review.tizen.org Git - platform/kernel/linux-starfive.git/log

mmc: mmci_sdmmc: Fix clear busyd0end irq flag

The busyd0 line transition can be very fast. The busy request may be
completed by busy_d0end, without waiting for the busy_d0 steps. Therefore,
clear the busyd0end irq flag, even if no busy_status.

Fixes: 0e68de6aa7b1 ("mmc: mmci: sdmmc: add busy_complete callback")
Cc: stable@vger.kernel.org
Signed-off-by: Ludovic Barre <ludovic.barre@st.com>
Link: https://lore.kernel.org/r/20200325143409.13005-2-ludovic.barre@st.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

dt-bindings: mmc: Fix node name in an example

The $nodename allows only "mmc@*" whereas the example node is named
"sdhci".

Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Link: https://lore.kernel.org/r/20200317093922.20785-19-lkundrak@v3.sk
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

mmc: core: Re-work the code for eMMC sanitize

The error path for sanitize operations that completes with -ETIMEDOUT, is
tightly coupled with the internal request handling code of the core. More
precisely, mmc_wait_for_req_done() checks for specific sanitize errors.
This is not only inefficient as it affects all types of requests, but also
hackish.

Therefore, let's improve the behaviour by moving the error path out of the
mmc core. To do that, retuning needs to be held while running the sanitize
operation.

Moreover, to avoid exporting unnecessary symbols to the mmc block module,
let's move the code into the mmc_ops.c file. While updating the actual
code, let's also take the opportunity to clean up some of the mess around
it.

Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20200316152152.15122-1-ulf.hansson@linaro.org

mmc: sdhci: use FIELD_GET for preset value bit masks

Use the FIELD_GET macro to get access to the register fields.
Delete the shift macros.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Link: https://lore.kernel.org/r/20200312110050.21732-1-yamada.masahiro@socionext.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

mmc: sdhci-of-at91: Display clock changes for debug purpose only

The sdhci_at91_set_clks_presets() function is called multiple times at
runtime and the messages are shown on the console. Display clk mul, gck
rate and clk base for debug purpose only.

Signed-off-by: Cristian Birsan <cristian.birsan@microchip.com>
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Acked-by: Ludovic Desroches <ludovic.desroches@microchip.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/20200312142904.232822-1-tudor.ambarus@microchip.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

mmc: sdhci: iproc: Add custom set_power() callback for bcm2711

The controller needs a valid bus voltage in its power register
regardless of whether an external regulator is taking care of the power
supply.

The sdhci core already provides a helper function for this,
sdhci_set_power_and_bus_voltage(), so create a bcm2711 specific 'struct
sdhci_ops' which makes use of it.

Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/20200306174413.20634-10-nsaenzjulienne@suse.de
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

mmc: sdhci: am654: Use sdhci_set_power_and_voltage()

The sdhci core provides a helper function with the same functionality as
this controller's set_power() callback. Use it instead.

Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/20200306174413.20634-8-nsaenzjulienne@suse.de
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

mmc: sdhci: at91: Use sdhci_set_power_and_voltage()

The sdhci core provides a helper function with the same functionality as
this controller's set_power() callback. Use it instead.

Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/20200306174413.20634-5-nsaenzjulienne@suse.de
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

mmc: sdhci: milbeaut: Use sdhci_set_power_and_voltage()

The sdhci core provides a helper function with the same functionality as
this controller's set_power() callback. Use it instead.

Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/20200306174413.20634-4-nsaenzjulienne@suse.de
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

mmc: sdhci: arasan: Use sdhci_set_power_and_voltage()

The sdhci core provides a helper function with the same functionality as
this controller's set_power() callback. Use it instead.

Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/20200306174413.20634-3-nsaenzjulienne@suse.de
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

mmc: sdhci: Introduce sdhci_set_power_and_bus_voltage()

Some controllers diverge from the standard way of setting power and need
their bus voltage register to be configured regardless of the whether
they use regulators. As this is a common pattern across sdhci hosts,
create a helper function.

Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/20200306174413.20634-2-nsaenzjulienne@suse.de
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

mmc: vub300: Use scnprintf() for avoiding potential buffer overflow

Since snprintf() returns the would-be-output size instead of the
actual output size, the succeeding calls may go beyond the given
buffer limit. Fix it by replacing with scnprintf().

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://lore.kernel.org/r/20200311080439.13928-1-tiwai@suse.de
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

dt-bindings: mmc: synopsys-dw-mshc: fix clock-freq-min-max in example

A test with the command below does not detect all errors
in combination with 'additionalProperties: false' and
allOf:
- $ref: "synopsys-dw-mshc-common.yaml#"
allOf:
- $ref: "mmc-controller.yaml#"

'additionalProperties' applies to all properties that are not
accounted-for by 'properties' or 'patternProperties' in
the immediate schema.

First when we combine synopsys-dw-mshc.yaml,
synopsys-dw-mshc-common.yaml and mmc-controller.yaml it gives
this error:

Documentation/devicetree/bindings/mmc/synopsys-dw-mshc.example.dt.yaml:
mmc@12200000: 'clock-freq-min-max' does not match any of the regexes:
'^.*@[0-9]+$', '^clk-phase-(legacy|sd-hs|mmc-(hs|hs[24]00|ddr52)|
uhs-(sdr(12|25|50|104)|ddr50))$', 'pinctrl-[0-9]+'

'clock-freq-min-max' is deprecated, so replace it by 'max-frequency'.

make ARCH=arm dt_binding_check
DT_SCHEMA_FILES=Documentation/devicetree/bindings/mmc/synopsys-dw-mshc.yaml

Signed-off-by: Johan Jonker <jbx6244@gmail.com>
Link: https://lore.kernel.org/r/20200307160556.16226-1-jbx6244@gmail.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

sdhci: tegra: Enable MMC_CAP_WAIT_WHILE_BUSY host capability

Tegra sdhci host supports HW busy detection of the device busy
signaling over data0 lane.

So, this patch enables host capability MMC_CAP_wAIT_WHILE_BUSY.

Signed-off-by: Sowjanya Komatineni <skomatineni@nvidia.com>
Link: https://lore.kernel.org/r/1583941675-9884-2-git-send-email-skomatineni@nvidia.com
[Ulf: Lumped together the caps assignments]
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

sdhci: tegra: Implement Tegra specific set_timeout callback

Tegra host supports HW busy detection and timeouts based on the
count programmed in SDHCI_TIMEOUT_CONTROL register and max busy
timeout it supports is 11s in finite busy wait mode.

Some operations like SLEEP_AWAKE, ERASE and flush cache through
SWITCH commands take longer than 11s and Tegra host supports
infinite HW busy wait mode where HW waits forever till the card
is busy without HW timeout.

This patch implements Tegra specific set_timeout sdhci_ops to allow
switching between finite and infinite HW busy detection wait modes
based on the device command expected operation time.

Signed-off-by: Sowjanya Komatineni <skomatineni@nvidia.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/1583941675-9884-1-git-send-email-skomatineni@nvidia.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

mmc: sdhci-omap: Add Support for Suspend/Resume

Add power management ops which save and restore the driver context and
facilitate a system suspend and resume.

Signed-off-by: Faiz Abbas <faiz_abbas@ti.com>
Link: https://lore.kernel.org/r/20200305151228.24692-1-faiz_abbas@ti.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

mmc: renesas_sdhi: simplify execute_tuning

After refactoring, 'ret' variable is superfluous. Remove it.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Link: https://lore.kernel.org/r/20200213163715.8212-1-wsa+renesas@sang-engineering.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

mmc: renesas_sdhi: Use BITS_PER_LONG helper

Use the existing BITS_PER_LONG helper definition instead of calculating
this value.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Link: https://lore.kernel.org/r/20200302093534.9055-1-geert+renesas@glider.be
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

mmc: cqhci: Update cqhci memory ioresource name

Update cqhci memory ioresource name from cqhci_mem to cqhci since
suffix _mem is redundant.

Only sdhci-msm driver is making use of this resource as of now.
No other vendor's driver is using it. So this update shouldn't affect
any other vendor's cqhci functionality.

Signed-off-by: Veerabhadrarao Badiganti <vbadigan@codeaurora.org>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/1583328320-9981-1-git-send-email-vbadigan@codeaurora.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

mmc: sdhci-msm: Deactivate CQE during SDHC reset

When SDHC gets reset (E.g. in runtime suspend path), CQE also gets
reset and goes to disable state. But s/w state still points it as CQE
is in enabled state. Since s/w and h/w states goes out of sync,
it results in s/w request timeout for subsequent CQE requests.

To synchronize CQE s/w and h/w state during SDHC reset,
explicitly deactivate CQE just before SDHC reset.

Signed-off-by: Veerabhadrarao Badiganti <vbadigan@codeaurora.org>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/1583503724-13943-3-git-send-email-vbadigan@codeaurora.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

mmc: cqhci: Add cqhci_deactivate()

Host controllers can reset CQHCI either directly or as a consequence of
host controller reset. Add cqhci_deactivate() which puts the CQHCI
driver into a state that is consistent with that.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Veerabhadrarao Badiganti <vbadigan@codeaurora.org>
Link: https://lore.kernel.org/r/1583503724-13943-2-git-send-email-vbadigan@codeaurora.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

mmc: Replace zero-length array with flexible-array member

The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:

struct foo {
int stuff;
struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.

Also, notice that, dynamic memory allocations won't be affected by
this change:

"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]

This issue was found with the help of Coccinelle.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/20200226223125.GA20630@embeddedor
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

mmc: mmc_test: Pass different sg lists for non-blocking requests

Supply a separate sg list for each of the request in non-blocking
IO test cases where two requests will be issued at same time.

Otherwise, sg memory may get unmapped when a request is done while
same memory is being accessed by controller from the other request,
and it leads to iommu errors with below call stack:

__arm_lpae_unmap+0x2e0/0x478
arm_lpae_unmap+0x54/0x70
arm_smmu_unmap+0x64/0xa4
__iommu_unmap+0xb8/0x1f0
iommu_unmap_fast+0x38/0x48
__iommu_dma_unmap+0x88/0x108
iommu_dma_unmap_sg+0x90/0xa4
sdhci_post_req+0x5c/0x78
mmc_test_start_areq+0x10c/0x120 [mmc_test]
mmc_test_area_io_seq+0x150/0x264 [mmc_test]
mmc_test_rw_multiple+0x174/0x1c0 [mmc_test]
mmc_test_rw_multiple_sg_len+0x44/0x6c [mmc_test]
mmc_test_profile_sglen_wr_nonblock_perf+0x6c/0x94 [mmc_test]
mtf_test_write+0x238/0x3cc [mmc_test]

Signed-off-by: Veerabhadrarao Badiganti <vbadigan@codeaurora.org>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Tested-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/1582714668-17247-1-git-send-email-vbadigan@codeaurora.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

dt-bindings: mmc: sdhci-msm: Add CQE reg map

CQE feature has been enabled on sdhci-msm. Add CQE reg map
and reg names that need to be supplied for supporting CQE feature.

Signed-off-by: Veerabhadrarao Badiganti <vbadigan@codeaurora.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/1582545470-11530-1-git-send-email-vbadigan@codeaurora.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

mmc: sdhci-sprd: Set the missing MMC_CAP_WAIT_WHILE_BUSY flag

The Spreadtrum host controller supports HW busy detection for commands with
R1B responses, but also for I/O operations. This means when the host gets a
transfer complete event, that always indicates the busy signal is released.

Let's inform the mmc core about this, via setting the corresponding
MMC_CAP_WAIT_WHILE_BUSY flag, as to remove some redundant software busy
polling.

Signed-off-by: Baolin Wang <baolin.wang7@gmail.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/96f16647f6a6e8cb058c44e46c61b122df027059.1582535202.git.baolin.wang7@gmail.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

mmc: core: Fix indentation

sdio_single_irq_set() was indented with a mix of tabs and spaces.

Signed-off-by: Jérôme Pouiller <jerome.pouiller@silabs.com>
Link: https://lore.kernel.org/r/20200221163147.608677-1-Jerome.Pouiller@silabs.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

mmc: sdhci-esdhc-imx: restore pin state when resume back

In some low power mode, SoC will lose the pin state, so need to restore
the pin state when resume back.

Signed-off-by: Haibo Chen <haibo.chen@nxp.com>
Link: https://lore.kernel.org/r/1582100757-20683-8-git-send-email-haibo.chen@nxp.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

mmc: sdhci-esdhc-imx: clear DMA_SEL when disable DMA mode

Currently, when use standard tuning, driver default disable DMA just before
send tuning command. But on i.MX8 usdhc, this is not enough. Need also clear
DMA_SEL. If not, once the DMA_SEL select AMDA2 before, even dma already disabled,
when send tuning command, usdhc will still prefetch the ADMA script from wrong
DMA address, then we will see IOMMU report some error which show lack of TLB
mapping.

Signed-off-by: Haibo Chen <haibo.chen@nxp.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/1582100757-20683-7-git-send-email-haibo.chen@nxp.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

mmc: sdhci-esdhc-imx: clear pending interrupt and halt cqhci

On i.MX8MM, we are running Dual Linux OS, with 1st Linux using SD Card
as rootfs storage, 2nd Linux using eMMC as rootfs storage. We let the
the 1st linux configure power/clock for the 2nd Linux.

When the 2nd Linux is booting into rootfs stage, we let the 1st Linux
to destroy the 2nd linux, then restart the 2nd linux, we met SDHCI dump
as following, after we clear the pending interrupt and halt CQCTL, issue
gone.

[ 1.334594] mmc2: Got command interrupt 0x00000001 even though no command operation was in progress.
[ 1.334595] mmc2: sdhci: ============ SDHCI REGISTER DUMP ===========
[ 1.334599] mmc2: sdhci: Sys addr: 0xa05dcc00 | Version: 0x00000002
[ 1.345538] mmc2: sdhci: Blk size: 0x00000200 | Blk cnt: 0x00000000
[ 1.345541] mmc2: sdhci: Argument: 0x00018000 | Trn mode: 0x00000033
[ 1.345543] mmc2: sdhci: Present: 0x01f88008 | Host ctl: 0x00000031
[ 1.345547] mmc2: sdhci: Power: 0x00000002 | Blk gap: 0x00000080
[ 1.357903] mmc2: sdhci: Wake-up: 0x00000008 | Clock: 0x0000003f
[ 1.357905] mmc2: sdhci: Timeout: 0x0000008f | Int stat: 0x00000000
[ 1.357908] mmc2: sdhci: Int enab: 0x107f100b | Sig enab: 0x107f100b
[ 1.357911] mmc2: sdhci: AC12 err: 0x00000000 | Slot int: 0x00000502
[ 1.370268] mmc2: sdhci: Caps: 0x07eb0000 | Caps_1: 0x0000b400
[ 1.370270] mmc2: sdhci: Cmd: 0x00000d1a | Max curr: 0x00ffffff
[ 1.370273] mmc2: sdhci: Resp[0]: 0x00000b00 | Resp[1]: 0xffffffff
[ 1.370276] mmc2: sdhci: Resp[2]: 0x328f5903 | Resp[3]: 0x00d00f00
[ 1.382132] mmc2: sdhci: Host ctl2: 0x00000000
[ 1.382135] mmc2: sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0xa2040208

[ 2.060932] mmc2: Unexpected interrupt 0x00004000.
[ 2.065538] mmc2: sdhci: ============ SDHCI REGISTER DUMP ===========
[ 2.071720] mmc2: sdhci: Sys addr: 0x00000000 | Version: 0x00000002
[ 2.077902] mmc2: sdhci: Blk size: 0x00000200 | Blk cnt: 0x00000001
[ 2.084083] mmc2: sdhci: Argument: 0x00000000 | Trn mode: 0x00000000
[ 2.090264] mmc2: sdhci: Present: 0x01f88009 | Host ctl: 0x00000011
[ 2.096446] mmc2: sdhci: Power: 0x00000002 | Blk gap: 0x00000080
[ 2.102627] mmc2: sdhci: Wake-up: 0x00000008 | Clock: 0x000010ff
[ 2.108809] mmc2: sdhci: Timeout: 0x0000008f | Int stat: 0x00004000
[ 2.114990] mmc2: sdhci: Int enab: 0x007f1003 | Sig enab: 0x007f1003
[ 2.121171] mmc2: sdhci: AC12 err: 0x00000000 | Slot int: 0x00000502
[ 2.127353] mmc2: sdhci: Caps: 0x07eb0000 | Caps_1: 0x0000b400
[ 2.133534] mmc2: sdhci: Cmd: 0x0000371a | Max curr: 0x00ffffff
[ 2.139715] mmc2: sdhci: Resp[0]: 0x00000900 | Resp[1]: 0xffffffff
[ 2.145896] mmc2: sdhci: Resp[2]: 0x328f5903 | Resp[3]: 0x00d00f00
[ 2.152077] mmc2: sdhci: Host ctl2: 0x00000000
[ 2.156342] mmc2: sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x00000000

Signed-off-by: Haibo Chen <haibo.chen@nxp.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/1582100757-20683-6-git-send-email-haibo.chen@nxp.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

mmc: sdhci-esdhc-imx: Add an new esdhc_soc_data for i.MX8MM

Add new esdhc_soc_data, with compatible string "fsl,imx8mm-usdhc".

Signed-off-by: Haibo Chen <haibo.chen@nxp.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/1582100757-20683-5-git-send-email-haibo.chen@nxp.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

mmc: sdhci-esdhc-imx: add flag ESDHC_FLAG_BROKEN_AUTO_CMD23

Since L4.15, community involve the commit 105819c8a545 ("mmc: core: use mrq->sbc
when sending CMD23 for RPMB"), let the usdhc to decide whether to use ACMD23 for
RPMB. This CMD23 for RPMB need to set the bit 31 to its argument, if not, the
RPMB write operation will return general fail.

According to the sdhci logic, SDMA mode will disable the ACMD23, and only in
ADMA mode, it will chose to use ACMD23 if the host support. But according to
debug, and confirm with IC, the imx6qpdl/imx6sx/imx6sl/imx7d do not support
the ACMD23 feature completely. These SoCs only use the 16 bit block count of
the register 0x4 (BLOCK_ATT) as the CMD23's argument in ACMD23 mode, which
means it will ignore the upper 16 bit of the CMD23's argument. This will block
the reliable write operation in RPMB, because RPMB reliable write need to set
the bit31 of the CMD23's argument. This is the hardware limitation. So for
imx6qpdl/imx6sx/imx6sl/imx7d, it need to broke the ACMD23 for eMMC, SD card do
not has this limitation, because SD card do not support reliable write.

For imx6ul/imx6ull/imx6sll/imx7ulp/imx8, it support the ACMD23 completely, it
change to use the 0x0 register (DS_ADDR) to put the CMD23's argument in ADMA mode.

This patch add a new flag ESDHC_FLAG_BROKEN_AUTO_CMD23, and add this flag to
imx6q/imx6sx/imx6sl/imx7d, and due to the imx6sll share the same compatible string
with imx6sx before, and imx6sll do not has this limitation, so add a new compatible
string for imx6sll.

Signed-off-by: Haibo Chen <haibo.chen@nxp.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/1582100757-20683-4-git-send-email-haibo.chen@nxp.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

mmc: sdhci-esdhc-imx: optimize the strobe dll setting

After set the STROBE SLV delay target value, it need to wait some
time to let the usdhc lock the REF and SLV clock. In normal case,
1~2us is enough for imx8/imx6 and imx7d, and 4~5us is enough for
imx7ulp, but when do reboot stress test or do the bind/unbind stress
test, sometimes need to wait about 10us to get the status lock.

This patch optimize delay handle method, only print the warning
message if the status is still not lock after 1ms delay.

Signed-off-by: Haibo Chen <haibo.chen@nxp.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/1582100757-20683-3-git-send-email-haibo.chen@nxp.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

mmc: sdhci-esdhc-imx: optimize the clock setting

When force clock off, check the SDOFF of register PRSSTAT to make sure
the clock is gate off. Before force clock on, check the SDSTB of register
PRSSTAT to make sure the clock is stable, this will eliminate the clock
glitch.

Signed-off-by: Haibo Chen <haibo.chen@nxp.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/1582100757-20683-2-git-send-email-haibo.chen@nxp.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

mmc: sdhci-esdhc-imx: add strobe-dll-delay-target support

strobe-dll-delay-target is the delay cell add on the strobe line.
Strobe line the the uSDHC loopback read clock which is use in HS400
mode. Different strobe-dll-delay-target may need to set for different
board/SoC. If this delay cell is not set to an appropriate value,
we may see some read operation meet CRC error after HS400 mode select
which already pass the tuning.

This patch add the strobe-dll-delay-target setting in driver, so that
user can easily config this delay cell in dts file.

Signed-off-by: Haibo Chen <haibo.chen@nxp.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/1582100757-20683-1-git-send-email-haibo.chen@nxp.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

doc: dt: fsl-imx-esdhc: add strobe-dll-delay-target binding

Add fsl,strobe-dll-delay-target binding.

Signed-off-by: Haibo Chen <haibo.chen@nxp.com>
Link: https://lore.kernel.org/r/1582100704-20601-1-git-send-email-haibo.chen@nxp.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

mmc: sdhci-esdhc-imx: restore the per_clk rate in PM_RUNTIME

When pm_runtime_suspend is run, a call to SCFW power off the SS (SS is a
power domain, usdhc belong to this SS power domain) in which the resource
resides is made. The SCFW can power off the SS if no other resource in
active in that SS. If so, all state associated with all the resources within
the SS that is powered off is lost, this includes the clock rates, clock state
etc. When pm_runtime_resume is called, the SS associated with that resource
is powered up. But the clocks are left in the default state.

This patch restore clock rate in pm_runtime_resume, make sure the
clock is right rather than depending on the default state setting
by SCFW.

Signed-off-by: Haibo Chen <haibo.chen@nxp.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/1582100563-20555-5-git-send-email-haibo.chen@nxp.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

mmc: sdhci-esdhci-imx: retune needed for Mega/Mix enabled SoCs

For Mega/Mix enabled SoCs like MX7D and MX6SX, uSDHC will lost power in
LP mode no matter whether the MMC_KEEP_POWER flag is set or not.
This may cause state misalign between kernel and HW, especially for
SDIO3.0 WiFi cards.
e.g. SDIO WiFi driver usually will keep power during system suspend.
And after resume, no card re-enumeration called.
But the tuning state is lost due to Mega/Mix.
Then CRC error may happen during next data transfer.

So we should always fire a mmc_retune_needed() for such type SoC
to tell MMC core retuning is needed for next data transfer.
mmc: sdhci-esdhci-imx: retune needed for Mega/Mix enabled SoCs

Signed-off-by: Haibo Chen <haibo.chen@nxp.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/1582100563-20555-4-git-send-email-haibo.chen@nxp.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

mmc: sdhci-esdhc-imx: no fail when no pinctrl available

When using jailhouse to support two Linux on i.MX8MQ EVK, we use the
1st Linux to configure pinctrl for the 2nd Linux. Then the 2nd Linux
could use the mmc without pinctrl driver.

So give a warning message when no pinctrl available, but no fail probe.

Signed-off-by: Haibo Chen <haibo.chen@nxp.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/1582100563-20555-3-git-send-email-haibo.chen@nxp.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

mmc: sdhci: do not enable card detect interrupt for gpio cd type

Except SDHCI_QUIRK_BROKEN_CARD_DETECTION and MMC_CAP_NONREMOVABLE,
we also do not need to handle controller native card detect interrupt
for gpio cd type.
If we wrong enabled the card detect interrupt for gpio case, it will
cause a lot of unexpected card detect interrupts during data transfer
which should not happen.

Signed-off-by: Haibo Chen <haibo.chen@nxp.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/1582100563-20555-2-git-send-email-haibo.chen@nxp.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

mmc: sdhci_am654: Enable DLL only for some speed modes

Its recommended that DLL must only be enabled for SDR50, DDR50, DDR52,
SDR104, HS200 and HS400 speed modes. Move DLL configuration to its own
function and call it only in the above speed modes.

Signed-off-by: Faiz Abbas <faiz_abbas@ti.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/20200108150920.14547-4-faiz_abbas@ti.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

mmc: sdhci_am654: Update OTAPDLY writes

According to the latest AM65x Data Manual[1], a different output tap
delay value is optimum for a given speed mode. Therefore, deprecate the
ti,otap-del-sel binding and introduce a new binding for each of the
possible MMC/SD speed modes. If the legacy mode is not found, fall back
to old binding to maintain dts compatibility.

[1] http://www.ti.com/lit/gpn/am6526

Signed-off-by: Faiz Abbas <faiz_abbas@ti.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/20200108150920.14547-3-faiz_abbas@ti.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

dt-bindings: mmc: sdhci-am654: Update Output tap delay binding

According to latest AM65x Data Manual[1], a different output tap delay
value is recommended for all speed modes. Therefore, replace the
ti,otap-del-sel binding with one ti,otap-del-sel- for each MMC/SD speed
mode.

[1] http://www.ti.com/lit/gpn/am6526

Signed-off-by: Faiz Abbas <faiz_abbas@ti.com>
Link: https://lore.kernel.org/r/20200108150920.14547-2-faiz_abbas@ti.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

mmc: host: hsq: Add missing MODULE_LICENSE() and MODULE_DESCRIPTION()

Add missing MODULE_LICENSE() and MODULE_DESCRIPTION() in hsq driver to
fix below warning when compiling the hsq as a module.

"WARNING: modpost: missing MODULE_LICENSE() in drivers/mmc/host/mmc_hsq.o".

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Baolin Wang <baolin.wang7@gmail.com>
Link: https://lore.kernel.org/r/98ce471185f037fce57520763621590588766381.1582161803.git.baolin.wang7@gmail.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

mmc: mmci: Add support for sdmmc variant revision 2.0

This patch adds a sdmmc variant revision 2.0. This revision is backward
compatible with 1.1, but adds DMA linked list support.

Signed-off-by: Ludovic Barre <ludovic.barre@st.com>
Link: https://lore.kernel.org/r/20200128090636.13689-10-ludovic.barre@st.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

mmc: mmci_sdmmc: Implement signal voltage callbacks

To prepare the voltage switch procedure, the VSWITCHEN bit must be set
before sending the CMD11. To confirm completion of voltage switch, the
VSWEND flag must be checked.

Signed-off-by: Ludovic Barre <ludovic.barre@st.com>
Link: https://lore.kernel.org/r/20200128090636.13689-9-ludovic.barre@st.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

mmc: mmci: Add callbacks for to manage signal voltage switch

A variant may need to define some actions before and after a voltage
switch.

This patch adds 2 callbacks to manage signal voltage switch in the struct
mmci_host_ops. ->pre_sig_volt_switch() allows to prepare a signal voltage
switch before sending the SD_SWITCH_VOLTAGE command (CMD11).
->post_sig_volt_switch callback allows specific actions to be executed,
after the I/O signal voltage level has been changed.

Signed-off-by: Ludovic Barre <ludovic.barre@st.com>
Link: https://lore.kernel.org/r/20200128090636.13689-8-ludovic.barre@st.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

mmc: mmci_sdmmc: Add execute tuning with delay block

The hardware delay block is used to align the sampling clock on the data
received by SDMMC. It is mandatory for SDMMC to support the SDR104 mode.
The delay block is used to generate an output clock which is dephased from
the input clock. The phase of the output clock must be programmed by the
execute tuning interface.

Signed-off-by: Ludovic Barre <ludovic.barre@st.com>
Link: https://lore.kernel.org/r/20200128090636.13689-7-ludovic.barre@st.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

dt-bindings: mmc: mmci: add delay block base register for sdmmc

To support the sdr104 mode, the sdmmc variant has a hardware delay block to
manage the clock phase when sampling data received by the card.

This patch adds a second base register (optional) for sdmmc delay block.

Signed-off-by: Ludovic Barre <ludovic.barre@st.com>
Acked-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20200128090636.13689-6-ludovic.barre@st.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

mmc: mmci: Add private pointer for variant

In variant init function, some references may be allocated for variant
specific usage. Add a private void* to mmci_host struct allows at variant
functions to access on this references by mmci_host structure.

Signed-off-by: Ludovic Barre <ludovic.barre@st.com>
Link: https://lore.kernel.org/r/20200128090636.13689-5-ludovic.barre@st.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

mmc: mmci: Add a reference at mmc_host_ops in mmci struct

The variant init function may need to add a mmc_host_ops, for example to
add the execute_tuning support if this feature is available. This patch
adds mmc_host_ops pointer in mmci struct.

Signed-off-by: Ludovic Barre <ludovic.barre@st.com>
Link: https://lore.kernel.org/r/20200128090636.13689-4-ludovic.barre@st.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

mmc: mmci_sdmmc: Rename sdmmc_priv struct to sdmmc_idma

This patch renames sdmmc_priv struct to sdmmc_idma which is assigned to
host->dma_priv.

Signed-off-by: Ludovic Barre <ludovic.barre@st.com>
Link: https://lore.kernel.org/r/20200128090636.13689-3-ludovic.barre@st.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

mmc: mmci_sdmmc: Replace sg_dma_xxx macros

sg_dma_xxx should be used after a dma_map_sg call has been done to get bus
addresses of each of the SG entries and their lengths. But mmci_host_ops
validate_data can be called before dma_map_sg. This patch replaces theses
macros by sg->offset and sg->length which are always defined.

Signed-off-by: Ludovic Barre <ludovic.barre@st.com>
Link: https://lore.kernel.org/r/20200128090636.13689-2-ludovic.barre@st.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

mmc: core: Fixup support for HW busy detection for HPI commands

In case the host specify a max_busy_timeout, we need to validate that the
needed timeout for the HPI command conforms to that requirement. If that's
not the case, let's convert from a R1B response to a R1 response, as to
instruct the host to avoid HW busy detection.

Additionally, when R1B is used we must also inform the host about the busy
timeout for the command, so let's do that via updating cmd.busy_timeout.

Finally, when R1B is used and in case the host supports HW busy detection,
there should be no need for doing polling, so then skip that.

Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Tested-by: Baolin Wang <baolin.wang7@gmail.com>
Tested-by: Ludovic Barre <ludovic.barre@st.com>
Reviewed-by: Ludovic Barre <ludovic.barre@st.com>
Link: https://lore.kernel.org/r/20200204085449.32585-12-ulf.hansson@linaro.org

mmc: core: Convert to mmc_poll_for_busy() for HPI commands

Rather than open coding the polling loop in mmc_interrupt_hpi(), let's
convert to use mmc_poll_for_busy().

Note that, moving to mmc_poll_for_busy() for HPI also improves the
behaviour according to below.

- Adds support for polling via the optional ->card_busy() host ops.
- Require R1_READY_FOR_DATA to be set in the CMD13 response before exiting
the polling loop.
- Adds a throttling mechanism to avoid CPU hogging when polling.

Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Tested-by: Baolin Wang <baolin.wang7@gmail.com>
Tested-by: Ludovic Barre <ludovic.barre@st.com>
Reviewed-by: Ludovic Barre <ludovic.barre@st.com>
Link: https://lore.kernel.org/r/20200204085449.32585-11-ulf.hansson@linaro.org

mmc: core: Drop redundant out-parameter to mmc_send_hpi_cmd()

The 'u32 *status' is unused by the caller, so let's drop it.

Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Tested-by: Baolin Wang <baolin.wang7@gmail.com>
Tested-by: Ludovic Barre <ludovic.barre@st.com>
Reviewed-by: Ludovic Barre <ludovic.barre@st.com>
Link: https://lore.kernel.org/r/20200204085449.32585-10-ulf.hansson@linaro.org

mmc: core: Convert to mmc_poll_for_busy() for erase/trim/discard

Rather than open coding the polling loop in mmc_do_erase(), let's convert
to use mmc_poll_for_busy().

To allow a slightly different error parsing during polling, compared to the
__mmc_switch() case, a new in-parameter to mmc_poll_for_busy() is needed,
but other than that the conversion is straight forward.

Besides addressing the open coding issue, moving to mmc_poll_for_busy() for
erase/trim/discard improves the behaviour according to below.

- Adds support for polling via the optional ->card_busy() host ops.
- Returns zero to indicate success when the final polling attempt finds the
card non-busy, even if the timeout expired.
- Exits the polling loop when state moves to R1_STATE_TRAN, rather than
when leaving R1_STATE_PRG.
- Decreases the starting range for throttling to 32-64us.

Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Tested-by: Baolin Wang <baolin.wang7@gmail.com>
Tested-by: Ludovic Barre <ludovic.barre@st.com>
Reviewed-by: Ludovic Barre <ludovic.barre@st.com>
Link: https://lore.kernel.org/r/20200204085449.32585-9-ulf.hansson@linaro.org

mmc: core: Update CMD13 busy check for CMD6 commands

Through mmc_poll_for_busy() a CMD13 may be sent to get the status of the
(e)MMC card. If the state of the card is R1_STATE_PRG, the card is
considered as being busy, which means we continue to poll with CMD13. This
seems to be sufficient, but it's also unnecessary fragile, as it means a
new command/request could potentially be sent to the card when it's in an
unknown state.

To try to improve the situation, but also to move towards a more consistent
CMD13 polling behaviour in the mmc core, let's deploy the same policy we
use for regular I/O write requests. In other words, let's check that card
returns to the R1_STATE_TRAN and that the R1_READY_FOR_DATA bit is set in
the CMD13 response, before exiting the polling loop.

Note that, potentially this changed behaviour could lead to unnecessary
waiting for the timeout to expire, if the card for some reason, moves to an
unexpected error state. However, as we bail out from the polling loop when
R1_SWITCH_ERROR bit is set or when the CMD13 fails, this shouldn't be an
issue.

Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Tested-by: Baolin Wang <baolin.wang7@gmail.com>
Tested-by: Ludovic Barre <ludovic.barre@st.com>
Reviewed-by: Ludovic Barre <ludovic.barre@st.com>
Link: https://lore.kernel.org/r/20200204085449.32585-8-ulf.hansson@linaro.org

mmc: core: Enable re-use of mmc_blk_in_tran_state()

To allow subsequent changes to re-use the code from the static function
mmc_blk_in_tran_state(), let's move it to a public header. While at it,
let's also rename it to mmc_ready_for_data(), as to try to better describe
its purpose.

Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Tested-by: Baolin Wang <baolin.wang7@gmail.com>
Tested-by: Ludovic Barre <ludovic.barre@st.com>
Reviewed-by: Ludovic Barre <ludovic.barre@st.com>
Link: https://lore.kernel.org/r/20200204085449.32585-7-ulf.hansson@linaro.org

mmc: core: Split up mmc_poll_for_busy()

To make the code more readable, move the part that gets the busy status of
the card out into a separate function, mmc_busy_status(). Then call it from
mmc_poll_for_busy().

Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Tested-by: Baolin Wang <baolin.wang7@gmail.com>
Tested-by: Ludovic Barre <ludovic.barre@st.com>
Reviewed-by: Ludovic Barre <ludovic.barre@st.com>
Link: https://lore.kernel.org/r/20200204085449.32585-6-ulf.hansson@linaro.org

mmc: core: Drop redundant in-parameter to __mmc_switch()

The use_busy_signal in-parameter is set true by all callers of
__mmc_switch(), hence it's redundant so drop it.

Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Tested-by: Baolin Wang <baolin.wang7@gmail.com>
Tested-by: Ludovic Barre <ludovic.barre@st.com>
Reviewed-by: Ludovic Barre <ludovic.barre@st.com>
Link: https://lore.kernel.org/r/20200204085449.32585-5-ulf.hansson@linaro.org

mmc: core: Extend mmc_switch_status() to rid of __mmc_switch_status()

To simplify code, let's extend mmc_switch_status() to cope with needs
addressed in __mmc_switch_status(). Then move all users to the updated
mmc_switch_status() API and drop __mmc_switch_status() altogether.

Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Tested-by: Baolin Wang <baolin.wang7@gmail.com>
Tested-by: Ludovic Barre <ludovic.barre@st.com>
Reviewed-by: Ludovic Barre <ludovic.barre@st.com>
Link: https://lore.kernel.org/r/20200204085449.32585-4-ulf.hansson@linaro.org

mmc: core: Drop unused define

The last user of MMC_OPS_TIMEOUT_MS was recently removed, however the
define stayed around. Let's remove it.

Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Tested-by: Baolin Wang <baolin.wang7@gmail.com>
Tested-by: Ludovic Barre <ludovic.barre@st.com>
Reviewed-by: Ludovic Barre <ludovic.barre@st.com>
Link: https://lore.kernel.org/r/20200204085449.32585-3-ulf.hansson@linaro.org

mmc: core: Throttle polling rate for CMD6

In mmc_poll_for_busy() we loop continuously, either by sending a CMD13 or
by invoking the ->card_busy() host ops, as to detect when the card stops
signaling busy. This behaviour is problematic as it may cause CPU hogging,
especially when the busy signal time reaches beyond a few ms.

Let's fix the issue by adding a throttling mechanism, that inserts a
usleep_range() in between the polling attempts. The sleep range starts at
32-64us, but increases for each loop by a factor of 2, up until the range
reaches ~32-64ms. In this way, we are able to keep the loop fine-grained
enough for short busy signaling times, while also not hogging the CPU for
longer times.

Note that, this change is inspired by the similar throttling mechanism that
we already use for mmc_do_erase().

Reported-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Tested-by: Baolin Wang <baolin.wang7@gmail.com>
Tested-by: Ludovic Barre <ludovic.barre@st.com>
Reviewed-by: Ludovic Barre <ludovic.barre@st.com>
Link: https://lore.kernel.org/r/20200204085449.32585-2-ulf.hansson@linaro.org

mmc: host: sdhci-sprd: Add software queue support

Add software queue support to improve the performance.

Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
Signed-off-by: Baolin Wang <baolin.wang7@gmail.com>
Link: https://lore.kernel.org/r/f629b32943aae9e30ffa17acf4af06c270417001.1581478569.git.baolin.wang7@gmail.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

mmc: host: sdhci: Add a variable to defer to complete requests if needed

When using the host software queue, it will trigger the next request in
irq handler without a context switch. But the sdhci_request() can not be
called in interrupt context when using host software queue for some host
drivers, due to the get_cd() ops can be sleepable.

But for some host drivers, such as Spreadtrum host driver, the card is
nonremovable, so the get_cd() ops is not sleepable, which means we can
complete the data request and trigger the next request in irq handler
to remove the context switch for the Spreadtrum host driver.

As suggested by Adrian, we should introduce a request_atomic() API to
indicate that a request can be called in interrupt context to remove
the context switch when using mmc host software queue. But this should
be done in another thread to convert the users of mmc host software queue.
Thus we can introduce a variable in struct sdhci_host to indicate that
we will always to defer to complete requests when using the host software
queue.

Suggested-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
Signed-off-by: Baolin Wang <baolin.wang7@gmail.com>
Link: https://lore.kernel.org/r/e693e7a29beb3c1922b333f4603ea81f43d5c5b1.1581478568.git.baolin.wang7@gmail.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

mmc: host: sdhci: Add request_done ops for struct sdhci_ops

Add request_done ops for struct sdhci_ops as a preparation in case some
host controllers have different method to complete one request, such as
supporting request completion of MMC software queue.

Suggested-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
Signed-off-by: Baolin Wang <baolin.wang7@gmail.com>
Link: https://lore.kernel.org/r/1539c801c8bbdbcd1d86f8c2dab375f5803c765a.1581478568.git.baolin.wang7@gmail.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

mmc: core: Enable the MMC host software queue for the SD card

Enable the MMC host software queue for the SD card if the host controller
supports the MMC host software queue.

On my Spreadtrum platform, I did not see any obvious performance changes
in 4K block size when changing to use hsq for the SD cards, I think the
reason is the SD card works at a low speed on my platform, and most of
time is spent in the hardware. But we can see some obvious improvements
when enabling the packed request based on hsq, that's why we still add hsq
support for the SD cards.

Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
Signed-off-by: Baolin Wang <baolin.wang7@gmail.com>
Link: https://lore.kernel.org/r/0065b4631fef2d61c3b89d14a4ea4f2b7499ea56.1581478568.git.baolin.wang7@gmail.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

mmc: Add MMC host software queue support

Now the MMC read/write stack will always wait for previous request is
completed by mmc_blk_rw_wait(), before sending a new request to hardware,
or queue a work to complete request, that will bring context switching
overhead and spend some extra time to poll the card for busy completion
for I/O writes via sending CMD13, especially for high I/O per second
rates, to affect the IO performance.

Thus this patch introduces MMC software queue interface based on the
hardware command queue engine's interfaces, which is similar with the
hardware command queue engine's idea, that can remove the context
switching. Moreover we set the default queue depth as 64 for software
queue, which allows more requests to be prepared, merged and inserted
into IO scheduler to improve performance, but we only allow 2 requests
in flight, that is enough to let the irq handler always trigger the
next request without a context switch, as well as avoiding a long latency.

Moreover the host controller should support HW busy detection for I/O
operations when enabling the host software queue. That means, the host
controller must not complete a data transfer request, until after the
card stops signals busy.

From the fio testing data in cover letter, we can see the software
queue can improve some performance with 4K block size, increasing
about 16% for random read, increasing about 90% for random write,
though no obvious improvement for sequential read and write.

Moreover we can expand the software queue interface to support MMC
packed request or packed command in future.

Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
Signed-off-by: Baolin Wang <baolin.wang7@gmail.com>
Link: https://lore.kernel.org/r/4409c1586a9b3ed20d57ad2faf6c262fc3ccb6e2.1581478568.git.baolin.wang7@gmail.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

mmc: sdhci-msm: Don't enable PWRSAVE_DLL for certain sdhc hosts

SDHC core with new 14lpp and later tech DLL should not enable
PWRSAVE_DLL since such controller's internal gating cannot meet
following MCLK requirement:
When MCLK is gated OFF, it is not gated for less than 0.5us and MCLK
must be switched on for at-least 1us before DATA starts coming.

Adding support for this requirement.

Signed-off-by: Ritesh Harjani <riteshh@codeaurora.org>
Signed-off-by: Veerabhadrarao Badiganti <vbadigan@codeaurora.org>
Reviewed-by: Can Guo <cang@codeaurora.org>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/1581077075-26011-1-git-send-email-vbadigan@codeaurora.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

mmc: sdhci-of-arasan: Remove quirk for broken base clock

This patch removes quirk which indicates a broken base clock. This was
making the kernel report wrong base clock of ~187MHz instead of 200MHz
even as the measurement on the hardware was showing 200MHz.

Signed-off-by: Manish Narani <manish.narani@xilinx.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/1579602095-30060-5-git-send-email-manish.narani@xilinx.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

mmc: sdhci-of-arasan: Add support for DLL reset for ZynqMP platforms

The DLL resets are required while executing the auto tuning procedure in
ZynqMP. This patch adds code to support the same.

Signed-off-by: Manish Narani <manish.narani@xilinx.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/1579602095-30060-4-git-send-email-manish.narani@xilinx.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

firmware: xilinx: Add DLL reset support

SD DLL resets are required for some of the operations on ZynqMP platform.
Add DLL reset support in ZynqMP firmware driver for SD DLL reset.

Signed-off-by: Manish Narani <manish.narani@xilinx.com>
Acked-by: Michal Simek <michal.simek@xilinx.com>
Link: https://lore.kernel.org/r/1579602095-30060-3-git-send-email-manish.narani@xilinx.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

firmware: xilinx: Add ZynqMP Tap Delay setup ioctl to the valid list

The Tap Delay setup ioctl was not added to valid list due to which it
may fail to set Tap Delays for SD. This patch fixes the same.

Signed-off-by: Manish Narani <manish.narani@xilinx.com>
Acked-by: Michal Simek <michal.simek@xilinx.com>
Link: https://lore.kernel.org/r/1579602095-30060-2-git-send-email-manish.narani@xilinx.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

mmc: tmio: remove superfluous callback wrappers

After various refactoring, we can populate the mmc_ops callbacks
directly and don't need to have wrappers for them anymore.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Link: https://lore.kernel.org/r/20200129203709.30493-7-wsa+renesas@sang-engineering.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

mmc: tmio: factor out TAP usage

TAPs are Renesas SDHI specific. Now that we moved all handling to the
SDHI core, we can also move the definitions from the TMIO struct to the
SDHI one.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Link: https://lore.kernel.org/r/20200129203709.30493-6-wsa+renesas@sang-engineering.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

mmc: tmio: enforce retune after runtime suspend

Currently, select_tuning() is called after RPM resume. But
select_tuning() needs some additional function calls to work correctly.
Instead of reimplementing the whole postprocessing, just enforce
retuning.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Link: https://lore.kernel.org/r/20200129203709.30493-5-wsa+renesas@sang-engineering.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

mmc: tmio: give callback a generic name

check_scc_error() is too Renesas specific. Let's just call it
check_retune() to make it also easier understandable what it does.
Only a rename, no functional change.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Link: https://lore.kernel.org/r/20200129203709.30493-4-wsa+renesas@sang-engineering.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

mmc: renesas_sdhi: complain loudly if driver needs update

When the tap array in the driver is too low, this is not a warning but
an error. Also _once is not helpful, we should make sure it is
prominently in the logs. It is safe to do this because this will only
show up during SoC enablement when we a new SoCs needs more taps (if
that ever will happen).

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Link: https://lore.kernel.org/r/20200129203709.30493-3-wsa+renesas@sang-engineering.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

mmc: tmio: refactor tuning execution into SDHI driver

Move Renesas specific code for executing the tuning with a SCC into the
SDHI driver and leave only a generic call in the TMIO driver. Simplify
the code a little by removing init_tuning() and prepare_tuning()
callbacks. The latter is directly folded into the new execute_tuning()
callbacks.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Link: https://lore.kernel.org/r/20200129203709.30493-2-wsa+renesas@sang-engineering.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

mmc: renesas_sdhi: cleanup SCC defines

Use increasing BIT numbers consistently and remove some superfluous
comments.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Link: https://lore.kernel.org/r/20191217114034.13290-6-wsa+renesas@sang-engineering.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

mmc: renesas_sdhi: enforce manual correction for Gen3

HW engineers say that automatic tap correction cannot be used for HS400
in all R-Car Gen3 SoCs. So, check for that SDHI variant and disable it
when HS400 is about to be enabled.

Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Link: https://lore.kernel.org/r/20191217114034.13290-5-wsa+renesas@sang-engineering.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

mmc: renesas_sdhi: only check CMD status for HS400 manual correction

R-Car Gen3 cannot use correction error status with HS400.
HS200: CMD and DAT signal timing are based on CLK signal.
HS400: CMD signal is based on CLK. DAT signal is based on DS signal.

In HS400, CMD signal is 200MHz(SDR). DAT signal is 200MHz(DDR).
Center position of signal is different between CMD and DAT.

TAP position should be adjusted to the center position of CMD signal.
DAT sampling timing is adjusted by HS400 calibration circuit regardless
of TAP position. Refer to renesas_sdhi_adjust_hs400mode_enable().

However, correction error status contains CMD and DAT status in HS400
(DAT signal is not masked in HS400). Therefore, correction error status
cannot use in HS400. It means that auto correction cannot be uses in
HS400. Manual correction can change to the correct TAP position by
ignoring DAT correction error status and using only CMD correction
status.

Signed-off-by: Takeshi Saito <takeshi.saito.xv@renesas.com>
[wsa: refactored patch from BSP]
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Link: https://lore.kernel.org/r/20191217114034.13290-4-wsa+renesas@sang-engineering.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

mmc: renesas_sdhi: Add manual correction

This patch adds a manual correction mechanism for SDHI. Currently, SDHI
uses automatic TAP position correction. However, TAP position can also
be corrected manually via correction error status flags.

Signed-off-by: Takeshi Saito <takeshi.saito.xv@renesas.com>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Link: https://lore.kernel.org/r/20191217114034.13290-3-wsa+renesas@sang-engineering.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

mmc: renesas_sdhi: remove double clear of automatic correction

hw_reset() clears the automatic correction bit twice. I couldn't find
anything in the docs recommending that. Removing one of them didn't
cause any regressions here, so keep it simple.

Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Link: https://lore.kernel.org/r/20191217114034.13290-2-wsa+renesas@sang-engineering.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

Linux 5.6-rc7

Merge tag 'for-5.6-rc6-tag' of git://git./linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
"Two fixes.

  The first is a regression: when dropping some incompat bits the
  conditions were reversed. The other is a fix for rename whiteout
  potentially leaving stack memory linked to a list"

* tag 'for-5.6-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: fix removal of raid[56|1c34} incompat flags after removing block group
  btrfs: fix log context list corruption after rename whiteout error

Merge branch 'akpm' (patches from Andrew)

Merge misc fixes from Andrew Morton:
"10 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  x86/mm: split vmalloc_sync_all()
  mm, slub: prevent kmalloc_node crashes and memory leaks
  mm/mmu_notifier: silence PROVE_RCU_LIST warnings
  epoll: fix possible lost wakeup on epoll_ctl() path
  mm: do not allow MADV_PAGEOUT for CoW pages
  mm, memcg: throttle allocators based on ancestral memory.high
  mm, memcg: fix corruption on 64-bit divisor in memory.high throttling
  page-flags: fix a crash at SetPageError(THP_SWAP)
  mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case
  memcg: fix NULL pointer dereference in __mem_cgroup_usage_unregister_event

x86/mm: split vmalloc_sync_all()

Commit 3f8fd02b1bf1 ("mm/vmalloc: Sync unmappings in
__purge_vmap_area_lazy()") introduced a call to vmalloc_sync_all() in
the vunmap() code-path.  While this change was necessary to maintain
correctness on x86-32-pae kernels, it also adds additional cycles for
architectures that don't need it.

Specifically on x86-64 with CONFIG_VMAP_STACK=y some people reported
severe performance regressions in micro-benchmarks because it now also
calls the x86-64 implementation of vmalloc_sync_all() on vunmap().  But
the vmalloc_sync_all() implementation on x86-64 is only needed for newly
created mappings.

To avoid the unnecessary work on x86-64 and to gain the performance
back, split up vmalloc_sync_all() into two functions:

* vmalloc_sync_mappings(), and
* vmalloc_sync_unmappings()

Most call-sites to vmalloc_sync_all() only care about new mappings being
synchronized.  The only exception is the new call-site added in the
above mentioned commit.

Shile Zhang directed us to a report of an 80% regression in reaim
throughput.

Fixes: 3f8fd02b1bf1 ("mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()")
Reported-by: kernel test robot <oliver.sang@intel.com>
Reported-by: Shile Zhang <shile.zhang@linux.alibaba.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Borislav Petkov <bp@suse.de>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [GHES]
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20191009124418.8286-1-joro@8bytes.org
Link: https://lists.01.org/hyperkitty/list/lkp@lists.01.org/thread/4D3JPPHBNOSPFK2KEPC6KGKS6J25AIDB/
Link: http://lkml.kernel.org/r/20191113095530.228959-1-shile.zhang@linux.alibaba.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm, slub: prevent kmalloc_node crashes and memory leaks

Sachin reports [1] a crash in SLUB __slab_alloc():

  BUG: Kernel NULL pointer dereference on read at 0x000073b0
  Faulting instruction address: 0xc0000000003d55f4
  Oops: Kernel access of bad area, sig: 11 [#1]
  LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
  Modules linked in:
  CPU: 19 PID: 1 Comm: systemd Not tainted 5.6.0-rc2-next-20200218-autotest #1
  NIP:  c0000000003d55f4 LR: c0000000003d5b94 CTR: 0000000000000000
  REGS: c0000008b37836d0 TRAP: 0300   Not tainted  (5.6.0-rc2-next-20200218-autotest)
  MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 24004844  XER: 00000000
  CFAR: c00000000000dec4 DAR: 00000000000073b0 DSISR: 40000000 IRQMASK: 1
  GPR00: c0000000003d5b94 c0000008b3783960 c00000000155d400 c0000008b301f500
  GPR04: 0000000000000dc0 0000000000000002 c0000000003443d8 c0000008bb398620
  GPR08: 00000008ba2f0000 0000000000000001 0000000000000000 0000000000000000
  GPR12: 0000000024004844 c00000001ec52a00 0000000000000000 0000000000000000
  GPR16: c0000008a1b20048 c000000001595898 c000000001750c18 0000000000000002
  GPR20: c000000001750c28 c000000001624470 0000000fffffffe0 5deadbeef0000122
  GPR24: 0000000000000001 0000000000000dc0 0000000000000002 c0000000003443d8
  GPR28: c0000008b301f500 c0000008bb398620 0000000000000000 c00c000002287180
  NIP ___slab_alloc+0x1f4/0x760
  LR __slab_alloc+0x34/0x60
  Call Trace:
    ___slab_alloc+0x334/0x760 (unreliable)
    __slab_alloc+0x34/0x60
    __kmalloc_node+0x110/0x490
    kvmalloc_node+0x58/0x110
    mem_cgroup_css_online+0x108/0x270
    online_css+0x48/0xd0
    cgroup_apply_control_enable+0x2ec/0x4d0
    cgroup_mkdir+0x228/0x5f0
    kernfs_iop_mkdir+0x90/0xf0
    vfs_mkdir+0x110/0x230
    do_mkdirat+0xb0/0x1a0
    system_call+0x5c/0x68

This is a PowerPC platform with following NUMA topology:

  available: 2 nodes (0-1)
  node 0 cpus:
  node 0 size: 0 MB
  node 0 free: 0 MB
  node 1 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
  node 1 size: 35247 MB
  node 1 free: 30907 MB
  node distances:
  node   0   1
    0:  10  40
    1:  40  10

  possible numa nodes: 0-31

This only happens with a mmotm patch "mm/memcontrol.c: allocate
shrinker_map on appropriate NUMA node" [2] which effectively calls
kmalloc_node for each possible node.  SLUB however only allocates
kmem_cache_node on online N_NORMAL_MEMORY nodes, and relies on
node_to_mem_node to return such valid node for other nodes since commit
a561ce00b09e ("slub: fall back to node_to_mem_node() node if allocating
on memoryless node").  This is however not true in this configuration
where the _node_numa_mem_ array is not initialized for nodes 0 and 2-31,
thus it contains zeroes and get_partial() ends up accessing
non-allocated kmem_cache_node.

A related issue was reported by Bharata (originally by Ramachandran) [3]
where a similar PowerPC configuration, but with mainline kernel without
patch [2] ends up allocating large amounts of pages by kmalloc-1k
kmalloc-512.  This seems to have the same underlying issue with
node_to_mem_node() not behaving as expected, and might probably also
lead to an infinite loop with CONFIG_SLUB_CPU_PARTIAL [4].

This patch should fix both issues by not relying on node_to_mem_node()
anymore and instead simply falling back to NUMA_NO_NODE, when
kmalloc_node(node) is attempted for a node that's not online, or has no
usable memory.  The "usable memory" condition is also changed from
node_present_pages() to N_NORMAL_MEMORY node state, as that is exactly
the condition that SLUB uses to allocate kmem_cache_node structures.
The check in get_partial() is removed completely, as the checks in
___slab_alloc() are now sufficient to prevent get_partial() being
reached with an invalid node.

[1] https://lore.kernel.org/linux-next/3381CD91-AB3D-4773-BA04-E7A072A63968@linux.vnet.ibm.com/
[2] https://lore.kernel.org/linux-mm/fff0e636-4c36-ed10-281c-8cdb0687c839@virtuozzo.com/
[3] https://lore.kernel.org/linux-mm/20200317092624.GB22538@in.ibm.com/
[4] https://lore.kernel.org/linux-mm/088b5996-faae-8a56-ef9c-5b567125ae54@suse.cz/

Fixes: a561ce00b09e ("slub: fall back to node_to_mem_node() node if allocating on memoryless node")
Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Reported-by: PUVICHAKRAVARTHY RAMACHANDRAN <puvichakravarthy@in.ibm.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Tested-by: Bharata B Rao <bharata@linux.ibm.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Christopher Lameter <cl@linux.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Nathan Lynch <nathanl@linux.ibm.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200320115533.9604-1-vbabka@suse.cz
Debugged-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm/mmu_notifier: silence PROVE_RCU_LIST warnings

It is safe to traverse mm->notifier_subscriptions->list either under
SRCU read lock or mm->notifier_subscriptions->lock using
hlist_for_each_entry_rcu().  Silence the PROVE_RCU_LIST false positives,
for example,

  WARNING: suspicious RCU usage
  -----------------------------
  mm/mmu_notifier.c:484 RCU-list traversed in non-reader section!!

  other info that might help us debug this:

  rcu_scheduler_active = 2, debug_locks = 1
  3 locks held by libvirtd/802:
   #0: ffff9321e3f58148 (&mm->mmap_sem#2){++++}, at: do_mprotect_pkey+0xe1/0x3e0
   #1: ffffffff91ae6160 (mmu_notifier_invalidate_range_start){+.+.}, at: change_p4d_range+0x5fa/0x800
   #2: ffffffff91ae6e08 (srcu){....}, at: __mmu_notifier_invalidate_range_start+0x178/0x460

  stack backtrace:
  CPU: 7 PID: 802 Comm: libvirtd Tainted: G          I       5.6.0-rc6-next-20200317+ #2
  Hardware name: HP ProLiant BL460c Gen8, BIOS I31 11/02/2014
  Call Trace:
    dump_stack+0xa4/0xfe
    lockdep_rcu_suspicious+0xeb/0xf5
    __mmu_notifier_invalidate_range_start+0x3ff/0x460
    change_p4d_range+0x746/0x800
    change_protection+0x1df/0x300
    mprotect_fixup+0x245/0x3e0
    do_mprotect_pkey+0x23b/0x3e0
    __x64_sys_mprotect+0x51/0x70
    do_syscall_64+0x91/0xae8
    entry_SYSCALL_64_after_hwframe+0x49/0xb3

Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Link: http://lkml.kernel.org/r/20200317175640.2047-1-cai@lca.pw
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

epoll: fix possible lost wakeup on epoll_ctl() path

This fixes possible lost wakeup introduced by commit a218cc491420.
Originally modifications to ep->wq were serialized by ep->wq.lock, but
in commit a218cc491420 ("epoll: use rwlock in order to reduce
ep_poll_callback() contention") a new rw lock was introduced in order to
relax fd event path, i.e. callers of ep_poll_callback() function.

After the change ep_modify and ep_insert (both are called on epoll_ctl()
path) were switched to ep->lock, but ep_poll (epoll_wait) was using
ep->wq.lock on wqueue list modification.

The bug doesn't lead to any wqueue list corruptions, because wake up
path and list modifications were serialized by ep->wq.lock internally,
but actual waitqueue_active() check prior wake_up() call can be
reordered with modifications of ep ready list, thus wake up can be lost.

And yes, can be healed by explicit smp_mb():

  list_add_tail(&epi->rdlink, &ep->rdllist);
  smp_mb();
  if (waitqueue_active(&ep->wq))
wake_up(&ep->wp);

But let's make it simple, thus current patch replaces ep->wq.lock with
the ep->lock for wqueue modifications, thus wake up path always observes
activeness of the wqueue correcty.

Fixes: a218cc491420 ("epoll: use rwlock in order to reduce ep_poll_callback() contention")
Reported-by: Max Neunhoeffer <max@arangodb.com>
Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Max Neunhoeffer <max@arangodb.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Christopher Kohlhoff <chris.kohlhoff@clearpool.io>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Jes Sorensen <jes.sorensen@gmail.com>
Cc: <stable@vger.kernel.org> [5.1+]
Link: http://lkml.kernel.org/r/20200214170211.561524-1-rpenyaev@suse.de
References: https://bugzilla.kernel.org/show_bug.cgi?id=205933
Bisected-by: Max Neunhoeffer <max@arangodb.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: do not allow MADV_PAGEOUT for CoW pages

Jann has brought up a very interesting point [1].  While shared pages
are excluded from MADV_PAGEOUT normally, CoW pages can be easily
reclaimed that way.  This can lead to all sorts of hard to debug
problems.  E.g.  performance problems outlined by Daniel [2].

There are runtime environments where there is a substantial memory
shared among security domains via CoW memory and a easy to reclaim way
of that memory, which MADV_{COLD,PAGEOUT} offers, can lead to either
performance degradation in for the parent process which might be more
privileged or even open side channel attacks.

The feasibility of the latter is not really clear to me TBH but there is
no real reason for exposure at this stage.  It seems there is no real
use case to depend on reclaiming CoW memory via madvise at this stage so
it is much easier to simply disallow it and this is what this patch
does.  Put it simply MADV_{PAGEOUT,COLD} can operate only on the
exclusively owned memory which is a straightforward semantic.

[1] http://lkml.kernel.org/r/CAG48ez0G3JkMq61gUmyQAaCq=_TwHbi1XKzWRooxZkv08PQKuw@mail.gmail.com
[2] http://lkml.kernel.org/r/CAKOZueua_v8jHCpmEtTB6f3i9e2YnmX4mqdYVWhV4E=Z-n+zRQ@mail.gmail.com

Fixes: 9c276cc65a58 ("mm: introduce MADV_COLD")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Daniel Colascione <dancol@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: "Joel Fernandes (Google)" <joel@joelfernandes.org>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200312082248.GS23944@dhcp22.suse.cz
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm, memcg: throttle allocators based on ancestral memory.high

Prior to this commit, we only directly check the affected cgroup's
memory.high against its usage. However, it's possible that we are being
reclaimed as a result of hitting an ancestor memory.high and should be
penalised based on that, instead.

This patch changes memory.high overage throttling to use the largest
overage in its ancestors when considering how many penalty jiffies to
charge. This makes sure that we penalise poorly behaving cgroups in the
same way regardless of at what level of the hierarchy memory.high was
breached.

Fixes: 0e4b01df8659 ("mm, memcg: throttle allocators when failing reclaim over memory.high")
Reported-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Chris Down <chris@chrisdown.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Nathan Chancellor <natechancellor@gmail.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: <stable@vger.kernel.org> [5.4.x+]
Link: http://lkml.kernel.org/r/8cd132f84bd7e16cdb8fde3378cdbf05ba00d387.1584036142.git.chris@chrisdown.name
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm, memcg: fix corruption on 64-bit divisor in memory.high throttling

Commit 0e4b01df8659 had a bunch of fixups to use the right division
method. However, it seems that after all that it still wasn't right --
div_u64 takes a 32-bit divisor.

The headroom is still large (2^32 pages), so on mundane systems you
won't hit this, but this should definitely be fixed.

Fixes: 0e4b01df8659 ("mm, memcg: throttle allocators when failing reclaim over memory.high")
Reported-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Chris Down <chris@chrisdown.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Nathan Chancellor <natechancellor@gmail.com>
Cc: <stable@vger.kernel.org> [5.4.x+]
Link: http://lkml.kernel.org/r/80780887060514967d414b3cd91f9a316a16ab98.1584036142.git.chris@chrisdown.name
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

page-flags: fix a crash at SetPageError(THP_SWAP)

Commit bd4c82c22c36 ("mm, THP, swap: delay splitting THP after swapped
out") supported writing THP to a swap device but forgot to upgrade an
older commit df8c94d13c7e ("page-flags: define behavior of FS/IO-related
flags on compound pages") which could trigger a crash during THP
swapping out with DEBUG_VM_PGFLAGS=y,

  kernel BUG at include/linux/page-flags.h:317!

  page dumped because: VM_BUG_ON_PAGE(1 && PageCompound(page))
  page:fffff3b2ec3a8000 refcount:512 mapcount:0 mapping:000000009eb0338c index:0x7f6e58200 head:fffff3b2ec3a8000 order:9 compound_mapcount:0 compound_pincount:0
  anon flags: 0x45fffe0000d8454(uptodate|lru|workingset|owner_priv_1|writeback|head|reclaim|swapbacked)

  end_swap_bio_write()
    SetPageError(page)
      VM_BUG_ON_PAGE(1 && PageCompound(page))

  <IRQ>
  bio_endio+0x297/0x560
  dec_pending+0x218/0x430 [dm_mod]
  clone_endio+0xe4/0x2c0 [dm_mod]
  bio_endio+0x297/0x560
  blk_update_request+0x201/0x920
  scsi_end_request+0x6b/0x4b0
  scsi_io_completion+0x509/0x7e0
  scsi_finish_command+0x1ed/0x2a0
  scsi_softirq_done+0x1c9/0x1d0
  __blk_mqnterrupt+0xf/0x20
  </IRQ>

Fix by checking PF_NO_TAIL in those places instead.

Fixes: bd4c82c22c36 ("mm, THP, swap: delay splitting THP after swapped out")
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: "Huang, Ying" <ying.huang@intel.com>
Acked-by: Rafael Aquini <aquini@redhat.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200310235846.1319-1-cai@lca.pw
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case

In section_deactivate(), pfn_to_page() doesn't work any more after
ms->section_mem_map is resetting to NULL in SPARSEMEM|!VMEMMAP case.  It
causes a hot remove failure:

  kernel BUG at mm/page_alloc.c:4806!
  invalid opcode: 0000 [#1] SMP PTI
  CPU: 3 PID: 8 Comm: kworker/u16:0 Tainted: G        W         5.5.0-next-20200205+ #340
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
  Workqueue: kacpi_hotplug acpi_hotplug_work_fn
  RIP: 0010:free_pages+0x85/0xa0
  Call Trace:
   __remove_pages+0x99/0xc0
   arch_remove_memory+0x23/0x4d
   try_remove_memory+0xc8/0x130
   __remove_memory+0xa/0x11
   acpi_memory_device_remove+0x72/0x100
   acpi_bus_trim+0x55/0x90
   acpi_device_hotplug+0x2eb/0x3d0
   acpi_hotplug_work_fn+0x1a/0x30
   process_one_work+0x1a7/0x370
   worker_thread+0x30/0x380
   kthread+0x112/0x130
   ret_from_fork+0x35/0x40

Let's move the ->section_mem_map resetting after
depopulate_section_memmap() to fix it.

[akpm@linux-foundation.org: remove unneeded initialization, per David]
Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
Signed-off-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Wei Yang <richardw.yang@linux.intel.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200307084229.28251-2-bhe@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

memcg: fix NULL pointer dereference in __mem_cgroup_usage_unregister_event

An eventfd monitors multiple memory thresholds of the cgroup, closes them,
the kernel deletes all events related to this eventfd.  Before all events
are deleted, another eventfd monitors the memory threshold of this cgroup,
leading to a crash:

  BUG: kernel NULL pointer dereference, address: 0000000000000004
  #PF: supervisor write access in kernel mode
  #PF: error_code(0x0002) - not-present page
  PGD 800000033058e067 P4D 800000033058e067 PUD 3355ce067 PMD 0
  Oops: 0002 [#1] SMP PTI
  CPU: 2 PID: 14012 Comm: kworker/2:6 Kdump: loaded Not tainted 5.6.0-rc4 #3
  Hardware name: LENOVO 20AWS01K00/20AWS01K00, BIOS GLET70WW (2.24 ) 05/21/2014
  Workqueue: events memcg_event_remove
  RIP: 0010:__mem_cgroup_usage_unregister_event+0xb3/0x190
  RSP: 0018:ffffb47e01c4fe18 EFLAGS: 00010202
  RAX: 0000000000000001 RBX: ffff8bb223a8a000 RCX: 0000000000000001
  RDX: 0000000000000001 RSI: ffff8bb22fb83540 RDI: 0000000000000001
  RBP: ffffb47e01c4fe48 R08: 0000000000000000 R09: 0000000000000010
  R10: 000000000000000c R11: 071c71c71c71c71c R12: ffff8bb226aba880
  R13: ffff8bb223a8a480 R14: 0000000000000000 R15: 0000000000000000
  FS:  0000000000000000(0000) GS:ffff8bb242680000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000004 CR3: 000000032c29c003 CR4: 00000000001606e0
  Call Trace:
    memcg_event_remove+0x32/0x90
    process_one_work+0x172/0x380
    worker_thread+0x49/0x3f0
    kthread+0xf8/0x130
    ret_from_fork+0x35/0x40
  CR2: 0000000000000004

We can reproduce this problem in the following ways:

1. We create a new cgroup subdirectory and a new eventfd, and then we
   monitor multiple memory thresholds of the cgroup through this eventfd.

2.  closing this eventfd, and __mem_cgroup_usage_unregister_event ()
   will be called multiple times to delete all events related to this
   eventfd.

The first time __mem_cgroup_usage_unregister_event() is called, the
kernel will clear all items related to this eventfd in thresholds->
primary.

Since there is currently only one eventfd, thresholds-> primary becomes
empty, so the kernel will set thresholds-> primary and hresholds-> spare
to NULL.  If at this time, the user creates a new eventfd and monitor
the memory threshold of this cgroup, kernel will re-initialize
thresholds-> primary.

Then when __mem_cgroup_usage_unregister_event () is called for the
second time, because thresholds-> primary is not empty, the system will
access thresholds-> spare, but thresholds-> spare is NULL, which will
trigger a crash.

In general, the longer it takes to delete all events related to this
eventfd, the easier it is to trigger this problem.

The solution is to check whether the thresholds associated with the
eventfd has been cleared when deleting the event.  If so, we do nothing.

[akpm@linux-foundation.org: fix comment, per Kirill]
Fixes: 907860ed381a ("cgroups: make cftype.unregister_event() void-returning")
Signed-off-by: Chunguang Xu <brookxu@tencent.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/077a6f67-aefa-4591-efec-f2f3af2b0b02@gmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'block-5.6-20200320' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
"Just two NVMe fabrics fixes that should go into 5.6"

* tag 'block-5.6-20200320' of git://git.kernel.dk/linux-block:
nvmet-tcp: set MSG_MORE only if we actually have more to send
nvme-rdma: Avoid double freeing of async event data

Merge tag 'io_uring-5.6-20200320' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:
"Two different fixes in here:

   - Fix for a potential NULL pointer deref for links with async or
     drain marked (Pavel)

   - Fix for not properly checking RLIMIT_NOFILE for async punted
     operations.

     This affects openat/openat2, which were added this cycle, and
     accept4. I did a full audit of other cases where we might check
     current->signal->rlim[] and found only RLIMIT_FSIZE for buffered
     writes and fallocate. That one is fixed and queued for 5.7 and
     marked stable"

* tag 'io_uring-5.6-20200320' of git://git.kernel.dk/linux-block:
  io_uring: make sure accept honor rlimit nofile
  io_uring: make sure openat/openat2 honor rlimit nofile
  io_uring: NULL-deref for IOSQE_{ASYNC,DRAIN}

Merge branch 'turbostat' of git://git./linux/kernel/git/lenb/linux

Pull turbostat updates from Len Brown:
"Update to turbostat v20.03.20.

  These patches unlock the full turbostat features for some new
  machines, plus a couple other minor tweaks"

* 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
  tools/power turbostat: update version
  tools/power turbostat: Print cpuidle information
  tools/power turbostat: Fix 32-bit capabilities warning
  tools/power turbostat: Fix missing SYS_LPI counter on some Chromebooks
  tools/power turbostat: Support Elkhart Lake
  tools/power turbostat: Support Jasper Lake
  tools/power turbostat: Support Ice Lake server
  tools/power turbostat: Support Tiger Lake
  tools/power turbostat: Fix gcc build warnings
  tools/power turbostat: Support Cometlake