Miquel Raynal [Sun, 17 Nov 2019 17:34:01 +0000 (18:34 +0100)]
Merge tag 'spi-nor/for-5.5' into mtd/next
SPI NOR core changes:
- introduce 'struct spi_nor_controller_ops',
- clean the Register Operations methods,
- use dev_dbg insted of dev_err for low level info,
- fix retlen handling in sst_write(),
- fix silent truncations in spi_nor_read and spi_nor_read_raw(),
- fix the clearing of QE bit on lock()/unlock(),
- rework the disabling of the block write protection,
- rework the Quad Enable methods,
- make sure nor->spimem and nor->controller_ops are mutually exclusive,
- set default Quad Enable method for ISSI flashes,
- add support for few flashes.
SPI NOR controller drivers changes:
- intel-spi:
- support chips without software sequencer,
- add support for Intel Cannon Lake and Intel Comet Lake-H flashes.
Miquel Raynal [Sun, 17 Nov 2019 17:32:35 +0000 (18:32 +0100)]
Merge CFI/Hyperbus tag 'for-v5.5-rc1' into mtd/next
CFI core changes:
* Code cleanups related useless initializers and coding style issues
* Fix for a possible double free problem in cfi_cmdset_0002
* Improved error reporting and handling in cfi_cmdset_0002 core for HyperFlash
Angelo Dureghello [Wed, 30 Oct 2019 11:39:57 +0000 (12:39 +0100)]
mtd: devices: fix mchp23k256 read and write
Due to the use of sizeof(), command size set for the spi transfer
was wrong. Driver was sending and receiving always 1 byte less
and especially on write, it was hanging.
echo -n -e "\\x1\\x2\\x3\\x4" > /dev/mtd1
And read part too now works as expected.
hexdump -C -n16 /dev/mtd1
00000000 01 02 03 04 ab f3 ad c2 ab e3 f4 36 dd 38 04 15
00000010
Fixes:
4379075a870b ("mtd: mchp23k256: Add support for mchp23lcv1024")
Signed-off-by: Angelo Dureghello <angelo.dureghello@timesys.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Greg Kroah-Hartman [Thu, 7 Nov 2019 08:51:11 +0000 (09:51 +0100)]
mtd: no need to check return value of debugfs_create functions
When calling debugfs functions, there is no need to ever check the
return value. The function can work or not, but the code logic should
never do something different based on this.
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Brian Norris <computersforpeace@gmail.com>
Cc: Marek Vasut <marek.vasut@gmail.com>
Cc: Miquel Raynal <miquel.raynal@bootlin.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Vignesh Raghavendra <vigneshr@ti.com>
Cc: Artem Bityutskiy <dedekind1@gmail.com>
Cc: linux-mtd@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Sagar Shrikant Kadam [Tue, 22 Oct 2019 17:22:19 +0000 (17:22 +0000)]
mtd: spi-nor: Set default Quad Enable method for ISSI flashes
Set the default Quad Enable method for ISSI flashes. Used for
ISSI flashes (IS25WP256D-JMLE) that do not support SFDP tables
and can not determine the Quad Enable method by parsing BFPT.
Based on code originally written by Wesley Terpstra <wesley@sifive.com>
and/or Palmer Dabbelt <palmer@sifive.com>
https://github.com/riscv/riscv-linux/commit/
c94e267766d62bc9a669611c3d0c8ed5ea26569b
Signed-off-by: Sagar Shrikant Kadam <sagar.kadam@sifive.com>
[tudor.ambarus@microchip.com:
- rebase, split and adapt for latest spi-nor/next,
- use PMC CFI ID for ISSI. According to JEP106BA, "Programmable Micro Corp"
changed its name to Integrated Silicon Solution (ISSI)]
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Reviewed-by: Vignesh Raghavendra <vigneshr@ti.com>
Sagar Shrikant Kadam [Tue, 22 Oct 2019 17:22:18 +0000 (17:22 +0000)]
mtd: spi-nor: Add support for is25wp256
Update the spi_nor_id table for is25wp256 (32MB) device from ISSI,
present on HiFive Unleashed dev board (Rev: A00).
Use the post bfpt fixup hook for the is25wp256 device, as done for
the is25lp256 device to overwrite the wrong address width advertised
by BFPT.
Signed-off-by: Sagar Shrikant Kadam <sagar.kadam@sifive.com>
[tudor.ambarus@microchip.com: rebase, split and adapt for latest spi-nor/next]
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Reviewed-by: Vignesh Raghavendra <vigneshr@ti.com>
Manivannan Sadhasivam [Wed, 30 Oct 2019 09:01:24 +0000 (14:31 +0530)]
mtd: spi-nor: Add support for w25q256jw
Add MTD support for w25q256jw SPI NOR chip from Winbond. This chip
supports dual/quad I/O mode with 512 blocks of memory organized in
64KB sectors. In addition to this, there is also small 4KB sectors
available for flexibility. The device has been validated using Thor96
board.
Cc: Marek Vasut <marek.vasut@gmail.com>
Cc: Tudor Ambarus <tudor.ambarus@microchip.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Brian Norris <computersforpeace@gmail.com>
Cc: Miquel Raynal <miquel.raynal@bootlin.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Vignesh Raghavendra <vigneshr@ti.com>
Cc: linux-mtd@lists.infradead.org
Signed-off-by: Darshak Patel <darshak.patel@einfochips.com>
[Mani: cleaned up for upstream]
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Tudor Ambarus [Fri, 25 Oct 2019 14:28:36 +0000 (14:28 +0000)]
mtd: spi-nor: Move condition to avoid a NULL check
When the controller is not under the SPI-MEM interface it may implement
the optional controller_ops->erase() method.
nor->spimem and nor->controller_ops are mutually exclusive. Move the
nor->controller_ops->erase != NULL check as an 'else if' case to
nor->spimem, in order to avoid the nor->controller_ops != NULL
check.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Tudor Ambarus [Fri, 25 Oct 2019 14:28:34 +0000 (14:28 +0000)]
mtd: spi-nor: Make sure nor->spimem and nor->controller_ops are mutually exclusive
Expand the spi_nor_check() to make sure that nor->spimem and
nor->controller_ops are mutually exclusive.
Fixes:
b35b9a10362d ("mtd: spi-nor: Move m25p80 code in spi-nor.c")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Tudor Ambarus [Thu, 7 Nov 2019 08:42:09 +0000 (08:42 +0000)]
mtd: spi-nor: Rename Quad Enable methods
Rename macronix_quad_enable() to a generic name:
spi_nor_sr1_bit6_quad_enable().
Prepend "spi_nor_" to "sr2_bit7_quad_enable". All SPI NOR generic
methods should be prepended by "spi_nor_".
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Reviewed-by: Vignesh Raghavendra <vigneshr@ti.com>
Tudor Ambarus [Thu, 7 Nov 2019 08:42:05 +0000 (08:42 +0000)]
mtd: spi-nor: Merge spansion Quad Enable methods
Merge
spansion_no_read_cr_quad_enable()
spansion_read_cr_quad_enable()
into
spi_nor_sr2_bit1_quad_enable().
Reduce code duplication by introducing spi_nor_write_16bit_cr_and_check().
The Configuration Register contains bits that can be updated in future:
FREEZE, CMP. Provide a generic method that allows updating all bits
of the Configuration Register.
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Reviewed-by: Vignesh Raghavendra <vigneshr@ti.com>
Tudor Ambarus [Thu, 7 Nov 2019 08:42:01 +0000 (08:42 +0000)]
mtd: spi-nor: Rename CR_QUAD_EN_SPAN to SR2_QUAD_EN_BIT1
JEDEC Basic Flash Parameter Table, 15th DWORD, bits 22:20,
refers to this bit as "bit 1 of the status register 2".
Rename the macro accordingly.
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Reviewed-by: Vignesh Raghavendra <vigneshr@ti.com>
Tudor Ambarus [Thu, 7 Nov 2019 08:41:58 +0000 (08:41 +0000)]
mtd: spi-nor: Extend the SR Read Back test
Test that all the bits from Status Register 1 and Status Register 2
were written correctly.
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Reviewed-by: Vignesh Raghavendra <vigneshr@ti.com>
Tudor Ambarus [Thu, 7 Nov 2019 08:41:55 +0000 (08:41 +0000)]
mtd: spi-nor: Rework the disabling of block write protection
spi_nor_unlock() unlocks blocks of memory or the entire flash memory
array, if requested. clear_sr_bp() unlocks the entire flash memory
array at boot time. This calls for some unification, clear_sr_bp() is
just an optimization for the case when the unlock request covers the
entire flash size.
Get rid of clear_sr_bp() and introduce spi_nor_unlock_all(), which is
just a call to spi_nor_unlock() for the entire flash memory array.
This fixes a bug that was present in spi_nor_spansion_clear_sr_bp().
When the QE bit was zero, we used the Write Status (01h) command with
one data byte, which might cleared the Status Register 2. We now always
use the Write Status (01h) command with two data bytes when
SNOR_F_HAS_16BIT_SR is set, to avoid clearing the Status Register 2.
The SNOR_F_NO_READ_CR case is treated as well. When the flash doesn't
support the CR Read command, we make an assumption about the value of
the QE bit. In spi_nor_init(), call spi_nor_quad_enable() first, then
spi_nor_unlock_all(), so that at the spi_nor_unlock_all() time we can
be sure the QE bit has value one, because of the previous call to
spi_nor_quad_enable().
Get rid of the MFR handling and implement specific manufacturer
default_init() fixup hooks.
Note that this changes a bit the logic for the SNOR_MFR_ATMEL,
SNOR_MFR_INTEL and SNOR_MFR_SST cases. Before this patch, the Atmel,
Intel and SST chips did not set the locking ops, but unlocked the entire
flash at boot time, while now they are setting the locking ops to
stm_locking_ops. This should work, since the disable of the block
protection at the boot time used the same Status Register bits to unlock
the flash, as in the stm_locking_ops case.
Suggested-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Reviewed-by: Vignesh Raghavendra <vigneshr@ti.com>
Tudor Ambarus [Thu, 7 Nov 2019 08:41:51 +0000 (08:41 +0000)]
mtd: spi-nor: Fix clearing of QE bit on lock()/unlock()
Make sure that when doing a lock() or an unlock() operation we don't clear
the QE bit from Status Register 2.
JESD216 revB or later offers information about the *default* Status
Register commands to use (see BFPT DWORDS[15], bits 22:20). In this
standard, Status Register 1 refers to the first data byte transferred on a
Read Status (05h) or Write Status (01h) command. Status register 2 refers
to the byte read using instruction 35h. Status register 2 is the second
byte transferred in a Write Status (01h) command.
Industry naming and definitions of these Status Registers may differ.
The definitions are described in JESD216B, BFPT DWORDS[15], bits 22:20.
There are cases in which writing only one byte to the Status Register 1
has the side-effect of clearing Status Register 2 and implicitly the Quad
Enable bit. This side-effect is hit just by the
BFPT_DWORD15_QER_SR2_BIT1_BUGGY and BFPT_DWORD15_QER_SR2_BIT1 cases.
Suggested-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Reviewed-by: Vignesh Raghavendra <vigneshr@ti.com>
Sergei Shtylyov [Thu, 31 Oct 2019 20:39:39 +0000 (23:39 +0300)]
mtd: cfi_cmdset_0002: fix delayed error detection on HyperFlash
The commit
4844ef80305d ("mtd: cfi_cmdset_0002: Add support for polling
status register") added checking for the status register error bits into
chip_good() to only return 1 if these bits are 0s. Unfortunately, this
means that polling using chip_good() always reaches a timeout condition
when erase or program failure bits are set. Let's fully delegate the task
of determining the error conditions to cfi_check_err_status() and make
chip_good() only look for the Device Ready/Busy condition.
Fixes:
4844ef80305d ("mtd: cfi_cmdset_0002: Add support for polling status register")
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
Sergei Shtylyov [Thu, 31 Oct 2019 20:37:27 +0000 (23:37 +0300)]
mtd: cfi_cmdset_0002: only check errors when ready in cfi_check_err_status()
Cypress S26K{L|S}P{128|256|512}S datasheet says that the error bits in
the status register are only valid when the "device ready" bit 7 is set.
Add the check for the device ready bit in cfi_check_err_status() as that
function isn't always called with this bit set.
Fixes:
4844ef80305d ("mtd: cfi_cmdset_0002: Add support for polling status register")
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
Hou Tao [Tue, 8 Oct 2019 02:36:37 +0000 (10:36 +0800)]
mtd: cfi_cmdset_0002: don't free cfi->cfiq in error path of cfi_amdstd_setup()
Else there may be a double-free problem, because cfi->cfiq will
be freed by mtd_do_chip_probe() if both the two invocations of
check_cmd_set() return failure.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Reviewed-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
Sergei Shtylyov [Thu, 3 Oct 2019 20:27:39 +0000 (23:27 +0300)]
mtd: cfi_cmdset_*: kill useless 'ret' variable initializers
The 'ret' local variables are typically initialized to 0 but this value is
often unused, thus we can kill those initializers.
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
Sergei Shtylyov [Fri, 27 Sep 2019 20:22:32 +0000 (23:22 +0300)]
mtd: cfi_util: use DIV_ROUND_UP() in cfi_udelay()
Use DIV_ROUND_UP() in cfi_udelay() instead of open-coding it...
Doing this also helpfully gets rid of two complaints from
'scripts/checkpatch.pl --strict':
CHECK: spaces preferred around that '+' (ctx:VxV)
#29: FILE: drivers/mtd/chips/cfi_util.c:29:
+ msleep((us+999)/1000);
^
CHECK: spaces preferred around that '/' (ctx:VxV)
#29: FILE: drivers/mtd/chips/cfi_util.c:29:
+ msleep((us+999)/1000);
^
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
Tudor Ambarus [Sat, 2 Nov 2019 11:23:44 +0000 (11:23 +0000)]
mtd: spi-nor: Print debug message when the read back test fails
Demystify where the EIO error occurs.
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Reviewed-by: Vignesh Raghavendra <vigneshr@ti.com>
Tudor Ambarus [Sat, 2 Nov 2019 11:23:43 +0000 (11:23 +0000)]
mtd: spi-nor: Check all the bits written, not just the BP ones
Check that all the bits written in the write_sr_and_check() method
match the status_new received value. Failing to write the other bits
is dangerous too, extend the check.
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Reviewed-by: Vignesh Raghavendra <vigneshr@ti.com>
Tudor Ambarus [Sat, 2 Nov 2019 11:23:41 +0000 (11:23 +0000)]
mtd: spi-nor: Fix errno on Quad Enable methods
When the Read-Modify-Write-Read-Back Quad Enable methods failed on
the Read-Back, they returned -EINVAL. Since this is an I/O error,
return -EIO.
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Reviewed-by: Vignesh Raghavendra <vigneshr@ti.com>
Tudor Ambarus [Sat, 2 Nov 2019 11:23:39 +0000 (11:23 +0000)]
mtd: spi-nor: Drop spansion_quad_enable()
Drop the default spansion_quad_enable() method and replace it with
spansion_read_cr_quad_enable().
The function was buggy, it didn't care about the previous values
of the Status and Configuration Registers. spansion_read_cr_quad_enable()
is a Read-Modify-Write-Check function that keeps track of what were
the previous values of the Status and Configuration Registers.
In terms of instruction types sent to the flash, the only difference
between the spansion_quad_enable() and spansion_read_cr_quad_enable()
is that the later calls spi_nor_read_sr(). We can safely assume that all
flashes support spi_nor_read_sr(), because all flashes call it in
spi_nor_sr_ready(). The transition from spansion_quad_enable() to
spansion_read_cr_quad_enable() will not affect anybody, drop the buggy
code.
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Reviewed-by: Vignesh Raghavendra <vigneshr@ti.com>
Tudor Ambarus [Sat, 2 Nov 2019 11:23:37 +0000 (11:23 +0000)]
mtd: spi-nor: Describe all the Reg Ops
Document all the Register Operations.
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Reviewed-by: Vignesh Raghavendra <vigneshr@ti.com>
Tudor Ambarus [Sat, 2 Nov 2019 11:23:35 +0000 (11:23 +0000)]
mtd: spi-nor: Merge spi_nor_write_sr() and spi_nor_write_sr_cr()
Merge
static int spi_nor_write_sr(struct spi_nor *nor, u8 val)
static int spi_nor_write_sr_cr(struct spi_nor *nor, const u8 *sr_cr)
into
static int spi_nor_write_sr(struct spi_nor *nor, const u8 *sr, size_t len)
The Status Register can be written with one or two bytes. Merge
the two functions to avoid code duplication.
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Tudor Ambarus [Sat, 2 Nov 2019 11:23:34 +0000 (11:23 +0000)]
mtd: spi-nor: Move the WE and wait calls inside Write SR methods
Avoid duplicating code by moving the calls to spi_nor_write_enable() and
spi_nor_wait_till_ready() inside the Write Status Register methods.
Move spi_nor_write_sr() to avoid forward declaration of
spi_nor_wait_till_ready().
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Tudor Ambarus [Sat, 2 Nov 2019 11:23:32 +0000 (11:23 +0000)]
mtd: spi-nor: Void return type for spi_nor_clear_sr/fsr()
spi_nor_clear_sr() and spi_nor_clear_fsr() are called just in case
of errors. The callers didn't check their return value, make them
of type void.
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Tudor Ambarus [Sat, 2 Nov 2019 11:23:30 +0000 (11:23 +0000)]
mtd: spi-nor: Rename label as it is no longer generic
Rename 'sst_write_err' label to 'out' as it is no longer generic for
all the errors in the sst_write() method, and may introduce confusion.
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Tudor Ambarus [Sat, 2 Nov 2019 11:23:28 +0000 (11:23 +0000)]
mtd: spi-nor: Check for errors after each Register Operation
Check for the return vales of each Register Operation.
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Reviewed-by: Vignesh Raghavendra <vigneshr@ti.com>
Tudor Ambarus [Sat, 2 Nov 2019 11:23:27 +0000 (11:23 +0000)]
mtd: spi-nor: Print debug info inside Reg Ops methods
Spare the callers of printing debug messages by themselves.
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Reviewed-by: Vignesh Raghavendra <vigneshr@ti.com>
Tudor Ambarus [Sat, 2 Nov 2019 11:23:25 +0000 (11:23 +0000)]
mtd: spi-nor: Use dev_dbg insted of dev_err for low level info
What most users care about is "my dev is not working properly".
All low level information should be discovered when activating
the debug traces.
Keep error messages just for the following cases:
- when the SR/FSR report program or erase fails, or attempts of
modifying a protected sector,
- when the JEDEC ID is not recognized,
- when the resume() call fails,
- when the spi_nor_check() fails.
Suggested-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Reviewed-by: Vignesh Raghavendra <vigneshr@ti.com>
Sergei Shtylyov [Wed, 30 Oct 2019 18:53:03 +0000 (21:53 +0300)]
mtd: spi-nor: fix silent truncation in spi_nor_read_raw()
spi_nor_read_raw() assigns the result of 'ssize_t spi_nor_read_data()'
to the 'int ret' variable, while 'ssize_t' is a 64-bit type and *int*
is a 32-bit type on the 64-bit machines. This silent truncation isn't
really valid, so fix up the variable's type.
Fixes:
f384b352cbf0 ("mtd: spi-nor: parse Serial Flash Discoverable Parameters (SFDP) tables")
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Sergei Shtylyov [Wed, 30 Oct 2019 18:48:59 +0000 (21:48 +0300)]
mtd: spi-nor: fix silent truncation in spi_nor_read()
spi_nor_read() assigns the result of 'ssize_t spi_nor_read_data()'
to the 'int ret' variable, while 'ssize_t' is a 64-bit type and *int*
is a 32-bit type on the 64-bit machines. This silent truncation isn't
really valid, so fix up the variable's type.
Fixes:
59451e1233bd ("mtd: spi-nor: change return value of read/write")
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Mika Westerberg [Wed, 23 Oct 2019 14:51:40 +0000 (17:51 +0300)]
mtd: spi-nor: intel-spi: Add support for Intel Comet Lake-H SPI serial flash
Intel Comet Lake-H PCH has the same SPI serial flash controller as Comet
Lake-LP. Add Comet Lake-H PCI ID to the driver list of supported devices.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Tudor Ambarus [Wed, 23 Oct 2019 23:28:07 +0000 (02:28 +0300)]
mtd: spi-nor: Print device info in case of error
Print identifying information about struct device.
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Tudor Ambarus [Sat, 26 Oct 2019 10:34:00 +0000 (13:34 +0300)]
mtd: spi-nor: Constify data to write to the Status Register
Constify the data to write to the Status Register.
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Tudor Ambarus [Fri, 25 Oct 2019 07:36:16 +0000 (10:36 +0300)]
mtd: spi-nor: Fix retlen handling in sst_write()
In case the write of the first byte failed, retlen was incorrectly
incremented to *retlen += actual; on the exit path. retlen should be
incremented when actual data was written to the flash.
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Tudor Ambarus [Thu, 24 Oct 2019 14:23:23 +0000 (17:23 +0300)]
mtd: spi-nor: Drop redundant error reports in Reg Ops callers
Drop the error messages from the callers, since the callees
already print an error message in case of failure.
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Tudor Ambarus [Thu, 24 Oct 2019 13:59:55 +0000 (16:59 +0300)]
mtd: spi-nor: Pointer parameter for CR in spi_nor_read_cr()
Let the callers pass the pointer to the DMA-able buffer where
the value of the Configuration Register will be written. This way we
avoid the casts between int and u8, which can be confusing.
Callers stop compare the return value of spi_nor_read_cr() with negative,
spi_nor_read_cr() returns 0 on success and -errno otherwise.
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Tudor Ambarus [Thu, 24 Oct 2019 13:40:13 +0000 (16:40 +0300)]
mtd: spi-nor: Pointer parameter for FSR in spi_nor_read_fsr()
Let the callers pass the pointer to the DMA-able buffer where
the value of the Flag Status Register will be written. This way we
avoid the casts between int and u8, which can be confusing.
Caller stops compare the return value of spi_nor_read_fsr() with negative,
spi_nor_read_fsr() returns 0 on success and -errno otherwise.
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Tudor Ambarus [Thu, 24 Oct 2019 12:55:34 +0000 (15:55 +0300)]
mtd: spi-nor: Pointer parameter for SR in spi_nor_read_sr()
Let the callers pass the pointer to the DMA-able buffer where
the value of the Status Register will be written. This way we
avoid the casts between int and u8, which can be confusing.
Callers stop compare the return value of spi_nor_read_sr() with negative,
spi_nor_read_sr() returns 0 on success and -errno otherwise.
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Tudor Ambarus [Tue, 29 Oct 2019 11:16:59 +0000 (11:16 +0000)]
mtd: spi-nor: Don't overwrite errno from Reg Ops
Do not overwrite the error numbers received the Register Operations
methods.
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Tudor Ambarus [Tue, 29 Oct 2019 11:16:55 +0000 (11:16 +0000)]
mtd: spi-nor: Drop explicit cast to int to already int value
ret is already of type int.
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Tudor Ambarus [Tue, 29 Oct 2019 11:16:54 +0000 (11:16 +0000)]
mtd: spi-nor: Stop compare with negative in Reg Ops methods
spi_mem_exec_op()
nor->controller_ops->write_reg()
nor->controller_ops->read_reg()
spi_nor_wait_till_ready()
Return 0 on success, -errno otherwise.
Stop compare with negative and compare with zero in all the register
operations methods.
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Tudor Ambarus [Tue, 29 Oct 2019 11:16:52 +0000 (11:16 +0000)]
mtd: spi-nor: Group all Reg Ops to avoid forward declarations
Group all register methods up in the file, to avoid forward
declarations.
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Tudor Ambarus [Tue, 29 Oct 2019 11:16:50 +0000 (11:16 +0000)]
mtd: spi-nor: Drop duplicated new line
Two new lines, one after another, drop one.
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Tudor Ambarus [Tue, 29 Oct 2019 11:16:49 +0000 (11:16 +0000)]
mtd: spi-nor: Prepend spi_nor_ to all Reg Ops methods
All the core functions should begin with "spi_nor_".
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Miquel Raynal [Mon, 28 Oct 2019 16:02:08 +0000 (17:02 +0100)]
MAINTAINERS: ubi/ubifs: Update the Git repository
UBI/UBIFS development now happens on Richard Weinberger's kernel.org
'ubifs' repository.
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Acked-by: Richard Weinberger <richard@nod.at>
Miquel Raynal [Tue, 22 Oct 2019 14:58:59 +0000 (16:58 +0200)]
mtd: spear_smi: Fix Write Burst mode
Any write with either dd or flashcp to a device driven by the
spear_smi.c driver will pass through the spear_smi_cpy_toio()
function. This function will get called for chunks of up to 256 bytes.
If the amount of data is smaller, we may have a problem if the data
length is not 4-byte aligned. In this situation, the kernel panics
during the memcpy:
# dd if=/dev/urandom bs=1001 count=1 of=/dev/mtd6
spear_smi_cpy_toio [620] dest
c9070000, src
c7be8800, len 256
spear_smi_cpy_toio [620] dest
c9070100, src
c7be8900, len 256
spear_smi_cpy_toio [620] dest
c9070200, src
c7be8a00, len 256
spear_smi_cpy_toio [620] dest
c9070300, src
c7be8b00, len 233
Unhandled fault: external abort on non-linefetch (0x808) at 0xc90703e8
[...]
PC is at memcpy+0xcc/0x330
The above error occurs because the implementation of memcpy_toio()
tries to optimize the number of I/O by writing 4 bytes at a time as
much as possible, until there are less than 4 bytes left and then
switches to word or byte writes.
Unfortunately, the specification states about the Write Burst mode:
"the next AHB Write request should point to the next
incremented address and should have the same size (byte,
half-word or word)"
This means ARM architecture implementation of memcpy_toio() cannot
reliably be used blindly here. Workaround this situation by update the
write path to stick to byte access when the burst length is not
multiple of 4.
Fixes:
f18dbbb1bfe0 ("mtd: ST SPEAr: Add SMI driver for serial NOR flash")
Cc: Russell King <linux@armlinux.org.uk>
Cc: Boris Brezillon <boris.brezillon@collabora.com>
Cc: stable@vger.kernel.org
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Reviewed-by: Russell King <rmk+kernel@armlinux.org.uk>
Linus Walleij [Sun, 20 Oct 2019 23:00:42 +0000 (01:00 +0200)]
mtd: physmap_of: add a hook for Intel IXP4xx flash probing
In order to support device tree probing of IXP4xx NOR flash
chips, a certain big-endian or mixed-endian memory access
pattern need to be used.
I have opted to use the pattern set by previous plug-ins
to physmap for Gemini and Versatile, just override some
functions and reuse most of the physmap core code as it
is to minimize maintenance.
Parts of drivers/mtd/ixp4xx.c are copied into this file.
After we have IXP4xx converted fully to device tree, the
drivers/mtd/ixp4xx.c file will be deleted and this will
be the only access pattern to the IXP4xx flash.
I did not keep the quirk in the flash write function
after probe, where the old code for a while checks for
access to odd addresses, fails and assigns a "faster"
write function once it has convinced probe to only use
2-byte accesses. As we mandate that this device should
be using bank-width = <2> this should not be a problem
unless misconfigured.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Linus Walleij [Sun, 20 Oct 2019 23:00:41 +0000 (01:00 +0200)]
mtd: add DT bindings for the Intel IXP4xx Flash
This adds device tree bindings for the Intel IXP4xx
flash controller, a simple physmap which however need a
specific big-endian or mixed-endian access pattern to the
memory.
Cc: devicetree@vger.kernel.org
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Miquel Raynal [Thu, 17 Oct 2019 14:22:29 +0000 (16:22 +0200)]
MAINTAINERS: mtd/ubi/ubifs: Remove inactive maintainers
Despite their substantial personal investment in the MTD/UBI/UBIFS a
few years back, David, Brian, Artem and Adrian are not actively
maintaining the subsystem anymore. We warmly salute them for all the
work they have achieved and will of course still welcome their
participation and reviews.
That said, Marek retired himself a few weeks ago quoting Harald [1]:
It matters who has which title and when. Should somebody not
be an active maintainer, make sure he's not listed as such.
For this same reason, let’s trim the maintainers list with the
actually active ones over the past two years.
[1] http://laforge.gnumonks.org/blog/
20180307-mchardy-gpl/
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Brian Norris <computersforpeace@gmail.com>
Cc: Artem Bityutskiy <dedekind1@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Marek Vasut <marek.vasut@gmail.com>
Cc: Miquel Raynal <miquel.raynal@bootlin.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Vignesh Raghavendra <vigneshr@ti.com>
Cc: Tudor Ambarus <tudor.ambarus@microchip.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Brian Norris <computersforpeace@gmail.com>
Acked-by: Artem Bityutskiy <dedekind1@gmail.com>
Fuqian Huang [Thu, 10 Oct 2019 13:58:09 +0000 (21:58 +0800)]
mtd: maps: l440gx: Avoid printing address to dmesg
Avoid printing the address of l440gx_map.virt every time l440gx init.
Signed-off-by: Fuqian Huang <huangfq.daxian@gmail.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Tudor Ambarus [Tue, 24 Sep 2019 07:45:58 +0000 (07:45 +0000)]
mtd: spi-nor: cadence-quadspi: Fix cqspi_command_read() definition
n_tx was never used, drop it. Replace 'const u8 *txbuf' with 'u8 opcode',
to comply with the SPI NOR int (*read_reg)() method. The 'const'
qualifier has no meaning for parameters passed by value, drop it.
Going furher, the opcode was passed to cqspi_calc_rdreg() and never used,
drop it.
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Tudor Ambarus [Tue, 24 Sep 2019 07:45:53 +0000 (07:45 +0000)]
mtd: spi-nor: Introduce 'struct spi_nor_controller_ops'
Move all SPI NOR controller driver specific ops in a dedicated
structure. 'struct spi_nor' becomes lighter.
Use size_t for lengths in 'int (*write_reg)()' and 'int (*read_reg)()'.
Rename wite/read_buf to buf, the name of the functions are
suggestive enough. Constify buf in int (*write_reg). Comply with these
changes in the SPI NOR controller drivers.
Suggested-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Tudor Ambarus [Tue, 24 Sep 2019 07:45:50 +0000 (07:45 +0000)]
mtd: spi-nor: hisi-sfc: Drop nor->erase NULL assignment
The pointer to 'struct spi_nor' is kzalloc'ed above in the code.
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
DENG Qingfang [Tue, 22 Oct 2019 16:59:39 +0000 (00:59 +0800)]
mtd: spi-nor: add support for en25qh16
Tested on HiWiFi C526A
Datasheet is available at:
http://www.xinyahong.com/upLoad/product/month_1411/
201411201256018276.pdf
Signed-off-by: DENG Qingfang <dqfext@gmail.com>
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Jethro Beekman [Wed, 4 Sep 2019 01:15:24 +0000 (01:15 +0000)]
mtd: spi-nor: intel-spi: add support for Intel Cannon Lake SPI flash
Now that SPI flash controllers without a software sequencer are
supported, it's trivial to add support for CNL and its PCI ID.
Values from https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/300-series-chipset-pch-datasheet-vol-2.pdf
Signed-off-by: Jethro Beekman <jethro@fortanix.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Jethro Beekman [Wed, 4 Sep 2019 01:15:14 +0000 (01:15 +0000)]
mtd: spi-nor: intel-spi: support chips without software sequencer
Some flash controllers don't have a software sequencer. Avoid
configuring the register addresses for it, and double check
everywhere that its not accidentally trying to be used.
Every use of `sregs` is now guarded by a check of `sregs` or
`swseq_reg`. The check might be done in the calling function.
Signed-off-by: Jethro Beekman <jethro@fortanix.com>
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Linus Torvalds [Sun, 20 Oct 2019 19:56:22 +0000 (15:56 -0400)]
Linux 5.4-rc4
Linus Torvalds [Sun, 20 Oct 2019 16:36:57 +0000 (12:36 -0400)]
Merge tag 'kbuild-fixes-v5.4-2' of git://git./linux/kernel/git/masahiroy/linux-kbuild
Pull more Kbuild fixes from Masahiro Yamada:
- fix a bashism of setlocalversion
- do not use the too new --sort option of tar
* tag 'kbuild-fixes-v5.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kheaders: substituting --sort in archive creation
scripts: setlocalversion: fix a bashism
kbuild: update comment about KBUILD_ALLDIRS
Linus Torvalds [Sun, 20 Oct 2019 10:31:14 +0000 (06:31 -0400)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"A small set of x86 fixes:
- Prevent a NULL pointer dereference in the X2APIC code in case of a
CPU hotplug failure.
- Prevent boot failures on HP superdome machines by invalidating the
level2 kernel pagetable entries outside of the kernel area as
invalid so BIOS reserved space won't be touched unintentionally.
Also ensure that memory holes are rounded up to the next PMD
boundary correctly.
- Enable X2APIC support on Hyper-V to prevent boot failures.
- Set the paravirt name when running on Hyper-V for consistency
- Move a function under the appropriate ifdef guard to prevent build
warnings"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot/acpi: Move get_cmdline_acpi_rsdp() under #ifdef guard
x86/hyperv: Set pv_info.name to "Hyper-V"
x86/apic/x2apic: Fix a NULL pointer deref when handling a dying cpu
x86/hyperv: Make vapic support x2apic mode
x86/boot/64: Round memory hole size up to next PMD page
x86/boot/64: Make level2_kernel_pgt pages invalid outside kernel area
Linus Torvalds [Sun, 20 Oct 2019 10:27:54 +0000 (06:27 -0400)]
Merge branch 'irq-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
"A small set of irq chip driver fixes and updates:
- Update the SIFIVE PLIC interrupt driver to use the fasteoi handler
to address the shortcomings of the existing flow handling which was
prone to lose interrupts
- Use the proper limit for GIC interrupt line numbers
- Add retrigger support for the recently merged Anapurna Labs Fabric
interrupt controller to make it complete
- Enable the ATMEL AIC5 interrupt controller driver on the new
SAM9X60 SoC"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/sifive-plic: Switch to fasteoi flow
irqchip/gic-v3: Fix GIC_LINE_NR accessor
irqchip/atmel-aic5: Add support for sam9x60 irqchip
irqchip/al-fic: Add support for irq retrigger
Linus Torvalds [Sun, 20 Oct 2019 10:25:12 +0000 (06:25 -0400)]
Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull hrtimer fixlet from Thomas Gleixner:
"A single commit annotating the lockcless access to timer->base with
READ_ONCE() and adding the WRITE_ONCE() counterparts for completeness"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
hrtimer: Annotate lockless access to timer->base
Linus Torvalds [Sun, 20 Oct 2019 10:22:25 +0000 (06:22 -0400)]
Merge branch 'core-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull stop-machine fix from Thomas Gleixner:
"A single fix, amending stop machine with WRITE/READ_ONCE() to address
the fallout of KCSAN"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
stop_machine: Avoid potential race behaviour
Linus Torvalds [Sat, 19 Oct 2019 21:09:11 +0000 (17:09 -0400)]
Merge git://git./linux/kernel/git/netdev/net
Pull networking fixes from David Miller:
"I was battling a cold after some recent trips, so quite a bit piled up
meanwhile, sorry about that.
Highlights:
1) Fix fd leak in various bpf selftests, from Brian Vazquez.
2) Fix crash in xsk when device doesn't support some methods, from
Magnus Karlsson.
3) Fix various leaks and use-after-free in rxrpc, from David Howells.
4) Fix several SKB leaks due to confusion of who owns an SKB and who
should release it in the llc code. From Eric Biggers.
5) Kill a bunc of KCSAN warnings in TCP, from Eric Dumazet.
6) Jumbo packets don't work after resume on r8169, as the BIOS resets
the chip into non-jumbo mode during suspend. From Heiner Kallweit.
7) Corrupt L2 header during MPLS push, from Davide Caratti.
8) Prevent possible infinite loop in tc_ctl_action, from Eric
Dumazet.
9) Get register bits right in bcmgenet driver, based upon chip
version. From Florian Fainelli.
10) Fix mutex problems in microchip DSA driver, from Marek Vasut.
11) Cure race between route lookup and invalidation in ipv4, from Wei
Wang.
12) Fix performance regression due to false sharing in 'net'
structure, from Eric Dumazet"
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (145 commits)
net: reorder 'struct net' fields to avoid false sharing
net: dsa: fix switch tree list
net: ethernet: dwmac-sun8i: show message only when switching to promisc
net: aquantia: add an error handling in aq_nic_set_multicast_list
net: netem: correct the parent's backlog when corrupted packet was dropped
net: netem: fix error path for corrupted GSO frames
macb: propagate errors when getting optional clocks
xen/netback: fix error path of xenvif_connect_data()
net: hns3: fix mis-counting IRQ vector numbers issue
net: usb: lan78xx: Connect PHY before registering MAC
vsock/virtio: discard packets if credit is not respected
vsock/virtio: send a credit update when buffer size is changed
mlxsw: spectrum_trap: Push Ethernet header before reporting trap
net: ensure correct skb->tstamp in various fragmenters
net: bcmgenet: reset 40nm EPHY on energy detect
net: bcmgenet: soft reset 40nm EPHYs before MAC init
net: phy: bcm7xxx: define soft_reset for 40nm EPHY
net: bcmgenet: don't set phydev->link from MAC
net: Update address for MediaTek ethernet driver in MAINTAINERS
ipv4: fix race condition between route lookup and invalidation
...
Eric Dumazet [Fri, 18 Oct 2019 22:20:05 +0000 (15:20 -0700)]
net: reorder 'struct net' fields to avoid false sharing
Intel test robot reported a ~7% regression on TCP_CRR tests
that they bisected to the cited commit.
Indeed, every time a new TCP socket is created or deleted,
the atomic counter net->count is touched (via get_net(net)
and put_net(net) calls)
So cpus might have to reload a contended cache line in
net_hash_mix(net) calls.
We need to reorder 'struct net' fields to move @hash_mix
in a read mostly cache line.
We move in the first cache line fields that can be
dirtied often.
We probably will have to address in a followup patch
the __randomize_layout that was added in linux-4.13,
since this might break our placement choices.
Fixes:
355b98553789 ("netns: provide pure entropy for net_hash_mix()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Fri, 18 Oct 2019 21:02:46 +0000 (17:02 -0400)]
net: dsa: fix switch tree list
If there are multiple switch trees on the device, only the last one
will be listed, because the arguments of list_add_tail are swapped.
Fixes:
83c0afaec7b7 ("net: dsa: Add new binding implementation")
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mans Rullgard [Fri, 18 Oct 2019 16:56:58 +0000 (17:56 +0100)]
net: ethernet: dwmac-sun8i: show message only when switching to promisc
Printing the info message every time more than the max number of mac
addresses are requested generates unnecessary log spam. Showing it only
when the hw is not already in promiscous mode is equally informative
without being annoying.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Chenwandun [Fri, 18 Oct 2019 10:20:37 +0000 (18:20 +0800)]
net: aquantia: add an error handling in aq_nic_set_multicast_list
add an error handling in aq_nic_set_multicast_list, it may not
work when hw_multicast_list_set error; and at the same time
it will remove gcc Wunused-but-set-variable warning.
Signed-off-by: Chenwandun <chenwandun@huawei.com>
Reviewed-by: Igor Russkikh <igor.russkikh@aquantia.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 19 Oct 2019 19:12:36 +0000 (12:12 -0700)]
Merge branch 'netem-fix-further-issues-with-packet-corruption'
Jakub Kicinski says:
====================
net: netem: fix further issues with packet corruption
This set is fixing two more issues with the netem packet corruption.
First patch (which was previously posted) avoids NULL pointer dereference
if the first frame gets freed due to allocation or checksum failure.
v2 improves the clarity of the code a little as requested by Cong.
Second patch ensures we don't return SUCCESS if the frame was in fact
dropped. Thanks to this commit message for patch 1 no longer needs the
"this will still break with a single-frame failure" disclaimer.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 18 Oct 2019 16:16:58 +0000 (09:16 -0700)]
net: netem: correct the parent's backlog when corrupted packet was dropped
If packet corruption failed we jump to finish_segs and return
NET_XMIT_SUCCESS. Seeing success will make the parent qdisc
increment its backlog, that's incorrect - we need to return
NET_XMIT_DROP.
Fixes:
6071bd1aa13e ("netem: Segment GSO packets on enqueue")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 18 Oct 2019 16:16:57 +0000 (09:16 -0700)]
net: netem: fix error path for corrupted GSO frames
To corrupt a GSO frame we first perform segmentation. We then
proceed using the first segment instead of the full GSO skb and
requeue the rest of the segments as separate packets.
If there are any issues with processing the first segment we
still want to process the rest, therefore we jump to the
finish_segs label.
Commit
177b8007463c ("net: netem: fix backlog accounting for
corrupted GSO frames") started using the pointer to the first
segment in the "rest of segments processing", but as mentioned
above the first segment may had already been freed at this point.
Backlog corrections for parent qdiscs have to be adjusted.
Fixes:
177b8007463c ("net: netem: fix backlog accounting for corrupted GSO frames")
Reported-by: kbuild test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reported-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Tretter [Fri, 18 Oct 2019 14:11:43 +0000 (16:11 +0200)]
macb: propagate errors when getting optional clocks
The tx_clk, rx_clk, and tsu_clk are optional. Currently the macb driver
marks clock as not available if it receives an error when trying to get
a clock. This is wrong, because a clock controller might return
-EPROBE_DEFER if a clock is not available, but will eventually become
available.
In these cases, the driver would probe successfully but will never be
able to adjust the clocks, because the clocks were not available during
probe, but became available later.
For example, the clock controller for the ZynqMP is implemented in the
PMU firmware and the clocks are only available after the firmware driver
has been probed.
Use devm_clk_get_optional() in instead of devm_clk_get() to get the
optional clock and propagate all errors to the calling function.
Signed-off-by: Michael Tretter <m.tretter@pengutronix.de>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Tested-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Juergen Gross [Fri, 18 Oct 2019 07:45:49 +0000 (09:45 +0200)]
xen/netback: fix error path of xenvif_connect_data()
xenvif_connect_data() calls module_put() in case of error. This is
wrong as there is no related module_get().
Remove the superfluous module_put().
Fixes:
279f438e36c0a7 ("xen-netback: Don't destroy the netdev until the vif is shut down")
Cc: <stable@vger.kernel.org> # 3.12
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: Wei Liu <wei.liu@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yonglong Liu [Fri, 18 Oct 2019 03:42:59 +0000 (11:42 +0800)]
net: hns3: fix mis-counting IRQ vector numbers issue
Currently, the num_msi_left means the vector numbers of NIC,
but if the PF supported RoCE, it contains the vector numbers
of NIC and RoCE(Not expected).
This may cause interrupts lost in some case, because of the
NIC module used the vector resources which belongs to RoCE.
This patch adds a new variable num_nic_msi to store the vector
numbers of NIC, and adjust the default TQP numbers and rss_size
according to the value of num_nic_msi.
Fixes:
46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support")
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sat, 19 Oct 2019 10:53:59 +0000 (06:53 -0400)]
Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
"Rather a lot of fixes, almost all affecting mm/"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (26 commits)
scripts/gdb: fix debugging modules on s390
kernel/events/uprobes.c: only do FOLL_SPLIT_PMD for uprobe register
mm/thp: allow dropping THP from page cache
mm/vmscan.c: support removing arbitrary sized pages from mapping
mm/thp: fix node page state in split_huge_page_to_list()
proc/meminfo: fix output alignment
mm/init-mm.c: include <linux/mman.h> for vm_committed_as_batch
mm/filemap.c: include <linux/ramfs.h> for generic_file_vm_ops definition
mm: include <linux/huge_mm.h> for is_vma_temporary_stack
zram: fix race between backing_dev_show and backing_dev_store
mm/memcontrol: update lruvec counters in mem_cgroup_move_account
ocfs2: fix panic due to ocfs2_wq is null
hugetlbfs: don't access uninitialized memmaps in pfn_range_valid_gigantic()
mm: memblock: do not enforce current limit for memblock_phys* family
mm: memcg: get number of pages on the LRU list in memcgroup base on lru_zone_size
mm/gup: fix a misnamed "write" argument, and a related bug
mm/gup_benchmark: add a missing "w" to getopt string
ocfs2: fix error handling in ocfs2_setattr()
mm: memcg/slab: fix panic in __free_slab() caused by premature memcg pointer release
mm/memunmap: don't access uninitialized memmap in memunmap_pages()
...
Ilya Leoshkevich [Sat, 19 Oct 2019 03:20:43 +0000 (20:20 -0700)]
scripts/gdb: fix debugging modules on s390
Currently lx-symbols assumes that module text is always located at
module->core_layout->base, but s390 uses the following layout:
+------+ <- module->core_layout->base
| GOT |
+------+ <- module->core_layout->base + module->arch->plt_offset
| PLT |
+------+ <- module->core_layout->base + module->arch->plt_offset +
| TEXT | module->arch->plt_size
+------+
Therefore, when trying to debug modules on s390, all the symbol
addresses are skewed by plt_offset + plt_size.
Fix by adding plt_offset + plt_size to module_addr in
load_module_symbols().
Link: http://lkml.kernel.org/r/20191017085917.81791-1-iii@linux.ibm.com
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Kieran Bingham <kbingham@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Song Liu [Sat, 19 Oct 2019 03:20:40 +0000 (20:20 -0700)]
kernel/events/uprobes.c: only do FOLL_SPLIT_PMD for uprobe register
Attaching uprobe to text section in THP splits the PMD mapped page table
into PTE mapped entries. On uprobe detach, we would like to regroup PMD
mapped page table entry to regain performance benefit of THP.
However, the regroup is broken For perf_event based trace_uprobe. This
is because perf_event based trace_uprobe calls uprobe_unregister twice
on close: first in TRACE_REG_PERF_CLOSE, then in
TRACE_REG_PERF_UNREGISTER. The second call will split the PMD mapped
page table entry, which is not the desired behavior.
Fix this by only use FOLL_SPLIT_PMD for uprobe register case.
Add a WARN() to confirm uprobe unregister never work on huge pages, and
abort the operation when this WARN() triggers.
Link: http://lkml.kernel.org/r/20191017164223.2762148-6-songliubraving@fb.com
Fixes:
5a52c9df62b4 ("uprobe: use FOLL_SPLIT_PMD instead of FOLL_SPLIT")
Signed-off-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: William Kucharski <william.kucharski@oracle.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Sat, 19 Oct 2019 03:20:36 +0000 (20:20 -0700)]
mm/thp: allow dropping THP from page cache
Once a THP is added to the page cache, it cannot be dropped via
/proc/sys/vm/drop_caches. Fix this issue with proper handling in
invalidate_mapping_pages().
Link: http://lkml.kernel.org/r/20191017164223.2762148-5-songliubraving@fb.com
Fixes:
99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Tested-by: Song Liu <songliubraving@fb.com>
Acked-by: Yang Shi <yang.shi@linux.alibaba.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: William Kucharski <william.kucharski@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
William Kucharski [Sat, 19 Oct 2019 03:20:33 +0000 (20:20 -0700)]
mm/vmscan.c: support removing arbitrary sized pages from mapping
__remove_mapping() assumes that pages can only be either base pages or
HPAGE_PMD_SIZE. Ask the page what size it is.
Link: http://lkml.kernel.org/r/20191017164223.2762148-4-songliubraving@fb.com
Fixes:
99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS")
Signed-off-by: William Kucharski <william.kucharski@oracle.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Song Liu <songliubraving@fb.com>
Acked-by: Yang Shi <yang.shi@linux.alibaba.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Sat, 19 Oct 2019 03:20:30 +0000 (20:20 -0700)]
mm/thp: fix node page state in split_huge_page_to_list()
Make sure split_huge_page_to_list() handles the state of shmem THP and
file THP properly.
Link: http://lkml.kernel.org/r/20191017164223.2762148-3-songliubraving@fb.com
Fixes:
60fbf0ab5da1 ("mm,thp: stats for file backed THP")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Tested-by: Song Liu <songliubraving@fb.com>
Acked-by: Yang Shi <yang.shi@linux.alibaba.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: William Kucharski <william.kucharski@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Sat, 19 Oct 2019 03:20:27 +0000 (20:20 -0700)]
proc/meminfo: fix output alignment
Patch series "Fixes for THP in page cache", v2.
This patch (of 5):
Add extra space for FileHugePages and FilePmdMapped, so the output is
aligned with other rows.
Link: http://lkml.kernel.org/r/20191017164223.2762148-2-songliubraving@fb.com
Fixes:
60fbf0ab5da1 ("mm,thp: stats for file backed THP")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Tested-by: Song Liu <songliubraving@fb.com>
Acked-by: Yang Shi <yang.shi@linux.alibaba.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: William Kucharski <william.kucharski@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ben Dooks (Codethink) [Sat, 19 Oct 2019 03:20:24 +0000 (20:20 -0700)]
mm/init-mm.c: include <linux/mman.h> for vm_committed_as_batch
mm_init.c needs to include <linux/mman.h> for the definition of
vm_committed_as_batch. Fixes the following sparse warning:
mm/mm_init.c:141:5: warning: symbol 'vm_committed_as_batch' was not declared. Should it be static?
Link: http://lkml.kernel.org/r/20191016091509.26708-1-ben.dooks@codethink.co.uk
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ben Dooks [Sat, 19 Oct 2019 03:20:20 +0000 (20:20 -0700)]
mm/filemap.c: include <linux/ramfs.h> for generic_file_vm_ops definition
The generic_file_vm_ops is defined in <linux/ramfs.h> so include it to
fix the following warning:
mm/filemap.c:2717:35: warning: symbol 'generic_file_vm_ops' was not declared. Should it be static?
Link: http://lkml.kernel.org/r/20191008102311.25432-1-ben.dooks@codethink.co.uk
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ben Dooks [Sat, 19 Oct 2019 03:20:17 +0000 (20:20 -0700)]
mm: include <linux/huge_mm.h> for is_vma_temporary_stack
Include <linux/huge_mm.h> for the definition of is_vma_temporary_stack
to fix the following sparse warning:
mm/rmap.c:1673:6: warning: symbol 'is_vma_temporary_stack' was not declared. Should it be static?
Link: http://lkml.kernel.org/r/20191009151155.27763-1-ben.dooks@codethink.co.uk
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Reviewed-by: Qian Cai <cai@lca.pw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Chenwandun [Sat, 19 Oct 2019 03:20:14 +0000 (20:20 -0700)]
zram: fix race between backing_dev_show and backing_dev_store
CPU0: CPU1:
backing_dev_show backing_dev_store
...... ......
file = zram->backing_dev;
down_read(&zram->init_lock); down_read(&zram->init_init_lock)
file_path(file, ...); zram->backing_dev = backing_dev;
up_read(&zram->init_lock); up_read(&zram->init_lock);
gets the value of zram->backing_dev too early in backing_dev_show, which
resultin the value being NULL at the beginning, and not NULL later.
backtrace:
d_path+0xcc/0x174
file_path+0x10/0x18
backing_dev_show+0x40/0xb4
dev_attr_show+0x20/0x54
sysfs_kf_seq_show+0x9c/0x10c
kernfs_seq_show+0x28/0x30
seq_read+0x184/0x488
kernfs_fop_read+0x5c/0x1a4
__vfs_read+0x44/0x128
vfs_read+0xa0/0x138
SyS_read+0x54/0xb4
Link: http://lkml.kernel.org/r/1571046839-16814-1-git-send-email-chenwandun@huawei.com
Signed-off-by: Chenwandun <chenwandun@huawei.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: <stable@vger.kernel.org> [4.14+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Konstantin Khlebnikov [Sat, 19 Oct 2019 03:20:11 +0000 (20:20 -0700)]
mm/memcontrol: update lruvec counters in mem_cgroup_move_account
Mapped, dirty and writeback pages are also counted in per-lruvec stats.
These counters needs update when page is moved between cgroups.
Currently is nobody *consuming* the lruvec versions of these counters and
that there is no user-visible effect.
Link: http://lkml.kernel.org/r/157112699975.7360.1062614888388489788.stgit@buzz
Fixes:
00f3ca2c2d66 ("mm: memcontrol: per-lruvec stats infrastructure")
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yi Li [Sat, 19 Oct 2019 03:20:08 +0000 (20:20 -0700)]
ocfs2: fix panic due to ocfs2_wq is null
mount.ocfs2 failed when reading ocfs2 filesystem superblock encounters
an error. ocfs2_initialize_super() returns before allocating ocfs2_wq.
ocfs2_dismount_volume() triggers the following panic.
Oct 15 16:09:27 cnwarekv-205120 kernel: On-disk corruption discovered.Please run fsck.ocfs2 once the filesystem is unmounted.
Oct 15 16:09:27 cnwarekv-205120 kernel: (mount.ocfs2,22804,44): ocfs2_read_locked_inode:537 ERROR: status = -30
Oct 15 16:09:27 cnwarekv-205120 kernel: (mount.ocfs2,22804,44): ocfs2_init_global_system_inodes:458 ERROR: status = -30
Oct 15 16:09:27 cnwarekv-205120 kernel: (mount.ocfs2,22804,44): ocfs2_init_global_system_inodes:491 ERROR: status = -30
Oct 15 16:09:27 cnwarekv-205120 kernel: (mount.ocfs2,22804,44): ocfs2_initialize_super:2313 ERROR: status = -30
Oct 15 16:09:27 cnwarekv-205120 kernel: (mount.ocfs2,22804,44): ocfs2_fill_super:1033 ERROR: status = -30
------------[ cut here ]------------
Oops: 0002 [#1] SMP NOPTI
CPU: 1 PID: 11753 Comm: mount.ocfs2 Tainted: G E
4.14.148-200.ckv.x86_64 #1
Hardware name: Sugon H320-G30/35N16-US, BIOS 0SSDX017 12/21/2018
task:
ffff967af0520000 task.stack:
ffffa5f05484000
RIP: 0010:mutex_lock+0x19/0x20
Call Trace:
flush_workqueue+0x81/0x460
ocfs2_shutdown_local_alloc+0x47/0x440 [ocfs2]
ocfs2_dismount_volume+0x84/0x400 [ocfs2]
ocfs2_fill_super+0xa4/0x1270 [ocfs2]
? ocfs2_initialize_super.isa.211+0xf20/0xf20 [ocfs2]
mount_bdev+0x17f/0x1c0
mount_fs+0x3a/0x160
Link: http://lkml.kernel.org/r/1571139611-24107-1-git-send-email-yili@winhong.com
Signed-off-by: Yi Li <yilikernel@gmail.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Hildenbrand [Sat, 19 Oct 2019 03:20:05 +0000 (20:20 -0700)]
hugetlbfs: don't access uninitialized memmaps in pfn_range_valid_gigantic()
Uninitialized memmaps contain garbage and in the worst case trigger
kernel BUGs, especially with CONFIG_PAGE_POISONING. They should not get
touched.
Let's make sure that we only consider online memory (managed by the
buddy) that has initialized memmaps. ZONE_DEVICE is not applicable.
page_zone() will call page_to_nid(), which will trigger
VM_BUG_ON_PGFLAGS(PagePoisoned(page), page) with CONFIG_PAGE_POISONING
and CONFIG_DEBUG_VM_PGFLAGS when called on uninitialized memmaps. This
can be the case when an offline memory block (e.g., never onlined) is
spanned by a zone.
Note: As explained by Michal in [1], alloc_contig_range() will verify
the range. So it boils down to the wrong access in this function.
[1] http://lkml.kernel.org/r/
20180423000943.GO17484@dhcp22.suse.cz
Link: http://lkml.kernel.org/r/20191015120717.4858-1-david@redhat.com
Fixes:
f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") [visible after
d0dc12e86b319]
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: Michal Hocko <mhocko@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: <stable@vger.kernel.org> [4.13+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mike Rapoport [Sat, 19 Oct 2019 03:20:01 +0000 (20:20 -0700)]
mm: memblock: do not enforce current limit for memblock_phys* family
Until commit
92d12f9544b7 ("memblock: refactor internal allocation
functions") the maximal address for memblock allocations was forced to
memblock.current_limit only for the allocation functions returning
virtual address. The changes introduced by that commit moved the limit
enforcement into the allocation core and as a result the allocation
functions returning physical address also started to limit allocations
to memblock.current_limit.
This caused breakage of etnaviv GPU driver:
etnaviv etnaviv: bound 130000.gpu (ops gpu_ops)
etnaviv etnaviv: bound 134000.gpu (ops gpu_ops)
etnaviv etnaviv: bound 2204000.gpu (ops gpu_ops)
etnaviv-gpu 130000.gpu: model: GC2000, revision: 5108
etnaviv-gpu 130000.gpu: command buffer outside valid memory window
etnaviv-gpu 134000.gpu: model: GC320, revision: 5007
etnaviv-gpu 134000.gpu: command buffer outside valid memory window
etnaviv-gpu 2204000.gpu: model: GC355, revision: 1215
etnaviv-gpu 2204000.gpu: Ignoring GPU with VG and FE2.0
Restore the behaviour of memblock_phys* family so that these functions
will not enforce memblock.current_limit.
Link: http://lkml.kernel.org/r/1570915861-17633-1-git-send-email-rppt@kernel.org
Fixes:
92d12f9544b7 ("memblock: refactor internal allocation functions")
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reported-by: Adam Ford <aford173@gmail.com>
Tested-by: Adam Ford <aford173@gmail.com> [imx6q-logicpd]
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Fabio Estevam <festevam@gmail.com>
Cc: Lucas Stach <l.stach@pengutronix.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Honglei Wang [Sat, 19 Oct 2019 03:19:58 +0000 (20:19 -0700)]
mm: memcg: get number of pages on the LRU list in memcgroup base on lru_zone_size
Commit
1a61ab8038e72 ("mm: memcontrol: replace zone summing with
lruvec_page_state()") has made lruvec_page_state to use per-cpu counters
instead of calculating it directly from lru_zone_size with an idea that
this would be more effective.
Tim has reported that this is not really the case for their database
benchmark which is showing an opposite results where lruvec_page_state
is taking up a huge chunk of CPU cycles (about 25% of the system time
which is roughly 7% of total cpu cycles) on 5.3 kernels. The workload
is running on a larger machine (96cpus), it has many cgroups (500) and
it is heavily direct reclaim bound.
Tim Chen said:
: The problem can also be reproduced by running simple multi-threaded
: pmbench benchmark with a fast Optane SSD swap (see profile below).
:
:
: 6.15% 3.08% pmbench [kernel.vmlinux] [k] lruvec_lru_size
: |
: |--3.07%--lruvec_lru_size
: | |
: | |--2.11%--cpumask_next
: | | |
: | | --1.66%--find_next_bit
: | |
: | --0.57%--call_function_interrupt
: | |
: | --0.55%--smp_call_function_interrupt
: |
: |--1.59%--0x441f0fc3d009
: | _ops_rdtsc_init_base_freq
: | access_histogram
: | page_fault
: | __do_page_fault
: | handle_mm_fault
: | __handle_mm_fault
: | |
: | --1.54%--do_swap_page
: | swapin_readahead
: | swap_cluster_readahead
: | |
: | --1.53%--read_swap_cache_async
: | __read_swap_cache_async
: | alloc_pages_vma
: | __alloc_pages_nodemask
: | __alloc_pages_slowpath
: | try_to_free_pages
: | do_try_to_free_pages
: | shrink_node
: | shrink_node_memcg
: | |
: | |--0.77%--lruvec_lru_size
: | |
: | --0.76%--inactive_list_is_low
: | |
: | --0.76%--lruvec_lru_size
: |
: --1.50%--measure_read
: page_fault
: __do_page_fault
: handle_mm_fault
: __handle_mm_fault
: do_swap_page
: swapin_readahead
: swap_cluster_readahead
: |
: --1.48%--read_swap_cache_async
: __read_swap_cache_async
: alloc_pages_vma
: __alloc_pages_nodemask
: __alloc_pages_slowpath
: try_to_free_pages
: do_try_to_free_pages
: shrink_node
: shrink_node_memcg
: |
: |--0.75%--inactive_list_is_low
: | |
: | --0.75%--lruvec_lru_size
: |
: --0.73%--lruvec_lru_size
The likely culprit is the cache traffic the lruvec_page_state_local
generates. Dave Hansen says:
: I was thinking purely of the cache footprint. If it's reading
: pn->lruvec_stat_local->count[idx] is three separate cachelines, so 192
: bytes of cache *96 CPUs = 18k of data, mostly read-only. 1 cgroup would
: be 18k of data for the whole system and the caching would be pretty
: efficient and all 18k would probably survive a tight page fault loop in
: the L1. 500 cgroups would be ~90k of data per CPU thread which doesn't
: fit in the L1 and probably wouldn't survive a tight page fault loop if
: both logical threads were banging on different cgroups.
:
: It's just a theory, but it's why I noted the number of cgroups when I
: initially saw this show up in profiles
Fix the regression by partially reverting the said commit and calculate
the lru size explicitly.
Link: http://lkml.kernel.org/r/20190905071034.16822-1-honglei.wang@oracle.com
Fixes:
1a61ab8038e72 ("mm: memcontrol: replace zone summing with lruvec_page_state()")
Signed-off-by: Honglei Wang <honglei.wang@oracle.com>
Reported-by: Tim Chen <tim.c.chen@linux.intel.com>
Acked-by: Tim Chen <tim.c.chen@linux.intel.com>
Tested-by: Tim Chen <tim.c.chen@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: <stable@vger.kernel.org> [5.2+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
John Hubbard [Sat, 19 Oct 2019 03:19:53 +0000 (20:19 -0700)]
mm/gup: fix a misnamed "write" argument, and a related bug
In several routines, the "flags" argument is incorrectly named "write".
Change it to "flags".
Also, in one place, the misnaming led to an actual bug:
"flags & FOLL_WRITE" is required, rather than just "flags".
(That problem was flagged by krobot, in v1 of this patch.)
Also, change the flags argument from int, to unsigned int.
You can see that this was a simple oversight, because the
calling code passes "flags" to the fifth argument:
gup_pgd_range():
...
if (!gup_huge_pd(__hugepd(pgd_val(pgd)), addr,
PGDIR_SHIFT, next, flags, pages, nr))
...which, until this patch, the callees referred to as "write".
Also, change two lines to avoid checkpatch line length
complaints, and another line to fix another oversight
that checkpatch called out: missing "int" on pdshift.
Link: http://lkml.kernel.org/r/20191014184639.1512873-3-jhubbard@nvidia.com
Fixes:
b798bec4741b ("mm/gup: change write parameter to flags in fast walk")
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reported-by: kbuild test robot <lkp@intel.com>
Suggested-by: Kirill A. Shutemov <kirill@shutemov.name>
Suggested-by: Ira Weiny <ira.weiny@intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
John Hubbard [Sat, 19 Oct 2019 03:19:50 +0000 (20:19 -0700)]
mm/gup_benchmark: add a missing "w" to getopt string
Even though gup_benchmark.c has code to handle the -w command-line option,
the "w" is not part of the getopt string. It looks as if it has been
missing the whole time.
On my machine, this leads naturally to the following predictable result:
$ sudo ./gup_benchmark -w
./gup_benchmark: invalid option -- 'w'
...which is fixed with this commit.
Link: http://lkml.kernel.org/r/20191014184639.1512873-2-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: kbuild test robot <lkp@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Chengguang Xu [Sat, 19 Oct 2019 03:19:47 +0000 (20:19 -0700)]
ocfs2: fix error handling in ocfs2_setattr()
Should set transfer_to[USRQUOTA/GRPQUOTA] to NULL on error case before
jumping to do dqput().
Link: http://lkml.kernel.org/r/20191010082349.1134-1-cgxu519@mykernel.net
Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Roman Gushchin [Sat, 19 Oct 2019 03:19:44 +0000 (20:19 -0700)]
mm: memcg/slab: fix panic in __free_slab() caused by premature memcg pointer release
Karsten reported the following panic in __free_slab() happening on a s390x
machine:
Unable to handle kernel pointer dereference in virtual kernel address space
Failing address:
0000000000000000 TEID:
0000000000000483
Fault in home space mode while using kernel ASCE.
AS:
00000000017d4007 R3:
000000007fbd0007 S:
000000007fbff000 P:
000000000000003d
Oops: 0004 ilc:3 Ý#1¨ PREEMPT SMP
Modules linked in: tcp_diag inet_diag xt_tcpudp ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_at nf_nat
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.0-05872-g6133e3e4bada-dirty #14
Hardware name: IBM 2964 NC9 702 (z/VM 6.4.0)
Krnl PSW :
0704d00180000000 00000000003cadb6 (__free_slab+0x686/0x6b0)
R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
Krnl GPRS:
00000000f3a32928 0000000000000000 000000007fbf5d00 000000000117c4b8
0000000000000000 000000009e3291c1 0000000000000000 0000000000000000
0000000000000003 0000000000000008 000000002b478b00 000003d080a97600
0000000000000003 0000000000000008 000000002b478b00 000003d080a97600
000000000117ba00 000003e000057db0 00000000003cabcc 000003e000057c78
Krnl Code:
00000000003cada6:
e310a1400004 lg %r1,320(%r10)
00000000003cadac:
c0e50046c286 brasl %r14,ca32b8
#
00000000003cadb2:
a7f4fe36 brc 15,3caa1e
>
00000000003cadb6:
e32060800024 stg %r2,128(%r6)
00000000003cadbc:
a7f4fd9e brc 15,3ca8f8
00000000003cadc0:
c0e50046790c brasl %r14,c99fd8
00000000003cadc6:
a7f4fe2c brc 15,3caa
00000000003cadc6:
a7f4fe2c brc 15,3caa1e
00000000003cadca:
ecb1ffff00d9 aghik %r11,%r1,-1
Call Trace:
(<
00000000003cabcc> __free_slab+0x49c/0x6b0)
<
00000000001f5886> rcu_core+0x5a6/0x7e0
<
0000000000ca2dea> __do_softirq+0xf2/0x5c0
<
0000000000152644> irq_exit+0x104/0x130
<
000000000010d222> do_IRQ+0x9a/0xf0
<
0000000000ca2344> ext_int_handler+0x130/0x134
<
0000000000103648> enabled_wait+0x58/0x128
(<
0000000000103634> enabled_wait+0x44/0x128)
<
0000000000103b00> arch_cpu_idle+0x40/0x58
<
0000000000ca0544> default_idle_call+0x3c/0x68
<
000000000018eaa4> do_idle+0xec/0x1c0
<
000000000018ee0e> cpu_startup_entry+0x36/0x40
<
000000000122df34> arch_call_rest_init+0x5c/0x88
<
0000000000000000> 0x0
INFO: lockdep is turned off.
Last Breaking-Event-Address:
<
00000000003ca8f4> __free_slab+0x1c4/0x6b0
Kernel panic - not syncing: Fatal exception in interrupt
The kernel panics on an attempt to dereference the NULL memcg pointer.
When shutdown_cache() is called from the kmem_cache_destroy() context, a
memcg kmem_cache might have empty slab pages in a partial list, which are
still charged to the memory cgroup.
These pages are released by free_partial() at the beginning of
shutdown_cache(): either directly or by scheduling a RCU-delayed work
(if the kmem_cache has the SLAB_TYPESAFE_BY_RCU flag). The latter case
is when the reported panic can happen: memcg_unlink_cache() is called
immediately after shrinking partial lists, without waiting for scheduled
RCU works. It sets the kmem_cache->memcg_params.memcg pointer to NULL,
and the following attempt to dereference it by __free_slab() from the
RCU work context causes the panic.
To fix the issue, let's postpone the release of the memcg pointer to
destroy_memcg_params(). It's called from a separate work context by
slab_caches_to_rcu_destroy_workfn(), which contains a full RCU barrier.
This guarantees that all scheduled page release RCU works will complete
before the memcg pointer will be zeroed.
Big thanks for Karsten for the perfect report containing all necessary
information, his help with the analysis of the problem and testing of the
fix.
Link: http://lkml.kernel.org/r/20191010160549.1584316-1-guro@fb.com
Fixes:
fb2f2b0adb98 ("mm: memcg/slab: reparent memcg kmem_caches on cgroup removal")
Signed-off-by: Roman Gushchin <guro@fb.com>
Reported-by: Karsten Graul <kgraul@linux.ibm.com>
Tested-by: Karsten Graul <kgraul@linux.ibm.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Karsten Graul <kgraul@linux.ibm.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Aneesh Kumar K.V [Sat, 19 Oct 2019 03:19:39 +0000 (20:19 -0700)]
mm/memunmap: don't access uninitialized memmap in memunmap_pages()
Patch series "mm/memory_hotplug: Shrink zones before removing memory",
v6.
This series fixes the access of uninitialized memmaps when shrinking
zones/nodes and when removing memory. Also, it contains all fixes for
crashes that can be triggered when removing certain namespace using
memunmap_pages() - ZONE_DEVICE, reported by Aneesh.
We stop trying to shrink ZONE_DEVICE, as it's buggy, fixing it would be
more involved (we don't have SECTION_IS_ONLINE as an indicator), and
shrinking is only of limited use (set_zone_contiguous() cannot detect
the ZONE_DEVICE as contiguous).
We continue shrinking !ZONE_DEVICE zones, however, I reduced the amount
of code to a minimum. Shrinking is especially necessary to keep
zone->contiguous set where possible, especially, on memory unplug of
DIMMs at zone boundaries.
--------------------------------------------------------------------------
Zones are now properly shrunk when offlining memory blocks or when
onlining failed. This allows to properly shrink zones on memory unplug
even if the separate memory blocks of a DIMM were onlined to different
zones or re-onlined to a different zone after offlining.
Example:
:/# cat /proc/zoneinfo
Node 1, zone Movable
spanned 0
present 0
managed 0
:/# echo "online_movable" > /sys/devices/system/memory/memory41/state
:/# echo "online_movable" > /sys/devices/system/memory/memory43/state
:/# cat /proc/zoneinfo
Node 1, zone Movable
spanned 98304
present 65536
managed 65536
:/# echo 0 > /sys/devices/system/memory/memory43/online
:/# cat /proc/zoneinfo
Node 1, zone Movable
spanned 32768
present 32768
managed 32768
:/# echo 0 > /sys/devices/system/memory/memory41/online
:/# cat /proc/zoneinfo
Node 1, zone Movable
spanned 0
present 0
managed 0
This patch (of 10):
With an altmap, the memmap falling into the reserved altmap space are not
initialized and, therefore, contain a garbage NID and a garbage zone.
Make sure to read the NID/zone from a memmap that was initialized.
This fixes a kernel crash that is observed when destroying a namespace:
kernel BUG at include/linux/mm.h:1107!
cpu 0x1: Vector: 700 (Program Check) at [
c000000274087890]
pc:
c0000000004b9728: memunmap_pages+0x238/0x340
lr:
c0000000004b9724: memunmap_pages+0x234/0x340
...
pid = 3669, comm = ndctl
kernel BUG at include/linux/mm.h:1107!
devm_action_release+0x30/0x50
release_nodes+0x268/0x2d0
device_release_driver_internal+0x174/0x240
unbind_store+0x13c/0x190
drv_attr_store+0x44/0x60
sysfs_kf_write+0x70/0xa0
kernfs_fop_write+0x1ac/0x290
__vfs_write+0x3c/0x70
vfs_write+0xe4/0x200
ksys_write+0x7c/0x140
system_call+0x5c/0x68
The "page_zone(pfn_to_page(pfn)" was introduced by
69324b8f4833 ("mm,
devm_memremap_pages: add MEMORY_DEVICE_PRIVATE support"), however, I
think we will never have driver reserved memory with
MEMORY_DEVICE_PRIVATE (no altmap AFAIKS).
[david@redhat.com: minimze code changes, rephrase description]
Link: http://lkml.kernel.org/r/20191006085646.5768-2-david@redhat.com
Fixes:
2c2a5af6fed2 ("mm, memory_hotplug: add nid parameter to arch_remove_memory")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Damian Tometzki <damian.tometzki@gmail.com>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Halil Pasic <pasic@linux.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jun Yao <yaojun8558363@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pankaj Gupta <pagupta@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qian Cai <cai@lca.pw>
Cc: Rich Felker <dalias@libc.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Wei Yang <richardw.yang@linux.intel.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Yu Zhao <yuzhao@google.com>
Cc: <stable@vger.kernel.org> [5.0+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Hildenbrand [Sat, 19 Oct 2019 03:19:33 +0000 (20:19 -0700)]
mm/memory_hotplug: don't access uninitialized memmaps in shrink_pgdat_span()
We might use the nid of memmaps that were never initialized. For
example, if the memmap was poisoned, we will crash the kernel in
pfn_to_nid() right now. Let's use the calculated boundaries of the
separate zones instead. This now also avoids having to iterate over a
whole bunch of subsections again, after shrinking one zone.
Before commit
d0dc12e86b31 ("mm/memory_hotplug: optimize memory
hotplug"), the memmap was initialized to 0 and the node was set to the
right value. After that commit, the node might be garbage.
We'll have to fix shrink_zone_span() next.
Link: http://lkml.kernel.org/r/20191006085646.5768-4-david@redhat.com
Fixes:
f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") [
d0dc12e86b319]
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Wei Yang <richardw.yang@linux.intel.com>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Damian Tometzki <damian.tometzki@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Halil Pasic <pasic@linux.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jun Yao <yaojun8558363@gmail.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Pankaj Gupta <pagupta@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qian Cai <cai@lca.pw>
Cc: Rich Felker <dalias@libc.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Yu Zhao <yuzhao@google.com>
Cc: <stable@vger.kernel.org> [4.13+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Qian Cai [Sat, 19 Oct 2019 03:19:29 +0000 (20:19 -0700)]
mm/page_owner: don't access uninitialized memmaps when reading /proc/pagetypeinfo
Uninitialized memmaps contain garbage and in the worst case trigger
kernel BUGs, especially with CONFIG_PAGE_POISONING. They should not get
touched.
For example, when not onlining a memory block that is spanned by a zone
and reading /proc/pagetypeinfo with CONFIG_DEBUG_VM_PGFLAGS and
CONFIG_PAGE_POISONING, we can trigger a kernel BUG:
:/# echo 1 > /sys/devices/system/memory/memory40/online
:/# echo 1 > /sys/devices/system/memory/memory42/online
:/# cat /proc/pagetypeinfo > test.file
page:
fffff2c585200000 is uninitialized and poisoned
raw:
ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
raw:
ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
There is not page extension available.
------------[ cut here ]------------
kernel BUG at include/linux/mm.h:1107!
invalid opcode: 0000 [#1] SMP NOPTI
Please note that this change does not affect ZONE_DEVICE, because
pagetypeinfo_showmixedcount_print() is called from
mm/vmstat.c:pagetypeinfo_showmixedcount() only for populated zones, and
ZONE_DEVICE is never populated (zone->present_pages always 0).
[david@redhat.com: move check to outer loop, add comment, rephrase description]
Link: http://lkml.kernel.org/r/20191011140638.8160-1-david@redhat.com
Fixes:
f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") # visible after
d0dc12e86b319
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Miles Chen <miles.chen@mediatek.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: <stable@vger.kernel.org> [4.13+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>