Heiko Carstens [Mon, 8 Feb 2021 15:32:27 +0000 (16:32 +0100)]
s390/debug: use union tod_clock
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Heiko Carstens [Mon, 8 Feb 2021 15:27:33 +0000 (16:27 +0100)]
s390/kvm: use union tod_clock
Use union tod_clock and get rid of the kvm specific struct
kvm_s390_tod_clock_ext which apparently was introduced for the same
purpose.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Heiko Carstens [Mon, 8 Feb 2021 15:16:28 +0000 (16:16 +0100)]
s390/vdso: use union tod_clock
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Heiko Carstens [Mon, 8 Feb 2021 15:06:10 +0000 (16:06 +0100)]
s390/time: convert tod_clock_base to union
Convert tod_clock_base to union tod_clock. This simplifies quite a bit
of code and also fixes a bug in read_persistent_clock64();
void read_persistent_clock64(struct timespec64 *ts)
{
__u64 delta;
delta = initial_leap_seconds + TOD_UNIX_EPOCH;
get_tod_clock_ext(clk);
*(__u64 *) &clk[1] -= delta;
if (*(__u64 *) &clk[1] > delta)
clk[0]--;
ext_to_timespec64(clk, ts);
}
Assume &clk[1] == 3 and delta == 2; then after the substraction the if
condition becomes true and the epoch part of the clock is decremented
by one because of an assumed overflow, even though there is none.
Fix this by using 128 bit arithmetics and let the compiler do the
right thing:
void read_persistent_clock64(struct timespec64 *ts)
{
union tod_clock clk;
u64 delta;
delta = initial_leap_seconds + TOD_UNIX_EPOCH;
store_tod_clock_ext(&clk);
clk.eitod -= delta;
ext_to_timespec64(&clk, ts);
}
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Heiko Carstens [Mon, 8 Feb 2021 12:58:47 +0000 (13:58 +0100)]
s390/time: introduce new store_tod_clock_ext()
Introduce new store_tod_clock_ext() function, which is the same like
store_tod_clock_ext_cc() except that it doesn't return a condition
code.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Heiko Carstens [Mon, 8 Feb 2021 12:56:49 +0000 (13:56 +0100)]
s390/time: rename store_tod_clock_ext() and use union tod_clock
Rename store_tod_clock_ext() to store_tod_clock_ext_cc() to reflect
that it returns a condition code and also use union tod_clock as
parameter.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Heiko Carstens [Sun, 7 Feb 2021 21:00:22 +0000 (22:00 +0100)]
s390/time: introduce union tod_clock
Introduce union tod_clock which is supposed to be used to decode and
access various fields of the result of STORE CLOCK EXTENDED.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Heiko Carstens [Wed, 10 Feb 2021 20:51:02 +0000 (21:51 +0100)]
s390,alpha: switch to 64-bit ino_t
s390 and alpha are the only 64 bit architectures with a 32-bit ino_t.
Since this is quite unusual this causes bugs from time to time.
See e.g. commit
ebce3eb2f7ef ("ceph: fix inode number handling on
arches with 32-bit ino_t") for an example.
This (obviously) also prevents s390 and alpha to use 64-bit ino_t for
tmpfs. See commit
b85a7a8bb573 ("tmpfs: disallow CONFIG_TMPFS_INODE64
on s390").
Therefore switch both s390 and alpha to 64-bit ino_t. This should only
have an effect on the ustat system call. To prevent ABI breakage
define struct ustat compatible to the old layout and change
sys_ustat() accordingly.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Sven Schnelle [Wed, 3 Feb 2021 16:50:00 +0000 (17:50 +0100)]
s390: split cleanup_sie
The current code uses the address in %r11 to figure out whether
it was called from the machine check handler or from a normal
interrupt handler. Instead of doing this implicit logic (which
is mostly a leftover from the old critical cleanup approach)
just add a second label and use that.
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Sven Schnelle [Wed, 3 Feb 2021 16:46:12 +0000 (17:46 +0100)]
s390: use r13 in cleanup_sie as temp register
Instead of thrashing r11 which is normally our pointer to struct
pt_regs on the stack, use r13 as temporary register in the BR_EX
macro. r13 is already used in cleanup_sie, so no need to thrash
another register.
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Sven Schnelle [Wed, 3 Feb 2021 08:16:45 +0000 (09:16 +0100)]
s390: fix kernel asce loading when sie is interrupted
If a machine check is coming in during sie, the PU saves the
control registers to the machine check save area. Afterwards
mcck_int_handler is called, which loads __LC_KERNEL_ASCE into
%cr1. Later the code restores %cr1 from the machine check area,
but that is wrong when SIE was interrupted because the machine
check area still contains the gmap asce. Instead it should return
with either __KERNEL_ASCE in %cr1 when interrupted in SIE or
the previous %cr1 content saved in the machine check save area.
Fixes:
87d598634521 ("s390/mm: remove set_fs / rework address space handling")
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Cc: <stable@kernel.org> # v5.8+
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Sven Schnelle [Wed, 3 Feb 2021 08:02:51 +0000 (09:02 +0100)]
s390: add stack for machine check handler
The previous code used the normal kernel stack for machine checks.
This is problematic when a machine check interrupts a system call
or interrupt handler right at the beginning where registers are set up.
Assume system_call is interrupted at the first instruction and a machine
check is triggered. The machine check handler is called, checks the PSW
to see whether it is coming from user space, notices that it is already
in kernel mode but %r15 still contains the user space stack. This would
lead to a kernel crash.
There are basically two ways of fixing that: Either using the 'critical
cleanup' approach which compares the address in the PSW to see whether
it is already at a point where the stack has been set up, or use an extra
stack for the machine check handler.
For simplicity, we will go with the second approach and allocate an extra
stack. This adds some memory overhead for large systems, but usually large
system have plenty of memory so this isn't really a concern. But it keeps
the mchk stack setup simple and less error prone.
Fixes:
0b0ed657fe00 ("s390: remove critical section cleanup from entry.S")
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Cc: <stable@kernel.org> # v5.8+
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Sven Schnelle [Wed, 10 Feb 2021 12:39:19 +0000 (13:39 +0100)]
s390: use WRITE_ONCE when re-allocating async stack
The code does:
S390_lowcore.async_stack = new + STACK_INIT_OFFSET;
But the compiler is free to first assign one value and
add the other value later. If a IRQ would be coming in
between these two operations, it would run with an invalid
stack. Prevent this by using WRITE_ONCE.
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Sven Schnelle [Thu, 28 Jan 2021 12:06:05 +0000 (13:06 +0100)]
s390: open code SWITCH_KERNEL macro
This is a preparation patch for two later bugfixes. In the past both
int_handler and machine check handler used SWITCH_KERNEL to switch to
the kernel stack. However, SWITCH_KERNEL doesn't work properly in machine
check context. So instead of adding more complexity to this macro, just
remove it.
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Cc: <stable@kernel.org> # v5.8+
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Heiko Carstens [Tue, 2 Feb 2021 15:59:50 +0000 (16:59 +0100)]
s390/vtime: use cpu alternative for stck/stckf
Use a cpu alternative to switch between stck and stckf instead of
making it compile time dependent. This will also make kernels compiled
for old machines, but running on newer machines, use stckf.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Heiko Carstens [Tue, 2 Feb 2021 16:10:49 +0000 (17:10 +0100)]
s390/alternatives: add alternative_input() / alternative_io()
Add support for alternative inline assemblies with input and output
arguments. This is consistent to x86.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Heiko Carstens [Tue, 2 Feb 2021 12:46:47 +0000 (13:46 +0100)]
s390/entry: use cpu alternative for stck/stckf
Use a cpu alternative to switch between stck and stckf instead of
making it compile time dependent. This will also make kernels compiled
for old machines, but running on newer machines, use stckf.
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Heiko Carstens [Mon, 1 Feb 2021 20:53:08 +0000 (21:53 +0100)]
s390/time: use stcke instead of stck
Use STORE CLOCK EXTENDED instead of STORE CLOCK in early tod clock
setup. This is just to remove another usage of stck, trying to remove
all usages of STORE CLOCK. This doesn't fix anything.
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Heiko Carstens [Mon, 1 Feb 2021 20:40:22 +0000 (21:40 +0100)]
s390/cpum_cf_diag: use get_tod_clock_fast()
Use get_tod_clock_fast() instead of store_tod_clock(), since
store_tod_clock() can be very slow.
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Heiko Carstens [Tue, 2 Feb 2021 15:45:37 +0000 (16:45 +0100)]
s390/vtime: fix inline assembly clobber list
The stck/stckf instruction used within the inline assembly within
do_account_vtime() changes the condition code. This is not reflected
with the clobber list, and therefore might result in incorrect code
generation.
It seems unlikely that the compiler could generate incorrect code
considering the surrounding C code, but it must still be fixed.
Cc: <stable@vger.kernel.org>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Heiko Carstens [Sun, 31 Jan 2021 22:07:42 +0000 (23:07 +0100)]
s390/vdso: on timens page fault prefault also VVAR page
This is the s390 variant of commit
e6b28ec65b6d ("x86/vdso: On timens
page fault prefault also VVAR page").
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Heiko Carstens [Fri, 5 Feb 2021 15:19:32 +0000 (16:19 +0100)]
s390/vdso: implement generic vdso time namespace support
Implement generic vdso time namespace support which also enables time
namespaces for s390. This is quite similar to what arm64 has.
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Heiko Carstens [Sun, 31 Jan 2021 21:27:36 +0000 (22:27 +0100)]
s390/vdso: simplify __arch_get_hw_counter()
Use the passed in vdso_data pointer instead of calculating it again.
This is also required as a prerequisite for vdso time namespaces: if a
process is part of a time namespace __arch_get_vdso_data() will return
a pointer to the time namespace data page instead of the vdso data
page, which is not what __arch_get_hw_counter() expects.
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Heiko Carstens [Fri, 5 Feb 2021 15:09:14 +0000 (16:09 +0100)]
s390/vdso: move data page before code pages
For consistency with x86 and arm64 move the data page before code
pages. Similar to commit
601255ae3c98 ("arm64: vdso: move data page
before code pages").
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Heiko Carstens [Sun, 24 Jan 2021 21:01:16 +0000 (22:01 +0100)]
s390/vdso: put vdso datapage in a separate vma
Add a separate "[vvar]" mapping for the vdso datapage, since it
doesn't need to be executable or COW-able.
This is actually the s390 implementation of commit
871549385278
("arm64: vdso: put vdso datapage in a separate vma")
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Heiko Carstens [Sun, 24 Jan 2021 20:36:14 +0000 (21:36 +0100)]
s390/vdso: get rid of vdso_fault
Implement vdso mapping similar to arm64 and powerpc.
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Heiko Carstens [Sun, 24 Jan 2021 19:57:08 +0000 (20:57 +0100)]
s390/vdso: misc simple code changes
- remove unneeded includes
- move functions around
- remove obvious and/or incorrect comments
- shorten some if conditions
No functional change.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Heiko Carstens [Sun, 24 Jan 2021 19:22:29 +0000 (20:22 +0100)]
s390/vdso: remove superfluous variables
A few local variables exist only so the contents of a global variable
can be copied to them, and use that value only for reading.
Just remove them and rename some global variables. Also change
vdso64_[start|end] to be character arrays to be consistent with other
architectures, and get rid of the global variable vdso64_kbase.
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Heiko Carstens [Sun, 24 Jan 2021 19:10:27 +0000 (20:10 +0100)]
s390/vdso: remove superfluous check
vdso_pages (aka vdso64_pages) is never 0, therefore remove the check.
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Heiko Carstens [Sun, 24 Jan 2021 19:08:40 +0000 (20:08 +0100)]
s390/vdso: remove BUG_ON()
Handle allocation error gracefully and simply disable vdso instead of
leaving the system in an undefined state.
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Heiko Carstens [Sun, 24 Jan 2021 19:04:08 +0000 (20:04 +0100)]
s390/vdso: simplify vdso size calculation
The vdso is (and must) be page aligned and its size must also be
a multiple of PAGE_SIZE. Therefore no need to round upwards.
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Heiko Carstens [Sun, 24 Jan 2021 18:51:34 +0000 (19:51 +0100)]
s390/vdso: convert vdso_init() to arch_initcall
Convert vdso_init() to arch_initcall like it is on all other architectures.
This requires to remove the vdso_getcpu_init() call from vdso_init()
since it must be called before smp is enabled.
vdso_getcpu_init() is now an early_initcall like on powerpc.
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Heiko Carstens [Sun, 24 Jan 2021 18:44:18 +0000 (19:44 +0100)]
s390/vdso: fix vdso data page definition
The vdso data page actually contains an array. Fix that.
This doesn't fix a real bug, just reflects reality.
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Heiko Carstens [Sun, 24 Jan 2021 18:37:55 +0000 (19:37 +0100)]
s390/vdso: remove VDSO32_LBASE compat leftover
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Halil Pasic [Thu, 4 Feb 2021 15:22:05 +0000 (16:22 +0100)]
s390/defconfig: add some NFT modules
Since Fedora 33 the virtualization stack of Fedora requires a couple of
netfilter modules to function properly. Let's add these to defconfig and
debug_defconfig.
Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Tested-by: Bjoern Walk <bwalk@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Marc Hartmayer [Fri, 5 Feb 2021 15:22:03 +0000 (15:22 +0000)]
s390/debug_config: enable kmemleak detector
...but set it to off by default. Use the kernel command line option
`kmemleak=on` to enable it.
Signed-off-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Jan Höppner [Fri, 5 Feb 2021 12:50:57 +0000 (13:50 +0100)]
Documentations: scsi, kvm: Update s390-tools GitHub URL
The GitHub organisation name under which the s390-tools package is being
hosted has changed. Update the web link.
Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Harald Freudenberger [Thu, 21 Jan 2021 15:03:08 +0000 (16:03 +0100)]
s390/zcrypt: return EIO when msg retry limit reached
When a msg is retried because the lower ap layer returns -EAGAIN
there is a retry limit (currently 10). When this limit is reached
the last return code from the lower layer is returned, causing
the userspace to get -1 on the ioctl with errno EAGAIN.
This EAGAIN is misleading here. After 10 retry attempts the
userspace should receive a clear failure indication like EINVAL
or EIO or ENODEV. However, the reason why these retries all
fail is unclear. On an invalid message EINVAL would be returned
by the lower layer, and if devices go away or are not available
an ENODEV is seen. So this patch now reworks the retry loops
to return EIO to userspace when the retry limit is reached.
Fixes:
91ffc519c199 ("s390/zcrypt: introduce msg tracking in zcrypt functions")
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Alexander Egorenkov [Wed, 3 Feb 2021 18:28:35 +0000 (19:28 +0100)]
s390/thread_info.h: fix task_struct declaration warning
Add missing forward declaration for task_struct.
The warning appears when the -Werror C compiler flag is being used.
Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Alexander Egorenkov [Thu, 28 Jan 2021 15:33:54 +0000 (16:33 +0100)]
s390: update defconfigs
Disable CONFIG_TMPFS_INODE64 which is currently broken on s390x
because size of ino_t on s390x is 4 bytes.
This fixes the following error with kdump:
[ 9.415082] [608]: Remounting '/' read-only in with options 'size=238372k,nr_inodes=59593,inode64'.
[ 9.415093] rootfs: Cannot use inode64 with <64bit inums in kernel
[ 9.415093]
[ 9.415100] [608]: Failed to remount '/' read-only: Invalid argument
Fixes:
5c60ed283e1d ("s390: update defconfigs")
Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Jiapeng Zhong [Tue, 26 Jan 2021 09:09:12 +0000 (17:09 +0800)]
s390: Simplify the calculation of variables
Fix the following coccicheck warnings:
./arch/s390/include/asm/scsw.h:528:48-50: WARNING !A || A && B is
equivalent to !A || B.
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Zhong <abaci-bugfix@linux.alibaba.com>
Reviewed-by: Vineeth Vijayan <vneethv@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Chengyang Fan [Mon, 25 Jan 2021 09:58:39 +0000 (17:58 +0800)]
s390/ap: remove unneeded semicolon
Remove a superfluous semicolon after function definition.
Signed-off-by: Chengyang Fan <cy.fan@huawei.com>
Message-Id: <
20210125095839.1720265-1-cy.fan@huawei.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Niklas Schnelle [Wed, 22 Jul 2020 14:53:54 +0000 (16:53 +0200)]
s390/pci: refactor zpci_create_device()
Currently zpci_create_device() is only called in clp_add_pci_device()
which allocates the memory for the struct zpci_dev being created. There
is little separation of concerns as only both functions together can
create a zpci_dev and the only CLP specific code in
clp_add_pci_device() is a call to clp_query_pci_fn().
Improve this by removing clp_add_pci_device() and refactor
zpci_create_device() such that it alone creates and initializes the
zpci_dev given the FID and Function Handle. For this we need to make
clp_query_pci_fn() non-static. While at it remove the function handle
parameter since we can just take that from the zpci_dev. Also move
adding to the zpci_list to after the zdev has been fully created which
eliminates a window where a partially initialized zdev can be found by
get_zdev_by_fid().
Acked-by: Pierre Morel <pmorel@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Julian Wiedmann [Tue, 15 Sep 2020 07:04:39 +0000 (10:04 +0300)]
s390/qdio: track time of last data IRQ for each device
We currently track the time of the most recent QDIO Adapter Interrupt.
This is a system-wide timestamp (as such interrupts are not bound to
one specific qdio device).
If interrupt processing stalls on one device but is functional for a
different device, the timestamp continues to be updated and is of no
help for problem diagnosis.
So for debugging purposes also track the time of the last Data IRQ on
a per-device level. Collect this data in the legacy non-AI path as well.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Julian Wiedmann [Wed, 30 Sep 2020 08:09:07 +0000 (10:09 +0200)]
s390/qdio: make thinint registration symmetric
tiqdio_add_device() adds the device to the tiq_list of eligible targets
for a data IRQ, which gets walked on each QDIO Adapter Interrupt to
inspect their DSCIs.
But currently the tiqdio_add_device() / tiqdio_remove_device() calls
are not symmetric - the device is removed within qdio_shutdown(),
but only added by qdio_activate().
So depending on the call sequence and encountered errors, we might
be trying to remove a list entry in qdio_shutdown() that was never even
added to the list. This required additional INIT_LIST_HEAD() calls to
ensure that the list entry was always in a consistent state.
All drivers now fence the IRQ delivery via qdio_start_irq() /
qdio_stop_irq(), so we can nicely integrate this tiq_list management
with the other steps needed for QDIO Adapter IRQ (de-)registration
(qdio_establish_thinint() / qdio_shutdown_thinint()).
As the naming suggests these get called during qdio_establish() and
qdio_shutdown(), with proper symmetry and roll-back after errors.
With this we longer need to worry about misplaced list removals, and
thus can clean up the list API abuse (INIT_LIST_HEAD() should not be
called on list entries).
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Julian Wiedmann [Thu, 1 Oct 2020 07:47:47 +0000 (09:47 +0200)]
s390/qdio: adopt new tasklet API
Convert the Output Queue tasklet code to take a tasklet_struct as
parameter. Then initialize the tasklet with tasklet_setup() to indicate
that we follow the new model.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Julian Wiedmann [Mon, 25 Jan 2021 08:46:59 +0000 (09:46 +0100)]
s390/qdio: remove qdio_inbound_q_moved() wrapper
It's used in just one place, inline it.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Julian Wiedmann [Tue, 2 Jun 2020 12:09:10 +0000 (14:09 +0200)]
s390/qdio: remove Input tasklet code
Both qeth and zfcp have fully moved to the polling-driven flow for
Input Queues with commit
0a6e634535f1 ("s390/qdio: extend polling
support to multiple queues") and commit
0b524abc2dd1 ("scsi: zfcp: Lift
Input Queue tasklet from qdio").
So remove the tasklet code for Input Queues, streamline the IRQ handlers
and push the tasklet struct into struct qdio_output_q.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Harald Freudenberger [Fri, 15 Jan 2021 07:56:19 +0000 (08:56 +0100)]
s390/crypto: improve retry logic in case of master key change
A master key change on a CCA card may cause an immediately
following request to derive an protected key from a secure
key to fail with error condition 8/2290. The recommendation
from firmware is to retry with 1 second sleep.
So now the low level cca functions return -EAGAIN when this
error condition is seen and the paes retry function will
evaluate the return value. Seeing EAGAIN and running in
process context results in trying to sleep for 1 s now.
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Ingo Franzki <ifranzki@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Niklas Schnelle [Tue, 19 Jan 2021 07:49:37 +0000 (08:49 +0100)]
s390/pci: remove superfluous zdev->zbus check
Checking zdev->zbus for NULL in __zpci_event_availability() is
superfluous as it can never be NULL at this point. While harmless this
check causes smatch warnings because we later access zdev->zbus with
only having checked zdev != NULL which is sufficient.
The reason zdev->zbus can never be NULL is since with zdev != NULL given
we know the zdev came from get_zdev_by_fid() and thus the zpci_list.
Now on first glance at zpci_create_device() one may assume that there is
a window where the zdev is in the list without a zdev, however this
window can't overlap with __zpci_event_availability() as
zpci_create_device() either runs on the same kthread as part of
availability events, or during the initial CLP List PCI at which point
the __zpci_event_availability() is not yet called as zPCI is not yet
initialized.
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Sven Schnelle [Thu, 21 Jan 2021 09:36:13 +0000 (10:36 +0100)]
s390: add missing include to arch/s390/kernel/signal.c
This fixes the following warning:
CHECK linux/arch/s390/kernel/signal.c
linux/arch/s390/kernel/signal.c:465:6: warning: symbol 'arch_do_signal_or_restart' was not declared. Should it be static?
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Colin Ian King [Mon, 18 Jan 2021 11:32:22 +0000 (11:32 +0000)]
s390/tape: Fix spelling mistake in function name tape_3590_erp_succeded
Rename tape_3590_erp_succeded to tape_3590_erp_succeeded to fix a
spelling mistake in the function name.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Link: https://lore.kernel.org/r/20210118113222.71708-1-colin.king@canonical.com
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Sven Schnelle [Mon, 18 Jan 2021 08:35:38 +0000 (09:35 +0100)]
s390: pass struct pt_regs instead of registers to syscalls
Instead of fetching all registers from struct pt_regs and passing
them to the syscall wrappers, let the system call wrappers only
fetch the values really required.
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Sven Schnelle [Mon, 18 Jan 2021 08:00:58 +0000 (09:00 +0100)]
s390: remove asmlinkage
On s390 asmlinkage is a nop, so remove it.
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Sven Schnelle [Sat, 21 Nov 2020 10:14:56 +0000 (11:14 +0100)]
s390: convert to generic entry
This patch converts s390 to use the generic entry infrastructure from
kernel/entry/*.
There are a few special things on s390:
- PIF_PER_TRAP is moved to TIF_PER_TRAP as the generic code doesn't
know about our PIF flags in exit_to_user_mode_loop().
- The old code had several ways to restart syscalls:
a) PIF_SYSCALL_RESTART, which was only set during execve to force a
restart after upgrading a process (usually qemu-kvm) to pgste page
table extensions.
b) PIF_SYSCALL, which is set by do_signal() to indicate that the
current syscall should be restarted. This is changed so that
do_signal() now also uses PIF_SYSCALL_RESTART. Continuing to use
PIF_SYSCALL doesn't work with the generic code, and changing it
to PIF_SYSCALL_RESTART makes PIF_SYSCALL and PIF_SYSCALL_RESTART
more unique.
- On s390 calling sys_sigreturn or sys_rt_sigreturn is implemented by
executing a svc instruction on the process stack which causes a fault.
While handling that fault the fault code sets PIF_SYSCALL to hand over
processing to the syscall code on exit to usermode.
The patch introduces PIF_SYSCALL_RET_SET, which is set if ptrace sets
a return value for a syscall. The s390x ptrace ABI uses r2 both for the
syscall number and return value, so ptrace cannot set the syscall number +
return value at the same time. The flag makes handling that a bit easier.
do_syscall() will just skip executing the syscall if PIF_SYSCALL_RET_SET
is set.
CONFIG_DEBUG_ASCE was removd in favour of the generic CONFIG_DEBUG_ENTRY.
CR1/7/13 will be checked both on kernel entry and exit to contain the
correct asces.
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Heiko Carstens [Fri, 15 Jan 2021 18:56:12 +0000 (19:56 +0100)]
s390: update defconfigs
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Heiko Carstens [Fri, 15 Jan 2021 19:01:36 +0000 (20:01 +0100)]
s390/bitops: remove small optimization to fix clang build
clang does not know about the 'b1' construct used in bitops inline
assembly. Since the plan is to use compiler atomic builtins anyway
there is no point in requesting clang support for this. Especially if
one considers that the kernel seems to be the only user of this.
With removing this small optimization it is possible to compile the
kernel also with -march=zEC12 and higher using clang.
Build error:
In file included from ./include/linux/bitops.h:32:
./arch/s390/include/asm/bitops.h:69:4: error: invalid operand in inline asm: 'oi $0,${1:b}'
"oi %0,%b1\n"
^
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Heiko Carstens [Fri, 15 Jan 2021 18:39:23 +0000 (19:39 +0100)]
s390/atomic: remove small optimization to fix clang build
With commit
f0cbd3b83ed4 ("s390/atomic: circumvent gcc 10 build
regression") there was an attempt to workaroud a gcc build bug,
however with the workaround a similar problem with clang appeared.
It was recommended to use a workaround which would fail again with
gcc. Therefore simply remove the optimization. It is just not worth
the effort.
Besides that all of this will be changed to use compiler atomic
builtins instead anyway.
See https://reviews.llvm.org/D90231
and https://reviews.llvm.org/D91786
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Julian Wiedmann [Wed, 9 Dec 2020 10:24:13 +0000 (11:24 +0100)]
s390/cio: use dma helpers for setting masks
Bypassing the DMA API is bad style, even when we don't expect any
actual problems. Let's utilize the right API helpers for setting the
DMA masks and check for returned errors, so that we benefit from
common sanity checks.
io_subchannel_allocate_dev() required some extra massaging, so that we
can return an errno other than -ENOMEM.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Julian Wiedmann [Mon, 30 Nov 2020 08:19:57 +0000 (10:19 +0200)]
s390/cio: remove ccw_device_add() wrapper
Set the bus type when initializing the cdev structure. The device core
won't act on it until we call device_add().
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Vineeth Vijayan <vneethv@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Alexander Gordeev [Thu, 10 Dec 2020 11:32:51 +0000 (12:32 +0100)]
s390/tlb: make cleared_pXs flags consistent with generic code
On s390 cleared_pXs flags in struct mmu_gather are set by
corresponding pXd_free_tlb functions. Such approach is
inconsistent with how the generic code interprets these
flags, e.g pte_free_tlb() frees a PTE table - or a PMD
level entity, and so on.
This update does not bring any functional change, since
s390 does not use the flags at the moment.
Fixes:
9de7d833e3708 ("s390/tlb: Convert to generic mmu_gather")
Link: https://lore.kernel.org/lkml/fbb00ac0-9104-8d25-f225-7b3d1b17a01f@huawei.com/
Reported-by: Zhenyu Ye <yezhenyu2@huawei.com>
Suggested-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Linus Torvalds [Mon, 18 Jan 2021 00:37:05 +0000 (16:37 -0800)]
Linux 5.11-rc4
Linus Torvalds [Sun, 17 Jan 2021 21:14:46 +0000 (13:14 -0800)]
Merge tag 'perf-tools-fixes-2021-01-17' of git://git./linux/kernel/git/acme/linux
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- Fix 'CPU too large' error in Intel PT
- Correct event attribute sizes in 'perf inject'
- Sync build_bug.h and kvm.h kernel copies
- Fix bpf.h header include directive in 5sec.c 'perf trace' bpf example
- libbpf tests fixes
- Fix shadow stat 'perf test' for non-bash shells
- Take cgroups into account for shadow stats in 'perf stat'
* tag 'perf-tools-fixes-2021-01-17' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
perf inject: Correct event attribute sizes
perf intel-pt: Fix 'CPU too large' error
perf stat: Take cgroups into account for shadow stats
perf stat: Introduce struct runtime_stat_data
libperf tests: Fail when failing to get a tracepoint id
libperf tests: If a test fails return non-zero
libperf tests: Avoid uninitialized variable warning
perf test: Fix shadow stat test for non-bash shells
tools headers: Syncronize linux/build_bug.h with the kernel sources
tools headers UAPI: Sync kvm.h headers with the kernel sources
perf bpf examples: Fix bpf.h header include directive in 5sec.c example
Linus Torvalds [Sun, 17 Jan 2021 20:28:58 +0000 (12:28 -0800)]
Merge tag 'powerpc-5.11-4' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"One fix for a lack of alignment in our linker script, that can lead to
crashes depending on configuration etc.
One fix for the 32-bit VDSO after the C VDSO conversion.
Thanks to Andreas Schwab, Ariel Marcovitch, and Christophe Leroy"
* tag 'powerpc-5.11-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/vdso: Fix clock_gettime_fallback for vdso32
powerpc: Fix alignment bug within the init sections
Linus Torvalds [Sun, 17 Jan 2021 20:16:47 +0000 (12:16 -0800)]
Merge branch 'fixes' of git://git./linux/kernel/git/viro/vfs
Pull misc vfs fixes from Al Viro:
"Several assorted fixes.
I still think that audit ->d_name race is better fixed this way for
the benefit of backports, with any possibly fancier variants done on
top of it"
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
dump_common_audit_data(): fix racy accesses to ->d_name
iov_iter: fix the uaccess area in copy_compat_iovec_from_user
umount(2): move the flag validity checks first
Linus Torvalds [Sat, 16 Jan 2021 23:34:57 +0000 (15:34 -0800)]
mm: don't put pinned pages into the swap cache
So technically there is nothing wrong with adding a pinned page to the
swap cache, but the pinning obviously means that the page can't actually
be free'd right now anyway, so it's a bit pointless.
However, the real problem is not with it being a bit pointless: the real
issue is that after we've added it to the swap cache, we'll try to unmap
the page. That will succeed, because the code in mm/rmap.c doesn't know
or care about pinned pages.
Even the unmapping isn't fatal per se, since the page will stay around
in memory due to the pinning, and we do hold the connection to it using
the swap cache. But when we then touch it next and take a page fault,
the logic in do_swap_page() will map it back into the process as a
possibly read-only page, and we'll then break the page association on
the next COW fault.
Honestly, this issue could have been fixed in any of those other places:
(a) we could refuse to unmap a pinned page (which makes conceptual
sense), or (b) we could make sure to re-map a pinned page writably in
do_swap_page(), or (c) we could just make do_wp_page() not COW the
pinned page (which was what we historically did before that "mm:
do_wp_page() simplification" commit).
But while all of them are equally valid models for breaking this chain,
not putting pinned pages into the swap cache in the first place is the
simplest one by far.
It's also the safest one: the reason why do_wp_page() was changed in the
first place was that getting the "can I re-use this page" wrong is so
fraught with errors. If you do it wrong, you end up with an incorrectly
shared page.
As a result, using "page_maybe_dma_pinned()" in either do_wp_page() or
do_swap_page() would be a serious bug since it is only a (very good)
heuristic. Re-using the page requires a hard black-and-white rule with
no room for ambiguity.
In contrast, saying "this page is very likely dma pinned, so let's not
add it to the swap cache and try to unmap it" is an obviously safe thing
to do, and if the heuristic might very rarely be a false positive, no
harm is done.
Fixes:
09854ba94c6a ("mm: do_wp_page() simplification")
Reported-and-tested-by: Martin Raiber <martin@urbackup.org>
Cc: Pavel Begunkov <asml.silence@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 16 Jan 2021 20:25:40 +0000 (12:25 -0800)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Nine minor fixes, seven in drivers and two in the core SCSI disk
driver (sd) which should be harmless involving removing an unused
variable and quietening a spurious warning"
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: sd: Remove obsolete variable in sd_remove()
scsi: sd: Suppress spurious errors when WRITE SAME is being disabled
scsi: scsi_debug: Fix memleak in scsi_debug_init()
scsi: mpt3sas: Fix spelling mistake in Kconfig "compatiblity" -> "compatibility"
scsi: qedi: Correct max length of CHAP secret
scsi: ufs: Correct the LUN used in eh_device_reset_handler() callback
scsi: ufs: Relocate flush of exceptional event
scsi: ufs: Relax the condition of UFSHCI_QUIRK_SKIP_MANUAL_WB_FLUSH_CTRL
scsi: ufs: Fix possible power drain during system suspend
Al Viro [Tue, 5 Jan 2021 19:43:46 +0000 (14:43 -0500)]
dump_common_audit_data(): fix racy accesses to ->d_name
We are not guaranteed the locking environment that would prevent
dentry getting renamed right under us. And it's possible for
old long name to be freed after rename, leading to UAF here.
Cc: stable@kernel.org # v2.6.2+
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Linus Torvalds [Sat, 16 Jan 2021 19:39:58 +0000 (11:39 -0800)]
Merge tag 'block-5.11-2021-01-16' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"Just an nvme pull request via Christoph:
- don't initialize hwmon for discover controllers (Sagi Grimberg)
- fix iov_iter handling in nvme-tcp (Sagi Grimberg)
- fix a preempt warning in nvme-tcp (Sagi Grimberg)
- fix a possible NULL pointer dereference in nvme (Israel Rukshin)"
* tag 'block-5.11-2021-01-16' of git://git.kernel.dk/linux-block:
nvme: don't intialize hwmon for discovery controllers
nvme-tcp: fix possible data corruption with bio merges
nvme-tcp: Fix warning with CONFIG_DEBUG_PREEMPT
nvmet-rdma: Fix NULL deref when setting pi_enable and traddr INADDR_ANY
Linus Torvalds [Sat, 16 Jan 2021 19:12:02 +0000 (11:12 -0800)]
Merge tag 'io_uring-5.11-2021-01-16' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
"We still have a pending fix for a cancelation issue, but it's still
being investigated. In the meantime:
- Dead mm handling fix (Pavel)
- SQPOLL setup error handling (Pavel)
- Flush timeout sequence fix (Marcelo)
- Missing finish_wait() for one exit case"
* tag 'io_uring-5.11-2021-01-16' of git://git.kernel.dk/linux-block:
io_uring: ensure finish_wait() is always called in __io_uring_task_cancel()
io_uring: flush timeouts that should already have expired
io_uring: do sqo disable on install_fd error
io_uring: fix null-deref in io_disable_sqo_submit
io_uring: don't take files/mm for a dead task
io_uring: drop mm and files after task_work_run
Linus Torvalds [Sat, 16 Jan 2021 19:00:08 +0000 (11:00 -0800)]
Merge tag 'riscv-for-linus-5.11-rc4' of git://git./linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
"There are a few more fixes than a normal rc4, largely due to the
bubble introduced by the holiday break:
- return -ENOSYS for syscall number -1, which previously returned an
uninitialized value.
- ensure of_clk_init() has been called in time_init(), without which
clock drivers may not be initialized.
- fix sifive,uart0 driver to properly display the baud rate. A fix to
initialize MPIE that allows interrupts to be processed during
system calls.
- avoid erronously begin tracing IRQs when interrupts are disabled,
which at least triggers suprious lockdep failures.
- workaround for a warning related to calling smp_processor_id()
while preemptible. The warning itself is suprious on currently
availiable systems.
- properly include the generic time VDSO calls. A fix to our kasan
address mapping. A fix to the HiFive Unleashed device tree, which
allows the Ethernet PHY to be properly initialized by Linux (as
opposed to relying on the bootloader).
- defconfig update to include SiFive's GPIO driver, which is present
on the HiFive Unleashed and necessary to initialize the PHY.
- avoid allocating memory while initializing reserved memory.
- avoid allocating the last 4K of memory, as pointers there alias
with syscall errors.
There are also two cleanups that should have no functional effect but
do fix build warnings:
- drop a duplicated definition of PAGE_KERNEL_EXEC.
- properly declare the asm register SP shim.
- cleanup the rv32 memory size Kconfig entry, to reflect the actual
size of memory availiable"
* tag 'riscv-for-linus-5.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
RISC-V: Fix maximum allowed phsyical memory for RV32
RISC-V: Set current memblock limit
RISC-V: Do not allocate memblock while iterating reserved memblocks
riscv: stacktrace: Move register keyword to beginning of declaration
riscv: defconfig: enable gpio support for HiFive Unleashed
dts: phy: add GPIO number and active state used for phy reset
dts: phy: fix missing mdio device and probe failure of vsc8541-01 device
riscv: Fix KASAN memory mapping.
riscv: Fixup CONFIG_GENERIC_TIME_VSYSCALL
riscv: cacheinfo: Fix using smp_processor_id() in preemptible
riscv: Trace irq on only interrupt is enabled
riscv: Drop a duplicated PAGE_KERNEL_EXEC
riscv: Enable interrupts during syscalls with M-Mode
riscv: Fix sifive serial driver
riscv: Fix kernel time_init()
riscv: return -ENOSYS for syscall -1
Linus Torvalds [Sun, 10 Jan 2021 01:09:10 +0000 (17:09 -0800)]
mm: don't play games with pinned pages in clear_page_refs
Turning a pinned page read-only breaks the pinning after COW. Don't do it.
The whole "track page soft dirty" state doesn't work with pinned pages
anyway, since the page might be dirtied by the pinning entity without
ever being noticed in the page tables.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 8 Jan 2021 21:13:41 +0000 (13:13 -0800)]
mm: fix clear_refs_write locking
Turning page table entries read-only requires the mmap_sem held for
writing.
So stop doing the odd games with turning things from read locks to write
locks and back. Just get the write lock.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Atish Patra [Mon, 11 Jan 2021 23:45:04 +0000 (15:45 -0800)]
RISC-V: Fix maximum allowed phsyical memory for RV32
Linux kernel can only map 1GB of address space for RV32 as the page offset
is set to 0xC0000000. The current description in the Kconfig is confusing
as it indicates that RV32 can support 2GB of physical memory. That is
simply not true for current kernel. In future, a 2GB split support can be
added to allow 2GB physical address space.
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Atish Patra [Mon, 11 Jan 2021 23:45:02 +0000 (15:45 -0800)]
RISC-V: Set current memblock limit
Currently, linux kernel can not use last 4k bytes of addressable space
because IS_ERR_VALUE macro treats those as an error. This will be an issue
for RV32 as any memblock allocator potentially allocate chunk of memory
from the end of DRAM (2GB) leading bad address error even though the
address was technically valid.
Fix this issue by limiting the memblock if available memory spans the
entire address space.
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Atish Patra [Mon, 11 Jan 2021 23:45:01 +0000 (15:45 -0800)]
RISC-V: Do not allocate memblock while iterating reserved memblocks
Currently, resource tree allocates memory blocks while iterating on the
list. It leads to following kernel warning because memblock allocation
also invokes memory block reservation API.
[ 0.000000] ------------[ cut here ]------------
[ 0.000000] WARNING: CPU: 0 PID: 0 at kernel/resource.c:795
__insert_resource+0x8e/0xd0
[ 0.000000] Modules linked in:
[ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted
5.10.0-00022-ge20097fb37e2-dirty #549
[ 0.000000] epc:
c00125c2 ra :
c001262c sp :
c1c01f50
[ 0.000000] gp :
c1d456e0 tp :
c1c0a980 t0 :
ffffcf20
[ 0.000000] t1 :
00000000 t2 :
00000000 s0 :
c1c01f60
[ 0.000000] s1 :
ffffcf00 a0 :
ffffff00 a1 :
c1c0c0c4
[ 0.000000] a2 :
80c12b15 a3 :
80402000 a4 :
80402000
[ 0.000000] a5 :
c1c0c0c4 a6 :
80c12b15 a7 :
f5faf600
[ 0.000000] s2 :
c1c0c0c4 s3 :
c1c0e000 s4 :
c1009a80
[ 0.000000] s5 :
c1c0c000 s6 :
c1d48000 s7 :
c1613b4c
[ 0.000000] s8 :
00000fff s9 :
80000200 s10:
c1613b40
[ 0.000000] s11:
00000000 t3 :
c1d4a000 t4 :
ffffffff
This is also unnecessary as we can pre-compute the total memblocks required
for each memory region and allocate it before the loop. It save precious
boot time not going through memblock allocation code every time.
Fixes:
00ab027a3b82 ("RISC-V: Add kernel image sections to the resource tree")
Reviewed-by: Anup Patel <anup@brainfault.org>
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Christoph Hellwig [Mon, 11 Jan 2021 17:19:26 +0000 (18:19 +0100)]
iov_iter: fix the uaccess area in copy_compat_iovec_from_user
sizeof needs to be called on the compat pointer, not the native one.
Fixes:
89cd35c58bc2 ("iov_iter: transparently handle compat iovecs in import_iovec")
Reported-by: David Laight <David.Laight@ACULAB.COM>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Linus Torvalds [Sat, 16 Jan 2021 02:01:17 +0000 (18:01 -0800)]
Merge tag 'for-5.11/dm-fixes-1' of git://git./linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- Fix DM-raid's raid1 discard limits so discards work.
- Select missing Kconfig dependencies for DM integrity and zoned
targets.
- Four fixes for DM crypt target's support to optionally bypass kcryptd
workqueues.
- Fix DM snapshot merge supports missing data flushes before committing
metadata.
- Fix DM integrity data device flushing when external metadata is used.
- Fix DM integrity's maximum number of supported constructor arguments
that user can request when creating an integrity device.
- Eliminate DM core ioctl logging noise when an ioctl is issued without
required CAP_SYS_RAWIO permission.
* tag 'for-5.11/dm-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm crypt: defer decryption to a tasklet if interrupts disabled
dm integrity: fix the maximum number of arguments
dm crypt: do not call bio_endio() from the dm-crypt tasklet
dm integrity: fix flush with external metadata device
dm: eliminate potential source of excessive kernel log noise
dm snapshot: flush merged data before committing metadata
dm crypt: use GFP_ATOMIC when allocating crypto requests from softirq
dm crypt: do not wait for backlogged crypto request completion in softirq
dm zoned: select CONFIG_CRC32
dm integrity: select CRYPTO_SKCIPHER
dm raid: fix discard limits for raid1
Linus Torvalds [Fri, 15 Jan 2021 23:25:45 +0000 (15:25 -0800)]
Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
"10 patches.
Subsystems affected by this patch series: MAINTAINERS and mm (slub,
pagealloc, memcg, kasan, vmalloc, migration, hugetlb, memory-failure,
and process_vm_access)"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm/process_vm_access.c: include compat.h
mm,hwpoison: fix printing of page flags
MAINTAINERS: add Vlastimil as slab allocators maintainer
mm/hugetlb: fix potential missing huge page size info
mm: migrate: initialize err in do_migrate_pages
mm/vmalloc.c: fix potential memory leak
arm/kasan: fix the array size of kasan_early_shadow_pte[]
mm/memcontrol: fix warning in mem_cgroup_page_lruvec()
mm/page_alloc: add a missing mm_page_alloc_zone_locked() tracepoint
mm, slub: consider rest of partial list if acquire_slab() fails
Linus Torvalds [Fri, 15 Jan 2021 23:14:40 +0000 (15:14 -0800)]
Merge tag 'for-linus' of git://git./linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
"A fairly modest set of bug fixes, nothing abnormal from the merge
window
The ucma patch is a bit on the larger side, but given the regression
was recently added I've opted to forward it to the rc stream.
- Fix a ucma memory leak introduced in v5.9 while fixing the
Syzkaller bugs
- Don't fail when the xarray wraps for user verbs objects
- User triggerable oops regression from the umem page size rework
- Error unwind bugs in usnic, ocrdma, mlx5 and cma"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/cma: Fix error flow in default_roce_mode_store
RDMA/mlx5: Fix wrong free of blue flame register on error
IB/mlx5: Fix error unwinding when set_has_smi_cap fails
RDMA/umem: Avoid undefined behavior of rounddown_pow_of_two()
RDMA/ocrdma: Fix use after free in ocrdma_dealloc_ucontext_pd()
RDMA/usnic: Fix memleak in find_free_vf_and_create_qp_grp
RDMA/restrack: Don't treat as an error allocation ID wrapping
RDMA/ucma: Do not miss ctx destruction steps in some cases
Jens Axboe [Fri, 15 Jan 2021 23:04:23 +0000 (16:04 -0700)]
io_uring: ensure finish_wait() is always called in __io_uring_task_cancel()
If we enter with requests pending and performm cancelations, we'll have
a different inflight count before and after calling prepare_to_wait().
This causes the loop to restart. If we actually ended up canceling
everything, or everything completed in-between, then we'll break out
of the loop without calling finish_wait() on the waitqueue. This can
trigger a warning on exit_signals(), as we leave the task state in
TASK_UNINTERRUPTIBLE.
Put a finish_wait() after the loop to catch that case.
Cc: stable@vger.kernel.org # 5.9+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Fri, 15 Jan 2021 22:54:24 +0000 (14:54 -0800)]
Merge tag 'ext4_for_linus_stable' of git://git./linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"A number of bug fixes for ext4:
- Fix for the new fast_commit feature
- Fix some error handling codepaths in whiteout handling and
mountpoint sampling
- Fix how we write ext4_error information so it goes through the
journal when journalling is active, to avoid races that can lead to
lost error information, superblock checksum failures, or DIF/DIX
features"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: remove expensive flush on fast commit
ext4: fix bug for rename with RENAME_WHITEOUT
ext4: fix wrong list_splice in ext4_fc_cleanup
ext4: use IS_ERR instead of IS_ERR_OR_NULL and set inode null when IS_ERR
ext4: don't leak old mountpoint samples
ext4: drop ext4_handle_dirty_super()
ext4: fix superblock checksum failure when setting password salt
ext4: use sbi instead of EXT4_SB(sb) in ext4_update_super()
ext4: save error info to sb through journal if available
ext4: protect superblock modifications with a buffer lock
ext4: drop sync argument of ext4_commit_super()
ext4: combine ext4_handle_error() and save_error_info()
Linus Torvalds [Fri, 15 Jan 2021 22:39:21 +0000 (14:39 -0800)]
Merge tag '5.11-rc3-smb3' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
"Two small cifs fixes for stable (including an important handle leak
fix) and three small cleanup patches"
* tag '5.11-rc3-smb3' of git://git.samba.org/sfrench/cifs-2.6:
cifs: style: replace one-element array with flexible-array
cifs: connect: style: Simplify bool comparison
fs: cifs: remove unneeded variable in smb3_fs_context_dup
cifs: fix interrupted close commands
cifs: check pointer before freeing
Linus Torvalds [Fri, 15 Jan 2021 21:11:51 +0000 (13:11 -0800)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- Set the minimum GCC version to 5.1 for arm64 due to earlier compiler
bugs.
- Make atomic helpers __always_inline to avoid a section mismatch when
compiling with clang.
- Fix the CMA and crashkernel reservations to use ZONE_DMA (remove the
arm64_dma32_phys_limit variable, no longer needed with a dynamic
ZONE_DMA sizing in 5.11).
- Remove redundant IRQ flag tracing that was leaving lockdep
inconsistent with the hardware state.
- Revert perf events based hard lockup detector that was causing
smp_processor_id() to be called in preemptible context.
- Some trivial cleanups - spelling fix, renaming S_FRAME_SIZE to
PT_REGS_SIZE, function prototypes added.
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: selftests: Fix spelling of 'Mismatch'
arm64: syscall: include prototype for EL0 SVC functions
compiler.h: Raise minimum version of GCC to 5.1 for arm64
arm64: make atomic helpers __always_inline
arm64: rename S_FRAME_SIZE to PT_REGS_SIZE
Revert "arm64: Enable perf events based hard lockup detector"
arm64: entry: remove redundant IRQ flag tracing
arm64: Remove arm64_dma32_phys_limit and its uses
Linus Torvalds [Fri, 15 Jan 2021 21:07:33 +0000 (13:07 -0800)]
Merge tag 'mips_fixes_5.11.1' of git://git./linux/kernel/git/mips/linux
Pull MIPS fixes from Thomas Bogendoerfer:
- fix coredumps on 64bit kernels
- fix for alignment bugs preventing booting
- fix checking for failed irq_alloc_desc calls
* tag 'mips_fixes_5.11.1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: OCTEON: fix unreachable code in octeon_irq_init_ciu
MIPS: relocatable: fix possible boot hangup with KASLR enabled
MIPS: Fix malformed NT_FILE and NT_SIGINFO in 32bit coredumps
MIPS: boot: Fix unaligned access with CONFIG_MIPS_RAW_APPENDED_DTB
Al Grant [Tue, 24 Nov 2020 19:58:17 +0000 (19:58 +0000)]
perf inject: Correct event attribute sizes
When 'perf inject' reads a perf.data file from an older version of perf,
it writes event attributes into the output with the original size field,
but lays them out as if they had the size currently used. Readers see a
corrupt file. Update the size field to match the layout.
Signed-off-by: Al Grant <al.grant@foss.arm.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20201124195818.30603-1-al.grant@arm.com
Signed-off-by: Denis Nikitin <denik@chromium.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Thu, 7 Jan 2021 17:41:59 +0000 (19:41 +0200)]
perf intel-pt: Fix 'CPU too large' error
In some cases, the number of cpus (nr_cpus_online) is confused with the
maximum cpu number (nr_cpus_avail), which results in the error in the
example below:
Example on system with 8 cpus:
Before:
# echo 0 > /sys/devices/system/cpu/cpu2/online
# ./perf record --kcore -e intel_pt// taskset --cpu-list 7 uname
Linux
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.147 MB perf.data ]
# ./perf script --itrace=e
Requested CPU 7 too large. Consider raising MAX_NR_CPUS
0x25908 [0x8]: failed to process type: 68 [Invalid argument]
After:
# ./perf script --itrace=e
#
Fixes:
8c7274691f0d ("perf machine: Replace MAX_NR_CPUS with perf_env::nr_cpus_online")
Fixes:
7df4e36a4785 ("perf session: Replace MAX_NR_CPUS with perf_env::nr_cpus_online")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: stable@vger.kernel.org
Link: http://lore.kernel.org/lkml/20210107174159.24897-1-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Fri, 15 Jan 2021 07:11:39 +0000 (16:11 +0900)]
perf stat: Take cgroups into account for shadow stats
As of now it doesn't consider cgroups when collecting shadow stats and
metrics so counter values from different cgroups will be saved in a same
slot. This resulted in incorrect numbers when those cgroups have
different workloads.
For example, let's look at the scenario below: cgroups A and C runs same
workload which burns a cpu while cgroup B runs a light workload.
$ perf stat -a -e cycles,instructions --for-each-cgroup A,B,C sleep 1
Performance counter stats for 'system wide':
3,958,116,522 cycles A
6,722,650,929 instructions A # 2.53 insn per cycle
1,132,741 cycles B
571,743 instructions B # 0.00 insn per cycle
4,007,799,935 cycles C
6,793,181,523 instructions C # 2.56 insn per cycle
1.
001050869 seconds time elapsed
When I run 'perf stat' with single workload, it usually shows IPC around
1.7. We can verify it (6,722,650,929.0 / 3,958,116,522 = 1.698) for cgroup A.
But in this case, since cgroups are ignored, cycles are averaged so it
used the lower value for IPC calculation and resulted in around 2.5.
avg cycle: (
3958116522 + 1132741 +
4007799935) / 3 =
2655683066
IPC (A) :
6722650929 /
2655683066 = 2.531
IPC (B) : 571743 /
2655683066 = 0.0002
IPC (C) :
6793181523 /
2655683066 = 2.557
We can simply compare cgroup pointers in the evsel and it'll be NULL
when cgroups are not specified. With this patch, I can see correct
numbers like below:
$ perf stat -a -e cycles,instructions --for-each-cgroup A,B,C sleep 1
Performance counter stats for 'system wide':
4,171,051,687 cycles A
7,219,793,922 instructions A # 1.73 insn per cycle
1,051,189 cycles B
583,102 instructions B # 0.55 insn per cycle
4,171,124,710 cycles C
7,192,944,580 instructions C # 1.72 insn per cycle
1.
007909814 seconds time elapsed
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20210115071139.257042-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Fri, 15 Jan 2021 07:11:38 +0000 (16:11 +0900)]
perf stat: Introduce struct runtime_stat_data
To pass more info to the saved_value in the runtime_stat, add a new
struct runtime_stat_data. Currently it only has 'ctx' field but later
patch will add more.
Note that we intentionally pass 0 as ctx to clock-related events for
compatibility. It was already there in a few places. So move the code
into the saved_value_lookup() explicitly and add a comment.
Suggested-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20210115071139.257042-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Thu, 14 Jan 2021 18:02:50 +0000 (10:02 -0800)]
libperf tests: Fail when failing to get a tracepoint id
Permissions are necessary to get a tracepoint id. Fail the test when the
read fails.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20210114180250.3853825-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Thu, 14 Jan 2021 18:02:49 +0000 (10:02 -0800)]
libperf tests: If a test fails return non-zero
If a test fails return -1 rather than 0. This is consistent with the
return value in test-cpumap.c
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20210114180250.3853825-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Thu, 14 Jan 2021 21:23:04 +0000 (13:23 -0800)]
libperf tests: Avoid uninitialized variable warning
The variable 'bf' is read (for a write call) without being initialized
triggering a memory sanitizer warning. Use 'bf' in the read and switch
the write to reading from a string.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20210114212304.4018119-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Daejun Park [Wed, 6 Jan 2021 01:32:42 +0000 (10:32 +0900)]
ext4: remove expensive flush on fast commit
In the fast commit, it adds REQ_FUA and REQ_PREFLUSH on each fast
commit block when barrier is enabled. However, in recovery phase,
ext4 compares CRC value in the tail. So it is sufficient to add
REQ_FUA and REQ_PREFLUSH on the block that has tail.
Signed-off-by: Daejun Park <daejun7.park@samsung.com>
Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20210106013242epcms2p5b6b4ed8ca86f29456fdf56aa580e74b4@epcms2p5
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
yangerkun [Tue, 5 Jan 2021 06:28:57 +0000 (14:28 +0800)]
ext4: fix bug for rename with RENAME_WHITEOUT
We got a "deleted inode referenced" warning cross our fsstress test. The
bug can be reproduced easily with following steps:
cd /dev/shm
mkdir test/
fallocate -l 128M img
mkfs.ext4 -b 1024 img
mount img test/
dd if=/dev/zero of=test/foo bs=1M count=128
mkdir test/dir/ && cd test/dir/
for ((i=0;i<1000;i++)); do touch file$i; done # consume all block
cd ~ && renameat2(AT_FDCWD, /dev/shm/test/dir/file1, AT_FDCWD,
/dev/shm/test/dir/dst_file, RENAME_WHITEOUT) # ext4_add_entry in
ext4_rename will return ENOSPC!!
cd /dev/shm/ && umount test/ && mount img test/ && ls -li test/dir/file1
We will get the output:
"ls: cannot access 'test/dir/file1': Structure needs cleaning"
and the dmesg show:
"EXT4-fs error (device loop0): ext4_lookup:1626: inode #2049: comm ls:
deleted inode referenced: 139"
ext4_rename will create a special inode for whiteout and use this 'ino'
to replace the source file's dir entry 'ino'. Once error happens
latter(the error above was the ENOSPC return from ext4_add_entry in
ext4_rename since all space has been consumed), the cleanup do drop the
nlink for whiteout, but forget to restore 'ino' with source file. This
will trigger the bug describle as above.
Signed-off-by: yangerkun <yangerkun@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: stable@vger.kernel.org
Fixes:
cd808deced43 ("ext4: support RENAME_WHITEOUT")
Link: https://lore.kernel.org/r/20210105062857.3566-1-yangerkun@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Daejun Park [Wed, 30 Dec 2020 09:48:51 +0000 (18:48 +0900)]
ext4: fix wrong list_splice in ext4_fc_cleanup
After full/fast commit, entries in staging queue are promoted to main
queue. In ext4_fs_cleanup function, it splice to staging queue to
staging queue.
Fixes:
aa75f4d3daaeb ("ext4: main fast-commit commit path")
Signed-off-by: Daejun Park <daejun7.park@samsung.com>
Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20201230094851epcms2p6eeead8cc984379b37b2efd21af90fd1a@epcms2p6
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Yi Li [Wed, 30 Dec 2020 03:38:27 +0000 (11:38 +0800)]
ext4: use IS_ERR instead of IS_ERR_OR_NULL and set inode null when IS_ERR
1: ext4_iget/ext4_find_extent never returns NULL, use IS_ERR
instead of IS_ERR_OR_NULL to fix this.
2: ext4_fc_replay_inode should set the inode to NULL when IS_ERR.
and go to call iput properly.
Fixes:
8016e29f4362 ("ext4: fast commit recovery path")
Signed-off-by: Yi Li <yili@winhong.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20201230033827.3996064-1-yili@winhong.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Namhyung Kim [Thu, 14 Jan 2021 05:06:09 +0000 (14:06 +0900)]
perf test: Fix shadow stat test for non-bash shells
It was using some bash-specific features and failed to parse when
running with a different shell like below:
root@kbl-ppc:~/kbl-ws/perf-dev/lck-9077/acme.tmp/tools/perf# ./perf test 83 -vv
83: perf stat metrics (shadow stat) test :
--- start ---
test child forked, pid 3922
./tests/shell/stat+shadow_stat.sh: 19: ./tests/shell/stat+shadow_stat.sh: [[: not found
./tests/shell/stat+shadow_stat.sh: 24: ./tests/shell/stat+shadow_stat.sh: [[: not found
./tests/shell/stat+shadow_stat.sh: 30: ./tests/shell/stat+shadow_stat.sh: [[: not found
(standard_in) 2: syntax error
./tests/shell/stat+shadow_stat.sh: 36: ./tests/shell/stat+shadow_stat.sh: [[: not found
./tests/shell/stat+shadow_stat.sh: 19: ./tests/shell/stat+shadow_stat.sh: [[: not found
./tests/shell/stat+shadow_stat.sh: 24: ./tests/shell/stat+shadow_stat.sh: [[: not found
./tests/shell/stat+shadow_stat.sh: 30: ./tests/shell/stat+shadow_stat.sh: [[: not found
(standard_in) 2: syntax error
./tests/shell/stat+shadow_stat.sh: 36: ./tests/shell/stat+shadow_stat.sh: [[: not found
./tests/shell/stat+shadow_stat.sh: 45: ./tests/shell/stat+shadow_stat.sh: declare: not found
test child finished with -1
---- end ----
perf stat metrics (shadow stat) test: FAILED!
Reported-by: Jin Yao <yao.jin@linux.intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Laight <david.laight@aculab.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20210114050609.1258820-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Thu, 14 Jan 2021 15:35:27 +0000 (12:35 -0300)]
tools headers: Syncronize linux/build_bug.h with the kernel sources
To pick up the changes in:
3a176b94609a18f5 ("Revert "kbuild: avoid static_assert for genksyms"")
And silence this perf build warning:
Warning: Kernel ABI header at 'tools/include/linux/build_bug.h' differs from latest version at 'include/linux/build_bug.h'
diff -u tools/include/linux/build_bug.h include/linux/build_bug.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Thu, 14 Jan 2021 15:26:08 +0000 (12:26 -0300)]
tools headers UAPI: Sync kvm.h headers with the kernel sources
To pick the changes in:
647daca25d24fb6e ("KVM: SVM: Add support for booting APs in an SEV-ES guest")
That don't cause any tooling change, just silences this perf build
warning:
Warning: Kernel ABI header at 'tools/include/uapi/linux/kvm.h' differs from latest version at 'include/uapi/linux/kvm.h'
diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Tue, 29 Dec 2020 18:41:19 +0000 (15:41 -0300)]
perf bpf examples: Fix bpf.h header include directive in 5sec.c example
It was looking at bpf/bpf.h, which caused this problem:
# perf trace -e tools/perf/examples/bpf/5sec.c
/home/acme/git/perf/tools/perf/examples/bpf/5sec.c:42:10: fatal error: 'bpf/bpf.h' file not found
#include <bpf/bpf.h>
^~~~~~~~~~~
1 error generated.
ERROR: unable to compile tools/perf/examples/bpf/5sec.c
Hint: Check error message shown above.
Hint: You can also pre-compile it into .o using:
clang -target bpf -O2 -c tools/perf/examples/bpf/5sec.c
with proper -I and -D options.
event syntax error: 'tools/perf/examples/bpf/5sec.c'
\___ Failed to load tools/perf/examples/bpf/5sec.c from source: Error when compiling BPF scriptlet
#
Change that to plain bpf.h, to make it work again:
# perf trace -e tools/perf/examples/bpf/5sec.c sleep 5s
0.000 perf_bpf_probe:hrtimer_nanosleep(__probe_ip: -
1776891872, rqtp:
5000000000)
# perf trace -e tools/perf/examples/bpf/5sec.c/max-stack=16/ sleep 5s
0.000 perf_bpf_probe:hrtimer_nanosleep(__probe_ip: -
1776891872, rqtp:
5000000000)
hrtimer_nanosleep ([kernel.kallsyms])
common_nsleep ([kernel.kallsyms])
__x64_sys_clock_nanosleep ([kernel.kallsyms])
do_syscall_64 ([kernel.kallsyms])
entry_SYSCALL_64_after_hwframe ([kernel.kallsyms])
__clock_nanosleep_2 (/usr/lib64/libc-2.32.so)
# perf trace -e tools/perf/examples/bpf/5sec.c sleep 4s
#
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>