Robin Murphy [Tue, 16 Jul 2019 23:30:44 +0000 (16:30 -0700)]
mm: clean up is_device_*_page() definitions
Refactor is_device_{public,private}_page() with is_pci_p2pdma_page() to
make them all consistent in depending on their respective config options
even when CONFIG_DEV_PAGEMAP_OPS is enabled for other reasons. This
allows a little more compile-time optimisation as well as the conceptual
and cosmetic cleanup.
Link: http://lkml.kernel.org/r/187c2ab27dea70635d375a61b2f2076d26c032b0.1558547956.git.robin.murphy@arm.com
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Suggested-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oliver O'Halloran <oohall@gmail.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Aneesh Kumar K.V [Tue, 16 Jul 2019 23:30:41 +0000 (16:30 -0700)]
mm/mmap: move common defines to mman-common.h
Two architecture that use arch specific MMAP flags are powerpc and
sparc. We still have few flag values common across them and other
architectures. Consolidate this in mman-common.h.
Also update the comment to indicate where to find HugeTLB specific
reserved values
Link: http://lkml.kernel.org/r/20190604090950.31417-1-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Aneesh Kumar K.V [Tue, 16 Jul 2019 23:30:38 +0000 (16:30 -0700)]
mm: move MAP_SYNC to asm-generic/mman-common.h
This enables support for synchronous DAX fault on powerpc
The generic changes are added as part of
b6fb293f2497 ("mm: Define
MAP_SYNC and VM_SYNC flags")
Without this, mmap returns EOPNOTSUPP for MAP_SYNC with
MAP_SHARED_VALIDATE
Instead of adding MAP_SYNC with same value to
arch/powerpc/include/uapi/asm/mman.h, I am moving the #define to
asm-generic/mman-common.h. Two architectures using mman-common.h
directly are sparc and powerpc. We should be able to consloidate more
#defines to mman-common.h. That can be done as a separate patch.
Link: http://lkml.kernel.org/r/20190528091120.13322-1-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pavel Tatashin [Tue, 16 Jul 2019 23:30:35 +0000 (16:30 -0700)]
device-dax: "Hotremove" persistent memory that is used like normal RAM
It is now allowed to use persistent memory like a regular RAM, but
currently there is no way to remove this memory until machine is
rebooted.
This work expands the functionality to also allows hotremoving
previously hotplugged persistent memory, and recover the device for use
for other purposes.
To hotremove persistent memory, the management software must first
offline all memory blocks of dax region, and than unbind it from
device-dax/kmem driver. So, operations should look like this:
echo offline > /sys/devices/system/memory/memoryN/state
...
echo dax0.0 > /sys/bus/dax/drivers/kmem/unbind
Note: if unbind is done without offlining memory beforehand, it won't be
possible to do dax0.0 hotremove, and dax's memory is going to be part of
System RAM until reboot.
Link: http://lkml.kernel.org/r/20190517215438.6487-4-pasha.tatashin@soleen.com
Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: James Morris <jmorris@namei.org>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Ross Zwisler <zwisler@kernel.org>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Yaowei Bai <baiyaowei@cmss.chinamobile.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pavel Tatashin [Tue, 16 Jul 2019 23:30:31 +0000 (16:30 -0700)]
mm/hotplug: make remove_memory() interface usable
Presently the remove_memory() interface is inherently broken. It tries
to remove memory but panics if some memory is not offline. The problem
is that it is impossible to ensure that all memory blocks are offline as
this function also takes lock_device_hotplug that is required to change
memory state via sysfs.
So, between calling this function and offlining all memory blocks there
is always a window when lock_device_hotplug is released, and therefore,
there is always a chance for a panic during this window.
Make this interface to return an error if memory removal fails. This
way it is safe to call this function without panicking machine, and also
makes it symmetric to add_memory() which already returns an error.
Link: http://lkml.kernel.org/r/20190517215438.6487-3-pasha.tatashin@soleen.com
Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: James Morris <jmorris@namei.org>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Ross Zwisler <zwisler@kernel.org>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Yaowei Bai <baiyaowei@cmss.chinamobile.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pavel Tatashin [Tue, 16 Jul 2019 23:30:27 +0000 (16:30 -0700)]
device-dax: fix memory and resource leak if hotplug fails
Patch series ""Hotremove" persistent memory", v6.
Recently, adding a persistent memory to be used like a regular RAM was
added to Linux. This work extends this functionality to also allow hot
removing persistent memory.
We (Microsoft) have an important use case for this functionality.
The requirement is for physical machines with small amount of RAM (~8G)
to be able to reboot in a very short period of time (<1s). Yet, there
is a userland state that is expensive to recreate (~2G).
The solution is to boot machines with 2G preserved for persistent
memory.
Copy the state, and hotadd the persistent memory so machine still has
all 8G available for runtime. Before reboot, offline and hotremove
device-dax 2G, copy the memory that is needed to be preserved to pmem0
device, and reboot.
The series of operations look like this:
1. After boot restore /dev/pmem0 to ramdisk to be consumed by apps.
and free ramdisk.
2. Convert raw pmem0 to devdax
ndctl create-namespace --mode devdax --map mem -e namespace0.0 -f
3. Hotadd to System RAM
echo dax0.0 > /sys/bus/dax/drivers/device_dax/unbind
echo dax0.0 > /sys/bus/dax/drivers/kmem/new_id
echo online_movable > /sys/devices/system/memoryXXX/state
4. Before reboot hotremove device-dax memory from System RAM
echo offline > /sys/devices/system/memoryXXX/state
echo dax0.0 > /sys/bus/dax/drivers/kmem/unbind
5. Create raw pmem0 device
ndctl create-namespace --mode raw -e namespace0.0 -f
6. Copy the state that was stored by apps to ramdisk to pmem device
7. Do kexec reboot or reboot through firmware if firmware does not
zero memory in pmem0 region (These machines have only regular
volatile memory). So to have pmem0 device either memmap kernel
parameter is used, or devices nodes in dtb are specified.
This patch (of 3):
When add_memory() fails, the resource and the memory should be freed.
Link: http://lkml.kernel.org/r/20190517215438.6487-2-pasha.tatashin@soleen.com
Fixes:
c221c0b0308f ("device-dax: "Hotplug" persistent memory for use like normal RAM")
Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: James Morris <jmorris@namei.org>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Ross Zwisler <zwisler@kernel.org>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Yaowei Bai <baiyaowei@cmss.chinamobile.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tom Levy [Tue, 16 Jul 2019 23:30:24 +0000 (16:30 -0700)]
include/linux/lz4.h: fix spelling and copy-paste errors in documentation
Fix a few spelling and grammar errors, and two places where fast/safe in
the documentation did not match the function.
Link: http://lkml.kernel.org/r/20190321014452.13297-1-tomlevy93@gmail.com
Signed-off-by: Tom Levy <tomlevy93@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Jiri Kosina <trivial@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kees Cook [Tue, 16 Jul 2019 23:30:21 +0000 (16:30 -0700)]
ipc/mqueue.c: only perform resource calculation if user valid
Andreas Christoforou reported:
UBSAN: Undefined behaviour in ipc/mqueue.c:414:49 signed integer overflow:
9 *
2305843009213693951 cannot be represented in type 'long int'
...
Call Trace:
mqueue_evict_inode+0x8e7/0xa10 ipc/mqueue.c:414
evict+0x472/0x8c0 fs/inode.c:558
iput_final fs/inode.c:1547 [inline]
iput+0x51d/0x8c0 fs/inode.c:1573
mqueue_get_inode+0x8eb/0x1070 ipc/mqueue.c:320
mqueue_create_attr+0x198/0x440 ipc/mqueue.c:459
vfs_mkobj+0x39e/0x580 fs/namei.c:2892
prepare_open ipc/mqueue.c:731 [inline]
do_mq_open+0x6da/0x8e0 ipc/mqueue.c:771
Which could be triggered by:
struct mq_attr attr = {
.mq_flags = 0,
.mq_maxmsg = 9,
.mq_msgsize = 0x1fffffffffffffff,
.mq_curmsgs = 0,
};
if (mq_open("/testing", 0x40, 3, &attr) == (mqd_t) -1)
perror("mq_open");
mqueue_get_inode() was correctly rejecting the giant mq_msgsize, and
preparing to return -EINVAL. During the cleanup, it calls
mqueue_evict_inode() which performed resource usage tracking math for
updating "user", before checking if there was a valid "user" at all
(which would indicate that the calculations would be sane). Instead,
delay this check to after seeing a valid "user".
The overflow was real, but the results went unused, so while the flaw is
harmless, it's noisy for kernel fuzzers, so just fix it by moving the
calculation under the non-NULL "user" where it actually gets used.
Link: http://lkml.kernel.org/r/201906072207.ECB65450@keescook
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Andreas Christoforou <andreaschristofo@gmail.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Drew Davenport [Tue, 16 Jul 2019 23:30:18 +0000 (16:30 -0700)]
include/asm-generic/bug.h: fix "cut here" for WARN_ON for __WARN_TAINT architectures
For architectures using __WARN_TAINT, the WARN_ON macro did not print
out the "cut here" string. The other WARN_XXX macros would print "cut
here" inside __warn_printk, which is not called for WARN_ON since it
doesn't have a message to print.
Link: http://lkml.kernel.org/r/20190624154831.163888-1-ddavenport@chromium.org
Fixes:
a7bed27af194 ("bug: fix "cut here" location for __WARN_TAINT architectures")
Signed-off-by: Drew Davenport <ddavenport@chromium.org>
Acked-by: Kees Cook <keescook@chromium.org>
Tested-by: Kees Cook <keescook@chromium.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Leonard Crestez [Tue, 16 Jul 2019 23:30:15 +0000 (16:30 -0700)]
scripts/gdb: add helpers to find and list devices
Add helper commands and functions for finding pointers to struct device
by enumerating linux device bus/class infrastructure. This can be used
to fetch subsystem and driver-specific structs:
(gdb) p *$container_of($lx_device_find_by_class_name("net", "eth0"), "struct net_device", "dev")
(gdb) p *$container_of($lx_device_find_by_bus_name("i2c", "0-004b"), "struct i2c_client", "dev")
(gdb) p *(struct imx_port*)$lx_device_find_by_class_name("tty", "ttymxc1")->parent->driver_data
Several generic "lx-device-list" functions are included to enumerate
devices by bus and class:
(gdb) lx-device-list-bus usb
(gdb) lx-device-list-class
(gdb) lx-device-list-tree &platform_bus
Similar information is available in /sys but pointer values are
deliberately hidden.
Link: http://lkml.kernel.org/r/c948628041311cbf1b9b4cff3dda7d2073cb3eaa.1561492937.git.leonard.crestez@nxp.com
Signed-off-by: Leonard Crestez <leonard.crestez@nxp.com>
Reviewed-by: Stephen Boyd <sboyd@kernel.org>
Cc: Kieran Bingham <kbingham@kernel.org>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Leonard Crestez [Tue, 16 Jul 2019 23:30:12 +0000 (16:30 -0700)]
scripts/gdb: add lx-genpd-summary command
This is like /sys/kernel/debug/pm/pm_genpd_summary except it's
accessible through a debugger.
This can be useful if the target crashes or hangs because power domains
were not properly enabled.
Link: http://lkml.kernel.org/r/f9ee627a0d4f94b894aa202fee8a98444049bed8.1561492937.git.leonard.crestez@nxp.com
Signed-off-by: Leonard Crestez <leonard.crestez@nxp.com>
Reviewed-by: Stephen Boyd <sboyd@kernel.org>
Cc: Kieran Bingham <kbingham@kernel.org>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Miroslav Lichvar [Tue, 16 Jul 2019 23:30:09 +0000 (16:30 -0700)]
drivers/pps/pps.c: clear offset flags in PPS_SETPARAMS ioctl
The PPS assert/clear offset corrections are set by the PPS_SETPARAMS
ioctl in the pps_ktime structs, which also contain flags. The flags are
not initialized by applications (using the timepps.h header) and they
are not used by the kernel for anything except returning them back in
the PPS_GETPARAMS ioctl.
Set the flags to zero to make it clear they are unused and avoid leaking
uninitialized data of the PPS_SETPARAMS caller to other applications
that have a read access to the PPS device.
Link: http://lkml.kernel.org/r/20190702092251.24303-1-mlichvar@redhat.com
Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Rodolfo Giometti <giometti@enneenne.com>
Cc: Greg KH <greg@kroah.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joel Fernandes (Google) [Tue, 16 Jul 2019 23:30:06 +0000 (16:30 -0700)]
kernel/pid.c: convert struct pid count to refcount_t
struct pid's count is an atomic_t field used as a refcount. Use
refcount_t for it which is basically atomic_t but does additional
checking to prevent use-after-free bugs.
For memory ordering, the only change is with the following:
- if ((atomic_read(&pid->count) == 1) ||
- atomic_dec_and_test(&pid->count)) {
+ if (refcount_dec_and_test(&pid->count)) {
kmem_cache_free(ns->pid_cachep, pid);
Here the change is from: Fully ordered --> RELEASE + ACQUIRE (as per
refcount-vs-atomic.rst) This ACQUIRE should take care of making sure the
free happens after the refcount_dec_and_test().
The above hunk also removes atomic_read() since it is not needed for the
code to work and it is unclear how beneficial it is. The removal lets
refcount_dec_and_test() check for cases where get_pid() happened before
the object was freed.
Link: http://lkml.kernel.org/r/20190701183826.191936-1-joel@joelfernandes.org
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Reviewed-by: Andrea Parri <andrea.parri@amarulasolutions.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Elena Reshetova <elena.reshetova@intel.com>
Cc: Jann Horn <jannh@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: KJ Tsanaktsidis <ktsanaktsidis@zendesk.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dan Carpenter [Tue, 16 Jul 2019 23:30:03 +0000 (16:30 -0700)]
drivers/rapidio/devices/rio_mport_cdev.c: NUL terminate some strings
The dev_info.name[] array has space for RIO_MAX_DEVNAME_SZ + 1
characters. But the problem here is that we don't ensure that the user
put a NUL terminator on the end of the string. It could lead to an out
of bounds read.
Link: http://lkml.kernel.org/r/20190529110601.GB19119@mwanda
Fixes:
e8de370188d0 ("rapidio: add mport char device driver")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Alexandre Bounine <alex.bou9@gmail.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Tue, 16 Jul 2019 23:29:59 +0000 (16:29 -0700)]
select: shift restore_saved_sigmask_unless() into poll_select_copy_remaining()
Now that restore_saved_sigmask_unless() is always called with the same
argument right before poll_select_copy_remaining() we can move it into
poll_select_copy_remaining() and make it the only caller of restore() in
fs/select.c.
The patch also renames poll_select_copy_remaining(),
poll_select_finish() looks better after this change.
kern_select() doesn't use set_user_sigmask(), so in this case
poll_select_finish() does restore_saved_sigmask_unless() "for no
reason". But this won't hurt, and WARN_ON(!TIF_SIGPENDING) is still
valid.
Link: http://lkml.kernel.org/r/20190606140915.GC13440@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Laight <David.Laight@aculab.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Deepa Dinamani <deepa.kernel@gmail.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Eric Wong <e@80x24.org>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Tue, 16 Jul 2019 23:29:56 +0000 (16:29 -0700)]
select: change do_poll() to return -ERESTARTNOHAND rather than -EINTR
do_poll() returns -EINTR if interrupted and after that all its callers
have to translate it into -ERESTARTNOHAND. Change do_poll() to return
-ERESTARTNOHAND and update (simplify) the callers.
Note that this also unifies all users of restore_saved_sigmask_unless(),
see the next patch.
Linus:
: The *right* return value will actually be then chosen by
: poll_select_copy_remaining(), which will turn ERESTARTNOHAND to EINTR
: when it can't update the timeout.
:
: Except for the cases that use restart_block and do that instead and
: don't have the whole timeout restart issue as a result.
Link: http://lkml.kernel.org/r/20190606140852.GB13440@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Laight <David.Laight@aculab.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Deepa Dinamani <deepa.kernel@gmail.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Eric Wong <e@80x24.org>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Tue, 16 Jul 2019 23:29:53 +0000 (16:29 -0700)]
signal: simplify set_user_sigmask/restore_user_sigmask
task->saved_sigmask and ->restore_sigmask are only used in the ret-from-
syscall paths. This means that set_user_sigmask() can save ->blocked in
->saved_sigmask and do set_restore_sigmask() to indicate that ->blocked
was modified.
This way the callers do not need 2 sigset_t's passed to set/restore and
restore_user_sigmask() renamed to restore_saved_sigmask_unless() turns
into the trivial helper which just calls restore_saved_sigmask().
Link: http://lkml.kernel.org/r/20190606113206.GA9464@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Deepa Dinamani <deepa.kernel@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Eric Wong <e@80x24.org>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: David Laight <David.Laight@aculab.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Tue, 16 Jul 2019 23:29:50 +0000 (16:29 -0700)]
signal: reorder struct sighand_struct
struct sighand_struct::siglock field is the most used field by far, put
it first so that is can be accessed without IMM8 or IMM32 encoding on
x86_64.
Space savings (on trimmed down VM test config):
add/remove: 0/0 grow/shrink: 8/68 up/down: 49/-1147 (-1098)
Function old new delta
complete_signal 512 533 +21
do_signalfd4 335 346 +11
__cleanup_sighand 39 43 +4
unhandled_signal 49 52 +3
prepare_signal 692 695 +3
ignore_signals 37 40 +3
__tty_check_change.part 248 251 +3
ksys_unshare 780 781 +1
sighand_ctor 33 29 -4
ptrace_trap_notify 60 56 -4
sigqueue_free 98 91 -7
run_posix_cpu_timers 1389 1382 -7
proc_pid_status 2448 2441 -7
proc_pid_limits 344 337 -7
posix_cpu_timer_rearm 222 215 -7
posix_cpu_timer_get 249 242 -7
kill_pid_info_as_cred 243 236 -7
freeze_task 197 190 -7
flush_old_exec 1873 1866 -7
do_task_stat 3363 3356 -7
do_send_sig_info 98 91 -7
do_group_exit 147 140 -7
init_sighand 2088 2080 -8
do_notify_parent_cldstop 399 391 -8
signalfd_cleanup 50 41 -9
do_notify_parent 557 545 -12
__send_signal 1029 1017 -12
ptrace_stop 590 577 -13
get_signal 1576 1563 -13
__lock_task_sighand 112 99 -13
zap_pid_ns_processes 391 377 -14
update_rlimit_cpu 78 64 -14
tty_signal_session_leader 413 399 -14
tty_open_proc_set_tty 149 135 -14
tty_jobctrl_ioctl 936 922 -14
set_cpu_itimer 339 325 -14
ptrace_resume 226 212 -14
ptrace_notify 110 96 -14
proc_clear_tty 81 67 -14
posix_cpu_timer_del 229 215 -14
kernel_sigaction 156 142 -14
getrusage 977 963 -14
get_current_tty 98 84 -14
force_sigsegv 89 75 -14
force_sig_info 205 191 -14
flush_signals 83 69 -14
flush_itimer_signals 85 71 -14
do_timer_create 1120 1106 -14
do_sigpending 88 74 -14
do_signal_stop 537 523 -14
cgroup_init_fs_context 644 630 -14
call_usermodehelper_exec_async 402 388 -14
calculate_sigpending 58 44 -14
__x64_sys_timer_delete 248 234 -14
__set_current_blocked 80 66 -14
__ptrace_unlink 310 296 -14
__ptrace_detach.part 187 173 -14
send_sigqueue 362 347 -15
get_cpu_itimer 214 199 -15
signalfd_poll 175 159 -16
dequeue_signal 340 323 -17
do_getitimer 192 174 -18
release_task.part 1060 1040 -20
ptrace_peek_siginfo 408 387 -21
posix_cpu_timer_set 827 806 -21
exit_signals 437 416 -21
do_sigaction 541 520 -21
do_setitimer 485 464 -21
disassociate_ctty.part 545 517 -28
__x64_sys_rt_sigtimedwait 721 679 -42
__x64_sys_ptrace 1319 1277 -42
ptrace_request 1828 1782 -46
signalfd_read 507 459 -48
wait_consider_task 2027 1971 -56
do_coredump 3672 3616 -56
copy_process.part 6936 6871 -65
Link: http://lkml.kernel.org/r/20190503192800.GA18004@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dmitry V. Levin [Tue, 16 Jul 2019 23:29:46 +0000 (16:29 -0700)]
selftests/ptrace: add a test case for PTRACE_GET_SYSCALL_INFO
Check whether PTRACE_GET_SYSCALL_INFO semantics implemented in the
kernel matches userspace expectations.
[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/20190510152852.GG28558@altlinux.org
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Acked-by: Shuah Khan <shuah@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Elvira Khabirova <lineprinter@altlinux.org>
Cc: Eugene Syromyatnikov <esyr@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Greentime Hu <greentime@andestech.com>
Cc: Helge Deller <deller@gmx.de> [parisc]
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: James Hogan <jhogan@kernel.org>
Cc: kbuild test robot <lkp@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Vincent Chen <deanbo422@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Elvira Khabirova [Tue, 16 Jul 2019 23:29:42 +0000 (16:29 -0700)]
ptrace: add PTRACE_GET_SYSCALL_INFO request
PTRACE_GET_SYSCALL_INFO is a generic ptrace API that lets ptracer obtain
details of the syscall the tracee is blocked in.
There are two reasons for a special syscall-related ptrace request.
Firstly, with the current ptrace API there are cases when ptracer cannot
retrieve necessary information about syscalls. Some examples include:
* The notorious int-0x80-from-64-bit-task issue. See [1] for details.
In short, if a 64-bit task performs a syscall through int 0x80, its
tracer has no reliable means to find out that the syscall was, in
fact, a compat syscall, and misidentifies it.
* Syscall-enter-stop and syscall-exit-stop look the same for the
tracer. Common practice is to keep track of the sequence of
ptrace-stops in order not to mix the two syscall-stops up. But it is
not as simple as it looks; for example, strace had a (just recently
fixed) long-standing bug where attaching strace to a tracee that is
performing the execve system call led to the tracer identifying the
following syscall-exit-stop as syscall-enter-stop, which messed up
all the state tracking.
* Since the introduction of commit
84d77d3f06e7 ("ptrace: Don't allow
accessing an undumpable mm"), both PTRACE_PEEKDATA and
process_vm_readv become unavailable when the process dumpable flag is
cleared. On such architectures as ia64 this results in all syscall
arguments being unavailable for the tracer.
Secondly, ptracers also have to support a lot of arch-specific code for
obtaining information about the tracee. For some architectures, this
requires a ptrace(PTRACE_PEEKUSER, ...) invocation for every syscall
argument and return value.
ptrace(2) man page:
long ptrace(enum __ptrace_request request, pid_t pid,
void *addr, void *data);
...
PTRACE_GET_SYSCALL_INFO
Retrieve information about the syscall that caused the stop.
The information is placed into the buffer pointed by "data"
argument, which should be a pointer to a buffer of type
"struct ptrace_syscall_info".
The "addr" argument contains the size of the buffer pointed to
by "data" argument (i.e., sizeof(struct ptrace_syscall_info)).
The return value contains the number of bytes available
to be written by the kernel.
If the size of data to be written by the kernel exceeds the size
specified by "addr" argument, the output is truncated.
[ldv@altlinux.org: selftests/seccomp/seccomp_bpf: update for PTRACE_GET_SYSCALL_INFO]
Link: http://lkml.kernel.org/r/20190708182904.GA12332@altlinux.org
Link: http://lkml.kernel.org/r/20190510152842.GF28558@altlinux.org
Signed-off-by: Elvira Khabirova <lineprinter@altlinux.org>
Co-developed-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Cc: Eugene Syromyatnikov <esyr@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Greentime Hu <greentime@andestech.com>
Cc: Helge Deller <deller@gmx.de> [parisc]
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: James Hogan <jhogan@kernel.org>
Cc: kbuild test robot <lkp@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vincent Chen <deanbo422@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dmitry V. Levin [Tue, 16 Jul 2019 23:29:39 +0000 (16:29 -0700)]
powerpc: define syscall_get_error()
syscall_get_error() is required to be implemented on this architecture in
addition to already implemented syscall_get_nr(), syscall_get_arguments(),
syscall_get_return_value(), and syscall_get_arch() functions in order to
extend the generic ptrace API with PTRACE_GET_SYSCALL_INFO request.
Link: http://lkml.kernel.org/r/20190510152824.GE28558@altlinux.org
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Elvira Khabirova <lineprinter@altlinux.org>
Cc: Eugene Syromyatnikov <esyr@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Greentime Hu <greentime@andestech.com>
Cc: Helge Deller <deller@gmx.de> [parisc]
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: James Hogan <jhogan@kernel.org>
Cc: kbuild test robot <lkp@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vincent Chen <deanbo422@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dmitry V. Levin [Tue, 16 Jul 2019 23:29:35 +0000 (16:29 -0700)]
parisc: define syscall_get_error()
syscall_get_error() is required to be implemented on all architectures in
addition to already implemented syscall_get_nr(), syscall_get_arguments(),
syscall_get_return_value(), and syscall_get_arch() functions in order to
extend the generic ptrace API with PTRACE_GET_SYSCALL_INFO request.
Link: http://lkml.kernel.org/r/20190510152812.GD28558@altlinux.org
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Acked-by: Helge Deller <deller@gmx.de> [parisc]
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: Elvira Khabirova <lineprinter@altlinux.org>
Cc: Eugene Syromyatnikov <esyr@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Greentime Hu <greentime@andestech.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: kbuild test robot <lkp@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vincent Chen <deanbo422@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dmitry V. Levin [Tue, 16 Jul 2019 23:29:32 +0000 (16:29 -0700)]
mips: define syscall_get_error()
syscall_get_error() is required to be implemented on all architectures
in addition to already implemented syscall_get_nr(),
syscall_get_arguments(), syscall_get_return_value(), and
syscall_get_arch() functions in order to extend the generic ptrace API
with PTRACE_GET_SYSCALL_INFO request.
Link: http://lkml.kernel.org/r/20190510152803.GC28558@altlinux.org
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Acked-by: Paul Burton <paul.burton@mips.com>
Cc: Elvira Khabirova <lineprinter@altlinux.org>
Cc: Eugene Syromyatnikov <esyr@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: James Hogan <jhogan@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Greentime Hu <greentime@andestech.com>
Cc: Helge Deller <deller@gmx.de> [parisc]
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: kbuild test robot <lkp@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vincent Chen <deanbo422@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dmitry V. Levin [Tue, 16 Jul 2019 23:29:28 +0000 (16:29 -0700)]
hexagon: define syscall_get_error() and syscall_get_return_value()
syscall_get_* functions are required to be implemented on all
architectures in order to extend the generic ptrace API with
PTRACE_GET_SYSCALL_INFO request.
This adds remaining 2 syscall_get_* functions as documented in
asm-generic/syscall.h: syscall_get_error and syscall_get_return_value.
Link: http://lkml.kernel.org/r/20190510152756.GB28558@altlinux.org
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Elvira Khabirova <lineprinter@altlinux.org>
Cc: Eugene Syromyatnikov <esyr@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Greentime Hu <greentime@andestech.com>
Cc: Helge Deller <deller@gmx.de> [parisc]
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: James Hogan <jhogan@kernel.org>
Cc: kbuild test robot <lkp@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vincent Chen <deanbo422@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dmitry V. Levin [Tue, 16 Jul 2019 23:29:24 +0000 (16:29 -0700)]
nds32: fix asm/syscall.h
PTRACE_GET_SYSCALL_INFO is a generic ptrace API that lets ptracer obtain
details of the syscall the tracee is blocked in.
There are two reasons for a special syscall-related ptrace request.
Firstly, with the current ptrace API there are cases when ptracer cannot
retrieve necessary information about syscalls. Some examples include:
* The notorious int-0x80-from-64-bit-task issue. See [1] for details.
In short, if a 64-bit task performs a syscall through int 0x80, its
tracer has no reliable means to find out that the syscall was, in
fact, a compat syscall, and misidentifies it.
* Syscall-enter-stop and syscall-exit-stop look the same for the
tracer. Common practice is to keep track of the sequence of
ptrace-stops in order not to mix the two syscall-stops up. But it is
not as simple as it looks; for example, strace had a (just recently
fixed) long-standing bug where attaching strace to a tracee that is
performing the execve system call led to the tracer identifying the
following syscall-exit-stop as syscall-enter-stop, which messed up
all the state tracking.
* Since the introduction of commit
84d77d3f06e7 ("ptrace: Don't allow
accessing an undumpable mm"), both PTRACE_PEEKDATA and
process_vm_readv become unavailable when the process dumpable flag is
cleared. On such architectures as ia64 this results in all syscall
arguments being unavailable for the tracer.
Secondly, ptracers also have to support a lot of arch-specific code for
obtaining information about the tracee. For some architectures, this
requires a ptrace(PTRACE_PEEKUSER, ...) invocation for every syscall
argument and return value.
PTRACE_GET_SYSCALL_INFO returns the following structure:
struct ptrace_syscall_info {
__u8 op; /* PTRACE_SYSCALL_INFO_* */
__u32 arch __attribute__((__aligned__(sizeof(__u32))));
__u64 instruction_pointer;
__u64 stack_pointer;
union {
struct {
__u64 nr;
__u64 args[6];
} entry;
struct {
__s64 rval;
__u8 is_error;
} exit;
struct {
__u64 nr;
__u64 args[6];
__u32 ret_data;
} seccomp;
};
};
The structure was chosen according to [2], except for the following
changes:
* seccomp substructure was added as a superset of entry substructure
* the type of nr field was changed from int to __u64 because syscall
numbers are, as a practical matter, 64 bits
* stack_pointer field was added along with instruction_pointer field
since it is readily available and can save the tracer from extra
PTRACE_GETREGS/PTRACE_GETREGSET calls
* arch is always initialized to aid with tracing system calls such as
execve()
* instruction_pointer and stack_pointer are always initialized so they
could be easily obtained for non-syscall stops
* a boolean is_error field was added along with rval field, this way
the tracer can more reliably distinguish a return value from an error
value
strace has been ported to PTRACE_GET_SYSCALL_INFO. Starting with
release 4.26, strace uses PTRACE_GET_SYSCALL_INFO API as the preferred
mechanism of obtaining syscall information.
[1] https://lore.kernel.org/lkml/CA+55aFzcSVmdDj9Lh_gdbz1OzHyEm6ZrGPBDAJnywm2LF_eVyg@mail.gmail.com/
[2] https://lore.kernel.org/lkml/CAObL_7GM0n80N7J_DFw_eQyfLyzq+sf4y2AvsCCV88Tb3AwEHA@mail.gmail.com/
This patch (of 7):
All syscall_get_*() and syscall_set_*() functions must be defined as
static inline as on all other architectures, otherwise asm/syscall.h
cannot be included in more than one compilation unit.
This bug has to be fixed in order to extend the generic
ptrace API with PTRACE_GET_SYSCALL_INFO request.
Link: http://lkml.kernel.org/r/20190510152749.GA28558@altlinux.org
Fixes:
1932fbe36e02 ("nds32: System calls handling")
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Reported-by: kbuild test robot <lkp@intel.com>
Acked-by: Greentime Hu <greentime@andestech.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Elvira Khabirova <lineprinter@altlinux.org>
Cc: Eugene Syromyatnikov <esyr@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Helge Deller <deller@gmx.de> [parisc]
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: James Hogan <jhogan@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hariprasad Kelam [Tue, 16 Jul 2019 23:29:21 +0000 (16:29 -0700)]
fs/reiserfs/journal.c: change return type of dirty_one_transaction
Change return type of dirty_one_transaction from int to void. As this
function always return success.
Fixes below issue reported by coccicheck:
fs/reiserfs/journal.c:1690:5-8: Unneeded variable: "ret". Return "0" on line 1719
Link: http://lkml.kernel.org/r/20190702175430.GA5882@hari-Inspiron-1545
Signed-off-by: Hariprasad Kelam <hariprasad.kelam@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Bharath Vedartham <linux.bhar@gmail.com>
Cc: Hariprasad Kelam <hariprasad.kelam@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
YueHaibing [Tue, 16 Jul 2019 23:29:18 +0000 (16:29 -0700)]
fs/ufs/super.c: remove set but not used variable 'usb3'
Fixes gcc '-Wunused-but-set-variable' warning:
fs/ufs/super.c: In function ufs_statfs:
fs/ufs/super.c:1409:32: warning: variable usb3 set but not used [-Wunused-but-set-variable]
It is not used since commmit
c596961d1b4c ("ufs: fix s_size/s_dsize
users")
Link: http://lkml.kernel.org/r/20190525140654.15924-1-yuehaibing@huawei.com
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Cc: Evgeniy Dushistov <dushistov@mail.ru>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mathieu Malaterre [Tue, 16 Jul 2019 23:29:15 +0000 (16:29 -0700)]
fs/hfsplus/xattr.c: replace strncpy with memcpy
strncpy() was used to copy a fixed size buffer. Since NUL-terminating
string is not required here, prefer a memcpy function. The generated
code (ppc32) remains the same.
Silence the following warning triggered using W=1:
fs/hfsplus/xattr.c:410:3: warning: 'strncpy' output truncated before terminating nul copying 4 bytes from a string of the same length [-Wstringop-truncation]
Link: http://lkml.kernel.org/r/20190529113341.11972-1-malat@debian.org
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Reviewed-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pedro Cuadra [Tue, 16 Jul 2019 23:29:13 +0000 (16:29 -0700)]
coda: add hinting support for partial file caching
This adds support for partial file caching in Coda. Every read, write
and mmap informs the userspace cache manager about what part of a file
is about to be accessed so that the cache manager can ensure the
relevant parts are available before the operation is allowed to proceed.
When a read or write operation completes, this is also reported to allow
the cache manager to track when partially cached content can be
released.
If the cache manager does not support partial file caching, or when the
entire file has been fetched into the local cache, the cache manager may
return an EOPNOTSUPP error to indicate that intent upcalls are no longer
necessary until the file is closed.
[akpm@linux-foundation.org: little whitespace fixup]
Link: http://lkml.kernel.org/r/20190618181301.6960-1-jaharkes@cs.cmu.edu
Signed-off-by: Pedro Cuadra <pjcuadra@gmail.com>
Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fabian Frederick [Tue, 16 Jul 2019 23:29:09 +0000 (16:29 -0700)]
coda: ftoc validity check integration
This patch moves cfi check in coda_ftoc() instead of repeating it in the
wild.
Module size
text data bss dec hex filename
28297 1040 700 30037 7555 fs/coda/coda.ko.before
28263 980 700 29943 74f7 fs/coda/coda.ko.after
Link: http://lkml.kernel.org/r/a2c27663ec4547018c92d71c63b1dff4650b6546.1558117389.git.jaharkes@cs.cmu.edu
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Mikko Rapeli <mikko.rapeli@iki.fi>
Cc: Sam Protsenko <semen.protsenko@linaro.org>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: Zhouyang Jia <jiazhouyang09@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fabian Frederick [Tue, 16 Jul 2019 23:29:06 +0000 (16:29 -0700)]
coda: remove sb test in coda_fid_to_inode()
coda_fid_to_inode() is only called by coda_downcall() where sb is already
being tested.
Link: http://lkml.kernel.org/r/d2163b3136348faf83ba47dc2d65a5d0a9a135dd.1558117389.git.jaharkes@cs.cmu.edu
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Mikko Rapeli <mikko.rapeli@iki.fi>
Cc: Sam Protsenko <semen.protsenko@linaro.org>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: Zhouyang Jia <jiazhouyang09@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fabian Frederick [Tue, 16 Jul 2019 23:29:03 +0000 (16:29 -0700)]
coda: remove sysctl object from module when unused
Inspired by NFS sysctl process
Link: http://lkml.kernel.org/r/9afcc2cd09490849b309786bbf47fef75de7f91c.1558117389.git.jaharkes@cs.cmu.edu
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Mikko Rapeli <mikko.rapeli@iki.fi>
Cc: Sam Protsenko <semen.protsenko@linaro.org>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: Zhouyang Jia <jiazhouyang09@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fabian Frederick [Tue, 16 Jul 2019 23:29:00 +0000 (16:29 -0700)]
coda: add __init to init_coda_psdev()
init_coda_psdev() was only called by __init function.
Link: http://lkml.kernel.org/r/a12a5a135fa6b0ea997e1a0af4be0a235c463a24.1558117389.git.jaharkes@cs.cmu.edu
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Mikko Rapeli <mikko.rapeli@iki.fi>
Cc: Sam Protsenko <semen.protsenko@linaro.org>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: Zhouyang Jia <jiazhouyang09@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fabian Frederick [Tue, 16 Jul 2019 23:28:57 +0000 (16:28 -0700)]
coda: use SIZE() for stat
max_t expression was already defined in coda sources
Link: http://lkml.kernel.org/r/e6cda497ce8691db155cb35f8d13ea44ca6cedeb.1558117389.git.jaharkes@cs.cmu.edu
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Mikko Rapeli <mikko.rapeli@iki.fi>
Cc: Sam Protsenko <semen.protsenko@linaro.org>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: Zhouyang Jia <jiazhouyang09@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fabian Frederick [Tue, 16 Jul 2019 23:28:54 +0000 (16:28 -0700)]
coda: destroy mutex in put_super()
We can safely destroy vc_mutex at the end of umount process.
Link: http://lkml.kernel.org/r/f436f68908c467c5663bc6a9251b52cd7b95d2a5.1558117389.git.jaharkes@cs.cmu.edu
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Mikko Rapeli <mikko.rapeli@iki.fi>
Cc: Sam Protsenko <semen.protsenko@linaro.org>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: Zhouyang Jia <jiazhouyang09@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Harkes [Tue, 16 Jul 2019 23:28:51 +0000 (16:28 -0700)]
coda: remove uapi/linux/coda_psdev.h
Nothing is left in this header that is used by userspace.
Link: http://lkml.kernel.org/r/bb11378cef94739f2cf89425dd6d302a52c64480.1558117389.git.jaharkes@cs.cmu.edu
Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Fabian Frederick <fabf@skynet.be>
Cc: Mikko Rapeli <mikko.rapeli@iki.fi>
Cc: Sam Protsenko <semen.protsenko@linaro.org>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: Zhouyang Jia <jiazhouyang09@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Tue, 16 Jul 2019 23:28:47 +0000 (16:28 -0700)]
coda: move internal defs out of include/linux/ [ver #2]
Move include/linux/coda_psdev.h to fs/coda/ as there's nothing else that
uses it.
Link: http://lkml.kernel.org/r/3ceeee0415a929b89fb02700b6b4b3a07938acb8.1558117389.git.jaharkes@cs.cmu.edu
Link: https://patchwork.kernel.org/patch/10590257/
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: Jan Harkes <jaharkes@cs.cmu.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Fabian Frederick <fabf@skynet.be>
Cc: Mikko Rapeli <mikko.rapeli@iki.fi>
Cc: Sam Protsenko <semen.protsenko@linaro.org>
Cc: Zhouyang Jia <jiazhouyang09@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Harkes [Tue, 16 Jul 2019 23:28:44 +0000 (16:28 -0700)]
coda: bump module version
The out of tree module version had been bumped several times already,
but we haven't kept this in-tree one in sync, partly because most
changes go from here to the out-of-tree copy.
Link: http://lkml.kernel.org/r/8b0ab50a2da2f0180ac32c79d91811b4d1d0bd8b.1558117389.git.jaharkes@cs.cmu.edu
Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Fabian Frederick <fabf@skynet.be>
Cc: Mikko Rapeli <mikko.rapeli@iki.fi>
Cc: Sam Protsenko <semen.protsenko@linaro.org>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: Zhouyang Jia <jiazhouyang09@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dan Carpenter [Tue, 16 Jul 2019 23:28:41 +0000 (16:28 -0700)]
coda: get rid of CODA_FREE()
The CODA_FREE() macro just calls kvfree(). We can call that directly
instead.
Link: http://lkml.kernel.org/r/4950a94fd30ec5f84835dd4ca0bb67c0448672f5.1558117389.git.jaharkes@cs.cmu.edu
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Fabian Frederick <fabf@skynet.be>
Cc: Mikko Rapeli <mikko.rapeli@iki.fi>
Cc: Sam Protsenko <semen.protsenko@linaro.org>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: Zhouyang Jia <jiazhouyang09@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dan Carpenter [Tue, 16 Jul 2019 23:28:38 +0000 (16:28 -0700)]
coda: get rid of CODA_ALLOC()
These days we have kvzalloc() so we can delete CODA_ALLOC().
I made a couple related changes in coda_psdev_write(). First, I added
some error handling to avoid a NULL dereference if the allocation
failed. Second, I used kvmalloc() instead of kvzalloc() because we copy
over the memory on the next line so there is no need to zero it first.
Link: http://lkml.kernel.org/r/e56010c822e7a7cbaa8a238cf82ad31c67eaa800.1558117389.git.jaharkes@cs.cmu.edu
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Fabian Frederick <fabf@skynet.be>
Cc: Mikko Rapeli <mikko.rapeli@iki.fi>
Cc: Sam Protsenko <semen.protsenko@linaro.org>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: Zhouyang Jia <jiazhouyang09@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Harkes [Tue, 16 Jul 2019 23:28:35 +0000 (16:28 -0700)]
coda: change Coda's user api to use 64-bit time_t in timespec
Move the 32-bit time_t problems to userspace.
Link: http://lkml.kernel.org/r/8d089068823bfb292a4020f773922fbd82ffad39.1558117389.git.jaharkes@cs.cmu.edu
Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Fabian Frederick <fabf@skynet.be>
Cc: Mikko Rapeli <mikko.rapeli@iki.fi>
Cc: Sam Protsenko <semen.protsenko@linaro.org>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: Zhouyang Jia <jiazhouyang09@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Arnd Bergmann [Tue, 16 Jul 2019 23:28:32 +0000 (16:28 -0700)]
coda: stop using 'struct timespec' in user API
We exchange file timestamps with user space using psdev device
read/write operations with a fixed but architecture specific binary
layout.
On 32-bit systems, this uses a 'timespec' structure that is defined by
the C library to contain two 32-bit values for seconds and nanoseconds.
As we get ready for the year 2038 overflow of the 32-bit signed seconds,
the kernel now uses 64-bit timestamps internally, and user space will do
the same change by changing the 'timespec' definition in the future.
Unfortunately, this breaks the layout of the coda_vattr structure, so we
need to redefine that in terms of something that does not change. I'm
introducing a new 'struct vtimespec' structure here that keeps the
existing layout, and the same change has to be done in the coda user
space copy of linux/coda.h before anyone can use that on a 32-bit
architecture with 64-bit time_t.
An open question is what should happen to actual times past y2038, as
they are now truncated to the last valid date when sent to user space,
and interpreted as pre-1970 times when a timestamp with the MSB set is
read back into the kernel. Alternatively, we could change the new
timespec64_to_coda()/coda_to_timespec64() functions to use a different
interpretation and extend the available range further to the future by
disallowing past timestamps. This would require more changes in the
user space side though.
Link: http://lkml.kernel.org/r/562b7324149461743e4fbe2fedbf7c242f7e274a.1558117389.git.jaharkes@cs.cmu.edu
Link: https://patchwork.kernel.org/patch/10474735/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>
Acked-by: Jan Harkes <jaharkes@cs.cmu.edu>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Fabian Frederick <fabf@skynet.be>
Cc: Mikko Rapeli <mikko.rapeli@iki.fi>
Cc: Sam Protsenko <semen.protsenko@linaro.org>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: Zhouyang Jia <jiazhouyang09@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Colin Ian King [Tue, 16 Jul 2019 23:28:29 +0000 (16:28 -0700)]
coda: clean up indentation, replace spaces with tab
Trivial fix to clean up indentation, replace spaces with tab
Link: http://lkml.kernel.org/r/ffc2bfa5a37ffcdf891c51b2e2ed618103965b24.1558117389.git.jaharkes@cs.cmu.edu
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Fabian Frederick <fabf@skynet.be>
Cc: Mikko Rapeli <mikko.rapeli@iki.fi>
Cc: Sam Protsenko <semen.protsenko@linaro.org>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: Zhouyang Jia <jiazhouyang09@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Harkes [Tue, 16 Jul 2019 23:28:26 +0000 (16:28 -0700)]
uapi linux/coda_psdev.h: move CODA_REQ_ from uapi to kernel side headers
These constants only used internally and not exposed to userspace.
Link: http://lkml.kernel.org/r/baeafc30dad70d8b422ee679420099c2d8aa7da0.1558117389.git.jaharkes@cs.cmu.edu
Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Fabian Frederick <fabf@skynet.be>
Cc: Mikko Rapeli <mikko.rapeli@iki.fi>
Cc: Sam Protsenko <semen.protsenko@linaro.org>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: Zhouyang Jia <jiazhouyang09@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Harkes [Tue, 16 Jul 2019 23:28:23 +0000 (16:28 -0700)]
coda: don't try to print names that were considered too long
Probably safer to just show the unexpected length and debug it from the
userspace side.
Link: http://lkml.kernel.org/r/582ae759a4fdfa31a64c35de489fa4efabac09d6.1558117389.git.jaharkes@cs.cmu.edu
Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Fabian Frederick <fabf@skynet.be>
Cc: Mikko Rapeli <mikko.rapeli@iki.fi>
Cc: Sam Protsenko <semen.protsenko@linaro.org>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: Zhouyang Jia <jiazhouyang09@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sam Protsenko [Tue, 16 Jul 2019 23:28:20 +0000 (16:28 -0700)]
coda: fix build using bare-metal toolchain
The kernel is self-contained project and can be built with bare-metal
toolchain. But bare-metal toolchain doesn't define __linux__. Because
of this u_quad_t type is not defined when using bare-metal toolchain and
codafs build fails. This patch fixes it by defining u_quad_t type
unconditionally.
Link: http://lkml.kernel.org/r/3cbb40b0a57b6f9923a9d67b53473c0b691a3eaa.1558117389.git.jaharkes@cs.cmu.edu
Signed-off-by: Sam Protsenko <semen.protsenko@linaro.org>
Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Fabian Frederick <fabf@skynet.be>
Cc: Mikko Rapeli <mikko.rapeli@iki.fi>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: Zhouyang Jia <jiazhouyang09@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Harkes [Tue, 16 Jul 2019 23:28:16 +0000 (16:28 -0700)]
coda: potential buffer overflow in coda_psdev_write()
Add checks to make sure the downcall message we got from the Coda cache
manager is large enough to contain the data it is supposed to have.
i.e. when we get a CODA_ZAPDIR we can access &out->coda_zapdir.CodaFid.
Link: http://lkml.kernel.org/r/894fb6b250add09e4e3935f14649f21284a5cb18.1558117389.git.jaharkes@cs.cmu.edu
Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Fabian Frederick <fabf@skynet.be>
Cc: Mikko Rapeli <mikko.rapeli@iki.fi>
Cc: Sam Protsenko <semen.protsenko@linaro.org>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: Zhouyang Jia <jiazhouyang09@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Zhouyang Jia [Tue, 16 Jul 2019 23:28:13 +0000 (16:28 -0700)]
coda: add error handling for fget
When fget fails, the lack of error-handling code may cause unexpected
results.
This patch adds error-handling code after calling fget.
Link: http://lkml.kernel.org/r/2514ec03df9c33b86e56748513267a80dd8004d9.1558117389.git.jaharkes@cs.cmu.edu
Signed-off-by: Zhouyang Jia <jiazhouyang09@gmail.com>
Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Fabian Frederick <fabf@skynet.be>
Cc: Mikko Rapeli <mikko.rapeli@iki.fi>
Cc: Sam Protsenko <semen.protsenko@linaro.org>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mikko Rapeli [Tue, 16 Jul 2019 23:28:10 +0000 (16:28 -0700)]
uapi linux/coda_psdev.h: move upc_req definition from uapi to kernel side headers
Only users of upc_req in kernel side fs/coda/psdev.c and
fs/coda/upcall.c already include linux/coda_psdev.h.
Suggested by Jan Harkes <jaharkes@cs.cmu.edu> in
https://lore.kernel.org/lkml/
20150531111913.GA23377@cs.cmu.edu/
Fixes these include/uapi/linux/coda_psdev.h compilation errors in userspace:
linux/coda_psdev.h:12:19: error: field `uc_chain' has incomplete type
struct list_head uc_chain;
^
linux/coda_psdev.h:13:2: error: unknown type name `caddr_t'
caddr_t uc_data;
^
linux/coda_psdev.h:14:2: error: unknown type name `u_short'
u_short uc_flags;
^
linux/coda_psdev.h:15:2: error: unknown type name `u_short'
u_short uc_inSize; /* Size is at most 5000 bytes */
^
linux/coda_psdev.h:16:2: error: unknown type name `u_short'
u_short uc_outSize;
^
linux/coda_psdev.h:17:2: error: unknown type name `u_short'
u_short uc_opcode; /* copied from data to save lookup */
^
linux/coda_psdev.h:19:2: error: unknown type name `wait_queue_head_t'
wait_queue_head_t uc_sleep; /* process' wait queue */
^
Link: http://lkml.kernel.org/r/9f99f5ce6a0563d5266e6cf7aa9585aac2cae971.1558117389.git.jaharkes@cs.cmu.edu
Signed-off-by: Mikko Rapeli <mikko.rapeli@iki.fi>
Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Fabian Frederick <fabf@skynet.be>
Cc: Sam Protsenko <semen.protsenko@linaro.org>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: Zhouyang Jia <jiazhouyang09@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mikko Rapeli [Tue, 16 Jul 2019 23:28:07 +0000 (16:28 -0700)]
uapi linux/coda.h: use __kernel_pid_t for userspace
Part of a patch by Mikko Rapeli, as Arnd Bergman commented on the
original patch.
pid_t might differ between libc and the kernel, so the kernel
interface has to use types that the kernel defines.
Link: http://lkml.kernel.org/r/f374a71f4d351bc8c8b3ac18ad7765c88d806d10.1558117389.git.jaharkes@cs.cmu.edu
Signed-off-by: Mikko Rapeli <mikko.rapeli@iki.fi>
Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Fabian Frederick <fabf@skynet.be>
Cc: Sam Protsenko <semen.protsenko@linaro.org>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: Zhouyang Jia <jiazhouyang09@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Harkes [Tue, 16 Jul 2019 23:28:04 +0000 (16:28 -0700)]
coda: pass the host file in vma->vm_file on mmap
Patch series "Coda updates".
The following patch series is a collection of various fixes for Coda,
most of which were collected from linux-fsdevel or linux-kernel but
which have as yet not found their way upstream.
This patch (of 22):
Various file systems expect that vma->vm_file points at their own file
handle, several use file_inode(vma->vm_file) to get at their inode or
use vma->vm_file->private_data. However the way Coda wrapped mmap on a
host file broke this assumption, vm_file was still pointing at the Coda
file and the host file systems would scribble over Coda's inode and
private file data.
This patch fixes the incorrect expectation and wraps vm_ops->open and
vm_ops->close to allow Coda to track when the vm_area_struct is
destroyed so we still release the reference on the Coda file handle at
the right time.
Link: http://lkml.kernel.org/r/0e850c6e59c0b147dc2dcd51a3af004c948c3697.1558117389.git.jaharkes@cs.cmu.edu
Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Fabian Frederick <fabf@skynet.be>
Cc: Mikko Rapeli <mikko.rapeli@iki.fi>
Cc: Sam Protsenko <semen.protsenko@linaro.org>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: Zhouyang Jia <jiazhouyang09@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Anshuman Khandual [Tue, 16 Jul 2019 23:28:00 +0000 (16:28 -0700)]
mm, kprobes: generalize and rename notify_page_fault() as kprobe_page_fault()
Architectures which support kprobes have very similar boilerplate around
calling kprobe_fault_handler(). Use a helper function in kprobes.h to
unify them, based on the x86 code.
This changes the behaviour for other architectures when preemption is
enabled. Previously, they would have disabled preemption while calling
the kprobe handler. However, preemption would be disabled if this fault
was due to a kprobe, so we know the fault was not due to a kprobe
handler and can simply return failure.
This behaviour was introduced in commit
a980c0ef9f6d ("x86/kprobes:
Refactor kprobes_fault() like kprobe_exceptions_notify()")
[anshuman.khandual@arm.com: export kprobe_fault_handler()]
Link: http://lkml.kernel.org/r/1561133358-8876-1-git-send-email-anshuman.khandual@arm.com
Link: http://lkml.kernel.org/r/1560420444-25737-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kees Cook [Tue, 16 Jul 2019 23:27:57 +0000 (16:27 -0700)]
init/Kconfig: fix neighboring typos
This fixes a couple typos I noticed in the slab Kconfig:
sacrifies -> sacrifices
accellerate -> accelerate
Seeing as no other instances of these typos are found elsewhere in the
kernel and that I originally added one of the two, I can only assume
working on slab must have caused damage to the spelling centers of my
brain.
Link: http://lkml.kernel.org/r/201905292203.CD000546EB@keescook
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Tue, 16 Jul 2019 23:27:54 +0000 (16:27 -0700)]
fs/binfmt_elf.c: delete stale comment
"passed_fileno" variable was deleted 11 years ago in 2.6.25.
Link: http://lkml.kernel.org/r/20190529201747.GA23248@avx2
Fixes:
d20894a23708 ("Remove a.out interpreter support in ELF loader")
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
YueHaibing [Tue, 16 Jul 2019 23:27:51 +0000 (16:27 -0700)]
fs/binfmt_flat.c: remove set but not used variable 'inode'
Fixes gcc '-Wunused-but-set-variable' warning:
fs/binfmt_flat.c: In function load_flat_file:
fs/binfmt_flat.c:419:16: warning: variable inode set but not used [-Wunused-but-set-variable]
It's never used and can be removed.
Link: http://lkml.kernel.org/r/20190525125341.9844-1-yuehaibing@huawei.com
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Matteo Croce [Tue, 16 Jul 2019 23:27:48 +0000 (16:27 -0700)]
checkpatch.pl: warn on duplicate sysctl local variable
Commit
d91bff3011cf ("proc/sysctl: add shared variables for range
check") adds some shared const variables to be used instead of a local
copy in each source file. Warn when a chunk duplicates one of these
values in a ctl_table struct:
$ scripts/checkpatch.pl 0001-test-commit.patch
WARNING: duplicated sysctl range checking value 'zero', consider using the shared one in include/linux/sysctl.h
#27: FILE: arch/arm/kernel/isa.c:48:
+ .extra1 = &zero,
WARNING: duplicated sysctl range checking value 'int_max', consider using the shared one in include/linux/sysctl.h
#28: FILE: arch/arm/kernel/isa.c:49:
+ .extra2 = &int_max,
total: 0 errors, 2 warnings, 14 lines checked
Link: http://lkml.kernel.org/r/20190531131422.14970-1-mcroce@redhat.com
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Joe Perches <joe@perches.com>
Cc: Aaron Tomlin <atomlin@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michel Lespinasse [Tue, 16 Jul 2019 23:27:45 +0000 (16:27 -0700)]
lib/rbtree: avoid generating code twice for the cached versions
As was already noted in rbtree.h, the logic to cache rb_first (or
rb_last) can easily be implemented externally to the core rbtree api.
Change the implementation to do just that. Previously the update of
rb_leftmost was wired deeper into the implmentation, but there were some
disadvantages to that - mostly, lib/rbtree.c had separate instantiations
for rb_insert_color() vs rb_insert_color_cached(), as well as rb_erase()
vs rb_erase_cached(), which were doing exactly the same thing save for
the rb_leftmost update at the start of either function.
text data bss dec hex filename
5405 120 0 5525 1595 lib/rbtree.o-vanilla
3827 96 0 3923 f53 lib/rbtree.o-patch
[dave@stgolabs.net: changelog addition]
Link: http://lkml.kernel.org/r/20190628171416.by5gdizl3rcxk5h5@linux-r8p5
[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/20190628045008.39926-1-walken@google.com
Signed-off-by: Michel Lespinasse <walken@google.com>
Acked-by: Davidlohr Bueso <dbueso@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexander Potapenko [Tue, 16 Jul 2019 23:27:42 +0000 (16:27 -0700)]
lib/test_meminit.c: minor test fixes
Fix the following issues in test_meminit.c:
- |size| in fill_with_garbage_skip() should be signed so that it
doesn't overflow if it's not aligned on sizeof(*p);
- fill_with_garbage_skip() should actually skip |skip| bytes;
- do_kmem_cache_size() should deallocate memory in the RCU case.
Link: http://lkml.kernel.org/r/20190626133135.217355-1-glider@google.com
Fixes:
7e659650cbda ("lib: introduce test_meminit module")
Fixes:
94e8988d91c7 ("lib/test_meminit.c: fix -Wmaybe-uninitialized false positive")
Signed-off-by: Alexander Potapenko <glider@google.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Arnd Bergmann [Tue, 16 Jul 2019 23:27:39 +0000 (16:27 -0700)]
lib/test_meminit.c: fix -Wmaybe-uninitialized false positive
The conditional logic is too complicated for the compiler to fully
comprehend:
lib/test_meminit.c: In function 'test_meminit_init':
lib/test_meminit.c:236:5: error: 'buf_copy' may be used uninitialized in this function [-Werror=maybe-uninitialized]
kfree(buf_copy);
^~~~~~~~~~~~~~~
lib/test_meminit.c:201:14: note: 'buf_copy' was declared here
Simplify it by splitting out the non-rcu section.
Link: http://lkml.kernel.org/r/20190617131210.2190280-1-arnd@arndb.de
Fixes:
af734ee6ec85 ("lib: introduce test_meminit module")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Alexander Potapenko <glider@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jonathan Corbet [Tue, 16 Jul 2019 23:27:36 +0000 (16:27 -0700)]
lib/string_helpers: fix some kerneldoc warnings
Due to some sad limitations in how kerneldoc comments are parsed, the
documentation in lib/string_helpers.c generates these warnings:
lib/string_helpers.c:236: WARNING: Unexpected indentation.
lib/string_helpers.c:241: WARNING: Block quote ends without a blank line; unexpected unindent.
lib/string_helpers.c:446: WARNING: Unexpected indentation.
lib/string_helpers.c:451: WARNING: Block quote ends without a blank line; unexpected unindent.
lib/string_helpers.c:474: WARNING: Unexpected indentation.
Rework the comments to obtain something like the desired result.
Link: http://lkml.kernel.org/r/20190607110952.409011ba@lwn.net
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Anshuman Khandual [Tue, 16 Jul 2019 23:27:33 +0000 (16:27 -0700)]
mm/ioremap: probe platform for p4d huge map support
Finish up what commit
c2febafc6773 ("mm: convert generic code to 5-level
paging") started while levelling up P4D huge mapping support at par with
PUD and PMD. A new arch call back arch_ioremap_p4d_supported() is added
which just maintains status quo (P4D huge map not supported) on x86,
arm64 and powerpc.
When HAVE_ARCH_HUGE_VMAP is enabled its just a simple check from the
arch about the support, hence runtime effects are minimal.
Link: http://lkml.kernel.org/r/1561699231-20991-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Anshuman Khandual [Tue, 16 Jul 2019 23:27:30 +0000 (16:27 -0700)]
mm/ioremap: check virtual address alignment while creating huge mappings
Virtual address alignment is essential in ensuring correct clearing for
all intermediate level pgtable entries and freeing associated pgtable
pages. An unaligned address can end up randomly freeing pgtable page
that potentially still contains valid mappings. Hence also check it's
alignment along with existing phys_addr check.
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Chintan Pandya <cpandya@codeaurora.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexander Potapenko [Tue, 16 Jul 2019 23:27:27 +0000 (16:27 -0700)]
lib: introduce test_meminit module
Add tests for heap and pagealloc initialization. These can be used to
check init_on_alloc and init_on_free implementations as well as other
approaches to initialization.
Expected test output in the case the kernel provides heap initialization
(e.g. when running with either init_on_alloc=1 or init_on_free=1):
test_meminit: all 10 tests in test_pages passed
test_meminit: all 40 tests in test_kvmalloc passed
test_meminit: all 60 tests in test_kmemcache passed
test_meminit: all 10 tests in test_rcu_persistent passed
test_meminit: all 120 tests passed!
Link: http://lkml.kernel.org/r/20190529123812.43089-4-glider@google.com
Signed-off-by: Alexander Potapenko <glider@google.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Sandeep Patil <sspatil@android.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Marco Elver <elver@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kees Cook [Tue, 16 Jul 2019 23:27:24 +0000 (16:27 -0700)]
lib/test_overflow.c: avoid tainting the kernel and fix wrap size
This adds __GFP_NOWARN to the kmalloc()-portions of the overflow test to
avoid tainting the kernel. Additionally fixes up the math on wrap size
to be architecture and page size agnostic.
Link: http://lkml.kernel.org/r/201905282012.0A8767E24@keescook
Fixes:
ca90800a91ba ("test_overflow: Add memory allocation overflow tests")
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Suggested-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Peter Rosin [Tue, 16 Jul 2019 23:27:21 +0000 (16:27 -0700)]
lib/test_string.c: add some testcases for strchr and strnchr
Make sure that the trailing NUL is considered part of the string and can
be found.
Link: http://lkml.kernel.org/r/20190506124634.6807-4-peda@axentia.se
Signed-off-by: Peter Rosin <peda@axentia.se>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Peter Rosin [Tue, 16 Jul 2019 23:27:18 +0000 (16:27 -0700)]
lib/test_string.c: avoid masking memset16/32/64 failures
If a memsetXX implementation is completely broken and fails in the first
iteration, when i, j, and k are all zero, the failure is masked as zero
is returned. Failing in the first iteration is perhaps the most likely
failure, so this makes the tests pretty much useless. Avoid the
situation by always setting a random unused bit in the result on
failure.
Link: http://lkml.kernel.org/r/20190506124634.6807-3-peda@axentia.se
Fixes:
03270c13c5ff ("lib/string.c: add testcases for memset16/32/64")
Signed-off-by: Peter Rosin <peda@axentia.se>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Peter Rosin [Tue, 16 Jul 2019 23:27:15 +0000 (16:27 -0700)]
lib/string.c: allow searching for NUL with strnchr
Patch series "lib/string: search for NUL with strchr/strnchr".
I noticed an inconsistency where strchr and strnchr do not behave the
same with respect to the trailing NUL. strchr is standardised and the
kernel function conforms, and the kernel relies on the behavior. So,
naturally strchr stays as-is and strnchr is what I change.
While writing a few tests to verify that my new strnchr loop was sane, I
noticed that the tests for memset16/32/64 had a problem. Since it's all
about the lib/string.c file I made a short series of it all...
This patch (of 3):
strchr considers the terminating NUL to be part of the string, and NUL
can thus be searched for with that function. For consistency, do the
same with strnchr.
Link: http://lkml.kernel.org/r/20190506124634.6807-2-peda@axentia.se
Signed-off-by: Peter Rosin <peda@axentia.se>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Tue, 16 Jul 2019 23:27:12 +0000 (16:27 -0700)]
lib/list: tweak LIST_POISON2 for better code generation on x86_64
list_del() poisoning can generate 2 64-bit immediate loads but it also can
generate one 64-bit immediate load and an addition:
48 b8 00 01 00 00 00 00 ad de movabs rax,0xdead000000000100
48 89 47 58 mov QWORD PTR [rdi+0x58],rax
48 05 00 01 00 00 <=====> add rax,0x100
48 89 47 60 mov QWORD PTR [rdi+0x60],rax
However on x86_64 not all constants are equal: those within [-128, 127]
range can be added with shorter "add r64, imm32" instruction:
48 b8 00 01 00 00 00 00 ad de movabs rax,0xdead000000000100
48 89 47 58 mov QWORD PTR [rdi+0x58],rax
48 83 c0 22 <======> add rax,0x22
48 89 47 60 mov QWORD PTR [rdi+0x60],rax
Patch saves 2 bytes per some LIST_POISON2 usage.
(Slightly disappointing) space savings on F29 x86_64 config:
add/remove: 0/0 grow/shrink: 0/2164 up/down: 0/-5184 (-5184)
Function old new delta
zstd_get_workspace 548 546 -2
...
mlx4_delete_all_resources_for_slave 4826 4804 -22
Total: Before=
83304131, After=
83298947, chg -0.01%
New constants are:
0xdead000000000100
0xdead000000000122
Note: LIST_POISON1 can't be changed to ...11 because something in page
allocator requires low bit unset.
Link: http://lkml.kernel.org/r/20190513191502.GA8492@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joe Perches [Tue, 16 Jul 2019 23:27:09 +0000 (16:27 -0700)]
get_maintainer: add ability to skip moderated mailing lists
Add a command line switch --no-moderated to skip L: mailing lists marked
with 'moderated'.
Some people prefer not emailing moderated mailing lists as the
moderation time can be indeterminate and some emails can be
intentionally dropped by a moderator.
This can cause fragmentation of email threads when some are subscribed
to a moderated list but others are not and emails are dropped.
Link: http://lkml.kernel.org/r/6f23c2918ad9fc744269feb8f909bdfb105c5afc.camel@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Qian Cai [Tue, 16 Jul 2019 23:27:06 +0000 (16:27 -0700)]
asm-generic: fix a compilation warning
Fix this compilation warning on x86 by making flush_cache_vmap() inline.
lib/ioremap.c: In function 'ioremap_page_range':
lib/ioremap.c:214:16: warning: variable 'start' set but not used [-Wunused-but-set-variable]
unsigned long start;
^~~~~
While at it, convert all other similar functions to inline for
consistency.
Link: http://lkml.kernel.org/r/1562594592-15228-1-git-send-email-cai@lca.pw
Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Stephen Kitt [Tue, 16 Jul 2019 23:27:04 +0000 (16:27 -0700)]
arch/*: remove unused isa_page_to_bus()
isa_page_to_bus() is deprecated and is no longer used anywhere. Remove
it entirely.
Link: http://lkml.kernel.org/r/20190613161155.16946-1-steve@sk2.org
Signed-off-by: Stephen Kitt <steve@sk2.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Masahiro Yamada [Tue, 16 Jul 2019 23:27:01 +0000 (16:27 -0700)]
arch: replace _BITUL() in kernel-space headers with BIT()
Now that BIT() can be used from assembly code, we can safely replace
_BITUL() with equivalent BIT().
UAPI headers are still required to use _BITUL(), but there is no more
reason to use it in kernel headers. BIT() is shorter.
Link: http://lkml.kernel.org/r/20190609153941.17249-2-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Masahiro Yamada [Tue, 16 Jul 2019 23:26:57 +0000 (16:26 -0700)]
linux/bits.h: make BIT(), GENMASK(), and friends available in assembly
BIT(), GENMASK(), etc. are useful to define register bits of hardware.
However, low-level code is often written in assembly, where they are
not available due to the hard-coded 1UL, 0UL.
In fact, in-kernel headers such as arch/arm64/include/asm/sysreg.h
use _BITUL() instead of BIT() so that the register bit macros are
available in assembly.
Using macros in include/uapi/linux/const.h have two reasons:
[1] For use in uapi headers
We should use underscore-prefixed variants for user-space.
[2] For use in assembly code
Since _BITUL() uses UL(1) instead of 1UL, it can be used as an
alternative of BIT().
For [2], it is pretty easy to change BIT() etc. for use in assembly.
This allows to replace _BUTUL() in kernel-space headers with BIT().
Link: http://lkml.kernel.org/r/20190609153941.17249-1-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Weitao Hou [Tue, 16 Jul 2019 23:26:54 +0000 (16:26 -0700)]
kernel: fix typos and some coding style in comments
fix lenght to length
Link: http://lkml.kernel.org/r/20190521050937.4370-1-houweitaoo@gmail.com
Signed-off-by: Weitao Hou <houweitaoo@gmail.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Radoslaw Burny [Tue, 16 Jul 2019 23:26:51 +0000 (16:26 -0700)]
fs/proc/proc_sysctl.c: fix the default values of i_uid/i_gid on /proc/sys inodes.
Normally, the inode's i_uid/i_gid are translated relative to s_user_ns,
but this is not a correct behavior for proc. Since sysctl permission
check in test_perm is done against GLOBAL_ROOT_[UG]ID, it makes more
sense to use these values in u_[ug]id of proc inodes. In other words:
although uid/gid in the inode is not read during test_perm, the inode
logically belongs to the root of the namespace. I have confirmed this
with Eric Biederman at LPC and in this thread:
https://lore.kernel.org/lkml/87k1kzjdff.fsf@xmission.com
Consequences
============
Since the i_[ug]id values of proc nodes are not used for permissions
checks, this change usually makes no functional difference. However, it
causes an issue in a setup where:
* a namespace container is created without root user in container -
hence the i_[ug]id of proc nodes are set to INVALID_[UG]ID
* container creator tries to configure it by writing /proc/sys files,
e.g. writing /proc/sys/kernel/shmmax to configure shared memory limit
Kernel does not allow to open an inode for writing if its i_[ug]id are
invalid, making it impossible to write shmmax and thus - configure the
container.
Using a container with no root mapping is apparently rare, but we do use
this configuration at Google. Also, we use a generic tool to configure
the container limits, and the inability to write any of them causes a
failure.
History
=======
The invalid uids/gids in inodes first appeared due to
81754357770e (fs:
Update i_[ug]id_(read|write) to translate relative to s_user_ns).
However, AFAIK, this did not immediately cause any issues. The
inability to write to these "invalid" inodes was only caused by a later
commit
0bd23d09b874 (vfs: Don't modify inodes with a uid or gid unknown
to the vfs).
Tested: Used a repro program that creates a user namespace without any
mapping and stat'ed /proc/$PID/root/proc/sys/kernel/shmmax from outside.
Before the change, it shows the overflow uid, with the change it's 0.
The overflow uid indicates that the uid in the inode is not correct and
thus it is not possible to open the file for writing.
Link: http://lkml.kernel.org/r/20190708115130.250149-1-rburny@google.com
Fixes:
0bd23d09b874 ("vfs: Don't modify inodes with a uid or gid unknown to the vfs")
Signed-off-by: Radoslaw Burny <rburny@google.com>
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: "Eric W . Biederman" <ebiederm@xmission.com>
Cc: Seth Forshee <seth.forshee@canonical.com>
Cc: John Sperbeck <jsperbeck@google.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: <stable@vger.kernel.org> [4.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Tue, 16 Jul 2019 23:26:48 +0000 (16:26 -0700)]
proc: test /proc/sysvipc vs setns(CLONE_NEWIPC)
I thought that /proc/sysvipc has the same bug as /proc/net
commit
1fde6f21d90f8ba5da3cb9c54ca991ed72696c43
proc: fix /proc/net/* after setns(2)
However, it doesn't! /proc/sysvipc files do
get_ipc_ns(current->nsproxy->ipc_ns);
in their open() hook and avoid the problem.
Keep the test, maybe /proc/sysvipc will become broken someday :-\
Link: http://lkml.kernel.org/r/20190706180146.GA21015@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Tue, 16 Jul 2019 23:26:45 +0000 (16:26 -0700)]
fs/proc/inode.c: use typeof_member() macro
Don't repeat function signatures twice.
This is a kind-of-precursor for "struct proc_ops".
Note:
typeof(pde->proc_fops->...) ...;
can't be used because ->proc_fops is "const struct file_operations *".
"const" prevents assignment down the code and it can't be deleted in the
type system.
Link: http://lkml.kernel.org/r/20190529191110.GB5703@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Tue, 16 Jul 2019 23:26:42 +0000 (16:26 -0700)]
include/linux/kernel.h: add typeof_member() macro
Add typeof_member() macro so that types can be extracted without
introducing dummy variables.
Link: http://lkml.kernel.org/r/20190529190720.GA5703@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kairui Song [Tue, 16 Jul 2019 23:26:39 +0000 (16:26 -0700)]
vmcore: add a kernel parameter novmcoredd
Since commit
2724273e8fd0 ("vmcore: add API to collect hardware dump in
second kernel"), drivers are allowed to add device related dump data to
vmcore as they want by using the device dump API. This has a potential
issue, the data is stored in memory, drivers may append too much data
and use too much memory. The vmcore is typically used in a kdump kernel
which runs in a pre-reserved small chunk of memory. So as a result it
will make kdump unusable at all due to OOM issues.
So introduce new 'novmcoredd' command line option. User can disable
device dump to reduce memory usage. This is helpful if device dump is
using too much memory, disabling device dump could make sure a regular
vmcore without device dump data is still available.
[akpm@linux-foundation.org: tweak documentation]
[akpm@linux-foundation.org: vmcore.c needs moduleparam.h]
Link: http://lkml.kernel.org/r/20190528111856.7276-1-kasong@redhat.com
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
Reviewed-by: Bhupesh Sharma <bhsharma@redhat.com>
Cc: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Cc: "David S . Miller" <davem@davemloft.net>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Tue, 16 Jul 2019 23:26:36 +0000 (16:26 -0700)]
tools/testing/selftests/proc/proc-pid-vm.c: hide "segfault at
ffffffffff600000" dmesg spam
Test tries to access vsyscall page and if it doesn't exist gets SIGSEGV
which can spam into dmesg. However the segfault happens by design.
Handle it and carry information via exit code to parent.
Link: http://lkml.kernel.org/r/20190524181256.GA2260@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Christoph Hellwig [Tue, 16 Jul 2019 23:26:33 +0000 (16:26 -0700)]
mm: stub out all of swapops.h for !CONFIG_MMU
The whole header file deals with swap entries and PTEs, none of which
can exist for nommu builds. The current nommu ports have lots of stubs
to allow the inline functions in swapops.h to compile, but as none of
this functionality is actually used there is no point in even providing
it. This way we don't have to provide the stubs for the upcoming RISC-V
nommu port, and can eventually remove it from the existing ports.
Link: http://lkml.kernel.org/r/20190703122359.18200-4-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Christoph Hellwig [Tue, 16 Jul 2019 23:26:30 +0000 (16:26 -0700)]
mm: provide a print_vma_addr stub for !CONFIG_MMU
Link: http://lkml.kernel.org/r/20190703122359.18200-3-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Christoph Hellwig [Tue, 16 Jul 2019 23:26:27 +0000 (16:26 -0700)]
mm: fix the MAP_UNINITIALIZED flag
We can't expose UAPI symbols differently based on CONFIG_ symbols, as
userspace won't have them available. Instead always define the flag,
but only respect it based on the config option.
Link: http://lkml.kernel.org/r/20190703122359.18200-2-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Doug Berger [Tue, 16 Jul 2019 23:26:24 +0000 (16:26 -0700)]
mm/cma.c: fail if fixed declaration can't be honored
The description of cma_declare_contiguous() indicates that if the
'fixed' argument is true the reserved contiguous area must be exactly at
the address of the 'base' argument.
However, the function currently allows the 'base', 'size', and 'limit'
arguments to be silently adjusted to meet alignment constraints. This
commit enforces the documented behavior through explicit checks that
return an error if the region does not fit within a specified region.
Link: http://lkml.kernel.org/r/1561422051-16142-1-git-send-email-opendmb@gmail.com
Fixes:
5ea3b1b2f8ad ("cma: add placement specifier for "cma=" kernel parameter")
Signed-off-by: Doug Berger <opendmb@gmail.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Cc: Yue Hu <huyue2@yulong.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Peng Fan <peng.fan@nxp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Henry Burns [Tue, 16 Jul 2019 23:26:21 +0000 (16:26 -0700)]
mm/z3fold.c: reinitialize zhdr structs after migration
z3fold_page_migration() calls memcpy(new_zhdr, zhdr, PAGE_SIZE).
However, zhdr contains fields that can't be directly coppied over (ex:
list_head, a circular linked list). We only need to initialize the
linked lists in new_zhdr, as z3fold_isolate_page() already ensures that
these lists are empty
Additionally it is possible that zhdr->work has been placed in a
workqueue. In this case we shouldn't migrate the page, as zhdr->work
references zhdr as opposed to new_zhdr.
Link: http://lkml.kernel.org/r/20190716000520.230595-1-henryburns@google.com
Fixes:
1f862989b04ade61d3 ("mm/z3fold.c: support page migration")
Signed-off-by: Henry Burns <henryburns@google.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Vitaly Vul <vitaly.vul@sony.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Jonathan Adams <jwadams@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Henry Burns [Tue, 16 Jul 2019 23:26:18 +0000 (16:26 -0700)]
mm/z3fold.c: remove z3fold_migration trylock
z3fold_page_migrate() will never succeed because it attempts to acquire
a lock that has already been taken by migrate.c in __unmap_and_move().
__unmap_and_move() migrate.c
trylock_page(oldpage)
move_to_new_page(oldpage_newpage)
a_ops->migrate_page(oldpage, newpage)
z3fold_page_migrate(oldpage, newpage)
trylock_page(oldpage)
Link: http://lkml.kernel.org/r/20190710213238.91835-1-henryburns@google.com
Fixes:
1f862989b04a ("mm/z3fold.c: support page migration")
Signed-off-by: Henry Burns <henryburns@google.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Vitaly Vul <vitaly.vul@sony.com>
Cc: Jonathan Adams <jwadams@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Snild Dolkow <snild@sony.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Tue, 16 Jul 2019 23:26:15 +0000 (16:26 -0700)]
mm/vmscan.c: add checks for incorrect handling of current->reclaim_state
Six sites are presently altering current->reclaim_state. There is a
risk that one function stomps on a caller's value. Use a helper
function to catch such errors.
Cc: Yafang Shao <laoar.shao@gmail.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yafang Shao [Tue, 16 Jul 2019 23:26:12 +0000 (16:26 -0700)]
mm/vmscan.c: calculate reclaimed slab caches in all reclaim paths
There are six different reclaim paths by now:
- kswapd reclaim path
- node reclaim path
- hibernate preallocate memory reclaim path
- direct reclaim path
- memcg reclaim path
- memcg softlimit reclaim path
The slab caches reclaimed in these paths are only calculated in the
above three paths.
There're some drawbacks if we don't calculate the reclaimed slab caches.
- The sc->nr_reclaimed isn't correct if there're some slab caches
relcaimed in this path.
- The slab caches may be reclaimed thoroughly if there're lots of
reclaimable slab caches and few page caches.
Let's take an easy example for this case. If one memcg is full of
slab caches and the limit of it is 512M, in other words there're
approximately 512M slab caches in this memcg. Then the limit of the
memcg is reached and the memcg reclaim begins, and then in this memcg
reclaim path it will continuesly reclaim the slab caches until the
sc->priority drops to 0. After this reclaim stops, you will find
there're few slab caches left, which is less than 20M in my test
case. While after this patch applied the number is greater than 300M
and the sc->priority only drops to 3.
Link: http://lkml.kernel.org/r/1561112086-6169-3-git-send-email-laoar.shao@gmail.com
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yafang Shao [Tue, 16 Jul 2019 23:26:09 +0000 (16:26 -0700)]
mm/vmscan.c: add a new member reclaim_state in struct shrink_control
Patch series "mm/vmscan: calculate reclaimed slab in all reclaim paths".
This patchset is to fix the issues in doing shrink slab.
There're six different reclaim paths by now,
- kswapd reclaim path
- node reclaim path
- hibernate preallocate memory reclaim path
- direct reclaim path
- memcg reclaim path
- memcg softlimit reclaim path
The slab caches reclaimed in these paths are only calculated in the
above three paths. The issues are detailed explained in patch #2. We
should calculate the reclaimed slab caches in every reclaim path. In
order to do it, the struct reclaim_state is placed into the struct
shrink_control.
In node reclaim path, there'is another issue about shrinking slab, which
is adressed in "mm/vmscan: shrink slab in node reclaim"
(https://lore.kernel.org/linux-mm/
1559874946-22960-1-git-send-email-laoar.shao@gmail.com/).
This patch (of 2):
The struct reclaim_state is used to record how many slab caches are
reclaimed in one reclaim path. The struct shrink_control is used to
control one reclaim path. So we'd better put reclaim_state into
shrink_control.
[laoar.shao@gmail.com: remove reclaim_state assignment from __perform_reclaim()]
Link: http://lkml.kernel.org/r/1561381582-13697-1-git-send-email-laoar.shao@gmail.com
Link: http://lkml.kernel.org/r/1561112086-6169-2-git-send-email-laoar.shao@gmail.com
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yafang Shao [Tue, 16 Jul 2019 23:26:06 +0000 (16:26 -0700)]
mm/memcontrol.c: keep local VM counters in sync with the hierarchical ones
After commit
815744d75152 ("mm: memcontrol: don't batch updates of local
VM stats and events"), the local VM counter are not in sync with the
hierarchical ones.
Below is one example in a leaf memcg on my server (with 8 CPUs):
inactive_file
3567570944
total_inactive_file
3568029696
We find that the deviation is very great because the 'val' in
__mod_memcg_state() is in pages while the effective value in
memcg_stat_show() is in bytes.
So the maximum of this deviation between local VM stats and total VM
stats can be (32 * number_of_cpu * PAGE_SIZE), that may be an
unacceptably great value.
We should keep the local VM stats in sync with the total stats. In
order to keep this behavior the same across counters, this patch updates
__mod_lruvec_state() and __count_memcg_events() as well.
Link: http://lkml.kernel.org/r/1562851979-10610-1-git-send-email-laoar.shao@gmail.com
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Yafang Shao <shaoyafang@didiglobal.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Henry Burns [Tue, 16 Jul 2019 23:26:03 +0000 (16:26 -0700)]
mm/z3fold.c: allow __GFP_HIGHMEM in z3fold_alloc
One of the gfp flags used to show that a page is movable is
__GFP_HIGHMEM. Currently z3fold_alloc() fails when __GFP_HIGHMEM is
passed. Now that z3fold pages are movable, we allow __GFP_HIGHMEM. We
strip the movability related flags from the call to kmem_cache_alloc()
for our slots since it is a kernel allocation.
[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/20190712222118.108192-1-henryburns@google.com
Signed-off-by: Henry Burns <henryburns@google.com>
Acked-by: Vitaly Wool <vitalywool@gmail.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ryohei Suzuki [Tue, 16 Jul 2019 23:26:00 +0000 (16:26 -0700)]
mm/cma.c: fix a typo ("alloc_cma" -> "cma_alloc") in cma_release() comments
A comment referred to a non-existent function alloc_cma(), which should
have been cma_alloc().
Link: http://lkml.kernel.org/r/20190712085549.5920-1-ryh.szk.cmnty@gmail.com
Signed-off-by: Ryohei Suzuki <ryh.szk.cmnty@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Arnd Bergmann [Tue, 16 Jul 2019 23:25:57 +0000 (16:25 -0700)]
mm/slab_common.c: work around clang bug #42570
Clang gets rather confused about two variables in the same special
section when one of them is not initialized, leading to an assembler
warning later:
/tmp/slab_common-18f869.s: Assembler messages:
/tmp/slab_common-18f869.s:7526: Warning: ignoring changed section attributes for .data..ro_after_init
Adding an initialization to kmalloc_caches is rather silly here
but does avoid the issue.
Link: https://bugs.llvm.org/show_bug.cgi?id=42570
Link: http://lkml.kernel.org/r/20190712090455.266021-1-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: David Rientjes <rientjes@google.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Roman Gushchin <guro@fb.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Arnd Bergmann [Tue, 16 Jul 2019 23:25:54 +0000 (16:25 -0700)]
lib/mpi/longlong.h: fix building with 32-bit x86
The mpi library contains some rather old inline assembly statements that
produce a lot of warnings for 32-bit x86, such as:
lib/mpi/mpih-div.c:76:16: error: invalid use of a cast in a inline asm context requiring an l-value: remove the cast or build with -fheinous-gnu-extensions
udiv_qrnnd(qp[i], n1, n1, np[i], d);
~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~
lib/mpi/longlong.h:423:20: note: expanded from macro 'udiv_qrnnd'
: "=a" ((USItype)(q)), \
~~~~~~~~~~^~
There is no point in doing a type cast for the output of an inline
assembler statement, so just remove the cast here, as we have done for
other architectures in the past.
See also
dea632cadd12 ("lib/mpi: fix build with clang").
Link: http://lkml.kernel.org/r/20190712090740.340186-1-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Cc: Stefan Agner <stefan@agner.ch>
Cc: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Arnd Bergmann [Tue, 16 Jul 2019 23:25:51 +0000 (16:25 -0700)]
mm/shmem.c: fix unused shmem_parse_huge() function warning
When CONFIG_SYSFS is disabled but CONFIG_TMPFS is enabled, we get a
warning about shmem_parse_huge() never being called:
mm/shmem.c:417:12: error: unused function 'shmem_parse_huge' [-Werror,-Wunused-function]
static int shmem_parse_huge(const char *str)
Change the #ifdef so we no longer build this function in that configuration.
Link: http://lkml.kernel.org/r/20190712091141.673355-1-arnd@arndb.de
Fixes:
144df3b288c4 ("vfs: Convert ramfs, shmem, tmpfs, devtmpfs, rootfs to use the new mount API")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Vineeth Remanan Pillai <vpillai@digitalocean.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vitaly Wool [Tue, 16 Jul 2019 23:25:48 +0000 (16:25 -0700)]
mm/z3fold: don't try to use buddy slots after free
As reported by Henry Burns:
Running z3fold stress testing with address sanitization showed zhdr->slots
was being used after it was freed.
z3fold_free(z3fold_pool, handle)
free_handle(handle)
kmem_cache_free(pool->c_handle, zhdr->slots)
release_z3fold_page_locked_list(kref)
__release_z3fold_page(zhdr, true)
zhdr_to_pool(zhdr)
slots_to_pool(zhdr->slots) *BOOM*
To fix this, add pointer to the pool back to z3fold_header and modify
zhdr_to_pool to return zhdr->pool.
Link: http://lkml.kernel.org/r/20190708134808.e89f3bfadd9f6ffd7eff9ba9@gmail.com
Fixes:
7c2b8baa61fe ("mm/z3fold.c: add structure for buddy handles")
Signed-off-by: Vitaly Wool <vitalywool@gmail.com>
Reported-by: Henry Burns <henryburns@google.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Jonathan Adams <jwadams@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Tue, 16 Jul 2019 16:25:04 +0000 (09:25 -0700)]
Merge tag 'backlight-next-5.3' of git://git./linux/kernel/git/lee/backlight
Pull backlight updates from Lee Jones:
"New Functionality:
- Provide support for ACPI enumeration; gpio_backlight
Fix-ups:
- SPDX fixups; pwm_bl
- Fix linear brightness levels to include number available; pwm_bl"
* tag 'backlight-next-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight:
backlight: pwm_bl: Fix heuristic to determine number of brightness levels
backlight: gpio_backlight: Enable ACPI enumeration
backlight: pwm_bl: Convert to use SPDX identifier
Linus Torvalds [Tue, 16 Jul 2019 04:20:52 +0000 (21:20 -0700)]
Merge tag 'for-linus-
20190715' of git://git.kernel.dk/linux-block
Pull more block updates from Jens Axboe:
"A later pull request with some followup items. I had some vacation
coming up to the merge window, so certain things items were delayed a
bit. This pull request also contains fixes that came in within the
last few days of the merge window, which I didn't want to push right
before sending you a pull request.
This contains:
- NVMe pull request, mostly fixes, but also a few minor items on the
feature side that were timing constrained (Christoph et al)
- Report zones fixes (Damien)
- Removal of dead code (Damien)
- Turn on cgroup psi memstall (Josef)
- block cgroup MAINTAINERS entry (Konstantin)
- Flush init fix (Josef)
- blk-throttle low iops timing fix (Konstantin)
- nbd resize fixes (Mike)
- nbd 0 blocksize crash fix (Xiubo)
- block integrity error leak fix (Wenwen)
- blk-cgroup writeback and priority inheritance fixes (Tejun)"
* tag 'for-linus-
20190715' of git://git.kernel.dk/linux-block: (42 commits)
MAINTAINERS: add entry for block io cgroup
null_blk: fixup ->report_zones() for !CONFIG_BLK_DEV_ZONED
block: Limit zone array allocation size
sd_zbc: Fix report zones buffer allocation
block: Kill gfp_t argument of blkdev_report_zones()
block: Allow mapping of vmalloc-ed buffers
block/bio-integrity: fix a memory leak bug
nvme: fix NULL deref for fabrics options
nbd: add netlink reconfigure resize support
nbd: fix crash when the blksize is zero
block: Disable write plugging for zoned block devices
block: Fix elevator name declaration
block: Remove unused definitions
nvme: fix regression upon hot device removal and insertion
blk-throttle: fix zero wait time for iops throttled group
block: Fix potential overflow in blk_report_zones()
blkcg: implement REQ_CGROUP_PUNT
blkcg, writeback: Implement wbc_blkcg_css()
blkcg, writeback: Add wbc->no_cgroup_owner
blkcg, writeback: Rename wbc_account_io() to wbc_account_cgroup_owner()
...
Linus Torvalds [Tue, 16 Jul 2019 04:10:39 +0000 (21:10 -0700)]
Merge branch 'i2c/for-5.3' of git://git./linux/kernel/git/wsa/linux
Pull i2c updates from Wolfram Sang:
"New stuff from the I2C world:
- in the core, getting irqs from ACPI is now similar to OF
- new driver for MediaTek MT7621/7628/7688 SoCs
- bcm2835, i801, and tegra drivers got some more attention
- GPIO API cleanups
- cleanups in the core headers
- lots of usual driver updates"
* 'i2c/for-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (74 commits)
i2c: mt7621: Fix platform_no_drv_owner.cocci warnings
i2c: cpm: remove casting dma_alloc
dt-bindings: i2c: sun6i-p2wi: Fix the binding example
dt-bindings: i2c: mv64xxx: Fix the example compatible
i2c: i801: Documentation update
i2c: i801: Add support for Intel Tiger Lake
i2c: i801: Fix PCI ID sorting
dt-bindings: i2c-stm32: document optional dmas
i2c: i2c-stm32f7: Add I2C_SMBUS_I2C_BLOCK_DATA support
i2c: core: Tidy up handling of init_irq
i2c: core: Move ACPI gpio IRQ handling into i2c_acpi_get_irq
i2c: core: Move ACPI IRQ handling to probe time
i2c: acpi: Factor out getting the IRQ from ACPI
i2c: acpi: Use available IRQ helper functions
i2c: core: Allow whole core to use i2c_dev_irq_from_resources
eeprom: at24: modify a comment referring to platform data
dt-bindings: i2c: omap: Add new compatible for J721E SoCs
dt-bindings: i2c: mv64xxx: Add YAML schemas
dt-bindings: i2c: sun6i-p2wi: Add YAML schemas
i2c: mt7621: Add MediaTek MT7621/7628/7688 I2C driver
...
Linus Torvalds [Tue, 16 Jul 2019 04:06:15 +0000 (21:06 -0700)]
Merge tag 'for-v5.3' of git://git./linux/kernel/git/sre/linux-power-supply
Pull power supply and reset updates from Sebastian Reichel:
"Core:
- add HWMON compat layer
- new properties:
- input power limit
- input voltage limit
Drivers:
- qcom-pon: add gen2 support
- new driver for storing reboot move in NVMEM
- new driver for Wilco EC charger configuration
- simplify getting the adapter of a client"
* tag 'for-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply:
power: reset: nvmem-reboot-mode: add CONFIG_OF dependency
power_supply: wilco_ec: Add charging config driver
power: supply: cros: allow to set input voltage and current limit
power: supply: add input power and voltage limit properties
power: supply: fix semicolon.cocci warnings
power: reset: nvmem-reboot-mode: use NVMEM as reboot mode write interface
dt-bindings: power: reset: add document for NVMEM based reboot-mode
reset: qcom-pon: Add support for gen2 pon
dt-bindings: power: reset: qcom: Add qcom,pm8998-pon compatibility line
power: supply: Add HWMON compatibility layer
power: supply: sbs-manager: simplify getting the adapter of a client
power: supply: rt9455_charger: simplify getting the adapter of a client
power: supply: rt5033_battery: simplify getting the adapter of a client
power: supply: max17042_battery: simplify getting the adapter of a client
power: supply: max17040_battery: simplify getting the adapter of a client
power: supply: max14656_charger_detector: simplify getting the adapter of a client
power: supply: bq25890_charger: simplify getting the adapter of a client
power: supply: bq24257_charger: simplify getting the adapter of a client
power: supply: bq24190_charger: simplify getting the adapter of a client