Martin Kepplinger [Thu, 14 Sep 2017 06:01:38 +0000 (08:01 +0200)]
objtool: Fix memory leak in elf_create_rela_section()
Let's free the allocated char array 'relaname' before returning,
in order to avoid leaking memory.
Signed-off-by: Martin Kepplinger <martink@posteo.de>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: mingo.kernel.org@gmail.com
Link: http://lkml.kernel.org/r/20170914060138.26472-1-martink@posteo.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Mimi Zohar [Wed, 13 Sep 2017 02:45:33 +0000 (22:45 -0400)]
vfs: constify path argument to kernel_read_file_from_path
This patch constifies the path argument to kernel_read_file_from_path().
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 15 Sep 2017 03:04:32 +0000 (20:04 -0700)]
Merge tag 'nfs-for-4.14-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull more NFS client updates from Trond Myklebust:
"Hightlights include:
Bugfixes:
- Various changes relating to reporting IO errors.
- pnfs: Use the standard I/O stateid when calling LAYOUTGET
Features:
- Add static NFS I/O tracepoints for debugging"
* tag 'nfs-for-4.14-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFS: various changes relating to reporting IO errors.
NFS: Add static NFS I/O tracepoints
pNFS: Use the standard I/O stateid when calling LAYOUTGET
Linus Torvalds [Fri, 15 Sep 2017 03:01:41 +0000 (20:01 -0700)]
Merge branch 'work.misc' of git://git./linux/kernel/git/viro/vfs
Pull misc leftovers from Al Viro.
* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fix the __user misannotations in asm-generic get_user/put_user
fput: Don't reinvent the wheel but use existing llist API
namespace.c: Don't reinvent the wheel but use existing llist API
Linus Torvalds [Fri, 15 Sep 2017 02:29:55 +0000 (19:29 -0700)]
Merge branch 'work.read_write' of git://git./linux/kernel/git/viro/vfs
Pull nowait read support from Al Viro:
"Support IOCB_NOWAIT for buffered reads and block devices"
* 'work.read_write' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
block_dev: support RFW_NOWAIT on block device nodes
fs: support RWF_NOWAIT for buffered reads
fs: support IOCB_NOWAIT in generic_file_buffered_read
fs: pass iocb to do_generic_file_read
Linus Torvalds [Fri, 15 Sep 2017 01:54:01 +0000 (18:54 -0700)]
Merge branch 'work.mount' of git://git./linux/kernel/git/viro/vfs
Pull mount flag updates from Al Viro:
"Another chunk of fmount preparations from dhowells; only trivial
conflicts for that part. It separates MS_... bits (very grotty
mount(2) ABI) from the struct super_block ->s_flags (kernel-internal,
only a small subset of MS_... stuff).
This does *not* convert the filesystems to new constants; only the
infrastructure is done here. The next step in that series is where the
conflicts would be; that's the conversion of filesystems. It's purely
mechanical and it's better done after the merge, so if you could run
something like
list=$(for i in MS_RDONLY MS_NOSUID MS_NODEV MS_NOEXEC MS_SYNCHRONOUS MS_MANDLOCK MS_DIRSYNC MS_NOATIME MS_NODIRATIME MS_SILENT MS_POSIXACL MS_KERNMOUNT MS_I_VERSION MS_LAZYTIME; do git grep -l $i fs drivers/staging/lustre drivers/mtd ipc mm include/linux; done|sort|uniq|grep -v '^fs/namespace.c$')
sed -i -e 's/\<MS_RDONLY\>/SB_RDONLY/g' \
-e 's/\<MS_NOSUID\>/SB_NOSUID/g' \
-e 's/\<MS_NODEV\>/SB_NODEV/g' \
-e 's/\<MS_NOEXEC\>/SB_NOEXEC/g' \
-e 's/\<MS_SYNCHRONOUS\>/SB_SYNCHRONOUS/g' \
-e 's/\<MS_MANDLOCK\>/SB_MANDLOCK/g' \
-e 's/\<MS_DIRSYNC\>/SB_DIRSYNC/g' \
-e 's/\<MS_NOATIME\>/SB_NOATIME/g' \
-e 's/\<MS_NODIRATIME\>/SB_NODIRATIME/g' \
-e 's/\<MS_SILENT\>/SB_SILENT/g' \
-e 's/\<MS_POSIXACL\>/SB_POSIXACL/g' \
-e 's/\<MS_KERNMOUNT\>/SB_KERNMOUNT/g' \
-e 's/\<MS_I_VERSION\>/SB_I_VERSION/g' \
-e 's/\<MS_LAZYTIME\>/SB_LAZYTIME/g' \
$list
and commit it with something along the lines of 'convert filesystems
away from use of MS_... constants' as commit message, it would save a
quite a bit of headache next cycle"
* 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
VFS: Differentiate mount flags (MS_*) from internal superblock flags
VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb)
vfs: Add sb_rdonly(sb) to query the MS_RDONLY flag on s_flags
Linus Torvalds [Fri, 15 Sep 2017 01:13:32 +0000 (18:13 -0700)]
Merge branch 'work.set_fs' of git://git./linux/kernel/git/viro/vfs
Pull more set_fs removal from Al Viro:
"Christoph's 'use kernel_read and friends rather than open-coding
set_fs()' series"
* 'work.set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: unexport vfs_readv and vfs_writev
fs: unexport vfs_read and vfs_write
fs: unexport __vfs_read/__vfs_write
lustre: switch to kernel_write
gadget/f_mass_storage: stop messing with the address limit
mconsole: switch to kernel_read
btrfs: switch write_buf to kernel_write
net/9p: switch p9_fd_read to kernel_write
mm/nommu: switch do_mmap_private to kernel_read
serial2002: switch serial2002_tty_write to kernel_{read/write}
fs: make the buf argument to __kernel_write a void pointer
fs: fix kernel_write prototype
fs: fix kernel_read prototype
fs: move kernel_read to fs/read_write.c
fs: move kernel_write to fs/read_write.c
autofs4: switch autofs4_write to __kernel_write
ashmem: switch to ->read_iter
Linus Torvalds [Fri, 15 Sep 2017 00:37:26 +0000 (17:37 -0700)]
Merge branch 'work.ipc' of git://git./linux/kernel/git/viro/vfs
Pull ipc compat cleanup and 64-bit time_t from Al Viro:
"IPC copyin/copyout sanitizing, including 64bit time_t work from Deepa
Dinamani"
* 'work.ipc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
utimes: Make utimes y2038 safe
ipc: shm: Make shmid_kernel timestamps y2038 safe
ipc: sem: Make sem_array timestamps y2038 safe
ipc: msg: Make msg_queue timestamps y2038 safe
ipc: mqueue: Replace timespec with timespec64
ipc: Make sys_semtimedop() y2038 safe
get rid of SYSVIPC_COMPAT on ia64
semtimedop(): move compat to native
shmat(2): move compat to native
msgrcv(2), msgsnd(2): move compat to native
ipc(2): move compat to native
ipc: make use of compat ipc_perm helpers
semctl(): move compat to native
semctl(): separate all layout-dependent copyin/copyout
msgctl(): move compat to native
msgctl(): split the actual work from copyin/copyout
ipc: move compat shmctl to native
shmctl: split the work from copyin/copyout
Linus Torvalds [Fri, 15 Sep 2017 00:30:49 +0000 (17:30 -0700)]
Merge branch 'zstd-minimal' of git://git./linux/kernel/git/mason/linux-btrfs
Pull zstd support from Chris Mason:
"Nick Terrell's patch series to add zstd support to the kernel has been
floating around for a while. After talking with Dave Sterba, Herbert
and Phillip, we decided to send the whole thing in as one pull
request.
zstd is a big win in speed over zlib and in compression ratio over
lzo, and the compression team here at FB has gotten great results
using it in production. Nick will continue to update the kernel side
with new improvements from the open source zstd userland code.
Nick has a number of benchmarks for the main zstd code in his lib/zstd
commit:
I ran the benchmarks on a Ubuntu 14.04 VM with 2 cores and 4 GiB
of RAM. The VM is running on a MacBook Pro with a 3.1 GHz Intel
Core i7 processor, 16 GB of RAM, and a SSD. I benchmarked using
`silesia.tar` [3], which is 211,988,480 B large. Run the following
commands for the benchmark:
sudo modprobe zstd_compress_test
sudo mknod zstd_compress_test c 245 0
sudo cp silesia.tar zstd_compress_test
The time is reported by the time of the userland `cp`.
The MB/s is computed with
1,536,217,008 B / time(buffer size, hash)
which includes the time to copy from userland.
The Adjusted MB/s is computed with
1,536,217,088 B / (time(buffer size, hash) - time(buffer size, none)).
The memory reported is the amount of memory the compressor
requests.
| Method | Size (B) | Time (s) | Ratio | MB/s | Adj MB/s | Mem (MB) |
|----------|----------|----------|-------|---------|----------|----------|
| none |
11988480 | 0.100 | 1 | 2119.88 | - | - |
| zstd -1 |
73645762 | 1.044 | 2.878 | 203.05 | 224.56 | 1.23 |
| zstd -3 |
66988878 | 1.761 | 3.165 | 120.38 | 127.63 | 2.47 |
| zstd -5 |
65001259 | 2.563 | 3.261 | 82.71 | 86.07 | 2.86 |
| zstd -10 |
60165346 | 13.242 | 3.523 | 16.01 | 16.13 | 13.22 |
| zstd -15 |
58009756 | 47.601 | 3.654 | 4.45 | 4.46 | 21.61 |
| zstd -19 |
54014593 | 102.835 | 3.925 | 2.06 | 2.06 | 60.15 |
| zlib -1 |
77260026 | 2.895 | 2.744 | 73.23 | 75.85 | 0.27 |
| zlib -3 |
72972206 | 4.116 | 2.905 | 51.50 | 52.79 | 0.27 |
| zlib -6 |
68190360 | 9.633 | 3.109 | 22.01 | 22.24 | 0.27 |
| zlib -9 |
67613382 | 22.554 | 3.135 | 9.40 | 9.44 | 0.27 |
I benchmarked zstd decompression using the same method on the same
machine. The benchmark file is located in the upstream zstd repo
under `contrib/linux-kernel/zstd_decompress_test.c` [4]. The
memory reported is the amount of memory required to decompress
data compressed with the given compression level. If you know the
maximum size of your input, you can reduce the memory usage of
decompression irrespective of the compression level.
| Method | Time (s) | MB/s | Adjusted MB/s | Memory (MB) |
|----------|----------|---------|---------------|-------------|
| none | 0.025 | 8479.54 | - | - |
| zstd -1 | 0.358 | 592.15 | 636.60 | 0.84 |
| zstd -3 | 0.396 | 535.32 | 571.40 | 1.46 |
| zstd -5 | 0.396 | 535.32 | 571.40 | 1.46 |
| zstd -10 | 0.374 | 566.81 | 607.42 | 2.51 |
| zstd -15 | 0.379 | 559.34 | 598.84 | 4.61 |
| zstd -19 | 0.412 | 514.54 | 547.77 | 8.80 |
| zlib -1 | 0.940 | 225.52 | 231.68 | 0.04 |
| zlib -3 | 0.883 | 240.08 | 247.07 | 0.04 |
| zlib -6 | 0.844 | 251.17 | 258.84 | 0.04 |
| zlib -9 | 0.837 | 253.27 | 287.64 | 0.04 |
I ran a long series of tests and benchmarks on the btrfs side and the
gains are very similar to the core benchmarks Nick ran"
* 'zstd-minimal' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
squashfs: Add zstd support
btrfs: Add zstd support
lib: Add zstd modules
lib: Add xxhash module
Linus Torvalds [Thu, 14 Sep 2017 20:46:33 +0000 (13:46 -0700)]
Merge tag 'kbuild-v4.14' of git://git./linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- Use Make-builtin $(abspath ...) helper to get absolute path
- Add W=2 extra warning option to detect unused macros
- Use more KCONFIG_CONFIG instead hard-coded .config
- Fix bugs of tar*-pkg targets
* tag 'kbuild-v4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: buildtar: do not print successful message if tar returns error
kbuild: buildtar: fix tar error when CONFIG_MODULES is disabled
kbuild: Use KCONFIG_CONFIG in buildtar
Kbuild: enable -Wunused-macros warning for "make W=2"
kbuild: use $(abspath ...) instead of $(shell cd ... && /bin/pwd)
Linus Torvalds [Thu, 14 Sep 2017 20:43:16 +0000 (13:43 -0700)]
Merge tag 'for-4.14/dm-changes' of git://git./linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mike Snitzer:
- Some request-based DM core and DM multipath fixes and cleanups
- Constify a few variables in DM core and DM integrity
- Add bufio optimization and checksum failure accounting to DM
integrity
- Fix DM integrity to avoid checking integrity of failed reads
- Fix DM integrity to use init_completion
- A couple DM log-writes target fixes
- Simplify DAX flushing by eliminating the unnecessary flush
abstraction that was stood up for DM's use.
* tag 'for-4.14/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dax: remove the pmem_dax_ops->flush abstraction
dm integrity: use init_completion instead of COMPLETION_INITIALIZER_ONSTACK
dm integrity: make blk_integrity_profile structure const
dm integrity: do not check integrity for failed read operations
dm log writes: fix >512b sectorsize support
dm log writes: don't use all the cpu while waiting to log blocks
dm ioctl: constify ioctl lookup table
dm: constify argument arrays
dm integrity: count and display checksum failures
dm integrity: optimize writing dm-bufio buffers that are partially changed
dm rq: do not update rq partially in each ending bio
dm rq: make dm-sq requeuing behavior consistent with dm-mq behavior
dm mpath: complain about unsupported __multipath_map_bio() return values
dm mpath: avoid that building with W=1 causes gcc 7 to complain about fall-through
Linus Torvalds [Thu, 14 Sep 2017 20:33:33 +0000 (13:33 -0700)]
Merge tag 'fbdev-v4.14' of git://github.com/bzolnier/linux
Pull fbdev updates from Bartlomiej Zolnierkiewicz:
- make fbcon a built-time depency for fbdev (fbcon was tristate option
before, now it is a bool) - this is a first step in preparations for
making console_lock usage saner (currently it acts like the BKL for
all things fbdev/fbcon) (Daniel Vetter)
- add fbcon=margin:<color> command line option to select the fbcon
margin color (David Lechner)
- add DMI quirk table for x86 systems which need fbcon rotation
(devices like Asus T100HA, GPD Pocket, the GPD win and the I.T.Works
TW891) (Hans de Goede)
- fix 1bpp logo support for unusual width (needed by LEGO MINDSTORMS
EV3) (David Lechner)
- enable Xilinx FB driver for ARM ZynqMP platform (Michal Simek)
- fix use after free in the error path of udlfb driver (Anton Vasilyev)
- fix error return code handling in pxa3xx_gcu driver (Gustavo A. R.
Silva)
- fix bootparams.screeninfo arguments checking in vgacon (Jan H.
Schönherr)
- do not leak uninitialized padding in clk to userspace in the debug
code of atyfb driver (Vladis Dronov)
- fix compiler warnings in fbcon code and matroxfb driver (Arnd
Bergmann)
- convert fbdev susbsytem to using %pOF instead of full_name (Rob
Herring)
- structures constifications (Arvind Yadav, Bhumika Goyal, Gustavo A.
R. Silva, Julia Lawall)
- misc cleanups (Gustavo A. R. Silva, Hyun Kwon, Julia Lawall, Kuninori
Morimoto, Lynn Lei)
* tag 'fbdev-v4.14' of git://github.com/bzolnier/linux: (75 commits)
video/console: Update BIOS dates list for GPD win console rotation DMI quirk
video/console: Add rotated LCD-panel DMI quirk for the VIOS LTH17
video: fbdev: sis: fix duplicated code for different branches
video: fbdev: make fb_var_screeninfo const
video: fbdev: aty: do not leak uninitialized padding in clk to userspace
vgacon: Prevent faulty bootparams.screeninfo from causing harm
video: fbdev: make fb_videomode const
video/console: Add new BIOS date for GPD pocket to dmi quirk table
fbcon: remove restriction on margin color
video: ARM CLCD: constify amba_id
video: fm2fb: constify zorro_device_id
video: fbdev: annotate fb_fix_screeninfo with const and __initconst
omapfb: constify omap_video_timings structures
video: fbdev: udlfb: Fix use after free on dlfb_usb_probe error path
fbdev: i810: make fb_ops const
fbdev: matrox: make fb_ops const
video: fbdev: pxa3xx_gcu: fix error return code in pxa3xx_gcu_probe()
video: fbdev: Enable Xilinx FB for ZynqMP
video: fbdev: Fix multiple style issues in xilinxfb
video: fbdev: udlfb: constify usb_device_id.
...
Linus Torvalds [Thu, 14 Sep 2017 20:28:30 +0000 (13:28 -0700)]
Merge git://www.linux-watchdog.org/linux-watchdog
Pull watchdog updates from Wim Van Sebroeck:
- add support for the watchdog on Meson8 and Meson8m2
- add support for MediaTek MT7623 and MT7622 SoC
- add support for the r8a77995 wdt
- explicitly request exclusive reset control for asm9260_wdt,
zx2967_wdt, rt2880_wdt and mt7621_wdt
- improvements to asm9260_wdt, aspeed_wdt, renesas_wdt and cadence_wdt
- add support for reading freq via CCF + suspend/resume support for
of_xilinx_wdt
- constify watchdog_ops and various device-id structures
- revert of commit
1fccb73011ea ("iTCO_wdt: all versions count down
twice") (Bug 196509)
* git://www.linux-watchdog.org/linux-watchdog: (40 commits)
watchdog: mei_wdt: constify mei_cl_device_id
watchdog: sp805: constify amba_id
watchdog: ziirave: constify i2c_device_id
watchdog: sc1200: constify pnp_device_id
dt-bindings: watchdog: renesas-wdt: Add support for the r8a77995 wdt
watchdog: renesas_wdt: update copyright dates
watchdog: renesas_wdt: make 'clk' a variable local to probe()
watchdog: renesas_wdt: consistently use RuntimePM for clock management
watchdog: aspeed: Support configuration of external signal properties
dt-bindings: watchdog: aspeed: External reset signal properties
drivers/watchdog: Add optional ASPEED device tree properties
drivers/watchdog: ASPEED reference dev tree properties for config
watchdog: da9063_wdt: Simplify by removing unneeded struct...
watchdog: bcm7038: Check the return value from clk_prepare_enable()
watchdog: qcom: Check for platform_get_resource() failure
watchdog: of_xilinx_wdt: Add suspend/resume support
watchdog: of_xilinx_wdt: Add support for reading freq via CCF
dt-bindings: watchdog: mediatek: add support for MediaTek MT7623 and MT7622 SoC
watchdog: max77620_wdt: constify platform_device_id
watchdog: pcwd_usb: constify usb_device_id
...
Linus Torvalds [Thu, 14 Sep 2017 20:10:48 +0000 (13:10 -0700)]
Merge branch 'dmi-for-linus' of git://git./linux/kernel/git/jdelvare/staging
Pull dmi update from Jean Delvare:
"Mark all struct dmi_system_id instances const"
* 'dmi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
dmi: Mark all struct dmi_system_id instances const
Linus Torvalds [Thu, 14 Sep 2017 20:01:09 +0000 (13:01 -0700)]
Merge tag 'pinctrl-v4.14-2' of git://git./linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:
"This slew of fixes for pin control was noticed and patched up early,
so to get the annoyance out of the way for -rc1 it would make sense to
send them already.
- Fix a build include in the Uniphier driver to keep pace with
ongoing refactorings.
- Fix a slew of minor semantic and syntactic issues as well as
stricting up Kconfig for the new Spreadtrum driver.
- Fix the GPIO interrupt set-up on the Marvell 37xx Armada as fallout
for dynamically allocating irq descriptors from the core. (Also
tagged for stable.)
- Fix AMD register suspend/resume state spool/unspooling so that
wakeup works as it should. (Also tagged for stable.)"
* tag 'pinctrl-v4.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl/amd: save pin registers over suspend/resume
pinctrl: armada-37xx: Fix gpio interrupt setup
pinctrl: sprd: fix off by one bugs
pinctrl: sprd: check for allocation failure
pinctrl: sprd: Restrict PINCTRL_SPRD to ARCH_SPRD or COMPILE_TEST
pinctrl: sprd: fix build errors and dependencies
pinctrl: sprd: make three local functions static
pinctrl: uniphier: include <linux/build_bug.h> instead of <linux/bug.h>
Linus Torvalds [Thu, 14 Sep 2017 19:25:34 +0000 (12:25 -0700)]
Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
"A few leftovers"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm, page_owner: skip unnecessary stack_trace entries
arm64: stacktrace: avoid listing stacktrace functions in stacktrace
mm: treewide: remove GFP_TEMPORARY allocation flag
IB/mlx4: fix sprintf format warning
fscache: fix fscache_objlist_show format processing
lib/test_bitmap.c: use ULL suffix for 64-bit constants
procfs: remove unused variable
drivers/media/cec/cec-adap.c: fix build with gcc-4.4.4
idr: remove WARN_ON_ONCE() when trying to replace negative ID
Tim Chen [Fri, 25 Aug 2017 16:13:55 +0000 (09:13 -0700)]
sched/wait: Introduce wakeup boomark in wake_up_page_bit
Now that we have added breaks in the wait queue scan and allow bookmark
on scan position, we put this logic in the wake_up_page_bit function.
We can have very long page wait list in large system where multiple
pages share the same wait list. We break the wake up walk here to allow
other cpus a chance to access the list, and not to disable the interrupts
when traversing the list for too long. This reduces the interrupt and
rescheduling latency, and excessive page wait queue lock hold time.
[ v2: Remove bookmark_wake_function ]
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tim Chen [Fri, 25 Aug 2017 16:13:54 +0000 (09:13 -0700)]
sched/wait: Break up long wake list walk
We encountered workloads that have very long wake up list on large
systems. A waker takes a long time to traverse the entire wake list and
execute all the wake functions.
We saw page wait list that are up to 3700+ entries long in tests of
large 4 and 8 socket systems. It took 0.8 sec to traverse such list
during wake up. Any other CPU that contends for the list spin lock will
spin for a long time. It is a result of the numa balancing migration of
hot pages that are shared by many threads.
Multiple CPUs waking are queued up behind the lock, and the last one
queued has to wait until all CPUs did all the wakeups.
The page wait list is traversed with interrupt disabled, which caused
various problems. This was the original cause that triggered the NMI
watch dog timer in: https://patchwork.kernel.org/patch/
9800303/ . Only
extending the NMI watch dog timer there helped.
This patch bookmarks the waker's scan position in wake list and break
the wake up walk, to allow access to the list before the waker resume
its walk down the rest of the wait list. It lowers the interrupt and
rescheduling latency.
This patch also provides a performance boost when combined with the next
patch to break up page wakeup list walk. We saw 22% improvement in the
will-it-scale file pread2 test on a Xeon Phi system running 256 threads.
[ v2: Merged in Linus' changes to remove the bookmark_wake_function, and
simply access to flags. ]
Reported-by: Kan Liang <kan.liang@intel.com>
Tested-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Christoph Hellwig [Thu, 14 Sep 2017 09:59:30 +0000 (11:59 +0200)]
dmi: Mark all struct dmi_system_id instances const
... and __initconst if applicable.
Based on similar work for an older kernel in the Grsecurity patch.
[JD: fix toshiba-wmi build]
[JD: add htcpen]
[JD: move __initconst where checkscript wants it]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Prakash Gupta [Wed, 13 Sep 2017 23:28:35 +0000 (16:28 -0700)]
mm, page_owner: skip unnecessary stack_trace entries
The page_owner stacktrace always begin as follows:
[<
ffffff987bfd48f4>] save_stack+0x40/0xc8
[<
ffffff987bfd4da8>] __set_page_owner+0x3c/0x6c
These two entries do not provide any useful information and limits the
available stacktrace depth. The page_owner stacktrace was skipping
caller function from stack entries but this was missed with commit
f2ca0b557107 ("mm/page_owner: use stackdepot to store stacktrace")
Example page_owner entry after the patch:
Page allocated via order 0, mask 0x8(
ffffff80085fb714)
PFN 654411 type Movable Block 639 type CMA Flags 0x0(
ffffffbe5c7f12c0)
[<
ffffff9b64989c14>] post_alloc_hook+0x70/0x80
...
[<
ffffff9b651216e8>] msm_comm_try_state+0x5f8/0x14f4
[<
ffffff9b6512486c>] msm_vidc_open+0x5e4/0x7d0
[<
ffffff9b65113674>] msm_v4l2_open+0xa8/0x224
Link: http://lkml.kernel.org/r/1504078343-28754-2-git-send-email-guptap@codeaurora.org
Fixes: f2ca0b557107 ("mm/page_owner: use stackdepot to store stacktrace")
Signed-off-by: Prakash Gupta <guptap@codeaurora.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Prakash Gupta [Wed, 13 Sep 2017 23:28:32 +0000 (16:28 -0700)]
arm64: stacktrace: avoid listing stacktrace functions in stacktrace
The stacktraces always begin as follows:
[<
c00117b4>] save_stack_trace_tsk+0x0/0x98
[<
c0011870>] save_stack_trace+0x24/0x28
...
This is because the stack trace code includes the stack frames for
itself. This is incorrect behaviour, and also leads to "skip" doing the
wrong thing (which is the number of stack frames to avoid recording.)
Perversely, it does the right thing when passed a non-current thread.
Fix this by ensuring that we have a known constant number of frames
above the main stack trace function, and always skip these.
This was fixed for arch arm by commit
3683f44c42e9 ("ARM: stacktrace:
avoid listing stacktrace functions in stacktrace")
Link: http://lkml.kernel.org/r/1504078343-28754-1-git-send-email-guptap@codeaurora.org
Signed-off-by: Prakash Gupta <guptap@codeaurora.org>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Hocko [Wed, 13 Sep 2017 23:28:29 +0000 (16:28 -0700)]
mm: treewide: remove GFP_TEMPORARY allocation flag
GFP_TEMPORARY was introduced by commit
e12ba74d8ff3 ("Group short-lived
and reclaimable kernel allocations") along with __GFP_RECLAIMABLE. It's
primary motivation was to allow users to tell that an allocation is
short lived and so the allocator can try to place such allocations close
together and prevent long term fragmentation. As much as this sounds
like a reasonable semantic it becomes much less clear when to use the
highlevel GFP_TEMPORARY allocation flag. How long is temporary? Can the
context holding that memory sleep? Can it take locks? It seems there is
no good answer for those questions.
The current implementation of GFP_TEMPORARY is basically GFP_KERNEL |
__GFP_RECLAIMABLE which in itself is tricky because basically none of
the existing caller provide a way to reclaim the allocated memory. So
this is rather misleading and hard to evaluate for any benefits.
I have checked some random users and none of them has added the flag
with a specific justification. I suspect most of them just copied from
other existing users and others just thought it might be a good idea to
use without any measuring. This suggests that GFP_TEMPORARY just
motivates for cargo cult usage without any reasoning.
I believe that our gfp flags are quite complex already and especially
those with highlevel semantic should be clearly defined to prevent from
confusion and abuse. Therefore I propose dropping GFP_TEMPORARY and
replace all existing users to simply use GFP_KERNEL. Please note that
SLAB users with shrinkers will still get __GFP_RECLAIMABLE heuristic and
so they will be placed properly for memory fragmentation prevention.
I can see reasons we might want some gfp flag to reflect shorterm
allocations but I propose starting from a clear semantic definition and
only then add users with proper justification.
This was been brought up before LSF this year by Matthew [1] and it
turned out that GFP_TEMPORARY really doesn't have a clear semantic. It
seems to be a heuristic without any measured advantage for most (if not
all) its current users. The follow up discussion has revealed that
opinions on what might be temporary allocation differ a lot between
developers. So rather than trying to tweak existing users into a
semantic which they haven't expected I propose to simply remove the flag
and start from scratch if we really need a semantic for short term
allocations.
[1] http://lkml.kernel.org/r/
20170118054945.GD18349@bombadil.infradead.org
[akpm@linux-foundation.org: fix typo]
[akpm@linux-foundation.org: coding-style fixes]
[sfr@canb.auug.org.au: drm/i915: fix up]
Link: http://lkml.kernel.org/r/20170816144703.378d4f4d@canb.auug.org.au
Link: http://lkml.kernel.org/r/20170728091904.14627-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Neil Brown <neilb@suse.de>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Arnd Bergmann [Wed, 13 Sep 2017 23:28:26 +0000 (16:28 -0700)]
IB/mlx4: fix sprintf format warning
gcc-7 points out that a negative port_num value would overflow the
string buffer:
drivers/infiniband/hw/mlx4/sysfs.c: In function 'mlx4_ib_device_register_sysfs':
drivers/infiniband/hw/mlx4/sysfs.c:251:16: error: 'sprintf' may write a terminating nul past the end of the destination [-Werror=format-overflow=]
drivers/infiniband/hw/mlx4/sysfs.c:251:2: note: 'sprintf' output between 2 and 11 bytes into a destination of size 10
drivers/infiniband/hw/mlx4/sysfs.c:303:17: error: 'sprintf' may write a terminating nul past the end of the destination [-Werror=format-overflow=]
drivers/infiniband/hw/mlx4/sysfs.c:303:3: note: 'sprintf' output between 2 and 11 bytes into a destination of size 10
While we should be able to assume that port_num is positive here, making
the buffer one byte longer has no downsides and avoids the warning.
Fixes: c1e7e466120b ("IB/mlx4: Add iov directory in sysfs under the ib device")
Link: http://lkml.kernel.org/r/20170714120720.906842-23-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Arnd Bergmann [Wed, 13 Sep 2017 23:28:23 +0000 (16:28 -0700)]
fscache: fix fscache_objlist_show format processing
gcc points out a minor bug in the handling of unknown cookie types,
which could result in a string overflow when the integer is copied into
a 3-byte string:
fs/fscache/object-list.c: In function 'fscache_objlist_show':
fs/fscache/object-list.c:265:19: error: 'sprintf' may write a terminating nul past the end of the destination [-Werror=format-overflow=]
sprintf(_type, "%02u", cookie->def->type);
^~~~~~
fs/fscache/object-list.c:265:4: note: 'sprintf' output between 3 and 4 bytes into a destination of size 3
This is currently harmless as no code sets a type other than 0 or 1, but
it makes sense to use snprintf() here to avoid overflowing the array if
that changes.
Link: http://lkml.kernel.org/r/20170714120720.906842-22-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Geert Uytterhoeven [Wed, 13 Sep 2017 23:28:20 +0000 (16:28 -0700)]
lib/test_bitmap.c: use ULL suffix for 64-bit constants
With gcc 4.1.2:
lib/test_bitmap.c:189: warning: integer constant is too large for `long' type
lib/test_bitmap.c:190: warning: integer constant is too large for `long' type
lib/test_bitmap.c:194: warning: integer constant is too large for `long' type
lib/test_bitmap.c:195: warning: integer constant is too large for `long' type
Add the missing "ULL" suffix to fix this.
Link: http://lkml.kernel.org/r/1505040523-31230-1-git-send-email-geert@linux-m68k.org
Fixes: 60ef690018b262dd ("bitmap: introduce BITMAP_FROM_U64()")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Yury Norov <ynorov@caviumnetworks.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Arnd Bergmann [Wed, 13 Sep 2017 23:28:17 +0000 (16:28 -0700)]
procfs: remove unused variable
In NOMMU configurations, we get a warning about a variable that has become
unused:
fs/proc/task_nommu.c: In function 'nommu_vma_show':
fs/proc/task_nommu.c:148:28: error: unused variable 'priv' [-Werror=unused-variable]
Link: http://lkml.kernel.org/r/20170911200231.3171415-1-arnd@arndb.de
Fixes: 1240ea0dc3bb ("fs, proc: remove priv argument from is_stack")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Wed, 13 Sep 2017 23:28:14 +0000 (16:28 -0700)]
drivers/media/cec/cec-adap.c: fix build with gcc-4.4.4
gcc-4.4.4 has issues with initialization of anonymous unions:
drivers/media/cec/cec-adap.c: In function 'cec_queue_msg_fh':
drivers/media/cec/cec-adap.c:184: error: unknown field 'lost_msgs' specified in initializer
work around this.
Fixes: 6b2bbb08747a5 ("media: cec: rework the cec event handling")
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Hans Verkuil <hans.verkuil@cisco.com>
Cc: Maxime Ripard <maxime.ripard@free-electrons.com>
Cc: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric Biggers [Wed, 13 Sep 2017 23:28:11 +0000 (16:28 -0700)]
idr: remove WARN_ON_ONCE() when trying to replace negative ID
IDR only supports non-negative IDs. There used to be a 'WARN_ON_ONCE(id <
0)' in idr_replace(), but it was intentionally removed by commit
2e1c9b286765 ("idr: remove WARN_ON_ONCE() on negative IDs").
Then it was added back by commit
0a835c4f090a ("Reimplement IDR and IDA
using the radix tree"). However it seems that adding it back was a
mistake, given that some users such as drm_gem_handle_delete()
(DRM_IOCTL_GEM_CLOSE) pass in a value from userspace to idr_replace(),
allowing the WARN_ON_ONCE to be triggered. drm_gem_handle_delete()
actually just wants idr_replace() to return an error code if the ID is
not allocated, including in the case where the ID is invalid (negative).
So once again remove the bogus WARN_ON_ONCE().
This bug was found by syzkaller, which encountered the following
warning:
WARNING: CPU: 3 PID: 3008 at lib/idr.c:157 idr_replace+0x1d8/0x240 lib/idr.c:157
Kernel panic - not syncing: panic_on_warn set ...
CPU: 3 PID: 3008 Comm: syzkaller218828 Not tainted 4.13.0-rc4-next-
20170811 #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:190
do_trap_no_signal arch/x86/kernel/traps.c:224 [inline]
do_trap+0x260/0x390 arch/x86/kernel/traps.c:273
do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:310
do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:323
invalid_op+0x1e/0x30 arch/x86/entry/entry_64.S:930
RIP: 0010:idr_replace+0x1d8/0x240 lib/idr.c:157
RSP: 0018:
ffff8800394bf9f8 EFLAGS:
00010297
RAX:
ffff88003c6c60c0 RBX:
1ffff10007297f43 RCX:
0000000000000000
RDX:
0000000000000000 RSI:
0000000000000000 RDI:
ffff8800394bfa78
RBP:
ffff8800394bfae0 R08:
ffffffff82856487 R09:
0000000000000000
R10:
ffff8800394bf9a8 R11:
ffff88006c8bae28 R12:
ffffffffffffffff
R13:
ffff8800394bfab8 R14:
dffffc0000000000 R15:
ffff8800394bfbc8
drm_gem_handle_delete+0x33/0xa0 drivers/gpu/drm/drm_gem.c:297
drm_gem_close_ioctl+0xa1/0xe0 drivers/gpu/drm/drm_gem.c:671
drm_ioctl_kernel+0x1e7/0x2e0 drivers/gpu/drm/drm_ioctl.c:729
drm_ioctl+0x72e/0xa50 drivers/gpu/drm/drm_ioctl.c:825
vfs_ioctl fs/ioctl.c:45 [inline]
do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:685
SYSC_ioctl fs/ioctl.c:700 [inline]
SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691
entry_SYSCALL_64_fastpath+0x1f/0xbe
Here is a C reproducer:
#include <fcntl.h>
#include <stddef.h>
#include <stdint.h>
#include <sys/ioctl.h>
#include <drm/drm.h>
int main(void)
{
int cardfd = open("/dev/dri/card0", O_RDONLY);
ioctl(cardfd, DRM_IOCTL_GEM_CLOSE,
&(struct drm_gem_close) { .handle = -1 } );
}
Link: http://lkml.kernel.org/r/20170906235306.20534-1-ebiggers3@gmail.com
Fixes: 0a835c4f090a ("Reimplement IDR and IDA using the radix tree")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: <stable@vger.kernel.org> [v4.11+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Wed, 13 Sep 2017 19:24:20 +0000 (12:24 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"A handful of tooling fixes"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf stat: Wait for the correct child
perf tools: Support running perf binaries with a dash in their name
perf config: Check not only section->from_system_config but also item's
perf ui progress: Fix progress update
perf ui progress: Make sure we always define step value
perf tools: Open perf.data with O_CLOEXEC flag
tools lib api: Fix make DEBUG=1 build
perf tests: Fix compile when libunwind's unwind.h is available
tools include linux: Guard against redefinition of some macros
Linus Torvalds [Wed, 13 Sep 2017 19:22:32 +0000 (12:22 -0700)]
Merge branch 'sched-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
"Three CPU hotplug related fixes and a debugging improvement"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/debug: Add debugfs knob for "sched_debug"
sched/core: WARN() when migrating to an offline CPU
sched/fair: Plug hole between hotplug and active_load_balance()
sched/fair: Avoid newidle balance for !active CPUs
Linus Torvalds [Wed, 13 Sep 2017 18:56:16 +0000 (11:56 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"The main changes are the PCID fixes from Andy, but there's also two
hyperv fixes and two paravirt updates"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/hyper-v: Remove duplicated HV_X64_EX_PROCESSOR_MASKS_RECOMMENDED definition
x86/hyper-V: Allocate the IDT entry early in boot
paravirt: Switch maintainer
x86/paravirt: Remove no longer used paravirt functions
x86/mm/64: Initialize CR4.PCIDE early
x86/hibernate/64: Mask off CR3's PCID bits in the saved CR3
x86/mm: Get rid of VM_BUG_ON in switch_tlb_irqs_off()
Linus Torvalds [Wed, 13 Sep 2017 18:52:18 +0000 (11:52 -0700)]
Merge tag 'openrisc-for-linus' of git://github.com/openrisc/linux
Pull OpenRISC fixlet from Stafford Horne:
"Fix warning for upcoming work to remove linux/vmalloc.h from
asm-generic/io.h"
* tag 'openrisc-for-linus' of git://github.com/openrisc/linux:
openrisc: add forward declaration for struct vm_area_struct
Linus Torvalds [Wed, 13 Sep 2017 18:28:19 +0000 (11:28 -0700)]
Merge tag 'modules-for-v4.14' of git://git./linux/kernel/git/jeyu/linux
Pull modules updates from Jessica Yu:
"Summary of modules changes for the 4.14 merge window:
- minor code cleanups and fixes
- modpost: avoid building modules that have names that exceed the
size of the name field in struct module"
* tag 'modules-for-v4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
module: Remove const attribute from alias for MODULE_DEVICE_TABLE
module: fix ddebug_remove_module()
modpost: abort if module name is too long
Linus Torvalds [Wed, 13 Sep 2017 18:18:19 +0000 (11:18 -0700)]
Fix up MAINTAINERS file sorting
Another merge window, another MAINTAINERS file disaster.
People have serious problems with the alphabet and sorting, and poor
Jérôme Glisse and Radim Krčmář get their names mangled by locale issues,
turning them into some mangled mess (probably others do too, but those
two stood out when sorting things again).
And we now have two copies of the same 'AS3645A LED FLASH CONTROLLER
DRIVER' in the tree and in the MAINTAINERS file, but that's a separate
issue - the duplication is real, and I left them as two entries for the
same name.
This does not try to sort the actual section pattern entries, although I
may end up doing that later.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Wed, 13 Sep 2017 18:04:14 +0000 (11:04 -0700)]
Merge tag 'clk-for-linus' of git://git./linux/kernel/git/clk/linux
Pull clk updates from Stephen Boyd:
"The diff is dominated by the Allwinner A10/A20 SoCs getting converted
to the sunxi-ng framework. Otherwise, the heavy hitters are various
drivers for SoCs like AT91, Amlogic, Renesas, and Rockchip. There are
some other new clk drivers in here too but overall this is just a
bunch of clk drivers for various different pieces of hardware and a
collection of non-critical fixes for clk drivers.
New Drivers:
- Allwinner R40 SoCs
- Renesas R-Car Gen3 USB 2.0 clock selector PHY
- Atmel AT91 audio PLL
- Uniphier PXs3 SoCs
- ARC HSDK Board PLLs
- AXS10X Board PLLs
- STMicroelectronics STM32H743 SoCs
Removed Drivers:
- Non-compiling mb86s7x support
Updates:
- Allwinner A10/A20 SoCs converted to sunxi-ng framework
- Allwinner H3 CPU clk fixes
- Renesas R-Car D3 SoC
- Renesas V2H and M3-W modules
- Samsung Exynos5420/5422/5800 audio fixes
- Rockchip fractional clk approximation fixes
- Rockchip rk3126 SoC support within the rk3128 driver
- Amlogic gxbb CEC32 and sd_emmc clks
- Amlogic meson8b reset controller support
- IDT VersaClock 5P49V5925/5P49V6901 support
- Qualcomm MSM8996 SMMU clks
- Various 'const' applications for struct clk_ops
- si5351 PLL reset bugfix
- Uniphier audio on LD11/LD20 and ethernet support on LD11/LD20/Pro4/PXs2
- Assorted Tegra clk driver fixes"
* tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (120 commits)
clk: si5351: fix PLL reset
ASoC: atmel-classd: remove aclk clock
ASoC: atmel-classd: remove aclk clock from DT binding
clk: at91: clk-generated: make gclk determine audio_pll rate
clk: at91: clk-generated: create function to find best_diff
clk: at91: add audio pll clock drivers
dt-bindings: clk: at91: add audio plls to the compatible list
clk: at91: clk-generated: remove useless divisor loop
clk: mb86s7x: Drop non-building driver
clk: ti: check for null return in strrchr to avoid null dereferencing
clk: Don't write error code into divider register
clk: uniphier: add video input subsystem clock
clk: uniphier: add audio system clock
clk: stm32h7: Add stm32h743 clock driver
clk: gate: expose clk_gate_ops::is_enabled
clk: nxp: clk-lpc32xx: rename clk_gate_is_enabled()
clk: uniphier: add PXs3 clock data
clk: hi6220: change watchdog clock source
clk: Kconfig: Name RK805 in Kconfig for COMMON_CLK_RK808
clk: cs2000: Add cs2000_set_saved_rate
...
Linus Torvalds [Wed, 13 Sep 2017 17:56:00 +0000 (10:56 -0700)]
Merge tag 'rtc-4.14' of git://git./linux/kernel/git/abelloni/linux
Pull RTC updates from Alexandre Belloni:
"Subsystem:
- remove .open() and .release() RTC ops
- constify i2c_device_id
New driver:
- Realtek RTD1295
- Android emulator (goldfish) RTC
Drivers:
- ds1307: Beginning of a huge cleanup
- s35390a: handle invalid RTC time
- sun6i: external oscillator gate support"
* tag 'rtc-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (40 commits)
rtc: ds1307: use octal permissions
rtc: ds1307: fix braces
rtc: ds1307: fix alignments and blank lines
rtc: ds1307: use BIT
rtc: ds1307: use u32
rtc: ds1307: use sizeof
rtc: ds1307: remove regs member
rtc: Add Realtek RTD1295
dt-bindings: rtc: Add Realtek RTD1295
rtc: sun6i: Add support for the external oscillator gate
rtc: goldfish: Add RTC driver for Android emulator
dt-bindings: Add device tree binding for Goldfish RTC driver
rtc: ds1307: add basic support for ds1341 chip
rtc: ds1307: remove member nvram_offset from struct ds1307
rtc: ds1307: factor out offset to struct chip_desc
rtc: ds1307: factor out rtc_ops to struct chip_desc
rtc: ds1307: factor out irq_handler to struct chip_desc
rtc: ds1307: improve irq setup
rtc: ds1307: constify struct chip_desc variables
rtc: ds1307: improve trickle charger initialization
...
Linus Torvalds [Wed, 13 Sep 2017 17:50:06 +0000 (10:50 -0700)]
Merge tag 'sound-fix-4.14-rc1' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Most of the commits are trivial cleanup patches, while one commit is a
significant fix for the race at ALSA sequencer that was spotted by
syzkaller"
* tag 'sound-fix-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: seq: Cancel pending autoload work at unbinding device
ALSA: firewire: Use common error handling code in snd_motu_stream_start_duplex()
ALSA: asihpi: Kill BUG_ON() usages
ALSA: core: Use %pS printk format for direct addresses
ALSA: ymfpci: Use common error handling code in snd_ymfpci_create()
ALSA: ymfpci: Use common error handling code in snd_card_ymfpci_probe()
ALSA: 6fire: Use common error handling code in usb6fire_chip_probe()
ALSA: usx2y: Use common error handling code in submit_urbs()
ALSA: us122l: Use common error handling code in us122l_create_card()
ALSA: hdspm: Use common error handling code in snd_hdspm_probe()
ALSA: rme9652: Use common code in hdsp_get_iobox_version()
ALSA: maestro3: Use common error handling code in two functions
Linus Torvalds [Wed, 13 Sep 2017 17:47:14 +0000 (10:47 -0700)]
Merge tag 'scsi-misc' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"A tiny update: one patch corrects a Kconfig problem with the shift of
the SAS SMP code to BSG and the other removes a vestige of user space
target mode"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: scsi_transport_sas: select BLK_DEV_BSGLIB
scsi: Remove Scsi_Host.uspace_req_q
Linus Torvalds [Wed, 13 Sep 2017 17:20:41 +0000 (10:20 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"Small collection of fixes that would be nice to have in -rc1. This
contains:
- NVMe pull request form Christoph, mostly with fixes for nvme-pci,
host memory buffer in particular.
- Error handling fixup for cgwb_create(), in case allocation of 'wb'
fails. From Christophe Jaillet.
- Ensure that trace_block_getrq() gets the 'dev' in an appropriate
fashion, to avoid a potential NULL deref. From Greg Thelen.
- Regression fix for dm-mq with blk-mq, fixing a problem with
stacking IO schedulers. From me.
- string.h fixup, fixing an issue with memcpy_and_pad(). This
original change came in through an NVMe dependency, which is why
I'm including it here. From Martin Wilck.
- Fix potential int overflow in __blkdev_sectors_to_bio_pages(), from
Mikulas.
- MBR enable fix for sed-opal, from Scott"
* 'for-linus' of git://git.kernel.dk/linux-block:
block: directly insert blk-mq request from blk_insert_cloned_request()
mm/backing-dev.c: fix an error handling path in 'cgwb_create()'
string.h: un-fortify memcpy_and_pad
nvme-pci: implement the HMB entry number and size limitations
nvme-pci: propagate (some) errors from host memory buffer setup
nvme-pci: use appropriate initial chunk size for HMB allocation
nvme-pci: fix host memory buffer allocation fallback
nvme: fix lightnvm check
block: fix integer overflow in __blkdev_sectors_to_bio_pages()
block: sed-opal: Set MBRDone on S3 resume path if TPER is MBREnabled
block: tolerate tracing of NULL bio
Linus Torvalds [Wed, 13 Sep 2017 17:18:34 +0000 (10:18 -0700)]
Merge tag 'docs-4.14' of git://git.lwn.net/linux
Pull documentation fixes from Jonathan Corbet:
"A cleanup from Mauro that needed to wait for the media pull, plus a
handful of other fixes that wandered in"
* tag 'docs-4.14' of git://git.lwn.net/linux:
kokr/memory-barriers.txt: Apply atomic_t.txt change
kokr/doc: Update memory-barriers.txt for read-to-write dependencies
docs-rst: don't require adjustbox anymore
docs-rst: conf.py: only setup notice box colors if Sphinx < 1.6
docs-rst: conf.py: remove lscape from LaTeX preamble
Linus Torvalds [Wed, 13 Sep 2017 17:10:19 +0000 (10:10 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/mszeredi/fuse
Pull fuse updates from Miklos Szeredi:
"This fixes a regression (spotted by the Sandstorm.io folks) in the pid
namespace handling introduced in 4.12.
There's also a fix for honoring sync/dsync flags for pwritev2()"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: getattr cleanup
fuse: honor iocb sync flags on write
fuse: allow server to run in different pid_ns
Linus Torvalds [Wed, 13 Sep 2017 16:11:44 +0000 (09:11 -0700)]
Merge branch 'overlayfs-linus' of git://git./linux/kernel/git/mszeredi/vfs
Pull overlayfs updates from Miklos Szeredi:
"This fixes d_ino correctness in readdir, which brings overlayfs on par
with normal filesystems regarding inode number semantics, as long as
all layers are on the same filesystem.
There are also some bug fixes, one in particular (random ioctl's
shouldn't be able to modify lower layers) that touches some vfs code,
but of course no-op for non-overlay fs"
* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: fix false positive ESTALE on lookup
ovl: don't allow writing ioctl on lower layer
ovl: fix relatime for directories
vfs: add flags to d_real()
ovl: cleanup d_real for negative
ovl: constant d_ino for non-merge dirs
ovl: constant d_ino across copy up
ovl: fix readdir error value
ovl: check snprintf return
Vitaly Kuznetsov [Mon, 11 Sep 2017 15:06:20 +0000 (17:06 +0200)]
x86/hyper-v: Remove duplicated HV_X64_EX_PROCESSOR_MASKS_RECOMMENDED definition
Commits:
7dcf90e9e032 ("PCI: hv: Use vPCI protocol version 1.2")
628f54cc6451 ("x86/hyper-v: Support extended CPU ranges for TLB flush hypercalls")
added the same definition and they came in through different trees.
Fix the duplication.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: devel@linuxdriverproject.org
Link: http://lkml.kernel.org/r/20170911150620.3998-1-vkuznets@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
K. Y. Srinivasan [Fri, 8 Sep 2017 23:15:57 +0000 (16:15 -0700)]
x86/hyper-V: Allocate the IDT entry early in boot
Allocate the hypervisor callback IDT entry early in the boot sequence.
The previous code would allocate the entry as part of registering the handler
when the vmbus driver loaded, and this caused a problem for the IDT cleanup
that Thomas is working on for v4.15.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: apw@canonical.com
Cc: devel@linuxdriverproject.org
Cc: gregkh@linuxfoundation.org
Cc: jasowang@redhat.com
Cc: olaf@aepfle.de
Link: http://lkml.kernel.org/r/20170908231557.2419-1-kys@exchange.microsoft.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Juergen Gross [Tue, 5 Sep 2017 14:34:07 +0000 (16:34 +0200)]
paravirt: Switch maintainer
Jeremy Fitzhardinge is stepping down as a paravirt maintainer. I'll
replace him.
While at it, update the file list to the actual pattern.
Signed-off-by: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akataria@vmware.com
Cc: chrisw@sous-sol.org
Cc: jeremy@goop.org
Cc: rusty@rustcorp.com.au
Cc: virtualization@lists.linux-foundation.org
Link: http://lkml.kernel.org/r/20170905143407.9227-1-jgross@suse.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Juergen Gross [Mon, 4 Sep 2017 10:25:27 +0000 (12:25 +0200)]
x86/paravirt: Remove no longer used paravirt functions
With removal of lguest some of the paravirt functions are no longer
needed:
->read_cr4()
->store_idt()
->set_pmd_at()
->set_pud_at()
->pte_update()
Remove them.
Signed-off-by: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akataria@vmware.com
Cc: boris.ostrovsky@oracle.com
Cc: chrisw@sous-sol.org
Cc: jeremy@goop.org
Cc: rusty@rustcorp.com.au
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/20170904102527.25409-1-jgross@suse.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Andy Lutomirski [Mon, 11 Sep 2017 00:48:27 +0000 (17:48 -0700)]
x86/mm/64: Initialize CR4.PCIDE early
cpu_init() is weird: it's called rather late (after early
identification and after most MMU state is initialized) on the boot
CPU but is called extremely early (before identification) on secondary
CPUs. It's called just late enough on the boot CPU that its CR4 value
isn't propagated to mmu_cr4_features.
Even if we put CR4.PCIDE into mmu_cr4_features, we'd hit two
problems. First, we'd crash in the trampoline code. That's
fixable, and I tried that. It turns out that mmu_cr4_features is
totally ignored by secondary_start_64(), though, so even with the
trampoline code fixed, it wouldn't help.
This means that we don't currently have CR4.PCIDE reliably initialized
before we start playing with cpu_tlbstate. This is very fragile and
tends to cause boot failures if I make even small changes to the TLB
handling code.
Make it more robust: initialize CR4.PCIDE earlier on the boot CPU
and propagate it to secondary CPUs in start_secondary().
( Yes, this is ugly. I think we should have improved mmu_cr4_features
to actually control CR4 during secondary bootup, but that would be
fairly intrusive at this stage. )
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reported-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Tested-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Fixes: 660da7c9228f ("x86/mm: Enable CR4.PCIDE on supported systems")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Andy Lutomirski [Fri, 8 Sep 2017 05:06:58 +0000 (22:06 -0700)]
x86/hibernate/64: Mask off CR3's PCID bits in the saved CR3
Jiri reported a resume-from-hibernation failure triggered by PCID.
The root cause appears to be rather odd. The hibernation asm
restores a CR3 value that comes from the image header. If the image
kernel has PCID on, it's entirely reasonable for this CR3 value to
have one of the low 12 bits set. The restore code restores it with
CR4.PCIDE=0, which means that those low 12 bits are accepted by the
CPU but are either ignored or interpreted as a caching mode. This
is odd, but still works. We blow up later when the image kernel
restores CR4, though, since changing CR4.PCIDE with CR3[11:0] != 0
is illegal. Boom!
FWIW, it's entirely unclear to me what's supposed to happen if a PAE
kernel restores a non-PAE image or vice versa. Ditto for LA57.
Reported-by: Jiri Kosina <jikos@kernel.org>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 660da7c9228f ("x86/mm: Enable CR4.PCIDE on supported systems")
Link: http://lkml.kernel.org/r/18ca57090651a6341e97083883f9e814c4f14684.1504847163.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Andy Lutomirski [Fri, 8 Sep 2017 05:06:57 +0000 (22:06 -0700)]
x86/mm: Get rid of VM_BUG_ON in switch_tlb_irqs_off()
If we hit the VM_BUG_ON(), we're detecting a genuinely bad situation,
but we're very unlikely to get a useful call trace.
Make it a warning instead.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/3b4e06bbb382ca54a93218407c93925ff5871546.1504847163.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ingo Molnar [Wed, 13 Sep 2017 07:25:10 +0000 (09:25 +0200)]
Merge tag 'perf-urgent-for-mingo-4.14-
20170912' of git://git./linux/kernel/git/acme/linux into perf/urgent
Pull perf/urgent fixes from Arnaldo Carvalho de Melo:
- Fix TUI progress bar when delta from new total from that of the
previous update is greater than the progress "step" (screen width
progress bar block)) (Jiri Olsa)
- Make tools/lib/api make DEBUG=1 build use -D_FORTIFY_SOURCE=2 not
to cripple debuginfo, just like tools/perf/ does (Jiri Olsa)
- Avoid leaking the 'perf.data' file to workloads started from the
'perf record' command line by using the O_CLOEXEC open flag (Jiri Olsa)
- Fix building when libunwind's 'unwind.h' file is present in the
include path, clashing with tools/perf/util/unwind.h (Milian Wolff)
- Check per .perfconfig section entry flag, not just per section (Taeung Song)
- Support running perf binaries with a dash in their name, needed to
run perf as an AppImage (Milian Wolff)
- Wait for the right child by using waitpid() when running workloads
from 'perf stat', also to fix using perf as an AppImage (Milian Wolff)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Linus Torvalds [Wed, 13 Sep 2017 03:05:58 +0000 (20:05 -0700)]
Merge tag 'f2fs-for-4.14' of git://git./linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"In this round, we've mostly tuned f2fs to provide better user
experience for Android. Especially, we've worked on atomic write
feature again with SQLite community in order to support it officially.
And we added or modified several facilities to analyze and enhance IO
behaviors.
Major changes include:
- add app/fs io stat
- add inode checksum feature
- support project/journalled quota
- enhance atomic write with new ioctl() which exposes feature set
- enhance background gc/discard/fstrim flows with new gc_urgent mode
- add F2FS_IOC_FS{GET,SET}XATTR
- fix some quota flows"
* tag 'f2fs-for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (63 commits)
f2fs: hurry up to issue discard after io interruption
f2fs: fix to show correct discard_granularity in sysfs
f2fs: detect dirty inode in evict_inode
f2fs: clear radix tree dirty tag of pages whose dirty flag is cleared
f2fs: speed up gc_urgent mode with SSR
f2fs: better to wait for fstrim completion
f2fs: avoid race in between read xattr & write xattr
f2fs: make get_lock_data_page to handle encrypted inode
f2fs: use generic terms used for encrypted block management
f2fs: introduce f2fs_encrypted_file for clean-up
Revert "f2fs: add a new function get_ssr_cost"
f2fs: constify super_operations
f2fs: fix to wake up all sleeping flusher
f2fs: avoid race in between atomic_read & atomic_inc
f2fs: remove unneeded parameter of change_curseg
f2fs: update i_flags correctly
f2fs: don't check inode's checksum if it was dirtied or writebacked
f2fs: don't need to update inode checksum for recovery
f2fs: trigger fdatasync for non-atomic_write file
f2fs: fix to avoid race in between aio and gc
...
Linus Torvalds [Wed, 13 Sep 2017 03:03:53 +0000 (20:03 -0700)]
Merge tag 'ceph-for-4.14-rc1' of git://github.com/ceph/ceph-client
Pull ceph updates from Ilya Dryomov:
"The highlights include:
- a large series of fixes and improvements to the snapshot-handling
code (Zheng Yan)
- individual read/write OSD requests passed down to libceph are now
limited to 16M in size to avoid hitting OSD-side limits (Zheng Yan)
- encode MStatfs v2 message to allow for more accurate space usage
reporting (Douglas Fuller)
- switch to the new writeback error tracking infrastructure (Jeff
Layton)"
* tag 'ceph-for-4.14-rc1' of git://github.com/ceph/ceph-client: (35 commits)
ceph: stop on-going cached readdir if mds revokes FILE_SHARED cap
ceph: wait on writeback after writing snapshot data
ceph: fix capsnap dirty pages accounting
ceph: ignore wbc->range_{start,end} when write back snapshot data
ceph: fix "range cyclic" mode writepages
ceph: cleanup local variables in ceph_writepages_start()
ceph: optimize pagevec iterating in ceph_writepages_start()
ceph: make writepage_nounlock() invalidate page that beyonds EOF
ceph: properly get capsnap's size in get_oldest_context()
ceph: remove stale check in ceph_invalidatepage()
ceph: queue cap snap only when snap realm's context changes
ceph: handle race between vmtruncate and queuing cap snap
ceph: fix message order check in handle_cap_export()
ceph: fix NULL pointer dereference in ceph_flush_snaps()
ceph: adjust 36 checks for NULL pointers
ceph: delete an unnecessary return statement in update_dentry_lease()
ceph: ENOMEM pr_err in __get_or_create_frag() is redundant
ceph: check negative offsets in ceph_llseek()
ceph: more accurate statfs
ceph: properly set snap follows for cap reconnect
...
Richard Wareing [Tue, 12 Sep 2017 23:09:35 +0000 (09:09 +1000)]
xfs: XFS_IS_REALTIME_INODE() should be false if no rt device present
If using a kernel with CONFIG_XFS_RT=y and we set the RHINHERIT flag on
a directory in a filesystem that does not have a realtime device and
create a new file in that directory, it gets marked as a real time file.
When data is written and a fsync is issued, the filesystem attempts to
flush a non-existent rt device during the fsync process.
This results in a crash dereferencing a null buftarg pointer in
xfs_blkdev_issue_flush():
BUG: unable to handle kernel NULL pointer dereference at
0000000000000008
IP: xfs_blkdev_issue_flush+0xd/0x20
.....
Call Trace:
xfs_file_fsync+0x188/0x1c0
vfs_fsync_range+0x3b/0xa0
do_fsync+0x3d/0x70
SyS_fsync+0x10/0x20
do_syscall_64+0x4d/0xb0
entry_SYSCALL64_slow_path+0x25/0x25
Setting RT inode flags does not require special privileges so any
unprivileged user can cause this oops to occur. To reproduce, confirm
kernel is compiled with CONFIG_XFS_RT=y and run:
# mkfs.xfs -f /dev/pmem0
# mount /dev/pmem0 /mnt/test
# mkdir /mnt/test/foo
# xfs_io -c 'chattr +t' /mnt/test/foo
# xfs_io -f -c 'pwrite 0 5m' -c fsync /mnt/test/foo/bar
Or just run xfstests with MKFS_OPTIONS="-d rtinherit=1" and wait.
Kernels built with CONFIG_XFS_RT=n are not exposed to this bug.
Fixes: f538d4da8d52 ("[XFS] write barrier support")
Cc: <stable@vger.kernel.org>
Signed-off-by: Richard Wareing <rwareing@fb.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Tue, 12 Sep 2017 20:30:06 +0000 (13:30 -0700)]
Merge tag 'dma-mapping-4.14' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping updates from Christoph Hellwig:
- removal of the old dma_alloc_noncoherent interface
- remove unused flags to dma_declare_coherent_memory
- restrict OF DMA configuration to specific physical busses
- use the iommu mailing list for dma-mapping questions and patches
* tag 'dma-mapping-4.14' of git://git.infradead.org/users/hch/dma-mapping:
dma-coherent: fix dma_declare_coherent_memory() logic error
ARM: imx: mx31moboard: Remove unused 'dma' variable
dma-coherent: remove an unused variable
MAINTAINERS: use the iommu list for the dma-mapping subsystem
dma-coherent: remove the DMA_MEMORY_MAP and DMA_MEMORY_IO flags
dma-coherent: remove the DMA_MEMORY_INCLUDES_CHILDREN flag
of: restrict DMA configuration
dma-mapping: remove dma_alloc_noncoherent and dma_free_noncoherent
i825xx: switch to switch to dma_alloc_attrs
au1000_eth: switch to dma_alloc_attrs
sgiseeq: switch to dma_alloc_attrs
dma-mapping: reduce dma_mapping_error inline bloat
Linus Torvalds [Tue, 12 Sep 2017 20:27:21 +0000 (13:27 -0700)]
Merge tag 'uuid-for-4.14' of git://git.infradead.org/users/hch/uuid
Pull uuid updates from Christoph Hellwig:
"Just a single conversion to the new UUID API for this merge window"
* tag 'uuid-for-4.14' of git://git.infradead.org/users/hch/uuid:
efi: switch to use new generic UUID API
Linus Torvalds [Tue, 12 Sep 2017 20:21:00 +0000 (13:21 -0700)]
Merge tag 'selinux-pr-
20170831' of git://git./linux/kernel/git/pcmoore/selinux
Pull selinux updates from Paul Moore:
"A relatively quiet period for SELinux, 11 patches with only two/three
having any substantive changes.
These noteworthy changes include another tweak to the NNP/nosuid
handling, per-file labeling for cgroups, and an object class fix for
AF_UNIX/SOCK_RAW sockets; the rest of the changes are minor tweaks or
administrative updates (Stephen's email update explains the file
explosion in the diffstat).
Everything passes the selinux-testsuite"
[ Also a couple of small patches from the security tree from Tetsuo
Handa for Tomoyo and LSM cleanup. The separation of security policy
updates wasn't all that clean - Linus ]
* tag 'selinux-pr-
20170831' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: constify nf_hook_ops
selinux: allow per-file labeling for cgroupfs
lsm_audit: update my email address
selinux: update my email address
MAINTAINERS: update the NetLabel and Labeled Networking information
selinux: use GFP_NOWAIT in the AVC kmem_caches
selinux: Generalize support for NNP/nosuid SELinux domain transitions
selinux: genheaders should fail if too many permissions are defined
selinux: update the selinux info in MAINTAINERS
credits: update Paul Moore's info
selinux: Assign proper class to PF_UNIX/SOCK_RAW sockets
tomoyo: Update URLs in Documentation/admin-guide/LSM/tomoyo.rst
LSM: Remove security_task_create() hook.
Linus Torvalds [Tue, 12 Sep 2017 18:34:39 +0000 (11:34 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Two fixes: dead code removal, plus a SME memory encryption fix on
32-bit kernels that crashed Xen guests"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu: Remove unused and undefined __generic_processor_info() declaration
x86/mm: Make the SME mask a u64
Linus Torvalds [Tue, 12 Sep 2017 18:30:56 +0000 (11:30 -0700)]
Merge branch 'sched-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
"Three fixes:
- fix a suspend/resume cpusets bug
- fix a !CONFIG_NUMA_BALANCING bug
- fix a kerneldoc warning"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/fair: Fix nuisance kernel-doc warning
sched/cpuset/pm: Fix cpuset vs. suspend-resume bugs
sched/fair: Fix wake_affine_llc() balancing rules
Linus Torvalds [Tue, 12 Sep 2017 18:28:13 +0000 (11:28 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull perf tooling updates from Ingo Molnar:
"Perf tooling updates and fixes"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf annotate browser: Help for cycling thru hottest instructions with TAB/shift+TAB
perf stat: Only auto-merge events that are PMU aliases
perf test: Add test case for PERF_SAMPLE_PHYS_ADDR
perf script: Support physical address
perf mem: Support physical address
perf sort: Add sort option for physical address
perf tools: Support new sample type for physical address
perf vendor events powerpc: Remove duplicate events
perf intel-pt: Fix syntax in documentation of config option
perf test powerpc: Fix 'Object code reading' test
perf trace: Support syscall name globbing
perf syscalltbl: Support glob matching on syscall names
perf report: Calculate the average cycles of iterations
Linus Torvalds [Tue, 12 Sep 2017 18:25:56 +0000 (11:25 -0700)]
Merge branch 'irq-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull irq fixes from Ingo Molnar:
"A sparse irq race/locking fix, and a MSI irq domains population fix"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq: Make sparse_irq_lock protect what it should protect
genirq/msi: Fix populating multiple interrupts
Chao Yu [Tue, 12 Sep 2017 13:35:12 +0000 (21:35 +0800)]
f2fs: hurry up to issue discard after io interruption
Once we encounter I/O interruption during issuing discards, we will delay
long time before next round, but if system status is I/O idle during the
time, it may loses opportunity to issue discards. So this patch changes
to hurry up to issue discard after io interruption.
Besides, this patch also fixes to issue discards accurately with assigned
rate.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Tue, 12 Sep 2017 06:25:35 +0000 (14:25 +0800)]
f2fs: fix to show correct discard_granularity in sysfs
Fix below incorrect display when reading discard_granularity sysfs node.
$ cat /sys/fs/f2fs/<device>/discard_granularity
$ 16
$ echo 32 > /sys/fs/f2fs/<device>/discard_granularity
$ cat /sys/fs/f2fs/<device>/discard_granularity
$ 16
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Tue, 12 Sep 2017 06:04:05 +0000 (14:04 +0800)]
f2fs: detect dirty inode in evict_inode
Add a bugon in f2fs_evict_inode to detect inconsistent status between
inode cache and related node page cache.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Milian Wolff [Tue, 12 Sep 2017 15:25:23 +0000 (17:25 +0200)]
perf stat: Wait for the correct child
When packaging the perf userland application into an AppImage, the
wait() call in perf stat returned too early. It turned out that some
other child process exited, but not the one perf stat launched:
$ sudo strace -e fork,execve,clone,wait4 -f ./perf-x86_64.AppImage stat sleep 1
execve("./perf-git.
3a73b7f9-x86_64.AppImage", ["./perf-git.
3a73b7f9-x86_64.AppIm"..., "stat", "sleep", "1"], 0x7ffec1bbf050 /* 18 vars */) = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f6a6e7efe50) = 3912
strace: Process 3912 attached
[pid 3912] clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f6a6e7efe50) = 3914
strace: Process 3914 attached
[pid 3912] +++ exited with 0 +++
[pid 3911] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=3912, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
[pid 3914] clone(strace: Process 3915 attached
child_stack=0x7f6a6d9fefb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f6a6d9ff9d0, tls=0x7f6a6d9ff700, child_tidptr=0x7f6a6d9ff9d0) = 3915
[pid 3911] execve("/tmp/.mount_perf-g6VYMpl/AppRun", ["./perf-git.
3a73b7f9-x86_64.AppIm"..., "stat", "sleep", "1"], 0x14aab70 /* 21 vars */) = 0
[pid 3911] clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f4ae113c4d0) = 3916
strace: Process 3916 attached
[pid 3911] wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 3912
[pid 3916] execve("/usr/libexec/perf-core/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory)
[pid 3916] execve("/tmp/./sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory)
[pid 3916] execve("/home/milian/.bin/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory)
[pid 3916] execve("/usr/lib/icecream/libexec/icecc/bin/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory)
[pid 3916] execve("/ssd2/milian/projects/compiled/other/bin/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory)
[pid 3916] execve("/home/milian/.bin/kf5/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory)
[pid 3916] execve("/ssd2/milian/projects/compiled/kf5/bin/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory)
[pid 3916] execve("/home/milian/projects/compiled/other/bin/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory)
[pid 3916] execve("/home/milian/projects/compiled/kf5/bin/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory)
[pid 3916] execve("/usr/local/sbin/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory)
[pid 3916] execve("/usr/local/bin/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory)
[pid 3916] execve("/usr/bin/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */
Performance counter stats for 'sleep 1':
<not counted> task-clock
<not counted> context-switches
<not counted> cpu-migrations
<not counted> page-faults
<not counted> cycles
<not counted> instructions
<not counted> branches
<not counted> branch-misses
0.
000047194 seconds time elapsed
[pid 3916] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=3911, si_uid=0} ---
[pid 3916] +++ killed by SIGTERM +++
[pid 3911] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=3916, si_uid=0, si_status=SIGTERM, si_utime=0, si_stime=0} ---
[pid 3915] --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=3914, si_uid=0} ---
[pid 3911] +++ exited with 0 +++
[pid 3915] --- SIGHUP {si_signo=SIGHUP, si_code=SI_USER, si_pid=3914, si_uid=0} ---
[pid 3915] +++ exited with 0 +++
+++ exited with 0 +++
This patch uses waitpid instead to ensure the call waits for the
debuggee application launched by 'perf stat'. This fixes 'perf stat'
when launched from an AppImage:
$ ./perf-x86_64.AppImage stat sleep 1
Performance counter stats for 'sleep 1':
0.357235 task-clock (msec) # 0.000 CPUs utilized
1 context-switches # 0.003 M/sec
0 cpu-migrations # 0.000 K/sec
50 page-faults # 0.140 M/sec
1269602 cycles # 3.554 GHz
654278 instructions # 0.52 insn per cycle
129963 branches # 363.803 M/sec
7082 branch-misses # 5.45% of all branches
1.
000633420 seconds time elapsed
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20170912152523.4497-1-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Milian Wolff [Mon, 11 Sep 2017 11:14:22 +0000 (13:14 +0200)]
perf tools: Support running perf binaries with a dash in their name
Previously the part behind "perf-" was interpreted as an internal perf
command. If the suffix could not be handled, the execution was stopped.
This makes it impossible to launch perf binaries that got renamed to
have the `perf-` prefix. This is e.g. the case for appimages (e.g.
"perf-x86_64.AppImage"), but would also apply to all other scenarios
where users symlink or rename perf themselves:
Status quo with the broken behavior:
$ ln -s ./perf ./perf-custom-suffix
$ ./perf-custom-suffix list
cannot handle custom-suffix internally$
Also note the missing newline at the end of the error message.
With this patch applied, the above works properly:
$ ./perf-custom-suffix list
List of pre-defined events (to be used in -e):
...
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Acked-by: David Ahern <dsahern@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/r/20170911111422.31903-1-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Peter Zijlstra [Thu, 7 Sep 2017 15:03:53 +0000 (17:03 +0200)]
sched/debug: Add debugfs knob for "sched_debug"
I'm forever late for editing my kernel cmdline, add a runtime knob to
disable the "sched_debug" thing.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170907150614.142924283@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Thu, 7 Sep 2017 15:03:52 +0000 (17:03 +0200)]
sched/core: WARN() when migrating to an offline CPU
Migrating tasks to offline CPUs is a pretty big fail, warn about it.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170907150614.094206976@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Thu, 7 Sep 2017 15:03:51 +0000 (17:03 +0200)]
sched/fair: Plug hole between hotplug and active_load_balance()
The load balancer applies cpu_active_mask to whatever sched_domains it
finds, however in the case of active_balance there is a hole between
setting rq->{active_balance,push_cpu} and running the stop_machine
work doing the actual migration.
The @push_cpu can go offline in this window, which would result in us
moving a task onto a dead cpu, which is a fairly bad thing.
Double check the active mask before the stop work does the migration.
CPU0 CPU1
<SoftIRQ>
stop_machine(takedown_cpu)
load_balance() cpu_stopper_thread()
... work = multi_cpu_stop
stop_one_cpu_nowait( /* wait for CPU0 */
.func = active_load_balance_cpu_stop
);
</SoftIRQ>
cpu_stopper_thread()
work = multi_cpu_stop
/* sync with CPU1 */
take_cpu_down()
<idle>
play_dead();
work = active_load_balance_cpu_stop
set_task_cpu(p, CPU1); /* oops!! */
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20170907150614.044460912@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Thu, 7 Sep 2017 15:03:50 +0000 (17:03 +0200)]
sched/fair: Avoid newidle balance for !active CPUs
On CPU hot unplug, when parking the last kthread we'll try and
schedule into idle to kill the CPU. This last schedule can (and does)
trigger newidle balance because at this point the sched domains are
still up because of commit:
77d1dfda0e79 ("sched/topology, cpuset: Avoid spurious/wrong domain rebuilds")
Obviously pulling tasks to an already offline CPU is a bad idea, and
all balancing operations _should_ be subject to cpu_active_mask, make
it so.
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: 77d1dfda0e79 ("sched/topology, cpuset: Avoid spurious/wrong domain rebuilds")
Link: http://lkml.kernel.org/r/20170907150613.994135806@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Taeung Song [Thu, 7 Sep 2017 03:18:45 +0000 (12:18 +0900)]
perf config: Check not only section->from_system_config but also item's
Currently section->from_system_config is being checked multiple times.
item->from_system_config should be checked instead, when iterating thru
the items in a section. Fix it.
Signed-off-by: Taeung Song <treeze.taeung@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1504754325-9724-1-git-send-email-treeze.taeung@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Fri, 8 Sep 2017 12:05:08 +0000 (14:05 +0200)]
perf ui progress: Fix progress update
We currently update the 'next' variable only with a single step value.
But it's possible the 'adv' update is bigger than single 'step' value.
This would leave 'next' value under counted and force unnecessary
ui_progress__ops->update calls.
Calculate the amount of steps we need for 'adv' update and increase the
'next' with that amounts of steps.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20170908120510.22515-3-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Fri, 8 Sep 2017 12:05:07 +0000 (14:05 +0200)]
perf ui progress: Make sure we always define step value
Unlikely, but we could have ui_progress__init being called with total <
16, which would set the next and step variables to 0. That would force
unnecessary ui_progress__ops->update calls because 'next' would never
raise.
Forcing the next and step values to be always > 0.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20170908120510.22515-2-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Fri, 8 Sep 2017 08:46:20 +0000 (10:46 +0200)]
perf tools: Open perf.data with O_CLOEXEC flag
Do not carry the perf.data file descriptor into the workload process and
close it when perf executes the workload.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20170908084621.31595-2-jolsa@kernel.org
[ Add definitions for O_CLOEXEC for older systems ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Fri, 8 Sep 2017 08:46:19 +0000 (10:46 +0200)]
tools lib api: Fix make DEBUG=1 build
Do not use -D_FORTIFY_SOURCE=2 for DEBUG build as it seems to mess up
with debuginfo, which results in bad gdb experience.
We already do that for tools/perf/.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20170908084621.31595-1-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Milian Wolff [Wed, 6 Sep 2017 15:02:09 +0000 (17:02 +0200)]
perf tests: Fix compile when libunwind's unwind.h is available
When cross compiling perf and I want to link against a self-compiled
libunwind, I usually make the custom path where the libunwind headers
exist visible by adding the libunwind prefix to the include path when
compiling perf, i.e.:
~~~~~
$ ls $HOME/projects/compiled/other/include/
libunwind-coredump.h libunwind.h libunwind-x86_64.h
libunwind-common.h libunwind-dynamic.h libunwind-ptrace.h
unwind.h
$ make EXTRA_CFLAGS="-I$HOME/projects/compiled/other/include/
~~~~~~
Note the `unwind.h` header from libunwind which leads to compile
errors when compiling tests/dwarf-unwind.c, since it shadows perf's
util/unwind.h:
~~~~~
tests/dwarf-unwind.c:41:32: error: ‘struct unwind_entry’ declared inside parameter list will not be visible outside of this definition or declaration [-Werror]
static int unwind_entry(struct unwind_entry *entry, void *arg)
^~~~~~~~~~~~
tests/dwarf-unwind.c: In function ‘unwind_entry’:
tests/dwarf-unwind.c:44:22: error: dereferencing pointer to incomplete type ‘struct unwind_entry’
char *symbol = entry->sym ? entry->sym->name : NULL;
^~
tests/dwarf-unwind.c: In function ‘unwind_thread’:
tests/dwarf-unwind.c:92:8: error: implicit declaration of function ‘unwind__get_entries’; did you mean ‘unwind_entry’? [-Werror=implicit-function-declaration]
err = unwind__get_entries(unwind_entry, &cnt, thread,
^~~~~~~~~~~~~~~~~~~
unwind_entry
tests/dwarf-unwind.c:92:8: error: nested extern declaration of ‘unwind__get_entries’ [-Werror=nested-externs]
~~~~~~
Fix this compile error by specificing an explicit include of perf's
unwind.h in the util folder.
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/r/20170906150209.12579-1-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Tue, 5 Sep 2017 13:52:02 +0000 (10:52 -0300)]
tools include linux: Guard against redefinition of some macros
When cross building to android r15c (and older versions) on Fedora 26
we notice these:
/opt/android-ndk-r15c/platforms/android-24/arch-arm/usr/include/sys/cdefs.h:332:0: note: this is the location of the previous definition
For __aligned, __packed and __noreturn, so guard those with ifdefs to
avoid drowning useful warnings in these.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-d7w3fa9c22dtmrwbedos6ie1@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Amir Goldstein [Mon, 11 Sep 2017 13:30:15 +0000 (16:30 +0300)]
ovl: fix false positive ESTALE on lookup
Commit
b9ac5c274b8c ("ovl: hash overlay non-dir inodes by copy up origin")
verifies that the origin lower inode stored in the overlayfs inode matched
the inode of a copy up origin dentry found by lookup.
There is a false positive result in that check when lower fs does not
support file handles and copy up origin cannot be followed by file handle
at lookup time.
The false negative happens when finding an overlay inode in cache on a
copied up overlay dentry lookup. The overlay inode still 'remembers' the
copy up origin inode, but the copy up origin dentry is not available for
verification.
Relax the check in case copy up origin dentry is not available.
Fixes: b9ac5c274b8c ("ovl: hash overlay non-dir inodes by copy up...")
Cc: <stable@vger.kernel.org> # v4.13
Reported-by: Jordi Pujol <jordipujolp@gmail.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Masahiro Yamada [Sat, 2 Sep 2017 08:05:35 +0000 (17:05 +0900)]
kbuild: buildtar: do not print successful message if tar returns error
The previous commit spotted that "Tarball successfully created ..."
is displayed even if the "tar" command returns error code because
it is followed by "| ${compress}".
Let the build fail instead of printing the successful message since
if the "tar" command fails, the output may not be what users expect.
Avoid the use of the pipe. While we are here, refactor the script
removing the use of sub-shell, ${compress}, ${file_ext}.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Masahiro Yamada [Sat, 2 Sep 2017 08:05:34 +0000 (17:05 +0900)]
kbuild: buildtar: fix tar error when CONFIG_MODULES is disabled
$tmpdir/lib is created by "make modules_install". It does not exist
if CONFIG_MODULES is disabled, then tar reports the following messages:
tar: lib: Cannot stat: No such file or directory
tar: Exiting with failure status due to previous errors
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Miklos Szeredi [Tue, 12 Sep 2017 14:57:54 +0000 (16:57 +0200)]
fuse: getattr cleanup
The refreshed argument isn't used by any caller, get rid of it.
Use a helper for just updating the inode (no need to fill in a kstat).
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Tue, 12 Sep 2017 14:57:53 +0000 (16:57 +0200)]
fuse: honor iocb sync flags on write
If the IOCB_DSYNC flag is set a sync is not being performed by
fuse_file_write_iter.
Honor IOCB_DSYNC/IOCB_SYNC by setting O_DYSNC/O_SYNC respectively in the
flags filed of the write request.
We don't need to sync data or metadata, since fuse_perform_write() does
write-through and the filesystem is responsible for updating file times.
Original patch by Vitaly Zolotusky.
Reported-by: Nate Clark <nate@neworld.us>
Cc: Vitaly Zolotusky <vitaly@unitc.com>.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Tue, 12 Sep 2017 14:57:53 +0000 (16:57 +0200)]
fuse: allow server to run in different pid_ns
Commit
0b6e9ea041e6 ("fuse: Add support for pid namespaces") broke
Sandstorm.io development tools, which have been sending FUSE file
descriptors across PID namespace boundaries since early 2014.
The above patch added a check that prevented I/O on the fuse device file
descriptor if the pid namespace of the reader/writer was different from the
pid namespace of the mounter. With this change passing the device file
descriptor to a different pid namespace simply doesn't work. The check was
added because pids are transferred to/from the fuse userspace server in the
namespace registered at mount time.
To fix this regression, remove the checks and do the following:
1) the pid in the request header (the pid of the task that initiated the
filesystem operation) is translated to the reader's pid namespace. If a
mapping doesn't exist for this pid, then a zero pid is used. Note: even if
a mapping would exist between the initiator task's pid namespace and the
reader's pid namespace the pid will be zero if either mapping from
initator's to mounter's namespace or mapping from mounter's to reader's
namespace doesn't exist.
2) The lk.pid value in setlk/setlkw requests and getlk reply is left alone.
Userspace should not interpret this value anyway. Also allow the
setlk/setlkw operations if the pid of the task cannot be represented in the
mounter's namespace (pid being zero in that case).
Reported-by: Kenton Varda <kenton@sandstorm.io>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 0b6e9ea041e6 ("fuse: Add support for pid namespaces")
Cc: <stable@vger.kernel.org> # v4.12+
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Seth Forshee <seth.forshee@canonical.com>
Daniel Drake [Mon, 11 Sep 2017 06:11:56 +0000 (14:11 +0800)]
pinctrl/amd: save pin registers over suspend/resume
The touchpad in the Asus laptop models X505BA/BP and X542BA/BP is
unresponsive after suspend/resume. The following error appears during
resume:
i2c_hid i2c-ELAN1300:00: failed to reset device.
The problem here is that i2c_hid does not notice the interrupt being
generated at this point, because the GPIO is no longer configured
for interrupts.
Fix this by saving pinctrl-amd pin registers during suspend and
restoring them at resume time.
Based on code from pinctrl-intel.
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Drake <drake@endlessm.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Linus Torvalds [Tue, 12 Sep 2017 13:10:44 +0000 (06:10 -0700)]
Merge branch 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM updates from Russell King:
"Low priority fixes and updates for ARM:
- add some missing includes
- efficiency improvements in system call entry code when tracing is
enabled
- ensure ARMv6+ is always built as EABI
- export save_stack_trace_tsk()
- fix fatal signal handling during mm fault
- build translation table base address register from scratch
- appropriately align the .data section to a word boundary where we
rely on that data being word aligned"
* 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: 8691/1: Export save_stack_trace_tsk()
ARM: 8692/1: mm: abort uaccess retries upon fatal signal
ARM: 8690/1: lpae: build TTB control register value from scratch in v7_ttb_setup
ARM: align .data section
ARM: always enable AEABI for ARMv6+
ARM: avoid saving and restoring registers unnecessarily
ARM: move PC value into r9
ARM: obtain thread info structure later
ARM: use aliases for registers in entry-common
ARM: 8689/1: scu: add missing errno include
ARM: 8688/1: pm: add missing types include
Linus Torvalds [Tue, 12 Sep 2017 13:01:59 +0000 (06:01 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/s390/linux
Pull more s390 updates from Martin Schwidefsky:
"The second patch set for the 4.14 merge window:
- Convert the dasd device driver to the blk-mq interface.
- Provide three zcrypt interfaces for vfio_ap. These will be required
for KVM guest access to the crypto cards attached via the AP bus.
- A couple of memory management bug fixes."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/dasd: blk-mq conversion
s390/mm: use a single lock for the fields in mm_context_t
s390/mm: fix race on mm->context.flush_mm
s390/mm: fix local TLB flushing vs. detach of an mm address space
s390/zcrypt: externalize AP queue interrupt control
s390/zcrypt: externalize AP config info query
s390/zcrypt: externalize test AP queue
s390/mm: use VM_BUG_ON in crst_table_[upgrade|downgrade]
Takashi Iwai [Tue, 12 Sep 2017 10:41:20 +0000 (12:41 +0200)]
ALSA: seq: Cancel pending autoload work at unbinding device
ALSA sequencer core has a mechanism to load the enumerated devices
automatically, and it's performed in an off-load work. This seems
causing some race when a sequencer is removed while the pending
autoload work is running. As syzkaller spotted, it may lead to some
use-after-free:
BUG: KASAN: use-after-free in snd_rawmidi_dev_seq_free+0x69/0x70
sound/core/rawmidi.c:1617
Write of size 8 at addr
ffff88006c611d90 by task kworker/2:1/567
CPU: 2 PID: 567 Comm: kworker/2:1 Not tainted 4.13.0+ #29
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Workqueue: events autoload_drivers
Call Trace:
__dump_stack lib/dump_stack.c:16 [inline]
dump_stack+0x192/0x22c lib/dump_stack.c:52
print_address_description+0x78/0x280 mm/kasan/report.c:252
kasan_report_error mm/kasan/report.c:351 [inline]
kasan_report+0x230/0x340 mm/kasan/report.c:409
__asan_report_store8_noabort+0x1c/0x20 mm/kasan/report.c:435
snd_rawmidi_dev_seq_free+0x69/0x70 sound/core/rawmidi.c:1617
snd_seq_dev_release+0x4f/0x70 sound/core/seq_device.c:192
device_release+0x13f/0x210 drivers/base/core.c:814
kobject_cleanup lib/kobject.c:648 [inline]
kobject_release lib/kobject.c:677 [inline]
kref_put include/linux/kref.h:70 [inline]
kobject_put+0x145/0x240 lib/kobject.c:694
put_device+0x25/0x30 drivers/base/core.c:1799
klist_devices_put+0x36/0x40 drivers/base/bus.c:827
klist_next+0x264/0x4a0 lib/klist.c:403
next_device drivers/base/bus.c:270 [inline]
bus_for_each_dev+0x17e/0x210 drivers/base/bus.c:312
autoload_drivers+0x3b/0x50 sound/core/seq_device.c:117
process_one_work+0x9fb/0x1570 kernel/workqueue.c:2097
worker_thread+0x1e4/0x1350 kernel/workqueue.c:2231
kthread+0x324/0x3f0 kernel/kthread.c:231
ret_from_fork+0x25/0x30 arch/x86/entry/entry_64.S:425
The fix is simply to assure canceling the autoload work at removing
the device.
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Gregory CLEMENT [Thu, 7 Sep 2017 14:54:07 +0000 (16:54 +0200)]
pinctrl: armada-37xx: Fix gpio interrupt setup
Since commit
dc749a09ea5e ("gpiolib: allow gpio irqchip to map irqs
dynamically"), the irqs for gpio are not statically allocated during in
gpiochip_irqchip_add.
This driver was based on this assumption for initializing the mask
associated to each interrupt this led to a NULL pointer crash in the
kernel:
Unable to handle kernel NULL pointer dereference at virtual address
00000000
Mem abort info:
Exception class = DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000068
CM = 0, WnR = 1
[
0000000000000000] user address but active_mm is swapper
Internal error: Oops:
96000044 [#1] PREEMPT SMP
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted
4.13.0-06657-g3b9f8ed25dbe #576
Hardware name: Marvell Armada 3720 Development Board DB-
88F3720-DDR3 (DT)
task:
ffff80001d908000 task.stack:
ffff000008068000
PC is at armada_37xx_pinctrl_probe+0x5f8/0x670
LR is at armada_37xx_pinctrl_probe+0x5e8/0x670
pc : [<
ffff000008e25cdc>] lr : [<
ffff000008e25ccc>] pstate:
60000045
sp :
ffff00000806bb80
x29:
ffff00000806bb80 x28:
0000000000000024
x27:
000000000000000c x26:
0000000000000001
x25:
ffff80001efee760 x24:
0000000000000000
x23:
ffff80001db6f570 x22:
ffff80001db6f438
x21:
0000000000000000 x20:
ffff80001d9f4810
x19:
ffff80001db6f418 x18:
0000000000000000
x17:
0000000000000001 x16:
0000000000000019
x15:
ffffffffffffffff x14:
0140000000000000
x13:
0000000000000000 x12:
0000000000000030
x11:
0101010101010101 x10:
0000000000000040
x9 :
ffff000009923580 x8 :
ffff80001d400248
x7 :
ffff80001d400270 x6 :
0000000000000000
x5 :
ffff80001d400248 x4 :
ffff80001d400270
x3 :
0000000000000000 x2 :
0000000000000001
x1 :
0000000000000001 x0 :
0000000000000000
Process swapper/0 (pid: 1, stack limit = 0xffff000008068000)
Call trace:
Exception stack(0xffff00000806ba40 to 0xffff00000806bb80)
ba40:
0000000000000000 0000000000000001 0000000000000001 0000000000000000
ba60:
ffff80001d400270 ffff80001d400248 0000000000000000 ffff80001d400270
ba80:
ffff80001d400248 ffff000009923580 0000000000000040 0101010101010101
baa0:
0000000000000030 0000000000000000 0140000000000000 ffffffffffffffff
bac0:
0000000000000019 0000000000000001 0000000000000000 ffff80001db6f418
bae0:
ffff80001d9f4810 0000000000000000 ffff80001db6f438 ffff80001db6f570
bb00:
0000000000000000 ffff80001efee760 0000000000000001 000000000000000c
bb20:
0000000000000024 ffff00000806bb80 ffff000008e25ccc ffff00000806bb80
bb40:
ffff000008e25cdc 0000000060000045 ffff00000806bb60 ffff0000081189b8
bb60:
ffffffffffffffff ffff00000811cf1c ffff00000806bb80 ffff000008e25cdc
[<
ffff000008e25cdc>] armada_37xx_pinctrl_probe+0x5f8/0x670
[<
ffff00000859d8c8>] platform_drv_probe+0x58/0xb8
[<
ffff00000859bb44>] driver_probe_device+0x22c/0x2d8
[<
ffff00000859bcac>] __driver_attach+0xbc/0xc0
[<
ffff000008599c84>] bus_for_each_dev+0x4c/0x98
[<
ffff00000859b440>] driver_attach+0x20/0x28
[<
ffff00000859af90>] bus_add_driver+0x1b8/0x228
[<
ffff00000859c648>] driver_register+0x60/0xf8
[<
ffff00000859df64>] __platform_driver_probe+0x74/0x130
[<
ffff000008e256dc>] armada_37xx_pinctrl_driver_init+0x20/0x28
[<
ffff000008083980>] do_one_initcall+0x38/0x128
[<
ffff000008e00cf4>] kernel_init_freeable+0x188/0x22c
[<
ffff0000089b56e8>] kernel_init+0x10/0x100
[<
ffff000008084bb0>] ret_from_fork+0x10/0x18
Code:
f9403fa2 12001341 1100075a 9ac12041 (
b9000001)
---[ end trace
8b0f4e05e1603208 ]---
This patch moves the initialization of the mask field in the irq_startup
function. However some callbacks such as irq_set_type and irq_set_wake
could be called before irq_startup. For those functions the mask is
computed at each call which is not a issue as these functions are not
located in a hot path but are used sporadically for configuration.
Fixes: dc749a09ea5e ("gpiolib: allow gpio irqchip to map irqs
dynamically")
Cc: <stable@vger.kernel.org>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Dan Carpenter [Thu, 7 Sep 2017 11:12:05 +0000 (14:12 +0300)]
pinctrl: sprd: fix off by one bugs
info->groups[] has info->ngroups elements so these comparisons should be
>= instead of >.
Fixes: 41d32cfce1ae ("pinctrl: sprd: Add Spreadtrum pin control driver")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Baolin Wang <baolin.wang@spreadtrum.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Dan Carpenter [Thu, 7 Sep 2017 07:29:26 +0000 (10:29 +0300)]
pinctrl: sprd: check for allocation failure
devm_pinctrl_get() could fail with ERR_PTR(-ENOMEM) so I have added a
check for that. I also reversed the other IS_ERR() test because it was
a little confusing to test one way and then the opposite a couple lines
later.
Fixes: 41d32cfce1ae ("pinctrl: sprd: Add Spreadtrum pin control driver")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Geert Uytterhoeven [Wed, 6 Sep 2017 16:08:05 +0000 (18:08 +0200)]
pinctrl: sprd: Restrict PINCTRL_SPRD to ARCH_SPRD or COMPILE_TEST
The Spreadtrum pinctrl drivers are only useful when building for a
Spreadtrum platform.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Randy Dunlap [Mon, 4 Sep 2017 15:35:28 +0000 (08:35 -0700)]
pinctrl: sprd: fix build errors and dependencies
Fix build errors when CONFIG_OF is not enabled.
Also, the pinctrl-sprd-sc9860 driver uses functions from the pinctrl-sprd
driver, so the former should depend on the latter driver.
../drivers/pinctrl/sprd/pinctrl-sprd.c: In function 'sprd_dt_node_to_map':
../drivers/pinctrl/sprd/pinctrl-sprd.c:290:2: error: implicit declaration of function 'pinconf_generic_parse_dt_config' [-Werror=implicit-function-declaration]
ret = pinconf_generic_parse_dt_config(np, pctldev, &configs,
^
../drivers/pinctrl/sprd/pinctrl-sprd.c: At top level:
../drivers/pinctrl/sprd/pinctrl-sprd.c:844:44: error: array type has incomplete element type
static const struct pinconf_generic_params sprd_dt_params[] = {
^
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Baolin Wang <baolin.wang@spreadtrum.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: linux-gpio@vger.kernel.org
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Colin Ian King [Mon, 4 Sep 2017 10:53:22 +0000 (11:53 +0100)]
pinctrl: sprd: make three local functions static
The functions sprd_pmx_get_function_count, sprd_pmx_get_function_name
and sprd_pmx_get_function_groups are local to the source and do not
need to be in global scope, so make them static.
Cleans up sparse warnings:
"symbol 'sprd_pmx_get_function_count' was not declared. Should it be
static?"
"symbol 'sprd_pmx_get_function_name' was not declared. Should it be
static?"
"symbol 'sprd_pmx_get_function_groups' was not declared. Should it be
static?"
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Masahiro Yamada [Sat, 2 Sep 2017 17:26:18 +0000 (02:26 +0900)]
pinctrl: uniphier: include <linux/build_bug.h> instead of <linux/bug.h>
The #includes <linux/bug.h> is here to use BUILD_BUG_ON_ZERO().
Thanks to commit
bc6245e5efd7 ("bug: split BUILD_BUG stuff out into
<linux/build_bug.h>"), it is now possible to reduce the number of
headers pulled in.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Markus Elfring [Wed, 6 Sep 2017 11:30:14 +0000 (13:30 +0200)]
ALSA: firewire: Use common error handling code in snd_motu_stream_start_duplex()
Add a jump target so that a bit of exception handling can be better reused
at the end of this function.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Linus Torvalds [Tue, 12 Sep 2017 05:26:20 +0000 (22:26 -0700)]
Merge branch 'next' of git://git./linux/kernel/git/rzhang/linux
Pull thermal updates from Zhang Rui:
- fix resources release in error paths when registering thermal zone.
(Christophe Jaillet)
- introduce a new thermal driver for on-chip PVT (Process, Voltage and
Temperature) monitoring unit implemented on UniPhier SoCs. This
driver supports temperature monitoring and alert function. (Kunihiko
Hayashi)
- Add support for mt2712 chip in the mtk_thermal driver. (Louis Yu)
- Add support for RK3328 SOC in rockchip_thermal driver. (Rocky Hao)
- cleanup a couple of platform thermal drivers to constify
thermal_zone_of_device_ops structures. (Julia Lawall)
- a couple of fixes in int340x and intel_pch_thermal thermal driver.
(Arvind Yadav, Sumeet Pawnikar, Brian Bian, Ed Swierk, Zhang Rui)
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux: (27 commits)
Thermal: int3406_thermal: fix thermal sysfs I/F
thermal: mediatek: minor mtk_thermal.c cleanups
thermal: mediatek: extend calibration data for mt2712 chip
thermal: mediatek: add Mediatek thermal driver for mt2712
dt-bindings: thermal: Add binding document for Mediatek thermal controller
thermal: intel_pch_thermal: Fix enable check on Broadwell-DE
thermal: rockchip: Support the RK3328 SOC in thermal driver
dt-bindings: rockchip-thermal: Support the RK3328 SoC compatible
thermal: bcm2835: constify thermal_zone_of_device_ops structures
thermal: exynos: constify thermal_zone_of_device_ops structures
thermal: zx2967: constify thermal_zone_of_device_ops structures
thermal: rcar_gen3_thermal: constify thermal_zone_of_device_ops structures
thermal: qoriq: constify thermal_zone_of_device_ops structures
thermal: hisilicon: constify thermal_zone_of_device_ops structures
thermal: core: Fix resources release in error paths in thermal_zone_device_register()
thermal: core: Use the new 'thermal_zone_destroy_device_groups()' helper function
thermal: core: Add some new helper functions to free resources
thermal: int3400_thermal: process "thermal table changed" event
thermal: uniphier: add UniPhier thermal driver
dt-bindings: thermal: add binding documentation for UniPhier thermal monitor
...
Linus Torvalds [Tue, 12 Sep 2017 05:01:44 +0000 (22:01 -0700)]
Merge tag 'nfs-for-4.14-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client updates from Trond Myklebust:
"Hightlights include:
Stable bugfixes:
- Fix mirror allocation in the writeback code to avoid a use after
free
- Fix the O_DSYNC writes to use the correct byte range
- Fix 2 use after free issues in the I/O code
Features:
- Writeback fixes to split up the inode->i_lock in order to reduce
contention
- RPC client receive fixes to reduce the amount of time the
xprt->transport_lock is held when receiving data from a socket into
am XDR buffer.
- Ditto fixes to reduce contention between call side users of the
rdma rb_lock, and its use in rpcrdma_reply_handler.
- Re-arrange rdma stats to reduce false cacheline sharing.
- Various rdma cleanups and optimisations.
- Refactor the NFSv4.1 exchange id code and clean up the code.
- Const-ify all instances of struct rpc_xprt_ops
Bugfixes:
- Fix the NFSv2 'sec=' mount option.
- NFSv4.1: don't use machine credentials for CLOSE when using
'sec=sys'
- Fix the NFSv3 GRANT callback when the port changes on the server.
- Fix livelock issues with COMMIT
- NFSv4: Use correct inode in _nfs4_opendata_to_nfs4_state() when
doing and NFSv4.1 open by filehandle"
* tag 'nfs-for-4.14-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (69 commits)
NFS: Count the bytes of skipped subrequests in nfs_lock_and_join_requests()
NFS: Don't hold the group lock when calling nfs_release_request()
NFS: Remove pnfs_generic_transfer_commit_list()
NFS: nfs_lock_and_join_requests and nfs_scan_commit_list can deadlock
NFS: Fix 2 use after free issues in the I/O code
NFS: Sync the correct byte range during synchronous writes
lockd: Delete an error message for a failed memory allocation in reclaimer()
NFS: remove jiffies field from access cache
NFS: flush data when locking a file to ensure cache coherence for mmap.
SUNRPC: remove some dead code.
NFS: don't expect errors from mempool_alloc().
xprtrdma: Use xprt_pin_rqst in rpcrdma_reply_handler
xprtrdma: Re-arrange struct rx_stats
NFS: Fix NFSv2 security settings
NFSv4.1: don't use machine credentials for CLOSE when using 'sec=sys'
SUNRPC: ECONNREFUSED should cause a rebind.
NFS: Remove unused parameter gfp_flags from nfs_pageio_init()
NFSv4: Fix up mirror allocation
SUNRPC: Add a separate spinlock to protect the RPC request receive list
SUNRPC: Cleanup xs_tcp_read_common()
...
Daeho Jeong [Mon, 11 Sep 2017 07:30:28 +0000 (16:30 +0900)]
f2fs: clear radix tree dirty tag of pages whose dirty flag is cleared
On a senario like writing out the first dirty page of the inode
as the inline data, we only cleared dirty flags of the pages, but
didn't clear the dirty tags of those pages in the radix tree.
If we don't clear the dirty tags of the pages in the radix tree, the
inodes which contain the pages will be marked with I_DIRTY_PAGES again
and again, and writepages() for the inodes will be invoked in every
writeback period. As a result, nothing will be done in every
writepages() for the inodes and it will just consume CPU time
meaninglessly.
Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
NeilBrown [Mon, 11 Sep 2017 03:15:50 +0000 (13:15 +1000)]
NFS: various changes relating to reporting IO errors.
1/ remove 'start' and 'end' args from nfs_file_fsync_commit().
They aren't used.
2/ Make nfs_context_set_write_error() a "static inline" in internal.h
so we can...
3/ Use nfs_context_set_write_error() instead of mapping_set_error()
if nfs_pageio_add_request() fails before sending any request.
NFS generally keeps errors in the open_context, not the mapping,
so this is more consistent.
4/ If filemap_write_and_write_range() reports any error, still
check ctx->error. The value in ctx->error is likely to be
more useful. As part of this, NFS_CONTEXT_ERROR_WRITE is
cleared slightly earlier, before nfs_file_fsync_commit() is called,
rather than at the start of that function.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Chuck Lever [Mon, 21 Aug 2017 19:00:49 +0000 (15:00 -0400)]
NFS: Add static NFS I/O tracepoints
Tools like tcpdump and rpcdebug can be very useful. But there are
plenty of environments where they are difficult or impossible to
use. For example, we've had customers report I/O failures during
workloads so heavy that collecting network traffic or enabling
RPC debugging are themselves onerous.
The kernel's static tracepoints are lightweight (less likely to
introduce timing changes) and efficient (the trace data is compact).
They also work in scenarios where capturing network traffic is not
possible due to lack of hardware support (some InfiniBand HCAs) or
where data or network privacy is a concern.
Introduce tracepoints that show when an NFS READ, WRITE, or COMMIT
is initiated, and when it completes. Record the arguments and
results of each operation, which are not shown by existing sunrpc
module's tracepoints.
For instance, the recorded offset and count can be used to match an
"initiate" event to a "done" event. If an NFS READ result returns
fewer bytes than requested or zero, seeing the EOF flag can be
probative. Seeing an NFS4ERR_BAD_STATEID result is also indication
of a particular class of problems. The timing information attached
to each event record can often be useful as well.
Usage example:
[root@manet tmp]# trace-cmd record -e nfs:*initiate* -e nfs:*done
/sys/kernel/debug/tracing/events/nfs/*initiate*/filter
/sys/kernel/debug/tracing/events/nfs/*done/filter
Hit Ctrl^C to stop recording
^CKernel buffer statistics:
Note: "entries" are the entries left in the kernel ring buffer and are not
recorded in the trace data. They should all be zero.
CPU: 0
entries: 0
overrun: 0
commit overrun: 0
bytes: 3680
oldest event ts: 78.367422
now ts: 100.124419
dropped events: 0
read events: 74
... and so on.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Trond Myklebust [Mon, 11 Sep 2017 17:09:37 +0000 (13:09 -0400)]
pNFS: Use the standard I/O stateid when calling LAYOUTGET
Instead of having a private method for copying the open/delegation stateid,
use the same call that is used for standard I/O through the MDS.
Note that this means we transmit the stateid with a zero seqid, avoiding
issues with NFS4ERR_OLD_STATEID.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>