Chuck Lever [Mon, 19 Mar 2018 18:23:16 +0000 (14:23 -0400)]
xprtrdma: Fix corner cases when handling device removal
Michal Kalderon has found some corner cases around device unload
with active NFS mounts that I didn't have the imagination to test
when xprtrdma device removal was added last year.
- The ULP device removal handler is responsible for deallocating
the PD. That wasn't clear to me initially, and my own testing
suggested it was not necessary, but that is incorrect.
- The transport destruction path can no longer assume that there
is a valid ID.
- When destroying a transport, ensure that ib_free_cq() is not
invoked on a CQ that was already released.
Reported-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Fixes:
bebd031866ca ("xprtrdma: Support unplugging an HCA from ...")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Jeff Layton [Sun, 18 Mar 2018 12:37:03 +0000 (08:37 -0400)]
nfs4: wake any lock waiters on successful RECLAIM_COMPLETE
If we have a RECLAIM_COMPLETE with a populated cl_lock_waitq, then
that implies that a reconnect has occurred. Since we can't expect a
CB_NOTIFY_LOCK callback at that point, just wake up the entire queue
so that all the tasks can re-poll for their locks.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Jeff Layton [Sun, 18 Mar 2018 12:37:02 +0000 (08:37 -0400)]
nfs4: don't compare clientid in nfs4_wake_lock_waiter
The task is expected to sleep for a while here, and it's possible that
a new EXCHANGE_ID has occurred in the interim, and we were assigned a
new clientid. Since this is a per-client list, there isn't a lot of
value in vetting the clientid on the incoming request.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Jeff Layton [Sun, 18 Mar 2018 12:37:01 +0000 (08:37 -0400)]
nfs4: always reset notified flag to false before repolling for lock
We may get a notification and lose the race to another client. Ensure
that we wait again for a notification in that case.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Fri, 16 Mar 2018 14:33:55 +0000 (10:33 -0400)]
sunrpc: Add static trace point to report result of RPC ping
This information can help track down local misconfiguration issues
as well as network partitions and unresponsive servers.
There are several ways to send a ping, and with transport multi-
plexing, the exact rpc_xprt that is used is sometimes not known by
the upper layer. The rpc_xprt pointer passed to the trace point
call also has to be RCU-safe.
I found a spot inside the client FSM where an rpc_xprt pointer is
always available and safe to use.
Suggested-by: Bill Baker <Bill.Baker@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Fri, 16 Mar 2018 14:33:49 +0000 (10:33 -0400)]
sunrpc: Add static trace point to report RPC latency stats
Introduce a low-overhead mechanism to report information about
latencies of individual RPCs. The goal is to enable user space to
filter the trace record for latency outliers, or build histograms,
etc.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Fri, 16 Mar 2018 14:33:44 +0000 (10:33 -0400)]
sunrpc: Simplify synopsis of some trace points
Clean up: struct rpc_task carries a pointer to a struct rpc_clnt,
and in fact task->tk_client is always what is passed into trace
points that are already passing @task.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Mon, 5 Mar 2018 20:13:13 +0000 (15:13 -0500)]
SUNRPC: Make num_reqs a non-atomic integer
If recording xprt->stat.max_slots is moved into xprt_alloc_slot,
then xprt->num_reqs is never manipulated outside
xprt->reserve_lock. There's no longer a need for xprt->num_reqs to
be atomic.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Mon, 5 Mar 2018 20:13:07 +0000 (15:13 -0500)]
SUNRPC: Make RTT measurement more precise (Send)
Some RPC transports have more overhead in their send_request
callouts than others. For example, for RPC-over-RDMA:
- Marshaling an RPC often has to DMA map the RPC arguments
- Registration methods perform memory registration as part of
marshaling
To capture just server and network latencies more precisely: when
sending a Call, capture the rq_xtime timestamp _after_ the transport
header has been marshaled.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Mon, 5 Mar 2018 20:13:02 +0000 (15:13 -0500)]
SUNRPC: Make RTT measurement more precise (Receive)
Some RPC transports have more overhead in their reply handlers
than others. For example, for RPC-over-RDMA:
- RPC completion has to wait for memory invalidation, which is
not a part of the server/network round trip
- Recently a context switch was introduced into the reply handler,
which further artificially inflates the measure of RPC RTT
To capture just server and network latencies more precisely: when
receiving a reply, compute the RTT as soon as the XID is recognized
rather than at RPC completion time.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Mon, 5 Mar 2018 20:12:57 +0000 (15:12 -0500)]
SUNRPC: Move xprt_update_rtt callsite
Since commit
33849792cbcd ("xprtrdma: Detect unreachable NFS/RDMA
servers more reliably"), the xprtrdma transport now has a ->timer
callout. But xprtrdma does not need to compute RTT data, only UDP
needs that. Move the xprt_update_rtt call into the UDP transport
implementation.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Wed, 28 Feb 2018 20:31:05 +0000 (15:31 -0500)]
xprtrdma: Move creation of rl_rdmabuf to rpcrdma_create_req
Refactor: Both rpcrdma_create_req call sites have to allocate the
buffer where the transport header is built, so just move that
allocation into rpcrdma_create_req.
This buffer is a fixed size. There's no needed information available
in call_allocate that is not also available when the transport is
created.
The original purpose for allocating these buffers on demand was to
reduce the possibility that an allocation failure during transport
creation will hork the mount operation during low memory scenarios.
Some relief for this rare possibility is coming up in the next few
patches.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Wed, 28 Feb 2018 20:30:59 +0000 (15:30 -0500)]
xprtrdma: Chain Send to FastReg WRs
With FRWR, the client transport can perform memory registration and
post a Send with just a single ib_post_send.
This reduces contention between the send_request path and the Send
Completion handlers, and reduces the overhead of registering a chunk
that has multiple segments.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Wed, 28 Feb 2018 20:30:54 +0000 (15:30 -0500)]
xprtrdma: "Support" call-only RPCs
RPC-over-RDMA version 1 credit accounting relies on there being a
response message for every RPC Call. This means that RPC procedures
that have no reply will disrupt credit accounting, just in the same
way as a retransmit would (since it is sent because no reply has
arrived). Deal with the "no reply" case the same way.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Wed, 28 Feb 2018 20:30:49 +0000 (15:30 -0500)]
xprtrdma: Reduce number of MRs created by rpcrdma_mrs_create
Create fewer MRs on average. Many workloads don't need as many as
32 MRs, and the transport can now quickly restock the MR free list.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Wed, 28 Feb 2018 20:30:44 +0000 (15:30 -0500)]
xprtrdma: ->send_request returns -EAGAIN when there are no free MRs
Currently, when the MR free list is exhausted during marshaling, the
RPC/RDMA transport places the RPC task on the delayq, which forces a
wait for HZ >> 2 before the marshal and send is retried.
With this change, the transport now places such an RPC task on the
pending queue, and wakes it just as soon as more MRs have been
created. Creating more MRs typically takes less than a millisecond,
and this waking mechanism is less deadlock-prone.
Moreover, the waiting RPC task is holding the transport's write
lock, which blocks the transport from sending RPCs. Therefore faster
recovery from MR exhaustion is desirable.
This is the same mechanism that the TCP transport utilizes when
handling write buffer space exhaustion.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Wed, 28 Feb 2018 20:30:38 +0000 (15:30 -0500)]
xprtrdma: Remove xprt-specific connect cookie
Clean up: The generic rq_connect_cookie is sufficient to detect RPC
Call retransmission.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Wed, 28 Feb 2018 20:30:33 +0000 (15:30 -0500)]
xprtrdma: Remove arbitrary limit on initiator depth
Clean up: We need to check only that the value does not exceed the
range of the u8 field it's going into.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Wed, 28 Feb 2018 20:30:27 +0000 (15:30 -0500)]
xprtrdma: Fix latency regression on NUMA NFS/RDMA clients
With v4.15, on one of my NFS/RDMA clients I measured a nearly
doubling in the latency of small read and write system calls. There
was no change in server round trip time. The extra latency appears
in the whole RPC execution path.
"git bisect" settled on commit
ccede7598588 ("xprtrdma: Spread reply
processing over more CPUs") .
After some experimentation, I found that leaving the WQ bound and
allowing the scheduler to pick the dispatch CPU seems to eliminate
the long latencies, and it does not introduce any new regressions.
The fix is implemented by reverting only the part of
commit
ccede7598588 ("xprtrdma: Spread reply processing over more
CPUs") that dispatches RPC replies specifically on the CPU where the
matching RPC call was made.
Interestingly, saving the CPU number and later queuing reply
processing there was effective _only_ for a NFS READ and WRITE
request. On my NUMA client, in-kernel RPC reply processing for
asynchronous RPCs was dispatched on the same CPU where the RPC call
was made, as expected. However synchronous RPCs seem to get their
reply dispatched on some other CPU than where the call was placed,
every time.
Fixes:
ccede7598588 ("xprtrdma: Spread reply processing over ... ")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: stable@vger.kernel.org # v4.15+
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Linus Torvalds [Sun, 4 Mar 2018 22:54:11 +0000 (14:54 -0800)]
Linux 4.16-rc4
Linus Torvalds [Sun, 4 Mar 2018 20:12:48 +0000 (12:12 -0800)]
Merge branch 'x86/urgent' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"A small set of fixes for x86:
- Add missing instruction suffixes to assembly code so it can be
compiled by newer GAS versions without warnings.
- Switch refcount WARN exceptions to UD2 as we did in general
- Make the reboot on Intel Edison platforms work
- A small documentation update so text and sample command match"
* 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Documentation, x86, resctrl: Make text and sample command match
x86/platform/intel-mid: Handle Intel Edison reboot correctly
x86/asm: Add instruction suffixes to bitops
x86/entry/64: Add instruction suffix
x86/refcounts: Switch to UD2 for exceptions
Linus Torvalds [Sun, 4 Mar 2018 19:40:16 +0000 (11:40 -0800)]
Merge branch 'x86-pti-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86/pti fixes from Thomas Gleixner:
"Three fixes related to melted spectrum:
- Sync the cpu_entry_area page table to initial_page_table on 32 bit.
Otherwise suspend/resume fails because resume uses
initial_page_table and triggers a triple fault when accessing the
cpu entry area.
- Zero the SPEC_CTL MRS on XEN before suspend to address a
shortcoming in the hypervisor.
- Fix another switch table detection issue in objtool"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu_entry_area: Sync cpu_entry_area to initial_page_table
objtool: Fix another switch table detection issue
x86/xen: Zero MSR_IA32_SPEC_CTRL before suspend
Linus Torvalds [Sun, 4 Mar 2018 19:34:49 +0000 (11:34 -0800)]
Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
"A small set of fixes from the timer departement:
- Add a missing timer wheel clock forward when migrating timers off a
unplugged CPU to prevent operating on a stale clock base and
missing timer deadlines.
- Use the proper shift count to extract data from a register value to
prevent evaluating unrelated bits
- Make the error return check in the FSL timer driver work correctly.
Checking an unsigned variable for less than zero does not really
work well.
- Clarify the confusing comments in the ARC timer code"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timers: Forward timer base before migrating timers
clocksource/drivers/arc_timer: Update some comments
clocksource/drivers/mips-gic-timer: Use correct shift count to extract data
clocksource/drivers/fsl_ftm_timer: Fix error return checking
Linus Torvalds [Sun, 4 Mar 2018 19:33:04 +0000 (11:33 -0800)]
Merge branch 'irq-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull irq fixlet from Thomas Gleixner:
"Just a documentation update for the missing device tree property of
the R-Car M3N interrupt controller"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
dt-bindings/irqchip/renesas-irqc: Document R-Car M3-N support
Linus Torvalds [Sun, 4 Mar 2018 19:04:27 +0000 (11:04 -0800)]
Merge tag 'for-4.16-rc3-tag' of git://git./linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- when NR_CPUS is large, a SRCU structure can significantly inflate
size of the main filesystem structure that would not be possible to
allocate by kmalloc, so the kvalloc fallback is used
- improved error handling
- fix endiannes when printing some filesystem attributes via sysfs,
this is could happen when a filesystem is moved between different
endianity hosts
- send fixes: the NO_HOLE mode should not send a write operation for a
file hole
- fix log replay for for special files followed by file hardlinks
- fix log replay failure after unlink and link combination
- fix max chunk size calculation for DUP allocation
* tag 'for-4.16-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
Btrfs: fix log replay failure after unlink and link combination
Btrfs: fix log replay failure after linking special file and fsync
Btrfs: send, fix issuing write op when processing hole in no data mode
btrfs: use proper endianness accessors for super_copy
btrfs: alloc_chunk: fix DUP stripe size handling
btrfs: Handle btrfs_set_extent_delalloc failure in relocate_file_extent_cluster
btrfs: handle failure of add_pending_csums
btrfs: use kvzalloc to allocate btrfs_fs_info
Linus Torvalds [Sat, 3 Mar 2018 22:55:20 +0000 (14:55 -0800)]
Merge branch 'i2c/for-current-fixed' of git://git./linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"A driver fix and a documentation fix (which makes dependency handling
for the next cycle easier)"
* 'i2c/for-current-fixed' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: octeon: Prevent error message on bus error
dt-bindings: at24: sort manufacturers alphabetically
Linus Torvalds [Sat, 3 Mar 2018 22:32:00 +0000 (14:32 -0800)]
Merge branch 'libnvdimm-fixes' of git://git./linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fixes from Dan Williams:
"A 4.16 regression fix, three fixes for -stable, and a cleanup fix:
- During the merge window support for the new ACPI NVDIMM Platform
Capabilities structure disabled support for "deep flush", a
force-unit- access like mechanism for persistent memory. Restore
that mechanism.
- VFIO like RDMA is yet one more memory registration / pinning
interface that is incompatible with Filesystem-DAX. Disable long
term pins of Filesystem-DAX mappings via VFIO.
- The Filesystem-DAX detection to prevent long terms pins mistakenly
also disabled Device-DAX pins which are not subject to the same
block- map collision concerns.
- Similar to the setup path, softlockup warnings can trigger in the
shutdown path for large persistent memory namespaces. Teach
for_each_device_pfn() to perform cond_resched() in all cases.
- Boaz noticed that the might_sleep() in dax_direct_access() is stale
as of the v4.15 kernel.
These have received a build success notification from the 0day robot,
and the longterm pin fixes have appeared in -next. However, I recently
rebased the tree to remove some other fixes that need to be reworked
after review feedback.
* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
memremap: fix softlockup reports at teardown
libnvdimm: re-enable deep flush for pmem devices via fsync()
vfio: disable filesystem-dax page pinning
dax: fix vma_is_fsdax() helper
dax: ->direct_access does not sleep anymore
Linus Torvalds [Sat, 3 Mar 2018 18:37:01 +0000 (10:37 -0800)]
Merge tag 'kbuild-fixes-v4.16' of git://git./linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- suppress sparse warnings about unknown attributes
- fix typos and stale comments
- fix build error of arch/sh
- fix wrong use of ld-option vs cc-ldoption
- remove redundant GCC_PLUGINS_CFLAGS assignment
- fix another memory leak of Kconfig
- fix line number in error messages of Kconfig
- do not write confusing CONFIG_DEFCONFIG_LIST out to .config
- add xstrdup() to Kconfig to handle memory shortage errors
- show also a Debian package name if ncurses is missing
* tag 'kbuild-fixes-v4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
MAINTAINERS: take over Kconfig maintainership
kconfig: fix line number in recursive inclusion error message
Coccinelle: memdup: Fix typo in warning messages
kconfig: Update ncurses package names for menuconfig
kbuild/kallsyms: trivial typo fix
kbuild: test --build-id linker flag by ld-option instead of cc-ldoption
kbuild: drop superfluous GCC_PLUGINS_CFLAGS assignment
kconfig: Don't leak choice names during parsing
sh: fix build error for empty CONFIG_BUILTIN_DTB_SOURCE
kconfig: set SYMBOL_AUTO to the symbol marked with defconfig_list
kconfig: add xstrdup() helper
kbuild: disable sparse warnings about unknown attributes
Makefile: Fix lying comment re. silentoldconfig
Linus Torvalds [Sat, 3 Mar 2018 18:27:14 +0000 (10:27 -0800)]
Merge tag 'media/v4.16-3' of git://git./linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
- some build fixes with randconfigs
- an m88ds3103 fix to prevent an OOPS if the chip doesn't provide the
right version during probe (with can happen if the hardware hangs)
- a potential out of array bounds reference in tvp5150
- some fixes and improvements in the DVB memory mapped API (added for
kernel 4.16)
* tag 'media/v4.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
media: vb2: Makefile: place vb2-trace together with vb2-core
media: Don't let tvp5150_get_vbi() go out of vbi_ram_default array
media: dvb: update buffer mmaped flags and frame counter
media: dvb: add continuity error indicators for memory mapped buffers
media: dmxdev: Fix the logic that enables DMA mmap support
media: dmxdev: fix error code for invalid ioctls
media: m88ds3103: don't call a non-initalized function
media: au0828: add VIDEO_V4L2 dependency
media: dvb: fix DVB_MMAP dependency
media: dvb: fix DVB_MMAP symbol name
media: videobuf2: fix build issues with vb2-trace
media: videobuf2: Add VIDEOBUF2_V4L2 Kconfig option for VB2 V4L2 part
Linus Torvalds [Sat, 3 Mar 2018 17:59:51 +0000 (09:59 -0800)]
Merge tag 'linux-watchdog-4.16-fixes-1' of git://linux-watchdog.org/linux-watchdog
Pull watchdog fixes from Wim Van Sebroeck:
- rave-sp: add NVMEM dependency
- build fixes for i6300esb_wdt, xen_wdt and sp5100_tco
* tag 'linux-watchdog-4.16-fixes-1' of git://www.linux-watchdog.org/linux-watchdog:
watchdog: sp5100_tco.c: fix potential build failure
watchdog: xen_wdt: fix potential build failure
watchdog: i6300esb: fix build failure
watchdog: rave-sp: add NVMEM dependency
Linus Torvalds [Sat, 3 Mar 2018 03:40:43 +0000 (19:40 -0800)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull KVM fixes from Radim Krčmář:
"x86:
- fix NULL dereference when using userspace lapic
- optimize spectre v1 mitigations by allowing guests to use LFENCE
- make microcode revision configurable to prevent guests from
unnecessarily blacklisting spectre v2 mitigation feature"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: fix vcpu initialization with userspace lapic
KVM: X86: Allow userspace to define the microcode version
KVM: X86: Introduce kvm_get_msr_feature()
KVM: SVM: Add MSR-based feature support for serializing LFENCE
KVM: x86: Add a framework for supporting MSR-based features
Dan Williams [Wed, 7 Feb 2018 03:34:11 +0000 (19:34 -0800)]
memremap: fix softlockup reports at teardown
The cond_resched() currently in the setup path needs to be duplicated in
the teardown path. Rather than require each instance of
for_each_device_pfn() to open code the same sequence, embed it in the
helper.
Link: https://github.com/intel/ixpdimm_sw/issues/11
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: <stable@vger.kernel.org>
Fixes:
71389703839e ("mm, zone_device: Replace {get, put}_zone_device_page()...")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dave Jiang [Sat, 3 Mar 2018 03:31:40 +0000 (19:31 -0800)]
libnvdimm: re-enable deep flush for pmem devices via fsync()
Re-enable deep flush so that users always have a way to be sure that a
write makes it all the way out to media. Writes from the PMEM driver
always arrive at the NVDIMM since movnt is used to bypass the cache, and
the driver relies on the ADR (Asynchronous DRAM Refresh) mechanism to
flush write buffers on power failure. The Deep Flush mechanism is there
to explicitly write buffers to protect against (rare) ADR failure. This
change prevents a regression in deep flush behavior so that applications
can continue to depend on fsync() as a mechanism to trigger deep flush
in the filesystem-DAX case.
Fixes:
06e8ccdab15f4 ("acpi: nfit: Add support for detect platform CPU cache...")
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Masahiro Yamada [Fri, 2 Mar 2018 13:04:59 +0000 (22:04 +0900)]
MAINTAINERS: take over Kconfig maintainership
I have recently picked up Kconfig patches to my tree without any
declaration. Making it official now.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Dan Williams [Sun, 4 Feb 2018 18:34:02 +0000 (10:34 -0800)]
vfio: disable filesystem-dax page pinning
Filesystem-DAX is incompatible with 'longterm' page pinning. Without
page cache indirection a DAX mapping maps filesystem blocks directly.
This means that the filesystem must not modify a file's block map while
any page in a mapping is pinned. In order to prevent the situation of
userspace holding of filesystem operations indefinitely, disallow
'longterm' Filesystem-DAX mappings.
RDMA has the same conflict and the plan there is to add a 'with lease'
mechanism to allow the kernel to notify userspace that the mapping is
being torn down for block-map maintenance. Perhaps something similar can
be put in place for vfio.
Note that xfs and ext4 still report:
"DAX enabled. Warning: EXPERIMENTAL, use at your own risk"
...at mount time, and resolving the dax-dma-vs-truncate problem is one
of the last hurdles to remove that designation.
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: kvm@vger.kernel.org
Cc: <stable@vger.kernel.org>
Reported-by: Haozhong Zhang <haozhong.zhang@intel.com>
Tested-by: Haozhong Zhang <haozhong.zhang@intel.com>
Fixes:
d475c6346a38 ("dax,ext2: replace XIP read and write with DAX I/O")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Linus Torvalds [Sat, 3 Mar 2018 01:44:39 +0000 (17:44 -0800)]
Merge tag 'pci-v4.16-fixes-2' of git://git./linux/kernel/git/helgaas/pci
Pull PCI fixes from Bjorn Helgaas:
- Update pci.ids location (documentation only) (Randy Dunlap)
- Fix a crash when BIOS didn't assign a BAR and we try to enlarge it
(Christian König)
* tag 'pci-v4.16-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI: Allow release of resources that were never assigned
PCI: Update location of pci.ids file
Linus Torvalds [Fri, 2 Mar 2018 21:05:20 +0000 (13:05 -0800)]
Merge branch 'parisc-4.16-1' of git://git./linux/kernel/git/deller/parisc-linux
Pull parisc fixes from Helge Deller:
- a patch to change the ordering of cache and TLB flushes to hopefully
fix the random segfaults we very rarely face (by Dave Anglin).
- a patch to hide the virtual kernel memory layout due to security
reasons.
- two small patches to make the kernel run more smoothly under qemu.
* 'parisc-4.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Reduce irq overhead when run in qemu
parisc: Use cr16 interval timers unconditionally on qemu
parisc: Check if secondary CPUs want own PDC calls
parisc: Hide virtual kernel memory layout
parisc: Fix ordering of cache and TLB flushes
Linus Torvalds [Fri, 2 Mar 2018 18:19:57 +0000 (10:19 -0800)]
Merge tag 'for-linus-4.16a-rc4-tag' of git://git./linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
"Five minor fixes for Xen-specific drivers"
* tag 'for-linus-4.16a-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
pvcalls-front: 64-bit align flags
x86/xen: add tty0 and hvc0 as preferred consoles for dom0
xen-netfront: Fix hang on device removal
xen/pirq: fix error path cleanup when binding MSIs
xen/pvcalls: fix null pointer dereference on map->sock
Linus Torvalds [Fri, 2 Mar 2018 18:05:10 +0000 (10:05 -0800)]
Merge tag 'ceph-for-4.16-rc4' of git://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
"A cap handling fix from Zhi that ensures that metadata writeback isn't
delayed and three error path memory leak fixups from Chengguang"
* tag 'ceph-for-4.16-rc4' of git://github.com/ceph/ceph-client:
ceph: fix potential memory leak in init_caches()
ceph: fix dentry leak when failing to init debugfs
libceph, ceph: avoid memory leak when specifying same option several times
ceph: flush dirty caps of unlinked inode ASAP
Linus Torvalds [Fri, 2 Mar 2018 17:35:36 +0000 (09:35 -0800)]
Merge tag 'for-linus-
20180302' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"A collection of fixes for this series. This is a little larger than
usual at this time, but that's mainly because I was out on vacation
last week. Nothing in here is major in any way, it's just two weeks of
fixes. This contains:
- NVMe pull from Keith, with a set of fixes from the usual suspects.
- mq-deadline zone unlock fix from Damien, fixing an issue with the
SMR zone locking added for 4.16.
- two bcache fixes sent in by Michael, with changes from Coly and
Tang.
- comment typo fix from Eric for blktrace.
- return-value error handling fix for nbd, from Gustavo.
- fix a direct-io case where we don't defer to a completion handler,
making us sleep from IRQ device completion. From Jan.
- a small series from Jan fixing up holes around handling of bdev
references.
- small set of regression fixes from Jiufei, mostly fixing problems
around the gendisk pointer -> partition index change.
- regression fix from Ming, fixing a boundary issue with the discard
page cache invalidation.
- two-patch series from Ming, fixing both a core blk-mq-sched and
kyber issue around token freeing on a requeue condition"
* tag 'for-linus-
20180302' of git://git.kernel.dk/linux-block: (24 commits)
block: fix a typo
block: display the correct diskname for bio
block: fix the count of PGPGOUT for WRITE_SAME
mq-deadline: Make sure to always unlock zones
nvmet: fix PSDT field check in command format
nvme-multipath: fix sysfs dangerously created links
nbd: fix return value in error handling path
bcache: fix kcrashes with fio in RAID5 backend dev
bcache: correct flash only vols (check all uuids)
blktrace_api.h: fix comment for struct blk_user_trace_setup
blockdev: Avoid two active bdev inodes for one device
genhd: Fix BUG in blkdev_open()
genhd: Fix use after free in __blkdev_get()
genhd: Add helper put_disk_and_module()
genhd: Rename get_disk() to get_disk_and_module()
genhd: Fix leaked module reference for NVME devices
direct-io: Fix sleep in atomic due to sync AIO
nvme-pci: Fix nvme queue cleanup if IRQ setup fails
block: kyber: fix domain token leak during requeue
blk-mq: don't call io sched's .requeue_request when requeueing rq to ->dispatch
...
Linus Torvalds [Fri, 2 Mar 2018 16:44:11 +0000 (08:44 -0800)]
Merge tag 'mmc-v4.16-rc3' of git://git./linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
"MMC core:
- mmc: core: Avoid hang when claiming host
MMC host:
- dw_mmc: Avoid hang when accessing registers
- dw_mmc: Fix out-of-bounds access for slot's caps
- dw_mmc-k3: Fix out-of-bounds access through DT alias
- sdhci-pci: Fix S0i3 for Intel BYT-based controllers"
* tag 'mmc-v4.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: core: Avoid hanging to claim host for mmc via some nested calls
mmc: dw_mmc: Avoid accessing registers in runtime suspended state
mmc: dw_mmc: Fix out-of-bounds access for slot's caps
mmc: dw_mmc: Factor out dw_mci_init_slot_caps
mmc: dw_mmc-k3: Fix out-of-bounds access through DT alias
mmc: sdhci-pci: Fix S0i3 for Intel BYT-based controllers
Linus Torvalds [Fri, 2 Mar 2018 16:17:49 +0000 (08:17 -0800)]
Merge tag 'pm-4.16-rc4' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix three issues in cpufreq drivers: one recent regression, one
leftover Kconfig dependency and one old but "stable" material.
Specifics:
- Make the task scheduler load and utilization signals be
frequency-invariant again after recent changes in the SCPI cpufreq
driver (Dietmar Eggemann).
- Drop an unnecessary leftover Kconfig dependency from the SCPI
cpufreq driver (Sudeep Holla).
- Fix the initialization of the s3c24xx cpufreq driver (Viresh
Kumar)"
* tag 'pm-4.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: s3c24xx: Fix broken s3c_cpufreq_init()
cpufreq: scpi: Fix incorrect arm_big_little config dependency
cpufreq: scpi: invoke frequency-invariance setter function
Masahiro Yamada [Fri, 2 Mar 2018 07:05:12 +0000 (16:05 +0900)]
kconfig: fix line number in recursive inclusion error message
When recursive inclusion is detected, the line number of the last
'included from:' is wrong.
[Test Case]
Kconfig:
-------->8--------
source "Kconfig2"
-------->8--------
Kconfig2:
-------->8--------
source "Kconfig3"
-------->8--------
Kconfig3:
-------->8--------
source "Kconfig"
-------->8--------
[Result]
$ make allyesconfig
scripts/kconfig/conf --allyesconfig Kconfig
Kconfig:1: recursive inclusion detected. Inclusion path:
current file : 'Kconfig'
included from: 'Kconfig3:1'
included from: 'Kconfig2:1'
included from: 'Kconfig:3'
scripts/kconfig/Makefile:89: recipe for target 'allyesconfig' failed
make[1]: *** [allyesconfig] Error 1
Makefile:512: recipe for target 'allyesconfig' failed
make: *** [allyesconfig] Error 2
where we expect
current file : 'Kconfig'
included from: 'Kconfig3:1'
included from: 'Kconfig2:1'
included from: 'Kconfig:1'
The 'iter->lineno+1' in the second fpinrtf() should be 'iter->lineno-1'.
I refactored the code to merge the two fprintf() calls.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Ulf Magnusson <ulfalizer@gmail.com>
Dafna Hirschfeld [Thu, 1 Mar 2018 08:57:21 +0000 (10:57 +0200)]
Coccinelle: memdup: Fix typo in warning messages
Replace 'kmemdep' with 'kmemdup' in warning messages.
Signed-off-by: Dafna Hirschfeld <dafna3@gmail.com>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Acked-by: Nicolas Palix <nicolas.palix@imag.fr>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Jan Glauber [Tue, 27 Feb 2018 15:42:13 +0000 (16:42 +0100)]
i2c: octeon: Prevent error message on bus error
The error message:
[Fri Feb 16 13:42:13 2018] i2c-thunderx 0000:01:09.4: unhandled state: 0
is mis-leading as state 0 (bus error) is not an unknown state.
Return -EIO as before but avoid printing the message. Also rename
STAT_ERROR to STATE_BUS_ERROR.
Signed-off-by: Jan Glauber <jglauber@cavium.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Wolfram Sang [Fri, 2 Mar 2018 10:04:33 +0000 (11:04 +0100)]
Merge tag 'at24-4.16-rc4-for-wolfram' of git://git./linux/kernel/git/brgl/linux into i2c/for-current
Pull in this fixup to get rid of a dependency for the next cycle:
"- sort the manufacturers in DT bindings alphabetically"
Rafael J. Wysocki [Fri, 2 Mar 2018 09:44:44 +0000 (10:44 +0100)]
Merge branch 'cpufreq-scpi'
* cpufreq-scpi:
cpufreq: scpi: Fix incorrect arm_big_little config dependency
cpufreq: scpi: invoke frequency-invariance setter function
Helge Deller [Mon, 12 Feb 2018 20:43:55 +0000 (21:43 +0100)]
parisc: Reduce irq overhead when run in qemu
When run under QEMU, calling mfctl(16) creates some overhead because the
qemu timer has to be scaled and moved into the register. This patch
reduces the number of calls to mfctl(16) by moving the calls out of the
loops.
Additionally, increase the minimal time interval to 8000 cycles instead
of 500 to compensate possible QEMU delays when delivering interrupts.
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org # 4.14+
Helge Deller [Fri, 12 Jan 2018 21:44:00 +0000 (22:44 +0100)]
parisc: Use cr16 interval timers unconditionally on qemu
When running on qemu we know that the (emulated) cr16 cpu-internal
clocks are syncronized. So let's use them unconditionally on qemu.
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org # 4.14+
Helge Deller [Fri, 12 Jan 2018 21:51:22 +0000 (22:51 +0100)]
parisc: Check if secondary CPUs want own PDC calls
The architecture specification says (for 64-bit systems): PDC is a per
processor resource, and operating system software must be prepared to
manage separate pointers to PDCE_PROC for each processor. The address
of PDCE_PROC for the monarch processor is stored in the Page Zero
location MEM_PDC. The address of PDCE_PROC for each non-monarch
processor is passed in gr26 when PDCE_RESET invokes OS_RENDEZ.
Currently we still use one PDC for all CPUs, but in case we face a
machine which is following the specification let's warn about it.
Signed-off-by: Helge Deller <deller@gmx.de>
Helge Deller [Fri, 12 Jan 2018 21:57:15 +0000 (22:57 +0100)]
parisc: Hide virtual kernel memory layout
For security reasons do not expose the virtual kernel memory layout to
userspace.
Signed-off-by: Helge Deller <deller@gmx.de>
Suggested-by: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org # 4.15
Reviewed-by: Kees Cook <keescook@chromium.org>
John David Anglin [Tue, 27 Feb 2018 13:16:07 +0000 (08:16 -0500)]
parisc: Fix ordering of cache and TLB flushes
The change to flush_kernel_vmap_range() wasn't sufficient to avoid the
SMP stalls. The problem is some drivers call these routines with
interrupts disabled. Interrupts need to be enabled for flush_tlb_all()
and flush_cache_all() to work. This version adds checks to ensure
interrupts are not disabled before calling routines that need IPI
interrupts. When interrupts are disabled, we now drop into slower code.
The attached change fixes the ordering of cache and TLB flushes in
several cases. When we flush the cache using the existing PTE/TLB
entries, we need to flush the TLB after doing the cache flush. We don't
need to do this when we flush the entire instruction and data caches as
these flushes don't use the existing TLB entries. The same is true for
tmpalias region flushes.
The flush_kernel_vmap_range() and invalidate_kernel_vmap_range()
routines have been updated.
Secondly, we added a new purge_kernel_dcache_range_asm() routine to
pacache.S and use it in invalidate_kernel_vmap_range(). Nominally,
purges are faster than flushes as the cache lines don't have to be
written back to memory.
Hopefully, this is sufficient to resolve the remaining problems due to
cache speculation. So far, testing indicates that this is the case. I
did work up a patch using tmpalias flushes, but there is a performance
hit because we need the physical address for each page, and we also need
to sequence access to the tmpalias flush code. This increases the
probability of stalls.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Cc: stable@vger.kernel.org # 4.9+
Signed-off-by: Helge Deller <deller@gmx.de>
Arvind Prasanna [Wed, 28 Feb 2018 21:32:19 +0000 (16:32 -0500)]
kconfig: Update ncurses package names for menuconfig
The package name is ncurses-devel for Redhat based distros
and libncurses-dev for Debian based distros.
Signed-off-by: Arvind Prasanna <arvindprasanna@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Cao jin [Tue, 27 Feb 2018 08:16:19 +0000 (16:16 +0800)]
kbuild/kallsyms: trivial typo fix
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Masahiro Yamada [Fri, 23 Feb 2018 04:56:52 +0000 (13:56 +0900)]
kbuild: test --build-id linker flag by ld-option instead of cc-ldoption
'--build-id' is passed to $(LD), so it should be tested by 'ld-option'.
This seems a kind of misconversion when ld-option was renamed to
cc-ldoption.
Commit
f86fd3066052 ("kbuild: rename ld-option to cc-ldoption") renamed
all instances of 'ld-option' to 'cc-ldoption'.
Then, commit
691ef3e7fdc1 ("kbuild: introduce ld-option") re-added
'ld-option' as a new implementation.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Cao jin [Wed, 21 Feb 2018 04:25:07 +0000 (12:25 +0800)]
kbuild: drop superfluous GCC_PLUGINS_CFLAGS assignment
GCC_PLUGINS_CFLAGS is already in the environment, so it is superfluous
to add it in commandline of final build of init/.
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Masahiro Yamada [Tue, 20 Feb 2018 11:40:29 +0000 (20:40 +0900)]
kconfig: Don't leak choice names during parsing
The named choice is not used in the kernel tree, but if it were used,
it would not be freed.
The intention of the named choice can be seen in the log of
commit
5a1aa8a1aff6 ("kconfig: add named choice group").
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Ulf Magnusson <ulfalizer@gmail.com>
Masahiro Yamada [Mon, 19 Feb 2018 17:09:42 +0000 (02:09 +0900)]
sh: fix build error for empty CONFIG_BUILTIN_DTB_SOURCE
If CONFIG_USE_BUILTIN_DTB is enabled, but CONFIG_BUILTIN_DTB_SOURCE
is empty (for example, allmodconfig), it fails to build, like this:
make[2]: *** No rule to make target 'arch/sh/boot/dts/.dtb.o',
needed by 'arch/sh/boot/dts/built-in.o'. Stop.
Surround obj-y with ifneq ... endif.
I replaced $(CONFIG_USE_BUILTIN_DTB) with 'y' since this is always
the case from the following code from arch/sh/Makefile:
core-$(CONFIG_USE_BUILTIN_DTB) += arch/sh/boot/dts/
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Masahiro Yamada [Fri, 16 Feb 2018 18:38:32 +0000 (03:38 +0900)]
kconfig: set SYMBOL_AUTO to the symbol marked with defconfig_list
The 'defconfig_list' is a weird attribute. If the '.config' is
missing, conf_read_simple() iterates over all visible defaults,
then it uses the first one for which fopen() succeeds.
config DEFCONFIG_LIST
string
depends on !UML
option defconfig_list
default "/lib/modules/$UNAME_RELEASE/.config"
default "/etc/kernel-config"
default "/boot/config-$UNAME_RELEASE"
default "$ARCH_DEFCONFIG"
default "arch/$ARCH/defconfig"
However, like other symbols, the first visible default is always
written out to the .config file. This might be different from what
has been actually used.
For example, on my machine, the third one "/boot/config-$UNAME_RELEASE"
is opened, like follows:
$ rm .config
$ make oldconfig 2>/dev/null
scripts/kconfig/conf --oldconfig Kconfig
#
# using defaults found in /boot/config-4.4.0-112-generic
#
*
* Restart config...
*
*
* IRQ subsystem
*
Expose irq internals in debugfs (GENERIC_IRQ_DEBUGFS) [N/y/?] (NEW)
However, the resulted .config file contains the first one since it is
visible:
$ grep CONFIG_DEFCONFIG_LIST .config
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
In order to stop confusing people, prevent this CONFIG option from
being written to the .config file.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Ulf Magnusson <ulfalizer@gmail.com>
Linus Torvalds [Thu, 1 Mar 2018 23:56:15 +0000 (15:56 -0800)]
Merge tag 'drm-fixes-for-v4.16-rc4' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"Pretty much run of the mill drm fixes.
amdgpu:
- power management fixes
- some display fixes
- one ppc 32-bit dma fix
i915:
- two display fixes
- three gem fixes
sun4i:
- display regression fixes
nouveau:
- display regression fix
virtio-gpu:
- dumb airlied ioctl fix"
* tag 'drm-fixes-for-v4.16-rc4' of git://people.freedesktop.org/~airlied/linux: (25 commits)
drm/amdgpu: skip ECC for SRIOV in gmc late_init
drm/amd/amdgpu: Correct VRAM width for APUs with GMC9
drm/amdgpu: fix&cleanups for wb_clear
drm/amdgpu: Correct sdma_v4 get_wptr(v2)
drm/amd/powerplay: fix power over limit on Fiji
drm/amdgpu:Fixed wrong emit frame size for enc
drm/amdgpu: move WB_FREE to correct place
drm/amdgpu: only flush hotplug work without DC
drm/amd/display: check for ipp before calling cursor operations
drm/i915: Make global seqno known in i915_gem_request_execute tracepoint
drm/i915: Clear the in-use marker on execbuf failure
drm/i915/cnl: Fix PORT_TX_DW5/7 register address
drm/i915/audio: fix check for av_enc_map overflow
drm/i915: Fix rsvd2 mask when out-fence is returned
virtio-gpu: fix ioctl and expose the fixed status to userspace.
drm/sun4i: Protect the TCON pixel clocks
drm/sun4i: Enable the output on the pins (tcon0)
drm/nouveau: prefer XBGR2101010 for addfb ioctl
drm/radeon: insist on 32-bit DMA for Cedar on PPC64/PPC64LE
drm/amd/display: VGA black screen from s3 when attached to hook
...
Linus Torvalds [Thu, 1 Mar 2018 22:32:23 +0000 (14:32 -0800)]
Merge tag 'arc-4.15-rc4' of git://git./linux/kernel/git/vgupta/arc
Pull ARC fixes from Vineet Gupta:
- MCIP aka ARconnect fixes for SMP builds [Euginey]
- preventive fix for SLC (L2 cache) flushing [Euginey]
- Kconfig default fix [Ulf Magnusson]
- trailing semicolon fixes [Luis de Bethencourt]
- other assorted minor fixes
* tag 'arc-4.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: setup cpu possible mask according to possible-cpus dts property
ARC: mcip: update MCIP debug mask when the new cpu came online
ARC: mcip: halt GFRC counter when ARC cores halt
ARCv2: boot log: fix HS48 release number
arc: dts: use 'atmel' as manufacturer for at24 in axs10x_mb
ARC: Fix malformed ARC_EMUL_UNALIGNED default
ARC: boot log: Fix trailing semicolon
ARC: dw2 unwind: Fix trailing semicolon
ARC: Enable fatal signals on boot for dev platforms
ARCv2: Don't pretend we may set L-bit in STATUS32 with kflag instruction
ARCv2: cache: fix slc_entire_op: flush only instead of flush-n-inv
Radim Krčmář [Thu, 1 Mar 2018 14:24:25 +0000 (15:24 +0100)]
KVM: x86: fix vcpu initialization with userspace lapic
Moving the code around broke this rare configuration.
Use this opportunity to finally call lapic reset from vcpu reset.
Reported-by: syzbot+fb7a33a4b6c35007a72b@syzkaller.appspotmail.com
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Fixes:
0b2e9904c159 ("KVM: x86: move LAPIC initialization after VMCS creation")
Cc: stable@vger.kernel.org
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Wanpeng Li [Wed, 28 Feb 2018 06:03:31 +0000 (14:03 +0800)]
KVM: X86: Allow userspace to define the microcode version
Linux (among the others) has checks to make sure that certain features
aren't enabled on a certain family/model/stepping if the microcode version
isn't greater than or equal to a known good version.
By exposing the real microcode version, we're preventing buggy guests that
don't check that they are running virtualized (i.e., they should trust the
hypervisor) from disabling features that are effectively not buggy.
Suggested-by: Filippo Sironi <sironi@amazon.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Liran Alon <liran.alon@oracle.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Wanpeng Li [Wed, 28 Feb 2018 06:03:30 +0000 (14:03 +0800)]
KVM: X86: Introduce kvm_get_msr_feature()
Introduce kvm_get_msr_feature() to handle the msrs which are supported
by different vendors and sharing the same emulation logic.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Liran Alon <liran.alon@oracle.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Linus Torvalds [Thu, 1 Mar 2018 18:50:01 +0000 (10:50 -0800)]
Merge tag 'platform-drivers-x86-v4.16-5' of git://git.infradead.org/linux-platform-drivers-x86
Pull x86 platform drivers fixes from Andy Shevchenko:
- fix a regression on laptops like Dell XPS 9360 where keyboard stopped
working.
- correct sysfs wakeup attribute after removal of some drivers to
reflect that they are not able to wake system up anymore.
* tag 'platform-drivers-x86-v4.16-5' of git://git.infradead.org/linux-platform-drivers-x86:
platform/x86: wmi: Fix misuse of vsprintf extension %pULL
platform/x86: intel-hid: Reset wakeup capable flag on removal
platform/x86: intel-vbtn: Reset wakeup capable flag on removal
platform/x86: intel-vbtn: Only activate tablet mode switch on 2-in-1's
Linus Torvalds [Thu, 1 Mar 2018 18:08:47 +0000 (10:08 -0800)]
Merge branch 'for-next' of git://git./linux/kernel/git/shli/md
Pull MD bugfixes from Shaohua Li:
- fix raid5-ppl flush request handling hang from Artur
- fix a potential deadlock in raid5/10 reshape from BingJing
- fix a deadlock for dm-raid from Heinz
- fix two md-cluster of raid10 from Lidong and Guoqing
- fix a NULL deference problem in device removal from Neil
- fix a NULL deference problem in raid1/raid10 in specific condition
from Yufen
- other cleanup and fixes
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
md/raid1: fix NULL pointer dereference
md: fix a potential deadlock of raid5/raid10 reshape
md-cluster: choose correct label when clustered layout is not supported
md: raid5: avoid string overflow warning
raid5-ppl: fix handling flush requests
md raid10: fix NULL deference in handle_write_completed()
md: only allow remove_and_add_spares when no sync_thread running.
md: document lifetime of internal rdev pointer.
md: fix md_write_start() deadlock w/o metadata devices
MD: Free bioset when md_run fails
raid10: change the size of resync window for clustered raid
md-multipath: Use seq_putc() in multipath_status()
md/raid1: Fix trailing semicolon
md/raid5: simplify uninitialization of shrinker
Linus Torvalds [Thu, 1 Mar 2018 18:06:39 +0000 (10:06 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/pmladek/printk
Pull printk fix from Petr Mladek:
"Make sure that we wake up userspace loggers. This fixes a race
introduced by the console waiter logic during this merge window"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk:
printk: Wake klogd when passing console_lock owner
Joe Perches [Thu, 1 Mar 2018 16:08:23 +0000 (08:08 -0800)]
platform/x86: wmi: Fix misuse of vsprintf extension %pULL
%pULL doesn't officially exist but %pUL does.
Miscellanea:
o Add missing newlines to a couple logging messages
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Darren Hart (VMware) <dvhart@infradead.org>
Tom Lendacky [Fri, 23 Feb 2018 23:18:20 +0000 (00:18 +0100)]
KVM: SVM: Add MSR-based feature support for serializing LFENCE
In order to determine if LFENCE is a serializing instruction on AMD
processors, MSR 0xc0011029 (MSR_F10H_DECFG) must be read and the state
of bit 1 checked. This patch will add support to allow a guest to
properly make this determination.
Add the MSR feature callback operation to svm.c and add MSR 0xc0011029
to the list of MSR-based features. If LFENCE is serializing, then the
feature is supported, allowing the hypervisor to set the value of the
MSR that guest will see. Support is also added to write (hypervisor only)
and read the MSR value for the guest. A write by the guest will result in
a #GP. A read by the guest will return the value as set by the host. In
this way, the support to expose the feature to the guest is controlled by
the hypervisor.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Tom Lendacky [Wed, 21 Feb 2018 19:39:51 +0000 (13:39 -0600)]
KVM: x86: Add a framework for supporting MSR-based features
Provide a new KVM capability that allows bits within MSRs to be recognized
as features. Two new ioctls are added to the /dev/kvm ioctl routine to
retrieve the list of these MSRs and then retrieve their values. A kvm_x86_ops
callback is used to determine support for the listed MSR-based features.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[Tweaked documentation. - Radim]
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Linus Torvalds [Thu, 1 Mar 2018 16:31:23 +0000 (08:31 -0800)]
Merge tag 'sound-4.16-rc4' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"The only core change is the fix for possible memory corruption by ALSA
ctl API since 4.14 kernel due to a thinko.
The rest are all device-specific: in addition to the usual suspects
(HD-audio and USB-audio fixups), a few LPE HDMI audio fixes came in at
this time"
* tag 'sound-4.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: x86: Fix potential crash at error path
ALSA: x86: Fix missing spinlock and mutex initializations
ALSA: control: Fix memory corruption risk in snd_ctl_elem_read
ALSA: hda - Fix pincfg at resume on Lenovo T470 dock
ALSA: usb-audio: Add a quirck for B&W PX headphones
ALSA: hda: Add a power_save blacklist
ALSA: x86: hdmi: Add single_port option for compatible behavior
Linus Torvalds [Thu, 1 Mar 2018 16:19:10 +0000 (08:19 -0800)]
Merge tag 'pinctrl-v4.16-2' of git://git./linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:
"Two smallish pin control fixes: one actual code fix for the Meson and
a MAINTAINERS update.
Summary:
- fix a pin group on the Meson
- assign maintainers for Freescale/NXP pin controllers"
* tag 'pinctrl-v4.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
MAINTAINERS: add Freescale pin controllers
pinctrl: meson-axg: adjust uart_ao_b pin group naming
Linus Torvalds [Thu, 1 Mar 2018 16:17:01 +0000 (08:17 -0800)]
Merge tag 'gpio-v4.16-2' of git://git./linux/kernel/git/linusw/linux-gpio
Pull GPIO fixes from Linus Walleij:
"Fix up device tree properties readout caused by my own refactorings"
* tag 'gpio-v4.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpio: Handle deferred probing in of_find_gpio() properly
gpiolib: Keep returning EPROBE_DEFER when we should
Jiufei Xue [Tue, 27 Feb 2018 12:10:22 +0000 (20:10 +0800)]
block: fix a typo
Fix a typo in pkt_start_recovery.
Fixes:
74d46992e0d9 ("block: replace bi_bdev with a gendisk pointer and partitions index")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jiufei Xue <jiufei.xue@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jiufei Xue [Tue, 27 Feb 2018 12:10:18 +0000 (20:10 +0800)]
block: display the correct diskname for bio
bio_devname use __bdevname to display the device name, and can
only show the major and minor of the part0,
Fix this by using disk_name to display the correct name.
Fixes:
74d46992e0d9 ("block: replace bi_bdev with a gendisk pointer and partitions index")
Reviewed-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jiufei Xue <jiufei.xue@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jiufei Xue [Tue, 27 Feb 2018 12:10:03 +0000 (20:10 +0800)]
block: fix the count of PGPGOUT for WRITE_SAME
The vm counters is counted in sectors, so we should do the conversation
in submit_bio.
Fixes:
74d46992e0d9 ("block: replace bi_bdev with a gendisk pointer and partitions index")
Cc: stable@vger.kernel.org
Reviewed-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jiufei Xue <jiufei.xue@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Chengguang Xu [Thu, 1 Mar 2018 06:24:51 +0000 (14:24 +0800)]
ceph: fix potential memory leak in init_caches()
There is lack of cache destroy operation for ceph_file_cachep
when failing from fscache register.
Signed-off-by: Chengguang Xu <cgxu519@icloud.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Damien Le Moal [Wed, 28 Feb 2018 17:35:29 +0000 (09:35 -0800)]
mq-deadline: Make sure to always unlock zones
In case of a failed write request (all retries failed) and when using
libata, the SCSI error handler calls scsi_finish_command(). In the
case of blk-mq this means that scsi_mq_done() does not get called,
that blk_mq_complete_request() does not get called and also that the
mq-deadline .completed_request() method is not called. This results in
the target zone of the failed write request being left in a locked
state, preventing that any new write requests are issued to the same
zone.
Fix this by replacing the .completed_request() method with the
.finish_request() method as this method is always called whether or
not a request completes successfully. Since the .finish_request()
method is only called by the blk-mq core if a .prepare_request()
method exists, add a dummy .prepare_request() method.
Fixes:
5700f69178e9 ("mq-deadline: Introduce zone locking support")
Cc: Hannes Reinecke <hare@suse.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
[ bvanassche: edited patch description ]
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Masahiro Yamada [Fri, 16 Feb 2018 18:38:31 +0000 (03:38 +0900)]
kconfig: add xstrdup() helper
We already have xmalloc(), xcalloc(), and xrealloc((). Add xstrdup()
as well to save tedious error handling.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Luc Van Oostenryck [Thu, 15 Feb 2018 21:07:50 +0000 (22:07 +0100)]
kbuild: disable sparse warnings about unknown attributes
Currently, sparse issues warnings on code using an attribute
it doesn't know about.
One of the problem with this is that these warnings have no
value for the developer, it's just noise for him. At best these
warnings tell something about some deficiencies of sparse itself
but not about a potential problem with code analyzed.
A second problem with this is that sparse release are, alas,
less frequent than new attributes are added to GCC.
So, avoid the noise by asking sparse to not warn about
attributes it doesn't know about.
Reference: https://marc.info/?l=linux-sparse&m=
151871600016790
Reference: https://marc.info/?l=linux-sparse&m=
151871725417322
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Ulf Magnusson [Tue, 13 Feb 2018 07:58:20 +0000 (08:58 +0100)]
Makefile: Fix lying comment re. silentoldconfig
The comment above the silentoldconfig invocation is outdated.
'make oldconfig' updates just .config and doesn't touch the
include/config/ tree.
This came up in https://lkml.org/lkml/2018/2/12/415.
While fixing the comment, make it more informative by explaining the
purpose of the unfortunately named silentoldconfig.
I can't make sense of the comment re. auto.conf.cmd and a cleaned tree.
include/config/auto.conf and include/config/auto.conf.cmd are both
created simultaneously by silentoldconfig (in
scripts/kconfig/confdata.c, by conf_write_autoconf()), and nothing seems
to remove auto.conf.cmd that wouldn't remove auto.conf. Remove that part
of the comment rather than blindly copying it. It might be a leftover
from an older way of doing things.
The include/config/auto.conf.cmd prerequisite might be there to ensure
that silentoldconfig gets rerun if conf_write_autoconf() fails between
writing out auto.conf.cmd and auto.conf (a comment in the function
indicates that auto.conf is deliberately written out last to mark
completion of the operation). It seems the Makefile dependency between
include/config/auto.conf and .config would already take care of that
though, since include/config/auto.conf would still be out of date re.
.config if the operation fails.
Cop out and leave the prerequisite in for now.
Signed-off-by: Ulf Magnusson <ulfalizer@gmail.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Filipe Manana [Wed, 28 Feb 2018 15:56:10 +0000 (15:56 +0000)]
Btrfs: fix log replay failure after unlink and link combination
If we have a file with 2 (or more) hard links in the same directory,
remove one of the hard links, create a new file (or link an existing file)
in the same directory with the name of the removed hard link, and then
finally fsync the new file, we end up with a log that fails to replay,
causing a mount failure.
Example:
$ mkfs.btrfs -f /dev/sdb
$ mount /dev/sdb /mnt
$ mkdir /mnt/testdir
$ touch /mnt/testdir/foo
$ ln /mnt/testdir/foo /mnt/testdir/bar
$ sync
$ unlink /mnt/testdir/bar
$ touch /mnt/testdir/bar
$ xfs_io -c "fsync" /mnt/testdir/bar
<power failure>
$ mount /dev/sdb /mnt
mount: mount(2) failed: /mnt: No such file or directory
When replaying the log, for that example, we also see the following in
dmesg/syslog:
[71813.671307] BTRFS info (device dm-0): failed to delete reference to bar, inode 258 parent 257
[71813.674204] ------------[ cut here ]------------
[71813.675694] BTRFS: Transaction aborted (error -2)
[71813.677236] WARNING: CPU: 1 PID: 13231 at fs/btrfs/inode.c:4128 __btrfs_unlink_inode+0x17b/0x355 [btrfs]
[71813.679669] Modules linked in: btrfs xfs f2fs dm_flakey dm_mod dax ghash_clmulni_intel ppdev pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper evdev psmouse i2c_piix4 parport_pc i2c_core pcspkr sg serio_raw parport button sunrpc loop autofs4 ext4 crc16 mbcache jbd2 zstd_decompress zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod ata_generic sd_mod virtio_scsi ata_piix libata virtio_pci virtio_ring crc32c_intel floppy virtio e1000 scsi_mod [last unloaded: btrfs]
[71813.679669] CPU: 1 PID: 13231 Comm: mount Tainted: G W 4.15.0-rc9-btrfs-next-56+ #1
[71813.679669] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
[71813.679669] RIP: 0010:__btrfs_unlink_inode+0x17b/0x355 [btrfs]
[71813.679669] RSP: 0018:
ffffc90001cef738 EFLAGS:
00010286
[71813.679669] RAX:
0000000000000025 RBX:
ffff880217ce4708 RCX:
0000000000000001
[71813.679669] RDX:
0000000000000000 RSI:
ffffffff81c14bae RDI:
00000000ffffffff
[71813.679669] RBP:
ffffc90001cef7c0 R08:
0000000000000001 R09:
0000000000000001
[71813.679669] R10:
ffffc90001cef5e0 R11:
ffffffff8343f007 R12:
ffff880217d474c8
[71813.679669] R13:
00000000fffffffe R14:
ffff88021ccf1548 R15:
0000000000000101
[71813.679669] FS:
00007f7cee84c480(0000) GS:
ffff88023fc80000(0000) knlGS:
0000000000000000
[71813.679669] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[71813.679669] CR2:
00007f7cedc1abf9 CR3:
00000002354b4003 CR4:
00000000001606e0
[71813.679669] Call Trace:
[71813.679669] btrfs_unlink_inode+0x17/0x41 [btrfs]
[71813.679669] drop_one_dir_item+0xfa/0x131 [btrfs]
[71813.679669] add_inode_ref+0x71e/0x851 [btrfs]
[71813.679669] ? __lock_is_held+0x39/0x71
[71813.679669] ? replay_one_buffer+0x53/0x53a [btrfs]
[71813.679669] replay_one_buffer+0x4a4/0x53a [btrfs]
[71813.679669] ? rcu_read_unlock+0x3a/0x57
[71813.679669] ? __lock_is_held+0x39/0x71
[71813.679669] walk_up_log_tree+0x101/0x1d2 [btrfs]
[71813.679669] walk_log_tree+0xad/0x188 [btrfs]
[71813.679669] btrfs_recover_log_trees+0x1fa/0x31e [btrfs]
[71813.679669] ? replay_one_extent+0x544/0x544 [btrfs]
[71813.679669] open_ctree+0x1cf6/0x2209 [btrfs]
[71813.679669] btrfs_mount_root+0x368/0x482 [btrfs]
[71813.679669] ? trace_hardirqs_on_caller+0x14c/0x1a6
[71813.679669] ? __lockdep_init_map+0x176/0x1c2
[71813.679669] ? mount_fs+0x64/0x10b
[71813.679669] mount_fs+0x64/0x10b
[71813.679669] vfs_kern_mount+0x68/0xce
[71813.679669] btrfs_mount+0x13e/0x772 [btrfs]
[71813.679669] ? trace_hardirqs_on_caller+0x14c/0x1a6
[71813.679669] ? __lockdep_init_map+0x176/0x1c2
[71813.679669] ? mount_fs+0x64/0x10b
[71813.679669] mount_fs+0x64/0x10b
[71813.679669] vfs_kern_mount+0x68/0xce
[71813.679669] do_mount+0x6e5/0x973
[71813.679669] ? memdup_user+0x3e/0x5c
[71813.679669] SyS_mount+0x72/0x98
[71813.679669] entry_SYSCALL_64_fastpath+0x1e/0x8b
[71813.679669] RIP: 0033:0x7f7cedf150ba
[71813.679669] RSP: 002b:
00007ffca71da688 EFLAGS:
00000206
[71813.679669] Code: 7f a0 e8 51 0c fd ff 48 8b 43 50 f0 0f ba a8 30 2c 00 00 02 72 17 41 83 fd fb 74 11 44 89 ee 48 c7 c7 7d 11 7f a0 e8 38 f5 8d e0 <0f> ff 44 89 e9 ba 20 10 00 00 eb 4d 48 8b 4d b0 48 8b 75 88 4c
[71813.679669] ---[ end trace
83bd473fc5b4663b ]---
[71813.854764] BTRFS: error (device dm-0) in __btrfs_unlink_inode:4128: errno=-2 No such entry
[71813.886994] BTRFS: error (device dm-0) in btrfs_replay_log:2307: errno=-2 No such entry (Failed to recover log tree)
[71813.903357] BTRFS error (device dm-0): cleaner transaction attach returned -30
[71814.128078] BTRFS error (device dm-0): open_ctree failed
This happens because the log has inode reference items for both inode 258
(the first file we created) and inode 259 (the second file created), and
when processing the reference item for inode 258, we replace the
corresponding item in the subvolume tree (which has two names, "foo" and
"bar") witht he one in the log (which only has one name, "foo") without
removing the corresponding dir index keys from the parent directory.
Later, when processing the inode reference item for inode 259, which has
a name of "bar" associated to it, we notice that dir index entries exist
for that name and for a different inode, so we attempt to unlink that
name, which fails because the inode reference item for inode 258 no longer
has the name "bar" associated to it, making a call to btrfs_unlink_inode()
fail with a -ENOENT error.
Fix this by unlinking all the names in an inode reference item from a
subvolume tree that are not present in the inode reference item found in
the log tree, before overwriting it with the item from the log tree.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Wed, 28 Feb 2018 15:55:40 +0000 (15:55 +0000)]
Btrfs: fix log replay failure after linking special file and fsync
If in the same transaction we rename a special file (fifo, character/block
device or symbolic link), create a hard link for it having its old name
then sync the log, we will end up with a log that can not be replayed and
at when attempting to replay it, an EEXIST error is returned and mounting
the filesystem fails. Example scenario:
$ mkfs.btrfs -f /dev/sdc
$ mount /dev/sdc /mnt
$ mkdir /mnt/testdir
$ mkfifo /mnt/testdir/foo
# Make sure everything done so far is durably persisted.
$ sync
# Create some unrelated file and fsync it, this is just to create a log
# tree. The file must be in the same directory as our special file.
$ touch /mnt/testdir/f1
$ xfs_io -c "fsync" /mnt/testdir/f1
# Rename our special file and then create a hard link with its old name.
$ mv /mnt/testdir/foo /mnt/testdir/bar
$ ln /mnt/testdir/bar /mnt/testdir/foo
# Create some other unrelated file and fsync it, this is just to persist
# the log tree which was modified by the previous rename and link
# operations. Alternatively we could have modified file f1 and fsync it.
$ touch /mnt/f2
$ xfs_io -c "fsync" /mnt/f2
<power failure>
$ mount /dev/sdc /mnt
mount: mount /dev/sdc on /mnt failed: File exists
This happens because when both the log tree and the subvolume's tree have
an entry in the directory "testdir" with the same name, that is, there
is one key (258 INODE_REF 257) in the subvolume tree and another one in
the log tree (where 258 is the inode number of our special file and 257
is the inode for directory "testdir"). Only the data of those two keys
differs, in the subvolume tree the index field for inode reference has
a value of 3 while the log tree it has a value of 5. Because the same key
exists in both trees, but have different index, the log replay fails with
an -EEXIST error when attempting to replay the inode reference from the
log tree.
Fix this by setting the last_unlink_trans field of the inode (our special
file) to the current transaction id when a hard link is created, as this
forces logging the parent directory inode, solving the conflict at log
replay time.
A new generic test case for fstests was also submitted.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Tue, 6 Feb 2018 20:39:20 +0000 (20:39 +0000)]
Btrfs: send, fix issuing write op when processing hole in no data mode
When doing an incremental send of a filesystem with the no-holes feature
enabled, we end up issuing a write operation when using the no data mode
send flag, instead of issuing an update extent operation. Fix this by
issuing the update extent operation instead.
Trivial reproducer:
$ mkfs.btrfs -f -O no-holes /dev/sdc
$ mkfs.btrfs -f /dev/sdd
$ mount /dev/sdc /mnt/sdc
$ mount /dev/sdd /mnt/sdd
$ xfs_io -f -c "pwrite -S 0xab 0 32K" /mnt/sdc/foobar
$ btrfs subvolume snapshot -r /mnt/sdc /mnt/sdc/snap1
$ xfs_io -c "fpunch 8K 8K" /mnt/sdc/foobar
$ btrfs subvolume snapshot -r /mnt/sdc /mnt/sdc/snap2
$ btrfs send /mnt/sdc/snap1 | btrfs receive /mnt/sdd
$ btrfs send --no-data -p /mnt/sdc/snap1 /mnt/sdc/snap2 \
| btrfs receive -vv /mnt/sdd
Before this change the output of the second receive command is:
receiving snapshot snap2 uuid=
f6922049-8c22-e544-9ff9-
fc6755918447...
utimes
write foobar, offset 8192, len 8192
utimes foobar
BTRFS_IOC_SET_RECEIVED_SUBVOL uuid=
f6922049-8c22-e544-9ff9-...
After this change it is:
receiving snapshot snap2 uuid=
564d36a3-ebc8-7343-aec9-
bf6fda278e64...
utimes
update_extent foobar: offset=8192, len=8192
utimes foobar
BTRFS_IOC_SET_RECEIVED_SUBVOL uuid=
564d36a3-ebc8-7343-aec9-
bf6fda278e64...
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Anand Jain [Thu, 22 Feb 2018 13:58:42 +0000 (21:58 +0800)]
btrfs: use proper endianness accessors for super_copy
The fs_info::super_copy is a byte copy of the on-disk structure and all
members must use the accessor macros/functions to obtain the right
value. This was missing in update_super_roots and in sysfs readers.
Moving between opposite endianness hosts will report bogus numbers in
sysfs, and mount may fail as the root will not be restored correctly. If
the filesystem is always used on a same endian host, this will not be a
problem.
Fix this by using the btrfs_set_super...() functions to set
fs_info::super_copy values, and for the sysfs, use the cached
fs_info::nodesize/sectorsize values.
CC: stable@vger.kernel.org
Fixes:
df93589a17378 ("btrfs: export more from FS_INFO to sysfs")
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ update changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
Hans van Kranenburg [Mon, 5 Feb 2018 16:45:11 +0000 (17:45 +0100)]
btrfs: alloc_chunk: fix DUP stripe size handling
In case of using DUP, we search for enough unallocated disk space on a
device to hold two stripes.
The devices_info[ndevs-1].max_avail that holds the amount of unallocated
space found is directly assigned to stripe_size, while it's actually
twice the stripe size.
Later on in the code, an unconditional division of stripe_size by
dev_stripes corrects the value, but in the meantime there's a check to
see if the stripe_size does not exceed max_chunk_size. Since during this
check stripe_size is twice the amount as intended, the check will reduce
the stripe_size to max_chunk_size if the actual correct to be used
stripe_size is more than half the amount of max_chunk_size.
The unconditional division later tries to correct stripe_size, but will
actually make sure we can't allocate more than half the max_chunk_size.
Fix this by moving the division by dev_stripes before the max chunk size
check, so it always contains the right value, instead of putting a duct
tape division in further on to get it fixed again.
Since in all other cases than DUP, dev_stripes is 1, this change only
affects DUP.
Other attempts in the past were made to fix this:
*
37db63a400 "Btrfs: fix max chunk size check in chunk allocator" tried
to fix the same problem, but still resulted in part of the code acting
on a wrongly doubled stripe_size value.
*
86db25785a "Btrfs: fix max chunk size on raid5/6" unintentionally
broke this fix again.
The real problem was already introduced with the rest of the code in
73c5de0051.
The user visible result however will be that the max chunk size for DUP
will suddenly double, while it's actually acting according to the limits
in the code again like it was 5 years ago.
Reported-by: Naohiro Aota <naohiro.aota@wdc.com>
Link: https://www.spinics.net/lists/linux-btrfs/msg69752.html
Fixes:
73c5de0051 ("btrfs: quasi-round-robin for chunk allocation")
Fixes:
86db25785a ("Btrfs: fix max chunk size on raid5/6")
Signed-off-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ update comment ]
Signed-off-by: David Sterba <dsterba@suse.com>
Nikolay Borisov [Wed, 31 Jan 2018 15:14:02 +0000 (17:14 +0200)]
btrfs: Handle btrfs_set_extent_delalloc failure in relocate_file_extent_cluster
Essentially duplicate the error handling from the above block which
handles the !PageUptodate(page) case and additionally clear
EXTENT_BOUNDARY.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Nikolay Borisov [Mon, 8 Jan 2018 08:59:43 +0000 (10:59 +0200)]
btrfs: handle failure of add_pending_csums
add_pending_csums was added as part of the new data=ordered
implementation in
e6dcd2dc9c48 ("Btrfs: New data=ordered
implementation"). Even back then it called the btrfs_csum_file_blocks
which can fail but it never bothered handling the failure. In ENOMEM
situation this could lead to the filesystem failing to write the
checksums for a particular extent and not detect this. On read this
could lead to the filesystem erroring out due to crc mismatch. Fix it by
propagating failure from add_pending_csums and handling them.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Jeff Mahoney [Fri, 16 Feb 2018 03:59:47 +0000 (22:59 -0500)]
btrfs: use kvzalloc to allocate btrfs_fs_info
The srcu_struct in btrfs_fs_info scales in size with NR_CPUS. On
kernels built with NR_CPUS=8192, this can result in kmalloc failures
that prevent mounting.
There is work in progress to try to resolve this for every user of
srcu_struct but using kvzalloc will work around the failures until
that is complete.
As an example with NR_CPUS=512 on x86_64: the overall size of
subvol_srcu is 3460 bytes, fs_info is 6496.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Rafael J. Wysocki [Wed, 28 Feb 2018 11:10:59 +0000 (12:10 +0100)]
platform/x86: intel-hid: Reset wakeup capable flag on removal
The intel-hid device will not be able to wake up the system any more
after removing the notify handler provided by its driver, so make
its sysfs attributes reflect that.
Fixes:
ef884112e55c (platform: x86: intel-hid: Wake up the system from suspend-to-idle)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Rafael J. Wysocki [Wed, 28 Feb 2018 11:09:56 +0000 (12:09 +0100)]
platform/x86: intel-vbtn: Reset wakeup capable flag on removal
The intel-vbtn device will not be able to wake up the system any more
after removing the notify handler provided by its driver, so make
its sysfs attributes reflect that.
Fixes:
91f9e850d465 (platform: x86: intel-vbtn: Wake up the system from suspend-to-idle)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Thomas Gleixner [Wed, 28 Feb 2018 20:14:26 +0000 (21:14 +0100)]
x86/cpu_entry_area: Sync cpu_entry_area to initial_page_table
The separation of the cpu_entry_area from the fixmap missed the fact that
on 32bit non-PAE kernels the cpu_entry_area mapping might not be covered in
initial_page_table by the previous synchronizations.
This results in suspend/resume failures because 32bit utilizes initial page
table for resume. The absence of the cpu_entry_area mapping results in a
triple fault, aka. insta reboot.
With PAE enabled this works by chance because the PGD entry which covers
the fixmap and other parts incindentally provides the cpu_entry_area
mapping as well.
Synchronize the initial page table after setting up the cpu entry
area. Instead of adding yet another copy of the same code, move it to a
function and invoke it from the various places.
It needs to be investigated if the existing calls in setup_arch() and
setup_per_cpu_areas() can be replaced by the later invocation from
setup_cpu_entry_areas(), but that's beyond the scope of this fix.
Fixes:
92a0f81d8957 ("x86/cpu_entry_area: Move it out of the fixmap")
Reported-by: Woody Suwalski <terraluna977@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Woody Suwalski <terraluna977@gmail.com>
Cc: William Grant <william.grant@canonical.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1802282137290.1392@nanos.tec.linutronix.de
Stefano Stabellini [Thu, 1 Mar 2018 02:05:34 +0000 (18:05 -0800)]
pvcalls-front: 64-bit align flags
We are using test_and_* operations on the status and flag fields of
struct sock_mapping. However, these functions require the operand to be
64-bit aligned on arm64. Currently, only status is 64-bit aligned.
Make status and flags explicitly 64-bit aligned.
Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Dave Airlie [Thu, 1 Mar 2018 04:03:14 +0000 (14:03 +1000)]
Merge branch 'drm-fixes-4.16' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
A few misc fixes for 4.16.
* 'drm-fixes-4.16' of git://people.freedesktop.org/~agd5f/linux:
drm/amdgpu: skip ECC for SRIOV in gmc late_init
drm/amd/amdgpu: Correct VRAM width for APUs with GMC9
drm/amdgpu: fix&cleanups for wb_clear
drm/amdgpu: Correct sdma_v4 get_wptr(v2)
drm/amd/powerplay: fix power over limit on Fiji
drm/amdgpu:Fixed wrong emit frame size for enc
drm/amdgpu: move WB_FREE to correct place
drm/amdgpu: only flush hotplug work without DC
drm/amd/display: check for ipp before calling cursor operations
Dave Airlie [Thu, 1 Mar 2018 04:02:32 +0000 (14:02 +1000)]
Merge tag 'drm-misc-fixes-2018-02-28' of git://people.freedesktop.org/drm-misc into drm-fixes
Two regression fixes here: a fb format regression on nouveau and a 4.16-rc1
regression with on LVDS with one sun4i device. Plus a sun4i and a virtio-gpu
fixes.
* tag 'drm-misc-fixes-2018-02-28' of git://people.freedesktop.org/drm-misc:
virtio-gpu: fix ioctl and expose the fixed status to userspace.
drm/sun4i: Protect the TCON pixel clocks
drm/sun4i: Enable the output on the pins (tcon0)
drm/nouveau: prefer XBGR2101010 for addfb ioctl
Dave Airlie [Thu, 1 Mar 2018 03:59:21 +0000 (13:59 +1000)]
Merge tag 'drm-intel-fixes-2018-02-28' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
- 2 display fixes: audio av_enc_map overflow check, and Cannonlake PLL related register offset.
- 3 gem fixes: Clear for in-fence out-fence, fix for clearing exec_flags on execbuf failure, and add back global seqno to tracepoints that had been removed recently by other fence related patch.
* tag 'drm-intel-fixes-2018-02-28' of git://anongit.freedesktop.org/drm/drm-intel:
drm/i915: Make global seqno known in i915_gem_request_execute tracepoint
drm/i915: Clear the in-use marker on execbuf failure
drm/i915/cnl: Fix PORT_TX_DW5/7 register address
drm/i915/audio: fix check for av_enc_map overflow
drm/i915: Fix rsvd2 mask when out-fence is returned
Linus Torvalds [Thu, 1 Mar 2018 00:11:04 +0000 (16:11 -0800)]
Merge tag 'armsoc-fixes' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Arnd Bergmann:
"This is the first set of bugfixes for ARM SoCs, fixing a couple of
stability problems, mostly on TI OMAP and Rockchips platforms:
- OMAP2 hwmod clocks must be enabled in the correct order
- OMAP3 Wakeup from resume through PRM IRQ was unreliable
- one regression on OMAP5 caused by a kexec fix
- Rockchip ethernet needs some settings for stable operation on
Rock64
- Rockchip based Chrombook Plus needs another clock setting for
stable display suspend/resume
- Rockchip based phyCORE-RK3288 was able to run at an invalid CPU
clock frequency
- Rockchip MMC link was sometimes unreliable
- multiple fixes to avoid crashes in the Broadcom STB DPFE driver
Other minor changes include:
- Devicetree fixes for incorrect hardware description (rockchip,
omap, Gemini, amlogic)
- some MAINTAINER file updates to correct email and git addresses
- some fixes addressing 'make W=1' dtc warnings (broadcom, amlogic,
cavium, qualcomm, hisilicon, zx)
- fixes for LTO-compilation (orion, davinci, clps711x)
- one fix for an incorrect Kconfig errata selection
- a memory leak in the OMAP timer driver
- a kernel data leak in OMAP1 debugfs files"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (38 commits)
MAINTAINERS: update entries for ARM/STM32
ARM: dts: bcm283x: Move arm-pmu out of soc node
ARM: dts: bcm283x: Fix unit address of local_intc
ARM: dts: NSP: Fix amount of RAM on BCM958625HR
ARM: dts: Set D-Link DNS-313 SATA to muxmode 0
ARM: omap2: set CONFIG_LIRC=y in defconfig
ARM: dts: imx6dl: Include correct dtsi file for Engicam i.CoreM6 DualLite/Solo RQS
memory: brcmstb: dpfe: support new way of passing data from the DCPU
memory: brcmstb: dpfe: fix type declaration of variable "ret"
memory: brcmstb: dpfe: properly mask vendor error bits
ARM: BCM: dts: Remove leading 0x and 0s from bindings notation
ARM: orion: fix orion_ge00_switch_board_info initialization
ARM: davinci: mark spi_board_info arrays as const
ARM: clps711x: mark clps711x_compat as const
arm: zx: dts: Remove leading 0x and 0s from bindings notation
arm64: dts: Remove leading 0x and 0s from bindings notation
arm64: dts: cavium: fix PCI bus dtc warnings
MAINTAINERS: ARM: at91: update my email address
soc: imx: gpc: de-register power domains only if initialized
ARM: dts: rockchip: Fix DWMMC clocks
...
Linus Torvalds [Wed, 28 Feb 2018 22:55:07 +0000 (14:55 -0800)]
Merge tag 'riscv-for-linus-4.16-rc4_smp_mb' of git://git./linux/kernel/git/palmer/riscv-linux
Pull RISC-V fix from Palmer Dabbelt:
"This week we have a single fix: replacing smp_mb() with __smp_mb().
We were the only architecture with smp_mb() and it appears to just be
clearly wrong, so I think this is a pretty safe patch for an RC"
* tag 'riscv-for-linus-4.16-rc4_smp_mb' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux:
riscv/barrier: Define __smp_{mb,rmb,wmb}
Lingutla Chandrasekhar [Thu, 18 Jan 2018 11:50:22 +0000 (17:20 +0530)]
timers: Forward timer base before migrating timers
On CPU hotunplug the enqueued timers of the unplugged CPU are migrated to a
live CPU. This happens from the control thread which initiated the unplug.
If the CPU on which the control thread runs came out from a longer idle
period then the base clock of that CPU might be stale because the control
thread runs prior to any event which forwards the clock.
In such a case the timers from the unplugged CPU are queued on the live CPU
based on the stale clock which can cause large delays due to increased
granularity of the outer timer wheels which are far away from base:;clock.
But there is a worse problem than that. The following sequence of events
illustrates it:
- CPU0 timer1 is queued expires = 59969 and base->clk = 59131.
The timer is queued at wheel level 2, with resulting expiry time = 60032
(due to level granularity).
- CPU1 enters idle @60007, with next timer expiry @60020.
- CPU0 is hotplugged at @60009
- CPU1 exits idle and runs the control thread which migrates the
timers from CPU0
timer1 is now queued in level 0 for immediate handling in the next
softirq because the requested expiry time 59969 is before CPU1 base->clk
60007
- CPU1 runs code which forwards the base clock which succeeds because the
next expiring timer. which was collected at idle entry time is still set
to 60020.
So it forwards beyond 60007 and therefore misses to expire the migrated
timer1. That timer gets expired when the wheel wraps around again, which
takes between 63 and 630ms depending on the HZ setting.
Address both problems by invoking forward_timer_base() for the control CPUs
timer base. All other places, which might run into a similar problem
(mod_timer()/add_timer_on()) already invoke forward_timer_base() to avoid
that.
[ tglx: Massaged comment and changelog ]
Fixes:
a683f390b93f ("timers: Forward the wheel clock whenever possible")
Co-developed-by: Neeraj Upadhyay <neeraju@codeaurora.org>
Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org>
Signed-off-by: Lingutla Chandrasekhar <clingutla@codeaurora.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: linux-arm-msm@vger.kernel.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180118115022.6368-1-clingutla@codeaurora.org
Arnd Bergmann [Wed, 28 Feb 2018 22:27:21 +0000 (23:27 +0100)]
Merge tag 'arm-soc/for-4.16/drivers-fixes' of https://github.com/Broadcom/stblinux into fixes
Pull "Broadcom drivers fixes for 4.16" from Florian Fainelli:
This pull request contains Broadcom SoCs drivers fixes for 4.16, please
pull the following:
- Markus provides two minor fixes to the Broadcom STB DPFE driver, one
to properly mask bits, and a second one to use the correct type. The
third commit is a consequence of a newer DFPE firmware which would
unfortunately crash without appropriate kernel changes.
* tag 'arm-soc/for-4.16/drivers-fixes' of https://github.com/Broadcom/stblinux:
memory: brcmstb: dpfe: support new way of passing data from the DCPU
memory: brcmstb: dpfe: fix type declaration of variable "ret"
memory: brcmstb: dpfe: properly mask vendor error bits