review.tizen.org Git - platform/kernel/linux-exynos.git/log

i40e: Use smp_rmb rather than read_barrier_depends

commit 52c6912fde0133981ee50ba08808f257829c4c93 upstream.

The original issue being fixed in this patch was seen with the ixgbe
driver, but the same issue exists with i40e as well, as the code is
very similar. read_barrier_depends is not sufficient to ensure
loads following it are not speculatively loaded out of order
by the CPU, which can result in stale data being loaded, causing
potential system crashes.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

spi-nor: intel-spi: Fix broken software sequencing codes

commit 9d63f17661e25fd28714dac94bdebc4ff5b75f09 upstream.

There are two bugs in current intel_spi_sw_cycle():

- The 'data byte count' field should be the number of bytes
transferred minus 1
- SSFSTS_CTL is the offset from ispi->sregs, not ispi->base

Signed-off-by: Bin Meng <bmeng.cn@gmail.com>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Cyrille Pitchen <cyrille.pitchen@wedev4u.fr>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

NFC: fix device-allocation error return

commit c45e3e4c5b134b081e8af362109905427967eb19 upstream.

A recent change fixing NFC device allocation itself introduced an
error-handling bug by returning an error pointer in case device-id
allocation failed. This is clearly broken as the callers still expected
NULL to be returned on errors as detected by Dan's static checker.

Fix this up by returning NULL in the event that we've run out of memory
when allocating a new device id.

Note that the offending commit is marked for stable (3.8) so this fix
needs to be backported along with it.

Fixes: 20777bc57c34 ("NFC: fix broken device allocation")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

IB/core: Only maintain real QPs in the security lists

commit 877add28178a7fa3c68f29c450d050a8e6513f08 upstream.

When modify QP is called on a shared QP update the security context for
the real QP. When security is subsequently enforced the shared QP
handles will be checked as well.

Without this change shared QP handles get added to the port/pkey lists,
which is a bug, because not all shared QP handles will be checked for
access. Also the shared QP security context wouldn't get removed from
the port/pkey lists causing access to free memory and list corruption
when they are destroyed.

Fixes: d291f1a65232 ("IB/core: Enforce PKey security on QPs")
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

IB/core: Avoid crash on pkey enforcement failed in received MADs

commit 89548bcafec7ecfeea58c553f0834b5d575a66eb upstream.

Below kernel crash is observed when Pkey security enforcement fails on
received MADs. This issue is reported in [1].

ib_free_recv_mad() accesses the rmpp_list, whose initialization is
needed before accessing it.
When security enformcent fails on received MADs, MAD processing avoided
due to security checks failed.

OpenSM[3770]: SM port is down
kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
kernel: IP: ib_free_recv_mad+0x44/0xa0 [ib_core]
kernel: PGD 0
kernel: P4D 0
kernel:
kernel: Oops: 0002 [#1] SMP
kernel: CPU: 0 PID: 2833 Comm: kworker/0:1H Tainted: P          IO    4.13.4-1-pve #1
kernel: Hardware name: Dell       XS23-TY3        /9CMP63, BIOS 1.71 09/17/2013
kernel: Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
kernel: task: ffffa069c6541600 task.stack: ffffb9a729054000
kernel: RIP: 0010:ib_free_recv_mad+0x44/0xa0 [ib_core]
kernel: RSP: 0018:ffffb9a729057d38 EFLAGS: 00010286
kernel: RAX: ffffa069cb138a48 RBX: ffffa069cb138a10 RCX: 0000000000000000
kernel: RDX: ffffb9a729057d38 RSI: 0000000000000000 RDI: ffffa069cb138a20
kernel: RBP: ffffb9a729057d60 R08: ffffa072d2d49800 R09: ffffa069cb138ae0
kernel: R10: ffffa069cb138ae0 R11: ffffa072b3994e00 R12: ffffb9a729057d38
kernel: R13: ffffa069d1c90000 R14: 0000000000000000 R15: ffffa069d1c90880
kernel: FS:  0000000000000000(0000) GS:ffffa069dba00000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 0000000000000008 CR3: 00000011f51f2000 CR4: 00000000000006f0
kernel: Call Trace:
kernel:  ib_mad_recv_done+0x5cc/0xb50 [ib_core]
kernel:  __ib_process_cq+0x5c/0xb0 [ib_core]
kernel:  ib_cq_poll_work+0x20/0x60 [ib_core]
kernel:  process_one_work+0x1e9/0x410
kernel:  worker_thread+0x4b/0x410
kernel:  kthread+0x109/0x140
kernel:  ? process_one_work+0x410/0x410
kernel:  ? kthread_create_on_node+0x70/0x70
kernel:  ? SyS_exit_group+0x14/0x20
kernel:  ret_from_fork+0x25/0x30
kernel: RIP: ib_free_recv_mad+0x44/0xa0 [ib_core] RSP: ffffb9a729057d38
kernel: CR2: 0000000000000008

[1] : https://www.spinics.net/lists/linux-rdma/msg56190.html

Fixes: 47a2b338fe63 ("IB/core: Enforce security on management datagrams")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reported-by: Chris Blake <chrisrblake93@gmail.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

IB/srp: Avoid that a cable pull can trigger a kernel crash

commit 8a0d18c62121d3c554a83eb96e2752861d84d937 upstream.

This patch fixes the following kernel crash:

general protection fault: 0000 [#1] PREEMPT SMP
Workqueue: ib_mad2 timeout_sends [ib_core]
Call Trace:
ib_sa_path_rec_callback+0x1c4/0x1d0 [ib_core]
send_handler+0xb2/0xd0 [ib_core]
timeout_sends+0x14d/0x220 [ib_core]
process_one_work+0x200/0x630
worker_thread+0x4e/0x3b0
kthread+0x113/0x150

Fixes: commit aef9ec39c47f ("IB: Add SCSI RDMA Protocol (SRP) initiator")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

IB/hfi1: Fix incorrect available receive user context count

commit d7d626179fb283aba73699071af0df6d00e32138 upstream.

The addition of the VNIC contexts to num_rcv_contexts changes the
meaning of the sysfs value nctxts from available user contexts, to
user contexts + reserved VNIC contexts.

User applications that use nctxts are now broken.

Update the calculation so that VNIC contexts are used only if there are
hardware contexts available, and do not silently affect nctxts.

Update code to use the calculated VNIC context number.

Update the sysfs value nctxts to be available user contexts only.

Fixes: 2280740f01ae ("IB/hfi1: Virtual Network Interface Controller (VNIC) HW support")
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Niranjana Vishwanathapura <Niranjana.Vishwanathapura@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

IB/cm: Fix memory corruption in handling CM request

commit 5a3dc32372439eb9a0d6027c54cbfff64803fce5 upstream.

In recent code, two path record entries are alwasy cleared while
allocated could be either one or two path record entries.
This leads to zero out of unallocated memory.

This fix initializes alternative path record only when alternative path
is set.

While we are at it, path record allocation doesn't check for OPA
alternative path, but rest of the code checks for OPA alternative path.
Path record allocation code doesn't check for OPA alternative LID.
This can further lead to memory corruption when only one path record is
allocated, but there is actually alternative OPA path record present in CM
request.

Fixes: 9fdca4da4d8c ("IB/SA: Split struct sa_path_rec based on IB and ROCE specific fields")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

IB/srpt: Do not accept invalid initiator port names

commit c70ca38960399a63d5c048b7b700612ea321d17e upstream.

Make srpt_parse_i_port_id() return a negative value if hex2bin()
fails.

Fixes: commit a42d985bd5b2 ("ib_srpt: Initial SRP Target merge for v3.3-rc1")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

svcrdma: Preserve CB send buffer across retransmits

commit 0bad47cada5defba13e98827d22d06f13258dfb3 upstream.

During each NFSv4 callback Call, an RDMA Send completion frees the
page that contains the RPC Call message. If the upper layer
determines that a retransmit is necessary, this is too soon.

One possible symptom: after a GARBAGE_ARGS response an NFSv4.1
callback request, the following BUG fires on the NFS server:

kernel: BUG: Bad page state in process kworker/0:2H pfn:7d3ce2
kernel: page:ffffea001f4f3880 count:-2 mapcount:0 mapping: (null) index:0x0
kernel: flags: 0x2fffff80000000()
kernel: raw: 002fffff80000000 0000000000000000 0000000000000000 fffffffeffffffff
kernel: raw: dead000000000100 dead000000000200 0000000000000000 0000000000000000
kernel: page dumped because: nonzero _refcount
kernel: Modules linked in: cts rpcsec_gss_krb5 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm
ocfs2_nodemanager ocfs2_stackglue rpcrdm a ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad
rdma_cm ib_cm iw_cm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel
kvm irqbypass crct10dif_pc lmul crc32_pclmul ghash_clmulni_intel pcbc iTCO_wdt
iTCO_vendor_support aesni_intel crypto_simd glue_helper cryptd pcspkr lpc_ich i2c_i801
mei_me mf d_core mei raid0 sg wmi ioatdma ipmi_si ipmi_devintf ipmi_msghandler shpchp
acpi_power_meter acpi_pad nfsd nfs_acl lockd auth_rpcgss grace sunrpc ip_tables xfs
libcrc32c mlx4_en mlx4_ib mlx5_ib ib_core sd_mod sr_mod cdrom ast drm_kms_helper
syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci crc32c_intel libahci drm
mlx5_core igb libata mlx4_core dca i2c_algo_bit i2c_core nvme
kernel: ptp nvme_core pps_core dm_mirror dm_region_hash dm_log dm_mod dax
kernel: CPU: 0 PID: 11495 Comm: kworker/0:2H Not tainted 4.14.0-rc3-00001-g577ce48 #811
kernel: Hardware name: Supermicro Super Server/X10SRL-F, BIOS 1.0c 09/09/2015
kernel: Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
kernel: Call Trace:
kernel: dump_stack+0x62/0x80
kernel: bad_page+0xfe/0x11a
kernel: free_pages_check_bad+0x76/0x78
kernel: free_pcppages_bulk+0x364/0x441
kernel: ? ttwu_do_activate.isra.61+0x71/0x78
kernel: free_hot_cold_page+0x1c5/0x202
kernel: __put_page+0x2c/0x36
kernel: svc_rdma_put_context+0xd9/0xe4 [rpcrdma]
kernel: svc_rdma_wc_send+0x50/0x98 [rpcrdma]

This issue exists all the way back to v4.5, but refactoring and code
re-organization prevents this simple patch from applying to kernels
older than v4.12. The fix is the same, however, if someone needs to
backport it.

Reported-by: Ben Coddington <bcodding@redhat.com>
BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=314
Fixes: 5d252f90a800 ('svcrdma: Add class for RDMA backwards ... ')
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

libnvdimm, namespace: make 'resource' attribute only readable by root

commit c1fb3542074fd0c4d901d778bd52455111e4eb6f upstream.

For the same reason that /proc/iomem returns 0's for non-root readers
and acpi tables are root-only, make the 'resource' attribute for
namespace devices only readable by root. Otherwise we disclose physical
address information.

Fixes: bf9bccc14c05 ("libnvdimm: pmem label sets and namespace instantiation")
Reported-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

libnvdimm, region : make 'resource' attribute only readable by root

commit b8ff981f88df03c72a4de2f6eaa9ce447a10ac03 upstream.

For the same reason that /proc/iomem returns 0's for non-root readers
and acpi tables are root-only, make the 'resource' attribute for region
devices only readable by root. Otherwise we disclose physical address
information.

Fixes: 802f4be6feee ("libnvdimm: Add 'resource' sysfs attribute to regions")
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Reported-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

libnvdimm, namespace: fix label initialization to use valid seq numbers

commit b18d4b8a25af6fe83d7692191d6ff962ea611c4f upstream.

The set of valid sequence numbers is {1,2,3}. The specification
indicates that an implementation should consider 0 a sign of a critical
error:

    UEFI 2.7: 13.19 NVDIMM Label Protocol

    Software never writes the sequence number 00, so a correctly
    check-summed Index Block with this sequence number probably indicates a
    critical error. When software discovers this case it treats it as an
    invalid Index Block indication.

While the expectation is that the invalid block is just thrown away, the
Robustness Principle says we should fix this to make both sequence
numbers valid.

Fixes: f524bf271a5c ("libnvdimm: write pmem label set")
Reported-by: Juston Li <juston.li@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

libnvdimm, pfn: make 'resource' attribute only readable by root

commit 26417ae4fc6108f8db436f24108b08f68bdc520e upstream.

For the same reason that /proc/iomem returns 0's for non-root readers
and acpi tables are root-only, make the 'resource' attribute for pfn
devices only readable by root. Otherwise we disclose physical address
information.

Fixes: f6ed58c70d14 ("libnvdimm, pfn: 'resource'-address and 'size'...")
Reported-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

libnvdimm, dimm: clear 'locked' status on successful DIMM enable

commit d34cb808402898e53b9a9bcbbedd01667a78723b upstream.

If we successfully enable a DIMM then it must not be locked and we can
clear the label-read failure condition. Otherwise, we need to reload the
entire bus provider driver to achieve the same effect, and that can
disrupt unrelated DIMMs and namespaces.

Fixes: 9d62ed965118 ("libnvdimm: handle locked label storage areas")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

clk: ti: dra7-atl-clock: fix child-node lookups

commit 33ec6dbc5a02677509d97fe36cd2105753f0f0ea upstream.

Fix child node-lookup during probe, which ended up searching the whole
device tree depth-first starting at parent rather than just matching on
its children.

Note that the original premature free of the parent node has already
been fixed separately, but that fix was apparently never backported to
stable.

Fixes: 9ac33b0ce81f ("CLK: TI: Driver for DRA7 ATL (Audio Tracking Logic)")
Fixes: 660e15519399 ("clk: ti: dra7-atl-clock: Fix of_node reference counting")
Cc: Peter Ujfalusi <peter.ujfalusi@ti.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Acked-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

SUNRPC: Fix tracepoint storage issues with svc_recv and svc_rqst_status

commit e9d4bf219c83d09579bc62512fea2ca10f025d93 upstream.

There is no guarantee that either the request or the svc_xprt exist
by the time we get round to printing the trace message.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

dax: fix general protection fault in dax_alloc_inode

commit 9f586fff6574f6ecbf323f92d44ffaf0d96225fe upstream.

Don't crash in case of allocation failure in dax_alloc_inode.

    syzkaller hit the following crash on e4880bc5dfb1

    kasan: CONFIG_KASAN_INLINE enabled
    kasan: GPF could be caused by NULL-ptr deref or user memory access
    [..]
    RIP: 0010:dax_alloc_inode+0x3b/0x70 drivers/dax/super.c:348
    Call Trace:
    alloc_inode+0x65/0x180 fs/inode.c:208
    new_inode_pseudo+0x69/0x190 fs/inode.c:890
    new_inode+0x1c/0x40 fs/inode.c:919
    mount_pseudo_xattr+0x288/0x560 fs/libfs.c:261
    mount_pseudo include/linux/fs.h:2137 [inline]
    dax_mount+0x2e/0x40 drivers/dax/super.c:388
    mount_fs+0x66/0x2d0 fs/super.c:1223

Fixes: 7b6be8444e0f ("dax: refactor dax-fs into a generic provider...")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

dax: fix PMD faults on zero-length files

commit 957ac8c421ad8b5eef9b17fe98e146d8311a541e upstream.

PMD faults on a zero length file on a file system mounted with -o dax
will not generate SIGBUS as expected.

fd = open(...O_TRUNC);
addr = mmap(NULL, 2*1024*1024, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
*addr = 'a';
<expect SIGBUS>

The problem is this code in dax_iomap_pmd_fault:

max_pgoff = (i_size_read(inode) - 1) >> PAGE_SHIFT;

If the inode size is zero, we end up with a max_pgoff that is way larger
than 0. :) Fix it by using DIV_ROUND_UP, as is done elsewhere in the
kernel.

I tested this with some simple test code that ensured that SIGBUS was
received where expected.

Fixes: 642261ac995e ("dax: add struct iomap based DAX PMD support")
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

kvm: vmx: Reinstate support for CPUs without virtual NMI

commit 8a1b43922d0d1279e7936ba85c4c2a870403c95f upstream.

This is more or less a revert of commit 2c82878b0cb3 ("KVM: VMX: require
virtual NMI support", 2017-03-27); it turns out that Core 2 Duo machines
only had virtual NMIs in some SKUs.

The revert is not trivial because in the meanwhile there have been several
fixes to nested NMI injection.  Therefore, the entire vNMI state is moved
to struct loaded_vmcs.

Another change compared to before the patch is a simplification here:

       if (unlikely(!cpu_has_virtual_nmis() && vmx->soft_vnmi_blocked &&
           !(is_guest_mode(vcpu) && nested_cpu_has_virtual_nmis(
                                       get_vmcs12(vcpu))))) {

The final condition here is always true (because nested_cpu_has_virtual_nmis
is always false) and is removed.

Fixes: 2c82878b0cb38fd516fd612c67852a6bbf282003
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1490803
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

KVM: SVM: obey guest PAT

commit 15038e14724799b8c205beb5f20f9e54896013c3 upstream.

For many years some users of assigned devices have reported worse
performance on AMD processors with NPT than on AMD without NPT,
Intel or bare metal.

The reason turned out to be that SVM is discarding the guest PAT
setting and uses the default (PA0=PA4=WB, PA1=PA5=WT, PA2=PA6=UC-,
PA3=UC). The guest might be using a different setting, and
especially might want write combining but isn't getting it
(instead getting slow UC or UC- accesses).

Thanks a lot to geoff@hostfission.com for noticing the relation
to the g_pat setting. The patch has been tested also by a bunch
of people on VFIO users forums.

Fixes: 709ddebf81cb40e3c36c6109a7892e8b93a09464
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=196409
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Tested-by: Nick Sarnie <commendsarnex@gmail.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

KVM: nVMX: set IDTR and GDTR limits when loading L1 host state

commit 21f2d551183847bc7fbe8d866151d00cdad18752 upstream.

Intel SDM 27.5.2 Loading Host Segment and Descriptor-Table Registers:

"The GDTR and IDTR limits are each set to FFFFH."

Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

KVM: PPC: Book3S HV: Don't call real-mode XICS hypercall handlers if not enabled

commit 00bb6ae5006205e041ce9784c819460562351d47 upstream.

When running a guest on a POWER9 system with the in-kernel XICS
emulation disabled (for example by running QEMU with the parameter
"-machine pseries,kernel_irqchip=off"), the kernel does not pass
the XICS-related hypercalls such as H_CPPR up to userspace for
emulation there as it should.

The reason for this is that the real-mode handlers for these
hypercalls don't check whether a XICS device has been instantiated
before calling the xics-on-xive code. That code doesn't check
either, leading to potential NULL pointer dereferences because
vcpu->arch.xive_vcpu is NULL. Those dereferences won't cause an
exception in real mode but will lead to kernel memory corruption.

This fixes it by adding kvmppc_xics_enabled() checks before calling
the XICS functions.

Fixes: 5af50993850a ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller")
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

lockd: double unregister of inetaddr notifiers

commit dc3033e16c59a2c4e62b31341258a5786cbcee56 upstream.

lockd_up() can call lockd_unregister_notifiers twice:
inside lockd_start_svc() when it calls lockd_svc_exit_thread()
and then in error path of lockd_up()

Patch forces lockd_start_svc() to unregister notifiers in all error cases
and removes extra unregister in error path of lockd_up().

Fixes: cb7d224f82e4 "lockd: unregister notifier blocks if the service ..."
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

irqchip/gic-v3: Fix ppi-partitions lookup

commit 00ee9a1ca5080202bc37b44e998c3b2c74d45817 upstream.

Fix child-node lookup during initialisation, which ended up searching
the whole device tree depth-first starting at the parent rather than
just matching on its children.

To make things worse, the parent gic node was prematurely freed, while
the ppi-partitions node was leaked.

Fixes: e3825ba1af3a ("irqchip/gic-v3: Add support for partitioned PPIs")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

genirq: Track whether the trigger type has been set

commit 4f8413a3a799c958f7a10a6310a451e6b8aef5ad upstream.

When requesting a shared interrupt, we assume that the firmware
support code (DT or ACPI) has called irqd_set_trigger_type
already, so that we can retrieve it and check that the requester
is being reasonnable.

Unfortunately, we still have non-DT, non-ACPI systems around,
and these guys won't call irqd_set_trigger_type before requesting
the interrupt. The consequence is that we fail the request that
would have worked before.

We can either chase all these use cases (boring), or address it
in core code (easier). Let's have a per-irq_desc flag that
indicates whether irqd_set_trigger_type has been called, and
let's just check it when checking for a shared interrupt.
If it hasn't been set, just take whatever the interrupt
requester asks.

Fixes: 382bd4de6182 ("genirq: Use irqd_get_trigger_type to compare the trigger type for shared IRQs")
Reported-and-tested-by: Petr Cvek <petrcvekcz@gmail.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

raid1: prevent freeze_array/wait_all_barriers deadlock

commit f6eca2d43ed694ab8124dd24c88277f7eca93b7d upstream.

If freeze_array is attempted in the middle of close_sync/
wait_all_barriers, deadlock can occur.

freeze_array will wait for nr_pending and nr_queued to line up.
wait_all_barriers increments nr_pending for each barrier bucket, one
at a time, but doesn't actually issue IO that could be counted in
nr_queued. So freeze_array is blocked until wait_all_barriers
completes and allow_all_barriers runs. At the same time, when
_wait_barrier sees array_frozen == 1, it stops and waits for
freeze_array to complete.

Prevent the deadlock by making close_sync call _wait_barrier and
_allow_barrier for one bucket at a time, instead of deferring the
_allow_barrier calls until after all _wait_barriers are complete.

Signed-off-by: Nate Dailey <nate.dailey@stratus.com>
Fix: fd76863e37fe(RAID1: a new I/O barrier implementation to remove resync window)
Reviewed-by: Coly Li <colyli@suse.de>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

block: Fix a race between blk_cleanup_queue() and timeout handling

commit 4e9b6f20828ac880dbc1fa2fdbafae779473d1af upstream.

Make sure that if the timeout timer fires after a queue has been
marked "dying" that the affected requests are finished.

Reported-by: chenxiang (M) <chenxiang66@hisilicon.com>
Fixes: commit 287922eb0b18 ("block: defer timeouts to a workqueue")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Tested-by: chenxiang (M) <chenxiang66@hisilicon.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

p54: don't unregister leds when they are not initialized

commit fc09785de0a364427a5df63d703bae9a306ed116 upstream.

ieee80211_register_hw() in p54_register_common() may fail and leds won't
get initialized. Currently p54_unregister_common() doesn't check that and
always calls p54_unregister_leds(). The fix is to check priv->registered
flag before calling p54_unregister_leds().

Found by syzkaller.

INFO: trying to register non-static key.
the code is fine but needs lockdep annotation.
turning off the locking correctness validator.
CPU: 1 PID: 1404 Comm: kworker/1:1 Not tainted
4.14.0-rc1-42251-gebb2c2437d80-dirty #205
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Workqueue: usb_hub_wq hub_event
Call Trace:
__dump_stack lib/dump_stack.c:16
dump_stack+0x292/0x395 lib/dump_stack.c:52
register_lock_class+0x6c4/0x1a00 kernel/locking/lockdep.c:769
__lock_acquire+0x27e/0x4550 kernel/locking/lockdep.c:3385
lock_acquire+0x259/0x620 kernel/locking/lockdep.c:4002
flush_work+0xf0/0x8c0 kernel/workqueue.c:2886
__cancel_work_timer+0x51d/0x870 kernel/workqueue.c:2961
cancel_delayed_work_sync+0x1f/0x30 kernel/workqueue.c:3081
p54_unregister_leds+0x6c/0xc0 drivers/net/wireless/intersil/p54/led.c:160
p54_unregister_common+0x3d/0xb0 drivers/net/wireless/intersil/p54/main.c:856
p54u_disconnect+0x86/0x120 drivers/net/wireless/intersil/p54/p54usb.c:1073
usb_unbind_interface+0x21c/0xa90 drivers/usb/core/driver.c:423
__device_release_driver drivers/base/dd.c:861
device_release_driver_internal+0x4f4/0x5c0 drivers/base/dd.c:893
device_release_driver+0x1e/0x30 drivers/base/dd.c:918
bus_remove_device+0x2f4/0x4b0 drivers/base/bus.c:565
device_del+0x5c4/0xab0 drivers/base/core.c:1985
usb_disable_device+0x1e9/0x680 drivers/usb/core/message.c:1170
usb_disconnect+0x260/0x7a0 drivers/usb/core/hub.c:2124
hub_port_connect drivers/usb/core/hub.c:4754
hub_port_connect_change drivers/usb/core/hub.c:5009
port_event drivers/usb/core/hub.c:5115
hub_event+0x1318/0x3740 drivers/usb/core/hub.c:5195
process_one_work+0xc7f/0x1db0 kernel/workqueue.c:2119
process_scheduled_works kernel/workqueue.c:2179
worker_thread+0xb2b/0x1850 kernel/workqueue.c:2255
kthread+0x3a1/0x470 kernel/kthread.c:231
ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431

Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mailbox: bcm-flexrm-mailbox: Fix FlexRM ring flush sequence

commit a371c10ea4b38a5f120e86d906d404d50a0f4660 upstream.

As-per suggestion from FlexRM HW folks, we have to first set
FlexRM ring flush state and then clear it for FlexRM ring flush
to work properly.

Currently, the FlexRM driver has incomplete FlexRM ring flush
sequence which causes repeated insmod+rmmod of mailbox client
drivers to fail.

This patch fixes FlexRM ring flush sequence in flexrm_shutdown()
as described above.

Fixes: dbc049eee730 ("mailbox: Add driver for Broadcom FlexRM
ring manager")

Signed-off-by: Anup Patel <anup.patel@broadcom.com>
Reviewed-by: Scott Branden <scott.branden@broadcom.com>
Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mtd: nand: mtk: fix infinite ECC decode IRQ issue

commit 1d2fcdcf33339c7c8016243de0f7f31cf6845e8d upstream.

For MT2701 NAND Controller, there may generate infinite ECC decode IRQ
during long time burn test on some platforms. Once this issue occurred,
the ECC decode IRQ status cannot be cleared in the IRQ handler function,
and threads cannot be scheduled.

ECC HW generates decode IRQ each sector, so there will have more than one
decode IRQ if read one page of large page NAND.

Currently, ECC IRQ handle flow is that we will check whether it is decode
IRQ at first by reading the register ECC_DECIRQ_STA. This is a read-clear
type register. If this IRQ is decode IRQ, then the ECC IRQ signal will be
cleared at the same time.
Secondly, we will check whether all sectors are decoded by reading the
register ECC_DECDONE. This is because the current IRQ may be not dealed
in time, and the next sectors have been decoded before reading the
register ECC_DECIRQ_STA. Then, the next sectors's decode IRQs will not
be generated.
Thirdly, if all sectors are decoded by comparing with ecc->sectors, then we
will complete ecc->done, set ecc->sectors as 0, and disable ECC IRQ by
programming the register ECC_IRQ_REG(op) as 0. Otherwise, wait for the
next ECC IRQ.

But, there is a timing issue between step one and two. When we read the
reigster ECC_DECIRQ_STA, all sectors are decoded except the last sector,
and the ECC IRQ signal is cleared. But the last sector is decoded before
reading ECC_DECDONE, so the ECC IRQ signal is enabled again by ECC HW, and
it means we will receive one extra ECC IRQ later. In step three, we will
find that all sectors were decoded, then disable ECC IRQ and return.
When deal with the extra ECC IRQ, the ECC IRQ status cannot be cleared
anymore. That is because the register ECC_DECIRQ_STA can only be cleared
when the register ECC_IRQ_REG(op) is enabled. But actually we have
disabled ECC IRQ in the previous ECC IRQ handle. So, there will
keep receiving ECC decode IRQ.

Now, we read the register ECC_DECIRQ_STA once again before completing the
ecc done event. This ensures that there will be no extra ECC decode IRQ.

Also, remove writel(0, ecc->regs + ECC_IRQ_REG(op)) from irq handler,
because ECC IRQ is disabled in mtk_ecc_disable(). And clear ECC_DECIRQ_STA
in mtk_ecc_disable() in case there is a timeout to wait decode IRQ.

Fixes: 1d6b1e464950 ("mtd: mediatek: driver for MTK Smart Device")
Signed-off-by: Xiaolei Li <xiaolei.li@mediatek.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mtd: nand: Fix writing mtdoops to nand flash.

commit 30863e38ebeb500a31cecee8096fb5002677dd9b upstream.

When mtdoops calls mtd_panic_write(), it eventually calls
panic_nand_write() in nand_base.c. In order to properly wait for the
nand chip to be ready in panic_nand_wait(), the chip must first be
selected.

When using the atmel nand flash controller, a panic would occur due to
a NULL pointer exception.

Fixes: 2af7c6539931 ("mtd: Add panic_write for NAND flashes")
Signed-off-by: Brent Taylor <motobud@gmail.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mtd: nand: omap2: Fix subpage write

commit 739c64414f01748a36e7d82c8e0611dea94412bd upstream.

Since v4.12, NAND subpage writes were causing a NULL pointer
dereference on OMAP platforms (omap2-nand) using OMAP_ECC_BCH4_CODE_HW,
OMAP_ECC_BCH8_CODE_HW and OMAP_ECC_BCH16_CODE_HW.

This is because for those ECC modes, omap_calculate_ecc_bch()
generates ECC bytes for the entire (multi-sector) page and this can
overflow the ECC buffer provided by nand_write_subpage_hwecc()
as it expects ecc.calculate() to return ECC bytes for just one sector.

However, the root cause of the problem is present since v3.9
but was not seen then as NAND buffers were being allocated
as one big chunk prior to commit 3deb9979c731 ("mtd: nand: allocate
aligned buffers if NAND_OWN_BUFFERS is unset").

Fix the issue by providing a OMAP optimized write_subpage()
implementation.

Fixes: 62116e5171e0 ("mtd: nand: omap2: Support for hardware BCH error correction.")
Signed-off-by: Roger Quadros <rogerq@ti.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mtd: nand: atmel: Actually use the PM ops

commit 1533bfa6f6b6bcca1ea1f172ef4a1c5ce5e7b335 upstream.

commit 6e532afaca8e ("mtd: nand: atmel: Add PM ops") was defining PM
ops but nothing was using/referencing those PM ops.

Fixes: 6e532afaca8e ("mtd: nand: atmel: Add PM ops")
Cc: Romain Izard <romain.izard.pro@gmail.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Acked-by: Wenyou Yang <wenyou.yang@microchip.com>
Tested-by: Romain Izard <romain.izard.pro@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mtd: nand: Export nand_reset() symbol

commit b9bb98424c51437973b854691aa1e9b2bfd348f5 upstream.

Commit 6e532afaca8e ("mtd: nand: atmel: Add PM ops") started to use the
nand_reset() function which was not yet exported by the NAND framework
(because it was only used internally before that). Export this symbol
to avoid build errors when the driver is enabled as a module.

Fixes: 6e532afaca8e ("mtd: nand: atmel: Add PM ops")
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mtd: Avoid probe failures when mtd->dbg.dfs_dir is invalid

commit 1530578abdac4edce9244c7a1962ded3ffdb58ce upstream.

Commit e8e3edb95ce6 ("mtd: create per-device and module-scope debugfs
entries") tried to make MTD related debugfs stuff consistent across the
MTD framework by creating a root <debugfs>/mtd/ directory containing
one directory per MTD device.

The problem is that, by default, the MTD layer only registers the
master device if no partitions are defined for this master. This
behavior breaks all drivers that expect mtd->dbg.dfs_dir to be filled
correctly after calling mtd_device_register() in order to add their own
debugfs entries.

The only way we can force all MTD masters to be registered no matter if
they expose partitions or not is by enabling the
CONFIG_MTD_PARTITIONED_MASTER option.

In such situations, there's no other solution but to accept skipping
debugfs initialization when dbg.dfs_dir is invalid, and when this
happens, inform the user that he should consider enabling
CONFIG_MTD_PARTITIONED_MASTER.

Fixes: e8e3edb95ce6 ("mtd: create per-device and module-scope debugfs entries")
Cc: Mario J. Rugiero <mrugiero@gmail.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Reported-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

target: Avoid early CMD_T_PRE_EXECUTE failures during ABORT_TASK

commit 1c21a48055a67ceb693e9c2587824a8de60a217c upstream.

This patch fixes bug where early se_cmd exceptions that occur
before backend execution can result in use-after-free if/when
a subsequent ABORT_TASK occurs for the same tag.

Since an early se_cmd exception will have had se_cmd added to
se_session->sess_cmd_list via target_get_sess_cmd(), it will
not have CMD_T_COMPLETE set by the usual target_complete_cmd()
backend completion path.

This causes a subsequent ABORT_TASK + __target_check_io_state()
to signal ABORT_TASK should proceed. As core_tmr_abort_task()
executes, it will bring the outstanding se_cmd->cmd_kref count
down to zero releasing se_cmd, after se_cmd has already been
queued with error status into fabric driver response path code.

To address this bug, introduce a CMD_T_PRE_EXECUTE bit that is
set at target_get_sess_cmd() time, and cleared immediately before
backend driver dispatch in target_execute_cmd() once CMD_T_ACTIVE
is set.

Then, check CMD_T_PRE_EXECUTE within __target_check_io_state() to
determine when an early exception has occured, and avoid aborting
this se_cmd since it will have already been queued into fabric
driver response path code.

Reported-by: Donald White <dew@datera.io>
Cc: Donald White <dew@datera.io>
Cc: Mike Christie <mchristi@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

target: Fix quiese during transport_write_pending_qf endless loop

commit 9574a497df2bbc0a676b609ce0dd24d237cee3a6 upstream.

This patch fixes a potential end-less loop during QUEUE_FULL,
where cmd->se_tfo->write_pending() callback fails repeatedly
but __transport_wait_for_tasks() has already been invoked to
quiese the outstanding se_cmd descriptor.

To address this bug, this patch adds a CMD_T_STOP|CMD_T_ABORTED
check within transport_write_pending_qf() and invokes the
existing se_cmd->t_transport_stop_comp to signal quiese
completion back to __transport_wait_for_tasks().

Cc: Mike Christie <mchristi@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Cc: Michael Cyr <mikecyr@linux.vnet.ibm.com>
Cc: Potnuri Bharat Teja <bharat@chelsio.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

target: Fix caw_sem leak in transport_generic_request_failure

commit fd2f928b0ddd2fe8876d4f1344df2ace2b715a4d upstream.

With the recent addition of transport_check_aborted_status() within
transport_generic_request_failure() to avoid sending a SCSI status
exception after CMD_T_ABORTED w/ TAS=1 has occured, it introduced
a COMPARE_AND_WRITE early failure regression.

Namely when COMPARE_AND_WRITE fails and se_device->caw_sem has
been taken by sbc_compare_and_write(), if the new check for
transport_check_aborted_status() returns true and exits,
cmd->transport_complete_callback() -> compare_and_write_post()
is skipped never releasing se_device->caw_sem.

This regression was originally introduced by:

  commit e3b88ee95b4e4bf3e9729a4695d695b9c7c296c8
  Author: Bart Van Assche <bart.vanassche@sandisk.com>
  Date:   Tue Feb 14 16:25:45 2017 -0800

      target: Fix handling of aborted failed commands

To address this bug, move the transport_check_aborted_status()
call after transport_complete_task_attr() and
cmd->transport_complete_callback().

Cc: Mike Christie <mchristi@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

target: Fix QUEUE_FULL + SCSI task attribute handling

commit 1c79df1f349fb6050016cea4ef1dfbc3853a5685 upstream.

This patch fixes a bug during QUEUE_FULL where transport_complete_qf()
calls transport_complete_task_attr() after it's already been invoked
by target_complete_ok_work() or transport_generic_request_failure()
during initial completion, preceeding QUEUE_FULL.

This will result in se_device->simple_cmds, se_device->dev_cur_ordered_id
and/or se_device->dev_ordered_sync being updated multiple times for
a single se_cmd.

To address this bug, clear SCF_TASK_ATTR_SET after the first call
to transport_complete_task_attr(), and avoid updating SCSI task
attribute related counters for any subsequent calls.

Also, when a se_cmd is deferred due to ordered tags and executed
via target_restart_delayed_cmds(), set CMD_T_SENT before execution
matching what target_execute_cmd() does.

Cc: Michael Cyr <mikecyr@linux.vnet.ibm.com>
Cc: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Cc: Mike Christie <mchristi@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

target: fix buffer offset in core_scsi3_pri_read_full_status

commit c58a252beb04cf0e02d6a746b2ed7ea89b6deb71 upstream.

When at least two initiators register pr on the same LUN,
the target returns the exception data due to buffer offset
error, therefore the initiator executes command 'sg_persist -s'
may cause the initiator to appear segfault error.

This fixes a regression originally introduced by:

  commit a85d667e58bddf73be84d1981b41eaac985ed216
  Author: Bart Van Assche <bart.vanassche@sandisk.com>
  Date:   Tue May 23 16:48:27 2017 -0700

      target: Use {get,put}_unaligned_be*() instead of open coding these functions

Signed-off-by: tangwenji <tang.wenji@zte.com.cn>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

target: fix null pointer regression in core_tmr_drain_tmr_list

commit 88fb2fa7db7510bf1078226ab48d162d9854f3d4 upstream.

The target system kernel crash when the initiator executes
the sg_persist -A command,because of the second argument to
be set to NULL when core_tmr_lun_reset is called in
core_scsi3_pro_preempt function.

This fixes a regression originally introduced by:

  commit 51ec502a32665fed66c7f03799ede4023b212536
  Author: Bart Van Assche <bart.vanassche@sandisk.com>
  Date:   Tue Feb 14 16:25:54 2017 -0800

      target: Delete tmr from list before processing

Signed-off-by: tangwenji <tang.wenji@zte.com.cn>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

iscsi-target: Fix non-immediate TMR reference leak

commit 3fc9fb13a4b2576aeab86c62fd64eb29ab68659c upstream.

This patch fixes a se_cmd->cmd_kref reference leak that can
occur when a non immediate TMR is proceeded our of command
sequence number order, and CMDSN_LOWER_THAN_EXP is returned
by iscsit_sequence_cmd().

To address this bug, call target_put_sess_cmd() during this
special case following what iscsit_process_scsi_cmd() does
upon CMDSN_LOWER_THAN_EXP.

Cc: Mike Christie <mchristi@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

iscsi-target: Make TASK_REASSIGN use proper se_cmd->cmd_kref

commit ae072726f6109bb1c94841d6fb3a82dde298ea85 upstream.

Since commit 59b6986dbf fixed a potential NULL pointer dereference
by allocating a se_tmr_req for ISCSI_TM_FUNC_TASK_REASSIGN, the
se_tmr_req is currently leaked by iscsit_free_cmd() because no
iscsi_cmd->se_cmd.se_tfo was associated.

To address this, treat ISCSI_TM_FUNC_TASK_REASSIGN like any other
TMR and call transport_init_se_cmd() + target_get_sess_cmd() to
setup iscsi_cmd->se_cmd.se_tfo with se_cmd->cmd_kref of 2.

This will ensure normal release operation once se_cmd->cmd_kref
reaches zero and target_release_cmd_kref() is invoked, se_tmr_req
will be released via existing target_free_cmd_mem() and
core_tmr_release_req() code.

Reported-by: Donald White <dew@datera.io>
Cc: Donald White <dew@datera.io>
Cc: Mike Christie <mchristi@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

scsi: lpfc: Fix oops if nvmet_fc_register_targetport fails

commit e7981a2c725f8e237f749fa1358997707d57e32c upstream.

if nvmet targetport registration fails, the driver encounters a NULL
pointer oops in lpfc_hb_timeout_handler.

To fix: if registration fails, ensure nvmet_support is cleared on the
port structure.

Also enhanced the log message on failure.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

scsi: lpfc: Fix FCP hba_wqidx assignment

commit 8e036a9497c5d565baafda4c648f2f372999a547 upstream.

The driver is encountering oops in lpfc_sli_calc_ring.

The driver is setting hba_wqidx for FCP based on the policy in use for
NVME. The two may not be the same. Change to set the wqidx based on the
FCP policy.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

scsi: lpfc: Fix crash receiving ELS while detaching driver

commit 1234a6d54fed8a00091968c4eb2fb52e1cbb8e2e upstream.

The driver crashes when attempting to use a freed ndpl pointer.

The pci_remove_one handler runs on a separate kernel thread. The order
of the removal is starting by freeing all of the ndlps and then
disabling interrupts. In between these two events the driver can still
receive an ELS and process it. When it tries to use the ndlp pointer
will be NULL

Change the order of the pci_remove_one vs disable interrupts so that
interrupts are disabled before the ndlp's are freed.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

scsi: lpfc: fix pci hot plug crash in list_add call

commit 401bb4169da655f3e5d28d0b208182e1ab60bf2a upstream.

During pci hot plug, the kernel crashes in a list_add_call

The lookup by tag function will return null if the IOCB is out of range
or does not have the on txcmplq flag set.

Fix: Check for null return from lookup by tag.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

scsi: lpfc: fix pci hot plug crash in timer management routines

commit 1901762f2ca2747ed269239ca5332a8023ce4e3d upstream.

During pci hot plug, the kernel crashes in timer management code.

The sli4 remove_one handler is not stoping the timers as it starts to
remove the port so that it can be swapped.

Fix: Stop the timers early in the handler routine.

Note: Fix in SLI-4 only. SLI-3 already stopped the timers properly.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

scsi: sd_zbc: Fix sd_zbc_read_zoned_characteristics()

commit 4a109032e3941413d8a029f619543fc5aec1d26d upstream.

The three values starting at byte 8 of the Zoned Block Device
Characteristics VPD page B6h are 32 bits values, not 64bits. So use
get_unaligned_be32() to retrieve the values and not get_unaligned_be64()

Fixes: 89d947561077 ("sd: Implement support for ZBC devices")
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Bart Van Assche <Bart.VanAssche@wdc.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

scsi: qla2xxx: Suppress a kernel complaint in qla_init_base_qpair()

commit 8653188763b56e0bcbdcab30cc7b059672c900ac upstream.

Avoid that the following is reported while loading the qla2xxx
kernel module:

BUG: using smp_processor_id() in preemptible [00000000] code: modprobe/783
caller is debug_smp_processor_id+0x17/0x20
CPU: 7 PID: 783 Comm: modprobe Not tainted 4.14.0-rc8-dbg+ #2
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Call Trace:
dump_stack+0x8e/0xce
check_preemption_disabled+0xe3/0xf0
debug_smp_processor_id+0x17/0x20
qla2x00_probe_one+0xf43/0x26c0 [qla2xxx]
pci_device_probe+0xca/0x140
driver_probe_device+0x2e2/0x440
__driver_attach+0xa3/0xe0
bus_for_each_dev+0x5f/0x90
driver_attach+0x19/0x20
bus_add_driver+0x1c0/0x260
driver_register+0x5b/0xd0
__pci_register_driver+0x63/0x70
qla2x00_module_init+0x1d6/0x222 [qla2xxx]
do_one_initcall+0x3c/0x163
do_init_module+0x55/0x1eb
load_module+0x20a2/0x2890
SYSC_finit_module+0xd7/0xf0
SyS_finit_module+0x9/0x10
entry_SYSCALL_64_fastpath+0x23/0xc2

Fixes: commit 8abfa9e22683 ("scsi: qla2xxx: Add function call to qpair for door bell")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Quinn Tran <quinn.tran@cavium.com>
Cc: Himanshu Madhani <himanshu.madhani@cavium.com>
Acked-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

net/9p: Switch to wait_event_killable()

commit 9523feac272ccad2ad8186ba4fcc89103754de52 upstream.

Because userspace gets Very Unhappy when calls like stat() and execve()
return -EINTR on 9p filesystem mounts. For instance, when bash is
looking in PATH for things to execute and some SIGCHLD interrupts
stat(), bash can throw a spurious 'command not found' since it doesn't
retry the stat().

In practice, hitting the problem is rare and needs a really
slow/bogged down 9p server.

Signed-off-by: Tuomas Tynkkynen <tuomas@tuxera.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

fs/9p: Compare qid.path in v9fs_test_inode

commit 8ee031631546cf2f7859cc69593bd60bbdd70b46 upstream.

Commit fd2421f54423 ("fs/9p: When doing inode lookup compare qid details
and inode mode bits.") transformed v9fs_qid_iget() to use iget5_locked()
instead of iget_locked(). However, the test() callback is not checking
fid.path at all, which means that a lookup in the inode cache can now
accidentally locate a completely wrong inode from the same inode hash
bucket if the other fields (qid.type and qid.version) match.

Fixes: fd2421f54423 ("fs/9p: When doing inode lookup compare qid details and inode mode bits.")
Reviewed-by: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: Tuomas Tynkkynen <tuomas@tuxera.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

9p: Fix missing commas in mount options

commit 61b272c3aa170b3e461b8df636407b29f35f98eb upstream.

Since commit c4fac9100456 ("9p: Implement show_options"), the mount
options of 9p filesystems are printed out with some missing commas
between the individual options:

p9-scratch on /mnt/scratch type 9p (rw,dirsync,loose,access=clienttrans=virtio)

Add them back.

Fixes: c4fac9100456 ("9p: Implement show_options")
Signed-off-by: Tuomas Tynkkynen <tuomas@tuxera.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

fix a page leak in vhost_scsi_iov_to_sgl() error recovery

commit 11d49e9d089ccec81be87c2386dfdd010d7f7f6e upstream.

we are advancing sg as we go, so the pages we need to drop in
case of error are *before* the current sg.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mfd: lpc_ich: Avoton/Rangeley uses SPI_BYT method

commit 07d70913dce59f3c8e5d0ca76250861158a9ca6c upstream.

Avoton/Rangeley are based on Silvermount micro-architecture, like
Bay Trail, and uses the INTEL_SPI_BYT method to drive SPI.

Signed-off-by: Joakim Tjernlund <joakim.tjernlund@infinera.com>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ASoC: sun8i-codec: Set the BCLK divider

commit 316b7758c998fb13371d14bb6c9e45ab129c19a7 upstream.

While the current code was reporting to be able to work in master mode, it
failed to do so because the BCLK divider wasn't programmed, meaning that
the BCLK would run at the PLL's frequency no matter the sample rate.

It was obviously a bit too fast.

Add support to retrieve the divider to use, and set it. Since our PLL is
not always able to generate a perfect multiple of the sample rate, we'll
have to choose the closest divider that matches our setup.

Fixes: 36c684936fae ("ASoC: Add sun8i digital audio codec")
Reviewed-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ASoC: sun8i-codec: Fix left and right channels inversion

commit 18c1bf35c1c09bca05cf70bc984a4764e0b0372b upstream.

Since its introduction, the codec had an inversion of the left and right
channels. It turned out to be pretty simple as it appears that the codec
doesn't have the same polarity on the LRCK signal than the I2S block.

Fix this by inverting our bit value for the LRCK inversion.

Fixes: 36c684936fae ("ASoC: Add sun8i digital audio codec")
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Reviewed-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ASoC: sun8i-codec: Invert Master / Slave condition

commit 560bfe774f058e97596f30ff71cffdac52b72914 upstream.

The current code had the condition backward when checking if the codec
should be running in slave or master mode.

Fix it, and make the comment a bit more readable.

Fixes: 36c684936fae ("ASoC: Add sun8i digital audio codec")
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Reviewed-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ALSA: hda/realtek - Fix ALC700 family no sound issue

commit 2d7fe6185722b0817bb345f62ab06b76a7b26542 upstream.

It maybe the typo for ALC700 support patch.
To fix the bit value on this patch.

Fixes: 6fbae35a3170 ("ALSA: hda/realtek - Add support for new codecs ALC700/ALC701/ALC703")
Signed-off-by: Kailang Yang <kailang@realtek.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ALSA: hda - Fix yet remaining issue with vmaster 0dB initialization

commit d6c0615f510bc1ee26cfb2b9a3343ac99b9c46fb upstream.

The previous fix for addressing the breakage in vmaster slave
initialization, commit a91d66129fb9 ("ALSA: hda - Fix incorrect TLV
callback check introduced during set_fs() removal"), introduced a new
helper to process over each slave kctl.  However, this helper passes
only the original kctl, not the virtual slave kctl.  As a result,
HD-audio driver (which is the only user so far) couldn't initialize
the slave correctly because it's trying to update the value directly
with the original kctl, not with the mapped kctl.

This patch fixes the situation again by passing both the mapped slaved
and original slave kctls to the function.  Luckily there is a single
caller as of now, so changing the call signature is no big matter.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=197959
Fixes: a91d66129fb9 ("ALSA: hda - Fix incorrect TLV callback check introduced during set_fs() removal")
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ALSA: hda: Fix too short HDMI/DP chmap reporting

commit c2432466f583cb719b35a41e757da587d9ab1d00 upstream.

We got a regression report about the HD-audio HDMI chmap, where some
surround channels are reported as UNKNOWN.  The git bisection pointed
the culprit at the commit 9b3dc8aa3fb1 ("ALSA: hda - Register chmap
obj as priv data instead of codec").  The story behind scene is like
this:

- While moving the code out of the legacy HDA to the HDA common place,
  the patch modifies the code to obtain the chmap array indirectly in
  a byte array, and it expands it to kctl value array.
- At the latter operation, the size of the array is wrongly passed by
  sizeof() to the pointer.
- It can be 4 on 32bit arch, thus too short for 6+ channels.
  (And that's the reason why it didn't hit other persons; it's 8 on
  64bit arch, thus it's usually enough.)

The code was further changed meanwhile, but the problem persisted.
Let's fix it by correctly evaluating the array size.

Fixes: 9b3dc8aa3fb1 ("ALSA: hda - Register chmap obj as priv data instead of codec")
Reported-by: VDR User <user.vdr@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ALSA: hda/realtek - Fix ALC275 no sound issue

commit 3aabf94c2d95fe465d5fa8590113d1c1f7d8333d upstream.

Sound works after a cold boot but not after a reboot from windows.
This patch will solve this issue. This is relation with Class-D power control.

[ The bug was reported in Bugzilla below for Sony VAIO SVS13A1C5E
-- tiwai]

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=197737
Signed-off-by: Kailang Yang <kailang@realtek.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ALSA: timer: Remove kernel warning at compat ioctl error paths

commit 3d4e8303f2c747c8540a0a0126d0151514f6468b upstream.

Some timer compat ioctls have NULL checks of timer instance with
snd_BUG_ON() that bring up WARN_ON() when the debug option is set.
Actually the condition can be met in the normal situation and it's
confusing and bad to spew kernel warnings with stack trace there.
Let's remove snd_BUG_ON() invocation and replace with the simple
checks. Also, correct the error code to EBADFD to follow the native
ioctl error handling.

Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ALSA: usb-audio: Add sanity checks in v2 clock parsers

commit 0a62d6c966956d77397c32836a5bbfe3af786fc1 upstream.

The helper functions to parse and look for the clock source, selector
and multiplier unit may return the descriptor with a too short length
than required, while there is no sanity check in the caller side.
Add some sanity checks in the parsers, at least, to guarantee the
given descriptor size, for avoiding the potential crashes.

Fixes: 79f920fbff56 ("ALSA: usb-audio: parse clock topology of UAC2 devices")
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ALSA: usb-audio: Fix potential out-of-bound access at parsing SU

commit f658f17b5e0e339935dca23e77e0f3cad591926b upstream.

The usb-audio driver may trigger an out-of-bound access at parsing a
malformed selector unit, as it checks the header length only after
evaluating bNrInPins field, which can be already above the given
length. Fix it by adding the length check beforehand.

Fixes: 99fc86450c43 ("ALSA: usb-mixer: parse descriptors with structs")
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ALSA: usb-audio: Add sanity checks to FE parser

commit d937cd6790a2bef2d07b500487646bd794c039bb upstream.

When the usb-audio descriptor contains the malformed feature unit
description with a too short length, the driver may access
out-of-bounds. Add a sanity check of the header size at the beginning
of parse_audio_feature_unit().

Fixes: 23caaf19b11e ("ALSA: usb-mixer: Add support for Audio Class v2.0")
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ALSA: pcm: update tstamp only if audio_tstamp changed

commit 20e3f985bb875fea4f86b04eba4b6cc29bfd6b71 upstream.

commit 3179f6200188 ("ALSA: core: add .get_time_info") had a side effect
of changing the behaviour of the PCM runtime tstamp.  Prior to this
change tstamp was not updated by snd_pcm_update_hw_ptr0() unless the
hw_ptr had moved, after this change tstamp was always updated.

For an application using alsa-lib, doing snd_pcm_readi() followed by
snd_pcm_status() to estimate the age of the read samples by subtracting
status->avail * [sample rate] from status->tstamp this change degraded
the accuracy of the estimate on devices where the pcm hw does not
provide a granular hw_ptr, e.g., devices using
soc-generic-dmaengine-pcm.c and a dma-engine with residue_granularity
DMA_RESIDUE_GRANULARITY_DESCRIPTOR.  The accuracy of the estimate
depended on the latency between the PCM hw completing a period and the
driver called snd_pcm_period_elapsed() to notify ALSA core, typically
determined by interrupt handling latency.  After the change the accuracy
of the estimate depended on the latency between the PCM hw completing a
period and the application calling snd_pcm_status(), determined by the
scheduling of the application process.  The maximum error of the
estimate is one period length in both cases, but the error average and
variance is smaller when it depends on interrupt latency.

Instead of always updating tstamp, update it only if audio_tstamp
changed.

Fixes: 3179f6200188 ("ALSA: core: add .get_time_info")
Suggested-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Signed-off-by: Henrik Eriksson <henrik.eriksson@axis.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ext4: prevent data corruption with journaling + DAX

commit e9072d859df3e0f2c3ba450f0d1739595c2d5d13 upstream.

The current code has the potential for data corruption when changing an
inode's journaling mode, as that can result in a subsequent unsafe change
in S_DAX.

I've captured an instance of this data corruption in the following fstest:

https://patchwork.kernel.org/patch/9948377/

Prevent this data corruption from happening by disallowing changes to the
journaling mode if the '-o dax' mount option was used. This means that for
a given filesystem we could have a mix of inodes using either DAX or
data journaling, but whatever state the inodes are in will be held for the
duration of the mount.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ext4: prevent data corruption with inline data + DAX

commit 559db4c6d784ceedc2a5418ced4d357cb843e221 upstream.

If an inode has inline data it is currently prevented from using DAX by a
check in ext4_set_inode_flags(). When the inode grows inline data via
ext4_create_inline_data() or removes its inline data via
ext4_destroy_inline_data_nolock(), the value of S_DAX can change.

Currently these changes are unsafe because we don't hold off page faults
and I/O, write back dirty radix tree entries and invalidate all mappings.
There are also issues with mm-level races when changing the value of S_DAX,
as well as issues with the VM_MIXEDMAP flag:

https://www.spinics.net/lists/linux-xfs/msg09859.html

The unsafe transition of S_DAX can reliably cause data corruption, as shown
by the following fstest:

https://patchwork.kernel.org/patch/9948381/

Fix this issue by preventing the DAX mount option from being used on
filesystems that were created to support inline data. Inline data is an
option given to mkfs.ext4.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ext4: fix interaction between i_size, fallocate, and delalloc after a crash

commit 51e3ae81ec58e95f10a98ef3dd6d7bce5d8e35a2 upstream.

If there are pending writes subject to delayed allocation, then i_size
will show size after the writes have completed, while i_disksize
contains the value of i_size on the disk (since the writes have not
been persisted to disk).

If fallocate(2) is called with the FALLOC_FL_KEEP_SIZE flag, either
with or without the FALLOC_FL_ZERO_RANGE flag set, and the new size
after the fallocate(2) is between i_size and i_disksize, then after a
crash, if a journal commit has resulted in the changes made by the
fallocate() call to be persisted after a crash, but the delayed
allocation write has not resolved itself, i_size would not be updated,
and this would cause the following e2fsck complaint:

Inode 12, end of extent exceeds allowed value
(logical block 33, physical block 33441, len 7)

This can only take place on a sparse file, where the fallocate(2) call
is allocating blocks in a range which is before a pending delayed
allocation write which is extending i_size. Since this situation is
quite rare, and the window in which the crash must take place is
typically < 30 seconds, in practice this condition will rarely happen.

Nevertheless, it can be triggered in testing, and in particular by
xfstests generic/456.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reported-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ata: fixes kernel crash while tracing ata_eh_link_autopsy event

commit f1601113ddc0339a745e702f4fb1ca37d4875e65 upstream.

When tracing ata link error event, the kernel crashes when the disk is
removed due to NULL pointer access by trace_ata_eh_link_autopsy API.
This occurs as the dev is NULL when the disk disappeared. This patch
fixes this crash by calling trace_ata_eh_link_autopsy only if "dev"
is not NULL.

v2 changes:
Removed direct passing "link" pointer instead of "dev" in trace API.

Signed-off-by: Rameshwar Prasad Sahu <rsahu@apm.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: 255c03d15a29 ("libata: Add tracepoints")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

fsnotify: fix pinning group in fsnotify_prepare_user_wait()

commit 9a31d7ad997f55768c687974ce36b759065b49e5 upstream.

Blind increment of group's user_waits is not enough, we could be far enough
in the group's destruction that it isn't taken into account (i.e. grabbing
the mark ref afterwards doesn't guarantee that it was the ref coming from
the _group_ that was grabbed).

Instead we need to check (under lock) that the mark is still attached to
the group after having obtained a ref to the mark. If not, skip it.

Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 9385a84d7e1f ("fsnotify: Pass fsnotify_iter_info into handle_event handler")
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

fsnotify: pin both inode and vfsmount mark

commit 0d6ec079d6aaa098b978d6395973bb027c752a03 upstream.

We may fail to pin one of the marks in fsnotify_prepare_user_wait() when
dropping the srcu read lock, resulting in use after free at the next
iteration.

Solution is to store both marks in iter_info instead of just the one we'll
be sending the event for.

Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 9385a84d7e1f ("fsnotify: Pass fsnotify_iter_info into handle_event handler")
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

fsnotify: clean up fsnotify_prepare/finish_user_wait()

commit 24c20305c7fc8959836211cb8c50aab93ae0e54f upstream.

This patch doesn't actually fix any bug, just paves the way for fixing mark
and group pinning.

Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

md/bitmap: revert a patch

commit 938b533d479e7428b7fa1b8179283646d2e2c53d upstream.

This reverts commit 8031c3ddc70a. That patches doesn't work well if PAGE_SIZE >
4k. We will fix the original problem with a different approach.

Fix: 8031c3ddc70a(md/bitmap: copy correct data for bitmap super)
Reported-by: Joshua Kinard <kumba@gentoo.org>
Suggested-by: Neil Brown <neilb@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Bluetooth: btqcomsmd: Add support for BD address setup

commit 6e518111060c2290427d79c43d4add9600ad852b upstream.

This patch implements the hdev setup function since wcnss-bt does not have
persistent memory to store an allocated BD address. The device is therefore
marked as unconfigured if no BD address has been previously retrieved.

Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

md: don't check MD_SB_CHANGE_CLEAN in md_allow_write

commit b90f6ff080c52e2f05364210733df120e3c4e597 upstream.

Only MD_SB_CHANGE_PENDING should be used to wait for transition from
clean to dirty. Checking also MD_SB_CHANGE_CLEAN is unnecessary and can
race with e.g. md_do_sync(). This sporadically causes a hang when
changing consistency policy during resync:

INFO: task mdadm:6183 blocked for more than 30 seconds.
Not tainted 4.14.0-rc3+ #391
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mdadm D12752 6183 6022 0x00000000
Call Trace:
__schedule+0x93f/0x990
schedule+0x6b/0x90
md_allow_write+0x100/0x130 [md_mod]
? do_wait_intr_irq+0x90/0x90
resize_stripes+0x3a/0x5b0 [raid456]
? kernfs_fop_write+0xbe/0x180
raid5_change_consistency_policy+0xa6/0x200 [raid456]
consistency_policy_store+0x2e/0x70 [md_mod]
md_attr_store+0x90/0xc0 [md_mod]
sysfs_kf_write+0x42/0x50
kernfs_fop_write+0x119/0x180
__vfs_write+0x28/0x110
? rcu_sync_lockdep_assert+0x12/0x60
? __sb_start_write+0x15a/0x1c0
? vfs_write+0xa3/0x1a0
vfs_write+0xb4/0x1a0
SyS_write+0x49/0xa0
entry_SYSCALL_64_fastpath+0x18/0xad

Fixes: 2214c260c72b ("md: don't return -EAGAIN in md_allow_write for external metadata arrays")
Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

md: fix deadlock error in recent patch.

commit d47c8ad261f787af22a220ffcc2d07afba809223 upstream.

A recent patch aimed to cause md_write_start() to fail (rather than
block) when the mddev was suspending, so as to avoid deadlocks.
Unfortunately the test in wait_event() was wrong, and it didn't change
behaviour at all.

We wait_event() must wait until the metadata is written OR the array is
suspending.

Fixes: cc27b0c78c79 ("md: fix deadlock between mddev_suspend() and md_write_start()")
Reported-by: Xiao Ni <xni@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

iwlwifi: fix firmware names for 9000 and A000 series hw

commit c2c48ddfc8b03b9ecb51d2832b586497b37531bc upstream.

iwlwifi 9000 and a0000 series hw contains an extra dash in firmware
file name as seeen in modinfo output for kernel 4.14:

firmware:       iwlwifi-9260-th-b0-jf-b0--34.ucode
firmware:       iwlwifi-9260-th-a0-jf-a0--34.ucode
firmware:       iwlwifi-9000-pu-a0-jf-b0--34.ucode
firmware:       iwlwifi-9000-pu-a0-jf-a0--34.ucode
firmware:       iwlwifi-QuQnj-a0-hr-a0--34.ucode
firmware:       iwlwifi-QuQnj-a0-jf-b0--34.ucode
firmware:       iwlwifi-QuQnj-f0-hr-a0--34.ucode
firmware:       iwlwifi-Qu-a0-jf-b0--34.ucode
firmware:       iwlwifi-Qu-a0-hr-a0--34.ucode

Fix that by dropping the extra adding of '"-"'.

Signed-off-by: Thomas Backlund <tmb@mageia.org>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

rtlwifi: fix uninitialized rtlhal->last_suspend_sec time

commit 3f2a162fab15aee243178b5308bb5d1206fc4043 upstream.

We set rtlhal->last_suspend_sec to an uninitialized stack variable,
but unfortunately gcc never warned about this, I only found it
while working on another patch. I opened a gcc bug for this.

Presumably the value of rtlhal->last_suspend_sec is not all that
important, but it does get used, so we probably want the
patch backported to stable kernels.

Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82839
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

rtlwifi: rtl8192ee: Fix memory leak when loading firmware

commit 519ce2f933fa14acf69d5c8cabcc18711943d629 upstream.

In routine rtl92ee_set_fw_rsvdpagepkt(), the driver allocates an skb, but
never calls rtl_cmd_send_packet(), which will free the buffer. All other
rtlwifi drivers perform this operation correctly.

This problem has been in the driver since it was included in the kernel.
Fortunately, each firmware load only leaks 4 buffers, which likely
explains why it has not previously been detected.

Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

nfsd: deal with revoked delegations appropriately

commit 95da1b3a5aded124dd1bda1e3cdb876184813140 upstream.

If a delegation has been revoked by the server, operations using that
delegation should error out with NFS4ERR_DELEG_REVOKED in the >4.1
case, and NFS4ERR_BAD_STATEID otherwise.

The server needs NFSv4.1 clients to explicitly free revoked delegations.
If the server returns NFS4ERR_DELEG_REVOKED, the client will do that;
otherwise it may just forget about the delegation and be unable to
recover when it later sees SEQ4_STATUS_RECALLABLE_STATE_REVOKED set on a
SEQUENCE reply. That can cause the Linux 4.1 client to loop in its
stage manager.

Signed-off-by: Andrew Elble <aweits@rit.edu>
Reviewed-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

NFS: revalidate "." etc correctly on "open".

commit b688741cb06695312f18b730653d6611e1bad28d upstream.

For correct close-to-open semantics, NFS must validate
the change attribute of a directory (or file) on open.

Since commit ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a
d_weak_revalidate dentry op"), open() of "." or a path ending ".." is
not revalidated reliably (except when that direct is a mount point).

Prior to that commit, "." was revalidated using nfs_lookup_revalidate()
which checks the LOOKUP_OPEN flag and forces revalidation if the flag is
set.
Since that commit, nfs_weak_revalidate() is used for NFSv3 (which
ignores the flags) and nothing is used for NFSv4.

This is fixed by using nfs_lookup_verify_inode() in
nfs_weak_revalidate(). This does the revalidation exactly when needed.
Also, add a definition of .d_weak_revalidate for NFSv4.

The incorrect behavior is easily demonstrated by running "echo *" in
some non-mountpoint NFS directory while watching network traffic.
Without this patch, "echo *" sometimes doesn't produce any traffic.
With the patch it always does.

Fixes: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op")
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

NFS: Avoid RCU usage in tracepoints

commit 3944369db701f075092357b511fd9f5755771585 upstream.

There isn't an obvious way to acquire and release the RCU lock during a
tracepoint, so we can't use the rpc_peeraddr2str() function here.
Instead, rely on the client's cl_hostname, which should have similar
enough information without needing an rcu_dereference().

Reported-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

nfs: Fix ugly referral attributes

commit c05cefcc72416a37eba5a2b35f0704ed758a9145 upstream.

Before traversing a referral and performing a mount, the mounted-on
directory looks strange:

dr-xr-xr-x. 2 4294967294 4294967294 0 Dec 31 1969 dir.0

nfs4_get_referral is wiping out any cached attributes with what was
returned via GETATTR(fs_locations), but the bit mask for that
operation does not request any file attributes.

Retrieve owner and timestamp information so that the memcpy in
nfs4_get_referral fills in more attributes.

Changes since v1:
- Don't request attributes that the client unconditionally replaces
- Request only MOUNTED_ON_FILEID or FILEID attribute, not both
- encode_fs_locations() doesn't use the third bitmask word

Fixes: 6b97fd3da1ea ("NFSv4: Follow a referral")
Suggested-by: Pradeep Thomas <pradeepthomas@gmail.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

NFS: Revert "NFS: Move the flock open mode check into nfs_flock()"

commit fcfa447062b2061e11f68b846d61cbfe60d0d604 upstream.

Commit e12937279c8b "NFS: Move the flock open mode check into nfs_flock()"
changed NFSv3 behavior for flock() such that the open mode must match the
lock type, however that requirement shouldn't be enforced for flock().

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

NFS: Fix typo in nomigration mount option

commit f02fee227e5f21981152850744a6084ff3fa94ee upstream.

The option was incorrectly masking off all other options.

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

f2fs: expose some sectors to user in inline data or dentry case

commit 5b4267d195dd887c4412e34b5a7365baa741b679 upstream.

If there's some data written through inline data or dentry, we need to shouw
st_blocks. This fixes reporting zero blocks even though there is small written
data.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: avoid link file for quotacheck]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

btrfs: change how we decide to commit transactions during flushing

commit 996478ca9c460886ac147eb0d00e99841b71d31b upstream.

Nikolay reported that generic/273 was failing currently with ENOSPC.
Turns out this is because we get to the point where the outstanding
reservations are greater than the pinned space on the fs. This is a
mistake, previously we used the current reservation amount in
may_commit_transaction, not the entire outstanding reservation amount.
Fix this to find the minimum byte size needed to make progress in
flushing, and pass that into may_commit_transaction. From there we can
make a smarter decision on whether to commit the transaction or not.
This fixes the failure in generic/273.

From Nikolai, IOW: when we go to the final stage of deciding whether to
do trans commit, instead of passing all the reservations from all
tickets we just pass the reservation for the current ticket. Otherwise,
in case all reservations exceed pinned space, then we don't commit
transaction and fail prematurely. Before we passed num_bytes from
flush_space, where num_bytes was the sum of all pending reserverations,
but now all we do is take the first ticket and commit the trans if we
can satisfy that.

Fixes: 957780eb2788 ("Btrfs: introduce ticketed enospc infrastructure")
Reported-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Tested-by: Nikolay Borisov <nborisov@suse.com>
[ added Nikolai's comment ]
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

isofs: fix timestamps beyond 2027

commit 34be4dbf87fc3e474a842305394534216d428f5d upstream.

isofs uses a 'char' variable to load the number of years since
1900 for an inode timestamp. On architectures that use a signed
char type by default, this results in an invalid date for
anything beyond 2027.

This changes the function argument to a 'u8' array, which
is defined the same way on all architectures, and unambiguously
lets us use years until 2155.

This should be backported to all kernels that might still be
in use by that date.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

fanotify: fix fsnotify_prepare_user_wait() failure

commit f37650f1c7c71cf5180b43229d13b421d81e7170 upstream.

If fsnotify_prepare_user_wait() fails, we leave the event on the
notification list. Which will result in a warning in
fsnotify_destroy_event() and later use-after-free.

Instead of adding a new helper to remove the event from the list in this
case, I opted to move the prepare/finish up into fanotify_handle_event().

This will allow these to be moved further out into the generic code later,
and perhaps let us move to non-sleeping RCU.

Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 05f0e38724e8 ("fanotify: Release SRCU lock when waiting for userspace response")
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

fs: guard_bio_eod() needs to consider partitions

commit 67f2519fe2903c4041c0e94394d14d372fe51399 upstream.

guard_bio_eod() needs to look at the partition capacity, not just the
capacity of the whole device, when determining if truncation is
necessary.

[   60.268688] attempt to access beyond end of device
[   60.268690] unknown-block(9,1): rw=0, want=67103509, limit=67103506
[   60.268693] buffer_io_error: 2 callbacks suppressed
[   60.268696] Buffer I/O error on dev md1p7, logical block 4524305, async page read

Fixes: 74d46992e0d9 ("block: replace bi_bdev with a gendisk pointer and partitions index")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Edwards <gedwards@ddn.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

bcache: check ca->alloc_thread initialized before wake up it

commit 91af8300d9c1d7c6b6a2fd754109e08d4798b8d8 upstream.

In bcache code, sysfs entries are created before all resources get
allocated, e.g. allocation thread of a cache set.

There is posibility for NULL pointer deference if a resource is accessed
but which is not initialized yet. Indeed Jorg Bornschein catches one on
cache set allocation thread and gets a kernel oops.

The reason for this bug is, when bch_bucket_alloc() is called during
cache set registration and attaching, ca->alloc_thread is not properly
allocated and initialized yet, call wake_up_process() on ca->alloc_thread
triggers NULL pointer deference failure. A simple and fast fix is, before
waking up ca->alloc_thread, checking whether it is allocated, and only
wake up ca->alloc_thread when it is not NULL.

Signed-off-by: Coly Li <colyli@suse.de>
Reported-by: Jorg Bornschein <jb@capsec.org>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Reviewed-by: Michael Lyle <mlyle@lyle.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

libceph: don't WARN() if user tries to add invalid key

commit b11270853fa3654f08d4a6a03b23ddb220512d8d upstream.

The WARN_ON(!key->len) in set_secret() in net/ceph/crypto.c is hit if a
user tries to add a key of type "ceph" with an invalid payload as
follows (assuming CONFIG_CEPH_LIB=y):

echo -e -n '\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' \
| keyctl padd ceph desc @s

This can be hit by fuzzers. As this is merely bad input and not a
kernel bug, replace the WARN_ON() with return -EINVAL.

Fixes: 7af3ea189a9a ("libceph: stop allocating a new cipher on every crypto request")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

eCryptfs: use after free in ecryptfs_release_messaging()

commit db86be3a12d0b6e5c5b51c2ab2a48f06329cb590 upstream.

We're freeing the list iterator so we should be using the _safe()
version of hlist_for_each_entry().

Fixes: 88b4a07e6610 ("[PATCH] eCryptfs: Public key transport mechanism")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

fscrypt: lock mutex before checking for bounce page pool

commit a0b3bc855374c50b5ea85273553485af48caf2f7 upstream.

fscrypt_initialize(), which allocates the global bounce page pool when
an encrypted file is first accessed, uses "double-checked locking" to
try to avoid locking fscrypt_init_mutex.  However, it doesn't use any
memory barriers, so it's theoretically possible for a thread to observe
a bounce page pool which has not been fully initialized.  This is a
classic bug with "double-checked locking".

While "only a theoretical issue" in the latest kernel, in pre-4.8
kernels the pointer that was checked was not even the last to be
initialized, so it was easily possible for a crash (NULL pointer
dereference) to happen.  This was changed only incidentally by the large
refactor to use fs/crypto/.

Solve both problems in a trivial way that can easily be backported: just
always take the mutex.  It's theoretically less efficient, but it
shouldn't be noticeable in practice as the mutex is only acquired very
briefly once per encrypted file.

Later I'd like to make this use a helper macro like DO_ONCE().  However,
DO_ONCE() runs in atomic context, so we'd need to add a new macro that
allows blocking.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

nilfs2: fix race condition that causes file system corruption

commit 31ccb1f7ba3cfe29631587d451cf5bb8ab593550 upstream.

There is a race condition between nilfs_dirty_inode() and
nilfs_set_file_dirty().

When a file is opened, nilfs_dirty_inode() is called to update the
access timestamp in the inode.  It calls __nilfs_mark_inode_dirty() in a
separate transaction.  __nilfs_mark_inode_dirty() caches the ifile
buffer_head in the i_bh field of the inode info structure and marks it
as dirty.

After some data was written to the file in another transaction, the
function nilfs_set_file_dirty() is called, which adds the inode to the
ns_dirty_files list.

Then the segment construction calls nilfs_segctor_collect_dirty_files(),
which goes through the ns_dirty_files list and checks the i_bh field.
If there is a cached buffer_head in i_bh it is not marked as dirty
again.

Since nilfs_dirty_inode() and nilfs_set_file_dirty() use separate
transactions, it is possible that a segment construction that writes out
the ifile occurs in-between the two.  If this happens the inode is not
on the ns_dirty_files list, but its ifile block is still marked as dirty
and written out.

In the next segment construction, the data for the file is written out
and nilfs_bmap_propagate() updates the b-tree.  Eventually the bmap root
is written into the i_bh block, which is not dirty, because it was
written out in another segment construction.

As a result the bmap update can be lost, which leads to file system
corruption.  Either the virtual block address points to an unallocated
DAT block, or the DAT entry will be reused for something different.

The error can remain undetected for a long time.  A typical error
message would be one of the "bad btree" errors or a warning that a DAT
entry could not be found.

This bug can be reproduced reliably by a simple benchmark that creates
and overwrites millions of 4k files.

Link: http://lkml.kernel.org/r/1509367935-3086-2-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Andreas Rohner <andreas.rohner@gmx.net>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: Andreas Rohner <andreas.rohner@gmx.net>
Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

autofs: don't fail mount for transient error

commit ecc0c469f27765ed1e2b967be0aa17cee1a60b76 upstream.

Currently if the autofs kernel module gets an error when writing to the
pipe which links to the daemon, then it marks the whole moutpoint as
catatonic, and it will stop working.

It is possible that the error is transient. This can happen if the
daemon is slow and more than 16 requests queue up. If a subsequent
process tries to queue a request, and is then signalled, the write to
the pipe will return -ERESTARTSYS and autofs will take that as total
failure.

So change the code to assess -ERESTARTSYS and -ENOMEM as transient
failures which only abort the current request, not the whole mountpoint.

It isn't a crash or a data corruption, but having autofs mountpoints
suddenly stop working is rather inconvenient.

Ian said:

: And given the problems with a half dozen (or so) user space applications
: consuming large amounts of CPU under heavy mount and umount activity this
: could happen more easily than we expect.

Link: http://lkml.kernel.org/r/87y3norvgp.fsf@notabene.neil.brown.name
Signed-off-by: NeilBrown <neilb@suse.com>
Acked-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mm/z3fold.c: use kref to prevent page free/compact race

commit 5d03a6613957785e94af7a4a6212ad4af66aa5c2 upstream.

There is a race in the current z3fold implementation between
do_compact() called in a work queue context and the page release
procedure when page's kref goes to 0.

do_compact() may be waiting for page lock, which is released by
release_z3fold_page_locked right before putting the page onto the
"stale" list, and then the page may be freed as do_compact() modifies
its contents.

The mechanism currently implemented to handle that (checking the
PAGE_STALE flag) is not reliable enough. Instead, we'll use page's kref
counter to guarantee that the page is not released if its compaction is
scheduled. It then becomes compaction function's responsibility to
decrease the counter and quit immediately if the page was actually
freed.

Link: http://lkml.kernel.org/r/20171117092032.00ea56f42affbed19f4fcc6c@gmail.com
Signed-off-by: Vitaly Wool <vitaly.wool@sonymobile.com>
Cc: <Oleksiy.Avramchenko@sony.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>