Bart Van Assche [Tue, 31 Oct 2017 18:03:17 +0000 (11:03 -0700)]
target/iscsi: Fix a race condition in iscsit_add_reject_from_cmd()
[ Upstream commit
cfe2b621bb18d86e93271febf8c6e37622da2d14 ]
Avoid that cmd->se_cmd.se_tfo is read after a command has already been
freed.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Mike Christie <mchristi@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bart Van Assche [Tue, 31 Oct 2017 18:03:18 +0000 (11:03 -0700)]
target/iscsi: Detect conn_cmd_list corruption early
[ Upstream commit
6eaf69e4ec075f5af236c0c89f75639a195db904 ]
Certain behavior of the initiator can cause the target driver to
send both a reject and a SCSI response. If that happens two
target_put_sess_cmd() calls will occur without the command having
been removed from conn_cmd_list. In other words, conn_cmd_list
will get corrupted once the freed memory is reused. Although the
Linux kernel can detect list corruption if list debugging is
enabled, in this case the context in which list corruption is
detected is not related to the context that caused list corruption.
Hence add WARN_ON() statements that report the context that is
causing list corruption.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Mike Christie <mchristi@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Kuppuswamy Sathyanarayanan [Sun, 29 Oct 2017 09:49:54 +0000 (02:49 -0700)]
platform/x86: intel_punit_ipc: Fix resource ioremap warning
[ Upstream commit
6cc8cbbc8868033f279b63e98b26b75eaa0006ab ]
For PUNIT device, ISPDRIVER_IPC and GTDDRIVER_IPC resources are not
mandatory. So when PMC IPC driver creates a PUNIT device, if these
resources are not available then it creates dummy resource entries for
these missing resources. But during PUNIT device probe, doing ioremap on
these dummy resources generates following warning messages.
intel_punit_ipc: can't request region for resource [mem 0x00000000]
intel_punit_ipc: can't request region for resource [mem 0x00000000]
intel_punit_ipc: can't request region for resource [mem 0x00000000]
intel_punit_ipc: can't request region for resource [mem 0x00000000]
This patch fixes this issue by adding extra check for resource size
before performing ioremap operation.
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tyrel Datwyler [Fri, 29 Sep 2017 00:19:20 +0000 (20:19 -0400)]
powerpc/pseries/vio: Dispose of virq mapping on vdevice unregister
[ Upstream commit
b8f89fea599d91e674497aad572613eb63181f31 ]
When a vdevice is DLPAR removed from the system the vio subsystem
doesn't bother unmapping the virq from the irq_domain. As a result we
have a virq mapped to a hardware irq that is no longer valid for the
irq_domain. A side effect is that we are left with /proc/irq/<irq#>
affinity entries, and attempts to modify the smp_affinity of the irq
will fail.
In the following observed example the kernel log is spammed by
ics_rtas_set_affinity errors after the removal of a VSCSI adapter.
This is a result of irqbalance trying to adjust the affinity every 10
seconds.
rpadlpar_io: slot U8408.E8E.10A7ACV-V5-C25 removed
ics_rtas_set_affinity: ibm,set-xive irq=655385 returns -3
ics_rtas_set_affinity: ibm,set-xive irq=655385 returns -3
This patch fixes the issue by calling irq_dispose_mapping() on the
virq of the viodev on unregister.
Fixes:
f2ab6219969f ("powerpc/pseries: Add PFO support to the VIO bus")
Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Christophe Leroy [Wed, 18 Oct 2017 09:16:47 +0000 (11:16 +0200)]
powerpc/ipic: Fix status get and status clear
[ Upstream commit
6b148a7ce72a7f87c81cbcde48af014abc0516a9 ]
IPIC Status is provided by register IPIC_SERSR and not by IPIC_SERMR
which is the mask register.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
William A. Kennington III [Fri, 22 Sep 2017 23:58:00 +0000 (16:58 -0700)]
powerpc/opal: Fix EBUSY bug in acquiring tokens
[ Upstream commit
71e24d7731a2903b1ae2bba2b2971c654d9c2aa6 ]
The current code checks the completion map to look for the first token
that is complete. In some cases, a completion can come in but the
token can still be on lease to the caller processing the completion.
If this completed but unreleased token is the first token found in the
bitmap by another tasks trying to acquire a token, then the
__test_and_set_bit call will fail since the token will still be on
lease. The acquisition will then fail with an EBUSY.
This patch reorganizes the acquisition code to look at the
opal_async_token_map for an unleased token. If the token has no lease
it must have no outstanding completions so we should never see an
EBUSY, unless we have leased out too many tokens. Since
opal_async_get_token_inrerruptible is protected by a semaphore, we
will practically never see EBUSY anymore.
Fixes:
8d7248232208 ("powerpc/powernv: Infrastructure to support OPAL async completion")
Signed-off-by: William A. Kennington III <wak@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
KUWAZAWA Takuya [Sun, 15 Oct 2017 11:54:10 +0000 (20:54 +0900)]
netfilter: ipvs: Fix inappropriate output of procfs
[ Upstream commit
c5504f724c86ee925e7ffb80aa342cfd57959b13 ]
Information about ipvs in different network namespace can be seen via procfs.
How to reproduce:
# ip netns add ns01
# ip netns add ns02
# ip netns exec ns01 ip a add dev lo 127.0.0.1/8
# ip netns exec ns02 ip a add dev lo 127.0.0.1/8
# ip netns exec ns01 ipvsadm -A -t 10.1.1.1:80
# ip netns exec ns02 ipvsadm -A -t 10.1.1.2:80
The ipvsadm displays information about its own network namespace only.
# ip netns exec ns01 ipvsadm -Ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 10.1.1.1:80 wlc
# ip netns exec ns02 ipvsadm -Ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 10.1.1.2:80 wlc
But I can see information about other network namespace via procfs.
# ip netns exec ns01 cat /proc/net/ip_vs
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP
0A010101:0050 wlc
TCP
0A010102:0050 wlc
# ip netns exec ns02 cat /proc/net/ip_vs
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP
0A010102:0050 wlc
Signed-off-by: KUWAZAWA Takuya <albatross0@gmail.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Gustavo A. R. Silva [Sun, 5 Nov 2017 04:52:54 +0000 (23:52 -0500)]
thunderbolt: tb: fix use after free in tb_activate_pcie_devices
[ Upstream commit
a2e373438f72391493a4425efc1b82030b6b4fd5 ]
Add a ̣̣continue statement in order to avoid using a previously
free'd pointer tunnel in list_add.
Addresses-Coverity-ID: 1415336
Fixes:
9d3cce0b6136 ("thunderbolt: Introduce thunderbolt bus and connection manager")
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Matthias Brugger [Mon, 30 Oct 2017 11:37:55 +0000 (12:37 +0100)]
iommu/mediatek: Fix driver name
[ Upstream commit
395df08d2e1de238a9c8c33fdcd0e2160efd63a9 ]
There exist two Mediatek iommu drivers for the two different
generations of the device. But both drivers have the same name
"mtk-iommu". This breaks the registration of the second driver:
Error: Driver 'mtk-iommu' is already registered, aborting...
Fix this by changing the name for first generation to
"mtk-iommu-v1".
Fixes:
b17336c55d89 ("iommu/mediatek: add support for mtk iommu generation one HW")
Signed-off-by: Matthias Brugger <matthias.bgg@gmail.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Mika Westerberg [Fri, 13 Oct 2017 18:35:43 +0000 (21:35 +0300)]
PCI: Do not allocate more buses than available in parent
[ Upstream commit
a20c7f36bd3d20d245616ae223bb9d05dfb6f050 ]
One can ask more buses to be reserved for hotplug bridges by passing
pci=hpbussize=N in the kernel command line. If the parent bus does not
have enough bus space available we incorrectly create child bus with the
requested number of subordinate buses.
In the example below hpbussize is set to one more than we have available
buses in the root port:
pci 0000:07:00.0: [8086:1578] type 01 class 0x060400
pci 0000:07:00.0: scanning [bus 00-00] behind bridge, pass 0
pci 0000:07:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
pci 0000:07:00.0: scanning [bus 00-00] behind bridge, pass 1
pci_bus 0000:08: busn_res: can not insert [bus 08-ff] under [bus 07-3f] (conflicts with (null) [bus 07-3f])
pci_bus 0000:08: scanning bus
...
pci_bus 0000:0a: bus scan returning with max=40
pci_bus 0000:0a: busn_res: [bus 0a-ff] end is updated to 40
pci_bus 0000:0a: [bus 0a-40] partially hidden behind bridge 0000:07 [bus 07-3f]
pci_bus 0000:08: bus scan returning with max=40
pci_bus 0000:08: busn_res: [bus 08-ff] end is updated to 40
Instead of allowing this, limit the subordinate number to be less than or
equal the maximum subordinate number allocated for the parent bus (if it
has any).
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
[bhelgaas: remove irrelevant dmesg messages]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Shriya [Fri, 13 Oct 2017 04:36:41 +0000 (10:06 +0530)]
powerpc/powernv/cpufreq: Fix the frequency read by /proc/cpuinfo
[ Upstream commit
cd77b5ce208c153260ed7882d8910f2395bfaabd ]
The call to /proc/cpuinfo in turn calls cpufreq_quick_get() which
returns the last frequency requested by the kernel, but may not
reflect the actual frequency the processor is running at. This patch
makes a call to cpufreq_get() instead which returns the current
frequency reported by the hardware.
Fixes:
fb5153d05a7d ("powerpc: powernv: Implement ppc_md.get_proc_freq()")
Signed-off-by: Shriya <shriyak@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Qiang [Thu, 28 Sep 2017 03:54:34 +0000 (11:54 +0800)]
PCI/PME: Handle invalid data when reading Root Status
[ Upstream commit
3ad3f8ce50914288731a3018b27ee44ab803e170 ]
PCIe PME and native hotplug share the same interrupt number, so hotplug
interrupts are also processed by PME. In some cases, e.g., a Link Down
interrupt, a device may be present but unreachable, so when we try to
read its Root Status register, the read fails and we get all ones data
(0xffffffff).
Previously, we interpreted that data as PCI_EXP_RTSTA_PME being set, i.e.,
"some device has asserted PME," so we scheduled pcie_pme_work_fn(). This
caused an infinite loop because pcie_pme_work_fn() tried to handle PME
requests until PCI_EXP_RTSTA_PME is cleared, but with the link down,
PCI_EXP_RTSTA_PME can't be cleared.
Check for the invalid 0xffffffff data everywhere we read the Root Status
register.
1469d17dd341 ("PCI: pciehp: Handle invalid data when reading from
non-existent devices") added similar checks in the hotplug driver.
Signed-off-by: Qiang Zheng <zhengqiang10@huawei.com>
[bhelgaas: changelog, also check in pcie_pme_work_fn(), use "~0" to follow
other similar checks]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Wei Yongjun [Mon, 6 Nov 2017 11:11:28 +0000 (11:11 +0000)]
mlxsw: spectrum: Fix error return code in mlxsw_sp_port_create()
[ Upstream commit
d86fd113ebbb37726ef7c7cc6fd6d5ce377455d6 ]
Fix to return a negative error code from the VID create error handling
case instead of 0, as done elsewhere in this function.
Fixes:
c57529e1d5d8 ("mlxsw: spectrum: Replace vPorts with Port-VLAN")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Ujfalusi [Wed, 8 Nov 2017 10:02:25 +0000 (12:02 +0200)]
dmaengine: ti-dma-crossbar: Correct am335x/am43xx mux value type
[ Upstream commit
288e7560e4d3e259aa28f8f58a8dfe63627a1bf6 ]
The used 0x1f mask is only valid for am335x family of SoC, different family
using this type of crossbar might have different number of electable
events. In case of am43xx family 0x3f mask should have been used for
example.
Instead of trying to handle each family's mask, just use u8 type to store
the mux value since the event offsets are aligned to byte offset.
Fixes:
42dbdcc6bf965 ("dmaengine: ti-dma-crossbar: Add support for crossbar on AM33xx/AM43xx")
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pankaj Bharadiya [Tue, 7 Nov 2017 10:46:19 +0000 (16:16 +0530)]
ASoC: Intel: Skylake: Fix uuid_module memory leak in failure case
[ Upstream commit
f8e066521192c7debe59127d90abbe2773577e25 ]
In the loop that adds the uuid_module to the uuid_list list, allocated
memory is not properly freed in the error path free uuid_list whenever
any of the memory allocation in the loop fails to avoid memory leak.
Signed-off-by: Pankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com>
Signed-off-by: Guneshwor Singh <guneshwor.o.singh@intel.com>
Acked-By: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Rajat Jain [Tue, 31 Oct 2017 21:44:24 +0000 (14:44 -0700)]
PM / s2idle: Clear the events_check_enabled flag
[ Upstream commit
95b982b45122c57da2ee0b46cce70775e1d987af ]
Problem: This flag does not get cleared currently in the suspend or
resume path in the following cases:
* In case some driver's suspend routine returns an error.
* Successful s2idle case
* etc?
Why is this a problem: What happens is that the next suspend attempt
could fail even though the user did not enable the flag by writing to
/sys/power/wakeup_count. This is 1 use case how the issue can be seen
(but similar use case with driver suspend failure can be thought of):
1. Read /sys/power/wakeup_count
2. echo count > /sys/power/wakeup_count
3. echo freeze > /sys/power/wakeup_count
4. Let the system suspend, and wakeup the system using some wake source
that calls pm_wakeup_event() e.g. power button or something.
5. Note that the combined wakeup count would be incremented due
to the pm_wakeup_event() in the resume path.
6. After resuming the events_check_enabled flag is still set.
At this point if the user attempts to freeze again (without writing to
/sys/power/wakeup_count), the suspend would fail even though there has
been no wake event since the past resume.
Address that by clearing the flag just before a resume is completed,
so that it is always cleared for the corner cases mentioned above.
Signed-off-by: Rajat Jain <rajatja@google.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pixel Ding [Wed, 8 Nov 2017 02:20:01 +0000 (10:20 +0800)]
drm/amdgpu: bypass lru touch for KIQ ring submission
[ Upstream commit
dce1e131dd4dc68099ff1b70aa03cd2d0acf8639 ]
KIQ ring submission is used for register accessing on SRIOV
VF that could happen both in irq enabled and irq disabled cases.
Inversion lock could happen on adev->ring_lru_list_lock, while
this operation is useless and just adds overhead in this use
case.
Signed-off-by: Pixel Ding <Pixel.Ding@amd.com>
Reviewed-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Arnd Bergmann [Tue, 7 Nov 2017 10:46:05 +0000 (11:46 +0100)]
scsi: aacraid: use timespec64 instead of timeval
[ Upstream commit
820f188659122602ab217dd80cfa32b3ac0c55c0 ]
aacraid passes the current time to the firmware in one of two ways,
either as year/month/day/... or as 32-bit unsigned seconds.
The first one is broken on 32-bit architectures as it cannot go past
year 2038. Using timespec64 here makes it behave properly on both 32-bit
and 64-bit architectures, and avoids relying on signed integer overflow
to pass times into the second interface.
The interface used in aac_send_hosttime() however is still problematic
in year 2106 when 32-bit seconds overflow. Hopefully we don't have to
worry about aacraid by that time.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Dave Carroll <david.carroll@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Philipp Zabel [Tue, 7 Nov 2017 12:12:17 +0000 (13:12 +0100)]
rtc: pcf8563: fix output clock rate
[ Upstream commit
a3350f9c57ffad569c40f7320b89da1f3061c5bb ]
The pcf8563_clkout_recalc_rate function erroneously ignores the
frequency index read from the CLKO register and always returns
32768 Hz.
Fixes:
a39a6405d5f9 ("rtc: pcf8563: add CLKOUT to common clock framework")
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Christophe JAILLET [Thu, 9 Nov 2017 17:09:28 +0000 (18:09 +0100)]
video: fbdev: au1200fb: Return an error code if a memory allocation fails
[ Upstream commit
8cae353e6b01ac3f18097f631cdbceb5ff28c7f3 ]
'ret' is known to be 0 at this point.
In case of memory allocation error in 'framebuffer_alloc()', return
-ENOMEM instead.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Christophe JAILLET [Thu, 9 Nov 2017 17:09:28 +0000 (18:09 +0100)]
video: fbdev: au1200fb: Release some resources if a memory allocation fails
[ Upstream commit
451f130602619a17c8883dd0b71b11624faffd51 ]
We should go through the error handling code instead of returning -ENOMEM
directly.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Ladislav Michl [Thu, 9 Nov 2017 17:09:30 +0000 (18:09 +0100)]
video: udlfb: Fix read EDID timeout
[ Upstream commit
c98769475575c8a585f5b3952f4b5f90266f699b ]
While usb_control_msg function expects timeout in miliseconds, a value
of HZ is used. Replace it with USB_CTRL_GET_TIMEOUT and also fix error
message which looks like:
udlfb: Read EDID byte 78 failed err
ffffff92
as error is either negative errno or number of bytes transferred use %d
format specifier.
Returned EDID is in second byte, so return error when less than two bytes
are received.
Fixes:
18dffdf8913a ("staging: udlfb: enhance EDID and mode handling support")
Signed-off-by: Ladislav Michl <ladis@linux-mips.org>
Cc: Bernie Thompson <bernie@plugable.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Geert Uytterhoeven [Thu, 9 Nov 2017 17:09:33 +0000 (18:09 +0100)]
fbdev: controlfb: Add missing modes to fix out of bounds access
[ Upstream commit
ac831a379d34109451b3c41a44a20ee10ecb615f ]
Dan's static analysis says:
drivers/video/fbdev/controlfb.c:560 control_setup()
error: buffer overflow 'control_mac_modes' 20 <= 21
Indeed, control_mac_modes[] has only 20 elements, while VMODE_MAX is 22,
which may lead to an out of bounds read when parsing vmode commandline
options.
The bug was introduced in v2.4.5.6, when 2 new modes were added to
macmodes.h, but control_mac_modes[] wasn't updated:
https://kernel.opensuse.org/cgit/kernel/diff/include/video/macmodes.h?h=v2.5.2&id=
29f279c764808560eaceb88fef36cbc35c529aad
Augment control_mac_modes[] with the two new video modes to fix this.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Robert Stonehouse [Tue, 7 Nov 2017 17:30:30 +0000 (17:30 +0000)]
sfc: don't warn on successful change of MAC
[ Upstream commit
cbad52e92ad7f01f0be4ca58bde59462dc1afe3a ]
Fixes:
535a61777f44e ("sfc: suppress handled MCDI failures when changing the MAC address")
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sébastien Szymanski [Fri, 10 Nov 2017 09:01:43 +0000 (10:01 +0100)]
HID: cp2112: fix broken gpio_direction_input callback
[ Upstream commit
7da85fbf1c87d4f73621e0e7666a3387497075a9 ]
When everything goes smoothly, ret is set to 0 which makes the function
to return EIO error.
Fixes:
8e9faa15469e ("HID: cp2112: fix gpio-callback error handling")
Signed-off-by: Sébastien Szymanski <sebastien.szymanski@armadeus.com>
Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Guy Levi [Wed, 25 Oct 2017 19:39:35 +0000 (22:39 +0300)]
IB/mlx4: Fix RSS's QPC attributes assignments
[ Upstream commit
108809a0571cd1e1b317c5c083a371e163e1f8f9 ]
In the modify QP handler the base_qpn_udp field in the RSS QPC is
overwrite later by irrelevant value assignment. Hence, ingress packets
which gets to the RSS QP will be steered then to a garbage QPN.
The patch fixes this by skipping the above assignment when a RSS QP is
modified, also, the RSS context's attributes assignments are relocated
just before the context is posted to avoid future issues like this.
Additionally, this patch takes the opportunity to change the code to be
disciplined to the device's manual and assigns the RSS QP context just at
RESET to INIT transition.
Fixes:
3078f5f1bd8b ("IB/mlx4: Add support for RSS QP")
Signed-off-by: Guy Levi <guyle@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Chandan Rajendra [Mon, 11 Dec 2017 20:00:57 +0000 (15:00 -0500)]
ext4: fix crash when a directory's i_size is too small
commit
9d5afec6b8bd46d6ed821aa1579634437f58ef1f upstream.
On a ppc64 machine, when mounting a fuzzed ext2 image (generated by
fsfuzzer) the following call trace is seen,
VFS: brelse: Trying to free free buffer
WARNING: CPU: 1 PID: 6913 at /root/repos/linux/fs/buffer.c:1165 .__brelse.part.6+0x24/0x40
.__brelse.part.6+0x20/0x40 (unreliable)
.ext4_find_entry+0x384/0x4f0
.ext4_lookup+0x84/0x250
.lookup_slow+0xdc/0x230
.walk_component+0x268/0x400
.path_lookupat+0xec/0x2d0
.filename_lookup+0x9c/0x1d0
.vfs_statx+0x98/0x140
.SyS_newfstatat+0x48/0x80
system_call+0x58/0x6c
This happens because the directory that ext4_find_entry() looks up has
inode->i_size that is less than the block size of the filesystem. This
causes 'nblocks' to have a value of zero. ext4_bread_batch() ends up not
reading any of the directory file's blocks. This renders the entries in
bh_use[] array to continue to have garbage data. buffer_uptodate() on
bh_use[0] can then return a zero value upon which brelse() function is
invoked.
This commit fixes the bug by returning -ENOENT when the directory file
has no associated blocks.
Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Theodore Ts'o [Mon, 11 Dec 2017 04:44:11 +0000 (23:44 -0500)]
ext4: add missing error check in __ext4_new_inode()
commit
996fc4477a0ea28226b30d175f053fb6f9a4fa36 upstream.
It's possible for ext4_get_acl() to return an ERR_PTR. So we need to
add a check for this case in __ext4_new_inode(). Otherwise on an
error we can end up oops the kernel.
This was getting triggered by xfstests generic/388, which is a test
which exercises the shutdown code path.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Eryu Guan [Mon, 4 Dec 2017 03:52:51 +0000 (22:52 -0500)]
ext4: fix fdatasync(2) after fallocate(2) operation
commit
c894aa97577e47d3066b27b32499ecf899bfa8b0 upstream.
Currently, fallocate(2) with KEEP_SIZE followed by a fdatasync(2)
then crash, we'll see wrong allocated block number (stat -c %b), the
blocks allocated beyond EOF are all lost. fstests generic/468
exposes this bug.
Commit
67a7d5f561f4 ("ext4: fix fdatasync(2) after extent
manipulation operations") fixed all the other extent manipulation
operation paths such as hole punch, zero range, collapse range etc.,
but forgot the fallocate case.
So similarly, fix it by recording the correct journal tid in ext4
inode in fallocate(2) path, so that ext4_sync_file() will wait for
the right tid to be committed on fdatasync(2).
This addresses the test failure in xfstests test generic/468.
Signed-off-by: Eryu Guan <eguan@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Andi Kleen [Mon, 4 Dec 2017 01:38:01 +0000 (20:38 -0500)]
ext4: support fast symlinks from ext3 file systems
commit
fc82228a5e3860502dbf3bfa4a9570cb7093cf7f upstream.
407cd7fb83c0 (ext4: change fast symlink test to not rely on i_blocks)
broke ~10 years old ext3 file systems created by 2.6.17. Any ELF
executable fails because the /lib/ld-linux.so.2 fast symlink
cannot be read anymore.
The patch assumed fast symlinks were created in a specific way,
but that's not true on these really old file systems.
The new behavior is apparently needed only with the large EA inode
feature.
Revert to the old behavior if the large EA inode feature is not set.
This makes my old VM boot again.
Fixes:
407cd7fb83c0 (ext4: change fast symlink test to not rely on i_blocks)
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Kees Cook [Tue, 12 Dec 2017 19:28:38 +0000 (11:28 -0800)]
Revert "exec: avoid RLIMIT_STACK races with prlimit()"
commit
779f4e1c6c7c661db40dfebd6dd6bda7b5f88aa3 upstream.
This reverts commit
04e35f4495dd560db30c25efca4eecae8ec8c375.
SELinux runs with secureexec for all non-"noatsecure" domain transitions,
which means lots of processes end up hitting the stack hard-limit change
that was introduced in order to fix a race with prlimit(). That race fix
will need to be redesigned.
Reported-by: Laura Abbott <labbott@redhat.com>
Reported-by: Tomáš Trnka <trnka@scm.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Adam Wallis [Mon, 27 Nov 2017 15:45:01 +0000 (10:45 -0500)]
dmaengine: dmatest: move callback wait queue to thread context
commit
6f6a23a213be51728502b88741ba6a10cda2441d upstream.
Commit
adfa543e7314 ("dmatest: don't use set_freezable_with_signal()")
introduced a bug (that is in fact documented by the patch commit text)
that leaves behind a dangling pointer. Since the done_wait structure is
allocated on the stack, future invocations to the DMATEST can produce
undesirable results (e.g., corrupted spinlocks).
Commit
a9df21e34b42 ("dmaengine: dmatest: warn user when dma test times
out") attempted to WARN the user that the stack was likely corrupted but
did not fix the actual issue.
This patch fixes the issue by pushing the wait queue and callback
structs into the the thread structure. If a failure occurs due to time,
dmaengine_terminate_all will force the callback to safely call
wake_up_all() without possibility of using a freed pointer.
Bug: https://bugzilla.kernel.org/show_bug.cgi?id=197605
Fixes:
adfa543e7314 ("dmatest: don't use set_freezable_with_signal()")
Reviewed-by: Sinan Kaya <okaya@codeaurora.org>
Suggested-by: Shunyong Yang <shunyong.yang@hxt-semitech.com>
Signed-off-by: Adam Wallis <awallis@codeaurora.org>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Thomas Gleixner [Fri, 15 Dec 2017 09:32:03 +0000 (10:32 +0100)]
posix-timer: Properly check sigevent->sigev_notify
commit
cef31d9af908243421258f1df35a4a644604efbe upstream.
timer_create() specifies via sigevent->sigev_notify the signal delivery for
the new timer. The valid modes are SIGEV_NONE, SIGEV_SIGNAL, SIGEV_THREAD
and (SIGEV_SIGNAL | SIGEV_THREAD_ID).
The sanity check in good_sigevent() is only checking the valid combination
for the SIGEV_THREAD_ID bit, i.e. SIGEV_SIGNAL, but if SIGEV_THREAD_ID is
not set it accepts any random value.
This has no real effects on the posix timer and signal delivery code, but
it affects show_timer() which handles the output of /proc/$PID/timers. That
function uses a string array to pretty print sigev_notify. The access to
that array has no bound checks, so random sigev_notify cause access beyond
the array bounds.
Add proper checks for the valid notify modes and remove the SIGEV_THREAD_ID
masking from various code pathes as SIGEV_NONE can never be set in
combination with SIGEV_THREAD_ID.
Reported-by: Eric Biggers <ebiggers3@gmail.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reported-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
David Lechner [Mon, 4 Dec 2017 01:54:41 +0000 (19:54 -0600)]
eeprom: at24: change nvmem stride to 1
commit
7f6d2ecd3d7acaf205ea7b3e96f9ffc55b92298b upstream.
Trying to read the MAC address from an eeprom that has an offset that
is not a multiple of 4 causes an error currently.
Fix it by changing the nvmem stride to 1.
Signed-off-by: David Lechner <david@lechnology.com>
[Bartosz: tweaked the commit message]
Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Kirill A. Shutemov [Mon, 4 Dec 2017 12:40:56 +0000 (15:40 +0300)]
x86/boot/compressed/64: Print error if 5-level paging is not supported
commit
6d7e0ba2d2be9e50cccba213baf07e0e183c1b24 upstream.
If the machine does not support the paging mode for which the kernel was
compiled, the boot process cannot continue.
It's not possible to let the kernel detect the mismatch as it does not even
reach the point where cpu features can be evaluted due to a triple fault in
the KASLR setup.
Instead of instantaneous silent reboot, emit an error message which gives
the user the information why the boot fails.
Fixes:
77ef56e4f0fb ("x86: Enable 5-level paging support via CONFIG_X86_5LEVEL=y")
Reported-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Borislav Petkov <bp@suse.de>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: linux-mm@kvack.org
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lkml.kernel.org/r/20171204124059.63515-3-kirill.shutemov@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Kirill A. Shutemov [Mon, 4 Dec 2017 12:40:55 +0000 (15:40 +0300)]
x86/boot/compressed/64: Detect and handle 5-level paging at boot-time
commit
08529078d8d9adf689bf39cc38d53979a0869970 upstream.
Prerequisite for fixing the current problem of instantaneous reboots when a
5-level paging kernel is booted on 4-level paging hardware.
At the same time this change prepares the decompression code to boot-time
switching between 4- and 5-level paging.
[ tglx: Folded the GCC < 5 fix. ]
Fixes:
77ef56e4f0fb ("x86: Enable 5-level paging support via CONFIG_X86_5LEVEL=y")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: linux-mm@kvack.org
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lkml.kernel.org/r/20171204124059.63515-2-kirill.shutemov@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Steve Wise [Mon, 27 Nov 2017 21:16:32 +0000 (13:16 -0800)]
iw_cxgb4: only insert drain cqes if wq is flushed
commit
c058ecf6e455fac7346d46197a02398ead90851f upstream.
Only insert our special drain CQEs to support ib_drain_sq/rq() after
the wq is flushed. Otherwise, existing but not yet polled CQEs can be
returned out of order to the user application. This can happen when the
QP has exited RTS but not yet flushed the QP, which can happen during
a normal close (vs abortive close).
In addition never count the drain CQEs when determining how many CQEs
need to be synthesized during the flush operation. This latter issue
should never happen if the QP is properly flushed before inserting the
drain CQE, but I wanted to avoid corrupting the CQ state. So we handle
it and log a warning once.
Fixes:
4fe7c2962e11 ("iw_cxgb4: refactor sq/rq drain logic")
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Trond Myklebust [Fri, 15 Dec 2017 02:24:08 +0000 (21:24 -0500)]
SUNRPC: Fix a race in the receive code path
commit
90d91b0cd371193d9dbfa9beacab8ab9a4cb75e0 upstream.
We must ensure that the call to rpc_sleep_on() in xprt_transmit() cannot
race with the call to xprt_complete_rqst().
Reported-by: Chuck Lever <chuck.lever@oracle.com>
Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=317
Fixes:
ce7c252a8c74 ("SUNRPC: Add a separate spinlock to protect..")
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
monty_pavel@sina.com [Fri, 24 Nov 2017 17:43:50 +0000 (01:43 +0800)]
dm: fix various targets to dm_register_target after module __init resources created
commit
7e6358d244e4706fe612a77b9c36519a33600ac0 upstream.
A NULL pointer is seen if two concurrent "vgchange -ay -K <vg name>"
processes race to load the dm-thin-pool module:
PID: 25992 TASK:
ffff883cd7d23500 CPU: 4 COMMAND: "vgchange"
#0 [
ffff883cd743d600] machine_kexec at
ffffffff81038fa9
0000001 [
ffff883cd743d660] crash_kexec at
ffffffff810c5992
0000002 [
ffff883cd743d730] oops_end at
ffffffff81515c90
0000003 [
ffff883cd743d760] no_context at
ffffffff81049f1b
0000004 [
ffff883cd743d7b0] __bad_area_nosemaphore at
ffffffff8104a1a5
0000005 [
ffff883cd743d800] bad_area at
ffffffff8104a2ce
0000006 [
ffff883cd743d830] __do_page_fault at
ffffffff8104aa6f
0000007 [
ffff883cd743d950] do_page_fault at
ffffffff81517bae
0000008 [
ffff883cd743d980] page_fault at
ffffffff81514f95
[exception RIP: kmem_cache_alloc+108]
RIP:
ffffffff8116ef3c RSP:
ffff883cd743da38 RFLAGS:
00010046
RAX:
0000000000000004 RBX:
ffffffff81121b90 RCX:
ffff881bf1e78cc0
RDX:
0000000000000000 RSI:
00000000000000d0 RDI:
0000000000000000
RBP:
ffff883cd743da68 R8:
ffff881bf1a4eb00 R9:
0000000080042000
R10:
0000000000002000 R11:
0000000000000000 R12:
00000000000000d0
R13:
0000000000000000 R14:
00000000000000d0 R15:
0000000000000246
ORIG_RAX:
ffffffffffffffff CS: 0010 SS: 0018
0000009 [
ffff883cd743da70] mempool_alloc_slab at
ffffffff81121ba5
0000010 [
ffff883cd743da80] mempool_create_node at
ffffffff81122083
0000011 [
ffff883cd743dad0] mempool_create at
ffffffff811220f4
0000012 [
ffff883cd743dae0] pool_ctr at
ffffffffa08de049 [dm_thin_pool]
0000013 [
ffff883cd743dbd0] dm_table_add_target at
ffffffffa0005f2f [dm_mod]
0000014 [
ffff883cd743dc30] table_load at
ffffffffa0008ba9 [dm_mod]
0000015 [
ffff883cd743dc90] ctl_ioctl at
ffffffffa0009dc4 [dm_mod]
The race results in a NULL pointer because:
Process A (vgchange -ay -K):
a. send DM_LIST_VERSIONS_CMD ioctl;
b. pool_target not registered;
c. modprobe dm_thin_pool and wait until end.
Process B (vgchange -ay -K):
a. send DM_LIST_VERSIONS_CMD ioctl;
b. pool_target registered;
c. table_load->dm_table_add_target->pool_ctr;
d. _new_mapping_cache is NULL and panic.
Note:
1. process A and process B are two concurrent processes.
2. pool_target can be detected by process B but
_new_mapping_cache initialization has not ended.
To fix dm-thin-pool, and other targets (cache, multipath, and snapshot)
with the same problem, simply dm_register_target() after all resources
created during module init (as labelled with __init) are finished.
Signed-off-by: monty <monty_pavel@sina.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Steven Rostedt [Sat, 2 Dec 2017 18:04:54 +0000 (13:04 -0500)]
sched/rt: Do not pull from current CPU if only one CPU to pull
commit
f73c52a5bcd1710994e53fbccc378c42b97a06b6 upstream.
Daniel Wagner reported a crash on the BeagleBone Black SoC.
This is a single CPU architecture, and does not have a functional
arch_send_call_function_single_ipi() implementation which can crash
the kernel if that is called.
As it only has one CPU, it shouldn't be called, but if the kernel is
compiled for SMP, the push/pull RT scheduling logic now calls it for
irq_work if the one CPU is overloaded, it can use that function to call
itself and crash the kernel.
Ideally, we should disable the SCHED_FEAT(RT_PUSH_IPI) if the system
only has a single CPU. But SCHED_FEAT is a constant if sched debugging
is turned off. Another fix can also be used, and this should also help
with normal SMP machines. That is, do not initiate the pull code if
there's only one RT overloaded CPU, and that CPU happens to be the
current CPU that is scheduling in a lower priority task.
Even on a system with many CPUs, if there's many RT tasks waiting to
run on a single CPU, and that CPU schedules in another RT task of lower
priority, it will initiate the PULL logic in case there's a higher
priority RT task on another CPU that is waiting to run. But if there is
no other CPU with waiting RT tasks, it will initiate the RT pull logic
on itself (as it still has RT tasks waiting to run). This is a wasted
effort.
Not only does this help with SMP code where the current CPU is the only
one with RT overloaded tasks, it should also solve the issue that
Daniel encountered, because it will prevent the PULL logic from
executing, as there's only one CPU on the system, and the check added
here will cause it to exit the RT pull code.
Reported-by: Daniel Wagner <wagi@monom.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-rt-users <linux-rt-users@vger.kernel.org>
Fixes:
4bdced5c9 ("sched/rt: Simplify the IPI based RT balancing logic")
Link: http://lkml.kernel.org/r/20171202130454.4cbbfe8d@vmware.local.home
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jason Yan [Mon, 11 Dec 2017 07:03:33 +0000 (15:03 +0800)]
scsi: libsas: fix length error in sas_smp_handler()
commit
621f6401fdeefe96dfe9eab4b167c7c39f552bb0 upstream.
The return value of smp_execute_task_sg() is the untransferred residual,
but bsg_job_done() requires the length of payload received. This makes
SMP passthrough commands from userland by sg ioctl to libsas get a wrong
response. The userland tools such as smp_utils failed because of these
wrong responses:
~#smp_discover /dev/bsg/expander-2\:13
response too short, len=0
~#smp_discover /dev/bsg/expander-2\:134
response too short, len=0
Fix this by passing the actual received length to bsg_job_done(). And if
smp_execute_task_sg() returns 0, this means received length is exactly
the buffer length.
[mkp: typo]
Fixes:
651a01364994 ("scsi: scsi_transport_sas: switch to bsg-lib for SMP passthrough")
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Reported-by: chenqilin <chenqilin2@huawei.com>
Tested-by: chenqilin <chenqilin2@huawei.com>
CC: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bart Van Assche [Wed, 6 Dec 2017 00:57:51 +0000 (16:57 -0800)]
scsi: core: Fix a scsi_show_rq() NULL pointer dereference
commit
14e3062fb18532175af4d1c4073597999f7a2248 upstream.
Avoid that scsi_show_rq() triggers a NULL pointer dereference if called
after sd_uninit_command(). Swap the NULL pointer assignment and the
mempool_free() call in sd_uninit_command() to make it less likely that
scsi_show_rq() triggers a use-after-free. Note: even with these changes
scsi_show_rq() can trigger a use-after-free but that's a lesser evil
than e.g. suppressing debug information for T10 PI Type 2 commands
completely. This patch fixes the following oops:
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: scsi_format_opcode_name+0x1a/0x1c0
CPU: 1 PID: 1881 Comm: cat Not tainted 4.14.0-rc2.blk_mq_io_hang+ #516
Call Trace:
__scsi_format_command+0x27/0xc0
scsi_show_rq+0x5c/0xc0
__blk_mq_debugfs_rq_show+0x116/0x130
blk_mq_debugfs_rq_show+0xe/0x10
seq_read+0xfe/0x3b0
full_proxy_read+0x54/0x90
__vfs_read+0x37/0x160
vfs_read+0x96/0x130
SyS_read+0x55/0xc0
entry_SYSCALL_64_fastpath+0x1a/0xa5
[mkp: added Type 2]
Fixes:
0eebd005dd07 ("scsi: Implement blk_mq_ops.show_rq()")
Reported-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Mark Rutland [Wed, 13 Dec 2017 11:45:42 +0000 (11:45 +0000)]
arm64: fix CONFIG_DEBUG_WX address reporting
commit
1d08a044cf12aee37dfd54837558e3295287b343 upstream.
In ptdump_check_wx(), we pass walk_pgd() a start address of 0 (rather
than VA_START) for the init_mm. This means that any reported W&X
addresses are offset by VA_START, which is clearly wrong and can make
them appear like userspace addresses.
Fix this by telling the ptdump code that we're walking init_mm starting
at VA_START. We don't need to update the addr_markers, since these are
still valid bounds regardless.
Fixes:
1404d6f13e47 ("arm64: dump: Add checking for writable and exectuable pages")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Laura Abbott <labbott@redhat.com>
Reported-by: Timur Tabi <timur@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Steve Capper [Mon, 4 Dec 2017 14:13:05 +0000 (14:13 +0000)]
arm64: Initialise high_memory global variable earlier
commit
f24e5834a2c3f6c5f814a417f858226f0a010ade upstream.
The high_memory global variable is used by
cma_declare_contiguous(.) before it is defined.
We don't notice this as we compute __pa(high_memory - 1), and it looks
like we're processing a VA from the direct linear map.
This problem becomes apparent when we flip the kernel virtual address
space and the linear map is moved to the bottom of the kernel VA space.
This patch moves the initialisation of high_memory before it used.
Fixes:
f7426b983a6a ("mm: cma: adjust address limit to avoid hitting low/high memory boundary")
Signed-off-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Steve Capper [Fri, 1 Dec 2017 17:22:14 +0000 (17:22 +0000)]
arm64: mm: Fix pte_mkclean, pte_mkdirty semantics
commit
8781bcbc5e69d7da69e84c7044ca0284848d5d01 upstream.
On systems with hardware dirty bit management, the ltp madvise09 unit
test fails due to dirty bit information being lost and pages being
incorrectly freed.
This was bisected to:
arm64: Ignore hardware dirty bit updates in ptep_set_wrprotect()
Reverting this commit leads to a separate problem, that the unit test
retains pages that should have been dropped due to the function
madvise_free_pte_range(.) not cleaning pte's properly.
Currently pte_mkclean only clears the software dirty bit, thus the
following code sequence can appear:
pte = pte_mkclean(pte);
if (pte_dirty(pte))
// this condition can return true with HW DBM!
This patch also adjusts pte_mkclean to set PTE_RDONLY thus effectively
clearing both the SW and HW dirty information.
In order for this to function on systems without HW DBM, we need to
also adjust pte_mkdirty to remove the read only bit from writable pte's
to avoid infinite fault loops.
Fixes:
64c26841b349 ("arm64: Ignore hardware dirty bit updates in ptep_set_wrprotect()")
Reported-by: Bhupinder Thakur <bhupinder.thakur@linaro.org>
Tested-by: Bhupinder Thakur <bhupinder.thakur@linaro.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Scott Mayhew [Fri, 8 Dec 2017 21:00:12 +0000 (16:00 -0500)]
nfs: don't wait on commit in nfs_commit_inode() if there were no commit requests
commit
dc4fd9ab01ab379ae5af522b3efd4187a7c30a31 upstream.
If there were no commit requests, then nfs_commit_inode() should not
wait on the commit or mark the inode dirty, otherwise the following
BUG_ON can be triggered:
[ 1917.130762] kernel BUG at fs/inode.c:578!
[ 1917.130766] Oops: Exception in kernel mode, sig: 5 [#1]
[ 1917.130768] SMP NR_CPUS=2048 NUMA pSeries
[ 1917.130772] Modules linked in: iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi blocklayoutdriver rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc sg nx_crypto pseries_rng ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic crct10dif_common ibmvscsi scsi_transport_srp ibmveth scsi_tgt dm_mirror dm_region_hash dm_log dm_mod
[ 1917.130805] CPU: 2 PID: 14923 Comm: umount.nfs4 Tainted: G ------------ T 3.10.0-768.el7.ppc64 #1
[ 1917.130810] task:
c0000005ecd88040 ti:
c00000004cea0000 task.ti:
c00000004cea0000
[ 1917.130813] NIP:
c000000000354178 LR:
c000000000354160 CTR:
c00000000012db80
[ 1917.130816] REGS:
c00000004cea3720 TRAP: 0700 Tainted: G ------------ T (3.10.0-768.el7.ppc64)
[ 1917.130820] MSR:
8000000100029032 <SF,EE,ME,IR,DR,RI> CR:
22002822 XER:
20000000
[ 1917.130828] CFAR:
c00000000011f594 SOFTE: 1
GPR00:
c000000000354160 c00000004cea39a0 c0000000014c4700 c0000000018cc750
GPR04:
000000000000c750 80c0000000000000 0600000000000000 04eeb76bea749a03
GPR08:
0000000000000034 c0000000018cc758 0000000000000001 d000000005e619e8
GPR12:
c00000000012db80 c000000007b31200 0000000000000000 0000000000000000
GPR16:
0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20:
0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR24:
0000000000000000 c000000000dfc3ec 0000000000000000 c0000005eefc02c0
GPR28:
d0000000079dbd50 c0000005b94a02c0 c0000005b94a0250 c0000005b94a01c8
[ 1917.130867] NIP [
c000000000354178] .evict+0x1c8/0x350
[ 1917.130871] LR [
c000000000354160] .evict+0x1b0/0x350
[ 1917.130873] Call Trace:
[ 1917.130876] [
c00000004cea39a0] [
c000000000354160] .evict+0x1b0/0x350 (unreliable)
[ 1917.130880] [
c00000004cea3a30] [
c0000000003558cc] .evict_inodes+0x13c/0x270
[ 1917.130884] [
c00000004cea3af0] [
c000000000327d20] .kill_anon_super+0x70/0x1e0
[ 1917.130896] [
c00000004cea3b80] [
d000000005e43e30] .nfs_kill_super+0x20/0x60 [nfs]
[ 1917.130900] [
c00000004cea3c00] [
c000000000328a20] .deactivate_locked_super+0xa0/0x1b0
[ 1917.130903] [
c00000004cea3c80] [
c00000000035ba54] .cleanup_mnt+0xd4/0x180
[ 1917.130907] [
c00000004cea3d10] [
c000000000119034] .task_work_run+0x114/0x150
[ 1917.130912] [
c00000004cea3db0] [
c00000000001ba6c] .do_notify_resume+0xcc/0x100
[ 1917.130916] [
c00000004cea3e30] [
c00000000000a7b0] .ret_from_except_lite+0x5c/0x60
[ 1917.130919] Instruction dump:
[ 1917.130921]
7fc3f378 486734b5 60000000 387f00a0 38800003 4bdcb365 60000000 e95f00a0
[ 1917.130927]
694a0060 7d4a0074 794ad182 694a0001 <
0b0a0000>
892d02a4 2f890000 40de0134
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Daniel Jurgens [Tue, 5 Dec 2017 20:30:02 +0000 (22:30 +0200)]
IB/core: Don't enforce PKey security on SMI MADs
commit
0fbe8f575b15585eec3326e43708fbbc024e8486 upstream.
Per the infiniband spec an SMI MAD can have any PKey. Checking the pkey
on SMI MADs is not necessary, and it seems that some older adapters
using the mthca driver don't follow the convention of using the default
PKey, resulting in false denials, or errors querying the PKey cache.
SMI MAD security is still enforced, only agents allowed to manage the
subnet are able to receive or send SMI MADs.
Reported-by: Chris Blake <chrisrblake93@gmail.com>
Fixes:
47a2b338fe63 ("IB/core: Enforce security on management datagrams")
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Daniel Jurgens [Tue, 5 Dec 2017 20:30:01 +0000 (22:30 +0200)]
IB/core: Bound check alternate path port number
commit
4cae8ff136782d77b108cb3a5ba53e60597ba3a6 upstream.
The alternate port number is used as an array index in the IB
security implementation, invalid values can result in a kernel panic.
Fixes:
d291f1a65232 ("IB/core: Enforce PKey security on QPs")
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Mathias Nyman [Fri, 8 Dec 2017 16:10:05 +0000 (18:10 +0200)]
xhci: Don't add a virt_dev to the devs array before it's fully allocated
commit
5d9b70f7d52eb14bb37861c663bae44de9521c35 upstream.
Avoid null pointer dereference if some function is walking through the
devs array accessing members of a new virt_dev that is mid allocation.
Add the virt_dev to xhci->devs[i] _after_ the virt_device and all its
members are properly allocated.
issue found by KASAN: null-ptr-deref in xhci_find_slot_id_by_port
"Quick analysis suggests that xhci_alloc_virt_device() is not mutex
protected. If so, there is a time frame where xhci->devs[slot_id] is set
but not fully initialized. Specifically, xhci->devs[i]->udev can be NULL."
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Chunfeng Yun [Fri, 8 Dec 2017 16:10:06 +0000 (18:10 +0200)]
usb: xhci: fix TDS for MTK xHCI1.1
commit
72b663a99c074a8d073e7ecdae446cfb024ef551 upstream.
For MTK's xHCI 1.0 or latter, TD size is the number of max
packet sized packets remaining in the TD, not including
this TRB (following spec).
For MTK's xHCI 0.96 and older, TD size is the number of max
packet sized packets remaining in the TD, including this TRB
(not following spec).
Signed-off-by: Chunfeng Yun <chunfeng.yun@mediatek.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Yan, Zheng [Thu, 30 Nov 2017 03:59:22 +0000 (11:59 +0800)]
ceph: drop negative child dentries before try pruning inode's alias
commit
040d786032bf59002d374b86d75b04d97624005c upstream.
Negative child dentry holds reference on inode's alias, it makes
d_prune_aliases() do nothing.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Christoph Fritz [Sat, 9 Dec 2017 22:47:55 +0000 (23:47 +0100)]
mmc: core: apply NO_CMD23 quirk to some specific cards
commit
91516a2a4734614d62ee3ed921f8f88acc67c000 upstream.
To get an usdhc Apacer and some ATP SD cards work reliable, CMD23 needs
to be disabled. This has been tested on i.MX6 (sdhci-esdhc) and rk3288
(dw_mmc-rockchip).
Without this patch on i.MX6 (sdhci-esdhc):
$ dd if=/dev/urandom of=/mnt/test bs=1M count=10 conv=fsync
| <mmc0: starting CMD23 arg
00000400 flags
00000015>
| mmc0: starting CMD25 arg
00a71f00 flags
000000b5
| mmc0: blksz 512 blocks 1024 flags
00000100 tsac 3000 ms nsac 0
| mmc0: CMD12 arg
00000000 flags
0000049d
| sdhci [sdhci_irq()]: *** mmc0 got interrupt: 0x00000001
| mmc0: Timeout waiting for hardware interrupt.
Without this patch on rk3288 (dw_mmc-rockchip):
| mmc1: Card stuck in programming state! mmcblk1 card_busy_detect
| dwmmc_rockchip
ff0c0000.dwmmc: Busy; trying anyway
| mmc_host mmc1: Bus speed (slot 0) = 400000Hz (slot req 400000Hz,
| actual 400000HZ div = 0)
| mmc1: card never left busy state
| mmc1: tried to reset card, got error -110
| blk_update_request: I/O error, dev mmcblk1, sector 139778
| Buffer I/O error on dev mmcblk1p1, logical block 131586, lost async
| page write
Signed-off-by: Christoph Fritz <chf.fritz@googlemail.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Shuah Khan [Thu, 7 Dec 2017 21:16:50 +0000 (14:16 -0700)]
usbip: fix stub_send_ret_submit() vulnerability to null transfer_buffer
commit
be6123df1ea8f01ee2f896a16c2b7be3e4557a5a upstream.
stub_send_ret_submit() handles urb with a potential null transfer_buffer,
when it replays a packet with potential malicious data that could contain
a null buffer. Add a check for the condition when actual_length > 0 and
transfer_buffer is null.
Reported-by: Secunia Research <vuln@secunia.com>
Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Shuah Khan [Thu, 7 Dec 2017 21:16:49 +0000 (14:16 -0700)]
usbip: prevent vhci_hcd driver from leaking a socket pointer address
commit
2f2d0088eb93db5c649d2a5e34a3800a8a935fc5 upstream.
When a client has a USB device attached over IP, the vhci_hcd driver is
locally leaking a socket pointer address via the
/sys/devices/platform/vhci_hcd/status file (world-readable) and in debug
output when "usbip --debug port" is run.
Fix it to not leak. The socket pointer address is not used at the moment
and it was made visible as a convenient way to find IP address from socket
pointer address by looking up /proc/net/{tcp,tcp6}.
As this opens a security hole, the fix replaces socket pointer address with
sockfd.
Reported-by: Secunia Research <vuln@secunia.com>
Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Shuah Khan [Thu, 7 Dec 2017 21:16:48 +0000 (14:16 -0700)]
usbip: fix stub_rx: harden CMD_SUBMIT path to handle malicious input
commit
c6688ef9f29762e65bce325ef4acd6c675806366 upstream.
Harden CMD_SUBMIT path to handle malicious input that could trigger
large memory allocations. Add checks to validate transfer_buffer_length
and number_of_packets to protect against bad input requesting for
unbounded memory allocations. Validate early in get_pipe() and return
failure.
Reported-by: Secunia Research <vuln@secunia.com>
Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Shuah Khan [Thu, 7 Dec 2017 21:16:47 +0000 (14:16 -0700)]
usbip: fix stub_rx: get_pipe() to validate endpoint number
commit
635f545a7e8be7596b9b2b6a43cab6bbd5a88e43 upstream.
get_pipe() routine doesn't validate the input endpoint number
and uses to reference ep_in and ep_out arrays. Invalid endpoint
number can trigger BUG(). Range check the epnum and returning
error instead of calling BUG().
Change caller stub_recv_cmd_submit() to handle the get_pipe()
error return.
Reported-by: Secunia Research <vuln@secunia.com>
Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Amir Goldstein [Wed, 29 Nov 2017 05:35:21 +0000 (07:35 +0200)]
ovl: update ctx->pos on impure dir iteration
commit
b02a16e6413a2f782e542ef60bad9ff6bf212f8a upstream.
This fixes a regression with readdir of impure dir in overlayfs
that is shared to VM via 9p fs.
Reported-by: Miguel Bernal Marin <miguel.bernal.marin@linux.intel.com>
Fixes:
4edb83bb1041 ("ovl: constant d_ino for non-merge dirs")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Tested-by: Miguel Bernal Marin <miguel.bernal.marin@linux.intel.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Vivek Goyal [Mon, 27 Nov 2017 15:12:44 +0000 (10:12 -0500)]
ovl: Pass ovl_get_nlink() parameters in right order
commit
08d8f8a5b094b66b29936e8751b4a818b8db1207 upstream.
Right now we seem to be passing index as "lowerdentry" and origin.dentry
as "upperdentry". IIUC, we should pass these parameters in reversed order
and this looks like a bug.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Amir Goldstein <amir73il@gmail.com>
Fixes:
caf70cb2ba5d ("ovl: cleanup orphan index entries")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Alan Stern [Tue, 12 Dec 2017 19:25:13 +0000 (14:25 -0500)]
USB: core: prevent malicious bNumInterfaces overflow
commit
48a4ff1c7bb5a32d2e396b03132d20d552c0eca7 upstream.
A malicious USB device with crafted descriptors can cause the kernel
to access unallocated memory by setting the bNumInterfaces value too
high in a configuration descriptor. Although the value is adjusted
during parsing, this adjustment is skipped in one of the error return
paths.
This patch prevents the problem by setting bNumInterfaces to 0
initially. The existing code already sets it to the proper value
after parsing is complete.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
David Kozub [Tue, 5 Dec 2017 21:40:04 +0000 (22:40 +0100)]
USB: uas and storage: Add US_FL_BROKEN_FUA for another JMicron JMS567 ID
commit
62354454625741f0569c2cbe45b2d192f8fd258e upstream.
There is another JMS567-based USB3 UAS enclosure (152d:0578) that fails
with the following error:
[sda] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[sda] tag#0 Sense Key : Illegal Request [current]
[sda] tag#0 Add. Sense: Invalid field in cdb
The issue occurs both with UAS (occasionally) and mass storage
(immediately after mounting a FS on a disk in the enclosure).
Enabling US_FL_BROKEN_FUA quirk solves this issue.
This patch adds an UNUSUAL_DEV with US_FL_BROKEN_FUA for the enclosure
for both UAS and mass storage.
Signed-off-by: David Kozub <zub@linux.fjfi.cvut.cz>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Changbin Du [Thu, 30 Nov 2017 03:39:43 +0000 (11:39 +0800)]
tracing: Allocate mask_str buffer dynamically
commit
90e406f96f630c07d631a021fd4af10aac913e77 upstream.
The default NR_CPUS can be very large, but actual possible nr_cpu_ids
usually is very small. For my x86 distribution, the NR_CPUS is 8192 and
nr_cpu_ids is 4. About 2 pages are wasted.
Most machines don't have so many CPUs, so define a array with NR_CPUS
just wastes memory. So let's allocate the buffer dynamically when need.
With this change, the mutext tracing_cpumask_update_lock also can be
removed now, which was used to protect mask_str.
Link: http://lkml.kernel.org/r/1512013183-19107-1-git-send-email-changbin.du@intel.com
Fixes:
36dfe9252bd4c ("ftrace: make use of tracing_cpumask")
Signed-off-by: Changbin Du <changbin.du@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Michal Hocko [Thu, 14 Dec 2017 23:33:15 +0000 (15:33 -0800)]
mm, oom_reaper: fix memory corruption
commit
4837fe37adff1d159904f0c013471b1ecbcb455e upstream.
David Rientjes has reported the following memory corruption while the
oom reaper tries to unmap the victims address space
BUG: Bad page map in process oom_reaper pte:
6353826300000000 pmd:
00000000
addr:
00007f50cab1d000 vm_flags:
08100073 anon_vma:
ffff9eea335603f0 mapping: (null) index:
7f50cab1d
file: (null) fault: (null) mmap: (null) readpage: (null)
CPU: 2 PID: 1001 Comm: oom_reaper
Call Trace:
unmap_page_range+0x1068/0x1130
__oom_reap_task_mm+0xd5/0x16b
oom_reaper+0xff/0x14c
kthread+0xc1/0xe0
Tetsuo Handa has noticed that the synchronization inside exit_mmap is
insufficient. We only synchronize with the oom reaper if
tsk_is_oom_victim which is not true if the final __mmput is called from
a different context than the oom victim exit path. This can trivially
happen from context of any task which has grabbed mm reference (e.g. to
read /proc/<pid>/ file which requires mm etc.).
The race would look like this
oom_reaper oom_victim task
mmget_not_zero
do_exit
mmput
__oom_reap_task_mm mmput
__mmput
exit_mmap
remove_vma
unmap_page_range
Fix this issue by providing a new mm_is_oom_victim() helper which
operates on the mm struct rather than a task. Any context which
operates on a remote mm struct should use this helper in place of
tsk_is_oom_victim. The flag is set in mark_oom_victim and never cleared
so it is stable in the exit_mmap path.
Debugged by Tetsuo Handa.
Link: http://lkml.kernel.org/r/20171210095130.17110-1-mhocko@kernel.org
Fixes:
212925802454 ("mm: oom: let oom_reap_task and exit_mmap run concurrently")
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: David Rientjes <rientjes@google.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Andrea Argangeli <andrea@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Thiago Rafael Becker [Thu, 14 Dec 2017 23:33:12 +0000 (15:33 -0800)]
kernel: make groups_sort calling a responsibility group_info allocators
commit
bdcf0a423ea1c40bbb40e7ee483b50fc8aa3d758 upstream.
In testing, we found that nfsd threads may call set_groups in parallel
for the same entry cached in auth.unix.gid, racing in the call of
groups_sort, corrupting the groups for that entry and leading to
permission denials for the client.
This patch:
- Make groups_sort globally visible.
- Move the call to groups_sort to the modifiers of group_info
- Remove the call to groups_sort from set_groups
Link: http://lkml.kernel.org/r/20171211151420.18655-1-thiago.becker@gmail.com
Signed-off-by: Thiago Rafael Becker <thiago.becker@gmail.com>
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: NeilBrown <neilb@suse.com>
Acked-by: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
NeilBrown [Thu, 14 Dec 2017 23:32:38 +0000 (15:32 -0800)]
autofs: fix careless error in recent commit
commit
302ec300ef8a545a7fc7f667e5fd743b091c2eeb upstream.
Commit
ecc0c469f277 ("autofs: don't fail mount for transient error") was
meant to replace an 'if' with a 'switch', but instead added the 'switch'
leaving the case in place.
Link: http://lkml.kernel.org/r/87zi6wstmw.fsf@notabene.neil.brown.name
Fixes:
ecc0c469f277 ("autofs: don't fail mount for transient error")
Reported-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: NeilBrown <neilb@suse.com>
Cc: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Arnd Bergmann [Thu, 14 Dec 2017 23:32:34 +0000 (15:32 -0800)]
string.h: workaround for increased stack usage
commit
146734b091430c80d80bb96b1139a96fb4bc830e upstream.
The hardened strlen() function causes rather large stack usage in at
least one file in the kernel, in particular when CONFIG_KASAN is
enabled:
drivers/media/usb/em28xx/em28xx-dvb.c: In function 'em28xx_dvb_init':
drivers/media/usb/em28xx/em28xx-dvb.c:2062:1: error: the frame size of 3256 bytes is larger than 204 bytes [-Werror=frame-larger-than=]
Analyzing this problem led to the discovery that gcc fails to merge the
stack slots for the i2c_board_info[] structures after we strlcpy() into
them, due to the 'noreturn' attribute on the source string length check.
I reported this as a gcc bug, but it is unlikely to get fixed for gcc-8,
since it is relatively easy to work around, and it gets triggered
rarely. An earlier workaround I did added an empty inline assembly
statement before the call to fortify_panic(), which works surprisingly
well, but is really ugly and unintuitive.
This is a new approach to the same problem, this time addressing it by
not calling the 'extern __real_strnlen()' function for string constants
where __builtin_strlen() is a compile-time constant and therefore known
to be safe.
We do this by checking if the last character in the string is a
compile-time constant '\0'. If it is, we can assume that strlen() of
the string is also constant.
As a side-effect, this should also improve the object code output for
any other call of strlen() on a string constant.
[akpm@linux-foundation.org: add comment]
Link: http://lkml.kernel.org/r/20171205215143.3085755-1-arnd@arndb.de
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82365
Link: https://patchwork.kernel.org/patch/9980413/
Link: https://patchwork.kernel.org/patch/9974047/
Fixes:
6974f0c4555 ("include/linux/string.h: add the option of fortified string.h functions")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Daniel Micay <danielmicay@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Martin Wilck <mwilck@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Ronnie Sahlberg [Mon, 20 Nov 2017 22:36:33 +0000 (09:36 +1100)]
cifs: fix NULL deref in SMB2_read
commit
a821df3f1af72aa6a0d573eea94a7dd2613e9f4e upstream.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Eric Biggers [Tue, 28 Nov 2017 08:46:24 +0000 (00:46 -0800)]
crypto: af_alg - fix NULL pointer dereference in
commit
887207ed9e5812ed9239b6d07185a2d35dda91db upstream.
af_alg_free_areq_sgls()
If allocating the ->tsgl member of 'struct af_alg_async_req' failed,
during cleanup we dereferenced the NULL ->tsgl pointer in
af_alg_free_areq_sgls(), because ->tsgl_entries was nonzero.
Fix it by only freeing the ->tsgl list if it is non-NULL.
This affected both algif_skcipher and algif_aead.
Fixes:
e870456d8e7c ("crypto: algif_skcipher - overhaul memory management")
Fixes:
d887c52d6ae4 ("crypto: algif_aead - overhaul memory management")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Eric Biggers [Wed, 29 Nov 2017 04:56:59 +0000 (20:56 -0800)]
crypto: salsa20 - fix blkcipher_walk API usage
commit
ecaaab5649781c5a0effdaf298a925063020500e upstream.
When asked to encrypt or decrypt 0 bytes, both the generic and x86
implementations of Salsa20 crash in blkcipher_walk_done(), either when
doing 'kfree(walk->buffer)' or 'free_page((unsigned long)walk->page)',
because walk->buffer and walk->page have not been initialized.
The bug is that Salsa20 is calling blkcipher_walk_done() even when
nothing is in 'walk.nbytes'. But blkcipher_walk_done() is only meant to
be called when a nonzero number of bytes have been provided.
The broken code is part of an optimization that tries to make only one
call to salsa20_encrypt_bytes() to process inputs that are not evenly
divisible by 64 bytes. To fix the bug, just remove this "optimization"
and use the blkcipher_walk API the same way all the other users do.
Reproducer:
#include <linux/if_alg.h>
#include <sys/socket.h>
#include <unistd.h>
int main()
{
int algfd, reqfd;
struct sockaddr_alg addr = {
.salg_type = "skcipher",
.salg_name = "salsa20",
};
char key[16] = { 0 };
algfd = socket(AF_ALG, SOCK_SEQPACKET, 0);
bind(algfd, (void *)&addr, sizeof(addr));
reqfd = accept(algfd, 0, 0);
setsockopt(algfd, SOL_ALG, ALG_SET_KEY, key, sizeof(key));
read(reqfd, key, sizeof(key));
}
Reported-by: syzbot <syzkaller@googlegroups.com>
Fixes:
eb6f13eb9f81 ("[CRYPTO] salsa20_generic: Fix multi-page processing")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Eric Biggers [Wed, 29 Nov 2017 02:01:38 +0000 (18:01 -0800)]
crypto: hmac - require that the underlying hash algorithm is unkeyed
commit
af3ff8045bbf3e32f1a448542e73abb4c8ceb6f1 upstream.
Because the HMAC template didn't check that its underlying hash
algorithm is unkeyed, trying to use "hmac(hmac(sha3-512-generic))"
through AF_ALG or through KEYCTL_DH_COMPUTE resulted in the inner HMAC
being used without having been keyed, resulting in sha3_update() being
called without sha3_init(), causing a stack buffer overflow.
This is a very old bug, but it seems to have only started causing real
problems when SHA-3 support was added (requires CONFIG_CRYPTO_SHA3)
because the innermost hash's state is ->import()ed from a zeroed buffer,
and it just so happens that other hash algorithms are fine with that,
but SHA-3 is not. However, there could be arch or hardware-dependent
hash algorithms also affected; I couldn't test everything.
Fix the bug by introducing a function crypto_shash_alg_has_setkey()
which tests whether a shash algorithm is keyed. Then update the HMAC
template to require that its underlying hash algorithm is unkeyed.
Here is a reproducer:
#include <linux/if_alg.h>
#include <sys/socket.h>
int main()
{
int algfd;
struct sockaddr_alg addr = {
.salg_type = "hash",
.salg_name = "hmac(hmac(sha3-512-generic))",
};
char key[4096] = { 0 };
algfd = socket(AF_ALG, SOCK_SEQPACKET, 0);
bind(algfd, (const struct sockaddr *)&addr, sizeof(addr));
setsockopt(algfd, SOL_ALG, ALG_SET_KEY, key, sizeof(key));
}
Here was the KASAN report from syzbot:
BUG: KASAN: stack-out-of-bounds in memcpy include/linux/string.h:341 [inline]
BUG: KASAN: stack-out-of-bounds in sha3_update+0xdf/0x2e0 crypto/sha3_generic.c:161
Write of size 4096 at addr
ffff8801cca07c40 by task syzkaller076574/3044
CPU: 1 PID: 3044 Comm: syzkaller076574 Not tainted 4.14.0-mm1+ #25
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:17 [inline]
dump_stack+0x194/0x257 lib/dump_stack.c:53
print_address_description+0x73/0x250 mm/kasan/report.c:252
kasan_report_error mm/kasan/report.c:351 [inline]
kasan_report+0x25b/0x340 mm/kasan/report.c:409
check_memory_region_inline mm/kasan/kasan.c:260 [inline]
check_memory_region+0x137/0x190 mm/kasan/kasan.c:267
memcpy+0x37/0x50 mm/kasan/kasan.c:303
memcpy include/linux/string.h:341 [inline]
sha3_update+0xdf/0x2e0 crypto/sha3_generic.c:161
crypto_shash_update+0xcb/0x220 crypto/shash.c:109
shash_finup_unaligned+0x2a/0x60 crypto/shash.c:151
crypto_shash_finup+0xc4/0x120 crypto/shash.c:165
hmac_finup+0x182/0x330 crypto/hmac.c:152
crypto_shash_finup+0xc4/0x120 crypto/shash.c:165
shash_digest_unaligned+0x9e/0xd0 crypto/shash.c:172
crypto_shash_digest+0xc4/0x120 crypto/shash.c:186
hmac_setkey+0x36a/0x690 crypto/hmac.c:66
crypto_shash_setkey+0xad/0x190 crypto/shash.c:64
shash_async_setkey+0x47/0x60 crypto/shash.c:207
crypto_ahash_setkey+0xaf/0x180 crypto/ahash.c:200
hash_setkey+0x40/0x90 crypto/algif_hash.c:446
alg_setkey crypto/af_alg.c:221 [inline]
alg_setsockopt+0x2a1/0x350 crypto/af_alg.c:254
SYSC_setsockopt net/socket.c:1851 [inline]
SyS_setsockopt+0x189/0x360 net/socket.c:1830
entry_SYSCALL_64_fastpath+0x1f/0x96
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Eric Biggers [Mon, 27 Nov 2017 07:16:49 +0000 (23:16 -0800)]
crypto: rsa - fix buffer overread when stripping leading zeroes
commit
d2890c3778b164fde587bc16583f3a1c87233ec5 upstream.
In rsa_get_n(), if the buffer contained all 0's and "FIPS mode" is
enabled, we would read one byte past the end of the buffer while
scanning the leading zeroes. Fix it by checking 'n_sz' before '!*ptr'.
This bug was reachable by adding a specially crafted key of type
"asymmetric" (requires CONFIG_RSA and CONFIG_X509_CERTIFICATE_PARSER).
KASAN report:
BUG: KASAN: slab-out-of-bounds in rsa_get_n+0x19e/0x1d0 crypto/rsa_helper.c:33
Read of size 1 at addr
ffff88003501a708 by task keyctl/196
CPU: 1 PID: 196 Comm: keyctl Not tainted 4.14.0-09238-g1d3b78bbc6e9 #26
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
Call Trace:
rsa_get_n+0x19e/0x1d0 crypto/rsa_helper.c:33
asn1_ber_decoder+0x82a/0x1fd0 lib/asn1_decoder.c:328
rsa_set_pub_key+0xd3/0x320 crypto/rsa.c:278
crypto_akcipher_set_pub_key ./include/crypto/akcipher.h:364 [inline]
pkcs1pad_set_pub_key+0xae/0x200 crypto/rsa-pkcs1pad.c:117
crypto_akcipher_set_pub_key ./include/crypto/akcipher.h:364 [inline]
public_key_verify_signature+0x270/0x9d0 crypto/asymmetric_keys/public_key.c:106
x509_check_for_self_signed+0x2ea/0x480 crypto/asymmetric_keys/x509_public_key.c:141
x509_cert_parse+0x46a/0x620 crypto/asymmetric_keys/x509_cert_parser.c:129
x509_key_preparse+0x61/0x750 crypto/asymmetric_keys/x509_public_key.c:174
asymmetric_key_preparse+0xa4/0x150 crypto/asymmetric_keys/asymmetric_type.c:388
key_create_or_update+0x4d4/0x10a0 security/keys/key.c:850
SYSC_add_key security/keys/keyctl.c:122 [inline]
SyS_add_key+0xe8/0x290 security/keys/keyctl.c:62
entry_SYSCALL_64_fastpath+0x1f/0x96
Allocated by task 196:
__do_kmalloc mm/slab.c:3711 [inline]
__kmalloc_track_caller+0x118/0x2e0 mm/slab.c:3726
kmemdup+0x17/0x40 mm/util.c:118
kmemdup ./include/linux/string.h:414 [inline]
x509_cert_parse+0x2cb/0x620 crypto/asymmetric_keys/x509_cert_parser.c:106
x509_key_preparse+0x61/0x750 crypto/asymmetric_keys/x509_public_key.c:174
asymmetric_key_preparse+0xa4/0x150 crypto/asymmetric_keys/asymmetric_type.c:388
key_create_or_update+0x4d4/0x10a0 security/keys/key.c:850
SYSC_add_key security/keys/keyctl.c:122 [inline]
SyS_add_key+0xe8/0x290 security/keys/keyctl.c:62
entry_SYSCALL_64_fastpath+0x1f/0x96
Fixes:
5a7de97309f5 ("crypto: rsa - return raw integers for the ASN.1 parser")
Cc: Tudor Ambarus <tudor-dan.ambarus@nxp.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: James Morris <james.l.morris@oracle.com>
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Eric Biggers [Tue, 28 Nov 2017 07:23:05 +0000 (23:23 -0800)]
crypto: algif_aead - fix reference counting of null skcipher
commit
b32a7dc8aef1882fbf983eb354837488cc9d54dc upstream.
In the AEAD interface for AF_ALG, the reference to the "null skcipher"
held by each tfm was being dropped in the wrong place -- when each
af_alg_ctx was freed instead of when the aead_tfm was freed. As
discovered by syzkaller, a specially crafted program could use this to
cause the null skcipher to be freed while it is still in use.
Fix it by dropping the reference in the right place.
Fixes:
72548b093ee3 ("crypto: algif_aead - copy AAD from src to dst")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Martin Kaiser [Tue, 17 Oct 2017 20:53:08 +0000 (22:53 +0200)]
mfd: fsl-imx25: Clean up irq settings during removal
commit
18f77393796848e68909e65d692c1d1436f06e06 upstream.
When fsl-imx25-tsadc is compiled as a module, loading, unloading and
reloading the module will lead to a crash.
Unable to handle kernel paging request at virtual address
bf005430
[<
c004df6c>] (irq_find_matching_fwspec)
from [<
c028d5ec>] (of_irq_get+0x58/0x74)
[<
c028d594>] (of_irq_get)
from [<
c01ff970>] (platform_get_irq+0x48/0xc8)
[<
c01ff928>] (platform_get_irq)
from [<
bf00e33c>] (mx25_tsadc_probe+0x220/0x2f4 [fsl_imx25_tsadc])
irq_find_matching_fwspec() loops over all registered irq domains. The
irq domain is still registered from last time the module was loaded but
the pointer to its operations is invalid after the module was unloaded.
Add a removal function which clears the irq handler and removes the irq
domain. With this cleanup in place, it's possible to unload and reload
the module.
Signed-off-by: Martin Kaiser <martin@kaiser.cx>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Greg Kroah-Hartman [Sun, 17 Dec 2017 14:08:14 +0000 (15:08 +0100)]
Linux 4.14.7
Mauro Carvalho Chehab [Tue, 7 Nov 2017 13:39:39 +0000 (08:39 -0500)]
dvb_frontend: don't use-after-free the frontend struct
commit
b1cb7372fa822af6c06c8045963571d13ad6348b upstream.
dvb_frontend_invoke_release() may free the frontend struct.
So, the free logic can't update it anymore after calling it.
That's OK, as __dvb_frontend_free() is called only when the
krefs are zeroed, so nobody is using it anymore.
That should fix the following KASAN error:
The KASAN report looks like this (running on kernel
3e0cc09a3a2c40ec1ffb6b4e12da86e98feccb11 (4.14-rc5+)):
==================================================================
BUG: KASAN: use-after-free in __dvb_frontend_free+0x113/0x120
Write of size 8 at addr
ffff880067d45a00 by task kworker/0:1/24
CPU: 0 PID: 24 Comm: kworker/0:1 Not tainted 4.14.0-rc5-43687-g06ab8a23e0e6 #545
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Workqueue: usb_hub_wq hub_event
Call Trace:
__dump_stack lib/dump_stack.c:16
dump_stack+0x292/0x395 lib/dump_stack.c:52
print_address_description+0x78/0x280 mm/kasan/report.c:252
kasan_report_error mm/kasan/report.c:351
kasan_report+0x23d/0x350 mm/kasan/report.c:409
__asan_report_store8_noabort+0x1c/0x20 mm/kasan/report.c:435
__dvb_frontend_free+0x113/0x120 drivers/media/dvb-core/dvb_frontend.c:156
dvb_frontend_put+0x59/0x70 drivers/media/dvb-core/dvb_frontend.c:176
dvb_frontend_detach+0x120/0x150 drivers/media/dvb-core/dvb_frontend.c:2803
dvb_usb_adapter_frontend_exit+0xd6/0x160 drivers/media/usb/dvb-usb/dvb-usb-dvb.c:340
dvb_usb_adapter_exit drivers/media/usb/dvb-usb/dvb-usb-init.c:116
dvb_usb_exit+0x9b/0x200 drivers/media/usb/dvb-usb/dvb-usb-init.c:132
dvb_usb_device_exit+0xa5/0xf0 drivers/media/usb/dvb-usb/dvb-usb-init.c:295
usb_unbind_interface+0x21c/0xa90 drivers/usb/core/driver.c:423
__device_release_driver drivers/base/dd.c:861
device_release_driver_internal+0x4f1/0x5c0 drivers/base/dd.c:893
device_release_driver+0x1e/0x30 drivers/base/dd.c:918
bus_remove_device+0x2f4/0x4b0 drivers/base/bus.c:565
device_del+0x5c4/0xab0 drivers/base/core.c:1985
usb_disable_device+0x1e9/0x680 drivers/usb/core/message.c:1170
usb_disconnect+0x260/0x7a0 drivers/usb/core/hub.c:2124
hub_port_connect drivers/usb/core/hub.c:4754
hub_port_connect_change drivers/usb/core/hub.c:5009
port_event drivers/usb/core/hub.c:5115
hub_event+0x1318/0x3740 drivers/usb/core/hub.c:5195
process_one_work+0xc73/0x1d90 kernel/workqueue.c:2119
worker_thread+0x221/0x1850 kernel/workqueue.c:2253
kthread+0x363/0x440 kernel/kthread.c:231
ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431
Allocated by task 24:
save_stack_trace+0x1b/0x20 arch/x86/kernel/stacktrace.c:59
save_stack+0x43/0xd0 mm/kasan/kasan.c:447
set_track mm/kasan/kasan.c:459
kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
kmem_cache_alloc_trace+0x11e/0x2d0 mm/slub.c:2772
kmalloc ./include/linux/slab.h:493
kzalloc ./include/linux/slab.h:666
dtt200u_fe_attach+0x4c/0x110 drivers/media/usb/dvb-usb/dtt200u-fe.c:212
dtt200u_frontend_attach+0x35/0x80 drivers/media/usb/dvb-usb/dtt200u.c:136
dvb_usb_adapter_frontend_init+0x32b/0x660 drivers/media/usb/dvb-usb/dvb-usb-dvb.c:286
dvb_usb_adapter_init drivers/media/usb/dvb-usb/dvb-usb-init.c:86
dvb_usb_init drivers/media/usb/dvb-usb/dvb-usb-init.c:162
dvb_usb_device_init+0xf73/0x17f0 drivers/media/usb/dvb-usb/dvb-usb-init.c:277
dtt200u_usb_probe+0xa1/0xe0 drivers/media/usb/dvb-usb/dtt200u.c:155
usb_probe_interface+0x35d/0x8e0 drivers/usb/core/driver.c:361
really_probe drivers/base/dd.c:413
driver_probe_device+0x610/0xa00 drivers/base/dd.c:557
__device_attach_driver+0x230/0x290 drivers/base/dd.c:653
bus_for_each_drv+0x161/0x210 drivers/base/bus.c:463
__device_attach+0x26b/0x3c0 drivers/base/dd.c:710
device_initial_probe+0x1f/0x30 drivers/base/dd.c:757
bus_probe_device+0x1eb/0x290 drivers/base/bus.c:523
device_add+0xd0b/0x1660 drivers/base/core.c:1835
usb_set_configuration+0x104e/0x1870 drivers/usb/core/message.c:1932
generic_probe+0x73/0xe0 drivers/usb/core/generic.c:174
usb_probe_device+0xaf/0xe0 drivers/usb/core/driver.c:266
really_probe drivers/base/dd.c:413
driver_probe_device+0x610/0xa00 drivers/base/dd.c:557
__device_attach_driver+0x230/0x290 drivers/base/dd.c:653
bus_for_each_drv+0x161/0x210 drivers/base/bus.c:463
__device_attach+0x26b/0x3c0 drivers/base/dd.c:710
device_initial_probe+0x1f/0x30 drivers/base/dd.c:757
bus_probe_device+0x1eb/0x290 drivers/base/bus.c:523
device_add+0xd0b/0x1660 drivers/base/core.c:1835
usb_new_device+0x7b8/0x1020 drivers/usb/core/hub.c:2457
hub_port_connect drivers/usb/core/hub.c:4903
hub_port_connect_change drivers/usb/core/hub.c:5009
port_event drivers/usb/core/hub.c:5115
hub_event+0x194d/0x3740 drivers/usb/core/hub.c:5195
process_one_work+0xc73/0x1d90 kernel/workqueue.c:2119
worker_thread+0x221/0x1850 kernel/workqueue.c:2253
kthread+0x363/0x440 kernel/kthread.c:231
ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431
Freed by task 24:
save_stack_trace+0x1b/0x20 arch/x86/kernel/stacktrace.c:59
save_stack+0x43/0xd0 mm/kasan/kasan.c:447
set_track mm/kasan/kasan.c:459
kasan_slab_free+0x72/0xc0 mm/kasan/kasan.c:524
slab_free_hook mm/slub.c:1390
slab_free_freelist_hook mm/slub.c:1412
slab_free mm/slub.c:2988
kfree+0xf6/0x2f0 mm/slub.c:3919
dtt200u_fe_release+0x3c/0x50 drivers/media/usb/dvb-usb/dtt200u-fe.c:202
dvb_frontend_invoke_release.part.13+0x1c/0x30 drivers/media/dvb-core/dvb_frontend.c:2790
dvb_frontend_invoke_release drivers/media/dvb-core/dvb_frontend.c:2789
__dvb_frontend_free+0xad/0x120 drivers/media/dvb-core/dvb_frontend.c:153
dvb_frontend_put+0x59/0x70 drivers/media/dvb-core/dvb_frontend.c:176
dvb_frontend_detach+0x120/0x150 drivers/media/dvb-core/dvb_frontend.c:2803
dvb_usb_adapter_frontend_exit+0xd6/0x160 drivers/media/usb/dvb-usb/dvb-usb-dvb.c:340
dvb_usb_adapter_exit drivers/media/usb/dvb-usb/dvb-usb-init.c:116
dvb_usb_exit+0x9b/0x200 drivers/media/usb/dvb-usb/dvb-usb-init.c:132
dvb_usb_device_exit+0xa5/0xf0 drivers/media/usb/dvb-usb/dvb-usb-init.c:295
usb_unbind_interface+0x21c/0xa90 drivers/usb/core/driver.c:423
__device_release_driver drivers/base/dd.c:861
device_release_driver_internal+0x4f1/0x5c0 drivers/base/dd.c:893
device_release_driver+0x1e/0x30 drivers/base/dd.c:918
bus_remove_device+0x2f4/0x4b0 drivers/base/bus.c:565
device_del+0x5c4/0xab0 drivers/base/core.c:1985
usb_disable_device+0x1e9/0x680 drivers/usb/core/message.c:1170
usb_disconnect+0x260/0x7a0 drivers/usb/core/hub.c:2124
hub_port_connect drivers/usb/core/hub.c:4754
hub_port_connect_change drivers/usb/core/hub.c:5009
port_event drivers/usb/core/hub.c:5115
hub_event+0x1318/0x3740 drivers/usb/core/hub.c:5195
process_one_work+0xc73/0x1d90 kernel/workqueue.c:2119
worker_thread+0x221/0x1850 kernel/workqueue.c:2253
kthread+0x363/0x440 kernel/kthread.c:231
ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431
The buggy address belongs to the object at
ffff880067d45500
which belongs to the cache kmalloc-2048 of size 2048
The buggy address is located 1280 bytes inside of
2048-byte region [
ffff880067d45500,
ffff880067d45d00)
The buggy address belongs to the page:
page:
ffffea00019f5000 count:1 mapcount:0 mapping: (null)
index:0x0 compound_mapcount: 0
flags: 0x100000000008100(slab|head)
raw:
0100000000008100 0000000000000000 0000000000000000 00000001000f000f
raw:
dead000000000100 dead000000000200 ffff88006c002d80 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff880067d45900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff880067d45980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff880067d45a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff880067d45a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff880067d45b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
Fixes:
ead666000a5f ("media: dvb_frontend: only use kref after initialized")
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Suggested-by: Matthias Schwarzott <zzam@gentoo.org>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Daniel Scheller [Sun, 29 Oct 2017 15:43:22 +0000 (11:43 -0400)]
media: dvb-core: always call invoke_release() in fe_free()
commit
62229de19ff2b7f3e0ebf4d48ad99061127d0281 upstream.
Follow-up to:
ead666000a5f ("media: dvb_frontend: only use kref after initialized")
The aforementioned commit fixed refcount OOPSes when demod driver attaching
succeeded but tuner driver didn't. However, the use count of the attached
demod drivers don't go back to zero and thus couldn't be cleanly unloaded.
Improve on this by calling dvb_frontend_invoke_release() in
__dvb_frontend_free() regardless of fepriv being NULL, instead of returning
when fepriv is NULL. This is safe to do since _invoke_release() will check
for passed pointers being valid before calling the .release() function.
[mchehab@s-opensource.com: changed the logic a little bit to reduce
conflicts with another bug fix patch under review]
Fixes:
ead666000a5f ("media: dvb_frontend: only use kref after initialized")
Signed-off-by: Daniel Scheller <d.scheller@gmx.net>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reinette Chatre [Fri, 20 Oct 2017 09:16:58 +0000 (02:16 -0700)]
x86/intel_rdt: Fix potential deadlock during resctrl unmount
[ Upstream commit
36b6f9fcb8928c06b6638a4cf91bc9d69bb49aa2 ]
Lockdep warns about a potential deadlock:
[ 66.782842] ======================================================
[ 66.782888] WARNING: possible circular locking dependency detected
[ 66.782937] 4.14.0-rc2-test-test+ #48 Not tainted
[ 66.782983] ------------------------------------------------------
[ 66.783052] umount/336 is trying to acquire lock:
[ 66.783117] (cpu_hotplug_lock.rw_sem){++++}, at: [<
ffffffff81032395>] rdt_kill_sb+0x215/0x390
[ 66.783193]
but task is already holding lock:
[ 66.783244] (rdtgroup_mutex){+.+.}, at: [<
ffffffff810321b6>] rdt_kill_sb+0x36/0x390
[ 66.783305]
which lock already depends on the new lock.
[ 66.783364]
the existing dependency chain (in reverse order) is:
[ 66.783419]
-> #3 (rdtgroup_mutex){+.+.}:
[ 66.783467] __lock_acquire+0x1293/0x13f0
[ 66.783509] lock_acquire+0xaf/0x220
[ 66.783543] __mutex_lock+0x71/0x9b0
[ 66.783575] mutex_lock_nested+0x1b/0x20
[ 66.783610] intel_rdt_online_cpu+0x3b/0x430
[ 66.783649] cpuhp_invoke_callback+0xab/0x8e0
[ 66.783687] cpuhp_thread_fun+0x7a/0x150
[ 66.783722] smpboot_thread_fn+0x1cc/0x270
[ 66.783764] kthread+0x16e/0x190
[ 66.783794] ret_from_fork+0x27/0x40
[ 66.783825]
-> #2 (cpuhp_state){+.+.}:
[ 66.783870] __lock_acquire+0x1293/0x13f0
[ 66.783906] lock_acquire+0xaf/0x220
[ 66.783938] cpuhp_issue_call+0x102/0x170
[ 66.783974] __cpuhp_setup_state_cpuslocked+0x154/0x2a0
[ 66.784023] __cpuhp_setup_state+0xc7/0x170
[ 66.784061] page_writeback_init+0x43/0x67
[ 66.784097] pagecache_init+0x43/0x4a
[ 66.784131] start_kernel+0x3ad/0x3f7
[ 66.784165] x86_64_start_reservations+0x2a/0x2c
[ 66.784204] x86_64_start_kernel+0x72/0x75
[ 66.784241] verify_cpu+0x0/0xfb
[ 66.784270]
-> #1 (cpuhp_state_mutex){+.+.}:
[ 66.784319] __lock_acquire+0x1293/0x13f0
[ 66.784355] lock_acquire+0xaf/0x220
[ 66.784387] __mutex_lock+0x71/0x9b0
[ 66.784419] mutex_lock_nested+0x1b/0x20
[ 66.784454] __cpuhp_setup_state_cpuslocked+0x52/0x2a0
[ 66.784497] __cpuhp_setup_state+0xc7/0x170
[ 66.784535] page_alloc_init+0x28/0x30
[ 66.784569] start_kernel+0x148/0x3f7
[ 66.784602] x86_64_start_reservations+0x2a/0x2c
[ 66.784642] x86_64_start_kernel+0x72/0x75
[ 66.784678] verify_cpu+0x0/0xfb
[ 66.784707]
-> #0 (cpu_hotplug_lock.rw_sem){++++}:
[ 66.784759] check_prev_add+0x32f/0x6e0
[ 66.784794] __lock_acquire+0x1293/0x13f0
[ 66.784830] lock_acquire+0xaf/0x220
[ 66.784863] cpus_read_lock+0x3d/0xb0
[ 66.784896] rdt_kill_sb+0x215/0x390
[ 66.784930] deactivate_locked_super+0x3e/0x70
[ 66.784968] deactivate_super+0x40/0x60
[ 66.785003] cleanup_mnt+0x3f/0x80
[ 66.785034] __cleanup_mnt+0x12/0x20
[ 66.785070] task_work_run+0x8b/0xc0
[ 66.785103] exit_to_usermode_loop+0x94/0xa0
[ 66.786804] syscall_return_slowpath+0xe8/0x150
[ 66.788502] entry_SYSCALL_64_fastpath+0xab/0xad
[ 66.790194]
other info that might help us debug this:
[ 66.795139] Chain exists of:
cpu_hotplug_lock.rw_sem --> cpuhp_state --> rdtgroup_mutex
[ 66.800035] Possible unsafe locking scenario:
[ 66.803267] CPU0 CPU1
[ 66.804867] ---- ----
[ 66.806443] lock(rdtgroup_mutex);
[ 66.808002] lock(cpuhp_state);
[ 66.809565] lock(rdtgroup_mutex);
[ 66.811110] lock(cpu_hotplug_lock.rw_sem);
[ 66.812608]
*** DEADLOCK ***
[ 66.816983] 2 locks held by umount/336:
[ 66.818418] #0: (&type->s_umount_key#35){+.+.}, at: [<
ffffffff81229738>] deactivate_super+0x38/0x60
[ 66.819922] #1: (rdtgroup_mutex){+.+.}, at: [<
ffffffff810321b6>] rdt_kill_sb+0x36/0x390
When the resctrl filesystem is unmounted the locks should be obtain in the
locks in the same order as was done when the cpus came online:
cpu_hotplug_lock before rdtgroup_mutex.
This also requires to switch the static_branch_disable() calls to the
_cpulocked variant because now cpu hotplug lock is held already.
[ tglx: Switched to cpus_read_[un]lock ]
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Acked-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Acked-by: Fenghua Yu <fenghua.yu@intel.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/cc292e76be073f7260604651711c47b09fd0dc81.1508490116.git.reinette.chatre@intel.com
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Leon Romanovsky [Wed, 25 Oct 2017 20:10:19 +0000 (23:10 +0300)]
RDMA/cxgb4: Annotate r2 and stag as __be32
[ Upstream commit
7d7d065a5eec7e218174d5c64a9f53f99ffdb119 ]
Chelsio cxgb4 HW is big-endian, hence there is need to properly
annotate r2 and stag fields as __be32 and not __u32 to fix the
following sparse warnings.
drivers/infiniband/hw/cxgb4/qp.c:614:16:
warning: incorrect type in assignment (different base types)
expected unsigned int [unsigned] [usertype] r2
got restricted __be32 [usertype] <noident>
drivers/infiniband/hw/cxgb4/qp.c:615:18:
warning: incorrect type in assignment (different base types)
expected unsigned int [unsigned] [usertype] stag
got restricted __be32 [usertype] <noident>
Cc: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Zdenek Kabelac [Wed, 8 Nov 2017 12:44:56 +0000 (13:44 +0100)]
md: free unused memory after bitmap resize
[ Upstream commit
0868b99c214a3d55486c700de7c3f770b7243e7c ]
When bitmap is resized, the old kalloced chunks just are not released
once the resized bitmap starts to use new space.
This fixes in particular kmemleak reports like this one:
unreferenced object 0xffff8f4311e9c000 (size 4096):
comm "lvm", pid 19333, jiffies
4295263268 (age 528.265s)
hex dump (first 32 bytes):
02 80 02 80 02 80 02 80 02 80 02 80 02 80 02 80 ................
02 80 02 80 02 80 02 80 02 80 02 80 02 80 02 80 ................
backtrace:
[<
ffffffffa69471ca>] kmemleak_alloc+0x4a/0xa0
[<
ffffffffa628c10e>] kmem_cache_alloc_trace+0x14e/0x2e0
[<
ffffffffa676cfec>] bitmap_checkpage+0x7c/0x110
[<
ffffffffa676d0c5>] bitmap_get_counter+0x45/0xd0
[<
ffffffffa676d6b3>] bitmap_set_memory_bits+0x43/0xe0
[<
ffffffffa676e41c>] bitmap_init_from_disk+0x23c/0x530
[<
ffffffffa676f1ae>] bitmap_load+0xbe/0x160
[<
ffffffffc04c47d3>] raid_preresume+0x203/0x2f0 [dm_raid]
[<
ffffffffa677762f>] dm_table_resume_targets+0x4f/0xe0
[<
ffffffffa6774b52>] dm_resume+0x122/0x140
[<
ffffffffa6779b9f>] dev_suspend+0x18f/0x290
[<
ffffffffa677a3a7>] ctl_ioctl+0x287/0x560
[<
ffffffffa677a693>] dm_ctl_ioctl+0x13/0x20
[<
ffffffffa62d6b46>] do_vfs_ioctl+0xa6/0x750
[<
ffffffffa62d7269>] SyS_ioctl+0x79/0x90
[<
ffffffffa6956d41>] entry_SYSCALL_64_fastpath+0x1f/0xc2
Signed-off-by: Zdenek Kabelac <zkabelac@redhat.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Heinz Mauelshagen [Thu, 2 Nov 2017 18:58:28 +0000 (19:58 +0100)]
dm raid: fix panic when attempting to force a raid to sync
[ Upstream commit
233978449074ca7e45d9c959f9ec612d1b852893 ]
Requesting a sync on an active raid device via a table reload
(see 'sync' parameter in Documentation/device-mapper/dm-raid.txt)
skips the super_load() call that defines the superblock size
(rdev->sb_size) -- resulting in an oops if/when super_sync()->memset()
is called.
Fix by moving the initialization of the superblock start and size
out of super_load() to the caller (analyse_superblocks).
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Paul Moore [Fri, 1 Sep 2017 13:44:34 +0000 (09:44 -0400)]
audit: ensure that 'audit=1' actually enables audit for PID 1
[ Upstream commit
173743dd99a49c956b124a74c8aacb0384739a4c ]
Prior to this patch we enabled audit in audit_init(), which is too
late for PID 1 as the standard initcalls are run after the PID 1 task
is forked. This means that we never allocate an audit_context (see
audit_alloc()) for PID 1 and therefore miss a lot of audit events
generated by PID 1.
This patch enables audit as early as possible to help ensure that when
PID 1 is forked it can allocate an audit_context if required.
Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Steve Grubb [Tue, 17 Oct 2017 22:29:22 +0000 (18:29 -0400)]
audit: Allow auditd to set pid to 0 to end auditing
[ Upstream commit
33e8a907804428109ce1d12301c3365d619cc4df ]
The API to end auditing has historically been for auditd to set the
pid to 0. This patch restores that functionality.
See: https://github.com/linux-audit/audit-kernel/issues/69
Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Steve Grubb <sgrubb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Israel Rukshin [Sun, 5 Nov 2017 08:43:01 +0000 (08:43 +0000)]
nvmet-rdma: update queue list during ib_device removal
[ Upstream commit
43b92fd27aaef0f529c9321cfebbaec1d7b8f503 ]
A NULL deref happens when nvmet_rdma_remove_one() is called more than once
(e.g. while connected via 2 ports).
The first call frees the queues related to the first ib_device but
doesn't remove them from the queue list.
While calling nvmet_rdma_remove_one() for the second ib_device it goes over
the full queue list again and we get the NULL deref.
Fixes:
f1d4ef7d ("nvmet-rdma: register ib_client to not deadlock in device removal")
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grmberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bart Van Assche [Wed, 8 Nov 2017 18:23:45 +0000 (10:23 -0800)]
blk-mq: Avoid that request queue removal can trigger list corruption
[ Upstream commit
aba7afc5671c23beade64d10caf86e24a9105dab ]
Avoid that removal of a request queue sporadically triggers the
following warning:
list_del corruption. next->prev should be
ffff8807d649b970, but was
6b6b6b6b6b6b6b6b
WARNING: CPU: 3 PID: 342 at lib/list_debug.c:56 __list_del_entry_valid+0x92/0xa0
Call Trace:
process_one_work+0x11b/0x660
worker_thread+0x3d/0x3b0
kthread+0x129/0x140
ret_from_fork+0x27/0x40
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Hongxu Jia [Fri, 10 Nov 2017 07:59:17 +0000 (15:59 +0800)]
ide: ide-atapi: fix compile error with defining macro DEBUG
[ Upstream commit
8dc7a31fbce5e2dbbacd83d910da37105181b054 ]
Compile ide-atapi failed with defining macro "DEBUG"
...
|drivers/ide/ide-atapi.c:285:52: error: 'struct request' has
no member named 'cmd'; did you mean 'csd'?
| debug_log("%s: rq->cmd[0]: 0x%x\n", __func__, rq->cmd[0]);
...
Since we split the scsi_request out of struct request, it missed
do the same thing on debug_log
Fixes:
82ed4db499b8 ("block: split scsi_request out of struct request")
Signed-off-by: Hongxu Jia <hongxu.jia@windriver.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Keefe Liu [Thu, 9 Nov 2017 12:09:31 +0000 (20:09 +0800)]
ipvlan: fix ipv6 outbound device
[ Upstream commit
ca29fd7cce5a6444d57fb86517589a1a31c759e1 ]
When process the outbound packet of ipv6, we should assign the master
device to output device other than input device.
Signed-off-by: Keefe Liu <liuqifa@huawei.com>
Acked-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Vaidyanathan Srinivasan [Wed, 23 Aug 2017 18:58:41 +0000 (00:28 +0530)]
powerpc/powernv/idle: Round up latency and residency values
[ Upstream commit
8d4e10e9ed9450e18fbbf6a8872be0eac9fd4999 ]
On PowerNV platforms, firmware provides exit latency and
target residency for each of the idle states in nano
seconds. Cpuidle framework expects the values in micro
seconds. Round up to nearest micro seconds to avoid errors
in cases where the values are defined as fractional micro
seconds.
Default idle state of 'snooze' has exit latency of zero. If
other states have fractional micro second exit latency, they
would get rounded down to zero micro second and make cpuidle
framework choose deeper idle state when snooze loop is the
right choice.
Reported-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Masahiro Yamada [Thu, 12 Oct 2017 09:22:25 +0000 (18:22 +0900)]
kbuild: do not call cc-option before KBUILD_CFLAGS initialization
[ Upstream commit
433dc2ebe7d17dd21cba7ad5c362d37323592236 ]
Some $(call cc-option,...) are invoked very early, even before
KBUILD_CFLAGS, etc. are initialized.
The returned string from $(call cc-option,...) depends on
KBUILD_CPPFLAGS, KBUILD_CFLAGS, and GCC_PLUGINS_CFLAGS.
Since they are exported, they are not empty when the top Makefile
is recursively invoked.
The recursion occurs in several places. For example, the top
Makefile invokes itself for silentoldconfig. "make tinyconfig",
"make rpm-pkg" are the cases, too.
In those cases, the second call of cc-option from the same line
runs a different shell command due to non-pristine KBUILD_CFLAGS.
To get the same result all the time, KBUILD_* and GCC_PLUGINS_CFLAGS
must be initialized before any call of cc-option. This avoids
garbage data in the .cache.mk file.
Move all calls of cc-option below the config targets because target
compiler flags are unnecessary for Kconfig.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Marc Zyngier [Thu, 16 Nov 2017 17:58:17 +0000 (17:58 +0000)]
KVM: arm/arm64: vgic-its: Preserve the revious read from the pending table
commit
64afe6e9eb4841f35317da4393de21a047a883b3 upstream.
The current pending table parsing code assumes that we keep the
previous read of the pending bits, but keep that variable in
the current block, making sure it is discarded on each loop.
We end-up using whatever is on the stack. Who knows, it might
just be the right thing...
Fixes:
33d3bc9556a7d ("KVM: arm64: vgic-its: Read initial LPI pending table")
Reported-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Al Viro [Tue, 5 Dec 2017 23:27:57 +0000 (23:27 +0000)]
fix kcm_clone()
commit
a5739435b5a3b8c449f8844ecd71a3b1e89f0a33 upstream.
1) it's fput() or sock_release(), not both
2) don't do fd_install() until the last failure exit.
3) not a bug per se, but... don't attach socket to struct file
until it's set up.
Take reserving descriptor into the caller, move fd_install() to the
caller, sanitize failure exits and calling conventions.
Acked-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jeff Layton [Tue, 14 Nov 2017 19:42:57 +0000 (14:42 -0500)]
fcntl: don't cap l_start and l_end values for F_GETLK64 in compat syscall
commit
4d2dc2cc766c3b51929658cacbc6e34fc8e242fb upstream.
Currently, we're capping the values too low in the F_GETLK64 case. The
fields in that structure are 64-bit values, so we shouldn't need to do
any sort of fixup there.
Make sure we check that assumption at build time in the future however
by ensuring that the sizes we're copying will fit.
With this, we no longer need COMPAT_LOFF_T_MAX either, so remove it.
Fixes:
94073ad77fff2 (fs/locks: don't mess with the address limit in compat_fcntl64)
Reported-by: Vitaly Lipatov <lav@etersoft.ru>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Vincent Pelletier [Sun, 26 Nov 2017 06:52:53 +0000 (06:52 +0000)]
usb: gadget: ffs: Forbid usb_ep_alloc_request from sleeping
commit
30bf90ccdec1da9c8198b161ecbff39ce4e5a9ba upstream.
Found using DEBUG_ATOMIC_SLEEP while submitting an AIO read operation:
[ 100.853642] BUG: sleeping function called from invalid context at mm/slab.h:421
[ 100.861148] in_atomic(): 1, irqs_disabled(): 1, pid: 1880, name: python
[ 100.867954] 2 locks held by python/1880:
[ 100.867961] #0: (&epfile->mutex){....}, at: [<
f8188627>] ffs_mutex_lock+0x27/0x30 [usb_f_fs]
[ 100.868020] #1: (&(&ffs->eps_lock)->rlock){....}, at: [<
f818ad4b>] ffs_epfile_io.isra.17+0x24b/0x590 [usb_f_fs]
[ 100.868076] CPU: 1 PID: 1880 Comm: python Not tainted 4.14.0-edison+ #118
[ 100.868085] Hardware name: Intel Corporation Merrifield/BODEGA BAY, BIOS 542 2015.01.21:18.19.48
[ 100.868093] Call Trace:
[ 100.868122] dump_stack+0x47/0x62
[ 100.868156] ___might_sleep+0xfd/0x110
[ 100.868182] __might_sleep+0x68/0x70
[ 100.868217] kmem_cache_alloc_trace+0x4b/0x200
[ 100.868248] ? dwc3_gadget_ep_alloc_request+0x24/0xe0 [dwc3]
[ 100.868302] dwc3_gadget_ep_alloc_request+0x24/0xe0 [dwc3]
[ 100.868343] usb_ep_alloc_request+0x16/0xc0 [udc_core]
[ 100.868386] ffs_epfile_io.isra.17+0x444/0x590 [usb_f_fs]
[ 100.868424] ? _raw_spin_unlock_irqrestore+0x27/0x40
[ 100.868457] ? kiocb_set_cancel_fn+0x57/0x60
[ 100.868477] ? ffs_ep0_poll+0xc0/0xc0 [usb_f_fs]
[ 100.868512] ffs_epfile_read_iter+0xfe/0x157 [usb_f_fs]
[ 100.868551] ? security_file_permission+0x9c/0xd0
[ 100.868587] ? rw_verify_area+0xac/0x120
[ 100.868633] aio_read+0x9d/0x100
[ 100.868692] ? __fget+0xa2/0xd0
[ 100.868727] ? __might_sleep+0x68/0x70
[ 100.868763] SyS_io_submit+0x471/0x680
[ 100.868878] do_int80_syscall_32+0x4e/0xd0
[ 100.868921] entry_INT80_32+0x2a/0x2a
[ 100.868932] EIP: 0xb7fbb676
[ 100.868941] EFLAGS:
00000292 CPU: 1
[ 100.868951] EAX:
ffffffda EBX:
b7aa2000 ECX:
00000002 EDX:
b7af8368
[ 100.868961] ESI:
b7fbb660 EDI:
b7aab000 EBP:
bfb6c658 ESP:
bfb6c638
[ 100.868973] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
Signed-off-by: Vincent Pelletier <plr.vincent@gmail.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Masamitsu Yamazaki [Wed, 15 Nov 2017 07:33:14 +0000 (07:33 +0000)]
ipmi: Stop timers before cleaning up the module
commit
4f7f5551a760eb0124267be65763008169db7087 upstream.
System may crash after unloading ipmi_si.ko module
because a timer may remain and fire after the module cleaned up resources.
cleanup_one_si() contains the following processing.
/*
* Make sure that interrupts, the timer and the thread are
* stopped and will not run again.
*/
if (to_clean->irq_cleanup)
to_clean->irq_cleanup(to_clean);
wait_for_timer_and_thread(to_clean);
/*
* Timeouts are stopped, now make sure the interrupts are off
* in the BMC. Note that timers and CPU interrupts are off,
* so no need for locks.
*/
while (to_clean->curr_msg || (to_clean->si_state != SI_NORMAL)) {
poll(to_clean);
schedule_timeout_uninterruptible(1);
}
si_state changes as following in the while loop calling poll(to_clean).
SI_GETTING_MESSAGES
=> SI_CHECKING_ENABLES
=> SI_SETTING_ENABLES
=> SI_GETTING_EVENTS
=> SI_NORMAL
As written in the code comments above,
timers are expected to stop before the polling loop and not to run again.
But the timer is set again in the following process
when si_state becomes SI_SETTING_ENABLES.
=> poll
=> smi_event_handler
=> handle_transaction_done
// smi_info->si_state == SI_SETTING_ENABLES
=> start_getting_events
=> start_new_msg
=> smi_mod_timer
=> mod_timer
As a result, before the timer set in start_new_msg() expires,
the polling loop may see si_state becoming SI_NORMAL
and the module clean-up finishes.
For example, hard LOCKUP and panic occurred as following.
smi_timeout was called after smi_event_handler,
kcs_event and hangs at port_inb()
trying to access I/O port after release.
[exception RIP: port_inb+19]
RIP:
ffffffffc0473053 RSP:
ffff88069fdc3d80 RFLAGS:
00000006
RAX:
ffff8806800f8e00 RBX:
ffff880682bd9400 RCX:
0000000000000000
RDX:
0000000000000ca3 RSI:
0000000000000ca3 RDI:
ffff8806800f8e40
RBP:
ffff88069fdc3d80 R8:
ffffffff81d86dfc R9:
ffffffff81e36426
R10:
00000000000509f0 R11:
0000000000100000 R12:
0000000000]:000000
R13:
0000000000000000 R14:
0000000000000246 R15:
ffff8806800f8e00
ORIG_RAX:
ffffffffffffffff CS: 0010 SS: 0000
--- <NMI exception stack> ---
To fix the problem I defined a flag, timer_can_start,
as member of struct smi_info.
The flag is enabled immediately after initializing the timer
and disabled immediately before waiting for timer deletion.
Fixes:
0cfec916e86d ("ipmi: Start the timer and thread on internal msgs")
Signed-off-by: Yamazaki Masamitsu <m-yamazaki@ah.jp.nec.com>
[Some fairly major changes went into the IPMI driver in 4.15, so this
required a backport as the code had changed and moved to a different
file.]
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Xin Long [Sun, 26 Nov 2017 12:56:07 +0000 (20:56 +0800)]
sctp: use right member as the param of list_for_each_entry
[ Upstream commit
a8dd397903a6e57157f6265911f7d35681364427 ]
Commit
d04adf1b3551 ("sctp: reset owner sk for data chunks on out queues
when migrating a sock") made a mistake that using 'list' as the param of
list_for_each_entry to traverse the retransmit, sacked and abandoned
queues, while chunks are using 'transmitted_list' to link into these
queues.
It could cause NULL dereference panic if there are chunks in any of these
queues when peeling off one asoc.
So use the chunk member 'transmitted_list' instead in this patch.
Fixes:
d04adf1b3551 ("sctp: reset owner sk for data chunks on out queues when migrating a sock")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jakub Kicinski [Mon, 27 Nov 2017 19:11:41 +0000 (11:11 -0800)]
cls_bpf: don't decrement net's refcount when offload fails
[ Upstream commit
25415cec502a1232b19fffc85465882b19a90415 ]
When cls_bpf offload was added it seemed like a good idea to
call cls_bpf_delete_prog() instead of extending the error
handling path, since the software state is fully initialized
at that point. This handling of errors without jumping to
the end of the function is error prone, as proven by later
commit missing that extra call to __cls_bpf_delete_prog().
__cls_bpf_delete_prog() is now expected to be invoked with
a reference on exts->net or the field zeroed out. The call
on the offload's error patch does not fullfil this requirement,
leading to each error stealing a reference on net namespace.
Create a function undoing what cls_bpf_set_parms() did and
use it from __cls_bpf_delete_prog() and the error path.
Fixes:
aae2c35ec892 ("cls_bpf: use tcf_exts_get_net() before call_rcu()")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Gustavo A. R. Silva [Sat, 25 Nov 2017 19:14:40 +0000 (13:14 -0600)]
net: openvswitch: datapath: fix data type in queue_gso_packets
[ Upstream commit
2734166e89639c973c6e125ac8bcfc2d9db72b70 ]
gso_type is being used in binary AND operations together with SKB_GSO_UDP.
The issue is that variable gso_type is of type unsigned short and
SKB_GSO_UDP expands to more than 16 bits:
SKB_GSO_UDP = 1 << 16
this makes any binary AND operation between gso_type and SKB_GSO_UDP to
be always zero, hence making some code unreachable and likely causing
undesired behavior.
Fix this by changing the data type of variable gso_type to unsigned int.
Addresses-Coverity-ID: 1462223
Fixes:
0c19f846d582 ("net: accept UFO datagrams from tuntap and packet")
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Willem de Bruijn [Tue, 21 Nov 2017 15:22:25 +0000 (10:22 -0500)]
net: accept UFO datagrams from tuntap and packet
[ Upstream commit
0c19f846d582af919db66a5914a0189f9f92c936 ]
Tuntap and similar devices can inject GSO packets. Accept type
VIRTIO_NET_HDR_GSO_UDP, even though not generating UFO natively.
Processes are expected to use feature negotiation such as TUNSETOFFLOAD
to detect supported offload types and refrain from injecting other
packets. This process breaks down with live migration: guest kernels
do not renegotiate flags, so destination hosts need to expose all
features that the source host does.
Partially revert the UFO removal from
182e0b6b5846~1..
d9d30adf5677.
This patch introduces nearly(*) no new code to simplify verification.
It brings back verbatim tuntap UFO negotiation, VIRTIO_NET_HDR_GSO_UDP
insertion and software UFO segmentation.
It does not reinstate protocol stack support, hardware offload
(NETIF_F_UFO), SKB_GSO_UDP tunneling in SKB_GSO_SOFTWARE or reception
of VIRTIO_NET_HDR_GSO_UDP packets in tuntap.
To support SKB_GSO_UDP reappearing in the stack, also reinstate
logic in act_csum and openvswitch. Achieve equivalence with v4.13 HEAD
by squashing in commit
939912216fa8 ("net: skb_needs_check() removes
CHECKSUM_UNNECESSARY check for tx.") and reverting commit
8d63bee643f1
("net: avoid skb_warn_bad_offload false positives on UFO").
(*) To avoid having to bring back skb_shinfo(skb)->ip6_frag_id,
ipv6_proxy_select_ident is changed to return a __be32 and this is
assigned directly to the frag_hdr. Also, SKB_GSO_UDP is inserted
at the end of the enum to minimize code churn.
Tested
Booted a v4.13 guest kernel with QEMU. On a host kernel before this
patch `ethtool -k eth0` shows UFO disabled. After the patch, it is
enabled, same as on a v4.13 host kernel.
A UFO packet sent from the guest appears on the tap device:
host:
nc -l -p -u 8000 &
tcpdump -n -i tap0
guest:
dd if=/dev/zero of=payload.txt bs=1 count=2000
nc -u 192.16.1.1 8000 < payload.txt
Direct tap to tap transmission of VIRTIO_NET_HDR_GSO_UDP succeeds,
packets arriving fragmented:
./with_tap_pair.sh ./tap_send_ufo tap0 tap1
(from https://github.com/wdebruij/kerneltools/tree/master/tests)
Changes
v1 -> v2
- simplified set_offload change (review comment)
- documented test procedure
Link: http://lkml.kernel.org/r/<CAF=yD-LuUeDuL9YWPJD9ykOZ0QCjNeznPDr6whqZ9NGMNF12Mw@mail.gmail.com>
Fixes:
fb652fdfe837 ("macvlan/macvtap: Remove NETIF_F_UFO advertisement.")
Reported-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Xin Long [Sun, 19 Nov 2017 11:31:04 +0000 (19:31 +0800)]
tun: fix rcu_read_lock imbalance in tun_build_skb
[ Upstream commit
654d573845f35017dc397840fa03610fef3d08b0 ]
rcu_read_lock in tun_build_skb is used to rcu_dereference tun->xdp_prog
safely, rcu_read_unlock should be done in every return path.
Now I could see one place missing it, where it returns NULL in switch-case
XDP_REDIRECT, another palce using rcu_read_lock wrongly, where it returns
NULL in if (xdp_xmit) chunk.
So fix both in this patch.
Fixes:
761876c857cb ("tap: XDP support")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
David Ahern [Tue, 21 Nov 2017 15:08:57 +0000 (07:08 -0800)]
net: ipv6: Fixup device for anycast routes during copy
[ Upstream commit
98d11291d189cb5adf49694d0ad1b971c0212697 ]
Florian reported a breakage with anycast routes due to commit
4832c30d5458 ("net: ipv6: put host and anycast routes on device with
address"). Prior to this commit anycast routes were added against the
loopback device causing repetitive route entries with no insight into
why they existed. e.g.:
$ ip -6 ro ls table local type anycast
anycast 2001:db8:1:: dev lo proto kernel metric 0 pref medium
anycast 2001:db8:2:: dev lo proto kernel metric 0 pref medium
anycast fe80:: dev lo proto kernel metric 0 pref medium
anycast fe80:: dev lo proto kernel metric 0 pref medium
The point of commit
4832c30d5458 is to add the routes using the device
with the address which is causing the route to be added. e.g.,:
$ ip -6 ro ls table local type anycast
anycast 2001:db8:1:: dev eth1 proto kernel metric 0 pref medium
anycast 2001:db8:2:: dev eth2 proto kernel metric 0 pref medium
anycast fe80:: dev eth2 proto kernel metric 0 pref medium
anycast fe80:: dev eth1 proto kernel metric 0 pref medium
For traffic to work as it did before, the dst device needs to be switched
to the loopback when the copy is created similar to local routes.
Fixes:
4832c30d5458 ("net: ipv6: put host and anycast routes on device with address")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Wei Xu [Fri, 1 Dec 2017 10:10:37 +0000 (05:10 -0500)]
tun: free skb in early errors
[ Upstream commit
c33ee15b3820a03cf8229ba9415084197b827f8c ]
tun_recvmsg() supports accepting skb by msg_control after
commit
ac77cfd4258f ("tun: support receiving skb through msg_control"),
the skb if presented should be freed no matter how far it can go
along, otherwise it would be leaked.
This patch fixes several missed cases.
Signed-off-by: Wei Xu <wexu@redhat.com>
Reported-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Neal Cardwell [Sat, 18 Nov 2017 02:06:14 +0000 (21:06 -0500)]
tcp: when scheduling TLP, time of RTO should account for current ACK
[ Upstream commit
ed66dfaf236c04d414de1d218441296e57fb2bd2 ]
Fix the TLP scheduling logic so that when scheduling a TLP probe, we
ensure that the estimated time at which an RTO would fire accounts for
the fact that ACKs indicating forward progress should push back RTO
times.
After the following fix:
df92c8394e6e ("tcp: fix xmit timer to only be reset if data ACKed/SACKed")
we had an unintentional behavior change in the following kind of
scenario: suppose the RTT variance has been very low recently. Then
suppose we send out a flight of N packets and our RTT is 100ms:
t=0: send a flight of N packets
t=100ms: receive an ACK for N-1 packets
The response before
df92c8394e6e that was:
-> schedule a TLP for now + RTO_interval
The response after
df92c8394e6e is:
-> schedule a TLP for t=0 + RTO_interval
Since RTO_interval = srtt + RTT_variance, this means that we have
scheduled a TLP timer at a point in the future that only accounts for
RTT_variance. If the RTT_variance term is small, this means that the
timer fires soon.
Before
df92c8394e6e this would not happen, because in that code, when
we receive an ACK for a prefix of flight, we did:
1) Near the top of tcp_ack(), switch from TLP timer to RTO
at write_queue_head->paket_tx_time + RTO_interval:
if (icsk->icsk_pending == ICSK_TIME_LOSS_PROBE)
tcp_rearm_rto(sk);
2) In tcp_clean_rtx_queue(), update the RTO to now + RTO_interval:
if (flag & FLAG_ACKED) {
tcp_rearm_rto(sk);
3) In tcp_ack() after tcp_fastretrans_alert() switch from RTO
to TLP at now + RTO_interval:
if (icsk->icsk_pending == ICSK_TIME_RETRANS)
tcp_schedule_loss_probe(sk);
In
df92c8394e6e we removed that 3-phase dance, and instead directly
set the TLP timer once: we set the TLP timer in cases like this to
write_queue_head->packet_tx_time + RTO_interval. So if the RTT
variance is small, then this means that this is setting the TLP timer
to fire quite soon. This means if the ACK for the tail of the flight
takes longer than an RTT to arrive (often due to delayed ACKs), then
the TLP timer fires too quickly.
Fixes:
df92c8394e6e ("tcp: fix xmit timer to only be reset if data ACKed/SACKed")
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>