Sean Christopherson [Fri, 7 Feb 2020 17:37:42 +0000 (09:37 -0800)]
KVM: x86/mmu: Fix struct guest_walker arrays for 5-level paging
[ Upstream commit
f6ab0107a4942dbf9a5cf0cca3f37e184870a360 ]
Define PT_MAX_FULL_LEVELS as PT64_ROOT_MAX_LEVEL, i.e. 5, to fix shadow
paging for 5-level guest page tables. PT_MAX_FULL_LEVELS is used to
size the arrays that track guest pages table information, i.e. using a
"max levels" of 4 causes KVM to access garbage beyond the end of an
array when querying state for level 5 entries. E.g. FNAME(gpte_changed)
will read garbage and most likely return %true for a level 5 entry,
soft-hanging the guest because FNAME(fetch) will restart the guest
instead of creating SPTEs because it thinks the guest PTE has changed.
Note, KVM doesn't yet support 5-level nested EPT, so PT_MAX_FULL_LEVELS
gets to stay "4" for the PTTYPE_EPT case.
Fixes:
855feb673640 ("KVM: MMU: Add 5 level EPT & Shadow page table support.")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Chengguang Xu [Wed, 16 Oct 2019 02:25:01 +0000 (10:25 +0800)]
ext4: choose hardlimit when softlimit is larger than hardlimit in ext4_statfs_project()
[ Upstream commit
57c32ea42f8e802bda47010418e25043e0c9337f ]
Setting softlimit larger than hardlimit seems meaningless
for disk quota but currently it is allowed. In this case,
there may be a bit of comfusion for users when they run
df comamnd to directory which has project quota.
For example, we set 20M softlimit and 10M hardlimit of
block usage limit for project quota of test_dir(project id 123).
[root@hades mnt_ext4]# repquota -P -a
*** Report for project quotas on device /dev/loop0
Block grace time: 7days; Inode grace time: 7days
Block limits File limits
Project used soft hard grace used soft hard grace
----------------------------------------------------------------------
0 -- 13 0 0 2 0 0
123 -- 10237 20480 10240 5 200 100
The result of df command as below:
[root@hades mnt_ext4]# df -h test_dir
Filesystem Size Used Avail Use% Mounted on
/dev/loop0 20M 10M 10M 50% /home/cgxu/test/mnt_ext4
Even though it looks like there is another 10M free space to use,
if we write new data to diretory test_dir(inherit project id),
the write will fail with errno(-EDQUOT).
After this patch, the df result looks like below.
[root@hades mnt_ext4]# df -h test_dir
Filesystem Size Used Avail Use% Mounted on
/dev/loop0 10M 10M 3.0K 100% /home/cgxu/test/mnt_ext4
Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20191016022501.760-1-cgxu519@mykernel.net
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sashal@kernel.org>
zhangyi (F) [Tue, 18 Feb 2020 10:59:53 +0000 (18:59 +0800)]
jbd2: do not clear the BH_Mapped flag when forgetting a metadata buffer
[ Upstream commit
c96dceeabf765d0b1b1f29c3bf50a5c01315b820 ]
Commit
904cdbd41d74 ("jbd2: clear dirty flag when revoking a buffer from
an older transaction") set the BH_Freed flag when forgetting a metadata
buffer which belongs to the committing transaction, it indicate the
committing process clear dirty bits when it is done with the buffer. But
it also clear the BH_Mapped flag at the same time, which may trigger
below NULL pointer oops when block_size < PAGE_SIZE.
rmdir 1 kjournald2 mkdir 2
jbd2_journal_commit_transaction
commit transaction N
jbd2_journal_forget
set_buffer_freed(bh1)
jbd2_journal_commit_transaction
commit transaction N+1
...
clear_buffer_mapped(bh1)
ext4_getblk(bh2 ummapped)
...
grow_dev_page
init_page_buffers
bh1->b_private=NULL
bh2->b_private=NULL
jbd2_journal_put_journal_head(jh1)
__journal_remove_journal_head(hb1)
jh1 is NULL and trigger oops
*) Dir entry block bh1 and bh2 belongs to one page, and the bh2 has
already been unmapped.
For the metadata buffer we forgetting, we should always keep the mapped
flag and clear the dirty flags is enough, so this patch pick out the
these buffers and keep their BH_Mapped flag.
Link: https://lore.kernel.org/r/20200213063821.30455-3-yi.zhang@huawei.com
Fixes:
904cdbd41d74 ("jbd2: clear dirty flag when revoking a buffer from an older transaction")
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
zhangyi (F) [Tue, 18 Feb 2020 10:59:52 +0000 (18:59 +0800)]
jbd2: move the clearing of b_modified flag to the journal_unmap_buffer()
[ Upstream commit
6a66a7ded12baa6ebbb2e3e82f8cb91382814839 ]
There is no need to delay the clearing of b_modified flag to the
transaction committing time when unmapping the journalled buffer, so
just move it to the journal_unmap_buffer().
Link: https://lore.kernel.org/r/20200213063821.30455-2-yi.zhang@huawei.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
Jernej Skrabec [Sun, 26 Jan 2020 06:59:37 +0000 (07:59 +0100)]
Revert "drm/sun4i: drv: Allow framebuffer modifiers in mode config"
commit
cf913e9683273f2640501094fa63a67e29f437b3 upstream.
This reverts commit
9db9c0cf5895e4ddde2814360cae7bea9282edd2.
Setting mode_config.allow_fb_modifiers manually is completely
unnecessary. It is set automatically by drm_universal_plane_init() based
on the fact if modifier list is provided or not. Even more, it breaks
DE2 and DE3 as they don't support any modifiers beside linear. Modifiers
aware applications can be confused by provided empty modifier list - at
least linear modifier should be included, but it's not for DE2 and DE3.
Fixes:
9db9c0cf5895 ("drm/sun4i: drv: Allow framebuffer modifiers in mode config")
Signed-off-by: Jernej Skrabec <jernej.skrabec@siol.net>
Reviewed-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20200126065937.9564-1-jernej.skrabec@siol.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Olga Kornievskaia [Wed, 12 Feb 2020 22:32:12 +0000 (17:32 -0500)]
NFSv4.1 make cachethis=no for writes
commit
cd1b659d8ce7697ee9799b64f887528315b9097b upstream.
Turning caching off for writes on the server should improve performance.
Fixes:
fba83f34119a ("NFS: Pass "privileged" value to nfs4_init_sequence()")
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Reviewed-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Kim Phillips [Fri, 7 Feb 2020 23:06:11 +0000 (17:06 -0600)]
perf stat: Don't report a null stalled cycles per insn metric
commit
80cc7bb6c104d733bff60ddda09f19139c61507c upstream.
For data collected on machines with front end stalled cycles supported,
such as found on modern AMD CPU families, commit
146540fb545b ("perf
stat: Always separate stalled cycles per insn") introduces a new line in
CSV output with a leading comma that upsets some automated scripts.
Scripts have to use "-e ex_ret_instr" to work around this issue, after
upgrading to a version of perf with that commit.
We could add "if (have_frontend_stalled && !config->csv_sep)" to the not
(total && avg) else clause, to emphasize that CSV users are usually
scripts, and are written to do only what is needed, i.e., they wouldn't
typically invoke "perf stat" without specifying an explicit event list.
But - let alone CSV output - why should users now tolerate a constant
0-reporting extra line in regular terminal output?:
BEFORE:
$ sudo perf stat --all-cpus -einstructions,cycles -- sleep 1
Performance counter stats for 'system wide':
181,110,981 instructions # 0.58 insn per cycle
# 0.00 stalled cycles per insn
309,876,469 cycles
1.
002202582 seconds time elapsed
The user would not like to see the now permanent:
"0.00 stalled cycles per insn"
line fixture, as it gives no useful information.
So this patch removes the printing of the zeroed stalled cycles line
altogether, almost reverting the very original commit
fb4605ba47e7
("perf stat: Check for frontend stalled for metrics"), which seems like
it was written to normalize --metric-only column output of common Intel
machines at the time: modern Intel machines have ceased to support the
genericised frontend stalled metrics AFAICT.
AFTER:
$ sudo perf stat --all-cpus -einstructions,cycles -- sleep 1
Performance counter stats for 'system wide':
244,071,432 instructions # 0.69 insn per cycle
355,353,490 cycles
1.
001862516 seconds time elapsed
Output behaviour when stalled cycles is indeed measured is not affected
(BEFORE == AFTER):
$ sudo perf stat --all-cpus -einstructions,cycles,stalled-cycles-frontend -- sleep 1
Performance counter stats for 'system wide':
247,227,799 instructions # 0.63 insn per cycle
# 0.26 stalled cycles per insn
394,745,636 cycles
63,194,485 stalled-cycles-frontend # 16.01% frontend cycles idle
1.
002079770 seconds time elapsed
Fixes:
146540fb545b ("perf stat: Always separate stalled cycles per insn")
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Song Liu <songliubraving@fb.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20200207230613.26709-1-kim.phillips@amd.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Oliver Upton [Fri, 7 Feb 2020 10:36:04 +0000 (02:36 -0800)]
KVM: x86: Mask off reserved bit from #DB exception payload
commit
307f1cfa269657c63cfe2c932386fcc24684d9dd upstream.
KVM defines the #DB payload as compatible with the 'pending debug
exceptions' field under VMX, not DR6. Mask off bit 12 when applying the
payload to DR6, as it is reserved on DR6 but not the 'pending debug
exceptions' field.
Fixes:
f10c729ff965 ("kvm: vmx: Defer setting of DR6 until #DB delivery")
Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Marc Zyngier [Thu, 23 Jan 2020 14:51:12 +0000 (14:51 +0000)]
arm64: dts: fast models: Fix FVP PCI interrupt-map property
commit
3543d7ddd55fe12c37e8a9db846216c51846015b upstream.
The interrupt map for the FVP's PCI node is missing the
parent-unit-address cells for each of the INTx entries, leading to the
kernel code failing to parse the entries correctly.
Add the missing zero cells, which are pretty useless as far as the GIC
is concerned, but that the spec requires. This allows INTx to be usable
on the model, and VFIO to work correctly.
Fixes:
fa083b99eb28 ("arm64: dts: fast models: Add DTS fo Base RevC FVP")
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Petr Pavlu [Mon, 10 Feb 2020 09:38:14 +0000 (10:38 +0100)]
cifs: fix mount option display for sec=krb5i
commit
3f6166aaf19902f2f3124b5426405e292e8974dd upstream.
Fix display for sec=krb5i which was wrongly interleaved by cruid,
resulting in string "sec=krb5,cruid=<...>i" instead of
"sec=krb5i,cruid=<...>".
Fixes:
96281b9e46eb ("smb3: for kerberos mounts display the credential uid used")
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sara Sharon [Fri, 31 Jan 2020 11:12:51 +0000 (13:12 +0200)]
mac80211: fix quiet mode activation in action frames
commit
2bf973ff9b9aeceb8acda629ae65341820d4b35b upstream.
Previously I intended to ignore quiet mode in probe response, however
I ended up ignoring it instead for action frames. As a matter of fact,
this path isn't invoked for probe responses to start with. Just revert
this patch.
Signed-off-by: Sara Sharon <sara.sharon@intel.com>
Fixes:
7976b1e9e3bf ("mac80211: ignore quiet mode in probe")
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Link: https://lore.kernel.org/r/20200131111300.891737-15-luca@coelho.fi
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Mike Jones [Tue, 28 Jan 2020 17:59:59 +0000 (10:59 -0700)]
hwmon: (pmbus/ltc2978) Fix PMBus polling of MFR_COMMON definitions.
commit
cf2b012c90e74e85d8aea7d67e48868069cfee0c upstream.
Change 21537dc driver PMBus polling of MFR_COMMON from bits 5/4 to
bits 6/5. This fixs a LTC297X family bug where polling always returns
not busy even when the part is busy. This fixes a LTC388X and
LTM467X bug where polling used PEND and NOT_IN_TRANS, and BUSY was
not polled, which can lead to NACKing of commands. LTC388X and
LTM467X modules now poll BUSY and PEND, increasing reliability by
eliminating NACKing of commands.
Signed-off-by: Mike Jones <michael-a1.jones@analog.com>
Link: https://lore.kernel.org/r/1580234400-2829-2-git-send-email-michael-a1.jones@analog.com
Fixes:
e04d1ce9bbb49 ("hwmon: (ltc2978) Add polling for chips requiring it")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Kan Liang [Tue, 21 Jan 2020 19:01:25 +0000 (11:01 -0800)]
perf/x86/intel: Fix inaccurate period in context switch for auto-reload
commit
f861854e1b435b27197417f6f90d87188003cb24 upstream.
Perf doesn't take the left period into account when auto-reload is
enabled with fixed period sampling mode in context switch.
Here is the MSR trace of the perf command as below.
(The MSR trace is simplified from a ftrace log.)
#perf record -e cycles:p -c 2000000 -- ./triad_loop
//The MSR trace of task schedule out
//perf disable all counters, disable PEBS, disable GP counter 0,
//read GP counter 0, and re-enable all counters.
//The counter 0 stops at 0xfffffff82840
write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value 0
write_msr: MSR_IA32_PEBS_ENABLE(3f1), value 0
write_msr: MSR_P6_EVNTSEL0(186), value
40003003c
rdpmc: 0, value
fffffff82840
write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value
f000000ff
//The MSR trace of the same task schedule in again
//perf disable all counters, enable and set GP counter 0,
//enable PEBS, and re-enable all counters.
//0xffffffe17b80 (-2000000) is written to GP counter 0.
write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value 0
write_msr: MSR_IA32_PMC0(4c1), value
ffffffe17b80
write_msr: MSR_P6_EVNTSEL0(186), value
40043003c
write_msr: MSR_IA32_PEBS_ENABLE(3f1), value 1
write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value
f000000ff
When the same task schedule in again, the counter should starts from
previous left. However, it starts from the fixed period -2000000 again.
A special variant of intel_pmu_save_and_restart() is used for
auto-reload, which doesn't update the hwc->period_left.
When the monitored task schedules in again, perf doesn't know the left
period. The fixed period is used, which is inaccurate.
With auto-reload, the counter always has a negative counter value. So
the left period is -value. Update the period_left in
intel_pmu_save_and_restart_reload().
With the patch:
//The MSR trace of task schedule out
write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value 0
write_msr: MSR_IA32_PEBS_ENABLE(3f1), value 0
write_msr: MSR_P6_EVNTSEL0(186), value
40003003c
rdpmc: 0, value
ffffffe25cbc
write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value
f000000ff
//The MSR trace of the same task schedule in again
write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value 0
write_msr: MSR_IA32_PMC0(4c1), value
ffffffe25cbc
write_msr: MSR_P6_EVNTSEL0(186), value
40043003c
write_msr: MSR_IA32_PEBS_ENABLE(3f1), value 1
write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value
f000000ff
Fixes:
d31fc13fdcb2 ("perf/x86/intel: Fix event update for auto-reload")
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20200121190125.3389-1-kan.liang@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Stephen Boyd [Tue, 21 Jan 2020 18:37:48 +0000 (10:37 -0800)]
spmi: pmic-arb: Set lockdep class for hierarchical irq domains
commit
2d5a2f913b658a7ae984773a63318ed4daadf4af upstream.
I see the following lockdep splat in the qcom pinctrl driver when
attempting to suspend the device.
WARNING: possible recursive locking detected
5.4.11 #3 Tainted: G W
--------------------------------------------
cat/3074 is trying to acquire lock:
ffffff81f49804c0 (&irq_desc_lock_class){-.-.}, at: __irq_get_desc_lock+0x64/0x94
but task is already holding lock:
ffffff81f1cc10c0 (&irq_desc_lock_class){-.-.}, at: __irq_get_desc_lock+0x64/0x94
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&irq_desc_lock_class);
lock(&irq_desc_lock_class);
*** DEADLOCK ***
May be due to missing lock nesting notation
6 locks held by cat/3074:
#0:
ffffff81f01d9420 (sb_writers#7){.+.+}, at: vfs_write+0xd0/0x1a4
#1:
ffffff81bd7d2080 (&of->mutex){+.+.}, at: kernfs_fop_write+0x12c/0x1fc
#2:
ffffff81f4c322f0 (kn->count#337){.+.+}, at: kernfs_fop_write+0x134/0x1fc
#3:
ffffffe411a41d60 (system_transition_mutex){+.+.}, at: pm_suspend+0x108/0x348
#4:
ffffff81f1c5e970 (&dev->mutex){....}, at: __device_suspend+0x168/0x41c
#5:
ffffff81f1cc10c0 (&irq_desc_lock_class){-.-.}, at: __irq_get_desc_lock+0x64/0x94
stack backtrace:
CPU: 5 PID: 3074 Comm: cat Tainted: G W 5.4.11 #3
Hardware name: Google Cheza (rev3+) (DT)
Call trace:
dump_backtrace+0x0/0x174
show_stack+0x20/0x2c
dump_stack+0xc8/0x124
__lock_acquire+0x460/0x2388
lock_acquire+0x1cc/0x210
_raw_spin_lock_irqsave+0x64/0x80
__irq_get_desc_lock+0x64/0x94
irq_set_irq_wake+0x40/0x144
qpnpint_irq_set_wake+0x28/0x34
set_irq_wake_real+0x40/0x5c
irq_set_irq_wake+0x70/0x144
pm8941_pwrkey_suspend+0x34/0x44
platform_pm_suspend+0x34/0x60
dpm_run_callback+0x64/0xcc
__device_suspend+0x310/0x41c
dpm_suspend+0xf8/0x298
dpm_suspend_start+0x84/0xb4
suspend_devices_and_enter+0xbc/0x620
pm_suspend+0x210/0x348
state_store+0xb0/0x108
kobj_attr_store+0x14/0x24
sysfs_kf_write+0x4c/0x64
kernfs_fop_write+0x15c/0x1fc
__vfs_write+0x54/0x18c
vfs_write+0xe4/0x1a4
ksys_write+0x7c/0xe4
__arm64_sys_write+0x20/0x2c
el0_svc_common+0xa8/0x160
el0_svc_handler+0x7c/0x98
el0_svc+0x8/0xc
Set a lockdep class when we map the irq so that irq_set_wake() doesn't
warn about a lockdep bug that doesn't exist.
Fixes:
12a9eeaebba3 ("spmi: pmic-arb: convert to v2 irq interfaces to support hierarchical IRQ chips")
Cc: Douglas Anderson <dianders@chromium.org>
Cc: Brian Masney <masneyb@onstation.org>
Cc: Lina Iyer <ilina@codeaurora.org>
Cc: Maulik Shah <mkshah@codeaurora.org>
Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Link: https://lore.kernel.org/r/20200121183748.68662-1-swboyd@chromium.org
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Qais Yousef [Tue, 14 Jan 2020 21:09:47 +0000 (21:09 +0000)]
sched/uclamp: Reject negative values in cpu_uclamp_write()
commit
b562d140649966d4daedd0483a8fe59ad3bb465a upstream.
The check to ensure that the new written value into cpu.uclamp.{min,max}
is within range, [0:100], wasn't working because of the signed
comparison
7301 if (req.percent > UCLAMP_PERCENT_SCALE) {
7302 req.ret = -ERANGE;
7303 return req;
7304 }
# echo -1 > cpu.uclamp.min
# cat cpu.uclamp.min
42949671.96
Cast req.percent into u64 to force the comparison to be unsigned and
work as intended in capacity_from_percent().
# echo -1 > cpu.uclamp.min
sh: write error: Numerical result out of range
Fixes:
2480c093130f ("sched/uclamp: Extend CPU's cgroup controller")
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20200114210947.14083-1-qais.yousef@arm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Nathan Chancellor [Sat, 8 Feb 2020 14:08:59 +0000 (07:08 -0700)]
s390/time: Fix clk type in get_tod_clock
commit
0f8a206df7c920150d2aa45574fba0ab7ff6be4f upstream.
Clang warns:
In file included from ../arch/s390/boot/startup.c:3:
In file included from ../include/linux/elf.h:5:
In file included from ../arch/s390/include/asm/elf.h:132:
In file included from ../include/linux/compat.h:10:
In file included from ../include/linux/time.h:74:
In file included from ../include/linux/time32.h:13:
In file included from ../include/linux/timex.h:65:
../arch/s390/include/asm/timex.h:160:20: warning: passing 'unsigned char
[16]' to parameter of type 'char *' converts between pointers to integer
types with different sign [-Wpointer-sign]
get_tod_clock_ext(clk);
^~~
../arch/s390/include/asm/timex.h:149:44: note: passing argument to
parameter 'clk' here
static inline void get_tod_clock_ext(char *clk)
^
Change clk's type to just be char so that it matches what happens in
get_tod_clock_ext.
Fixes:
57b28f66316d ("[S390] s390_hypfs: Add new attributes")
Link: https://github.com/ClangBuiltLinux/linux/issues/861
Link: http://lkml.kernel.org/r/20200208140858.47970-1-natechancellor@gmail.com
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Leon Romanovsky [Wed, 12 Feb 2020 08:06:51 +0000 (10:06 +0200)]
RDMA/core: Fix protection fault in get_pkey_idx_qp_list
commit
1dd017882e01d2fcd9c5dbbf1eb376211111c393 upstream.
We don't need to set pkey as valid in case that user set only one of pkey
index or port number, otherwise it will be resulted in NULL pointer
dereference while accessing to uninitialized pkey list. The following
crash from Syzkaller revealed it.
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN PTI
CPU: 1 PID: 14753 Comm: syz-executor.2 Not tainted 5.5.0-rc5 #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
RIP: 0010:get_pkey_idx_qp_list+0x161/0x2d0
Code: 01 00 00 49 8b 5e 20 4c 39 e3 0f 84 b9 00 00 00 e8 e4 42 6e fe 48
8d 7b 10 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04
02 84 c0 74 08 3c 01 0f 8e d0 00 00 00 48 8d 7d 04 48 b8
RSP: 0018:
ffffc9000bc6f950 EFLAGS:
00010202
RAX:
dffffc0000000000 RBX:
0000000000000000 RCX:
ffffffff82c8bdec
RDX:
0000000000000002 RSI:
ffffc900030a8000 RDI:
0000000000000010
RBP:
ffff888112c8ce80 R08:
0000000000000004 R09:
fffff5200178df1f
R10:
0000000000000001 R11:
fffff5200178df1f R12:
ffff888115dc4430
R13:
ffff888115da8498 R14:
ffff888115dc4410 R15:
ffff888115da8000
FS:
00007f20777de700(0000) GS:
ffff88811b100000(0000)
knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
0000001b2f721000 CR3:
00000001173ca002 CR4:
0000000000360ee0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
Call Trace:
port_pkey_list_insert+0xd7/0x7c0
ib_security_modify_qp+0x6fa/0xfc0
_ib_modify_qp+0x8c4/0xbf0
modify_qp+0x10da/0x16d0
ib_uverbs_modify_qp+0x9a/0x100
ib_uverbs_write+0xaa5/0xdf0
__vfs_write+0x7c/0x100
vfs_write+0x168/0x4a0
ksys_write+0xc8/0x200
do_syscall_64+0x9c/0x390
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes:
d291f1a65232 ("IB/core: Enforce PKey security on QPs")
Link: https://lore.kernel.org/r/20200212080651.GB679970@unreal
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Message-Id: <
20200212080651.GB679970@unreal>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Zhu Yanjun [Wed, 12 Feb 2020 07:26:33 +0000 (09:26 +0200)]
RDMA/rxe: Fix soft lockup problem due to using tasklets in softirq
commit
8ac0e6641c7ca14833a2a8c6f13d8e0a435e535c upstream.
When run stress tests with RXE, the following Call Traces often occur
watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [swapper/2:0]
...
Call Trace:
<IRQ>
create_object+0x3f/0x3b0
kmem_cache_alloc_node_trace+0x129/0x2d0
__kmalloc_reserve.isra.52+0x2e/0x80
__alloc_skb+0x83/0x270
rxe_init_packet+0x99/0x150 [rdma_rxe]
rxe_requester+0x34e/0x11a0 [rdma_rxe]
rxe_do_task+0x85/0xf0 [rdma_rxe]
tasklet_action_common.isra.21+0xeb/0x100
__do_softirq+0xd0/0x298
irq_exit+0xc5/0xd0
smp_apic_timer_interrupt+0x68/0x120
apic_timer_interrupt+0xf/0x20
</IRQ>
...
The root cause is that tasklet is actually a softirq. In a tasklet
handler, another softirq handler is triggered. Usually these softirq
handlers run on the same cpu core. So this will cause "soft lockup Bug".
Fixes:
8700e3e7c485 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20200212072635.682689-8-leon@kernel.org
Signed-off-by: Zhu Yanjun <yanjunz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Kamal Heib [Wed, 5 Feb 2020 11:05:30 +0000 (13:05 +0200)]
RDMA/hfi1: Fix memory leak in _dev_comp_vect_mappings_create
commit
8a4f300b978edbbaa73ef9eca660e45eb9f13873 upstream.
Make sure to free the allocated cpumask_var_t's to avoid the following
reported memory leak by kmemleak:
$ cat /sys/kernel/debug/kmemleak
unreferenced object 0xffff8897f812d6a8 (size 8):
comm "kworker/1:1", pid 347, jiffies
4294751400 (age 101.703s)
hex dump (first 8 bytes):
00 00 00 00 00 00 00 00 ........
backtrace:
[<
00000000bff49664>] alloc_cpumask_var_node+0x4c/0xb0
[<
0000000075d3ca81>] hfi1_comp_vectors_set_up+0x20f/0x800 [hfi1]
[<
0000000098d420df>] hfi1_init_dd+0x3311/0x4960 [hfi1]
[<
0000000071be7e52>] init_one+0x25e/0xf10 [hfi1]
[<
000000005483d4c2>] local_pci_probe+0xd4/0x180
[<
000000007c3cbc6e>] work_for_cpu_fn+0x51/0xa0
[<
000000001d626905>] process_one_work+0x8f0/0x17b0
[<
000000007e569e7e>] worker_thread+0x536/0xb50
[<
00000000fd39a4a5>] kthread+0x30c/0x3d0
[<
0000000056f2edb3>] ret_from_fork+0x3a/0x50
Fixes:
5d18ee67d4c1 ("IB/{hfi1, rdmavt, qib}: Implement CQ completion vector support")
Link: https://lore.kernel.org/r/20200205110530.12129-1-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Krishnamraju Eraparaju [Tue, 4 Feb 2020 09:12:30 +0000 (14:42 +0530)]
RDMA/iw_cxgb4: initiate CLOSE when entering TERM
commit
d219face9059f38ad187bde133451a2a308fdb7c upstream.
As per draft-hilland-iwarp-verbs-v1.0, sec 6.2.3, always initiate a CLOSE
when entering into TERM state.
In c4iw_modify_qp(), disconnect operation should only be performed when
the modify_qp call is invoked from ib_core. And all other internal
modify_qp calls(invoked within iw_cxgb4) that needs 'disconnect' should
call c4iw_ep_disconnect() explicitly after modify_qp. Otherwise, deadlocks
like below can occur:
Call Trace:
schedule+0x2f/0xa0
schedule_preempt_disabled+0xa/0x10
__mutex_lock.isra.5+0x2d0/0x4a0
c4iw_ep_disconnect+0x39/0x430 => tries to reacquire ep lock again
c4iw_modify_qp+0x468/0x10d0
rx_data+0x218/0x570 => acquires ep lock
process_work+0x5f/0x70
process_one_work+0x1a7/0x3b0
worker_thread+0x30/0x390
kthread+0x112/0x130
ret_from_fork+0x35/0x40
Fixes:
d2c33370ae73 ("RDMA/iw_cxgb4: Always disconnect when QP is transitioning to TERMINATE state")
Link: https://lore.kernel.org/r/20200204091230.7210-1-krishna2@chelsio.com
Signed-off-by: Krishnamraju Eraparaju <krishna2@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Avihai Horon [Sun, 26 Jan 2020 17:15:00 +0000 (19:15 +0200)]
RDMA/core: Fix invalid memory access in spec_filter_size
commit
a72f4ac1d778f7bde93dfee69bfc23377ec3d74f upstream.
Add a check that the size specified in the flow spec header doesn't cause
an overflow when calculating the filter size, and thus prevent access to
invalid memory. The following crash from syzkaller revealed it.
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN PTI
CPU: 1 PID: 17834 Comm: syz-executor.3 Not tainted 5.5.0-rc5 #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
RIP: 0010:memchr_inv+0xd3/0x330
Code: 89 f9 89 f5 83 e1 07 0f 85 f9 00 00 00 49 89 d5 49 c1 ed 03 45 85
ed 74 6f 48 89 d9 48 b8 00 00 00 00 00 fc ff df 48 c1 e9 03 <80> 3c 01
00 0f 85 0d 02 00 00 44 0f b6 e5 48 b8 01 01 01 01 01 01
RSP: 0018:
ffffc9000a13fa50 EFLAGS:
00010202
RAX:
dffffc0000000000 RBX:
7fff88810de9d820 RCX:
0ffff11021bd3b04
RDX:
000000000000fff8 RSI:
0000000000000000 RDI:
7fff88810de9d820
RBP:
0000000000000000 R08:
ffff888110d69018 R09:
0000000000000009
R10:
0000000000000001 R11:
ffffed10236267cc R12:
0000000000000004
R13:
0000000000001fff R14:
ffff88810de9d820 R15:
0000000000000040
FS:
00007f9ee0e51700(0000) GS:
ffff88811b100000(0000)
knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
0000000000000000 CR3:
0000000115ea0006 CR4:
0000000000360ee0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
Call Trace:
spec_filter_size.part.16+0x34/0x50
ib_uverbs_kern_spec_to_ib_spec_filter+0x691/0x770
ib_uverbs_ex_create_flow+0x9ea/0x1b40
ib_uverbs_write+0xaa5/0xdf0
__vfs_write+0x7c/0x100
vfs_write+0x168/0x4a0
ksys_write+0xc8/0x200
do_syscall_64+0x9c/0x390
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x465b49
Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01
f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:
00007f9ee0e50c58 EFLAGS:
00000246 ORIG_RAX:
0000000000000001
RAX:
ffffffffffffffda RBX:
000000000073bf00 RCX:
0000000000465b49
RDX:
00000000000003a0 RSI:
00000000200007c0 RDI:
0000000000000004
RBP:
0000000000000003 R08:
0000000000000000 R09:
0000000000000000
R10:
0000000000000000 R11:
0000000000000246 R12:
00007f9ee0e516bc
R13:
00000000004ca2da R14:
000000000070deb8 R15:
00000000ffffffff
Modules linked in:
Dumping ftrace buffer:
(ftrace buffer empty)
Fixes:
94e03f11ad1f ("IB/uverbs: Add support for flow tag")
Link: https://lore.kernel.org/r/20200126171500.4623-1-leon@kernel.org
Signed-off-by: Avihai Horon <avihaih@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Yonatan Cohen [Wed, 12 Feb 2020 07:26:34 +0000 (09:26 +0200)]
IB/umad: Fix kernel crash while unloading ib_umad
commit
9ea04d0df6e6541c6736b43bff45f1e54875a1db upstream.
When disassociating a device from umad we must ensure that the sysfs
access is prevented before blocking the fops, otherwise assumptions in
syfs don't hold:
CPU0 CPU1
ib_umad_kill_port() ibdev_show()
port->ib_dev = NULL
dev_name(port->ib_dev)
The prior patch made an error in moving the device_destroy(), it should
have been split into device_del() (above) and put_device() (below). At
this point we already have the split, so move the device_del() back to its
original place.
kernel stack
PF: error_code(0x0000) - not-present page
Oops: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
RIP: 0010:ibdev_show+0x18/0x50 [ib_umad]
RSP: 0018:
ffffc9000097fe40 EFLAGS:
00010282
RAX:
0000000000000000 RBX:
ffffffffa0441120 RCX:
ffff8881df514000
RDX:
ffff8881df514000 RSI:
ffffffffa0441120 RDI:
ffff8881df1e8870
RBP:
ffffffff81caf000 R08:
ffff8881df1e8870 R09:
0000000000000000
R10:
0000000000001000 R11:
0000000000000003 R12:
ffff88822f550b40
R13:
0000000000000001 R14:
ffffc9000097ff08 R15:
ffff8882238bad58
FS:
00007f1437ff3740(0000) GS:
ffff888236940000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
00000000000004e8 CR3:
00000001e0dfc001 CR4:
00000000001606e0
Call Trace:
dev_attr_show+0x15/0x50
sysfs_kf_seq_show+0xb8/0x1a0
seq_read+0x12d/0x350
vfs_read+0x89/0x140
ksys_read+0x55/0xd0
do_syscall_64+0x55/0x1b0
entry_SYSCALL_64_after_hwframe+0x44/0xa9:
Fixes:
cf7ad3030271 ("IB/umad: Avoid destroying device while it is accessed")
Link: https://lore.kernel.org/r/20200212072635.682689-9-leon@kernel.org
Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Kaike Wan [Mon, 10 Feb 2020 13:10:40 +0000 (08:10 -0500)]
IB/rdmavt: Reset all QPs when the device is shut down
commit
f92e48718889b3d49cee41853402aa88cac84a6b upstream.
When the hfi1 device is shut down during a system reboot, it is possible
that some QPs might have not not freed by ULPs. More requests could be
post sent and a lingering timer could be triggered to schedule more packet
sends, leading to a crash:
BUG: unable to handle kernel NULL pointer dereference at
0000000000000102
IP: [
ffffffff810a65f2] __queue_work+0x32/0x3c0
PGD 0
Oops: 0000 1 SMP
Modules linked in: nvmet_rdma(OE) nvmet(OE) nvme(OE) dm_round_robin nvme_rdma(OE) nvme_fabrics(OE) nvme_core(OE) pal_raw(POE) pal_pmt(POE) pal_cache(POE) pal_pile(POE) pal(POE) pal_compatible(OE) rpcrdma sunrpc ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm mlx4_ib sb_edac edac_core intel_powerclamp coretemp intel_rapl iosf_mbi kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt iTCO_vendor_support mxm_wmi ipmi_ssif pcspkr ses enclosure joydev scsi_transport_sas i2c_i801 sg mei_me lpc_ich mei ioatdma shpchp ipmi_si ipmi_devintf ipmi_msghandler wmi acpi_power_meter acpi_pad dm_multipath hangcheck_timer ip_tables ext4 mbcache jbd2 mlx4_en
sd_mod crc_t10dif crct10dif_generic mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm mlx4_core crct10dif_pclmul crct10dif_common hfi1(OE) igb crc32c_intel rdmavt(OE) ahci ib_core libahci libata ptp megaraid_sas pps_core dca i2c_algo_bit i2c_core devlink dm_mirror dm_region_hash dm_log dm_mod
CPU: 23 PID: 0 Comm: swapper/23 Tainted: P OE ------------ 3.10.0-693.el7.x86_64 #1
Hardware name: Intel Corporation S2600CWR/S2600CWR, BIOS SE5C610.86B.01.01.0028.
121720182203 12/17/2018
task:
ffff8808f4ec4f10 ti:
ffff8808f4ed8000 task.ti:
ffff8808f4ed8000
RIP: 0010:[
ffffffff810a65f2] [
ffffffff810a65f2] __queue_work+0x32/0x3c0
RSP: 0018:
ffff88105df43d48 EFLAGS:
00010046
RAX:
0000000000000086 RBX:
0000000000000086 RCX:
0000000000000000
RDX:
ffff880f74e758b0 RSI:
0000000000000000 RDI:
000000000000001f
RBP:
ffff88105df43d80 R08:
ffff8808f3c583c8 R09:
ffff8808f3c58000
R10:
0000000000000002 R11:
ffff88105df43da8 R12:
ffff880f74e758b0
R13:
000000000000001f R14:
0000000000000000 R15:
ffff88105a300000
FS:
0000000000000000(0000) GS:
ffff88105df40000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
0000000000000102 CR3:
00000000019f2000 CR4:
00000000001407e0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000ffff0ff0 DR7:
0000000000000400
Stack:
ffff88105b6dd708 0000001f00000286 0000000000000086 ffff88105a300000
ffff880f74e75800 0000000000000000 ffff88105a300000 ffff88105df43d98
ffffffff810a6b85 ffff88105a301e80 ffff88105df43dc8 ffffffffc0224cde
Call Trace:
IRQ
[
ffffffff810a6b85] queue_work_on+0x45/0x50
[
ffffffffc0224cde] _hfi1_schedule_send+0x6e/0xc0 [hfi1]
[
ffffffffc0170570] ? get_map_page+0x60/0x60 [rdmavt]
[
ffffffffc0224d62] hfi1_schedule_send+0x32/0x70 [hfi1]
[
ffffffffc0170644] rvt_rc_timeout+0xd4/0x120 [rdmavt]
[
ffffffffc0170570] ? get_map_page+0x60/0x60 [rdmavt]
[
ffffffff81097316] call_timer_fn+0x36/0x110
[
ffffffffc0170570] ? get_map_page+0x60/0x60 [rdmavt]
[
ffffffff8109982d] run_timer_softirq+0x22d/0x310
[
ffffffff81090b3f] __do_softirq+0xef/0x280
[
ffffffff816b6a5c] call_softirq+0x1c/0x30
[
ffffffff8102d3c5] do_softirq+0x65/0xa0
[
ffffffff81090ec5] irq_exit+0x105/0x110
[
ffffffff816b76c2] smp_apic_timer_interrupt+0x42/0x50
[
ffffffff816b5c1d] apic_timer_interrupt+0x6d/0x80
EOI
[
ffffffff81527a02] ? cpuidle_enter_state+0x52/0xc0
[
ffffffff81527b48] cpuidle_idle_call+0xd8/0x210
[
ffffffff81034fee] arch_cpu_idle+0xe/0x30
[
ffffffff810e7bca] cpu_startup_entry+0x14a/0x1c0
[
ffffffff81051af6] start_secondary+0x1b6/0x230
Code: 89 e5 41 57 41 56 49 89 f6 41 55 41 89 fd 41 54 49 89 d4 53 48 83 ec 10 89 7d d4 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 be 02 00 00 41 f6 86 02 01 00 00 01 0f 85 58 02 00 00 49 c7 c7 28 19 01 00
RIP [
ffffffff810a65f2] __queue_work+0x32/0x3c0
RSP
ffff88105df43d48
CR2:
0000000000000102
The solution is to reset the QPs before the device resources are freed.
This reset will change the QP state to prevent post sends and delete
timers to prevent callbacks.
Fixes:
0acb0cc7ecc1 ("IB/rdmavt: Initialize and teardown of qpn table")
Link: https://lore.kernel.org/r/20200210131040.87408.38161.stgit@awfm-01.aw.intel.com
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Mike Marciniszyn [Mon, 10 Feb 2020 13:10:33 +0000 (08:10 -0500)]
IB/hfi1: Close window for pq and request coliding
commit
be8638344c70bf492963ace206a9896606b6922d upstream.
Cleaning up a pq can result in the following warning and panic:
WARNING: CPU: 52 PID: 77418 at lib/list_debug.c:53 __list_del_entry+0x63/0xd0
list_del corruption,
ffff88cb2c6ac068->next is LIST_POISON1 (
dead000000000100)
Modules linked in: mmfs26(OE) mmfslinux(OE) tracedev(OE) 8021q garp mrp ib_isert iscsi_target_mod target_core_mod crc_t10dif crct10dif_generic opa_vnic rpcrdma ib_iser libiscsi scsi_transport_iscsi ib_ipoib(OE) bridge stp llc iTCO_wdt iTCO_vendor_support intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crct10dif_pclmul crct10dif_common crc32_pclmul ghash_clmulni_intel ast aesni_intel ttm lrw gf128mul glue_helper ablk_helper drm_kms_helper cryptd syscopyarea sysfillrect sysimgblt fb_sys_fops drm pcspkr joydev lpc_ich mei_me drm_panel_orientation_quirks i2c_i801 mei wmi ipmi_si ipmi_devintf ipmi_msghandler nfit libnvdimm acpi_power_meter acpi_pad hfi1(OE) rdmavt(OE) rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core binfmt_misc numatools(OE) xpmem(OE) ip_tables
nfsv3 nfs_acl nfs lockd grace sunrpc fscache igb ahci i2c_algo_bit libahci dca ptp libata pps_core crc32c_intel [last unloaded: i2c_algo_bit]
CPU: 52 PID: 77418 Comm: pvbatch Kdump: loaded Tainted: G OE ------------ 3.10.0-957.38.3.el7.x86_64 #1
Hardware name: HPE.COM HPE SGI 8600-XA730i Gen10/X11DPT-SB-SG007, BIOS SBED1229 01/22/2019
Call Trace:
[<
ffffffff90365ac0>] dump_stack+0x19/0x1b
[<
ffffffff8fc98b78>] __warn+0xd8/0x100
[<
ffffffff8fc98bff>] warn_slowpath_fmt+0x5f/0x80
[<
ffffffff8ff970c3>] __list_del_entry+0x63/0xd0
[<
ffffffff8ff9713d>] list_del+0xd/0x30
[<
ffffffff8fddda70>] kmem_cache_destroy+0x50/0x110
[<
ffffffffc0328130>] hfi1_user_sdma_free_queues+0xf0/0x200 [hfi1]
[<
ffffffffc02e2350>] hfi1_file_close+0x70/0x1e0 [hfi1]
[<
ffffffff8fe4519c>] __fput+0xec/0x260
[<
ffffffff8fe453fe>] ____fput+0xe/0x10
[<
ffffffff8fcbfd1b>] task_work_run+0xbb/0xe0
[<
ffffffff8fc2bc65>] do_notify_resume+0xa5/0xc0
[<
ffffffff90379134>] int_signal+0x12/0x17
BUG: unable to handle kernel NULL pointer dereference at
0000000000000010
IP: [<
ffffffff8fe1f93e>] kmem_cache_close+0x7e/0x300
PGD
2cdab19067 PUD
2f7bfdb067 PMD 0
Oops: 0000 [#1] SMP
Modules linked in: mmfs26(OE) mmfslinux(OE) tracedev(OE) 8021q garp mrp ib_isert iscsi_target_mod target_core_mod crc_t10dif crct10dif_generic opa_vnic rpcrdma ib_iser libiscsi scsi_transport_iscsi ib_ipoib(OE) bridge stp llc iTCO_wdt iTCO_vendor_support intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crct10dif_pclmul crct10dif_common crc32_pclmul ghash_clmulni_intel ast aesni_intel ttm lrw gf128mul glue_helper ablk_helper drm_kms_helper cryptd syscopyarea sysfillrect sysimgblt fb_sys_fops drm pcspkr joydev lpc_ich mei_me drm_panel_orientation_quirks i2c_i801 mei wmi ipmi_si ipmi_devintf ipmi_msghandler nfit libnvdimm acpi_power_meter acpi_pad hfi1(OE) rdmavt(OE) rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core binfmt_misc numatools(OE) xpmem(OE) ip_tables
nfsv3 nfs_acl nfs lockd grace sunrpc fscache igb ahci i2c_algo_bit libahci dca ptp libata pps_core crc32c_intel [last unloaded: i2c_algo_bit]
CPU: 52 PID: 77418 Comm: pvbatch Kdump: loaded Tainted: G W OE ------------ 3.10.0-957.38.3.el7.x86_64 #1
Hardware name: HPE.COM HPE SGI 8600-XA730i Gen10/X11DPT-SB-SG007, BIOS SBED1229 01/22/2019
task:
ffff88cc26db9040 ti:
ffff88b5393a8000 task.ti:
ffff88b5393a8000
RIP: 0010:[<
ffffffff8fe1f93e>] [<
ffffffff8fe1f93e>] kmem_cache_close+0x7e/0x300
RSP: 0018:
ffff88b5393abd60 EFLAGS:
00010287
RAX:
0000000000000000 RBX:
ffff88cb2c6ac000 RCX:
0000000000000003
RDX:
0000000000000400 RSI:
0000000000000400 RDI:
ffffffff9095b800
RBP:
ffff88b5393abdb0 R08:
ffffffff9095b808 R09:
ffffffff8ff77c19
R10:
ffff88b73ce1f160 R11:
ffffddecddde9800 R12:
ffff88cb2c6ac000
R13:
000000000000000c R14:
ffff88cf3fdca780 R15:
0000000000000000
FS:
00002aaaaab52500(0000) GS:
ffff88b73ce00000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
0000000000000010 CR3:
0000002d27664000 CR4:
00000000007607e0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
PKRU:
55555554
Call Trace:
[<
ffffffff8fe20d44>] __kmem_cache_shutdown+0x14/0x80
[<
ffffffff8fddda78>] kmem_cache_destroy+0x58/0x110
[<
ffffffffc0328130>] hfi1_user_sdma_free_queues+0xf0/0x200 [hfi1]
[<
ffffffffc02e2350>] hfi1_file_close+0x70/0x1e0 [hfi1]
[<
ffffffff8fe4519c>] __fput+0xec/0x260
[<
ffffffff8fe453fe>] ____fput+0xe/0x10
[<
ffffffff8fcbfd1b>] task_work_run+0xbb/0xe0
[<
ffffffff8fc2bc65>] do_notify_resume+0xa5/0xc0
[<
ffffffff90379134>] int_signal+0x12/0x17
Code: 00 00 ba 00 04 00 00 0f 4f c2 3d 00 04 00 00 89 45 bc 0f 84 e7 01 00 00 48 63 45 bc 49 8d 04 c4 48 89 45 b0 48 8b 80 c8 00 00 00 <48> 8b 78 10 48 89 45 c0 48 83 c0 10 48 89 45 d0 48 8b 17 48 39
RIP [<
ffffffff8fe1f93e>] kmem_cache_close+0x7e/0x300
RSP <
ffff88b5393abd60>
CR2:
0000000000000010
The panic is the result of slab entries being freed during the destruction
of the pq slab.
The code attempts to quiesce the pq, but looking for n_req == 0 doesn't
account for new requests.
Fix the issue by using SRCU to get a pq pointer and adjust the pq free
logic to NULL the fd pq pointer prior to the quiesce.
Fixes:
e87473bc1b6c ("IB/hfi1: Only set fd pointer when base context is completely initialized")
Link: https://lore.kernel.org/r/20200210131033.87408.81174.stgit@awfm-01.aw.intel.com
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Kaike Wan [Mon, 10 Feb 2020 13:10:26 +0000 (08:10 -0500)]
IB/hfi1: Acquire lock to release TID entries when user file is closed
commit
a70ed0f2e6262e723ae8d70accb984ba309eacc2 upstream.
Each user context is allocated a certain number of RcvArray (TID)
entries and these entries are managed through TID groups. These groups
are put into one of three lists in each user context: tid_group_list,
tid_used_list, and tid_full_list, depending on the number of used TID
entries within each group. When TID packets are expected, one or more
TID groups will be allocated. After the packets are received, the TID
groups will be freed. Since multiple user threads may access the TID
groups simultaneously, a mutex exp_mutex is used to synchronize the
access. However, when the user file is closed, it tries to release
all TID groups without acquiring the mutex first, which risks a race
condition with another thread that may be releasing its TID groups,
leading to data corruption.
This patch addresses the issue by acquiring the mutex first before
releasing the TID groups when the file is closed.
Fixes:
3abb33ac6521 ("staging/hfi1: Add TID cache receive init and free funcs")
Link: https://lore.kernel.org/r/20200210131026.87408.86853.stgit@awfm-01.aw.intel.com
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Mark Zhang [Sun, 26 Jan 2020 17:17:08 +0000 (19:17 +0200)]
IB/mlx5: Return failure when rts2rts_qp_counters_set_id is not supported
commit
10189e8e6fe8dcde13435f9354800429c4474fb1 upstream.
When binding a QP with a counter and the QP state is not RESET, return
failure if the rts2rts_qp_counters_set_id is not supported by the
device.
This is to prevent cases like manual bind for Connect-IB devices from
returning success when the feature is not supported.
Fixes:
d14133dd4161 ("IB/mlx5: Support set qp counter")
Link: https://lore.kernel.org/r/20200126171708.5167-1-leon@kernel.org
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Colin Ian King [Tue, 14 Jan 2020 14:40:31 +0000 (14:40 +0000)]
drivers: ipmi: fix off-by-one bounds check that leads to a out-of-bounds write
commit
e0354d147e5889b5faa12e64fa38187aed39aad4 upstream.
The end of buffer check is off-by-one since the check is against
an index that is pre-incremented before a store to buf[]. Fix this
adjusting the bounds check appropriately.
Addresses-Coverity: ("Out-of-bounds write")
Fixes:
51bd6f291583 ("Add support for IPMB driver")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Message-Id: <
20200114144031.358003-1-colin.king@canonical.com>
Reviewed-by: Asmaa Mnebhi <asmaa@mellanox.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Yi Zhang [Fri, 14 Feb 2020 10:48:02 +0000 (18:48 +0800)]
nvme: fix the parameter order for nvme_get_log in nvme_get_fw_slot_info
commit
f25372ffc3f6c2684b57fb718219137e6ee2b64c upstream.
nvme fw-activate operation will get bellow warning log,
fix it by update the parameter order
[ 113.231513] nvme nvme0: Get FW SLOT INFO log error
Fixes:
0e98719b0e4b ("nvme: simplify the API for getting log pages")
Reported-by: Sujith Pandel <sujith_pandel@dell.com>
Reviewed-by: David Milburn <dmilburn@redhat.com>
Signed-off-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Marek Behún [Sat, 15 Feb 2020 14:21:30 +0000 (15:21 +0100)]
bus: moxtet: fix potential stack buffer overflow
commit
3bf3c9744694803bd2d6f0ee70a6369b980530fd upstream.
The input_read function declares the size of the hex array relative to
sizeof(buf), but buf is a pointer argument of the function. The hex
array is meant to contain hexadecimal representation of the bin array.
Link: https://lore.kernel.org/r/20200215142130.22743-1-marek.behun@nic.cz
Fixes:
5bc7f990cd98 ("bus: Add support for Moxtet bus")
Signed-off-by: Marek Behún <marek.behun@nic.cz>
Reported-by: sohu0106 <sohu0106@126.com>
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Boris Brezillon [Fri, 29 Nov 2019 13:59:08 +0000 (14:59 +0100)]
drm/panfrost: Make sure the shrinker does not reclaim referenced BOs
commit
7e0cf7e9936c4358b0863357b90aa12afe6489da upstream.
Userspace might tag a BO purgeable while it's still referenced by GPU
jobs. We need to make sure the shrinker does not purge such BOs until
all jobs referencing it are finished.
Fixes:
013b65101315 ("drm/panfrost: Add madvise and shrinker support")
Cc: <stable@vger.kernel.org>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20191129135908.2439529-9-boris.brezillon@collabora.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Daniel Vetter [Sun, 2 Feb 2020 13:21:33 +0000 (14:21 +0100)]
drm/vgem: Close use-after-free race in vgem_gem_create
commit
4b848f20eda5974020f043ca14bacf7a7e634fc8 upstream.
There's two references floating around here (for the object reference,
not the handle_count reference, that's a different thing):
- The temporary reference held by vgem_gem_create, acquired by
creating the object and released by calling
drm_gem_object_put_unlocked.
- The reference held by the object handle, created by
drm_gem_handle_create. This one generally outlives the function,
except if a 2nd thread races with a GEM_CLOSE ioctl call.
So usually everything is correct, except in that race case, where the
access to gem_object->size could be looking at freed data already.
Which again isn't a real problem (userspace shot its feet off already
with the race, we could return garbage), but maybe someone can exploit
this as an information leak.
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Hillf Danton <hdanton@sina.com>
Reported-by: syzbot+0dc4444774d419e916c8@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Cc: Emil Velikov <emil.velikov@collabora.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Sean Paul <seanpaul@chromium.org>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Eric Anholt <eric@anholt.net>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Rob Clark <robdclark@chromium.org>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200202132133.1891846-1-daniel.vetter@ffwll.ch
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Christian Borntraeger [Mon, 10 Feb 2020 16:27:37 +0000 (11:27 -0500)]
s390/uv: Fix handling of length extensions
commit
27dc0700c3be7c681cea03c5230b93d02f623492 upstream.
The query parameter block might contain additional information and can
be extended in the future. If the size of the block does not suffice we
get an error code of rc=0x100. The buffer will contain all information
up to the specified size and the hypervisor/guest simply do not need the
additional information as they do not know about the new data. That
means that we can (and must) accept rc=0x100 as success.
Cc: stable@vger.kernel.org
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Fixes:
5abb9351dfd9 ("s390/uv: introduce guest side ultravisor code")
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Harald Freudenberger [Fri, 31 Jan 2020 11:08:31 +0000 (12:08 +0100)]
s390/pkey: fix missing length of protected key on return
commit
aab73d278d49c718b722ff5052e16c9cddf144d4 upstream.
The pkey ioctl call PKEY_SEC2PROTK updates a struct pkey_protkey
on return. The protected key is stored in, the protected key type
is stored in but the len information was not updated. This patch
now fixes this and so the len field gets an update to refrect
the actual size of the protected key value returned.
Fixes:
efc598e6c8a9 ("s390/zcrypt: move cca misc functions to new code file")
Cc: Stable <stable@vger.kernel.org>
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reported-by: Christian Rund <RUNDC@de.ibm.com>
Suggested-by: Ingo Franzki <ifranzki@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Kim Phillips [Tue, 21 Jan 2020 17:12:31 +0000 (11:12 -0600)]
perf/x86/amd: Add missing L2 misses event spec to AMD Family 17h's event map
commit
25d387287cf0330abf2aad761ce6eee67326a355 upstream.
Commit
3fe3331bb285 ("perf/x86/amd: Add event map for AMD Family 17h"),
claimed L2 misses were unsupported, due to them not being found in its
referenced documentation, whose link has now moved [1].
That old documentation listed PMCx064 unit mask bit 3 as:
"LsRdBlkC: LS Read Block C S L X Change to X Miss."
and bit 0 as:
"IcFillMiss: IC Fill Miss"
We now have new public documentation [2] with improved descriptions, that
clearly indicate what events those unit mask bits represent:
Bit 3 now clearly states:
"LsRdBlkC: Data Cache Req Miss in L2 (all types)"
and bit 0 is:
"IcFillMiss: Instruction Cache Req Miss in L2."
So we can now add support for L2 misses in perf's genericised events as
PMCx064 with both the above unit masks.
[1] The commit's original documentation reference, "Processor Programming
Reference (PPR) for AMD Family 17h Model 01h, Revision B1 Processors",
originally available here:
https://www.amd.com/system/files/TechDocs/54945_PPR_Family_17h_Models_00h-0Fh.pdf
is now available here:
https://developer.amd.com/wordpress/media/2017/11/54945_PPR_Family_17h_Models_00h-0Fh.pdf
[2] "Processor Programming Reference (PPR) for Family 17h Model 31h,
Revision B0 Processors", available here:
https://developer.amd.com/wp-content/resources/55803_0.54-PUB.pdf
Fixes:
3fe3331bb285 ("perf/x86/amd: Add event map for AMD Family 17h")
Reported-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Babu Moger <babu.moger@amd.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20200121171232.28839-1-kim.phillips@amd.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sean Christopherson [Fri, 7 Feb 2020 17:37:41 +0000 (09:37 -0800)]
KVM: nVMX: Use correct root level for nested EPT shadow page tables
commit
148d735eb55d32848c3379e460ce365f2c1cbe4b upstream.
Hardcode the EPT page-walk level for L2 to be 4 levels, as KVM's MMU
currently also hardcodes the page walk level for nested EPT to be 4
levels. The L2 guest is all but guaranteed to soft hang on its first
instruction when L1 is using EPT, as KVM will construct 4-level page
tables and then tell hardware to use 5-level page tables.
Fixes:
855feb673640 ("KVM: MMU: Add 5 level EPT & Shadow page table support.")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Robert Richter [Wed, 12 Feb 2020 17:25:18 +0000 (18:25 +0100)]
EDAC/mc: Fix use-after-free and memleaks during device removal
commit
216aa145aaf379a50b17afc812db71d893bd6683 upstream.
A test kernel with the options DEBUG_TEST_DRIVER_REMOVE, KASAN and
DEBUG_KMEMLEAK set, revealed several issues when removing an mci device:
1) Use-after-free:
On 27.11.19 17:07:33, John Garry wrote:
> [ 22.104498] BUG: KASAN: use-after-free in
> edac_remove_sysfs_mci_device+0x148/0x180
The use-after-free is caused by the mci_for_each_dimm() macro called in
edac_remove_sysfs_mci_device(). The iterator was introduced with
c498afaf7df8 ("EDAC: Introduce an mci_for_each_dimm() iterator").
The iterator loop calls device_unregister(&dimm->dev), which removes
the sysfs entry of the device, but also frees the dimm struct in
dimm_attr_release(). When incrementing the loop in mci_for_each_dimm(),
the dimm struct is accessed again, after having been freed already.
The fix is to free all the mci device's subsequent dimm and csrow
objects at a later point, in _edac_mc_free(), when the mci device itself
is being freed.
This keeps the data structures intact and the mci device can be
fully used until its removal. The change allows the safe usage of
mci_for_each_dimm() to release dimm devices from sysfs.
2) Memory leaks:
Following memory leaks have been detected:
# grep edac /sys/kernel/debug/kmemleak | sort | uniq -c
1 [<
000000003c0f58f9>] edac_mc_alloc+0x3bc/0x9d0 # mci->csrows
16 [<
00000000bb932dc0>] edac_mc_alloc+0x49c/0x9d0 # csr->channels
16 [<
00000000e2734dba>] edac_mc_alloc+0x518/0x9d0 # csr->channels[chn]
1 [<
00000000eb040168>] edac_mc_alloc+0x5c8/0x9d0 # mci->dimms
34 [<
00000000ef737c29>] ghes_edac_register+0x1c8/0x3f8 # see edac_mc_alloc()
All leaks are from memory allocated by edac_mc_alloc().
Note: The test above shows that edac_mc_alloc() was called here from
ghes_edac_register(), thus both functions show up in the stack trace
but the module causing the leaks is edac_mc. The comments with the data
structures involved were made manually by analyzing the objdump.
The data structures listed above and created by edac_mc_alloc() are
not properly removed during device removal, which is done in
edac_mc_free().
There are two paths implemented to remove the device depending on device
registration, _edac_mc_free() is called if the device is not registered
and edac_unregister_sysfs() otherwise.
The implemenations differ. For the sysfs case, the mci device removal
lacks the removal of subsequent data structures (csrows, channels,
dimms). This causes the memory leaks (see mci_attr_release()).
[ bp: Massage commit message. ]
Fixes:
c498afaf7df8 ("EDAC: Introduce an mci_for_each_dimm() iterator")
Fixes:
faa2ad09c01c ("edac_mc: edac_mc_free() cannot assume mem_ctl_info is registered in sysfs.")
Fixes:
7a623c039075 ("edac: rewrite the sysfs code to use struct device")
Reported-by: John Garry <john.garry@huawei.com>
Signed-off-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: John Garry <john.garry@huawei.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20200212120340.4764-3-rrichter@marvell.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Robert Richter [Wed, 12 Feb 2020 12:03:39 +0000 (13:03 +0100)]
EDAC/sysfs: Remove csrow objects on errors
commit
4d59588c09f2a2daedad2a544d4d1b602ab3a8af upstream.
All created csrow objects must be removed in the error path of
edac_create_csrow_objects(). The objects have been added as devices.
They need to be removed by doing a device_del() *and* put_device() call
to also free their memory. The missing put_device() leaves a memory
leak. Use device_unregister() instead of device_del() which properly
unregisters the device doing both.
Fixes:
7adc05d2dc3a ("EDAC/sysfs: Drop device references properly")
Signed-off-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: John Garry <john.garry@huawei.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20200212120340.4764-4-rrichter@marvell.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Ronnie Sahlberg [Thu, 13 Feb 2020 02:14:47 +0000 (12:14 +1000)]
cifs: make sure we do not overflow the max EA buffer size
commit
85db6b7ae65f33be4bb44f1c28261a3faa126437 upstream.
RHBZ: 1752437
Before we add a new EA we should check that this will not overflow
the maximum buffer we have available to read the EAs back.
Otherwise we can get into a situation where the EAs are so big that
we can not read them back to the client and thus we can not list EAs
anymore or delete them.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Chuck Lever [Wed, 12 Feb 2020 16:12:30 +0000 (11:12 -0500)]
xprtrdma: Fix DMA scatter-gather list mapping imbalance
commit
ca1c671302825182629d3c1a60363cee6f5455bb upstream.
The @nents value that was passed to ib_dma_map_sg() has to be passed
to the matching ib_dma_unmap_sg() call. If ib_dma_map_sg() choses to
concatenate sg entries, it will return a different nents value than
it was passed.
The bug was exposed by recent changes to the AMD IOMMU driver, which
enabled sg entry concatenation.
Looking all the way back to commit
4143f34e01e9 ("xprtrdma: Port to
new memory registration API") and reviewing other kernel ULPs, it's
not clear that the frwr_map() logic was ever correct for this case.
Reported-by: Andre Tomt <andre@tomt.net>
Suggested-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: stable@vger.kernel.org
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Will Deacon [Thu, 6 Feb 2020 10:42:58 +0000 (10:42 +0000)]
arm64: ssbs: Fix context-switch when SSBS is present on all CPUs
commit
fca3d33d8ad61eb53eca3ee4cac476d1e31b9008 upstream.
When all CPUs in the system implement the SSBS extension, the SSBS field
in PSTATE is the definitive indication of the mitigation state. Further,
when the CPUs implement the SSBS manipulation instructions (advertised
to userspace via an HWCAP), EL0 can toggle the SSBS field directly and
so we cannot rely on any shadow state such as TIF_SSBD at all.
Avoid forcing the SSBS field in context-switch on such a system, and
simply rely on the PSTATE register instead.
Cc: <stable@vger.kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Srinivas Ramana <sramana@codeaurora.org>
Fixes:
cbdf8a189a66 ("arm64: Force SSBS on context switch")
Reviewed-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Paul Thomas [Sat, 25 Jan 2020 22:14:10 +0000 (17:14 -0500)]
gpio: xilinx: Fix bug where the wrong GPIO register is written to
commit
c3afa804c58e5c30ac63858b527fffadc88bce82 upstream.
Care is taken with "index", however with the current version
the actual xgpio_writereg is using index for data but
xgpio_regoffset(chip, i) for the offset. And since i is already
incremented it is incorrect. This patch fixes it so that index
is used for the offset too.
Cc: stable@vger.kernel.org
Signed-off-by: Paul Thomas <pthomas8589@gmail.com>
Link: https://lore.kernel.org/r/20200125221410.8022-1-pthomas8589@gmail.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Krzysztof Kozlowski [Thu, 30 Jan 2020 19:55:24 +0000 (20:55 +0100)]
ARM: npcm: Bring back GPIOLIB support
commit
e383e871ab54f073c2a798a9e0bde7f1d0528de8 upstream.
The CONFIG_ARCH_REQUIRE_GPIOLIB is gone since commit
65053e1a7743
("gpio: delete ARCH_[WANTS_OPTIONAL|REQUIRE]_GPIOLIB") and all platforms
should explicitly select GPIOLIB to have it.
Link: https://lore.kernel.org/r/20200130195525.4525-1-krzk@kernel.org
Cc: <stable@vger.kernel.org>
Fixes:
65053e1a7743 ("gpio: delete ARCH_[WANTS_OPTIONAL|REQUIRE]_GPIOLIB")
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
David Sterba [Wed, 5 Feb 2020 16:12:28 +0000 (17:12 +0100)]
btrfs: log message when rw remount is attempted with unclean tree-log
commit
10a3a3edc5b89a8cd095bc63495fb1e0f42047d9 upstream.
A remount to a read-write filesystem is not safe when there's tree-log
to be replayed. Files that could be opened until now might be affected
by the changes in the tree-log.
A regular mount is needed to replay the log so the filesystem presents
the consistent view with the pending changes included.
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
David Sterba [Wed, 5 Feb 2020 16:12:16 +0000 (17:12 +0100)]
btrfs: print message when tree-log replay starts
commit
e8294f2f6aa6208ed0923aa6d70cea3be178309a upstream.
There's no logged information about tree-log replay although this is
something that points to previous unclean unmount. Other filesystems
report that as well.
Suggested-by: Chris Murphy <lists@colorremedies.com>
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Wenwen Wang [Sat, 1 Feb 2020 20:38:38 +0000 (20:38 +0000)]
btrfs: ref-verify: fix memory leaks
commit
f311ade3a7adf31658ed882aaab9f9879fdccef7 upstream.
In btrfs_ref_tree_mod(), 'ref' and 'ra' are allocated through kzalloc() and
kmalloc(), respectively. In the following code, if an error occurs, the
execution will be redirected to 'out' or 'out_unlock' and the function will
be exited. However, on some of the paths, 'ref' and 'ra' are not
deallocated, leading to memory leaks. For example, if 'action' is
BTRFS_ADD_DELAYED_EXTENT, add_block_entry() will be invoked. If the return
value indicates an error, the execution will be redirected to 'out'. But,
'ref' is not deallocated on this path, causing a memory leak.
To fix the above issues, deallocate both 'ref' and 'ra' before exiting from
the function when an error is encountered.
CC: stable@vger.kernel.org # 4.15+
Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Filipe Manana [Fri, 31 Jan 2020 14:06:07 +0000 (14:06 +0000)]
Btrfs: fix race between using extent maps and merging them
commit
ac05ca913e9f3871126d61da275bfe8516ff01ca upstream.
We have a few cases where we allow an extent map that is in an extent map
tree to be merged with other extents in the tree. Such cases include the
unpinning of an extent after the respective ordered extent completed or
after logging an extent during a fast fsync. This can lead to subtle and
dangerous problems because when doing the merge some other task might be
using the same extent map and as consequence see an inconsistent state of
the extent map - for example sees the new length but has seen the old start
offset.
With luck this triggers a BUG_ON(), and not some silent bug, such as the
following one in __do_readpage():
$ cat -n fs/btrfs/extent_io.c
3061 static int __do_readpage(struct extent_io_tree *tree,
3062 struct page *page,
(...)
3127 em = __get_extent_map(inode, page, pg_offset, cur,
3128 end - cur + 1, get_extent, em_cached);
3129 if (IS_ERR_OR_NULL(em)) {
3130 SetPageError(page);
3131 unlock_extent(tree, cur, end);
3132 break;
3133 }
3134 extent_offset = cur - em->start;
3135 BUG_ON(extent_map_end(em) <= cur);
(...)
Consider the following example scenario, where we end up hitting the
BUG_ON() in __do_readpage().
We have an inode with a size of 8KiB and 2 extent maps:
extent A: file offset 0, length 4KiB, disk_bytenr = X, persisted on disk by
a previous transaction
extent B: file offset 4KiB, length 4KiB, disk_bytenr = X + 4KiB, not yet
persisted but writeback started for it already. The extent map
is pinned since there's writeback and an ordered extent in
progress, so it can not be merged with extent map A yet
The following sequence of steps leads to the BUG_ON():
1) The ordered extent for extent B completes, the respective page gets its
writeback bit cleared and the extent map is unpinned, at that point it
is not yet merged with extent map A because it's in the list of modified
extents;
2) Due to memory pressure, or some other reason, the MM subsystem releases
the page corresponding to extent B - btrfs_releasepage() is called and
returns 1, meaning the page can be released as it's not dirty, not under
writeback anymore and the extent range is not locked in the inode's
iotree. However the extent map is not released, either because we are
not in a context that allows memory allocations to block or because the
inode's size is smaller than 16MiB - in this case our inode has a size
of 8KiB;
3) Task B needs to read extent B and ends up __do_readpage() through the
btrfs_readpage() callback. At __do_readpage() it gets a reference to
extent map B;
4) Task A, doing a fast fsync, calls clear_em_loggin() against extent map B
while holding the write lock on the inode's extent map tree - this
results in try_merge_map() being called and since it's possible to merge
extent map B with extent map A now (the extent map B was removed from
the list of modified extents), the merging begins - it sets extent map
B's start offset to 0 (was 4KiB), but before it increments the map's
length to 8KiB (4kb + 4KiB), task A is at:
BUG_ON(extent_map_end(em) <= cur);
The call to extent_map_end() sees the extent map has a start of 0
and a length still at 4KiB, so it returns 4KiB and 'cur' is 4KiB, so
the BUG_ON() is triggered.
So it's dangerous to modify an extent map that is in the tree, because some
other task might have got a reference to it before and still using it, and
needs to see a consistent map while using it. Generally this is very rare
since most paths that lookup and use extent maps also have the file range
locked in the inode's iotree. The fsync path is pretty much the only
exception where we don't do it to avoid serialization with concurrent
reads.
Fix this by not allowing an extent map do be merged if if it's being used
by tasks other then the one attempting to merge the extent map (when the
reference count of the extent map is greater than 2).
Reported-by: ryusuke1925 <st13s20@gm.ibaraki-ct.ac.jp>
Reported-by: Koki Mitani <koki.mitani.xg@hco.ntt.co.jp>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=206211
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Theodore Ts'o [Fri, 14 Feb 2020 23:11:19 +0000 (18:11 -0500)]
ext4: improve explanation of a mount failure caused by a misconfigured kernel
commit
d65d87a07476aa17df2dcb3ad18c22c154315bec upstream.
If CONFIG_QFMT_V2 is not enabled, but CONFIG_QUOTA is enabled, when a
user tries to mount a file system with the quota or project quota
enabled, the kernel will emit a very confusing messsage:
EXT4-fs warning (device vdc): ext4_enable_quotas:5914: Failed to enable quota tracking (type=0, err=-3). Please run e2fsck to fix.
EXT4-fs (vdc): mount failed
We will now report an explanatory message indicating which kernel
configuration options have to be enabled, to avoid customer/sysadmin
confusion.
Link: https://lore.kernel.org/r/20200215012738.565735-1-tytso@mit.edu
Google-Bug-Id:
149093531
Fixes:
7c319d328505b778 ("ext4: make quota as first class supported feature")
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Shijie Luo [Tue, 11 Feb 2020 01:17:52 +0000 (20:17 -0500)]
ext4: add cond_resched() to ext4_protect_reserved_inode
commit
af133ade9a40794a37104ecbcc2827c0ea373a3c upstream.
When journal size is set too big by "mkfs.ext4 -J size=", or when
we mount a crafted image to make journal inode->i_size too big,
the loop, "while (i < num)", holds cpu too long. This could cause
soft lockup.
[ 529.357541] Call trace:
[ 529.357551] dump_backtrace+0x0/0x198
[ 529.357555] show_stack+0x24/0x30
[ 529.357562] dump_stack+0xa4/0xcc
[ 529.357568] watchdog_timer_fn+0x300/0x3e8
[ 529.357574] __hrtimer_run_queues+0x114/0x358
[ 529.357576] hrtimer_interrupt+0x104/0x2d8
[ 529.357580] arch_timer_handler_virt+0x38/0x58
[ 529.357584] handle_percpu_devid_irq+0x90/0x248
[ 529.357588] generic_handle_irq+0x34/0x50
[ 529.357590] __handle_domain_irq+0x68/0xc0
[ 529.357593] gic_handle_irq+0x6c/0x150
[ 529.357595] el1_irq+0xb8/0x140
[ 529.357599] __ll_sc_atomic_add_return_acquire+0x14/0x20
[ 529.357668] ext4_map_blocks+0x64/0x5c0 [ext4]
[ 529.357693] ext4_setup_system_zone+0x330/0x458 [ext4]
[ 529.357717] ext4_fill_super+0x2170/0x2ba8 [ext4]
[ 529.357722] mount_bdev+0x1a8/0x1e8
[ 529.357746] ext4_mount+0x44/0x58 [ext4]
[ 529.357748] mount_fs+0x50/0x170
[ 529.357752] vfs_kern_mount.part.9+0x54/0x188
[ 529.357755] do_mount+0x5ac/0xd78
[ 529.357758] ksys_mount+0x9c/0x118
[ 529.357760] __arm64_sys_mount+0x28/0x38
[ 529.357764] el0_svc_common+0x78/0x130
[ 529.357766] el0_svc_handler+0x38/0x78
[ 529.357769] el0_svc+0x8/0xc
[ 541.356516] watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [mount:18674]
Link: https://lore.kernel.org/r/20200211011752.29242-1-luoshijie1@huawei.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Shijie Luo <luoshijie1@huawei.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jan Kara [Mon, 10 Feb 2020 14:43:16 +0000 (15:43 +0100)]
ext4: fix checksum errors with indexed dirs
commit
48a34311953d921235f4d7bbd2111690d2e469cf upstream.
DIR_INDEX has been introduced as a compat ext4 feature. That means that
even kernels / tools that don't understand the feature may modify the
filesystem. This works because for kernels not understanding indexed dir
format, internal htree nodes appear just as empty directory entries.
Index dir aware kernels then check the htree structure is still
consistent before using the data. This all worked reasonably well until
metadata checksums were introduced. The problem is that these
effectively made DIR_INDEX only ro-compatible because internal htree
nodes store checksums in a different place than normal directory blocks.
Thus any modification ignorant to DIR_INDEX (or just clearing
EXT4_INDEX_FL from the inode) will effectively cause checksum mismatch
and trigger kernel errors. So we have to be more careful when dealing
with indexed directories on filesystems with checksumming enabled.
1) We just disallow loading any directory inodes with EXT4_INDEX_FL when
DIR_INDEX is not enabled. This is harsh but it should be very rare (it
means someone disabled DIR_INDEX on existing filesystem and didn't run
e2fsck), e2fsck can fix the problem, and we don't want to answer the
difficult question: "Should we rather corrupt the directory more or
should we ignore that DIR_INDEX feature is not set?"
2) When we find out htree structure is corrupted (but the filesystem and
the directory should in support htrees), we continue just ignoring htree
information for reading but we refuse to add new entries to the
directory to avoid corrupting it more.
Link: https://lore.kernel.org/r/20200210144316.22081-1-jack@suse.cz
Fixes:
dbe89444042a ("ext4: Calculate and verify checksums for htree nodes")
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Theodore Ts'o [Thu, 6 Feb 2020 22:35:01 +0000 (17:35 -0500)]
ext4: fix support for inode sizes > 1024 bytes
commit
4f97a68192bd33b9963b400759cef0ca5963af00 upstream.
A recent commit,
9803387c55f7 ("ext4: validate the
debug_want_extra_isize mount option at parse time"), moved mount-time
checks around. One of those changes moved the inode size check before
the blocksize variable was set to the blocksize of the file system.
After
9803387c55f7 was set to the minimum allowable blocksize, which
in practice on most systems would be 1024 bytes. This cuased file
systems with inode sizes larger than 1024 bytes to be rejected with a
message:
EXT4-fs (sdXX): unsupported inode size: 4096
Fixes:
9803387c55f7 ("ext4: validate the debug_want_extra_isize mount option at parse time")
Link: https://lore.kernel.org/r/20200206225252.GA3673@mit.edu
Reported-by: Herbert Poetzl <herbert@13thfloor.at>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Andreas Dilger [Sun, 26 Jan 2020 22:03:34 +0000 (15:03 -0700)]
ext4: don't assume that mmp_nodename/bdevname have NUL
commit
14c9ca0583eee8df285d68a0e6ec71053efd2228 upstream.
Don't assume that the mmp_nodename and mmp_bdevname strings are NUL
terminated, since they are filled in by snprintf(), which is not
guaranteed to do so.
Link: https://lore.kernel.org/r/1580076215-1048-1-git-send-email-adilger@dilger.ca
Signed-off-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Alexander Tsoy [Wed, 12 Feb 2020 23:54:50 +0000 (02:54 +0300)]
ALSA: usb-audio: Add clock validity quirk for Denon MC7000/MCX8000
commit
9f35a31283775e6f6af73fb2c95c686a4c0acac7 upstream.
It should be safe to ignore clock validity check result if the following
conditions are met:
- only one single sample rate is supported;
- the terminal is directly connected to the clock source;
- the clock type is internal.
This is to deal with some Denon DJ controllers that always reports that
clock is invalid.
Tested-by: Tobias Oszlanyi <toszlanyi@yahoo.de>
Signed-off-by: Alexander Tsoy <alexander@tsoy.me>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20200212235450.697348-1-alexander@tsoy.me
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Saurav Girepunje [Tue, 29 Oct 2019 17:52:00 +0000 (23:22 +0530)]
ALSA: usb-audio: sound: usb: usb true/false for bool return type
commit
1d4961d9eb1aaa498dfb44779b7e4b95d79112d0 upstream.
Use true/false for bool type return in uac_clock_source_is_valid().
Signed-off-by: Saurav Girepunje <saurav.girepunje@gmail.com>
Link: https://lore.kernel.org/r/20191029175200.GA7320@saurav
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Rafael J. Wysocki [Tue, 11 Feb 2020 16:53:52 +0000 (17:53 +0100)]
ACPI: PM: s2idle: Prevent spurious SCIs from waking up the system
commit
fdde0ff8590b4c1c41b3227f5ac4265fccccb96b upstream.
If the platform triggers a spurious SCI even though the status bit
is not set for any GPE when the system is suspended to idle, it will
be treated as a genuine wakeup, so avoid that by checking if any GPEs
are active at all before returning 'true' from acpi_s2idle_wake().
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206413
Fixes:
56b991849009 ("PM: sleep: Simplify suspend-to-idle control flow")
Reported-by: Tsuchiya Yuto <kitakar@gmail.com>
Cc: 5.4+ <stable@vger.kernel.org> # 5.4+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Rafael J. Wysocki [Tue, 11 Feb 2020 16:52:32 +0000 (17:52 +0100)]
ACPICA: Introduce acpi_any_gpe_status_set()
commit
ea128834dd76f9a72a35d011c651fa96658f06a7 upstream.
Introduce a new helper function, acpi_any_gpe_status_set(), for
checking the status bits of all enabled GPEs in one go.
It is needed to distinguish spurious SCIs from genuine ones when
deciding whether or not to wake up the system from suspend-to-idle.
Cc: 5.4+ <stable@vger.kernel.org> # 5.4+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Rafael J. Wysocki [Tue, 11 Feb 2020 09:11:02 +0000 (10:11 +0100)]
ACPI: PM: s2idle: Avoid possible race related to the EC GPE
commit
e3728b50cd9be7d4b1469447cdf1feb93e3b7adb upstream.
It is theoretically possible for the ACPI EC GPE to be set after the
s2idle_ops->wake() called from s2idle_loop() has returned and before
the subsequent pm_wakeup_pending() check is carried out. If that
happens, the resulting wakeup event will cause the system to resume
even though it may be a spurious one.
To avoid that race, first make the ->wake() callback in struct
platform_s2idle_ops return a bool value indicating whether or not
to let the system resume and rearrange s2idle_loop() to use that
value instad of the direct pm_wakeup_pending() call if ->wake() is
present.
Next, rework acpi_s2idle_wake() to process EC events and check
pm_wakeup_pending() before re-arming the SCI for system wakeup
to prevent it from triggering prematurely and add comments to
that function to explain the rationale for the new code flow.
Fixes:
56b991849009 ("PM: sleep: Simplify suspend-to-idle control flow")
Cc: 5.4+ <stable@vger.kernel.org> # 5.4+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Rafael J. Wysocki [Tue, 11 Feb 2020 09:07:43 +0000 (10:07 +0100)]
ACPI: EC: Fix flushing of pending work
commit
f0ac20c3f6137910c8a927953e8a92f5b3716166 upstream.
Commit
016b87ca5c8c ("ACPI: EC: Rework flushing of pending work")
introduced a subtle bug into the flushing of pending EC work while
suspended to idle, which may cause the EC driver to fail to
re-enable the EC GPE after handling a non-wakeup event (like a
battery status change event, for example).
The problem is that the work item flushed by flush_scheduled_work()
in __acpi_ec_flush_work() may disable the EC GPE and schedule another
work item expected to re-enable it, but that new work item is not
flushed, so __acpi_ec_flush_work() returns with the EC GPE disabled
and the CPU running it goes into an idle state subsequently. If all
of the other CPUs are in idle states at that point, the EC GPE won't
be re-enabled until at least one CPU is woken up by another interrupt
source, so system wakeup events that would normally come from the EC
then don't work.
This is reproducible on a Dell XPS13 9360 in my office which
sometimes stops reacting to power button and lid events (triggered
by the EC on that machine) after switching from AC power to battery
power or vice versa while suspended to idle (each of those switches
causes the EC GPE to trigger for several times in a row, but they
are not system wakeup events).
To avoid this problem, it is necessary to drain the workqueue
entirely in __acpi_ec_flush_work(), but that cannot be done with
respect to system_wq, because work items may be added to it from
other places while __acpi_ec_flush_work() is running. For this
reason, make the EC driver use a dedicated workqueue for EC events
processing (let that workqueue be ordered so that EC events are
processed sequentially) and use drain_workqueue() on it in
__acpi_ec_flush_work().
Fixes:
016b87ca5c8c ("ACPI: EC: Rework flushing of pending work")
Cc: 5.4+ <stable@vger.kernel.org> # 5.4+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Arvind Sankar [Tue, 11 Feb 2020 16:22:35 +0000 (11:22 -0500)]
ALSA: usb-audio: Apply sample rate quirk for Audioengine D1
commit
93f9d1a4ac5930654c17412e3911b46ece73755a upstream.
The Audioengine D1 (0x2912:0x30c8) does support reading the sample rate,
but it returns the rate in byte-reversed order.
When setting sampling rate, the driver produces these warning messages:
[168840.944226] usb 3-2.2: current rate 4500480 is different from the runtime rate 44100
[168854.930414] usb 3-2.2: current rate 8436480 is different from the runtime rate 48000
[168905.185825] usb 3-2.1.2: current rate 30465 is different from the runtime rate 96000
As can be seen from the hexadecimal conversion, the current rate read
back is byte-reversed from the rate that was set.
44100 == 0x00ac44, 4500480 == 0x44ac00
48000 == 0x00bb80, 8436480 == 0x80bb00
96000 == 0x017700, 30465 == 0x007701
Rather than implementing a new quirk to reverse the order, just skip
checking the rate to avoid spamming the log.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20200211162235.1639889-1-nivedita@alum.mit.edu
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Takashi Iwai [Wed, 12 Feb 2020 08:10:47 +0000 (09:10 +0100)]
ALSA: hda/realtek - Fix silent output on MSI-GL73
commit
7dafba3762d6c0083ded00a48f8c1a158bc86717 upstream.
MSI-GL73 laptop with ALC1220 codec requires a similar workaround for
Clevo laptops to enforce the DAC/mixer connection path. Set up a
quirk entry for that.
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=204159
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20200212081047.27727-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Kailang Yang [Mon, 10 Feb 2020 08:15:14 +0000 (16:15 +0800)]
ALSA: hda/realtek - Add more codec supported Headset Button
commit
2b3b6497c38d123934de68ea82a247b557d95290 upstream.
Add supported Headset Button for ALC215/ALC285/ALC289.
Signed-off-by: Kailang Yang <kailang@realtek.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/948f70b4488f4cc2b629a39ce4e4be33@realtek.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Takashi Iwai [Tue, 11 Feb 2020 16:05:21 +0000 (17:05 +0100)]
ALSA: usb-audio: Fix UAC2/3 effect unit parsing
commit
d75a170fd848f037a1e28893ad10be7a4c51f8a6 upstream.
We've got a regression report about M-Audio Fast Track C400 device,
and the git bisection resulted in the commit
e0ccdef92653 ("ALSA:
usb-audio: Clean up check_input_term()"). This commit was about the
rewrite of the input terminal parser, and it's not too obvious from
the change what really broke. The answer is: it's the interpretation
of UAC2/3 effect units.
In the original code, UAC2 effect unit is as if through UAC1
processing unit because both UAC1 PU and UAC2/3 EU share the same
number (0x07). The old code went through a complex switch-case
fallthrough, finally bailing out in the middle:
if (protocol == UAC_VERSION_2 &&
hdr[2] == UAC2_EFFECT_UNIT) {
/* UAC2/UAC1 unit IDs overlap here in an
* uncompatible way. Ignore this unit for now.
*/
return 0;
}
... and this special handling was missing in the new code; the new
code treats UAC2/3 effect unit as if it were equivalent with the
processing unit.
Actually, the old code was too confusing. The effect unit has an
incompatible unit description with the processing unit, so we
shouldn't have dealt with EU in the same way.
This patch addresses the regression by changing the effect unit
handling to the own parser function. The own parser function makes
the clear distinct with PU, so it improves the readability, too.
The EU parser just sets the type and the id like the old kernels.
Once when the proper effect unit support is added, we can revisit this
parser function, but for now, let's keep this simple setup as is.
Fixes:
e0ccdef92653 ("ALSA: usb-audio: Clean up check_input_term()")
Cc: <stable@vger.kernel.org>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=206147
Link: https://lore.kernel.org/r/20200211160521.31990-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Benjamin Tissoires [Fri, 14 Feb 2020 01:07:47 +0000 (17:07 -0800)]
Input: synaptics - remove the LEN0049 dmi id from topbuttonpad list
commit
5179a9dfa9440c1781816e2c9a183d1d2512dc61 upstream.
The Yoga 11e is using LEN0049, but it doesn't have a trackstick.
Thus, there is no need to create a software top buttons row.
However, it seems that the device works under SMBus, so keep it as part
of the smbus_pnp_ids.
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20200115013023.9710-1-benjamin.tissoires@redhat.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Gaurav Agrawal [Fri, 14 Feb 2020 01:06:10 +0000 (17:06 -0800)]
Input: synaptics - enable SMBus on ThinkPad L470
commit
b8a3d819f872e0a3a0a6db0dbbcd48071042fb98 upstream.
Add touchpad LEN2044 to the list, as it is capable of working with
psmouse.synaptics_intertouch=1
Signed-off-by: Gaurav Agrawal <agrawalgaurav@gnome.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/CADdtggVzVJq5gGNmFhKSz2MBwjTpdN5YVOdr4D3Hkkv=KZRc9g@mail.gmail.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Lyude Paul [Fri, 14 Feb 2020 00:59:15 +0000 (16:59 -0800)]
Input: synaptics - switch T470s to RMI4 by default
commit
bf502391353b928e63096127e5fd8482080203f5 upstream.
This supports RMI4 and everything seems to work, including the touchpad
buttons. So, let's enable this by default.
Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20200204194322.112638-1-lyude@redhat.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Greg Kroah-Hartman [Fri, 14 Feb 2020 21:34:20 +0000 (16:34 -0500)]
Linux 5.4.20
Stephen Smalley [Fri, 22 Nov 2019 17:22:45 +0000 (12:22 -0500)]
selinux: fall back to ref-walk if audit is required
commit
0188d5c025ca8fe756ba3193bd7d150139af5a88 upstream.
commit
bda0be7ad994 ("security: make inode_follow_link RCU-walk aware")
passed down the rcu flag to the SELinux AVC, but failed to adjust the
test in slow_avc_audit() to also return -ECHILD on LSM_AUDIT_DATA_DENTRY.
Previously, we only returned -ECHILD if generating an audit record with
LSM_AUDIT_DATA_INODE since this was only relevant from inode_permission.
Move the handling of MAY_NOT_BLOCK to avc_audit() and its inlined
equivalent in selinux_inode_permission() immediately after we determine
that audit is required, and always fall back to ref-walk in this case.
Fixes:
bda0be7ad994 ("security: make inode_follow_link RCU-walk aware")
Reported-by: Will Deacon <will@kernel.org>
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Nicolai Stange [Tue, 14 Jan 2020 10:39:03 +0000 (11:39 +0100)]
libertas: make lbs_ibss_join_existing() return error code on rates overflow
[ Upstream commit
1754c4f60aaf1e17d886afefee97e94d7f27b4cb ]
Commit
e5e884b42639 ("libertas: Fix two buffer overflows at parsing bss
descriptor") introduced a bounds check on the number of supplied rates to
lbs_ibss_join_existing() and made it to return on overflow.
However, the aforementioned commit doesn't set the return value accordingly
and thus, lbs_ibss_join_existing() would return with zero even though it
failed.
Make lbs_ibss_join_existing return -EINVAL in case the bounds check on the
number of supplied rates fails.
Fixes:
e5e884b42639 ("libertas: Fix two buffer overflows at parsing bss descriptor")
Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Nicolai Stange [Tue, 14 Jan 2020 10:39:02 +0000 (11:39 +0100)]
libertas: don't exit from lbs_ibss_join_existing() with RCU read lock held
[ Upstream commit
c7bf1fb7ddca331780b9a733ae308737b39f1ad4 ]
Commit
e5e884b42639 ("libertas: Fix two buffer overflows at parsing bss
descriptor") introduced a bounds check on the number of supplied rates to
lbs_ibss_join_existing().
Unfortunately, it introduced a return path from within a RCU read side
critical section without a corresponding rcu_read_unlock(). Fix this.
Fixes:
e5e884b42639 ("libertas: Fix two buffer overflows at parsing bss descriptor")
Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Qing Xu [Thu, 2 Jan 2020 02:39:27 +0000 (10:39 +0800)]
mwifiex: Fix possible buffer overflows in mwifiex_cmd_append_vsie_tlv()
[ Upstream commit
b70261a288ea4d2f4ac7cd04be08a9f0f2de4f4d ]
mwifiex_cmd_append_vsie_tlv() calls memcpy() without checking
the destination size may trigger a buffer overflower,
which a local user could use to cause denial of service
or the execution of arbitrary code.
Fix it by putting the length check before calling memcpy().
Signed-off-by: Qing Xu <m1s5p6688@gmail.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Qing Xu [Thu, 2 Jan 2020 02:39:26 +0000 (10:39 +0800)]
mwifiex: Fix possible buffer overflows in mwifiex_ret_wmm_get_status()
[ Upstream commit
3a9b153c5591548612c3955c9600a98150c81875 ]
mwifiex_ret_wmm_get_status() calls memcpy() without checking the
destination size.Since the source is given from remote AP which
contains illegal wmm elements , this may trigger a heap buffer
overflow.
Fix it by putting the length check before calling memcpy().
Signed-off-by: Qing Xu <m1s5p6688@gmail.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Chuhong Yuan [Mon, 9 Dec 2019 08:57:11 +0000 (16:57 +0800)]
dmaengine: axi-dmac: add a check for devm_regmap_init_mmio
commit
a5b982af953bcc838cd198b0434834cc1dff14ec upstream.
The driver misses checking the result of devm_regmap_init_mmio().
Add a check to fix it.
Fixes:
fc15be39a827 ("dmaengine: axi-dmac: add regmap support")
Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Reviewed-by: Alexandru Ardelean <alexandru.ardelean@analog.com>
Link: https://lore.kernel.org/r/20191209085711.16001-1-hslester96@gmail.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jerome Brunet [Fri, 13 Dec 2019 10:33:04 +0000 (11:33 +0100)]
clk: meson: g12a: fix missing uart2 in regmap table
commit
b1b3f0622a9d52ac19a63619911823c89a4d85a4 upstream.
UART2 peripheral is missing from the regmap fixup table of the g12a family
clock controller. As it is, any access to this clock would Oops, which is
not great.
Add the clock to the table to fix the problem.
Fixes:
085a4ea93d54 ("clk: meson: g12a: add peripheral clock controller")
Reported-by: Dmitry Shmidt <dimitrysh@google.com>
Tested-by: Dmitry Shmidt <dimitrysh@google.com>
Acked-by: Neil Armstrong <narmstrong@baylibre.com>
Tested-by: Kevin Hilman <khilman@baylibre.com>
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bartosz Golaszewski [Fri, 3 Jan 2020 11:41:56 +0000 (12:41 +0100)]
mfd: max77650: Select REGMAP_IRQ in Kconfig
commit
cb7a374a5e7a5af3f8c839f74439193add6d0589 upstream.
MAX77650 MFD driver uses regmap_irq API but doesn't select the required
REGMAP_IRQ option in Kconfig. This can cause the following build error
if regmap irq is not enabled implicitly by someone else:
ld: drivers/mfd/max77650.o: in function `max77650_i2c_probe':
max77650.c:(.text+0xcb): undefined reference to `devm_regmap_add_irq_chip'
ld: max77650.c:(.text+0xdb): undefined reference to `regmap_irq_get_domain'
make: *** [Makefile:1079: vmlinux] Error 1
Fix it by adding the missing option.
Fixes:
d0f60334500b ("mfd: Add new driver for MAX77650 PMIC")
Reported-by: Paul Gazzillo <paul@pgazz.com>
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Ben Whitten [Sat, 18 Jan 2020 20:56:24 +0000 (20:56 +0000)]
regmap: fix writes to non incrementing registers
commit
2e31aab08bad0d4ee3d3d890a7b74cb6293e0a41 upstream.
When checking if a register block is writable we must ensure that the
block does not start with or contain a non incrementing register.
Fixes:
8b9f9d4dc511 ("regmap: verify if register is writeable before writing operations")
Signed-off-by: Ben Whitten <ben.whitten@gmail.com>
Link: https://lore.kernel.org/r/20200118205625.14532-1-ben.whitten@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Geert Uytterhoeven [Wed, 18 Dec 2019 19:48:07 +0000 (20:48 +0100)]
pinctrl: sh-pfc: r8a7778: Fix duplicate SDSELF_B and SD1_CLK_B
commit
805f635703b2562b5ddd822c62fc9124087e5dd5 upstream.
The FN_SDSELF_B and FN_SD1_CLK_B enum IDs are used twice, which means
one set of users must be wrong. Replace them by the correct enum IDs.
Fixes:
87f8c988636db0d4 ("sh-pfc: Add r8a7778 pinmux support")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20191218194812.12741-2-geert+renesas@glider.be
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Geert Uytterhoeven [Wed, 13 Nov 2019 10:16:53 +0000 (11:16 +0100)]
pinctrl: sh-pfc: r8a77965: Fix DU_DOTCLKIN3 drive/bias control
commit
a34cd9dfd03fa9ec380405969f1d638bc63b8d63 upstream.
R-Car Gen3 Hardware Manual Errata for Rev. 2.00 of October 24, 2019
changed the configuration bits for drive and bias control for the
DU_DOTCLKIN3 pin on R-Car M3-N, to match the same pin on R-Car H3.
Update the driver to reflect this.
After this, the handling of drive and bias control for the various
DU_DOTCLKINx pins is consistent across all of the R-Car H3, M3-W,
M3-W+, and M3-N SoCs.
Fixes:
86c045c2e4201e94 ("pinctrl: sh-pfc: r8a77965: Replace DU_DOTCLKIN2 by DU_DOTCLKIN3")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20191113101653.28428-1-geert+renesas@glider.be
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Stephen Smalley [Fri, 17 Jan 2020 20:24:07 +0000 (15:24 -0500)]
selinux: fix regression introduced by move_mount(2) syscall
commit
98aa00345de54b8340dc2ddcd87f446d33387b5e upstream.
commit
2db154b3ea8e ("vfs: syscall: Add move_mount(2) to move mounts around")
introduced a new move_mount(2) system call and a corresponding new LSM
security_move_mount hook but did not implement this hook for any existing
LSM. This creates a regression for SELinux with respect to consistent
checking of mounts; the existing selinux_mount hook checks mounton
permission to the mount point path. Provide a SELinux hook
implementation for move_mount that applies this same check for
consistency. In the future we may wish to add a new move_mount
filesystem permission and check as well, but this addresses
the immediate regression.
Fixes:
2db154b3ea8e ("vfs: syscall: Add move_mount(2) to move mounts around")
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Stephen Smalley [Fri, 22 Nov 2019 17:22:44 +0000 (12:22 -0500)]
selinux: revert "stop passing MAY_NOT_BLOCK to the AVC upon follow_link"
commit
1a37079c236d55fb31ebbf4b59945dab8ec8764c upstream.
This reverts commit
e46e01eebbbc ("selinux: stop passing MAY_NOT_BLOCK
to the AVC upon follow_link"). The correct fix is to instead fall
back to ref-walk if audit is required irrespective of the specific
audit data type. This is done in the next commit.
Fixes:
e46e01eebbbc ("selinux: stop passing MAY_NOT_BLOCK to the AVC upon follow_link")
Reported-by: Will Deacon <will@kernel.org>
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Coly Li [Thu, 23 Jan 2020 17:01:37 +0000 (01:01 +0800)]
bcache: avoid unnecessary btree nodes flushing in btree_flush_write()
commit
2aa8c529387c25606fdc1484154b92f8bfbc5746 upstream.
the commit
91be66e1318f ("bcache: performance improvement for
btree_flush_write()") was an effort to flushing btree node with oldest
btree node faster in following methods,
- Only iterate dirty btree nodes in c->btree_cache, avoid scanning a lot
of clean btree nodes.
- Take c->btree_cache as a LRU-like list, aggressively flushing all
dirty nodes from tail of c->btree_cache util the btree node with
oldest journal entry is flushed. This is to reduce the time of holding
c->bucket_lock.
Guoju Fang and Shuang Li reported that they observe unexptected extra
write I/Os on cache device after applying the above patch. Guoju Fang
provideed more detailed diagnose information that the aggressive
btree nodes flushing may cause 10x more btree nodes to flush in his
workload. He points out when system memory is large enough to hold all
btree nodes in memory, c->btree_cache is not a LRU-like list any more.
Then the btree node with oldest journal entry is very probably not-
close to the tail of c->btree_cache list. In such situation much more
dirty btree nodes will be aggressively flushed before the target node
is flushed. When slow SATA SSD is used as cache device, such over-
aggressive flushing behavior will cause performance regression.
After spending a lot of time on debug and diagnose, I find the real
condition is more complicated, aggressive flushing dirty btree nodes
from tail of c->btree_cache list is not a good solution.
- When all btree nodes are cached in memory, c->btree_cache is not
a LRU-like list, the btree nodes with oldest journal entry won't
be close to the tail of the list.
- There can be hundreds dirty btree nodes reference the oldest journal
entry, before flushing all the nodes the oldest journal entry cannot
be reclaimed.
When the above two conditions mixed together, a simply flushing from
tail of c->btree_cache list is really NOT a good idea.
Fortunately there is still chance to make btree_flush_write() work
better. Here is how this patch avoids unnecessary btree nodes flushing,
- Only acquire c->journal.lock when getting oldest journal entry of
fifo c->journal.pin. In rested locations check the journal entries
locklessly, so their values can be changed on other cores
in parallel.
- In loop list_for_each_entry_safe_reverse(), checking latest front
point of fifo c->journal.pin. If it is different from the original
point which we get with locking c->journal.lock, it means the oldest
journal entry is reclaim on other cores. At this moment, all selected
dirty nodes recorded in array btree_nodes[] are all flushed and clean
on other CPU cores, it is unncessary to iterate c->btree_cache any
longer. Just quit the list_for_each_entry_safe_reverse() loop and
the following for-loop will skip all the selected clean nodes.
- Find a proper time to quit the list_for_each_entry_safe_reverse()
loop. Check the refcount value of orignial fifo front point, if the
value is larger than selected node number of btree_nodes[], it means
more matching btree nodes should be scanned. Otherwise it means no
more matching btee nodes in rest of c->btree_cache list, the loop
can be quit. If the original oldest journal entry is reclaimed and
fifo front point is updated, the refcount of original fifo front point
will be 0, then the loop will be quit too.
- Not hold c->bucket_lock too long time. c->bucket_lock is also required
for space allocation for cached data, hold it for too long time will
block regular I/O requests. When iterating list c->btree_cache, even
there are a lot of maching btree nodes, in order to not holding
c->bucket_lock for too long time, only BTREE_FLUSH_NR nodes are
selected and to flush in following for-loop.
With this patch, only btree nodes referencing oldest journal entry
are flushed to cache device, no aggressive flushing for unnecessary
btree node any more. And in order to avoid blocking regluar I/O
requests, each time when btree_flush_write() called, at most only
BTREE_FLUSH_NR btree nodes are selected to flush, even there are more
maching btree nodes in list c->btree_cache.
At last, one more thing to explain: Why it is safe to read front point
of c->journal.pin without holding c->journal.lock inside the
list_for_each_entry_safe_reverse() loop ?
Here is my answer: When reading the front point of fifo c->journal.pin,
we don't need to know the exact value of front point, we just want to
check whether the value is different from the original front point
(which is accurate value because we get it while c->jouranl.lock is
held). For such purpose, it works as expected without holding
c->journal.lock. Even the front point is changed on other CPU core and
not updated to local core, and current iterating btree node has
identical journal entry local as original fetched fifo front point, it
is still safe. Because after holding mutex b->write_lock (with memory
barrier) this btree node can be found as clean and skipped, the loop
will quite latter when iterate on next node of list c->btree_cache.
Fixes:
91be66e1318f ("bcache: performance improvement for btree_flush_write()")
Reported-by: Guoju Fang <fangguoju@gmail.com>
Reported-by: Shuang Li <psymon@bonuscloud.io>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Beniamin Bia [Tue, 14 Jan 2020 13:24:01 +0000 (15:24 +0200)]
dt-bindings: iio: adc: ad7606: Fix wrong maxItems value
commit
a6c4f77cb3b11f81077b53c4a38f21b92d41f21e upstream.
This patch set the correct value for oversampling maxItems. In the
original example, appears 3 items for oversampling while the maxItems
is set to 1, this patch fixes those issues.
Fixes:
416f882c3b40 ("dt-bindings: iio: adc: Migrate AD7606 documentation to yaml")
Signed-off-by: Beniamin Bia <beniamin.bia@analog.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Gustavo A. R. Silva [Tue, 22 Oct 2019 13:25:22 +0000 (15:25 +0200)]
media: i2c: adv748x: Fix unsafe macros
commit
0d962e061abcf1b9105f88fb850158b5887fbca3 upstream.
Enclose multiple macro parameters in parentheses in order to
make such macros safer and fix the Clang warning below:
drivers/media/i2c/adv748x/adv748x-afe.c:452:12: warning: operator '?:'
has lower precedence than '|'; '|' will be evaluated first
[-Wbitwise-conditional-parentheses]
ret = sdp_clrset(state, ADV748X_SDP_FRP, ADV748X_SDP_FRP_MASK, enable
? ctrl->val - 1 : 0);
Fixes:
3e89586a64df ("media: i2c: adv748x: add adv748x driver")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Christophe Roullier [Fri, 22 Nov 2019 13:22:46 +0000 (14:22 +0100)]
drivers: watchdog: stm32_iwdg: set WDOG_HW_RUNNING at probe
commit
85fdc63fe256b595f923a69848cd99972ff446d8 upstream.
If the watchdog hardware is already enabled during the boot process,
when the Linux watchdog driver loads, it should start/reset the watchdog
and tell the watchdog framework. As a result, ping can be generated from
the watchdog framework (if CONFIG_WATCHDOG_HANDLE_BOOT_ENABLED is set),
until the userspace watchdog daemon takes over control
Fixes:
4332d113c66a ("watchdog: Add STM32 IWDG driver")
Signed-off-by: Christophe Roullier <christophe.roullier@st.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20191122132246.8473-1-christophe.roullier@st.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Horia Geantă [Mon, 13 Jan 2020 08:54:35 +0000 (10:54 +0200)]
crypto: caam/qi2 - fix typo in algorithm's driver name
commit
53146d152510584c2034c62778a7cbca25743ce9 upstream.
Fixes:
8d818c105501 ("crypto: caam/qi2 - add DPAA2-CAAM driver")
Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Eric Biggers [Tue, 31 Dec 2019 03:19:33 +0000 (21:19 -0600)]
crypto: atmel-sha - fix error handling when setting hmac key
commit
b529f1983b2dcc46354f311feda92e07b6e9e2da upstream.
HMAC keys can be of any length, and atmel_sha_hmac_key_set() can only
fail due to -ENOMEM. But atmel_sha_hmac_setkey() incorrectly treated
any error as a "bad key length" error. Fix it to correctly propagate
the -ENOMEM error code and not set any tfm result flags.
Fixes:
81d8750b2b59 ("crypto: atmel-sha - add support to hmac(shaX)")
Cc: Nicolas Ferre <nicolas.ferre@microchip.com>
Cc: Alexandre Belloni <alexandre.belloni@bootlin.com>
Cc: Ludovic Desroches <ludovic.desroches@microchip.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Eric Biggers [Tue, 31 Dec 2019 03:19:32 +0000 (21:19 -0600)]
crypto: artpec6 - return correct error code for failed setkey()
commit
b828f905904cd76424230c69741a4cabb0174168 upstream.
->setkey() is supposed to retun -EINVAL for invalid key lengths, not -1.
Fixes:
a21eb94fc4d3 ("crypto: axis - add ARTPEC-6/7 crypto accelerator driver")
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Lars Persson <lars.persson@axis.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Lars Persson <lars.persson@axis.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Eric Biggers [Sun, 1 Dec 2019 21:53:26 +0000 (13:53 -0800)]
crypto: testmgr - don't try to decrypt uninitialized buffers
commit
eb455dbd02cb1074b37872ffca30a81cb2a18eaa upstream.
Currently if the comparison fuzz tests encounter an encryption error
when generating an skcipher or AEAD test vector, they will still test
the decryption side (passing it the uninitialized ciphertext buffer)
and expect it to fail with the same error.
This is sort of broken because it's not well-defined usage of the API to
pass an uninitialized buffer, and furthermore in the AEAD case it's
acceptable for the decryption error to be EBADMSG (meaning "inauthentic
input") even if the encryption error was something else like EINVAL.
Fix this for skcipher by explicitly initializing the ciphertext buffer
on error, and for AEAD by skipping the decryption test on error.
Reported-by: Pascal Van Leeuwen <pvanleeuwen@verimatrix.com>
Fixes:
d435e10e67be ("crypto: testmgr - fuzz skciphers against their generic implementation")
Fixes:
40153b10d91c ("crypto: testmgr - fuzz AEADs against their generic implementation")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
YueHaibing [Mon, 30 Dec 2019 03:29:45 +0000 (11:29 +0800)]
mtd: sharpslpart: Fix unsigned comparison to zero
commit
f33113b542219448fa02d77ca1c6f4265bd7f130 upstream.
The unsigned variable log_num is being assigned a return value
from the call to sharpsl_nand_get_logical_num that can return
-EINVAL.
Detected using Coccinelle:
./drivers/mtd/parsers/sharpslpart.c:207:6-13: WARNING: Unsigned expression compared with zero: log_num > 0
Fixes:
8a4580e4d298 ("mtd: sharpslpart: Add sharpslpart partition parser")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Nathan Chancellor [Mon, 9 Dec 2019 21:44:23 +0000 (14:44 -0700)]
mtd: onenand_base: Adjust indentation in onenand_read_ops_nolock
commit
0e7ca83e82d021c928dadf4c13c137d57337540d upstream.
Clang warns:
../drivers/mtd/nand/onenand/onenand_base.c:1269:3: warning: misleading
indentation; statement is not part of the previous 'if'
[-Wmisleading-indentation]
while (!ret) {
^
../drivers/mtd/nand/onenand/onenand_base.c:1266:2: note: previous
statement is here
if (column + thislen > writesize)
^
1 warning generated.
This warning occurs because there is a space before the tab of the while
loop. There are spaces at the beginning of a lot of the lines in this
block, remove them so that the indentation is consistent with the Linux
kernel coding style and clang no longer warns.
Fixes:
a8de85d55700 ("[MTD] OneNAND: Implement read-while-load")
Link: https://github.com/ClangBuiltLinux/linux/issues/794
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Suzuki K Poulose [Mon, 13 Jan 2020 23:30:23 +0000 (23:30 +0000)]
arm64: nofpsmid: Handle TIF_FOREIGN_FPSTATE flag cleanly
commit
52f73c383b2418f2d31b798e765ae7d596c35021 upstream.
We detect the absence of FP/SIMD after an incapable CPU is brought up,
and by then we have kernel threads running already with TIF_FOREIGN_FPSTATE set
which could be set for early userspace applications (e.g, modprobe triggered
from initramfs) and init. This could cause the applications to loop forever in
do_nofity_resume() as we never clear the TIF flag, once we now know that
we don't support FP.
Fix this by making sure that we clear the TIF_FOREIGN_FPSTATE flag
for tasks which may have them set, as we would have done in the normal
case, but avoiding touching the hardware state (since we don't support any).
Also to make sure we handle the cases seemlessly we categorise the
helper functions to two :
1) Helpers for common core code, which calls into take appropriate
actions without knowing the current FPSIMD state of the CPU/task.
e.g fpsimd_restore_current_state(), fpsimd_flush_task_state(),
fpsimd_save_and_flush_cpu_state().
We bail out early for these functions, taking any appropriate actions
(e.g, clearing the TIF flag) where necessary to hide the handling
from core code.
2) Helpers used when the presence of FP/SIMD is apparent.
i.e, save/restore the FP/SIMD register state, modify the CPU/task
FP/SIMD state.
e.g,
fpsimd_save(), task_fpsimd_load() - save/restore task FP/SIMD registers
fpsimd_bind_task_to_cpu() \
- Update the "state" metadata for CPU/task.
fpsimd_bind_state_to_cpu() /
fpsimd_update_current_state() - Update the fp/simd state for the current
task from memory.
These must not be called in the absence of FP/SIMD. Put in a WARNING
to make sure they are not invoked in the absence of FP/SIMD.
KVM also uses the TIF_FOREIGN_FPSTATE flag to manage the FP/SIMD state
on the CPU. However, without FP/SIMD support we trap all accesses and
inject undefined instruction. Thus we should never "load" guest state.
Add a sanity check to make sure this is valid.
Fixes:
82e0191a1aa11abf ("arm64: Support systems without FP/ASIMD")
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Alexandru Elisei [Mon, 27 Jan 2020 10:36:52 +0000 (10:36 +0000)]
KVM: arm64: Treat emulated TVAL TimerValue as a signed 32-bit integer
commit
4a267aa707953a9a73d1f5dc7f894dd9024a92be upstream.
According to the ARM ARM, registers CNT{P,V}_TVAL_EL0 have bits [63:32]
RES0 [1]. When reading the register, the value is truncated to the least
significant 32 bits [2], and on writes, TimerValue is treated as a signed
32-bit integer [1, 2].
When the guest behaves correctly and writes 32-bit values, treating TVAL
as an unsigned 64 bit register works as expected. However, things start
to break down when the guest writes larger values, because
(u64)0x1_ffff_ffff =
8589934591. but (s32)0x1_ffff_ffff = -1, and the
former will cause the timer interrupt to be asserted in the future, but
the latter will cause it to be asserted now. Let's treat TVAL as a
signed 32-bit register on writes, to match the behaviour described in
the architecture, and the behaviour experimentally exhibited by the
virtual timer on a non-vhe host.
[1] Arm DDI 0487E.a, section D13.8.18
[2] Arm DDI 0487E.a, section D11.2.4
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
[maz: replaced the read-side mask with lower_32_bits]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Fixes:
8fa761624871 ("KVM: arm/arm64: arch_timer: Fix CNTP_TVAL calculation")
Link: https://lore.kernel.org/r/20200127103652.2326-1-alexandru.elisei@arm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Eric Auger [Fri, 24 Jan 2020 14:25:34 +0000 (15:25 +0100)]
KVM: arm64: pmu: Fix chained SW_INCR counters
commit
aa76829171e98bd75a0cc00b6248eca269ac7f4f upstream.
At the moment a SW_INCR counter always overflows on 32-bit
boundary, independently on whether the n+1th counter is
programmed as CHAIN.
Check whether the SW_INCR counter is a 64b counter and if so,
implement the 64b logic.
Fixes:
80f393a23be6 ("KVM: arm/arm64: Support chained PMU counters")
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200124142535.29386-4-eric.auger@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Eric Auger [Fri, 24 Jan 2020 14:25:32 +0000 (15:25 +0100)]
KVM: arm64: pmu: Don't increment SW_INCR if PMCR.E is unset
commit
3837407c1aa1101ed5e214c7d6041e7a23335c6e upstream.
The specification says PMSWINC increments PMEVCNTR<n>_EL1 by 1
if PMEVCNTR<n>_EL0 is enabled and configured to count SW_INCR.
For PMEVCNTR<n>_EL0 to be enabled, we need both PMCNTENSET to
be set for the corresponding event counter but we also need
the PMCR.E bit to be set.
Fixes:
7a0adc7064b8 ("arm64: KVM: Add access handler for PMSWINC register")
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Andrew Murray <andrew.murray@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200124142535.29386-2-eric.auger@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
James Morse [Tue, 21 Jan 2020 12:33:56 +0000 (12:33 +0000)]
KVM: arm: Make inject_abt32() inject an external abort instead
commit
21aecdbd7f3ab02c9b82597dc733ee759fb8b274 upstream.
KVM's inject_abt64() injects an external-abort into an aarch64 guest.
The KVM_CAP_ARM_INJECT_EXT_DABT is intended to do exactly this, but
for an aarch32 guest inject_abt32() injects an implementation-defined
exception, 'Lockdown fault'.
Change this to external abort. For non-LPAE we now get the documented:
| Unhandled fault: external abort on non-linefetch (0x008) at 0x9c800f00
and for LPAE:
| Unhandled fault: synchronous external abort (0x210) at 0x9c800f00
Fixes:
74a64a981662a ("KVM: arm/arm64: Unify 32bit fault injection")
Reported-by: Beata Michalska <beata.michalska@linaro.org>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200121123356.203000-3-james.morse@arm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
James Morse [Tue, 21 Jan 2020 12:33:55 +0000 (12:33 +0000)]
KVM: arm: Fix DFSR setting for non-LPAE aarch32 guests
commit
018f22f95e8a6c3e27188b7317ef2c70a34cb2cd upstream.
Beata reports that KVM_SET_VCPU_EVENTS doesn't inject the expected
exception to a non-LPAE aarch32 guest.
The host intends to inject DFSR.FS=0x14 "IMPLEMENTATION DEFINED fault
(Lockdown fault)", but the guest receives DFSR.FS=0x04 "Fault on
instruction cache maintenance". This fault is hooked by
do_translation_fault() since ARMv6, which goes on to silently 'handle'
the exception, and restart the faulting instruction.
It turns out, when TTBCR.EAE is clear DFSR is split, and FS[4] has
to shuffle up to DFSR[10].
As KVM only does this in one place, fix up the static values. We
now get the expected:
| Unhandled fault: lock abort (0x404) at 0x9c800f00
Fixes:
74a64a981662a ("KVM: arm/arm64: Unify 32bit fault injection")
Reported-by: Beata Michalska <beata.michalska@linaro.org>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200121123356.203000-2-james.morse@arm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Gavin Shan [Tue, 21 Jan 2020 05:56:59 +0000 (16:56 +1100)]
KVM: arm/arm64: Fix young bit from mmu notifier
commit
cf2d23e0bac9f6b5cd1cba8898f5f05ead40e530 upstream.
kvm_test_age_hva() is called upon mmu_notifier_test_young(), but wrong
address range has been passed to handle_hva_to_gpa(). With the wrong
address range, no young bits will be checked in handle_hva_to_gpa().
It means zero is always returned from mmu_notifier_test_young().
This fixes the issue by passing correct address range to the underly
function handle_hva_to_gpa(), so that the hardware young (access) bit
will be visited.
Fixes:
35307b9a5f7e ("arm/arm64: KVM: Implement Stage-2 page aging")
Signed-off-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200121055659.19560-1-gshan@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Suzuki K Poulose [Mon, 13 Jan 2020 23:30:21 +0000 (23:30 +0000)]
arm64: ptrace: nofpsimd: Fail FP/SIMD regset operations
commit
c9d66999f064947e6b577ceacc1eb2fbca6a8d3c upstream.
When fp/simd is not supported on the system, fail the operations
of FP/SIMD regsets.
Fixes:
82e0191a1aa11abf ("arm64: Support systems without FP/ASIMD")
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Suzuki K Poulose [Mon, 13 Jan 2020 23:30:20 +0000 (23:30 +0000)]
arm64: cpufeature: Set the FP/SIMD compat HWCAP bits properly
commit
7559950aef1ab8792c50797c6c5c7c5150a02460 upstream.
We set the compat_elf_hwcap bits unconditionally on arm64 to
include the VFP and NEON support. However, the FP/SIMD unit
is optional on Arm v8 and thus could be missing. We already
handle this properly in the kernel, but still advertise to
the COMPAT applications that the VFP is available. Fix this
to make sure we only advertise when we really have them.
Fixes:
82e0191a1aa11abf ("arm64: Support systems without FP/ASIMD")
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Suzuki K Poulose [Mon, 13 Jan 2020 23:30:19 +0000 (23:30 +0000)]
arm64: cpufeature: Fix the type of no FP/SIMD capability
commit
449443c03d8cfdacf7313e17779a2594ebf87e6d upstream.
The NO_FPSIMD capability is defined with scope SYSTEM, which implies
that the "absence" of FP/SIMD on at least one CPU is detected only
after all the SMP CPUs are brought up. However, we use the status
of this capability for every context switch. So, let us change
the scope to LOCAL_CPU to allow the detection of this capability
as and when the first CPU without FP is brought up.
Also, the current type allows hotplugged CPU to be brought up without
FP/SIMD when all the current CPUs have FP/SIMD and we have the userspace
up. Fix both of these issues by changing the capability to
BOOT_RESTRICTED_LOCAL_CPU_FEATURE.
Fixes:
82e0191a1aa11abf ("arm64: Support systems without FP/ASIMD")
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Qais Yousef [Tue, 24 Dec 2019 11:54:04 +0000 (11:54 +0000)]
sched/uclamp: Fix a bug in propagating uclamp value in new cgroups
commit
7226017ad37a888915628e59a84a2d1e57b40707 upstream.
When a new cgroup is created, the effective uclamp value wasn't updated
with a call to cpu_util_update_eff() that looks at the hierarchy and
update to the most restrictive values.
Fix it by ensuring to call cpu_util_update_eff() when a new cgroup
becomes online.
Without this change, the newly created cgroup uses the default
root_task_group uclamp values, which is 1024 for both uclamp_{min, max},
which will cause the rq to to be clamped to max, hence cause the
system to run at max frequency.
The problem was observed on Ubuntu server and was reproduced on Debian
and Buildroot rootfs.
By default, Ubuntu and Debian create a cpu controller cgroup hierarchy
and add all tasks to it - which creates enough noise to keep the rq
uclamp value at max most of the time. Imitating this behavior makes the
problem visible in Buildroot too which otherwise looks fine since it's a
minimal userspace.
Fixes:
0b60ba2dd342 ("sched/uclamp: Propagate parent clamps")
Reported-by: Doug Smythies <dsmythies@telus.net>
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Doug Smythies <dsmythies@telus.net>
Link: https://lore.kernel.org/lkml/000701d5b965$361b6c60$a2524520$@net/
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Olof Johansson [Wed, 18 Dec 2019 00:18:49 +0000 (01:18 +0100)]
ARM: 8949/1: mm: mark free_memmap as __init
commit
31f3010e60522ede237fb145a63b4af5a41718c2 upstream.
As of commit
ac7c3e4ff401 ("compiler: enable CONFIG_OPTIMIZE_INLINING
forcibly"), free_memmap() might not always be inlined, and thus is
triggering a section warning:
WARNING: vmlinux.o(.text.unlikely+0x904): Section mismatch in reference from the function free_memmap() to the function .meminit.text:memblock_free()
Mark it as __init, since the faller (free_unused_memmap) already is.
Fixes:
ac7c3e4ff401 ("compiler: enable CONFIG_OPTIMIZE_INLINING forcibly")
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>