Linus Torvalds [Fri, 30 Nov 2018 17:32:34 +0000 (09:32 -0800)]
Merge tag 'trace-v4.20-rc3' of git://git./linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"While rewriting the function graph tracer, I discovered a design flaw
that was introduced by a patch that tried to fix one bug, but by doing
so created another bug.
As both bugs corrupt the output (but they do not crash the kernel), I
decided to fix the design such that it could have both bugs fixed. The
original fix, fixed time reporting of the function graph tracer when
doing a max_depth of one. This was code that can test how much the
kernel interferes with userspace. But in doing so, it could corrupt
the time keeping of the function profiler.
The issue is that the curr_ret_stack variable was being used for two
different meanings. One was to keep track of the stack pointer on the
ret_stack (shadow stack used by the function graph tracer), and the
other use case was the graph call depth. Although, the two may be
closely related, where they got updated was the issue that lead to the
two different bugs that required the two use cases to be updated
differently.
The big issue with this fix is that it requires changing each
architecture. The good news is, I was able to remove a lot of code
that was duplicated within the architectures and place it into a
single location. Then I could make the fix in one place.
I pushed this code into linux-next to let it settle over a week, and
before doing so, I cross compiled all the affected architectures to
make sure that they built fine.
In the mean time, I also pulled in a patch that fixes the sched_switch
previous tasks state output, that was not actually correct"
* tag 'trace-v4.20-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
sched, trace: Fix prev_state output in sched_switch tracepoint
function_graph: Have profiler use curr_ret_stack and not depth
function_graph: Reverse the order of pushing the ret_stack and the callback
function_graph: Move return callback before update of curr_ret_stack
function_graph: Use new curr_ret_depth to manage depth instead of curr_ret_stack
function_graph: Make ftrace_push_return_trace() static
sparc/function_graph: Simplify with function_graph_enter()
sh/function_graph: Simplify with function_graph_enter()
s390/function_graph: Simplify with function_graph_enter()
riscv/function_graph: Simplify with function_graph_enter()
powerpc/function_graph: Simplify with function_graph_enter()
parisc: function_graph: Simplify with function_graph_enter()
nds32: function_graph: Simplify with function_graph_enter()
MIPS: function_graph: Simplify with function_graph_enter()
microblaze: function_graph: Simplify with function_graph_enter()
arm64: function_graph: Simplify with function_graph_enter()
ARM: function_graph: Simplify with function_graph_enter()
x86/function_graph: Simplify with function_graph_enter()
function_graph: Create function_graph_enter() to consolidate architecture code
Linus Torvalds [Fri, 30 Nov 2018 17:10:29 +0000 (09:10 -0800)]
Merge tag 'drm-fixes-2018-11-30' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"This weeks instalment of fixes. Looks fairly like business as usual
and everything seems to rolling along. There was one MST fix applied
and reverted in the misc tree, but otherwise nothing too strange in
here.
core:
- incorrect master setting on error fix
i915:
- only GVT fixes this week:
* one MOCS register load
* rpm lock fix
* use after free
rcar-du:
- regression fix for group start
amdgpu:
- DP MST fix
- GPUVM fix for huge pages
- RLC fix for vega20
ast:
- fix EDID reading stability
- ioreg free fix
meson:
- sleep in irq fix
- vblank fixes
- array boundary fix"
* tag 'drm-fixes-2018-11-30' of git://anongit.freedesktop.org/drm/drm:
drm/ast: fixed reading monitor EDID not stable issue
drm/ast: Fix incorrect free on ioregs
Revert "drm/dp_mst: Skip validating ports during destruction, just ref"
drm/amdgpu: Add delay after enable RLC ucode
drm/amdgpu: Avoid endless loop in GPUVM fragment processing
drm/amdgpu: Cast to uint64_t before left shift
drm/meson: add support for 1080p25 mode
drm/meson: Fix OOB memory accesses in meson_viu_set_osd_lut()
drm/meson: Enable fast_io in meson_dw_hdmi_regmap_config
drm/meson: Fixes for drm_crtc_vblank_on/off support
drm: set is_master to 0 upon drm_new_set_master() failure
drm/dp_mst: Skip validating ports during destruction, just ref
drm: rcar-du: Fix DU3 start/stop on M3-N
drm/amd/dm: Understand why attaching path/tile properties are needed
drm/amd/dm: Don't forget to attach MST encoders
drm/i915/gvt: Avoid use-after-free iterating the gtt list
drm/i915/gvt: ensure gpu is powered before do i915_gem_gtt_insert
drm/i915/gvt: not to touch undefined MOCS registers
Linus Torvalds [Fri, 30 Nov 2018 17:03:15 +0000 (09:03 -0800)]
Merge tag 'pstore-v4.20-rc5' of git://git./linux/kernel/git/kees/linux
Pull pstore fix from Kees Cook:
"Fix corrupted compression due to unlucky size choice with ECC"
* tag 'pstore-v4.20-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
pstore/ram: Correctly calculate usable PRZ bytes
Linus Torvalds [Fri, 30 Nov 2018 16:57:31 +0000 (08:57 -0800)]
Merge tag 'for-linus' of git://git./linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
"This is a bit later than usual for our first -rc but I'm not seeing
anything worry-some in the RDMA tree right now. Quiet so far this -rc
cycle, only a few internal driver related bugs and a small series
fixing ODP bugs found by more advanced testing.
A set of small driver and core code fixes:
- Small series fixing longtime user triggerable bugs in the ODP
processing inside mlx5 and core code
- Various small driver malfunctions and crashes (use after, free,
error unwind, implementation bugs)
- A misfunction of the RDMA GID cache that can be triggered by the
administrator"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/mlx5: Initialize return variable in case pagefault was skipped
IB/mlx5: Fix page fault handling for MW
IB/umem: Set correct address to the invalidation function
IB/mlx5: Skip non-ODP MR when handling a page fault
RDMA/hns: Bugfix pbl configuration for rereg mr
iser: set sector for ambiguous mr status errors
RDMA/rdmavt: Fix rvt_create_ah function signature
IB/mlx5: Avoid load failure due to unknown link width
IB/mlx5: Fix XRC QP support after introducing extended atomic
RDMA/bnxt_re: Avoid accessing the device structure after it is freed
RDMA/bnxt_re: Fix system hang when registration with L2 driver fails
RDMA/core: Add GIDs while changing MAC addr only for registered ndev
RDMA/mlx5: Fix fence type for IB_WR_LOCAL_INV WR
net/mlx5: Fix XRC SRQ umem valid bits
Linus Torvalds [Thu, 29 Nov 2018 23:54:12 +0000 (15:54 -0800)]
Merge tag 'acpi-4.20-rc5' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"Fix a recent regression in ACPICA releted to the Generic Serial Bus
protocol handling and causing it to read or write too little or too
much data in some cases, so incorrect data may be written to hardware
as a result (Hans de Goede)"
* tag 'acpi-4.20-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPICA: Fix handling of buffer-size in acpi_ex_write_data_to_field()
Linus Torvalds [Thu, 29 Nov 2018 23:07:30 +0000 (15:07 -0800)]
Merge tag 'pm-4.20-rc5' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix two issues in the operating performance points (OPP)
framework.
Specifics:
- Fix the handling of the "operating-points-v2" property to avoid
failures if multiple phandles are present in it which is legitimate
(Viresh Kumar).
- Drop the unnecessary static initialization of the .owner field in
the ti_opp_supply_driver structure (YueHaibing)"
* tag 'pm-4.20-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
OPP: Fix parsing of multiple phandles in "operating-points-v2" property
opp: ti-opp-supply: Fix platform_no_drv_owner.cocci warnings
Leon Romanovsky [Thu, 29 Nov 2018 10:25:29 +0000 (12:25 +0200)]
RDMA/mlx5: Initialize return variable in case pagefault was skipped
Pagefaults occurred in non-ODP MR are completely valid events, so
initialize return variable to 0.
Fixes:
4d5422a309de ("IB/mlx5: Skip non-ODP MR when handling a page fault")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Kees Cook [Thu, 1 Nov 2018 23:17:22 +0000 (16:17 -0700)]
pstore/ram: Correctly calculate usable PRZ bytes
The actual number of bytes stored in a PRZ is smaller than the
bytes requested by platform data, since there is a header on each
PRZ. Additionally, if ECC is enabled, there are trailing bytes used
as well. Normally this mismatch doesn't matter since PRZs are circular
buffers and the leading "overflow" bytes are just thrown away. However, in
the case of a compressed record, this rather badly corrupts the results.
This corruption was visible with "ramoops.mem_size=204800 ramoops.ecc=1".
Any stored crashes would not be uncompressable (producing a pstorefs
"dmesg-*.enc.z" file), and triggering errors at boot:
[ 2.790759] pstore: crypto_comp_decompress failed, ret = -22!
Backporting this depends on commit
70ad35db3321 ("pstore: Convert console
write to use ->write_buf")
Reported-by: Joel Fernandes <joel@joelfernandes.org>
Fixes:
b0aad7a99c1d ("pstore: Add compression support to pstore")
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Rafael J. Wysocki [Thu, 29 Nov 2018 20:21:39 +0000 (21:21 +0100)]
Merge branch 'acpica-fixes'
* acpica-fixes:
ACPICA: Fix handling of buffer-size in acpi_ex_write_data_to_field()
Linus Torvalds [Thu, 29 Nov 2018 18:15:06 +0000 (10:15 -0800)]
Merge tag 'selinux-pr-
20181129' of git://git./linux/kernel/git/pcmoore/selinux
Pull SELinux fix from Paul Moore:
"One more SELinux fix for v4.20: add some missing netlink message to
SELinux permission mappings. The netlink messages were added in v4.19,
but unfortunately we didn't catch it then because the mechanism to
catch these things was bypassed.
In addition to adding the mappings, we're adding some comments to the
code to hopefully prevent bypasses in the future"
* tag 'selinux-pr-
20181129' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: add support for RTM_NEWCHAIN, RTM_DELCHAIN, and RTM_GETCHAIN
Linus Torvalds [Thu, 29 Nov 2018 18:10:42 +0000 (10:10 -0800)]
Merge tag 's390-4.20-3' of git://git./linux/kernel/git/s390/linux
Pull s390 updates from Martin Schwidefsky:
- Add two missing kfree calls on error paths in the vfio-ccw code
- Make sure that all data structures of a mediated vfio-ccw device are
initialized before registering it
- Fix a sparse warning in vfio-ccw
- A followup patch for the pgtable_bytes accounting, the page table
downgrade for compat processes missed a mm_dec_nr_pmds()
- Reject sampling requests in the PMU init function of the CPU
measurement counter facility
- With the vfio AP driver an AP queue needs to be reset on every device
probe as the alternative driver could have modified the device state
* tag 's390-4.20-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/mm: correct pgtable_bytes on page table downgrade
s390/zcrypt: reinit ap queue state machine during device probe
s390/cpum_cf: Reject request for sampling in event initialization
s390/cio: Fix cleanup when unsupported IDA format is used
s390/cio: Fix cleanup of pfn_array alloc failure
vfio: ccw: Register mediated device once all structures are initialized
s390/cio: make vfio_ccw_io_region static
Linus Torvalds [Thu, 29 Nov 2018 18:03:42 +0000 (10:03 -0800)]
Merge tag 'sound-4.20-rc5' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"As a usual pattern, we've got relatively large updates at rc5:
- A fix for races in ALSA control user elements
- ASoC DAPM regression due to component refactoring
- A fix in error handling of ASoC iteration macro
- ASoC Intel SST Skylake kconfig fix; a new Kconfig will appear as a
consequence, but in the end it's a good cleanup
- HD-audio and USB-audio quirks as always
- Assort of ASoC driver fixes (pcm186x, Intel cht, rockchip, pcm3060,
rsnd, omap, wm_adsp, qcom, sunxi, stm32)"
* tag 'sound-4.20-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (34 commits)
ALSA: usb-audio: Add vendor and product name for Dell WD19 Dock
ALSA: hda/realtek - Support ALC300
ALSA: hda/realtek - Add auto-mute quirk for HP Spectre x360 laptop
ALSA: hda/realtek - fix the pop noise on headphone for lenovo laptops
ALSA: control: Fix race between adding and removing a user element
ALSA: sparc: Fix invalid snd_free_pages() at error path
ALSA: wss: Fix invalid snd_free_pages() at error path
ALSA: hda/realtek - fix headset mic detection for MSI MS-B171
ALSA: hda: Add ASRock N68C-S UCC the power_save blacklist
ALSA: ac97: Fix incorrect bit shift at AC97-SPSA control write
ASoC: omap-dmic: Add pm_qos handling to avoid overruns with CPU_IDLE
ASoC: omap-mcpdm: Add pm_qos handling to avoid under/overruns with CPU_IDLE
ASoC: omap-mcbsp: Fix latency value calculation for pm_qos
ASoC: acpi: fix: continue searching when machine is ignored
ASoC: Intel: Skylake: fix Kconfigs, make HDaudio codec optional
MAINTAINERS: add ASoC maintainers for sound dt-bindings
ASoC: pcm186x: Fix device reset-registers trigger value
ASoC: dapm: Recalculate audio map forcely when card instantiated
ASoC: omap-abe-twl6040: Fix missing audio card caused by deferred probing
ASoC: pcm3060: Rename output widgets
...
Linus Torvalds [Thu, 29 Nov 2018 17:56:00 +0000 (09:56 -0800)]
Merge tag 'fixes_for_v4.20-rc5' of git://git./linux/kernel/git/jack/linux-fs
Pull ext2 and udf fixes from Jan Kara:
"Three small ext2 and udf fixes"
* tag 'fixes_for_v4.20-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
ext2: fix potential use after free
ext2: initialize opts.s_mount_opt as zero before using it
udf: Allow mounting volumes with incorrect identification strings
Paul Moore [Wed, 28 Nov 2018 17:57:33 +0000 (12:57 -0500)]
selinux: add support for RTM_NEWCHAIN, RTM_DELCHAIN, and RTM_GETCHAIN
Commit
32a4f5ecd738 ("net: sched: introduce chain object to uapi")
added new RTM_* definitions without properly updating SELinux, this
patch adds the necessary SELinux support.
While there was a BUILD_BUG_ON() in the SELinux code to protect from
exactly this case, it was bypassed in the broken commit. In order to
hopefully prevent this from happening in the future, add additional
comments which provide some instructions on how to resolve the
BUILD_BUG_ON() failures.
Fixes:
32a4f5ecd738 ("net: sched: introduce chain object to uapi")
Cc: <stable@vger.kernel.org> # 4.19
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Dave Airlie [Thu, 29 Nov 2018 00:11:02 +0000 (10:11 +1000)]
Merge tag 'drm-misc-fixes-2018-11-28-1' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
- mst: Don't try to validate ports while destroying them (Lyude)
- Revert: Don't try to validate ports while destroying them (Lyude)
- core: Don't set device to master unless set_master succeeds (Sergio)
- meson: Do vblank_on/off on enable/disable (Neil)
- meson: Use fast_io regmap option to avoid sleeping in irq ctx (Lyude)
- meson: Don't walk off the end of the OSD EOTF LUTs (Lyude)
Cc: Lyude Paul <lyude@redhat.com>
Cc: Sergio Correia <sergio@correia.cc>
Cc: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Sean Paul <sean@poorly.run>
Link: https://patchwork.freedesktop.org/patch/msgid/20181128212936.GA21379@art_vandelay
Dave Airlie [Thu, 29 Nov 2018 00:07:33 +0000 (10:07 +1000)]
Merge branch 'drm-fixes-4.20' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
Fixes for 4.20. Nothing major.
- DC DP MST fix
- GPUVM fix for huge page mapping
- RLC fix for vega20
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181128195905.2897-1-alexander.deucher@amd.com
Dave Airlie [Wed, 28 Nov 2018 23:59:12 +0000 (09:59 +1000)]
Merge tag 'drm-intel-fixes-2018-11-28' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
Just gvt-fixes-2018-11-26
""One to correct MOCS registers load on engine list, one for rpm lock
warning fix, and another for use-after-free fix for partial ggtt
list destroy. "
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181128180648.GA17585@jlahtine-desk.ger.corp.intel.com
Dave Airlie [Wed, 28 Nov 2018 23:56:29 +0000 (09:56 +1000)]
Merge tag 'du-fixes-
20181126' of git://linuxtv.org/pinchartl/media into drm-fixes
R-Car DU v4.20 regression fix
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Link: https://patchwork.freedesktop.org/patch/msgid/8134504.ZSXK7gKU4H@avalon
Y.C. Chen [Thu, 22 Nov 2018 03:56:28 +0000 (11:56 +0800)]
drm/ast: fixed reading monitor EDID not stable issue
v1: over-sample data to increase the stability with some specific monitors
v2: refine to avoid infinite loop
v3: remove un-necessary "volatile" declaration
[airlied: fix two checkpatch warnings]
Signed-off-by: Y.C. Chen <yc_chen@aspeedtech.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1542858988-1127-1-git-send-email-yc_chen@aspeedtech.com
Sam Bobroff [Mon, 5 Nov 2018 05:57:47 +0000 (16:57 +1100)]
drm/ast: Fix incorrect free on ioregs
If the platform has no IO space, ioregs is placed next to the already
allocated regs. In this case, it should not be separately freed.
This prevents a kernel warning from __vunmap "Trying to vfree()
nonexistent vm area" when unloading the driver.
Fixes:
0dd68309b9c5 ("drm/ast: Try to use MMIO registers when PIO isn't supported")
Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Lyude Paul [Wed, 28 Nov 2018 21:00:05 +0000 (16:00 -0500)]
Revert "drm/dp_mst: Skip validating ports during destruction, just ref"
This reverts commit:
c54c7374ff44 ("drm/dp_mst: Skip validating ports during destruction, just ref")
ugh.
In drm_dp_destroy_connector_work(), we have a pretty good chance of
freeing the actual struct drm_dp_mst_port. However, after destroying
things we send a hotplug through (*mgr->cbs->hotplug)(mgr) which is
where the problems start.
For i915, this calls all the way down to the fbcon probing helpers,
which start trying to access the port in a modeset.
[ 45.062001] ==================================================================
[ 45.062112] BUG: KASAN: use-after-free in ex_handler_refcount+0x146/0x180
[ 45.062196] Write of size 4 at addr
ffff8882b4b70968 by task kworker/3:1/53
[ 45.062325] CPU: 3 PID: 53 Comm: kworker/3:1 Kdump: loaded Tainted: G O 4.20.0-rc4Lyude-Test+ #3
[ 45.062442] Hardware name: LENOVO 20BWS1KY00/20BWS1KY00, BIOS JBET71WW (1.35 ) 09/14/2018
[ 45.062554] Workqueue: events drm_dp_destroy_connector_work [drm_kms_helper]
[ 45.062641] Call Trace:
[ 45.062685] dump_stack+0xbd/0x15a
[ 45.062735] ? dump_stack_print_info.cold.0+0x1b/0x1b
[ 45.062801] ? printk+0x9f/0xc5
[ 45.062847] ? kmsg_dump_rewind_nolock+0xe4/0xe4
[ 45.062909] ? ex_handler_refcount+0x146/0x180
[ 45.062970] print_address_description+0x71/0x239
[ 45.063036] ? ex_handler_refcount+0x146/0x180
[ 45.063095] kasan_report.cold.5+0x242/0x30b
[ 45.063155] __asan_report_store4_noabort+0x1c/0x20
[ 45.063313] ex_handler_refcount+0x146/0x180
[ 45.063371] ? ex_handler_clear_fs+0xb0/0xb0
[ 45.063428] fixup_exception+0x98/0xd7
[ 45.063484] ? raw_notifier_call_chain+0x20/0x20
[ 45.063548] do_trap+0x6d/0x210
[ 45.063605] ? _GLOBAL__sub_I_65535_1_drm_dp_aux_unregister_devnode+0x2f/0x1c6 [drm_kms_helper]
[ 45.063732] do_error_trap+0xc0/0x170
[ 45.063802] ? _GLOBAL__sub_I_65535_1_drm_dp_aux_unregister_devnode+0x2f/0x1c6 [drm_kms_helper]
[ 45.063929] do_invalid_op+0x3b/0x50
[ 45.063997] ? _GLOBAL__sub_I_65535_1_drm_dp_aux_unregister_devnode+0x2f/0x1c6 [drm_kms_helper]
[ 45.064103] invalid_op+0x14/0x20
[ 45.064162] RIP: 0010:_GLOBAL__sub_I_65535_1_drm_dp_aux_unregister_devnode+0x2f/0x1c6 [drm_kms_helper]
[ 45.064274] Code: 00 48 c7 c7 80 fe 53 a0 48 89 e5 e8 5b 6f 26 e1 5d c3 48 8d 0e 0f 0b 48 8d 0b 0f 0b 48 8d 0f 0f 0b 48 8d 0f 0f 0b 49 8d 4d 00 <0f> 0b 49 8d 0e 0f 0b 48 8d 08 0f 0b 49 8d 4d 00 0f 0b 48 8d 0b 0f
[ 45.064569] RSP: 0018:
ffff8882b789ee10 EFLAGS:
00010282
[ 45.064637] RAX:
ffff8882af47ae70 RBX:
ffff8882af47aa60 RCX:
ffff8882b4b70968
[ 45.064723] RDX:
ffff8882af47ae70 RSI:
0000000000000008 RDI:
ffff8882b788bdb8
[ 45.064808] RBP:
ffff8882b789ee28 R08:
ffffed1056f13db4 R09:
ffffed1056f13db3
[ 45.064894] R10:
ffffed1056f13db3 R11:
ffff8882b789ed9f R12:
ffff8882af47ad28
[ 45.064980] R13:
ffff8882b4b70968 R14:
ffff8882acd86728 R15:
ffff8882b4b75dc8
[ 45.065084] drm_dp_mst_reset_vcpi_slots+0x12/0x80 [drm_kms_helper]
[ 45.065225] intel_mst_disable_dp+0xda/0x180 [i915]
[ 45.065361] intel_encoders_disable.isra.107+0x197/0x310 [i915]
[ 45.065498] haswell_crtc_disable+0xbe/0x400 [i915]
[ 45.065622] ? i9xx_disable_plane+0x1c0/0x3e0 [i915]
[ 45.065750] intel_atomic_commit_tail+0x74e/0x3e60 [i915]
[ 45.065884] ? intel_pre_plane_update+0xbc0/0xbc0 [i915]
[ 45.065968] ? drm_atomic_helper_swap_state+0x88b/0x1d90 [drm_kms_helper]
[ 45.066054] ? kasan_check_write+0x14/0x20
[ 45.066165] ? i915_gem_track_fb+0x13a/0x330 [i915]
[ 45.066277] ? i915_sw_fence_complete+0xe9/0x140 [i915]
[ 45.066406] ? __i915_sw_fence_complete+0xc50/0xc50 [i915]
[ 45.066540] intel_atomic_commit+0x72e/0xef0 [i915]
[ 45.066635] ? drm_dev_dbg+0x200/0x200 [drm]
[ 45.066764] ? intel_atomic_commit_tail+0x3e60/0x3e60 [i915]
[ 45.066898] ? intel_atomic_commit_tail+0x3e60/0x3e60 [i915]
[ 45.067001] drm_atomic_commit+0xc4/0xf0 [drm]
[ 45.067074] restore_fbdev_mode_atomic+0x562/0x780 [drm_kms_helper]
[ 45.067166] ? drm_fb_helper_debug_leave+0x690/0x690 [drm_kms_helper]
[ 45.067249] ? kasan_check_read+0x11/0x20
[ 45.067324] restore_fbdev_mode+0x127/0x4b0 [drm_kms_helper]
[ 45.067364] ? kasan_check_read+0x11/0x20
[ 45.067406] drm_fb_helper_restore_fbdev_mode_unlocked+0x164/0x200 [drm_kms_helper]
[ 45.067462] ? drm_fb_helper_hotplug_event+0x30/0x30 [drm_kms_helper]
[ 45.067508] ? kasan_check_write+0x14/0x20
[ 45.070360] ? mutex_unlock+0x22/0x40
[ 45.073748] drm_fb_helper_set_par+0xb2/0xf0 [drm_kms_helper]
[ 45.075846] drm_fb_helper_hotplug_event.part.33+0x1cd/0x290 [drm_kms_helper]
[ 45.078088] drm_fb_helper_hotplug_event+0x1c/0x30 [drm_kms_helper]
[ 45.082614] intel_fbdev_output_poll_changed+0x9f/0x140 [i915]
[ 45.087069] drm_kms_helper_hotplug_event+0x67/0x90 [drm_kms_helper]
[ 45.089319] intel_dp_mst_hotplug+0x37/0x50 [i915]
[ 45.091496] drm_dp_destroy_connector_work+0x510/0x6f0 [drm_kms_helper]
[ 45.093675] ? drm_dp_update_payload_part1+0x1220/0x1220 [drm_kms_helper]
[ 45.095851] ? kasan_check_write+0x14/0x20
[ 45.098473] ? kasan_check_read+0x11/0x20
[ 45.101155] ? strscpy+0x17c/0x530
[ 45.103808] ? __switch_to_asm+0x34/0x70
[ 45.106456] ? syscall_return_via_sysret+0xf/0x7f
[ 45.109711] ? read_word_at_a_time+0x20/0x20
[ 45.113138] ? __switch_to_asm+0x40/0x70
[ 45.116529] ? __switch_to_asm+0x34/0x70
[ 45.119891] ? __switch_to_asm+0x40/0x70
[ 45.123224] ? __switch_to_asm+0x34/0x70
[ 45.126540] ? __switch_to_asm+0x34/0x70
[ 45.129824] process_one_work+0x88d/0x15d0
[ 45.133172] ? pool_mayday_timeout+0x850/0x850
[ 45.136459] ? pci_mmcfg_check_reserved+0x110/0x128
[ 45.139739] ? wake_q_add+0xb0/0xb0
[ 45.143010] ? check_preempt_wakeup+0x652/0x1050
[ 45.146304] ? worker_enter_idle+0x29e/0x740
[ 45.149589] ? __schedule+0x1ec0/0x1ec0
[ 45.152937] ? kasan_check_read+0x11/0x20
[ 45.156179] ? _raw_spin_lock_irq+0xa3/0x130
[ 45.159382] ? _raw_read_unlock_irqrestore+0x30/0x30
[ 45.162542] ? kasan_check_write+0x14/0x20
[ 45.165657] worker_thread+0x1a5/0x1470
[ 45.168725] ? set_load_weight+0x2e0/0x2e0
[ 45.171755] ? process_one_work+0x15d0/0x15d0
[ 45.174806] ? __switch_to_asm+0x34/0x70
[ 45.177645] ? __switch_to_asm+0x40/0x70
[ 45.180323] ? __switch_to_asm+0x34/0x70
[ 45.182936] ? __switch_to_asm+0x40/0x70
[ 45.185539] ? __switch_to_asm+0x34/0x70
[ 45.188100] ? __switch_to_asm+0x40/0x70
[ 45.190628] ? __schedule+0x7d4/0x1ec0
[ 45.193143] ? save_stack+0xa9/0xd0
[ 45.195632] ? kasan_check_write+0x10/0x20
[ 45.198162] ? kasan_kmalloc+0xc4/0xe0
[ 45.200609] ? kmem_cache_alloc_trace+0xdd/0x190
[ 45.203046] ? kthread+0x9f/0x3b0
[ 45.205470] ? ret_from_fork+0x35/0x40
[ 45.207876] ? unwind_next_frame+0x43/0x50
[ 45.210273] ? __save_stack_trace+0x82/0x100
[ 45.212658] ? deactivate_slab.isra.67+0x3d4/0x580
[ 45.215026] ? default_wake_function+0x35/0x50
[ 45.217399] ? kasan_check_read+0x11/0x20
[ 45.219825] ? _raw_spin_lock_irqsave+0xae/0x140
[ 45.222174] ? __lock_text_start+0x8/0x8
[ 45.224521] ? replenish_dl_entity.cold.62+0x4f/0x4f
[ 45.226868] ? __kthread_parkme+0x87/0xf0
[ 45.229200] kthread+0x2f7/0x3b0
[ 45.231557] ? process_one_work+0x15d0/0x15d0
[ 45.233923] ? kthread_park+0x120/0x120
[ 45.236249] ret_from_fork+0x35/0x40
[ 45.240875] Allocated by task 242:
[ 45.243136] save_stack+0x43/0xd0
[ 45.245385] kasan_kmalloc+0xc4/0xe0
[ 45.247597] kmem_cache_alloc_trace+0xdd/0x190
[ 45.249793] drm_dp_add_port+0x1e0/0x2170 [drm_kms_helper]
[ 45.252000] drm_dp_send_link_address+0x4a7/0x740 [drm_kms_helper]
[ 45.254389] drm_dp_check_and_send_link_address+0x1a7/0x210 [drm_kms_helper]
[ 45.256803] drm_dp_mst_link_probe_work+0x6f/0xb0 [drm_kms_helper]
[ 45.259200] process_one_work+0x88d/0x15d0
[ 45.261597] worker_thread+0x1a5/0x1470
[ 45.264038] kthread+0x2f7/0x3b0
[ 45.266371] ret_from_fork+0x35/0x40
[ 45.270937] Freed by task 53:
[ 45.273170] save_stack+0x43/0xd0
[ 45.275382] __kasan_slab_free+0x139/0x190
[ 45.277604] kasan_slab_free+0xe/0x10
[ 45.279826] kfree+0x99/0x1b0
[ 45.282044] drm_dp_free_mst_port+0x4a/0x60 [drm_kms_helper]
[ 45.284330] drm_dp_destroy_connector_work+0x43e/0x6f0 [drm_kms_helper]
[ 45.286660] process_one_work+0x88d/0x15d0
[ 45.288934] worker_thread+0x1a5/0x1470
[ 45.291231] kthread+0x2f7/0x3b0
[ 45.293547] ret_from_fork+0x35/0x40
[ 45.298206] The buggy address belongs to the object at
ffff8882b4b70968
which belongs to the cache kmalloc-2k of size 2048
[ 45.303047] The buggy address is located 0 bytes inside of
2048-byte region [
ffff8882b4b70968,
ffff8882b4b71168)
[ 45.308010] The buggy address belongs to the page:
[ 45.310477] page:
ffffea000ad2dc00 count:1 mapcount:0 mapping:
ffff8882c080cf40 index:0x0 compound_mapcount: 0
[ 45.313051] flags: 0x8000000000010200(slab|head)
[ 45.315635] raw:
8000000000010200 ffffea000aac2808 ffffea000abe8608 ffff8882c080cf40
[ 45.318300] raw:
0000000000000000 00000000000d000d 00000001ffffffff 0000000000000000
[ 45.320966] page dumped because: kasan: bad access detected
[ 45.326312] Memory state around the buggy address:
[ 45.329085]
ffff8882b4b70800: fb fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 45.331845]
ffff8882b4b70880: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 45.334584] >
ffff8882b4b70900: fc fc fc fc fc fc fc fc fc fc fc fc fc fb fb fb
[ 45.337302] ^
[ 45.340061]
ffff8882b4b70980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 45.342910]
ffff8882b4b70a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 45.345748] ==================================================================
So, this definitely isn't a fix that we want. This being said; there's
no real easy fix for this problem because of some of the catch-22's of
the MST helpers current design. For starters; we always need to validate
a port with drm_dp_get_validated_port_ref(), but validation relies on
the lifetime of the port in the actual topology. So once the port is
gone, it can't be validated again.
If we were to try to make the payload helpers not use port validation,
then we'd cause another problem: if the port isn't validated, it could
be freed and we'd just start causing more KASAN issues. There are
already hacks that attempt to workaround this in
drm_dp_mst_destroy_connector_work() by re-initializing the kref so that
it can be used again and it's memory can be freed once the VCPI helpers
finish removing the port's respective payloads. But none of these really
do anything helpful since the port still can't be validated since it's
gone from the topology. Also, that workaround is immensely confusing to
read through.
What really needs to be done in order to fix this is to teach DRM how to
track the lifetime of the structs for MST ports and branch devices
separately from their lifetime in the actual topology. Simply put; this
means having two different krefs-one that removes the port/branch device
from the topology, and one that finally calls kfree(). This would let us
simplify things, since we'd now be able to keep ports around without
having to keep them in the topology at the same time, which is exactly
what we need in order to teach our VCPI helpers to only validate ports
when it's actually necessary without running the risk of trying to use
unallocated memory.
Such a fix is on it's way, but for now let's play it safe and just
revert this. If this bug has been around for well over a year, we can
wait a little while to get an actual proper fix here.
Signed-off-by: Lyude Paul <lyude@redhat.com>
Fixes:
c54c7374ff44 ("drm/dp_mst: Skip validating ports during destruction, just ref")
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Sean Paul <sean@poorly.run>
Cc: Jerry Zuo <Jerry.Zuo@amd.com>
Cc: Harry Wentland <Harry.Wentland@amd.com>
Cc: stable@vger.kernel.org # v4.6+
Acked-by: Sean Paul <sean@poorly.run>
Link: https://patchwork.freedesktop.org/patch/msgid/20181128210005.24434-1-lyude@redhat.com
Linus Torvalds [Wed, 28 Nov 2018 20:53:48 +0000 (12:53 -0800)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) ARM64 JIT fixes for subprog handling from Daniel Borkmann.
2) Various sparc64 JIT bug fixes (fused branch convergance, frame
pointer usage detection logic, PSEODU call argument handling).
3) Fix to use BH locking in nf_conncount, from Taehee Yoo.
4) Fix race of TX skb freeing in ipheth driver, from Bernd Eckstein.
5) Handle return value of TX NAPI completion properly in lan743x
driver, from Bryan Whitehead.
6) MAC filter deletion in i40e driver clears wrong state bit, from
Lihong Yang.
7) Fix use after free in rionet driver, from Pan Bian.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (53 commits)
s390/qeth: fix length check in SNMP processing
net: hisilicon: remove unexpected free_netdev
rapidio/rionet: do not free skb before reading its length
i40e: fix kerneldoc for xsk methods
ixgbe: recognize 1000BaseLX SFP modules as 1Gbps
i40e: Fix deletion of MAC filters
igb: fix uninitialized variables
netfilter: nf_tables: deactivate expressions in rule replecement routine
lan743x: Enable driver to work with LAN7431
tipc: fix lockdep warning during node delete
lan743x: fix return value for lan743x_tx_napi_poll
net: via: via-velocity: fix spelling mistake "alignement" -> "alignment"
qed: fix spelling mistake "attnetion" -> "attention"
net: thunderx: fix NULL pointer dereference in nic_remove
sctp: increase sk_wmem_alloc when head->truesize is increased
firestream: fix spelling mistake: "Inititing" -> "Initializing"
net: phy: add workaround for issue where PHY driver doesn't bind to the device
usbnet: ipheth: fix potential recvmsg bug and recvmsg bug 2
sparc: Adjust bpf JIT prologue for PSEUDO calls.
bpf, doc: add entries of who looks over which jits
...
Linus Torvalds [Wed, 28 Nov 2018 20:51:10 +0000 (12:51 -0800)]
Merge tag 'xtensa-
20181128' of git://github.com/jcmvbkbc/linux-xtensa
Pull Xtensa fixes from Max Filippov:
- fix kernel exception on userspace access to a currently disabled
coprocessor
- fix coprocessor data saving/restoring in configurations with multiple
coprocessors
- fix ptrace access to coprocessor data on configurations with multiple
coprocessors with high alignment requirements
* tag 'xtensa-
20181128' of git://github.com/jcmvbkbc/linux-xtensa:
xtensa: fix coprocessor part of ptrace_{get,set}xregs
xtensa: fix coprocessor context offset definitions
xtensa: enable coprocessors that are being flushed
shaoyunl [Thu, 22 Nov 2018 16:45:24 +0000 (11:45 -0500)]
drm/amdgpu: Add delay after enable RLC ucode
Driver shouldn't try to access any GFX registers until RLC is idle.
During the test, it took 12 seconds for RLC to clear the BUSY bit
in RLC_GPM_STAT register which is un-acceptable for driver.
As per RLC engineer, it would take RLC Ucode less than 10,000 GFXCLK
cycles to finish its critical section. In a lowest 300M enginer clock
setting(default from vbios), 50 us delay is enough.
This commit fix the hang when RLC introduce the work around for XGMI
which requires more cycles to setup more registers than normal
Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Felix Kuehling [Sun, 25 Nov 2018 04:25:04 +0000 (23:25 -0500)]
drm/amdgpu: Avoid endless loop in GPUVM fragment processing
Don't bounce back to the root level for fragment processing, because
huge pages are not supported at that level. This is unlikely to happen
with the default VM size on Vega, but can be exposed by limiting the
VM size with the amdgpu.vm_size module parameter.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Felix Kuehling [Sun, 25 Nov 2018 03:46:23 +0000 (22:46 -0500)]
drm/amdgpu: Cast to uint64_t before left shift
Avoid potential integer overflows with left shift in huge-page mapping
code by casting the operand to uin64_t first.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
David S. Miller [Wed, 28 Nov 2018 19:33:35 +0000 (11:33 -0800)]
Merge branch '1GbE' of git://git./linux/kernel/git/jkirsher/net-queue
Jeff Kirsher says:
====================
Intel Wired LAN Driver Fixes 2018-11-28
This series contains fixes to igb, ixgbe and i40e.
Yunjian Wang from Huawei resolves a variable that could potentially be
NULL before it is used.
Lihong fixes an i40e issue which goes back to 4.17 kernels, where
deleting any of the MAC filters was causing the incorrect syncing for
the PF.
Josh Elsasser caught that there were missing enum values in the link
capabilities for x550 devices, which was preventing link for 1000BaseLX
SFP modules.
Jan fixes the function header comments for XSK methods.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Wed, 28 Nov 2018 15:20:50 +0000 (16:20 +0100)]
s390/qeth: fix length check in SNMP processing
The response for a SNMP request can consist of multiple parts, which
the cmd callback stages into a kernel buffer until all parts have been
received. If the callback detects that the staging buffer provides
insufficient space, it bails out with error.
This processing is buggy for the first part of the response - while it
initially checks for a length of 'data_len', it later copies an
additional amount of 'offsetof(struct qeth_snmp_cmd, data)' bytes.
Fix the calculation of 'data_len' for the first part of the response.
This also nicely cleans up the memcpy code.
Fixes:
1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 28 Nov 2018 19:02:45 +0000 (11:02 -0800)]
Merge git://git./pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Disable BH while holding list spinlock in nf_conncount, from
Taehee Yoo.
2) List corruption in nf_conncount, also from Taehee.
3) Fix race that results in leaving around an empty list node in
nf_conncount, from Taehee Yoo.
4) Proper chain handling for inactive chains from the commit path,
from Florian Westphal. This includes a selftest for this.
5) Do duplicate rule handles when replacing rules, also from Florian.
6) Remove net_exit path in xt_RATEEST that results in splat, from Taehee.
7) Possible use-after-free in nft_compat when releasing extensions.
From Florian.
8) Memory leak in xt_hashlimit, from Taehee.
9) Call ip_vs_dst_notifier after ipv6_dev_notf, from Xin Long.
10) Fix cttimeout with udplite and gre, from Florian.
11) Preserve oif for IPv6 link-local generated traffic from mangle
table, from Alin Nastac.
12) Missing error handling in masquerade notifiers, from Taehee Yoo.
13) Use mutex to protect registration/unregistration of masquerade
extensions in order to prevent a race, from Taehee.
14) Incorrect condition check in tree_nodes_free(), also from Taehee.
15) Fix chain counter leak in rule replacement path, from Taehee.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pan Bian [Wed, 28 Nov 2018 07:30:24 +0000 (15:30 +0800)]
net: hisilicon: remove unexpected free_netdev
The net device ndev is freed via free_netdev when failing to register
the device. The control flow then jumps to the error handling code
block. ndev is used and freed again. Resulting in a use-after-free bug.
Signed-off-by: Pan Bian <bianpan2016@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pan Bian [Wed, 28 Nov 2018 06:53:19 +0000 (14:53 +0800)]
rapidio/rionet: do not free skb before reading its length
skb is freed via dev_kfree_skb_any, however, skb->len is read then. This
may result in a use-after-free bug.
Fixes:
e6161d64263 ("rapidio/rionet: rework driver initialization and removal")
Signed-off-by: Pan Bian <bianpan2016@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jan Sokolowski [Tue, 27 Nov 2018 17:35:35 +0000 (09:35 -0800)]
i40e: fix kerneldoc for xsk methods
One method, xsk_umem_setup, had an incorrect kernel doc
description, which has been corrected.
Also fixes small typos found in the comments.
Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Linus Torvalds [Wed, 28 Nov 2018 16:38:20 +0000 (08:38 -0800)]
Merge tag 'for-4.20-rc4-tag' of git://git./linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"Some of these bugs are being hit during testing so we'd like to get
them merged, otherwise there are usual stability fixes for stable
trees"
* tag 'for-4.20-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: relocation: set trans to be NULL after ending transaction
Btrfs: fix race between enabling quotas and subvolume creation
Btrfs: send, fix infinite loop due to directory rename dependencies
Btrfs: ensure path name is null terminated at btrfs_control_ioctl
Btrfs: fix rare chances for data loss when doing a fast fsync
btrfs: Always try all copies when reading extent buffers
Linus Torvalds [Wed, 28 Nov 2018 16:33:55 +0000 (08:33 -0800)]
Merge tag 'spi-fix-v4.20-rc4' of git://git./linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"A few driver specific fixes here, nothing big or that stands out for
anyone other than the driver users.
The omap2-mcspi fix is for issues that started showing up with a
change in defconfig in this release to make cpuidle get turned on by
default"
* tag 'spi-fix-v4.20-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: omap2-mcspi: Add missing suspend and resume calls
spi: mediatek: use correct mata->xfer_len when in fifo transfer
spi: uniphier: fix incorrect property items
Josh Elsasser [Sat, 24 Nov 2018 20:57:33 +0000 (12:57 -0800)]
ixgbe: recognize 1000BaseLX SFP modules as 1Gbps
Add the two 1000BaseLX enum values to the X550's check for 1Gbps modules,
allowing the core driver code to establish a link over this SFP type.
This is done by the out-of-tree driver but the fix wasn't in mainline.
Fixes:
e23f33367882 ("ixgbe: Fix 1G and 10G link stability for X550EM_x SFP+”)
Fixes:
6a14ee0cfb19 ("ixgbe: Add X550 support function pointers")
Signed-off-by: Josh Elsasser <jelsasser@appneta.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Linus Torvalds [Wed, 28 Nov 2018 16:29:18 +0000 (08:29 -0800)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"Bugfixes, many of them reported by syzkaller and mostly predating the
merge window"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
kvm: svm: Ensure an IBPB on all affected CPUs when freeing a vmcb
kvm: mmu: Fix race in emulated page table writes
KVM: nVMX: vmcs12 revision_id is always VMCS12_REVISION even when copied from eVMCS
KVM: nVMX: Verify eVMCS revision id match supported eVMCS version on eVMCS VMPTRLD
KVM: nVMX/nSVM: Fix bug which sets vcpu->arch.tsc_offset to L1 tsc_offset
x86/kvm/vmx: fix old-style function declaration
KVM: x86: fix empty-body warnings
KVM: VMX: Update shared MSRs to be saved/restored on MSR_EFER.LMA changes
KVM: x86: Fix kernel info-leak in KVM_HC_CLOCK_PAIRING hypercall
KVM: nVMX: Fix kernel info-leak when enabling KVM_CAP_HYPERV_ENLIGHTENED_VMCS more than once
svm: Add mutex_lock to protect apic_access_page_done on AMD systems
KVM: X86: Fix scan ioapic use-before-initialization
KVM: LAPIC: Fix pv ipis use-before-initialization
KVM: VMX: re-add ple_gap module parameter
KVM: PPC: Book3S HV: Fix handling for interrupted H_ENTER_NESTED
Lihong Yang [Wed, 21 Nov 2018 17:15:37 +0000 (09:15 -0800)]
i40e: Fix deletion of MAC filters
In __i40e_del_filter function, the flag __I40E_MACVLAN_SYNC_PENDING for
the PF state is wrongly set for the VSI. Deleting any of the MAC filters
has caused the incorrect syncing for the PF. Fix it by setting this state
flag to the intended PF.
CC: stable <stable@vger.kernel.org>
Signed-off-by: Lihong Yang <lihong.yang@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Yunjian Wang [Tue, 6 Nov 2018 08:27:12 +0000 (16:27 +0800)]
igb: fix uninitialized variables
This patch fixes the variable 'phy_word' may be used uninitialized.
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Hui Wang [Wed, 28 Nov 2018 09:11:26 +0000 (17:11 +0800)]
ALSA: usb-audio: Add vendor and product name for Dell WD19 Dock
Like the Dell WD15 Dock, the WD19 Dock (0bda:402e) doens't provide
useful string for the vendor and product names too. In order to share
the UCM with WD15, here we keep the profile_name same as the WD15.
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Taehee Yoo [Wed, 28 Nov 2018 02:27:28 +0000 (11:27 +0900)]
netfilter: nf_tables: deactivate expressions in rule replecement routine
There is no expression deactivation call from the rule replacement path,
hence, chain counter is not decremented. A few steps to reproduce the
problem:
%nft add table ip filter
%nft add chain ip filter c1
%nft add chain ip filter c1
%nft add rule ip filter c1 jump c2
%nft replace rule ip filter c1 handle 3 accept
%nft flush ruleset
<jump c2> expression means immediate NFT_JUMP to chain c2.
Reference count of chain c2 is increased when the rule is added.
When rule is deleted or replaced, the reference counter of c2 should be
decreased via nft_rule_expr_deactivate() which calls
nft_immediate_deactivate().
Splat looks like:
[ 214.396453] WARNING: CPU: 1 PID: 21 at net/netfilter/nf_tables_api.c:1432 nf_tables_chain_destroy.isra.38+0x2f9/0x3a0 [nf_tables]
[ 214.398983] Modules linked in: nf_tables nfnetlink
[ 214.398983] CPU: 1 PID: 21 Comm: kworker/1:1 Not tainted 4.20.0-rc2+ #44
[ 214.398983] Workqueue: events nf_tables_trans_destroy_work [nf_tables]
[ 214.398983] RIP: 0010:nf_tables_chain_destroy.isra.38+0x2f9/0x3a0 [nf_tables]
[ 214.398983] Code: 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 8e 00 00 00 48 8b 7b 58 e8 e1 2c 4e c6 48 89 df e8 d9 2c 4e c6 eb 9a <0f> 0b eb 96 0f 0b e9 7e fe ff ff e8 a7 7e 4e c6 e9 a4 fe ff ff e8
[ 214.398983] RSP: 0018:
ffff8881152874e8 EFLAGS:
00010202
[ 214.398983] RAX:
0000000000000001 RBX:
ffff88810ef9fc28 RCX:
ffff8881152876f0
[ 214.398983] RDX:
dffffc0000000000 RSI:
1ffff11022a50ede RDI:
ffff88810ef9fc78
[ 214.398983] RBP:
1ffff11022a50e9d R08:
0000000080000000 R09:
0000000000000000
[ 214.398983] R10:
0000000000000000 R11:
0000000000000000 R12:
1ffff11022a50eba
[ 214.398983] R13:
ffff888114446e08 R14:
ffff8881152876f0 R15:
ffffed1022a50ed6
[ 214.398983] FS:
0000000000000000(0000) GS:
ffff888116400000(0000) knlGS:
0000000000000000
[ 214.398983] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 214.398983] CR2:
00007fab9bb5f868 CR3:
000000012aa16000 CR4:
00000000001006e0
[ 214.398983] Call Trace:
[ 214.398983] ? nf_tables_table_destroy.isra.37+0x100/0x100 [nf_tables]
[ 214.398983] ? __kasan_slab_free+0x145/0x180
[ 214.398983] ? nf_tables_trans_destroy_work+0x439/0x830 [nf_tables]
[ 214.398983] ? kfree+0xdb/0x280
[ 214.398983] nf_tables_trans_destroy_work+0x5f5/0x830 [nf_tables]
[ ... ]
Fixes:
bb7b40aecbf7 ("netfilter: nf_tables: bogus EBUSY in chain deletions")
Reported by: Christoph Anton Mitterer <calestyo@scientia.net>
Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=914505
Link: https://bugzilla.kernel.org/show_bug.cgi?id=201791
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pavankumar Kondeti [Tue, 30 Oct 2018 06:54:33 +0000 (12:24 +0530)]
sched, trace: Fix prev_state output in sched_switch tracepoint
commit
3f5fe9fef5b2 ("sched/debug: Fix task state recording/printout")
tried to fix the problem introduced by a previous commit
efb40f588b43
("sched/tracing: Fix trace_sched_switch task-state printing"). However
the prev_state output in sched_switch is still broken.
task_state_index() uses fls() which considers the LSB as 1. Left
shifting 1 by this value gives an incorrect mapping to the task state.
Fix this by decrementing the value returned by __get_task_state()
before shifting.
Link: http://lkml.kernel.org/r/1540882473-1103-1-git-send-email-pkondeti@codeaurora.org
Cc: stable@vger.kernel.org
Fixes:
3f5fe9fef5b2 ("sched/debug: Fix task state recording/printout")
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Steven Rostedt (VMware) [Tue, 20 Nov 2018 17:51:07 +0000 (12:51 -0500)]
function_graph: Have profiler use curr_ret_stack and not depth
The profiler uses trace->depth to find its entry on the ret_stack, but the
depth may not match the actual location of where its entry is (if an
interrupt were to preempt the processing of the profiler for another
function, the depth and the curr_ret_stack will be different).
Have it use the curr_ret_stack as the index to find its ret_stack entry
instead of using the depth variable, as that is no longer guaranteed to be
the same.
Cc: stable@kernel.org
Fixes:
03274a3ffb449 ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Steven Rostedt (VMware) [Tue, 20 Nov 2018 17:40:25 +0000 (12:40 -0500)]
function_graph: Reverse the order of pushing the ret_stack and the callback
The function graph profiler uses the ret_stack to store the "subtime" and
reuse it by nested functions and also on the return. But the current logic
has the profiler callback called before the ret_stack is updated, and it is
just modifying the ret_stack that will later be allocated (it's just lucky
that the "subtime" is not touched when it is allocated).
This could also cause a crash if we are at the end of the ret_stack when
this happens.
By reversing the order of the allocating the ret_stack and then calling the
callbacks attached to a function being traced, the ret_stack entry is no
longer used before it is allocated.
Cc: stable@kernel.org
Fixes:
03274a3ffb449 ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Steven Rostedt (VMware) [Mon, 19 Nov 2018 20:18:40 +0000 (15:18 -0500)]
function_graph: Move return callback before update of curr_ret_stack
In the past, curr_ret_stack had two functions. One was to denote the depth
of the call graph, the other is to keep track of where on the ret_stack the
data is used. Although they may be slightly related, there are two cases
where they need to be used differently.
The one case is that it keeps the ret_stack data from being corrupted by an
interrupt coming in and overwriting the data still in use. The other is just
to know where the depth of the stack currently is.
The function profiler uses the ret_stack to save a "subtime" variable that
is part of the data on the ret_stack. If curr_ret_stack is modified too
early, then this variable can be corrupted.
The "max_depth" option, when set to 1, will record the first functions going
into the kernel. To see all top functions (when dealing with timings), the
depth variable needs to be lowered before calling the return hook. But by
lowering the curr_ret_stack, it makes the data on the ret_stack still being
used by the return hook susceptible to being overwritten.
Now that there's two variables to handle both cases (curr_ret_depth), we can
move them to the locations where they can handle both cases.
Cc: stable@kernel.org
Fixes:
03274a3ffb449 ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Steven Rostedt (VMware) [Mon, 19 Nov 2018 13:07:12 +0000 (08:07 -0500)]
function_graph: Use new curr_ret_depth to manage depth instead of curr_ret_stack
Currently, the depth of the ret_stack is determined by curr_ret_stack index.
The issue is that there's a race between setting of the curr_ret_stack and
calling of the callback attached to the return of the function.
Commit
03274a3ffb44 ("tracing/fgraph: Adjust fgraph depth before calling
trace return callback") moved the calling of the callback to after the
setting of the curr_ret_stack, even stating that it was safe to do so, when
in fact, it was the reason there was a barrier() there (yes, I should have
commented that barrier()).
Not only does the curr_ret_stack keep track of the current call graph depth,
it also keeps the ret_stack content from being overwritten by new data.
The function profiler, uses the "subtime" variable of ret_stack structure
and by moving the curr_ret_stack, it allows for interrupts to use the same
structure it was using, corrupting the data, and breaking the profiler.
To fix this, there needs to be two variables to handle the call stack depth
and the pointer to where the ret_stack is being used, as they need to change
at two different locations.
Cc: stable@kernel.org
Fixes:
03274a3ffb449 ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Steven Rostedt (VMware) [Mon, 19 Nov 2018 12:40:39 +0000 (07:40 -0500)]
function_graph: Make ftrace_push_return_trace() static
As all architectures now call function_graph_enter() to do the entry work,
no architecture should ever call ftrace_push_return_trace(). Make it static.
This is needed to prepare for a fix of a design bug on how the curr_ret_stack
is used.
Cc: stable@kernel.org
Fixes:
03274a3ffb449 ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Steven Rostedt (VMware) [Sun, 18 Nov 2018 22:37:40 +0000 (17:37 -0500)]
sparc/function_graph: Simplify with function_graph_enter()
The function_graph_enter() function does the work of calling the function
graph hook function and the management of the shadow stack, simplifying the
work done in the architecture dependent prepare_ftrace_return().
Have sparc use the new code, and remove the shadow stack management as well as
having to set up the trace structure.
This is needed to prepare for a fix of a design bug on how the curr_ret_stack
is used.
Cc: "David S. Miller" <davem@davemloft.net>
Cc: sparclinux@vger.kernel.org
Cc: stable@kernel.org
Fixes:
03274a3ffb449 ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Steven Rostedt (VMware) [Sun, 18 Nov 2018 22:35:37 +0000 (17:35 -0500)]
sh/function_graph: Simplify with function_graph_enter()
The function_graph_enter() function does the work of calling the function
graph hook function and the management of the shadow stack, simplifying the
work done in the architecture dependent prepare_ftrace_return().
Have superh use the new code, and remove the shadow stack management as well as
having to set up the trace structure.
This is needed to prepare for a fix of a design bug on how the curr_ret_stack
is used.
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: linux-sh@vger.kernel.org
Cc: stable@kernel.org
Fixes:
03274a3ffb449 ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Steven Rostedt (VMware) [Sun, 18 Nov 2018 22:33:17 +0000 (17:33 -0500)]
s390/function_graph: Simplify with function_graph_enter()
The function_graph_enter() function does the work of calling the function
graph hook function and the management of the shadow stack, simplifying the
work done in the architecture dependent prepare_ftrace_return().
Have s390 use the new code, and remove the shadow stack management as well as
having to set up the trace structure.
This is needed to prepare for a fix of a design bug on how the curr_ret_stack
is used.
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Julian Wiedmann <jwi@linux.ibm.com>
Cc: linux-s390@vger.kernel.org
Cc: stable@kernel.org
Fixes:
03274a3ffb449 ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Steven Rostedt (VMware) [Sun, 18 Nov 2018 22:31:44 +0000 (17:31 -0500)]
riscv/function_graph: Simplify with function_graph_enter()
The function_graph_enter() function does the work of calling the function
graph hook function and the management of the shadow stack, simplifying the
work done in the architecture dependent prepare_ftrace_return().
Have riscv use the new code, and remove the shadow stack management as well as
having to set up the trace structure.
This is needed to prepare for a fix of a design bug on how the curr_ret_stack
is used.
Cc: Greentime Hu <greentime@andestech.com>
Cc: Alan Kao <alankao@andestech.com>
Cc: stable@kernel.org
Fixes:
03274a3ffb449 ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
Reviewed-by: Palmer Dabbelt <palmer@sifive.com>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Steven Rostedt (VMware) [Sun, 18 Nov 2018 22:28:53 +0000 (17:28 -0500)]
powerpc/function_graph: Simplify with function_graph_enter()
The function_graph_enter() function does the work of calling the function
graph hook function and the management of the shadow stack, simplifying the
work done in the architecture dependent prepare_ftrace_return().
Have powerpc use the new code, and remove the shadow stack management as well as
having to set up the trace structure.
This is needed to prepare for a fix of a design bug on how the curr_ret_stack
is used.
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: stable@kernel.org
Fixes:
03274a3ffb449 ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Steven Rostedt (VMware) [Sun, 18 Nov 2018 22:27:43 +0000 (17:27 -0500)]
parisc: function_graph: Simplify with function_graph_enter()
The function_graph_enter() function does the work of calling the function
graph hook function and the management of the shadow stack, simplifying the
work done in the architecture dependent prepare_ftrace_return().
Have parisc use the new code, and remove the shadow stack management as well as
having to set up the trace structure.
This is needed to prepare for a fix of a design bug on how the curr_ret_stack
is used.
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: linux-parisc@vger.kernel.org
Cc: stable@kernel.org
Fixes:
03274a3ffb449 ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Steven Rostedt (VMware) [Sun, 18 Nov 2018 22:26:35 +0000 (17:26 -0500)]
nds32: function_graph: Simplify with function_graph_enter()
The function_graph_enter() function does the work of calling the function
graph hook function and the management of the shadow stack, simplifying the
work done in the architecture dependent prepare_ftrace_return().
Have nds32 use the new code, and remove the shadow stack management as well as
having to set up the trace structure.
This is needed to prepare for a fix of a design bug on how the curr_ret_stack
is used.
Cc: Greentime Hu <greentime@andestech.com>
Cc: stable@kernel.org
Fixes:
03274a3ffb449 ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Steven Rostedt (VMware) [Sun, 18 Nov 2018 22:25:18 +0000 (17:25 -0500)]
MIPS: function_graph: Simplify with function_graph_enter()
The function_graph_enter() function does the work of calling the function
graph hook function and the management of the shadow stack, simplifying the
work done in the architecture dependent prepare_ftrace_return().
Have MIPS use the new code, and remove the shadow stack management as well as
having to set up the trace structure.
This is needed to prepare for a fix of a design bug on how the curr_ret_stack
is used.
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Burton <paul.burton@mips.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: linux-mips@linux-mips.org
Cc: stable@kernel.org
Fixes:
03274a3ffb449 ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Steven Rostedt (VMware) [Sun, 18 Nov 2018 22:23:30 +0000 (17:23 -0500)]
microblaze: function_graph: Simplify with function_graph_enter()
The function_graph_enter() function does the work of calling the function
graph hook function and the management of the shadow stack, simplifying the
work done in the architecture dependent prepare_ftrace_return().
Have microblaze use the new code, and remove the shadow stack management as well as
having to set up the trace structure.
This is needed to prepare for a fix of a design bug on how the curr_ret_stack
is used.
Cc: Michal Simek <monstr@monstr.eu>
Cc: stable@kernel.org
Fixes:
03274a3ffb449 ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Steven Rostedt (VMware) [Sun, 18 Nov 2018 22:21:51 +0000 (17:21 -0500)]
arm64: function_graph: Simplify with function_graph_enter()
The function_graph_enter() function does the work of calling the function
graph hook function and the management of the shadow stack, simplifying the
work done in the architecture dependent prepare_ftrace_return().
Have arm64 use the new code, and remove the shadow stack management as well as
having to set up the trace structure.
This is needed to prepare for a fix of a design bug on how the curr_ret_stack
is used.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: stable@kernel.org
Fixes:
03274a3ffb449 ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
Acked-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Steven Rostedt (VMware) [Sun, 18 Nov 2018 22:19:26 +0000 (17:19 -0500)]
ARM: function_graph: Simplify with function_graph_enter()
The function_graph_enter() function does the work of calling the function
graph hook function and the management of the shadow stack, simplifying the
work done in the architecture dependent prepare_ftrace_return().
Have ARM use the new code, and remove the shadow stack management as well as
having to set up the trace structure.
This is needed to prepare for a fix of a design bug on how the curr_ret_stack
is used.
Cc: Russell King <linux@armlinux.org.uk>
Cc: linux-arm-kernel@lists.infradead.org
Cc: stable@kernel.org
Fixes:
03274a3ffb449 ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Steven Rostedt (VMware) [Sun, 18 Nov 2018 22:14:10 +0000 (17:14 -0500)]
x86/function_graph: Simplify with function_graph_enter()
The function_graph_enter() function does the work of calling the function
graph hook function and the management of the shadow stack, simplifying the
work done in the architecture dependent prepare_ftrace_return().
Have x86 use the new code, and remove the shadow stack management as well as
having to set up the trace structure.
This is needed to prepare for a fix of a design bug on how the curr_ret_stack
is used.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: stable@kernel.org
Fixes:
03274a3ffb449 ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Bryan Whitehead [Mon, 26 Nov 2018 17:27:10 +0000 (12:27 -0500)]
lan743x: Enable driver to work with LAN7431
This driver was designed to work with both LAN7430 and LAN7431.
The only difference between the two is the LAN7431 has support
for external phy.
This change adds LAN7431 to the list of recognized devices
supported by this driver.
Updates for v2:
changed 'fixes' tag to match defined format
fixes:
23f0703c125b ("lan743x: Add main source files for new lan743x driver")
Signed-off-by: Bryan Whitehead <Bryan.Whitehead@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Maloy [Mon, 26 Nov 2018 17:26:14 +0000 (12:26 -0500)]
tipc: fix lockdep warning during node delete
We see the following lockdep warning:
[ 2284.078521] ======================================================
[ 2284.078604] WARNING: possible circular locking dependency detected
[ 2284.078604] 4.19.0+ #42 Tainted: G E
[ 2284.078604] ------------------------------------------------------
[ 2284.078604] rmmod/254 is trying to acquire lock:
[ 2284.078604]
00000000acd94e28 ((&n->timer)#2){+.-.}, at: del_timer_sync+0x5/0xa0
[ 2284.078604]
[ 2284.078604] but task is already holding lock:
[ 2284.078604]
00000000f997afc0 (&(&tn->node_list_lock)->rlock){+.-.}, at: tipc_node_stop+0xac/0x190 [tipc]
[ 2284.078604]
[ 2284.078604] which lock already depends on the new lock.
[ 2284.078604]
[ 2284.078604]
[ 2284.078604] the existing dependency chain (in reverse order) is:
[ 2284.078604]
[ 2284.078604] -> #1 (&(&tn->node_list_lock)->rlock){+.-.}:
[ 2284.078604] tipc_node_timeout+0x20a/0x330 [tipc]
[ 2284.078604] call_timer_fn+0xa1/0x280
[ 2284.078604] run_timer_softirq+0x1f2/0x4d0
[ 2284.078604] __do_softirq+0xfc/0x413
[ 2284.078604] irq_exit+0xb5/0xc0
[ 2284.078604] smp_apic_timer_interrupt+0xac/0x210
[ 2284.078604] apic_timer_interrupt+0xf/0x20
[ 2284.078604] default_idle+0x1c/0x140
[ 2284.078604] do_idle+0x1bc/0x280
[ 2284.078604] cpu_startup_entry+0x19/0x20
[ 2284.078604] start_secondary+0x187/0x1c0
[ 2284.078604] secondary_startup_64+0xa4/0xb0
[ 2284.078604]
[ 2284.078604] -> #0 ((&n->timer)#2){+.-.}:
[ 2284.078604] del_timer_sync+0x34/0xa0
[ 2284.078604] tipc_node_delete+0x1a/0x40 [tipc]
[ 2284.078604] tipc_node_stop+0xcb/0x190 [tipc]
[ 2284.078604] tipc_net_stop+0x154/0x170 [tipc]
[ 2284.078604] tipc_exit_net+0x16/0x30 [tipc]
[ 2284.078604] ops_exit_list.isra.8+0x36/0x70
[ 2284.078604] unregister_pernet_operations+0x87/0xd0
[ 2284.078604] unregister_pernet_subsys+0x1d/0x30
[ 2284.078604] tipc_exit+0x11/0x6f2 [tipc]
[ 2284.078604] __x64_sys_delete_module+0x1df/0x240
[ 2284.078604] do_syscall_64+0x66/0x460
[ 2284.078604] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 2284.078604]
[ 2284.078604] other info that might help us debug this:
[ 2284.078604]
[ 2284.078604] Possible unsafe locking scenario:
[ 2284.078604]
[ 2284.078604] CPU0 CPU1
[ 2284.078604] ---- ----
[ 2284.078604] lock(&(&tn->node_list_lock)->rlock);
[ 2284.078604] lock((&n->timer)#2);
[ 2284.078604] lock(&(&tn->node_list_lock)->rlock);
[ 2284.078604] lock((&n->timer)#2);
[ 2284.078604]
[ 2284.078604] *** DEADLOCK ***
[ 2284.078604]
[ 2284.078604] 3 locks held by rmmod/254:
[ 2284.078604] #0:
000000003368be9b (pernet_ops_rwsem){+.+.}, at: unregister_pernet_subsys+0x15/0x30
[ 2284.078604] #1:
0000000046ed9c86 (rtnl_mutex){+.+.}, at: tipc_net_stop+0x144/0x170 [tipc]
[ 2284.078604] #2:
00000000f997afc0 (&(&tn->node_list_lock)->rlock){+.-.}, at: tipc_node_stop+0xac/0x19
[...}
The reason is that the node timer handler sometimes needs to delete a
node which has been disconnected for too long. To do this, it grabs
the lock 'node_list_lock', which may at the same time be held by the
generic node cleanup function, tipc_node_stop(), during module removal.
Since the latter is calling del_timer_sync() inside the same lock, we
have a potential deadlock.
We fix this letting the timer cleanup function use spin_trylock()
instead of just spin_lock(), and when it fails to grab the lock it
just returns so that the timer handler can terminate its execution.
This is safe to do, since tipc_node_stop() anyway is about to
delete both the timer and the node instance.
Fixes:
6a939f365bdb ("tipc: Auto removal of peer down node instance")
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bryan Whitehead [Mon, 26 Nov 2018 17:04:57 +0000 (12:04 -0500)]
lan743x: fix return value for lan743x_tx_napi_poll
The lan743x driver, when under heavy traffic load, has been noticed
to sometimes hang, or cause a kernel panic.
Debugging reveals that the TX napi poll routine was returning
the wrong value, 'weight'. Most other drivers return 0.
And call napi_complete, instead of napi_complete_done.
Additionally when creating the tx napi poll routine.
Changed netif_napi_add, to netif_tx_napi_add.
Updates for v3:
changed 'fixes' tag to match defined format
Updates for v2:
use napi_complete, instead of napi_complete_done in
lan743x_tx_napi_poll
use netif_tx_napi_add, instead of netif_napi_add for
registration of tx napi poll routine
fixes:
23f0703c125b ("lan743x: Add main source files for new lan743x driver")
Signed-off-by: Bryan Whitehead <Bryan.Whitehead@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Mon, 26 Nov 2018 15:34:01 +0000 (15:34 +0000)]
net: via: via-velocity: fix spelling mistake "alignement" -> "alignment"
The text in array velocity_gstrings contains a spelling mistake,
rename rx_frame_alignement_errors to rx_frame_alignment_errors.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Mon, 26 Nov 2018 15:53:54 +0000 (15:53 +0000)]
qed: fix spelling mistake "attnetion" -> "attention"
The text in array s_igu_fifo_error_strs contains a spelling mistake,
fix it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lorenzo Bianconi [Mon, 26 Nov 2018 14:07:16 +0000 (15:07 +0100)]
net: thunderx: fix NULL pointer dereference in nic_remove
Fix a possible NULL pointer dereference in nic_remove routine
removing the nicpf module if nic_probe fails.
The issue can be triggered with the following reproducer:
$rmmod nicvf
$rmmod nicpf
[ 521.412008] Unable to handle kernel access to user memory outside uaccess routines at virtual address
0000000000000014
[ 521.422777] Mem abort info:
[ 521.425561] ESR = 0x96000004
[ 521.428624] Exception class = DABT (current EL), IL = 32 bits
[ 521.434535] SET = 0, FnV = 0
[ 521.437579] EA = 0, S1PTW = 0
[ 521.440730] Data abort info:
[ 521.443603] ISV = 0, ISS = 0x00000004
[ 521.447431] CM = 0, WnR = 0
[ 521.450417] user pgtable: 4k pages, 48-bit VAs, pgdp =
0000000072a3da42
[ 521.457022] [
0000000000000014] pgd=
0000000000000000
[ 521.461916] Internal error: Oops:
96000004 [#1] SMP
[ 521.511801] Hardware name: GIGABYTE H270-T70/MT70-HD0, BIOS T49 02/02/2018
[ 521.518664] pstate:
80400005 (Nzcv daif +PAN -UAO)
[ 521.523451] pc : nic_remove+0x24/0x88 [nicpf]
[ 521.527808] lr : pci_device_remove+0x48/0xd8
[ 521.532066] sp :
ffff000013433cc0
[ 521.535370] x29:
ffff000013433cc0 x28:
ffff810f6ac50000
[ 521.540672] x27:
0000000000000000 x26:
0000000000000000
[ 521.545974] x25:
0000000056000000 x24:
0000000000000015
[ 521.551274] x23:
ffff8007ff89a110 x22:
ffff000001667070
[ 521.556576] x21:
ffff8007ffb170b0 x20:
ffff8007ffb17000
[ 521.561877] x19:
0000000000000000 x18:
0000000000000025
[ 521.567178] x17:
0000000000000000 x16:
000000000000010ffc33ff98 x8 :
0000000000000000
[ 521.593683] x7 :
0000000000000000 x6 :
0000000000000001
[ 521.598983] x5 :
0000000000000002 x4 :
0000000000000003
[ 521.604284] x3 :
ffff8007ffb17184 x2 :
ffff8007ffb17184
[ 521.609585] x1 :
ffff000001662118 x0 :
ffff000008557be0
[ 521.614887] Process rmmod (pid: 1897, stack limit = 0x00000000859535c3)
[ 521.621490] Call trace:
[ 521.623928] nic_remove+0x24/0x88 [nicpf]
[ 521.627927] pci_device_remove+0x48/0xd8
[ 521.631847] device_release_driver_internal+0x1b0/0x248
[ 521.637062] driver_detach+0x50/0xc0
[ 521.640628] bus_remove_driver+0x60/0x100
[ 521.644627] driver_unregister+0x34/0x60
[ 521.648538] pci_unregister_driver+0x24/0xd8
[ 521.652798] nic_cleanup_module+0x14/0x111c [nicpf]
[ 521.657672] __arm64_sys_delete_module+0x150/0x218
[ 521.662460] el0_svc_handler+0x94/0x110
[ 521.666287] el0_svc+0x8/0xc
[ 521.669160] Code:
aa1e03e0 9102c295 d503201f f9404eb3 (
b9401660)
Fixes:
4863dea3fab0 ("net: Adding support for Cavium ThunderX network controller")
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Mon, 26 Nov 2018 06:52:44 +0000 (14:52 +0800)]
sctp: increase sk_wmem_alloc when head->truesize is increased
I changed to count sk_wmem_alloc by skb truesize instead of 1 to
fix the sk_wmem_alloc leak caused by later truesize's change in
xfrm in Commit
02968ccf0125 ("sctp: count sk_wmem_alloc by skb
truesize in sctp_packet_transmit").
But I should have also increased sk_wmem_alloc when head->truesize
is increased in sctp_packet_gso_append() as xfrm does. Otherwise,
sctp gso packet will cause sk_wmem_alloc underflow.
Fixes:
02968ccf0125 ("sctp: count sk_wmem_alloc by skb truesize in sctp_packet_transmit")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Sun, 25 Nov 2018 23:21:08 +0000 (23:21 +0000)]
firestream: fix spelling mistake: "Inititing" -> "Initializing"
There are spelling mistakes in debug messages, fix them.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Fri, 23 Nov 2018 18:41:29 +0000 (19:41 +0100)]
net: phy: add workaround for issue where PHY driver doesn't bind to the device
After switching the r8169 driver to use phylib some user reported that
their network is broken. This was caused by the genphy PHY driver being
used instead of the dedicated PHY driver for the RTL8211B. Users
reported that loading the Realtek PHY driver module upfront fixes the
issue. See also this mail thread:
https://marc.info/?t=
154279781800003&r=1&w=2
The issue is quite weird and the root cause seems to be somewhere in
the base driver core. The patch works around the issue and may be
removed once the actual issue is fixed.
The Fixes tag refers to the first reported occurrence of the issue.
The issue itself may have been existing much longer and it may affect
users of other network chips as well. Users typically will recognize
this issue only if their PHY stops working when being used with the
genphy driver.
Fixes:
f1e911d5d0df ("r8169: add basic phylib support")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bernd Eckstein [Fri, 23 Nov 2018 12:51:26 +0000 (13:51 +0100)]
usbnet: ipheth: fix potential recvmsg bug and recvmsg bug 2
The bug is not easily reproducable, as it may occur very infrequently
(we had machines with 20minutes heavy downloading before it occurred)
However, on a virual machine (VMWare on Windows 10 host) it occurred
pretty frequently (1-2 seconds after a speedtest was started)
dev->tx_skb mab be freed via dev_kfree_skb_irq on a callback
before it is set.
This causes the following problems:
- double free of the skb or potential memory leak
- in dmesg: 'recvmsg bug' and 'recvmsg bug 2' and eventually
general protection fault
Example dmesg output:
[ 134.841986] ------------[ cut here ]------------
[ 134.841987] recvmsg bug: copied
9C24A555 seq
9C24B557 rcvnxt
9C25A6B3 fl 0
[ 134.841993] WARNING: CPU: 7 PID: 2629 at /build/linux-hwe-On9fm7/linux-hwe-4.15.0/net/ipv4/tcp.c:1865 tcp_recvmsg+0x44d/0xab0
[ 134.841994] Modules linked in: ipheth(OE) kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd vmw_balloon intel_rapl_perf joydev input_leds serio_raw vmw_vsock_vmci_transport vsock shpchp i2c_piix4 mac_hid binfmt_misc vmw_vmci parport_pc ppdev lp parport autofs4 vmw_pvscsi vmxnet3 hid_generic usbhid hid vmwgfx ttm drm_kms_helper syscopyarea sysfillrect mptspi mptscsih sysimgblt ahci psmouse fb_sys_fops pata_acpi mptbase libahci e1000 drm scsi_transport_spi
[ 134.842046] CPU: 7 PID: 2629 Comm: python Tainted: G W OE 4.15.0-34-generic #37~16.04.1-Ubuntu
[ 134.842046] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/19/2017
[ 134.842048] RIP: 0010:tcp_recvmsg+0x44d/0xab0
[ 134.842048] RSP: 0018:
ffffa6630422bcc8 EFLAGS:
00010286
[ 134.842049] RAX:
0000000000000000 RBX:
ffff997616f4f200 RCX:
0000000000000006
[ 134.842049] RDX:
0000000000000007 RSI:
0000000000000082 RDI:
ffff9976257d6490
[ 134.842050] RBP:
ffffa6630422bd98 R08:
0000000000000001 R09:
000000000004bba4
[ 134.842050] R10:
0000000001e00c6f R11:
000000000004bba4 R12:
ffff99760dee3000
[ 134.842051] R13:
0000000000000000 R14:
ffff99760dee3514 R15:
0000000000000000
[ 134.842051] FS:
00007fe332347700(0000) GS:
ffff9976257c0000(0000) knlGS:
0000000000000000
[ 134.842052] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 134.842053] CR2:
0000000001e41000 CR3:
000000020e9b4006 CR4:
00000000003606e0
[ 134.842055] DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
[ 134.842055] DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
[ 134.842057] Call Trace:
[ 134.842060] ? aa_sk_perm+0x53/0x1a0
[ 134.842064] inet_recvmsg+0x51/0xc0
[ 134.842066] sock_recvmsg+0x43/0x50
[ 134.842070] SYSC_recvfrom+0xe4/0x160
[ 134.842072] ? __schedule+0x3de/0x8b0
[ 134.842075] ? ktime_get_ts64+0x4c/0xf0
[ 134.842079] SyS_recvfrom+0xe/0x10
[ 134.842082] do_syscall_64+0x73/0x130
[ 134.842086] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 134.842086] RIP: 0033:0x7fe331f5a81d
[ 134.842088] RSP: 002b:
00007ffe8da98398 EFLAGS:
00000246 ORIG_RAX:
000000000000002d
[ 134.842090] RAX:
ffffffffffffffda RBX:
ffffffffffffffff RCX:
00007fe331f5a81d
[ 134.842094] RDX:
00000000000003fb RSI:
0000000001e00874 RDI:
0000000000000003
[ 134.842095] RBP:
00007fe32f642c70 R08:
0000000000000000 R09:
0000000000000000
[ 134.842097] R10:
0000000000000000 R11:
0000000000000246 R12:
00007fe332347698
[ 134.842099] R13:
0000000001b7e0a0 R14:
0000000001e00874 R15:
0000000000000000
[ 134.842103] Code: 24 fd ff ff e9 cc fe ff ff 48 89 d8 41 8b 8c 24 10 05 00 00 44 8b 45 80 48 c7 c7 08 bd 59 8b 48 89 85 68 ff ff ff e8 b3 c4 7d ff <0f> 0b 48 8b 85 68 ff ff ff e9 e9 fe ff ff 41 8b 8c 24 10 05 00
[ 134.842126] ---[ end trace
b7138fc08c83147f ]---
[ 134.842144] general protection fault: 0000 [#1] SMP PTI
[ 134.842145] Modules linked in: ipheth(OE) kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd vmw_balloon intel_rapl_perf joydev input_leds serio_raw vmw_vsock_vmci_transport vsock shpchp i2c_piix4 mac_hid binfmt_misc vmw_vmci parport_pc ppdev lp parport autofs4 vmw_pvscsi vmxnet3 hid_generic usbhid hid vmwgfx ttm drm_kms_helper syscopyarea sysfillrect mptspi mptscsih sysimgblt ahci psmouse fb_sys_fops pata_acpi mptbase libahci e1000 drm scsi_transport_spi
[ 134.842161] CPU: 7 PID: 2629 Comm: python Tainted: G W OE 4.15.0-34-generic #37~16.04.1-Ubuntu
[ 134.842162] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/19/2017
[ 134.842164] RIP: 0010:tcp_close+0x2c6/0x440
[ 134.842165] RSP: 0018:
ffffa6630422bde8 EFLAGS:
00010202
[ 134.842167] RAX:
0000000000000000 RBX:
ffff99760dee3000 RCX:
0000000180400034
[ 134.842168] RDX:
5c4afd407207a6c4 RSI:
ffffe868495bd300 RDI:
ffff997616f4f200
[ 134.842169] RBP:
ffffa6630422be08 R08:
0000000016f4d401 R09:
0000000180400034
[ 134.842169] R10:
ffffa6630422bd98 R11:
0000000000000000 R12:
000000000000600c
[ 134.842170] R13:
0000000000000000 R14:
ffff99760dee30c8 R15:
ffff9975bd44fe00
[ 134.842171] FS:
00007fe332347700(0000) GS:
ffff9976257c0000(0000) knlGS:
0000000000000000
[ 134.842173] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 134.842174] CR2:
0000000001e41000 CR3:
000000020e9b4006 CR4:
00000000003606e0
[ 134.842177] DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
[ 134.842178] DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
[ 134.842179] Call Trace:
[ 134.842181] inet_release+0x42/0x70
[ 134.842183] __sock_release+0x42/0xb0
[ 134.842184] sock_close+0x15/0x20
[ 134.842187] __fput+0xea/0x220
[ 134.842189] ____fput+0xe/0x10
[ 134.842191] task_work_run+0x8a/0xb0
[ 134.842193] exit_to_usermode_loop+0xc4/0xd0
[ 134.842195] do_syscall_64+0xf4/0x130
[ 134.842197] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 134.842197] RIP: 0033:0x7fe331f5a560
[ 134.842198] RSP: 002b:
00007ffe8da982e8 EFLAGS:
00000246 ORIG_RAX:
0000000000000003
[ 134.842200] RAX:
0000000000000000 RBX:
00007fe32f642c70 RCX:
00007fe331f5a560
[ 134.842201] RDX:
00000000008f5320 RSI:
0000000001cd4b50 RDI:
0000000000000003
[ 134.842202] RBP:
00007fe32f6500f8 R08:
000000000000003c R09:
00000000009343c0
[ 134.842203] R10:
0000000000000000 R11:
0000000000000246 R12:
00007fe32f6500d0
[ 134.842204] R13:
00000000008f5320 R14:
00000000008f5320 R15:
0000000001cd4770
[ 134.842205] Code: c8 00 00 00 45 31 e4 49 39 fe 75 4d eb 50 83 ab d8 00 00 00 01 48 8b 17 48 8b 47 08 48 c7 07 00 00 00 00 48 c7 47 08 00 00 00 00 <48> 89 42 08 48 89 10 0f b6 57 34 8b 47 2c 2b 47 28 83 e2 01 80
[ 134.842226] RIP: tcp_close+0x2c6/0x440 RSP:
ffffa6630422bde8
[ 134.842227] ---[ end trace
b7138fc08c831480 ]---
The proposed patch eliminates a potential racing condition.
Before, usb_submit_urb was called and _after_ that, the skb was attached
(dev->tx_skb). So, on a callback it was possible, however unlikely that the
skb was freed before it was set. That way (because dev->tx_skb was not set
to NULL after it was freed), it could happen that a skb from a earlier
transmission was freed a second time (and the skb we should have freed did
not get freed at all)
Now we free the skb directly in ipheth_tx(). It is not passed to the
callback anymore, eliminating the posibility of a double free of the same
skb. Depending on the retval of usb_submit_urb() we use dev_kfree_skb_any()
respectively dev_consume_skb_any() to free the skb.
Signed-off-by: Oliver Zweigle <Oliver.Zweigle@faro.com>
Signed-off-by: Bernd Eckstein <3ernd.Eckstein@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 27 Nov 2018 17:52:05 +0000 (09:52 -0800)]
Merge git://git./pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2018-11-27
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Fix several bugs in BPF sparc JIT, that is, convergence for fused
branches, initialization of frame pointer register, and moving all
arguments into output registers from input registers in prologue
to fix BPF to BPF calls, from David.
2) Fix a bug in arm64 JIT for fetching BPF to BPF call addresses where
they are not guaranteed to fit into imm field and therefore must be
retrieved through prog aux data, from Daniel.
3) Explicitly add all JITs to MAINTAINERS file with developers able to
help out in feature development, fixes, review, etc.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Takashi Iwai [Tue, 27 Nov 2018 15:06:42 +0000 (16:06 +0100)]
Merge tag 'asoc-v4.20-rc4' of https://git./linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v4.20
Lots of fixes here, the majority of which are driver specific but
there's a couple of core things and one notable driver specific one:
- A core fix for a DAPM regression introduced during the component
refactoring, we'd lost the code that forced a reevaluation of the
DAPM graph after probe (which we suppress during init to save lots
of recalcuation) and have now restored it.
- A core fix for error handling using the newly added
for_each_rtd_codec_dai_rollback() macro.
- A fix for the names of widgets in the newly introduced pcm3060
driver, merged as a fix so we don't have a release with legacy names.
Martin Schwidefsky [Tue, 27 Nov 2018 13:04:04 +0000 (14:04 +0100)]
s390/mm: correct pgtable_bytes on page table downgrade
The downgrade of a page table from 3 levels to 2 levels for a 31-bit compat
process removes a pmd table which has to be counted against pgtable_bytes.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Jim Mattson [Tue, 22 May 2018 16:54:20 +0000 (09:54 -0700)]
kvm: svm: Ensure an IBPB on all affected CPUs when freeing a vmcb
Previously, we only called indirect_branch_prediction_barrier on the
logical CPU that freed a vmcb. This function should be called on all
logical CPUs that last loaded the vmcb in question.
Fixes:
15d45071523d ("KVM/x86: Add IBPB support")
Reported-by: Neel Natu <neelnatu@google.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Junaid Shahid [Wed, 31 Oct 2018 21:53:57 +0000 (14:53 -0700)]
kvm: mmu: Fix race in emulated page table writes
When a guest page table is updated via an emulated write,
kvm_mmu_pte_write() is called to update the shadow PTE using the just
written guest PTE value. But if two emulated guest PTE writes happened
concurrently, it is possible that the guest PTE and the shadow PTE end
up being out of sync. Emulated writes do not mark the shadow page as
unsync-ed, so this inconsistency will not be resolved even by a guest TLB
flush (unless the page was marked as unsync-ed at some other point).
This is fixed by re-reading the current value of the guest PTE after the
MMU lock has been acquired instead of just using the value that was
written prior to calling kvm_mmu_pte_write().
Signed-off-by: Junaid Shahid <junaids@google.com>
Reviewed-by: Wanpeng Li <wanpengli@tencent.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Liran Alon [Tue, 13 Nov 2018 15:44:46 +0000 (17:44 +0200)]
KVM: nVMX: vmcs12 revision_id is always VMCS12_REVISION even when copied from eVMCS
vmcs12 represents the per-CPU cache of L1 active vmcs12.
This cache can be loaded by one of the following:
1) Guest making a vmcs12 active by exeucting VMPTRLD
2) Guest specifying eVMCS in VP assist page and executing
VMLAUNCH/VMRESUME.
Either way, vmcs12 should have revision_id of VMCS12_REVISION.
Which is not equal to eVMCS revision_id which specifies used
VersionNumber of eVMCS struct (e.g. KVM_EVMCS_VERSION).
Specifically, this causes an issue in restoring a nested VM state
because vmx_set_nested_state() verifies that vmcs12->revision_id
is equal to VMCS12_REVISION which was not true in case vmcs12
was populated from an eVMCS by vmx_get_nested_state() which calls
copy_enlightened_to_vmcs12().
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Liran Alon [Thu, 1 Nov 2018 08:57:39 +0000 (10:57 +0200)]
KVM: nVMX: Verify eVMCS revision id match supported eVMCS version on eVMCS VMPTRLD
According to TLFS section 16.11.2 Enlightened VMCS, the first u32
field of eVMCS should specify eVMCS VersionNumber.
This version should be in the range of supported eVMCS versions exposed
to guest via CPUID.0x4000000A.EAX[0:15].
The range which KVM expose to guest in this CPUID field should be the
same as the value returned in vmcs_version by nested_enable_evmcs().
According to the above, eVMCS VMPTRLD should verify that version specified
in given eVMCS is in the supported range. However, current code
mistakenly verfies this field against VMCS12_REVISION.
One can also see that when KVM use eVMCS, it makes sure that
alloc_vmcs_cpu() sets allocated eVMCS revision_id to KVM_EVMCS_VERSION.
Obvious fix should just change eVMCS VMPTRLD to verify first u32 field
of eVMCS is equal to KVM_EVMCS_VERSION.
However, it turns out that Microsoft Hyper-V fails to comply to their
own invented interface: When Hyper-V use eVMCS, it just sets first u32
field of eVMCS to revision_id specified in MSR_IA32_VMX_BASIC (In our
case: VMCS12_REVISION). Instead of used eVMCS version number which is
one of the supported versions specified in CPUID.0x4000000A.EAX[0:15].
To overcome Hyper-V bug, we accept either a supported eVMCS version
or VMCS12_REVISION as valid values for first u32 field of eVMCS.
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
Reviewed-by: Mark Kanda <mark.kanda@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Leonid Shatz [Tue, 6 Nov 2018 10:14:25 +0000 (12:14 +0200)]
KVM: nVMX/nSVM: Fix bug which sets vcpu->arch.tsc_offset to L1 tsc_offset
Since commit
e79f245ddec1 ("X86/KVM: Properly update 'tsc_offset' to
represent the running guest"), vcpu->arch.tsc_offset meaning was
changed to always reflect the tsc_offset value set on active VMCS.
Regardless if vCPU is currently running L1 or L2.
However, above mentioned commit failed to also change
kvm_vcpu_write_tsc_offset() to set vcpu->arch.tsc_offset correctly.
This is because vmx_write_tsc_offset() could set the tsc_offset value
in active VMCS to given offset parameter *plus vmcs12->tsc_offset*.
However, kvm_vcpu_write_tsc_offset() just sets vcpu->arch.tsc_offset
to given offset parameter. Without taking into account the possible
addition of vmcs12->tsc_offset. (Same is true for SVM case).
Fix this issue by changing kvm_x86_ops->write_tsc_offset() to return
actually set tsc_offset in active VMCS and modify
kvm_vcpu_write_tsc_offset() to set returned value in
vcpu->arch.tsc_offset.
In addition, rename write_tsc_offset() callback to write_l1_tsc_offset()
to make it clear that it is meant to set L1 TSC offset.
Fixes:
e79f245ddec1 ("X86/KVM: Properly update 'tsc_offset' to represent the running guest")
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: Leonid Shatz <leonid.shatz@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Yi Wang [Thu, 8 Nov 2018 03:22:21 +0000 (11:22 +0800)]
x86/kvm/vmx: fix old-style function declaration
The inline keyword which is not at the beginning of the function
declaration may trigger the following build warnings, so let's fix it:
arch/x86/kvm/vmx.c:1309:1: warning: ‘inline’ is not at beginning of declaration [-Wold-style-declaration]
arch/x86/kvm/vmx.c:5947:1: warning: ‘inline’ is not at beginning of declaration [-Wold-style-declaration]
arch/x86/kvm/vmx.c:5985:1: warning: ‘inline’ is not at beginning of declaration [-Wold-style-declaration]
arch/x86/kvm/vmx.c:6023:1: warning: ‘inline’ is not at beginning of declaration [-Wold-style-declaration]
Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Yi Wang [Thu, 8 Nov 2018 08:48:36 +0000 (16:48 +0800)]
KVM: x86: fix empty-body warnings
We get the following warnings about empty statements when building
with 'W=1':
arch/x86/kvm/lapic.c:632:53: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body]
arch/x86/kvm/lapic.c:1907:42: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body]
arch/x86/kvm/lapic.c:1936:65: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body]
arch/x86/kvm/lapic.c:1975:44: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body]
Rework the debug helper macro to get rid of these warnings.
Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Liran Alon [Tue, 20 Nov 2018 16:03:25 +0000 (18:03 +0200)]
KVM: VMX: Update shared MSRs to be saved/restored on MSR_EFER.LMA changes
When guest transitions from/to long-mode by modifying MSR_EFER.LMA,
the list of shared MSRs to be saved/restored on guest<->host
transitions is updated (See vmx_set_efer() call to setup_msrs()).
On every entry to guest, vcpu_enter_guest() calls
vmx_prepare_switch_to_guest(). This function should also take care
of setting the shared MSRs to be saved/restored. However, the
function does nothing in case we are already running with loaded
guest state (vmx->loaded_cpu_state != NULL).
This means that even when guest modifies MSR_EFER.LMA which results
in updating the list of shared MSRs, it isn't being taken into account
by vmx_prepare_switch_to_guest() because it happens while we are
running with loaded guest state.
To fix above mentioned issue, add a flag to mark that the list of
shared MSRs has been updated and modify vmx_prepare_switch_to_guest()
to set shared MSRs when running with host state *OR* list of shared
MSRs has been updated.
Note that this issue was mistakenly introduced by commit
678e315e78a7 ("KVM: vmx: add dedicated utility to access guest's
kernel_gs_base") because previously vmx_set_efer() always called
vmx_load_host_state() which resulted in vmx_prepare_switch_to_guest() to
set shared MSRs.
Fixes:
678e315e78a7 ("KVM: vmx: add dedicated utility to access guest's kernel_gs_base")
Reported-by: Eyal Moscovici <eyal.moscovici@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Liran Alon [Wed, 7 Nov 2018 22:43:06 +0000 (00:43 +0200)]
KVM: x86: Fix kernel info-leak in KVM_HC_CLOCK_PAIRING hypercall
kvm_pv_clock_pairing() allocates local var
"struct kvm_clock_pairing clock_pairing" on stack and initializes
all it's fields besides padding (clock_pairing.pad[]).
Because clock_pairing var is written completely (including padding)
to guest memory, failure to init struct padding results in kernel
info-leak.
Fix the issue by making sure to also init the padding with zeroes.
Fixes:
55dd00a73a51 ("KVM: x86: add KVM_HC_CLOCK_PAIRING hypercall")
Reported-by: syzbot+a8ef68d71211ba264f56@syzkaller.appspotmail.com
Reviewed-by: Mark Kanda <mark.kanda@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Liran Alon [Sat, 17 Nov 2018 12:05:06 +0000 (14:05 +0200)]
KVM: nVMX: Fix kernel info-leak when enabling KVM_CAP_HYPERV_ENLIGHTENED_VMCS more than once
Consider the case that userspace enables KVM_CAP_HYPERV_ENLIGHTENED_VMCS twice:
1) kvm_vcpu_ioctl_enable_cap() is called to enable
KVM_CAP_HYPERV_ENLIGHTENED_VMCS which calls nested_enable_evmcs().
2) nested_enable_evmcs() sets enlightened_vmcs_enabled to true and fills
vmcs_version which is then copied to userspace.
3) kvm_vcpu_ioctl_enable_cap() is called again to enable
KVM_CAP_HYPERV_ENLIGHTENED_VMCS which calls nested_enable_evmcs().
4) This time nested_enable_evmcs() just returns 0 as
enlightened_vmcs_enabled is already true. *Without filling
vmcs_version*.
5) kvm_vcpu_ioctl_enable_cap() continues as usual and copies
*uninitialized* vmcs_version to userspace which leads to kernel info-leak.
Fix this issue by simply changing nested_enable_evmcs() to always fill
vmcs_version output argument. Even when enlightened_vmcs_enabled is
already set to true.
Note that SVM's nested_enable_evmcs() should not be modified because it
always returns a non-zero value (-ENODEV) which results in
kvm_vcpu_ioctl_enable_cap() skipping the copy of vmcs_version to
userspace (as it should).
Fixes:
57b119da3594 ("KVM: nVMX: add KVM_CAP_HYPERV_ENLIGHTENED_VMCS capability")
Reported-by: syzbot+cfbc368e283d381f8cef@syzkaller.appspotmail.com
Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Wei Wang [Mon, 12 Nov 2018 12:23:14 +0000 (12:23 +0000)]
svm: Add mutex_lock to protect apic_access_page_done on AMD systems
There is a race condition when accessing kvm->arch.apic_access_page_done.
Due to it, x86_set_memory_region will fail when creating the second vcpu
for a svm guest.
Add a mutex_lock to serialize the accesses to apic_access_page_done.
This lock is also used by vmx for the same purpose.
Signed-off-by: Wei Wang <wawei@amazon.de>
Signed-off-by: Amadeusz Juskowiak <ajusk@amazon.de>
Signed-off-by: Julian Stecklina <jsteckli@amazon.de>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Reviewed-by: Joerg Roedel <jroedel@suse.de>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Wanpeng Li [Tue, 20 Nov 2018 08:34:18 +0000 (16:34 +0800)]
KVM: X86: Fix scan ioapic use-before-initialization
Reported by syzkaller:
BUG: unable to handle kernel NULL pointer dereference at
00000000000001c8
PGD
80000003ec4da067 P4D
80000003ec4da067 PUD
3f7bfa067 PMD 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 7 PID: 5059 Comm: debug Tainted: G OE 4.19.0-rc5 #16
RIP: 0010:__lock_acquire+0x1a6/0x1990
Call Trace:
lock_acquire+0xdb/0x210
_raw_spin_lock+0x38/0x70
kvm_ioapic_scan_entry+0x3e/0x110 [kvm]
vcpu_enter_guest+0x167e/0x1910 [kvm]
kvm_arch_vcpu_ioctl_run+0x35c/0x610 [kvm]
kvm_vcpu_ioctl+0x3e9/0x6d0 [kvm]
do_vfs_ioctl+0xa5/0x690
ksys_ioctl+0x6d/0x80
__x64_sys_ioctl+0x1a/0x20
do_syscall_64+0x83/0x6e0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
The reason is that the testcase writes hyperv synic HV_X64_MSR_SINT6 msr
and triggers scan ioapic logic to load synic vectors into EOI exit bitmap.
However, irqchip is not initialized by this simple testcase, ioapic/apic
objects should not be accessed.
This can be triggered by the following program:
#define _GNU_SOURCE
#include <endian.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/syscall.h>
#include <sys/types.h>
#include <unistd.h>
uint64_t r[3] = {0xffffffffffffffff, 0xffffffffffffffff, 0xffffffffffffffff};
int main(void)
{
syscall(__NR_mmap, 0x20000000, 0x1000000, 3, 0x32, -1, 0);
long res = 0;
memcpy((void*)0x20000040, "/dev/kvm", 9);
res = syscall(__NR_openat, 0xffffffffffffff9c, 0x20000040, 0, 0);
if (res != -1)
r[0] = res;
res = syscall(__NR_ioctl, r[0], 0xae01, 0);
if (res != -1)
r[1] = res;
res = syscall(__NR_ioctl, r[1], 0xae41, 0);
if (res != -1)
r[2] = res;
memcpy(
(void*)0x20000080,
"\x01\x00\x00\x00\x00\x5b\x61\xbb\x96\x00\x00\x40\x00\x00\x00\x00\x01\x00"
"\x08\x00\x00\x00\x00\x00\x0b\x77\xd1\x78\x4d\xd8\x3a\xed\xb1\x5c\x2e\x43"
"\xaa\x43\x39\xd6\xff\xf5\xf0\xa8\x98\xf2\x3e\x37\x29\x89\xde\x88\xc6\x33"
"\xfc\x2a\xdb\xb7\xe1\x4c\xac\x28\x61\x7b\x9c\xa9\xbc\x0d\xa0\x63\xfe\xfe"
"\xe8\x75\xde\xdd\x19\x38\xdc\x34\xf5\xec\x05\xfd\xeb\x5d\xed\x2e\xaf\x22"
"\xfa\xab\xb7\xe4\x42\x67\xd0\xaf\x06\x1c\x6a\x35\x67\x10\x55\xcb",
106);
syscall(__NR_ioctl, r[2], 0x4008ae89, 0x20000080);
syscall(__NR_ioctl, r[2], 0xae80, 0);
return 0;
}
This patch fixes it by bailing out scan ioapic if ioapic is not initialized in
kernel.
Reported-by: Wei Wu <ww9210@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Wei Wu <ww9210@gmail.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Wanpeng Li [Tue, 20 Nov 2018 01:39:30 +0000 (09:39 +0800)]
KVM: LAPIC: Fix pv ipis use-before-initialization
Reported by syzkaller:
BUG: unable to handle kernel NULL pointer dereference at
0000000000000014
PGD
800000040410c067 P4D
800000040410c067 PUD
40410d067 PMD 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 3 PID: 2567 Comm: poc Tainted: G OE 4.19.0-rc5 #16
RIP: 0010:kvm_pv_send_ipi+0x94/0x350 [kvm]
Call Trace:
kvm_emulate_hypercall+0x3cc/0x700 [kvm]
handle_vmcall+0xe/0x10 [kvm_intel]
vmx_handle_exit+0xc1/0x11b0 [kvm_intel]
vcpu_enter_guest+0x9fb/0x1910 [kvm]
kvm_arch_vcpu_ioctl_run+0x35c/0x610 [kvm]
kvm_vcpu_ioctl+0x3e9/0x6d0 [kvm]
do_vfs_ioctl+0xa5/0x690
ksys_ioctl+0x6d/0x80
__x64_sys_ioctl+0x1a/0x20
do_syscall_64+0x83/0x6e0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
The reason is that the apic map has not yet been initialized, the testcase
triggers pv_send_ipi interface by vmcall which results in kvm->arch.apic_map
is dereferenced. This patch fixes it by checking whether or not apic map is
NULL and bailing out immediately if that is the case.
Fixes:
4180bf1b65 (KVM: X86: Implement "send IPI" hypercall)
Reported-by: Wei Wu <ww9210@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Wei Wu <ww9210@gmail.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Luiz Capitulino [Fri, 23 Nov 2018 17:02:14 +0000 (12:02 -0500)]
KVM: VMX: re-add ple_gap module parameter
Apparently, the ple_gap parameter was accidentally removed
by commit
c8e88717cfc6b36bedea22368d97667446318291. Add it
back.
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Cc: stable@vger.kernel.org
Fixes:
c8e88717cfc6b36bedea22368d97667446318291
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Kailang Yang [Thu, 8 Nov 2018 08:36:15 +0000 (16:36 +0800)]
ALSA: hda/realtek - Support ALC300
This patch will enable ALC300.
[ It's almost equivalent with other ALC269-compatible ones, and
apparently has no loopback mixer -- tiwai ]
Signed-off-by: Kailang Yang <kailang@realtek.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Girija Kumar Kasinadhuni [Mon, 26 Nov 2018 18:40:46 +0000 (13:40 -0500)]
ALSA: hda/realtek - Add auto-mute quirk for HP Spectre x360 laptop
This device makes a loud buzzing sound when a headphone is inserted while
playing audio at full volume through the speaker.
Fixes:
bbf8ff6b1d2a ("ALSA: hda/realtek - Fixup for HP x360 laptops with B&O speakers")
Signed-off-by: Girija Kumar Kasinadhuni <gkumar@neverware.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Harald Freudenberger [Fri, 9 Nov 2018 13:59:24 +0000 (14:59 +0100)]
s390/zcrypt: reinit ap queue state machine during device probe
Until the vfio-ap driver came into live there was a well known
agreement about the way how ap devices are initialized and their
states when the driver's probe function is called.
However, the vfio device driver when receiving an ap queue device does
additional resets thereby removing the registration for interrupts for
the ap device done by the ap bus core code. So when later the vfio
driver releases the device and one of the default zcrypt drivers takes
care of the device the interrupt registration needs to get
renewed. The current code does no renew and result is that requests
send into such a queue will never see a reply processed - the
application hangs.
This patch adds a function which resets the aq queue state machine for
the ap queue device and triggers the walk through the initial states
(which are reset and registration for interrupts). This function is
now called before the driver's probe function is invoked.
When the association between driver and device is released, the
driver's remove function is called. The current implementation calls a
ap queue function ap_queue_remove(). This invokation has been moved to
the ap bus function to make the probe / remove pair for ap bus and
drivers more symmetric.
Fixes:
7e0bdbe5c21c ("s390/zcrypt: AP bus support for alternate driver(s)")
Cc: stable@vger.kernel.org # 4.19+
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewd-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewd-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Pan Bian [Sun, 25 Nov 2018 00:58:02 +0000 (08:58 +0800)]
ext2: fix potential use after free
The function ext2_xattr_set calls brelse(bh) to drop the reference count
of bh. After that, bh may be freed. However, following brelse(bh),
it reads bh->b_data via macro HDR(bh). This may result in a
use-after-free bug. This patch moves brelse(bh) after reading field.
CC: stable@vger.kernel.org
Signed-off-by: Pan Bian <bianpan2016@163.com>
Signed-off-by: Jan Kara <jack@suse.cz>
xingaopeng [Sat, 24 Nov 2018 11:21:59 +0000 (19:21 +0800)]
ext2: initialize opts.s_mount_opt as zero before using it
We need to initialize opts.s_mount_opt as zero before using it, else we
may get some unexpected mount options.
Fixes:
088519572ca8 ("ext2: Parse mount options into a dedicated structure")
CC: stable@vger.kernel.org
Signed-off-by: xingaopeng <xingaopeng@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
David Miller [Tue, 27 Nov 2018 05:55:09 +0000 (21:55 -0800)]
sparc: Adjust bpf JIT prologue for PSEUDO calls.
Move all arguments into output registers from input registers.
This path is exercised by test_verifier.c's "calls: two calls with
args" test. Adjust BPF_TAILCALL_PROLOGUE_SKIP as needed.
Let's also make the prologue length a constant size regardless of
the combination of ->saw_frame_pointer and ->saw_tail_call
settings.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Max Filippov [Tue, 27 Nov 2018 02:06:01 +0000 (18:06 -0800)]
xtensa: fix coprocessor part of ptrace_{get,set}xregs
Layout of coprocessor registers in the elf_xtregs_t and
xtregs_coprocessor_t may be different due to alignment. Thus it is not
always possible to copy data between the xtregs_coprocessor_t structure
and the elf_xtregs_t and get correct values for all registers.
Use a table of offsets and sizes of individual coprocessor register
groups to do coprocessor context copying in the ptrace_getxregs and
ptrace_setxregs.
This fixes incorrect coprocessor register values reading from the user
process by the native gdb on an xtensa core with multiple coprocessors
and registers with high alignment requirements.
Cc: stable@vger.kernel.org
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Max Filippov [Mon, 26 Nov 2018 23:18:26 +0000 (15:18 -0800)]
xtensa: fix coprocessor context offset definitions
Coprocessor context offsets are used by the assembly code that moves
coprocessor context between the individual fields of the
thread_info::xtregs_cp structure and coprocessor registers.
This fixes coprocessor context clobbering on flushing and reloading
during normal user code execution and user process debugging in the
presence of more than one coprocessor in the core configuration.
Cc: stable@vger.kernel.org
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Max Filippov [Mon, 26 Nov 2018 21:29:41 +0000 (13:29 -0800)]
xtensa: enable coprocessors that are being flushed
coprocessor_flush_all may be called from a context of a thread that is
different from the thread being flushed. In that case contents of the
cpenable special register may not match ti->cpenable of the target
thread, resulting in unhandled coprocessor exception in the kernel
context.
Set cpenable special register to the ti->cpenable of the target register
for the duration of the flush and restore it afterwards.
This fixes the following crash caused by coprocessor register inspection
in native gdb:
(gdb) p/x $w0
Illegal instruction in kernel: sig: 9 [#1] PREEMPT
Call Trace:
___might_sleep+0x184/0x1a4
__might_sleep+0x41/0xac
exit_signals+0x14/0x218
do_exit+0xc9/0x8b8
die+0x99/0xa0
do_illegal_instruction+0x18/0x6c
common_exception+0x77/0x77
coprocessor_flush+0x16/0x3c
arch_ptrace+0x46c/0x674
sys_ptrace+0x2ce/0x3b4
system_call+0x54/0x80
common_exception+0x77/0x77
note: gdb[100] exited with preempt_count 1
Killed
Cc: stable@vger.kernel.org
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Matias Bjørling [Mon, 26 Nov 2018 19:27:03 +0000 (11:27 -0800)]
ia64: export node_distance function
The numa_slit variable used by node_distance is available to a
module as long as it is linked at compile-time. However, it is
not available to loadable modules. Leading to errors such as:
ERROR: "numa_slit" [drivers/nvme/host/nvme-core.ko] undefined!
The error above is caused by the nvme multipath code that makes
use of node_distance for its path calculation. When the patch was
added, the lightnvm subsystem would select nvme and always compile
it in, leading to the node_distance call to always succeed.
However, when this requirement was removed, nvme could be compiled
in as a module, which exposed this bug.
This patch extracts node_distance to a function and exports it.
Since ACPI is depending on node_distance being a simple lookup to
numa_slit, the previous behavior is exposed as slit_distance and its
users updated.
Fixes:
f333444708f82 "nvme: take node locality into account when selecting a path"
Fixes:
73569e11032f "lightnvm: remove dependencies on BLK_DEV_NVME and PCI"
Signed-off-by: Matias Bjøring <mb@lightnvm.io>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Daniel Borkmann [Tue, 27 Nov 2018 00:21:00 +0000 (01:21 +0100)]
bpf, doc: add entries of who looks over which jits
Make the high-level BPF JIT entry a general 'catch-all' and add
architecture specific entries to make it more clear who actively
maintains which BPF JIT compiler. The list (L) address implies
that this eventually lands in the bpf patchwork bucket. Goal is
that this set of responsible developers listed here is always up
to date and a point of contact for helping out in e.g. feature
development, fixes, review or testing patches in order to help
long-term in ensuring quality of the BPF JITs and therefore BPF
core under a given architecture. Every new JIT in future /must/
have an entry here as well.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Sandipan Das <sandipan@linux.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Zi Shen Lim <zlim.lnx@gmail.com>
Acked-by: Paul Burton <paul.burton@mips.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Wang YanQing <udknight@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
David Miller [Mon, 26 Nov 2018 22:52:18 +0000 (14:52 -0800)]
sparc: Correct ctx->saw_frame_pointer logic.
We need to initialize the frame pointer register not just if it is
seen as a source operand, but also if it is seen as the destination
operand of a store or an atomic instruction (which effectively is a
source operand).
This is exercised by test_verifier's "non-invalid fp arithmetic"
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
David Miller [Mon, 26 Nov 2018 21:03:46 +0000 (13:03 -0800)]
sparc: Fix JIT fused branch convergance.
On T4 and later sparc64 cpus we can use the fused compare and branch
instruction.
However, it can only be used if the branch destination is in the range
of a signed 10-bit immediate offset. This amounts to 1024
instructions forwards or backwards.
After the commit referenced in the Fixes: tag, the largest possible
size program seen by the JIT explodes by a significant factor.
As a result of this convergance takes many more passes since the
expanded "BPF_LDX | BPF_MSH | BPF_B" code sequence, for example,
contains several embedded branch on condition instructions.
On each pass, as suddenly new fused compare and branch instances
become valid, this makes thousands more in range for the next pass.
And so on and so forth.
This is most greatly exemplified by "BPF_MAXINSNS: exec all MSH" which
takes 35 passes to converge, and shrinks the image by about 64K.
To decrease the cost of this number of convergance passes, do the
convergance pass before we have the program image allocated, just like
other JITs (such as x86) do.
Fixes:
e0cea7ce988c ("bpf: implement ld_abs/ld_ind in native bpf")
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Alexei Starovoitov [Tue, 27 Nov 2018 01:34:25 +0000 (17:34 -0800)]
Merge branch 'arm64-jit-fixes'
Daniel Borkmann says:
====================
This set contains a fix for arm64 BPF JIT. First patch generalizes
ppc64 way of retrieving subprog into bpf_jit_get_func_addr() as core
code and uses the same on arm64 in second patch. Tested on both arm64
and ppc64.
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Daniel Borkmann [Mon, 26 Nov 2018 13:05:39 +0000 (14:05 +0100)]
bpf, arm64: fix getting subprog addr from aux for calls
The arm64 JIT has the same issue as ppc64 JIT in that the relative BPF
to BPF call offset can be too far away from core kernel in that relative
encoding into imm is not sufficient and could potentially be truncated,
see also
fd045f6cd98e ("arm64: add support for module PLTs") which adds
spill-over space for module_alloc() and therefore bpf_jit_binary_alloc().
Therefore, use the recently added bpf_jit_get_func_addr() helper for
properly fetching the address through prog->aux->func[off]->bpf_func
instead. This also has the benefit to optimize normal helper calls since
their address can use the optimized emission. Tested on Cavium ThunderX
CN8890.
Fixes:
db496944fdaa ("bpf: arm64: add JIT support for multi-function programs")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>