Heiko Carstens [Wed, 5 Apr 2023 14:34:52 +0000 (16:34 +0200)]
proc/stat: remove arch_idle_time()
The last (only) architecture specific arch_idle_time() implementation was
removed with commit
be76ea614460 ("s390/idle: remove arch_cpu_idle_time()
and corresponding code").
Therefore remove the now dead code in fs/proc/stat.c as well.
Link: https://lkml.kernel.org/r/20230405143452.2677172-1-hca@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Matthieu Baerts [Mon, 3 Apr 2023 16:23:50 +0000 (18:23 +0200)]
checkpatch: check for misuse of the link tags
"Link:" and "Closes:" tags have to be used with public URLs.
It is difficult to make sure the link is public but at least we can verify
the tag is followed by 'http(s)://'.
With that, we avoid such a tag that is not allowed [1]:
Closes: <number>
Now that we check the "link" tags are followed by a URL, we can relax the
check linked to "Reported-by being followed by a link tag" to only verify
if a "link" tag is present after the "Reported-by" one.
Link: https://lore.kernel.org/linux-doc/CAHk-=wh0v1EeDV3v8TzK81nDC40=XuTdY2MCr0xy3m3FiBV3+Q@mail.gmail.com/
Link: https://lkml.kernel.org/r/20230314-doc-checkpatch-closes-tag-v4-5-d26d1fa66f9f@tessares.net
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Acked-by: Joe Perches <joe@perches.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: David Airlie <airlied@gmail.com>
Cc: Dwaipayan Ray <dwaipayanray1@gmail.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kai Wasserbäch <kai@dev.carbon-project.org>
Cc: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Thorsten Leemhuis <linux@leemhuis.info>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Matthieu Baerts [Mon, 3 Apr 2023 16:23:49 +0000 (18:23 +0200)]
checkpatch: allow Closes tags with links
As a follow-up of a previous patch modifying the documentation to allow
using the "Closes:" tag, checkpatch.pl is updated accordingly.
checkpatch.pl now no longer complain when the "Closes:" tag is used by
itself:
commit
76f381bb77a0 ("checkpatch: warn when unknown tags are used for links")
... or after the "Reported-by:" tag:
commit
d7f1d71e5ef6 ("checkpatch: warn when Reported-by: is not followed by Link:")
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/373
Link: https://lkml.kernel.org/r/20230314-doc-checkpatch-closes-tag-v4-4-d26d1fa66f9f@tessares.net
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Acked-by: Joe Perches <joe@perches.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: David Airlie <airlied@gmail.com>
Cc: Dwaipayan Ray <dwaipayanray1@gmail.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kai Wasserbäch <kai@dev.carbon-project.org>
Cc: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Thorsten Leemhuis <linux@leemhuis.info>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Matthieu Baerts [Mon, 3 Apr 2023 16:23:48 +0000 (18:23 +0200)]
checkpatch: use a list of "link" tags
The following commit will allow the use of a similar "link" tag.
Because there is a possibility that other similar tags will be added in
the future and to reduce the number of places where the code will be
modified to allow this new tag, a list with all these "link" tags is now
used.
Two variables are created from it: one to search for such tags and one to
print all tags in a warning message.
Link: https://lkml.kernel.org/r/20230314-doc-checkpatch-closes-tag-v4-3-d26d1fa66f9f@tessares.net
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Suggested-by: Joe Perches <joe@perches.com>
Acked-by: Joe Perches <joe@perches.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: David Airlie <airlied@gmail.com>
Cc: Dwaipayan Ray <dwaipayanray1@gmail.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kai Wasserbäch <kai@dev.carbon-project.org>
Cc: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Thorsten Leemhuis <linux@leemhuis.info>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Matthieu Baerts [Mon, 3 Apr 2023 16:23:47 +0000 (18:23 +0200)]
checkpatch: don't print the next line if not defined
When checking if "Reported-by" tag is followed by "Link:", there is no
need to print the next line if there is no next line.
While at it, also mention in this case that the "Link:" tag should be
followed by a URL, similar to the next warning.
By doing that, the code is now similar to what is done above when checking
if the Co-developed-by tag is properly used.
Link: https://lkml.kernel.org/r/20230314-doc-checkpatch-closes-tag-v4-2-d26d1fa66f9f@tessares.net
Fixes:
d7f1d71e5ef6 ("checkpatch: warn when Reported-by: is not followed by Link:")
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Acked-by: Joe Perches <joe@perches.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: David Airlie <airlied@gmail.com>
Cc: Dwaipayan Ray <dwaipayanray1@gmail.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kai Wasserbäch <kai@dev.carbon-project.org>
Cc: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Thorsten Leemhuis <linux@leemhuis.info>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Matthieu Baerts [Mon, 3 Apr 2023 16:23:46 +0000 (18:23 +0200)]
docs: process: allow Closes tags with links
Since v6.3, checkpatch.pl now complains about the use of "Closes:" tags
followed by a link [1]. It also complains if a "Reported-by:" tag is
followed by a "Closes:" one [2].
As detailed in the first patch, this "Closes:" tag is used for a bit of
time, mainly by DRM and MPTCP subsystems. It is used by some bug trackers
to automate the closure of issues when a patch is accepted. It is even
planned to use this tag with bugzilla.kernel.org [3].
The first patch updates the documentation to explain what is this
"Closes:" tag and how/when to use it. The second patch modifies
checkpatch.pl to stop complaining about it.
The DRM maintainers and their mailing list have been added in Cc as they
are probably interested by these two patches as well.
[1] https://lore.kernel.org/all/
3b036087d80b8c0e07a46a1dbaaf4ad0d018f8d5.
1674217480.git.linux@leemhuis.info/
[2] https://lore.kernel.org/all/
bb5dfd55ea2026303ab2296f4a6df3da7dd64006.
1674217480.git.linux@leemhuis.info/
[3] https://lore.kernel.org/linux-doc/
20230315181205.f3av7h6owqzzw64p@meerkat.local/
This patch (of 5):
Making sure a bug tracker is up to date is not an easy task. For example,
a first version of a patch fixing a tracked issue can be sent a long time
after having created the issue. But also, it can take some time to have
this patch accepted upstream in its final form. When it is done, someone
-- probably not the person who accepted the patch -- has to remember about
closing the corresponding issue.
This task of closing and tracking the patch can be done automatically by
bug trackers like GitLab [1], GitHub [2] and hopefully soon [3]
bugzilla.kernel.org when the appropriated tag is used. The two first ones
accept multiple tags but it is probably better to pick one.
According to commit
76f381bb77a0 ("checkpatch: warn when unknown tags are used for links"),
the "Closes" tag seems to have been used in the past by a few people and
it is supported by popular bug trackers. Here is how it has been used in
the past:
$ git log --no-merges --format=email -P --grep='^Closes: http' | \
grep '^Closes: http' | cut -d/ -f3-5 | sort | uniq -c | sort -rn
391 gitlab.freedesktop.org/drm/intel
79 github.com/multipath-tcp/mptcp_net-next
8 gitlab.freedesktop.org/drm/msm
3 gitlab.freedesktop.org/drm/amd
2 gitlab.freedesktop.org/mesa/mesa
1 patchwork.freedesktop.org/series/73320
1 gitlab.freedesktop.org/lima/linux
1 gitlab.freedesktop.org/drm/nouveau
1 github.com/ClangBuiltLinux/linux
1 bugzilla.netfilter.org/show_bug.cgi?id=1579
1 bugzilla.netfilter.org/show_bug.cgi?id=1543
1 bugzilla.netfilter.org/show_bug.cgi?id=1436
1 bugzilla.netfilter.org/show_bug.cgi?id=1427
1 bugs.debian.org/625804
Likely here, the "Closes" tag was only properly used with GitLab and
GitHub. We can also see that it has been used quite a few times (and
still used recently) and this is then not a "random tag that makes no
sense" like it was the case with "BugLink" recently [4]. It has also been
misused but that was a long time ago, when it was common to use many
different random tags.
checkpatch.pl script should then stop complaining about this "Closes" tag.
As suggested by Thorsten [5], if this tag is accepted, it should first be
described in the documentation. This is what is done here in this patch.
To avoid confusion, the "Closes" should be used with any public bug
report. No need to check if the underlying bug tracker supports
automations. Having this tag with any kind of public bug reports allows
bots like regzbot to clearly identify patches fixing a specific bug and
avoid false-positives, e.g. patches mentioning it is related to an issue
but not fixing it. As suggested by Thorsten [6] again, if we follow the
same logic, the "Closes" tag should then be used after a "Reported-by"
one.
Note that thanks to this "Closes" tag, the mentioned bug trackers can also
locate where a patch has been applied in different branches and
repositories. If only the "Link" tag is used, the tracking can also be
done but the ticket will not be closed and a manual operation will be
needed. Also, these bug trackers have some safeguards: the closure is
only done if a commit having the "Closes:" tag is applied in a specific
branch. It will then not be closed if a random commit having the same tag
is published elsewhere. Also in case of closure, a notification is sent
to the owners.
Link: https://lkml.kernel.org/r/20230314-doc-checkpatch-closes-tag-v4-0-d26d1fa66f9f@tessares.net
Link: https://docs.gitlab.com/ee/user/project/issues/managing_issues.html#default-closing-pattern
Link: https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/using-keywords-in-issues-and-pull-requests
Link: https://lore.kernel.org/linux-doc/20230315181205.f3av7h6owqzzw64p@meerkat.local/
Link: https://lore.kernel.org/all/CAHk-=wgs38ZrfPvy=nOwVkVzjpM3VFU1zobP37Fwd_h9iAD5JQ@mail.gmail.com/
Link: https://lore.kernel.org/all/688cd6cb-90ab-6834-a6f5-97080e39ca8e@leemhuis.info/
Link: https://lore.kernel.org/linux-doc/2194d19d-f195-1a1e-41fc-7827ae569351@leemhuis.info/
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/373
Link: https://lkml.kernel.org/r/20230314-doc-checkpatch-closes-tag-v4-1-d26d1fa66f9f@tessares.net
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Suggested-by: Thorsten Leemhuis <linux@leemhuis.info>
Acked-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Acked-by: Joe Perches <joe@perches.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: David Airlie <airlied@gmail.com>
Cc: Dwaipayan Ray <dwaipayanray1@gmail.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kai Wasserbäch <kai@dev.carbon-project.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Peng Liu [Tue, 21 Mar 2023 06:20:04 +0000 (14:20 +0800)]
scripts/gdb: fix lx-timerlist for HRTIMER_MAX_CLOCK_BASES printing
HRTIMER_MAX_CLOCK_BASES is of enum type hrtimer_base_type. To print it as
an integer, HRTIMER_MAX_CLOCK_BASES should be converted first.
Link: https://lkml.kernel.org/r/TYCP286MB214640FF0E7F04AC3926A39EC6819@TYCP286MB2146.JPNP286.PROD.OUTLOOK.COM
Signed-off-by: Peng Liu <liupeng17@lenovo.com>
Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Kieran Bingham <kbingham@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Peng Liu [Tue, 21 Mar 2023 06:19:29 +0000 (14:19 +0800)]
scripts/gdb: fix lx-timerlist for Python3
Below incompatibilities between Python2 and Python3 made lx-timerlist fail
to run under Python3.
o xrange() is replaced by range() in Python3
o bytes and str are different types in Python3
o the return value of Inferior.read_memory() is memoryview object in
Python3
akpm: cc stable so that older kernels are properly debuggable under newer
Python.
Link: https://lkml.kernel.org/r/TYCP286MB2146EE1180A4D5176CBA8AB2C6819@TYCP286MB2146.JPNP286.PROD.OUTLOOK.COM
Signed-off-by: Peng Liu <liupeng17@lenovo.com>
Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Kieran Bingham <kbingham@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Peng Liu [Tue, 21 Mar 2023 06:18:43 +0000 (14:18 +0800)]
scripts/gdb: fix lx-timerlist for struct timequeue_head change
commit
511885d7061e ("lib/timerqueue: Rely on rbtree semantics for next
timer") changed struct timerqueue_head, and so print_active_timers()
should be changed accordingly with its way to interpret the structure.
Link: https://lkml.kernel.org/r/TYCP286MB21463BD277330B26DDC18903C6819@TYCP286MB2146.JPNP286.PROD.OUTLOOK.COM
Signed-off-by: Peng Liu <liupeng17@lenovo.com>
Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Kieran Bingham <kbingham@kernel.org>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andy Shevchenko [Mon, 27 Mar 2023 14:26:04 +0000 (17:26 +0300)]
lib/test-string_helpers: replace UNESCAPE_ANY by UNESCAPE_ALL_MASK
When we get a random number to generate a flag in the valid range of
UNESCAPE flags, use UNESCAPE_ALL_MASK, It's more correct and prevents from
missed updates of the test coverage in the future if any.
Link: https://lkml.kernel.org/r/20230327142604.48213-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Florian Fainelli [Thu, 23 Mar 2023 23:16:57 +0000 (16:16 -0700)]
scripts/gdb: bail early if there are no generic PD
Avoid generating an exception if there are no generic power domain(s)
registered:
(gdb) lx-genpd-summary
domain status children
/device runtime status
----------------------------------------------------------------------
Python Exception <class 'gdb.error'>: No symbol "gpd_list" in current context.
Error occurred in Python: No symbol "gpd_list" in current context.
(gdb) quit
[f.fainelli@gmail.com: correctly invoke gdb_eval_or_none]
Link: https://lkml.kernel.org/r/20230327185746.3856407-1-f.fainelli@gmail.com
Link: https://lkml.kernel.org/r/20230323231659.3319941-1-f.fainelli@gmail.com
Fixes:
8207d4a88e1e ("scripts/gdb: add lx-genpd-summary command")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Kieran Bingham <kbingham@kernel.org>
Cc: Leonard Crestez <leonard.crestez@nxp.com>
Cc: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Florian Fainelli [Thu, 23 Mar 2023 22:52:45 +0000 (15:52 -0700)]
scripts/gdb: bail early if there are no clocks
Avoid generating an exception if there are no clocks registered:
(gdb) lx-clk-summary
enable prepare protect
clock count count count rate
------------------------------------------------------------------------
Python Exception <class 'gdb.error'>: No symbol "clk_root_list" in
current context.
Error occurred in Python: No symbol "clk_root_list" in current context.
Link: https://lkml.kernel.org/r/20230323225246.3302977-1-f.fainelli@gmail.com
Fixes:
d1e9710b63d8 ("scripts/gdb: initial clk support: lx-clk-summary")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Kieran Bingham <kbingham@kernel.org>
Cc: Leonard Crestez <leonard.crestez@nxp.com>
Cc: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Bjorn Helgaas [Tue, 7 Mar 2023 22:44:16 +0000 (16:44 -0600)]
kexec: remove unnecessary arch_kexec_kernel_image_load()
arch_kexec_kernel_image_load() only calls kexec_image_load_default(), and
there are no arch-specific implementations.
Remove the unnecessary arch_kexec_kernel_image_load() and make
kexec_image_load_default() static.
No functional change intended.
Link: https://lkml.kernel.org/r/20230307224416.907040-3-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Bjorn Helgaas [Tue, 7 Mar 2023 22:44:15 +0000 (16:44 -0600)]
x86/kexec: remove unnecessary arch_kexec_kernel_image_load()
Patch series "kexec: Remove unnecessary arch hook", v2.
There are no arch-specific things in arch_kexec_kernel_image_load(), so
remove it and just use the generic version.
This patch (of 2):
The x86 implementation of arch_kexec_kernel_image_load() is functionally
identical to the generic arch_kexec_kernel_image_load():
arch_kexec_kernel_image_load # x86
if (!image->fops || !image->fops->load)
return ERR_PTR(-ENOEXEC);
return image->fops->load(image, image->kernel_buf, ...)
arch_kexec_kernel_image_load # generic
kexec_image_load_default
if (!image->fops || !image->fops->load)
return ERR_PTR(-ENOEXEC);
return image->fops->load(image, image->kernel_buf, ...)
Remove the x86-specific version and use the generic
arch_kexec_kernel_image_load(). No functional change intended.
Link: https://lkml.kernel.org/r/20230307224416.907040-1-helgaas@kernel.org
Link: https://lkml.kernel.org/r/20230307224416.907040-2-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cai Huoqing [Thu, 23 Mar 2023 11:37:09 +0000 (19:37 +0800)]
rapidio/tsi721: remove redundant pci_clear_master
Remove pci_clear_master to simplify the code, the bus-mastering is also
cleared in do_pci_disable_device, like this:
./drivers/pci/pci.c:2197
static void do_pci_disable_device(struct pci_dev *dev)
{
u16 pci_command;
pci_read_config_word(dev, PCI_COMMAND, &pci_command);
if (pci_command & PCI_COMMAND_MASTER) {
pci_command &= ~PCI_COMMAND_MASTER;
pci_write_config_word(dev, PCI_COMMAND, pci_command);
}
pcibios_disable_device(dev);
}.
And dev->is_busmaster is set to 0 in pci_disable_device.
Link: https://lkml.kernel.org/r/20230323113711.10523-1-cai.huoqing@linux.dev
Signed-off-by: Cai Huoqing <cai.huoqing@linux.dev>
Cc: Alexandre Bounine <alex.bou9@gmail.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andy Shevchenko [Thu, 23 Mar 2023 15:50:29 +0000 (17:50 +0200)]
kernel.h: split the hexadecimal related helpers to hex.h
For the sake of cleaning up the kernel.h split the hexadecimal related
helpers to own header called 'hex.h'.
Link: https://lkml.kernel.org/r/20230323155029.40000-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Paolo Abeni [Wed, 22 Mar 2023 16:57:02 +0000 (17:57 +0100)]
epoll: use refcount to reduce ep_mutex contention
We are observing huge contention on the epmutex during an http
connection/rate test:
83.17% 0.25% nginx [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe
[...]
|--66.96%--__fput
|--60.04%--eventpoll_release_file
|--58.41%--__mutex_lock.isra.6
|--56.56%--osq_lock
The application is multi-threaded, creates a new epoll entry for
each incoming connection, and does not delete it before the
connection shutdown - that is, before the connection's fd close().
Many different threads compete frequently for the epmutex lock,
affecting the overall performance.
To reduce the contention this patch introduces explicit reference counting
for the eventpoll struct. Each registered event acquires a reference,
and references are released at ep_remove() time.
The eventpoll struct is released by whoever - among EP file close() and
and the monitored file close() drops its last reference.
Additionally, this introduces a new 'dying' flag to prevent races between
the EP file close() and the monitored file close().
ep_eventpoll_release() marks, under f_lock spinlock, each epitem as dying
before removing it, while EP file close() does not touch dying epitems.
The above is needed as both close operations could run concurrently and
drop the EP reference acquired via the epitem entry. Without the above
flag, the monitored file close() could reach the EP struct via the epitem
list while the epitem is still listed and then try to put it after its
disposal.
An alternative could be avoiding touching the references acquired via
the epitems at EP file close() time, but that could leave the EP struct
alive for potentially unlimited time after EP file close(), with nasty
side effects.
With all the above in place, we can drop the epmutex usage at disposal time.
Overall this produces a significant performance improvement in the
mentioned connection/rate scenario: the mutex operations disappear from
the topmost offenders in the perf report, and the measured connections/rate
grows by ~60%.
To make the change more readable this additionally renames ep_free() to
ep_clear_and_put(), and moves the actual memory cleanup in a separate
ep_free() helper.
Link: https://lkml.kernel.org/r/4a57788dcaf28f5eb4f8dfddcc3a8b172a7357bb.1679504153.git.pabeni@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Co-developed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Xiumei Mu <xmu@redhiat.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Jacob Keller <jacob.e.keller@intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Guilherme G. Piccoli [Tue, 14 Mar 2023 20:00:58 +0000 (17:00 -0300)]
notifiers: add tracepoints to the notifiers infrastructure
Currently there is no way to show the callback names for registered,
unregistered or executed notifiers. This is very useful for debug
purposes, hence add this functionality here in the form of notifiers'
tracepoints, one per operation.
[akpm@linux-foundation.org: coding-style cleanups]
Link: https://lkml.kernel.org/r/20230314200058.1326909-1-gpiccoli@igalia.com
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Michael Kelley <mikelley@microsoft.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Xiaoming Ni <nixiaoming@huawei.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Cc: Guilherme G. Piccoli <gpiccoli@igalia.com>
Cc: Guilherme G. Piccoli <kernel@gpiccoli.net>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tom Rix [Sun, 12 Mar 2023 16:46:45 +0000 (12:46 -0400)]
kernel/hung_task.c: set some hung_task.c variables storage-class-specifier to static
smatch reports several warnings
kernel/hung_task.c:31:19: warning:
symbol 'sysctl_hung_task_check_count' was not declared. Should it be static?
kernel/hung_task.c:50:29: warning:
symbol 'sysctl_hung_task_check_interval_secs' was not declared. Should it be static?
kernel/hung_task.c:52:19: warning:
symbol 'sysctl_hung_task_warnings' was not declared. Should it be static?
kernel/hung_task.c:75:28: warning:
symbol 'sysctl_hung_task_panic' was not declared. Should it be static?
These variables are only used in hung_task.c, so they should be static
Link: https://lkml.kernel.org/r/20230312164645.471259-1-trix@redhat.com
Signed-off-by: Tom Rix <trix@redhat.com>
Cc: Ben Dooks <ben.dooks@sifive.com>
Cc: fuyuanli <fuyuanli@didiglobal.com>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Lukas Bulwahn [Wed, 8 Mar 2023 15:06:25 +0000 (16:06 +0100)]
MAINTAINERS: remove the obsolete section EMBEDDED LINUX
By now, many developers are working on Linux for embedded systems. There
is no need to point out single developers. The linux-embedded mailing
list has only little traffic, and most of it is just spam.
Remove this obsolete section.
Link: https://lkml.kernel.org/r/20230308150625.28732-1-lukas.bulwahn@gmail.com
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Acked-by: David Woodhouse <dwmw2@infradead.org>
Cc: Olivia Mackall <olivia@selenic.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Gerhard Engleder [Wed, 4 Jan 2023 20:15:24 +0000 (21:15 +0100)]
checkpatch: ignore ETHTOOL_LINK_MODE_ enum values
Since commit
4104a20646 ("checkpatch: ignore generated CamelCase defines
and enum values") enum values like ETHTOOL_LINK_MODE_Asym_Pause_BIT are
ignored. But there are other enums like
ETHTOOL_LINK_MODE_1000baseT_Full_BIT, which are not ignored because of the
not matching '1000baseT' substring.
Add regex to match all ETHTOOL_LINK_MODE enums.
Link: https://lkml.kernel.org/r/20230104201524.28078-1-gerhard@engleder-embedded.com
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Dwaipayan Ray <dwaipayanray1@gmail.com>
Cc: Gerhard Engleder <gerhard@engleder-embedded.com>
Cc: Joe Perches <joe@perches.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andrew Morton [Mon, 6 Mar 2023 21:32:53 +0000 (13:32 -0800)]
scripts/link-vmlinux.sh: fix error message presentation
This comes out as
Try make KALLSYMS_EXTRA_PASS=1 as a workaround
but we want quotes:
Try "make KALLSYMS_EXTRA_PASS=1" as a workaround
Link: https://lkml.kernel.org/r/202303042034.Cjc7JTd0-lkp@intel.com
Cc: kernel test robot <lkp@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Alexey Dobriyan [Tue, 28 Feb 2023 12:14:17 +0000 (15:14 +0300)]
ELF: fix all "Elf" typos
ELF is acronym and therefore should be spelled in all caps.
I left one exception at Documentation/arm/nwfpe/nwfpe.rst which looks like
being written in the first person.
Link: https://lkml.kernel.org/r/Y/3wGWQviIOkyLJW@p183
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Alexey Dobriyan [Tue, 28 Feb 2023 15:34:38 +0000 (18:34 +0300)]
mm: uninline kstrdup()
gcc inlines kstrdup into kstrdup_const() but it can very efficiently tail
call into it instead:
$ ./scripts/bloat-o-meter ../vmlinux-000 ../obj/vmlinux
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-84 (-84)
Function old new delta
kstrdup_const 119 35 -84
Link: https://lkml.kernel.org/r/Y/4fDlbIhTLNLFHz@p183
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Glenn Washburn [Mon, 27 Feb 2023 04:06:00 +0000 (22:06 -0600)]
scripts/gdb: support getting current task struct in UML
A running x86 UML kernel reports with architecture "i386:x86-64" as it is
a sub-architecture. However, a difference with bare-metal x86 kernels is
in how it manages tasks and the current task struct. To identify that the
inferior is a UML kernel and not bare-metal, check for the existence of
the UML specific symbol "cpu_tasks" which contains the current task
struct.
Link: https://lkml.kernel.org/r/b839d611e2906ccef2725c34d8e353fab35fe75e.1677469905.git.development@efficientek.com
Signed-off-by: Glenn Washburn <development@efficientek.com>
Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Anton Ivanov <anton.ivanov@kot-begemot.co.uk>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Kieran Bingham <kbingham@kernel.org>
Cc: Richard Weinberger <richard@nod.at>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Glenn Washburn [Mon, 27 Feb 2023 04:05:59 +0000 (22:05 -0600)]
scripts/gdb: correct indentation in get_current_task
Patch series "scripts/gdb: Support getting current task struct in UML",
v3.
A running x86 UML kernel reports with architecture "i386:x86-64" as it is
a sub-architecture. However, a difference with bare-metal x86 kernels is
in how it manages tasks and the current task struct. To identify that the
inferior is a UML kernel and not bare-metal, check for the existence of
the UML specific symbol "cpu_tasks" which contains the current task
struct.
This patch (of 3):
There is an extra space in a couple blocks in get_current_task. Though
python does not care, let's make the spacing consistent. Also, format
better an if expression, removing unneeded parenthesis.
Link: https://lkml.kernel.org/r/cover.1677469905.git.development@efficientek.com
Link: https://lkml.kernel.org/r/2e117b82240de6893f27cb6507242ce455ed7b5b.1677469905.git.development@efficientek.com
Signed-off-by: Glenn Washburn <development@efficientek.com>
Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Anton Ivanov <anton.ivanov@kot-begemot.co.uk>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Kieran Bingham <kbingham@kernel.org>
Cc: Richard Weinberger <richard@nod.at>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Dan Carpenter [Mon, 27 Feb 2023 10:06:12 +0000 (13:06 +0300)]
dca: delete unnecessary variable
It's more readable to just pass NULL directly instead of using a variable
for that.
Link: https://lkml.kernel.org/r/Y/yAlDytLH0ZNLNz@kili
Signed-off-by: Dan Carpenter <error27@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andrey Konovalov [Mon, 27 Feb 2023 17:17:03 +0000 (18:17 +0100)]
kcov: improve documentation
Improve KCOV documentation:
- Use KCOV instead of kcov, as the former is more widely-used.
- Mention Clang in compiler requirements.
- Use ``annotations`` for inline code.
- Rework remote coverage collection documentation for better clarity.
- Various smaller changes.
[andreyknvl@google.com: v2]
Link: https://lkml.kernel.org/r/583f41c49eef15210fa813e8229730d11427efa7.1677614637.git.andreyknvl@google.com
[andreyknvl@google.com: fix ``annotation`` for KCOV_REMOTE_ENABLE]
Link: https://lkml.kernel.org/r/72be5c215c275f35891229b90622ed859f196a46.1677684837.git.andreyknvl@google.com
Link: https://lkml.kernel.org/r/0b5efd70e31bba7912cf9a6c951f0e76a8df27df.1677517724.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Marco Elver <elver@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Uros Bizjak [Wed, 30 Nov 2022 15:12:31 +0000 (16:12 +0100)]
nfs: remove empty if statement from nfs3_prepare_get_acl
Remove empty if statement from nfs3_prepare_get_acl and update comment to
follow the one from the referred fs/posix_acl.c:get_acl().
No functional change intended.
Link: https://lkml.kernel.org/r/20221130151231.3654-1-ubizjak@gmail.com
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Anna Schumaker <anna@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Chao Yu [Tue, 31 Jan 2023 15:08:40 +0000 (23:08 +0800)]
proc: remove mark_inode_dirty() in .setattr()
procfs' .setattr() has updated i_uid, i_gid and i_mode into proc dirent,
we don't need to call mark_inode_dirty() for delayed update, remove it.
Link: https://lkml.kernel.org/r/20230131150840.34726-1-chao@kernel.org
Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Randy Dunlap [Thu, 23 Feb 2023 03:43:09 +0000 (19:43 -0800)]
ia64: salinfo: placate defined-but-not-used warning
When CONFIG_PROC_FS is not set, proc_salinfo_show() is not used. Mark the
function as __maybe_unused to quieten the warning message.
../arch/ia64/kernel/salinfo.c:584:12: warning: 'proc_salinfo_show' defined but not used [-Wunused-function]
584 | static int proc_salinfo_show(struct seq_file *m, void *v)
| ^~~~~~~~~~~~~~~~~
Link: https://lkml.kernel.org/r/20230223034309.13375-1-rdunlap@infradead.org
Fixes:
3f3942aca6da ("proc: introduce proc_create_single{,_data}")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Randy Dunlap [Thu, 23 Feb 2023 03:42:58 +0000 (19:42 -0800)]
ia64: mm/contig: fix section mismatch warning/error
alloc_per_cpu_data() is called by find_memory(), which is marked as
__init. Therefore alloc_per_cpu_data() can also be marked as __init to
remedy this modpost problem.
WARNING: modpost: vmlinux.o: section mismatch in reference: alloc_per_cpu_data (section: .text) -> memblock_alloc_try_nid (section: .init.text)
Link: https://lkml.kernel.org/r/20230223034258.12917-1-rdunlap@infradead.org
Fixes:
4b9ddc7cf272 ("[IA64] Fix section mismatch in contig.c version of per_cpu_init()")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Wang Yong [Mon, 13 Feb 2023 06:08:08 +0000 (14:08 +0800)]
delayacct: improve the average delay precision of getdelay tool to microsecond
Improve the average delay precision of getdelay tool to microsecond. When
using the getdelay tool, it is sometimes found that the average delay
except CPU is not 0, but display is 0, because the precison is too low.
For example, see delay average of SWAP below when using ZRAM.
print delayacct stats ON
PID 32915
CPU count real total virtual total delay total delay average
339202
2793871936 9233585504 7951112 0.000ms
IO count delay total delay average
41
419296904 10ms
SWAP count delay total delay average
242589
1045792384 0ms
This wrong display is misleading, so improve the millisecond precision of
the average delay to microsecond just like CPU. Then user would get more
accurate information of delay time.
Link: https://lkml.kernel.org/r/202302131408087983857@zte.com.cn
Signed-off-by: Wang Yong <wang.yong12@zte.com.cn>
Reviewed-by: Yang Yang <yang.yang29@zte.com.cn>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Linus Torvalds [Fri, 7 Apr 2023 20:53:16 +0000 (13:53 -0700)]
Merge tag 'gpio-fixes-for-v6.3-rc6' of git://git./linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
- fix irq handling in gpio-davinci
- fix Kconfig dependencies for gpio-regmap
* tag 'gpio-fixes-for-v6.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: davinci: Add irq chip flag to skip set wake
gpio: davinci: Do not clear the bank intr enable bit in save_context
gpio: GPIO_REGMAP: select REGMAP instead of depending on it
Linus Torvalds [Fri, 7 Apr 2023 20:32:54 +0000 (13:32 -0700)]
Merge tag 'acpi-6.3-rc6' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"Fix the ACPI backlight override mechanism for the cases when
acpi_backlight=video is set through the kernel command line or a DMI
quirk and add backlight quirks for Apple iMac14,1 and iMac14,2 and
Lenovo ThinkPad W530 (Hans de Goede)"
* tag 'acpi-6.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: video: Add acpi_backlight=video quirk for Lenovo ThinkPad W530
ACPI: video: Add acpi_backlight=video quirk for Apple iMac14,1 and iMac14,2
ACPI: video: Make acpi_backlight=video work independent from GPU driver
ACPI: video: Add auto_detect arg to __acpi_video_get_backlight_type()
Linus Torvalds [Fri, 7 Apr 2023 20:27:02 +0000 (13:27 -0700)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 fix from Catalin Marinas:
"Fix uninitialised variable warning (from smatch) in the arm64 compat
alignment fixup code"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: compat: Work around uninitialized variable warning
Linus Torvalds [Fri, 7 Apr 2023 20:10:23 +0000 (13:10 -0700)]
Merge tag '6.3-rc5-ksmbd-server-fixes' of git://git.samba.org/ksmbd
Pull ksmbd server fixes from Steve French:
"Four fixes, three for stable:
- slab out of bounds fix
- lock cancellation fix
- minor cleanup to address clang warning
- fix for xfstest 551 (wrong parms passed to kvmalloc)"
* tag '6.3-rc5-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: fix slab-out-of-bounds in init_smb2_rsp_hdr
ksmbd: delete asynchronous work from list
ksmbd: remove unused is_char_allowed function
ksmbd: do not call kvmalloc() with __GFP_NORETRY | __GFP_NO_WARN
Linus Torvalds [Thu, 6 Apr 2023 18:39:07 +0000 (11:39 -0700)]
Merge tag 'net-6.3-rc6-2' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from wireless and can.
Current release - regressions:
- wifi: mac80211:
- fix potential null pointer dereference
- fix receiving mesh packets in forwarding=0 networks
- fix mesh forwarding
Current release - new code bugs:
- virtio/vsock: fix leaks due to missing skb owner
Previous releases - regressions:
- raw: fix NULL deref in raw_get_next().
- sctp: check send stream number after wait_for_sndbuf
- qrtr:
- fix a refcount bug in qrtr_recvmsg()
- do not do DEL_SERVER broadcast after DEL_CLIENT
- wifi: brcmfmac: fix SDIO suspend/resume regression
- wifi: mt76: fix use-after-free in fw features query.
- can: fix race between isotp_sendsmg() and isotp_release()
- eth: mtk_eth_soc: fix remaining throughput regression
- eth: ice: reset FDIR counter in FDIR init stage
Previous releases - always broken:
- core: don't let netpoll invoke NAPI if in xmit context
- icmp: guard against too small mtu
- ipv6: fix an uninit variable access bug in __ip6_make_skb()
- wifi: mac80211: fix the size calculation of
ieee80211_ie_len_eht_cap()
- can: fix poll() to not report false EPOLLOUT events
- eth: gve: secure enough bytes in the first TX desc for all TCP
pkts"
* tag 'net-6.3-rc6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (47 commits)
net: stmmac: check fwnode for phy device before scanning for phy
net: stmmac: Add queue reset into stmmac_xdp_open() function
selftests: net: rps_default_mask.sh: delete veth link specifically
net: fec: make use of MDIO C45 quirk
can: isotp: fix race between isotp_sendsmg() and isotp_release()
can: isotp: isotp_ops: fix poll() to not report false EPOLLOUT events
can: isotp: isotp_recvmsg(): use sock_recv_cmsgs() to get SOCK_RXQ_OVFL infos
can: j1939: j1939_tp_tx_dat_new(): fix out-of-bounds memory access
gve: Secure enough bytes in the first TX desc for all TCP pkts
netlink: annotate lockless accesses to nlk->max_recvmsg_len
ethtool: reset #lanes when lanes is omitted
ping: Fix potentail NULL deref for /proc/net/icmp.
raw: Fix NULL deref in raw_get_next().
ice: Reset FDIR counter in FDIR init stage
ice: fix wrong fallback logic for FDIR
net: stmmac: fix up RX flow hash indirection table when setting channels
net: ethernet: ti: am65-cpsw: Fix mdio cleanup in probe
wifi: mt76: ignore key disable commands
wifi: ath11k: reduce the MHI timeout to 20s
ipv6: Fix an uninit variable access bug in __ip6_make_skb()
...
Linus Torvalds [Thu, 6 Apr 2023 18:34:18 +0000 (11:34 -0700)]
Merge tag 'linux-kselftest-fixes-6.3-rc6' of git://git./linux/kernel/git/shuah/linux-kselftest
Pull Kselftest fixes from Shuah Khan:
"One single fix to mount_setattr_test build failure"
* tag 'linux-kselftest-fixes-6.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests mount: Fix mount_setattr_test builds failed
Linus Torvalds [Thu, 6 Apr 2023 18:27:21 +0000 (11:27 -0700)]
Merge tag 'for-linus-iommufd' of git://git./linux/kernel/git/jgg/iommufd
Pull iommufd fixes from Jason Gunthorpe:
- An invalid VA range can be be put in a pages and eventually trigger
WARN_ON, reject it early
- Use of the wrong start index value when doing the complex batch carry
scheme
- Wrong store ordering resulting in corrupting data used in a later
calculation that corrupted the batch structure during carry
* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd:
iommufd: Do not corrupt the pfn list when doing batch carry
iommufd: Fix unpinning of pages when an access is present
iommufd: Check for uptr overflow
Linus Torvalds [Thu, 6 Apr 2023 18:08:03 +0000 (11:08 -0700)]
Merge tag 'pwm/for-6.3-rc6' of git://git./linux/kernel/git/thierry.reding/linux-pwm
Pull pwm fixes from Thierry Reding:
"These are some fixes to make sure the PWM state structure is always
initialized to a known state.
Prior to this it could happen in some situations that random data from
the stack would leak into the data structure and cause subtle bugs"
* tag 'pwm/for-6.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm:
pwm: Zero-initialize the pwm_state passed to driver's .get_state()
pwm: meson: Explicitly set .polarity in .get_state()
pwm: sprd: Explicitly set .polarity in .get_state()
pwm: iqs620a: Explicitly set .polarity in .get_state()
pwm: cros-ec: Explicitly set .polarity in .get_state()
pwm: hibvt: Explicitly set .polarity in .get_state()
Linus Torvalds [Thu, 6 Apr 2023 17:25:27 +0000 (10:25 -0700)]
Merge tag 'drm-fixes-2023-04-06' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Daniel Vetter:
"Mostly i915 fixes: dp mst for compression/dsc, perf ioctl uaf, ctx rpm
accounting, gt reset vs huc loading.
And a few individual driver fixes: ivpu dma fence&suspend, panfrost
mmap, nouveau color depth"
* tag 'drm-fixes-2023-04-06' of git://anongit.freedesktop.org/drm/drm:
accel/ivpu: Fix S3 system suspend when not idle
accel/ivpu: Add dma fence to command buffers only
drm/i915: Fix context runtime accounting
drm/i915: fix race condition UAF in i915_perf_add_config_ioctl
drm/i915: Use compressed bpp when calculating m/n value for DP MST DSC
drm/i915/huc: Cancel HuC delayed load timer on reset.
drm/i915/ttm: fix sparse warning
drm/panfrost: Fix the panfrost_mmu_map_fault_addr() error path
drm/nouveau/disp: Support more modes by checking with lower bpc
Linus Torvalds [Thu, 6 Apr 2023 17:19:30 +0000 (10:19 -0700)]
Merge tag 'sound-6.3-rc6' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"The majority of changes here are various fixes for Intel drivers,
and there is a change in ASoC PCM core for the format constraints.
In addition, a workaround for HD-audio HDMI regressions and usual
HD-audio quirks are found"
* tag 'sound-6.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/hdmi: Preserve the previous PCM device upon re-enablement
ALSA: hda/realtek: Add quirk for Clevo X370SNW
ALSA: hda/realtek: fix mute/micmute LEDs for a HP ProBook
ASoC: SOF: avoid a NULL dereference with unsupported widgets
ASoC: da7213.c: add missing pm_runtime_disable()
ASoC: hdac_hdmi: use set_stream() instead of set_tdm_slots()
ASoC: codecs: lpass: fix the order or clks turn off during suspend
ASoC: Intel: bytcr_rt5640: Add quirk for the Acer Iconia One 7 B1-750
ASoC: SOF: ipc4: Ensure DSP is in D0I0 during sof_ipc4_set_get_data()
ASoC: amd: yc: Add DMI entries to support Victus by HP Laptop 16-e1xxx (8A22)
ASoC: soc-pcm: fix hw->formats cleared by soc_pcm_hw_init() for dpcm
ASoC: Intel: soc-acpi: add table for Intel 'Rooks County' NUC M15
ASOC: Intel: sof_sdw: add quirk for Intel 'Rooks County' NUC M15
Linus Torvalds [Thu, 6 Apr 2023 17:13:23 +0000 (10:13 -0700)]
Merge tag 'platform-drivers-x86-v6.3-5' of git://git./linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Hans de Goede:
- more think-lmi fixes
- one DMI quirk addition
* tag 'platform-drivers-x86-v6.3-5' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86: thinkpad_acpi: Add missing T14s Gen1 type to s2idle quirk list
platform/x86: think-lmi: Clean up display of current_value on Thinkstation
platform/x86: think-lmi: Fix memory leaks when parsing ThinkStation WMI strings
platform/x86: think-lmi: Fix memory leak when showing current settings
Linus Torvalds [Thu, 6 Apr 2023 16:51:04 +0000 (09:51 -0700)]
Merge tag 'asm-generic-fixes-6.3' of git://git./linux/kernel/git/arnd/asm-generic
Pull asm-generic fixes from Arnd Bergmann:
"These are minor fixes to address false-positive build warnings:
Some of the less common I/O accessors are missing __force casts and
cause sparse warnings for their implied byteswap, and a recent change
to __generic_cmpxchg_local() causes a warning about constant integer
truncation"
* tag 'asm-generic-fixes-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
asm-generic: avoid __generic_cmpxchg_local warnings
asm-generic/io.h: suppress endianness warnings for relaxed accessors
asm-generic/io.h: suppress endianness warnings for readq() and writeq()
Michael Sit Wei Hong [Thu, 6 Apr 2023 02:45:41 +0000 (10:45 +0800)]
net: stmmac: check fwnode for phy device before scanning for phy
Some DT devices already have phy device configured in the DT/ACPI.
Current implementation scans for a phy unconditionally even though
there is a phy listed in the DT/ACPI and already attached.
We should check the fwnode if there is any phy device listed in
fwnode and decide whether to scan for a phy to attach to.
Fixes:
fe2cfbc96803 ("net: stmmac: check if MAC needs to attach to a PHY")
Reported-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Link: https://lore.kernel.org/lkml/20230403212434.296975-1-martin.blumenstingl@googlemail.com/
Tested-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Shahab Vahedi <shahab@synopsys.com>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Suggested-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Michael Sit Wei Hong <michael.wei.hong.sit@intel.com>
Link: https://lore.kernel.org/r/20230406024541.3556305-1-michael.wei.hong.sit@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Song Yoong Siang [Tue, 4 Apr 2023 04:48:23 +0000 (12:48 +0800)]
net: stmmac: Add queue reset into stmmac_xdp_open() function
Queue reset was moved out from __init_dma_rx_desc_rings() and
__init_dma_tx_desc_rings() functions. Thus, the driver fails to transmit
and receive packet after XDP prog setup.
This commit adds the missing queue reset into stmmac_xdp_open() function.
Fixes:
f9ec5723c3db ("net: ethernet: stmicro: stmmac: move queue reset to dedicated functions")
Cc: <stable@vger.kernel.org> # 6.0+
Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Link: https://lore.kernel.org/r/20230404044823.3226144-1-yoong.siang.song@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Hangbin Liu [Tue, 4 Apr 2023 07:24:11 +0000 (15:24 +0800)]
selftests: net: rps_default_mask.sh: delete veth link specifically
When deleting the netns and recreating a new one while re-adding the
veth interface, there is a small window of time during which the old
veth interface has not yet been removed. This can cause the new addition
to fail. To resolve this issue, we can either wait for a short while to
ensure that the old veth interface is deleted, or we can specifically
remove the veth interface.
Before this patch:
# ./rps_default_mask.sh
empty rps_default_mask [ ok ]
changing rps_default_mask dont affect existing devices [ ok ]
changing rps_default_mask dont affect existing netns [ ok ]
changing rps_default_mask affect newly created devices [ ok ]
changing rps_default_mask don't affect newly child netns[II][ ok ]
rps_default_mask is 0 by default in child netns [ ok ]
RTNETLINK answers: File exists
changing rps_default_mask in child ns don't affect the main one[ ok ]
cat: /sys/class/net/vethC11an1/queues/rx-0/rps_cpus: No such file or directory
changing rps_default_mask in child ns affects new childns devices./rps_default_mask.sh: line 36: [: -eq: unary operator expected
[fail] expected 1 found
changing rps_default_mask in child ns don't affect existing devices[ ok ]
After this patch:
# ./rps_default_mask.sh
empty rps_default_mask [ ok ]
changing rps_default_mask dont affect existing devices [ ok ]
changing rps_default_mask dont affect existing netns [ ok ]
changing rps_default_mask affect newly created devices [ ok ]
changing rps_default_mask don't affect newly child netns[II][ ok ]
rps_default_mask is 0 by default in child netns [ ok ]
changing rps_default_mask in child ns don't affect the main one[ ok ]
changing rps_default_mask in child ns affects new childns devices[ ok ]
changing rps_default_mask in child ns don't affect existing devices[ ok ]
Fixes:
3a7d84eae03b ("self-tests: more rps self tests")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/20230404072411.879476-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Greg Ungerer [Tue, 4 Apr 2023 05:22:07 +0000 (15:22 +1000)]
net: fec: make use of MDIO C45 quirk
Not all fec MDIO bus drivers support C45 mode transactions. The older fec
hardware block in many ColdFire SoCs does not appear to support them, at
least according to most of the different ColdFire SoC reference manuals.
The bits used to generate C45 access on the iMX parts, in the OP field
of the MMFR register, are documented as generating non-compliant MII
frames (it is not documented as to exactly how they are non-compliant).
Commit
8d03ad1ab0b0 ("net: fec: Separate C22 and C45 transactions")
means the fec driver will always register c45 MDIO read and write
methods. During probe these will always be accessed now generating
non-compliant MII accesses on ColdFire based devices.
Add a quirk define, FEC_QUIRK_HAS_MDIO_C45, that can be used to
distinguish silicon that supports MDIO C45 framing or not. Add this to
all the existing iMX quirks, so they will be behave as they do now (*).
(*) it seems that some iMX parts may not support C45 transactions either.
The iMX25 and iMX50 Reference Manuals contain similar wording to
the ColdFire Reference Manuals on this.
Fixes:
8d03ad1ab0b0 ("net: fec: Separate C22 and C45 transactions")
Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
Reviewed-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20230404052207.3064861-1-gerg@linux-m68k.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 6 Apr 2023 00:24:26 +0000 (17:24 -0700)]
Merge tag 'wireless-2023-04-05' of git://git./linux/kernel/git/wireless/wireless
Kalle Valo says:
====================
wireless fixes for v6.3
mt76 has a fix for leaking cleartext frames on a certain scenario and
two firmware file handling related fixes. For brcmfmac we have a fix
for an older SDIO suspend regression and for ath11k avoiding a kernel
crash during hibernation with SUSE kernels.
* tag 'wireless-2023-04-05' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: mt76: ignore key disable commands
wifi: ath11k: reduce the MHI timeout to 20s
wifi: mt76: mt7921: fix fw used for offload check for mt7922
wifi: mt76: mt7921: Fix use-after-free in fw features query.
wifi: brcmfmac: Fix SDIO suspend/resume regression
====================
Link: https://lore.kernel.org/r/20230405105536.4E946C433D2@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 6 Apr 2023 00:22:06 +0000 (17:22 -0700)]
Merge tag 'linux-can-fixes-for-6.3-
20230405' of git://git./linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2023-04-05
The first patch is by Oleksij Rempel and fixes a out-of-bounds memory
access in the j1939 protocol.
The remaining 3 patches target the ISOTP protocol. Oliver Hartkopp
fixes the ISOTP protocol to pass information about dropped PDUs to the
user space via control messages. Michal Sojka's patch fixes poll() to
not forward false EPOLLOUT events. And Oliver Hartkopp fixes a race
condition between isotp_sendsmg() and isotp_release().
* tag 'linux-can-fixes-for-6.3-
20230405' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: isotp: fix race between isotp_sendsmg() and isotp_release()
can: isotp: isotp_ops: fix poll() to not report false EPOLLOUT events
can: isotp: isotp_recvmsg(): use sock_recv_cmsgs() to get SOCK_RXQ_OVFL infos
can: j1939: j1939_tp_tx_dat_new(): fix out-of-bounds memory access
====================
Link: https://lore.kernel.org/r/20230405092444.1802340-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 6 Apr 2023 00:10:32 +0000 (17:10 -0700)]
Merge branch '100GbE' of git://git./linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2023-04-04 (ice)
This series contains updates to ice driver only.
Simei adjusts error path on adding VF Flow Director filters that were
not releasing all resources.
Lingyu adds setting/resetting of VF Flow Director filters counters
during initialization.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
ice: Reset FDIR counter in FDIR init stage
ice: fix wrong fallback logic for FDIR
====================
Link: https://lore.kernel.org/r/20230404172306.450880-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Daniel Vetter [Wed, 5 Apr 2023 19:06:27 +0000 (21:06 +0200)]
Merge tag 'drm-misc-fixes-2023-04-05' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
Short summary of fixes pull:
* ivpu: DMA fence and suspend fixes
* nouveau: Color-depth fixes
* panfrost: Fix mmap error handling
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20230405182855.GA1551@linux-uq9g
Hans de Goede [Tue, 4 Apr 2023 11:02:49 +0000 (13:02 +0200)]
ACPI: video: Add acpi_backlight=video quirk for Lenovo ThinkPad W530
The Lenovo ThinkPad W530 uses a nvidia k1000m GPU. When this gets used
together with one of the older nvidia binary driver series (the latest
series does not support it), then backlight control does not work.
This is caused by commit
3dbc80a3e4c5 ("ACPI: video: Make backlight
class device registration a separate step (v2)") combined with
commit
5aa9d943e9b6 ("ACPI: video: Don't enable fallback path for
creating ACPI backlight by default").
After these changes the acpi_video# backlight device is only registered
when requested by a GPU driver calling acpi_video_register_backlight()
which the nvidia binary driver does not do.
I realize that using the nvidia binary driver is not a supported use-case
and users can workaround this by adding acpi_backlight=video on the kernel
commandline, but the ThinkPad W530 is a popular model under Linux users,
so it seems worthwhile to add a quirk for this.
I will also email Nvidia asking them to make the driver call
acpi_video_register_backlight() when an internal LCD panel is detected.
So maybe the next maintenance release of the drivers will fix this...
Fixes:
5aa9d943e9b6 ("ACPI: video: Don't enable fallback path for creating ACPI backlight by default")
Cc: All applicable <stable@vger.kernel.org>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Hans de Goede [Tue, 4 Apr 2023 11:02:48 +0000 (13:02 +0200)]
ACPI: video: Add acpi_backlight=video quirk for Apple iMac14,1 and iMac14,2
On the Apple iMac14,1 and iMac14,2 all-in-ones (monitors with builtin "PC")
the connection between the GPU and the panel is seen by the GPU driver as
regular DP instead of eDP, causing the GPU driver to never call
acpi_video_register_backlight().
(GPU drivers only call acpi_video_register_backlight() when an internal
panel is detected, to avoid non working acpi_video# devices getting
registered on desktops which unfortunately is a real issue.)
Fix the missing acpi_video# backlight device on these all-in-ones by
adding a acpi_backlight=video DMI quirk, so that video.ko will
immediately register the backlight device instead of waiting for
an acpi_video_register_backlight() call.
Fixes:
5aa9d943e9b6 ("ACPI: video: Don't enable fallback path for creating ACPI backlight by default")
Cc: All applicable <stable@vger.kernel.org>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Hans de Goede [Tue, 4 Apr 2023 11:02:47 +0000 (13:02 +0200)]
ACPI: video: Make acpi_backlight=video work independent from GPU driver
Commit
3dbc80a3e4c5 ("ACPI: video: Make backlight class device
registration a separate step (v2)") combined with
commit
5aa9d943e9b6 ("ACPI: video: Don't enable fallback path for
creating ACPI backlight by default")
Means that the video.ko code now fully depends on the GPU driver calling
acpi_video_register_backlight() for the acpi_video# backlight class
devices to get registered.
This means that if the GPU driver does not do this, acpi_backlight=video
on the cmdline, or DMI quirks for selecting acpi_video# will not work.
This is a problem on for example Apple iMac14,1 all-in-ones where
the monitor's LCD panel shows up as a regular DP connection instead of
eDP so the GPU driver will not call acpi_video_register_backlight() [1].
Fix this by making video.ko directly register the acpi_video# devices
when these have been explicitly requested either on the cmdline or
through DMI quirks (rather then auto-detection being used).
[1] GPU drivers only call acpi_video_register_backlight() when an internal
panel is detected, to avoid non working acpi_video# devices getting
registered on desktops which unfortunately is a real issue.
Fixes:
5aa9d943e9b6 ("ACPI: video: Don't enable fallback path for creating ACPI backlight by default")
Cc: All applicable <stable@vger.kernel.org>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Hans de Goede [Tue, 4 Apr 2023 11:02:46 +0000 (13:02 +0200)]
ACPI: video: Add auto_detect arg to __acpi_video_get_backlight_type()
Allow callers of __acpi_video_get_backlight_type() to pass a pointer
to a bool which will get set to false if the backlight-type comes from
the cmdline or a DMI quirk and set to true if auto-detection was used.
And make __acpi_video_get_backlight_type() non static so that it can
be called directly outside of video_detect.c .
While at it turn the acpi_video_get_backlight_type() and
acpi_video_backlight_use_native() wrappers into static inline functions
in include/acpi/video.h, so that we need to export one less symbol.
Fixes:
5aa9d943e9b6 ("ACPI: video: Don't enable fallback path for creating ACPI backlight by default")
Cc: All applicable <stable@vger.kernel.org>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Ard Biesheuvel [Tue, 4 Apr 2023 10:36:25 +0000 (12:36 +0200)]
arm64: compat: Work around uninitialized variable warning
Dan reports that smatch complains about a potential uninitialized
variable being used in the compat alignment fixup code.
The logic is not wrong per se, but we do end up using an uninitialized
variable if reading the instruction that triggered the alignment fault
from user space faults, even if the fault ensures that the uninitialized
value doesn't propagate any further.
Given that we just give up and return 1 if any fault occurs when reading
the instruction, let's get rid of the 'success handling' pattern that
captures the fault in a variable and aborts later, and instead, just
return 1 immediately if any of the get_user() calls result in an
exception.
Fixes:
3fc24ef32d3b ("arm64: compat: Implement misalignment fixups for multiword loads")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <error27@gmail.com>
Link: https://lore.kernel.org/r/202304021214.gekJ8yRc-lkp@intel.com/
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20230404103625.2386382-1-ardb@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Linus Torvalds [Wed, 5 Apr 2023 16:11:08 +0000 (09:11 -0700)]
Merge tag 'trace-v6.3-rc5' of git://git./linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Fix timerlat notification, as it was not triggering the notify to
users when a new max latency was hit.
- Do not trigger max latency if the tracing is off.
When tracing is off, the ring buffer is not updated, it does not make
sense to notify when there's a new max latency detected by the
tracer, as why that latency happened is not available. The tracing
logic still runs when the ring buffer is disabled, but it should not
be triggering notifications.
- Fix race on freeing the synthetic event "last_cmd" variable by adding
a mutex around it.
- Fix race between reader and writer of the ring buffer by adding
memory barriers. When the writer is still on the reader page it must
have its content visible on the buffer before it moves the commit
index that the reader uses to know how much content is on the page.
- Make get_lock_parent_ip() always inlined, as it uses _THIS_IP_ and
_RET_IP_, which gets broken if it is not inlined.
- Make __field(int, arr[5]) in a TRACE_EVENT() macro fail to build.
The field formats of trace events are calculated by using
sizeof(type) and other means by what is passed into the structure
macros like __field(). The __field() macro is only meant for atom
types like int, long, short, pointer, etc. It is not meant for
arrays.
The code will currently compile with arrays, but then the format
produced will be inaccurate, and user space parsing tools will break.
Two bugs have already been fixed, now add code that will make the
kernel fail to build if another trace event includes this buggy field
format.
- Fix boot up snapshot code:
Boot snapshots were triggering when not even asked for on the kernel
command line. This was caused by two bugs:
1) It would trigger a snapshot on any instance if one was created
from the kernel command line.
2) The error handling would only affect the top level instance.
So the fact that a snapshot was done on a instance that didn't
allocate a buffer triggered a warning written into the top level
buffer, and worse yet, disabled the top level buffer.
- Fix memory leak that was caused when an error was logged in a trace
buffer instance, and then the buffer instance was removed.
The allocated error log messages still needed to be freed.
* tag 'trace-v6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Free error logs of tracing instances
tracing: Fix ftrace_boot_snapshot command line logic
tracing: Have tracing_snapshot_instance_cond() write errors to the appropriate instance
tracing: Error if a trace event has an array for a __field()
tracing/osnoise: Fix notify new tracing_max_latency
tracing/timerlat: Notify new max thread latency
ftrace: Mark get_lock_parent_ip() __always_inline
ring-buffer: Fix race while reader and writer are on the same page
tracing/synthetic: Fix races on freeing last_cmd
Steven Rostedt (Google) [Tue, 4 Apr 2023 23:45:04 +0000 (19:45 -0400)]
tracing: Free error logs of tracing instances
When a tracing instance is removed, the error messages that hold errors
that occurred in the instance needs to be freed. The following reports a
memory leak:
# cd /sys/kernel/tracing
# mkdir instances/foo
# echo 'hist:keys=x' > instances/foo/events/sched/sched_switch/trigger
# cat instances/foo/error_log
[ 117.404795] hist:sched:sched_switch: error: Couldn't find field
Command: hist:keys=x
^
# rmdir instances/foo
Then check for memory leaks:
# echo scan > /sys/kernel/debug/kmemleak
# cat /sys/kernel/debug/kmemleak
unreferenced object 0xffff88810d8ec700 (size 192):
comm "bash", pid 869, jiffies
4294950577 (age 215.752s)
hex dump (first 32 bytes):
60 dd 68 61 81 88 ff ff 60 dd 68 61 81 88 ff ff `.ha....`.ha....
a0 30 8c 83 ff ff ff ff 26 00 0a 00 00 00 00 00 .0......&.......
backtrace:
[<
00000000dae26536>] kmalloc_trace+0x2a/0xa0
[<
00000000b2938940>] tracing_log_err+0x277/0x2e0
[<
000000004a0e1b07>] parse_atom+0x966/0xb40
[<
0000000023b24337>] parse_expr+0x5f3/0xdb0
[<
00000000594ad074>] event_hist_trigger_parse+0x27f8/0x3560
[<
00000000293a9645>] trigger_process_regex+0x135/0x1a0
[<
000000005c22b4f2>] event_trigger_write+0x87/0xf0
[<
000000002cadc509>] vfs_write+0x162/0x670
[<
0000000059c3b9be>] ksys_write+0xca/0x170
[<
00000000f1cddc00>] do_syscall_64+0x3e/0xc0
[<
00000000868ac68c>] entry_SYSCALL_64_after_hwframe+0x72/0xdc
unreferenced object 0xffff888170c35a00 (size 32):
comm "bash", pid 869, jiffies
4294950577 (age 215.752s)
hex dump (first 32 bytes):
0a 20 20 43 6f 6d 6d 61 6e 64 3a 20 68 69 73 74 . Command: hist
3a 6b 65 79 73 3d 78 0a 00 00 00 00 00 00 00 00 :keys=x.........
backtrace:
[<
000000006a747de5>] __kmalloc+0x4d/0x160
[<
000000000039df5f>] tracing_log_err+0x29b/0x2e0
[<
000000004a0e1b07>] parse_atom+0x966/0xb40
[<
0000000023b24337>] parse_expr+0x5f3/0xdb0
[<
00000000594ad074>] event_hist_trigger_parse+0x27f8/0x3560
[<
00000000293a9645>] trigger_process_regex+0x135/0x1a0
[<
000000005c22b4f2>] event_trigger_write+0x87/0xf0
[<
000000002cadc509>] vfs_write+0x162/0x670
[<
0000000059c3b9be>] ksys_write+0xca/0x170
[<
00000000f1cddc00>] do_syscall_64+0x3e/0xc0
[<
00000000868ac68c>] entry_SYSCALL_64_after_hwframe+0x72/0xdc
The problem is that the error log needs to be freed when the instance is
removed.
Link: https://lore.kernel.org/lkml/76134d9f-a5ba-6a0d-37b3-28310b4a1e91@alu.unizg.hr/
Link: https://lore.kernel.org/linux-trace-kernel/20230404194504.5790b95f@gandalf.local.home
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Thorsten Leemhuis <regressions@leemhuis.info>
Cc: Ulf Hansson <ulf.hansson@linaro.org>
Cc: Eric Biggers <ebiggers@kernel.org>
Fixes:
2f754e771b1a6 ("tracing: Have the error logs show up in the proper instances")
Reported-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Tested-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Oliver Hartkopp [Fri, 31 Mar 2023 13:19:35 +0000 (15:19 +0200)]
can: isotp: fix race between isotp_sendsmg() and isotp_release()
As discussed with Dae R. Jeong and Hillf Danton here [1] the sendmsg()
function in isotp.c might get into a race condition when restoring the
former tx.state from the old_state.
Remove the old_state concept and implement proper locking for the
ISOTP_IDLE transitions in isotp_sendmsg(), inspired by a
simplification idea from Hillf Danton.
Introduce a new tx.state ISOTP_SHUTDOWN and use the same locking
mechanism from isotp_release() which resolves a potential race between
isotp_sendsmg() and isotp_release().
[1] https://lore.kernel.org/linux-can/ZB%2F93xJxq%2FBUqAgG@dragonet
v1: https://lore.kernel.org/all/
20230331102114.15164-1-socketcan@hartkopp.net
v2: https://lore.kernel.org/all/
20230331123600.3550-1-socketcan@hartkopp.net
take care of signal interrupts for wait_event_interruptible() in
isotp_release()
v3: https://lore.kernel.org/all/
20230331130654.9886-1-socketcan@hartkopp.net
take care of signal interrupts for wait_event_interruptible() in
isotp_sendmsg() in the wait_tx_done case
v4: https://lore.kernel.org/all/
20230331131935.21465-1-socketcan@hartkopp.net
take care of signal interrupts for wait_event_interruptible() in
isotp_sendmsg() in ALL cases
Cc: Dae R. Jeong <threeearcat@gmail.com>
Cc: Hillf Danton <hdanton@sina.com>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Fixes:
4f027cba8216 ("can: isotp: split tx timer into transmission and timeout")
Link: https://lore.kernel.org/all/20230331131935.21465-1-socketcan@hartkopp.net
Cc: stable@vger.kernel.org
[mkl: rephrase commit message]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Daniel Vetter [Wed, 5 Apr 2023 09:14:18 +0000 (11:14 +0200)]
Merge tag 'drm-intel-fixes-2023-04-05' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
drm/i915 fixes for v6.3-rc6:
- Fix DP MST DSC M/N calculation to use compressed bpp
- Fix racy use-after-free in perf ioctl
- Fix context runtime accounting
- Fix handling of GT reset during HuC loading
- Fix use of unsigned vm_fault_t for error values
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/87zg7mzomz.fsf@intel.com
Michal Sojka [Fri, 31 Mar 2023 12:55:11 +0000 (14:55 +0200)]
can: isotp: isotp_ops: fix poll() to not report false EPOLLOUT events
When using select()/poll()/epoll() with a non-blocking ISOTP socket to
wait for when non-blocking write is possible, a false EPOLLOUT event
is sometimes returned. This can happen at least after sending a
message which must be split to multiple CAN frames.
The reason is that isotp_sendmsg() returns -EAGAIN when tx.state is
not equal to ISOTP_IDLE and this behavior is not reflected in
datagram_poll(), which is used in isotp_ops.
This is fixed by introducing ISOTP-specific poll function, which
suppresses the EPOLLOUT events in that case.
v2: https://lore.kernel.org/all/
20230302092812.320643-1-michal.sojka@cvut.cz
v1: https://lore.kernel.org/all/
20230224010659.48420-1-michal.sojka@cvut.cz
https://lore.kernel.org/all/
b53a04a2-ba1f-3858-84c1-
d3eb3301ae15@hartkopp.net
Signed-off-by: Michal Sojka <michal.sojka@cvut.cz>
Reported-by: Jakub Jira <jirajak2@fel.cvut.cz>
Tested-by: Oliver Hartkopp <socketcan@hartkopp.net>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Fixes:
e057dd3fc20f ("can: add ISO 15765-2:2016 transport protocol")
Link: https://lore.kernel.org/all/20230331125511.372783-1-michal.sojka@cvut.cz
Cc: stable@vger.kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Oliver Hartkopp [Thu, 30 Mar 2023 17:02:48 +0000 (19:02 +0200)]
can: isotp: isotp_recvmsg(): use sock_recv_cmsgs() to get SOCK_RXQ_OVFL infos
isotp.c was still using sock_recv_timestamp() which does not provide
control messages to detect dropped PDUs in the receive path.
Fixes:
e057dd3fc20f ("can: add ISO 15765-2:2016 transport protocol")
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://lore.kernel.org/all/20230330170248.62342-1-socketcan@hartkopp.net
Cc: stable@vger.kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Oleksij Rempel [Tue, 4 Apr 2023 07:31:28 +0000 (09:31 +0200)]
can: j1939: j1939_tp_tx_dat_new(): fix out-of-bounds memory access
In the j1939_tp_tx_dat_new() function, an out-of-bounds memory access
could occur during the memcpy() operation if the size of skb->cb is
larger than the size of struct j1939_sk_buff_cb. This is because the
memcpy() operation uses the size of skb->cb, leading to a read beyond
the struct j1939_sk_buff_cb.
Updated the memcpy() operation to use the size of struct
j1939_sk_buff_cb instead of the size of skb->cb. This ensures that the
memcpy() operation only reads the memory within the bounds of struct
j1939_sk_buff_cb, preventing out-of-bounds memory access.
Additionally, add a BUILD_BUG_ON() to check that the size of skb->cb
is greater than or equal to the size of struct j1939_sk_buff_cb. This
ensures that the skb->cb buffer is large enough to hold the
j1939_sk_buff_cb structure.
Fixes:
9d71dd0c7009 ("can: add support of SAE J1939 protocol")
Reported-by: Shuangpeng Bai <sjb7183@psu.edu>
Tested-by: Shuangpeng Bai <sjb7183@psu.edu>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://groups.google.com/g/syzkaller/c/G_LL-C3plRs/m/-8xCi6dCAgAJ
Link: https://lore.kernel.org/all/20230404073128.3173900-1-o.rempel@pengutronix.de
Cc: stable@vger.kernel.org
[mkl: rephrase commit message]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Jacek Lawrynowicz [Fri, 31 Mar 2023 11:36:03 +0000 (13:36 +0200)]
accel/ivpu: Fix S3 system suspend when not idle
Wait for VPU to be idle in ivpu_pm_suspend_cb() before powering off
the device, so jobs are not lost and TDRs are not triggered after
resume.
Fixes:
852be13f3bd3 ("accel/ivpu: Add PM support")
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230331113603.2802515-3-stanislaw.gruszka@linux.intel.com
Karol Wachowski [Fri, 31 Mar 2023 11:36:02 +0000 (13:36 +0200)]
accel/ivpu: Add dma fence to command buffers only
Currently job->done_fence is added to every BO handle within a job. If job
handle (command buffer) is shared between multiple submits, KMD will add
the fence in each of them. Then bo_wait_ioctl() executed on command buffer
will exit only when all jobs containing that handle are done.
This creates deadlock scenario for user mode driver in case when job handle
is added as dependency of another job, because bo_wait_ioctl() of first job
will wait until second job finishes, and second job can not finish before
first one.
Having fences added only to job buffer handle allows user space to execute
bo_wait_ioctl() on the job even if it's handle is submitted with other job.
Fixes:
cd7272215c44 ("accel/ivpu: Add command buffer submission logic")
Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com>
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230331113603.2802515-2-stanislaw.gruszka@linux.intel.com
Steven Rostedt (Google) [Wed, 5 Apr 2023 02:21:15 +0000 (22:21 -0400)]
tracing: Fix ftrace_boot_snapshot command line logic
The kernel command line ftrace_boot_snapshot by itself is supposed to
trigger a snapshot at the end of boot up of the main top level trace
buffer. A ftrace_boot_snapshot=foo will do the same for an instance called
foo that was created by trace_instance=foo,...
The logic was broken where if ftrace_boot_snapshot was by itself, it would
trigger a snapshot for all instances that had tracing enabled, regardless
if it asked for a snapshot or not.
When a snapshot is requested for a buffer, the buffer's
tr->allocated_snapshot is set to true. Use that to know if a trace buffer
wants a snapshot at boot up or not.
Since the top level buffer is part of the ftrace_trace_arrays list,
there's no reason to treat it differently than the other buffers. Just
iterate the list if ftrace_boot_snapshot was specified.
Link: https://lkml.kernel.org/r/20230405022341.895334039@goodmis.org
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ross Zwisler <zwisler@google.com>
Fixes:
9c1c251d670bc ("tracing: Allow boot instances to have snapshot buffers")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Steven Rostedt (Google) [Wed, 5 Apr 2023 02:21:14 +0000 (22:21 -0400)]
tracing: Have tracing_snapshot_instance_cond() write errors to the appropriate instance
If a trace instance has a failure with its snapshot code, the error
message is to be written to that instance's buffer. But currently, the
message is written to the top level buffer. Worse yet, it may also disable
the top level buffer and not the instance that had the issue.
Link: https://lkml.kernel.org/r/20230405022341.688730321@goodmis.org
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ross Zwisler <zwisler@google.com>
Fixes:
2824f50332486 ("tracing: Make the snapshot trigger work with instances")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Shailend Chand [Mon, 3 Apr 2023 17:28:09 +0000 (10:28 -0700)]
gve: Secure enough bytes in the first TX desc for all TCP pkts
Non-GSO TCP packets whose SKBs' linear portion did not include the
entire TCP header were not populating the first Tx descriptor with
as many bytes as the vNIC expected. This change ensures that all
TCP packets populate the first descriptor with the correct number of
bytes.
Fixes:
893ce44df565 ("gve: Add basic driver framework for Compute Engine Virtual NIC")
Signed-off-by: Shailend Chand <shailend@google.com>
Link: https://lore.kernel.org/r/20230403172809.2939306-1-shailend@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Mon, 3 Apr 2023 21:46:43 +0000 (21:46 +0000)]
netlink: annotate lockless accesses to nlk->max_recvmsg_len
syzbot reported a data-race in data-race in netlink_recvmsg() [1]
Indeed, netlink_recvmsg() can be run concurrently,
and netlink_dump() also needs protection.
[1]
BUG: KCSAN: data-race in netlink_recvmsg / netlink_recvmsg
read to 0xffff888141840b38 of 8 bytes by task 23057 on cpu 0:
netlink_recvmsg+0xea/0x730 net/netlink/af_netlink.c:1988
sock_recvmsg_nosec net/socket.c:1017 [inline]
sock_recvmsg net/socket.c:1038 [inline]
__sys_recvfrom+0x1ee/0x2e0 net/socket.c:2194
__do_sys_recvfrom net/socket.c:2212 [inline]
__se_sys_recvfrom net/socket.c:2208 [inline]
__x64_sys_recvfrom+0x78/0x90 net/socket.c:2208
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
write to 0xffff888141840b38 of 8 bytes by task 23037 on cpu 1:
netlink_recvmsg+0x114/0x730 net/netlink/af_netlink.c:1989
sock_recvmsg_nosec net/socket.c:1017 [inline]
sock_recvmsg net/socket.c:1038 [inline]
____sys_recvmsg+0x156/0x310 net/socket.c:2720
___sys_recvmsg net/socket.c:2762 [inline]
do_recvmmsg+0x2e5/0x710 net/socket.c:2856
__sys_recvmmsg net/socket.c:2935 [inline]
__do_sys_recvmmsg net/socket.c:2958 [inline]
__se_sys_recvmmsg net/socket.c:2951 [inline]
__x64_sys_recvmmsg+0xe2/0x160 net/socket.c:2951
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
value changed: 0x0000000000000000 -> 0x0000000000001000
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 23037 Comm: syz-executor.2 Not tainted 6.3.0-rc4-syzkaller-00195-g5a57b48fdfcb #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/02/2023
Fixes:
9063e21fb026 ("netlink: autosize skb lengthes")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230403214643.768555-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Andy Roulin [Mon, 3 Apr 2023 21:20:53 +0000 (14:20 -0700)]
ethtool: reset #lanes when lanes is omitted
If the number of lanes was forced and then subsequently the user
omits this parameter, the ksettings->lanes is reset. The driver
should then reset the number of lanes to the device's default
for the specified speed.
However, although the ksettings->lanes is set to 0, the mod variable
is not set to true to indicate the driver and userspace should be
notified of the changes.
The consequence is that the same ethtool operation will produce
different results based on the initial state.
If the initial state is:
$ ethtool swp1 | grep -A 3 'Speed: '
Speed: 500000Mb/s
Lanes: 2
Duplex: Full
Auto-negotiation: on
then executing 'ethtool -s swp1 speed 50000 autoneg off' will yield:
$ ethtool swp1 | grep -A 3 'Speed: '
Speed: 500000Mb/s
Lanes: 2
Duplex: Full
Auto-negotiation: off
While if the initial state is:
$ ethtool swp1 | grep -A 3 'Speed: '
Speed: 500000Mb/s
Lanes: 1
Duplex: Full
Auto-negotiation: off
executing the same 'ethtool -s swp1 speed 50000 autoneg off' results in:
$ ethtool swp1 | grep -A 3 'Speed: '
Speed: 500000Mb/s
Lanes: 1
Duplex: Full
Auto-negotiation: off
This patch fixes this behavior. Omitting lanes will always results in
the driver choosing the default lane width for the chosen speed. In this
scenario, regardless of the initial state, the end state will be, e.g.,
$ ethtool swp1 | grep -A 3 'Speed: '
Speed: 500000Mb/s
Lanes: 2
Duplex: Full
Auto-negotiation: off
Fixes:
012ce4dd3102 ("ethtool: Extend link modes settings uAPI with lanes")
Signed-off-by: Andy Roulin <aroulin@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/ac238d6b-8726-8156-3810-6471291dbc7f@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 5 Apr 2023 01:56:59 +0000 (18:56 -0700)]
Merge branch 'raw-ping-fix-locking-in-proc-net-raw-icmp'
Kuniyuki Iwashima says:
====================
raw/ping: Fix locking in /proc/net/{raw,icmp}.
The first patch fixes a NULL deref for /proc/net/raw and second one fixes
the same issue for ping sockets.
The first patch also converts hlist_nulls to hlist, but this is because
the current code uses sk_nulls_for_each() for lockless readers, instead
of sk_nulls_for_each_rcu() which adds memory barrier, but raw sockets
does not use the nulls marker nor SLAB_TYPESAFE_BY_RCU in the first place.
OTOH, the ping sockets already uses sk_nulls_for_each_rcu(), and such
conversion can be posted later for net-next.
====================
Link: https://lore.kernel.org/r/20230403194959.48928-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kuniyuki Iwashima [Mon, 3 Apr 2023 19:49:59 +0000 (12:49 -0700)]
ping: Fix potentail NULL deref for /proc/net/icmp.
After commit
dbca1596bbb0 ("ping: convert to RCU lookups, get rid
of rwlock"), we use RCU for ping sockets, but we should use spinlock
for /proc/net/icmp to avoid a potential NULL deref mentioned in
the previous patch.
Let's go back to using spinlock there.
Note we can convert ping sockets to use hlist instead of hlist_nulls
because we do not use SLAB_TYPESAFE_BY_RCU for ping sockets.
Fixes:
dbca1596bbb0 ("ping: convert to RCU lookups, get rid of rwlock")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kuniyuki Iwashima [Mon, 3 Apr 2023 19:49:58 +0000 (12:49 -0700)]
raw: Fix NULL deref in raw_get_next().
Dae R. Jeong reported a NULL deref in raw_get_next() [0].
It seems that the repro was running these sequences in parallel so
that one thread was iterating on a socket that was being freed in
another netns.
unshare(0x40060200)
r0 = syz_open_procfs(0x0, &(0x7f0000002080)='net/raw\x00')
socket$inet_icmp_raw(0x2, 0x3, 0x1)
pread64(r0, &(0x7f0000000000)=""/10, 0xa, 0x10000000007f)
After commit
0daf07e52709 ("raw: convert raw sockets to RCU"), we
use RCU and hlist_nulls_for_each_entry() to iterate over SOCK_RAW
sockets. However, we should use spinlock for slow paths to avoid
the NULL deref.
Also, SOCK_RAW does not use SLAB_TYPESAFE_BY_RCU, and the slab object
is not reused during iteration in the grace period. In fact, the
lockless readers do not check the nulls marker with get_nulls_value().
So, SOCK_RAW should use hlist instead of hlist_nulls.
Instead of adding an unnecessary barrier by sk_nulls_for_each_rcu(),
let's convert hlist_nulls to hlist and use sk_for_each_rcu() for
fast paths and sk_for_each() and spinlock for /proc/net/raw.
[0]:
general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f]
CPU: 2 PID: 20952 Comm: syz-executor.0 Not tainted 6.2.0-g048ec869bafd-dirty #7
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
RIP: 0010:read_pnet include/net/net_namespace.h:383 [inline]
RIP: 0010:sock_net include/net/sock.h:649 [inline]
RIP: 0010:raw_get_next net/ipv4/raw.c:974 [inline]
RIP: 0010:raw_get_idx net/ipv4/raw.c:986 [inline]
RIP: 0010:raw_seq_start+0x431/0x800 net/ipv4/raw.c:995
Code: ef e8 33 3d 94 f7 49 8b 6d 00 4c 89 ef e8 b7 65 5f f7 49 89 ed 49 83 c5 98 0f 84 9a 00 00 00 48 83 c5 c8 48 89 e8 48 c1 e8 03 <42> 80 3c 30 00 74 08 48 89 ef e8 00 3d 94 f7 4c 8b 7d 00 48 89 ef
RSP: 0018:
ffffc9001154f9b0 EFLAGS:
00010206
RAX:
0000000000000005 RBX:
1ffff1100302c8fd RCX:
0000000000000000
RDX:
0000000000000028 RSI:
ffffc9001154f988 RDI:
ffffc9000f77a338
RBP:
0000000000000029 R08:
ffffffff8a50ffb4 R09:
fffffbfff24b6bd9
R10:
fffffbfff24b6bd9 R11:
0000000000000000 R12:
ffff88801db73b78
R13:
fffffffffffffff9 R14:
dffffc0000000000 R15:
0000000000000030
FS:
00007f843ae8e700(0000) GS:
ffff888063700000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
000055bb9614b35f CR3:
000000003c672000 CR4:
00000000003506e0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
Call Trace:
<TASK>
seq_read_iter+0x4c6/0x10f0 fs/seq_file.c:225
seq_read+0x224/0x320 fs/seq_file.c:162
pde_read fs/proc/inode.c:316 [inline]
proc_reg_read+0x23f/0x330 fs/proc/inode.c:328
vfs_read+0x31e/0xd30 fs/read_write.c:468
ksys_pread64 fs/read_write.c:665 [inline]
__do_sys_pread64 fs/read_write.c:675 [inline]
__se_sys_pread64 fs/read_write.c:672 [inline]
__x64_sys_pread64+0x1e9/0x280 fs/read_write.c:672
do_syscall_x64 arch/x86/entry/common.c:51 [inline]
do_syscall_64+0x4e/0xa0 arch/x86/entry/common.c:82
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x478d29
Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:
00007f843ae8dbe8 EFLAGS:
00000246 ORIG_RAX:
0000000000000011
RAX:
ffffffffffffffda RBX:
0000000000791408 RCX:
0000000000478d29
RDX:
000000000000000a RSI:
0000000020000000 RDI:
0000000000000003
RBP:
00000000f477909a R08:
0000000000000000 R09:
0000000000000000
R10:
000010000000007f R11:
0000000000000246 R12:
0000000000791740
R13:
0000000000791414 R14:
0000000000791408 R15:
00007ffc2eb48a50
</TASK>
Modules linked in:
---[ end trace
0000000000000000 ]---
RIP: 0010:read_pnet include/net/net_namespace.h:383 [inline]
RIP: 0010:sock_net include/net/sock.h:649 [inline]
RIP: 0010:raw_get_next net/ipv4/raw.c:974 [inline]
RIP: 0010:raw_get_idx net/ipv4/raw.c:986 [inline]
RIP: 0010:raw_seq_start+0x431/0x800 net/ipv4/raw.c:995
Code: ef e8 33 3d 94 f7 49 8b 6d 00 4c 89 ef e8 b7 65 5f f7 49 89 ed 49 83 c5 98 0f 84 9a 00 00 00 48 83 c5 c8 48 89 e8 48 c1 e8 03 <42> 80 3c 30 00 74 08 48 89 ef e8 00 3d 94 f7 4c 8b 7d 00 48 89 ef
RSP: 0018:
ffffc9001154f9b0 EFLAGS:
00010206
RAX:
0000000000000005 RBX:
1ffff1100302c8fd RCX:
0000000000000000
RDX:
0000000000000028 RSI:
ffffc9001154f988 RDI:
ffffc9000f77a338
RBP:
0000000000000029 R08:
ffffffff8a50ffb4 R09:
fffffbfff24b6bd9
R10:
fffffbfff24b6bd9 R11:
0000000000000000 R12:
ffff88801db73b78
R13:
fffffffffffffff9 R14:
dffffc0000000000 R15:
0000000000000030
FS:
00007f843ae8e700(0000) GS:
ffff888063700000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
00007f92ff166000 CR3:
000000003c672000 CR4:
00000000003506e0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
Fixes:
0daf07e52709 ("raw: convert raw sockets to RCU")
Reported-by: syzbot <syzkaller@googlegroups.com>
Reported-by: Dae R. Jeong <threeearcat@gmail.com>
Link: https://lore.kernel.org/netdev/ZCA2mGV_cmq7lIfV@dragonet/
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Tue, 4 Apr 2023 18:29:37 +0000 (11:29 -0700)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"PPC:
- Hide KVM_CAP_IRQFD_RESAMPLE if XIVE is enabled
s390:
- Fix handling of external interrupts in protected guests
x86:
- Resample the pending state of IOAPIC interrupts when unmasking them
- Fix usage of Hyper-V "enlightened TLB" on AMD
- Small fixes to real mode exceptions
- Suppress pending MMIO write exits if emulator detects exception
Documentation:
- Fix rST syntax"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
docs: kvm: x86: Fix broken field list
KVM: PPC: Make KVM_CAP_IRQFD_RESAMPLE platform dependent
KVM: s390: pv: fix external interruption loop not always detected
KVM: nVMX: Do not report error code when synthesizing VM-Exit from Real Mode
KVM: x86: Clear "has_error_code", not "error_code", for RM exception injection
KVM: x86: Suppress pending MMIO write exits if emulator detects exception
KVM: x86/ioapic: Resample the pending state of an IRQ when unmasking
KVM: irqfd: Make resampler_list an RCU list
KVM: SVM: Flush Hyper-V TLB when required
Linus Torvalds [Tue, 4 Apr 2023 18:20:55 +0000 (11:20 -0700)]
Merge tag 'nfsd-6.3-5' of git://git./linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
- Fix a crash and a resource leak in NFSv4 COMPOUND processing
- Fix issues with AUTH_SYS credential handling
- Try again to address an NFS/NFSD/SUNRPC build dependency regression
* tag 'nfsd-6.3-5' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
NFSD: callback request does not use correct credential for AUTH_SYS
NFS: Remove "select RPCSEC_GSS_KRB5
sunrpc: only free unix grouplist after RCU settles
nfsd: call op_release, even when op_func returns an error
NFSD: Avoid calling OPDESC() with ops->opnum == OP_ILLEGAL
Takahiro Itazuri [Fri, 31 Mar 2023 09:31:16 +0000 (10:31 +0100)]
docs: kvm: x86: Fix broken field list
Add a missing ":" to fix a broken field list.
Signed-off-by: Takahiro Itazuri <itazur@amazon.com>
Fixes:
ba7bb663f554 ("KVM: x86: Provide per VM capability for disabling PMU virtualization")
Message-Id: <
20230331093116.99820-1-itazur@amazon.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Arnd Bergmann [Thu, 2 Mar 2023 08:53:31 +0000 (09:53 +0100)]
asm-generic: avoid __generic_cmpxchg_local warnings
Code that passes a 32-bit constant into cmpxchg() produces a harmless
sparse warning because of the truncation in the branch that is not taken:
fs/erofs/zdata.c: note: in included file (through /home/arnd/arm-soc/arch/arm/include/asm/cmpxchg.h, /home/arnd/arm-soc/arch/arm/include/asm/atomic.h, /home/arnd/arm-soc/include/linux/atomic.h, ...):
include/asm-generic/cmpxchg-local.h:29:33: warning: cast truncates bits from constant value (
5f0ecafe becomes fe)
include/asm-generic/cmpxchg-local.h:33:34: warning: cast truncates bits from constant value (
5f0ecafe becomes cafe)
include/asm-generic/cmpxchg-local.h:29:33: warning: cast truncates bits from constant value (
5f0ecafe becomes fe)
include/asm-generic/cmpxchg-local.h:30:42: warning: cast truncates bits from constant value (
5f0edead becomes ad)
include/asm-generic/cmpxchg-local.h:33:34: warning: cast truncates bits from constant value (
5f0ecafe becomes cafe)
include/asm-generic/cmpxchg-local.h:34:44: warning: cast truncates bits from constant value (
5f0edead becomes dead)
This was reported as a regression to Matt's recent __generic_cmpxchg_local
patch, though this patch only added more warnings on top of the ones
that were already there.
Rewording the truncation to use an explicit bitmask instead of a cast
to a smaller type avoids the warning but otherwise leaves the code
unchanged.
I had another look at why the cast is even needed for atomic_cmpxchg(),
and as Matt describes the problem here is that atomic_t contains a
signed 'int', but cmpxchg() takes an 'unsigned long' argument, and
converting between the two leads to a 64-bit sign-extension of
negative 32-bit atomics.
I checked the other implementations of arch_cmpxchg() and did not find
any others that run into the same problem as __generic_cmpxchg_local(),
but it's easy to be on the safe side here and always convert the
signed int into an unsigned int when calling arch_cmpxchg(), as this
will work even when any of the arch_cmpxchg() implementations run
into the same problem.
Fixes:
624654152284 ("locking/atomic: cmpxchg: Make __generic_cmpxchg_local compare against zero-extended 'old' value")
Reviewed-by: Matt Evans <mev@rivosinc.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Vladimir Oltean [Mon, 9 Jan 2023 13:11:53 +0000 (15:11 +0200)]
asm-generic/io.h: suppress endianness warnings for relaxed accessors
Copy the forced type casts from the normal MMIO accessors to suppress
the sparse warnings that point out __raw_readl() returns a native endian
word (just like readl()).
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Vladimir Oltean [Mon, 9 Jan 2023 13:11:52 +0000 (15:11 +0200)]
asm-generic/io.h: suppress endianness warnings for readq() and writeq()
Commit
c1d55d50139b ("asm-generic/io.h: Fix sparse warnings on
big-endian architectures") missed fixing the 64-bit accessors.
Arnd explains in the attached link why the casts are necessary, even if
__raw_readq() and __raw_writeq() do not take endian-specific types.
Link: https://lore.kernel.org/lkml/9105d6fc-880b-4734-857d-e3d30b87ccf6@app.fastmail.com/
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Lingyu Liu [Tue, 28 Mar 2023 10:49:11 +0000 (10:49 +0000)]
ice: Reset FDIR counter in FDIR init stage
Reset the FDIR counters when FDIR inits. Without this patch,
when VF initializes or resets, all the FDIR counters are not
cleaned, which may cause unexpected behaviors for future FDIR
rule create (e.g., rule conflict).
Fixes:
1f7ea1cd6a37 ("ice: Enable FDIR Configure for AVF")
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
Signed-off-by: Lingyu Liu <lingyu.liu@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Simei Su [Wed, 22 Mar 2023 02:24:15 +0000 (10:24 +0800)]
ice: fix wrong fallback logic for FDIR
When adding a FDIR filter, if ice_vc_fdir_set_irq_ctx returns failure,
the inserted fdir entry will not be removed and if ice_vc_fdir_write_fltr
returns failure, the fdir context info for irq handler will not be cleared
which may lead to inconsistent or memory leak issue. This patch refines
failure cases to resolve this issue.
Fixes:
1f7ea1cd6a37 ("ice: Enable FDIR Configure for AVF")
Signed-off-by: Simei Su <simei.su@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Dai Ngo [Sat, 1 Apr 2023 20:22:08 +0000 (13:22 -0700)]
NFSD: callback request does not use correct credential for AUTH_SYS
Currently callback request does not use the credential specified in
CREATE_SESSION if the security flavor for the back channel is AUTH_SYS.
Problem was discovered by pynfs 4.1 DELEG5 and DELEG7 test with error:
DELEG5 st_delegation.testCBSecParms : FAILURE
expected callback with uid, gid == 17, 19, got 0, 0
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Fixes:
8276c902bbe9 ("SUNRPC: remove uid and gid from struct auth_cred")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Chuck Lever [Tue, 28 Mar 2023 17:47:58 +0000 (13:47 -0400)]
NFS: Remove "select RPCSEC_GSS_KRB5
If CONFIG_CRYPTO=n (e.g. arm/shmobile_defconfig):
WARNING: unmet direct dependencies detected for RPCSEC_GSS_KRB5
Depends on [n]: NETWORK_FILESYSTEMS [=y] && SUNRPC [=y] && CRYPTO [=n]
Selected by [y]:
- NFS_V4 [=y] && NETWORK_FILESYSTEMS [=y] && NFS_FS [=y]
As NFSv4 can work without crypto enabled, remove the RPCSEC_GSS_KRB5
dependency altogether.
Trond says:
> It is possible to use the NFSv4.1 client with just AUTH_SYS, and
> in fact there are plenty of people out there using only that. The
> fact that RFC5661 gets its knickers in a twist about RPCSEC_GSS
> support is largely irrelevant to those people.
>
> The other issue is that ’select’ enforces the strict dependency
> that if the NFS client is compiled into the kernel, then the
> RPCSEC_GSS and kerberos code needs to be compiled in as well: they
> cannot exist as modules.
Fixes:
e57d06527738 ("NFS & NFSD: Update GSS dependencies")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Niklas Söderlund <niklas.soderlund@ragnatech.se>
Suggested-by: Trond Myklebust <trondmy@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Jeff Layton [Thu, 30 Mar 2023 18:24:27 +0000 (14:24 -0400)]
sunrpc: only free unix grouplist after RCU settles
While the unix_gid object is rcu-freed, the group_info list that it
contains is not. Ensure that we only put the group list reference once
we are really freeing the unix_gid object.
Reported-by: Zhi Li <yieli@redhat.com>
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2183056
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Fixes:
fd5d2f78261b ("SUNRPC: Make server side AUTH_UNIX use lockless lookups")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Corinna Vinschen [Mon, 3 Apr 2023 12:11:20 +0000 (14:11 +0200)]
net: stmmac: fix up RX flow hash indirection table when setting channels
stmmac_reinit_queues() fails to fix up the RX hash. Even if the number
of channels gets restricted, the output of `ethtool -x' indicates that
all RX queues are used:
$ ethtool -l enp0s29f2
Channel parameters for enp0s29f2:
Pre-set maximums:
RX: 8
TX: 8
Other: n/a
Combined: n/a
Current hardware settings:
RX: 8
TX: 8
Other: n/a
Combined: n/a
$ ethtool -x enp0s29f2
RX flow hash indirection table for enp0s29f2 with 8 RX ring(s):
0: 0 1 2 3 4 5 6 7
8: 0 1 2 3 4 5 6 7
[...]
$ ethtool -L enp0s29f2 rx 3
$ ethtool -x enp0s29f2
RX flow hash indirection table for enp0s29f2 with 3 RX ring(s):
0: 0 1 2 3 4 5 6 7
8: 0 1 2 3 4 5 6 7
[...]
Fix this by setting the indirection table according to the number
of specified queues. The result is now as expected:
$ ethtool -L enp0s29f2 rx 3
$ ethtool -x enp0s29f2
RX flow hash indirection table for enp0s29f2 with 3 RX ring(s):
0: 0 1 2 0 1 2 0 1
8: 2 0 1 2 0 1 2 0
[...]
Tested on Intel Elkhart Lake.
Fixes:
0366f7e06a6b ("net: stmmac: add ethtool support for get/set channels")
Signed-off-by: Corinna Vinschen <vinschen@redhat.com>
Link: https://lore.kernel.org/r/20230403121120.489138-1-vinschen@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jason Gunthorpe [Fri, 31 Mar 2023 15:32:26 +0000 (12:32 -0300)]
iommufd: Do not corrupt the pfn list when doing batch carry
If batch->end is 0 then setting npfns[0] before computing the new value of
pfns will fail to adjust the pfn and result in various page accounting
corruptions. It should be ordered after.
This seems to result in various kinds of page meta-data corruption related
failures:
WARNING: CPU: 1 PID: 527 at mm/gup.c:75 try_grab_folio+0x503/0x740
Modules linked in:
CPU: 1 PID: 527 Comm: repro Not tainted 6.3.0-rc2-
eeac8ede1755+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
RIP: 0010:try_grab_folio+0x503/0x740
Code: e3 01 48 89 de e8 6d c1 dd ff 48 85 db 0f 84 7c fe ff ff e8 4f bf dd ff 49 8d 47 ff 48 89 45 d0 e9 73 fe ff ff e8 3d bf dd ff <0f> 0b 31 db e9 d0 fc ff ff e8 2f bf dd ff 48 8b 5d c8 31 ff 48 89
RSP: 0018:
ffffc90000f37908 EFLAGS:
00010046
RAX:
0000000000000000 RBX:
00000000fffffc02 RCX:
ffffffff81504c26
RDX:
0000000000000000 RSI:
ffff88800d030000 RDI:
0000000000000002
RBP:
ffffc90000f37948 R08:
000000000003ca24 R09:
0000000000000008
R10:
000000000003ca00 R11:
0000000000000023 R12:
ffffea000035d540
R13:
0000000000000001 R14:
0000000000000000 R15:
ffffea000035d540
FS:
00007fecbf659740(0000) GS:
ffff88807dd00000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
00000000200011c3 CR3:
000000000ef66006 CR4:
0000000000770ee0
PKRU:
55555554
Call Trace:
<TASK>
internal_get_user_pages_fast+0xd32/0x2200
pin_user_pages_fast+0x65/0x90
pfn_reader_user_pin+0x376/0x390
pfn_reader_next+0x14a/0x7b0
pfn_reader_first+0x140/0x1b0
iopt_area_fill_domain+0x74/0x210
iopt_table_add_domain+0x30e/0x6e0
iommufd_device_selftest_attach+0x7f/0x140
iommufd_test+0x10ff/0x16f0
iommufd_fops_ioctl+0x206/0x330
__x64_sys_ioctl+0x10e/0x160
do_syscall_64+0x3b/0x90
entry_SYSCALL_64_after_hwframe+0x72/0xdc
Cc: <stable@vger.kernel.org>
Fixes:
f394576eb11d ("iommufd: PFN handling for iopt_pages")
Link: https://lore.kernel.org/r/3-v1-ceab6a4d7d7a+94-iommufd_syz_jgg@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reported-by: Pengfei Xu <pengfei.xu@intel.com>
Tested-by: Pengfei Xu <pengfei.xu@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Jason Gunthorpe [Fri, 31 Mar 2023 15:32:25 +0000 (12:32 -0300)]
iommufd: Fix unpinning of pages when an access is present
syzkaller found that the calculation of batch_last_index should use
'start_index' since at input to this function the batch is either empty or
it has already been adjusted to cross any accesses so it will start at the
point we are unmapping from.
Getting this wrong causes the unmap to run over the end of the pages
which corrupts pages that were never mapped. In most cases this triggers
the num pinned debugging:
WARNING: CPU: 0 PID: 557 at drivers/iommu/iommufd/pages.c:294 __iopt_area_unfill_domain+0x152/0x560
Modules linked in:
CPU: 0 PID: 557 Comm: repro Not tainted 6.3.0-rc2-
eeac8ede1755 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
RIP: 0010:__iopt_area_unfill_domain+0x152/0x560
Code: d2 0f ff 44 8b 64 24 54 48 8b 44 24 48 31 ff 44 89 e6 48 89 44 24 38 e8 fc d3 0f ff 45 85 e4 0f 85 eb 01 00 00 e8 0e d2 0f ff <0f> 0b e8 07 d2 0f ff 48 8b 44 24 38 89 5c 24 58 89 18 8b 44 24 54
RSP: 0018:
ffffc9000108baf0 EFLAGS:
00010246
RAX:
0000000000000000 RBX:
00000000ffffffff RCX:
ffffffff821e3f85
RDX:
0000000000000000 RSI:
ffff88800faf0000 RDI:
0000000000000002
RBP:
ffffc9000108bd18 R08:
000000000003ca25 R09:
0000000000000014
R10:
000000000003ca00 R11:
0000000000000024 R12:
0000000000000004
R13:
0000000000000801 R14:
00000000000007ff R15:
0000000000000800
FS:
00007f3499ce1740(0000) GS:
ffff88807dc00000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
0000000020000243 CR3:
00000000179c2001 CR4:
0000000000770ef0
PKRU:
55555554
Call Trace:
<TASK>
iopt_area_unfill_domain+0x32/0x40
iopt_table_remove_domain+0x23f/0x4c0
iommufd_device_selftest_detach+0x3a/0x90
iommufd_selftest_destroy+0x55/0x70
iommufd_object_destroy_user+0xce/0x130
iommufd_destroy+0xa2/0xc0
iommufd_fops_ioctl+0x206/0x330
__x64_sys_ioctl+0x10e/0x160
do_syscall_64+0x3b/0x90
entry_SYSCALL_64_after_hwframe+0x72/0xdc
Also add some useful WARN_ON sanity checks.
Cc: <stable@vger.kernel.org>
Fixes:
8d160cd4d506 ("iommufd: Algorithms for PFN storage")
Link: https://lore.kernel.org/r/2-v1-ceab6a4d7d7a+94-iommufd_syz_jgg@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reported-by: Pengfei Xu <pengfei.xu@intel.com>
Tested-by: Pengfei Xu <pengfei.xu@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Jason Gunthorpe [Fri, 31 Mar 2023 15:32:24 +0000 (12:32 -0300)]
iommufd: Check for uptr overflow
syzkaller found that setting up a map with a user VA that wraps past zero
can trigger WARN_ONs, particularly from pin_user_pages weirdly returning 0
due to invalid arguments.
Prevent creating a pages with a uptr and size that would math overflow.
WARNING: CPU: 0 PID: 518 at drivers/iommu/iommufd/pages.c:793 pfn_reader_user_pin+0x2e6/0x390
Modules linked in:
CPU: 0 PID: 518 Comm: repro Not tainted 6.3.0-rc2-
eeac8ede1755+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
RIP: 0010:pfn_reader_user_pin+0x2e6/0x390
Code: b1 11 e9 25 fe ff ff e8 28 e4 0f ff 31 ff 48 89 de e8 2e e6 0f ff 48 85 db 74 0a e8 14 e4 0f ff e9 4d ff ff ff e8 0a e4 0f ff <0f> 0b bb f2 ff ff ff e9 3c ff ff ff e8 f9 e3 0f ff ba 01 00 00 00
RSP: 0018:
ffffc90000f9fa30 EFLAGS:
00010246
RAX:
0000000000000000 RBX:
0000000000000000 RCX:
ffffffff821e2b72
RDX:
0000000000000000 RSI:
ffff888014184680 RDI:
0000000000000002
RBP:
ffffc90000f9fa78 R08:
00000000000000ff R09:
0000000079de6f4e
R10:
ffffc90000f9f790 R11:
ffff888014185418 R12:
ffffc90000f9fc60
R13:
0000000000000002 R14:
ffff888007879800 R15:
0000000000000000
FS:
00007f4227555740(0000) GS:
ffff88807dc00000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
0000000020000043 CR3:
000000000e748005 CR4:
0000000000770ef0
PKRU:
55555554
Call Trace:
<TASK>
pfn_reader_next+0x14a/0x7b0
? interval_tree_double_span_iter_update+0x11a/0x140
pfn_reader_first+0x140/0x1b0
iopt_pages_rw_slow+0x71/0x280
? __this_cpu_preempt_check+0x20/0x30
iopt_pages_rw_access+0x2b2/0x5b0
iommufd_access_rw+0x19f/0x2f0
iommufd_test+0xd11/0x16f0
? write_comp_data+0x2f/0x90
iommufd_fops_ioctl+0x206/0x330
__x64_sys_ioctl+0x10e/0x160
? __pfx_iommufd_fops_ioctl+0x10/0x10
do_syscall_64+0x3b/0x90
entry_SYSCALL_64_after_hwframe+0x72/0xdc
Cc: <stable@vger.kernel.org>
Fixes:
8d160cd4d506 ("iommufd: Algorithms for PFN storage")
Link: https://lore.kernel.org/r/1-v1-ceab6a4d7d7a+94-iommufd_syz_jgg@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reported-by: Pengfei Xu <pengfei.xu@intel.com>
Tested-by: Pengfei Xu <pengfei.xu@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Siddharth Vadapalli [Mon, 3 Apr 2023 09:03:21 +0000 (14:33 +0530)]
net: ethernet: ti: am65-cpsw: Fix mdio cleanup in probe
In the am65_cpsw_nuss_probe() function's cleanup path, the call to
of_platform_device_destroy() for the common->mdio_dev device is invoked
unconditionally. It is possible that either the MDIO node is not present
in the device-tree, or the MDIO node is disabled in the device-tree. In
both these cases, the MDIO device is not created, resulting in a NULL
pointer dereference when the of_platform_device_destroy() function is
invoked on the common->mdio_dev device on the cleanup path.
Fix this by ensuring that the common->mdio_dev device exists, before
attempting to invoke of_platform_device_destroy().
Fixes:
a45cfcc69a25 ("net: ethernet: ti: am65-cpsw-nuss: use of_platform_device_create() for mdio")
Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Reviewed-by: Roger Quadros <rogerq@kernel.org>
Link: https://lore.kernel.org/r/20230403090321.835877-1-s-vadapalli@ti.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Linus Torvalds [Mon, 3 Apr 2023 16:41:24 +0000 (09:41 -0700)]
Merge tag 'vfs.misc.fixes.v6.3-rc6' of git://git./linux/kernel/git/vfs/idmapping
Pull vfs fix from Christian Brauner:
"When a mount or mount tree is made shared the vfs allocates new peer
group ids for all mounts that have no peer group id set. Only mounts
that aren't marked with MNT_SHARED are relevant here as MNT_SHARED
indicates that the mount has fully transitioned to a shared mount. The
peer group id handling is done with namespace lock held.
On failure, the peer group id settings of mounts for which a new peer
group id was allocated need to be reverted and the allocated peer
group id freed. The cleanup_group_ids() helper can identify the mounts
to cleanup by checking whether a given mount has a peer group id set
but isn't marked MNT_SHARED. The deallocation always needs to happen
with namespace lock held to protect against concurrent modifications
of the propagation settings.
This fixes the one place where the namespace lock was dropped before
calling cleanup_group_ids()"
* tag 'vfs.misc.fixes.v6.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping:
fs: drop peer group ids under namespace lock
Linus Torvalds [Mon, 3 Apr 2023 16:34:08 +0000 (09:34 -0700)]
Merge tag 'hyperv-fixes-signed-
20230402' of git://git./linux/kernel/git/hyperv/linux
Pull hyperv fixes from Wei Liu:
- Fix a bug in channel allocation for VMbus (Mohammed Gamal)
- Do not allow root partition functionality in CVM (Michael Kelley)
* tag 'hyperv-fixes-signed-
20230402' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
x86/hyperv: Block root partition functionality in a Confidential VM
Drivers: vmbus: Check for channel allocation before looking up relids
Steven Rostedt (Google) [Fri, 10 Mar 2023 03:13:02 +0000 (22:13 -0500)]
tracing: Error if a trace event has an array for a __field()
A __field() in the TRACE_EVENT() macro is used to set up the fields of the
trace event data. It is for single storage units (word, char, int,
pointer, etc) and not for complex structures or arrays. Unfortunately,
there's nothing preventing the build from accepting:
__field(int, arr[5]);
from building. It will turn into a array value. This use to work fine, as
the offset and size use to be determined by the macro using the field name,
but things have changed and the offset and size are now determined by the
type. So the above would only be size 4, and the next field will be
located 4 bytes from it (instead of 20).
The proper way to declare static arrays is to use the __array() macro.
Instead of __field(int, arr[5]) it should be __array(int, arr, 5).
Add some macro tricks to the building of a trace event from the
TRACE_EVENT() macro such that __field(int, arr[5]) will fail to build. A
comment by the failure will explain why the build failed.
Link: https://lore.kernel.org/lkml/20230306122549.236561-1-douglas.raillard@arm.com/
Link: https://lore.kernel.org/linux-trace-kernel/20230309221302.642e82d9@gandalf.local.home
Reported-by: Douglas RAILLARD <douglas.raillard@arm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Daniel Bristot de Oliveira [Wed, 29 Mar 2023 15:50:16 +0000 (17:50 +0200)]
tracing/osnoise: Fix notify new tracing_max_latency
osnoise/timerlat tracers are reporting new max latency on instances
where the tracing is off, creating inconsistencies between the max
reported values in the trace and in the tracing_max_latency. Thus
only report new tracing_max_latency on active tracing instances.
Link: https://lkml.kernel.org/r/ecd109fde4a0c24ab0f00ba1e9a144ac19a91322.1680104184.git.bristot@kernel.org
Cc: stable@vger.kernel.org
Fixes:
dae181349f1e ("tracing/osnoise: Support a list of trace_array *tr")
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Daniel Bristot de Oliveira [Wed, 29 Mar 2023 15:50:15 +0000 (17:50 +0200)]
tracing/timerlat: Notify new max thread latency
timerlat is not reporting a new tracing_max_latency for the thread
latency. The reason is that it is not calling notify_new_max_latency()
function after the new thread latency is sampled.
Call notify_new_max_latency() after computing the thread latency.
Link: https://lkml.kernel.org/r/16e18d61d69073d0192ace07bf61e405cca96e9c.1680104184.git.bristot@kernel.org
Cc: stable@vger.kernel.org
Fixes:
dae181349f1e ("tracing/osnoise: Support a list of trace_array *tr")
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
John Keeping [Mon, 27 Mar 2023 17:36:46 +0000 (18:36 +0100)]
ftrace: Mark get_lock_parent_ip() __always_inline
If the compiler decides not to inline this function then preemption
tracing will always show an IP inside the preemption disabling path and
never the function actually calling preempt_{enable,disable}.
Link: https://lore.kernel.org/linux-trace-kernel/20230327173647.1690849-1-john@metanate.com
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: stable@vger.kernel.org
Fixes:
f904f58263e1d ("sched/debug: Fix preempt_disable_ip recording for preempt_disable()")
Signed-off-by: John Keeping <john@metanate.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Zheng Yejian [Sat, 25 Mar 2023 02:12:47 +0000 (10:12 +0800)]
ring-buffer: Fix race while reader and writer are on the same page
When user reads file 'trace_pipe', kernel keeps printing following logs
that warn at "cpu_buffer->reader_page->read > rb_page_size(reader)" in
rb_get_reader_page(). It just looks like there's an infinite loop in
tracing_read_pipe(). This problem occurs several times on arm64 platform
when testing v5.10 and below.
Call trace:
rb_get_reader_page+0x248/0x1300
rb_buffer_peek+0x34/0x160
ring_buffer_peek+0xbc/0x224
peek_next_entry+0x98/0xbc
__find_next_entry+0xc4/0x1c0
trace_find_next_entry_inc+0x30/0x94
tracing_read_pipe+0x198/0x304
vfs_read+0xb4/0x1e0
ksys_read+0x74/0x100
__arm64_sys_read+0x24/0x30
el0_svc_common.constprop.0+0x7c/0x1bc
do_el0_svc+0x2c/0x94
el0_svc+0x20/0x30
el0_sync_handler+0xb0/0xb4
el0_sync+0x160/0x180
Then I dump the vmcore and look into the problematic per_cpu ring_buffer,
I found that tail_page/commit_page/reader_page are on the same page while
reader_page->read is obviously abnormal:
tail_page == commit_page == reader_page == {
.write = 0x100d20,
.read = 0x8f9f4805, // Far greater than 0xd20, obviously abnormal!!!
.entries = 0x10004c,
.real_end = 0x0,
.page = {
.time_stamp = 0x857257416af0,
.commit = 0xd20, // This page hasn't been full filled.
// .data[0...0xd20] seems normal.
}
}
The root cause is most likely the race that reader and writer are on the
same page while reader saw an event that not fully committed by writer.
To fix this, add memory barriers to make sure the reader can see the
content of what is committed. Since commit
a0fcaaed0c46 ("ring-buffer: Fix
race between reset page and reading page") has added the read barrier in
rb_get_reader_page(), here we just need to add the write barrier.
Link: https://lore.kernel.org/linux-trace-kernel/20230325021247.2923907-1-zhengyejian1@huawei.com
Cc: stable@vger.kernel.org
Fixes:
77ae365eca89 ("ring-buffer: make lockless")
Suggested-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Tze-nan Wu [Tue, 21 Mar 2023 11:04:43 +0000 (19:04 +0800)]
tracing/synthetic: Fix races on freeing last_cmd
Currently, the "last_cmd" variable can be accessed by multiple processes
asynchronously when multiple users manipulate synthetic_events node
at the same time, it could lead to use-after-free or double-free.
This patch add "lastcmd_mutex" to prevent "last_cmd" from being accessed
asynchronously.
================================================================
It's easy to reproduce in the KASAN environment by running the two
scripts below in different shells.
script 1:
while :
do
echo -n -e '\x88' > /sys/kernel/tracing/synthetic_events
done
script 2:
while :
do
echo -n -e '\xb0' > /sys/kernel/tracing/synthetic_events
done
================================================================
double-free scenario:
process A process B
------------------- ---------------
1.kstrdup last_cmd
2.free last_cmd
3.free last_cmd(double-free)
================================================================
use-after-free scenario:
process A process B
------------------- ---------------
1.kstrdup last_cmd
2.free last_cmd
3.tracing_log_err(use-after-free)
================================================================
Appendix 1. KASAN report double-free:
BUG: KASAN: double-free in kfree+0xdc/0x1d4
Free of addr ***** by task sh/4879
Call trace:
...
kfree+0xdc/0x1d4
create_or_delete_synth_event+0x60/0x1e8
trace_parse_run_command+0x2bc/0x4b8
synth_events_write+0x20/0x30
vfs_write+0x200/0x830
...
Allocated by task 4879:
...
kstrdup+0x5c/0x98
create_or_delete_synth_event+0x6c/0x1e8
trace_parse_run_command+0x2bc/0x4b8
synth_events_write+0x20/0x30
vfs_write+0x200/0x830
...
Freed by task 5464:
...
kfree+0xdc/0x1d4
create_or_delete_synth_event+0x60/0x1e8
trace_parse_run_command+0x2bc/0x4b8
synth_events_write+0x20/0x30
vfs_write+0x200/0x830
...
================================================================
Appendix 2. KASAN report use-after-free:
BUG: KASAN: use-after-free in strlen+0x5c/0x7c
Read of size 1 at addr ***** by task sh/5483
sh: CPU: 7 PID: 5483 Comm: sh
...
__asan_report_load1_noabort+0x34/0x44
strlen+0x5c/0x7c
tracing_log_err+0x60/0x444
create_or_delete_synth_event+0xc4/0x204
trace_parse_run_command+0x2bc/0x4b8
synth_events_write+0x20/0x30
vfs_write+0x200/0x830
...
Allocated by task 5483:
...
kstrdup+0x5c/0x98
create_or_delete_synth_event+0x80/0x204
trace_parse_run_command+0x2bc/0x4b8
synth_events_write+0x20/0x30
vfs_write+0x200/0x830
...
Freed by task 5480:
...
kfree+0xdc/0x1d4
create_or_delete_synth_event+0x74/0x204
trace_parse_run_command+0x2bc/0x4b8
synth_events_write+0x20/0x30
vfs_write+0x200/0x830
...
Link: https://lore.kernel.org/linux-trace-kernel/20230321110444.1587-1-Tze-nan.Wu@mediatek.com
Fixes:
27c888da9867 ("tracing: Remove size restriction on synthetic event cmd error logging")
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Cc: "Tom Zanussi" <zanussi@kernel.org>
Signed-off-by: Tze-nan Wu <Tze-nan.Wu@mediatek.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Dhruva Gole [Mon, 3 Apr 2023 07:24:43 +0000 (12:54 +0530)]
gpio: davinci: Add irq chip flag to skip set wake
Add the IRQCHIP_SKIP_SET_WAKE flag since there are no special IRQ Wake
bits that can be set to enable wakeup IRQ.
Fixes:
3d9edf09d452 ("[ARM] 4457/2: davinci: GPIO support")
Signed-off-by: Dhruva Gole <d-gole@ti.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>