Hyeonggon Yoo [Sun, 12 Dec 2021 06:52:41 +0000 (06:52 +0000)]
mm/slob: Remove unnecessary page_mapcount_reset() function call
After commit
401fb12c68c2 ("mm/sl*b: Differentiate struct slab fields by
sl*b implementations"), we can reorder fields of struct slab depending
on slab allocator.
For now, page_mapcount_reset() is called because page->_mapcount and
slab->units have same offset. But this is not necessary for struct slab.
Use unused field for units instead.
Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Link: https://lore.kernel.org/r/20211212065241.GA886691@odroid
Matthew Wilcox (Oracle) [Mon, 4 Oct 2021 13:46:48 +0000 (14:46 +0100)]
bootmem: Use page->index instead of page->freelist
page->freelist is for the use of slab. Using page->index is the same
set of bits as page->freelist, and by using an integer instead of a
pointer, we can avoid casts.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: <x86@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Matthew Wilcox (Oracle) [Mon, 4 Oct 2021 13:46:47 +0000 (14:46 +0100)]
zsmalloc: Stop using slab fields in struct page
The ->freelist and ->units members of struct page are for the use of
slab only. I'm not particularly familiar with zsmalloc, so generate the
same code by using page->index to store 'page' (page->index and
page->freelist are at the same offset in struct page). This should be
cleaned up properly at some point by somebody who is familiar with
zsmalloc.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Vlastimil Babka [Wed, 10 Nov 2021 13:12:45 +0000 (14:12 +0100)]
mm/slub: Define struct slab fields for CONFIG_SLUB_CPU_PARTIAL only when enabled
The fields 'next' and 'slabs' are only used when CONFIG_SLUB_CPU_PARTIAL
is enabled. We can put their definition to #ifdef to prevent accidental
use when disabled.
Currenlty show_slab_objects() and slabs_cpu_partial_show() contain code
accessing the slabs field that's effectively dead with
CONFIG_SLUB_CPU_PARTIAL=n through the wrappers slub_percpu_partial() and
slub_percpu_partial_read_once(), but to prevent a compile error, we need
to hide all this code behind #ifdef.
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Vlastimil Babka [Wed, 10 Nov 2021 11:57:43 +0000 (12:57 +0100)]
mm/slub: Simplify struct slab slabs field definition
Before commit
b47291ef02b0 ("mm, slub: change percpu partial accounting
from objects to pages") we had to fit two integer fields into a native
word size, so we used short int on 32-bit and int on 64-bit via #ifdef.
After that commit there is only one integer field, so we can simply
define it as int everywhere.
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Roman Gushchin <guro@fb.com>
Vlastimil Babka [Thu, 4 Nov 2021 10:30:58 +0000 (11:30 +0100)]
mm/sl*b: Differentiate struct slab fields by sl*b implementations
With a struct slab definition separate from struct page, we can go
further and define only fields that the chosen sl*b implementation uses.
This means everything between __page_flags and __page_refcount
placeholders now depends on the chosen CONFIG_SL*B. Some fields exist in
all implementations (slab_list) but can be part of a union in some, so
it's simpler to repeat them than complicate the definition with ifdefs
even more.
The patch doesn't change physical offsets of the fields, although it
could be done later - for example it's now clear that tighter packing in
SLOB could be possible.
This should also prevent accidental use of fields that don't exist in
given implementation. Before this patch virt_to_cache() and
cache_from_obj() were visible for SLOB (albeit not used), although they
rely on the slab_cache field that isn't set by SLOB. With this patch
it's now a compile error, so these functions are now hidden behind
an #ifndef CONFIG_SLOB.
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Roman Gushchin <guro@fb.com>
Tested-by: Marco Elver <elver@google.com> # kfence
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Marco Elver <elver@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: <kasan-dev@googlegroups.com>
Vlastimil Babka [Wed, 3 Nov 2021 17:19:48 +0000 (18:19 +0100)]
mm/kfence: Convert kfence_guarded_alloc() to struct slab
The function sets some fields that are being moved from struct page to
struct slab so it needs to be converted.
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Marco Elver <elver@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: <kasan-dev@googlegroups.com>
Matthew Wilcox (Oracle) [Mon, 4 Oct 2021 13:46:46 +0000 (14:46 +0100)]
mm/kasan: Convert to struct folio and struct slab
KASAN accesses some slab related struct page fields so we need to
convert it to struct slab. Some places are a bit simplified thanks to
kasan_addr_to_slab() encapsulating the PageSlab flag check through
virt_to_slab(). When resolving object address to either a real slab or
a large kmalloc, use struct folio as the intermediate type for testing
the slab flag to avoid unnecessary implicit compound_head().
[ vbabka@suse.cz: use struct folio, adjust to differences in previous
patches ]
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Tested-by: Hyeongogn Yoo <42.hyeyoo@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: <kasan-dev@googlegroups.com>
Matthew Wilcox (Oracle) [Mon, 4 Oct 2021 13:46:43 +0000 (14:46 +0100)]
mm/slob: Convert SLOB to use struct slab and struct folio
Use struct slab throughout the slob allocator. Where non-slab page can
appear use struct folio instead of struct page.
[ vbabka@suse.cz: don't introduce wrappers for PageSlobFree in mm/slab.h
just for the single callers being wrappers in mm/slob.c ]
[ Hyeonggon Yoo <42.hyeyoo@gmail.com>: fix NULL pointer deference ]
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Vlastimil Babka [Tue, 2 Nov 2021 21:42:04 +0000 (22:42 +0100)]
mm/memcg: Convert slab objcgs from struct page to struct slab
page->memcg_data is used with MEMCG_DATA_OBJCGS flag only for slab pages
so convert all the related infrastructure to struct slab. Also use
struct folio instead of struct page when resolving object pointers.
This is not just mechanistic changing of types and names. Now in
mem_cgroup_from_obj() we use folio_test_slab() to decide if we interpret
the folio as a real slab instead of a large kmalloc, instead of relying
on MEMCG_DATA_OBJCGS bit that used to be checked in page_objcgs_check().
Similarly in memcg_slab_free_hook() where we can encounter
kmalloc_large() pages (here the folio slab flag check is implied by
virt_to_slab()). As a result, page_objcgs_check() can be dropped instead
of converted.
To avoid include cycles, move the inline definition of slab_objcgs()
from memcontrol.h to mm/slab.h.
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Roman Gushchin <guro@fb.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: <cgroups@vger.kernel.org>
Vlastimil Babka [Tue, 2 Nov 2021 14:42:04 +0000 (15:42 +0100)]
mm: Convert struct page to struct slab in functions used by other subsystems
KASAN, KFENCE and memcg interact with SLAB or SLUB internals through
functions nearest_obj(), obj_to_index() and objs_per_slab() that use
struct page as parameter. This patch converts it to struct slab
including all callers, through a coccinelle semantic patch.
// Options: --include-headers --no-includes --smpl-spacing include/linux/slab_def.h include/linux/slub_def.h mm/slab.h mm/kasan/*.c mm/kfence/kfence_test.c mm/memcontrol.c mm/slab.c mm/slub.c
// Note: needs coccinelle 1.1.1 to avoid breaking whitespace
@@
@@
-objs_per_slab_page(
+objs_per_slab(
...
)
{ ... }
@@
@@
-objs_per_slab_page(
+objs_per_slab(
...
)
@@
identifier fn =~ "obj_to_index|objs_per_slab";
@@
fn(...,
- const struct page *page
+ const struct slab *slab
,...)
{
<...
(
- page_address(page)
+ slab_address(slab)
|
- page
+ slab
)
...>
}
@@
identifier fn =~ "nearest_obj";
@@
fn(...,
- struct page *page
+ const struct slab *slab
,...)
{
<...
(
- page_address(page)
+ slab_address(slab)
|
- page
+ slab
)
...>
}
@@
identifier fn =~ "nearest_obj|obj_to_index|objs_per_slab";
expression E;
@@
fn(...,
(
- slab_page(E)
+ E
|
- virt_to_page(E)
+ virt_to_slab(E)
|
- virt_to_head_page(E)
+ virt_to_slab(E)
|
- page
+ page_slab(page)
)
,...)
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Julia Lawall <julia.lawall@inria.fr>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Marco Elver <elver@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: <kasan-dev@googlegroups.com>
Cc: <cgroups@vger.kernel.org>
Vlastimil Babka [Tue, 2 Nov 2021 12:26:56 +0000 (13:26 +0100)]
mm/slab: Finish struct page to struct slab conversion
Change cache_free_alien() to use slab_nid(virt_to_slab()). Otherwise
just update of comments and some remaining variable names.
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Vlastimil Babka [Tue, 2 Nov 2021 12:23:10 +0000 (13:23 +0100)]
mm/slab: Convert most struct page to struct slab by spatch
The majority of conversion from struct page to struct slab in SLAB
internals can be delegated to a coccinelle semantic patch. This includes
renaming of variables with 'page' in name to 'slab', and similar.
Big thanks to Julia Lawall and Luis Chamberlain for help with
coccinelle.
// Options: --include-headers --no-includes --smpl-spacing mm/slab.c
// Note: needs coccinelle 1.1.1 to avoid breaking whitespace, and ocaml for the
// embedded script
// build list of functions for applying the next rule
@initialize:ocaml@
@@
let ok_function p =
not (List.mem (List.hd p).current_element ["kmem_getpages";"kmem_freepages"])
// convert the type in selected functions
@@
position p : script:ocaml() { ok_function p };
@@
- struct page@p
+ struct slab
@@
@@
-PageSlabPfmemalloc(page)
+slab_test_pfmemalloc(slab)
@@
@@
-ClearPageSlabPfmemalloc(page)
+slab_clear_pfmemalloc(slab)
@@
@@
obj_to_index(
...,
- page
+ slab_page(slab)
,...)
// for all functions, change any "struct slab *page" parameter to "struct slab
// *slab" in the signature, and generally all occurences of "page" to "slab" in
// the body - with some special cases.
@@
identifier fn;
expression E;
@@
fn(...,
- struct slab *page
+ struct slab *slab
,...)
{
<...
(
- int page_node;
+ int slab_node;
|
- page_node
+ slab_node
|
- page_slab(page)
+ slab
|
- page_address(page)
+ slab_address(slab)
|
- page_size(page)
+ slab_size(slab)
|
- page_to_nid(page)
+ slab_nid(slab)
|
- virt_to_head_page(E)
+ virt_to_slab(E)
|
- page
+ slab
)
...>
}
// rename a function parameter
@@
identifier fn;
expression E;
@@
fn(...,
- int page_node
+ int slab_node
,...)
{
<...
- page_node
+ slab_node
...>
}
// functions converted by previous rules that were temporarily called using
// slab_page(E) so we want to remove the wrapper now that they accept struct
// slab ptr directly
@@
identifier fn =~ "index_to_obj";
expression E;
@@
fn(...,
- slab_page(E)
+ E
,...)
// functions that were returning struct page ptr and now will return struct
// slab ptr, including slab_page() wrapper removal
@@
identifier fn =~ "cache_grow_begin|get_valid_first_slab|get_first_slab";
expression E;
@@
fn(...)
{
<...
- slab_page(E)
+ E
...>
}
// rename any former struct page * declarations
@@
@@
struct slab *
-page
+slab
;
// all functions (with exceptions) with a local "struct slab *page" variable
// that will be renamed to "struct slab *slab"
@@
identifier fn !~ "kmem_getpages|kmem_freepages";
expression E;
@@
fn(...)
{
<...
(
- page_slab(page)
+ slab
|
- page_to_nid(page)
+ slab_nid(slab)
|
- kasan_poison_slab(page)
+ kasan_poison_slab(slab_page(slab))
|
- page_address(page)
+ slab_address(slab)
|
- page_size(page)
+ slab_size(slab)
|
- page->pages
+ slab->slabs
|
- page = virt_to_head_page(E)
+ slab = virt_to_slab(E)
|
- virt_to_head_page(E)
+ virt_to_slab(E)
|
- page
+ slab
)
...>
}
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Roman Gushchin <guro@fb.com>
Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Julia Lawall <julia.lawall@inria.fr>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Vlastimil Babka [Fri, 29 Oct 2021 15:54:55 +0000 (17:54 +0200)]
mm/slab: Convert kmem_getpages() and kmem_freepages() to struct slab
These functions sit at the boundary to page allocator. Also use folio
internally to avoid extra compound_head() when dealing with page flags.
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Vlastimil Babka [Mon, 15 Nov 2021 15:55:15 +0000 (16:55 +0100)]
mm/slub: Finish struct page to struct slab conversion
Update comments mentioning pages to mention slabs where appropriate.
Also some goto labels.
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Roman Gushchin <guro@fb.com>
Vlastimil Babka [Wed, 3 Nov 2021 14:39:59 +0000 (15:39 +0100)]
mm/slub: Convert most struct page to struct slab by spatch
The majority of conversion from struct page to struct slab in SLUB
internals can be delegated to a coccinelle semantic patch. This includes
renaming of variables with 'page' in name to 'slab', and similar.
Big thanks to Julia Lawall and Luis Chamberlain for help with
coccinelle.
// Options: --include-headers --no-includes --smpl-spacing include/linux/slub_def.h mm/slub.c
// Note: needs coccinelle 1.1.1 to avoid breaking whitespace, and ocaml for the
// embedded script
// build list of functions to exclude from applying the next rule
@initialize:ocaml@
@@
let ok_function p =
not (List.mem (List.hd p).current_element ["nearest_obj";"obj_to_index";"objs_per_slab_page";"__slab_lock";"__slab_unlock";"free_nonslab_page";"kmalloc_large_node"])
// convert the type from struct page to struct page in all functions except the
// list from previous rule
// this also affects struct kmem_cache_cpu, but that's ok
@@
position p : script:ocaml() { ok_function p };
@@
- struct page@p
+ struct slab
// in struct kmem_cache_cpu, change the name from page to slab
// the type was already converted by the previous rule
@@
@@
struct kmem_cache_cpu {
...
-struct slab *page;
+struct slab *slab;
...
}
// there are many places that use c->page which is now c->slab after the
// previous rule
@@
struct kmem_cache_cpu *c;
@@
-c->page
+c->slab
@@
@@
struct kmem_cache {
...
- unsigned int cpu_partial_pages;
+ unsigned int cpu_partial_slabs;
...
}
@@
struct kmem_cache *s;
@@
- s->cpu_partial_pages
+ s->cpu_partial_slabs
@@
@@
static void
- setup_page_debug(
+ setup_slab_debug(
...)
{...}
@@
@@
- setup_page_debug(
+ setup_slab_debug(
...);
// for all functions (with exceptions), change any "struct slab *page"
// parameter to "struct slab *slab" in the signature, and generally all
// occurences of "page" to "slab" in the body - with some special cases.
@@
identifier fn !~ "free_nonslab_page|obj_to_index|objs_per_slab_page|nearest_obj";
@@
fn(...,
- struct slab *page
+ struct slab *slab
,...)
{
<...
- page
+ slab
...>
}
// similar to previous but the param is called partial_page
@@
identifier fn;
@@
fn(...,
- struct slab *partial_page
+ struct slab *partial_slab
,...)
{
<...
- partial_page
+ partial_slab
...>
}
// similar to previous but for functions that take pointer to struct page ptr
@@
identifier fn;
@@
fn(...,
- struct slab **ret_page
+ struct slab **ret_slab
,...)
{
<...
- ret_page
+ ret_slab
...>
}
// functions converted by previous rules that were temporarily called using
// slab_page(E) so we want to remove the wrapper now that they accept struct
// slab ptr directly
@@
identifier fn =~ "slab_free|do_slab_free";
expression E;
@@
fn(...,
- slab_page(E)
+ E
,...)
// similar to previous but for another pattern
@@
identifier fn =~ "slab_pad_check|check_object";
@@
fn(...,
- folio_page(folio, 0)
+ slab
,...)
// functions that were returning struct page ptr and now will return struct
// slab ptr, including slab_page() wrapper removal
@@
identifier fn =~ "allocate_slab|new_slab";
expression E;
@@
static
-struct slab *
+struct slab *
fn(...)
{
<...
- slab_page(E)
+ E
...>
}
// rename any former struct page * declarations
@@
@@
struct slab *
(
- page
+ slab
|
- partial_page
+ partial_slab
|
- oldpage
+ oldslab
)
;
// this has to be separate from previous rule as page and page2 appear at the
// same line
@@
@@
struct slab *
-page2
+slab2
;
// similar but with initial assignment
@@
expression E;
@@
struct slab *
(
- page
+ slab
|
- flush_page
+ flush_slab
|
- discard_page
+ slab_to_discard
|
- page_to_unfreeze
+ slab_to_unfreeze
)
= E;
// convert most of struct page to struct slab usage inside functions (with
// exceptions), including specific variable renames
@@
identifier fn !~ "nearest_obj|obj_to_index|objs_per_slab_page|__slab_(un)*lock|__free_slab|free_nonslab_page|kmalloc_large_node";
expression E;
@@
fn(...)
{
<...
(
- int pages;
+ int slabs;
|
- int pages = E;
+ int slabs = E;
|
- page
+ slab
|
- flush_page
+ flush_slab
|
- partial_page
+ partial_slab
|
- oldpage->pages
+ oldslab->slabs
|
- oldpage
+ oldslab
|
- unsigned int nr_pages;
+ unsigned int nr_slabs;
|
- nr_pages
+ nr_slabs
|
- unsigned int partial_pages = E;
+ unsigned int partial_slabs = E;
|
- partial_pages
+ partial_slabs
)
...>
}
// this has to be split out from the previous rule so that lines containing
// multiple matching changes will be fully converted
@@
identifier fn !~ "nearest_obj|obj_to_index|objs_per_slab_page|__slab_(un)*lock|__free_slab|free_nonslab_page|kmalloc_large_node";
@@
fn(...)
{
<...
(
- slab->pages
+ slab->slabs
|
- pages
+ slabs
|
- page2
+ slab2
|
- discard_page
+ slab_to_discard
|
- page_to_unfreeze
+ slab_to_unfreeze
)
...>
}
// after we simply changed all occurences of page to slab, some usages need
// adjustment for slab-specific functions, or use slab_page() wrapper
@@
identifier fn !~ "nearest_obj|obj_to_index|objs_per_slab_page|__slab_(un)*lock|__free_slab|free_nonslab_page|kmalloc_large_node";
@@
fn(...)
{
<...
(
- page_slab(slab)
+ slab
|
- kasan_poison_slab(slab)
+ kasan_poison_slab(slab_page(slab))
|
- page_address(slab)
+ slab_address(slab)
|
- page_size(slab)
+ slab_size(slab)
|
- PageSlab(slab)
+ folio_test_slab(slab_folio(slab))
|
- page_to_nid(slab)
+ slab_nid(slab)
|
- compound_order(slab)
+ slab_order(slab)
)
...>
}
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Julia Lawall <julia.lawall@inria.fr>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Matthew Wilcox (Oracle) [Mon, 4 Oct 2021 13:46:40 +0000 (14:46 +0100)]
mm/slub: Convert pfmemalloc_match() to take a struct slab
Preparatory for mass conversion. Use the new slab_test_pfmemalloc()
helper. As it doesn't do VM_BUG_ON(!PageSlab()) we no longer need the
pfmemalloc_match_unsafe() variant.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Vlastimil Babka [Fri, 29 Oct 2021 10:18:24 +0000 (12:18 +0200)]
mm/slub: Convert __free_slab() to use struct slab
__free_slab() is on the boundary of distinguishing struct slab and
struct page so start with struct slab but convert to folio for working
with flags and folio_page() to call functions that require struct page.
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Vlastimil Babka [Tue, 26 Oct 2021 14:35:02 +0000 (16:35 +0200)]
mm/slub: Convert alloc_slab_page() to return a struct slab
Preparatory, callers convert back to struct page for now.
Also move setting page flags to alloc_slab_page() where we still operate
on a struct page. This means the page->slab_cache pointer is now set
later than the PageSlab flag, which could theoretically confuse some pfn
walker assuming PageSlab means there would be a valid cache pointer. But
as the code had no barriers and used __set_bit() anyway, it could have
happened already, so there shouldn't be such a walker.
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Matthew Wilcox (Oracle) [Mon, 4 Oct 2021 13:46:34 +0000 (14:46 +0100)]
mm/slub: Convert print_page_info() to print_slab_info()
Improve the type safety and prepare for further conversion. For flags
access, convert to folio internally.
[ vbabka@suse.cz: access flags via folio_flags() ]
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Vlastimil Babka [Tue, 26 Oct 2021 11:39:14 +0000 (13:39 +0200)]
mm/slub: Convert __slab_lock() and __slab_unlock() to struct slab
These functions operate on the PG_locked page flag, but make them accept
struct slab to encapsulate this implementation detail.
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Roman Gushchin <guro@fb.com>
Matthew Wilcox (Oracle) [Mon, 4 Oct 2021 13:45:59 +0000 (14:45 +0100)]
mm/slub: Convert kfree() to use a struct slab
Convert kfree(), kmem_cache_free() and ___cache_free() to resolve object
addresses to struct slab, using folio as intermediate step where needed.
Keep passing the result as struct page for now in preparation for mass
conversion of internal functions.
[ vbabka@suse.cz: Use folio as intermediate step when checking for
large kmalloc pages, and when freeing them - rename
free_nonslab_page() to free_large_kmalloc() that takes struct folio ]
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Roman Gushchin <guro@fb.com>
Matthew Wilcox (Oracle) [Mon, 4 Oct 2021 13:45:58 +0000 (14:45 +0100)]
mm/slub: Convert detached_freelist to use a struct slab
This gives us a little bit of extra typesafety as we know that nobody
called virt_to_page() instead of virt_to_head_page().
[ vbabka@suse.cz: Use folio as intermediate step when filtering out
large kmalloc pages ]
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Roman Gushchin <guro@fb.com>
Matthew Wilcox (Oracle) [Mon, 4 Oct 2021 13:45:56 +0000 (14:45 +0100)]
mm: Convert check_heap_object() to use struct slab
Ensure that we're not seeing a tail page inside __check_heap_object() by
converting to a slab instead of a page. Take the opportunity to mark
the slab as const since we're not modifying it. Also move the
declaration of __check_heap_object() to mm/slab.h so it's not available
to the wider kernel.
[ vbabka@suse.cz: in check_heap_object() only convert to struct slab for
actual PageSlab pages; use folio as intermediate step instead of page ]
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Roman Gushchin <guro@fb.com>
Matthew Wilcox (Oracle) [Mon, 4 Oct 2021 13:45:55 +0000 (14:45 +0100)]
mm: Use struct slab in kmem_obj_info()
All three implementations of slab support kmem_obj_info() which reports
details of an object allocated from the slab allocator. By using the
slab type instead of the page type, we make it obvious that this can
only be called for slabs.
[ vbabka@suse.cz: also convert the related kmem_valid_obj() to folios ]
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Roman Gushchin <guro@fb.com>
Matthew Wilcox (Oracle) [Mon, 4 Oct 2021 13:45:54 +0000 (14:45 +0100)]
mm: Convert __ksize() to struct slab
In SLUB, use folios, and struct slab to access slab_cache field.
In SLOB, use folios to properly resolve pointers beyond
PAGE_SIZE offset of the object.
[ vbabka@suse.cz: use folios, and only convert folio_test_slab() == true
folios to struct slab ]
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Roman Gushchin <guro@fb.com>
Matthew Wilcox (Oracle) [Mon, 4 Oct 2021 13:45:53 +0000 (14:45 +0100)]
mm: Convert virt_to_cache() to use struct slab
This function is entirely self-contained, so can be converted from page
to slab.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Matthew Wilcox (Oracle) [Mon, 4 Oct 2021 13:45:52 +0000 (14:45 +0100)]
mm: Convert [un]account_slab_page() to struct slab
Convert the parameter of these functions to struct slab instead of
struct page and drop _page from the names. For now their callers just
convert page to slab.
[ vbabka@suse.cz: replace existing functions instead of calling them ]
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Roman Gushchin <guro@fb.com>
Matthew Wilcox (Oracle) [Mon, 4 Oct 2021 13:45:51 +0000 (14:45 +0100)]
mm: Split slab into its own type
Make struct slab independent of struct page. It still uses the
underlying memory in struct page for storing slab-specific data, but
slab and slub can now be weaned off using struct page directly. Some of
the wrapper functions (slab_address() and slab_order()) still need to
cast to struct folio, but this is a significant disentanglement.
[ vbabka@suse.cz: Rebase on folios, use folio instead of page where
possible.
Do not duplicate flags field in struct slab, instead make the related
accessors go through slab_folio(). For testing pfmemalloc use the
folio_*_active flag accessors directly so the PageSlabPfmemalloc
wrappers can be removed later.
Make folio_slab() expect only folio_test_slab() == true folios and
virt_to_slab() return NULL when folio_test_slab() == false.
Move struct slab to mm/slab.h.
Don't represent with struct slab pages that are not true slab pages,
but just a compound page obtained directly rom page allocator (with
large kmalloc() for SLUB and SLOB). ]
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Roman Gushchin <guro@fb.com>
Vlastimil Babka [Tue, 26 Oct 2021 16:22:44 +0000 (18:22 +0200)]
mm/slub: Make object_err() static
There are no callers outside of mm/slub.c anymore.
Move freelist_corrupted() that calls object_err() to avoid a need for
forward declaration.
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Roman Gushchin <guro@fb.com>
Vlastimil Babka [Mon, 1 Nov 2021 16:02:30 +0000 (17:02 +0100)]
mm/slab: Dissolve slab_map_pages() in its caller
The function no longer does what its name and comment suggests, and just
sets two struct page fields, which can be done directly in its sole
caller.
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Vlastimil Babka [Thu, 25 Nov 2021 17:15:37 +0000 (18:15 +0100)]
mm: add virt_to_folio() and folio_address()
These two wrappers around their respective struct page variants will be
useful in the following patches.
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Roman Gushchin <guro@fb.com>
Linus Torvalds [Sun, 19 Dec 2021 22:14:33 +0000 (14:14 -0800)]
Linux 5.16-rc6
Linus Torvalds [Sun, 19 Dec 2021 20:44:03 +0000 (12:44 -0800)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"Two small fixes, one of which was being worked around in selftests"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: Retry page fault if MMU reload is pending and root has no sp
KVM: selftests: vmx_pmu_msrs_test: Drop tests mangling guest visible CPUIDs
KVM: x86: Drop guest CPUID check for host initiated writes to MSR_IA32_PERF_CAPABILITIES
Linus Torvalds [Sun, 19 Dec 2021 20:38:53 +0000 (12:38 -0800)]
Merge tag 'block-5.16-2021-12-19' of git://git.kernel.dk/linux-block
Pull block revert from Jens Axboe:
"It turns out that the fix for not hammering on the delayed work timer
too much caused a performance regression for BFQ, so let's revert the
change for now.
I've got some ideas on how to fix it appropriately, but they should
wait for 5.17"
* tag 'block-5.16-2021-12-19' of git://git.kernel.dk/linux-block:
Revert "block: reduce kblockd_mod_delayed_work_on() CPU consumption"
Linus Torvalds [Sun, 19 Dec 2021 20:28:46 +0000 (12:28 -0800)]
Merge tag 'irq_urgent_for_v5.16_rc6' of git://git./linux/kernel/git/tip/tip
Pull irq fixes from Borislav Petkov:
- Clear the PCI_MSIX_FLAGS_MASKALL bit too on the error path so that it
is restored to its reset state
- Mask MSI-X vectors late on the init path in order to handle
out-of-spec Marvell NVME devices which apparently look at the MSI-X
mask even when MSI-X is disabled
* tag 'irq_urgent_for_v5.16_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
PCI/MSI: Clear PCI_MSIX_FLAGS_MASKALL on error
PCI/MSI: Mask MSI-X vectors only on success
Linus Torvalds [Sun, 19 Dec 2021 20:23:18 +0000 (12:23 -0800)]
Merge tag 'timers_urgent_for_v5.16_rc6' of git://git./linux/kernel/git/tip/tip
Pull timer fix from Borislav Petkov:
- Make sure the CLOCK_REALTIME to CLOCK_MONOTONIC offset is never
positive
* tag 'timers_urgent_for_v5.16_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timekeeping: Really make sure wall_to_monotonic isn't positive
Linus Torvalds [Sun, 19 Dec 2021 20:17:26 +0000 (12:17 -0800)]
Merge tag 'locking_urgent_for_v5.16_rc6' of git://git./linux/kernel/git/tip/tip
Pull locking fix from Borislav Petkov:
- Fix the rtmutex condition checking when the optimistic spinning of a
waiter needs to be terminated
* tag 'locking_urgent_for_v5.16_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/rtmutex: Fix incorrect condition in rtmutex_spin_on_owner()
Linus Torvalds [Sun, 19 Dec 2021 19:46:54 +0000 (11:46 -0800)]
Merge tag 'core_urgent_for_v5.16_rc6' of git://git./linux/kernel/git/tip/tip
Pull signal handlign fix from Borislav Petkov:
- Prevent lock contention on the new sigaltstack lock on the
common-case path, when no changes have been made to the alternative
signal stack.
* tag 'core_urgent_for_v5.16_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
signal: Skip the altstack update when not needed
Linus Torvalds [Sun, 19 Dec 2021 19:40:11 +0000 (11:40 -0800)]
Merge tag 'mips-fixes_5.16_3' of git://git./linux/kernel/git/mips/linux
Pull MIPS fix from Thomas Bogendoerfer:
- only enable pci_remap_iospace() for Ralink devices
* tag 'mips-fixes_5.16_3' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: Only define pci_remap_iospace() for Ralink
Linus Torvalds [Sun, 19 Dec 2021 19:31:14 +0000 (11:31 -0800)]
Merge tag 'powerpc-5.16-4' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Fix a recently introduced oops at boot on 85xx in some configurations.
Fix crashes when loading some livepatch modules with
STRICT_MODULE_RWX.
Thanks to Joe Lawrence, Russell Currey, and Xiaoming Ni"
* tag 'powerpc-5.16-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/module_64: Fix livepatching for RO modules
powerpc/85xx: Fix oops when CONFIG_FSL_PMC=n
Linus Torvalds [Sun, 19 Dec 2021 19:23:02 +0000 (11:23 -0800)]
Merge tag '5.16-rc5-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
"Two cifs/smb3 fixes, one fscache related, and one mount parsing
related for stable"
* tag '5.16-rc5-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: sanitize multiple delimiters in prepath
cifs: ignore resource_id while getting fscache super cookie
Sean Christopherson [Thu, 9 Dec 2021 06:05:46 +0000 (06:05 +0000)]
KVM: x86: Retry page fault if MMU reload is pending and root has no sp
Play nice with a NULL shadow page when checking for an obsolete root in
the page fault handler by flagging the page fault as stale if there's no
shadow page associated with the root and KVM_REQ_MMU_RELOAD is pending.
Invalidating memslots, which is the only case where _all_ roots need to
be reloaded, requests all vCPUs to reload their MMUs while holding
mmu_lock for lock.
The "special" roots, e.g. pae_root when KVM uses PAE paging, are not
backed by a shadow page. Running with TDP disabled or with nested NPT
explodes spectaculary due to dereferencing a NULL shadow page pointer.
Skip the KVM_REQ_MMU_RELOAD check if there is a valid shadow page for the
root. Zapping shadow pages in response to guest activity, e.g. when the
guest frees a PGD, can trigger KVM_REQ_MMU_RELOAD even if the current
vCPU isn't using the affected root. I.e. KVM_REQ_MMU_RELOAD can be seen
with a completely valid root shadow page. This is a bit of a moot point
as KVM currently unloads all roots on KVM_REQ_MMU_RELOAD, but that will
be cleaned up in the future.
Fixes:
a955cad84cda ("KVM: x86/mmu: Retry page fault if root is invalidated by memslot update")
Cc: stable@vger.kernel.org
Cc: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <
20211209060552.2956723-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Vitaly Kuznetsov [Thu, 16 Dec 2021 16:52:12 +0000 (17:52 +0100)]
KVM: selftests: vmx_pmu_msrs_test: Drop tests mangling guest visible CPUIDs
Host initiated writes to MSR_IA32_PERF_CAPABILITIES should not depend
on guest visible CPUIDs and (incorrect) KVM logic implementing it is
about to change. Also, KVM_SET_CPUID{,2} after KVM_RUN is now forbidden
and causes test to fail.
Reported-by: kernel test robot <oliver.sang@intel.com>
Fixes:
feb627e8d6f6 ("KVM: x86: Forbid KVM_SET_CPUID{,2} after KVM_RUN")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <
20211216165213.338923-2-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Vitaly Kuznetsov [Thu, 16 Dec 2021 16:52:13 +0000 (17:52 +0100)]
KVM: x86: Drop guest CPUID check for host initiated writes to MSR_IA32_PERF_CAPABILITIES
The ability to write to MSR_IA32_PERF_CAPABILITIES from the host should
not depend on guest visible CPUID entries, even if just to allow
creating/restoring guest MSRs and CPUIDs in any sequence.
Fixes:
27461da31089 ("KVM: x86/pmu: Support full width counting")
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <
20211216165213.338923-3-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Jens Axboe [Sun, 19 Dec 2021 14:58:44 +0000 (07:58 -0700)]
Revert "block: reduce kblockd_mod_delayed_work_on() CPU consumption"
This reverts commit
cb2ac2912a9ca7d3d26291c511939a41361d2d83.
Alex and the kernel test robot report that this causes a significant
performance regression with BFQ. I can reproduce that result, so let's
revert this one as we're close to -rc6 and we there's no point in trying
to rush a fix.
Link: https://lore.kernel.org/linux-block/1639853092.524jxfaem2.none@localhost/
Link: https://lore.kernel.org/lkml/20211219141852.GH14057@xsang-OptiPlex-9020/
Reported-by: Alex Xu (Hello71) <alex_y_xu@yahoo.ca>
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Sat, 18 Dec 2021 21:23:55 +0000 (13:23 -0800)]
Merge tag 'tty-5.16-rc6' of git://git./linux/kernel/git/gregkh/tty
Pull tty/serial fixes from Greg KH:
"Here are two small tty/serial fixes for 5.16-rc6. They include:
- n_hdlc fix for syzbot reported problem that you were previously
copied on.
- 8250_fintek driver fix that resolved a console problem by removing
a previous change.
Both have been in linux-next with no reported issues"
* tag 'tty-5.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
serial: 8250_fintek: Fix garbled text for console
tty: n_hdlc: make n_hdlc_tty_wakeup() asynchronous
Linus Torvalds [Sat, 18 Dec 2021 21:16:43 +0000 (13:16 -0800)]
Merge tag 'usb-5.16-rc6' of git://git./linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are a number of small USB driver fixes for reported problems.
They include:
- dwc2 driver fixes
- xhci driver fixes
- cdnsp driver fixes
- typec driver fix
- gadget u_ether driver fix
- new quirk additions
- usb gadget endpoint calculation fix
- usb serial new device ids
- revert of a xhci-dbg change that broke early debug booting
All changes, except for the revert, have been in linux-next with no
reported problems. The revert was from yesterday, and it was reported
by the developers affected that it resolved their problem"
* tag 'usb-5.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
Revert "usb: early: convert to readl_poll_timeout_atomic()"
usb: typec: tcpm: fix tcpm unregister port but leave a pending timer
usb: cdnsp: Fix lack of spin_lock_irqsave/spin_lock_restore
USB: NO_LPM quirk Lenovo USB-C to Ethernet Adapher(RTL8153-04)
usb: xhci: Extend support for runtime power management for AMD's Yellow carp.
usb: dwc2: fix STM ID/VBUS detection startup delay in dwc2_driver_probe
USB: gadget: bRequestType is a bitfield, not a enum
USB: serial: option: add Telit FN990 compositions
USB: serial: cp210x: fix CP2105 GPIO registration
usb: cdnsp: Fix incorrect status for control request
usb: cdnsp: Fix issue in cdnsp_log_ep trace event
usb: cdnsp: Fix incorrect calling of cdnsp_died function
usb: xhci-mtk: fix list_del warning when enable list debug
usb: gadget: u_ether: fix race in setting MAC address in setup phase
Linus Torvalds [Sat, 18 Dec 2021 19:53:14 +0000 (11:53 -0800)]
Merge tag 'perf-tools-fixes-for-v5.16-2021-12-18' of git://git./linux/kernel/git/acme/linux
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- Fix segfaults in 'perf inject' related to usage of unopened files
- The return value of hashmap__new() should be checked using IS_ERR()
* tag 'perf-tools-fixes-for-v5.16-2021-12-18' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
perf inject: Fix segfault due to perf_data__fd() without open
perf inject: Fix segfault due to close without open
perf expr: Fix missing check for return value of hashmap__new()
Adrian Hunter [Mon, 13 Dec 2021 08:48:29 +0000 (10:48 +0200)]
perf inject: Fix segfault due to perf_data__fd() without open
The fixed commit attempts to get the output file descriptor even if the
file was never opened e.g.
$ perf record uname
Linux
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.002 MB perf.data (7 samples) ]
$ perf inject -i perf.data --vm-time-correlation=dry-run
Segmentation fault (core dumped)
$ gdb --quiet perf
Reading symbols from perf...
(gdb) r inject -i perf.data --vm-time-correlation=dry-run
Starting program: /home/ahunter/bin/perf inject -i perf.data --vm-time-correlation=dry-run
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Program received signal SIGSEGV, Segmentation fault.
__GI___fileno (fp=0x0) at fileno.c:35
35 fileno.c: No such file or directory.
(gdb) bt
#0 __GI___fileno (fp=0x0) at fileno.c:35
#1 0x00005621e48dd987 in perf_data__fd (data=0x7fff4c68bd08) at util/data.h:72
#2 perf_data__fd (data=0x7fff4c68bd08) at util/data.h:69
#3 cmd_inject (argc=<optimized out>, argv=0x7fff4c69c1f0) at builtin-inject.c:1017
#4 0x00005621e4936783 in run_builtin (p=0x5621e4ee6878 <commands+600>, argc=4, argv=0x7fff4c69c1f0) at perf.c:313
#5 0x00005621e4897d5c in handle_internal_command (argv=<optimized out>, argc=<optimized out>) at perf.c:365
#6 run_argv (argcp=<optimized out>, argv=<optimized out>) at perf.c:409
#7 main (argc=4, argv=0x7fff4c69c1f0) at perf.c:539
(gdb)
Fixes:
0ae03893623dd1dd ("perf tools: Pass a fd to perf_file_header__read_pipe()")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: stable@vger.kernel.org
Link: http://lore.kernel.org/lkml/20211213084829.114772-3-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Mon, 13 Dec 2021 08:48:28 +0000 (10:48 +0200)]
perf inject: Fix segfault due to close without open
The fixed commit attempts to close inject.output even if it was never
opened e.g.
$ perf record uname
Linux
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.002 MB perf.data (7 samples) ]
$ perf inject -i perf.data --vm-time-correlation=dry-run
Segmentation fault (core dumped)
$ gdb --quiet perf
Reading symbols from perf...
(gdb) r inject -i perf.data --vm-time-correlation=dry-run
Starting program: /home/ahunter/bin/perf inject -i perf.data --vm-time-correlation=dry-run
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Program received signal SIGSEGV, Segmentation fault.
0x00007eff8afeef5b in _IO_new_fclose (fp=0x0) at iofclose.c:48
48 iofclose.c: No such file or directory.
(gdb) bt
#0 0x00007eff8afeef5b in _IO_new_fclose (fp=0x0) at iofclose.c:48
#1 0x0000557fc7b74f92 in perf_data__close (data=data@entry=0x7ffcdafa6578) at util/data.c:376
#2 0x0000557fc7a6b807 in cmd_inject (argc=<optimized out>, argv=<optimized out>) at builtin-inject.c:1085
#3 0x0000557fc7ac4783 in run_builtin (p=0x557fc8074878 <commands+600>, argc=4, argv=0x7ffcdafb6a60) at perf.c:313
#4 0x0000557fc7a25d5c in handle_internal_command (argv=<optimized out>, argc=<optimized out>) at perf.c:365
#5 run_argv (argcp=<optimized out>, argv=<optimized out>) at perf.c:409
#6 main (argc=4, argv=0x7ffcdafb6a60) at perf.c:539
(gdb)
Fixes:
02e6246f5364d526 ("perf inject: Close inject.output on exit")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: stable@vger.kernel.org
Link: http://lore.kernel.org/lkml/20211213084829.114772-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Miaoqian Lin [Sun, 12 Dec 2021 06:25:02 +0000 (06:25 +0000)]
perf expr: Fix missing check for return value of hashmap__new()
The hashmap__new() function may return ERR_PTR(-ENOMEM) when malloc()
fails, add IS_ERR() checking for ctx->ids.
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20211212062504.25841-1-linmq006@gmail.com
[ s/kfree()/free()/ and add missing linux/err.h include ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Zqiang [Fri, 17 Dec 2021 07:42:07 +0000 (15:42 +0800)]
locking/rtmutex: Fix incorrect condition in rtmutex_spin_on_owner()
Optimistic spinning needs to be terminated when the spinning waiter is not
longer the top waiter on the lock, but the condition is negated. It
terminates if the waiter is the top waiter, which is defeating the whole
purpose.
Fixes:
c3123c431447 ("locking/rtmutex: Dont dereference waiter lockless")
Signed-off-by: Zqiang <qiang1.zhang@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20211217074207.77425-1-qiang1.zhang@intel.com
Linus Torvalds [Sat, 18 Dec 2021 01:24:53 +0000 (17:24 -0800)]
Merge tag 'libata-5.16-rc6' of git://git./linux/kernel/git/dlemoal/libata
Pull libata fix from Damien Le Moal:
"A single fix for this cycle:
- Check that ATA16 passthrough commands that do not transfer any data
have a DMA direction set to DMA_NONE (From George)"
* tag 'libata-5.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
libata: if T_LENGTH is zero, dma direction should be DMA_NONE
Linus Torvalds [Sat, 18 Dec 2021 01:19:51 +0000 (17:19 -0800)]
Merge tag 'zonefs-5.16-rc6' of git://git./linux/kernel/git/dlemoal/zonefs
Pull zonefs fixes from Damien Le Moal:
"One fix and one trivial update for rc6:
- Add MODULE_ALIAS_FS to get automatic module loading on mount
(Naohiro)
- Update Damien's email address in the MAINTAINERS file (me)"
* tag 'zonefs-5.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
MAITAINERS: Change zonefs maintainer email address
zonefs: add MODULE_ALIAS_FS
Thiago Rafael Becker [Fri, 17 Dec 2021 18:20:22 +0000 (15:20 -0300)]
cifs: sanitize multiple delimiters in prepath
mount.cifs can pass a device with multiple delimiters in it. This will
cause rename(2) to fail with ENOENT.
V2:
- Make sanitize_path more readable.
- Fix multiple delimiters between UNC and prepath.
- Avoid a memory leak if a bad user starts putting a lot of delimiters
in the path on purpose.
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=2031200
Fixes:
24e0a1eff9e2 ("cifs: switch to new mount api")
Cc: stable@vger.kernel.org # 5.11+
Acked-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Thiago Rafael Becker <trbecker@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Shyam Prasad N [Wed, 8 Dec 2021 16:33:19 +0000 (16:33 +0000)]
cifs: ignore resource_id while getting fscache super cookie
We have a cyclic dependency between fscache super cookie
and root inode cookie. The super cookie relies on
tcon->resource_id, which gets populated from the root inode
number. However, fetching the root inode initializes inode
cookie as a child of super cookie, which is yet to be populated.
resource_id is only used as auxdata to check the validity of
super cookie. We can completely avoid setting resource_id to
remove the circular dependency. Since vol creation time and
vol serial numbers are used for auxdata, we should be fine.
Additionally, there will be auxiliary data check for each
inode cookie as well.
Fixes:
5bf91ef03d98 ("cifs: wait for tcon resource_id before getting fscache super")
CC: David Howells <dhowells@redhat.com>
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Yu Liao [Mon, 13 Dec 2021 13:57:27 +0000 (21:57 +0800)]
timekeeping: Really make sure wall_to_monotonic isn't positive
Even after commit
e1d7ba873555 ("time: Always make sure wall_to_monotonic
isn't positive") it is still possible to make wall_to_monotonic positive
by running the following code:
int main(void)
{
struct timespec time;
clock_gettime(CLOCK_MONOTONIC, &time);
time.tv_nsec = 0;
clock_settime(CLOCK_REALTIME, &time);
return 0;
}
The reason is that the second parameter of timespec64_compare(), ts_delta,
may be unnormalized because the delta is calculated with an open coded
substraction which causes the comparison of tv_sec to yield the wrong
result:
wall_to_monotonic = { .tv_sec = -10, .tv_nsec =
900000000 }
ts_delta = { .tv_sec = -9, .tv_nsec = -
900000000 }
That makes timespec64_compare() claim that wall_to_monotonic < ts_delta,
but actually the result should be wall_to_monotonic > ts_delta.
After normalization, the result of timespec64_compare() is correct because
the tv_sec comparison is not longer misleading:
wall_to_monotonic = { .tv_sec = -10, .tv_nsec =
900000000 }
ts_delta = { .tv_sec = -10, .tv_nsec =
100000000 }
Use timespec64_sub() to ensure that ts_delta is normalized, which fixes the
issue.
Fixes:
e1d7ba873555 ("time: Always make sure wall_to_monotonic isn't positive")
Signed-off-by: Yu Liao <liaoyu15@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20211213135727.1656662-1-liaoyu15@huawei.com
Linus Torvalds [Fri, 17 Dec 2021 21:55:03 +0000 (13:55 -0800)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fix from James Bottomley:
"One driver fix: the pm8001 has never actually worked on a system with
an IOMMU and this fixes that use case"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: pm8001: Fix phys_to_virt() usage on dma_addr_t
Linus Torvalds [Fri, 17 Dec 2021 21:50:58 +0000 (13:50 -0800)]
Merge tag 'for-5.16-rc5-tag' of git://git./linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"A few more fixes, almost all error handling one-liners and for stable.
- regression fix in directory logging items
- regression fix of extent buffer status bits handling after an error
- fix memory leak in error handling path in tree-log
- fix freeing invalid anon device number when handling errors during
subvolume creation
- fix warning when freeing leaf after subvolume creation failure
- fix missing blkdev put in device scan error handling
- fix invalid delayed ref after subvolume creation failure"
* tag 'for-5.16-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix missing blkdev_put() call in btrfs_scan_one_device()
btrfs: fix warning when freeing leaf after subvolume creation failure
btrfs: fix invalid delayed ref after subvolume creation failure
btrfs: check WRITE_ERR when trying to read an extent buffer
btrfs: fix missing last dir item offset update when logging directory
btrfs: fix double free of anon_dev after failure to create subvolume
btrfs: fix memory leak in __add_inode_ref()
Linus Torvalds [Fri, 17 Dec 2021 20:03:14 +0000 (12:03 -0800)]
Merge tag 'selinux-pr-
20211217' of git://git./linux/kernel/git/pcmoore/selinux
Pull selinux fix from Paul Moore:
"Another small SELinux fix for v5.16 to ensure that we don't block on
memory allocations while holding a spinlock.
This passes all our tests without problem"
* tag 'selinux-pr-
20211217' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: fix sleeping function called from invalid context
Linus Torvalds [Fri, 17 Dec 2021 19:55:35 +0000 (11:55 -0800)]
Merge tag 'riscv-for-linus-5.16-rc6' of git://git./linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
- A handful of DT updates for the SiFive HiFive Unmatched, that fix the
regulator handling. These should stop some warning spew.
- A pair of fixes for both the SiFive Hifive Unleashed and Unmatched,
that correctly hook up the MMC card detect signal.
* tag 'riscv-for-linus-5.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: dts: sifive unmatched: Link the tmp451 with its power supply
riscv: dts: sifive unmatched: Fix regulator for board rev3
riscv: dts: sifive unmatched: Expose the PMIC sub-functions
riscv: dts: sifive unmatched: Expose the board ID eeprom
riscv: dts: sifive unmatched: Name gpio lines
riscv: dts: unmatched: Add gpio card detect to mmc-spi-slot
riscv: dts: unleashed: Add gpio card detect to mmc-spi-slot
Linus Torvalds [Fri, 17 Dec 2021 19:46:07 +0000 (11:46 -0800)]
Merge tag 'block-5.16-2021-12-17' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- Fix for hammering on the delayed run queue timer (me)
- bcache regression fix for this merge window (Lin)
- Fix a divide-by-zero in the blk-iocost code (Tejun)
* tag 'block-5.16-2021-12-17' of git://git.kernel.dk/linux-block:
bcache: fix NULL pointer reference in cached_dev_detach_finish
block: reduce kblockd_mod_delayed_work_on() CPU consumption
iocost: Fix divide-by-zero on donation from low hweight cgroup
Linus Torvalds [Fri, 17 Dec 2021 19:31:46 +0000 (11:31 -0800)]
Merge tag 'io_uring-5.16-2021-12-17' of git://git.kernel.dk/linux-block
Pull io_uring fix from Jens Axboe:
"Just a single fix, fixing an issue with the worker creation change
that was merged last week"
* tag 'io_uring-5.16-2021-12-17' of git://git.kernel.dk/linux-block:
io-wq: drop wqe lock before creating new worker
Linus Torvalds [Fri, 17 Dec 2021 19:25:50 +0000 (11:25 -0800)]
Merge tag 'dmaengine-fix-5.16' of git://git./linux/kernel/git/vkoul/dmaengine
Pull dmaengine fixes from Vinod Koul:
"A bunch of driver fixes, notably:
- uninit variable fix for dw-axi-dmac driver
- return value check dw-edma driver
- calling wq quiesce inside spinlock and missed completion for idxd
driver
- mod alias fix for st_fdma driver"
* tag 'dmaengine-fix-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine:
dmaengine: st_fdma: fix MODULE_ALIAS
dmaengine: idxd: fix missed completion on abort path
dmaengine: ti: k3-udma: Fix smatch warnings
dmaengine: idxd: fix calling wq quiesce inside spinlock
dmaengine: dw-edma: Fix return value check for dma_set_mask_and_coherent()
dmaengine: dw-axi-dmac: Fix uninitialized variable in axi_chan_block_xfer_start()
Linus Torvalds [Fri, 17 Dec 2021 19:07:13 +0000 (11:07 -0800)]
Merge tag 'drm-fixes-2021-12-17-1' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Mostly amdgpu fixes this week scattered around the driver, otherwise
one i915, one ast, one simpledrm. There is a revert in the fb-helper
for places userspace was using a string that we tried to change.
i915:
- Fix a bound check in the DMC fw load.
ast:
- NULL ptr deref fix
simpledrm:
- pixel clock units fix
fb-helper:
- userspace regression revert
amdgpu:
- Fix RLC register offset
- GMC fix
- Properly cache SMU FW version on Yellow Carp
- Fix missing callback on DCN3.1
- Reset DMCUB before HW init
- Fix for GMC powergating on PCO
- Fix a possible memory leak in GPU metrics table handling on RN"
* tag 'drm-fixes-2021-12-17-1' of git://anongit.freedesktop.org/drm/drm:
drm/amd/pm: fix a potential gpu_metrics_table memory leak
drm/amdgpu: correct the wrong cached state for GMC on PICASSO
drm/amd/display: Reset DMCUB before HW init
drm/amd/display: Set exit_optimized_pwr_state for DCN31
drm/amd/pm: fix reading SMU FW version from amdgpu_firmware_info on YC
drm/amdgpu: don't override default ECO_BITs setting
drm/amdgpu: correct register access for RLC_JUMP_TABLE_RESTORE
drm/i915/display: Fix an unsigned subtraction which can never be negative.
drm/ast: potential dereference of null pointer
drm: simpledrm: fix wrong unit with pixel clock
Revert "drm/fb-helper: improve DRM fbdev emulation device names"
Greg Kroah-Hartman [Fri, 17 Dec 2021 15:24:30 +0000 (16:24 +0100)]
Revert "usb: early: convert to readl_poll_timeout_atomic()"
This reverts commit
796eed4b2342c9d6b26c958e92af91253a2390e1.
This change causes boot lockups when using "arlyprintk=xdbc" because
ktime can not be used at this point in time in the boot process. Also,
it is not needed for very small delays like this.
Reported-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Reported-by: Peter Zijlstra <peterz@infradead.org>
Cc: Jann Horn <jannh@google.com>
Cc: Chunfeng Yun <chunfeng.yun@mediatek.com>
Fixes:
796eed4b2342 ("usb: early: convert to readl_poll_timeout_atomic()")
Link: https://lore.kernel.org/r/c2b5c9bb-1b75-bf56-3754-b5b18812d65e@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Greg Kroah-Hartman [Fri, 17 Dec 2021 09:37:02 +0000 (10:37 +0100)]
Merge tag 'usb-serial-5.16-rc6' of https://git./linux/kernel/git/johan/usb-serial into usb-linus
Johan writes:
USB-serial fixes for 5.16-rc6
Here's a fix for a reported problem in the cp210x gpio-registration code
and some more modem device ids.
All have been in linux-next with no reported issues.
* tag 'usb-serial-5.16-rc6' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial:
USB: serial: option: add Telit FN990 compositions
USB: serial: cp210x: fix CP2105 GPIO registration
Damien Le Moal [Fri, 17 Dec 2021 07:41:17 +0000 (16:41 +0900)]
MAITAINERS: Change zonefs maintainer email address
Update my email address from damien.lemoal@wdc.com to
damien.lemoal@opensource.wdc.com.
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Naohiro Aota [Fri, 17 Dec 2021 06:15:45 +0000 (15:15 +0900)]
zonefs: add MODULE_ALIAS_FS
Add MODULE_ALIAS_FS() to load the module automatically when you do "mount
-t zonefs".
Fixes:
8dcc1a9d90c1 ("fs: New zonefs file system")
Cc: stable <stable@vger.kernel.org> # 5.6+
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: Johannes Thumshirn <jth@kernel.org>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Vincent Pelletier [Tue, 16 Nov 2021 23:57:42 +0000 (23:57 +0000)]
riscv: dts: sifive unmatched: Link the tmp451 with its power supply
Fixes the following probe warning:
lm90 0-004c: Looking up vcc-supply from device tree
lm90 0-004c: Looking up vcc-supply property in node /soc/i2c@
10030000/temperature-sensor@4c failed
lm90 0-004c: supply vcc not found, using dummy regulator
Signed-off-by: Vincent Pelletier <plr.vincent@gmail.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Vincent Pelletier [Tue, 16 Nov 2021 23:57:41 +0000 (23:57 +0000)]
riscv: dts: sifive unmatched: Fix regulator for board rev3
The existing values are rejected by the da9063 regulator driver, as they
are unachievable with the declared chip setup (non-merged vcore and bmem
are unable to provide the declared curent).
Fix voltages to match rev3 schematics, which also matches their boot-up
configuration within the chip's available precision.
Declare bcore1/bcore2 and bmem/bio as merged.
Set ldo09 and ldo10 as always-on as their consumers are not declared but
exist.
Drop ldo current limits as there is no current limit feature for these
regulators in the DA9063. Fixes warnings like:
DA9063_LDO3: Operation of current configuration missing
Signed-off-by: Vincent Pelletier <plr.vincent@gmail.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Vincent Pelletier [Tue, 16 Nov 2021 23:57:39 +0000 (23:57 +0000)]
riscv: dts: sifive unmatched: Expose the PMIC sub-functions
These sub-functions are available in the chip revision on this board, so
expose them.
Signed-off-by: Vincent Pelletier <plr.vincent@gmail.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Vincent Pelletier [Tue, 16 Nov 2021 23:57:38 +0000 (23:57 +0000)]
riscv: dts: sifive unmatched: Expose the board ID eeprom
Mark it as read-only as it is factory-programmed with identifying
information, and no executable nor configuration:
- eth MAC address
- board model (PCB version, BoM version)
- board serial number
Accidental modification would cause misidentification which could brick
the board, so marking read-only seem like both a safe and non-constraining
choice.
Signed-off-by: Vincent Pelletier <plr.vincent@gmail.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Vincent Pelletier [Tue, 16 Nov 2021 23:57:37 +0000 (23:57 +0000)]
riscv: dts: sifive unmatched: Name gpio lines
Follow the pin descriptions given in the version 3 of the board schematics.
Signed-off-by: Vincent Pelletier <plr.vincent@gmail.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Dave Airlie [Fri, 17 Dec 2021 05:01:00 +0000 (15:01 +1000)]
Merge tag 'amd-drm-fixes-5.16-2021-12-15' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-5.16-2021-12-15:
amdgpu:
- Fix RLC register offset
- GMC fix
- Properly cache SMU FW version on Yellow Carp
- Fix missing callback on DCN3.1
- Reset DMCUB before HW init
- Fix for GMC powergating on PCO
- Fix a possible memory leak in GPU metrics table handling on RN
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211216035239.5787-1-alexander.deucher@amd.com
Dave Airlie [Fri, 17 Dec 2021 04:16:57 +0000 (14:16 +1000)]
Merge tag 'drm-misc-fixes-2021-12-16-1' of ssh://git.freedesktop.org/git/drm/drm-misc into drm-fixes
One null pointer dereference fix for ast, a pixel clock unit fix for
simpledrm and a user-space regression revert for fb-helper
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20211216082603.pm6yzlckmxvwnqyv@houat
George Kennedy [Tue, 14 Dec 2021 14:45:10 +0000 (09:45 -0500)]
libata: if T_LENGTH is zero, dma direction should be DMA_NONE
Avoid data corruption by rejecting pass-through commands where
T_LENGTH is zero (No data is transferred) and the dma direction
is not DMA_NONE.
Cc: <stable@vger.kernel.org>
Reported-by: syzkaller<syzkaller@googlegroups.com>
Signed-off-by: George Kennedy<george.kennedy@oracle.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Linus Torvalds [Thu, 16 Dec 2021 23:24:46 +0000 (15:24 -0800)]
Merge tag 'audit-pr-
20211216' of git://git./linux/kernel/git/pcmoore/audit
Pull audit fix from Paul Moore:
"A single patch to fix a problem where the audit queue could grow
unbounded when the audit daemon is forcibly stopped"
* tag 'audit-pr-
20211216' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
audit: improve robustness of the audit queue handling
Linus Torvalds [Thu, 16 Dec 2021 23:02:14 +0000 (15:02 -0800)]
Merge tag 'net-5.16-rc6' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Networking fixes, including fixes from mac80211, wifi, bpf.
Relatively large batches of fixes from BPF and the WiFi stack, calm in
general networking.
Current release - regressions:
- dpaa2-eth: fix buffer overrun when reporting ethtool statistics
Current release - new code bugs:
- bpf: fix incorrect state pruning for <8B spill/fill
- iavf:
- add missing unlocks in iavf_watchdog_task()
- do not override the adapter state in the watchdog task (again)
- mlxsw: spectrum_router: consolidate MAC profiles when possible
Previous releases - regressions:
- mac80211 fixes:
- rate control, avoid driver crash for retransmitted frames
- regression in SSN handling of addba tx
- a memory leak where sta_info is not freed
- marking TX-during-stop for TX in in_reconfig, prevent stall
- cfg80211: acquire wiphy mutex on regulatory work
- wifi drivers: fix build regressions and LED config dependency
- virtio_net: fix rx_drops stat for small pkts
- dsa: mv88e6xxx: unforce speed & duplex in mac_link_down()
Previous releases - always broken:
- bpf fixes:
- kernel address leakage in atomic fetch
- kernel address leakage in atomic cmpxchg's r0 aux reg
- signed bounds propagation after mov32
- extable fixup offset
- extable address check
- mac80211:
- fix the size used for building probe request
- send ADDBA requests using the tid/queue of the aggregation
session
- agg-tx: don't schedule_and_wake_txq() under sta->lock, avoid
deadlocks
- validate extended element ID is present
- mptcp:
- never allow the PM to close a listener subflow (null-defer)
- clear 'kern' flag from fallback sockets, prevent crash
- fix deadlock in __mptcp_push_pending()
- inet_diag: fix kernel-infoleak for UDP sockets
- xsk: do not sleep in poll() when need_wakeup set
- smc: avoid very long waits in smc_release()
- sch_ets: don't remove idle classes from the round-robin list
- netdevsim:
- zero-initialize memory for bpf map's value, prevent info leak
- don't let user space overwrite read only (max) ethtool parms
- ixgbe: set X550 MDIO speed before talking to PHY
- stmmac:
- fix null-deref in flower deletion w/ VLAN prio Rx steering
- dwmac-rk: fix oob read in rk_gmac_setup
- ice: time stamping fixes
- systemport: add global locking for descriptor life cycle"
* tag 'net-5.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (89 commits)
bpf, selftests: Fix racing issue in btf_skc_cls_ingress test
selftest/bpf: Add a test that reads various addresses.
bpf: Fix extable address check.
bpf: Fix extable fixup offset.
bpf, selftests: Add test case trying to taint map value pointer
bpf: Make 32->64 bounds propagation slightly more robust
bpf: Fix signed bounds propagation after mov32
sit: do not call ipip6_dev_free() from sit_init_net()
net: systemport: Add global locking for descriptor lifecycle
net/smc: Prevent smc_release() from long blocking
net: Fix double 0x prefix print in SKB dump
virtio_net: fix rx_drops stat for small pkts
dsa: mv88e6xxx: fix debug print for SPEED_UNFORCED
sfc_ef100: potential dereference of null pointer
net: stmmac: dwmac-rk: fix oob read in rk_gmac_setup
net: usb: lan78xx: add Allied Telesis AT29M2-AF
net/packet: rx_owner_map depends on pg_vec
netdevsim: Zero-initialize memory for new map's value in function nsim_bpf_map_alloc
dpaa2-eth: fix ethtool statistics
ixgbe: set X550 MDIO speed before talking to PHY
...
Linus Torvalds [Thu, 16 Dec 2021 22:48:57 +0000 (14:48 -0800)]
Merge tag 'soc-fixes-5.16-3' of git://git./linux/kernel/git/soc/soc
Pull ARM SoC fixes from Arnd Bergmann:
"There are a number of DT fixes, mostly for mistakes found through
static checking of the dts files again, as well as a couple of minor
changes to address incorrect DT settings.
For i.MX, there is yet another series of devitree changes to update
RGMII delay settings for ethernet, which is an ongoing problem after
some driver changes.
For SoC specific device drivers, a number of smaller fixes came up:
- i.MX SoC identification was incorrectly registered non-i.MX
machines when the driver is built-in
- One fix on imx8m-blk-ctrl driver to get i.MX8MM MIPI reset work
properly
- a few compile fixes for warnings that get in the way of -Werror
- a string overflow in the scpi firmware driver
- a boot failure with FORTIFY_SOURCE on Rockchips machines
- broken error handling in the AMD TEE driver
- a revert for a tegra reset driver commit that broke HDA"
* tag 'soc-fixes-5.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (25 commits)
soc/tegra: fuse: Fix bitwise vs. logical OR warning
firmware: arm_scpi: Fix string overflow in SCPI genpd driver
soc: imx: Register SoC device only on i.MX boards
soc: imx: imx8m-blk-ctrl: Fix imx8mm mipi reset
ARM: dts: imx6ull-pinfunc: Fix CSI_DATA07__ESAI_TX0 pad name
arm64: dts: imx8mq: remove interconnect property from lcdif
ARM: socfpga: dts: fix qspi node compatible
arm64: dts: apple: add #interrupt-cells property to pinctrl nodes
dt-bindings: i2c: apple,i2c: allow multiple compatibles
arm64: meson: remove COMMON_CLK
arm64: meson: fix dts for JetHub D1
tee: amdtee: fix an IS_ERR() vs NULL bug
arm64: dts: apple: change ethernet0 device type to ethernet
arm64: dts: ten64: remove redundant interrupt declaration for gpio-keys
arm64: dts: rockchip: fix poweroff on helios64
arm64: dts: rockchip: fix audio-supply for Rock Pi 4
arm64: dts: rockchip: fix rk3399-leez-p710 vcc3v3-lan supply
arm64: dts: rockchip: fix rk3308-roc-cc vcc-sd supply
arm64: dts: rockchip: remove mmc-hs400-enhanced-strobe from rk3399-khadas-edge
ARM: rockchip: Use memcpy_toio instead of memcpy on smp bring-up
...
Scott Mayhew [Wed, 15 Dec 2021 21:28:40 +0000 (16:28 -0500)]
selinux: fix sleeping function called from invalid context
selinux_sb_mnt_opts_compat() is called via sget_fc() under the sb_lock
spinlock, so it can't use GFP_KERNEL allocations:
[ 868.565200] BUG: sleeping function called from invalid context at
include/linux/sched/mm.h:230
[ 868.568246] in_atomic(): 1, irqs_disabled(): 0,
non_block: 0, pid: 4914, name: mount.nfs
[ 868.569626] preempt_count: 1, expected: 0
[ 868.570215] RCU nest depth: 0, expected: 0
[ 868.570809] Preemption disabled at:
[ 868.570810] [<
0000000000000000>] 0x0
[ 868.571848] CPU: 1 PID: 4914 Comm: mount.nfs Kdump: loaded
Tainted: G W 5.16.0-rc5.
2585cf9dfa #1
[ 868.573273] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
BIOS 1.14.0-4.fc34 04/01/2014
[ 868.574478] Call Trace:
[ 868.574844] <TASK>
[ 868.575156] dump_stack_lvl+0x34/0x44
[ 868.575692] __might_resched.cold+0xd6/0x10f
[ 868.576308] slab_pre_alloc_hook.constprop.0+0x89/0xf0
[ 868.577046] __kmalloc_track_caller+0x72/0x420
[ 868.577684] ? security_context_to_sid_core+0x48/0x2b0
[ 868.578569] kmemdup_nul+0x22/0x50
[ 868.579108] security_context_to_sid_core+0x48/0x2b0
[ 868.579854] ? _nfs4_proc_pathconf+0xff/0x110 [nfsv4]
[ 868.580742] ? nfs_reconfigure+0x80/0x80 [nfs]
[ 868.581355] security_context_str_to_sid+0x36/0x40
[ 868.581960] selinux_sb_mnt_opts_compat+0xb5/0x1e0
[ 868.582550] ? nfs_reconfigure+0x80/0x80 [nfs]
[ 868.583098] security_sb_mnt_opts_compat+0x2a/0x40
[ 868.583676] nfs_compare_super+0x113/0x220 [nfs]
[ 868.584249] ? nfs_try_mount_request+0x210/0x210 [nfs]
[ 868.584879] sget_fc+0xb5/0x2f0
[ 868.585267] nfs_get_tree_common+0x91/0x4a0 [nfs]
[ 868.585834] vfs_get_tree+0x25/0xb0
[ 868.586241] fc_mount+0xe/0x30
[ 868.586605] do_nfs4_mount+0x130/0x380 [nfsv4]
[ 868.587160] nfs4_try_get_tree+0x47/0xb0 [nfsv4]
[ 868.587724] vfs_get_tree+0x25/0xb0
[ 868.588193] do_new_mount+0x176/0x310
[ 868.588782] __x64_sys_mount+0x103/0x140
[ 868.589388] do_syscall_64+0x3b/0x90
[ 868.589935] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 868.590699] RIP: 0033:0x7f2b371c6c4e
[ 868.591239] Code: 48 8b 0d dd 71 0e 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e
0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 a5 00
00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d aa 71
0e 00 f7 d8 64 89 01 48
[ 868.593810] RSP: 002b:
00007ffc83775d88 EFLAGS:
00000246
ORIG_RAX:
00000000000000a5
[ 868.594691] RAX:
ffffffffffffffda RBX:
00007ffc83775f10 RCX:
00007f2b371c6c4e
[ 868.595504] RDX:
0000555d517247a0 RSI:
0000555d51724700 RDI:
0000555d51724540
[ 868.596317] RBP:
00007ffc83775f10 R08:
0000555d51726890 R09:
0000555d51726890
[ 868.597162] R10:
0000000000000000 R11:
0000000000000246 R12:
0000555d51726890
[ 868.598005] R13:
0000000000000003 R14:
0000555d517246e0 R15:
0000555d511ac925
[ 868.598826] </TASK>
Cc: stable@vger.kernel.org
Fixes:
69c4a42d72eb ("lsm,selinux: add new hook to compare new mount to an existing mount")
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
[PM: cleanup/line-wrap the backtrace]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Jakub Kicinski [Thu, 16 Dec 2021 21:06:49 +0000 (13:06 -0800)]
Merge https://git./linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2021-12-16
We've added 15 non-merge commits during the last 7 day(s) which contain
a total of 12 files changed, 434 insertions(+), 30 deletions(-).
The main changes are:
1) Fix incorrect verifier state pruning behavior for <8B register spill/fill,
from Paul Chaignon.
2) Fix x86-64 JIT's extable handling for fentry/fexit when return pointer
is an ERR_PTR(), from Alexei Starovoitov.
3) Fix 3 different possibilities that BPF verifier missed where unprivileged
could leak kernel addresses, from Daniel Borkmann.
4) Fix xsk's poll behavior under need_wakeup flag, from Magnus Karlsson.
5) Fix an oob-write in test_verifier due to a missed MAX_NR_MAPS bump,
from Kumar Kartikeya Dwivedi.
6) Fix a race in test_btf_skc_cls_ingress selftest, from Martin KaFai Lau.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf, selftests: Fix racing issue in btf_skc_cls_ingress test
selftest/bpf: Add a test that reads various addresses.
bpf: Fix extable address check.
bpf: Fix extable fixup offset.
bpf, selftests: Add test case trying to taint map value pointer
bpf: Make 32->64 bounds propagation slightly more robust
bpf: Fix signed bounds propagation after mov32
bpf, selftests: Update test case for atomic cmpxchg on r0 with pointer
bpf: Fix kernel address leakage in atomic cmpxchg's r0 aux reg
bpf, selftests: Add test case for atomic fetch on spilled pointer
bpf: Fix kernel address leakage in atomic fetch
selftests/bpf: Fix OOB write in test_verifier
xsk: Do not sleep in poll() when need_wakeup set
selftests/bpf: Tests for state pruning with u32 spill/fill
bpf: Fix incorrect state pruning for <8B spill/fill
====================
Link: https://lore.kernel.org/r/20211216210005.13815-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Martin KaFai Lau [Thu, 16 Dec 2021 19:16:30 +0000 (11:16 -0800)]
bpf, selftests: Fix racing issue in btf_skc_cls_ingress test
The libbpf CI reported occasional failure in btf_skc_cls_ingress:
test_syncookie:FAIL:Unexpected syncookie states gen_cookie:
80326634 recv_cookie:0
bpf prog error at line 97
"error at line 97" means the bpf prog cannot find the listening socket
when the final ack is received. It then skipped processing
the syncookie in the final ack which then led to "recv_cookie:0".
The problem is the userspace program did not do accept() and went
ahead to close(listen_fd) before the kernel (and the bpf prog) had
a chance to process the final ack.
The fix is to add accept() call so that the userspace will wait for
the kernel to finish processing the final ack first before close()-ing
everything.
Fixes:
9a856cae2217 ("bpf: selftest: Add test_btf_skc_cls_ingress")
Reported-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20211216191630.466151-1-kafai@fb.com
Alexei Starovoitov [Wed, 15 Dec 2021 20:35:34 +0000 (12:35 -0800)]
selftest/bpf: Add a test that reads various addresses.
Add a function to bpf_testmod that returns invalid kernel and user addresses.
Then attach an fexit program to that function that tries to read
memory through these addresses.
This logic checks that bpf_probe_read_kernel and BPF_PROBE_MEM logic is sane.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Alexei Starovoitov [Wed, 15 Dec 2021 03:25:13 +0000 (19:25 -0800)]
bpf: Fix extable address check.
The verifier checks that PTR_TO_BTF_ID pointer is either valid or NULL,
but it cannot distinguish IS_ERR pointer from valid one.
When offset is added to IS_ERR pointer it may become small positive
value which is a user address that is not handled by extable logic
and has to be checked for at the runtime.
Tighten BPF_PROBE_MEM pointer check code to prevent this case.
Fixes:
4c5de127598e ("bpf: Emit explicit NULL pointer checks for PROBE_LDX instructions.")
Reported-by: Lorenzo Fontana <lorenzo.fontana@elastic.co>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Alexei Starovoitov [Thu, 16 Dec 2021 02:38:30 +0000 (18:38 -0800)]
bpf: Fix extable fixup offset.
The prog - start_of_ldx is the offset before the faulting ldx to the location
after it, so this will be used to adjust pt_regs->ip for jumping over it and
continuing, and with old temp it would have been fixed up to the wrong offset,
causing crash.
Fixes:
4c5de127598e ("bpf: Emit explicit NULL pointer checks for PROBE_LDX instructions.")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Linus Torvalds [Thu, 16 Dec 2021 19:48:59 +0000 (11:48 -0800)]
Merge tag 'clk-fixes-for-linus' of git://git./linux/kernel/git/clk/linux
Pull clk fix from Stephen Boyd:
"A single fix for the clk framework that needed some more bake time in
linux-next.
The problem is that two clks being registered at the same time can
lead to a busted clk tree if the parent isn't fully registered by the
time the child finds the parent. We rejigger the place where we mark
the parent as fully registered so that the child can't find the parent
until things are proper"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: Don't parent clks until the parent is fully registered
Daniel Borkmann [Wed, 15 Dec 2021 23:48:54 +0000 (23:48 +0000)]
bpf, selftests: Add test case trying to taint map value pointer
Add a test case which tries to taint map value pointer arithmetic into a
unknown scalar with subsequent export through the map.
Before fix:
# ./test_verifier 1186
#1186/u map access: trying to leak tained dst reg FAIL
Unexpected success to load!
verification time 24 usec
stack depth 8
processed 15 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1
#1186/p map access: trying to leak tained dst reg FAIL
Unexpected success to load!
verification time 8 usec
stack depth 8
processed 15 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1
Summary: 0 PASSED, 0 SKIPPED, 2 FAILED
After fix:
# ./test_verifier 1186
#1186/u map access: trying to leak tained dst reg OK
#1186/p map access: trying to leak tained dst reg OK
Summary: 2 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Daniel Borkmann [Wed, 15 Dec 2021 22:28:48 +0000 (22:28 +0000)]
bpf: Make 32->64 bounds propagation slightly more robust
Make the bounds propagation in __reg_assign_32_into_64() slightly more
robust and readable by aligning it similarly as we did back in the
__reg_combine_64_into_32() counterpart. Meaning, only propagate or
pessimize them as a smin/smax pair.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Daniel Borkmann [Wed, 15 Dec 2021 22:02:19 +0000 (22:02 +0000)]
bpf: Fix signed bounds propagation after mov32
For the case where both s32_{min,max}_value bounds are positive, the
__reg_assign_32_into_64() directly propagates them to their 64 bit
counterparts, otherwise it pessimises them into [0,u32_max] universe and
tries to refine them later on by learning through the tnum as per comment
in mentioned function. However, that does not always happen, for example,
in mov32 operation we call zext_32_to_64(dst_reg) which invokes the
__reg_assign_32_into_64() as is without subsequent bounds update as
elsewhere thus no refinement based on tnum takes place.
Thus, not calling into the __update_reg_bounds() / __reg_deduce_bounds() /
__reg_bound_offset() triplet as we do, for example, in case of ALU ops via
adjust_scalar_min_max_vals(), will lead to more pessimistic bounds when
dumping the full register state:
Before fix:
0: (b4) w0 = -1
1: R0_w=invP4294967295
(id=0,imm=
ffffffff,
smin_value=
4294967295,smax_value=
4294967295,
umin_value=
4294967295,umax_value=
4294967295,
var_off=(0xffffffff; 0x0),
s32_min_value=-1,s32_max_value=-1,
u32_min_value=-1,u32_max_value=-1)
1: (bc) w0 = w0
2: R0_w=invP4294967295
(id=0,imm=
ffffffff,
smin_value=0,smax_value=
4294967295,
umin_value=
4294967295,umax_value=
4294967295,
var_off=(0xffffffff; 0x0),
s32_min_value=-1,s32_max_value=-1,
u32_min_value=-1,u32_max_value=-1)
Technically, the smin_value=0 and smax_value=
4294967295 bounds are not
incorrect, but given the register is still a constant, they break assumptions
about const scalars that smin_value == smax_value and umin_value == umax_value.
After fix:
0: (b4) w0 = -1
1: R0_w=invP4294967295
(id=0,imm=
ffffffff,
smin_value=
4294967295,smax_value=
4294967295,
umin_value=
4294967295,umax_value=
4294967295,
var_off=(0xffffffff; 0x0),
s32_min_value=-1,s32_max_value=-1,
u32_min_value=-1,u32_max_value=-1)
1: (bc) w0 = w0
2: R0_w=invP4294967295
(id=0,imm=
ffffffff,
smin_value=
4294967295,smax_value=
4294967295,
umin_value=
4294967295,umax_value=
4294967295,
var_off=(0xffffffff; 0x0),
s32_min_value=-1,s32_max_value=-1,
u32_min_value=-1,u32_max_value=-1)
Without the smin_value == smax_value and umin_value == umax_value invariant
being intact for const scalars, it is possible to leak out kernel pointers
from unprivileged user space if the latter is enabled. For example, when such
registers are involved in pointer arithmtics, then adjust_ptr_min_max_vals()
will taint the destination register into an unknown scalar, and the latter
can be exported and stored e.g. into a BPF map value.
Fixes:
3f50f132d840 ("bpf: Verifier, do explicit ALU32 bounds tracking")
Reported-by: Kuee K1r0a <liulin063@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Linus Torvalds [Thu, 16 Dec 2021 18:44:20 +0000 (10:44 -0800)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 fix from Catalin Marinas:
"Fix missing error code on kexec failure path"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: kexec: Fix missing error code 'ret' warning in load_other_segments()
Linus Torvalds [Thu, 16 Dec 2021 18:05:49 +0000 (10:05 -0800)]
Merge tag 'for-5.16/dm-fixes' of git://git./linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- Fix use after free in DM btree remove's rebalance_children()
- Fix DM integrity data corruption, introduced during 5.16 merge, due
to improper use of bvec_kmap_local()
* tag 'for-5.16/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm integrity: fix data corruption due to improper use of bvec_kmap_local
dm btree remove: fix use after free in rebalance_children()
Lakshmi Ramasubramanian [Fri, 10 Dec 2021 01:01:21 +0000 (17:01 -0800)]
arm64: kexec: Fix missing error code 'ret' warning in load_other_segments()
Since commit
ac10be5cdbfa ("arm64: Use common
of_kexec_alloc_and_setup_fdt()"), smatch reports the following warning:
arch/arm64/kernel/machine_kexec_file.c:152 load_other_segments()
warn: missing error code 'ret'
Return code is not set to an error code in load_other_segments() when
of_kexec_alloc_and_setup_fdt() call returns a NULL dtb. This results
in status success (return code set to 0) being returned from
load_other_segments().
Set return code to -EINVAL if of_kexec_alloc_and_setup_fdt() returns
NULL dtb.
Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes:
ac10be5cdbfa ("arm64: Use common of_kexec_alloc_and_setup_fdt()")
Link: https://lore.kernel.org/r/20211210010121.101823-1-nramas@linux.microsoft.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
David Howells [Tue, 14 Dec 2021 09:22:12 +0000 (09:22 +0000)]
afs: Fix mmap
Fix afs_add_open_map() to check that the vnode isn't already on the list
when it adds it. It's possible that afs_drop_open_mmap() decremented
the cb_nr_mmap counter, but hadn't yet got into the locked section to
remove it.
Also vnode->cb_mmap_link should be initialised, so fix that too.
Fixes:
6e0e99d58a65 ("afs: Fix mmap coherency vs 3rd-party changes")
Reported-by: kafs-testing+fedora34_64checkkafs-build-300@auristor.com
Suggested-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: kafs-testing+fedora34_64checkkafs-build-300@auristor.com
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/686465.1639435380@warthog.procyon.org.uk/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric Dumazet [Thu, 16 Dec 2021 11:17:41 +0000 (03:17 -0800)]
sit: do not call ipip6_dev_free() from sit_init_net()
ipip6_dev_free is sit dev->priv_destructor, already called
by register_netdevice() if something goes wrong.
Alternative would be to make ipip6_dev_free() robust against
multiple invocations, but other drivers do not implement this
strategy.
syzbot reported:
dst_release underflow
WARNING: CPU: 0 PID: 5059 at net/core/dst.c:173 dst_release+0xd8/0xe0 net/core/dst.c:173
Modules linked in:
CPU: 1 PID: 5059 Comm: syz-executor.4 Not tainted 5.16.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:dst_release+0xd8/0xe0 net/core/dst.c:173
Code: 4c 89 f2 89 d9 31 c0 5b 41 5e 5d e9 da d5 44 f9 e8 1d 90 5f f9 c6 05 87 48 c6 05 01 48 c7 c7 80 44 99 8b 31 c0 e8 e8 67 29 f9 <0f> 0b eb 85 0f 1f 40 00 53 48 89 fb e8 f7 8f 5f f9 48 83 c3 a8 48
RSP: 0018:
ffffc9000aa5faa0 EFLAGS:
00010246
RAX:
d6894a925dd15a00 RBX:
00000000ffffffff RCX:
0000000000040000
RDX:
ffffc90005e19000 RSI:
000000000003ffff RDI:
0000000000040000
RBP:
0000000000000000 R08:
ffffffff816a1f42 R09:
ffffed1017344f2c
R10:
ffffed1017344f2c R11:
0000000000000000 R12:
0000607f462b1358
R13:
1ffffffff1bfd305 R14:
ffffe8ffffcb1358 R15:
dffffc0000000000
FS:
00007f66c71a2700(0000) GS:
ffff8880b9a00000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
00007f88aaed5058 CR3:
0000000023e0f000 CR4:
00000000003506f0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
Call Trace:
<TASK>
dst_cache_destroy+0x107/0x1e0 net/core/dst_cache.c:160
ipip6_dev_free net/ipv6/sit.c:1414 [inline]
sit_init_net+0x229/0x550 net/ipv6/sit.c:1936
ops_init+0x313/0x430 net/core/net_namespace.c:140
setup_net+0x35b/0x9d0 net/core/net_namespace.c:326
copy_net_ns+0x359/0x5c0 net/core/net_namespace.c:470
create_new_namespaces+0x4ce/0xa00 kernel/nsproxy.c:110
unshare_nsproxy_namespaces+0x11e/0x180 kernel/nsproxy.c:226
ksys_unshare+0x57d/0xb50 kernel/fork.c:3075
__do_sys_unshare kernel/fork.c:3146 [inline]
__se_sys_unshare kernel/fork.c:3144 [inline]
__x64_sys_unshare+0x34/0x40 kernel/fork.c:3144
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f66c882ce99
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:
00007f66c71a2168 EFLAGS:
00000246 ORIG_RAX:
0000000000000110
RAX:
ffffffffffffffda RBX:
00007f66c893ff60 RCX:
00007f66c882ce99
RDX:
0000000000000000 RSI:
0000000000000000 RDI:
0000000048040200
RBP:
00007f66c8886ff1 R08:
0000000000000000 R09:
0000000000000000
R10:
0000000000000000 R11:
0000000000000246 R12:
0000000000000000
R13:
00007fff6634832f R14:
00007f66c71a2300 R15:
0000000000022000
</TASK>
Fixes:
cf124db566e6 ("net: Fix inconsistent teardown and release of private netdev state.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Link: https://lore.kernel.org/r/20211216111741.1387540-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Florian Fainelli [Wed, 15 Dec 2021 20:24:49 +0000 (12:24 -0800)]
net: systemport: Add global locking for descriptor lifecycle
The descriptor list is a shared resource across all of the transmit queues, and
the locking mechanism used today only protects concurrency across a given
transmit queue between the transmit and reclaiming. This creates an opportunity
for the SYSTEMPORT hardware to work on corrupted descriptors if we have
multiple producers at once which is the case when using multiple transmit
queues.
This was particularly noticeable when using multiple flows/transmit queues and
it showed up in interesting ways in that UDP packets would get a correct UDP
header checksum being calculated over an incorrect packet length. Similarly TCP
packets would get an equally correct checksum computed by the hardware over an
incorrect packet length.
The SYSTEMPORT hardware maintains an internal descriptor list that it re-arranges
when the driver produces a new descriptor anytime it writes to the
WRITE_PORT_{HI,LO} registers, there is however some delay in the hardware to
re-organize its descriptors and it is possible that concurrent TX queues
eventually break this internal allocation scheme to the point where the
length/status part of the descriptor gets used for an incorrect data buffer.
The fix is to impose a global serialization for all TX queues in the short
section where we are writing to the WRITE_PORT_{HI,LO} registers which solves
the corruption even with multiple concurrent TX queues being used.
Fixes:
80105befdb4b ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC driver")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20211215202450.4086240-1-f.fainelli@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
D. Wythe [Wed, 15 Dec 2021 12:29:21 +0000 (20:29 +0800)]
net/smc: Prevent smc_release() from long blocking
In nginx/wrk benchmark, there's a hung problem with high probability
on case likes that: (client will last several minutes to exit)
server: smc_run nginx
client: smc_run wrk -c 10000 -t 1 http://server
Client hangs with the following backtrace:
0 [ffffa7ce8Of3bbf8] __schedule at ffffffff9f9eOd5f
1 [ffffa7ce8Of3bc88] schedule at ffffffff9f9eløe6
2 [ffffa7ce8Of3bcaO] schedule_timeout at
ffffffff9f9e3f3c
3 [ffffa7ce8Of3bd2O] wait_for_common at ffffffff9f9el9de
4 [ffffa7ce8Of3bd8O] __flush_work at ffffffff9fOfeOl3
5 [ffffa7ce8øf3bdfO] smc_release at ffffffffcO697d24 [smc]
6 [ffffa7ce8Of3be2O] __sock_release at ffffffff9f8O2e2d
7 [ffffa7ce8Of3be4ø] sock_close at ffffffff9f8ø2ebl
8 [ffffa7ce8øf3be48] __fput at
ffffffff9f334f93
9 [ffffa7ce8Of3be78] task_work_run at ffffffff9flOlff5
10 [ffffa7ce8Of3beaO] do_exit at ffffffff9fOe5Ol2
11 [ffffa7ce8Of3bflO] do_group_exit at ffffffff9fOe592a
12 [ffffa7ce8Of3bf38] __x64_sys_exit_group at ffffffff9fOe5994
13 [ffffa7ce8Of3bf4O] do_syscall_64 at
ffffffff9f9d4373
14 [ffffa7ce8Of3bfsO] entry_SYSCALL_64_after_hwframe at
ffffffff9fa0007c
This issue dues to flush_work(), which is used to wait for
smc_connect_work() to finish in smc_release(). Once lots of
smc_connect_work() was pending or all executing work dangling,
smc_release() has to block until one worker comes to free, which
is equivalent to wait another smc_connnect_work() to finish.
In order to fix this, There are two changes:
1. For those idle smc_connect_work(), cancel it from the workqueue; for
executing smc_connect_work(), waiting for it to finish. For that
purpose, replace flush_work() with cancel_work_sync().
2. Since smc_connect() hold a reference for passive closing, if
smc_connect_work() has been cancelled, release the reference.
Fixes:
24ac3a08e658 ("net/smc: rebuild nonblocking connect")
Reported-by: Tony Lu <tonylu@linux.alibaba.com>
Tested-by: Dust Li <dust.li@linux.alibaba.com>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Acked-by: Karsten Graul <kgraul@linux.ibm.com>
Link: https://lore.kernel.org/r/1639571361-101128-1-git-send-email-alibuda@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Arnd Bergmann [Thu, 16 Dec 2021 14:02:25 +0000 (15:02 +0100)]
Merge tag 'tegra-for-5.16-soc-fixes' of git://git./linux/kernel/git/tegra/linux into arm/fixes
soc/tegra: Fixes for v5.16-rc6
This contains a single build fix without which ARM allmodconfig builds
are broken if -Werror is enabled.
* tag 'tegra-for-5.16-soc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux:
soc/tegra: fuse: Fix bitwise vs. logical OR warning
Link: https://lore.kernel.org/r/20211215162618.3568474-1-thierry.reding@gmail.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Gal Pressman [Thu, 16 Dec 2021 09:28:25 +0000 (11:28 +0200)]
net: Fix double 0x prefix print in SKB dump
When printing netdev features %pNF already takes care of the 0x prefix,
remove the explicit one.
Fixes:
6413139dfc64 ("skbuff: increase verbosity when dumping skb data")
Signed-off-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>