Shakeel Butt [Thu, 5 Apr 2018 23:21:50 +0000 (16:21 -0700)]
slab, slub: remove size disparity on debug kernel
I have noticed on debug kernel with SLAB, the size of some non-root
slabs were larger than their corresponding root slabs.
e.g. for radix_tree_node:
$cat /proc/slabinfo | grep radix
name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> ...
radix_tree_node 15052 15075 4096 1 1 ...
$cat /cgroup/memory/temp/memory.kmem.slabinfo | grep radix
name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> ...
radix_tree_node 1581 158 4120 1 2 ...
However for SLUB in debug kernel, the sizes were same. On further
inspection it is found that SLUB always use kmem_cache.object_size to
measure the kmem_cache.size while SLAB use the given kmem_cache.size.
In the debug kernel the slab's size can be larger than its object_size.
Thus in the creation of non-root slab, the SLAB uses the root's size as
base to calculate the non-root slab's size and thus non-root slab's size
can be larger than the root slab's size. For SLUB, the non-root slab's
size is measured based on the root's object_size and thus the size will
remain same for root and non-root slab.
This patch makes slab's object_size the default base to measure the
slab's size.
Link: http://lkml.kernel.org/r/20180313165428.58699-1-shakeelb@google.com
Fixes:
794b1248be4e ("memcg, slab: separate memcg vs root cache creation paths")
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Thu, 5 Apr 2018 23:21:46 +0000 (16:21 -0700)]
slab: use 32-bit arithmetic in freelist_randomize()
SLAB doesn't support 4GB+ of objects per slab, therefore randomization
doesn't need size_t.
Link: http://lkml.kernel.org/r/20180305200730.15812-25-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Thu, 5 Apr 2018 23:21:43 +0000 (16:21 -0700)]
slub: make size_from_object() return unsigned int
Function returns size of the object without red zone which can't be
negative.
Link: http://lkml.kernel.org/r/20180305200730.15812-24-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Thu, 5 Apr 2018 23:21:39 +0000 (16:21 -0700)]
slub: make struct kmem_cache_order_objects::x unsigned int
struct kmem_cache_order_objects is for mixing order and number of
objects, and orders aren't big enough to warrant 64-bit width.
Propagate unsignedness down so that everything fits.
!!! Patch assumes that "PAGE_SIZE << order" doesn't overflow. !!!
Link: http://lkml.kernel.org/r/20180305200730.15812-23-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Thu, 5 Apr 2018 23:21:35 +0000 (16:21 -0700)]
slub: make slab_index() return unsigned int
slab_index() returns index of an object within a slab which is at most
u15 (or u16?).
Iterators additionally guarantee that "p >= addr".
Link: http://lkml.kernel.org/r/20180305200730.15812-22-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Thu, 5 Apr 2018 23:21:31 +0000 (16:21 -0700)]
slab: make usercopy region 32-bit
If kmem case sizes are 32-bit, then usecopy region should be too.
Link: http://lkml.kernel.org/r/20180305200730.15812-21-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: David Miller <davem@davemloft.net>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Thu, 5 Apr 2018 23:21:28 +0000 (16:21 -0700)]
kasan: make kasan_cache_create() work with 32-bit slab cache sizes
If SLAB doesn't support 4GB+ kmem caches (it never did), KASAN should
not do it as well.
Link: http://lkml.kernel.org/r/20180305200730.15812-20-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Thu, 5 Apr 2018 23:21:24 +0000 (16:21 -0700)]
slab: make kmem_cache_flags accept 32-bit object size
Now that all sizes are properly typed, propagate "unsigned int" down the
callgraph.
Link: http://lkml.kernel.org/r/20180305200730.15812-19-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Thu, 5 Apr 2018 23:21:20 +0000 (16:21 -0700)]
slub: make ->size unsigned int
Linux doesn't support negative length objects (including meta data).
Link: http://lkml.kernel.org/r/20180305200730.15812-18-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Thu, 5 Apr 2018 23:21:17 +0000 (16:21 -0700)]
slub: make ->object_size unsigned int
Linux doesn't support negative length objects.
Link: http://lkml.kernel.org/r/20180305200730.15812-17-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Thu, 5 Apr 2018 23:21:13 +0000 (16:21 -0700)]
slub: make ->offset unsigned int
->offset is free pointer offset from the start of the object, can't be
negative.
Link: http://lkml.kernel.org/r/20180305200730.15812-16-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Thu, 5 Apr 2018 23:21:10 +0000 (16:21 -0700)]
slub: make ->cpu_partial unsigned int
/*
* cpu_partial determined the maximum number of objects
* kept in the per cpu partial lists of a processor.
*/
Can't be negative.
Link: http://lkml.kernel.org/r/20180305200730.15812-15-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Thu, 5 Apr 2018 23:21:06 +0000 (16:21 -0700)]
slub: make ->inuse unsigned int
->inuse is "the number of bytes in actual use by the object",
can't be negative.
Link: http://lkml.kernel.org/r/20180305200730.15812-14-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Thu, 5 Apr 2018 23:21:02 +0000 (16:21 -0700)]
slub: make ->align unsigned int
Kmem cache alignment can't be negative.
Link: http://lkml.kernel.org/r/20180305200730.15812-13-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Thu, 5 Apr 2018 23:20:58 +0000 (16:20 -0700)]
slub: make ->reserved unsigned int
->reserved is either 0 or sizeof(struct rcu_head), can't be negative.
Link: http://lkml.kernel.org/r/20180305200730.15812-12-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Thu, 5 Apr 2018 23:20:55 +0000 (16:20 -0700)]
slub: make ->red_left_pad unsigned int
Padding length can't be negative.
Link: http://lkml.kernel.org/r/20180305200730.15812-11-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Thu, 5 Apr 2018 23:20:51 +0000 (16:20 -0700)]
slub: make ->max_attr_size unsigned int
->max_attr_size is maximum length of every SLAB memcg attribute
ever written. VFS limits those to INT_MAX.
Link: http://lkml.kernel.org/r/20180305200730.15812-10-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Thu, 5 Apr 2018 23:20:48 +0000 (16:20 -0700)]
slub: make ->remote_node_defrag_ratio unsigned int
->remote_node_defrag_ratio is in range 0..1000.
This also adds a check and modifies the behavior to return an error
code. Before this patch invalid values were ignored.
Link: http://lkml.kernel.org/r/20180305200730.15812-9-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Thu, 5 Apr 2018 23:20:44 +0000 (16:20 -0700)]
slab: make size_index_elem() unsigned int
size_index_elem() always works with small sizes (kmalloc caches are
32-bit) and returns small indexes.
Link: http://lkml.kernel.org/r/20180305200730.15812-8-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Thu, 5 Apr 2018 23:20:40 +0000 (16:20 -0700)]
slab: make size_index[] array u8
All those small numbers are reverse indexes into kmalloc caches array
and can't be negative.
On x86_64 "unsigned int = fls()" can drop CDQE instruction:
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-2 (-2)
Function old new delta
kmalloc_slab 101 99 -2
Link: http://lkml.kernel.org/r/20180305200730.15812-7-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Thu, 5 Apr 2018 23:20:37 +0000 (16:20 -0700)]
slab: make kmem_cache_create() work with 32-bit sizes
struct kmem_cache::size and ::align were always 32-bit.
Out of curiosity I created 4GB kmem_cache, it oopsed with division by 0.
kmem_cache_create(1UL<<32+1) created 1-byte cache as expected.
size_t doesn't work and never did.
Link: http://lkml.kernel.org/r/20180305200730.15812-6-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Thu, 5 Apr 2018 23:20:33 +0000 (16:20 -0700)]
slab: make create_boot_cache() work with 32-bit sizes
struct kmem_cache::size has always been "int", all those
"size_t size" are fake.
Link: http://lkml.kernel.org/r/20180305200730.15812-5-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Thu, 5 Apr 2018 23:20:29 +0000 (16:20 -0700)]
slab: make create_kmalloc_cache() work with 32-bit sizes
KMALLOC_MAX_CACHE_SIZE is 32-bit so is the largest kmalloc cache size.
Christoph said:
:
: Ok SLABs maximum allocation size is limited to 32M (see
: include/linux/slab.h:
:
: #define KMALLOC_SHIFT_HIGH ((MAX_ORDER + PAGE_SHIFT - 1) <= 25 ? \
: (MAX_ORDER + PAGE_SHIFT - 1) : 25)
:
: And SLUB/SLOB pass all larger requests to the page allocator anyways.
Link: http://lkml.kernel.org/r/20180305200730.15812-4-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Thu, 5 Apr 2018 23:20:26 +0000 (16:20 -0700)]
slab: make kmalloc_size() return "unsigned int"
kmalloc_size() derives size of kmalloc cache from internal index, which
can't be negative.
Propagate unsignedness a bit.
Link: http://lkml.kernel.org/r/20180305200730.15812-3-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Thu, 5 Apr 2018 23:20:22 +0000 (16:20 -0700)]
slab: make kmalloc_index() return "unsigned int"
kmalloc_index() return index into an array of kmalloc kmem caches,
therefore should be unsigned.
Space savings with SLUB on trimmed down .config:
add/remove: 0/1 grow/shrink: 6/56 up/down: 85/-557 (-472)
Function old new delta
calculate_sizes 924 983 +59
on_freelist 589 604 +15
init_cache_random_seq 122 127 +5
ext4_mb_init 1206 1210 +4
slab_pad_check.part 270 271 +1
cpu_partial_store 112 113 +1
usersize_show 28 27 -1
...
new_slab 1871 1837 -34
slab_order 204 - -204
This patch start a series of converting SLUB (mostly) to "unsigned int".
1) Most integers in the code are in fact unsigned entities: array
indexes, lengths, buffer sizes, allocation orders. It is therefore
better to use unsigned variables
2) Some integers in the code are either "size_t" or "unsigned long" for
no reason.
size_t usually comes from people trying to maintain type correctness
and figuring out that "sizeof" operator returns size_t or
memset/memcpy takes size_t so should everything passed to it.
However the number of 4GB+ objects in the kernel is very small. Most,
if not all, dynamically allocated objects with kmalloc() or
kmem_cache_create() aren't actually big. Maintaining wide types
doesn't do anything.
64-bit ops are bigger than 32-bit on our beloved x86_64,
so try to not use 64-bit where it isn't necessary
(read: everywhere where integers are integers not pointers)
3) in case of SLAB allocators, there are additional limitations
*) page->inuse, page->objects are only 16-/15-bit,
*) cache size was always 32-bit
*) slab orders are small, order 20 is needed to go 64-bit on x86_64
(PAGE_SIZE << order)
Basically everything is 32-bit except kmalloc(1ULL<<32) which gets
shortcut through page allocator.
Christoph said:
:
: That changes with large base page size on power and ARM64 f.e. but then
: we do not want to encourage larger allocations through slab anyways.
Link: http://lkml.kernel.org/r/20180305200730.15812-2-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Thu, 5 Apr 2018 23:20:18 +0000 (16:20 -0700)]
slab: fixup calculate_alignment() argument type
Link: http://lkml.kernel.org/r/20180305200730.15812-1-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Chintan Pandya [Thu, 5 Apr 2018 23:20:15 +0000 (16:20 -0700)]
mm/slub.c: use jitter-free reference while printing age
When SLUB_DEBUG catches some issues, it prints all the required debug
info. However, in a few cases where allocation and free of the object
has happened in a very short time, 'age' might be misleading. See the
example below:
=============================================================================
BUG kmalloc-256 (Tainted: G W O ): Poison overwritten
-----------------------------------------------------------------------------
...
INFO: Allocated in binder_transaction+0x4b0/0x2448 age=731 cpu=3 pid=5314
...
INFO: Freed in binder_free_transaction+0x2c/0x58 age=735 cpu=6 pid=2079
...
Object
fffffff14956a870: 6b 6b 6b 6b 6b 6b 6b 6b 67 6b 6b 6b 6b 6b 6b a5 kkkkkkkkgkkkk
In this case, object got freed later but 'age' shows otherwise. This
could be because, while printing this info, we print allocation traces
first and free traces thereafter. In between, if we get schedule out or
jiffies increment, (jiffies - t->when) could become meaningless.
Use the jitter free reference to calculate age.
New output will exactly be same. 'age' is still staying with single
jiffies ref in both prints.
Change-Id: I0846565807a4229748649bbecb1ffb743d71fcd8
Link: http://lkml.kernel.org/r/1520492010-19389-1-git-send-email-cpandya@codeaurora.org
Signed-off-by: Chintan Pandya <cpandya@codeaurora.org>
Acked-by: Christoph Lameter <cl@linux.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Thu, 5 Apr 2018 23:20:11 +0000 (16:20 -0700)]
mm/slab_common.c: mark kmalloc machinery as __ro_after_init
kmalloc caches aren't relocated after being set up neither does
"size_index" array.
Link: http://lkml.kernel.org/r/20180226203519.GA6886@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
shunki-fujita [Thu, 5 Apr 2018 23:20:07 +0000 (16:20 -0700)]
fs: don't flush pagecache when expanding block device
When changing the size of a block device, its all caches are freed.
It's necessary on shrinking to prevent spurious I/Os to the disappeared
region. However, on expanding, such kind of I/Os doesn't happen.
Similar things can be considered for btrfs filesystem resize and
resize2fs, but they are designed not to drop caches when expanding.
Therefore this patch removes unnecessary cache drop.
Link: http://lkml.kernel.org/r/1521457240-153390-1-git-send-email-shunki-fujita@cybozu.co.jp
Signed-off-by: Shunki Fujita <shunki-fujita@cybozu.co.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Chengguang Xu [Thu, 5 Apr 2018 23:20:01 +0000 (16:20 -0700)]
net/9p/client.c: fix potential refcnt problem of trans module
When specifying trans_mod multiple times in a mount, it will cause an
inaccurate refcount of the trans module. Also, in the error case of
option parsing, we should put the trans module if we have already got
it.
Link: http://lkml.kernel.org/r/1522154942-57339-1-git-send-email-cgxu519@gmx.com
Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@sandia.gov>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yiwen Jiang [Thu, 5 Apr 2018 23:19:57 +0000 (16:19 -0700)]
fs/9p: don't set SB_NOATIME by default
When the user uses some syscall, for example mmap(v9fs_file_mmap), it
will not update atime even if user's was set mnt_flags without
MNT_NOATIME, because v9fs defaults to settine SB_NOATIME in
v9fs_set_super.
For supporting access time updating when the user mounts with relatime,
we should not set SB_NOATIME by default.
Link: http://lkml.kernel.org/r/5AB9A377.6080906@huawei.com
Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@sandia.gov>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Chengguang Xu [Thu, 5 Apr 2018 23:19:53 +0000 (16:19 -0700)]
9p: check memory allocation result for cachetag
Check memory allocation result for cachetag in mount option parsing and
fix potential memory leak in the error case.
Link: http://lkml.kernel.org/r/1521614889-73446-1-git-send-email-cgxu519@gmx.com
Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@sandia.gov>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Cc: <v9fs-developer@lists.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eryu Guan [Thu, 5 Apr 2018 23:19:49 +0000 (16:19 -0700)]
9p: don't maintain dir i_nlink if the exported fs doesn't either
If the exported filesystem dir on 9p server doesn't maintain accurate
i_nlink count, e.g. always reports i_nlink as 1, then 9p should not
maintain nlink count either, otherwise drop_link would report warning
with i_nlink being zero.
For example:
- overlayfs sets nlink to 1 for merged dir
- ext4 (with dir_nlink feature enabled) sets nlink to 1 if a dir has
more than EXT4_LINK_MAX (65000) links.
In this case, everytime a stat(2) call (getattr) on such exported dirs
on 9p client side, the i_nlink gets reset to 1, then operations like
rmdir(2), unlink(2) and rename(2) would cause the dir nlink to go to
zero (then negative), which results in warnings in drop_nlink() and/or
inc_nlink() calls.
This can be reproduced easily as the following steps:
- export a merged overlayfs dir via qemu virtfs to guest
- mount the exported virtfs in guest
- create two sub-directories in the root dir of the mounted 9pfs
- stat the root dir of 9pfs, this resets nlink to 1
- remove all subdirs, the second unlink/rmdir would trigger warning
------------[ cut here ]------------
WARNING: CPU: 3 PID: 1284 at fs/inode.c:282 drop_nlink+0x3e/0x50
...
Call Trace:
dump_stack+0x63/0x81
__warn+0xcb/0xf0
warn_slowpath_null+0x1d/0x20
drop_nlink+0x3e/0x50
v9fs_remove+0xaa/0x130 [9p]
v9fs_vfs_rmdir+0x13/0x20 [9p]
vfs_rmdir+0xb7/0x130
do_rmdir+0x1b8/0x230
SyS_unlinkat+0x22/0x30
do_syscall_64+0x67/0x180
---[ end trace
43758d8ba91e603b ]---
Fix it by leaving i_nlink to be 1 and don't drop nlink if a directory
has nlink <= 2, which indicates that the underlying exported fs doesn't
maintain nlink count accurately. This follows what ext4 does in
ext4_dec_count().
Link: http://lkml.kernel.org/r/20180312053829.4367-1-eguan@linux.alibaba.com
Signed-off-by: Eryu Guan <eguan@linux.alibaba.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Tested-by: Roman Kapl <code@rkapl.cz>
Cc: Caspar Zhang <caspar@linux.alibaba.com>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@sandia.gov>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Cc: <v9fs-developer@lists.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Greg Kurz [Thu, 5 Apr 2018 23:19:44 +0000 (16:19 -0700)]
net/9p: avoid -ERESTARTSYS leak to userspace
If it was interrupted by a signal, the 9p client may need to send some
more requests to the server for cleanup before returning to userspace.
To avoid such a last minute request to be interrupted right away, the
client memorizes if a signal is pending, clears TIF_SIGPENDING, handles
the request and calls recalc_sigpending() before returning.
Unfortunately, if the transmission of this cleanup request fails for any
reason, the transport returns an error and the client propagates it
right away, without calling recalc_sigpending().
This ends up with -ERESTARTSYS from the initially interrupted request
crawling up to syscall exit, with TIF_SIGPENDING cleared by the cleanup
request. The specific signal handling code, which is responsible for
converting -ERESTARTSYS to -EINTR is not called, and userspace receives
the confusing errno value:
open: Unknown error 512 (512)
This is really hard to hit in real life. I discovered the issue while
working on hot-unplug of a virtio-9p-pci device with an instrumented
QEMU allowing to control request completion.
Both p9_client_zc_rpc() and p9_client_rpc() functions have this buggy
error path actually. Their code flow is a bit obscure and the best
thing to do would probably be a full rewrite: to really ensure this
situation of clearing TIF_SIGPENDING and returning -ERESTARTSYS can
never happen.
But given the general lack of interest for the 9p code, I won't risk
breaking more things. So this patch simply fixes the buggy paths in
both functions with a trivial label+goto.
Thanks to Laurent Dufour for his help and suggestions on how to find the
root cause and how to fix it.
Link: http://lkml.kernel.org/r/152062809886.10599.7361006774123053312.stgit@bahia.lan
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@sandia.gov>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Cc: David Miller <davem@davemloft.net>
Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Changwei Ge [Thu, 5 Apr 2018 23:19:40 +0000 (16:19 -0700)]
ocfs2/dlm: clean up unused variable in dlm_process_recovery_data
Link: http://lkml.kernel.org/r/1522734135-7933-1-git-send-email-ge.changwei@h3c.com
Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
Acked-by: Joseph Qi <jiangqi903@gmail.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
piaojun [Thu, 5 Apr 2018 23:19:36 +0000 (16:19 -0700)]
ocfs2/o2hb: check len for bio_add_page() to avoid getting incorrect bio
We need to check len for bio_add_page() to make sure the bio has been
set up correctly, otherwise we may submit incorrect data to device.
Link: http://lkml.kernel.org/r/5ABC3EBE.5020807@huawei.com
Signed-off-by: Jun Piao <piaojun@huawei.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Reviewed-by: Changwei Ge <ge.changwei@h3c.com>
Acked-by: Joseph Qi <jiangqi903@gmail.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Gang He [Thu, 5 Apr 2018 23:19:32 +0000 (16:19 -0700)]
ocfs2: add duplicated ino number check
Add duplicated ino number check, to avoid adding a file into the file
check list when this file is being checked.
Link: http://lkml.kernel.org/r/1495611866-27360-5-git-send-email-ghe@suse.com
Signed-off-by: Gang He <ghe@suse.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Gang He [Thu, 5 Apr 2018 23:19:29 +0000 (16:19 -0700)]
ocfs2: add kobject for online file check
Use embedded kobject mechanism for online file check feature, this will
avoid to use a global list to save/search per-device online file check
related data, meanwhile, reduce the code lines and make the code logic
clear. The changed code is based on Goldwyn Rodrigues's patches and
ext4 fs code.
Link: http://lkml.kernel.org/r/1495611866-27360-4-git-send-email-ghe@suse.com
Signed-off-by: Gang He <ghe@suse.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Gang He [Thu, 5 Apr 2018 23:19:25 +0000 (16:19 -0700)]
ocfs2: fix some small problems
First, move setting fe_done = 1 in spin lock, avoid bring any potential
race condition.
Second, tune mlog message level from ERROR to NOTICE, since the message
should not belong to error message.
Third, tune errno to -EAGAIN when file check queue is full, this errno
is more appropriate in the case.
Link: http://lkml.kernel.org/r/1495611866-27360-3-git-send-email-ghe@suse.com
Signed-off-by: Gang He <ghe@suse.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Gang He [Thu, 5 Apr 2018 23:19:22 +0000 (16:19 -0700)]
ocfs2: move some definitions to header file
Patch series "ocfs2: use kobject for online file check", v3.
Use embedded kobject mechanism for online file check feature, this will
avoid to use a global list to save/search per-device online file check
related data. The changed code is based on Goldwyn Rodrigues's patches
and ext4 fs code, there is not any new features added, except some very
small fixes during this code refactoring. Second, the code change does
not affect the underlying file check code. Thank Goldwyn very much.
Compare with second version, add more comments in the patch
descriptions, to make sure each modification is mentioned. Compare with
first version, split the code change into four patches, make sure each
patch will not bring ocfs2 kernel modules compiling errors.
This patch (of 3):
Move some definitions to header file, which will be referenced by other
source files when kobject mechanism is introduced.
Link: http://lkml.kernel.org/r/1495611866-27360-2-git-send-email-ghe@suse.com
Signed-off-by: Gang He <ghe@suse.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Changwei Ge [Thu, 5 Apr 2018 23:19:18 +0000 (16:19 -0700)]
ocfs2: correct spelling mistake for migratable for all
Inspired by the ocfs2 patch to fix the spelling of migrateable to
migratable, I checked all ocfs2 files and found more spelling mistakes.
So correct them all.
Link: http://lkml.kernel.org/r/1521525734-19576-1-git-send-email-ge.changwei@h3c.com
Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Colin Ian King [Thu, 5 Apr 2018 23:19:14 +0000 (16:19 -0700)]
ocfs2: fix spelling mistake: "Migrateable" -> "Migratable"
Trivial fix to spelling mistake in mlog message text
Link: http://lkml.kernel.org/r/20180319114101.2051-1-colin.king@canonical.com
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
piaojun [Thu, 5 Apr 2018 23:19:11 +0000 (16:19 -0700)]
ocfs2/dlm: wait for dlm recovery done when migrating all lock resources
Wait for dlm recovery done when migrating all lock resources in case that
new lock resource left after leaving dlm domain. And the left lock
resource will cause other nodes BUG.
NodeA NodeB NodeC
umount:
dlm_unregister_domain()
dlm_migrate_all_locks()
NodeB down
do recovery for NodeB
and collect a new lockres
form other live nodes:
dlm_do_recovery
dlm_remaster_locks
dlm_request_all_locks:
dlm_mig_lockres_handler
dlm_new_lockres
__dlm_insert_lockres
at last NodeA become the
master of the new lockres
and leave domain:
dlm_leave_domain()
mount:
dlm_join_domain()
touch file and request
for the owner of the new
lockres, but all the
other nodes said 'NO',
so NodeC decide to be
the owner, and send do
assert msg to other
nodes:
dlmlock()
dlm_get_lock_resource()
dlm_do_assert_master()
other nodes receive the msg
and found two masters exist.
at last cause BUG in
dlm_assert_master_handler()
-->BUG();
Link: http://lkml.kernel.org/r/5AAA6E25.7090303@huawei.com
Fixes:
bc9838c4d44a ("dlm: allow dlm do recovery during shutdown")
Signed-off-by: Jun Piao <piaojun@huawei.com>
Reviewed-by: Alex Chen <alex.chen@huawei.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Acked-by: Joseph Qi <jiangqi903@gmail.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Changwei Ge [Thu, 5 Apr 2018 23:19:07 +0000 (16:19 -0700)]
ocfs2/dlm: clean up unused stack variable in dlm_do_local_ast()
Link: http://lkml.kernel.org/r/1521116681-14602-2-git-send-email-ge.changwei@h3c.com
Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Changwei Ge [Thu, 5 Apr 2018 23:19:03 +0000 (16:19 -0700)]
ocfs2/dlm: clean up unused argument for dlm_destroy_recovery_area()
Link: http://lkml.kernel.org/r/1521116681-14602-1-git-send-email-ge.changwei@h3c.com
Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Changwei Ge [Thu, 5 Apr 2018 23:19:00 +0000 (16:19 -0700)]
fs/ocfs2/dlm/dlmrecovery.c: remove unrelated comment
Obviously, the comment before dlm_do_local_recovery_cleanup() has nothing
to do with it. So remove it.
Link: http://lkml.kernel.org/r/1519371054-4648-1-git-send-email-ge.changwei@h3c.com
Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
Acked-by: Joseph Qi <jiangqi903@gmail.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Changwei Ge [Thu, 5 Apr 2018 23:18:56 +0000 (16:18 -0700)]
ocfs2: remove two unused functions from suballoc.c
The two functions are no longer used.
Link: http://lkml.kernel.org/r/1519609595-26229-1-git-send-email-ge.changwei@h3c.com
Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
piaojun [Thu, 5 Apr 2018 23:18:52 +0000 (16:18 -0700)]
ocfs2: remove unnecessary null pointer check before kmem_cache_destroy()
As kmem_cache_destroy() already handles null pointers, so we can remove
the conditional test entirely.
Link: http://lkml.kernel.org/r/5A9EB21D.3000209@huawei.com
Signed-off-by: Jun Piao <piaojun@huawei.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jun Piao [Thu, 5 Apr 2018 23:18:48 +0000 (16:18 -0700)]
ocfs2/dlm: don't handle migrate lockres if already in shutdown
We should not handle migrate lockres if we are already in
'DLM_CTXT_IN_SHUTDOWN', as that will cause lockres remains after leaving
dlm domain. At last other nodes will get stuck into infinite loop when
requsting lock from us.
The problem is caused by concurrency umount between nodes. Before
receiveing N1's DLM_BEGIN_EXIT_DOMAIN_MSG, N2 has picked up N1 as the
migrate target. So N2 will continue sending lockres to N1 even though
N1 has left domain.
N1 N2 (owner)
touch file
access the file,
and get pr lock
begin leave domain and
pick up N1 as new owner
begin leave domain and
migrate all lockres done
begin migrate lockres to N1
end leave domain, but
the lockres left
unexpectedly, because
migrate task has passed
[piaojun@huawei.com: v3]
Link: http://lkml.kernel.org/r/5A9CBD19.5020107@huawei.com
Link: http://lkml.kernel.org/r/5A99F028.2090902@huawei.com
Signed-off-by: Jun Piao <piaojun@huawei.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Reviewed-by: Joseph Qi <jiangqi903@gmail.com>
Reviewed-by: Changwei Ge <ge.changwei@h3c.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jia Guo [Thu, 5 Apr 2018 23:18:45 +0000 (16:18 -0700)]
ocfs2: keep the trace point consistent with the function name
Keep the trace point consistent with the function name.
Link: http://lkml.kernel.org/r/02609aba-84b2-a22d-3f3b-bc1944b94260@huawei.com
Fixes:
3ef045c3d8ae ("ocfs2: switch to ->write_iter()")
Signed-off-by: Jia Guo <guojia12@huawei.com>
Reviewed-by: Jun Piao <piaojun@huawei.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Reviewed-by: Alex Chen <alex.chen@huawei.com>
Acked-by: Gang He <ghe@suse.com>
Acked-by: Joseph Qi <jiangqi903@gmail.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
piaojun [Thu, 5 Apr 2018 23:18:41 +0000 (16:18 -0700)]
ocfs2: remove some unused function declarations
Remove some unused function declarations in dlmcommon.h.
Link: http://lkml.kernel.org/r/5A7D1034.7050807@huawei.com
Signed-off-by: Jun Piao <piaojun@huawei.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Reviewed-by: Changwei Ge <ge.changwei@h3c.com>
Reviewed-by: Alex Chen <alex.chen@huawei.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
piaojun [Thu, 5 Apr 2018 23:18:37 +0000 (16:18 -0700)]
ocfs2: use 'oi' instead of 'OCFS2_I()'
We could use 'oi' instead of 'OCFS2_I()' to make code more elegant.
Link: http://lkml.kernel.org/r/5A7020FE.5050906@huawei.com
Signed-off-by: Jun Piao <piaojun@huawei.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Reviewed-by: Alex Chen <alex.chen@huawei.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
piaojun [Thu, 5 Apr 2018 23:18:33 +0000 (16:18 -0700)]
ocfs2: use 'osb' instead of 'OCFS2_SB()'
We could use 'osb' instead of 'OCFS2_SB()' to make code more elegant.
Link: http://lkml.kernel.org/r/5A702111.7090907@huawei.com
Signed-off-by: Jun Piao <piaojun@huawei.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Changbin Du [Thu, 5 Apr 2018 23:18:29 +0000 (16:18 -0700)]
scripts/faddr2line: show the code context
Inspired by gdb command 'list', show the code context of target lines.
Here is a example:
$ scripts/faddr2line vmlinux native_write_msr+0x6
native_write_msr+0x6/0x20:
arch_static_branch at arch/x86/include/asm/msr.h:105
100 return EAX_EDX_VAL(val, low, high);
101 }
102
103 static inline void notrace __wrmsr(unsigned int msr, u32 low, u32 high)
104 {
105 asm volatile("1: wrmsr\n"
106 "2:\n"
107 _ASM_EXTABLE_HANDLE(1b, 2b, ex_handler_wrmsr_unsafe)
108 : : "c" (msr), "a"(low), "d" (high) : "memory");
109 }
110
(inlined by) static_key_false at include/linux/jump_label.h:142
137 #define JUMP_TYPE_LINKED 2UL
138 #define JUMP_TYPE_MASK 3UL
139
140 static __always_inline bool static_key_false(struct static_key *key)
141 {
142 return arch_static_branch(key, false);
143 }
144
145 static __always_inline bool static_key_true(struct static_key *key)
146 {
147 return !arch_static_branch(key, true);
(inlined by) native_write_msr at arch/x86/include/asm/msr.h:150
145 static inline void notrace
146 native_write_msr(unsigned int msr, u32 low, u32 high)
147 {
148 __wrmsr(msr, low, high);
149
150 if (msr_tracepoint_active(__tracepoint_write_msr))
151 do_trace_write_msr(msr, ((u64)high << 32 | low), 0);
152 }
153
154 /* Can be uninlined because referenced by paravirt */
155 static inline int notrace
Link: http://lkml.kernel.org/r/1521444205-2259-1-git-send-email-changbin.du@intel.com
Signed-off-by: Changbin Du <changbin.du@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: NeilBrown <neilb@suse.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yury Norov [Thu, 5 Apr 2018 23:18:25 +0000 (16:18 -0700)]
lib: fix stall in __bitmap_parselist()
syzbot is catching stalls at __bitmap_parselist()
(https://syzkaller.appspot.com/bug?id=
ad7e0351fbc90535558514a71cd3edc11681997a).
The trigger is
unsigned long v = 0;
bitmap_parselist("7:,", &v, BITS_PER_LONG);
which results in hitting infinite loop at
while (a <= b) {
off = min(b - a + 1, used_size);
bitmap_set(maskp, a, off);
a += group_size;
}
due to used_size == group_size == 0.
Link: http://lkml.kernel.org/r/20180404162647.15763-1-ynorov@caviumnetworks.com
Fixes:
0a5ce0831d04382a ("lib/bitmap.c: make bitmap_parselist() thread-safe and much faster")
Signed-off-by: Yury Norov <ynorov@caviumnetworks.com>
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: syzbot <syzbot+6887cbb011c8054e8a3d@syzkaller.appspotmail.com>
Cc: Noam Camus <noamca@mellanox.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mike Kravetz [Thu, 5 Apr 2018 23:18:21 +0000 (16:18 -0700)]
hugetlbfs: fix bug in pgoff overflow checking
This is a fix for a regression in 32 bit kernels caused by an invalid
check for pgoff overflow in hugetlbfs mmap setup. The check incorrectly
specified that the size of a loff_t was the same as the size of a long.
The regression prevents mapping hugetlbfs files at offsets greater than
4GB on 32 bit kernels.
On 32 bit kernels conversion from a page based unsigned long can not
overflow a loff_t byte offset. Therefore, skip this check if
sizeof(unsigned long) != sizeof(loff_t).
Link: http://lkml.kernel.org/r/20180330145402.5053-1-mike.kravetz@oracle.com
Fixes:
63489f8e8211 ("hugetlbfs: check for pgoff value overflow")
Reported-by: Dan Rue <dan.rue@linaro.org>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Tested-by: Anders Roxell <anders.roxell@linaro.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Yisheng Xie <xieyisheng1@huawei.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Nic Losby <blurbdust@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Huacai Chen [Thu, 5 Apr 2018 23:18:18 +0000 (16:18 -0700)]
zboot: fix stack protector in compressed boot phase
Calling __stack_chk_guard_setup() in decompress_kernel() is too late
that stack checking always fails for decompress_kernel() itself. So
remove __stack_chk_guard_setup() and initialize __stack_chk_guard before
we call decompress_kernel().
Original code comes from ARM but also used for MIPS and SH, so fix them
together. If without this fix, compressed booting of these archs will
fail because stack checking is enabled by default (>=4.16).
Link: http://lkml.kernel.org/r/1522226933-29317-1-git-send-email-chenhc@lemote.com
Fixes:
8779657d29c0 ("stackprotector: Introduce CONFIG_CC_STACKPROTECTOR_STRONG")
Signed-off-by: Huacai Chen <chenhc@lemote.com>
Acked-by: James Hogan <jhogan@kernel.org>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Rich Felker <dalias@libc.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 5 Apr 2018 03:07:20 +0000 (20:07 -0700)]
Merge tag 'char-misc-4.17-rc1' of git://git./linux/kernel/git/gregkh/char-misc
Pull char/misc updates from Greg KH:
"Here is the big set of char/misc driver patches for 4.17-rc1.
There are a lot of little things in here, nothing huge, but all
important to the different hardware types involved:
- thunderbolt driver updates
- parport updates (people still care...)
- nvmem driver updates
- mei updates (as always)
- hwtracing driver updates
- hyperv driver updates
- extcon driver updates
- ... and a handful of even smaller driver subsystem and individual
driver updates
All of these have been in linux-next with no reported issues"
* tag 'char-misc-4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (149 commits)
hwtracing: Add HW tracing support menu
intel_th: Add ACPI glue layer
intel_th: Allow forcing host mode through drvdata
intel_th: Pick up irq number from resources
intel_th: Don't touch switch routing in host mode
intel_th: Use correct method of finding hub
intel_th: Add SPDX GPL-2.0 header to replace GPLv2 boilerplate
stm class: Make dummy's master/channel ranges configurable
stm class: Add SPDX GPL-2.0 header to replace GPLv2 boilerplate
MAINTAINERS: Bestow upon myself the care for drivers/hwtracing
hv: add SPDX license id to Kconfig
hv: add SPDX license to trace
Drivers: hv: vmbus: do not mark HV_PCIE as perf_device
Drivers: hv: vmbus: respect what we get from hv_get_synint_state()
/dev/mem: Avoid overwriting "err" in read_mem()
eeprom: at24: use SPDX identifier instead of GPL boiler-plate
eeprom: at24: simplify the i2c functionality checking
eeprom: at24: fix a line break
eeprom: at24: tweak newlines
eeprom: at24: refactor at24_probe()
...
Linus Torvalds [Thu, 5 Apr 2018 02:41:45 +0000 (19:41 -0700)]
Merge tag 'driver-core-4.17-rc1' of git://git./linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is the "big" set of driver core patches for 4.17-rc1.
There's really not much here, just a bunch of firmware code
refactoring from Luis as he attempts to wrangle that codebase into
something that is managable, along with a bunch of userspace tests for
it. Other than that, a handful of small bugfixes and reverts of things
that didn't work out.
Full details are in the shortlog, it's not all that much.
All of these have been in linux-next for a while with no reported
issues"
* tag 'driver-core-4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (30 commits)
drivers: base: remove check for callback in coredump_store()
mt7601u: use firmware_request_cache() to address cache on reboot
firmware: add firmware_request_cache() to help with cache on reboot
firmware: fix typo on pr_info_once() when ignore_sysfs_fallback is used
firmware: explicitly include vmalloc.h
firmware: ensure the firmware cache is not used on incompatible calls
test_firmware: modify custom fallback tests to use unique files
firmware: add helper to check to see if fw cache is setup
firmware: fix checking for return values for fw_add_devm_name()
rename: _request_firmware_load() fw_load_sysfs_fallback()
test_firmware: test three firmware kernel configs using a proc knob
test_firmware: expand on library with shared helpers
firmware: enable to force disable the fallback mechanism at run time
firmware: enable run time change of forcing fallback loader
firmware: move firmware loader into its own directory
firmware: split firmware fallback functionality into its own file
firmware: move loading timeout under struct firmware_fallback_config
firmware: use helpers for setting up a temporary cache timeout
firmware: simplify CONFIG_FW_LOADER_USER_HELPER_FALLBACK further
drivers: base: add description for .coredump() callback
...
Linus Torvalds [Thu, 5 Apr 2018 01:56:27 +0000 (18:56 -0700)]
Merge tag 'staging-4.17-rc1' of git://git./linux/kernel/git/gregkh/staging
Pull staging/IIO updates from Greg KH:
"Here is the big set of Staging/IIO driver patches for 4.17-rc1.
It is a lot, over 500 changes, but not huge by previous kernel release
standards. We deleted more lines than we added again (27k added vs.
91k remvoed), thanks to finally being able to delete the IRDA drivers
and networking code.
We also deleted the ccree crypto driver, but that's coming back in
through the crypto tree to you, in a much cleaned-up form.
Added this round is at lot of "mt7621" device support, which is for an
embedded device that Neil Brown cares about, and of course a handful
of new IIO drivers as well.
And finally, the fsl-mc core code moved out of the staging tree to the
"real" part of the kernel, which is nice to see happen as well.
Full details are in the shortlog, which has all of the tiny cleanup
patches described.
All of these have been in linux-next for a while with no reported
issues"
* tag 'staging-4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (579 commits)
staging: rtl8723bs: Remove yield call, replace with cond_resched()
staging: rtl8723bs: Replace yield() call with cond_resched()
staging: rtl8723bs: Remove unecessary newlines from 'odm.h'.
staging: rtl8723bs: Rework 'struct _ODM_Phy_Status_Info_' coding style.
staging: rtl8723bs: Rework 'struct _ODM_Per_Pkt_Info_' coding style.
staging: rtl8723bs: Replace NULL pointer comparison with '!'.
staging: rtl8723bs: Factor out rtl8723bs_recv_tasklet() sections.
staging: rtl8723bs: Fix function signature that goes over 80 characters.
staging: rtl8723bs: Fix lines too long in update_recvframe_attrib().
staging: rtl8723bs: Remove unnecessary blank lines in 'rtl8723bs_recv.c'.
staging: rtl8723bs: Change camel case to snake case in 'rtl8723bs_recv.c'.
staging: rtl8723bs: Add missing braces in else statement.
staging: rtl8723bs: Add spaces around ternary operators.
staging: rtl8723bs: Fix lines with trailing open parentheses.
staging: rtl8723bs: Remove unnecessary length #define's.
staging: rtl8723bs: Fix IEEE80211 authentication algorithm constants.
staging: rtl8723bs: Fix alignment in rtw_wx_set_auth().
staging: rtl8723bs: Remove braces from single statement conditionals.
staging: rtl8723bs: Remove unecessary braces from switch statement.
staging: rtl8723bs: Fix newlines in rtw_wx_set_auth().
...
Linus Torvalds [Thu, 5 Apr 2018 01:43:49 +0000 (18:43 -0700)]
Merge tag 'tty-4.17-rc1' of git://git./linux/kernel/git/gregkh/tty
Pull tty/serial driver updates from Greg KH:
"Here is the big set of tty and serial driver patches for 4.17-rc1
Not all that big really, most are just small fixes and additions to
existing drivers. There's a bunch of work on the imx serial driver
recently for some reason, and a new embedded serial driver added as
well.
Full details are in the shortlog.
All of these have been in the linux-next tree for a while with no
reported issues"
* tag 'tty-4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (66 commits)
serial: expose buf_overrun count through proc interface
serial: mvebu-uart: fix tx lost characters
tty: serial: msm_geni_serial: Fix return value check in qcom_geni_serial_probe()
tty: serial: msm_geni_serial: Add serial driver support for GENI based QUP
8250-men-mcb: add support for 16z025 and 16z057
powerpc: Mark the variable earlycon_acpi_spcr_enable maybe_unused
serial: stm32: fix initialization of RS485 mode
ARM: dts: STi: Remove "console=ttyASN" from bootargs for STi boards
vt: change SGR 21 to follow the standards
serdev: Fix typo in serdev_device_alloc
ARM: dts: STi: Fix aliases property name for STi boards
tty: st-asc: Update tty alias
serial: stm32: add support for RS485 hardware control mode
dt-bindings: serial: stm32: add RS485 optional properties
selftests: add devpts selftests
devpts: comment devpts_mntget()
devpts: resolve devpts bind-mounts
devpts: hoist out check for DEVPTS_SUPER_MAGIC
serial: 8250: Add Nuvoton NPCM UART
serial: mxs-auart: disable clks of Alphascale ASM9260
...
Linus Torvalds [Thu, 5 Apr 2018 00:55:35 +0000 (17:55 -0700)]
Merge tag 'usb-4.17-rc1' of git://git./linux/kernel/git/gregkh/usb
Pull USB/PHY updates from Greg KH:
"Here is the big set of USB and PHY driver patches for 4.17-rc1.
Lots of USB typeC work happened this round, with code moving from the
staging directory into the "real" part of the kernel, as well as new
infrastructure being added to be able to handle the different types of
"roles" that typeC requires.
There is also the normal huge set of USB gadget controller and driver
updates, along with XHCI changes, and a raft of other tiny fixes all
over the USB tree. And the PHY driver updates are merged in here as
well as they interacted with the USB drivers in some places.
All of these have been in linux-next for a while with no reported
issues"
* tag 'usb-4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (250 commits)
Revert "USB: serial: ftdi_sio: add Id for Physik Instrumente E-870"
usb: musb: gadget: misplaced out of bounds check
usb: chipidea: imx: Fix ULPI on imx53
usb: chipidea: imx: Cleanup ci_hdrc_imx_platform_flag
usb: chipidea: usbmisc: small clean up
usb: chipidea: usbmisc: evdo can be set e/o reset
usb: chipidea: usbmisc: evdo is only specific to OTG port
USB: serial: ftdi_sio: add Id for Physik Instrumente E-870
usb: dwc3: gadget: never call ->complete() from ->ep_queue()
usb: gadget: udc: core: update usb_ep_queue() documentation
usb: host: Remove the deprecated ATH79 USB host config options
usb: roles: Fix return value check in intel_xhci_usb_probe()
USB: gadget: f_midi: fixing a possible double-free in f_midi
usb: core: Add USB_QUIRK_DELAY_CTRL_MSG to usbcore quirks
usb: core: Copy parameter string correctly and remove superfluous null check
USB: announce bcdDevice as well as idVendor, idProduct.
USB:fix USB3 devices behind USB3 hubs not resuming at hibernate thaw
usb: hub: Reduce warning to notice on power loss
USB: serial: ftdi_sio: add support for Harman FirmwareHubEmulator
USB: serial: cp210x: add ELDAT Easywave RX09 id
...
Linus Torvalds [Thu, 5 Apr 2018 00:42:38 +0000 (17:42 -0700)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
"This fixes some fallout from the net-next merge the other day, plus
some non-merge-window-related bug fixes:
1) Fix sparse warnings in bcmgenet, systemport, b53, and mt7530
(Florian Fainelli)
2) pptp does a bogus dst_release() on a route we have a single
refcount on, and attached to a socket, which needs that refcount
(Eric Dumazet)
3) UDP connected sockets on ipv6 can race with route update handling,
resulting in a pre-PMTU update route still stuck on the socket and
thus continuing to get ICMPV6_PKT_TOOBIG errors. We end up never
seeing the updated route. (Alexey Kodanev)
4) Missing list initializer(s) in TIPC (Jon Maloy)
5) Connect phy early to prevent crashes in lan78xx driver (Alexander
Graf)
6) Fix build with modular NVMEM (Arnd Bergmann)
7) netdevsim canot mark nsim_devlink_net_ops and nsim_fib_net_ops as
__net_initdata, as these are references from module unload
unconditionally (Arnd Bergmann)"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (23 commits)
netdevsim: remove incorrect __net_initdata annotations
sfc: remove ctpio_dmabuf_start from stats
inet: frags: fix ip6frag_low_thresh boundary
tipc: Fix namespace violation in tipc_sk_fill_sock_diag
net: avoid unneeded atomic operation in ip*_append_data()
nvmem: disallow modular CONFIG_NVMEM
net: hns3: fix length overflow when CONFIG_ARM64_64K_PAGES
nfp: use full 40 bits of the NSP buffer address
lan78xx: Connect phy early
nfp: add a separate counter for packets with CHECKSUM_COMPLETE
tipc: Fix missing list initializations in struct tipc_subscription
ipv6: udp: set dst cache for a connected sk if current not valid
ipv6: udp: convert 'connected' to bool type in udpv6_sendmsg()
ipv6: allow to cache dst for a connected sk in ip6_sk_dst_lookup_flow()
ipv6: add a wrapper for ip6_dst_store() with flowi6 checks
net: phy: marvell10g: add thermal hwmon device
pptp: remove a buggy dst release in pptp_connect()
net: dsa: mt7530: Use NULL instead of plain integer
net: dsa: b53: Fix sparse warnings in b53_mmap.c
af_unix: remove redundant lockdep class
...
Linus Torvalds [Thu, 5 Apr 2018 00:11:08 +0000 (17:11 -0700)]
Merge branch 'linus' of git://git./linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
"API:
- add AEAD support to crypto engine
- allow batch registration in simd
Algorithms:
- add CFB mode
- add speck block cipher
- add sm4 block cipher
- new test case for crct10dif
- improve scheduling latency on ARM
- scatter/gather support to gcm in aesni
- convert x86 crypto algorithms to skcihper
Drivers:
- hmac(sha224/sha256) support in inside-secure
- aes gcm/ccm support in stm32
- stm32mp1 support in stm32
- ccree driver from staging tree
- gcm support over QI in caam
- add ks-sa hwrng driver"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (212 commits)
crypto: ccree - remove unused enums
crypto: ahash - Fix early termination in hash walk
crypto: brcm - explicitly cast cipher to hash type
crypto: talitos - don't leak pointers to authenc keys
crypto: qat - don't leak pointers to authenc keys
crypto: picoxcell - don't leak pointers to authenc keys
crypto: ixp4xx - don't leak pointers to authenc keys
crypto: chelsio - don't leak pointers to authenc keys
crypto: caam/qi - don't leak pointers to authenc keys
crypto: caam - don't leak pointers to authenc keys
crypto: lrw - Free rctx->ext with kzfree
crypto: talitos - fix IPsec cipher in length
crypto: Deduplicate le32_to_cpu_array() and cpu_to_le32_array()
crypto: doc - clarify hash callbacks state machine
crypto: api - Keep failed instances alive
crypto: api - Make crypto_alg_lookup static
crypto: api - Remove unused crypto_type lookup function
crypto: chelsio - Remove declaration of static function from header
crypto: inside-secure - hmac(sha224) support
crypto: inside-secure - hmac(sha256) support
..
Linus Torvalds [Wed, 4 Apr 2018 23:43:47 +0000 (16:43 -0700)]
Merge tag 'riscv-for-linus-4.17-mw0' of git://git./linux/kernel/git/palmer/riscv-linux
Pull RISC-V updates from Palmer Dabbelt:
"This contains the new features we'd like to incorporate into the
RISC-V port for 4.17. We might have a bit more stuff land later in the
merge window, but I wanted to get this out earlier just so everyone
can see where we currently stand.
A short summary of the changes is:
- We've added support for dynamic ftrace on RISC-V targets.
- There have been a handful of cleanups to our atomic and locking
routines. They now more closely match the released RISC-V memory
model draft.
- Our module loading support has been cleaned up and is now enabled
by default, despite some limitations still existing.
- A patch to define COMMANDLINE_FORCE instead of COMMANDLINE_OVERRIDE
so the generic device tree code picks up handling all our command
line stuff.
There's more information in the merge commits for each patch set"
* tag 'riscv-for-linus-4.17-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux: (21 commits)
RISC-V: Rename CONFIG_CMDLINE_OVERRIDE to CONFIG_CMDLINE_FORCE
RISC-V: Add definition of relocation types
RISC-V: Enable module support in defconfig
RISC-V: Support SUB32 relocation type in kernel module
RISC-V: Support ADD32 relocation type in kernel module
RISC-V: Support ALIGN relocation type in kernel module
RISC-V: Support RVC_BRANCH/JUMP relocation type in kernel modulewq
RISC-V: Support HI20/LO12_I/LO12_S relocation type in kernel module
RISC-V: Support CALL relocation type in kernel module
RISC-V: Support GOT_HI20/CALL_PLT relocation type in kernel module
RISC-V: Add section of GOT.PLT for kernel module
RISC-V: Add sections of PLT and GOT for kernel module
riscv/atomic: Strengthen implementations with fences
riscv/spinlock: Strengthen implementations with fences
riscv/barrier: Define __smp_{store_release,load_acquire}
riscv/ftrace: Add HAVE_FUNCTION_GRAPH_RET_ADDR_PTR support
riscv/ftrace: Add DYNAMIC_FTRACE_WITH_REGS support
riscv/ftrace: Add ARCH_SUPPORTS_FTRACE_OPS support
riscv/ftrace: Add dynamic function graph tracer support
riscv/ftrace: Add dynamic function tracer support
...
Linus Torvalds [Wed, 4 Apr 2018 23:01:43 +0000 (16:01 -0700)]
Merge tag 'arm64-upstream' of git://git./linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
"Nothing particularly stands out here, probably because people were
tied up with spectre/meltdown stuff last time around. Still, the main
pieces are:
- Rework of our CPU features framework so that we can whitelist CPUs
that don't require kpti even in a heterogeneous system
- Support for the IDC/DIC architecture extensions, which allow us to
elide instruction and data cache maintenance when writing out
instructions
- Removal of the large memory model which resulted in suboptimal
codegen by the compiler and increased the use of literal pools,
which could potentially be used as ROP gadgets since they are
mapped as executable
- Rework of forced signal delivery so that the siginfo_t is
well-formed and handling of show_unhandled_signals is consolidated
and made consistent between different fault types
- More siginfo cleanup based on the initial patches from Eric
Biederman
- Workaround for Cortex-A55 erratum #1024718
- Some small ACPI IORT updates and cleanups from Lorenzo Pieralisi
- Misc cleanups and non-critical fixes"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (70 commits)
arm64: uaccess: Fix omissions from usercopy whitelist
arm64: fpsimd: Split cpu field out from struct fpsimd_state
arm64: tlbflush: avoid writing RES0 bits
arm64: cmpxchg: Include linux/compiler.h in asm/cmpxchg.h
arm64: move percpu cmpxchg implementation from cmpxchg.h to percpu.h
arm64: cmpxchg: Include build_bug.h instead of bug.h for BUILD_BUG
arm64: lse: Include compiler_types.h and export.h for out-of-line LL/SC
arm64: fpsimd: include <linux/init.h> in fpsimd.h
drivers/perf: arm_pmu_platform: do not warn about affinity on uniprocessor
perf: arm_spe: include linux/vmalloc.h for vmap()
Revert "arm64: Revert L1_CACHE_SHIFT back to 6 (64-byte cache line size)"
arm64: cpufeature: Avoid warnings due to unused symbols
arm64: Add work around for Arm Cortex-A55 Erratum 1024718
arm64: Delay enabling hardware DBM feature
arm64: Add MIDR encoding for Arm Cortex-A55 and Cortex-A35
arm64: capabilities: Handle shared entries
arm64: capabilities: Add support for checks based on a list of MIDRs
arm64: Add helpers for checking CPU MIDR against a range
arm64: capabilities: Clean up midr range helpers
arm64: capabilities: Change scope of VHE to Boot CPU feature
...
Linus Torvalds [Wed, 4 Apr 2018 22:19:26 +0000 (15:19 -0700)]
Merge branch 'irq-core-for-linus' of git://git./linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
"The usual pile of boring changes:
- Consolidate tasklet functions to share code instead of duplicating
it
- The first step for making the low level entry handler management on
multi-platform kernels generic
- A new sysfs file which allows to retrieve the wakeup state of
interrupts.
- Ensure that the interrupt thread follows the effective affinity and
not the programmed affinity to avoid cross core wakeups.
- Two new interrupt controller drivers (Microsemi Ocelot and Qualcomm
PDC)
- Fix the wakeup path clock handling for Reneasas interrupt chips.
- Rework the boot time register reset for ARM GIC-V2/3
- Better suspend/resume support for ARM GIV-V3/ITS
- Add missing locking to the ARM GIC set_type() callback
- Small fixes for the irq simulator code
- SPDX identifiers for the irq core code and removal of boiler plate
- Small cleanups all over the place"
* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits)
openrisc: Set CONFIG_MULTI_IRQ_HANDLER
arm64: Set CONFIG_MULTI_IRQ_HANDLER
genirq: Make GENERIC_IRQ_MULTI_HANDLER depend on !MULTI_IRQ_HANDLER
irqchip/gic: Take lock when updating irq type
irqchip/gic: Update supports_deactivate static key to modern api
irqchip/gic-v3: Ensure GICR_CTLR.EnableLPI=0 is observed before enabling
irqchip: Add a driver for the Microsemi Ocelot controller
dt-bindings: interrupt-controller: Add binding for the Microsemi Ocelot interrupt controller
irqchip/gic-v3: Probe for SCR_EL3 being clear before resetting AP0Rn
irqchip/gic-v3: Don't try to reset AP0Rn
irqchip/gic-v3: Do not check trigger configuration of partitionned LPIs
genirq: Remove license boilerplate/references
genirq: Add missing SPDX identifiers
genirq/matrix: Cleanup SPDX identifier
genirq: Cleanup top of file comments
genirq: Pass desc to __irq_free instead of irq number
irqchip/gic-v3: Loudly complain about the use of IRQ_TYPE_NONE
irqchip/gic: Loudly complain about the use of IRQ_TYPE_NONE
RISC-V: Move to the new GENERIC_IRQ_MULTI_HANDLER handler
genirq: Add CONFIG_GENERIC_IRQ_MULTI_HANDLER
...
Linus Torvalds [Wed, 4 Apr 2018 21:50:29 +0000 (14:50 -0700)]
Merge branch 'timers-core-for-linus' of git://git./linux/kernel/git/tip/tip
Pull time(r) updates from Thomas Gleixner:
"A small set of updates for timers and timekeeping:
- The most interesting change is the consolidation of clock MONOTONIC
and clock BOOTTIME.
Clock MONOTONIC behaves now exactly like clock BOOTTIME and does
not longer ignore the time spent in suspend. A new clock
MONOTONIC_ACTIVE is provived which behaves like clock MONOTONIC in
kernels before this change. This allows applications to
programmatically check for the clock MONOTONIC behaviour.
As discussed in the review thread, this has the potential of
breaking user space and we might have to revert this. Knock on wood
that we can avoid that exercise.
- Updates to the NTP mechanism to improve accuracy
- A new kernel internal data structure to aid the ongoing Y2038 work.
- Cleanups and simplifications of the clocksource code.
- Make the alarmtimer code play nicely with debugobjects"
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
alarmtimer: Init nanosleep alarm timer on stack
y2038: Introduce struct __kernel_old_timeval
tracing: Unify the "boot" and "mono" tracing clocks
hrtimer: Unify MONOTONIC and BOOTTIME clock behavior
posix-timers: Unify MONOTONIC and BOOTTIME clock behavior
timekeeping: Remove boot time specific code
Input: Evdev - unify MONOTONIC and BOOTTIME clock behavior
timekeeping: Make the MONOTONIC clock behave like the BOOTTIME clock
timekeeping: Add the new CLOCK_MONOTONIC_ACTIVE clock
timekeeping/ntp: Determine the multiplier directly from NTP tick length
timekeeping/ntp: Don't align NTP frequency adjustments to ticks
clocksource: Use ATTRIBUTE_GROUPS
clocksource: Use DEVICE_ATTR_RW/RO/WO to define device attributes
clocksource: Don't walk the clocksource list for empty override
Linus Torvalds [Wed, 4 Apr 2018 21:31:53 +0000 (14:31 -0700)]
Merge tag 'random_for_linus' of git://git./linux/kernel/git/tytso/random
Pull /dev/random updates from Ted Ts'o:
"A few random (cough, cough) cleanups for the /dev/random driver"
* tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random:
drivers/char/random.c: remove unused dont_count_entropy
random: optimize add_interrupt_randomness
random: always fill buffer in get_random_bytes_wait
random: use a tighter cap in credit_entropy_bits_safe()
Linus Torvalds [Wed, 4 Apr 2018 21:19:24 +0000 (14:19 -0700)]
Merge tag 'ext4_for_linus' of git://git./linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
"Cleanups and bugfixes for ext4, including some fixes to make ext4 more
robust against maliciously crafted file system images.
(I still don't recommend that container folks hold any delusions that
mounting arbitary images that can be crafted by malicious attackers
should be considered sane thing to do, though!)"
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (29 commits)
ext4: force revalidation of directory pointer after seekdir(2)
ext4: add extra checks to ext4_xattr_block_get()
ext4: add bounds checking to ext4_xattr_find_entry()
ext4: move call to ext4_error() into ext4_xattr_check_block()
ext4: don't show data=<mode> option if defaulted
ext4: omit init_itable=n in procfs when disabled
ext4: show more binary mount options in procfs
ext4: simplify kobject usage
ext4: remove unused parameters in sysfs code
ext4: null out kobject* during sysfs cleanup
ext4: don't allow r/w mounts if metadata blocks overlap the superblock
ext4: always initialize the crc32c checksum driver
ext4: fail ext4_iget for root directory if unallocated
ext4: limit xattr size to INT_MAX
ext4: add validity checks for bitmap block numbers
ext4: fix comments in ext4_swap_extents()
ext4: use generic_writepages instead of __writepage/write_cache_pages
ext4: don't complain about incorrect features when probing
ext4: remove EXT4_STATE_DIOREAD_LOCK flag
ext4: fix offset overflow on 32-bit archs in ext4_iomap_begin()
...
Linus Torvalds [Wed, 4 Apr 2018 21:09:27 +0000 (14:09 -0700)]
Merge tag '4.17-SMB3-Fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs updates from Steve French:
"Includes SMB3.11 security improvements, as well as various fixes for
stable and some debugging improvements"
* tag '4.17-SMB3-Fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: Add minor debug message during negprot
smb3: Fix root directory when server returns inode number of zero
cifs: fix sparse warning on previous patch in a few printks
cifs: add server->vals->header_preamble_size
cifs: smbd: disconnect transport on RDMA errors
cifs: smbd: avoid reconnect lockup
Don't log confusing message on reconnect by default
Don't log expected error on DFS referral request
fs: cifs: Replace _free_xid call in cifs_root_iget function
SMB3.1.1 dialect is no longer experimental
Tree connect for SMB3.1.1 must be signed for non-encrypted shares
fix smb3-encryption breakage when CONFIG_DEBUG_SG=y
CIFS: fix sha512 check in cifs_crypto_secmech_release
CIFS: implement v3.11 preauth integrity
CIFS: add sha512 secmech
CIFS: refactor crypto shash/sdesc allocation&free
Update README file for cifs.ko
Update TODO list for cifs.ko
cifs: fix memory leak in SMB2_open()
CIFS: SMBD: fix spelling mistake: "faield" and "legnth"
Linus Torvalds [Wed, 4 Apr 2018 20:09:42 +0000 (13:09 -0700)]
Merge tag 'gfs2-4.17.fixes' of git://git./linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 updates from Bob Peterson:
"We've only got nine GFS2 patches for this merge window:
- report journal recovery times more accurately during journal replay
(Abhi Das)
- fix fallocate chunk size (Andreas Gruenbacher)
- correctly dirty inodes during rename (Andreas Gruenbacher)
- improve the comment for function gfs2_block_map (Andreas
Gruenbacher)
- improve kernel trace point iomap end: The physical block address
was added (Andreas Gruenbacher)
- fix a nasty file system corruption bug that surfaced in xfstests
476 in punch-hole/truncate (Andreas Gruenbacher)
- fix a problem Christoph Helwig pointed out, namely, that GFS2 was
misusing the IOMAP_ZERO flag. The zeroing of new blocks was moved
to the proper fallocate code (Andreas Gruenbacher)
- declare function gfs2_remove_from_ail as static (Bob Peterson)
- only set PageChecked for jdata page writes (Bob Peterson)"
* tag 'gfs2-4.17.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: time journal recovery steps accurately
gfs2: Zero out fallocated blocks in fallocate_chunk
gfs2: Check for the end of metadata in punch_hole
gfs2: gfs2_iomap_end tracepoint: log block address
gfs2: Improve gfs2_block_map comment
GFS2: Only set PageChecked for jdata pages
GFS2: Make function gfs2_remove_from_ail static
gfs2: Dirty source inode during rename
gfs2: Fix fallocate chunk size
Linus Torvalds [Wed, 4 Apr 2018 20:03:38 +0000 (13:03 -0700)]
Merge tag 'for-4.17-tag' of git://git./linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba:
"There are a several user visible changes, the rest is mostly invisible
and continues to clean up the whole code base.
User visible changes:
- new mount option nossd_spread (pair for ssd_spread)
- mount option subvolid will detect junk after the number and fail
the mount
- add message after cancelled device replace
- direct module dependency on libcrc32, removed own crc wrappers
- removed user space transaction ioctls
- use lighter locking when reading /proc/self/mounts, RCU instead of
mutex to avoid unnecessary contention
Enhancements:
- skip writeback of last page when truncating file to same size
- send: do not issue unnecessary truncate operations
- mount option token specifiers: use %u for unsigned values, more
validation
- selftests: more tree block validations
qgroups:
- preparatory work for splitting reservation types for data and
metadata, this should allow for more accurate tracking and fix some
issues with underflows or do further enhancements
- split metadata reservations for started and joined transaction so
they do not get mixed up and are accounted correctly at commit time
- with the above, it's possible to revert patch that potentially
deadlocks when trying to make more space by explicitly committing
when the quota limit is hit
- fix root item corruption when multiple same source snapshots are
created with quota enabled
RAID56:
- make sure target is identical to source when raid56 rebuild fails
after dev-replace
- faster rebuild during scrub, batch by stripes and not
block-by-block
- make more use of cached data when rebuilding from a missing device
Fixes:
- null pointer deref when device replace target is missing
- fix fsync after hole punching when using no-holes feature
- fix lockdep splat when allocating percpu data with wrong GFP flags
Cleanups, refactoring, core changes:
- drop redunant parameters from various functions
- kill and opencode trivial helpers
- __cold/__exit function annotations
- dead code removal
- continued audit and documentation of memory barriers
- error handling: handle removal from uuid tree
- error handling: remove handling of impossible condtitons
- more debugging or error messages
- updated tracepoints
- one VLA use removal (and one still left)"
* tag 'for-4.17-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (164 commits)
btrfs: lift errors from add_extent_changeset to the callers
Btrfs: print error messages when failing to read trees
btrfs: user proper type for btrfs_mask_flags flags
btrfs: split dev-replace locking helpers for read and write
btrfs: remove stale comments about fs_mutex
btrfs: use RCU in btrfs_show_devname for device list traversal
btrfs: update barrier in should_cow_block
btrfs: use lockdep_assert_held for mutexes
btrfs: use lockdep_assert_held for spinlocks
btrfs: Validate child tree block's level and first key
btrfs: tests/qgroup: Fix wrong tree backref level
Btrfs: fix copy_items() return value when logging an inode
Btrfs: fix fsync after hole punching when using no-holes feature
btrfs: use helper to set ulist aux from a qgroup
Revert "btrfs: qgroups: Retry after commit on getting EDQUOT"
btrfs: qgroup: Update trace events for metadata reservation
btrfs: qgroup: Use root::qgroup_meta_rsv_* to record qgroup meta reserved space
btrfs: delayed-inode: Use new qgroup meta rsv for delayed inode and item
btrfs: qgroup: Use separate meta reservation type for delalloc
btrfs: qgroup: Introduce function to convert META_PREALLOC into META_PERTRANS
...
Linus Torvalds [Wed, 4 Apr 2018 19:44:02 +0000 (12:44 -0700)]
Merge tag 'xfs-4.17-merge-1' of git://git./fs/xfs/xfs-linux
Pull xfs updates from Darrick Wong:
"Here's the first round of fixes for XFS for 4.17.
The biggest new features this time around are the addition of lazytime
support, further enhancement of the on-disk inode metadata verifiers,
and a patch to smooth over some of the AGFL padding problems that have
intermittently plagued users since 4.5. I forsee sending a second pull
request next week with further bug fixes and speedups in the online
scrub code and elsewhere.
This series has been run through a full xfstests run over the weekend
and through a quick xfstests run against this morning's master, with
no major failures reported.
Summary of changes for this release:
- Various cleanups and code fixes
- Implement lazytime as a mount option
- Convert various on-disk metadata checks from asserts to -EFSCORRUPTED
- Fix accounting problems with the rmap per-ag reservations
- Refactorings and cleanups for xfs_log_force
- Various bugfixes for the reflink code
- Work around v5 AGFL padding problems to prevent fs shutdowns
- Establish inode fork verifiers to inspect on-disk metadata
correctness
- Various online scrub fixes
- Fix v5 swapext blowing up on deleted inodes"
* tag 'xfs-4.17-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (49 commits)
xfs: do not log/recover swapext extent owner changes for deleted inodes
xfs: clean up xfs_mount allocation and dynamic initializers
xfs: remove dead inode version setting code
xfs: catch inode allocation state mismatch corruption
xfs: xfs_scrub_iallocbt_xref_rmap_inodes should use xref_set_corrupt
xfs: flag inode corruption if parent ptr doesn't get us a real inode
xfs: don't accept inode buffers with suspicious unlinked chains
xfs: move inode extent size hint validation to libxfs
xfs: record inode buf errors as a xref error in inobt scrubber
xfs: remove xfs_buf parameter from inode scrub methods
xfs: inode scrubber shouldn't bother with raw checks
xfs: bmap scrubber should do rmap xref with bmap for sparse files
xfs: refactor inode buffer verifier error logging
xfs: refactor inode verifier error logging
xfs: refactor bmap record validation
xfs: sanity-check the unused space before trying to use it
xfs: detect agfl count corruption and reset agfl
xfs: unwind the try_again loop in xfs_log_force
xfs: refactor xfs_log_force_lsn
xfs: minor cleanup for xfs_reflink_end_cow
...
Linus Torvalds [Wed, 4 Apr 2018 19:05:25 +0000 (12:05 -0700)]
Merge branch 'work.dcache' of git://git./linux/kernel/git/viro/vfs
Pull vfs dcache updates from Al Viro:
"Part of this is what the trylock loop elimination series has turned
into, part making d_move() preserve the parent (and thus the path) of
victim, plus some general cleanups"
* 'work.dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (22 commits)
d_genocide: move export to definition
fold dentry_lock_for_move() into its sole caller and clean it up
make non-exchanging __d_move() copy ->d_parent rather than swap them
oprofilefs: don't oops on allocation failure
lustre: get rid of pointless casts to struct dentry *
debugfs_lookup(): switch to lookup_one_len_unlocked()
fold lookup_real() into __lookup_hash()
take out orphan externs (empty_string/slash_string)
split d_path() and friends into a separate file
dcache.c: trim includes
fs/dcache: Avoid a try_lock loop in shrink_dentry_list()
get rid of trylock loop around dentry_kill()
handle move to LRU in retain_dentry()
dput(): consolidate the "do we need to retain it?" into an inlined helper
split the slow part of lock_parent() off
now lock_parent() can't run into killed dentry
get rid of trylock loop in locking dentries on shrink list
d_delete(): get rid of trylock loop
fs/dcache: Move dentry_kill() below lock_parent()
fs/dcache: Remove stale comment from dentry_kill()
...
Arnd Bergmann [Wed, 4 Apr 2018 12:12:39 +0000 (14:12 +0200)]
netdevsim: remove incorrect __net_initdata annotations
The __net_initdata section cannot currently be used for structures that
get cleaned up in an exitcall using unregister_pernet_operations:
WARNING: vmlinux.o(.text+0x868c34): Section mismatch in reference from the function nsim_devlink_exit() to the (unknown reference) .init.data:(unknown)
The function nsim_devlink_exit() references
the (unknown reference) __initdata (unknown).
This is often because nsim_devlink_exit lacks a __initdata
annotation or the annotation of (unknown) is wrong.
WARNING: vmlinux.o(.text+0x868c64): Section mismatch in reference from the function nsim_devlink_init() to the (unknown reference) .init.data:(unknown)
WARNING: vmlinux.o(.text+0x8692bc): Section mismatch in reference from the function nsim_fib_exit() to the (unknown reference) .init.data:(unknown)
WARNING: vmlinux.o(.text+0x869300): Section mismatch in reference from the function nsim_fib_init() to the (unknown reference) .init.data:(unknown)
As that warning tells us, discarding the structure after a module is
loaded would lead to a undefined behavior when that module is removed.
It might be possible to change that annotation so it has no effect for
loadable modules, but I have not figured out exactly how to do that, and
we want this to be fixed in -rc1.
This just removes the annotations, just like we do for all other such
modules.
Fixes:
37923ed6b8ce ("netdevsim: Add simple FIB resource controller via devlink")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bert Kenward [Wed, 4 Apr 2018 15:40:30 +0000 (16:40 +0100)]
sfc: remove ctpio_dmabuf_start from stats
The ctpio_dmabuf_start entry is not actually a stat and shouldn't
be exposed to ethtool.
Fixes:
2c0b6ee837db ("sfc: expose CTPIO stats on NICs that support them")
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 4 Apr 2018 15:35:10 +0000 (08:35 -0700)]
inet: frags: fix ip6frag_low_thresh boundary
Giving an integer to proc_doulongvec_minmax() is dangerous on 64bit arches,
since linker might place next to it a non zero value preventing a change
to ip6frag_low_thresh.
ip6frag_low_thresh is not used anymore in the kernel, but we do not
want to prematuraly break user scripts wanting to change it.
Since specifying a minimal value of 0 for proc_doulongvec_minmax()
is moot, let's remove these zero values in all defrag units.
Fixes:
6e00f7dd5e4e ("ipv6: frags: fix /proc/sys/net/ipv6/ip6frag_low_thresh")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
GhantaKrishnamurthy MohanKrishna [Wed, 4 Apr 2018 12:49:47 +0000 (14:49 +0200)]
tipc: Fix namespace violation in tipc_sk_fill_sock_diag
To fetch UID info for socket diagnostics, we determine the
namespace of user context using tipc socket instance. This
may cause namespace violation, as the kernel will remap based
on UID.
We fix this by fetching namespace info using the calling userspace
netlink socket.
Fixes:
c30b70deb5f4 (tipc: implement socket diagnostics for AF_TIPC)
Reported-by: syzbot+326e587eff1074657718@syzkaller.appspotmail.com
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: GhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Wed, 4 Apr 2018 12:30:01 +0000 (14:30 +0200)]
net: avoid unneeded atomic operation in ip*_append_data()
After commit
694aba690de0 ("ipv4: factorize sk_wmem_alloc updates
done by __ip_append_data()") and commit
1f4c6eb24029 ("ipv6:
factorize sk_wmem_alloc updates done by __ip6_append_data()"),
when transmitting sub MTU datagram, an addtional, unneeded atomic
operation is performed in ip*_append_data() to update wmem_alloc:
in the above condition the delta is 0.
The above cause small but measurable performance regression in UDP
xmit tput test with packet size below MTU.
This change avoids such overhead updating wmem_alloc only if
wmem_alloc_delta is non zero.
The error path is left intentionally unmodified: it's a slow path
and simplicity is preferred to performances.
Fixes:
694aba690de0 ("ipv4: factorize sk_wmem_alloc updates done by __ip_append_data()")
Fixes:
1f4c6eb24029 ("ipv6: factorize sk_wmem_alloc updates done by __ip6_append_data()")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Wed, 4 Apr 2018 10:38:40 +0000 (12:38 +0200)]
nvmem: disallow modular CONFIG_NVMEM
The new of_get_nvmem_mac_address() helper function causes a link error
with CONFIG_NVMEM=m:
drivers/of/of_net.o: In function `of_get_nvmem_mac_address':
of_net.c:(.text+0x168): undefined reference to `of_nvmem_cell_get'
of_net.c:(.text+0x19c): undefined reference to `nvmem_cell_read'
of_net.c:(.text+0x1a8): undefined reference to `nvmem_cell_put'
I could not come up with a good solution for this, as the code is always
built-in. Using an #if IS_REACHABLE() check around it would solve the
link time issue but then stop it from working in that configuration.
Making of_nvmem_cell_get() an inline function could also solve that, but
seems a bit ugly since it's somewhat larger than most inline functions,
and it would just bring that problem into the callers. Splitting the
function into a separate file might be an alternative.
This uses the big hammer by making CONFIG_NVMEM itself a 'bool' symbol,
which avoids the problem entirely but makes the vmlinux larger for anyone
that might use NVMEM support but doesn't need it built-in otherwise.
Fixes:
9217e566bdee ("of_net: Implement of_get_nvmem_mac_address helper")
Cc: Mike Looijmans <mike.looijmans@topic.nl>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: David S. Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Mike Looijmans
Signed-off-by: David S. Miller <davem@davemloft.net>
Tan Xiaojun [Wed, 4 Apr 2018 09:40:48 +0000 (17:40 +0800)]
net: hns3: fix length overflow when CONFIG_ARM64_64K_PAGES
When enable the config item "CONFIG_ARM64_64K_PAGES", the size of PAGE_SIZE
is 65536(64K). But the type of length is u16, it will overflow. So change it
to u32.
Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dirk van der Merwe [Wed, 4 Apr 2018 00:24:23 +0000 (17:24 -0700)]
nfp: use full 40 bits of the NSP buffer address
The NSP default buffer is a piece of NFP memory where additional
command data can be placed. Its format has been copied from
host buffer, but the PCIe selection bits do not make sense in
this case. If those get masked out from a NFP address - writes
to random place in the chip memory may be issued and crash the
device.
Even in the general NSP buffer case, it doesn't make sense to have the
PCIe selection bits there anymore. These are unused at the moment, and
when it becomes necessary, the PCIe selection bits should rather be
moved to another register to utilise more bits for the buffer address.
This has never been an issue because the buffer used to be
allocated in memory with less-than-38-bit-long address but that
is about to change.
Fixes:
1a64821c6af7 ("nfp: add support for service processor access")
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Graf [Tue, 3 Apr 2018 22:19:35 +0000 (00:19 +0200)]
lan78xx: Connect phy early
When using wicked with a lan78xx device attached to the system, we
end up with ethtool commands issued on the device before an ifup
got issued. That lead to the following crash:
Unable to handle kernel NULL pointer dereference at virtual address
0000039c
pgd =
ffff800035b30000
[
0000039c] *pgd=
0000000000000000
Internal error: Oops:
96000004 [#1] SMP
Modules linked in: [...]
Supported: Yes
CPU: 3 PID: 638 Comm: wickedd Tainted: G E 4.12.14-0-default #1
Hardware name: raspberrypi rpi/rpi, BIOS 2018.03-rc2 02/21/2018
task:
ffff800035e74180 task.stack:
ffff800036718000
PC is at phy_ethtool_ksettings_get+0x20/0x98
LR is at lan78xx_get_link_ksettings+0x44/0x60 [lan78xx]
pc : [<
ffff0000086f7f30>] lr : [<
ffff000000dcca84>] pstate:
20000005
sp :
ffff80003671bb20
x29:
ffff80003671bb20 x28:
ffff800035e74180
x27:
ffff000008912000 x26:
000000000000001d
x25:
0000000000000124 x24:
ffff000008f74d00
x23:
0000004000114809 x22:
0000000000000000
x21:
ffff80003671bbd0 x20:
0000000000000000
x19:
ffff80003671bbd0 x18:
000000000000040d
x17:
0000000000000001 x16:
0000000000000000
x15:
0000000000000000 x14:
ffffffffffffffff
x13:
0000000000000000 x12:
0000000000000020
x11:
0101010101010101 x10:
fefefefefefefeff
x9 :
7f7f7f7f7f7f7f7f x8 :
fefefeff31677364
x7 :
0000000080808080 x6 :
ffff80003671bc9c
x5 :
ffff80003671b9f8 x4 :
ffff80002c296190
x3 :
0000000000000000 x2 :
0000000000000000
x1 :
ffff80003671bbd0 x0 :
ffff80003671bc00
Process wickedd (pid: 638, stack limit = 0xffff800036718000)
Call trace:
Exception stack(0xffff80003671b9e0 to 0xffff80003671bb20)
b9e0:
ffff80003671bc00 ffff80003671bbd0 0000000000000000 0000000000000000
ba00:
ffff80002c296190 ffff80003671b9f8 ffff80003671bc9c 0000000080808080
ba20:
fefefeff31677364 7f7f7f7f7f7f7f7f fefefefefefefeff 0101010101010101
ba40:
0000000000000020 0000000000000000 ffffffffffffffff 0000000000000000
ba60:
0000000000000000 0000000000000001 000000000000040d ffff80003671bbd0
ba80:
0000000000000000 ffff80003671bbd0 0000000000000000 0000004000114809
baa0:
ffff000008f74d00 0000000000000124 000000000000001d ffff000008912000
bac0:
ffff800035e74180 ffff80003671bb20 ffff000000dcca84 ffff80003671bb20
bae0:
ffff0000086f7f30 0000000020000005 ffff80002c296000 ffff800035223900
bb00:
0000ffffffffffff 0000000000000000 ffff80003671bb20 ffff0000086f7f30
[<
ffff0000086f7f30>] phy_ethtool_ksettings_get+0x20/0x98
[<
ffff000000dcca84>] lan78xx_get_link_ksettings+0x44/0x60 [lan78xx]
[<
ffff0000087cbc40>] ethtool_get_settings+0x68/0x210
[<
ffff0000087cc0d4>] dev_ethtool+0x214/0x2180
[<
ffff0000087e5008>] dev_ioctl+0x400/0x630
[<
ffff00000879dd00>] sock_do_ioctl+0x70/0x88
[<
ffff00000879f5f8>] sock_ioctl+0x208/0x368
[<
ffff0000082cde10>] do_vfs_ioctl+0xb0/0x848
[<
ffff0000082ce634>] SyS_ioctl+0x8c/0xa8
Exception stack(0xffff80003671bec0 to 0xffff80003671c000)
bec0:
0000000000000009 0000000000008946 0000fffff4e841d0 0000aa0032687465
bee0:
0000aaaafa2319d4 0000fffff4e841d4 0000000032687465 0000000032687465
bf00:
000000000000001d 7f7fff7f7f7f7f7f 72606b622e71ff4c 7f7f7f7f7f7f7f7f
bf20:
0101010101010101 0000000000000020 ffffffffffffffff 0000ffff7f510c68
bf40:
0000ffff7f6a9d18 0000ffff7f44ce30 000000000000040d 0000ffff7f6f98f0
bf60:
0000fffff4e842c0 0000000000000001 0000aaaafa2c2e00 0000ffff7f6ab000
bf80:
0000fffff4e842c0 0000ffff7f62a000 0000aaaafa2b9f20 0000aaaafa2c2e00
bfa0:
0000fffff4e84818 0000fffff4e841a0 0000ffff7f5ad0cc 0000fffff4e841a0
bfc0:
0000ffff7f44ce3c 0000000080000000 0000000000000009 000000000000001d
bfe0:
0000000000000000 0000000000000000 0000000000000000 0000000000000000
The culprit is quite simple: The driver tries to access the phy left and right,
but only actually has a working reference to it when the device is up.
The fix thus is quite simple too: Get a reference to the phy on probe already
and keep it even when the device is going down.
With this patch applied, I can successfully run wicked on my system and bring
the interface up and down as many times as I want, without getting NULL pointer
dereferences in between.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 2 Apr 2018 20:31:20 +0000 (13:31 -0700)]
nfp: add a separate counter for packets with CHECKSUM_COMPLETE
We are currently counting packets with CHECKSUM_COMPLETE as
"hw_rx_csum_ok". This is confusing. Add a new counter.
To make sure it fits in the same cacheline move the less used
error counter to a different location.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Maloy [Tue, 3 Apr 2018 17:11:19 +0000 (19:11 +0200)]
tipc: Fix missing list initializations in struct tipc_subscription
When an item of struct tipc_subscription is created, we fail to
initialize the two lists aggregated into the struct. This has so far
never been a problem, since the items are just added to a root
object by list_add(), which does not require the addee list to be
pre-initialized. However, syzbot is provoking situations where this
addition fails, whereupon the attempted removal if the item from
the list causes a crash.
This problem seems to always have been around, despite that the code
for creating this object was rewritten in commit
242e82cc95f6 ("tipc:
collapse subscription creation functions"), which is still in net-next.
We fix this for that commit by initializing the two lists properly.
Fixes:
242e82cc95f6 ("tipc: collapse subscription creation functions")
Reported-by: syzbot+0bb443b74ce09197e970@syzkaller.appspotmail.com
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 4 Apr 2018 15:31:58 +0000 (11:31 -0400)]
Merge branch 'ipv6-udp-set-dst-cache-for-a-connected-sk-if-current-not-valid'
Alexey Kodanev says:
====================
ipv6: udp: set dst cache for a connected sk if current not valid
A new RTF_CACHE route can be created with the socket's dst cache
update between the below calls in udpv6_sendmsg(), when datagram
sending results to ICMPV6_PKT_TOOBIG error:
dst = ip6_sk_dst_lookup_flow(...)
...
release_dst:
if (dst) {
if (connected) {
ip6_dst_store(sk, dst)
Therefore, the new socket's dst cache reset to the old one on
"release_dst:".
The first three patches prepare the code to store dst cache
with ip6_sk_dst_lookup_flow():
* the first patch adds ip6_sk_dst_store_flow() function with
commonly used source and destiantion addresses checks using
the flow information.
* the second patch adds a new argument to ip6_sk_dst_lookup_flow()
and ability to store dst in the socket's cache. Also, the two
users of the function are updated without enabling the new
behavior: pingv6_sendmsg() and udpv6_sendmsg().
* the third patch makes 'connected' variable in udpv6_sendmsg()
to be consistent with ip6_sk_dst_store_flow(), changes its type
from int to bool.
The last patch contains the actual fix that removes sk dst cache
update in the end of udpv6_sendmsg(), and allows to do it in
ip6_sk_dst_lookup_flow().
v6: * use bool type for a new parameter in ip_sk_dst_lookup_flow()
* add one more patch to convert 'connected' variable in
udpv6_sendmsg() from int to bool type. If it shouldn't be
here I will resend it when the net-next is opened.
v5: * relocate ip6_sk_dst_store_flow() to net/ipv6/route.c and
rename ip6_dst_store_flow() to ip6_sk_dst_store_flow() as
suggested by Martin
v4: * fix the error in the build of ip_dst_store_flow() reported by
kbuild test robot due to missing checks for CONFIG_IPV6: add
new function to ip6_output.c instead of ip6_route.h
* add 'const' to struct flowi6 in ip6_dst_store_flow()
* minor commit messages fixes
v3: * instead of moving ip6_dst_store() above udp_v6_send_skb(),
update socket's dst cache inside ip6_sk_dst_lookup_flow()
if the current one is invalid
* the issue not reproduced in 4.1, but starting from 4.2. Add
one more 'Fixes:' commit that creates new RTF_CACHE route.
Though, it is also mentioned in the first one
====================
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexey Kodanev [Tue, 3 Apr 2018 12:00:10 +0000 (15:00 +0300)]
ipv6: udp: set dst cache for a connected sk if current not valid
A new RTF_CACHE route can be created between ip6_sk_dst_lookup_flow()
and ip6_dst_store() calls in udpv6_sendmsg(), when datagram sending
results to ICMPV6_PKT_TOOBIG error:
udp_v6_send_skb(), for example with vti6 tunnel:
vti6_xmit(), get ICMPV6_PKT_TOOBIG error
skb_dst_update_pmtu(), can create a RTF_CACHE clone
icmpv6_send()
...
udpv6_err()
ip6_sk_update_pmtu()
ip6_update_pmtu(), can create a RTF_CACHE clone
...
ip6_datagram_dst_update()
ip6_dst_store()
And after commit
33c162a980fe ("ipv6: datagram: Update dst cache of
a connected datagram sk during pmtu update"), the UDPv6 error handler
can update socket's dst cache, but it can happen before the update in
the end of udpv6_sendmsg(), preventing getting the new dst cache on
the next udpv6_sendmsg() calls.
In order to fix it, save dst in a connected socket only if the current
socket's dst cache is invalid.
The previous patch prepared ip6_sk_dst_lookup_flow() to do that with
the new argument, and this patch enables it in udpv6_sendmsg().
Fixes:
33c162a980fe ("ipv6: datagram: Update dst cache of a connected datagram sk during pmtu update")
Fixes:
45e4fd26683c ("ipv6: Only create RTF_CACHE routes after encountering pmtu exception")
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexey Kodanev [Tue, 3 Apr 2018 12:00:09 +0000 (15:00 +0300)]
ipv6: udp: convert 'connected' to bool type in udpv6_sendmsg()
This should make it consistent with ip6_sk_dst_lookup_flow()
that is accepting the new 'connected' parameter of type bool.
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexey Kodanev [Tue, 3 Apr 2018 12:00:08 +0000 (15:00 +0300)]
ipv6: allow to cache dst for a connected sk in ip6_sk_dst_lookup_flow()
Add 'connected' parameter to ip6_sk_dst_lookup_flow() and update
the cache only if ip6_sk_dst_check() returns NULL and a socket
is connected.
The function is used as before, the new behavior for UDP sockets
in udpv6_sendmsg() will be enabled in the next patch.
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexey Kodanev [Tue, 3 Apr 2018 12:00:07 +0000 (15:00 +0300)]
ipv6: add a wrapper for ip6_dst_store() with flowi6 checks
Move commonly used pattern of ip6_dst_store() usage to a separate
function - ip6_sk_dst_store_flow(), which will check the addresses
for equality using the flow information, before saving them.
There is no functional changes in this patch. In addition, it will
be used in the next patch, in ip6_sk_dst_lookup_flow().
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Tue, 3 Apr 2018 09:31:45 +0000 (10:31 +0100)]
net: phy: marvell10g: add thermal hwmon device
Add a thermal monitoring device for the Marvell 88x3310, which updates
once a second. We also need to hook into the suspend/resume mechanism
to ensure that the thermal monitoring is reconfigured when we resume.
Suggested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 3 Apr 2018 01:48:37 +0000 (18:48 -0700)]
pptp: remove a buggy dst release in pptp_connect()
Once dst has been cached in socket via sk_setup_caps(),
it is illegal to call ip_rt_put() (or dst_release()),
since sk_setup_caps() did not change dst refcount.
We can still dereference it since we hold socket lock.
Caugth by syzbot :
BUG: KASAN: use-after-free in atomic_dec_return include/asm-generic/atomic-instrumented.h:198 [inline]
BUG: KASAN: use-after-free in dst_release+0x27/0xa0 net/core/dst.c:185
Write of size 4 at addr
ffff8801c54dc040 by task syz-executor4/20088
CPU: 1 PID: 20088 Comm: syz-executor4 Not tainted 4.16.0+ #376
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:17 [inline]
dump_stack+0x1a7/0x27d lib/dump_stack.c:53
print_address_description+0x73/0x250 mm/kasan/report.c:256
kasan_report_error mm/kasan/report.c:354 [inline]
kasan_report+0x23c/0x360 mm/kasan/report.c:412
check_memory_region_inline mm/kasan/kasan.c:260 [inline]
check_memory_region+0x137/0x190 mm/kasan/kasan.c:267
kasan_check_write+0x14/0x20 mm/kasan/kasan.c:278
atomic_dec_return include/asm-generic/atomic-instrumented.h:198 [inline]
dst_release+0x27/0xa0 net/core/dst.c:185
sk_dst_set include/net/sock.h:1812 [inline]
sk_dst_reset include/net/sock.h:1824 [inline]
sock_setbindtodevice net/core/sock.c:610 [inline]
sock_setsockopt+0x431/0x1b20 net/core/sock.c:707
SYSC_setsockopt net/socket.c:1845 [inline]
SyS_setsockopt+0x2ff/0x360 net/socket.c:1828
do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x42/0xb7
RIP: 0033:0x4552d9
RSP: 002b:
00007f4878126c68 EFLAGS:
00000246 ORIG_RAX:
0000000000000036
RAX:
ffffffffffffffda RBX:
00007f48781276d4 RCX:
00000000004552d9
RDX:
0000000000000019 RSI:
0000000000000001 RDI:
0000000000000013
RBP:
000000000072bea0 R08:
0000000000000010 R09:
0000000000000000
R10:
00000000200010c0 R11:
0000000000000246 R12:
00000000ffffffff
R13:
0000000000000526 R14:
00000000006fac30 R15:
0000000000000000
Allocated by task 20088:
save_stack+0x43/0xd0 mm/kasan/kasan.c:447
set_track mm/kasan/kasan.c:459 [inline]
kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:552
kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:489
kmem_cache_alloc+0x12e/0x760 mm/slab.c:3542
dst_alloc+0x11f/0x1a0 net/core/dst.c:104
rt_dst_alloc+0xe9/0x540 net/ipv4/route.c:1520
__mkroute_output net/ipv4/route.c:2265 [inline]
ip_route_output_key_hash_rcu+0xa49/0x2c60 net/ipv4/route.c:2493
ip_route_output_key_hash+0x20b/0x370 net/ipv4/route.c:2322
__ip_route_output_key include/net/route.h:126 [inline]
ip_route_output_flow+0x26/0xa0 net/ipv4/route.c:2577
ip_route_output_ports include/net/route.h:163 [inline]
pptp_connect+0xa84/0x1170 drivers/net/ppp/pptp.c:453
SYSC_connect+0x213/0x4a0 net/socket.c:1639
SyS_connect+0x24/0x30 net/socket.c:1620
do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x42/0xb7
Freed by task 20082:
save_stack+0x43/0xd0 mm/kasan/kasan.c:447
set_track mm/kasan/kasan.c:459 [inline]
__kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:520
kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:527
__cache_free mm/slab.c:3486 [inline]
kmem_cache_free+0x83/0x2a0 mm/slab.c:3744
dst_destroy+0x266/0x380 net/core/dst.c:140
dst_destroy_rcu+0x16/0x20 net/core/dst.c:153
__rcu_reclaim kernel/rcu/rcu.h:178 [inline]
rcu_do_batch kernel/rcu/tree.c:2675 [inline]
invoke_rcu_callbacks kernel/rcu/tree.c:2930 [inline]
__rcu_process_callbacks kernel/rcu/tree.c:2897 [inline]
rcu_process_callbacks+0xd6c/0x17b0 kernel/rcu/tree.c:2914
__do_softirq+0x2d7/0xb85 kernel/softirq.c:285
The buggy address belongs to the object at
ffff8801c54dc000
which belongs to the cache ip_dst_cache of size 168
The buggy address is located 64 bytes inside of
168-byte region [
ffff8801c54dc000,
ffff8801c54dc0a8)
The buggy address belongs to the page:
page:
ffffea0007153700 count:1 mapcount:0 mapping:
ffff8801c54dc000 index:0x0
flags: 0x2fffc0000000100(slab)
raw:
02fffc0000000100 ffff8801c54dc000 0000000000000000 0000000100000010
raw:
ffffea0006b34b20 ffffea0006b6c1e0 ffff8801d674a1c0 0000000000000000
page dumped because: kasan: bad access detected
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Mon, 2 Apr 2018 23:24:14 +0000 (16:24 -0700)]
net: dsa: mt7530: Use NULL instead of plain integer
We would be passing 0 instead of NULL as the rsp argument to
mt7530_fdb_cmd(), fix that.
Fixes:
b8f126a8d543 ("net-next: dsa: add dsa support for Mediatek MT7530 switch")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Mon, 2 Apr 2018 23:17:01 +0000 (16:17 -0700)]
net: dsa: b53: Fix sparse warnings in b53_mmap.c
sparse complains about the following warnings:
drivers/net/dsa/b53/b53_mmap.c:33:31: warning: incorrect type in
initializer (different address spaces)
drivers/net/dsa/b53/b53_mmap.c:33:31: expected unsigned char
[noderef] [usertype] <asn:2>*regs
drivers/net/dsa/b53/b53_mmap.c:33:31: got void *priv
and indeed, while what we are doing is functional, we are dereferencing
a void * pointer into a void __iomem * which is not great. Just use the
defined b53_mmap_priv structure which holds our register base and use
that.
Fixes:
967dd82ffc52 ("net: dsa: b53: Add support for Broadcom RoboSwitch")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Mon, 2 Apr 2018 18:01:27 +0000 (11:01 -0700)]
af_unix: remove redundant lockdep class
After commit
581319c58600 ("net/socket: use per af lockdep classes for sk queues")
sock queue locks now have per-af lockdep classes, including unix socket.
It is no longer necessary to workaround it.
I noticed this while looking at a syzbot deadlock report, this patch
itself doesn't fix it (this is why I don't add Reported-by).
Fixes:
581319c58600 ("net/socket: use per af lockdep classes for sk queues")
Cc: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 4 Apr 2018 15:07:21 +0000 (11:07 -0400)]
Merge branch 'net-Broadcom-drivers-sparse-fixes'
Florian Fainelli says:
====================
net: Broadcom drivers sparse fixes
This patch series fixes the same warning reported by sparse in bcmsysport and
bcmgenet in the code that deals with inserting the TX checksum pointers:
drivers/net/ethernet/broadcom/bcmsysport.c:1155:26: warning: cast from restricted __be16
drivers/net/ethernet/broadcom/bcmsysport.c:1155:26: warning: incorrect type in argument 1 (different base types)
drivers/net/ethernet/broadcom/bcmsysport.c:1155:26: expected unsigned short [unsigned] [usertype] val
drivers/net/ethernet/broadcom/bcmsysport.c:1155:26: got restricted __be16 [usertype] protocol
This patch fixes both issues by using the same construct and not swapping
skb->protocol but instead the values we are checking against.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Mon, 2 Apr 2018 22:58:56 +0000 (15:58 -0700)]
net: systemport: Fix sparse warnings in bcm_sysport_insert_tsb()
skb->protocol is a __be16 which we would be calling htons() against,
while this is not wrong per-se as it correctly results in swapping the
value on LE hosts, this still upsets sparse. Adopt a similar pattern to
what other drivers do and just assign ip_ver to skb->protocol, and then
use htons() against the different constants such that the compiler can
resolve the values at build time.
Fixes:
80105befdb4b ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC driver")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Mon, 2 Apr 2018 22:58:55 +0000 (15:58 -0700)]
net: bcmgenet: Fix sparse warnings in bcmgenet_put_tx_csum()
skb->protocol is a __be16 which we would be calling htons() against,
while this is not wrong per-se as it correctly results in swapping the
value on LE hosts, this still upsets sparse. Adopt a similar pattern to
what other drivers do and just assign ip_ver to skb->protocol, and then
use htons() against the different constants such that the compiler can
resolve the values at build time.
Fixes:
1c1008c793fa ("net: bcmgenet: add main driver file")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Howells [Mon, 2 Apr 2018 22:51:39 +0000 (23:51 +0100)]
rxrpc: Fix undefined packet handling
By analogy with other Rx implementations, RxRPC packet types 9, 10 and 11
should just be discarded rather than being aborted like other undefined
packet types.
Reported-by: Jeffrey Altman <jaltman@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>