radix tree: fix sibling entry handling in radix_tree_descend()
authorLinus Torvalds <torvalds@linux-foundation.org>
Sun, 25 Sep 2016 20:32:46 +0000 (13:32 -0700)
committerLinus Torvalds <torvalds@linux-foundation.org>
Sun, 25 Sep 2016 20:32:46 +0000 (13:32 -0700)
The fixes to the radix tree test suite show that the multi-order case is
broken.  The basic reason is that the radix tree code uses tagged
pointers with the "internal" bit in the low bits, and calculating the
pointer indices was supposed to mask off those bits.  But gcc will
notice that we then use the index to re-create the pointer, and will
avoid doing the arithmetic and use the tagged pointer directly.

This cleans the code up, using the existing is_sibling_entry() helper to
validate the sibling pointer range (instead of open-coding it), and
using entry_to_node() to mask off the low tag bit from the pointer.  And
once you do that, you might as well just use the now cleaned-up pointer
directly.

[ Side note: the multi-order code isn't actually ever used in the kernel
  right now, and the only reason I didn't just delete all that code is
  that Kirill Shutemov piped up and said:

    "Well, my ext4-with-huge-pages patchset[1] uses multi-order entries.
     It also converts shmem-with-huge-pages and hugetlb to them.

     I'm okay with converting it to other mechanism, but I need
     something.  (I looked into Konstantin's RFC patchset[2].  It looks
     okay, but I don't feel myself qualified to review it as I don't
     know much about radix-tree internals.)"

  [1] http://lkml.kernel.org/r/20160915115523.29737-1-kirill.shutemov@linux.intel.com
  [2] http://lkml.kernel.org/r/147230727479.9957.1087787722571077339.stgit@zurg ]

Reported-by: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Cedric Blancher <cedric.blancher@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
lib/radix-tree.c

index 1b7bf73141418f5f1427b14e806c9a9d4c6e7582..91f0727e3cada11bd721afdeeb112dd196546637 100644 (file)
@@ -105,10 +105,10 @@ static unsigned int radix_tree_descend(struct radix_tree_node *parent,
 
 #ifdef CONFIG_RADIX_TREE_MULTIORDER
        if (radix_tree_is_internal_node(entry)) {
-               unsigned long siboff = get_slot_offset(parent, entry);
-               if (siboff < RADIX_TREE_MAP_SIZE) {
-                       offset = siboff;
-                       entry = rcu_dereference_raw(parent->slots[offset]);
+               if (is_sibling_entry(parent, entry)) {
+                       void **sibentry = (void **) entry_to_node(entry);
+                       offset = get_slot_offset(parent, sibentry);
+                       entry = rcu_dereference_raw(*sibentry);
                }
        }
 #endif