David Howells [Wed, 3 Aug 2016 16:57:39 +0000 (17:57 +0100)]
cachefiles: Fix race between inactivating and culling a cache object
There's a race between cachefiles_mark_object_inactive() and
cachefiles_cull():
(1) cachefiles_cull() can't delete a backing file until the cache object
is marked inactive, but as soon as that's the case it's fair game.
(2) cachefiles_mark_object_inactive() marks the object as being inactive
and *only then* reads the i_blocks on the backing inode - but
cachefiles_cull() might've managed to delete it by this point.
Fix this by making sure cachefiles_mark_object_inactive() gets any data it
needs from the backing inode before deactivating the object.
Without this, the following oops may occur:
BUG: unable to handle kernel NULL pointer dereference at
0000000000000098
IP: [<
ffffffffa06c5cc1>] cachefiles_mark_object_inactive+0x61/0xb0 [cachefiles]
...
CPU: 11 PID: 527 Comm: kworker/u64:4 Tainted: G I ------------ 3.10.0-470.el7.x86_64 #1
Hardware name: Hewlett-Packard HP Z600 Workstation/0B54h, BIOS 786G4 v03.19 03/11/2011
Workqueue: fscache_object fscache_object_work_func [fscache]
task:
ffff880035edaf10 ti:
ffff8800b77c0000 task.ti:
ffff8800b77c0000
RIP: 0010:[<
ffffffffa06c5cc1>] cachefiles_mark_object_inactive+0x61/0xb0 [cachefiles]
RSP: 0018:
ffff8800b77c3d70 EFLAGS:
00010246
RAX:
0000000000000000 RBX:
ffff8800bf6cc400 RCX:
0000000000000034
RDX:
0000000000000000 RSI:
ffff880090ffc710 RDI:
ffff8800bf761ef8
RBP:
ffff8800b77c3d88 R08:
2000000000000000 R09:
0090ffc710000000
R10:
ff51005d2ff1c400 R11:
0000000000000000 R12:
ffff880090ffc600
R13:
ffff8800bf6cc520 R14:
ffff8800bf6cc400 R15:
ffff8800bf6cc498
FS:
0000000000000000(0000) GS:
ffff8800bb8c0000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
000000008005003b
CR2:
0000000000000098 CR3:
00000000019ba000 CR4:
00000000000007e0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000ffff0ff0 DR7:
0000000000000400
Stack:
ffff880090ffc600 ffff8800bf6cc400 ffff8800867df140 ffff8800b77c3db0
ffffffffa06c48cb ffff880090ffc600 ffff880090ffc180 ffff880090ffc658
ffff8800b77c3df0 ffffffffa085d846 ffff8800a96b8150 ffff880090ffc600
Call Trace:
[<
ffffffffa06c48cb>] cachefiles_drop_object+0x6b/0xf0 [cachefiles]
[<
ffffffffa085d846>] fscache_drop_object+0xd6/0x1e0 [fscache]
[<
ffffffffa085d615>] fscache_object_work_func+0xa5/0x200 [fscache]
[<
ffffffff810a605b>] process_one_work+0x17b/0x470
[<
ffffffff810a6e96>] worker_thread+0x126/0x410
[<
ffffffff810a6d70>] ? rescuer_thread+0x460/0x460
[<
ffffffff810ae64f>] kthread+0xcf/0xe0
[<
ffffffff810ae580>] ? kthread_create_on_node+0x140/0x140
[<
ffffffff81695418>] ret_from_fork+0x58/0x90
[<
ffffffff810ae580>] ? kthread_create_on_node+0x140/0x140
The oopsing code shows:
callq 0xffffffff810af6a0 <wake_up_bit>
mov 0xf8(%r12),%rax
mov 0x30(%rax),%rax
mov 0x98(%rax),%rax <---- oops here
lock add %rax,0x130(%rbx)
where this is:
d_backing_inode(object->dentry)->i_blocks
Fixes:
a5b3a80b899bda0f456f1246c4c5a1191ea01519 (CacheFiles: Provide read-and-reset release counters for cachefilesd)
Reported-by: Jianhong Yin <jiyin@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Steve Dickson <steved@redhat.com>
cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Wed, 3 Aug 2016 17:31:51 +0000 (13:31 -0400)]
Merge branch 'for-viro' of git://git./linux/kernel/git/mszeredi/vfs into for-linus
Linus Torvalds [Wed, 3 Aug 2016 16:50:06 +0000 (12:50 -0400)]
Merge tag 'trace-v4.8-1' of git://git./linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"A few updates and fixes:
- move the suppressing of the __builtin_return_address >0 warning to
the tracing directory only.
- metag recordmcount fix for newer glibc's
- two tracing histogram fixes that were reported by KASAN"
* tag 'trace-v4.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Fix use-after-free in hist_register_trigger()
tracing: Fix use-after-free in hist_unreg_all/hist_enable_unreg_all
Makefile: Mute warning for __builtin_return_address(>0) for tracing only
ftrace/recordmcount: Work around for addition of metag magic but not relocations
Geert Uytterhoeven [Wed, 3 Aug 2016 15:33:32 +0000 (17:33 +0200)]
fs/proc: Add compiler check for -Wno-override-init to support gcc < 4.2
With gcc < 4.2 (e.g. 4.1.2):
CC fs/proc/task_mmu.o
cc1: error: unrecognized command line option "-Wno-override-init"
To fix this, only enable the compiler option when it is actually
supported by the compiler.
Fixes:
ca52953f5f24 ("fs/proc/task_mmu.c: suppress compilation warnings with W=1")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Heinz Mauelshagen [Wed, 3 Aug 2016 15:47:04 +0000 (17:47 +0200)]
dm raid: constructor fails on non-zero incompat_features
When lvm2 userspace requests a RaidLV repair, it sets the rebuild
constructor flag on the new replacement DataLVs but does not clear the
respective MetaLVs. Hence the superblock that is loaded from such new
MetaLVs may have a non-zero incompat_features member and the constructor
will fail with false-positive on incompat_features.
Solve by initializing the incompat_features member properly.
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Al Viro [Wed, 3 Aug 2016 15:12:12 +0000 (11:12 -0400)]
9p: use clone_fid()
in a bunch of places it cleans the things up
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Wed, 3 Aug 2016 15:02:48 +0000 (11:02 -0400)]
9p: fix braino introduced in "9p: new helper - v9fs_parent_fid()"
In v9fs_vfs_rename() we need to clone the parents' fids, not just
find them.
Spotted-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Heinz Mauelshagen [Wed, 27 Jul 2016 21:34:01 +0000 (23:34 +0200)]
dm raid: fix processing of max_recovery_rate constructor flag
__CTR_FLAG_MIN_RECOVERY_RATE was used instead of __CTR_FLAG_MAX_RECOVERY_RATE
thus causing max_recovery_rate to be rejected in case min_recovery_rate
was already set.
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Takashi Iwai [Wed, 3 Aug 2016 13:13:00 +0000 (15:13 +0200)]
ALSA: hda: Fix krealloc() with __GFP_ZERO usage
krealloc() doesn't work always properly with __GFP_ZERO flag as
expected. For clearing the reallocated area, we need to clear
explicitly instead.
Reported-by: Joe Perches <joe@perches.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Miklos Szeredi [Wed, 3 Aug 2016 11:44:27 +0000 (13:44 +0200)]
vfs: make dentry_needs_remove_privs() internal
Only used by the vfs.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Wed, 3 Aug 2016 11:44:27 +0000 (13:44 +0200)]
vfs: remove file_needs_remove_privs()
This function is now unused.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Wed, 3 Aug 2016 11:44:27 +0000 (13:44 +0200)]
vfs: fix deadlock in file_remove_privs() on overlayfs
file_remove_privs() is called with inode lock on file_inode(), which
proceeds to calling notify_change() on file->f_path.dentry. Which triggers
the WARN_ON_ONCE(!inode_is_locked(inode)) in addition to deadlocking later
when ovl_setattr tries to lock the underlying inode again.
Fix this mess by not mixing the layers, but doing everything on underlying
dentry/inode.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes:
07a2daab49c5 ("ovl: Copy up underlying inode's ->i_mode to overlay inode")
Cc: <stable@vger.kernel.org>
Maruthi Srinivas Bayyavarapu [Wed, 3 Aug 2016 11:16:39 +0000 (16:46 +0530)]
ALSA: hda: add AMD Bonaire AZ PCI ID with proper driver caps
This commit fixes garbled audio on Bonaire HDMI
Signed-off-by: Maruthi Bayyavarapu <maruthi.bayyavarapu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Linus Torvalds [Wed, 3 Aug 2016 11:26:11 +0000 (07:26 -0400)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Fix several cases of missing of_node_put() calls in various
networking drivers. From Peter Chen.
2) Don't try to remove unconfigured VLANs in qed driver, from Yuval
Mintz.
3) Unbalanced locking in TIPC error handling, from Wei Yongjun.
4) Fix lockups in CPDMA driver, from Grygorii Strashko.
5) More MACSEC refcount et al fixes, from Sabrina Dubroca.
6) Fix MAC address setting in r8169 during runtime suspend, from
Chun-Hao Lin.
7) Various printf format specifier fixes, from Heinrich Schuchardt.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (59 commits)
qed: Fail driver load in 100g MSI mode.
ethernet: ti: davinci_emac: add missing of_node_put after calling of_parse_phandle
ethernet: stmicro: stmmac: add missing of_node_put after calling of_parse_phandle
ethernet: stmicro: stmmac: dwmac-socfpga: add missing of_node_put after calling of_parse_phandle
ethernet: renesas: sh_eth: add missing of_node_put after calling of_parse_phandle
ethernet: renesas: ravb_main: add missing of_node_put after calling of_parse_phandle
ethernet: marvell: pxa168_eth: add missing of_node_put after calling of_parse_phandle
ethernet: marvell: mvpp2: add missing of_node_put after calling of_parse_phandle
ethernet: marvell: mvneta: add missing of_node_put after calling of_parse_phandle
ethernet: hisilicon: hns: hns_dsaf_main: add missing of_node_put after calling of_parse_phandle
ethernet: hisilicon: hns: hns_dsaf_mac: add missing of_node_put after calling of_parse_phandle
ethernet: cavium: octeon: add missing of_node_put after calling of_parse_phandle
ethernet: aurora: nb8800: add missing of_node_put after calling of_parse_phandle
ethernet: arc: emac_main: add missing of_node_put after calling of_parse_phandle
ethernet: apm: xgene: add missing of_node_put after calling of_parse_phandle
ethernet: altera: add missing of_node_put
8139too: fix system hang when there is a tx timeout event.
qed: Fix error return code in qed_resc_alloc()
net: qlcnic: avoid superfluous assignement
dsa: b53: remove redundant if
...
Ralf Baechle [Wed, 3 Aug 2016 10:55:49 +0000 (12:55 +0200)]
Merge branch '4.7-fixes' into mips-for-linux-next
Mika Penttilä [Wed, 3 Aug 2016 06:52:32 +0000 (23:52 -0700)]
Input: add driver for SiS 9200 family I2C touchscreen controllers
This is a driver for SiS 9200 family touchscreen controllers using I2C bus.
Signed-off-by: Mika Penttilä <mika.penttila@nextfour.com>
Acked-by: Tammy Tseng <tammy_tseng@sis.com>
Acked-by: Yuger Yu <yuger_yu@sis.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Matt Redfearn [Tue, 14 Jun 2016 13:59:38 +0000 (14:59 +0100)]
MIPS: mm: Fix definition of R6 cache instruction
Commit
a168b8f1cde6 ("MIPS: mm: Add MIPS R6 instruction encodings") added
an incorrect definition of the redefined MIPSr6 cache instruction.
Executing any kernel code including this instuction results in a
reserved instruction exception and kernel panic.
Fix the instruction definition.
Fixes:
a168b8f1cde6588ff7a67699fa11e01bc77a5ddd
Signed-off-by: Matt Redfearn <matt.redfearn@imgtec.com>
Cc: <stable@vger.kernel.org> # 4.x-
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/13663/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Harvey Hunt [Thu, 16 Jun 2016 15:35:39 +0000 (16:35 +0100)]
MIPS: tools: Fix relocs tool compiler warnings
When using clang as HOSTCC, the following warnings appear:
In file included from arch/mips/boot/tools/relocs_64.c:27:0:
arch/mips/boot/tools/relocs.c: In function ‘read_relocs’:
arch/mips/boot/tools/relocs.c:397:4: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
ELF_R_SYM(rel->r_info) = elf32_to_cpu(ELF_R_SYM(rel->r_info));
^~~~~~~~~
arch/mips/boot/tools/relocs.c:397:4: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
arch/mips/boot/tools/relocs.c: In function ‘walk_relocs’:
arch/mips/boot/tools/relocs.c:491:4: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
Elf_Sym *sym = &sh_symtab[ELF_R_SYM(rel->r_info)];
^~~~~~~
arch/mips/boot/tools/relocs.c: In function ‘do_reloc’:
arch/mips/boot/tools/relocs.c:502:2: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
unsigned r_type = ELF_R_TYPE(rel->r_info);
^~~~~~~~
arch/mips/boot/tools/relocs.c: In function ‘do_reloc_info’:
arch/mips/boot/tools/relocs.c:641:3: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
rel_type(ELF_R_TYPE(rel->r_info)),
^~~~~~~~
Fix them by making Elf64_Mips_Rela a union
Signed-off-by: Harvey Hunt <harvey.hunt@imgtec.com>
Acked-by: Matt Redfearn <matt.redfearn@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/13683/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Dmitry Torokhov [Tue, 2 Aug 2016 17:31:43 +0000 (10:31 -0700)]
Input: ili210x - fix permissions on "calibrate" attribute
"calibrate" attribute does not provide "show" methods and thus we should
not mark it as readable.
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
KT Liao [Wed, 13 Jul 2016 18:12:12 +0000 (11:12 -0700)]
Input: elan_i2c - properly wake up touchpad on ASUS laptops
Some ASUS laptops were shipped with touchpads that require to be woken up
first, before trying to switch them into absolute reporting mode, otherwise
touchpad would fail to work while flooding the logs with:
elan_i2c i2c-ELAN1000:00: invalid report id data (1)
Among affected devices are Asus E202SA, N552VW, X456UF, UX305CA, and
others. We detect such devices by checking the IC type and product ID
numbers and adjusting order of operations accordingly.
Signed-off-by: KT Liao <kt.liao@emc.com.tw>
Reported-by: Chris Chiu <chiu@endlessm.com>
Reported-by: Vlad Glagolev <stealth@vaygr.net>
Tested-by: Vlad Glagolev <stealth@vaygr.net>
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Robert Dolca [Thu, 28 Jul 2016 21:28:46 +0000 (14:28 -0700)]
Input: add driver for Silead touchscreens
This driver adds support for Silead touchscreens. It has been tested
with GSL1680 and GSL3680 touch panels.
It supports ACPI and device tree enumeration. Screen resolution,
the maximum number of fingers supported and firmware name are
configurable.
Signed-off-by: Robert Dolca <robert.dolca@intel.com>
Signed-off-by: Daniel Jansen <djaniboe@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Benjamin Tissoires [Thu, 28 Jul 2016 18:22:13 +0000 (11:22 -0700)]
Input: elantech - fix debug dump of the current packet
The use of mixed psmouse_printk() and printk creates 2 lines in the log,
while the use of %*ph solves everything.
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Andrea Gelmini [Sat, 21 May 2016 11:59:39 +0000 (13:59 +0200)]
MIPS: Cobalt: Fix typo
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Cc: trivial@kernel.org
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/13316/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Andrea Gelmini [Sat, 21 May 2016 11:59:32 +0000 (13:59 +0200)]
MIPS: Octeon: Fix typo
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Cc: trivial@kernel.org
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/13315/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Sudip Mukherjee [Thu, 16 Jun 2016 20:46:08 +0000 (21:46 +0100)]
MIPS: Lantiq: Fix build failure
Some configs of mips like xway_defconffig are failing with the error:
arch/mips/lantiq/irq.c:209:2: error: initialization from incompatible
pointer type [-Werror]
"icu",
^
arch/mips/lantiq/irq.c:209:2: error: (near initialization for
'ltq_irq_type.parent_device') [-Werror]
arch/mips/lantiq/irq.c:219:2: error: initialization from incompatible
pointer type [-Werror]
"eiu",
^
arch/mips/lantiq/irq.c:219:2: error: (near initialization for
'ltq_eiu_type.parent_device') [-Werror]
The first member of the "struct irq" is no longer a pointer for the
name.
Fixes:
be45beb2df69 ("genirq: Add runtime power management support for IRQ chips")
Signed-off-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk>
Acked-by: John Crispin <john@phrozen.org>
Cc: linux-kernel@vger.kernel.org
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/13684/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Benjamin Herrenschmidt [Tue, 2 Aug 2016 05:53:01 +0000 (15:53 +1000)]
powerpc/32: Fix early access to cpu_spec relocation
Commit
9402c6846131 ("powerpc: Factor do_feature_fixup calls")
introduced a subtle bug on 32-bit. When reading the cpu spec from the
global, we not only need to do a pointer relocation on the global
address but also on the pointer we read from it.
This fixes crashes reported on MPC5200 based machines.
Fixes:
9402c6846131 ("powerpc: Factor do_feature_fixup calls")
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Dean Luick [Thu, 28 Jul 2016 19:21:27 +0000 (15:21 -0400)]
IB/hfi1: Add cache evict LRU list
The original code used a LRU list to evict nodes which were least
recently used. For correctness the evict code was moved under the
handler->lock, now add back the LRU list.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Ira Weiny [Thu, 28 Jul 2016 19:21:26 +0000 (15:21 -0400)]
IB/hfi1: Fix memory leak during unexpected shutdown
During an unexpected shutdown, references to tid_rb_node were NULL'ed out
without properly being released.
Fix this by calling clear_tid_node in the mmu notifier remove callback
rather than after these callbacks are called.
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dean Luick [Thu, 28 Jul 2016 19:21:25 +0000 (15:21 -0400)]
IB/hfi1: Remove unneeded mm argument in remove function
The reworked mmu_rb interface allows the unused mm argument to be removed.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dean Luick [Thu, 28 Jul 2016 19:21:24 +0000 (15:21 -0400)]
IB/hfi1: Consistently call ops->remove outside spinlock
The ops->remove() callback was called by hfi1_mmu_unregister() with a
NULL mm argument while holding a spinlock. In the case of sdma_rb_remove()
this caused it to pass current->mm to hfi1_release_user_pages()
This had 2 problems. First this would attempt to acquire the mmap_sem
under a spin lock. Second the use of current->mm is not always guaranteed
to be the proper mm when the fd is being closed.
Rather than depend on this implicit behavior we move all calls to
ops->remove outside of the spinlock. This also allows the correct
mm to be used in the remove callback without fear of deadlock.
Because the MMU notifier is not guaranteed to hold mm->mmap_sem, but
usually does, we must delay all remove callbacks until out of the notifier,
when the callbacks can take the mmap_sem if they need to.
Code comments were added to clarify what the expectations are for the
users of the mmu rb tree.
Suggested-by: Jim Foraker <foraker1@llnl.gov>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dean Luick [Thu, 28 Jul 2016 19:21:23 +0000 (15:21 -0400)]
IB/hfi1: Use evict mmu rb operation
Use the new cache evict operation in the SDMA code. This allows the cache
to properly coordinate evicts and removes, preventing any race. With this
change, the separate list, lock, and race flag are not needed.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dean Luick [Thu, 28 Jul 2016 19:21:22 +0000 (15:21 -0400)]
IB/hfi1: Add evict operation to the mmu rb handler
Allow users to clear nodes from the rb tree based on their evict callback.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dean Luick [Thu, 28 Jul 2016 19:21:21 +0000 (15:21 -0400)]
IB/hfi1: Fix TID caching actions
Per file descriptor TID caching actions depend on a global that can
change midway through the lifetime of that file descriptor.
Make the use of caching consistent for the life of the file descriptor
by using the presence of the cache handler to decide when to use the cache
functions.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dean Luick [Thu, 28 Jul 2016 19:21:20 +0000 (15:21 -0400)]
IB/hfi1: Make the cache handler own its rb tree root
The objects which use cache handling should reference their own handler
object not the internal data structure it uses to track the nodes.
Have the "users" of the mmu notifier code pass opaque objects which can
then be properly used in the mmu callbacks depending on the owners needs.
This patch has the additional benefit that operations no longer require a
look up in a list to find the handlers.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Ira Weiny [Thu, 28 Jul 2016 19:21:19 +0000 (15:21 -0400)]
IB/hfi1: Make use of mm consistent
The hfi1 driver registers a mmu_notifier callback when /dev/hfi1_* is
opened, and unregisters it when the device is closed. The driver
incorrectly assumes that the close will always happen from the same
context as the open. In particular, closes due to SIGKILL or OOM killer
activity may happen from a different context. In these cases, the wrong
mm is passed to mmu_notifier_unregister(), which causes improper reference
counting for the victim mm, and eventual memory corruption.
Preserve the mm for all open file descriptors and use this mm rather than
current->mm for memory operations for the lifetime of that fd. Note: this
patch leaves 1 use of current->mm in place. This use is removed in a
follow on patch because other functional changes were required prior to
that use being removed.
If registration fails, there is no reason to keep the handler object
around. Free the handler object rather than add it to the list to
prevent any mmu_notifier operations, including unregister, when
registration fails.
Suggested-by: Jim Foraker <foraker1@llnl.gov>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dean Luick [Thu, 28 Jul 2016 19:21:18 +0000 (15:21 -0400)]
IB/hfi1: Fix user SDMA racy user request claim
The user SDMA in-use claim bit is in the structure that gets zeroed out
once the claim is made. Move the request in-use flag into its own bit
array and use that for atomic claims. This cleans up the claim code and
removes any race possibility.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dean Luick [Thu, 28 Jul 2016 19:21:17 +0000 (15:21 -0400)]
IB/hfi1: Fix error condition that needs to clean up
If input validation fails, properly free the request before returning.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dean Luick [Thu, 28 Jul 2016 19:21:16 +0000 (15:21 -0400)]
IB/hfi1: Release node on insert failure
If unable to insert node into the RB tree cache, node will be freed
before returning from the function. Null out iovec's pointer to node
so iovec does not try to free it later.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dean Luick [Thu, 28 Jul 2016 19:21:15 +0000 (15:21 -0400)]
IB/hfi1: Validate SDMA user iovector count
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dean Luick [Thu, 28 Jul 2016 19:21:14 +0000 (15:21 -0400)]
IB/hfi1: Validate SDMA user request index
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dean Luick [Thu, 28 Jul 2016 19:21:13 +0000 (15:21 -0400)]
IB/hfi1: Use the same capability state for all shared contexts
Save the current capability state at user context creation
time. Report this saved value for all shared contexts.
Also get rid of unnecessary hfi1_get_base_kinfo function.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Ira Weiny [Thu, 28 Jul 2016 19:21:12 +0000 (15:21 -0400)]
IB/hfi1: Prevent null pointer dereference
If a context has not been assigned or assignment failed, pq may be NULL.
Move the unregister within the protection of the null check.
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dean Luick [Thu, 28 Jul 2016 16:27:37 +0000 (12:27 -0400)]
IB/hfi1: Rename TID mmu_rb_* functions
Clarify the names of the TID mmu functions.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dean Luick [Thu, 28 Jul 2016 16:27:36 +0000 (12:27 -0400)]
IB/hfi1: Remove unneeded empty check in hfi1_mmu_rb_unregister()
Checking if the rb tree is empty is redundant with the while loop which is
emptying the rb tree.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Ira Weiny [Thu, 28 Jul 2016 16:27:35 +0000 (12:27 -0400)]
IB/hfi1: Restructure hfi1_file_open
Rearrange the file open call in prep for new changes.
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dean Luick [Thu, 28 Jul 2016 16:27:34 +0000 (12:27 -0400)]
IB/hfi1: Make iovec loop index easy to understand
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Ira Weiny [Thu, 28 Jul 2016 16:27:33 +0000 (12:27 -0400)]
IB/hfi1: Use "false" not 0
For bool parameters "false" should be used
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Ira Weiny [Thu, 28 Jul 2016 16:27:32 +0000 (12:27 -0400)]
IB/hfi1: Remove unused sub-context parameter
subctxt is not used, just remove it.
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Ira Weiny [Thu, 28 Jul 2016 16:27:31 +0000 (12:27 -0400)]
IB/hfi1: Consolidate __mmu_rb_remove and hfi1_mmu_rb_remove
__mmu_rb_remove was called in only 1 place which was a very simple
call site. Combine this function into its caller.
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dean Luick [Thu, 28 Jul 2016 16:27:30 +0000 (12:27 -0400)]
IB/hfi1: Always expect ops functions
Remove, insert, and invalidate are always provided. No
need to test.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Ira Weiny [Thu, 28 Jul 2016 16:27:29 +0000 (12:27 -0400)]
IB/hfi1: Add parameter names to callback declarations
This makes it more clear what these functions are
operating on.
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Ira Weiny [Thu, 28 Jul 2016 16:27:28 +0000 (12:27 -0400)]
IB/hfi1: Add parameter names to function declarations
Parameter names to function declarations make it more clear
what those parameters do.
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dean Luick [Thu, 28 Jul 2016 16:27:27 +0000 (12:27 -0400)]
IB/hfi1: Remove unused function hfi1_mmu_rb_search
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dean Luick [Thu, 28 Jul 2016 16:27:26 +0000 (12:27 -0400)]
IB/hfi1: Remove unused uctxt->subpid and uctxt->pid
These are no longer needed.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Ira Weiny [Thu, 28 Jul 2016 16:27:25 +0000 (12:27 -0400)]
IB/hfi1: Fix minor format error
Brackets should be on the next line of a function
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Ira Weiny [Thu, 28 Jul 2016 01:09:40 +0000 (21:09 -0400)]
IB/hfi1: Expand reported serial number
Expand the serial number space by using more bits
from the GUID.
Reviewed-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Ira Weiny [Thu, 28 Jul 2016 01:08:42 +0000 (21:08 -0400)]
IB/hfi1: Allow for non-double word multiple message sizes for user SDMA
The driver pads non-double word multiple message sizes but it doesn't
account for this padding when the packet length is calculated. Also, the
data length is miscalculated for message sizes less than 4 bytes due to
the bit representation in LRH. And there's a check for non-double word
multiple message sizes that prevents these messages from being sent.
This patch fixes length miscalculations and enables the functionality to
send non-double word multiple message sizes.
Reviewed-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Ira Weiny [Thu, 28 Jul 2016 01:07:36 +0000 (21:07 -0400)]
IB/rdmavt: Eliminate redundant opcode test in mr ref clear
The use of the specific opcode test is redundant since
all ack entry users correctly manipulate the mr pointer
to selectively trigger the reference clearing.
The overly specific test hinders the use of implementation
specific operations.
The change needs to get rid of the union to insure that
an atomic value is not seen as an MR pointer.
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Ira Weiny [Thu, 28 Jul 2016 01:06:15 +0000 (21:06 -0400)]
IB/hfi1: Handle kzalloc failure in init_pervl_scs
Checking the return value of the memory allocation call in
init_pervl_scs() was missed. Recently the kmalloc() was changed to
kzalloc() which identified the problem.
While fixing this issue 2 other bugs were noticed. First, the array
being allocated is accessed in the nomem path which can be reached before
it is allocated. Second, kernel_send_context was not released on error.
Fix both of these by creating a more common memory unwind label structure.
Fixes:
35f6befc8441 ("staging/rdma/hfi1: Add qp to send context mapping for PIO")
Reported-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Darrick J. Wong [Wed, 3 Aug 2016 02:31:07 +0000 (12:31 +1000)]
xfs: move (and rename) the deferred bmap-free tracepoints
Rename the deferred bmap-free to extent_free and make them only
trigger when we're really running deferred ops.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Wed, 3 Aug 2016 02:30:31 +0000 (12:30 +1000)]
xfs: collapse single use static functions
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Wed, 3 Aug 2016 02:29:32 +0000 (12:29 +1000)]
xfs: remove unnecessary parentheses from log redo item recovery functions
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Wed, 3 Aug 2016 02:28:43 +0000 (12:28 +1000)]
xfs: remove the extents array from the rmap update done log item
Nothing ever uses the extent array in the rmap update done redo
item, so remove it before it is fixed in the on-disk log format.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Wed, 3 Aug 2016 02:26:22 +0000 (12:26 +1000)]
xfs: in btree_lshift, only allocate temporary cursor when needed
We only need the temporary cursor in _btree_lshift if we're shifting
in an overlapped btree. Therefore, factor that into a single block
of code so we avoid unnecessary cursor duplication.
Also fix use of the wrong cursor when checking for corruption in
xfs_btree_rshift().
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Wed, 3 Aug 2016 02:22:45 +0000 (12:22 +1000)]
xfs: remove unnecesary lshift/rshift key initialization
In the lshift/rshift functions we don't use the key variable for
anything now, so remove the variable and its initializer. The
update_keys functions figure out the key for a block on their own.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Wed, 3 Aug 2016 02:22:12 +0000 (12:22 +1000)]
xfs: remove the get*keys and update_keys btree ops pointers
These are internal btree functions; we don't need them to be
dispatched via function pointers. Make them static again and
just check the overlapped flag to figure out what we need to
do. The strategy behind this patch was suggested by Christoph.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Suggested-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Wed, 3 Aug 2016 02:20:57 +0000 (12:20 +1000)]
xfs: enable the rmap btree functionality
Originally-From: Dave Chinner <dchinner@redhat.com>
Add the feature flag to the supported matrix so that the kernel can
mount and use rmap btree enabled filesystems
Signed-off-by: Dave Chinner <dchinner@redhat.com>
[darrick.wong@oracle.com: move the experimental tag]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Wed, 3 Aug 2016 02:19:53 +0000 (12:19 +1000)]
xfs: don't update rmapbt when fixing agfl
Allow a caller of xfs_alloc_fix_freelist to disable rmapbt updates
when fixing the AG freelist. xfs_repair needs this during phase 5
to be able to adjust the freelist while it's reconstructing the rmap
btree; the missing entries will be added back at the very end of
phase 5 once the AGFL contents settle down.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Wed, 3 Aug 2016 02:18:07 +0000 (12:18 +1000)]
xfs: disable XFS_IOC_SWAPEXT when rmap btree is enabled
Swapping extents between two inodes requires the owner to be updated
in the rmap tree for all the extents that are swapped. This code
does not yet exist, so switch off the XFS_IOC_SWAPEXT ioctl until
support has been implemented. This will need to be done before the
rmap btree code can have the experimental tag removed.
This functionality will be provided in a (much) later patch, using
some of the reflink deferred block remapping functionality to
accomlish extent swapping with rmap updates.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Wed, 3 Aug 2016 02:17:11 +0000 (12:17 +1000)]
xfs: add rmap btree block detection to log recovery
Originally-From: Dave Chinner <dchinner@redhat.com>
So such blocks can be correctly identified and have their operations
structures attached to validate recovery has not resulted in a
correct block.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Wed, 3 Aug 2016 02:16:44 +0000 (12:16 +1000)]
xfs: add rmap btree geometry feature flag
Originally-From: Dave Chinner <dchinner@redhat.com>
So xfs_info and other userspace utilities know the filesystem is
using this feature.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Wed, 3 Aug 2016 02:16:05 +0000 (12:16 +1000)]
xfs: propagate bmap updates to rmapbt
When we map, unmap, or convert an extent in a file's data or attr
fork, schedule a respective update in the rmapbt. Previous versions
of this patch required a 1:1 correspondence between bmap and rmap,
but this is no longer true as we now have ability to make interval
queries against the rmapbt.
We use the deferred operations code to handle redo operations
atomically and deadlock free. This plumbs in all five rmap actions
(map, unmap, convert extent, alloc, free); we'll use the first three
now for file data, and reflink will want the last two. We also add
an error injection site to test log recovery.
Finally, we need to fix the bmap shift extent code to adjust the
rmaps correctly.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Wed, 3 Aug 2016 02:11:01 +0000 (12:11 +1000)]
xfs: enable the xfs_defer mechanism to process rmaps to update
Connect the xfs_defer mechanism with the pieces that we'll need to
handle deferred rmap updates. We'll wire up the existing code to
our new deferred mechanism later.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Wed, 3 Aug 2016 02:09:48 +0000 (12:09 +1000)]
xfs: log rmap intent items
Provide a mechanism for higher levels to create RUI/RUD items, submit
them to the log, and a stub function to deal with recovered RUI items.
These parts will be connected to the rmapbt in a later patch.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Wed, 3 Aug 2016 02:04:45 +0000 (12:04 +1000)]
xfs: create rmap update intent log items
Create rmap update intent/done log items to record redo information in
the log. Because we need to roll transactions between updating the
bmbt mapping and updating the reverse mapping, we also have to track
the status of the metadata updates that will be recorded in the
post-roll transactions, just in case we crash before committing the
final transaction. This mechanism enables log recovery to finish what
was already started.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Wed, 3 Aug 2016 02:03:58 +0000 (12:03 +1000)]
xfs: add rmap btree insert and delete helpers
Add a couple of helper functions to encapsulate rmap btree insert and
delete operations. Add tracepoints to the update function.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Wed, 3 Aug 2016 02:03:19 +0000 (12:03 +1000)]
xfs: convert unwritten status of reverse mappings
Provide a function to convert an unwritten rmap extent to a real one
and vice versa.
[ dchinner: Note that this algorithm and code was derived from the
existing bmapbt unwritten extent conversion code in
xfs_bmap_add_extent_unwritten_real(). ]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Wed, 3 Aug 2016 01:45:12 +0000 (11:45 +1000)]
xfs: remove an extent from the rmap btree
Originally-From: Dave Chinner <dchinner@redhat.com>
Now that we have records in the rmap btree, we need to remove them
when extents are freed. This needs to find the relevant record in
the btree and remove/trim/split it accordingly.
[darrick.wong@oracle.com: make rmap routines handle the enlarged keyspace]
[dchinner: remove remaining unused debug printks]
[darrick: fix a bug when growfs in an AG with an rmap ending at EOFS]
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Wed, 3 Aug 2016 01:44:21 +0000 (11:44 +1000)]
xfs: add an extent to the rmap btree
Originally-From: Dave Chinner <dchinner@redhat.com>
Now all the btree, free space and transaction infrastructure is in
place, we can finally add the code to insert reverse mappings to the
rmap btree. Freeing will be done in a separate patch, so just the
addition operation can be focussed on here.
[darrick: handle owner offsets when adding rmaps]
[dchinner: remove remaining debug printk statements]
[darrick: move unwritten bit to rm_offset]
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Wed, 3 Aug 2016 01:43:24 +0000 (11:43 +1000)]
xfs: add tracepoints for the rmap functions
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Wed, 3 Aug 2016 01:42:39 +0000 (11:42 +1000)]
xfs: teach rmapbt to support interval queries
Now that the generic btree code supports querying all records within a
range of keys, use that functionality to allow us to ask for all the
extents mapped to a range of physical blocks.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Wed, 3 Aug 2016 01:40:56 +0000 (11:40 +1000)]
xfs: support overlapping intervals in the rmap btree
Now that the generic btree code supports overlapping intervals, plug
in the rmap btree to this functionality. We will need it to find
potential left neighbors in xfs_rmap_{alloc,free} later in the patch
set.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Wed, 3 Aug 2016 01:39:05 +0000 (11:39 +1000)]
xfs: add rmap btree operations
Originally-From: Dave Chinner <dchinner@redhat.com>
Implement the generic btree operations needed to manipulate rmap
btree blocks. This is very similar to the per-ag freespace btree
implementation, and uses the AGFL for allocation and freeing of
blocks.
Adapt the rmap btree to store owner offsets within each rmap record,
and to handle the primary key being redefined as the tuple
[agblk, owner, offset]. The expansion of the primary key is crucial
to allowing multiple owners per extent.
[darrick: adapt the btree ops to deal with offsets]
[darrick: remove init_rec_from_key]
[darrick: move unwritten bit to rm_offset]
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Wed, 3 Aug 2016 01:38:24 +0000 (11:38 +1000)]
xfs: rmap btree requires more reserved free space
Originally-From: Dave Chinner <dchinner@redhat.com>
The rmap btree is allocated from the AGFL, which means we have to
ensure ENOSPC is reported to userspace before we run out of free
space in each AG. The last allocation in an AG can cause a full
height rmap btree split, and that means we have to reserve at least
this many blocks *in each AG* to be placed on the AGFL at ENOSPC.
Update the various space calculation functions to handle this.
Also, because the macros are now executing conditional code and are
called quite frequently, convert them to functions that initialise
variables in the struct xfs_mount, use the new variables everywhere
and document the calculations better.
[darrick.wong@oracle.com: don't reserve blocks if !rmap]
[dchinner@redhat.com: update m_ag_max_usable after growfs]
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Wed, 3 Aug 2016 01:37:10 +0000 (11:37 +1000)]
xfs: rmap btree transaction reservations
The rmap btrees will use the AGFL as the block allocation source, so
we need to ensure that the transaction reservations reflect the fact
this tree is modified by allocation and freeing. Hence we need to
extend all the extent allocation/free reservations used in
transactions to handle this.
Note that this also gets rid of the unused XFS_ALLOCFREE_LOG_RES
macro, as we now do buffer reservations based on the number of
buffers logged via xfs_calc_buf_res(). Hence we only need the buffer
count calculation now.
[darrick: use rmap_maxlevels when calculating log block resv]
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Wed, 3 Aug 2016 01:36:08 +0000 (11:36 +1000)]
xfs: add rmap btree growfs support
Originally-From: Dave Chinner <dchinner@redhat.com>
Now we can read and write rmap btree blocks, we can add support to
the growfs code to initialise new rmap btree blocks.
[darrick.wong@oracle.com: fill out the rmap offset fields]
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Wed, 3 Aug 2016 01:36:07 +0000 (11:36 +1000)]
xfs: define the on-disk rmap btree format
Originally-From: Dave Chinner <dchinner@redhat.com>
Now we have all the surrounding call infrastructure in place, we can
start filling out the rmap btree implementation. Start with the
on-disk btree format; add everything needed to read, write and
manipulate rmap btree blocks. This prepares the way for adding the
btree operations implementation.
[darrick: record owner and offset info in rmap btree]
[darrick: fork, bmbt and unwritten state in rmap btree]
[darrick: flags are a separate field in xfs_rmap_irec]
[darrick: calculate maxlevels separately]
[darrick: move the 'unwritten' bit into unused parts of rm_offset]
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Wed, 3 Aug 2016 01:33:43 +0000 (11:33 +1000)]
xfs: introduce rmap extent operation stubs
Originally-From: Dave Chinner <dchinner@redhat.com>
Add the stubs into the extent allocation and freeing paths that the
rmap btree implementation will hook into. While doing this, add the
trace points that will be used to track rmap btree extent
manipulations.
[darrick.wong@oracle.com: Extend the stubs to take full owner info.]
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Wed, 3 Aug 2016 01:33:42 +0000 (11:33 +1000)]
xfs: add owner field to extent allocation and freeing
For the rmap btree to work, we have to feed the extent owner
information to the the allocation and freeing functions. This
information is what will end up in the rmap btree that tracks
allocated extents. While we technically don't need the owner
information when freeing extents, passing it allows us to validate
that the extent we are removing from the rmap btree actually
belonged to the owner we expected it to belong to.
We also define a special set of owner values for internal metadata
that would otherwise have no owner. This allows us to tell the
difference between metadata owned by different per-ag btrees, as
well as static fs metadata (e.g. AG headers) and internal journal
blocks.
There are also a couple of special cases we need to take care of -
during EFI recovery, we don't actually know who the original owner
was, so we need to pass a wildcard to indicate that we aren't
checking the owner for validity. We also need special handling in
growfs, as we "free" the space in the last AG when extending it, but
because it's new space it has no actual owner...
While touching the xfs_bmap_add_free() function, re-order the
parameters to put the struct xfs_mount first.
Extend the owner field to include both the owner type and some sort
of index within the owner. The index field will be used to support
reverse mappings when reflink is enabled.
When we're freeing extents from an EFI, we don't have the owner
information available (rmap updates have their own redo items).
xfs_free_extent therefore doesn't need to do an rmap update. Make
sure that the log replay code signals this correctly.
This is based upon a patch originally from Dave Chinner. It has been
extended to add more owner information with the intent of helping
recovery operations when things go wrong (e.g. offset of user data
block in a file).
[dchinner: de-shout the xfs_rmap_*_owner helpers]
[darrick: minor style fixes suggested by Christoph Hellwig]
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Wed, 3 Aug 2016 01:31:47 +0000 (11:31 +1000)]
xfs: rmap btree add more reserved blocks
Originally-From: Dave Chinner <dchinner@redhat.com>
XFS reserves a small amount of space in each AG for the minimum
number of free blocks needed for operation. Adding the rmap btree
increases the number of reserved blocks, but it also increases the
complexity of the calculation as the free inode btree is optional
(like the rmbt).
Rather than calculate the prealloc blocks every time we need to
check it, add a function to calculate it at mount time and store it
in the struct xfs_mount, and convert the XFS_PREALLOC_BLOCKS macro
just to use the xfs-mount variable directly.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Wed, 3 Aug 2016 01:31:11 +0000 (11:31 +1000)]
xfs: add rmap btree stats infrastructure
Originally-From: Dave Chinner <dchinner@redhat.com>
The rmap btree will require the same stats as all the other generic
btrees, so add all the code for that now.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Wed, 3 Aug 2016 01:30:32 +0000 (11:30 +1000)]
xfs: introduce rmap btree definitions
Originally-From: Dave Chinner <dchinner@redhat.com>
Add new per-ag rmap btree definitions to the per-ag structures. The
rmap btree will sit in the empty slots on disk after the free space
btrees, and hence form a part of the array of space management
btrees. This requires the definition of the btree to be contiguous
with the free space btrees.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Wed, 3 Aug 2016 01:29:42 +0000 (11:29 +1000)]
xfs: increase XFS_BTREE_MAXLEVELS to fit the rmapbt
By my calculations, a 1,073,741,824 block AG with a 1k block size
can attain a maximum height of 9. Assuming a record size of 24
bytes, a key/ptr size of 44 bytes, and half-full btree nodes, we'd
need 53,687,092 blocks for the records and ~6 million blocks for the
keys. That requires a btree of height 9 based on the following
derivation:
Block size = 1024b
sblock CRC header = 56b
== 1024-56 = 968 bytes for tree data
rmapbt record = 24b
== 40 records per leaf block
rmapbt ptr/key = 44b
== 22 ptr/keys per block
Worst case, each block is half full, so 20 records and 11 ptrs per block.
1073741824 rmap records / 20 records per block
==
53687092 leaf blocks
53687092 leaves / 11 ptrs per block
== 4880645 level 1 blocks
== 443695 level 2 blocks
== 40336 level 3 blocks
== 3667 level 4 blocks
== 334 level 5 blocks
== 31 level 6 blocks
== 3 level 7 blocks
== 1 level 8 block
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Wed, 3 Aug 2016 01:26:33 +0000 (11:26 +1000)]
xfs: add tracepoints and error injection for deferred extent freeing
Add a couple of tracepoints for the deferred extent free operation and
a site for injecting errors while finishing the operation. This makes
it easier to debug deferred ops and test log redo.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Wed, 3 Aug 2016 01:23:49 +0000 (11:23 +1000)]
xfs: refactor redo intent item processing
Refactor the EFI intent item recovery (and cancellation) functions
into a general function that scans the AIL and an intent item type
specific handler. Move the function that recovers a single EFI item
into the extent free item code. We'll want the generalized function
when we start wiring up more redo item types.
Furthermore, ensure that log recovery only replays the redo items
that were in the AIL prior to recovery by checking the item LSN
against the largest LSN seen during log scanning. As written this
should never happen, but we can be defensive anyway.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Wed, 3 Aug 2016 01:19:29 +0000 (11:19 +1000)]
xfs: rename flist/free_list to dfops
Mechanical change of flist/free_list to dfops, since they're now
deferred ops, not just a freeing list.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Wed, 3 Aug 2016 01:18:10 +0000 (11:18 +1000)]
xfs: change xfs_bmap_{finish,cancel,init,free} -> xfs_defer_*
Drop the compatibility shims that we were using to integrate the new
deferred operation mechanism into the existing code. No new code.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Wed, 3 Aug 2016 01:15:38 +0000 (11:15 +1000)]
xfs: rework xfs_bmap_free callers to use xfs_defer_ops
Restructure everything that used xfs_bmap_free to use xfs_defer_ops
instead. For now we'll just remove the old symbols and play some
cpp magic to make it work; in the next patch we'll actually rename
everything.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Wed, 3 Aug 2016 01:14:35 +0000 (11:14 +1000)]
xfs: enable the xfs_defer mechanism to process extents to free
Connect the xfs_defer mechanism with the pieces that we'll need to
handle deferred extent freeing. We'll wire up the existing code to
our new deferred mechanism later.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Wed, 3 Aug 2016 01:13:47 +0000 (11:13 +1000)]
xfs: clean up typedef usage in the EFI/EFD handling code
Replace structure typedefs with struct xfs_foo_* in the EFI/EFD
handling code in preparation to move it over to deferred ops.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>