bpf: Use get_file_rcu() instead of get_file() for task_file iterator
With latest `bpftool prog` command, we observed the following kernel
panic.
BUG: kernel NULL pointer dereference, address:
0000000000000000
#PF: supervisor instruction fetch in kernel mode
#PF: error_code(0x0010) - not-present page
PGD
dfe894067 P4D
dfe894067 PUD
deb663067 PMD 0
Oops: 0010 [#1] SMP
CPU: 9 PID: 6023 ...
RIP: 0010:0x0
Code: Bad RIP value.
RSP: 0000:
ffffc900002b8f18 EFLAGS:
00010286
RAX:
ffff8883a405f400 RBX:
ffff888e46a6bf00 RCX:
000000008020000c
RDX:
0000000000000000 RSI:
0000000000000001 RDI:
ffff8883a405f400
RBP:
ffff888e46a6bf50 R08:
0000000000000000 R09:
ffffffff81129600
R10:
ffff8883a405f300 R11:
0000160000000000 R12:
0000000000002710
R13:
000000e9494b690c R14:
0000000000000202 R15:
0000000000000009
FS:
00007fd9187fe700(0000) GS:
ffff888e46a40000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
ffffffffffffffd6 CR3:
0000000de5d33002 CR4:
0000000000360ee0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
Call Trace:
<IRQ>
rcu_core+0x1a4/0x440
__do_softirq+0xd3/0x2c8
irq_exit+0x9d/0xa0
smp_apic_timer_interrupt+0x68/0x120
apic_timer_interrupt+0xf/0x20
</IRQ>
RIP: 0033:0x47ce80
Code: Bad RIP value.
RSP: 002b:
00007fd9187fba40 EFLAGS:
00000206 ORIG_RAX:
ffffffffffffff13
RAX:
0000000000000002 RBX:
00007fd931789160 RCX:
000000000000010c
RDX:
00007fd9308cdfb4 RSI:
00007fd9308cdfb4 RDI:
00007ffedd1ea0a8
RBP:
00007fd9187fbab0 R08:
000000000000000e R09:
000000000000002a
R10:
0000000000480210 R11:
00007fd9187fc570 R12:
00007fd9316cc400
R13:
0000000000000118 R14:
00007fd9308cdfb4 R15:
00007fd9317a9380
After further analysis, the bug is triggered by
Commit
eaaacd23910f ("bpf: Add task and task/file iterator targets")
which introduced task_file bpf iterator, which traverses all open file
descriptors for all tasks in the current namespace.
The latest `bpftool prog` calls a task_file bpf program to traverse
all files in the system in order to associate processes with progs/maps, etc.
When traversing files for a given task, rcu read_lock is taken to
access all files in a file_struct. But it used get_file() to grab
a file, which is not right. It is possible file->f_count is 0 and
get_file() will unconditionally increase it.
Later put_file() may cause all kind of issues with the above
as one of sympotoms.
The failure can be reproduced with the following steps in a few seconds:
$ cat t.c
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#define N 10000
int fd[N];
int main() {
int i;
for (i = 0; i < N; i++) {
fd[i] = open("./note.txt", 'r');
if (fd[i] < 0) {
fprintf(stderr, "failed\n");
return -1;
}
}
for (i = 0; i < N; i++)
close(fd[i]);
return 0;
}
$ gcc -O2 t.c
$ cat run.sh
#/bin/bash
for i in {1..100}
do
while true; do ./a.out; done &
done
$ ./run.sh
$ while true; do bpftool prog >& /dev/null; done
This patch used get_file_rcu() which only grabs a file if the
file->f_count is not zero. This is to ensure the file pointer
is always valid. The above reproducer did not fail for more
than 30 minutes.
Fixes: eaaacd23910f ("bpf: Add task and task/file iterator targets")
Suggested-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Link: https://lore.kernel.org/bpf/20200817174214.252601-1-yhs@fb.com