Stanislav Fomichev [Mon, 29 Jul 2019 21:51:11 +0000 (14:51 -0700)]
selftests/bpf: extend sockopt_sk selftest with TCP_CONGESTION use case
Ignore SOL_TCP:TCP_CONGESTION in getsockopt and always override
SOL_TCP:TCP_CONGESTION with "cubic" in setsockopt hook.
Call setsockopt(SOL_TCP, TCP_CONGESTION) with short optval ("nv")
to make sure BPF program has enough buffer space to replace it
with "cubic".
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Stanislav Fomichev [Mon, 29 Jul 2019 21:51:10 +0000 (14:51 -0700)]
bpf: always allocate at least 16 bytes for setsockopt hook
Since we always allocate memory, allocate just a little bit more
for the BPF program in case it need to override user input with
bigger value. The canonical example is TCP_CONGESTION where
input string might be too small to override (nv -> bbr or cubic).
16 bytes are chosen to match the size of TCP_CA_NAME_MAX and can
be extended in the future if needed.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Jakub Kicinski [Tue, 30 Jul 2019 21:03:00 +0000 (14:03 -0700)]
tools: bpftool: add support for reporting the effective cgroup progs
Takshak said in the original submission:
With different bpf attach_flags available to attach bpf programs specially
with BPF_F_ALLOW_OVERRIDE and BPF_F_ALLOW_MULTI, the list of effective
bpf-programs available to any sub-cgroups really needs to be available for
easy debugging.
Using BPF_F_QUERY_EFFECTIVE flag, one can get the list of not only attached
bpf-programs to a cgroup but also the inherited ones from parent cgroup.
So a new option is introduced to use BPF_F_QUERY_EFFECTIVE query flag here
to list all the effective bpf-programs available for execution at a specified
cgroup.
Reused modified test program test_cgroup_attach from tools/testing/selftests/bpf:
# ./test_cgroup_attach
With old bpftool:
# bpftool cgroup show /sys/fs/cgroup/cgroup-test-work-dir/cg1/
ID AttachType AttachFlags Name
271 egress multi pkt_cntr_1
272 egress multi pkt_cntr_2
Attached new program pkt_cntr_4 in cg2 gives following:
# bpftool cgroup show /sys/fs/cgroup/cgroup-test-work-dir/cg1/cg2
ID AttachType AttachFlags Name
273 egress override pkt_cntr_4
And with new "effective" option it shows all effective programs for cg2:
# bpftool cgroup show /sys/fs/cgroup/cgroup-test-work-dir/cg1/cg2 effective
ID AttachType AttachFlags Name
273 egress override pkt_cntr_4
271 egress override pkt_cntr_1
272 egress override pkt_cntr_2
Compared to original submission use a local flag instead of global
option.
We need to clear query_flags on every command, in case batch mode
wants to use varying settings.
v2: (Takshak)
- forbid duplicated flags;
- fix cgroup path freeing.
Signed-off-by: Takshak Chahande <ctakshak@fb.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Takshak Chahande <ctakshak@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Andrii Nakryiko [Tue, 30 Jul 2019 18:05:41 +0000 (11:05 -0700)]
selftests/bpf: fix clearing buffered output between tests/subtests
Clear buffered output once test or subtests finishes even if test was
successful. Not doing this leads to accumulation of output from previous
tests and on first failed tests lots of irrelevant output will be
dumped, greatly confusing things.
v1->v2: fix Fixes tag, add more context to patch
Fixes:
3a516a0a3a7b ("selftests/bpf: add sub-tests support for test_progs")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Alexei Starovoitov [Wed, 31 Jul 2019 04:03:06 +0000 (21:03 -0700)]
Merge branch 'gen-syn-cookie'
Petar Penkov says:
====================
This patch series introduces a BPF helper function that allows generating SYN
cookies from BPF. Currently, this helper is enabled at both the TC hook and the
XDP hook.
The first two patches in the series add/modify several TCP helper functions to
allow for SKB-less operation, as is the case at the XDP hook.
The third patch introduces the bpf_tcp_gen_syncookie helper function which
generates a SYN cookie for either XDP or TC programs. The return value of
this function contains both the MSS value, encoded in the cookie, and the
cookie itself.
The last three patches sync tools/ and add a test.
Performance evaluation:
I sent 10Mpps to a fixed port on a host with 2 10G bonded Mellanox 4 NICs from
random IPv6 source addresses. Without XDP I observed 7.2Mpps (syn-acks) being
sent out if the IPv6 packets carry 20 bytes of TCP options or 7.6Mpps if they
carry no options. If I attached a simple program that checks if a packet is
IPv6/TCP/SYN, looks up the socket, issues a cookie, and sends it back out after
swapping src/dest, recomputing the checksum, and setting the ACK flag, I
observed 10Mpps being sent back out.
Changes since v1:
1/ Added performance numbers to the cover letter
2/ Patch 2: Refactored a bit to fix compilation issues
3/ Patch 3: Changed ENOTSUPP to EOPNOTSUPP at Toke's suggestion
Changes since RFC:
1/ Cookie is returned in host order at Alexei's suggestion
2/ If cookies are not enabled via a sysctl, the helper function returns
-ENOENT instead of -EINVAL at Lorenz's suggestion
3/ Fixed documentation to properly reflect that MSS is 16 bits at
Lorenz's suggestion
4/ BPF helper requires TCP length to match ->doff field, rather than to simply
be no more than 20 bytes at Eric and Alexei's suggestion
5/ Packet type is looked up from the packet version field, rather than from the
socket. v4 packets are rejected on v6-only sockets but should work with
dual stack listeners at Eric's suggestion
6/ Removed unnecessary `net` argument from helper function in patch 2 at
Lorenz's suggestion
7/ Changed test to only pass MSS option so we can convince the verifier that the
memory access is not out of bounds
Note that 7/ below illustrates the verifier might need to be extended to allow
passing a variable tcph->doff to the helper function like below:
__u32 thlen = tcph->doff * 4;
if (thlen < sizeof(*tcph))
return;
__s64 cookie = bpf_tcp_gen_syncookie(sk, ipv4h, 20, tcph, thlen);
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Petar Penkov [Mon, 29 Jul 2019 16:59:18 +0000 (09:59 -0700)]
selftests/bpf: add test for bpf_tcp_gen_syncookie
Modify the existing bpf_tcp_check_syncookie test to also generate a
SYN cookie, pass the packet to the kernel, and verify that the two
cookies are the same (and both valid). Since cloned SKBs are skipped
during generic XDP, this test does not issue a SYN cookie when run in
XDP mode. We therefore only check that a valid SYN cookie was issued at
the TC hook.
Additionally, verify that the MSS for that SYN cookie is within
expected range.
Signed-off-by: Petar Penkov <ppenkov@google.com>
Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Petar Penkov [Mon, 29 Jul 2019 16:59:17 +0000 (09:59 -0700)]
selftests/bpf: bpf_tcp_gen_syncookie->bpf_helpers
Expose bpf_tcp_gen_syncookie to selftests.
Signed-off-by: Petar Penkov <ppenkov@google.com>
Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Petar Penkov [Mon, 29 Jul 2019 16:59:16 +0000 (09:59 -0700)]
bpf: sync bpf.h to tools/
Sync updated documentation for bpf_redirect_map.
Sync the bpf_tcp_gen_syncookie helper function definition with the one
in tools/uapi.
Signed-off-by: Petar Penkov <ppenkov@google.com>
Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Petar Penkov [Mon, 29 Jul 2019 16:59:15 +0000 (09:59 -0700)]
bpf: add bpf_tcp_gen_syncookie helper
This helper function allows BPF programs to try to generate SYN
cookies, given a reference to a listener socket. The function works
from XDP and with an skb context since bpf_skc_lookup_tcp can lookup a
socket in both cases.
Signed-off-by: Petar Penkov <ppenkov@google.com>
Suggested-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Petar Penkov [Mon, 29 Jul 2019 16:59:14 +0000 (09:59 -0700)]
tcp: add skb-less helpers to retrieve SYN cookie
This patch allows generation of a SYN cookie before an SKB has been
allocated, as is the case at XDP.
Signed-off-by: Petar Penkov <ppenkov@google.com>
Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Petar Penkov [Mon, 29 Jul 2019 16:59:13 +0000 (09:59 -0700)]
tcp: tcp_syn_flood_action read port from socket
This allows us to call this function before an SKB has been
allocated.
Signed-off-by: Petar Penkov <ppenkov@google.com>
Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Alexei Starovoitov [Mon, 29 Jul 2019 20:50:55 +0000 (13:50 -0700)]
Merge branch 'devmap_hash'
Toke Høiland-Jørgensen says:
====================
This series adds a new map type, devmap_hash, that works like the existing
devmap type, but using a hash-based indexing scheme. This is useful for the use
case where a devmap is indexed by ifindex (for instance for use with the routing
table lookup helper). For this use case, the regular devmap needs to be sized
after the maximum ifindex number, not the number of devices in it. A hash-based
indexing scheme makes it possible to size the map after the number of devices it
should contain instead.
This was previously part of my patch series that also turned the regular
bpf_redirect() helper into a map-based one; for this series I just pulled out
the patches that introduced the new map type.
Changelog:
v5:
- Dynamically set the number of hash buckets by rounding up max_entries to the
nearest power of two (mirroring the regular hashmap), as suggested by Jesper.
v4:
- Remove check_memlock parameter that was left over from an earlier patch
series.
- Reorder struct members to avoid holes.
v3:
- Rework the split into different patches
- Use spin_lock_irqsave()
- Also add documentation and bash completion definitions for bpftool
v2:
- Split commit adding the new map type so uapi and tools changes are separate.
Changes to these patches since the previous series:
- Rebase on top of the other devmap changes (makes this one simpler!)
- Don't enforce key==val, but allow arbitrary indexes.
- Rename the type to devmap_hash to reflect the fact that it's just a hashmap now.
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Toke Høiland-Jørgensen [Fri, 26 Jul 2019 16:06:58 +0000 (18:06 +0200)]
tools: Add definitions for devmap_hash map type
This adds selftest and bpftool updates for the devmap_hash map type.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Toke Høiland-Jørgensen [Fri, 26 Jul 2019 16:06:57 +0000 (18:06 +0200)]
tools/libbpf_probes: Add new devmap_hash type
This adds the definition for BPF_MAP_TYPE_DEVMAP_HASH to libbpf_probes.c in
tools/lib/bpf.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Toke Høiland-Jørgensen [Fri, 26 Jul 2019 16:06:56 +0000 (18:06 +0200)]
tools/include/uapi: Add devmap_hash BPF map type
This adds the devmap_hash BPF map type to the uapi headers in tools/.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Toke Høiland-Jørgensen [Fri, 26 Jul 2019 16:06:55 +0000 (18:06 +0200)]
xdp: Add devmap_hash map type for looking up devices by hashed index
A common pattern when using xdp_redirect_map() is to create a device map
where the lookup key is simply ifindex. Because device maps are arrays,
this leaves holes in the map, and the map has to be sized to fit the
largest ifindex, regardless of how many devices actually are actually
needed in the map.
This patch adds a second type of device map where the key is looked up
using a hashmap, instead of being used as an array index. This allows maps
to be densely packed, so they can be smaller.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Toke Høiland-Jørgensen [Fri, 26 Jul 2019 16:06:53 +0000 (18:06 +0200)]
xdp: Refactor devmap allocation code for reuse
The subsequent patch to add a new devmap sub-type can re-use much of the
initialisation and allocation code, so refactor it into separate functions.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Toke Høiland-Jørgensen [Fri, 26 Jul 2019 16:06:52 +0000 (18:06 +0200)]
include/bpf.h: Remove map_insert_ctx() stubs
When we changed the device and CPU maps to use linked lists instead of
bitmaps, we also removed the need for the map_insert_ctx() helpers to keep
track of the bitmaps inside each map. However, it seems I forgot to remove
the function definitions stubs, so remove those here.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Alexei Starovoitov [Sun, 28 Jul 2019 05:36:20 +0000 (22:36 -0700)]
Merge branch 'revamp-test_progs'
Andrii Nakryiko says:
====================
This patch set makes a number of changes to test_progs selftest, which is
a collection of many other tests (and sometimes sub-tests as well), to provide
better testing experience and allow to start convering many individual test
programs under selftests/bpf into a single and convenient test runner.
Patch #1 fixes issue with Makefile, which makes prog_tests/test.h compiled as
a C code. This fix allows to change how test.h is generated, providing ability
to have more control on what and how tests are run.
Patch #2 changes how test.h is auto-generated, which allows to have test
definitions, instead of just running test functions. This gives ability to do
more complicated test run policies.
Patch #3 adds `-t <test-name>` and `-n <test-num>` selectors to run only
subset of tests.
Patch #4 changes libbpf_set_print() to return previously set print callback,
allowing to temporarily replace current print callback and then set it back.
This is necessary for some tests that want more control over libbpf logging.
Patch #5 sets up and takes over libbpf logging from individual tests to
test_prog runner, adding -vv verbosity to capture debug output from libbpf.
This is useful when debugging failing tests.
Patch #6 furthers test output management and buffers it by default, emitting
log output only if test fails. This give succinct and clean default test
output. It's possible to bypass this behavior with -v flag, which will turn
off test output buffering.
Patch #7 adds support for sub-tests. It also enhances -t and -n selectors to
both support ability to specify sub-test selectors, as well as enhancing
number selector to accept sets of test, instead of just individual test
number.
Patch #8 converts bpf_verif_scale.c test to use sub-test APIs.
Patch #9 converts send_signal.c tests to use sub-test APIs.
v2->v3:
- fix buffered output rare unitialized value bug (Alexei);
- fix buffered output va_list reuse bug (Alexei);
- fix buffered output truncation due to interleaving zero terminators;
v1->v2:
- drop libbpf_swap_print, instead return previous function from
libbpf_set_print (Stanislav);
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Andrii Nakryiko [Sun, 28 Jul 2019 03:25:31 +0000 (20:25 -0700)]
selftests/bpf: convert send_signal.c to use subtests
Convert send_signal set of tests to be exposed as three sub-tests.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Andrii Nakryiko [Sun, 28 Jul 2019 03:25:30 +0000 (20:25 -0700)]
selftests/bpf: convert bpf_verif_scale.c to sub-tests API
Expose each BPF verifier scale test as individual sub-test to allow
independent results output and test selection.
Test run results now look like this:
$ sudo ./test_progs -t verif/
#3/1 loop3.o:OK
#3/2 test_verif_scale1.o:OK
#3/3 test_verif_scale2.o:OK
#3/4 test_verif_scale3.o:OK
#3/5 pyperf50.o:OK
#3/6 pyperf100.o:OK
#3/7 pyperf180.o:OK
#3/8 pyperf600.o:OK
#3/9 pyperf600_nounroll.o:OK
#3/10 loop1.o:OK
#3/11 loop2.o:OK
#3/12 strobemeta.o:OK
#3/13 strobemeta_nounroll1.o:OK
#3/14 strobemeta_nounroll2.o:OK
#3/15 test_sysctl_loop1.o:OK
#3/16 test_sysctl_loop2.o:OK
#3/17 test_xdp_loop.o:OK
#3/18 test_seg6_loop.o:OK
#3 bpf_verif_scale:OK
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Andrii Nakryiko [Sun, 28 Jul 2019 03:25:29 +0000 (20:25 -0700)]
selftests/bpf: add sub-tests support for test_progs
Allow tests to have their own set of sub-tests. Also add ability to do
test/subtest selection using `-t <test-name>/<subtest-name>` and `-n
<test-nums-set>/<subtest-nums-set>`, as an extension of existing -t/-n
selector options. For the <test-num-set> format: it's a comma-separated
list of either individual test numbers (1-based), or range of test
numbers. E.g., all of the following are valid sets of test numbers:
- 10
- 1,2,3
- 1-3
- 5-10,1,3-4
'/<subtest' part is optional, but has the same format. E.g., to select
test #3 and its sub-tests #10 through #15, use: -t 3/10-15.
Similarly, to select tests by name, use `-t verif/strobe`:
$ sudo ./test_progs -t verif/strobe
#3/12 strobemeta.o:OK
#3/13 strobemeta_nounroll1.o:OK
#3/14 strobemeta_nounroll2.o:OK
#3 bpf_verif_scale:OK
Summary: 1/3 PASSED, 0 FAILED
Example of using subtest API is in the next patch, converting
bpf_verif_scale.c tests to use sub-tests.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Andrii Nakryiko [Sun, 28 Jul 2019 03:25:28 +0000 (20:25 -0700)]
selftests/bpf: abstract away test log output
This patch changes how test output is printed out. By default, if test
had no errors, the only output will be a single line with test number,
name, and verdict at the end, e.g.:
#31 xdp:OK
If test had any errors, all log output captured during test execution
will be output after test completes.
It's possible to force output of log with `-v` (`--verbose`) option, in
which case output won't be buffered and will be output immediately.
To support this, individual tests are required to use helper methods for
logging: `test__printf()` and `test__vprintf()`.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Andrii Nakryiko [Sun, 28 Jul 2019 03:25:27 +0000 (20:25 -0700)]
selftest/bpf: centralize libbpf logging management for test_progs
Make test_progs test runner own libbpf logging. Also introduce two
levels of verbosity: -v and -vv. First one will be used in subsequent
patches to enable test log output always. Second one increases verbosity
level of libbpf logging further to include debug output as well.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Andrii Nakryiko [Sun, 28 Jul 2019 03:25:26 +0000 (20:25 -0700)]
libbpf: return previous print callback from libbpf_set_print
By returning previously set print callback from libbpf_set_print, it's
possible to restore it, eventually. This is useful when running many
independent test with one default print function, but overriding log
verbosity for particular subset of tests.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Andrii Nakryiko [Sun, 28 Jul 2019 03:25:25 +0000 (20:25 -0700)]
selftests/bpf: add test selectors by number and name to test_progs
Add ability to specify either test number or test name substring to
narrow down a set of test to run.
Usage:
sudo ./test_progs -n 1
sudo ./test_progs -t attach_probe
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Andrii Nakryiko [Sun, 28 Jul 2019 03:25:24 +0000 (20:25 -0700)]
selftests/bpf: revamp test_progs to allow more control
Refactor test_progs to allow better control on what's being run.
Also use argp to do argument parsing, so that it's easier to keep adding
more options.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Andrii Nakryiko [Sun, 28 Jul 2019 03:25:23 +0000 (20:25 -0700)]
selftests/bpf: prevent headers to be compiled as C code
Apprently listing header as a normal dependency for a binary output
makes it go through compilation as if it was C code. This currently
works without a problem, but in subsequent commits causes problems for
differently generated test.h for test_progs. Marking those headers as
order-only dependency solves the issue.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Alexei Starovoitov [Fri, 26 Jul 2019 01:00:51 +0000 (18:00 -0700)]
Merge branch 'flow_dissector-input-flags'
Stanislav Fomichev says:
====================
C flow dissector supports input flags that tell it to customize parsing
by either stopping early or trying to parse as deep as possible.
BPF flow dissector always parses as deep as possible which is sub-optimal.
Pass input flags to the BPF flow dissector as well so it can make the same
decisions.
Series outline:
* remove unused FLOW_DISSECTOR_F_STOP_AT_L3 flag
* export FLOW_DISSECTOR_F_XXX flags as uapi and pass them to BPF
flow dissector
* add documentation for the export flags
* support input flags in BPF_PROG_TEST_RUN via ctx_{in,out}
* sync uapi to tools
* support FLOW_DISSECTOR_F_PARSE_1ST_FRAG in selftest
* support FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL in kernel and selftest
* support FLOW_DISSECTOR_F_STOP_AT_ENCAP in selftest
Pros:
* makes BPF flow dissector faster by avoiding burning extra cycles
* existing BPF progs continue to work by ignoring the flags and always
parsing as deep as possible
Cons:
* new UAPI which we need to support (OTOH, if we need to deprecate some
flags, we can just stop setting them upon calling BPF programs)
Some numbers (with .repeat = 4000000 in test_flow_dissector):
test_flow_dissector:PASS:ipv4-frag 35 nsec
test_flow_dissector:PASS:ipv4-frag 35 nsec
test_flow_dissector:PASS:ipv4-no-frag 32 nsec
test_flow_dissector:PASS:ipv4-no-frag 32 nsec
test_flow_dissector:PASS:ipv6-frag 39 nsec
test_flow_dissector:PASS:ipv6-frag 39 nsec
test_flow_dissector:PASS:ipv6-no-frag 36 nsec
test_flow_dissector:PASS:ipv6-no-frag 36 nsec
test_flow_dissector:PASS:ipv6-flow-label 36 nsec
test_flow_dissector:PASS:ipv6-flow-label 36 nsec
test_flow_dissector:PASS:ipv6-no-flow-label 33 nsec
test_flow_dissector:PASS:ipv6-no-flow-label 33 nsec
test_flow_dissector:PASS:ipip-encap 38 nsec
test_flow_dissector:PASS:ipip-encap 38 nsec
test_flow_dissector:PASS:ipip-no-encap 32 nsec
test_flow_dissector:PASS:ipip-no-encap 32 nsec
The improvement is around 10%, but it's in a tight cache-hot
BPF_PROG_TEST_RUN loop.
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Stanislav Fomichev [Thu, 25 Jul 2019 22:52:31 +0000 (15:52 -0700)]
selftests/bpf: support BPF_FLOW_DISSECTOR_F_STOP_AT_ENCAP
Exit as soon as we found that packet is encapped when
BPF_FLOW_DISSECTOR_F_STOP_AT_ENCAP is passed.
Add appropriate selftest cases.
v2:
* Subtract sizeof(struct iphdr) from .iph_inner.tot_len (Willem de Bruijn)
Acked-by: Petar Penkov <ppenkov@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Song Liu <songliubraving@fb.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Petar Penkov <ppenkov@google.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Stanislav Fomichev [Thu, 25 Jul 2019 22:52:30 +0000 (15:52 -0700)]
bpf/flow_dissector: support ipv6 flow_label and BPF_FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL
Add support for exporting ipv6 flow label via bpf_flow_keys.
Export flow label from bpf_flow.c and also return early when
BPF_FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL is passed.
Acked-by: Petar Penkov <ppenkov@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Song Liu <songliubraving@fb.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Petar Penkov <ppenkov@google.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Stanislav Fomichev [Thu, 25 Jul 2019 22:52:29 +0000 (15:52 -0700)]
selftests/bpf: support BPF_FLOW_DISSECTOR_F_PARSE_1ST_FRAG
bpf_flow.c: exit early unless BPF_FLOW_DISSECTOR_F_PARSE_1ST_FRAG is
passed in flags. Also, set ip_proto earlier, this makes sure we have
correct value with fragmented packets.
Add selftest cases to test ipv4/ipv6 fragments and skip eth_get_headlen
tests that don't have BPF_FLOW_DISSECTOR_F_PARSE_1ST_FRAG flag.
eth_get_headlen calls flow dissector with
BPF_FLOW_DISSECTOR_F_PARSE_1ST_FRAG flag so we can't run tests that
have different set of input flags against it.
v2:
* sefltests -> selftests (Willem de Bruijn)
* Reword a comment about eth_get_headlen flags (Song Liu)
Acked-by: Petar Penkov <ppenkov@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Song Liu <songliubraving@fb.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Petar Penkov <ppenkov@google.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Stanislav Fomichev [Thu, 25 Jul 2019 22:52:28 +0000 (15:52 -0700)]
tools/bpf: sync bpf_flow_keys flags
Export bpf_flow_keys flags to tools/libbpf/selftests.
Acked-by: Petar Penkov <ppenkov@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Song Liu <songliubraving@fb.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Petar Penkov <ppenkov@google.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Stanislav Fomichev [Thu, 25 Jul 2019 22:52:27 +0000 (15:52 -0700)]
bpf/flow_dissector: support flags in BPF_PROG_TEST_RUN
This will allow us to write tests for those flags.
v2:
* Swap kfree(data) and kfree(user_ctx) (Song Liu)
Acked-by: Petar Penkov <ppenkov@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Song Liu <songliubraving@fb.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Petar Penkov <ppenkov@google.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Stanislav Fomichev [Thu, 25 Jul 2019 22:52:26 +0000 (15:52 -0700)]
bpf/flow_dissector: document flags
Describe what each input flag does and who uses it.
Acked-by: Petar Penkov <ppenkov@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Song Liu <songliubraving@fb.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Petar Penkov <ppenkov@google.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Stanislav Fomichev [Thu, 25 Jul 2019 22:52:25 +0000 (15:52 -0700)]
bpf/flow_dissector: pass input flags to BPF flow dissector program
C flow dissector supports input flags that tell it to customize parsing
by either stopping early or trying to parse as deep as possible. Pass
those flags to the BPF flow dissector so it can make the same
decisions. In the next commits I'll add support for those flags to
our reference bpf_flow.c
v3:
* Export copy of flow dissector flags instead of moving (Alexei Starovoitov)
Acked-by: Petar Penkov <ppenkov@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Song Liu <songliubraving@fb.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Petar Penkov <ppenkov@google.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Allan Zhang [Wed, 24 Jul 2019 00:07:25 +0000 (17:07 -0700)]
selftests/bpf: Add selftests for bpf_perf_event_output
Software event output is only enabled by a few prog types.
This test is to ensure that all supported types are enabled for
bpf_perf_event_output successfully.
Signed-off-by: Allan Zhang <allanzhang@google.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Allan Zhang [Wed, 24 Jul 2019 00:07:24 +0000 (17:07 -0700)]
bpf: Allow bpf_skb_event_output for a few prog types
Software event output is only enabled by a few prog types right now (TC,
LWT out, XDP, sockops). Many other skb based prog types need
bpf_skb_event_output to produce software event.
Added socket_filter, cg_skb, sk_skb prog types to generate sw event.
Test bpf code is generated from code snippet:
struct TMP {
uint64_t tmp;
} tt;
tt.tmp = 5;
bpf_perf_event_output(skb, &connection_tracking_event_map, 0,
&tt, sizeof(tt));
return 1;
the bpf assembly from llvm is:
0: b7 02 00 00 05 00 00 00 r2 = 5
1: 7b 2a f8 ff 00 00 00 00 *(u64 *)(r10 - 8) = r2
2: bf a4 00 00 00 00 00 00 r4 = r10
3: 07 04 00 00 f8 ff ff ff r4 += -8
4: 18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r2 = 0ll
6: b7 03 00 00 00 00 00 00 r3 = 0
7: b7 05 00 00 08 00 00 00 r5 = 8
8: 85 00 00 00 19 00 00 00 call 25
9: b7 00 00 00 01 00 00 00 r0 = 1
10: 95 00 00 00 00 00 00 00 exit
Signed-off-by: Allan Zhang <allanzhang@google.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Alexei Starovoitov [Tue, 23 Jul 2019 23:05:53 +0000 (16:05 -0700)]
Merge branch 'convert-tests-to-libbpf'
Andrii Nakryiko says:
====================
There were few more tests and samples that were using custom perf buffer setup
code from trace_helpers.h. This patch set gets rid of all the usages of those
and removes helpers themselves. Libbpf provides nicer, but equally powerful
set of APIs to work with perf ring buffers, so let's have all the samples use
v1->v2:
- make logging message one long line instead of two (Song).
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Andrii Nakryiko [Tue, 23 Jul 2019 21:34:45 +0000 (14:34 -0700)]
selftests/bpf: remove perf buffer helpers
libbpf's perf_buffer API supersedes trace_helper.h's helpers.
Remove those helpers after all existing users were already moved to
perf_buffer API.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Andrii Nakryiko [Tue, 23 Jul 2019 21:34:44 +0000 (14:34 -0700)]
samples/bpf: switch trace_output sample to perf_buffer API
Convert trace_output sample to libbpf's perf_buffer API.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Andrii Nakryiko [Tue, 23 Jul 2019 21:34:43 +0000 (14:34 -0700)]
samples/bpf: convert xdp_sample_pkts_user to perf_buffer API
Convert xdp_sample_pkts_user to libbpf's perf_buffer API.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Andrii Nakryiko [Tue, 23 Jul 2019 21:34:42 +0000 (14:34 -0700)]
selftests/bpf: switch test_tcpnotify to perf_buffer API
Switch test_tcpnotify test to use libbpf's perf_buffer API instead of
re-implementing portion of it.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Andrii Nakryiko [Tue, 23 Jul 2019 21:34:41 +0000 (14:34 -0700)]
selftests/bpf: convert test_get_stack_raw_tp to perf_buffer API
Convert test_get_stack_raw_tp test to new perf_buffer API.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Andrii Nakryiko [Tue, 23 Jul 2019 21:11:33 +0000 (14:11 -0700)]
libbpf: provide more helpful message on uninitialized global var
When BPF program defines uninitialized global variable, it's put into
a special COMMON section. Libbpf will reject such programs, but will
provide very unhelpful message with garbage-looking section index.
This patch detects special section cases and gives more explicit error
message.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Roman Mashak [Tue, 23 Jul 2019 19:01:59 +0000 (15:01 -0400)]
tc-testing: added tdc tests for [b|p]fifo qdisc
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Shevchenko [Tue, 23 Jul 2019 16:39:43 +0000 (19:39 +0300)]
hv_sock: Use consistent types for UUIDs
The rest of Hyper-V code is using new types for UUID handling.
Convert hv_sock as well.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 23 Jul 2019 20:52:51 +0000 (13:52 -0700)]
Merge branch 'nfp-Offload-MPLS-actions'
John Hurley says:
====================
nfp: Offload MPLS actions
The module act_mpls has recently been added to the kernel. This allows the
manipulation of MPLS headers on packets including push, pop and modify.
Add these new actions and parameters to the intermediate representation
API for hardware offload. Follow this by implementing the offload of these
MPLS actions in the NFP driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
John Hurley [Tue, 23 Jul 2019 14:34:02 +0000 (15:34 +0100)]
nfp: flower: offload MPLS set action
Recent additions to the kernel include a TC action module to manipulate
MPLS headers on packets. Such actions are available to offload via the
flow_offload intermediate representation API.
Modify the NFP driver to allow the offload of MPLS set actions to
firmware. Set actions update the outermost MPLS header. The offload
includes a mask to specify which fields should be set.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
John Hurley [Tue, 23 Jul 2019 14:34:01 +0000 (15:34 +0100)]
nfp: flower: offload MPLS pop action
Recent additions to the kernel include a TC action module to manipulate
MPLS headers on packets. Such actions are available to offload via the
flow_offload intermediate representation API.
Modify the NFP driver to allow the offload of MPLS pop actions to
firmware. The act_mpls TC module enforces that the next protocol is
supplied along with the pop action. Passing this to firmware allows it
to properly rebuild the underlying packet after the pop.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
John Hurley [Tue, 23 Jul 2019 14:34:00 +0000 (15:34 +0100)]
nfp: flower: offload MPLS push action
Recent additions to the kernel include a TC action module to manipulate
MPLS headers on packets. Such actions are available to offload via the
flow_offload intermediate representation API.
Modify the NFP driver to allow the offload of MPLS push actions to
firmware.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
John Hurley [Tue, 23 Jul 2019 14:33:59 +0000 (15:33 +0100)]
net: sched: include mpls actions in hardware intermediate representation
A recent addition to TC actions is the ability to manipulate the MPLS
headers on packets.
In preparation to offload such actions to hardware, update the IR code to
accept and prepare the new actions.
Note that no driver currently impliments the MPLS dec_ttl action so this
is not included.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Tue, 23 Jul 2019 12:02:26 +0000 (12:02 +0000)]
net/mlx5e: xsk: dynamically allocate mlx5e_channel_param
The structure is too large to put on the stack, resulting in a
warning on 32-bit ARM:
drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c:59:5: error: stack frame size of 1344 bytes in function
'mlx5e_open_xsk' [-Werror,-Wframe-larger-than=]
Use kvzalloc() instead.
Fixes:
a038e9794541 ("net/mlx5e: Add XSK zero-copy support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Chuhong Yuan [Tue, 23 Jul 2019 14:16:42 +0000 (22:16 +0800)]
net: jme: Use dev_get_drvdata
Instead of using to_pci_dev + pci_get_drvdata,
use dev_get_drvdata to make code simpler.
Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Chuhong Yuan [Tue, 23 Jul 2019 14:16:24 +0000 (22:16 +0800)]
igb: Use dev_get_drvdata where possible
Instead of using to_pci_dev + pci_get_drvdata,
use dev_get_drvdata to make code simpler.
Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Chuhong Yuan [Tue, 23 Jul 2019 14:15:51 +0000 (22:15 +0800)]
i40e: Use dev_get_drvdata
Instead of using to_pci_dev + pci_get_drvdata,
use dev_get_drvdata to make code simpler.
Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Chuhong Yuan [Tue, 23 Jul 2019 14:15:33 +0000 (22:15 +0800)]
fm10k: Use dev_get_drvdata
Instead of using to_pci_dev + pci_get_drvdata,
use dev_get_drvdata to make code simpler.
Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Chuhong Yuan [Tue, 23 Jul 2019 14:15:13 +0000 (22:15 +0800)]
e1000e: Use dev_get_drvdata where possible
Instead of using to_pci_dev + pci_get_drvdata,
use dev_get_drvdata to make code simpler.
Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Chuhong Yuan [Tue, 23 Jul 2019 13:19:29 +0000 (21:19 +0800)]
net: broadcom: Use dev_get_drvdata
Instead of using to_pci_dev + pci_get_drvdata,
use dev_get_drvdata to make code simpler.
Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Chuhong Yuan [Tue, 23 Jul 2019 13:18:56 +0000 (21:18 +0800)]
net: atheros: Use dev_get_drvdata
Instead of using to_pci_dev + pci_get_drvdata,
use dev_get_drvdata to make code simpler.
Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Chuhong Yuan [Tue, 23 Jul 2019 13:18:44 +0000 (21:18 +0800)]
net: 3com: 3c59x: Use dev_get_drvdata
Instead of using to_pci_dev + pci_get_drvdata,
use dev_get_drvdata to make code simpler.
Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Chuhong Yuan [Tue, 23 Jul 2019 08:13:14 +0000 (16:13 +0800)]
atm: Use dev_get_drvdata
Instead of using to_pci_dev + pci_get_drvdata,
use dev_get_drvdata to make code simpler.
Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 23 Jul 2019 18:45:44 +0000 (11:45 -0700)]
ftgmac100: Fix build.
drivers/net/ethernet/faraday/ftgmac100.c:777:13: error: 'skb_frag_t {aka struct bio_vec}' has no member named 'size'
Fallout from the skb_frag_t conversion to bio_vec, simply
use skb_frag_size().
Fixes:
b8b576a16f79 ("net: Rename skb_frag_t size to bv_len")
Reported-by: René van Dorst <opensource@vdorst.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Benjamin Poirier [Tue, 23 Jul 2019 06:14:13 +0000 (15:14 +0900)]
qlge: Move drivers/net/ethernet/qlogic/qlge/ to drivers/staging/qlge/
The hardware has been declared EOL by the vendor more than 5 years ago.
What's more relevant to the Linux kernel is that the quality of this driver
is not on par with many other mainline drivers.
Cc: Manish Chopra <manishc@marvell.com>
Message-id: <
20190617074858.32467-1-bpoirier@suse.com>
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 23 Jul 2019 03:47:56 +0000 (20:47 -0700)]
Merge branch 'Convert-skb_frag_t-to-bio_vec'
Matthew Wilcox says:
====================
Convert skb_frag_t to bio_vec
The skb_frag_t and bio_vec are fundamentally the same (page, offset,
length) tuple. This patch series unifies the two, leaving the
skb_frag_t typedef in place. This has the immediate advantage that
we already have iov_iter support for bvecs and don't need to add
support for iterating skbuffs. It enables a long-term plan to use
bvecs more broadly within the kernel and should make network-storage
drivers able to do less work converting between skbuffs and biovecs.
It will consume more memory on 32-bit kernels. If that proves
problematic, we can look at ways of addressing it.
v3: Rebase on latest Linus with net-next merged.
- Reorder the uncontroversial 'Use skb accessors' patches first so you
can apply just those two if you want to hold off on the full
conversion.
- Convert all the users of 'struct skb_frag_struct' to skb_frag_t.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Matthew Wilcox (Oracle) [Tue, 23 Jul 2019 03:08:31 +0000 (20:08 -0700)]
net: Convert skb_frag_t to bio_vec
There are a lot of users of frag->page_offset, so use a union
to avoid converting those users today.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Matthew Wilcox (Oracle) [Tue, 23 Jul 2019 03:08:30 +0000 (20:08 -0700)]
net: Rename skb_frag_t size to bv_len
Improved compatibility with bvec
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Matthew Wilcox (Oracle) [Tue, 23 Jul 2019 03:08:29 +0000 (20:08 -0700)]
net: Rename skb_frag page to bv_page
One step closer to turning the skb_frag_t into a bio_vec.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Matthew Wilcox (Oracle) [Tue, 23 Jul 2019 03:08:28 +0000 (20:08 -0700)]
net: Reorder the contents of skb_frag_t
Match the layout of bio_vec.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Matthew Wilcox (Oracle) [Tue, 23 Jul 2019 03:08:27 +0000 (20:08 -0700)]
net: Increase the size of skb_frag_t
To increase commonality between block and net, we are going to replace
the skb_frag_t with the bio_vec. This patch increases the size of
skb_frag_t on 32-bit machines from 8 bytes to 12 bytes. The size is
unchanged on 64-bit machines.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Matthew Wilcox (Oracle) [Tue, 23 Jul 2019 03:08:26 +0000 (20:08 -0700)]
net: Use skb accessors in network core
In preparation for unifying the skb_frag and bio_vec, use the fine
accessors which already exist and use skb_frag_t instead of
struct skb_frag_struct.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Matthew Wilcox (Oracle) [Tue, 23 Jul 2019 03:08:25 +0000 (20:08 -0700)]
net: Use skb accessors in network drivers
In preparation for unifying the skb_frag and bio_vec, use the fine
accessors which already exist and use skb_frag_t instead of
struct skb_frag_struct.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Chuhong Yuan [Mon, 22 Jul 2019 07:41:34 +0000 (15:41 +0800)]
net: usb: Merge cpu_to_le32s + memcpy to put_unaligned_le32
Merge the combo uses of cpu_to_le32s and memcpy.
Use put_unaligned_le32 instead.
This simplifies the code.
Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Mon, 22 Jul 2019 20:01:15 +0000 (22:01 +0200)]
r8169: improve rtl_rx
This patch improves few aspects of rtl_rx, no functional change intended.
1. inline rtl8169_try_rx_copy
2. make pkt_size unsigned
3. use constant ETH_FCS_LEN instead of value 4
4. We just created the skb, so we don't need the checks in skb_put.
Also we don't need the return value of skb_put.
Set skb->tail and skb->len directly.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Chuhong Yuan [Fri, 19 Jul 2019 09:07:15 +0000 (17:07 +0800)]
ax88179_178a: Merge memcpy + le32_to_cpus to get_unaligned_le32
Merge the combo use of memcpy and le32_to_cpus.
Use get_unaligned_le32 instead.
This simplifies the code.
Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Chuhong Yuan [Fri, 19 Jul 2019 08:27:31 +0000 (16:27 +0800)]
usbnet: smsc75xx: Merge memcpy + le32_to_cpus to get_unaligned_le32
Merge the combo use of memcpy and le32_to_cpus.
Use get_unaligned_le32 instead.
This simplifies the code.
Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Chuhong Yuan [Fri, 19 Jul 2019 07:36:15 +0000 (15:36 +0800)]
net: lan78xx: Merge memcpy + lexx_to_cpus to get_unaligned_lexx
Merge the combo use of memcpy and lexx_to_cpus.
Use get_unaligned_lexx instead.
This simplifies the code.
Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Acked-by: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Maciej Żenczykowski [Fri, 19 Jul 2019 06:30:03 +0000 (23:30 -0700)]
net-ipv6-ndisc: add support for RFC7710 RA Captive Portal Identifier
This is trivial since we already have support for the entirely
identical (from the kernel's point of view) RDNSS and DNSSL that
also contain opaque data that needs to be passed down to userspace.
As specified in RFC7710, Captive Portal option contains a URL.
8-bit identifier of the option type as assigned by the IANA is 37.
This option should also be treated as userland.
Hence, treat ND option 37 as userland (Captive Portal support)
See:
https://tools.ietf.org/html/rfc7710
https://www.iana.org/assignments/icmpv6-parameters/icmpv6-parameters.xhtml
Fixes:
e35f30c131a56
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: Remin Nguyen Van <reminv@google.com>
Cc: Alexey I. Froloff <raorn@raorn.name>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Mon, 22 Jul 2019 16:30:34 +0000 (09:30 -0700)]
Merge branch 'sched-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull preemption Kconfig fix from Thomas Gleixner:
"The PREEMPT_RT stub config renamed PREEMPT to PREEMPT_LL and defined
PREEMPT outside of the menu and made it selectable by both PREEMPT_LL
and PREEMPT_RT.
Stupid me missed that 114 defconfigs select CONFIG_PREEMPT which
obviously can't work anymore. oldconfig builds are affected as well,
but it's more obvious as the user gets asked. [old]defconfig silently
fixes it up and selects PREEMPT_NONE.
Unbreak it by undoing the rename and adding a intermediate config
symbol which is selected by both PREEMPT and PREEMPT_RT. That requires
to chase down a few #ifdefs, but it's better than tweaking 114
defconfigs and annoying users"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/rt, Kconfig: Unbreak def/oldconfig with CONFIG_PREEMPT=y
Linus Torvalds [Mon, 22 Jul 2019 16:14:19 +0000 (09:14 -0700)]
Merge tag 'for-linus-
20190722' of git://git./linux/kernel/git/brauner/linux
Pull pidfd polling fix from Christian Brauner:
"A fix for pidfd polling. It ensures that the task's exit state is
visible to all waiters"
* tag 'for-linus-
20190722' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
pidfd: fix a poll race when setting exit_state
Linus Torvalds [Mon, 22 Jul 2019 16:08:38 +0000 (09:08 -0700)]
Merge tag 'for-5.3-rc1-tag' of git://git./linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- fixes for leaks caused by recently merged patches
- one build fix
- a fix to prevent mixing of incompatible features
* tag 'for-5.3-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: don't leak extent_map in btrfs_get_io_geometry()
btrfs: free checksum hash on in close_ctree
btrfs: Fix build error while LIBCRC32C is module
btrfs: inode: Don't compress if NODATASUM or NODATACOW set
Thomas Gleixner [Mon, 22 Jul 2019 15:59:19 +0000 (17:59 +0200)]
sched/rt, Kconfig: Unbreak def/oldconfig with CONFIG_PREEMPT=y
The merge of the CONFIG_PREEMPT_RT stub renamed CONFIG_PREEMPT to
CONFIG_PREEMPT_LL which causes all defconfigs which have CONFIG_PREEMPT=y
set to fall back to CONFIG_PREEMPT_NONE because CONFIG_PREEMPT depends on
the preemption mode choice wich defaults to NONE. This also affects
oldconfig builds.
So rather than changing 114 defconfig files and being an annoyance to
users, revert the rename and select a new config symbol PREEMPTION. That
keeps everything working smoothly and the revelant ifdef's are going to be
fixed up step by step.
Reported-by: Mark Rutland <mark.rutland@arm.com>
Fixes:
a50a3f4b6a31 ("sched/rt, Kconfig: Introduce CONFIG_PREEMPT_RT")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Linus Torvalds [Mon, 22 Jul 2019 16:01:47 +0000 (09:01 -0700)]
Merge tag 'media/v5.3-2' of git://git./linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
"For two regressions in media core:
- v4l2-subdev: fix regression in check_pad()
- videodev2.h: change V4L2_PIX_FMT_BGRA444 define: fourcc was already
in use"
* tag 'media/v5.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
media: videodev2.h: change V4L2_PIX_FMT_BGRA444 define: fourcc was already in use
media: v4l2-subdev: fix regression in check_pad()
Linus Torvalds [Mon, 22 Jul 2019 15:49:22 +0000 (08:49 -0700)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Several netfilter fixes including a nfnetlink deadlock fix from
Florian Westphal and fix for dropping VRF packets from Miaohe Lin.
2) Flow offload fixes from Pablo Neira Ayuso including a fix to restore
proper block sharing.
3) Fix r8169 PHY init from Thomas Voegtle.
4) Fix memory leak in mac80211, from Lorenzo Bianconi.
5) Missing NULL check on object allocation in cxgb4, from Navid
Emamdoost.
6) Fix scaling of RX power in sfp phy driver, from Andrew Lunn.
7) Check that there is actually an ip header to access in skb->data in
VRF, from Peter Kosyh.
8) Remove spurious rcu unlock in hv_netvsc, from Haiyang Zhang.
9) One more tweak the the TCP fragmentation memory limit changes, to be
less harmful to applications setting small SO_SNDBUF values. From
Eric Dumazet.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (40 commits)
tcp: be more careful in tcp_fragment()
hv_netvsc: Fix extra rcu_read_unlock in netvsc_recv_callback()
vrf: make sure skb->data contains ip header to make routing
connector: remove redundant input callback from cn_dev
qed: Prefer pcie_capability_read_word()
igc: Prefer pcie_capability_read_word()
cxgb4: Prefer pcie_capability_read_word()
be2net: Synchronize be_update_queues with dev_watchdog
bnx2x: Prevent load reordering in tx completion processing
net: phy: sfp: hwmon: Fix scaling of RX power
net: sched: verify that q!=NULL before setting q->flags
chelsio: Fix a typo in a function name
allocate_flower_entry: should check for null deref
net: hns3: typo in the name of a constant
kbuild: add net/netfilter/nf_tables_offload.h to header-test blacklist.
tipc: Fix a typo
mac80211: don't warn about CW params when not using them
mac80211: fix possible memory leak in ieee80211_assign_beacon
nl80211: fix NL80211_HE_MAX_CAPABILITY_LEN
nl80211: fix VENDOR_CMD_RAW_DATA
...
Suren Baghdasaryan [Wed, 17 Jul 2019 17:21:00 +0000 (13:21 -0400)]
pidfd: fix a poll race when setting exit_state
There is a race between reading task->exit_state in pidfd_poll and
writing it after do_notify_parent calls do_notify_pidfd. Expected
sequence of events is:
CPU 0 CPU 1
------------------------------------------------
exit_notify
do_notify_parent
do_notify_pidfd
tsk->exit_state = EXIT_DEAD
pidfd_poll
if (tsk->exit_state)
However nothing prevents the following sequence:
CPU 0 CPU 1
------------------------------------------------
exit_notify
do_notify_parent
do_notify_pidfd
pidfd_poll
if (tsk->exit_state)
tsk->exit_state = EXIT_DEAD
This causes a polling task to wait forever, since poll blocks because
exit_state is 0 and the waiting task is not notified again. A stress
test continuously doing pidfd poll and process exits uncovered this bug.
To fix it, we make sure that the task's exit_state is always set before
calling do_notify_pidfd.
Fixes:
b53b0b9d9a6 ("pidfd: add polling support")
Cc: kernel-team@android.com
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Link: https://lore.kernel.org/r/20190717172100.261204-1-joel@joelfernandes.org
[christian@brauner.io: adapt commit message and drop unneeded changes from wait_task_zombie]
Signed-off-by: Christian Brauner <christian@brauner.io>
Eric Dumazet [Fri, 19 Jul 2019 18:52:33 +0000 (11:52 -0700)]
tcp: be more careful in tcp_fragment()
Some applications set tiny SO_SNDBUF values and expect
TCP to just work. Recent patches to address CVE-2019-11478
broke them in case of losses, since retransmits might
be prevented.
We should allow these flows to make progress.
This patch allows the first and last skb in retransmit queue
to be split even if memory limits are hit.
It also adds the some room due to the fact that tcp_sendmsg()
and tcp_sendpage() might overshoot sk_wmem_queued by about one full
TSO skb (64KB size). Note this allowance was already present
in stable backports for kernels < 4.15
Note for < 4.15 backports :
tcp_rtx_queue_tail() will probably look like :
static inline struct sk_buff *tcp_rtx_queue_tail(const struct sock *sk)
{
struct sk_buff *skb = tcp_send_head(sk);
return skb ? tcp_write_queue_prev(sk, skb) : tcp_write_queue_tail(sk);
}
Fixes:
f070ef2ac667 ("tcp: tcp_fragment() should apply sane memory limits")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrew Prout <aprout@ll.mit.edu>
Tested-by: Andrew Prout <aprout@ll.mit.edu>
Tested-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Tested-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Christoph Paasch <cpaasch@apple.com>
Cc: Jonathan Looney <jtl@netflix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Haiyang Zhang [Fri, 19 Jul 2019 17:33:51 +0000 (17:33 +0000)]
hv_netvsc: Fix extra rcu_read_unlock in netvsc_recv_callback()
There is an extra rcu_read_unlock left in netvsc_recv_callback(),
after a previous patch that removes RCU from this function.
This patch removes the extra RCU unlock.
Fixes:
345ac08990b8 ("hv_netvsc: pass netvsc_device to receive callback")
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sun, 21 Jul 2019 21:05:38 +0000 (14:05 -0700)]
Linus 5.3-rc1
Peter Kosyh [Fri, 19 Jul 2019 08:11:47 +0000 (11:11 +0300)]
vrf: make sure skb->data contains ip header to make routing
vrf_process_v4_outbound() and vrf_process_v6_outbound() do routing
using ip/ipv6 addresses, but don't make sure the header is available
in skb->data[] (skb_headlen() is less then header size).
Case:
1) igb driver from intel.
2) Packet size is greater then 255.
3) MPLS forwards to VRF device.
So, patch adds pskb_may_pull() calls in vrf_process_v4/v6_outbound()
functions.
Signed-off-by: Peter Kosyh <p.kosyh@gmail.com>
Reviewed-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vasily Averin [Thu, 18 Jul 2019 04:26:46 +0000 (07:26 +0300)]
connector: remove redundant input callback from cn_dev
A small cleanup: this callback is never used.
Originally fixed by Stanislav Kinsburskiy <skinsbursky@virtuozzo.com>
for OpenVZ7 bug OVZ-6877
cc: stanislav.kinsburskiy@gmail.com
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Frederick Lawler [Thu, 18 Jul 2019 02:07:42 +0000 (21:07 -0500)]
qed: Prefer pcie_capability_read_word()
Commit
8c0d3a02c130 ("PCI: Add accessors for PCI Express Capability")
added accessors for the PCI Express Capability so that drivers didn't
need to be aware of differences between v1 and v2 of the PCI
Express Capability.
Replace pci_read_config_word() and pci_write_config_word() calls with
pcie_capability_read_word() and pcie_capability_write_word().
Signed-off-by: Frederick Lawler <fred@fredlawl.com>
Acked-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Frederick Lawler [Thu, 18 Jul 2019 02:07:39 +0000 (21:07 -0500)]
igc: Prefer pcie_capability_read_word()
Commit
8c0d3a02c130 ("PCI: Add accessors for PCI Express Capability")
added accessors for the PCI Express Capability so that drivers didn't
need to be aware of differences between v1 and v2 of the PCI
Express Capability.
Replace pci_read_config_word() and pci_write_config_word() calls with
pcie_capability_read_word() and pcie_capability_write_word().
Signed-off-by: Frederick Lawler <fred@fredlawl.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Frederick Lawler [Thu, 18 Jul 2019 02:07:36 +0000 (21:07 -0500)]
cxgb4: Prefer pcie_capability_read_word()
Commit
8c0d3a02c130 ("PCI: Add accessors for PCI Express Capability")
added accessors for the PCI Express Capability so that drivers didn't
need to be aware of differences between v1 and v2 of the PCI
Express Capability.
Replace pci_read_config_word() and pci_write_config_word() calls with
pcie_capability_read_word() and pcie_capability_write_word().
Signed-off-by: Frederick Lawler <fred@fredlawl.com>
Reviewed-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Benjamin Poirier [Thu, 18 Jul 2019 01:42:18 +0000 (10:42 +0900)]
be2net: Synchronize be_update_queues with dev_watchdog
As pointed out by Firo Yang, a netdev tx timeout may trigger just before an
ethtool set_channels operation is started. be_tx_timeout(), which dumps
some queue structures, is not written to run concurrently with
be_update_queues(), which frees/allocates those queues structures. Add some
synchronization between the two.
Message-id: <CH2PR18MB31898E033896F9760D36BFF288C90@CH2PR18MB3189.namprd18.prod.outlook.com>
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Brian King [Mon, 15 Jul 2019 21:41:50 +0000 (16:41 -0500)]
bnx2x: Prevent load reordering in tx completion processing
This patch fixes an issue seen on Power systems with bnx2x which results
in the skb is NULL WARN_ON in bnx2x_free_tx_pkt firing due to the skb
pointer getting loaded in bnx2x_free_tx_pkt prior to the hw_cons
load in bnx2x_tx_int. Adding a read memory barrier resolves the issue.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Sun, 21 Jul 2019 16:50:08 +0000 (18:50 +0200)]
net: phy: sfp: hwmon: Fix scaling of RX power
The RX power read from the SFP uses units of 0.1uW. This must be
scaled to units of uW for HWMON. This requires a divide by 10, not the
current 100.
With this change in place, sensors(1) and ethtool -m agree:
sff2-isa-0000
Adapter: ISA adapter
in0: +3.23 V
temp1: +33.1 C
power1: 270.00 uW
power2: 200.00 uW
curr1: +0.01 A
Laser output power : 0.2743 mW / -5.62 dBm
Receiver signal average optical power : 0.2014 mW / -6.96 dBm
Reported-by: chris.healy@zii.aero
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Fixes:
1323061a018a ("net: phy: sfp: Add HWMON support for module sensors")
Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Buslov [Sun, 21 Jul 2019 14:44:12 +0000 (17:44 +0300)]
net: sched: verify that q!=NULL before setting q->flags
In function int tc_new_tfilter() q pointer can be NULL when adding filter
on a shared block. With recent change that resets TCQ_F_CAN_BYPASS after
filter creation, following NULL pointer dereference happens in case parent
block is shared:
[ 212.925060] BUG: kernel NULL pointer dereference, address:
0000000000000010
[ 212.925445] #PF: supervisor write access in kernel mode
[ 212.925709] #PF: error_code(0x0002) - not-present page
[ 212.925965] PGD
8000000827923067 P4D
8000000827923067 PUD
827924067 PMD 0
[ 212.926302] Oops: 0002 [#1] SMP KASAN PTI
[ 212.926539] CPU: 18 PID: 2617 Comm: tc Tainted: G B 5.2.0+ #512
[ 212.926938] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
[ 212.927364] RIP: 0010:tc_new_tfilter+0x698/0xd40
[ 212.927633] Code: 74 0d 48 85 c0 74 08 48 89 ef e8 03 aa 62 00 48 8b 84 24 a0 00 00 00 48 8d 78 10 48 89 44 24 18 e8 4d 0c 6b ff 48 8b 44 24 18 <83> 60 10 f
b 48 85 ed 0f 85 3d fe ff ff e9 4f fe ff ff e8 81 26 f8
[ 212.928607] RSP: 0018:
ffff88884fd5f5d8 EFLAGS:
00010296
[ 212.928905] RAX:
0000000000000000 RBX:
0000000000000000 RCX:
dffffc0000000000
[ 212.929201] RDX:
0000000000000007 RSI:
0000000000000004 RDI:
0000000000000297
[ 212.929402] RBP:
ffff88886bedd600 R08:
ffffffffb91d4b51 R09:
fffffbfff7616e4d
[ 212.929609] R10:
fffffbfff7616e4c R11:
ffffffffbb0b7263 R12:
ffff88886bc61040
[ 212.929803] R13:
ffff88884fd5f950 R14:
ffffc900039c5000 R15:
ffff88835e927680
[ 212.929999] FS:
00007fe7c50b6480(0000) GS:
ffff88886f980000(0000) knlGS:
0000000000000000
[ 212.930235] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 212.930394] CR2:
0000000000000010 CR3:
000000085bd04002 CR4:
00000000001606e0
[ 212.930588] Call Trace:
[ 212.930682] ? tc_del_tfilter+0xa40/0xa40
[ 212.930811] ? __lock_acquire+0x5b5/0x2460
[ 212.930948] ? find_held_lock+0x85/0xa0
[ 212.931081] ? tc_del_tfilter+0xa40/0xa40
[ 212.931201] rtnetlink_rcv_msg+0x4ab/0x5f0
[ 212.931332] ? rtnl_dellink+0x490/0x490
[ 212.931454] ? lockdep_hardirqs_on+0x260/0x260
[ 212.931589] ? netlink_deliver_tap+0xab/0x5a0
[ 212.931717] ? match_held_lock+0x1b/0x240
[ 212.931844] netlink_rcv_skb+0xd0/0x200
[ 212.931958] ? rtnl_dellink+0x490/0x490
[ 212.932079] ? netlink_ack+0x440/0x440
[ 212.932205] ? netlink_deliver_tap+0x161/0x5a0
[ 212.932335] ? lock_downgrade+0x360/0x360
[ 212.932457] ? lock_acquire+0xe5/0x210
[ 212.932579] netlink_unicast+0x296/0x350
[ 212.932705] ? netlink_attachskb+0x390/0x390
[ 212.932834] ? _copy_from_iter_full+0xe0/0x3a0
[ 212.932976] netlink_sendmsg+0x394/0x600
[ 212.937998] ? netlink_unicast+0x350/0x350
[ 212.943033] ? move_addr_to_kernel.part.0+0x90/0x90
[ 212.948115] ? netlink_unicast+0x350/0x350
[ 212.953185] sock_sendmsg+0x96/0xa0
[ 212.958099] ___sys_sendmsg+0x482/0x520
[ 212.962881] ? match_held_lock+0x1b/0x240
[ 212.967618] ? copy_msghdr_from_user+0x250/0x250
[ 212.972337] ? lock_downgrade+0x360/0x360
[ 212.976973] ? rwlock_bug.part.0+0x60/0x60
[ 212.981548] ? __mod_node_page_state+0x1f/0xa0
[ 212.986060] ? match_held_lock+0x1b/0x240
[ 212.990567] ? find_held_lock+0x85/0xa0
[ 212.994989] ? do_user_addr_fault+0x349/0x5b0
[ 212.999387] ? lock_downgrade+0x360/0x360
[ 213.003713] ? find_held_lock+0x85/0xa0
[ 213.007972] ? __fget_light+0xa1/0xf0
[ 213.012143] ? sockfd_lookup_light+0x91/0xb0
[ 213.016165] __sys_sendmsg+0xba/0x130
[ 213.020040] ? __sys_sendmsg_sock+0xb0/0xb0
[ 213.023870] ? handle_mm_fault+0x337/0x470
[ 213.027592] ? page_fault+0x8/0x30
[ 213.031316] ? lockdep_hardirqs_off+0xbe/0x100
[ 213.034999] ? mark_held_locks+0x24/0x90
[ 213.038671] ? do_syscall_64+0x1e/0xe0
[ 213.042297] do_syscall_64+0x74/0xe0
[ 213.045828] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 213.049354] RIP: 0033:0x7fe7c527c7b8
[ 213.052792] Code: 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 65 8f 0c 00 8b 00 85 c0 75 17 b8 2e 00 00 00 0f 05 <48> 3d 00 f
0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 28 89 54
[ 213.060269] RSP: 002b:
00007ffc3f7908a8 EFLAGS:
00000246 ORIG_RAX:
000000000000002e
[ 213.064144] RAX:
ffffffffffffffda RBX:
000000005d34716f RCX:
00007fe7c527c7b8
[ 213.068094] RDX:
0000000000000000 RSI:
00007ffc3f790910 RDI:
0000000000000003
[ 213.072109] RBP:
0000000000000000 R08:
0000000000000001 R09:
00007fe7c5340cc0
[ 213.076113] R10:
0000000000404ec2 R11:
0000000000000246 R12:
0000000000000080
[ 213.080146] R13:
0000000000480640 R14:
0000000000000080 R15:
0000000000000000
[ 213.084147] Modules linked in: act_gact cls_flower sch_ingress nfsv3 nfs_acl nfs lockd grace fscache bridge stp llc sunrpc intel_rapl_msr intel_rapl_common
\e[<1;69;32Msb_edac rdma_ucm rdma_cm x86_pkg_temp_thermal iw_cm intel_powerclamp ib_cm coretemp kvm_intel kvm irqbypass mlx5_ib ib_uverbs ib_core crct10dif_pclmul crc32_pc
lmul crc32c_intel ghash_clmulni_intel mlx5_core intel_cstate intel_uncore iTCO_wdt igb iTCO_vendor_support mlxfw mei_me ptp ses intel_rapl_perf mei pcspkr ipmi
_ssif i2c_i801 joydev enclosure pps_core lpc_ich ioatdma wmi dca ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad ast i2c_algo_bit drm_vram_helpe
r ttm drm_kms_helper drm mpt3sas raid_class scsi_transport_sas
[ 213.112326] CR2:
0000000000000010
[ 213.117429] ---[ end trace
adb58eb0a4ee6283 ]---
Verify that q pointer is not NULL before setting the 'flags' field.
Fixes:
3f05e6886a59 ("net_sched: unset TCQ_F_CAN_BYPASS when adding filters")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Christophe JAILLET [Sun, 21 Jul 2019 13:16:05 +0000 (15:16 +0200)]
chelsio: Fix a typo in a function name
It is likely that 'my3216_poll()' should be 'my3126_poll()'. (1 and 2
switched in 3126.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Navid Emamdoost [Sun, 21 Jul 2019 06:37:31 +0000 (01:37 -0500)]
allocate_flower_entry: should check for null deref
allocate_flower_entry does not check for allocation success, but tries
to deref the result. I only moved the spin_lock under null check, because
the caller is checking allocation's status at line 652.
Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Christophe JAILLET [Sun, 21 Jul 2019 13:08:31 +0000 (15:08 +0200)]
net: hns3: typo in the name of a constant
All constant in 'enum HCLGE_MBX_OPCODE' start with HCLGE, except
'HLCGE_MBX_PUSH_VLAN_INFO' (C and L switched)
s/HLC/HCL/
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>