David S. Miller [Tue, 2 May 2017 19:41:23 +0000 (15:41 -0400)]
Merge branch 'thunderx-xdp'
Sunil Goutham says:
====================
net: thunderx: Adds XDP support
This patch series adds support for XDP to ThunderX NIC driver
which is used on CN88xx, CN81xx and CN83xx platforms.
Patches 1-4 are performance improvement and cleanup patches
which are done keeping XDP performance bottlenecks in view.
Rest of the patches adds actual XDP support.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Sunil Goutham [Tue, 2 May 2017 13:06:58 +0000 (18:36 +0530)]
net: thunderx: Optimize page recycling for XDP
Driver follows a method of taking one extra reference on the
page for recycling which is fine in usual packet path where
each 64KB page is segmented into multiple receive buffers.
But in XDP mode since there is just one receive buffer per
page taking extra page reference itself becomes big bottleneck
consuming ~50% of CPU cycles due to atomic operations.
This patch adds a internal ref count in pgcache for each
page and additional page references are taken in a batch
instead of just one at a time. Internal i.e 'pgcache->ref_count'
and page's i.e 'page->_refcount' counters are compared to check
page's recyclability.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sunil Goutham [Tue, 2 May 2017 13:06:57 +0000 (18:36 +0530)]
net: thunderx: Support for XDP header adjustment
When in XDP mode reserve XDP_PACKET_HEADROOM bytes at the start
of receive buffer for XDP program to modify headers and adjust
packet start. Additional code changes done to handle such packets.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sunil Goutham [Tue, 2 May 2017 13:06:56 +0000 (18:36 +0530)]
net: thunderx: Add support for XDP_TX
Adds support for XDP_TX i.e transmits packet out of
the XDP TX queue mapped to the corresponding Rx queue
on which packet is received.
Since SQ for XDP TX will be used only on a single cpu i.e
SQ description creation and freeing, using atomic free count
is not necessary and will become a bottleneck. Hence added
a separate 'xdp_free_cnt' used for SQs designated for XDP
to track descriptor free count.
Changes also include
- A new entry 'xdp_page' is added to save transmitted packet's
page pointer for later cleanup.
- XDP Tx SQ's doorbell is ringed once per NAPI instance.
- Retrieving designated SQ for packets being sent out by stack
via 'nicvf_xmit'.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sunil Goutham [Tue, 2 May 2017 13:06:55 +0000 (18:36 +0530)]
net: thunderx: Add support for XDP_DROP
Adds support for XDP_DROP.
Also since in XDP mode there is just a single buffer per page,
made changes to recycle DMA mapping info as well along with pages.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sunil Goutham [Tue, 2 May 2017 13:06:54 +0000 (18:36 +0530)]
net: thunderx: Add basic XDP support
Adds basic XDP support i.e attaching a BPF program to an
interface. Also takes care of allocating separate Tx queues
for XDP path and for network stack packet transmission.
This patch doesn't support handling of any of the XDP actions,
all are treated as XDP_PASS i.e packets will be handed over to
the network stack.
Changes also involve allocating one receive buffer per page in XDP
mode and multiple in normal mode i.e when no BPF program is attached.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sunil Goutham [Tue, 2 May 2017 13:06:53 +0000 (18:36 +0530)]
net: thunderx: Cleanup receive buffer allocation
Get rid of unnecessary double pointer references and type casting
in receive buffer allocation code.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sunil Goutham [Tue, 2 May 2017 13:06:52 +0000 (18:36 +0530)]
net: thunderx: Optimize CQE_TX handling
Optimized CQE handling with below changes
- Feeing descriptors back to SQ in bulk i.e once per NAPI
instance instead for every CQE_TX, this will reduce number
of atomic updates to 'sq->free_cnt'.
- Checking errors in CQE_TX and CQE_RX before calling appropriate
fn()s to update error stats i.e reduce branching.
Also removed debug messages in packet handling path which otherwise
causes issues if DEBUG is enabled.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sunil Goutham [Tue, 2 May 2017 13:06:51 +0000 (18:36 +0530)]
net: thunderx: Optimize RBDR descriptor handling
Receive buffer's physical address or iova will anyway not
go beyond 49bits, since it is the max supported HW address.
As per perf, updating bitfields i.e buf_addr:42 in RBDR
descriptor entry consumes lots of cpu cycles, hence changed
it to a 64bit field with alignment requirements taken care of.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sunil Goutham [Tue, 2 May 2017 13:06:50 +0000 (18:36 +0530)]
net: thunderx: Support for page recycling
Adds support for page recycling for allocating receive buffers
to reduce cost of refilling RBDR ring. Also got rid of using
compound pages when pagesize is 4K, only order-0 pages now.
Only page is recycled, DMA mappings still needs to be done for
every receive buffer allocated due to following constraints
- Cannot have just one receive buffer per 64KB page.
- There is just one buffer ring shared across 8 Rx queues, so
buffers of same page can go to any Rx queue.
- HW gives buffer address where packet has been DMA'ed and not
the index into buffer ring.
This makes it not possible to resue DMA mapping info. So unfortunately
have to go through costly mapping route for every buffer.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Tue, 2 May 2017 10:58:53 +0000 (13:58 +0300)]
ipx: call ipxitf_put() in ioctl error path
We should call ipxitf_put() if the copy_to_user() fails.
Reported-by: 李强 <liqiang6-s@360.cn>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Tue, 2 May 2017 08:12:00 +0000 (10:12 +0200)]
net: sched: add helpers to handle extended actions
Jump is now the only one using value action opcode. This is going to
change soon. So introduce helpers to work with this. Convert TC_ACT_JUMP.
This also fixes the TC_ACT_JUMP check, which is incorrectly done as a
bit check, not a value check.
Fixes: e0ee84ded796 ("net sched actions: Complete the JUMPX opcode")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 2 May 2017 19:33:02 +0000 (15:33 -0400)]
Merge branch 'qed-PTP-fixes'
Sudarsana Reddy Kalluru says:
====================
qed*: PTP bug fixes.
The series addresses couple of issues in the PTP implementation.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
sudarsana.kalluru@cavium.com [Tue, 2 May 2017 08:11:03 +0000 (01:11 -0700)]
qed*: Fix issues in the ptp filter config implementation.
PTP hardware filter configuration performed by the driver for a given
user requested config is not correct for some of the PTP modes.
Following changes are needed for PTP config-filter implementation.
1. NIG_REG_TX_PTP_EN register - Bits 0/1/2 respectively enables
TimeSync/"V1 frame format support"/"V2 frame format support" on
the TX side. Set the associated bits based on the user request.
2. ptp4l application fails to operate in Peer Delay mode. Following
changes are needed to fix this,
a. Driver should enable (set to 0) DA #1-related bits for IPv4,
IPv6 and MAC destination addresses in these registers:
NIG_REG_TX_LLH_PTP_RULE_MASK
NIG_REG_LLH_PTP_RULE_MASK
b. NIG_REG_LLH_PTP_PARAM_MASK/NIG_REG_TX_LLH_PTP_PARAM_MASK should
be set to 0x0 in all modes.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sudarsana.kalluru@cavium.com [Tue, 2 May 2017 08:11:02 +0000 (01:11 -0700)]
qede: Fix concurrency issue in PTP Tx path processing.
PTP Tx timestamping data structures are not protected against the
concurrent access in the Tx paths. Protecting the same using atomic
bit locks.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jan Kiszka [Tue, 2 May 2017 07:58:00 +0000 (09:58 +0200)]
stmmac: Add support for SIMATIC IOT2000 platform
The IOT2000 is industrial controller platform, derived from the Intel
Galileo Gen2 board. The variant IOT2020 comes with one LAN port, the
IOT2040 has two of them. They can be told apart based on the board asset
tag in the DMI table.
Based on patch by Sascha Weisenberger.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Sascha Weisenberger <sascha.weisenberger@siemens.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Timmy Li [Tue, 2 May 2017 02:46:52 +0000 (10:46 +0800)]
net: hns: fix ethtool_get_strings overflow in hns driver
hns_get_sset_count() returns HNS_NET_STATS_CNT and the data space allocated
is not enough for ethtool_get_strings(), which will cause random memory
corruption.
When SLAB and DEBUG_SLAB are both enabled, memory corruptions like the
the following can be observed without this patch:
[ 43.115200] Slab corruption (Not tainted): Acpi-ParseExt start=
ffff801fb0b69030, len=80
[ 43.115206] Redzone: 0x9f911029d006462/0x5f78745f31657070.
[ 43.115208] Last user: [<
5f7272655f746b70>](0x5f7272655f746b70)
[ 43.115214] 010: 70 70 65 31 5f 74 78 5f 70 6b 74 00 6b 6b 6b 6b ppe1_tx_pkt.kkkk
[ 43.115217] 030: 70 70 65 31 5f 74 78 5f 70 6b 74 5f 6f 6b 00 6b ppe1_tx_pkt_ok.k
[ 43.115218] Next obj: start=
ffff801fb0b69098, len=80
[ 43.115220] Redzone: 0x706d655f6f666966/0x9f911029d74e35b.
[ 43.115229] Last user: [<
ffff0000084b11b0>](acpi_os_release_object+0x28/0x38)
[ 43.115231] 000: 74 79 00 6b 6b 6b 6b 6b 70 70 65 31 5f 74 78 5f ty.kkkkkppe1_tx_
[ 43.115232] 010: 70 6b 74 5f 65 72 72 5f 63 73 75 6d 5f 66 61 69 pkt_err_csum_fai
Signed-off-by: Timmy Li <lixiaoping3@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 1 May 2017 22:29:48 +0000 (15:29 -0700)]
tcp: fix wraparound issue in tcp_lp
Be careful when comparing tcp_time_stamp to some u32 quantity,
otherwise result can be surprising.
Fixes: 7c106d7e782b ("[TCP]: TCP Low Priority congestion control")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Tue, 2 May 2017 18:34:54 +0000 (20:34 +0200)]
bpf, arm64: fix jit branch offset related to ldimm64
When the instruction right before the branch destination is
a 64 bit load immediate, we currently calculate the wrong
jump offset in the ctx->offset[] array as we only account
one instruction slot for the 64 bit load immediate although
it uses two BPF instructions. Fix it up by setting the offset
into the right slot after we incremented the index.
Before (ldimm64 test 1):
[...]
00000020:
52800007 mov w7, #0x0 // #0
00000024:
d2800060 mov x0, #0x3 // #3
00000028:
d2800041 mov x1, #0x2 // #2
0000002c:
eb01001f cmp x0, x1
00000030:
54ffff82 b.cs 0x00000020
00000034:
d29fffe7 mov x7, #0xffff // #65535
00000038:
f2bfffe7 movk x7, #0xffff, lsl #16
0000003c:
f2dfffe7 movk x7, #0xffff, lsl #32
00000040:
f2ffffe7 movk x7, #0xffff, lsl #48
00000044:
d29dddc7 mov x7, #0xeeee // #61166
00000048:
f2bdddc7 movk x7, #0xeeee, lsl #16
0000004c:
f2ddddc7 movk x7, #0xeeee, lsl #32
00000050:
f2fdddc7 movk x7, #0xeeee, lsl #48
[...]
After (ldimm64 test 1):
[...]
00000020:
52800007 mov w7, #0x0 // #0
00000024:
d2800060 mov x0, #0x3 // #3
00000028:
d2800041 mov x1, #0x2 // #2
0000002c:
eb01001f cmp x0, x1
00000030:
540000a2 b.cs 0x00000044
00000034:
d29fffe7 mov x7, #0xffff // #65535
00000038:
f2bfffe7 movk x7, #0xffff, lsl #16
0000003c:
f2dfffe7 movk x7, #0xffff, lsl #32
00000040:
f2ffffe7 movk x7, #0xffff, lsl #48
00000044:
d29dddc7 mov x7, #0xeeee // #61166
00000048:
f2bdddc7 movk x7, #0xeeee, lsl #16
0000004c:
f2ddddc7 movk x7, #0xeeee, lsl #32
00000050:
f2fdddc7 movk x7, #0xeeee, lsl #48
[...]
Also, add a couple of test cases to make sure JITs pass
this test. Tested on Cavium ThunderX ARMv8. The added
test cases all pass after the fix.
Fixes: 8eee539ddea0 ("arm64: bpf: fix out-of-bounds read in bpf2a64_offset()")
Reported-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: Xi Wang <xi.wang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Mon, 1 May 2017 00:57:20 +0000 (02:57 +0200)]
bpf, arm64: implement jiting of BPF_XADD
This work adds BPF_XADD for BPF_W/BPF_DW to the arm64 JIT and therefore
completes JITing of all BPF instructions, meaning we can thus also remove
the 'notyet' label and do not need to fall back to the interpreter when
BPF_XADD is used in a program!
This now also brings arm64 JIT in line with x86_64, s390x, ppc64, sparc64,
where all current eBPF features are supported.
BPF_W example from test_bpf:
.u.insns_int = {
BPF_ALU32_IMM(BPF_MOV, R0, 0x12),
BPF_ST_MEM(BPF_W, R10, -40, 0x10),
BPF_STX_XADD(BPF_W, R10, R0, -40),
BPF_LDX_MEM(BPF_W, R0, R10, -40),
BPF_EXIT_INSN(),
},
[...]
00000020:
52800247 mov w7, #0x12 // #18
00000024:
928004eb mov x11, #0xffffffffffffffd8 // #-40
00000028:
d280020a mov x10, #0x10 // #16
0000002c:
b82b6b2a str w10, [x25,x11]
// start of xadd mapping:
00000030:
928004ea mov x10, #0xffffffffffffffd8 // #-40
00000034:
8b19014a add x10, x10, x25
00000038:
f9800151 prfm pstl1strm, [x10]
0000003c:
885f7d4b ldxr w11, [x10]
00000040:
0b07016b add w11, w11, w7
00000044:
880b7d4b stxr w11, w11, [x10]
00000048:
35ffffab cbnz w11, 0x0000003c
// end of xadd mapping:
[...]
BPF_DW example from test_bpf:
.u.insns_int = {
BPF_ALU32_IMM(BPF_MOV, R0, 0x12),
BPF_ST_MEM(BPF_DW, R10, -40, 0x10),
BPF_STX_XADD(BPF_DW, R10, R0, -40),
BPF_LDX_MEM(BPF_DW, R0, R10, -40),
BPF_EXIT_INSN(),
},
[...]
00000020:
52800247 mov w7, #0x12 // #18
00000024:
928004eb mov x11, #0xffffffffffffffd8 // #-40
00000028:
d280020a mov x10, #0x10 // #16
0000002c:
f82b6b2a str x10, [x25,x11]
// start of xadd mapping:
00000030:
928004ea mov x10, #0xffffffffffffffd8 // #-40
00000034:
8b19014a add x10, x10, x25
00000038:
f9800151 prfm pstl1strm, [x10]
0000003c:
c85f7d4b ldxr x11, [x10]
00000040:
8b07016b add x11, x11, x7
00000044:
c80b7d4b stxr w11, x11, [x10]
00000048:
35ffffab cbnz w11, 0x0000003c
// end of xadd mapping:
[...]
Tested on Cavium ThunderX ARMv8, test suite results after the patch:
No JIT: [ 3751.855362] test_bpf: Summary: 311 PASSED, 0 FAILED, [0/303 JIT'ed]
With JIT: [ 3573.759527] test_bpf: Summary: 311 PASSED, 0 FAILED, [303/303 JIT'ed]
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 2 May 2017 15:46:29 +0000 (11:46 -0400)]
Merge branch 'bpf-test-prog-fixes'
I say:
====================
Fix some bpf program testing framework bugs
This series fixes two issue:
1) Accidental user pointer dereference in bpf_test_finish()
2) The packet data given to the test programs is not aligned correctly
The first issue is fixed simply because we have a kernel side copy
of the datastructure in question already. And the second bug is
a simple matter of applying NET_IP_ALIGN where needed.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David Miller [Tue, 2 May 2017 15:36:45 +0000 (11:36 -0400)]
bpf: Align packet data properly in program testing framework.
Make sure we apply NET_IP_ALIGN when reserving headroom for SKB
and XDP test runs, just like a real driver would.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
David Miller [Tue, 2 May 2017 15:36:33 +0000 (11:36 -0400)]
bpf: Do not dereference user pointer in bpf_test_finish().
Instead, pass the kattr in which has a kernel side copy of this
data structure from userspace already.
Fix based upon a suggestion from Alexei Starovoitov.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
David S. Miller [Tue, 2 May 2017 14:52:01 +0000 (07:52 -0700)]
selftests: bpf: Use bpf_endian.h in test_xdp.c
This fixes the testcase on big-endian.
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 1 May 2017 22:53:43 +0000 (15:53 -0700)]
xdp: fix parameter kdoc for extack
Fix kdoc parameter spelling from extact to extack.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Mon, 1 May 2017 22:47:09 +0000 (00:47 +0200)]
bpf, samples: fix build warning in cookie_uid_helper_example
Fix the following warnings triggered by
51570a5ab2b7 ("A Sample of
using socket cookie and uid for traffic monitoring"):
In file included from /home/foo/net-next/samples/bpf/cookie_uid_helper_example.c:54:0:
/home/foo/net-next/samples/bpf/cookie_uid_helper_example.c: In function 'prog_load':
/home/foo/net-next/samples/bpf/cookie_uid_helper_example.c:119:27: warning: overflow in implicit constant conversion [-Woverflow]
-32 + offsetof(struct stats, uid)),
^
/home/foo/net-next/samples/bpf/libbpf.h:135:12: note: in definition of macro 'BPF_STX_MEM'
.off = OFF, \
^
/home/foo/net-next/samples/bpf/cookie_uid_helper_example.c:121:27: warning: overflow in implicit constant conversion [-Woverflow]
-32 + offsetof(struct stats, packets), 1),
^
/home/foo/net-next/samples/bpf/libbpf.h:155:12: note: in definition of macro 'BPF_ST_MEM'
.off = OFF, \
^
/home/foo/net-next/samples/bpf/cookie_uid_helper_example.c:129:27: warning: overflow in implicit constant conversion [-Woverflow]
-32 + offsetof(struct stats, bytes)),
^
/home/foo/net-next/samples/bpf/libbpf.h:135:12: note: in definition of macro 'BPF_STX_MEM'
.off = OFF, \
^
HOSTLD /home/foo/net-next/samples/bpf/per_socket_stats_example
Fixes: 51570a5ab2b7 ("A Sample of using socket cookie and uid for traffic monitoring")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 2 May 2017 03:26:02 +0000 (20:26 -0700)]
sparc64: Fix BPF JIT wrt. branches and ldimm64 instructions.
Like other JITs, sparc64 maintains an array of instruction offsets but
stores the entries off by one. This is done because jumps to the
exit block are indexed to one past the last BPF instruction.
So if we size the array by the program length, we need to record
the previous instruction in order to stay within the array bounds.
This is explained in ARM JIT commit
8eee539ddea0 ("arm64: bpf: fix
out-of-bounds read in bpf2a64_offset()").
But this scheme requires a little bit of careful handling when
the instruction before the branch destination is a 64-bit load
immediate. It takes up 2 BPF instruction slots.
Therefore, we have to fill in the array entry for the second
half of the 64-bit load immediate instruction rather than for
the one for the beginning of that instruction.
Fixes: 7a12b5031c6b ("sparc64: Add eBPF JIT.")
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal [Mon, 1 May 2017 20:18:01 +0000 (22:18 +0200)]
rhashtable: compact struct rhashtable_params
By using smaller datatypes this (rather large) struct shrinks considerably
(80 -> 48 bytes on x86_64).
As this is embedded in other structs, this also rerduces size of several
others, e.g. cls_fl_head or nft_hash.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 1 May 2017 19:58:21 +0000 (12:58 -0700)]
bpf: Include bpf_endian.h in test_progs.c too.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 1 May 2017 19:43:49 +0000 (12:43 -0700)]
bpf: Move endianness BPF helpers out of bpf_util.h
We do not want to include things like stdio.h and friends into
eBPF program builds. bpf_util.h is for host compiled programs,
so eBPF C-code helpers don't really belong there.
Add a new bpf_endian.h as a quick fix for this for now.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 1 May 2017 19:10:20 +0000 (15:10 -0400)]
ipv6: Need to export ipv6_push_frag_opts for tunneling now.
Since that change also made the nfrag function not necessary
for exports, remove it.
Fixes: 89a23c8b528b ("ip6_tunnel: Fix missing tunnel encapsulation limit option")
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 1 May 2017 19:03:14 +0000 (15:03 -0400)]
Merge branch 'dsa-mv88e6xxx-802.1s-and-
88E6390-VTU'
Vivien Didelot says:
====================
net: dsa: mv88e6xxx: 802.1s and
88E6390 VTU
This patch series adds support for the VLAN Table Unit (a.k.a. the VTU)
to the
88E6390 family of Marvell Ethernet switch chips. The plumbing for
the per VLAN Spanning Tree support is added as a side effect of the
necessary refactoring.
The patchset is split up so that no duplication of code is introduced.
With this patchset applied, the mv88e6xxx driver has 2 new function
pointers for the VTU GetNext and VTU Load/Purge operations (with 3
implementations), both handling programmation of 802.1q and 802.1s.
On a ZII Rev C board (featuring 2 88E6390X chips) with all ports bridged
together, we obtain the following hardware VLAN configuration:
# cat /sys/class/net/br0/bridge/vlan_filtering
1
# cat /sys/class/net/br0/bridge/default_pvid
42
# bridge vlan add dev lan3 vid 666
# bridge vlan show
port vlan ids
lan1 42 PVID Egress Untagged
lan1 42 PVID Egress Untagged
lan2 42 PVID Egress Untagged
lan2 42 PVID Egress Untagged
lan3 42 PVID Egress Untagged
666
lan3 42 PVID Egress Untagged
666
lan4 42 PVID Egress Untagged
lan4 42 PVID Egress Untagged
lan5 42 PVID Egress Untagged
lan5 42 PVID Egress Untagged
lan6 42 PVID Egress Untagged
lan6 42 PVID Egress Untagged
lan7 42 PVID Egress Untagged
lan7 42 PVID Egress Untagged
lan8 42 PVID Egress Untagged
lan8 42 PVID Egress Untagged
br0 42 PVID Egress Untagged
Below are the technical details for the different implementations.
All switch families have up to 3 dedicated VTU Data registers used to
program 802.1q and 802.1s, both using 2-bit values.
On
88E6185 and
88E6352 families, port membership and state are adjacent,
while the
88E6390 family share the same bits:
Bits
88E6185/
88E6352 88E6390
----- ----------------- --------------------------
0-1 Port 0 membership Port 0 membership or state
2-3 Port 0 state Port 1 membership or state
4-5 Port 1 membership Port 2 membership or state
6-7 Port 1 state Port 3 membership or state
8-9 Port 2 membership Port 4 membership or state
10-11 Port 2 state Port 5 membership or state
... ... ...
The
88E6185 family programs all ports membership and state in a single
VTU GetNext or Load/Purge operation.
The
88E6352 family introduced an indirect Spanning Tree Unit table
(a.k.a. STU) which requires additional STU GetNext and Load/Purge
operations to read and write the ports state bits.
The
88E6390 family also has an STU and requires data bits to be accessed
before and after every single VTU or STU operation.
Finally, the
88E6390 family introduced a 13th bit for the VLAN ID, which
must be taken care of regardless the VTU operating mode. This means that
iterating over the VTU now starts or ends with value 8191, not 4095.
Patch 1 adds a max_vid field to the chip info structure.
Patch 2 adds 802.1q and 802.1s data to the generic VTU entry structure.
Patches 3 to 10 move helpers to a dedicated file (later made static).
Patches 11 and 12 abstract handling of the STU behind VTU operations.
Patches 13 and 14 add the new function pointers for VTU operations.
Patches 15 and 18 polish the VTU code and add VTU support for
88E6390.
Changes in v2:
- add Reviewed-by tags
- fix comments in 8/18
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Mon, 1 May 2017 18:05:27 +0000 (14:05 -0400)]
net: dsa: mv88e6xxx: add VTU support for
88E6390
The 6390 family of chips use only 2 of the 3 VTU Data registers to pack
the MemberTag and PortState VLAN data. This means that they must be
written or read before or after each VTU/STU operations.
Implement this variant to add support for VTU with such chips. These
chips have a 13th bit for the VID thus set their max_vid to 8191.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Mon, 1 May 2017 18:05:26 +0000 (14:05 -0400)]
net: dsa: mv88e6xxx: support the VTU Page bit
Newer chips such as the
88E6390 have a VTU Page bit in the VTU VID
register to specify a 13th bit for the VID. This can be used to support
8K VLANs.
When dumping the whole VTU, all VID bits must be set to one, including
this VTU Page bit. Add support for VID greater than 4095.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Mon, 1 May 2017 18:05:25 +0000 (14:05 -0400)]
net: dsa: mv88e6xxx: simplify VTU entry getter
Make the code which fetches or initializes a new VTU entry more concise.
This allows us the get rid of the old underscore prefix naming.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Mon, 1 May 2017 18:05:24 +0000 (14:05 -0400)]
net: dsa: mv88e6xxx: make VTU helpers static
Now that we have chip operations for VTU accesses, mark all helpers from
global1_vtu.c as static. Only the various implementations of the
GetNext, LoadPurge and Flush operations need to be exposed.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Mon, 1 May 2017 18:05:23 +0000 (14:05 -0400)]
net: dsa: mv88e6xxx: add VTU Load/Purge operation
Add a new vtu_loadpurge operation to the chip info structure to differ
the various implementations of the VTU accesses.
Now that the STU handling is abstracted behind VTU operations, kill the
obsolete MV88E6XXX_FLAG_STU flag.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Mon, 1 May 2017 18:05:22 +0000 (14:05 -0400)]
net: dsa: mv88e6xxx: add VTU GetNext operation
Add a new vtu_getnext operation to the chip info structure to differ the
various implementations of the VTU accesses.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Mon, 1 May 2017 18:05:21 +0000 (14:05 -0400)]
net: dsa: mv88e6xxx: load STU entry with VTU entry
Now that the code writes both VTU and STU data when loading a VTU entry,
load the corresponding STU entry at the same time.
This allows us to get rid of the STU management in the
_mv88e6xxx_vtu_new helper and thus remove the separate implementations
of STU Load/Purge and STU GetNext, as well as the unused family checks.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Mon, 1 May 2017 18:05:20 +0000 (14:05 -0400)]
net: dsa: mv88e6xxx: get STU entry on VTU GetNext
Now that the code reads both VTU and STU data on VTU GetNext operation,
fetch the STU entry data of a VTU entry at the same time.
The STU data bits are masked with the VTU data bits and they are now all
read at the same time a VTU GetNext operation is issued.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Mon, 1 May 2017 18:05:19 +0000 (14:05 -0400)]
net: dsa: mv88e6xxx: move STU GetNext operation
Extract the generic portion of code to issue an STU GetNext operation,
which will be used in other implementations.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Mon, 1 May 2017 18:05:18 +0000 (14:05 -0400)]
net: dsa: mv88e6xxx: move VTU Data accessors
The code to access the VTU Data registers currently only supports the
88E6185 family and alike: 2-bit membership adjacent to 2-bit port state.
Even though the
88E6352 family introduced an indirect table to program
the VLAN Spanning Tree states, the usage of the VTU Data registers
remains the same regardless the VTU or STU operation.
Now that the mv88e6xxx_vtu_entry structure contains both port membership
and states data, factorize the code to access them in global1_vtu.c.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Mon, 1 May 2017 18:05:17 +0000 (14:05 -0400)]
net: dsa: mv88e6xxx: move generic VTU GetNext
Even though every switch model has a different way to access the VTU
Data bits, the base implementation of the VTU GetNext operation remains
the same: wait, write the first VID to iterate from, start the
operation, and read the next VID.
Move this generic implementation into global1_vtu.c and abstract the
handling of the start VID (similarly to the ATU GetNext implementation),
before introducing a new chip operation for specific chips.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Mon, 1 May 2017 18:05:16 +0000 (14:05 -0400)]
net: dsa: mv88e6xxx: move VTU VID accessors
Add helpers to access the VTU VID register in the global1_vtu.c file.
At the same time, move mv88e6xxx_g1_vtu_vid_write at the beginning of
_mv88e6xxx_vtu_loadpurge, which adds no functional changes but makes
future patches simpler.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Mon, 1 May 2017 18:05:15 +0000 (14:05 -0400)]
net: dsa: mv88e6xxx: move VTU SID accessors
Add helpers to access the VTU SID register in the global1_vtu.c file.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Mon, 1 May 2017 18:05:14 +0000 (14:05 -0400)]
net: dsa: mv88e6xxx: move VTU FID accessors
Add helpers to access the VTU FID register in the global1_vtu.c file.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Mon, 1 May 2017 18:05:13 +0000 (14:05 -0400)]
net: dsa: mv88e6xxx: move VTU flush
Move the VTU flush operation to global1_vtu.c and call it from a
mv88e6xxx_vtu_setup helper, similarly to the ATU and PVT setup.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Mon, 1 May 2017 18:05:12 +0000 (14:05 -0400)]
net: dsa: mv88e6xxx: move VTU Operation accessors
Move the helper functions to access the Global 1 VTU Operation register
to a new global1_vtu.c file, and get rid of the old underscore prefix
naming convention. This file will be extended will all VTU/STU related
code.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Mon, 1 May 2017 18:05:11 +0000 (14:05 -0400)]
net: dsa: mv88e6xxx: split VTU entry data member
VLAN aware Marvell chips can program 802.1Q VLAN membership as well as
802.1s per VLAN Spanning Tree state using the same 3 VTU Data registers.
Some chips such as
88E6185 use different Data registers offsets for
ports state and membership, and program them in a single operation.
Other chips such as
88E6352 use the same register layout but program
them in distinct operations (an indirect table is used for 802.1s.)
Newer chips such as
88E6390 use the same offsets for both state and
membership in distinct operations, thus require multiple data accesses.
To correctly abstract this, split the "data" structure member of
mv88e6xxx_vtu_entry in two "state" and "member" members, before adding
VTU support for newer chips.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Mon, 1 May 2017 18:05:10 +0000 (14:05 -0400)]
net: dsa: mv88e6xxx: add max VID to info
Some chips don't have a VLAN Table Unit, most of them do have a 4K
table, some others as the
88E6390 family has a 13th bit for the VID.
Add a new max_vid member to the info structure, used to check the
presence of a VTU as well as the value used to iterate from in VTU
GetNext operations.
This makes the MV88E6XXX_FLAG_VTU obsolete, thus remove it.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ilan Tayari [Sun, 30 Apr 2017 13:51:19 +0000 (16:51 +0300)]
xfrm: Indicate xfrm_state offload errors
Current code silently ignores driver errors when configuring
IPSec offload xfrm_state, and falls back to host-based crypto.
Fail the xfrm_state creation if the driver has an error, because
the NIC offloading was explicitly requested by the user program.
This will communicate back to the user that there was an error.
Fixes: d77e38e612a0 ("xfrm: Add an IPsec hardware offloading API")
Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ilan Tayari [Sun, 30 Apr 2017 13:34:38 +0000 (16:34 +0300)]
net/esp4: Fix invalid esph pointer crash
Both esp_output and esp_xmit take a pointer to the ESP header
and place it in esp_info struct prior to calling esp_output_head.
Inside esp_output_head, the call to esp_output_udp_encap
makes sure to update the pointer if it gets invalid.
However, if esp_output_head itself calls skb_cow_data, the
pointer is not updated and stays invalid, causing a crash
after esp_output_head returns.
Update the pointer if it becomes invalid in esp_output_head
Fixes: fca11ebde3f0 ("esp4: Reorganize esp_output")
Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Craig Gallek [Wed, 26 Apr 2017 18:37:45 +0000 (14:37 -0400)]
ip6_tunnel: Fix missing tunnel encapsulation limit option
The IPv6 tunneling code tries to insert IPV6_TLV_TNL_ENCAP_LIMIT and
IPV6_TLV_PADN options when an encapsulation limit is defined (the
default is a limit of 4). An MTU adjustment is done to account for
these options as well. However, the options are never present in the
generated packets.
The issue appears to be a subtlety between IPV6_DSTOPTS and
IPV6_RTHDRDSTOPTS defined in RFC 3542. When the IPIP tunnel driver was
written, the encap limit options were included as IPV6_RTHDRDSTOPTS in
dst0opt of struct ipv6_txoptions. Later, ipv6_push_nfrags_opts was
(correctly) updated to require IPV6_RTHDR options when IPV6_RTHDRDSTOPTS
are to be used. This caused the options to no longer be included in v6
encapsulated packets.
The fix is to use IPV6_DSTOPTS (in dst1opt of struct ipv6_txoptions)
instead. IPV6_DSTOPTS do not have the additional IPV6_RTHDR requirement.
Fixes: 1df64a8569c7: ("[IPV6]: Add ip6ip6 tunnel driver.")
Fixes: 333fad5364d6: ("[IPV6]: Support several new sockopt / ancillary data in Advanced API (RFC3542)")
Signed-off-by: Craig Gallek <kraig@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 1 May 2017 18:34:46 +0000 (14:34 -0400)]
Merge branch 'for-upstream' of git://git./linux/kernel/git/bluetooth/bluetooth-next
Johan Hedberg says:
====================
pull request: bluetooth-next 2017-04-30
Here's one last batch of Bluetooth patches in the bluetooth-next tree
targeting the 4.12 kernel.
- Remove custom ECDH implementation and use new KPP API instead
- Add protocol checks to hci_ldisc
- Add module license to HCI UART Nokia H4+ driver
- Minor fix for 32bit user space - 64 bit kernel combination
Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Liam Beguin [Mon, 1 May 2017 15:02:01 +0000 (11:02 -0400)]
switchdev: documentation: fix whitespace issues
Figure 1 is full of whitespaces; fix it
Signed-off-by: Liam Beguin <lbeguin@tycoint.com>
Signed-off-by: Sylvain Lemieux <slemieux@tycoint.com>
Acked-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Sun, 30 Apr 2017 16:47:14 +0000 (19:47 +0300)]
mlxsw: spectrum_router: Simplify VRF enslavement
When a netdev is enslaved to a VRF master, its router interface (RIF)
needs to be destroyed (if exists) and a new one created using the
corresponding virtual router (VR).
>From the driver's perspective, the above is equivalent to an inetaddr
event sent for this netdev. Therefore, when a port netdev (or its
uppers) are enslaved to a VRF master, call the same function that
would've been called had a NETDEV_UP was sent for this netdev in the
inetaddr notification chain.
This patch also fixes a bug when a LAG netdev with an existing RIF is
enslaved to a VRF. Before this patch, each LAG port would drop the
reference on the RIF, but would re-join the same one (in the wrong VR)
soon after. With this patch, the corresponding RIF is first destroyed
and a new one is created using the correct VR.
Fixes: 7179eb5acd59 ("mlxsw: spectrum_router: Add support for VRFs")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 1 May 2017 15:47:10 +0000 (11:47 -0400)]
Merge tag 'mlx5-updates-2017-04-30' of git://git./linux/kernel/git/saeed/linux
mlx5-updates-2017-04-30
Or says:
================
mlx5 neigh update
This series (whose code name is 'neigh update') from Hadar, enhances the
mlx5 TC IP tunnel offloads to deal with changes to tunnel destination
neighbours used in offloaded flows which involved encapsulation.
In order to keep track on the validity state of such neighbours, we register
a netevent notifier callback and act on NEIGH_UPDATE events: if a neighbour
becomes valid, offload the related flows to HW (the other way around when
neigh becomes invalid) and similarly when a neigh mac addresses changes.
Since this traffic is offloaded from the host OS, the neighbour for the IP
tunnel destination can mistakenly become STALE and deleted by the kernel
since its 'used' value wasn't changed. To address that, we proactively
update the neighbour 'used' value every DELAY_PROBE_TIME seconds, using
time stamps generated by the existing driver code for HW flow counters.
We use the DELAY_PROBE_TIME_UPDATE event to adjust the frequency of the updates.
Prior to the core of the series, there's a patch from Saeed that introduces an
extendable vport representor implementation scheme. It provides a separation
between the eswitch to the netdev related aspects of the representors.
We would like to thank Ido Schimmel and Ilya Lesokhin for their coaching && advice
through the long design and review cycles while we struggled to understand and
(hopefully correctly) implement the locking around the different driver flows(..) .
- Or.
=================
Misc Updates:
From Tariq:
Some small performance and trivial code optimization for mlx5 netdev driver
- Optimize poll ICOSQ completion queue
- Use prefetchw when a write is to follow
- Use u8 as ownership type in mlx5e_get_cqe()
From Eran:
- Disable LRO by default on specific setups
From Eli:
- Small cleanup for E-Switch to avoid redundant allocation
Signed-off-by: David S. Miller <davem@davemloft.net>
Mintz, Yuval [Sun, 30 Apr 2017 09:14:44 +0000 (12:14 +0300)]
qed: Prevent warning without CONFIG_RFS_ACCEL
After removing the PTP related initialization from slowpath start,
the remaining PTT entry is required only in case CONFIG_RFS_ACCEL is set.
Otherwise, it leads to a warning due to it being unused.
Fixes: d179bd1699fc ("qed: Acquire/release ptt_ptp lock when enabling/disabling PTP")
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 1 May 2017 15:42:16 +0000 (11:42 -0400)]
Merge branch 'qed-RoCE-fixes'
Yuval Mintz says:
====================
qed: RoCE related pseudo-fixes
This series contains multiple small corrections to the RoCE logic
in qed plus some debug information and inter-module parameter
meant to prevent issues further along.
- #1, #6 Share information with protocol driver
[either new or filling missing bits in existing API].
- #2, #3 correct error flows in qed.
- #4 add debug related information.
- #5 fixes a minor issue in the HW configuration.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ram Amrani [Sun, 30 Apr 2017 08:49:10 +0000 (11:49 +0300)]
qed: output the DPM status and WID count
Output to the RDMA driver whether DPM mode is enabled or disabled in
the HW and if so what is the number of WIDs it supports
Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ram Amrani [Sun, 30 Apr 2017 08:49:09 +0000 (11:49 +0300)]
qed: align DPI configuration to HW requirements
When calculating doorbell BAR partitioning round up the number of
CPUs to the nearest power of 2 so the size of the DPI (per user
section) configured in the hardware will be stored properly and
not truncated.
Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ram Amrani [Sun, 30 Apr 2017 08:49:08 +0000 (11:49 +0300)]
qed: verify RoCE resource bitmaps are released
Add mechanism to verify RoCE resources are released prior to freeing the
bitmaps. If this is not the case, print what resources were not released.
Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ram Amrani [Sun, 30 Apr 2017 08:49:07 +0000 (11:49 +0300)]
qed: add error handling flow to TID deregistratin posting failure
If the posting of the ramrod for the purpose of TID deregistration
fails, abort the deregistration operation without using the FW's
return code.
Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ram Amrani [Sun, 30 Apr 2017 08:49:06 +0000 (11:49 +0300)]
qed: remove unused SQ error state
The internal RoCE SQE QP state isn't being used. Instead we mark the
QP as in regular error state.
Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ram Amrani [Sun, 30 Apr 2017 08:49:05 +0000 (11:49 +0300)]
qed: configure the RoCE max message size
Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yonghong Song [Sun, 30 Apr 2017 05:52:42 +0000 (22:52 -0700)]
bpf: enhance verifier to understand stack pointer arithmetic
llvm 4.0 and above generates the code like below:
....
440: (b7) r1 = 15
441: (05) goto pc+73
515: (79) r6 = *(u64 *)(r10 -152)
516: (bf) r7 = r10
517: (07) r7 += -112
518: (bf) r2 = r7
519: (0f) r2 += r1
520: (71) r1 = *(u8 *)(r8 +0)
521: (73) *(u8 *)(r2 +45) = r1
....
and the verifier complains "R2 invalid mem access 'inv'" for insn #521.
This is because verifier marks register r2 as unknown value after #519
where r2 is a stack pointer and r1 holds a constant value.
Teach verifier to recognize "stack_ptr + imm" and
"stack_ptr + reg with const val" as valid stack_ptr with new offset.
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Karim Eshapa [Mon, 1 May 2017 13:58:08 +0000 (15:58 +0200)]
benet: Use time_before_eq for time comparison
Use time_before_eq for time comparison more safe and dealing
with timer wrapping to be future-proof.
Signed-off-by: Karim Eshapa <karim.eshapa@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Benjamin LaHaise [Mon, 1 May 2017 13:58:40 +0000 (09:58 -0400)]
flower: check unused bits in MPLS fields
Since several of the the netlink attributes used to configure the flower
classifier's MPLS TC, BOS and Label fields have additional bits which are
unused, check those bits to ensure that they are actually 0 as suggested
by Jamal.
Signed-off-by: Benjamin LaHaise <benjamin.lahaise@netronome.com>
Cc: David Miller <davem@davemloft.net>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Simon Horman <simon.horman@netronome.com>
Cc: Jakub Kicinski <kubakici@wp.pl>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 1 May 2017 14:46:50 +0000 (10:46 -0400)]
Merge git://git./linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:
====================
Netfilter/IPVS updates for net-next
The following patchset contains Netfilter updates for your net-next
tree. A large bunch of code cleanups, simplify the conntrack extension
codebase, get rid of the fake conntrack object, speed up netns by
selective synchronize_net() calls. More specifically, they are:
1) Check for ct->status bit instead of using nfct_nat() from IPVS and
Netfilter codebase, patch from Florian Westphal.
2) Use kcalloc() wherever possible in the IPVS code, from Varsha Rao.
3) Simplify FTP IPVS helper module registration path, from Arushi Singhal.
4) Introduce nft_is_base_chain() helper function.
5) Enforce expectation limit from userspace conntrack helper,
from Gao Feng.
6) Add nf_ct_remove_expect() helper function, from Gao Feng.
7) NAT mangle helper function return boolean, from Gao Feng.
8) ctnetlink_alloc_expect() should only work for conntrack with
helpers, from Gao Feng.
9) Add nfnl_msg_type() helper function to nfnetlink to build the
netlink message type.
10) Get rid of unnecessary cast on void, from simran singhal.
11) Use seq_puts()/seq_putc() instead of seq_printf() where possible,
also from simran singhal.
12) Use list_prev_entry() from nf_tables, from simran signhal.
13) Remove unnecessary & on pointer function in the Netfilter and IPVS
code.
14) Remove obsolete comment on set of rules per CPU in ip6_tables,
no longer true. From Arushi Singhal.
15) Remove duplicated nf_conntrack_l4proto_udplite4, from Gao Feng.
16) Remove unnecessary nested rcu_read_lock() in
__nf_nat_decode_session(). Code running from hooks are already
guaranteed to run under RCU read side.
17) Remove deadcode in nf_tables_getobj(), from Aaron Conole.
18) Remove double assignment in nf_ct_l4proto_pernet_unregister_one(),
also from Aaron.
19) Get rid of unsed __ip_set_get_netlink(), from Aaron Conole.
20) Don't propagate NF_DROP error to userspace via ctnetlink in
__nf_nat_alloc_null_binding() function, from Gao Feng.
21) Revisit nf_ct_deliver_cached_events() to remove unnecessary checks,
from Gao Feng.
22) Kill the fake untracked conntrack objects, use ctinfo instead to
annotate a conntrack object is untracked, from Florian Westphal.
23) Remove nf_ct_is_untracked(), now obsolete since we have no
conntrack template anymore, from Florian.
24) Add event mask support to nft_ct, also from Florian.
25) Move nf_conn_help structure to
include/net/netfilter/nf_conntrack_helper.h.
26) Add a fixed 32 bytes scratchpad area for conntrack helpers.
Thus, we don't deal with variable conntrack extensions anymore.
Make sure userspace conntrack helper doesn't go over that size.
Remove variable size ct extension infrastructure now this code
got no more clients. From Florian Westphal.
27) Restore offset and length of nf_ct_ext structure to 8 bytes now
that wraparound is not possible any longer, also from Florian.
28) Allow to get rid of unassured flows under stress in conntrack,
this applies to DCCP, SCTP and TCP protocols, from Florian.
29) Shrink size of nf_conntrack_ecache structure, from Florian.
30) Use TCP_MAX_WSCALE instead of hardcoded 14 in TCP tracker,
from Gao Feng.
31) Register SYNPROXY hooks on demand, from Florian Westphal.
32) Use pernet hook whenever possible, instead of global hook
registration, from Florian Westphal.
33) Pass hook structure to ebt_register_table() to consolidate some
infrastructure code, from Florian Westphal.
34) Use consume_skb() and return NF_STOLEN, instead of NF_DROP in the
SYNPROXY code, to make sure device stats are not fooled, patch
from Gao Feng.
35) Remove NF_CT_EXT_F_PREALLOC this kills quite some code that we
don't need anymore if we just select a fixed size instead of
expensive runtime time calculation of this. From Florian.
36) Constify nf_ct_extend_register() and nf_ct_extend_unregister(),
from Florian.
37) Simplify nf_ct_ext_add(), this kills nf_ct_ext_create(), from
Florian.
38) Attach NAT extension on-demand from masquerade and pptp helper
path, from Florian.
39) Get rid of useless ip_vs_set_state_timeout(), from Aaron Conole.
40) Speed up netns by selective calls of synchronize_net(), from
Florian Westphal.
41) Silence stack size warning gcc in 32-bit arch in snmp helper,
from Florian.
42) Inconditionally call nf_ct_ext_destroy(), even if we have no
extensions, to deal with the NF_NAT_MANIP_SRC case. Patch from
Liping Zhang.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 1 May 2017 14:42:38 +0000 (10:42 -0400)]
Merge branch 'bpf-samples-skb_mode-bug-fixes'
Jesper Dangaard Brouer says:
====================
samples/bpf: two bug fixes to XDP_FLAGS_SKB_MODE attaching
Two small bugfixes for:
commit
3993f2cb983b ("samples/bpf: Add support for SKB_MODE to xdp1 and xdp_tx_iptunnel")
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesper Dangaard Brouer [Mon, 1 May 2017 09:26:20 +0000 (11:26 +0200)]
samples/bpf: fix XDP_FLAGS_SKB_MODE detach for xdp_tx_iptunnel
The xdp_tx_iptunnel program can be terminated in two ways, after
N-seconds or via Ctrl-C SIGINT. The SIGINT code path does not
handle detatching the correct XDP program, in-case the program
was attached with XDP_FLAGS_SKB_MODE.
Fix this by storing the XDP flags as a global variable, which is
available for the SIGINT handler function.
Fixes: 3993f2cb983b ("samples/bpf: Add support for SKB_MODE to xdp1 and xdp_tx_iptunnel")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesper Dangaard Brouer [Mon, 1 May 2017 09:26:15 +0000 (11:26 +0200)]
samples/bpf: fix SKB_MODE flag to be a 32-bit unsigned int
The kernel side of XDP_FLAGS_SKB_MODE is unsigned, and the rtnetlink
IFLA_XDP_FLAGS is defined as NLA_U32. Thus, userspace programs under
samples/bpf/ should use the correct type.
Fixes: 3993f2cb983b ("samples/bpf: Add support for SKB_MODE to xdp1 and xdp_tx_iptunnel")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 1 May 2017 14:35:48 +0000 (10:35 -0400)]
Merge branch 'xdp-netlink-ext-ack'
Jakub Kicinski says:
====================
xdp: use netlink extended ACK reporting
This series is an attempt to make XDP more user friendly by
enabling exploiting the recently added netlink extended ACK
reporting to carry messages to user space.
David Ahern's iproute2 ext ack patches for ip link are sufficient
to show the errors like this:
Error: nfp: MTU too large w/ XDP enabled
Where the message is coming directly from the driver. There could
still be a bit of a leap for a complete novice from the message
above to the right settings, but it's a big improvement over the
standard "Invalid argument" message.
v1/non-rfc:
- add a separate macro in patch 1;
- add KBUILD_MODNAME as part of the message (Daniel);
- don't print the error to logs in patch 1.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 1 May 2017 04:46:48 +0000 (21:46 -0700)]
virtio_net: make use of extended ack message reporting
Try to carry error messages to the user via the netlink extended
ack message attribute.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 1 May 2017 04:46:47 +0000 (21:46 -0700)]
nfp: make use of extended ack message reporting
Try to carry error messages to the user via the netlink extended
ack message attribute.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 1 May 2017 04:46:46 +0000 (21:46 -0700)]
xdp: propagate extended ack to XDP setup
Drivers usually have a number of restrictions for running XDP
- most common being buffer sizes, LRO and number of rings.
Even though some drivers try to be helpful and print error
messages experience shows that users don't often consult
kernel logs on netlink errors. Try to use the new extended
ack mechanism to carry the message back to user space.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 1 May 2017 04:46:45 +0000 (21:46 -0700)]
netlink: add NULL-friendly helper for setting extended ACK message
As we propagate extended ack reporting throughout various paths in
the kernel it may be that the same function is called with the
extended ack parameter passed as NULL. One place where that happens
is in drivers which have a centralized reconfiguration function
called both from ndos and from ethtool_ops. Add a new helper for
setting the error message in such conditions.
Existing helper is left as is to encourage propagating the ext act
fully wherever possible. It also makes it clear in the code which
messages may be lost due to ext ack being NULL.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Liping Zhang [Sat, 29 Apr 2017 13:59:49 +0000 (21:59 +0800)]
netfilter: nf_ct_ext: invoke destroy even when ext is not attached
For NF_NAT_MANIP_SRC, we will insert the ct to the nat_bysource_table,
then remove it from the nat_bysource_table via nat_extend->destroy.
But now, the nat extension is attached on demand, so if the nat extension
is not attached, we will not be notified when the ct is destroyed, i.e.
we may fail to remove ct from the nat_bysource_table.
So just keep it simple, even if the extension is not attached, we will
still invoke the related ext->destroy. And this will also preserve the
flexibility for the future extension.
Fixes: 9a08ecfe74d7 ("netfilter: don't attach a nat extension by default")
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pablo Neira Ayuso [Mon, 1 May 2017 09:45:47 +0000 (11:45 +0200)]
Merge tag 'ipvs3-for-v4.12' of git./linux/kernel/git/horms/ipvs-next
Simon Horman says:
====================
Third Round of IPVS Updates for v4.12
please consider these enhancements to IPVS for v4.12.
If it is too late for v4.12 then please consider them for v4.13.
* Remove unused function
* Correct comparison of unsigned value
====================
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Westphal [Thu, 27 Apr 2017 14:39:43 +0000 (16:39 +0200)]
netfilter: snmp: avoid stack size warning
net/ipv4/netfilter/nf_nat_snmp_basic.c:1158:1: warning: the frame size
of 1160 bytes is larger than 1024 bytes
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Westphal [Mon, 24 Apr 2017 13:37:41 +0000 (15:37 +0200)]
netfilter: nf_queue: only call synchronize_net twice if nf_queue is active
nf_unregister_net_hook(s) can avoid a second call to synchronize_net,
provided there is no nfqueue active in that net namespace (which is
the common case).
This also gets rid of the extra arg to nf_queue_nf_hook_drop(), normally
this gets called during netns cleanup so no packets should be queued.
For the rare case of base chain being unregistered or module removal
while nfqueue is in use the extra hiccup due to the packet drops isn't
a big deal.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Westphal [Tue, 25 Apr 2017 08:24:03 +0000 (10:24 +0200)]
netfilter: nf_log: don't call synchronize_rcu in nf_log_unset
nf_log_unregister() (which is what gets called in the logger backends
module exit paths) does a (required, module is removed) synchronize_rcu().
But nf_log_unset() is only called from pernet exit handlers. It doesn't
free any memory so there appears to be no need to call synchronize_rcu.
v2: Liping Zhang points out that nf_log_unregister() needs to be called
after pernet unregister, else rmmod would become unsafe.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Westphal [Mon, 24 Apr 2017 13:37:39 +0000 (15:37 +0200)]
netfilter: batch synchronize_net calls during hook unregister
synchronize_net is expensive and slows down netns cleanup a lot.
We have two APIs to unregister a hook:
nf_unregister_net_hook (which calls synchronize_net())
and
nf_unregister_net_hooks (calls nf_unregister_net_hook in a loop)
Make nf_unregister_net_hook a wapper around new helper
__nf_unregister_net_hook, which unlinks the hook but does not free it.
Then, we can call that helper in nf_unregister_net_hooks and then
call synchronize_net() only once.
Andrey Konovalov reports this change improves syzkaller fuzzing speed at
least twice.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Abhishek Shah [Sun, 30 Apr 2017 05:34:21 +0000 (11:04 +0530)]
net: phy: Allow BCM5481x PHYs to setup internal TX/RX clock delay
This patch allows users to enable/disable internal TX and/or RX
clock delay for BCM5481x series PHYs so as to satisfy RGMII timing
specifications.
On a particular platform, whether TX and/or RX clock delay is required
depends on how PHY connected to the MAC IP. This requirement can be
specified through "phy-mode" property in the platform device tree.
Signed-off-by: Abhishek Shah <abhishek.shah@broadcom.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Sat, 29 Apr 2017 21:38:57 +0000 (22:38 +0100)]
net: sunhme: fix spelling mistakes: "ParityErro" -> "ParityError"
trivial fix to spelling mistakes in printk message.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Scott Wood [Sat, 29 Apr 2017 00:17:41 +0000 (19:17 -0500)]
bnx2x: Align RX buffers
The bnx2x driver is not providing proper alignment on the receive buffers it
passes to build_skb(), causing skb_shared_info to be misaligned.
skb_shared_info contains an atomic, and while PPC normally supports
unaligned accesses, it does not support unaligned atomics.
Aligning the size of rx buffers will ensure that page_frag_alloc() returns
aligned addresses.
This can be reproduced on PPC by setting the network MTU to 1450 (or other
non-multiple-of-4) and then generating sufficient inbound network traffic
(one or two large "wget"s usually does it), producing the following oops:
Unable to handle kernel paging request for unaligned access at address 0xc00000ffc43af656
Faulting instruction address: 0xc00000000080ef8c
Oops: Kernel access of bad area, sig: 7 [#1]
SMP NR_CPUS=2048
NUMA
PowerNV
Modules linked in: vmx_crypto powernv_rng rng_core powernv_op_panel leds_powernv led_class nfsd ip_tables x_tables autofs4 xfs lpfc bnx2x mdio libcrc32c crc_t10dif crct10dif_generic crct10dif_common
CPU: 104 PID: 0 Comm: swapper/104 Not tainted
4.11.0-rc8-00088-g4c761da #2
task:
c00000ffd4892400 task.stack:
c00000ffd4920000
NIP:
c00000000080ef8c LR:
c00000000080eee8 CTR:
c0000000001f8320
REGS:
c00000ffffc33710 TRAP: 0600 Not tainted (
4.11.0-rc8-00088-g4c761da)
MSR:
9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>
CR:
24082042 XER:
00000000
CFAR:
c00000000080eea0 DAR:
c00000ffc43af656 DSISR:
00000000 SOFTE: 1
GPR00:
c000000000907f64 c00000ffffc33990 c000000000dd3b00 c00000ffcaf22100
GPR04:
c00000ffcaf22e00 0000000000000000 0000000000000000 0000000000000000
GPR08:
0000000000b80008 c00000ffc43af636 c00000ffc43af656 0000000000000000
GPR12:
c0000000001f6f00 c00000000fe1a000 000000000000049f 000000000000c51f
GPR16:
00000000ffffef33 0000000000000000 0000000000008a43 0000000000000001
GPR20:
c00000ffc58a90c0 0000000000000000 000000000000dd86 0000000000000000
GPR24:
c000007fd0ed10c0 00000000ffffffff 0000000000000158 000000000000014a
GPR28:
c00000ffc43af010 c00000ffc9144000 c00000ffcaf22e00 c00000ffcaf22100
NIP [
c00000000080ef8c] __skb_clone+0xdc/0x140
LR [
c00000000080eee8] __skb_clone+0x38/0x140
Call Trace:
[
c00000ffffc33990] [
c00000000080fb74] skb_clone+0x74/0x110 (unreliable)
[
c00000ffffc339c0] [
c000000000907f64] packet_rcv+0x144/0x510
[
c00000ffffc33a40] [
c000000000827b64] __netif_receive_skb_core+0x5b4/0xd80
[
c00000ffffc33b00] [
c00000000082b2bc] netif_receive_skb_internal+0x2c/0xc0
[
c00000ffffc33b40] [
c00000000082c49c] napi_gro_receive+0x11c/0x260
[
c00000ffffc33b80] [
d000000066483d68] bnx2x_poll+0xcf8/0x17b0 [bnx2x]
[
c00000ffffc33d00] [
c00000000082babc] net_rx_action+0x31c/0x480
[
c00000ffffc33e10] [
c0000000000d5a44] __do_softirq+0x164/0x3d0
[
c00000ffffc33f00] [
c0000000000d60a8] irq_exit+0x108/0x120
[
c00000ffffc33f20] [
c000000000015b98] __do_irq+0x98/0x200
[
c00000ffffc33f90] [
c000000000027f14] call_do_irq+0x14/0x24
[
c00000ffd4923a90] [
c000000000015d94] do_IRQ+0x94/0x110
[
c00000ffd4923ae0] [
c000000000008d90] hardware_interrupt_common+0x150/0x160
Signed-off-by: David S. Miller <davem@davemloft.net>
Arkadi Sharshevsky [Fri, 28 Apr 2017 19:39:07 +0000 (22:39 +0300)]
net: bridge: Fix improper taking over HW learned FDB
Commit
7e26bf45e4cb ("net: bridge: allow SW learn to take over HW fdb
entries") added the ability to "take over an entry which was previously
learned via HW when it shows up from a SW port".
However, if an entry was learned via HW and then a control packet
(e.g., ARP request) was trapped to the CPU, the bridge driver will
update the entry and remove the externally learned flag, although the
entry is still present in HW. Instead, only clear the externally learned
flag in case of roaming.
Fixes: 7e26bf45e4cb ("net: bridge: allow SW learn to take over HW fdb entries")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Arkadi Sharashevsky <arkadis@mellanox.com>
Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
WANG Cong [Fri, 28 Apr 2017 17:04:29 +0000 (10:04 -0700)]
ipv4: get rid of ip_ra_lock
After commit
1215e51edad1 ("ipv4: fix a deadlock in ip_ra_control")
we always take RTNL lock for ip_ra_control() which is the only place
we update the list ip_ra_chain, so the ip_ra_lock is no longer needed.
As Eric points out, BH does not need to disable either, RCU readers
don't care.
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesper Dangaard Brouer [Fri, 28 Apr 2017 14:25:04 +0000 (16:25 +0200)]
samples/bpf: bpf_load.c detect and abort if ELF maps section size is wrong
The struct bpf_map_def was extended in commit
fb30d4b71214 ("bpf: Add tests
for map-in-map") with member unsigned int inner_map_idx. This changed the size
of the maps section in the generated ELF _kern.o files.
Unfortunately the loader in bpf_load.c does not detect or handle this. Thus,
older _kern.o files became incompatible, and caused hard-to-debug errors
where the syscall validation rejected BPF_MAP_CREATE request.
This patch only detect the situation and aborts load_bpf_file(). It also
add code comments warning people that read this loader for inspiration
for these pitfalls.
Fixes: fb30d4b71214 ("bpf: Add tests for map-in-map")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Fri, 28 Apr 2017 13:03:48 +0000 (16:03 +0300)]
lwtunnel: fix error path in lwtunnel_fill_encap()
We recently added a check to see if nla_nest_start() fails. There are
two issues with that. First, if it fails then I don't think we should
call nla_nest_cancel(). Second, it's slightly convoluted but the
current code returns success but we should return -EMSGSIZE instead.
Fixes: a50fe0ffd76f ("lwtunnel: check return value of nla_nest_start")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Fri, 28 Apr 2017 12:57:15 +0000 (15:57 +0300)]
liquidio: silence a locking static checker warning
Presumably we never hit this return, but static checkers complain that
we need to unlock so we may as well fix that.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Fri, 28 Apr 2017 12:56:09 +0000 (15:56 +0300)]
qed: Unlock on error in qed_vf_pf_acquire()
My static checker complains that we're holding a mutex on this error
path. Let's goto exit instead of returning directly.
Fixes: b0bccb69eba3 ("qed: Change locking scheme for VF channel")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 1 May 2017 02:39:25 +0000 (22:39 -0400)]
Merge branch 'hns-deferred-probe'
lipeng says:
====================
net: hns: bug fix for HNS driver
This patchset add support defered dsaf probe when mdio and
mbigen module is not insmod.
For more details, please refer to individual patch.
change log:
V4 - > V5:
1. Float on net-next;
2. Delete patch "net: hns: fixed bug that skb used after kfree"
from this patchset;
V3 -> V4:
1. Delete redundant commit message;
2. Add Reviewed-by: Matthias Brugger <mbrugger@suse.com>;
V2 -> V3:
1. Check return value when platform_get_irq in hns_rcb_get_cfg;
V1 -> V2:
1. Return appropriate errno in hns_mac_register_phy;
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
lipeng [Fri, 28 Apr 2017 06:49:47 +0000 (14:49 +0800)]
net: hns: support deferred probe when no mdio
In the hip06 and hip07 SoCs, phy connect to mdio bus.The mdio
module is probed with module_init, and, as such,
is not guaranteed to probe before the HNS driver. So we need
to support deferred probe.
We check for probe deferral in the mac init, so we not init DSAF
when there is no mdio, and free all resource, to later learn that
we need to defer the probe.
Signed-off-by: lipeng <lipeng321@huawei.com>
Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com>
Reviewed-by: Matthias Brugger <mbrugger@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
lipeng [Fri, 28 Apr 2017 06:49:46 +0000 (14:49 +0800)]
net: hns: support deferred probe when can not obtain irq
In the hip06 and hip07 SoCs, the interrupt lines from the
DSAF controllers are connected to mbigen hw module.
The mbigen module is probed with module_init, and, as such,
is not guaranteed to probe before the HNS driver. So we need
to support deferred probe.
Signed-off-by: lipeng <lipeng321@huawei.com>
Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com>
Reviewed-by: Matthias Brugger <mbrugger@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 1 May 2017 02:37:01 +0000 (22:37 -0400)]
Merge branch 'nfp-XDP_TX-optimizations'
Jakub Kicinski says:
====================
nfp: optimize XDP TX and small fixes
This series optimizes the nfp XDP TX performance a little bit.
I run quick tests on an Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz.
Single core/queue performance for both touch and drop and touch and
forward is above 20Mpps @64B packets, drop being 2Mpps faster.
I think this is max for a single queue on the low power NFPs.
There are also a few minor fixes included for code in net-next.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 28 Apr 2017 04:06:20 +0000 (21:06 -0700)]
nfp: provide 256 bytes of XDP headroom in all configurations
For legacy reasons NFP FW may be compiled to DMA packets to a constant
offset into the buffer and use the space before it for metadata. This
ensures that packets data always start at a certain offset regardless of
the amount of preceding metadata.
If rx offset is set to 0 there may still be up to 64 bytes of metadata
but metadata will start at the beginning of the buffer, instead of:
data_start_offset = rx_offset - meta_len
Even though we make the buffers larger to accommodate up to 64 bytes of
metadata, if there is only N bytes of metadata, we will end up with
N bytes of headroom and 64 - N bytes of tailroom. Therefore we can't
rely on that space for XDP headroom. Make sure we always allocate
full 256 bytes. This, unfortunately, means we can't fit the headroom
on an u8 any more.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 28 Apr 2017 04:06:19 +0000 (21:06 -0700)]
nfp: don't completely refuse to work with old flashes
Right now the required Service Process ABI version is still tied
to max ID of known commands. For new NSP commands we are adding
we are checking if NSP version is recent enough on command-by-command
basis. The driver doesn't have to force the device to have the
very latest flash, anything newer than 0.8 should do.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 28 Apr 2017 04:06:18 +0000 (21:06 -0700)]
nfp: avoid reading TX queue indexes from the device
Reading TX queue indexes from the device memory on each interrupt
is expensive. It's doubly expensive with XDP running since we have
two TX rings to check there. If the software indexes indicate that
the TX queue is completely empty, however, we don't need to look at
the device completion index at all.
The queuing CPU is doing a wmb() before kicking the device TX so
we should be safe to assume on the CPU handling the completions will
never see old value of the software copy of the index.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 28 Apr 2017 04:06:17 +0000 (21:06 -0700)]
nfp: do simple XDP TX buffer recycling
On the RX path we follow the "drop if allocation of replacement
buffer fails" rule. With XDP we extended that to the TX action,
so if XDP prog returned TX but allocation of replacement RX buffer
failed, we will drop the packet.
To improve our XDP TX performance extend the idea of rings being
always full to XDP TX rings. Pre-fill the XDP TX rings with RX
buffers, and when XDP prog returns TX action swap the RX buffer
with the next buffer from the TX ring.
XDP TX complete will no longer free the buffers but let them
sit on the TX ring and wait for swap with RX buffer, instead.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>