Igor Russkikh [Fri, 19 Jan 2018 14:03:26 +0000 (17:03 +0300)]
net: aquantia: Introduce global AQC hardware reset sequence
The detailed reset sequence ensures all HW components are in aligned
state before NIC startup. It also supports cards with signed firmware (RBL)
and checks if their FW is valid.
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Russkikh [Fri, 19 Jan 2018 14:03:25 +0000 (17:03 +0300)]
net: aquantia: Introduce support for new firmware on AQC cards
This defines fw2x operations table and corresponding methods.
Some of the functions are being shared with 1.x firmware
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Russkikh [Fri, 19 Jan 2018 14:03:24 +0000 (17:03 +0300)]
net: aquantia: Introduce firmware ops callbacks
New AQC cards will have an updated firmware with new binary interface.
This patch extracts firmware specific operations into a separate table
and prepares for the introduction of new fw 2.x and 3.x
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Russkikh [Fri, 19 Jan 2018 14:03:23 +0000 (17:03 +0300)]
net: aquantia: Change confusing no_ff_addr to more meaningful name
The address to check if HW is not dead/hang could be stored in
capabilities, since it is a constant. Change its name to better reflect
the idea.
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Russkikh [Fri, 19 Jan 2018 14:03:22 +0000 (17:03 +0300)]
net: aquantia: Remove create/destroy from hw ops
These ops are not related to HW and are now implemented in pci module.
Thus, remove these ops pointers and implementation.
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Russkikh [Fri, 19 Jan 2018 14:03:21 +0000 (17:03 +0300)]
net: aquantia: Cleanup pci functions module
Driver contained a dead code of maintaining multiple pci port instances.
That will never be used since for each pci function a separate NIC
instance is created.
Simplify this, making pci module only responsible for pci resource
management.
NIC initialization is also simplified accordingly.
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Russkikh [Fri, 19 Jan 2018 14:03:20 +0000 (17:03 +0300)]
net: aquantia: Convert hw and caps structures to const static pointers
This removes unnecessary structure copying, and prepares the driver for
separate firmware ops table introduction.
We also remove extra copy of capabilities structure (which is const actually)
and also replace it with a const pointer in aq_nic_cfg.
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Russkikh [Fri, 19 Jan 2018 14:03:19 +0000 (17:03 +0300)]
net: aquantia: Introduce new AQC devices and capabilities
A number of new AQC devices is going to be released. To support more
flexible capabilities management a number of static caps instances is now
declared. Devices now are mainly differs by supported speeds, but in future
more parameters will be customized. A set of AQC100 devices have
fibre media, not twisted pair - this is also reflected in
new capabilities definitions.
HW level also now directly exports hw_ops for each of A0/B0 hardware.
PCI configuration now uses a device configuration table where each
device ID is explicitly mapped with hardware OPs and capabilities
structures.
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Russkikh [Fri, 19 Jan 2018 14:03:18 +0000 (17:03 +0300)]
net: aquantia: Introduce new device ids and constants
New set of aquantia devices has an upgraded hardware (B1).
The hardware interface is identical to B0. The difference will
be in firmware which is incompatible with old one.
Reorganized and removed duplicate speed and devid definitions
Introduced explicit flow control configuration defines
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 21 Jan 2018 23:13:23 +0000 (18:13 -0500)]
Merge tag 'mlx5-updates-2018-01-19' of git://git./linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2018-01-19
From: Or Gerlitz <ogerlitz@mellanox.com>
=======
First six patches of this series further enhances the mlx5 hairpin support.
The first two patches deal with using different hairpin instances
for flows whose packets have different priorities to align with the port
TX QoS model. The next four patches allow us to do HW spreading
of flows over a set of hairpin pairs using RSS. The last two patches
change the driver to also set the size of the HW hairpin queues.
========
Next four patches from Eran Ben Elisha <eranbe@mellanox.com>:
Add more debug data for TX timeout handling, and further enhance and optimize
TX timeout handling upon lost interrupts, which adds a mechanism for explicitly
polling EQ in case of a TX timeout in order to recover from a lost interrupt.
If this is not the case (no pending EQEs), perform a channels full recovery as
usual.
From Kamal Heib <kamalh@mellanox.com>, Two patches to extend the stats group API
to have an update_stats() callback which will be used to fetch the hardware or
software counters data, this will improve the current API and reduce code
duplication.
From Gal Pressman <galp@mellanox.com>, Last patch, Add likely to the common RX checksum
flow.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pieter Jansen van Vuuren [Sat, 20 Jan 2018 01:54:08 +0000 (17:54 -0800)]
nfp: flower: prioritize stats updates
Previously it was possible to interrupt processing stats updates because
they were handled in a work queue. Interrupting the stats updates could
lead to a situation where we backup the control message queue. This patch
moves the stats update processing out of the work queue to be processed as
soon as hardware sends a request.
Reported-by: Louis Peens <louis.peens@netronome.com>
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Walleij [Sun, 21 Jan 2018 13:15:41 +0000 (14:15 +0100)]
net: gemini: Depend on HAS_IOMEM
The zeroday builder notices that since Usermode Linux does not
have IO memory, the build fails for them when selecting everything
it can enable.
As the driver is clearly using memory-mapped registers to access
the network adapter, we add depends on HAS_IOMEM to solve this
problem.
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 21 Jan 2018 16:35:34 +0000 (11:35 -0500)]
Merge git://git./linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:
====================
Netfilter/IPVS updates for net-next
The following patchset contains Netfilter/IPVS updates for your net-next
tree. Basically, a new extension for ip6tables, simplification work of
nf_tables that saves us 500 LoC, allow raw table registration before
defragmentation, conversion of the SNMP helper to use the ASN.1 code
generator, unique 64-bit handle for all nf_tables objects and fixes to
address fallout from previous nf-next batch. More specifically, they
are:
1) Seven patches to remove family abstraction layer (struct nft_af_info)
in nf_tables, this simplifies our codebase and it saves us 64 bytes per
net namespace.
2) Add IPv6 segment routing header matching for ip6tables, from Ahmed
Abdelsalam.
3) Allow to register iptable_raw table before defragmentation, some
people do not want to waste cycles on defragmenting traffic that is
going to be dropped, hence add a new module parameter to enable this
behaviour in iptables and ip6tables. From Subash Abhinov
Kasiviswanathan. This patch needed a couple of follow up patches to
get things tidy from Arnd Bergmann.
4) SNMP helper uses the ASN.1 code generator, from Taehee Yoo. Several
patches for this helper to prepare this change are also part of this
patch series.
5) Add 64-bit handles to uniquely objects in nf_tables, from Harsha
Sharma.
6) Remove log message that several netfilter subsystems print at
boot/load time.
7) Restore x_tables module autoloading, that got broken in a previous
patch to allow singleton NAT hook callback registration per hook
spot, from Florian Westphal. Moreover, return EBUSY to report that
the singleton NAT hook slot is already in instead.
8) Several fixes for the new nf_tables flowtable representation,
including incorrect error check after nf_tables_flowtable_lookup(),
missing Kconfig dependencies that lead to build breakage and missing
initialization of priority and hooknum in flowtable object.
9) Missing NETFILTER_FAMILY_ARP dependency in Kconfig for the clusterip
target. This is due to recent updates in the core to shrink the hook
array size and compile it out if no specific family is enabled via
.config file. Patch from Florian Westphal.
10) Remove duplicated include header files, from Wei Yongjun.
11) Sparse warning fix for the NFPROTO_INET handling from the core
due to missing static function definition, also from Wei Yongjun.
12) Restore ICMPv6 Parameter Problem error reporting when
defragmentation fails, from Subash Abhinov Kasiviswanathan.
13) Remove obsolete owner field initialization from struct
file_operations, patch from Alexey Dobriyan.
14) Use boolean datatype where needed in the Netfilter codebase, from
Gustavo A. R. Silva.
15) Remove double semicolon in dynset nf_tables expression, from
Luis de Bethencourt.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 21 Jan 2018 03:03:46 +0000 (22:03 -0500)]
Merge git://git./linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:
====================
pull-request: bpf-next 2018-01-19
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) bpf array map HW offload, from Jakub.
2) support for bpf_get_next_key() for LPM map, from Yonghong.
3) test_verifier now runs loaded programs, from Alexei.
4) xdp cpumap monitoring, from Jesper.
5) variety of tests, cleanups and small x64 JIT optimization, from Daniel.
6) user space can now retrieve HW JITed program, from Jiong.
Note there is a minor conflict between Russell's arm32 JIT fixes
and removal of bpf_jit_enable variable by Daniel which should
be resolved by keeping Russell's comment and removing that variable.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 20 Jan 2018 03:59:33 +0000 (22:59 -0500)]
Merge git://git./linux/kernel/git/davem/net
The BPF verifier conflict was some minor contextual issue.
The TUN conflict was less trivial. Cong Wang fixed a memory leak of
tfile->tx_array in 'net'. This is an skb_array. But meanwhile in
net-next tun changed tfile->tx_arry into tfile->tx_ring which is a
ptr_ring.
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov [Sat, 20 Jan 2018 02:37:01 +0000 (18:37 -0800)]
Merge branch 'bpf-misc-improvements'
Daniel Borkmann says:
====================
This series adds various misc improvements to BPF: detection
of BPF helper definition misconfiguration for mem/size argument
pairs, csum_diff helper also for XDP, various test cases,
removal of the recently added pure_initcall(), restriction
of the jit sysctls to cap_sys_admin for initns, a minor size
improvement for x86 jit in alu ops, output of complexity limit
to verifier log and last but not least having the event output
more flexible with moving to const_size_or_zero type.
Thanks!
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Daniel Borkmann [Sat, 20 Jan 2018 00:24:37 +0000 (01:24 +0100)]
bpf: move event_output to const_size_or_zero for xdp/skb as well
Similar rationale as in
a60dd35d2e39 ("bpf: change bpf_perf_event_output
arg5 type to ARG_CONST_SIZE_OR_ZERO"), change the type to CONST_SIZE_OR_ZERO
such that we can better deal with optimized code. No changes needed in
bpf_event_output() as it can also deal with 0 size entirely (e.g. as only
wake-up signal with empty frame in perf RB, or packet dumps w/o meta data
as another such possibility).
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Daniel Borkmann [Sat, 20 Jan 2018 00:24:36 +0000 (01:24 +0100)]
bpf: add upper complexity limit to verifier log
Given the limit could potentially get further adjustments in the
future, add it to the log so it becomes obvious what the current
limit is w/o having to check the source first. This may also be
helpful for debugging complexity related issues on kernels that
backport from upstream.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Daniel Borkmann [Sat, 20 Jan 2018 00:24:35 +0000 (01:24 +0100)]
bpf, x86: small optimization in alu ops with imm
For the BPF_REG_0 (BPF_REG_A in cBPF, respectively), we can use
the short form of the opcode as dst mapping is on eax/rax and
thus save a byte per such operation. Added to add/sub/and/or/xor
for 32/64 bit when K immediate is used. There may be more such
low-hanging fruit to add in future as well.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Daniel Borkmann [Sat, 20 Jan 2018 00:24:34 +0000 (01:24 +0100)]
bpf: restrict access to core bpf sysctls
Given BPF reaches far beyond just networking these days, it was
never intended to allow setting and in some cases reading those
knobs out of a user namespace root running without CAP_SYS_ADMIN,
thus tighten such access.
Also the bpf_jit_enable = 2 debugging mode should only be allowed
if kptr_restrict is not set since it otherwise can leak addresses
to the kernel log. Dump a note to the kernel log that this is for
debugging JITs only when enabled.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Daniel Borkmann [Sat, 20 Jan 2018 00:24:33 +0000 (01:24 +0100)]
bpf: get rid of pure_initcall dependency to enable jits
Having a pure_initcall() callback just to permanently enable BPF
JITs under CONFIG_BPF_JIT_ALWAYS_ON is unnecessary and could leave
a small race window in future where JIT is still disabled on boot.
Since we know about the setting at compilation time anyway, just
initialize it properly there. Also consolidate all the individual
bpf_jit_enable variables into a single one and move them under one
location. Moreover, don't allow for setting unspecified garbage
values on them.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Daniel Borkmann [Sat, 20 Jan 2018 00:24:32 +0000 (01:24 +0100)]
bpf: add couple of test cases for div/mod by zero
Add couple of missing test cases for eBPF div/mod by zero to the
new test_verifier prog runtime feature. Also one for an empty prog
and only exit.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Daniel Borkmann [Sat, 20 Jan 2018 00:24:31 +0000 (01:24 +0100)]
bpf: add couple of test cases for signed extended imms
Add a couple of test cases for interpreter and JIT that are
related to an issue we faced some time ago in Cilium [1],
which is fixed in LLVM with commit
e53750e1e086 ("bpf: fix
bug on silently truncating 64-bit immediate").
Test cases were run-time checking kernel to behave as intended
which should also provide some guidance for current or new
JITs in case they should trip over this. Added for cBPF and
eBPF.
[1] https://github.com/cilium/cilium/pull/2162
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Daniel Borkmann [Sat, 20 Jan 2018 00:24:30 +0000 (01:24 +0100)]
bpf: add csum_diff helper to xdp as well
Useful for porting cls_bpf programs w/o increasing program
complexity limits much at the same time, so add the helper
to XDP as well.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Daniel Borkmann [Sat, 20 Jan 2018 00:24:29 +0000 (01:24 +0100)]
bpf, verifier: detect misconfigured mem, size argument pair
I've seen two patch proposals now for helper additions that used
ARG_PTR_TO_MEM or similar in reg_X but no corresponding ARG_CONST_SIZE
in reg_X+1. Verifier won't complain in such case, but it will omit
verifying the memory passed to the helper thus ending up badly.
Detect such buggy helper function signature and bail out during
verification rather than finding them through review.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Jesper Dangaard Brouer [Fri, 19 Jan 2018 16:15:50 +0000 (17:15 +0100)]
samples/bpf: xdp_monitor include cpumap tracepoints in monitoring
The xdp_redirect_cpu sample have some "builtin" monitoring of the
tracepoints for xdp_cpumap_*, but it is practical to have an external
tool that can monitor these transpoint as an easy way to troubleshoot
an application using XDP + cpumap.
Specifically I need such external tool when working on Suricata and
XDP cpumap redirect. Extend the xdp_monitor tool sample with
monitoring of these xdp_cpumap_* tracepoints. Model the output format
like xdp_redirect_cpu.
Given I needed to handle per CPU decoding for cpumap, this patch also
add per CPU info on the existing monitor events. This resembles part
of the builtin monitoring output from sample xdp_rxq_info. Thus, also
covering part of that sample in an external monitoring tool.
Performance wise, the cpumap tracepoints uses bulking, which cause
them to have very little overhead. Thus, they are enabled by default.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Daniel Borkmann [Fri, 19 Jan 2018 22:26:41 +0000 (23:26 +0100)]
Merge branch 'bpf-lpm-get-next-key'
Yonghong Song says:
====================
This patch set implements MAP_GET_NEXT_KEY command for LPM_TRIE map.
This command is really useful for key enumeration, and for key deletion
if what keys in the trie are unknown.
Patch #1 implements the functionality in the kernel and patch #2
adds a test case in tools/testing/selftests/bpf.
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Yonghong Song [Thu, 18 Jan 2018 23:08:51 +0000 (15:08 -0800)]
tools/bpf: add a testcase for MAP_GET_NEXT_KEY command of LPM_TRIE map
A test case is added in tools/testing/selftests/bpf/test_lpm_map.c
for MAP_GET_NEXT_KEY command. A four node trie, which
is described in kernel/bpf/lpm_trie.c, is built and the
MAP_GET_NEXT_KEY results are checked.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Yonghong Song [Thu, 18 Jan 2018 23:08:50 +0000 (15:08 -0800)]
bpf: implement MAP_GET_NEXT_KEY command for LPM_TRIE map
Current LPM_TRIE map type does not implement MAP_GET_NEXT_KEY
command. This command is handy when users want to enumerate
keys. Otherwise, a different map which supports key
enumeration may be required to store the keys. If the
map data is sparse and all map data are to be deleted without
closing file descriptor, using MAP_GET_NEXT_KEY to find
all keys is much faster than enumerating all key space.
This patch implements MAP_GET_NEXT_KEY command for LPM_TRIE map.
If user provided key pointer is NULL or the key does not have
an exact match in the trie, the first key will be returned.
Otherwise, the next key will be returned.
In this implemenation, key enumeration follows a postorder
traversal of internal trie. More specific keys
will be returned first than less specific ones, given
a sequence of MAP_GET_NEXT_KEY syscalls.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Shuah Khan [Fri, 19 Jan 2018 00:36:24 +0000 (17:36 -0700)]
selftests: bpf: update .gitignore with missing generated files
Update .gitignore with missing generated files.
Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Roman Gushchin [Fri, 19 Jan 2018 14:17:45 +0000 (14:17 +0000)]
bpftool: recognize BPF_MAP_TYPE_CPUMAP maps
Add BPF_MAP_TYPE_CPUMAP map type to the list
of map type recognized by bpftool and define
corresponding text representation.
Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Quentin Monnet <quentin.monnet@netronome.com>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Acked-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
David S. Miller [Fri, 19 Jan 2018 20:57:03 +0000 (15:57 -0500)]
Merge branch 'dsa-mv88e6xxx-ATU-VTU-irq-fixes'
Andrew Lunn says:
====================
ATU and VTU irq fixes
Further testing and code review found two sets of bugs.
Core review found a cut/paste error in the irq setup code.
A board which does not have an interrupt line from the switch to the
SoC, and experiancing an EPROBE_DEFER throw a splat when the ATU irq
was freed but never registered.
v2: Fix typ0 chip->chip->vtu_prob_irq to chip->vtu_prob_irq
0-day compile testing.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Thu, 18 Jan 2018 16:42:50 +0000 (17:42 +0100)]
net: dsa: mv88e6xxx: Free ATU/VTU irq only when there is chip irq
We only register the ATU and VTU irq when we have a chip level IRQ.
In the error path, we should only attempt to remove the ATU and VTU
irq if we also have a chip level IRQ.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Thu, 18 Jan 2018 16:42:49 +0000 (17:42 +0100)]
net: dsa: mv88e6xxx: Return error from irq_find_mapping()
Fix a cut/paste error. When irq_find_mapping() returns an error for
the ATU or VTU interrupt, return that error, not the value of
chip->device_irq.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 19 Jan 2018 20:52:52 +0000 (15:52 -0500)]
Merge branch 'net-sched-cls-add-extack-support'
Alexander Aring says:
====================
net: sched: cls: add extack support
this patch adds extack support for TC classifier subsystem. The first
patch fixes some code style issues for this patch series pointed out
by checkpatch. The other patches until the last one prepares extack
handling for the TC classifier subsystem and handle generic extack
errors.
The last patch is an example for u32 classifier to add extack support
inside the callbacks delete and change. There exists a init callback as
well, but most classifier implementation run a kalloc() once to allocate
something. Not necessary _yet_ to add extack support now.
- Alex
Cc: David Ahern <dsahern@gmail.com>
changes since v3:
- fix accidentally move of config option mismatch message in PATCH 2/8
correct one is 4/8, detected by kbuildbot (Thank you)
- Removed patch "net: sched: cls: add extack support for tc_setup_cb_call"
PATCH 7/8 in version v2 as suggested by Jakub Kicinski (Thank you)
- changed NL_SET_ERR_MSG to NL_SET_ERR_MSG_MOD as suggested by Jakub Kicinski
in u32 cls (Thank You)
- Removed text from cover letter that I was waiting for Jiri's Patches as
detected by Jamal Hadi Salim (Thank you).
changes since v2:
- rebased on Jiri's patches (Thank you)
- several spelling fixes pointed out by Cong Wang (Thank you)
- several spelling fixes pointed out by David Ahern (Thank you)
- use David Ahern recommendation if config option is mismatch, but
combine it with Cong Wang recommendation to put config name into it
(Thank you)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Aring [Thu, 18 Jan 2018 16:20:55 +0000 (11:20 -0500)]
net: sched: cls_u32: add extack support
This patch adds extack support for the u32 classifier as example for
delete and init callback.
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Aring [Thu, 18 Jan 2018 16:20:54 +0000 (11:20 -0500)]
net: sched: cls: add extack support for tcf_change_indev
This patch adds extack handling for the tcf_change_indev function which
is common used by TC classifier implementations.
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Aring [Thu, 18 Jan 2018 16:20:53 +0000 (11:20 -0500)]
net: sched: cls: add extack support for delete callback
This patch adds extack support for classifier delete callback api. This
prepares to handle extack support inside each specific classifier
implementation.
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Aring [Thu, 18 Jan 2018 16:20:52 +0000 (11:20 -0500)]
net: sched: cls: add extack support for tcf_exts_validate
The tcf_exts_validate function calls the act api change callback. For
preparing extack support for act api, this patch adds the extack as
parameter for this function which is common used in cls implementations.
Furthermore the tcf_exts_validate will call action init callback which
prepares the TC action subsystem for extack support.
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Aring [Thu, 18 Jan 2018 16:20:51 +0000 (11:20 -0500)]
net: sched: cls: add extack support for change callback
This patch adds extack support for classifier change callback api. This
prepares to handle extack support inside each specific classifier
implementation.
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Aring [Thu, 18 Jan 2018 16:20:50 +0000 (11:20 -0500)]
net: sched: cls_api: handle generic cls errors
This patch adds extack support for generic cls handling. The extack
will be set deeper to each called function which is not part of netdev
core api.
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Aring [Thu, 18 Jan 2018 16:20:49 +0000 (11:20 -0500)]
net: sched: cls: fix code style issues
This patch changes some code style issues pointed out by checkpatch
inside the TC cls subsystem.
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Thu, 18 Jan 2018 11:55:23 +0000 (12:55 +0100)]
mlxsw: spectrum: Upper-bound supported FW version
During initialization the driver checks whether the flashed FW image
suits its requirements by checking that it's sufficiently new.
However, there's only a weak backward compatibility scheme that is
actually guaranteed by the FW, so driver must also upper bound the
version to prevent compatibility issues between current driver and some
possible future fw.
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 19 Jan 2018 20:44:19 +0000 (15:44 -0500)]
Merge branch 'nfp-devlink-capabilities-extensions-and-updates'
Jakub Kicinski says:
====================
nfp: devlink, capabilities extensions and updates
This series starts with an improvement to the usability of the device
memory accessors (CPP transactions). Next few patches are devoted to
fixing the devlink locking. After recent patches for mlxsw the locking
scheme of devlink ops has to be reworked. Following patches improve
NFP code dealing with "representors", and expands the error message
printed when driver has no support for loaded FW.
Second part of the series is focused on vNIC capabilities read from
vNIC control memory (often referred to as "BAR0" for historical reasons).
TLV capability format is established and immediately made use of. The
next patches rework parsing of features for control vNIC which allows
apps to mask out features they don't want enabled.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 18 Jan 2018 02:51:06 +0000 (18:51 -0800)]
nfp: bpf: disable all ctrl vNIC capabilities
BPF firmware currently exposes IRQ moderation capability.
The driver will make use of it by default, inserting 50 usec
delay to every control message exchange. This cuts the number
of messages per second we can exchange by almost half.
None of the other capabilities make much sense for BPF control
vNIC, either. Disable them all.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 18 Jan 2018 02:51:05 +0000 (18:51 -0800)]
nfp: allow apps to disable ctrl vNIC capabilities
Most vNIC capabilities are netdev related. It makes no sense
to initialize them and waste FW resources. Some are even
counter-productive, like IRQ moderation, which will slow
down exchange of control messages.
Add to nfp_app a mask of enabled control vNIC capabilities
for apps to use. Make flower and BPF enable all capabilities
for now. No functional changes.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 18 Jan 2018 02:51:04 +0000 (18:51 -0800)]
nfp: split reading capabilities out of nfp_net_init()
nfp_net_init() is a little long and we are about to add more
code to reading capabilties. Move the capability reading,
parsing and validating out. Only actual initialization
will stay in nfp_net_init().
No functional changes.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 18 Jan 2018 02:51:03 +0000 (18:51 -0800)]
nfp: read mailbox address from TLV caps
Allow specifying alternative vNIC mailbox location in TLV caps.
This way we can size the mailbox to the needs and not necessarily
waste 512B of ctrl memory space.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 18 Jan 2018 02:51:02 +0000 (18:51 -0800)]
nfp: read ME frequency from vNIC ctrl memory
PCIe island clock frequency is used when converting coalescing
parameters from usecs to NFP timestamps. Most chips don't run
at 1200MHz, allow FW to provide us with the real frequency.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 18 Jan 2018 02:51:01 +0000 (18:51 -0800)]
nfp: add TLV capabilities to the BAR
NFP is entirely programmable, including the PCI data interface.
Using a fixed control BAR layout certainly makes implementations
easier, but require careful considerations when space is allocated.
Once BAR area is allocated to one feature nothing else can use it.
Allocating space statically also requires it to be sized upfront,
which leads to either unnecessary limitation or wastage.
We currently have a 32bit capability word defined which tells drivers
which application FW features are supported. Most of the bits
are exhausted. The same bits are also reused for enabling specific
features. Bulk of capabilities don't have a need for an enable bit,
however, leading to confusion and wastage.
TLVs seems like a better fit for expressing capabilities of applications
running on programmable hardware.
This patch leaves the front of the BAR as is, and declares a TLV
capability start at offset 0x58. Most of the space up to 0x0d90
is already allocated, but the used space can be wrapped with RESERVED
TLVs. E.g.:
Address Type Length
0x0058 RESERVED 0xe00 /* Wrap basic structures */
0x0e5c FEATURE_A 0x004
0x0e64 FEATURE_B 0x004
0x0e6c RESERVED 0x990 /* Wrap qeueue stats */
0x1800 FEATURE_C 0x100
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 18 Jan 2018 02:51:00 +0000 (18:51 -0800)]
nfp: improve app not found message
When driver app matching loaded FW is not found users are faced with:
nfp: failed to find app with ID 0x%02x
This message does not properly explain that matching driver code is
either not built into the driver or the driver is too old.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 18 Jan 2018 02:50:59 +0000 (18:50 -0800)]
nfp: protect each repr pointer individually with RCU
Representors are grouped in sets by type. Currently the whole
sets are under RCU protection, but individual representor pointers
are not. This causes some inconveniences when representors have
to be destroyed, because we have to allocate new sets to remove
any representors. Protect the individual pointers with RCU.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 18 Jan 2018 02:50:58 +0000 (18:50 -0800)]
nfp: add nfp_reprs_get_locked() helper
The write side of repr tables is always done under pf->lock.
Add a helper to dereference repr table pointers under protection
of that lock.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 18 Jan 2018 02:50:57 +0000 (18:50 -0800)]
nfp: register devlink after app is created
Devlink used to have two global locks: devlink lock and port lock,
our lock ordering looked like this:
devlink lock -> driver's pf->lock -> devlink port lock
After recent changes port lock was replaced with per-instance
lock. Unfortunately, new per-instance lock is taken on most
operations now. This means we can only grab the pf->lock from
the port split/unsplit ops. Lock ordering looks like this:
devlink lock -> driver's pf->lock -> devlink instance lock
Since we can't take pf->lock from most devlink ops, make sure
nfp_apps are prepared to service them as soon as devlink is
registered. Locking the pf must be pushed down after
nfp_app_init() callback.
The init order looks like this:
nfp_app_init
devlink_register
nfp_app_start
netdev/port_register
As soon as app_init is done nfp_apps must be ready to service
devlink-related callbacks. apps can only register their own
devlink objects from nfp_app_start.
Fixes:
2406e7e546b2 ("devlink: Add per devlink instance lock")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 18 Jan 2018 02:50:56 +0000 (18:50 -0800)]
nfp: release global resources only on the remove path
NFP app is currently shut down as soon as all the vNICs are gone.
This means we can't depend on the app existing throughout the
lifetime of the device. Free the app only from PCI remove path.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 18 Jan 2018 02:50:55 +0000 (18:50 -0800)]
nfp: core: make scalar CPP helpers fail on short accesses
Currently the helpers for accessing 4 or 8 byte values over
the CPP bus return the length of IO on success. If the IO
was short caller has to deal with error handling. The short
IO for 4/8B values is completely impractical. Make the
helpers return an error if full access was not possible.
Fix the few places which are actually dealing with errors
correctly, most call sites already only deal with negative
return codes.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gal Pressman [Tue, 9 Jan 2018 12:40:21 +0000 (14:40 +0200)]
net/mlx5e: Add likely to the common RX checksum flow
Most of the packets return true for is_last_ethertype_ip, surround it
with likely compiler hint.
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Kamal Heib [Tue, 28 Nov 2017 21:52:13 +0000 (23:52 +0200)]
net/mlx5e: Extend the stats group API to have update_stats()
Extend the stats group API to have an update_stats() callback which
will be used to fetch the hardware or software counters data.
Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Kamal Heib [Tue, 28 Nov 2017 21:18:13 +0000 (23:18 +0200)]
net/mlx5e: Merge per priority stats groups
Merge the per priority traffic and pfc groups into one group, because
both groups share the same update_stats() callback which will be
introduced in the upcoming patch.
Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Eran Ben Elisha [Tue, 19 Dec 2017 12:53:34 +0000 (14:53 +0200)]
net/mlx5e: Add per-channel counters infrastructure, use it upon TX timeout
Add per-channel counter ch#_eq_rearm to monitor how many lost interrupt
recovery actions happened upon TX timeouts.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Eran Ben Elisha [Tue, 19 Dec 2017 12:52:29 +0000 (14:52 +0200)]
net/mlx5e: Poll event queue upon TX timeout before performing full channels recovery
Up until this patch, on every TX timeout we would try to do channels
recovery. However, in case of a lost interrupt for an EQ, the channel
associated to it cannot be recovered if reopened as it would never get
another interrupt on sent/received traffic, and eventually ends up with
another TX timeout (Restarting the EQ is not part of channel recovery).
This patch adds a mechanism for explicitly polling EQ in case of a TX
timeout in order to recover from a lost interrupt. If this is not the
case (no pending EQEs), perform a channels full recovery as usual.
Once a lost EQE is recovered, it triggers the NAPI to run and handle all
pending completions. This will free some budget in the bql (via calling
netdev_tx_completed_queue) or by clearing pending TXWQEs and waking up
the queue. One of the above actions will move the queue to be ready for
transmit again.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Eran Ben Elisha [Wed, 13 Dec 2017 11:04:01 +0000 (13:04 +0200)]
net/mlx5e: Add Event Queue meta data info for TX timeout logs
When TX timeout occurs, EQ consumer index and irqn can help in debug for
understanding the SW state of EQ. Add them to the logger prints for the
relevant EQ only.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Eran Ben Elisha [Wed, 20 Dec 2017 09:31:28 +0000 (11:31 +0200)]
net/mlx5e: Print delta since last transmit per SQ upon TX timeout
When driver callback for TX timeout is being called, it handles all
stopped xmit queues (not only the ones which their timeout expired).
Add usecs since last transmit to TX timeout logs per send queue in order
to monitor if the queue timeout expired.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Or Gerlitz [Thu, 4 Jan 2018 10:47:19 +0000 (12:47 +0200)]
net/mlx5e: Set hairpin queue size
For a given hairpin packet buffer size, different queue sizes
(values of log_hairpin_num_packets) determine how the data is broken
to strides on the RQ. Currently the chosen value is set to 64B strides.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Or Gerlitz [Thu, 4 Jan 2018 10:26:21 +0000 (12:26 +0200)]
net/mlx5: Enable setting hairpin queue size
Allow to specify the size of the hairpin queues along with the
packet buffer data size from the core setup code.
If the driver doesn't provide this, the FW applies proper value that
matches the provided data size and a FW chosen RQ stride size.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Or Gerlitz [Sun, 26 Nov 2017 18:39:12 +0000 (20:39 +0200)]
net/mlx5e: Add RSS support for hairpin
Support RSS for hairpin traffic. We create multiple hairpin RQ/SQ pairs
and RSS TTC table per hairpin instance and steer the related flows
through that table so they are spread between the pairs.
We open one pair per 50Gbs link speed, for all speeds <= 50Gbs, there
is one pair and no RSS while for 100Gbs ports two RSSed pairs.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Or Gerlitz [Thu, 23 Nov 2017 16:19:23 +0000 (18:19 +0200)]
net/mlx5: Vectorize the low level core hairpin object
Enhance the hairpin setup code at the core to support a set of N
(RQ,SQ) pairs. This will be later used by the caller to set RSS
spreading among the different RQs.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Or Gerlitz [Wed, 1 Nov 2017 13:22:08 +0000 (15:22 +0200)]
net/mlx5e: Enlarge the NIC TC offload steering prio to support two levels
This will allow to be able and set TC rule whose steering dest is
RSS TTC steering table.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Or Gerlitz [Wed, 6 Dec 2017 19:05:01 +0000 (21:05 +0200)]
net/mlx5e: Refactor RSS related objects and code
In order to use RSS for hairpin, we refactor the code that deals with
setup of the TTC steering tables. This is done using an interim ttc
params object that has the flow table attributes, TIR numbers, etc.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Or Gerlitz [Wed, 20 Dec 2017 16:18:10 +0000 (18:18 +0200)]
net/mlx5e: Set per priority hairpin pairs
As part of the QoS model, on xmit, the HW mandates that all packets going
through a given SQ have the same priority. To align hairpin SQs with that,
we use the priority given as part of the matching for the hairpin hash key.
This ensures that flows/packets mapped to different HW priorities will
go through different hairpin instances. If no priority is given for
matching, we treat that as an 8th priority, this is in order not to
harm cases where priority is specified.
Only the PCP priority trust model is supported.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Or Gerlitz [Wed, 20 Dec 2017 15:53:22 +0000 (17:53 +0200)]
net/mlx5e: Use vhca id as the hairpin peer identifier
The peer vhca id spans less bits vs the ifindex and can
well serve for the hairpin hash key, move to use that.
This is a pre-step to put more info into the hairpin hash
key in downstream patch while keeping it at 32 bits.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
David S. Miller [Fri, 19 Jan 2018 20:39:31 +0000 (15:39 -0500)]
Merge branch 'tcp-min-rtt'
Yuchung Cheng says:
====================
tcp: do not use RTT from delayed ACKs for min-RTT
This patch set prevents TCP sender from using RTT samples from
(suspected) delayed ACKs as the minimum RTT, to avoid unbounded
over-estimation of the network path delay. This issue is common
when a connection has extended periods of one packet chit-chat
beyond the min RTT filter window. The first patch does that for TCP
general min RTT estimation. The second patch addresses specifically
the BBR congestion control's min RTT filter.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuchung Cheng [Wed, 17 Jan 2018 20:11:01 +0000 (12:11 -0800)]
tcp: avoid min RTT bloat by skipping RTT from delayed-ACK in BBR
A persistent connection may send tiny amount of data (e.g. health-check)
for a long period of time. BBR's windowed min RTT filter may only see
RTT samples from delayed ACKs causing BBR to grossly over-estimate
the path delay depending how much the ACK was delayed at the receiver.
This patch skips RTT samples that are likely coming from delayed ACKs. Note
that it is possible the sender never obtains a valid measure to set the
min RTT. In this case BBR will continue to set cwnd to initial window
which seems fine because the connection is thin stream.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuchung Cheng [Wed, 17 Jan 2018 20:11:00 +0000 (12:11 -0800)]
tcp: avoid min-RTT overestimation from delayed ACKs
This patch avoids having TCP sender or congestion control
overestimate the min RTT by orders of magnitude. This happens when
all the samples in the windowed filter are one-packet transfer
like small request and health-check like chit-chat, which is farily
common for applications using persistent connections. This patch
tries to conservatively labels and skip RTT samples obtained from
this type of workload.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Maloy [Wed, 17 Jan 2018 15:42:46 +0000 (16:42 +0100)]
tipc: fix race between poll() and setsockopt()
Letting tipc_poll() dereference a socket's pointer to struct tipc_group
entails a race risk, as the group item may be deleted in a concurrent
tipc_sk_join() or tipc_sk_leave() thread.
We now move the 'open' flag in struct tipc_group to struct tipc_sock,
and let the former retain only a pointer to the moved field. This will
eliminate the race risk.
Reported-by: syzbot+799dafde0286795858ac@syzkaller.appspotmail.com
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lorenzo Bianconi [Wed, 17 Jan 2018 10:41:20 +0000 (11:41 +0100)]
l2tp: remove switch block in l2tp_nl_cmd_session_create()
Remove the switch block in l2tp_nl_cmd_session_create() that
checks pseudowire-specific parameters since just L2TP_PWTYPE_ETH and
L2TP_PWTYPE_PPP are currently supported and no actual checks are
performed. Moreover the L2TP_PWTYPE_IP/default case presents a harmless
issue in error handling (break instead of goto out_tunnel)
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Acked-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ganesh Goudar [Wed, 17 Jan 2018 07:27:34 +0000 (12:57 +0530)]
cxgb4: IPv6 filter takes 2 tids
on T6, IPv6 filter would occupy 2 tids instead of 4.
Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 19 Jan 2018 20:00:49 +0000 (15:00 -0500)]
Merge branch 'l2tp-set-l2specific_len-based-on-l2specific_type'
Lorenzo Bianconi says:
====================
l2tp: set l2specific_len based on l2specific_type
Do not rely on l2specific_len value provided by userspace but set sublayer
length according to l2specific_type.
Mark L2TP_ATTR_L2SPEC_LEN attribute as not used
Changes since v2:
- drop the patch related to a fix in the switch default case in
l2tp_nl_cmd_session_create()
- use L2SPECTYPE_NONE as default case in l2tp_get_l2specific_len()
Changes since v1:
- remove l2specific_len parameter
- add sanity check on l2specific_type provided by userspace
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Lorenzo Bianconi [Tue, 16 Jan 2018 22:01:57 +0000 (23:01 +0100)]
l2tp: mark L2TP_ATTR_L2SPEC_LEN as not used
Reviewed-by: Guillaume Nault <g.nault@alphalink.fr>
Tested-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lorenzo Bianconi [Tue, 16 Jan 2018 22:01:56 +0000 (23:01 +0100)]
l2tp: remove l2specific_len configurable parameter
Remove l2specific_len configuration parameter since now L2-Specific
Sublayer length is computed according to l2specific_type provided by
userspace.
Reviewed-by: Guillaume Nault <g.nault@alphalink.fr>
Tested-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lorenzo Bianconi [Tue, 16 Jan 2018 22:01:55 +0000 (23:01 +0100)]
l2tp: remove l2specific_len dependency in l2tp_core
Remove l2specific_len dependency while building l2tpv3 header or
parsing the received frame since default L2-Specific Sublayer is
always four bytes long and we don't need to rely on a user supplied
value.
Moreover in l2tp netlink code there are no sanity checks to
enforce the relation between l2specific_len and l2specific_type,
so sending a malformed netlink message is possible to set
l2specific_type to L2TP_L2SPECTYPE_DEFAULT (or even
L2TP_L2SPECTYPE_NONE) and set l2specific_len to a value greater than
4 leaking memory on the wire and sending corrupted frames.
Reviewed-by: Guillaume Nault <g.nault@alphalink.fr>
Tested-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lorenzo Bianconi [Tue, 16 Jan 2018 22:01:54 +0000 (23:01 +0100)]
l2tp: double-check l2specific_type provided by userspace
Add sanity check on l2specific_type provided by userspace in
l2tp_nl_cmd_session_create() since just L2TP_L2SPECTYPE_DEFAULT and
L2TP_L2SPECTYPE_NONE are currently supported.
Moreover explicitly set l2specific_type to L2TP_L2SPECTYPE_DEFAULT
only if the userspace does not provide a value for it
Reviewed-by: Guillaume Nault <g.nault@alphalink.fr>
Tested-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 19 Jan 2018 19:56:32 +0000 (14:56 -0500)]
Merge branch 'cxgb4-reduce-memory-footprint-for-collecting-firmware-dump'
Rahul Lakkireddy says:
====================
cxgb4: reduce memory footprint for collecting firmware dump
Firmware dump can be large (upto 2 GB). In low memory conditions,
ethtool fails to allocate such large memory. So, use zlib deflate
to compress collected firmware dump.
Patch 1 updates collection logic to use compression.
Patch 2 adds zlib deflate to compress collected firmware dump.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Rahul Lakkireddy [Wed, 17 Jan 2018 07:23:47 +0000 (12:53 +0530)]
cxgb4: use zlib deflate to compress firmware dump
Use zlib deflate to compress firmware dump. Collect and compress
as much firmware dump as possible into a 32 MB buffer.
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Vishal Kulkarni <vishal@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rahul Lakkireddy [Wed, 17 Jan 2018 07:23:46 +0000 (12:53 +0530)]
cxgb4: update dump collection logic to use compression
Update firmware dump collection logic to use compression when available.
Let collection logic attempt to do compression, instead of returning out
of memory early.
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Vishal Kulkarni <vishal@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Talat Batheesh [Wed, 17 Jan 2018 21:13:16 +0000 (23:13 +0200)]
net/dim: Fix fixpoint divide exception in net_dim_stats_compare
Helmut reported a bug about devision by zero while
running traffic and doing physical cable pull test.
When the cable unplugged the ppms become zero, so when
dividing the current ppms by the previous ppms in the
next dim iteration there is devision by zero.
This patch prevent this division for both ppms and epms.
Fixes:
c3164d2fc48f ("net/mlx5e: Added BW check for DIM decision mechanism")
Fixes:
4c4dbb4a7363 ("net/mlx5e: Move dynamic interrupt coalescing code to include/linux")
Reported-by: Helmut Grauer <helmut.grauer@de.ibm.com>
Signed-off-by: Talat Batheesh <talatb@mellanox.com>
Signed-off-by: Tal Gilboa <talgi@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 19 Jan 2018 19:38:19 +0000 (11:38 -0800)]
Merge tag 'trace-v4.15-rc4-3' of git://git./linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"Two more small fixes
- The conversion of enums into their actual numbers to display in the
event format file had an off-by-one bug, that could cause an enum
not to be converted, and break user space parsing tools.
- A fix to a previous fix to bring back the context recursion checks.
The interrupt case checks for NMI, IRQ and softirq, but the softirq
returned the same number regardless if it was set or not, although
the logic would force it to be set if it were hit"
* tag 'trace-v4.15-rc4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Fix converting enum's from the map in trace_event_eval_update()
ring-buffer: Fix duplicate results in mapping context to bits in recursive lock
Wei Yongjun [Wed, 17 Jan 2018 03:27:42 +0000 (03:27 +0000)]
devlink: Make some functions static
Fixes the following sparse warnings:
net/core/devlink.c:2297:25: warning:
symbol 'devlink_resource_find' was not declared. Should it be static?
net/core/devlink.c:2322:6: warning:
symbol 'devlink_resource_validate_children' was not declared. Should it be static?
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 19 Jan 2018 19:36:09 +0000 (11:36 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:
- a fix for use-after-free in Synaptics RMI4 driver
- correction to multitouch contact tracking on certain ALPS touchpads
(which got broken when we tried to fix the 2-finger scrolling)
- touchpad on Lenovo T640p is switched over to SMbus/RMI
- a few device node refcount fixes
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: synaptics-rmi4 - prevent UAF reported by KASAN
Input: ALPS - fix multi-touch decoding on SS4 plus touchpads
Input: synaptics - Lenovo Thinkpad T460p devices should use RMI
Input: of_touchscreen - add MODULE_LICENSE
Input: 88pm860x-ts - fix child-node lookup
Input: twl6040-vibra - fix child-node lookup
Input: twl4030-vibra - fix sibling-node lookup
Wei Yongjun [Wed, 17 Jan 2018 03:27:33 +0000 (03:27 +0000)]
mlxsw: spectrum: Make function mlxsw_sp_kvdl_part_occ() static
Fixes the following sparse warning:
drivers/net/ethernet/mellanox/mlxsw/spectrum_kvdl.c:289:5: warning:
symbol 'mlxsw_sp_kvdl_part_occ' was not declared. Should it be static?
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Zhu Yanjun [Wed, 17 Jan 2018 02:59:41 +0000 (21:59 -0500)]
forcedeth: remove unused variable
The variable miistat is not used. So it is removed.
CC: Srinivas Eeda <srinivas.eeda@oracle.com>
CC: Joe Jin <joe.jin@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 19 Jan 2018 19:30:06 +0000 (11:30 -0800)]
Merge branch 'i2c/for-current-fixed' of git://git./linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"Two bugfixes for the I2C core: Lixing Wang fixed a refcounting problem
with DT nodes. Jeremy Compostella fixed a buffer overflow possibility
when using a 'don't use' ioctl interface directly"
* 'i2c/for-current-fixed' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: core-smbus: prevent stack corruption on read I2C_BLOCK_DATA
i2c: core: decrease reference count of device node in i2c_unregister_device
Linus Torvalds [Fri, 19 Jan 2018 19:26:59 +0000 (11:26 -0800)]
Merge branch 'for-4.15-fixes' of git://git./linux/kernel/git/tj/libata
Pull libata fixlet from Tejun Heo:
"This just adds one more entry for liteon optical drives to the device
blacklist for large IOs.
The change is very low risk"
* 'for-4.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
libata: apply MAX_SEC_1024 to all LITEON EP1 series devices
Linus Torvalds [Fri, 19 Jan 2018 19:25:17 +0000 (11:25 -0800)]
Merge branch 'for-4.15-fixes' of git://git./linux/kernel/git/tj/cgroup
Pull cgroup fix from Tejun Heo:
"cgroup.threads should be delegatable (ie. a container should be able
to write to it from inside) but was missing the flag.
The change is very low risk"
* 'for-4.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: make cgroup.threads delegatable
Linus Torvalds [Fri, 19 Jan 2018 19:23:39 +0000 (11:23 -0800)]
Merge branch 'for-4.15-fixes' of git://git./linux/kernel/git/tj/wq
Pull workqueue fixlet from Tejun Heo:
"One patch to add touch_nmi_watchdog() while dumping workqueue debug
messages to avoid triggering the lockup detector spuriously.
The change is very low risk"
* 'for-4.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: avoid hard lockups in show_workqueue_state()
Linus Torvalds [Fri, 19 Jan 2018 19:21:31 +0000 (11:21 -0800)]
Merge tag 'armsoc-fixes' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Arnd Bergmann:
"We have various small DT fixes, and one important regression fix:
The recent device tree bugfixes that were intended to address issues
that 'dtc' started warning about in 4.15 fixed various USB PHY device
nodes, but it turns out that we had code that depended on those nodes
being incorrect and the probe failing with a particular error code.
With the workaround we can also deal with correct device nodes.
The DT fixes include:
- Allwinner A10 and A20 had the display pipeline set up incorrectly
(introduced in v4.15)
- The Altera PMU lacked an interrupt-parent (never worked)
- Pin muxing on the Openblocks A7 (never worked)
- Clocks might get set up wrong on Armada 7K/8K (4.15 regression)
We now have additional device tree patches to address all the
remaining warnings introduced in 4.15, but decided to queue them for
4.16 instead, to avoid risking another regression like the USB PHY
thing mentioned above.
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
phy: work around 'phys' references to usb-nop-xceiv devices
ARM: sunxi_defconfig: Enable CMA
arm64: dts: socfpga: add missing interrupt-parent
ARM: dts: sun[47]i: Fix display backend 1 output to TCON0 remote endpoint
ARM64: dts: marvell: armada-cp110: Fix clock resources for various node
ARM: dts: da850-lcdk: Remove leading 0x and 0s from unit address
ARM: dts: kirkwood: fix pin-muxing of MPP7 on OpenBlocks A7
Linus Torvalds [Fri, 19 Jan 2018 19:19:11 +0000 (11:19 -0800)]
Merge tag 'powerpc-4.15-8' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"More than we'd like after rc8, but nothing very alarming either, just
tying up loose ends before the release:
Since we changed powernv to use cpufreq_get() from show_cpuinfo(), we
see warnings with PREEMPT enabled. But the preempt_disable() in
show_cpuinfo() doesn't actually prevent CPU hotplug as it suggests, so
remove it.
Two updates to the recently merged RFI flush code. Wire up the generic
sysfs file to report the status, and add a debugfs file to allow
enabling/disabling it at runtime.
Two updates to xmon, one to add the RFI flush related fields to the
paca dump, and another to not use hashed pointers in the paca dump.
And one minor fix to add a missing include of linux/types.h in
asm/hvcall.h, not seen to break the build in upstream, but correct
anyway.
Thanks to: Benjamin Herrenschmidt, Michal Suchanek, Nicholas Piggin"
* tag 'powerpc-4.15-8' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/pseries: include linux/types.h in asm/hvcall.h
powerpc/64s: Allow control of RFI flush via debugfs
powerpc/64s: Wire up cpu_show_meltdown()
powerpc: Don't preempt_disable() in show_cpuinfo()
powerpc/xmon: Don't print hashed pointers in paca dump
powerpc/xmon: Add RFI flush related fields to paca dump
Eric Dumazet [Tue, 16 Jan 2018 23:40:00 +0000 (15:40 -0800)]
ipv6: mcast: remove dead code
Since commit
41033f029e39 ("snmp: Remove duplicate OUTMCAST stat
increment") one line of code became unneeded.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 19 Jan 2018 19:16:01 +0000 (11:16 -0800)]
Merge tag 'drm-fixes-for-v4.15-rc9' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"Nouveau, i915, vmwgfx and sun4i regression fixes.
The i915 change fixes a display corruption problem introduced in 4.15,
the nouveau changes are for regressions in 4.15, one of the vmwgfx
fixes goes back a little further, the other is a 4.15 regression fix,
the 3 sun4i changes fix blank HDMI output on those devices"
* tag 'drm-fixes-for-v4.15-rc9' of git://people.freedesktop.org/~airlied/linux:
drm/nouveau/mmu/mcp77: fix regressions in stolen memory handling
drm/nouveau/bar/gk20a: Avoid bar teardown during init
drm/nouveau/drm/nouveau: Pass the proper arguments to nvif_object_map_handle()
drm/vmwgfx: fix memory corruption with legacy/sou connectors
drm/vmwgfx: Fix a boot time warning
drm/i915: Fix deadlock in i830_disable_pipe()
drm/i915: Redo plane sanitation during readout
drm/i915: Add .get_hw_state() method for planes
drm/sun4i: hdmi: Add missing rate halving check in sun4i_tmds_determine_rate
drm/sun4i: hdmi: Fix incorrect assignment in sun4i_tmds_determine_rate
drm/sun4i: hdmi: Check for unset best_parent in sun4i_tmds_determine_rate
Arnd Bergmann [Tue, 16 Jan 2018 16:34:00 +0000 (17:34 +0100)]
caif: reduce stack size with KASAN
When CONFIG_KASAN is set, we can use relatively large amounts of kernel
stack space:
net/caif/cfctrl.c:555:1: warning: the frame size of 1600 bytes is larger than 1280 bytes [-Wframe-larger-than=]
This adds convenience wrappers around cfpkt_extr_head(), which is responsible
for most of the stack growth. With those wrapper functions, gcc apparently
starts reusing the stack slots for each instance, thus avoiding the
problem.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>