platform/kernel/linux-rpi.git
6 years agoDocumentation: e100: Update the Intel 10/100 driver doc
Jeff Kirsher [Thu, 10 May 2018 19:20:13 +0000 (12:20 -0700)]
Documentation: e100: Update the Intel 10/100 driver doc

Over the years, several of the links have changed or are no longer valid
so update them.  In addition, the default values were incorrect for a
couple of parameters.

Converted the text file to the reStructuredText (RST) format, since the
Linux kernel documentation now uses this format for documentation.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
6 years agoe1000e: Ignore TSYNCRXCTL when getting I219 clock attributes
Benjamin Poirier [Thu, 10 May 2018 07:28:35 +0000 (16:28 +0900)]
e1000e: Ignore TSYNCRXCTL when getting I219 clock attributes

There have been multiple reports of crashes that look like
kernel: RIP: 0010:[<ffffffff8110303f>] timecounter_read+0xf/0x50
[...]
kernel: Call Trace:
kernel:  [<ffffffffa0806b0f>] e1000e_phc_gettime+0x2f/0x60 [e1000e]
kernel:  [<ffffffffa0806c5d>] e1000e_systim_overflow_work+0x1d/0x80 [e1000e]
kernel:  [<ffffffff810992c5>] process_one_work+0x155/0x440
kernel:  [<ffffffff81099e16>] worker_thread+0x116/0x4b0
kernel:  [<ffffffff8109f422>] kthread+0xd2/0xf0
kernel:  [<ffffffff8163184f>] ret_from_fork+0x3f/0x70

These can be traced back to the fact that e1000e_systim_reset() skips the
timecounter_init() call if e1000e_get_base_timinca() returns -EINVAL, which
leads to a null deref in timecounter_read().

Commit 83129b37ef35 ("e1000e: fix systim issues", v4.2-rc1) reworked
e1000e_get_base_timinca() in such a way that it can return -EINVAL for
e1000_pch_spt if the SYSCFI bit is not set in TSYNCRXCTL.

Some experimentation has shown that on I219 (e1000_pch_spt, "MAC: 12")
adapters, the E1000_TSYNCRXCTL_SYSCFI flag is unstable; TSYNCRXCTL reads
sometimes don't have the SYSCFI bit set. Retrying the read shortly after
finds the bit to be set. This was observed at boot (probe) but also link up
and link down.

Moreover, the phc (PTP Hardware Clock) seems to operate normally even after
reads where SYSCFI=0. Therefore, remove this register read and
unconditionally set the clock parameters.

Reported-by: Achim Mildenberger <admin@fph.physik.uni-karlsruhe.de>
Message-Id: <20180425065243.g5mqewg5irkwgwgv@f2>
Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1075876
Fixes: 83129b37ef35 ("e1000e: fix systim issues")
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
6 years agoMerge branch 'selftests-net-various'
David S. Miller [Mon, 4 Jun 2018 13:50:12 +0000 (09:50 -0400)]
Merge branch 'selftests-net-various'

Willem de Bruijn says:

====================
selftests/net: various

A few odds and ends to network tests:

- msg_zerocopy: run as part of kselftest
- udp gso:      add missing bounds test for minimal sizes
- psocket_snd:  initial basic conformance test
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests/net: add packet socket packet_snd test
Willem de Bruijn [Thu, 31 May 2018 16:14:40 +0000 (12:14 -0400)]
selftests/net: add packet socket packet_snd test

Add regression tests for PF_PACKET transmission using packet_snd.

The TPACKET ring interface has tests for transmission and reception.
This is an initial stab at the same for the send call based interface.

Packets are sent over loopback, then read twice. The entire packet is
read from another packet socket and compared. The packet is also
verified to arrive at a UDP socket for protocol conformance.

The test sends a packet over loopback, testing the following options
(not the full cross-product):

- SOCK_DGRAM
- SOCK_RAW
- vlan tag
- qdisc bypass
- bind() and sendto()
- virtio_net_hdr
- csum offload (NOT actual csum feature, ignored on loopback)
- gso

Besides these basic functionality tests, the test runs from a set
of bounds checks, positive and negative. Running over loopback, which
has dev->min_header_len, it cannot generate variable length hhlen.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests/net: udpgso: test small gso_size boundary conditions
Willem de Bruijn [Thu, 31 May 2018 16:14:39 +0000 (12:14 -0400)]
selftests/net: udpgso: test small gso_size boundary conditions

Verify that udpgso can generate segments smaller than device mtu, down
to the extreme case of 1B gso_size.

Verify that irrespective of gso_size, udpgso restricts the number of
segments it will generate per call (UDP_MAX_SEGMENTS).

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests/net: enable msg_zerocopy test
Willem de Bruijn [Thu, 31 May 2018 16:14:38 +0000 (12:14 -0400)]
selftests/net: enable msg_zerocopy test

The existing msg_zerocopy test takes additional protocol arguments.
Add a variant that takes no arguments and runs all supported variants.
Call this from kselftest.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: virtio: simplify the virtnet_find_vqs
Tonghao Zhang [Thu, 31 May 2018 14:16:32 +0000 (07:16 -0700)]
net: virtio: simplify the virtnet_find_vqs

Use the common free functions while return successfully.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge tag 'wireless-drivers-next-for-davem-2018-05-31' of git://git.kernel.org/pub...
David S. Miller [Sun, 3 Jun 2018 15:03:10 +0000 (11:03 -0400)]
Merge tag 'wireless-drivers-next-for-davem-2018-05-31' of git://git./linux/kernel/git/kvalo/wireless-drivers-next

Kalle Valo says:

====================
wireless-drivers-next patches for 4.18

Hopefully the last pull request to 4.18 before the merge window.
Nothing major here, we have smaller new features and of course a lots
of fixes.

Major changes:

ath10k

* add memory dump support for QCA9888 and QCA99X0

* add support to configure channel dwell time

* support new DFS host confirmation feature in the firmware

ath

* update various regulatory mappings

wcn36xx

* various fixes to improve reliability

* add Factory Test Mode support

brmfmac

* add debugfs file for reading firmware capabilities

mwifiex

* support sysfs initiated device coredump
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agovlan: use non-archaic spelling of failes
Thadeu Lima de Souza Cascardo [Thu, 31 May 2018 12:20:20 +0000 (09:20 -0300)]
vlan: use non-archaic spelling of failes

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: axienet: remove stale comment of axienet_open
YueHaibing [Thu, 31 May 2018 11:51:15 +0000 (19:51 +0800)]
net: axienet: remove stale comment of axienet_open

axienet_open no longer return -ENODEV when PHY cannot be connected to
since commit d7cc3163e026 ("net: axienet: Support phy-less mode of operation")

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/ncsi: Avoid GFP_KERNEL in response handler
Samuel Mendoza-Jonas [Thu, 31 May 2018 07:02:54 +0000 (17:02 +1000)]
net/ncsi: Avoid GFP_KERNEL in response handler

ncsi_rsp_handler_gc() allocates the filter arrays using GFP_KERNEL in
softirq context, causing the below backtrace. This allocation is only a
few dozen bytes during probing so allocate with GFP_ATOMIC instead.

[   42.813372] BUG: sleeping function called from invalid context at mm/slab.h:416
[   42.820900] in_atomic(): 1, irqs_disabled(): 0, pid: 213, name: kworker/0:1
[   42.827893] INFO: lockdep is turned off.
[   42.832023] CPU: 0 PID: 213 Comm: kworker/0:1 Tainted: G        W       4.13.16-01441-gad99b38 #65
[   42.841007] Hardware name: Generic DT based system
[   42.845966] Workqueue: events ncsi_dev_work
[   42.850251] [<8010a494>] (unwind_backtrace) from [<80107510>] (show_stack+0x20/0x24)
[   42.858046] [<80107510>] (show_stack) from [<80612770>] (dump_stack+0x20/0x28)
[   42.865309] [<80612770>] (dump_stack) from [<80148248>] (___might_sleep+0x230/0x2b0)
[   42.873241] [<80148248>] (___might_sleep) from [<80148334>] (__might_sleep+0x6c/0xac)
[   42.881129] [<80148334>] (__might_sleep) from [<80240d6c>] (__kmalloc+0x210/0x2fc)
[   42.888737] [<80240d6c>] (__kmalloc) from [<8060ad54>] (ncsi_rsp_handler_gc+0xd0/0x170)
[   42.896770] [<8060ad54>] (ncsi_rsp_handler_gc) from [<8060b454>] (ncsi_rcv_rsp+0x16c/0x1d4)
[   42.905314] [<8060b454>] (ncsi_rcv_rsp) from [<804d86c8>] (__netif_receive_skb_core+0x3c8/0xb50)
[   42.914158] [<804d86c8>] (__netif_receive_skb_core) from [<804d96cc>] (__netif_receive_skb+0x20/0x7c)
[   42.923420] [<804d96cc>] (__netif_receive_skb) from [<804de4b0>] (netif_receive_skb_internal+0x78/0x6a4)
[   42.932931] [<804de4b0>] (netif_receive_skb_internal) from [<804df980>] (netif_receive_skb+0x78/0x158)
[   42.942292] [<804df980>] (netif_receive_skb) from [<8042f204>] (ftgmac100_poll+0x43c/0x4e8)
[   42.950855] [<8042f204>] (ftgmac100_poll) from [<804e094c>] (net_rx_action+0x278/0x4c4)
[   42.958918] [<804e094c>] (net_rx_action) from [<801016a8>] (__do_softirq+0xe0/0x4c4)
[   42.966716] [<801016a8>] (__do_softirq) from [<8011cd9c>] (do_softirq.part.4+0x50/0x78)
[   42.974756] [<8011cd9c>] (do_softirq.part.4) from [<8011cebc>] (__local_bh_enable_ip+0xf8/0x11c)
[   42.983579] [<8011cebc>] (__local_bh_enable_ip) from [<804dde08>] (__dev_queue_xmit+0x260/0x890)
[   42.992392] [<804dde08>] (__dev_queue_xmit) from [<804df1f0>] (dev_queue_xmit+0x1c/0x20)
[   43.000689] [<804df1f0>] (dev_queue_xmit) from [<806099c0>] (ncsi_xmit_cmd+0x1c0/0x244)
[   43.008763] [<806099c0>] (ncsi_xmit_cmd) from [<8060dc14>] (ncsi_dev_work+0x2e0/0x4c8)
[   43.016725] [<8060dc14>] (ncsi_dev_work) from [<80133dfc>] (process_one_work+0x214/0x6f8)
[   43.024940] [<80133dfc>] (process_one_work) from [<80134328>] (worker_thread+0x48/0x558)
[   43.033070] [<80134328>] (worker_thread) from [<8013ba80>] (kthread+0x130/0x174)
[   43.040506] [<8013ba80>] (kthread) from [<80102950>] (ret_from_fork+0x14/0x24)

Fixes: 062b3e1b6d4f ("net/ncsi: Refactor MAC, VLAN filters")
Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: netcp: ethss: remove unnecessary pointer set to NULL
YueHaibing [Thu, 31 May 2018 03:48:48 +0000 (11:48 +0800)]
net: netcp: ethss: remove unnecessary pointer set to NULL

If statement has make sure the 'slave->phy' is NULL

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/smc: fix error return code in smc_setsockopt()
Wei Yongjun [Thu, 31 May 2018 02:31:22 +0000 (02:31 +0000)]
net/smc: fix error return code in smc_setsockopt()

Fix to return error code -EINVAL instead of 0 if optlen is invalid.

Fixes: 01d2f7e2cdd3 ("net/smc: sockopts TCP_NODELAY and TCP_CORK")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/mlx5: Make function mlx5_fpga_tls_send_teardown_cmd() static
Wei Yongjun [Thu, 31 May 2018 02:31:12 +0000 (02:31 +0000)]
net/mlx5: Make function mlx5_fpga_tls_send_teardown_cmd() static

Fixes the following sparse warning:

drivers/net/ethernet/mellanox/mlx5/core/fpga/tls.c:199:6: warning:
 symbol 'mlx5_fpga_tls_send_teardown_cmd' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agohv_netvsc: fix error return code in netvsc_probe()
Wei Yongjun [Thu, 31 May 2018 02:04:43 +0000 (02:04 +0000)]
hv_netvsc: fix error return code in netvsc_probe()

Fix to return a negative error code from the failover register fail
error handling case instead of 0, as done elsewhere in this function.

Fixes: 1ff78076d8dd ("netvsc: refactor notifier/event handling code to use the failover framework")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
David S. Miller [Sun, 3 Jun 2018 13:31:58 +0000 (09:31 -0400)]
Merge git://git./linux/kernel/git/davem/net

Filling in the padding slot in the bpf structure as a bug fix in 'ne'
overlapped with actually using that padding area for something in
'net-next'.

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: phy: consider PHY_IGNORE_INTERRUPT in state machine PHY_NOLINK handling
Heiner Kallweit [Wed, 30 May 2018 20:13:20 +0000 (22:13 +0200)]
net: phy: consider PHY_IGNORE_INTERRUPT in state machine PHY_NOLINK handling

We can bail out immediately also in case of PHY_IGNORE_INTERRUPT because
phy_mac_interupt() informs us once the link is up.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
David S. Miller [Sun, 3 Jun 2018 12:24:27 +0000 (08:24 -0400)]
Merge git://git./linux/kernel/git/pablo/nf-next

Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for your net-next tree:

1) Get rid of nf_sk_is_transparent(), use inet_sk_transparent() instead.
   From Máté Eckl.

2) Move shared tproxy infrastructure to nf_tproxy_ipv4 and nf_tproxy_ipv6.
   Also from Máté.

3) Add hashtable to speed up chain lookups by name, from Florian Westphal.

4) Patch series to add connlimit support reusing part of the
   nf_conncount infrastructure. This includes preparation changes such
   passing context to the object and expression destroy interface;
   garbage collection for expressions embedded into set elements, and
   the introduction of the clone_destroy interface for expressions.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Linus Torvalds [Sun, 3 Jun 2018 00:35:53 +0000 (17:35 -0700)]
Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

 1) Infinite loop in _decode_session6(), from Eric Dumazet.

 2) Pass correct argument to nla_strlcpy() in netfilter, also from Eric
    Dumazet.

 3) Out of bounds memory access in ipv6 srh code, from Mathieu Xhonneux.

 4) NULL deref in XDP_REDIRECT handling of tun driver, from Toshiaki
    Makita.

 5) Incorrect idr release in cls_flower, from Paul Blakey.

 6) Probe error handling fix in davinci_emac, from Dan Carpenter.

 7) Memory leak in XPS configuration, from Alexander Duyck.

 8) Use after free with cloned sockets in kcm, from Kirill Tkhai.

 9) MTU handling fixes fo ip_tunnel and ip6_tunnel, from Nicolas
    Dichtel.

10) Fix UAPI hole in bpf data structure for 32-bit compat applications,
    from Daniel Borkmann.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (33 commits)
  bpf: fix uapi hole for 32 bit compat applications
  net: usb: cdc_mbim: add flag FLAG_SEND_ZLP
  ip6_tunnel: remove magic mtu value 0xFFF8
  ip_tunnel: restore binding to ifaces with a large mtu
  net: dsa: b53: Add BCM5389 support
  kcm: Fix use-after-free caused by clonned sockets
  net-sysfs: Fix memory leak in XPS configuration
  ixgbe: fix parsing of TC actions for HW offload
  net: ethernet: davinci_emac: fix error handling in probe()
  net/ncsi: Fix array size in dumpit handler
  cls_flower: Fix incorrect idr release when failing to modify rule
  net/sonic: Use dma_mapping_error()
  xfrm Fix potential error pointer dereference in xfrm_bundle_create.
  vhost_net: flush batched heads before trying to busy polling
  tun: Fix NULL pointer dereference in XDP redirect
  be2net: Fix error detection logic for BE3
  net: qmi_wwan: Add Netgear Aircard 779S
  mlxsw: spectrum: Forbid creation of VLAN 1 over port/LAG
  atm: zatm: fix memcmp casting
  iwlwifi: pcie: compare with number of IRQs requested for, not number of CPUs
  ...

6 years agonetfilter: nf_tables: handle chain name lookups via rhltable
Florian Westphal [Sat, 2 Jun 2018 21:41:06 +0000 (23:41 +0200)]
netfilter: nf_tables: handle chain name lookups via rhltable

If there is a significant amount of chains list search is too slow, so
add an rhlist table for this.

This speeds up ruleset loading: for every new rule we have to check if
the name already exists in current generation.

We need to be able to cope with duplicate chain names in case a transaction
drops the nfnl mutex (for request_module) and the abort of this old
transaction is still pending.

The list is kept -- we need a way to iterate chains even if hash resize is
in progress without missing an entry.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
6 years agonetfilter: nf_tables: add connlimit support
Pablo Neira Ayuso [Sat, 2 Jun 2018 19:38:51 +0000 (21:38 +0200)]
netfilter: nf_tables: add connlimit support

This features which allows you to limit the maximum number of
connections per arbitrary key. The connlimit expression is stateful,
therefore it can be used from meters to dynamically populate a set, this
provides a mapping to the iptables' connlimit match. This patch also
comes that allows you define static connlimit policies.

This extension depends on the nf_conncount infrastructure.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
6 years agoMerge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Linus Torvalds [Sat, 2 Jun 2018 22:54:49 +0000 (15:54 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi

Pull SCSI fix from James Bottomley:
 "Eve of merge window fix: The original code was so bogus as to be
  casting the wrong generic device to an rport and proceeding to take
  actions based on the bogus values it found.

  Fortunately it seems the location that is dereferenced always exists,
  so the code hasn't oopsed yet, but it certainly annoys the memory
  checkers"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: scsi_transport_srp: Fix shost to rport translation

6 years agoMerge tag 'drm-fixes-for-v4.17-rc8' of git://people.freedesktop.org/~airlied/linux
Linus Torvalds [Sat, 2 Jun 2018 22:24:45 +0000 (15:24 -0700)]
Merge tag 'drm-fixes-for-v4.17-rc8' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:
 "A few final fixes:

  i915:
   - fix for potential Spectre vector in the new query uAPI
   - fix NULL pointer deref (FDO #106559)
   - DMI fix to hide LVDS for Radiant P845 (FDO #105468)

  amdgpu:
   - suspend/resume DC regression fix
   - underscan flicker fix on fiji
   - gamma setting fix after dpms

  omap:
   - fix oops regression

  core:
   - fix PSR timing

  dw-hdmi:
   - fix oops regression"

* tag 'drm-fixes-for-v4.17-rc8' of git://people.freedesktop.org/~airlied/linux:
  drm/amd/display: Update color props when modeset is required
  drm/amd/display: Make atomic-check validate underscan changes
  drm/bridge/synopsys: dw-hdmi: fix dw_hdmi_setup_rx_sense
  drm/amd/display: Fix BUG_ON during CRTC atomic check update
  drm/i915/query: nospec expects no more than an unsigned long
  drm/i915/query: Protect tainted function pointer lookup
  drm/i915/lvds: Move acpi lid notification registration to registration phase
  drm/i915: Disable LVDS on Radiant P845
  drm/omap: fix NULL deref crash with SDI displays
  drm/psr: Fix missed entry in PSR setup time table.

6 years agonetfilter: nf_tables: add destroy_clone expression
Pablo Neira Ayuso [Sat, 2 Jun 2018 21:38:50 +0000 (23:38 +0200)]
netfilter: nf_tables: add destroy_clone expression

Before this patch, cloned expressions are released via ->destroy. This
is a problem for the new connlimit expression since the ->destroy path
drop a reference on the conntrack modules and it unregisters hooks. The
new ->destroy_clone provides context that this expression is being
released from the packet path, so it is mirroring ->clone(), where
neither module reference is dropped nor hooks need to be unregistered -
because this done from the control plane path from the ->init() path.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
6 years agonetfilter: nf_tables: garbage collection for stateful expressions
Pablo Neira Ayuso [Sat, 2 Jun 2018 21:38:49 +0000 (23:38 +0200)]
netfilter: nf_tables: garbage collection for stateful expressions

Use garbage collector to schedule removal of elements based of feedback
from expression that this element comes with. Therefore, the garbage
collector is not guided by timeout expirations in this new mode.

The new connlimit expression sets on the NFT_EXPR_GC flag to enable this
behaviour, the dynset expression needs to explicitly enable the garbage
collector via set->ops->gc_init call.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
6 years agonetfilter: nf_tables: pass ctx to nf_tables_expr_destroy()
Pablo Neira Ayuso [Sat, 2 Jun 2018 21:38:48 +0000 (23:38 +0200)]
netfilter: nf_tables: pass ctx to nf_tables_expr_destroy()

nft_set_elem_destroy() can be called from call_rcu context. Annotate
netns and table in set object so we can populate the context object.
Moreover, pass context object to nf_tables_set_elem_destroy() from the
commit phase, since it is already available from there.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
6 years agonetfilter: nf_conncount: expose connection list interface
Pablo Neira Ayuso [Sat, 2 Jun 2018 21:38:47 +0000 (23:38 +0200)]
netfilter: nf_conncount: expose connection list interface

This patch provides an interface to maintain the list of connections and
the lookup function to obtain the number of connections in the list.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
6 years agonetfilter: nf_tables: pass context to object destroy indirection
Pablo Neira Ayuso [Sat, 2 Jun 2018 21:38:46 +0000 (23:38 +0200)]
netfilter: nf_tables: pass context to object destroy indirection

The new connlimit object needs this to properly deal with conntrack
dependencies.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
6 years agonetfilter: Libify xt_TPROXY
Máté Eckl [Fri, 1 Jun 2018 18:44:56 +0000 (20:44 +0200)]
netfilter: Libify xt_TPROXY

The extracted functions will likely be usefull to implement tproxy
support in nf_tables.

Extrancted functions:
- nf_tproxy_sk_is_transparent
- nf_tproxy_laddr4
- nf_tproxy_handle_time_wait4
- nf_tproxy_get_sock_v4
- nf_tproxy_laddr6
- nf_tproxy_handle_time_wait6
- nf_tproxy_get_sock_v6

(nf_)tproxy_handle_time_wait6 also needed some refactor as its current
implementation was xtables-specific.

Signed-off-by: Máté Eckl <ecklm94@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
6 years agonetfilter: Decrease code duplication regarding transparent socket option
Máté Eckl [Fri, 1 Jun 2018 12:54:07 +0000 (14:54 +0200)]
netfilter: Decrease code duplication regarding transparent socket option

There is a function in include/net/netfilter/nf_socket.h to decide if a
socket has IP(V6)_TRANSPARENT socket option set or not. However this
does the same as inet_sk_transparent() in include/net/tcp.h

include/net/tcp.h:1733
/* This helper checks if socket has IP_TRANSPARENT set */
static inline bool inet_sk_transparent(const struct sock *sk)
{
switch (sk->sk_state) {
case TCP_TIME_WAIT:
return inet_twsk(sk)->tw_transparent;
case TCP_NEW_SYN_RECV:
return inet_rsk(inet_reqsk(sk))->no_srccheck;
}
return inet_sk(sk)->transparent;
}

tproxy_sk_is_transparent has also been refactored to use this function
instead of reimplementing it.

Signed-off-by: Máté Eckl <ecklm94@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
6 years agoMerge branch 'drm-fixes-4.17' of git://people.freedesktop.org/~agd5f/linux into drm...
Dave Airlie [Sat, 2 Jun 2018 20:13:57 +0000 (06:13 +1000)]
Merge branch 'drm-fixes-4.17' of git://people.freedesktop.org/~agd5f/linux into drm-fixes

Two last minute DC fixes for 4.17.  A fix for underscan on fiji and
a fix for gamma settings getting after dpms.

* 'drm-fixes-4.17' of git://people.freedesktop.org/~agd5f/linux:
  drm/amd/display: Update color props when modeset is required
  drm/amd/display: Make atomic-check validate underscan changes

6 years agoMerge tag 'mips_fixes_4.17_3' of git://git.kernel.org/pub/scm/linux/kernel/git/mips...
Linus Torvalds [Sat, 2 Jun 2018 17:12:23 +0000 (10:12 -0700)]
Merge tag 'mips_fixes_4.17_3' of git://git./linux/kernel/git/mips/linux

Pull MIPS fixes from James Hogan:
 "A final few MIPS fixes for 4.17:

   - drop Lantiq gphy reboot/remove reset (4.14)

   - prctl(PR_SET_FP_MODE): Disallow PRE without FR (4.0)

   - ptrace(PTRACE_PEEKUSR): Fix 64-bit FGRs (3.15)"

* tag 'mips_fixes_4.17_3' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
  MIPS: ptrace: Fix PTRACE_PEEKUSR requests for 64-bit FGRs
  MIPS: prctl: Disallow FRE without FR with PR_SET_FP_MODE requests
  MIPS: lantiq: gphy: Drop reboot/remove reset asserts

6 years agoMerge tag 'vfio-v4.17' of git://github.com/awilliam/linux-vfio
Linus Torvalds [Sat, 2 Jun 2018 17:08:45 +0000 (10:08 -0700)]
Merge tag 'vfio-v4.17' of git://github.com/awilliam/linux-vfio

Pull VFIO fix from Alex Williamson:
 "Revert a pfn page mapping optimization identified as introducing a bad
  page state regression (Alex Williamson)"

* tag 'vfio-v4.17' of git://github.com/awilliam/linux-vfio:
  Revert "vfio/type1: Improve memory pinning process for raw PFN mapping"

6 years agoMerge tag 'char-misc-4.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregk...
Linus Torvalds [Sat, 2 Jun 2018 17:05:45 +0000 (10:05 -0700)]
Merge tag 'char-misc-4.17-rc8' of git://git./linux/kernel/git/gregkh/char-misc

Pull char/misc driver fixes from Greg KH:
 "Here are four small bugfixes for some char/misc drivers. Well, really
  three fixes and one fix for one of those fixes due to problems found
  by 0-day.

  This resolves some reported issues with the hwtracing drivers, and a
  reported regression for the thunderbolt subsystem. All of these have
  been in linux-next for a while now with no reported problems"

* tag 'char-misc-4.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  hwtracing: stm: fix build error on some arches
  intel_th: Use correct device when freeing buffers
  stm class: Use vmalloc for the master map
  thunderbolt: Handle NULL boot ACL entries properly

6 years agoMerge tag 'staging-4.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh...
Linus Torvalds [Sat, 2 Jun 2018 17:02:14 +0000 (10:02 -0700)]
Merge tag 'staging-4.17-rc8' of git://git./linux/kernel/git/gregkh/staging

Pull IIO driver fixes from Greg KH:
 "Here are some old IIO driver fixes that were sitting in my tree for a
  few weeks. Sorry about not getting them to you sooner. They fix a
  number of small IIO driver issues that have been reported.

  All of these have been in linux-next for a while with no reported
  problems"

* tag 'staging-4.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  iio: adc: select buffer for at91-sama5d2_adc
  iio: hid-sensor-trigger: Fix sometimes not powering up the sensor after resume
  iio: adc: at91-sama5d2_adc: fix channel configuration for differential channels
  iio:kfifo_buf: check for uint overflow
  iio:buffer: make length types match kfifo types
  iio: adc: stm32-dfsdm: fix sample rate for div2 spi clock
  iio: adc: stm32-dfsdm: fix successive oversampling settings
  iio: ad7793: implement IIO_CHAN_INFO_SAMP_FREQ

6 years agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Linus Torvalds [Sat, 2 Jun 2018 16:55:44 +0000 (09:55 -0700)]
Merge tag 'for-linus' of git://git./linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:
 "Just three small last minute regressions that were found in the last
  week. The Broadcom fix is a bit big for rc7, but since it is fixing
  driver crash regressions that were merged via netdev into rc1, I am
  sending it.

   - bnxt netdev changes merged this cycle caused the bnxt RDMA driver
     to crash under certain situations

   - Arnd found (several, unfortunately) kconfig problems with the
     patches adding INFINIBAND_ADDR_TRANS. Reverting this last part,
     will fix it more fully outside -rc.

   - Subtle change in error code for a uapi function caused breakage in
     userspace. This was bug was subtly introduced cycle"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  IB/core: Fix error code for invalid GID entry
  IB: Revert "remove redundant INFINIBAND kconfig dependencies"
  RDMA/bnxt_re: Fix broken RoCE driver due to recent L2 driver changes

6 years agoMerge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa...
Linus Torvalds [Sat, 2 Jun 2018 16:52:22 +0000 (09:52 -0700)]
Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:
 "A documentation bugfix and a MAINTAINERS addition"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: ocores: update HDL sources URL
  i2c: xlp9xx: Add MAINTAINERS entry

6 years agoMerge branch 'akpm' (patches from Andrew)
Linus Torvalds [Sat, 2 Jun 2018 16:44:15 +0000 (09:44 -0700)]
Merge branch 'akpm' (patches from Andrew)

Merge two fixes from Andrew Morton.

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm: fix the NULL mapping case in __isolate_lru_page()
  mm/huge_memory.c: __split_huge_page() use atomic ClearPageDirty()

6 years agomm: fix the NULL mapping case in __isolate_lru_page()
Hugh Dickins [Fri, 1 Jun 2018 23:50:50 +0000 (16:50 -0700)]
mm: fix the NULL mapping case in __isolate_lru_page()

George Boole would have noticed a slight error in 4.16 commit
69d763fc6d3a ("mm: pin address_space before dereferencing it while
isolating an LRU page").  Fix it, to match both the comment above it,
and the original behaviour.

Although anonymous pages are not marked PageDirty at first, we have an
old habit of calling SetPageDirty when a page is removed from swap
cache: so there's a category of ex-swap pages that are easily
migratable, but were inadvertently excluded from compaction's async
migration in 4.16.

Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1805302014001.12558@eggly.anvils
Fixes: 69d763fc6d3a ("mm: pin address_space before dereferencing it while isolating an LRU page")
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Reported-by: Ivan Kalvachev <ikalvachev@gmail.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agomm/huge_memory.c: __split_huge_page() use atomic ClearPageDirty()
Hugh Dickins [Fri, 1 Jun 2018 23:50:45 +0000 (16:50 -0700)]
mm/huge_memory.c: __split_huge_page() use atomic ClearPageDirty()

Swapping load on huge=always tmpfs (with khugepaged tuned up to be very
eager, but I'm not sure that is relevant) soon hung uninterruptibly,
waiting for page lock in shmem_getpage_gfp()'s find_lock_entry(), most
often when "cp -a" was trying to write to a smallish file.  Debug showed
that the page in question was not locked, and page->mapping NULL by now,
but page->index consistent with having been in a huge page before.

Reproduced in minutes on a 4.15 kernel, even with 4.17's 605ca5ede764
("mm/huge_memory.c: reorder operations in __split_huge_page_tail()") added
in; but took hours to reproduce on a 4.17 kernel (no idea why).

The culprit proved to be the __ClearPageDirty() on tails beyond i_size in
__split_huge_page(): the non-atomic __bitoperation may have been safe when
4.8's baa355fd3314 ("thp: file pages support for split_huge_page()")
introduced it, but liable to erase PageWaiters after 4.10's 62906027091f
("mm: add PageWaiters indicating tasks are waiting for a page bit").

Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1805291841070.3197@eggly.anvils
Fixes: 62906027091f ("mm: add PageWaiters indicating tasks are waiting for a page bit")
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agoRevert "vfio/type1: Improve memory pinning process for raw PFN mapping"
Alex Williamson [Sat, 2 Jun 2018 14:41:44 +0000 (08:41 -0600)]
Revert "vfio/type1: Improve memory pinning process for raw PFN mapping"

Bisection by Amadeusz Sławiński implicates this commit leading to bad
page state issues after VM shutdown, likely due to unbalanced page
references.  The original commit was intended only as a performance
improvement, therefore revert for offline rework.

Link: https://lkml.org/lkml/2018/6/2/97
Fixes: 356e88ebe447 ("vfio/type1: Improve memory pinning process for raw PFN mapping")
Cc: Jason Cai (Xiang Feng) <jason.cai@linux.alibaba.com>
Reported-by: Amadeusz Sławiński <amade@asmblr.net>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
6 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
David S. Miller [Sat, 2 Jun 2018 13:04:21 +0000 (09:04 -0400)]
Merge git://git./linux/kernel/git/pablo/nf-next

Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

The following patchset contains Netfilter/IPVS updates for your net-next
tree, the most relevant things in this batch are:

1) Compile masquerade infrastructure into NAT module, from Florian Westphal.
   Same thing with the redirection support.

2) Abort transaction if early initialization of the commit phase fails.
   Also from Florian.

3) Get rid of synchronize_rcu() by using rule array in nf_tables, from
   Florian.

4) Abort nf_tables batch if fatal signal is pending, from Florian.

5) Use .call_rcu nfnetlink from nf_tables to make dumps fully lockless.
   From Florian Westphal.

6) Support to match transparent sockets from nf_tables, from Máté Eckl.

7) Audit support for nf_tables, from Phil Sutter.

8) Validate chain dependencies from commit phase, fall back to fine grain
   validation only in case of errors.

9) Attach dst to skbuff from netfilter flowtable packet path, from
   Jason A. Donenfeld.

10) Use artificial maximum attribute cap to remove VLA from nfnetlink.
    Patch from Kees Cook.

11) Add extension to allow to forward packets through neighbour layer.

12) Add IPv6 conntrack helper support to IPVS, from Julian Anastasov.

13) Add IPv6 FTP conntrack support to IPVS, from Julian Anastasov.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge tag 'mlx5e-updates-2018-06-01' of git://git.kernel.org/pub/scm/linux/kernel...
David S. Miller [Sat, 2 Jun 2018 12:55:01 +0000 (08:55 -0400)]
Merge tag 'mlx5e-updates-2018-06-01' of git://git./linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5e-updates-2018-06-01

1) From Tariq, Two patches to Fix IPoIB issues introduced in
   "net/mlx5e: TX, Use actual WQE size for SQ edge fill"

2) From Eran, Additional improvements to mlx5e statistics reporting

3) From Maor, Increase aRFS flow tables size

4) From Adi, Support MTU change for ethernet representors

5) From Ilan and Adi, Handle QP error events in FPGA

6) From Tariq, last 10 patches mainly deals with RX buffer scheme improvements for legacy RQ
   to use only order-0 pages and fragmented SKBs for large MTUs.

-  Tariq starts with some refactoring and removing HW LRO support from traditional
   (legacy) RQ, since it complicates the buffer scheme and removing it makes it smoother
   to move to cyclic descriptor buffer for traditional RQ.

- Use cyclic WQ in legacy RQ, which has many benefits and paves the way for fragmented SKBs
  for large MTUs.

- Enhance legacy Receive Queue memory scheme, such that only order-0 pages are used.
  Whenever possible, prefer using a linear SKB, and build it wrapping the WQE buffer.
  Otherwise (for example, jumbo frames on x86), use non-linear SKB, with as many frags
  as needed. In this case, multiple WQE scatter entries are used, up to a maximum of 4
  frags and 10KB of MTU.

- TX statistics access improvements.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
David S. Miller [Sat, 2 Jun 2018 12:07:52 +0000 (08:07 -0400)]
Merge git://git./pub/scm/linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2018-06-02

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) BPF uapi fix in struct bpf_prog_info and struct bpf_map_info in
   order to fix offsets on 32 bit archs.

This will have a minor merge conflict with net-next which has the
__u32 gpl_compatible:1 bitfield in struct bpf_prog_info at this
location. Resolution is to use the gpl_compatible member.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agobpf: fix uapi hole for 32 bit compat applications
Daniel Borkmann [Sat, 2 Jun 2018 03:21:59 +0000 (05:21 +0200)]
bpf: fix uapi hole for 32 bit compat applications

In 64 bit, we have a 4 byte hole between ifindex and netns_dev in the
case of struct bpf_map_info but also struct bpf_prog_info. In net-next
commit b85fab0e67b ("bpf: Add gpl_compatible flag to struct bpf_prog_info")
added a bitfield into it to expose some flags related to programs. Thus,
add an unnamed __u32 bitfield for both so that alignment keeps the same
in both 32 and 64 bit cases, and can be naturally extended from there
as in b85fab0e67b.

Before:

  # file test.o
  test.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped
  # pahole test.o
  struct bpf_map_info {
__u32                      type;                 /*     0     4 */
__u32                      id;                   /*     4     4 */
__u32                      key_size;             /*     8     4 */
__u32                      value_size;           /*    12     4 */
__u32                      max_entries;          /*    16     4 */
__u32                      map_flags;            /*    20     4 */
char                       name[16];             /*    24    16 */
__u32                      ifindex;              /*    40     4 */
__u64                      netns_dev;            /*    44     8 */
__u64                      netns_ino;            /*    52     8 */

/* size: 64, cachelines: 1, members: 10 */
/* padding: 4 */
  };

After (same as on 64 bit):

  # file test.o
  test.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped
  # pahole test.o
  struct bpf_map_info {
__u32                      type;                 /*     0     4 */
__u32                      id;                   /*     4     4 */
__u32                      key_size;             /*     8     4 */
__u32                      value_size;           /*    12     4 */
__u32                      max_entries;          /*    16     4 */
__u32                      map_flags;            /*    20     4 */
char                       name[16];             /*    24    16 */
__u32                      ifindex;              /*    40     4 */

/* XXX 4 bytes hole, try to pack */

__u64                      netns_dev;            /*    48     8 */
__u64                      netns_ino;            /*    56     8 */
/* --- cacheline 1 boundary (64 bytes) --- */

/* size: 64, cachelines: 1, members: 10 */
/* sum members: 60, holes: 1, sum holes: 4 */
  };

Reported-by: Dmitry V. Levin <ldv@altlinux.org>
Reported-by: Eugene Syromiatnikov <esyr@redhat.com>
Fixes: 52775b33bb507 ("bpf: offload: report device information about offloaded maps")
Fixes: 675fc275a3a2d ("bpf: offload: report device information for offloaded programs")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
6 years agonet/mlx5e: TX, Separate cachelines of xmit and completion stats
Tariq Toukan [Wed, 18 Apr 2018 10:29:11 +0000 (13:29 +0300)]
net/mlx5e: TX, Separate cachelines of xmit and completion stats

Avoid false sharing of cachelines by separating the cachelines of
TX stats that are dertied in xmit flow and in completion flow.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
6 years agonet/mlx5e: RX, Always prefer Linear SKB configuration
Tariq Toukan [Tue, 20 Feb 2018 13:17:54 +0000 (15:17 +0200)]
net/mlx5e: RX, Always prefer Linear SKB configuration

Prefer the linear SKB configuration of Legacy RQ over the
non-linear one of Striding RQ.

This implies that ConnectX-4 LX now uses legacy RQ by default,
as it does not support the linear configuration of Striding RQ.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
6 years agonet/mlx5e: RX, Enhance legacy Receive Queue memory scheme
Tariq Toukan [Wed, 2 May 2018 15:23:58 +0000 (18:23 +0300)]
net/mlx5e: RX, Enhance legacy Receive Queue memory scheme

Enhance the memory scheme of the legacy RQ, such that
only order-0 pages are used.

Whenever possible, prefer using a linear SKB, and build it
wrapping the WQE buffer.

Otherwise (for example, jumbo frames on x86), use non-linear SKB,
with as many frags as needed. In this case, multiple WQE
scatter entries are used, up to a maximum of 4 frags and 10KB of MTU.

This implied to remove support of HW LRO in legacy RQ, as it would
require large number of page allocations and scatter entries per WQE
on archs with PAGE_SIZE = 4KB, yielding bad performance.

In earlier patches, we guaranteed that all completions are in-order,
and that we use a cyclic WQ.
This creates an oppurtunity for a performance optimization:
The mapping between a "struct mlx5e_dma_info", and the
WQEs (struct mlx5e_wqe_frag_info) pointing to it, is constant
across different cycles of a WQ. This allows initializing
the mapping in the time of RQ creation, and not handle it
in datapath.

A struct mlx5e_dma_info that is shared between different WQEs
is allocated by the first WQE, and freed by the last one.
This implies an important requirement: WQEs that share the same
struct mlx5e_dma_info must be posted within the same NAPI.
Otherwise, upon completion, struct mlx5e_wqe_frag_info would mistakenly
point to the new struct mlx5e_dma_info, not the one that was posted
(and the HW wrote to).
This bulking requirement is actually good also for performance reasons,
hence we extend the bulk beyong the minimal requirement above.

With this memory scheme, the RQs memory footprint is reduce by a
factor of 2 on x86, and by a factor of 32 on PowerPC.
Same factors apply for the number of pages in a GRO session.

Performance tests:
ConnectX-4, single core, single RX ring, default MTU.

x86:
CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz

Packet rate (early drop in TC): no degradation
TCP streams: ~5% improvement

PowerPC:
CPU: POWER8 (raw), altivec supported

Packet rate (early drop in TC): 20% gain
TCP streams: 25% gain

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
6 years agonet/mlx5e: RX, Use cyclic WQ in legacy RQ
Tariq Toukan [Mon, 2 Apr 2018 14:31:31 +0000 (17:31 +0300)]
net/mlx5e: RX, Use cyclic WQ in legacy RQ

Now that LRO is not supported for Legacy RQ, there is no source of
out-of-order completions in the WQ, and we can use a cyclic one.
This has multiple advantages:
- reduces the WQE size (smaller PCI transactions).
- lower overhead in datapath (no handling of 'next' pointers).
- no reserved WQE for the WQ head (was need in linked-list).
- allows using a constant map between frag and dma_info struct, in downstream patch.

Performance tests:
ConnectX-4, single core, single RX ring.
Major gain in packet rate of single ring XDP drop.
Bottleneck is shifted form HW (at 16Mpps) to SW (at 20Mpps).

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
6 years agonet/mlx5e: RX, Split WQ objects for different RQ types
Tariq Toukan [Mon, 2 Apr 2018 14:23:14 +0000 (17:23 +0300)]
net/mlx5e: RX, Split WQ objects for different RQ types

Replace the common RQ WQ object with two separate ones for the
different RQ types.
This is in preparation for switching to using a cyclic WQ type
in Legacy RQ.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
6 years agonet/mlx5e: RX, Remove HW LRO support in legacy RQ
Tariq Toukan [Mon, 2 Apr 2018 13:28:10 +0000 (16:28 +0300)]
net/mlx5e: RX, Remove HW LRO support in legacy RQ

Current LRO implementation in Legacy RQ uses high-order pages.
In downstream patches of this series we complete the transition
to using only order-0 pages in RX datapath (which was already done
in Striding RQ).

Unlike the more advanced Striding RQ, Legacy RQ does not make reuse
of any non-consumed buffers of non-full LRO sessions, and combining
it with order-0 pages has many performance drawbacks.

Hence, here we totally remove LRO support in Legacy RQ.
This guarantees having no out-of-order completions, which allows using
a cyclic work queue (instead of a linked-list) in a downstream patch.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
6 years agonet/mlx5e: RX, Dedicate a function for copying SKB header
Tariq Toukan [Mon, 16 Apr 2018 14:11:07 +0000 (17:11 +0300)]
net/mlx5e: RX, Dedicate a function for copying SKB header

Get the logic of copying the packet header into the SKB linear part
into a generic function. Function does copy length alignment
and dma buffer sync.

It is currently called only within the MPWQE flow.
In a downstream patch, it will be called within the legacy RQ flow
as well.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
6 years agonet/mlx5e: RX, Generalise function of SKB frag addition
Tariq Toukan [Mon, 2 Apr 2018 11:30:34 +0000 (14:30 +0300)]
net/mlx5e: RX, Generalise function of SKB frag addition

Rename it and pass truesize as an extra argument, as it will be used also
in Legacy RQ in a downstream patch.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
6 years agonet/mlx5e: RX, Generalise name of non-linear SKB head size
Tariq Toukan [Mon, 2 Apr 2018 11:30:34 +0000 (14:30 +0300)]
net/mlx5e: RX, Generalise name of non-linear SKB head size

Make name more generic by dropping MPWRQ from it, as it will be
used also in Legacy RQ in a downstream patch.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
6 years agonet/mlx5e: TX, Obsolete maintaining local copies of skb->len/data
Tariq Toukan [Thu, 24 May 2018 10:44:24 +0000 (13:44 +0300)]
net/mlx5e: TX, Obsolete maintaining local copies of skb->len/data

Instead of maintaining a local copy of skb->len/data and updating
it upon every copy to the WQE inline part, just calculate it once
when needed, using the ihs.

This obsoletes the function mlx5e_tx_skb_pull_inline.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
6 years agonet/mlx5: FPGA, Handle QP error event
Ilan Tayari [Tue, 29 May 2018 23:39:04 +0000 (16:39 -0700)]
net/mlx5: FPGA, Handle QP error event

Add handlers for this event to perform graceful teardown of the device.

Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Adi Nissim <adin@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
6 years agonet/mlx5e: Support configurable MTU for vport representors
Adi Nissim [Sun, 1 Apr 2018 13:54:27 +0000 (16:54 +0300)]
net/mlx5e: Support configurable MTU for vport representors

The representor MTU was hard coded to 1500 bytes.
Allow setting arbitrary MTU values up to the max supported by the FW.

Signed-off-by: Adi Nissim <adin@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
6 years agonet/mlx5e: Increase aRFS flow tables size
Maor Gottlieb [Thu, 3 May 2018 09:40:30 +0000 (12:40 +0300)]
net/mlx5e: Increase aRFS flow tables size

Increase the aRFS flow table size to 64k so it could contain up to 64k
different streams.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
6 years agonet/mlx5e: Remove redundant active_channels indication
Eran Ben Elisha [Tue, 29 May 2018 08:06:31 +0000 (11:06 +0300)]
net/mlx5e: Remove redundant active_channels indication

Now, when all channels stats are saved regardless of the channel's state
{open, closed}, we can safely remove this indication and the stats spin
lock which protects it.

Fixes: 76c3810bade3 ("net/mlx5e: Avoid reset netdev stats on configuration changes")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
6 years agonet/mlx5e: Present SW stats when state is not opened
Eran Ben Elisha [Tue, 29 May 2018 07:54:47 +0000 (10:54 +0300)]
net/mlx5e: Present SW stats when state is not opened

The driver can present all SW stats even when the state not opened.
Fixed get strings, count and stats to support it.

In addition, fix tc2txq to hold a static mapping which doesn't depend on
the amount of open channels, and cannot have the same value on two
different cells  while moving between configurations.
Example:
- OOB 16 channels
- Change to 2 channels, 8 TCs
- tc2txq[15][0] == tc2txq[1][7] == 15
This will cause multiple appearances of the same TX index in statistics
output.

Fixes: 76c3810bade3 ("net/mlx5e: Avoid reset netdev stats on configuration changes")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
6 years agonet/mlx5e: IPOIB, Add a missing skb_pull
Tariq Toukan [Thu, 31 May 2018 15:04:23 +0000 (18:04 +0300)]
net/mlx5e: IPOIB, Add a missing skb_pull

A call to mlx5e_tx_skb_pull_inline was mistakenly dropped
in the cited patch. Get it back.

Fixes: 043dc78ecf07 ("net/mlx5e: TX, Use actual WQE size for SQ edge fill")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
6 years agonet/mlx5e: IPOIB, Fix overflowing SQ WQE memset
Tariq Toukan [Thu, 31 May 2018 15:01:31 +0000 (18:01 +0300)]
net/mlx5e: IPOIB, Fix overflowing SQ WQE memset

IPoIB WQE size is larger than a single WQEBB.  Must not fetch the WQE,
and surely not memset it, until it is guaranteed that there are enough
WQEBBs available before getting to SQ/frag edge.

Fixes: 043dc78ecf07 ("net/mlx5e: TX, Use actual WQE size for SQ edge fill")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
6 years agoMerge branch 'hns3-next'
David S. Miller [Fri, 1 Jun 2018 18:23:59 +0000 (14:23 -0400)]
Merge branch 'hns3-next'

Salil Mehta says:

====================
Misc. bug fixes & optimizations for HNS3 driver

This patch-set presents some bug fixes found out during the internal
review and system testing and some small optimizations.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: Optimize the VF's process of updating multicast MAC
Xi Wang [Fri, 1 Jun 2018 16:52:11 +0000 (17:52 +0100)]
net: hns3: Optimize the VF's process of updating multicast MAC

In the update flow of the new PF driver, if a multicast address is in mta
table, the VF deletion action will not take effect.

This patch adds the VF adaptation according to the new flow of PF'driver.

Signed-off-by: Xi Wang <wangxi11@huawei.com>
Reviewed-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: Optimize the PF's process of updating multicast MAC
Xi Wang [Fri, 1 Jun 2018 16:52:10 +0000 (17:52 +0100)]
net: hns3: Optimize the PF's process of updating multicast MAC

In the current process, the multicast MAC is added to both MAC_VLAN
table and MTA table, this will reduce the utilization of the resource.

This patch improves the process of adding multicast MAC address, the
new process starts using the MTA table to add multicast MAC after the
MAC_VLAN table is full, and the MTA is disable if it is no longer used.

Signed-off-by: Xi Wang <wangxi11@huawei.com>
Reviewed-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: Fix for vxlan tx checksum bug
Yunsheng Lin [Fri, 1 Jun 2018 16:52:09 +0000 (17:52 +0100)]
net: hns3: Fix for vxlan tx checksum bug

when skb->encapsulation is 0, skb->ip_summed is CHECKSUM_PARTIAL
and it is udp packet, which has a dest port as the IANA assigned.
the hardware is expected to do the checksum offload, but the
hardware will not do the checksum offload when udp dest port is
4789.

This patch fixes it by doing the checksum in software.

Fixes: 76ad4f0ee747 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: Add missing break in misc_irq_handle
Yunsheng Lin [Fri, 1 Jun 2018 16:52:08 +0000 (17:52 +0100)]
net: hns3: Add missing break in misc_irq_handle

There is a break missing in the switch/case handling in
hclge_misc_irq_handle, which causes the log to output
uncorrectly.

This patch adds the missing break, and change the dev_dbg
to dev_warn in order to better catch the error.

Fixes: c1a81619d73a ("net: hns3: Add mailbox interrupt handling to PF driver")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: Fix for phy not link up problem after resetting
Yunsheng Lin [Fri, 1 Jun 2018 16:52:07 +0000 (17:52 +0100)]
net: hns3: Fix for phy not link up problem after resetting

When resetting, phy_state_machine may be accessing the phy through
firmware if the phy is not stopped or disconnected, which will
cause firemware timeout problem because the firmware is busy
processing the reset request.

This patch fixes it by disabling the phy when resetting.

Fixes: b940aeae0ed6 ("net: hns3: never send command queue message to IMP when reset")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: Fix for hclge_reset running repeatly problem
Yunsheng Lin [Fri, 1 Jun 2018 16:52:06 +0000 (17:52 +0100)]
net: hns3: Fix for hclge_reset running repeatly problem

When hardware sends the HCLGE_VECTOR0_EVENT_RST event through
hclge_misc_irq_handle, currently driver enables misc_vector in
the interrupt handle, and hardware generates the same interrupt
for the same reset event again and again until the reset is
complete, which causes hclge_reset running repeatly problem.

This patch fixes by enabling the misc_vector after reset is
complete.

Fixes: 4ed340ab8f49 ("net: hns3: Add reset process in hclge_main")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: Fix for service_task not running problem after resetting
Yunsheng Lin [Fri, 1 Jun 2018 16:52:05 +0000 (17:52 +0100)]
net: hns3: Fix for service_task not running problem after resetting

When hclge_ae_stop is called during resetting, it will cancel the
service_task by calling cancel_work_sync, which may cause the
service_task to exit without clearing HCLGE_STATE_SERVICE_SCHED
bit. If this happens, the service_task will never run again.

This patch fixes this problem by clearing it after calling
cancel_work_sync in hclge_ae_stop.

Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: Fix setting mac address error
Jian Shen [Fri, 1 Jun 2018 16:52:04 +0000 (17:52 +0100)]
net: hns3: Fix setting mac address error

When doing function reset or insmod hns3 dirver after rmmod,
the entries of mac vlan table are not cleared, which may cause
init mac address failed. This patch fixes it by clearing the
old mac address when doing function reset or rmmod hns3 driver.

Fixes: 76ad4f0ee747 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: Add repeat address checking for setting mac address
Jian Shen [Fri, 1 Jun 2018 16:52:03 +0000 (17:52 +0100)]
net: hns3: Add repeat address checking for setting mac address

Add checking for new mac address. It doesn't need to config
the mac vlan table if it's already in use.

Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: Add support for IFF_ALLMULTI flag
Peng Li [Fri, 1 Jun 2018 16:52:02 +0000 (17:52 +0100)]
net: hns3: Add support for IFF_ALLMULTI flag

This patch adds support for IFF_ALLMULTI flag to HNS3 PF and VF
driver.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: Disable vf vlan filter when vf vlan table is full
Yunsheng Lin [Fri, 1 Jun 2018 16:52:01 +0000 (17:52 +0100)]
net: hns3: Disable vf vlan filter when vf vlan table is full

This is only 128 entries for hardware's vf vlan table, when
the vf table is full, the firmware will disable the vf vlan
filter and return a resp_code of HCLGE_VF_VLAN_NO_ENTRY to
driver.

This patch checks the if resp_code from firmware is
HCLGE_VF_VLAN_NO_ENTRY, if yes, then print a warning and
return ok to the caller.

Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'mirror-to-gretap-tests'
David S. Miller [Fri, 1 Jun 2018 18:11:06 +0000 (14:11 -0400)]
Merge branch 'mirror-to-gretap-tests'

Petr Machata says:

====================
Test mirror-to-gretap with bridge in UL

This patchset adds more tests to the mirror-to-gretap suite where bridge
is present in the underlay. Specifically it adds tests for bridge VLAN
handling, FDB, and bridge port STP status.

In patches #1-#3, the codebase is refactored to support the new tests.

In patch #4, an STP test is added to the mirroring library, that will
later be called from bridge tests.

In patches #5-#8, the test for mirror-to-gretap with an 802.1q bridge in
underlay is adapted and more tests are added.

In patch #9, an STP test is added to the test suite for mirror-to-gretap
with an 802.1d bridge in underlay.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests: forwarding: mirror_gre_bridge_1d_vlan: Add STP test
Petr Machata [Thu, 31 May 2018 17:52:47 +0000 (19:52 +0200)]
selftests: forwarding: mirror_gre_bridge_1d_vlan: Add STP test

To test offloading of mirror-to-gretap in mlxsw for cases that a
VLAN-unaware bridge is in underlay packet path, test that the STP status
of bridge egress port is reflected.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests: forwarding: mirror_gre_vlan_bridge_1q: Add more tests
Petr Machata [Thu, 31 May 2018 17:52:42 +0000 (19:52 +0200)]
selftests: forwarding: mirror_gre_vlan_bridge_1q: Add more tests

Offloading of mirror-to-gretap in mlxsw is tricky especially in cases
when the gretap underlay involves bridges. Add more tests that exercise
the bridge handling code:

- forbidden_egress tests that check vlan removal on bridge port in the
  underlay packet path
- untagged_egress tests that similarly check "egress untagged"
- fdb_roaming tests that check whether learning FDB on a different port
  is reflected
- stp tests for handling port STP status of bridge egress port

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests: forwarding: mirror_gre_vlan_bridge_1q: Rename two tests
Petr Machata [Thu, 31 May 2018 17:52:37 +0000 (19:52 +0200)]
selftests: forwarding: mirror_gre_vlan_bridge_1q: Rename two tests

Rename test_gretap_forbidden() and test_ip6gretap_forbidden() to a more
specific test_gretap_forbidden_cpu() and test_ip6gretap_forbidden_cpu().
This will make it clearer which is which when further down a patch is
introduced that forbids a VLAN on regular bridge port.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests: forwarding: mirror_gre_vlan_bridge_1q: Test final config
Petr Machata [Thu, 31 May 2018 17:52:32 +0000 (19:52 +0200)]
selftests: forwarding: mirror_gre_vlan_bridge_1q: Test final config

After the final change reestablishes the original configuration, make
sure the traffic flows again as it should.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests: forwarding: mirror_gre_vlan_bridge_1q: Fix tunnel name
Petr Machata [Thu, 31 May 2018 17:52:26 +0000 (19:52 +0200)]
selftests: forwarding: mirror_gre_vlan_bridge_1q: Fix tunnel name

The "ip6gretap" in the test name refers to the tunnel device type that
the test is supposed to be testing. However test_ip6gretap_forbidden()
tests, due to a typo, a gretap tunnel. Fix the typo.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests: forwarding: mirror_gre_lib: Add STP test
Petr Machata [Thu, 31 May 2018 17:52:20 +0000 (19:52 +0200)]
selftests: forwarding: mirror_gre_lib: Add STP test

Add a reusable full test that toggles STP state of a given bridge port
and checks that the mirroring reacts appropriately. The test will be
used by bridge tests in follow-up patches.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests: forwarding: mirror_lib: skip_hw the VLAN capture
Petr Machata [Thu, 31 May 2018 17:52:15 +0000 (19:52 +0200)]
selftests: forwarding: mirror_lib: skip_hw the VLAN capture

When the VLAN capture is installed on a front panel device and not a
soft device, the packets are counted twice: once in fast path, and once
after they are trapped to the kernel. Resolve the problem by passing
skip_hw flag to vlan_capture_install().

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests: forwarding: mirror_lib: Move here do_test_span_vlan_dir_ips()
Petr Machata [Thu, 31 May 2018 17:52:09 +0000 (19:52 +0200)]
selftests: forwarding: mirror_lib: Move here do_test_span_vlan_dir_ips()

Move the function do_test_span_vlan_dir_ips() from mirror_vlan.sh test
to a library file mirror_lib.sh to allow reuse. Fill in other entry
points similar to other testing functions in mirror_lib.sh, they will be
useful in following patches.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests: forwarding: lib: Move here vlan_capture_{, un}install()
Petr Machata [Thu, 31 May 2018 17:52:02 +0000 (19:52 +0200)]
selftests: forwarding: lib: Move here vlan_capture_{, un}install()

Move vlan_capture_install() and vlan_capture_uninstall() from
mirror_vlan.sh test to lib.sh so that it can be reused in other tests.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: usb: cdc_mbim: add flag FLAG_SEND_ZLP
Daniele Palmas [Thu, 31 May 2018 09:18:29 +0000 (11:18 +0200)]
net: usb: cdc_mbim: add flag FLAG_SEND_ZLP

Testing Telit LM940 with ICMP packets > 14552 bytes revealed that
the modem needs FLAG_SEND_ZLP to properly work, otherwise the cdc
mbim data interface won't be anymore responsive.

Signed-off-by: Daniele Palmas <dnlplm@gmail.com>
Acked-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'tunnel-mtus'
David S. Miller [Fri, 1 Jun 2018 17:56:31 +0000 (13:56 -0400)]
Merge branch 'tunnel-mtus'

Nicolas Dichtel says:

====================
ip[6] tunnels: fix mtu calculations

The first patch restores the possibility to bind an ip4 tunnel to an
interface whith a large mtu.
The second patch was spotted after the first fix. I also target it to net
because it fixes the max mtu value that can be used for ipv6 tunnels.

v2: remove the 0xfff8 in ip_tunnel_newlink()
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoip6_tunnel: remove magic mtu value 0xFFF8
Nicolas Dichtel [Thu, 31 May 2018 08:59:33 +0000 (10:59 +0200)]
ip6_tunnel: remove magic mtu value 0xFFF8

I don't know where this value comes from (probably a copy and paste and
paste and paste ...).
Let's use standard values which are a bit greater.

Link: https://git.kernel.org/pub/scm/linux/kernel/git/davem/netdev-vger-cvs.git/commit/?id=e5afd356a411a
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoip_tunnel: restore binding to ifaces with a large mtu
Nicolas Dichtel [Thu, 31 May 2018 08:59:32 +0000 (10:59 +0200)]
ip_tunnel: restore binding to ifaces with a large mtu

After commit f6cc9c054e77, the following conf is broken (note that the
default loopback mtu is 65536, ie IP_MAX_MTU + 1):

$ ip tunnel add gre1 mode gre local 10.125.0.1 remote 10.125.0.2 dev lo
add tunnel "gre0" failed: Invalid argument
$ ip l a type dummy
$ ip l s dummy1 up
$ ip l s dummy1 mtu 65535
$ ip tunnel add gre1 mode gre local 10.125.0.1 remote 10.125.0.2 dev dummy1
add tunnel "gre0" failed: Invalid argument

dev_set_mtu() doesn't allow to set a mtu which is too large.
First, let's cap the mtu returned by ip_tunnel_bind_dev(). Second, remove
the magic value 0xFFF8 and use IP_MAX_MTU instead.
0xFFF8 seems to be there for ages, I don't know why this value was used.

With a recent kernel, it's also possible to set a mtu > IP_MAX_MTU:
$ ip l s dummy1 mtu 66000
After that patch, it's also possible to bind an ip tunnel on that kind of
interface.

CC: Petr Machata <petrm@mellanox.com>
CC: Ido Schimmel <idosch@mellanox.com>
Link: https://git.kernel.org/pub/scm/linux/kernel/git/davem/netdev-vger-cvs.git/commit/?id=e5afd356a411a
Fixes: f6cc9c054e77 ("ip_tunnel: Emit events for post-register MTU changes")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
David S. Miller [Fri, 1 Jun 2018 17:25:41 +0000 (13:25 -0400)]
Merge branch 'master' of git://git./linux/kernel/git/klassert/ipsec

Steffen Klassert says:

====================
pull request (net): ipsec 2018-05-31

1) Avoid possible overflow of the offset variable
   in  _decode_session6(), this fixes an infinite
   lookp there. From Eric Dumazet.

2) We may use an error pointer in the error path of
   xfrm_bundle_create(). Fix this by returning this
   pointer directly to the caller.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: mvpp2: Split the PPv2 driver to a dedicated directory
Maxime Chevallier [Thu, 31 May 2018 08:07:43 +0000 (10:07 +0200)]
net: mvpp2: Split the PPv2 driver to a dedicated directory

As the mvpp2 driver is growing, move this driver to a dedicated
directory and split it into several files.

Since this driver has a lot of register defines and structure
definitions, it can benefit from having all of this into a dedicated
header file, named mvpp2.h.

A good chunk of the mvpp2 code is dedicated to Header Parser handling, so
we introduce mvpp2_prs.h where all Header Parser definitions are located,
and mvpp2_prs.c containing the related code.

In the same way, mvpp2_cls.h and mvpp2_cls.c are created to contain
Classifier and RSS related code.

The former 'mvpp2.c' file is renamed 'mvpp2_main.c' so that we can keep
the driver binary named 'mvpp2'.

This commit is only about spliting the driver into multiple files and
doesn't introduce any new function, feature or fix besides removing
'static' keywords when needed.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Tested-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: dsa: b53: Add BCM5389 support
Damien Thébault [Thu, 31 May 2018 07:04:01 +0000 (07:04 +0000)]
net: dsa: b53: Add BCM5389 support

This patch adds support for the BCM5389 switch connected through MDIO.

Signed-off-by: Damien Thébault <damien.thebault@vitec.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: sched: split tc_ctl_tfilter into three handlers
Vlad Buslov [Thu, 31 May 2018 06:52:53 +0000 (09:52 +0300)]
net: sched: split tc_ctl_tfilter into three handlers

tc_ctl_tfilter handles three netlink message types: RTM_NEWTFILTER,
RTM_DELTFILTER, RTM_GETTFILTER. However, implementation of this function
involves a lot of branching on specific message type because most of the
code is message-specific. This significantly complicates adding new
functionality and doesn't provide much benefit of code reuse.

Split tc_ctl_tfilter to three standalone functions that handle filter new,
delete and get requests.

The only truly protocol independent part of tc_ctl_tfilter is code that
looks up queue, class, and block. Refactor this code to standalone
tcf_block_find function that is used by all three new handlers.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agortnetlink: Fix null-ptr-deref in rtnl_newlink
Prashant Bhole [Fri, 1 Jun 2018 08:16:58 +0000 (17:16 +0900)]
rtnetlink: Fix null-ptr-deref in rtnl_newlink

In rtnl_newlink(), NULL check is performed on m_ops however member of
ops is accessed. Fixed by accessing member of m_ops instead of ops.

[  345.432629] BUG: KASAN: null-ptr-deref in rtnl_newlink+0x400/0x1110
[  345.432629] Read of size 4 at addr 0000000000000088 by task ip/986
[  345.432629]
[  345.432629] CPU: 1 PID: 986 Comm: ip Not tainted 4.17.0-rc6+ #9
[  345.432629] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[  345.432629] Call Trace:
[  345.432629]  dump_stack+0xc6/0x150
[  345.432629]  ? dump_stack_print_info.cold.0+0x1b/0x1b
[  345.432629]  ? kasan_report+0xb4/0x410
[  345.432629]  kasan_report.cold.4+0x8f/0x91
[  345.432629]  ? rtnl_newlink+0x400/0x1110
[  345.432629]  rtnl_newlink+0x400/0x1110
[...]

Fixes: ccf8dbcd062a ("rtnetlink: Remove VLA usage")
Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Tested-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agokcm: Fix use-after-free caused by clonned sockets
Kirill Tkhai [Fri, 1 Jun 2018 11:30:38 +0000 (14:30 +0300)]
kcm: Fix use-after-free caused by clonned sockets

(resend for properly queueing in patchwork)

kcm_clone() creates kernel socket, which does not take net counter.
Thus, the net may die before the socket is completely destructed,
i.e. kcm_exit_net() is executed before kcm_done().

Reported-by: syzbot+5f1a04e374a635efc426@syzkaller.appspotmail.com
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoipvs: add ipv6 support to ftp
Julian Anastasov [Fri, 25 May 2018 19:06:25 +0000 (22:06 +0300)]
ipvs: add ipv6 support to ftp

Add support for FTP commands with extended format (RFC 2428):

- FTP EPRT: IPv4 and IPv6, active mode, similar to PORT
- FTP EPSV: IPv4 and IPv6, passive mode, similar to PASV.
EPSV response usually contains only port but we allow real
server to provide different address

We restrict control and data connection to be from same
address family.

Allow the "(" and ")" to be optional in PASV response.

Also, add ipvsh argument to the pkt_in/pkt_out handlers to better
access the payload after transport header.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
6 years agoipvs: add full ipv6 support to nfct
Julian Anastasov [Fri, 25 May 2018 19:06:24 +0000 (22:06 +0300)]
ipvs: add full ipv6 support to nfct

Prepare NFCT to support IPv6 for FTP:

- Do not restrict the expectation callback to PF_INET

- Split the debug messages, so that the 160-byte limitation
in IP_VS_DBG_BUF is not exceeded when printing many IPv6
addresses. This means no more than 3 addresses in one message,
i.e. 1 tuple with 2 addresses or 1 connection with 3 addresses.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
6 years agonetfilter: nft_fwd_netdev: allow to forward packets via neighbour layer
Pablo Neira Ayuso [Wed, 30 May 2018 23:58:00 +0000 (01:58 +0200)]
netfilter: nft_fwd_netdev: allow to forward packets via neighbour layer

This allows us to forward packets from the netdev family via neighbour
layer, so you don't need an explicit link-layer destination when using
this expression from rules. The ttl/hop_limit field is decremented.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
6 years agonetfilter: nfnetlink: Remove VLA usage
Kees Cook [Wed, 30 May 2018 19:17:56 +0000 (12:17 -0700)]
netfilter: nfnetlink: Remove VLA usage

In the quest to remove all stack VLA usage from the kernel[1], this
allocates the maximum size expected for all possible attrs and adds
sanity-checks at both registration and usage to make sure nothing
gets out of sync.

[1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
6 years agonetfilter: nf_flow_table: attach dst to skbs
Jason A. Donenfeld [Wed, 30 May 2018 18:43:15 +0000 (20:43 +0200)]
netfilter: nf_flow_table: attach dst to skbs

Some drivers, such as vxlan and wireguard, use the skb's dst in order to
determine things like PMTU. They therefore loose functionality when flow
offloading is enabled. So, we ensure the skb has it before xmit'ing it
in the offloading path.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
6 years agonetfilter: nf_tables: fix chain dependency validation
Pablo Neira Ayuso [Wed, 30 May 2018 18:18:57 +0000 (20:18 +0200)]
netfilter: nf_tables: fix chain dependency validation

The following ruleset:

 add table ip filter
 add chain ip filter input { type filter hook input priority 4; }
 add chain ip filter ap
 add rule ip filter input jump ap
 add rule ip filter ap masquerade

results in a panic, because the masquerade extension should be rejected
from the filter chain. The existing validation is missing a chain
dependency check when the rule is added to the non-base chain.

This patch fixes the problem by walking down the rules from the
basechains, searching for either immediate or lookup expressions, then
jumping to non-base chains and again walking down the rules to perform
the expression validation, so we make sure the full ruleset graph is
validated. This is done only once from the commit phase, in case of
problem, we abort the transaction and perform fine grain validation for
error reporting. This patch requires 003087911af2 ("netfilter:
nfnetlink: allow commit to fail") to achieve this behaviour.

This patch also adds a cleanup callback to nfnl batch interface to reset
the validate state from the exit path.

As a result of this patch, nf_tables_check_loops() doesn't use
->validate to check for loops, instead it just checks for immediate
expressions.

Reported-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>