platform/kernel/linux-rpi.git
7 years agonet: fix build error in devmap helper calls
John Fastabend [Tue, 18 Jul 2017 04:56:48 +0000 (21:56 -0700)]
net: fix build error in devmap helper calls

Initial patches missed case with CONFIG_BPF_SYSCALL not set.

Fixes: 11393cc9b9be ("xdp: Add batching support to redirect map")
Fixes: 97f91a7cf04f ("bpf: add bpf_redirect_map helper routine")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomdio_bus: Remove unneeded gpiod NULL check
Fabio Estevam [Mon, 17 Jul 2017 21:09:09 +0000 (18:09 -0300)]
mdio_bus: Remove unneeded gpiod NULL check

The gpiod API checks for NULL descriptors, so there is no need to
duplicate the check in the driver.

Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosamples/bpf: add option for native and skb mode for redirect apps
Andy Gospodarek [Mon, 17 Jul 2017 20:14:19 +0000 (16:14 -0400)]
samples/bpf: add option for native and skb mode for redirect apps

When testing with a driver that has both native and generic redirect support:

$ sudo ./samples/bpf/xdp_redirect -N 5 6
input: 5 output: 6
ifindex 6:    4961879 pkt/s
ifindex 6:    6391319 pkt/s
ifindex 6:    6419468 pkt/s

$ sudo ./samples/bpf/xdp_redirect -S 5 6
input: 5 output: 6
ifindex 6:    1845435 pkt/s
ifindex 6:    3882850 pkt/s
ifindex 6:    3893974 pkt/s

$ sudo ./samples/bpf/xdp_redirect_map -N 5 6
input: 5 output: 6
map[0] (vports) = 4, map[1] (map) = 5, map[2] (count) = 0
ifindex 6:    2207374 pkt/s
ifindex 6:    6212869 pkt/s
ifindex 6:    6286515 pkt/s

$ sudo ./samples/bpf/xdp_redirect_map -S 5 6
input: 5 output: 6
map[0] (vports) = 4, map[1] (map) = 5, map[2] (count) = 0
ifindex 6:    5052528 pkt/s
ifindex 6:    5736631 pkt/s
ifindex 6:    5739962 pkt/s

Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: ec_bhf: constify pci_device_id.
Arvind Yadav [Mon, 17 Jul 2017 18:12:34 +0000 (23:42 +0530)]
net: ec_bhf: constify pci_device_id.

pci_device_id are not supposed to change at runtime. All functions
working with pci_device_id provided by <linux/pci.h> work with
const pci_device_id. So mark the non-const structs as const.

File size before:
   text    data     bss     dec     hex filename
   5113     384       0    5497    1579 drivers/net/ethernet/ec_bhf.o

File size After adding 'const':
   text    data     bss     dec     hex filename
   5177     320       0    5497    1579 drivers/net/ethernet/ec_bhf.o

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: cadence: macb: constify pci_device_id.
Arvind Yadav [Mon, 17 Jul 2017 18:11:52 +0000 (23:41 +0530)]
net: cadence: macb: constify pci_device_id.

pci_device_id are not supposed to change at runtime. All functions
working with pci_device_id provided by <linux/pci.h> work with
const pci_device_id. So mark the non-const structs as const.

File size before:
   text    data     bss     dec     hex filename
    791     336       0    1127     467 net/ethernet/cadence/macb_pci.o

File size After adding 'const':
   text    data     bss     dec     hex filename
    855     272       0    1127     467 net/ethernet/cadence/macb_pci.o

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: Revert "net: add function to allocate sk_buff head without data area"
Florian Westphal [Mon, 17 Jul 2017 16:56:54 +0000 (18:56 +0200)]
net: Revert "net: add function to allocate sk_buff head without data area"

It was added for netlink mmap tx, there are no callers in the tree.
The commit also added a check for skb->head != NULL in kfree_skb path,
remove that too -- all skbs ought to have skb->head set.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'net-ufo-remove'
David S. Miller [Mon, 17 Jul 2017 16:53:05 +0000 (09:53 -0700)]
Merge branch 'net-ufo-remove'

David S. Miller says:

====================
net: Remove UDP Fragmentation Offload support

This is a patch series, based upon some discussions with various
developers, that removes UFO offloading.

Very few devices support this operation, it's usefullness is
quesitonable at best, and it adds a non-trivial amount of
complexity to our data paths.

v2: Delete more code thanks to feedback from Willem.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: Kill NETIF_F_UFO and SKB_GSO_UDP.
David S. Miller [Mon, 3 Jul 2017 14:31:57 +0000 (07:31 -0700)]
net: Kill NETIF_F_UFO and SKB_GSO_UDP.

No longer used.

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoinet: Remove software UFO fragmenting code.
David S. Miller [Fri, 7 Jul 2017 09:30:55 +0000 (10:30 +0100)]
inet: Remove software UFO fragmenting code.

Rename udp{4,6}_ufo_fragment() to udp{4,6}_tunnel_segment() and only
handle tunnel segmentation.

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: Remove all references to SKB_GSO_UDP.
David S. Miller [Mon, 3 Jul 2017 14:29:12 +0000 (07:29 -0700)]
net: Remove all references to SKB_GSO_UDP.

Such packets are no longer possible.

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoinet: Stop generating UFO packets.
David S. Miller [Mon, 3 Jul 2017 14:07:18 +0000 (07:07 -0700)]
inet: Stop generating UFO packets.

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: Remove references to NETIF_F_UFO from ethtool.
David S. Miller [Mon, 3 Jul 2017 14:04:34 +0000 (07:04 -0700)]
net: Remove references to NETIF_F_UFO from ethtool.

It is going away.

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: Remove references to NETIF_F_UFO in netdev_fix_features().
David S. Miller [Mon, 3 Jul 2017 14:04:22 +0000 (07:04 -0700)]
net: Remove references to NETIF_F_UFO in netdev_fix_features().

It is going away.

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agovirtio_net: Remove references to NETIF_F_UFO.
David S. Miller [Mon, 3 Jul 2017 13:37:32 +0000 (06:37 -0700)]
virtio_net: Remove references to NETIF_F_UFO.

It is going away.

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agodummy: Remove references to NETIF_F_UFO.
David S. Miller [Mon, 3 Jul 2017 13:36:07 +0000 (06:36 -0700)]
dummy: Remove references to NETIF_F_UFO.

It is going away.

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agotun/tap: Remove references to NETIF_F_UFO.
David S. Miller [Mon, 3 Jul 2017 13:35:32 +0000 (06:35 -0700)]
tun/tap: Remove references to NETIF_F_UFO.

It is going away.

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomacvlan/macvtap: Remove NETIF_F_UFO advertisement.
David S. Miller [Mon, 3 Jul 2017 13:33:08 +0000 (06:33 -0700)]
macvlan/macvtap: Remove NETIF_F_UFO advertisement.

It is going away.

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoipvlan: Stop advertising NETIF_F_UFO support.
David S. Miller [Mon, 3 Jul 2017 13:32:14 +0000 (06:32 -0700)]
ipvlan: Stop advertising NETIF_F_UFO support.

It is going away.

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomacb: Remove bogus reference to NETIF_F_UFO.
David S. Miller [Mon, 3 Jul 2017 13:31:05 +0000 (06:31 -0700)]
macb: Remove bogus reference to NETIF_F_UFO.

This driver doesn't actually support UFO explicitly yet
it advertises this in netdev->features.

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agos2io: Remove UFO support.
David S. Miller [Mon, 3 Jul 2017 13:28:56 +0000 (06:28 -0700)]
s2io: Remove UFO support.

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'xdp-redirect'
David S. Miller [Mon, 17 Jul 2017 16:48:07 +0000 (09:48 -0700)]
Merge branch 'xdp-redirect'

John Fastabend says:

====================
Implement XDP bpf_redirect

This series adds two new XDP helper routines bpf_redirect() and
bpf_redirect_map(). The first variant bpf_redirect() is meant
to be used the same way it is currently being used by the cls_bpf
classifier. An xdp packet will be redirected immediately when this
is called.

The other variant bpf_redirect_map(map, key, flags) uses a new
map type called devmap. A devmap uses integers as keys and
net_devices as values. The user provies key/ifindex pairs to
update the map with new net_devices. This provides two benefits
over the normal variant 'bpf_redirect()'. First the datapath
bpf program is abstracted away from using hard-coded ifindex
values. Allowing a single bpf program to be run any many different
environments. Second, and perhaps more important, the map enables
batching packet transmits. The map plus small driver changes
allows for batching all send requests across a NAPI poll loop.
This allows driver writers to optimize the driver xmit path
and only call expensive operations once for a batch of xdp_buffs.

The devmap was designed to support possible future work for
multicast and broadcast as follow-up patches.

To see, in more detail, how to leverage the new helpers and
map from the userspace side please review these two patches,

  xdp: sample program for new bpf_redirect helper
  xdp: bpf redirect with map sample program

Performance numbers provided by Jesper are the following, tested
using the ixgbe driver with CPU E5-1650 v4 @ 3.60GHz:

13,939,674 pkt/s = XDP_DROP without touching memory
14,290,650 pkt/s = xdp1: XDP_DROP with reading packet data
13,221,812 pkt/s = xdp2: XDP_TX   with swap mac (writes into pkt)
 7,596,576 pkt/s = xdp_redirect:    XDP_REDIRECT with swap mac (like XDP_TX)
13,058,435 pkt/s = xdp_redirect_map:XDP_REDIRECT with swap mac + devmap

A big thanks to everyone who helped with this series. Jesper
provided fixes, debugging, code review, performance benchmarks!
Daniel provided lots of useful feedback and code review. And last
but not least Andy provided useful feedback related to supporting
additional drivers, generic xdp implementation, testing, etc. Any
other feedback is welcome but I believe at this point these are
ready to be merged!

Whats left... get the rest of the drivers developers to implement
this in all the drivers.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoxdp: bpf redirect with map sample program
John Fastabend [Mon, 17 Jul 2017 16:30:25 +0000 (09:30 -0700)]
xdp: bpf redirect with map sample program

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Tested-by: Andy Gospodarek <andy@greyhouse.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: add notifier hooks for devmap bpf map
John Fastabend [Mon, 17 Jul 2017 16:30:02 +0000 (09:30 -0700)]
net: add notifier hooks for devmap bpf map

The BPF map devmap holds a refcnt on the net_device structure when
it is in the map. We need to do this to ensure on driver unload we
don't lose a dev reference.

However, its not very convenient to have to manually unload the map
when destroying a net device so add notifier handlers to do the cleanup
automatically. But this creates a race between update/destroy BPF
syscall and programs and the unregister netdev hook.

Unfortunately, the best I could come up with is either to live with
requiring manual removal of net devices from the map before removing
the net device OR to add a mutex in devmap to ensure the map is not
modified while we are removing a device. The fallout also requires
that BPF programs no longer update/delete the map from the BPF program
side because the mutex may sleep and this can not be done from inside
an rcu critical section.  This is not a real problem though because I
have not come up with any use cases where this is actually useful in
practice. If/when we come up with a compelling user for this we may
need to revisit this.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoxdp: Add batching support to redirect map
John Fastabend [Mon, 17 Jul 2017 16:29:40 +0000 (09:29 -0700)]
xdp: Add batching support to redirect map

For performance reasons we want to avoid updating the tail pointer in
the driver tx ring as much as possible. To accomplish this we add
batching support to the redirect path in XDP.

This adds another ndo op "xdp_flush" that is used to inform the driver
that it should bump the tail pointer on the TX ring.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobpf: add bpf_redirect_map helper routine
John Fastabend [Mon, 17 Jul 2017 16:29:18 +0000 (09:29 -0700)]
bpf: add bpf_redirect_map helper routine

BPF programs can use the devmap with a bpf_redirect_map() helper
routine to forward packets to netdevice in map.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobpf: add devmap, a map for storing net device references
John Fastabend [Mon, 17 Jul 2017 16:28:56 +0000 (09:28 -0700)]
bpf: add devmap, a map for storing net device references

Device map (devmap) is a BPF map, primarily useful for networking
applications, that uses a key to lookup a reference to a netdevice.

The map provides a clean way for BPF programs to build virtual port
to physical port maps. Additionally, it provides a scoping function
for the redirect action itself allowing multiple optimizations. Future
patches will leverage the map to provide batching at the XDP layer.

Another optimization/feature, that is not yet implemented, would be
to support multiple netdevices per key to support efficient multicast
and broadcast support.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoxdp: add trace event for xdp redirect
John Fastabend [Mon, 17 Jul 2017 16:28:35 +0000 (09:28 -0700)]
xdp: add trace event for xdp redirect

This adds a trace event for xdp redirect which may help when debugging
XDP programs that use redirect bpf commands.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoixgbe: add initial support for xdp redirect
John Fastabend [Mon, 17 Jul 2017 16:28:12 +0000 (09:28 -0700)]
ixgbe: add initial support for xdp redirect

There are optimizations we can add after the basic feature is
enabled. But, for now keep the patch simple.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: implement XDP_REDIRECT for xdp generic
John Fastabend [Mon, 17 Jul 2017 16:27:50 +0000 (09:27 -0700)]
net: implement XDP_REDIRECT for xdp generic

Add support for redirect to xdp generic creating a fall back for
devices that do not yet have support and allowing test infrastructure
using veth pairs to be built.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Tested-by: Andy Gospodarek <andy@greyhouse.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoxdp: sample program for new bpf_redirect helper
John Fastabend [Mon, 17 Jul 2017 16:27:28 +0000 (09:27 -0700)]
xdp: sample program for new bpf_redirect helper

This implements a sample program for testing bpf_redirect. It reports
the number of packets redirected per second and as input takes the
ifindex of the device to run the xdp program on and the ifindex of the
interface to redirect packets to.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Tested-by: Andy Gospodarek <andy@greyhouse.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoxdp: add bpf_redirect helper function
John Fastabend [Mon, 17 Jul 2017 16:27:07 +0000 (09:27 -0700)]
xdp: add bpf_redirect helper function

This adds support for a bpf_redirect helper function to the XDP
infrastructure. For now this only supports redirecting to the egress
path of a port.

In order to support drivers handling a xdp_buff natively this patches
uses a new ndo operation ndo_xdp_xmit() that takes pushes a xdp_buff
to the specified device.

If the program specifies either (a) an unknown device or (b) a device
that does not support the operation a BPF warning is thrown and the
XDP_ABORTED error code is returned.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: xdp: support xdp generic on virtual devices
John Fastabend [Mon, 17 Jul 2017 16:26:45 +0000 (09:26 -0700)]
net: xdp: support xdp generic on virtual devices

XDP generic allows users to test XDP programs and/or run them with
degraded performance on devices that do not yet support XDP. For
testing I typically test eBPF programs using a set of veth devices.
This allows testing topologies that would otherwise be difficult to
setup especially in the early stages of development.

This patch adds a xdp generic hook to the netif_rx_internal()
function which is called from dev_forward_skb(). With this addition
attaching XDP programs to veth devices works as expected! Also I
noticed multiple drivers using netif_rx(). These devices will also
benefit and generic XDP will work for them as well.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Tested-by: Andy Gospodarek <andy@greyhouse.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoixgbe: NULL xdp_tx rings on resource cleanup
John Fastabend [Mon, 17 Jul 2017 16:26:24 +0000 (09:26 -0700)]
ixgbe: NULL xdp_tx rings on resource cleanup

tx_rings and rx_rings are cleaned up on close paths in ixgbe driver
however, xdp_rings are not. Set the xdp_rings to NULL here so that
we can use the pointer to indicate if the XDP rings are initialized.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'mlxsw-traps'
David S. Miller [Mon, 17 Jul 2017 16:19:40 +0000 (09:19 -0700)]
Merge branch 'mlxsw-traps'

Jiri Pirko says:

====================
mlxsw: Traps enhancements

Ido says:

The first patch makes sure the driver marks packets that were trapped
in the router and might have already been flooded by the bridge, so that
the bridge driver won't flood them again. This isn't critical at this time
point, but will be when Neighbour Discovery traps are introduced as these
are multicast packets that are trapped in the router.

The second and third patches add new traps - for MLD and Router Alert
packets. The last patch takes advantage of that and floods IPv6
unregistered multicast packets only to mrouter ports instead of all ports.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum: Improve IPv6 unregistered multicast flooding
Arkadi Sharshevsky [Mon, 17 Jul 2017 12:15:32 +0000 (14:15 +0200)]
mlxsw: spectrum: Improve IPv6 unregistered multicast flooding

Up until now IPv6 unregistered multicast traffic would be flooded like
broadcast, even when MLD snooping was enabled on the bridge. This was
intentional as MLD packet traps were missing, preventing the bridge
driver from programming MDB entries to the device.

Previous patch added these traps, so we can now finally flood IPv6
unregistered multicast packets to specific ports via the multicast table
instead of flooding them to all ports via the broadcast table.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum: Add support for IPv6 MLDv1/2 traps
Arkadi Sharshevsky [Mon, 17 Jul 2017 12:15:31 +0000 (14:15 +0200)]
mlxsw: spectrum: Add support for IPv6 MLDv1/2 traps

Add support for IPv6 MLDv1/2 packet trapping.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum: Trap IPv4 packets with Router Alert option
Ido Schimmel [Mon, 17 Jul 2017 12:15:30 +0000 (14:15 +0200)]
mlxsw: spectrum: Trap IPv4 packets with Router Alert option

In case local sockets have the IP_ROUTER_ALERT socket option set, then
they expect to get packets with the Router Alert option.

Trap such packets, so that the kernel could inspect them and potentially
send them to interested sockets.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum: Mark packets trapped in router
Ido Schimmel [Mon, 17 Jul 2017 12:15:29 +0000 (14:15 +0200)]
mlxsw: spectrum: Mark packets trapped in router

In commit 1c6c6d221e2b ("mlxsw: spectrum: Mirror certain packets to
CPU") we marked packets that were mirrored to the CPU, so that they
won't be flooded again by the bridge driver.

However, certain packets are trapped in the device's router block, after
passing through the bridge block where they were potentially flooded.

Mark all packets coming from L3 traps, so that they won't be potentially
flooded again by the bridge driver.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'mlxsw-ttl-tos'
David S. Miller [Mon, 17 Jul 2017 16:18:24 +0000 (09:18 -0700)]
Merge branch 'mlxsw-ttl-tos'

Jiri Pirko says:

====================
mlxsw: offloading matches on ip ttl and tos

Or says:

Support offloading matches on ip ttl and tos
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum_flower: Add support for ip tos
Or Gerlitz [Mon, 17 Jul 2017 12:07:31 +0000 (14:07 +0200)]
mlxsw: spectrum_flower: Add support for ip tos

Support offloading rules that match on ip tos.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum: Add tos to the ipv4 acl block
Or Gerlitz [Mon, 17 Jul 2017 12:07:30 +0000 (14:07 +0200)]
mlxsw: spectrum: Add tos to the ipv4 acl block

Add ecn and dscp fields to the ipv4 acl block.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: acl: Add ip tos acl element
Or Gerlitz [Mon, 17 Jul 2017 12:07:29 +0000 (14:07 +0200)]
mlxsw: acl: Add ip tos acl element

Define new element for ip tos (ecn, dscp) and place it into scratch area.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum_flower: Add support for ip ttl
Or Gerlitz [Mon, 17 Jul 2017 12:07:28 +0000 (14:07 +0200)]
mlxsw: spectrum_flower: Add support for ip ttl

Support offloading rules that match on ip ttl.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum: Add ttl to the ipv4 acl block
Or Gerlitz [Mon, 17 Jul 2017 12:07:27 +0000 (14:07 +0200)]
mlxsw: spectrum: Add ttl to the ipv4 acl block

Add ttl field to the ipv4 acl block.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: acl: Add ip ttl acl element
Or Gerlitz [Mon, 17 Jul 2017 12:07:26 +0000 (14:07 +0200)]
mlxsw: acl: Add ip ttl acl element

Define new element for ip ttl and place it into scratch area.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoinetpeer: remove AVL implementation in favor of RB tree
Eric Dumazet [Mon, 17 Jul 2017 09:56:10 +0000 (02:56 -0700)]
inetpeer: remove AVL implementation in favor of RB tree

As discussed in Faro during Netfilter Workshop 2017, RB trees can be
used with RCU, using a seqlock.

Note that net/rxrpc/conn_service.c is already using this.

This patch converts inetpeer from AVL tree to RB tree, since it allows
to remove private AVL implementation in favor of shared RB code.

$ size net/ipv4/inetpeer.before net/ipv4/inetpeer.after
   text    data     bss     dec     hex filename
   3195      40     128    3363     d23 net/ipv4/inetpeer.before
   1562      24       0    1586     632 net/ipv4/inetpeer.after

The same technique can be used to speed up
net/netfilter/nft_set_rbtree.c (removing rwlock contention in fast path)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/unix: drop obsolete fd-recursion limits
David Herrmann [Mon, 17 Jul 2017 09:35:54 +0000 (11:35 +0200)]
net/unix: drop obsolete fd-recursion limits

All unix sockets now account inflight FDs to the respective sender.
This was introduced in:

    commit 712f4aad406bb1ed67f3f98d04c044191f0ff593
    Author: willy tarreau <w@1wt.eu>
    Date:   Sun Jan 10 07:54:56 2016 +0100

        unix: properly account for FDs passed over unix sockets

and further refined in:

    commit 415e3d3e90ce9e18727e8843ae343eda5a58fad6
    Author: Hannes Frederic Sowa <hannes@stressinduktion.org>
    Date:   Wed Feb 3 02:11:03 2016 +0100

        unix: correctly track in-flight fds in sending process user_struct

Hence, regardless of the stacking depth of FDs, the total number of
inflight FDs is limited, and accounted. There is no known way for a
local user to exceed those limits or exploit the accounting.

Furthermore, the GC logic is independent of the recursion/stacking depth
as well. It solely depends on the total number of inflight FDs,
regardless of their layout.

Lastly, the current `recursion_level' suffers a TOCTOU race, since it
checks and inherits depths only at queue time. If we consider `A<-B' to
mean `queue-B-on-A', the following sequence circumvents the recursion
level easily:

    A<-B
       B<-C
          C<-D
             ...
               Y<-Z

resulting in:

    A<-B<-C<-...<-Z

With all of this in mind, lets drop the recursion limit. It has no
additional security value, anymore. On the contrary, it randomly
confuses message brokers that try to forward file-descriptors, since
any sendmsg(2) call can fail spuriously with ETOOMANYREFS if a client
maliciously modifies the FD while inflight.

Cc: Alban Crequy <alban.crequy@collabora.co.uk>
Cc: Simon McVittie <simon.mcvittie@collabora.co.uk>
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Reviewed-by: Tom Gundersen <teg@jklm.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoskbuff: optimize the pull_pages code in __pskb_pull_tail()
linzhang [Mon, 17 Jul 2017 09:25:02 +0000 (17:25 +0800)]
skbuff: optimize the pull_pages code in __pskb_pull_tail()

In the pull_pages code block, if the first frag size > eat,
we can end the loop in advance to avoid extra copy.

Signed-off-by: Lin Zhang <xiaolou4617@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agodt-bindings: net: ravb : Add support for r8a7743 SoC
Biju Das [Mon, 17 Jul 2017 08:33:52 +0000 (09:33 +0100)]
dt-bindings: net: ravb : Add support for r8a7743 SoC

Add a new compatible string for the RZ/G1M (R8A7743) SoC.

Signed-off-by: Biju Das <biju.das@bp.renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: axienet: add support for standard phy-mode binding
Alvaro G. M [Mon, 17 Jul 2017 07:12:28 +0000 (09:12 +0200)]
net: axienet: add support for standard phy-mode binding

Keep supporting proprietary "xlnx,phy-type" attribute and add support for
MII connectivity to the PHY.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Alvaro Gamez Machado <alvaro.gamez@hazent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'sctp-typedef-remove-part-2'
David S. Miller [Mon, 17 Jul 2017 03:52:15 +0000 (20:52 -0700)]
Merge branch 'sctp-typedef-remove-part-2'

Xin Long says:

====================
sctp: remove typedefs from structures part 2

As we know, typedef is suggested not to use in kernel, even checkpatch.pl
also gives warnings about it. Now sctp is using it for many structures.

All this kind of typedef's using should be removed. This patchset is the
part 2 to remove it for another 11 basic structures.

Just as the part 1, No any code's logic would be changed in these patches,
only cleaning up.

Note that v1->v2, nothing changed, just because net-next were closed when
posting v1.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosctp: remove the typedef sctp_hmac_algo_param_t
Xin Long [Mon, 17 Jul 2017 03:29:59 +0000 (11:29 +0800)]
sctp: remove the typedef sctp_hmac_algo_param_t

This patch is to remove the typedef sctp_hmac_algo_param_t, and
replace with struct sctp_hmac_algo_param in the places where it's
using this typedef.

It is also to use sizeof(variable) instead of sizeof(type).

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosctp: remove the typedef sctp_chunks_param_t
Xin Long [Mon, 17 Jul 2017 03:29:58 +0000 (11:29 +0800)]
sctp: remove the typedef sctp_chunks_param_t

This patch is to remove the typedef sctp_chunks_param_t, and
replace with struct sctp_chunks_param in the places where it's
using this typedef.

It is also to use sizeof(variable) instead of sizeof(type).

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosctp: remove the typedef sctp_random_param_t
Xin Long [Mon, 17 Jul 2017 03:29:57 +0000 (11:29 +0800)]
sctp: remove the typedef sctp_random_param_t

This patch is to remove the typedef sctp_random_param_t, and
replace with struct sctp_random_param in the places where it's
using this typedef.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosctp: remove the typedef sctp_supported_ext_param_t
Xin Long [Mon, 17 Jul 2017 03:29:56 +0000 (11:29 +0800)]
sctp: remove the typedef sctp_supported_ext_param_t

This patch is to remove the typedef sctp_supported_ext_param_t, and
replace with struct sctp_supported_ext_param in the places where it's
using this typedef.

It is also to use sizeof(variable) instead of sizeof(type).

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosctp: remove the typedef sctp_adaptation_ind_param_t
Xin Long [Mon, 17 Jul 2017 03:29:55 +0000 (11:29 +0800)]
sctp: remove the typedef sctp_adaptation_ind_param_t

This patch is to remove the typedef sctp_adaptation_ind_param_t, and
replace with struct sctp_adaptation_ind_param in the places where it's
using this typedef.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosctp: remove struct sctp_ecn_capable_param
Xin Long [Mon, 17 Jul 2017 03:29:54 +0000 (11:29 +0800)]
sctp: remove struct sctp_ecn_capable_param

Remove it, there is even no places using it.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosctp: remove the typedef sctp_supported_addrs_param_t
Xin Long [Mon, 17 Jul 2017 03:29:53 +0000 (11:29 +0800)]
sctp: remove the typedef sctp_supported_addrs_param_t

This patch is to remove the typedef sctp_supported_addrs_param_t, and
replace with struct sctp_supported_addrs_param in the places where it's
using this typedef.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosctp: remove the typedef sctp_hostname_param_t
Xin Long [Mon, 17 Jul 2017 03:29:52 +0000 (11:29 +0800)]
sctp: remove the typedef sctp_hostname_param_t

Remove this typedef, there is even no places using it.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosctp: remove the typedef sctp_cookie_preserve_param_t
Xin Long [Mon, 17 Jul 2017 03:29:51 +0000 (11:29 +0800)]
sctp: remove the typedef sctp_cookie_preserve_param_t

This patch is to remove the typedef sctp_cookie_preserve_param_t, and
replace with struct sctp_cookie_preserve_param in the places where it's
using this typedef.

It is also to fix some indents in sctp_sf_do_5_2_6_stale().

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosctp: remove the typedef sctp_ipv6addr_param_t
Xin Long [Mon, 17 Jul 2017 03:29:50 +0000 (11:29 +0800)]
sctp: remove the typedef sctp_ipv6addr_param_t

This patch is to remove the typedef sctp_ipv6addr_param_t, and replace
with struct sctp_ipv6addr_param in the places where it's using this
typedef.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosctp: remove the typedef sctp_ipv4addr_param_t
Xin Long [Mon, 17 Jul 2017 03:29:49 +0000 (11:29 +0800)]
sctp: remove the typedef sctp_ipv4addr_param_t

This patch is to remove the typedef sctp_ipv4addr_param_t, and replace
with struct sctp_ipv4addr_param in the places where it's using this
typedef.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agords: cancel send/recv work before queuing connection shutdown
Sowmini Varadhan [Sun, 16 Jul 2017 23:43:46 +0000 (16:43 -0700)]
rds: cancel send/recv work before queuing connection shutdown

We could end up executing rds_conn_shutdown before the rds_recv_worker
thread, then rds_conn_shutdown -> rds_tcp_conn_shutdown can do a
sock_release and set sock->sk to null, which may interleave in bad
ways with rds_recv_worker, e.g., it could result in:

"BUG: unable to handle kernel NULL pointer dereference at 0000000000000078"
    [ffff881769f6fd70] release_sock at ffffffff815f337b
    [ffff881769f6fd90] rds_tcp_recv at ffffffffa043c888 [rds_tcp]
    [ffff881769f6fdb0] rds_recv_worker at ffffffffa04a4810 [rds]
    [ffff881769f6fde0] process_one_work at ffffffff810a14c1
    [ffff881769f6fe40] worker_thread at ffffffff810a1940
    [ffff881769f6fec0] kthread at ffffffff810a6b1e

Also, do not enqueue any new shutdown workq items when the connection is
shutting down (this may happen for rds-tcp in softirq mode, if a FIN
or CLOSE is received while the modules is in the middle of an unload)

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'atm-constify-atm-pci_device_id'
David S. Miller [Sun, 16 Jul 2017 23:38:03 +0000 (16:38 -0700)]
Merge branch 'atm-constify-atm-pci_device_id'

Arvind Yadav says:

====================
atm: constify atm pci_device_id.

pci_device_id are not supposed to change at runtime. All functions
working with pci_device_id provided by <linux/pci.h> work with
const pci_device_id. So mark the non-const structs as const.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoatm: idt77252: constify pci_device_id.
Arvind Yadav [Sun, 16 Jul 2017 09:32:40 +0000 (15:02 +0530)]
atm: idt77252: constify pci_device_id.

pci_device_id are not supposed to change at runtime. All functions
working with pci_device_id provided by <linux/pci.h> work with
const pci_device_id. So mark the non-const structs as const.

File size before:
   text    data     bss     dec     hex filename
  27702     468      16   28186    6e1a drivers/atm/idt77252.o

File size After adding 'const':
   text    data     bss     dec     hex filename
  27766     404      16   28186    6e1a drivers/atm/idt77252.o

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoatm: eni: constify pci_device_id.
Arvind Yadav [Sun, 16 Jul 2017 09:32:39 +0000 (15:02 +0530)]
atm: eni: constify pci_device_id.

pci_device_id are not supposed to change at runtime. All functions
working with pci_device_id provided by <linux/pci.h> work with
const pci_device_id. So mark the non-const structs as const.

File size before:
   text    data     bss     dec     hex filename
  21565     352      56   21973    55d5 drivers/atm/eni.o

File size After adding 'const':
   text    data     bss     dec     hex filename
  21661     256      56   21973    55d5 drivers/atm/eni.o

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoatm: firestream: constify pci_device_id.
Arvind Yadav [Sun, 16 Jul 2017 09:32:38 +0000 (15:02 +0530)]
atm: firestream: constify pci_device_id.

pci_device_id are not supposed to change at runtime. All functions
working with pci_device_id provided by <linux/pci.h> work with
const pci_device_id. So mark the non-const structs as const.

File size before:
   text    data     bss     dec     hex filename
  16884     444      28   17356    43cc drivers/atm/firestream.o

File size After adding 'const':
   text    data     bss     dec     hex filename
  16980     348      28   17356    43cc drivers/atm/firestream.o

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoatm: zatm: constify pci_device_id.
Arvind Yadav [Sun, 16 Jul 2017 09:32:37 +0000 (15:02 +0530)]
atm: zatm: constify pci_device_id.

pci_device_id are not supposed to change at runtime. All functions
working with pci_device_id provided by <linux/pci.h> work with
const pci_device_id. So mark the non-const structs as const.

File size before:
   text    data     bss     dec     hex filename
  14350     352      40   14742    3996 drivers/atm/zatm.o

File size After adding 'const':
   text    data     bss     dec     hex filename
  14446     256      40   14742    3996 drivers/atm/zatm.o

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoatm: lanai: constify pci_device_id.
Arvind Yadav [Sun, 16 Jul 2017 09:32:36 +0000 (15:02 +0530)]
atm: lanai: constify pci_device_id.

pci_device_id are not supposed to change at runtime. All functions
working with pci_device_id provided by <linux/pci.h> work with
const pci_device_id. So mark the non-const structs as const.

File size before:
   text    data     bss     dec     hex filename
  18074     352       0   18426    47fa drivers/atm/lanai.o

File size After adding 'const':
   text    data     bss     dec     hex filename
  18170     256       0   18426    47fa drivers/atm/lanai.o

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoatm: solos-pci: constify pci_device_id.
Arvind Yadav [Sun, 16 Jul 2017 09:32:35 +0000 (15:02 +0530)]
atm: solos-pci: constify pci_device_id.

pci_device_id are not supposed to change at runtime. All functions
working with pci_device_id provided by <linux/pci.h> work with
const pci_device_id. So mark the non-const structs as const.

File size before:
   text    data     bss     dec     hex filename
  16138    4592      24   20754    5112 drivers/atm/solos-pci.o

File size After adding 'const':
   text    data     bss     dec     hex filename
  16218    4528      24   20754    5122 drivers/atm/solos-pci.o

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoatm: horizon: constify pci_device_id.
Arvind Yadav [Sun, 16 Jul 2017 09:32:34 +0000 (15:02 +0530)]
atm: horizon: constify pci_device_id.

pci_device_id are not supposed to change at runtime. All functions
working with pci_device_id provided by <linux/pci.h> work with
const pci_device_id. So mark the non-const structs as const.

File size before:
   text    data     bss     dec     hex filename
   9859     328       6   10193    27d1 drivers/atm/horizon.o

File size After adding 'const':
   text    data     bss     dec     hex filename
   9923     264       6   10193    27d1 drivers/atm/horizon.o

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoatm: he: constify pci_device_id.
Arvind Yadav [Sun, 16 Jul 2017 09:32:33 +0000 (15:02 +0530)]
atm: he: constify pci_device_id.

pci_device_id are not supposed to change at runtime. All functions
working with pci_device_id provided by <linux/pci.h> work with
const pci_device_id. So mark the non-const structs as const.

File size before:
   text    data     bss     dec     hex filename
  26514     440      48   27002    697a drivers/atm/he.o

File size After adding 'const':
   text    data     bss     dec     hex filename
  26578     376      48   27002    697a drivers/atm/he.o

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoatm: nicstar: constify pci_device_id.
Arvind Yadav [Sun, 16 Jul 2017 09:32:32 +0000 (15:02 +0530)]
atm: nicstar: constify pci_device_id.

pci_device_id are not supposed to change at runtime. All functions
working with pci_device_id provided by <linux/pci.h> work with
const pci_device_id. So mark the non-const structs as const.

File size before:
   text    data     bss     dec     hex filename
  22781     464     128   23373    5b4d drivers/atm/nicstar.o

File size After adding 'const':
   text    data     bss     dec     hex filename
  22845     400     128   23373    5b4d drivers/atm/nicstar.o

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoatm: fore200e: constify pci_device_id.
Arvind Yadav [Sun, 16 Jul 2017 09:32:31 +0000 (15:02 +0530)]
atm: fore200e: constify pci_device_id.

pci_device_id are not supposed to change at runtime. All functions
working with pci_device_id provided by <linux/pci.h> work with
const pci_device_id. So mark the non-const structs as const.

File size before:
   text    data     bss     dec     hex filename
  20025     320      16   20361    4f89 drivers/atm/fore200e.o

File size After adding 'const':
   text    data     bss     dec     hex filename
  20089     256      16   20361    4f89 drivers/atm/fore200e.o

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoatm: ambassador: constify pci_device_id.
Arvind Yadav [Sun, 16 Jul 2017 09:32:30 +0000 (15:02 +0530)]
atm: ambassador: constify pci_device_id.

pci_device_id are not supposed to change at runtime. All functions
working with pci_device_id provided by <linux/pci.h> work with
const pci_device_id. So mark the non-const structs as const.

File size before:
   text    data     bss     dec     hex filename
  13372     408       4   13784    35d8 drivers/atm/ambassador.o

File size After adding 'const':
   text    data     bss     dec     hex filename
  13484     296       4   13784    35d8 drivers/atm/ambassador.o

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoatm: iphase: constify pci_device_id.
Arvind Yadav [Sun, 16 Jul 2017 09:32:29 +0000 (15:02 +0530)]
atm: iphase: constify pci_device_id.

pci_device_id are not supposed to change at runtime. All functions
working with pci_device_id provided by <linux/pci.h> work with
const pci_device_id. So mark the non-const structs as const.

File size before:
   text    data     bss     dec     hex filename
  23536     432     160   24128    5e40 drivers/atm/iphase.o

File size After adding 'const':
   text    data     bss     dec     hex filename
  23632     336     160   24128    5e40 drivers/atm/iphase.o

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoip6: fix PMTU discovery when using /127 subnets
Vincent Bernat [Sat, 15 Jul 2017 17:40:20 +0000 (19:40 +0200)]
ip6: fix PMTU discovery when using /127 subnets

The definition of an "anycast destination address" has been tweaked as a
side-effect of commit 2647a9b07032 ("ipv6: Remove external dependency on
rt6i_gateway and RTF_ANYCAST"). The first address of a point-to-point
/127 subnet is now considered as an anycast address. This prevents
ICMPv6 errors to be returned to a sender of such a subnet and breaks
PMTU discovery.

This can be reproduced with:

    ip link add name out6 type veth peer name in6
    ip link add name out7 type veth peer name in7
    ip link set mtu 1400 dev out7
    ip link set mtu 1400 dev in7
    ip netns add next-hop
    ip netns add next-next-hop
    ip link set netns next-hop dev in6
    ip link set netns next-hop dev out7
    ip link set netns next-next-hop dev in7
    ip link set up dev out6
    ip addr add 2001:db8:1::12/127 dev out6
    ip netns exec next-hop ip link set up dev in6
    ip netns exec next-hop ip link set up dev out7
    ip netns exec next-hop ip addr add 2001:db8:1::13/127 dev in6
    ip netns exec next-hop ip addr add 2001:db8:1::14/127 dev out7
    ip netns exec next-hop ip route add default via 2001:db8:1::15
    ip netns exec next-hop sysctl -qw net.ipv6.conf.all.forwarding=1
    ip netns exec next-next-hop ip link set up dev in7
    ip netns exec next-next-hop ip addr add 2001:db8:1::15/127 dev in7
    ip netns exec next-next-hop ip addr add 2001:db8:1::50/128 dev in7
    ip netns exec next-next-hop ip route add default via 2001:db8:1::14
    ip netns exec next-next-hop sysctl -qw net.ipv6.conf.all.forwarding=1
    ip route add 2001:db8:1::48/123 via 2001:db8:1::13
    sleep 4
    ping -M do -s 1452 -c 3 2001:db8:1::50 || true
    ip route get 2001:db8:1::50

Before the patch, we get:

    2001:db8:1::50 from :: via 2001:db8:1::13 dev out6 src 2001:db8:1::12 metric 1024  pref medium

After the patch, we get:

    2001:db8:1::50 via 2001:db8:1::13 dev out6 src 2001:db8:1::12 metric 0
        cache  expires 578sec mtu 1400 pref medium

Fixes: 2647a9b07032 ("ipv6: Remove external dependency on rt6i_gateway and RTF_ANYCAST")
Signed-off-by: Vincent Bernat <vincent@bernat.im>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agotools: hv: ignore a NIC if it has been configured
sixiao@microsoft.com [Fri, 14 Jul 2017 17:47:20 +0000 (10:47 -0700)]
tools: hv: ignore a NIC if it has been configured

Let bondvf.sh ignore this NIC if it has been configured, to prevent
user configuration from being overwritten unexpectly.

Signed-off-by: Simon Xiao <sixiao@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosunvnet: add support for IPv6 checksum offloads
Shannon Nelson [Thu, 6 Jul 2017 23:57:10 +0000 (16:57 -0700)]
sunvnet: add support for IPv6 checksum offloads

The original code didn't handle non-IPv4 packets very well, so the
offload advertising had to be scaled back down to just IP.  Here we
add the bits needed to support TCP and UDP packets over IPv6 and
turn the offload advertising back on.

Orabug: 26289579

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Linus Torvalds [Thu, 13 Jul 2017 02:30:57 +0000 (19:30 -0700)]
Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) Fix 64-bit division in mlx5 IPSEC offload support, from Ilan Tayari
   and Arnd Bergmann.

2) Fix race in statistics gathering in bnxt_en driver, from Michael
   Chan.

3) Can't use a mutex in RCU reader protected section on tap driver, from
   Cong WANG.

4) Fix mdb leak in bridging code, from Eduardo Valentin.

5) Fix free of wrong pointer variable in nfp driver, from Dan Carpenter.

6) Buffer overflow in brcmfmac driver, from Arend van SPriel.

7) ioremap_nocache() return value needs to be checked in smsc911x
   driver, from Alexey Khoroshilov.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (34 commits)
  net: stmmac: revert "support future possible different internal phy mode"
  sfc: don't read beyond unicast address list
  datagram: fix kernel-doc comments
  socket: add documentation for missing elements
  smsc911x: Add check for ioremap_nocache() return code
  brcmfmac: fix possible buffer overflow in brcmf_cfg80211_mgmt_tx()
  net: hns: Bugfix for Tx timeout handling in hns driver
  net: ipmr: ipmr_get_table() returns NULL
  nfp: freeing the wrong variable
  mlxsw: spectrum_switchdev: Check status of memory allocation
  mlxsw: spectrum_switchdev: Remove unused variable
  mlxsw: spectrum_router: Fix use-after-free in route replace
  mlxsw: spectrum_router: Add missing rollback
  samples/bpf: fix a build issue
  bridge: mdb: fix leak on complete_info ptr on fail path
  tap: convert a mutex to a spinlock
  cxgb4: fix BUG() on interrupt deallocating path of ULD
  qed: Fix printk option passed when printing ipv6 addresses
  net: Fix minor code bug in timestamping.txt
  net: stmmac: Make 'alloc_dma_[rt]x_desc_resources()' look even closer
  ...

7 years agodisable new gcc-7.1.1 warnings for now
Linus Torvalds [Thu, 13 Jul 2017 02:25:47 +0000 (19:25 -0700)]
disable new gcc-7.1.1 warnings for now

I made the mistake of upgrading my desktop to the new Fedora 26 that
comes with gcc-7.1.1.

There's nothing wrong per se that I've noticed, but I now have 1500
lines of warnings, mostly from the new format-truncation warning
triggering all over the tree.

We use 'snprintf()' and friends in a lot of places, and often know that
the numbers are fairly small (ie a controller index or similar), but gcc
doesn't know that, and sees an 'int', and thinks that it could be some
huge number.  And then complains when our buffers are not able to fit
the name for the ten millionth controller.

These warnings aren't necessarily bad per se, and we probably want to
look through them subsystem by subsystem, but at least during the merge
window they just mean that I can't even see if somebody is introducing
any *real* problems when I pull.

So warnings disabled for now.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoMerge tag 'modules-for-v4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu...
Linus Torvalds [Thu, 13 Jul 2017 00:22:01 +0000 (17:22 -0700)]
Merge tag 'modules-for-v4.13' of git://git./linux/kernel/git/jeyu/linux

Pull modules updates from Jessica Yu:
 "Summary of modules changes for the 4.13 merge window:

   - Minor code cleanups

   - Avoid accessing mod struct prior to checking module struct version,
     from Kees

   - Fix racy atomic inc/dec logic of kmod_concurrent_max in kmod, from
     Luis"

* tag 'modules-for-v4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
  module: make the modinfo name const
  kmod: reduce atomic operations on kmod_concurrent and simplify
  module: use list_for_each_entry_rcu() on find_module_all()
  kernel/module.c: suppress warning about unused nowarn variable
  module: Add module name to modinfo
  module: Pass struct load_info into symbol checks

7 years agonet: stmmac: revert "support future possible different internal phy mode"
LABBE Corentin [Wed, 12 Jul 2017 07:32:34 +0000 (09:32 +0200)]
net: stmmac: revert "support future possible different internal phy mode"

Since internal phy-mode is reserved for non-xMII protocol we cannot use
it with dwmac-sun8i.
Furthermore, all DT patchs which comes with this patch were cleaned, so
the current state is broken.
This reverts commit 1c2fa5f84683 ("net: stmmac: support future possible different internal phy mode")

Fixes: 1c2fa5f84683 ("net: stmmac: support future possible different internal phy mode")
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosfc: don't read beyond unicast address list
Bert Kenward [Wed, 12 Jul 2017 16:19:41 +0000 (17:19 +0100)]
sfc: don't read beyond unicast address list

If we have more than 32 unicast MAC addresses assigned to an interface
we will read beyond the end of the address table in the driver when
adding filters. The next 256 entries store multicast addresses, so we
will end up attempting to insert duplicate filters, which is mostly
harmless. If we add more than 288 unicast addresses we will then read
past the multicast address table, which is likely to be more exciting.

Fixes: 12fb0da45c9a ("sfc: clean fallbacks between promisc/normal in efx_ef10_filter_sync_rx_mode")
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'net-doc-fixes'
David S. Miller [Wed, 12 Jul 2017 21:39:44 +0000 (14:39 -0700)]
Merge branch 'net-doc-fixes'

Stephen Hemminger says:

====================
minor net kernel-doc fixes

Fix a couple of small errors in kernel-doc for networking
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agodatagram: fix kernel-doc comments
stephen hemminger [Wed, 12 Jul 2017 16:29:07 +0000 (09:29 -0700)]
datagram: fix kernel-doc comments

An underscore in the kernel-doc comment section has special meaning
and mis-use generates an errors.

./net/core/datagram.c:207: ERROR: Unknown target name: "msg".
./net/core/datagram.c:379: ERROR: Unknown target name: "msg".
./net/core/datagram.c:816: ERROR: Unknown target name: "t".

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosocket: add documentation for missing elements
stephen hemminger [Wed, 12 Jul 2017 16:29:06 +0000 (09:29 -0700)]
socket: add documentation for missing elements

Fill in missing kernel-doc for missing elements in struct sock.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosmsc911x: Add check for ioremap_nocache() return code
Alexey Khoroshilov [Wed, 12 Jul 2017 20:58:56 +0000 (23:58 +0300)]
smsc911x: Add check for ioremap_nocache() return code

There is no check for return code of smsc911x_drv_probe()
in smsc911x_drv_probe(). The patch adds one.

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'i2c/for-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa...
Linus Torvalds [Wed, 12 Jul 2017 17:04:56 +0000 (10:04 -0700)]
Merge branch 'i2c/for-4.13' of git://git./linux/kernel/git/wsa/linux

Pull i2c updates from Wolfram Sang:
 "This pull request contains:

   - i2c core reorganization. One source file became too monolithic. It
     is now split up, yet we still have the same named object as the
     final output. This should ease maintenance.

   - new drivers: ZTE ZX2967 family, ASPEED 24XX/25XX

   - designware driver gained slave mode support

   - xgene-slimpro driver gained ACPI support

   - bigger overhaul for pca-platform driver

   - the algo-bit module now supports messages with enforced STOP

   - slightly bigger than usual set of driver updates and improvements

  and with much appreciated quality assurance from Andy Shevchenko"

* 'i2c/for-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (51 commits)
  i2c: Provide a stub for i2c_detect_slave_mode()
  i2c: designware: Let slave adapter support be optional
  i2c: designware: Make HW init functions static
  i2c: designware: fix spelling mistakes
  i2c: pca-platform: propagate error from i2c_pca_add_numbered_bus
  i2c: pca-platform: correctly set algo_data.reset_chip
  i2c: acpi: Do not create i2c-clients for LNXVIDEO ACPI devices
  i2c: designware: enable SLAVE in platform module
  i2c: designware: add SLAVE mode functions
  i2c: zx2967: drop COMPILE_TEST dependency
  i2c: zx2967: always use the same device when printing errors
  i2c: pca-platform: use dev_warn/dev_info instead of printk
  i2c: pca-platform: use device managed allocations
  i2c: pca-platform: add devicetree awareness
  i2c: pca-platform: switch to struct gpio_desc
  dt-bindings: add bindings for i2c-pca-platform
  i2c: cadance: fix ctrl/addr reg write order
  i2c: zx2967: add i2c controller driver for ZTE's zx2967 family
  dt: bindings: add documentation for zx2967 family i2c controller
  i2c: algo-bit: add support for I2C_M_STOP
  ...

7 years agoMerge tag 'iommu-updates-v4.13' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Wed, 12 Jul 2017 17:00:04 +0000 (10:00 -0700)]
Merge tag 'iommu-updates-v4.13' of git://git./linux/kernel/git/joro/iommu

Pull IOMMU updates from Joerg Roedel:
 "This update comes with:

   - Support for lockless operation in the ARM io-pgtable code.

     This is an important step to solve the scalability problems in the
     common dma-iommu code for ARM

   - Some Errata workarounds for ARM SMMU implemenations

   - Rewrite of the deferred IO/TLB flush code in the AMD IOMMU driver.

     The code suffered from very high flush rates, with the new
     implementation the flush rate is down to ~1% of what it was before

   - Support for amd_iommu=off when booting with kexec.

     The problem here was that the IOMMU driver bailed out early without
     disabling the iommu hardware, if it was enabled in the old kernel

   - The Rockchip IOMMU driver is now available on ARM64

   - Align the return value of the iommu_ops->device_group call-backs to
     not miss error values

   - Preempt-disable optimizations in the Intel VT-d and common IOVA
     code to help Linux-RT

   - Various other small cleanups and fixes"

* tag 'iommu-updates-v4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (60 commits)
  iommu/vt-d: Constify intel_dma_ops
  iommu: Warn once when device_group callback returns NULL
  iommu/omap: Return ERR_PTR in device_group call-back
  iommu: Return ERR_PTR() values from device_group call-backs
  iommu/s390: Use iommu_group_get_for_dev() in s390_iommu_add_device()
  iommu/vt-d: Don't disable preemption while accessing deferred_flush()
  iommu/iova: Don't disable preempt around this_cpu_ptr()
  iommu/arm-smmu-v3: Add workaround for Cavium ThunderX2 erratum #126
  iommu/arm-smmu-v3: Enable ACPI based HiSilicon CMD_PREFETCH quirk(erratum 161010701)
  iommu/arm-smmu-v3: Add workaround for Cavium ThunderX2 erratum #74
  ACPI/IORT: Fixup SMMUv3 resource size for Cavium ThunderX2 SMMUv3 model
  iommu/arm-smmu-v3, acpi: Add temporary Cavium SMMU-V3 IORT model number definitions
  iommu/io-pgtable-arm: Use dma_wmb() instead of wmb() when publishing table
  iommu/io-pgtable: depend on !GENERIC_ATOMIC64 when using COMPILE_TEST with LPAE
  iommu/arm-smmu-v3: Remove io-pgtable spinlock
  iommu/arm-smmu: Remove io-pgtable spinlock
  iommu/io-pgtable-arm-v7s: Support lockless operation
  iommu/io-pgtable-arm: Support lockless operation
  iommu/io-pgtable: Introduce explicit coherency
  iommu/io-pgtable-arm-v7s: Refactor split_blk_unmap
  ...

7 years agoMerge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszer...
Linus Torvalds [Wed, 12 Jul 2017 16:28:55 +0000 (09:28 -0700)]
Merge branch 'overlayfs-linus' of git://git./linux/kernel/git/mszeredi/vfs

Pull overlayfs updates from Miklos Szeredi:
 "This work from Amir introduces the inodes index feature, which
  provides:

   - hardlinks are not broken on copy up

   - infrastructure for overlayfs NFS export

  This also fixes constant st_ino for samefs case for lower hardlinks"

* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: (33 commits)
  ovl: mark parent impure and restore timestamp on ovl_link_up()
  ovl: document copying layers restrictions with inodes index
  ovl: cleanup orphan index entries
  ovl: persistent overlay inode nlink for indexed inodes
  ovl: implement index dir copy up
  ovl: move copy up lock out
  ovl: rearrange copy up
  ovl: add flag for upper in ovl_entry
  ovl: use struct copy_up_ctx as function argument
  ovl: base tmpfile in workdir too
  ovl: factor out ovl_copy_up_inode() helper
  ovl: extract helper to get temp file in copy up
  ovl: defer upper dir lock to tempfile link
  ovl: hash overlay non-dir inodes by copy up origin
  ovl: cleanup bad and stale index entries on mount
  ovl: lookup index entry for copy up origin
  ovl: verify index dir matches upper dir
  ovl: verify upper root dir matches lower root dir
  ovl: introduce the inodes index dir feature
  ovl: generalize ovl_create_workdir()
  ...

7 years agofix a braino in compat_sys_getrlimit()
Al Viro [Wed, 12 Jul 2017 03:59:45 +0000 (04:59 +0100)]
fix a braino in compat_sys_getrlimit()

Reported-and-tested-by: Meelis Roos <mroos@linux.ee>
Fixes: commit d9e968cb9f84 "getrlimit()/setrlimit(): move compat to native"
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agobrcmfmac: fix possible buffer overflow in brcmf_cfg80211_mgmt_tx()
Arend van Spriel [Fri, 7 Jul 2017 20:09:06 +0000 (21:09 +0100)]
brcmfmac: fix possible buffer overflow in brcmf_cfg80211_mgmt_tx()

The lower level nl80211 code in cfg80211 ensures that "len" is between
25 and NL80211_ATTR_FRAME (2304).  We subtract DOT11_MGMT_HDR_LEN (24) from
"len" so thats's max of 2280.  However, the action_frame->data[] buffer is
only BRCMF_FIL_ACTION_FRAME_SIZE (1800) bytes long so this memcpy() can
overflow.

memcpy(action_frame->data, &buf[DOT11_MGMT_HDR_LEN],
       le16_to_cpu(action_frame->len));

Cc: stable@vger.kernel.org # 3.9.x
Fixes: 18e2f61db3b70 ("brcmfmac: P2P action frame tx.")
Reported-by: "freenerguo(郭大兴)" <freenerguo@tencent.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: hns: Bugfix for Tx timeout handling in hns driver
Lin Yun Sheng [Wed, 12 Jul 2017 11:09:59 +0000 (19:09 +0800)]
net: hns: Bugfix for Tx timeout handling in hns driver

When hns port type is not debug mode, netif_tx_disable is called
when there is a tx timeout, which requires system reboot to return
to normal state. This patch fix this problem by resetting the net
dev.

Fixes: b5996f11ea54 ("net: add Hisilicon Network Subsystem basic ethernet support")
Signed-off-by: Lin Yun Sheng <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: ipmr: ipmr_get_table() returns NULL
Dan Carpenter [Wed, 12 Jul 2017 07:56:47 +0000 (10:56 +0300)]
net: ipmr: ipmr_get_table() returns NULL

The ipmr_get_table() function doesn't return error pointers it returns
NULL on error.

Fixes: 4f75ba6982bc ("net: ipmr: Add ipmr_rtm_getroute")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonfp: freeing the wrong variable
Dan Carpenter [Wed, 12 Jul 2017 07:42:06 +0000 (10:42 +0300)]
nfp: freeing the wrong variable

We accidentally free a NULL pointer and leak the pointer we want to
free.  Also you can tell from the label name what was intended.  :)

Fixes: abfcdc1de9bf ("nfp: add a stats handler for flower offloads")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'mlxsw-spectrum-Various-fixes'
David S. Miller [Wed, 12 Jul 2017 15:15:52 +0000 (08:15 -0700)]
Merge branch 'mlxsw-spectrum-Various-fixes'

Jiri Pirko says:

====================
mlxsw: spectrum: Various fixes

First patch adds a missing rollback in error path. Second patch prevents
a use-after-free during IPv4 route replace. Last two patches fix warnings
from static checkers.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum_switchdev: Check status of memory allocation
Ido Schimmel [Wed, 12 Jul 2017 07:12:55 +0000 (09:12 +0200)]
mlxsw: spectrum_switchdev: Check status of memory allocation

We can't rely on kzalloc() always succeeding, so check its return value.

Suppresses the following smatch error:

mlxsw_sp_switchdev_event() error: potential null dereference
'switchdev_work->fdb_info.addr'.  (kzalloc returns
 null)

Fixes: af061378924f ("mlxsw: spectrum_switchdev: Add support for learning FDB through notification")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum_switchdev: Remove unused variable
Ido Schimmel [Wed, 12 Jul 2017 07:12:54 +0000 (09:12 +0200)]
mlxsw: spectrum_switchdev: Remove unused variable

Commit 10e23eb299fa ("mlxsw: spectrum: Remove support for bypass bridge
port attributes/vlan set") removed statements that used 'bridge_vlan',
but didn't remove the variable itself resulting in the following warning
with W=1:

warning: variable ‘bridge_vlan’ set but not used
[-Wunused-but-set-variable]

Remove the variable and suppress the warning.

Fixes: 10e23eb299fa ("mlxsw: spectrum: Remove support for bypass bridge port attributes/vlan set")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum_router: Fix use-after-free in route replace
Ido Schimmel [Wed, 12 Jul 2017 07:12:53 +0000 (09:12 +0200)]
mlxsw: spectrum_router: Fix use-after-free in route replace

While working on IPv6 route replace I realized we can have a
use-after-free in IPv4 in case the replaced route is offloaded and the
only one using its FIB info.

The problem is that fib_table_insert() drops the reference on the FIB
info of the replaced routes which is eventually freed via call_rcu().
Since the driver doesn't hold a reference on this FIB info it can cause
a use-after-free when it tries to clear the RTNH_F_OFFLOAD flag stored
in fi->fib_flags.

After running the following commands in a loop for enough time with a
KASAN enabled kernel I finally got the below trace.

$ ip route add 192.168.50.0/24 via 192.168.200.1 dev enp3s0np3
$ ip route replace 192.168.50.0/24 dev enp3s0np5
$ ip route del 192.168.50.0/24 dev enp3s0np5

BUG: KASAN: use-after-free in mlxsw_sp_fib_entry_offload_unset+0xa7/0x120 [mlxsw_spectrum]
Read of size 4 at addr ffff8803717d9820 by task kworker/u4:2/55
[...]
? mlxsw_sp_fib_entry_offload_unset+0xa7/0x120 [mlxsw_spectrum]
? mlxsw_sp_fib_entry_offload_unset+0xa7/0x120 [mlxsw_spectrum]
? mlxsw_sp_router_neighs_update_work+0x1cd0/0x1ce0 [mlxsw_spectrum]
? mlxsw_sp_fib_entry_offload_unset+0xa7/0x120 [mlxsw_spectrum]
__asan_load4+0x61/0x80
mlxsw_sp_fib_entry_offload_unset+0xa7/0x120 [mlxsw_spectrum]
mlxsw_sp_fib_entry_offload_refresh+0xb6/0x370 [mlxsw_spectrum]
mlxsw_sp_router_fib_event_work+0xd1c/0x2780 [mlxsw_spectrum]
[...]
Freed by task 5131:
 save_stack_trace+0x16/0x20
 save_stack+0x46/0xd0
 kasan_slab_free+0x70/0xc0
 kfree+0x144/0x570
 free_fib_info_rcu+0x2e7/0x410
 rcu_process_callbacks+0x4f8/0xe30
 __do_softirq+0x1d3/0x9e2

Fix this by taking a reference on the FIB info when creating the nexthop
group it represents and drop it when the group is destroyed.

Fixes: 599cf8f95f22 ("mlxsw: spectrum_router: Add support for route replace")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>