review.tizen.org Git - platform/kernel/linux-rpi.git/log

Merge branch 'ICMP-flow-improvements'

Matteo Croce says:

====================
ICMP flow improvements

This series improves the flow inspector handling of ICMP packets:
The first two patches just add some comments in the code which would have saved
me a few minutes of time, and refactor a piece of code.
The third one adds to the flow inspector the capability to extract the
Identifier field, if present, so echo requests and replies are classified
as part of the same flow.
The fourth patch uses the function introduced earlier to the bonding driver,
so echo replies can be balanced across bonding slaves.

v1 -> v2:
- remove unused struct members
- add an helper to check for the Id field
- use a local flow_dissector_key in the bonding to avoid
changing behaviour of the flow dissector
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: balance ICMP echoes in layer3+4 mode

The bonding uses the L4 ports to balance flows between slaves. As the ICMP
protocol has no ports, those packets are sent all to the same device:

    # tcpdump -qltnni veth0 ip |sed 's/^/0: /' &
    # tcpdump -qltnni veth1 ip |sed 's/^/1: /' &
    # ping -qc1 192.168.0.2
    1: IP 192.168.0.1 > 192.168.0.2: ICMP echo request, id 315, seq 1, length 64
    1: IP 192.168.0.2 > 192.168.0.1: ICMP echo reply, id 315, seq 1, length 64
    # ping -qc1 192.168.0.2
    1: IP 192.168.0.1 > 192.168.0.2: ICMP echo request, id 316, seq 1, length 64
    1: IP 192.168.0.2 > 192.168.0.1: ICMP echo reply, id 316, seq 1, length 64
    # ping -qc1 192.168.0.2
    1: IP 192.168.0.1 > 192.168.0.2: ICMP echo request, id 317, seq 1, length 64
    1: IP 192.168.0.2 > 192.168.0.1: ICMP echo reply, id 317, seq 1, length 64

But some ICMP packets have an Identifier field which is
used to match packets within sessions, let's use this value in the hash
function to balance these packets between bond slaves:

    # ping -qc1 192.168.0.2
    0: IP 192.168.0.1 > 192.168.0.2: ICMP echo request, id 303, seq 1, length 64
    0: IP 192.168.0.2 > 192.168.0.1: ICMP echo reply, id 303, seq 1, length 64
    # ping -qc1 192.168.0.2
    1: IP 192.168.0.1 > 192.168.0.2: ICMP echo request, id 304, seq 1, length 64
    1: IP 192.168.0.2 > 192.168.0.1: ICMP echo reply, id 304, seq 1, length 64

Aso, let's use a flow_dissector_key which defines FLOW_DISSECTOR_KEY_ICMP,
so we can balance pings encapsulated in a tunnel when using mode encap3+4:

    # ping -q 192.168.1.2 -c1
    0: IP 192.168.0.1 > 192.168.0.2: GREv0, length 102: IP 192.168.1.1 > 192.168.1.2: ICMP echo request, id 585, seq 1, length 64
    0: IP 192.168.0.2 > 192.168.0.1: GREv0, length 102: IP 192.168.1.2 > 192.168.1.1: ICMP echo reply, id 585, seq 1, length 64
    # ping -q 192.168.1.2 -c1
    1: IP 192.168.0.1 > 192.168.0.2: GREv0, length 102: IP 192.168.1.1 > 192.168.1.2: ICMP echo request, id 586, seq 1, length 64
    1: IP 192.168.0.2 > 192.168.0.1: GREv0, length 102: IP 192.168.1.2 > 192.168.1.1: ICMP echo reply, id 586, seq 1, length 64

Signed-off-by: Matteo Croce <mcroce@redhat.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

flow_dissector: extract more ICMP information

The ICMP flow dissector currently parses only the Type and Code fields.
Some ICMP packets (echo, timestamp) have a 16 bit Identifier field which
is used to correlate packets.
Add such field in flow_dissector_key_icmp and replace skb_flow_get_be16()
with a more complex function which populate this field.

Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

flow_dissector: skip the ICMP dissector for non ICMP packets

FLOW_DISSECTOR_KEY_ICMP is checked for every packet, not only ICMP ones.
Even if the test overhead is probably negligible, move the
ICMP dissector code under the big 'switch(ip_proto)' so it gets called
only for ICMP packets.

Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

flow_dissector: add meaningful comments

Documents two piece of code which can't be understood at a glance.

Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tc-testing: fixed two failing pedit tests

Two pedit tests were failing due to incorrect operation
value in matchPattern, should be 'add' not 'val', so fix it.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: add smart nagle feature

We introduce a feature that works like a combination of TCP_NAGLE and
TCP_CORK, but without some of the weaknesses of those. In particular,
we will not observe long delivery delays because of delayed acks, since
the algorithm itself decides if and when acks are to be sent from the
receiving peer.

- The nagle property as such is determined by manipulating a new
  'maxnagle' field in struct tipc_sock. If certain conditions are met,
  'maxnagle' will define max size of the messages which can be bundled.
  If it is set to zero no messages are ever bundled, implying that the
  nagle property is disabled.
- A socket with the nagle property enabled enters nagle mode when more
  than 4 messages have been sent out without receiving any data message
  from the peer.
- A socket leaves nagle mode whenever it receives a data message from
  the peer.

In nagle mode, messages smaller than 'maxnagle' are accumulated in the
socket write queue. The last buffer in the queue is marked with a new
'ack_required' bit, which forces the receiving peer to send a CONN_ACK
message back to the sender upon reception.

The accumulated contents of the write queue is transmitted when one of
the following events or conditions occur.

- A CONN_ACK message is received from the peer.
- A data message is received from the peer.
- A SOCK_WAKEUP pseudo message is received from the link level.
- The write queue contains more than 64 1k blocks of data.
- The connection is being shut down.
- There is no CONN_ACK message to expect. I.e., there is currently
  no outstanding message where the 'ack_required' bit was set. As a
  consequence, the first message added after we enter nagle mode
  is always sent directly with this bit set.

This new feature gives a 50-100% improvement of throughput for small
(i.e., less than MTU size) messages, while it might add up to one RTT
to latency time when the socket is in nagle mode.

Acked-by: Ying Xue <ying.xue@windreiver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mlxsw-Update-firmware-version'

Ido Schimmel says:

====================
mlxsw: Update firmware version

This patch set updates the firmware version for Spectrum-1 and enforces
a firmware version for Spectrum-2.

The version adds support for querying port module type. It will be used
by a followup patch set from Jiri to make port split code more generic.

Patch #1 increases the size of an existing register in order to be
compatible with the new firmware version. In the future the firmware
will assign default values to fields not specified by the driver.

Patch #2 temporarily increases the PCI reset timeout for SN3800 systems.
Note that in normal cases the driver will need to wait no longer than 5
seconds for the device to become ready following reset command.

Patch #3 bumps the firmware version for Spectrum-1.

Patch #4 enforces a minimum firmware version for Spectrum-2.

v2:
* Added patch #2
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: Enforce firmware version for Spectrum-2

In a similar fashion to Spectrum-1, enforce a specific firmware version
for Spectrum-2 so that the driver and firmware are always in sync with
regards to new features.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: Bump firmware version to 13.2000.2308

The version adds support for querying port module type. It will be used
by a followup patch set from Jiri to make port split code more generic.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: pci: Increase PCI reset timeout for SN3800 systems

SN3800 Spectrum-2 based systems have gearboxes that need to be
initialized by the firmware during its initialization flow. In certain
cases, the firmware might need to flash these gearboxes, which is
currently a time-consuming process.

In newer firmware versions, the firmware will not signal to the driver
that it is ready until the gearboxes are flashed. Increase the PCI reset
timeout for these situations. In normal cases, the driver will need to
wait no longer than 5 seconds.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: reg: Increase size of MPAR register

In new firmware versions this register is extended with a sampling rate
for Spectrum-2 and future ASICs.

Increase the size of the register to ensure the field is initialized to
0 which means every packet is mirrored.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'nfc-pn533-add-uart-phy-driver'

Lars Poeschel says:

====================
nfc: pn533: add uart phy driver

The purpose of this patch series is to add a uart phy driver to the
pn533 nfc driver.
It first changes the dt strings and docs. The dt compatible strings
need to change, because I would add "pn532-uart" to the already
existing "pn533-i2c" one. These two are now unified into just
"pn532". Then the neccessary changes to the pn533 core driver are
made. Then the uart phy is added.
As the pn532 chip supports a autopoll, I wanted to use this instead
of the software poll loop in the pn533 core driver. It is added and
activated by the last to patches.
The way to add the autopoll later in seperate patches is chosen, to
show, that the uart phy driver can also work with the software poll
loop, if someone needs that for some reason.
In v11 of this patchseries I address a byte ordering issue reported
by kbuild test robot in patch 5/7.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

nfc: pn532_uart: Make use of pn532 autopoll

This switches the pn532 UART phy driver from manually polling to the new
autopoll mechanism.

Cc: Johan Hovold <johan@kernel.org>
Signed-off-by: Lars Poeschel <poeschel@lemonage.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfc: pn533: Add autopoll capability

pn532 devices support an autopoll command, that lets the chip
automatically poll for selected nfc technologies instead of manually
looping through every single nfc technology the user is interested in.
This is faster and less cpu and bus intensive than manually polling.
This adds this autopoll capability to the pn533 driver.

Cc: Johan Hovold <johan@kernel.org>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Lars Poeschel <poeschel@lemonage.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfc: pn533: add UART phy driver

This adds the UART phy interface for the pn533 driver.
The pn533 driver can be used through UART interface this way.
It is implemented as a serdev device.

Cc: Johan Hovold <johan@kernel.org>
Cc: Claudiu Beznea <Claudiu.Beznea@microchip.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Lars Poeschel <poeschel@lemonage.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfc: pn533: Split pn533 init & nfc_register

There is a problem in the initialisation and setup of the pn533: It
registers with nfc too early. It could happen, that it finished
registering with nfc and someone starts using it. But setup of the pn533
is not yet finished. Bad or at least unintended things could happen.
So I split out nfc registering (and unregistering) to seperate functions
that have to be called late in probe then.
i2c requires a bit more love: i2c requests an irq in it's probe
function. 'Commit 32ecc75ded72 ("NFC: pn533: change order operations in
dev registation")' shows, this can not happen too early. An irq can be
served before structs are fully initialized. The way chosen to prevent
this is to request the irq after nfc_alloc_device initialized the
structs, but before nfc_register_device. So there is now this
pn532_i2c_nfc_alloc function.

Cc: Johan Hovold <johan@kernel.org>
Cc: Claudiu Beznea <Claudiu.Beznea@microchip.com>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Lars Poeschel <poeschel@lemonage.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfc: pn533: Add dev_up/dev_down hooks to phy_ops

This adds hooks for dev_up and dev_down to the phy_ops. They are
optional.
The idea is to inform the phy driver when the nfc chip is really going
to be used. When it is not used, the phy driver can suspend it's
interface to the nfc chip to save some power. The nfc chip is considered
not in use before dev_up and after dev_down.

Cc: Johan Hovold <johan@kernel.org>
Signed-off-by: Lars Poeschel <poeschel@lemonage.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfc: pn532: Add uart phy docs and rename it

This adds documentation about the uart phy to the pn532 binding doc. As
the filename "pn533-i2c.txt" is not appropriate any more, rename it to
the more general "pn532.txt".
This also documents the deprecation of the compatible strings ending
with "...-i2c".

Cc: Johan Hovold <johan@kernel.org>
Cc: Simon Horman <horms@verge.net.au>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Lars Poeschel <poeschel@lemonage.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfc: pn533: i2c: "pn532" as dt compatible string

It is favourable to have one unified compatible string for devices that
have multiple interfaces. So this adds simply "pn532" as the devicetree
binding compatible string and makes a note that the old ones are
deprecated.

Cc: Johan Hovold <johan@kernel.org>
Cc: Simon Horman <horms@verge.net.au>
Signed-off-by: Lars Poeschel <poeschel@lemonage.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'bridge-fdbs-bitops'

Nikolay Aleksandrov says:

====================
net: bridge: convert fdbs to use bitops

We'd like to have a well-defined behaviour when changing fdb flags. The
problem is that we've added new fields which are changed from all
contexts without any locking. We are aware of the bit test/change races
and these are fine (we can remove them later), but it is considered
undefined behaviour to change bitfields from multiple threads and also
on some architectures that can result in unexpected results,
specifically when all fields between the changed ones are also
bitfields. The conversion to bitops shows the intent clearly and
makes them use functions with well-defined behaviour in such cases.
There is no overhead for the fast-path, the bit changing functions are
used only in special cases when learning and in the slow path.
In addition this conversion allows us to simplify fdb flag handling and
avoid bugs for future bits (e.g. a forgetting to clear the new bit when
allocating a new fdb). All bridge selftests passed, also tried all of the
converted bits manually in a VM.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: bridge: fdb: set flags directly in fdb_create

No need to have separate arguments for each flag, just set the flags to
whatever was passed to fdb_create() before the fdb is published.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bridge: fdb: convert offloaded to use bitops

Convert the offloaded field to a flag and use bitops.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bridge: fdb: convert added_by_external_learn to use bitops

Convert the added_by_external_learn field to a flag and use bitops.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bridge: fdb: convert added_by_user to bitops

Straight-forward convert of the added_by_user field to bitops.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bridge: fdb: convert is_sticky to bitops

Straight-forward convert of the is_sticky field to bitops.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bridge: fdb: convert is_static to bitops

Convert the is_static to bitops, make use of the combined
test_and_set/clear_bit to simplify expressions in fdb_add_entry.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bridge: fdb: convert is_local to bitops

The patch adds a new fdb flags field in the hole between the two cache
lines and uses it to convert is_local to bitops.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/smc: remove unneeded include for smc.h

The only smc-related reference in net/sock.h is struct smc_hashinfo.
But just its address is refered to. Thus there is no need for the
include of net/smc.h. Remove it.

Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: improve throughput between nodes in netns

Currently, TIPC transports intra-node user data messages directly
socket to socket, hence shortcutting all the lower layers of the
communication stack. This gives TIPC very good intra node performance,
both regarding throughput and latency.

We now introduce a similar mechanism for TIPC data traffic across
network namespaces located in the same kernel. On the send path, the
call chain is as always accompanied by the sending node's network name
space pointer. However, once we have reliably established that the
receiving node is represented by a namespace on the same host, we just
replace the namespace pointer with the receiving node/namespace's
ditto, and follow the regular socket receive patch though the receiving
node. This technique gives us a throughput similar to the node internal
throughput, several times larger than if we let the traffic go though
the full network stacks. As a comparison, max throughput for 64k
messages is four times larger than TCP throughput for the same type of
traffic.

To meet any security concerns, the following should be noted.

- All nodes joining a cluster are supposed to have been be certified
and authenticated by mechanisms outside TIPC. This is no different for
nodes/namespaces on the same host; they have to auto discover each
other using the attached interfaces, and establish links which are
supervised via the regular link monitoring mechanism. Hence, a kernel
local node has no other way to join a cluster than any other node, and
have to obey to policies set in the IP or device layers of the stack.

- Only when a sender has established with 100% certainty that the peer
node is located in a kernel local namespace does it choose to let user
data messages, and only those, take the crossover path to the receiving
node/namespace.

- If the receiving node/namespace is removed, its namespace pointer
is invalidated at all peer nodes, and their neighbor link monitoring
will eventually note that this node is gone.

- To ensure the "100% certainty" criteria, and prevent any possible
spoofing, received discovery messages must contain a proof that the
sender knows a common secret. We use the hash mix of the sending
node/namespace for this purpose, since it can be accessed directly by
all other namespaces in the kernel. Upon reception of a discovery
message, the receiver checks this proof against all the local
namespaces'hash_mix:es. If it finds a match, that, along with a
matching node id and cluster id, this is deemed sufficient proof that
the peer node in question is in a local namespace, and a wormhole can
be opened.

- We should also consider that TIPC is intended to be a cluster local
IPC mechanism (just like e.g. UNIX sockets) rather than a network
protocol, and hence we think it can justified to allow it to shortcut the
lower protocol layers.

Regarding traceability, we should notice that since commit 6c9081a3915d
("tipc: add loopback device tracking") it is possible to follow the node
internal packet flow by just activating tcpdump on the loopback
interface. This will be true even for this mechanism; by activating
tcpdump on the involved nodes' loopback interfaces their inter-name
space messaging can easily be tracked.

v2:
- update 'net' pointer when node left/rejoined
v3:
- grab read/write lock when using node ref obj
v4:
- clone traffics between netns to loopback

Suggested-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

inet: do not call sublist_rcv on empty list

syzbot triggered struct net NULL deref in NF_HOOK_LIST:
RIP: 0010:NF_HOOK_LIST include/linux/netfilter.h:331 [inline]
RIP: 0010:ip6_sublist_rcv+0x5c9/0x930 net/ipv6/ip6_input.c:292
ipv6_list_rcv+0x373/0x4b0 net/ipv6/ip6_input.c:328
__netif_receive_skb_list_ptype net/core/dev.c:5274 [inline]

Reason:
void ipv6_list_rcv(struct list_head *head, struct packet_type *pt,
                   struct net_device *orig_dev)
[..]
        list_for_each_entry_safe(skb, next, head, list) {
/* iterates list */
                skb = ip6_rcv_core(skb, dev, net);
/* ip6_rcv_core drops skb -> NULL is returned */
                if (skb == NULL)
                        continue;
[..]
}
/* sublist is empty -> curr_net is NULL */
        ip6_sublist_rcv(&sublist, curr_dev, curr_net);

Before the recent change NF_HOOK_LIST did a list iteration before
struct net deref, i.e. it was a no-op in the empty list case.

List iteration now happens after *net deref, causing crash.

Follow the same pattern as the ip(v6)_list_rcv loop and add a list_empty
test for the final sublist dispatch too.

Cc: Edward Cree <ecree@solarflare.com>
Reported-by: syzbot+c54f457cad330e57e967@syzkaller.appspotmail.com
Fixes: ca58fbe06c54 ("netfilter: add and use nf_hook_slow_list()")
Signed-off-by: Florian Westphal <fw@strlen.de>
Tested-by: Leon Romanovsky <leonro@mellanox.com>
Tested-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

broadcom: bnxt: Fix use true/false for bool

Use true/false for bool type in bnxt_timer function.

Signed-off-by: Saurav Girepunje <saurav.girepunje@gmail.com>
Acked-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cavium: thunder: Fix use true/false for bool type

use true/false on bool type variables for assignment.

Signed-off-by: Saurav Girepunje <saurav.girepunje@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-phy-marvell-fix-and-extend-downshift-support'

Heiner Kallweit says:

====================
net: phy: marvell: fix and extend downshift support

This series includes two fixes and two extensions for downshift support.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: marvell: add PHY tunable support for more PHY versions

More PHY versions are compatible with the existing downshift
implementation, so let's add downshift support for them.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: marvell: add downshift support for M88E1111

This patch adds downshift support for M88E1111. This PHY version uses
another register for downshift configuration, reading downshift status
is possible via the same register as for other PHY versions.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: marvell: fix downshift function naming

I got access to the M88E1111 datasheet, and this PHY version uses
another register for downshift configuration. Therefore change prefix
to m88e1011, aligned with constants like MII_M1011_PHY_SCR.

Fixes: a3bdfce7bf9c ("net: phy: marvell: support downshift as PHY tunable")
Reported-by: Chris Healy <Chris.Healy@zii.aero>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: marvell: fix typo in constant MII_M1011_PHY_SRC_DOWNSHIFT_MASK

Fix typo and use PHY_SCR for PHY-specific Control Register.

Fixes: a3bdfce7bf9c ("net: phy: marvell: support downshift as PHY tunable")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Documentation: net-sysfs: describe missing statistics

Sync the ABI description with the interface statistics that are currently
available through sysfs.

CC: Jarod Wilson <jarod@redhat.com>
CC: Jonathan Corbet <corbet@lwn.net>
CC: linux-doc@vger.kernel.org
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ionic: Remove set but not used variable 'sg_desc'

Fixes gcc '-Wunused-but-set-variable' warning:

drivers/net/ethernet/pensando/ionic/ionic_txrx.c: In function 'ionic_rx_empty':
drivers/net/ethernet/pensando/ionic/ionic_txrx.c:405:28: warning:
variable 'sg_desc' set but not used [-Wunused-but-set-variable]

It is never used, so can be removed.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: dp83867: support Wake on LAN

This adds WoL support on TI DP83867 for magic, magic secure, unicast and
broadcast.

Signed-off-by: Thomas Haemmerle <thomas.haemmerle@wolfvision.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: aquantia: fix error handling in aq_ptp_poll

Fix currenty ignored returned error by properly checking *err* after
calling aq_nic->aq_hw_ops->hw_ring_hwts_rx_fill().

Addresses-Coverity-ID: 1487357 ("Unused value")
Fixes: 04a1839950d9 ("net: aquantia: implement data PTP datapath")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: aquantia: remove unused including <linux/version.h>

Remove including <linux/version.h> that don't need it.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: LAN9303: select REGMAP when LAN9303 enable

When NET_DSA_SMSC_LAN9303=y and NET_DSA_SMSC_LAN9303_MDIO=y,
below errors can be seen:
drivers/net/dsa/lan9303_mdio.c:87:23: error: REGMAP_ENDIAN_LITTLE
undeclared here (not in a function)
.reg_format_endian = REGMAP_ENDIAN_LITTLE,
drivers/net/dsa/lan9303_mdio.c:93:3: error: const struct regmap_config
has no member named reg_read
.reg_read = lan9303_mdio_read,

It should select REGMAP in config NET_DSA_SMSC_LAN9303.

Fixes: dc7005831523 ("net: dsa: LAN9303: add MDIO managed mode support")
Signed-off-by: Mao Wenan <maowenan@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: aquantia: make two symbols be static

When using ARCH=mips CROSS_COMPILE=mips-linux-gnu-
to build drivers/net/ethernet/aquantia/atlantic/aq_ptp.o
and drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.o,
below errors can be seen:
drivers/net/ethernet/aquantia/atlantic/aq_ptp.c:1378:6:
warning: symbol 'aq_ptp_poll_sync_work_cb' was not declared.
Should it be static?

drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c:1155:5:
warning: symbol 'hw_atl_b0_ts_to_sys_clock' was not declared.
Should it be static?

This patch to make aq_ptp_poll_sync_work_cb and hw_atl_b0_ts_to_sys_clock
be static to fix these warnings.

Fixes: 9c477032f7d0 ("net: aquantia: add support for PIN funcs")
Signed-off-by: Mao Wenan <maowenan@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch '40GbE' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
40GbE Intel Wired LAN Driver Updates 2019-10-25

This series contains updates to i40e only.  Several are fixes that could
go to 'net', but were intended for 'net-next'.

Sylwia changes how the driver function to read the NVM module data, so
that it is able to read the LLDP agent configuration to allow for
persistent LLDP.

Jaroslaw resolves an issue where the incorrect FEC settings were being
displayed in ethtool, by setting the proper FEC bits.

Piotr moves the hardware flags detection into a separate function, so
that the specific flags can be set based on the MAC and NVM.  Also
extends the PHY access function to include a command flag to let the
firmware know it should not change the page while accessing a OSFP module.
Updates the driver to display the driver and firmware version when in
recovery mode.

Aleksandr refactored the VF MAC filters accounting since an untrusted
VF was able to delete but not add a MAC filter, so refactor the code to
have more consistency and improved logging.

Nicholas updates the driver to use a default interval of 50 usecs,
instead of the current 100 usecs which was causing some regression
performance issues.

Damian resolved LED blinking issues for X710T*L devices by adding
specific flows for these devices in the LED operations.

Navid Emamdoost found where allocated memory is not being properly freed
upon a failure in setting up MAC VLANs, so added the missing kfree().

v2: Dropped patches 2 & 6 from the original series while we wait for the
    author to respond to community feedback.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: fec: remove redundant assignment to pointer bdp

The pointer bdp is being assigned with a value that is never
read, so the assignment is redundant and hence can be removed.

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Fugang Duan <fugang.duan@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: return directly from dsa_to_port

Return directly from within the loop as soon as the port is found,
otherwise we won't return NULL if the end of the list is reached.

Fixes: b96ddf254b09 ("net: dsa: use ports list in dsa_to_port")
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: aquantia: fix unintention integer overflow on left shift

Shifting the integer value 1 is evaluated using 32-bit
arithmetic and then used in an expression that expects a 64-bit
value, so there is potentially an integer overflow. Fix this
by using the BIT_ULL macro to perform the shift and avoid the
overflow.

Addresses-Coverity: ("Unintentional integer overflow")
Fixes: 04a1839950d9 ("net: aquantia: implement data PTP datapath")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: aquantia: fix spelling mistake: tx_queus -> tx_queues

There is a spelling mistake in a netdev_err error message. Fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

atm: remove unneeded semicolon

remove unneeded semicolon.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sock: remove unneeded semicolon

remove unneeded semicolon.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mediatek: remove unneeded semicolon

remove unneeded semicolon.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_buffers: remove unneeded semicolon

Remove excess semicolon after closing parenthesis.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mv88e6xxx-Allow-config-of-ATU-hash-algorithm'

Andrew Lunn says:

====================
mv88e6xxx: Allow config of ATU hash algorithm

v2:

Pass a pointer for where the hash should be stored, return a plain
errno, or 0.

Document the parameter.

v3:

Document type of parameter, and valid range
Add break statements to default clause of switch
Directly use ctx->val.vu8

v4:

Consistently use devlink, not a mix of devlink and dl.
Fix allocation of devlink priv
Remove upper case from parameter name
Make mask 16 bit wide.

v5:
Back to using the parameter name ATU_hash

v6:
Rebase net-next/master
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: Add devlink param for ATU hash algorithm.

Some of the marvell switches have bits controlling the hash algorithm
the ATU uses for MAC addresses. In some industrial settings, where all
the devices are from the same manufacture, and hence use the same OUI,
the default hashing algorithm is not optimal. Allow the other
algorithms to be selected via devlink.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: Add support for devlink device parameters

Add plumbing to allow DSA drivers to register parameters with devlink.

To keep with the abstraction, the DSA drivers pass the ds structure to
these helpers, and the DSA core then translates that to the devlink
structure associated to the device.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: use helper rtl_hw_aspm_clkreq_enable also in rtl_hw_start_8168g_2

One place in the driver was left where the open-coded functionality
hasn't been replaced with helper rtl_hw_aspm_clkreq_enable yet.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-dsa-b53-Add-support-for-MDB'

Florian Fainelli says:

====================
net: dsa: b53: Add support for MDB

This patch series adds support for programming multicast database
entries on b53 and bcm_sf2. This is extracted from a previously
submitted series that added managed mode support, but these patches are
usable in isolation. The larger series still needs to be reworked.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: bcm_sf2: Wire up MDB operations

Leverage the recently add b53_mdb_{add,del,prepare} functions since they
work as-is for bcm_sf2.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: b53: Add support for MDB

In preparation for supporting IGMP snooping with or without the use of
a bridge, add support within b53_common.c to program the ARL entries for
multicast operations. The key difference is that a multicast ARL entry
is comprised of a bitmask of enabled ports, instead of a port number.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mvpp2-improvements-in-rx-path'

Matteo Croce says:

====================
mvpp2 improvements in rx path

Refactor some code in the RX path to allow prefetching some data from the
packet header. The first patch is only a refactor, the second one
reduces the data synced, while the third one adds the prefetch.

The packet rate improvement with the second patch is very small (1606 => 1620 kpps),
while the prefetch bumps it up by 14%: 1620 => 1853 kpps.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

mvpp2: prefetch frame header

When receiving traffic, eth_type_trans() is high up on the perf top list,
because it's the first function which access the packet data.

Move the DMA unmap a bit higher, and put a prefetch just after it, so we
have more time to load the data into the cache.

The packet rate increase is about 14% with a tc drop test: 1620 => 1853 kpps

Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mvpp2: sync only the received frame

In the RX path we always sync against the maximum frame size for that pool.
Do the DMA sync and the unmap separately, so we can only sync by the
size of the received frame.

Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mvpp2: refactor frame drop routine

Move some code down to remove a backward goto.

Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

isdn: hfcsusb: Spelling and grammar fixes

Fix misspellings of "endpoints", "configuration", and "device's".

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: Spelling s/enpoint/endpoint/

Fix misspelling of "endpoint".

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Fix various misspellings of "connect"

Fix misspellings of "disconnect", "disconnecting", "connections", and
"disconnected".

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Kalle Valo <kvalo@codeaurora.org>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Fix misspellings of "configure" and "configuration"

Fix various misspellings of "configuration" and "configure".

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: qca8k: Initialize the switch with correct number of ports

Since commit 0394a63acfe2 ("net: dsa: enable and disable all ports")
the dsa core disables all unused ports of a switch. In this case
disabling ports with numbers higher than QCA8K_NUM_PORTS causes that
some switch registers are overwritten with incorrect content.

To fix this, initialize the dsa_switch->num_ports with correct number
of ports.

Fixes: 7e99e3470172 ("net: dsa: remove dsa_switch_alloc helper")
Signed-off-by: Michal Vokáč <michal.vokac@ysoft.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: fix dereference on ds->dev before null check error

Currently ds->dev is dereferenced on the assignments of pdata and
np before ds->dev is null checked, hence there is a potential null
pointer dereference on ds->dev. Fix this by assigning pdata and
np after the ds->dev null pointer sanity check.

Addresses-Coverity: ("Dereference before null check")
Fixes: 7e99e3470172 ("net: dsa: remove dsa_switch_alloc helper")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reported-by: kbuild test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git./linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2019-10-27

The following pull-request contains BPF updates for your *net-next* tree.

We've added 52 non-merge commits during the last 11 day(s) which contain
a total of 65 files changed, 2604 insertions(+), 1100 deletions(-).

The main changes are:

1) Revolutionize BPF tracing by using in-kernel BTF to type check BPF
    assembly code. The work here teaches BPF verifier to recognize
    kfree_skb()'s first argument as 'struct sk_buff *' in tracepoints
    such that verifier allows direct use of bpf_skb_event_output() helper
    used in tc BPF et al (w/o probing memory access) that dumps skb data
    into perf ring buffer. Also add direct loads to probe memory in order
    to speed up/replace bpf_probe_read() calls, from Alexei Starovoitov.

2) Big batch of changes to improve libbpf and BPF kselftests. Besides
    others: generalization of libbpf's CO-RE relocation support to now
    also include field existence relocations, revamp the BPF kselftest
    Makefile to add test runner concept allowing to exercise various
    ways to build BPF programs, and teach bpf_object__open() and friends
    to automatically derive BPF program type/expected attach type from
    section names to ease their use, from Andrii Nakryiko.

3) Fix deadlock in stackmap's build-id lookup on rq_lock(), from Song Liu.

4) Allow to read BTF as raw data from bpftool. Most notable use case
    is to dump /sys/kernel/btf/vmlinux through this, from Jiri Olsa.

5) Use bpf_redirect_map() helper in libbpf's AF_XDP helper prog which
    manages to improve "rx_drop" performance by ~4%., from Björn Töpel.

6) Fix to restore the flow dissector after reattach BPF test and also
    fix error handling in bpf_helper_defs.h generation, from Jakub Sitnicki.

7) Improve verifier's BTF ctx access for use outside of raw_tp, from
    Martin KaFai Lau.

8) Improve documentation for AF_XDP with new sections and to reflect
    latest features, from Magnus Karlsson.

9) Add back 'version' section parsing to libbpf for old kernels, from
    John Fastabend.

10) Fix strncat bounds error in libbpf's libbpf_prog_type_by_name(),
    from KP Singh.

11) Turn on -mattr=+alu32 in LLVM by default for BPF kselftests in order
    to improve insn coverage for built BPF progs, from Yonghong Song.

12) Misc minor cleanups and fixes, from various others.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

tc-testing: list required kernel options for act_ct action

Updated config with required kernel options for conntrac TC action,
so that tdc can run the tests.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git./linux/kernel/git/pablo/nf-next

Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

The following patchset contains Netfilter/IPVS updates for net-next,
more specifically:

* Updates for ipset:

1) Coding style fix for ipset comment extension, from Jeremy Sowden.

2) De-inline many functions in ipset, from Jeremy Sowden.

3) Move ipset function definition from header to source file.

4) Move ip_set_put_flags() to source, export it as a symbol, remove
   inline.

5) Move range_to_mask() to the source file where this is used.

6) Move ip_set_get_ip_port() to the source file where this is used.

* IPVS selftests and netns improvements:

7) Two patches to speedup ipvs netns dismantle, from Haishuang Yan.

8) Three patches to add selftest script for ipvs, also from
   Haishuang Yan.

* Conntrack updates and new nf_hook_slow_list() function:

9) Document ct ecache extension, from Florian Westphal.

10) Skip ct extensions from ctnetlink dump, from Florian.

11) Free ct extension immediately, from Florian.

12) Skip access to ecache extension from nf_ct_deliver_cached_events()
    this is not correct as reported by Syzbot.

13) Add and use nf_hook_slow_list(), from Florian.

* Flowtable infrastructure updates:

14) Move priority to nf_flowtable definition.

15) Dynamic allocation of per-device hooks in flowtables.

16) Allow to include netdevice only once in flowtable definitions.

17) Rise maximum number of devices per flowtable.

* Netfilter hardware offload infrastructure updates:

18) Add nft_flow_block_chain() helper function.

19) Pass callback list to nft_setup_cb_call().

20) Add nft_flow_cls_offload_setup() helper function.

21) Remove rules for the unregistered device via netdevice event.

22) Support for multiple devices in a basechain definition at the
    ingress hook.

22) Add nft_chain_offload_cmd() helper function.

23) Add nft_flow_block_offload_init() helper function.

24) Rewind in case of failing to bind multiple devices to hook.

25) Typo in IPv6 tproxy module description, from Norman Rasmussen.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-aquantia-ptp-followup-fixes'

Igor Russkikh says:

====================
net: aquantia: ptp followup fixes

Here are two sparse warnings, third patch is a fix for
scaled_ppm_to_ppb missing. Eventually I reworked this
to exclude ptp module from build. Please consider it instead
of this patch: https://patchwork.ozlabs.org/patch/1184171/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: aquantia: disable ptp object build if no config

We do disable aq_ptp module build using inline
stubs when CONFIG_PTP_1588_CLOCK is not declared.

This reduces module size and removes unnecessary code.

Reported-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: aquantia: fix warnings on endianness

fixes to remove sparse warnings:
sparse: sparse: cast to restricted __be64

Fixes: 04a1839950d9 ("net: aquantia: implement data PTP datapath")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: aquantia: fix var initialization warning

found by sparse, simply useless local initialization with zero.

Fixes: 94ad94558b0f ("net: aquantia: add PTP rings infrastructure")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netfilter: nf_tables_offload: unbind if multi-device binding fails

nft_flow_block_chain() needs to unbind in case of error when performing
the multi-device binding.

Fixes: d54725cd11a5 ("netfilter: nf_tables: support for multiple devices per netdev hook")
Reported-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables_offload: add nft_flow_block_offload_init()

This patch adds the nft_flow_block_offload_init() helper function to
initialize the flow_block_offload object.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables_offload: add nft_chain_offload_cmd()

This patch adds the nft_chain_offload_cmd() helper function.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: ecache: don't look for ecache extension on dying/unconfirmed conntracks

syzbot reported following splat:
BUG: KASAN: use-after-free in __nf_ct_ext_exist
include/net/netfilter/nf_conntrack_extend.h:53 [inline]
BUG: KASAN: use-after-free in nf_ct_deliver_cached_events+0x5c3/0x6d0
net/netfilter/nf_conntrack_ecache.c:205
nf_conntrack_confirm include/net/netfilter/nf_conntrack_core.h:65 [inline]
nf_confirm+0x3d8/0x4d0 net/netfilter/nf_conntrack_proto.c:154
[..]

While there is no reproducer yet, the syzbot report contains one
interesting bit of information:

Freed by task 27585:
[..]
kfree+0x10a/0x2c0 mm/slab.c:3757
nf_ct_ext_destroy+0x2ab/0x2e0 net/netfilter/nf_conntrack_extend.c:38
nf_conntrack_free+0x8f/0xe0 net/netfilter/nf_conntrack_core.c:1418
destroy_conntrack+0x1a2/0x270 net/netfilter/nf_conntrack_core.c:626
nf_conntrack_put include/linux/netfilter/nf_conntrack_common.h:31 [inline]
nf_ct_resolve_clash net/netfilter/nf_conntrack_core.c:915 [inline]
^^^^^^^^^^^^^^^^^^^
__nf_conntrack_confirm+0x21ca/0x2830 net/netfilter/nf_conntrack_core.c:1038
nf_conntrack_confirm include/net/netfilter/nf_conntrack_core.h:63 [inline]
nf_confirm+0x3e7/0x4d0 net/netfilter/nf_conntrack_proto.c:154

This is whats happening:

1. a conntrack entry is about to be confirmed (added to hash table).
2. a clash with existing entry is detected.
3. nf_ct_resolve_clash() puts skb->nfct (the "losing" entry).
4. this entry now has a refcount of 0 and is freed to SLAB_TYPESAFE_BY_RCU
   kmem cache.

skb->nfct has been replaced by the one found in the hash.
Problem is that nf_conntrack_confirm() uses the old ct:

static inline int nf_conntrack_confirm(struct sk_buff *skb)
{
struct nf_conn *ct = (struct nf_conn *)skb_nfct(skb);
int ret = NF_ACCEPT;

  if (ct) {
    if (!nf_ct_is_confirmed(ct))
       ret = __nf_conntrack_confirm(skb);
    if (likely(ret == NF_ACCEPT))
nf_ct_deliver_cached_events(ct); /* This ct has refcount 0! */
  }
  return ret;
}

As of "netfilter: conntrack: free extension area immediately", we can't
access conntrack extensions in this case.

To fix this, make sure we check the dying bit presence before attempting
to get the eache extension.

Reported-by: syzbot+c7aabc9fe93e7f3637ba@syzkaller.appspotmail.com
Fixes: 2ad9d7747c10d1 ("netfilter: conntrack: free extension area immediately")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Merge branch 'ionic-updates'

Shannon Nelson says:

====================
ionic updates

These are a few of the driver updates we've been working on internally.
These clean up a few mismatched struct comments, add checking for dead
firmware, fix an initialization bug, and change the Rx buffer management.

These are based on net-next v5.4-rc3-709-g985fd98ab5cc.

v2: clear napi->skb in the error case in ionic_rx_frags()
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ionic: update driver version

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

ionic: implement support for rx sgl

Even out Rx performance across MTU sizes by changing from full
skb allocations to page-based frag allocations. The device
supports a form of scatter-gather in the Rx path, so we can
set up a number of pages for each descriptor, all of which are
easier to alloc and pass around than the standard kzalloc'd
buffer. An skb is wrapped around the pages while processing
the received packets, and pages are recycled as needed, or
left alone if they weren't used in the Rx.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

ionic: add a watchdog timer to monitor heartbeat

Add a watchdog to periodically monitor the NIC heartbeat.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

ionic: add heartbeat check

Most of our firmware has a heartbeat feature that the driver
can watch for to see if the FW is still alive and likely to
answer a dev_cmd or AdminQ request.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

ionic: reverse an interrupt coalesce calculation

Fix the initial interrupt coalesce usec-to-hw setting
to actually be usec-to-hw.

Fixes: 780eded34ccc ("ionic: report users coalesce request")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

ionic: fix up struct name comments

Fix up struct names in the ionic_if.h comments

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: improve rtl8169_rx_fill

We have only one user of the error path, so we can inline it.
In addition the call to rtl8169_make_unusable_by_asic() can be removed
because rtl8169_alloc_rx_data() didn't call rtl8169_mark_to_asic() yet
for the respective index if returning NULL.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: align fix_features callback with vendor driver

This patch aligns the fix_features callback with the vendor driver and
also disables IPv6 HW checksumming and TSO if jumbo packets are used
on RTL8101/RTL8168/RTL8125.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'for-upstream' of git://git./linux/kernel/git/bluetooth/bluetooth-next

Johan Hedberg says:

====================
pull request: bluetooth-next 2019-10-23

Here's the main bluetooth-next pull request for the 5.5 kernel:

- Multiple fixes to hci_qca driver
- Fix for HCI_USER_CHANNEL initialization
- btwlink: drop superseded driver
- Add support for Intel FW download error recovery
- Various other smaller fixes & improvements

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: add TCP_INFO status for failed client TFO

The TCPI_OPT_SYN_DATA bit as part of tcpi_options currently reports whether
or not data-in-SYN was ack'd on both the client and server side. We'd like
to gather more information on the client-side in the failure case in order
to indicate the reason for the failure. This can be useful for not only
debugging TFO, but also for creating TFO socket policies. For example, if
a middle box removes the TFO option or drops a data-in-SYN, we can
can detect this case, and turn off TFO for these connections saving the
extra retransmits.

The newly added tcpi_fastopen_client_fail status is 2 bits and has the
following 4 states:

1) TFO_STATUS_UNSPEC

Catch-all state which includes when TFO is disabled via black hole
detection, which is indicated via LINUX_MIB_TCPFASTOPENBLACKHOLE.

2) TFO_COOKIE_UNAVAILABLE

If TFO_CLIENT_NO_COOKIE mode is off, this state indicates that no cookie
is available in the cache.

3) TFO_DATA_NOT_ACKED

Data was sent with SYN, we received a SYN/ACK but it did not cover the data
portion. Cookie is not accepted by server because the cookie may be invalid
or the server may be overloaded.

4) TFO_SYN_RETRANSMITTED

Data was sent with SYN, we received a SYN/ACK which did not cover the data
after at least 1 additional SYN was sent (without data). It may be the case
that a middle-box is dropping data-in-SYN packets. Thus, it would be more
efficient to not use TFO on this connection to avoid extra retransmits
during connection establishment.

These new fields do not cover all the cases where TFO may fail, but other
failures, such as SYN/ACK + data being dropped, will result in the
connection not becoming established. And a connection blackhole after
session establishment shows up as a stalled connection.

Signed-off-by: Jason Baron <jbaron@akamai.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Christoph Paasch <cpaasch@apple.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'phy-dp83867-enable-robust-auto-mdix'

Grygorii Strashko says:

====================
net: phy: dp83867: enable robust auto-mdix

Patch 1 - improves link detection when dp83867 PHY is configured in manual mode
by enabling CFG3[9] Robust Auto-MDIX option.

Patch 2 - is minor optimization.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: dp83867: move dt parsing to probe

Move DT parsing code to probe dp83867_probe() as it's one time operation.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: dp83867: enable robust auto-mdix

The link detection timeouts can be observed (or link might not be detected
at all) when dp83867 PHY is configured in manual mode (speed/duplex).

CFG3[9] Robust Auto-MDIX option allows to significantly improve link detection
in case dp83867 is configured in manual mode and reduce link detection
time.
As per DM: "If link partners are configured to operational modes that are
not supported by normal Auto MDI/MDIX mode (like Auto-Neg versus Force
100Base-TX or Force 100Base-TX versus Force 100Base-TX), this Robust Auto
MDI/MDIX mode allows MDI/MDIX resolution and prevents deadlock."

Hence, enable this option by default as there are no known reasons
not to do so.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sch_generic: Use pfifo_fast as fallback scheduler for CAN hardware

There is networking hardware that isn't based on Ethernet for layers 1 and 2.

For example CAN.

CAN is a multi-master serial bus standard for connecting Electronic Control
Units [ECUs] also known as nodes. A frame on the CAN bus carries up to 8 bytes
of payload. Frame corruption is detected by a CRC. However frame loss due to
corruption is possible, but a quite unusual phenomenon.

While fq_codel works great for TCP/IP, it doesn't for CAN. There are a lot of
legacy protocols on top of CAN, which are not build with flow control or high
CAN frame drop rates in mind.

When using fq_codel, as soon as the queue reaches a certain delay based length,
skbs from the head of the queue are silently dropped. Silently meaning that the
user space using a send() or similar syscall doesn't get an error. However
TCP's flow control algorithm will detect dropped packages and adjust the
bandwidth accordingly.

When using fq_codel and sending raw frames over CAN, which is the common use
case, the user space thinks the package has been sent without problems, because
send() returned without an error. pfifo_fast will drop skbs, if the queue
length exceeds the maximum. But with this scheduler the skbs at the tail are
dropped, an error (-ENOBUFS) is propagated to user space. So that the user
space can slow down the package generation.

On distributions, where fq_codel is made default via CONFIG_DEFAULT_NET_SCH
during compile time, or set default during runtime with sysctl
net.core.default_qdisc (see [1]), we get a bad user experience. In my test case
with pfifo_fast, I can transfer thousands of million CAN frames without a frame
drop. On the other hand with fq_codel there is more then one lost CAN frame per
thousand frames.

As pointed out fq_codel is not suited for CAN hardware, so this patch changes
attach_one_default_qdisc() to use pfifo_fast for "ARPHRD_CAN" network devices.

During transition of a netdev from down to up state the default queuing
discipline is attached by attach_default_qdiscs() with the help of
attach_one_default_qdisc(). This patch modifies attach_one_default_qdisc() to
attach the pfifo_fast (pfifo_fast_ops) if the network device type is
"ARPHRD_CAN".

[1] https://github.com/systemd/systemd/issues/9194

Suggested-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Vincent Prince <vincent.prince.fr@gmail.com>
Acked-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8152: check the pointer rtl_fw->fw before using it

Fix the pointer rtl_fw->fw would be used before checking in
rtl8152_apply_firmware() that causes the following kernel oops.

Unable to handle kernel NULL pointer dereference at virtual address 00000002
pgd = (ptrval)
[00000002] *pgd=00000000
Internal error: Oops: 5 [#1] PREEMPT SMP ARM
Modules linked in:
CPU: 0 PID: 131 Comm: kworker/0:2 Not tainted
5.4.0-rc1-00539-g9370f2d05a2a #6788
Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
Workqueue: events_long rtl_hw_phy_work_func_t
PC is at rtl8152_apply_firmware+0x14/0x464
LR is at r8153_hw_phy_cfg+0x24/0x17c
pc : [<c064f4e4>]    lr : [<c064fa18>]    psr: a0000013
sp : e75c9e60  ip : 60000013  fp : c11b7614
r10: e883b91c  r9 : 00000000  r8 : fffffffe
r7 : e883b640  r6 : fffffffe  r5 : fffffffe  r4 : e883b640
r3 : 736cfe7c  r2 : 736cfe7c  r1 : 000052f8  r0 : e883b640
Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
Control: 10c5387d  Table: 6640006a  DAC: 00000051
Process kworker/0:2 (pid: 131, stack limit = 0x(ptrval))
Stack: (0xe75c9e60 to 0xe75ca000)
...
[<c064f4e4>] (rtl8152_apply_firmware) from [<c064fa18>]
(r8153_hw_phy_cfg+0x24/0x17c)
[<c064fa18>] (r8153_hw_phy_cfg) from [<c064e784>]
(rtl_hw_phy_work_func_t+0x220/0x3e4)
[<c064e784>] (rtl_hw_phy_work_func_t) from [<c0148a74>]
(process_one_work+0x22c/0x7c8)
[<c0148a74>] (process_one_work) from [<c0149054>] (worker_thread+0x44/0x520)
[<c0149054>] (worker_thread) from [<c0150548>] (kthread+0x130/0x164)
[<c0150548>] (kthread) from [<c01010b4>] (ret_from_fork+0x14/0x20)
Exception stack(0xe75c9fb0 to 0xe75c9ff8)
...

Fixes: 9370f2d05a2a ("r8152: support request_firmware for RTL8153")
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests/bpf: Fix .gitignore to ignore no_alu32/

When switching to alu32 by default, no_alu32/ subdirectory wasn't added
to .gitignore. Fix it.

Fixes: e13a2fe642bd ("tools/bpf: Turn on llvm alu32 attribute by default")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20191025045503.3043427-1-andriin@fb.com

bpftool: Allow to read btf as raw data

The bpftool interface stays the same, but now it's possible
to run it over BTF raw data, like:

  $ bpftool btf dump file /sys/kernel/btf/vmlinux
  [1] INT '(anon)' size=4 bits_offset=0 nr_bits=32 encoding=(none)
  [2] INT 'long unsigned int' size=8 bits_offset=0 nr_bits=64 encoding=(none)
  [3] CONST '(anon)' type_id=2

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Link: https://lore.kernel.org/bpf/20191024133025.10691-1-jolsa@kernel.org