review.tizen.org Git - platform/kernel/linux-rpi.git/log

afs: Keep and pass sockaddr_rxrpc addresses rather than in_addr

Keep and pass sockaddr_rxrpc addresses around rather than keeping and
passing in_addr addresses to allow for the use of IPv6 and non-standard
port numbers in future.

This also allows the port and service_id fields to be removed from the
afs_call struct.

Signed-off-by: David Howells <dhowells@redhat.com>

afs: Update the cache index structure

Update the cache index structure in the following ways:

(1) Don't use the volume name followed by the volume type as levels in the
     cache index.  Volumes can be renamed.  Use the volume ID instead.

(2) Don't store the VLDB data for a volume in the tree.  If the volume
     database should be cached locally, then it should be done in a separate
     tree.

(3) Expand the volume ID stored in the cache to 64 bits.

(4) Expand the file/vnode ID stored in the cache to 96 bits.

(5) Increment the cache structure version number to 1.

Signed-off-by: David Howells <dhowells@redhat.com>

afs: Add some protocol defs

Add some protocol definitions, including max field lengths, flag defs, an
XDR-encoded UUID def, more VL operation IDs and more fileserver abort
codes.

Signed-off-by: David Howells <dhowells@redhat.com>

afs: Push the net ns pointer to more places

Push the network namespace pointer to more places in AFS, including the
afs_server structure (which doesn't hold a ref on the netns).

In particular, afs_put_cell() now takes requires a net ns parameter so that
it can safely alter the netns after decrementing the cell usage count - the
cell will be deallocated by a background thread after being cached for a
period, which means that it's not safe to access it after reducing its
usage count.

Signed-off-by: David Howells <dhowells@redhat.com>

afs: Note the cell in the superblock info also

Keep a reference to the cell in the superblock info structure in addition
to the volume and net pointers. This will make it easier to clean up in a
future patch in which afs_put_volume() will need the cell pointer.

Whilst we're at it, make the cell and volume getting functions return a
pointer to the object got to make the call sites look neater.

Signed-off-by: David Howells <dhowells@redhat.com>

afs: Fix server reaping

Fix server reaping and make sure it's all done before we start trying to
purge cells, given that servers currently pin cells.

Signed-off-by: David Howells <dhowells@redhat.com>

afs: Close the rxrpc socket only after purging the servers

Close the rxrpc socket only after we've purged the server records (and also
cell and volume records which might refer to servers) so that we can give
up the callbacks on each server.

Signed-off-by: David Howells <dhowells@redhat.com>

afs: Lay the groundwork for supporting network namespaces

Lay the groundwork for supporting network namespaces (netns) to the AFS
filesystem by moving various global features to a network-namespace struct
(afs_net) and providing an instance of this as a temporary global variable
that everything uses via accessor functions for the moment.

The following changes have been made:

(1) Store the netns in the superblock info.  This will be obtained from
     the mounter's nsproxy on a manual mount and inherited from the parent
     superblock on an automount.

(2) The cell list is made per-netns.  It can be viewed through
     /proc/net/afs/cells and also be modified by writing commands to that
     file.

(3) The local workstation cell is set per-ns in /proc/net/afs/rootcell.
     This is unset by default.

(4) The 'rootcell' module parameter, which sets a cell and VL server list
     modifies the init net namespace, thereby allowing an AFS root fs to be
     theoretically used.

(5) The volume location lists and the file lock manager are made
     per-netns.

(6) The AF_RXRPC socket and associated I/O bits are made per-ns.

The various workqueues remain global for the moment.

Changes still to be made:

(1) /proc/fs/afs/ should be moved to /proc/net/afs/ and a symlink emplaced
     from the old name.

(2) A per-netns subsys needs to be registered for AFS into which it can
     store its per-netns data.

(3) Rather than the AF_RXRPC socket being opened on module init, it needs
     to be opened on the creation of a superblock in that netns.

(4) The socket needs to be closed when the last superblock using it is
     destroyed and all outstanding client calls on it have been completed.
     This prevents a reference loop on the namespace.

(5) It is possible that several namespaces will want to use AFS, in which
     case each one will need its own UDP port.  These can either be set
     through /proc/net/afs/cm_port or the kernel can pick one at random.
     The init_ns gets 7001 by default.

Other issues that need resolving:

(1) The DNS keyring needs net-namespacing.

(2) Where do upcalls go (eg. DNS request-key upcall)?

(3) Need something like open_socket_in_file_ns() syscall so that AFS
     command line tools attempting to operate on an AFS file/volume have
     their RPC calls go to the right place.

Signed-off-by: David Howells <dhowells@redhat.com>

Pass mode to wait_on_atomic_t() action funcs and provide default actions

Make wait_on_atomic_t() pass the TASK_* mode onto its action function as an
extra argument and make it 'unsigned int throughout.

Also, consolidate a bunch of identical action functions into a default
function that can do the appropriate thing for the mode.

Also, change the argument name in the bit_wait*() function declarations to
reflect the fact that it's the mode and not the bit number.

[Peter Z gives this a grudging ACK, but thinks that the whole atomic_t wait
should be done differently, though he's not immediately sure as to how]

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
cc: Ingo Molnar <mingo@kernel.org>

Merge remote-tracking branch 'tip/timers/core' into afs-next

These AFS patches need the timer_reduce() patch from timers/core.

Signed-off-by: David Howells <dhowells@redhat.com>

Merge branch 'net-improve-the-process-of-redirect-and-toobig-for-ipv6-tunnels'

Xin Long says:

====================
net: improve the process of redirect and toobig for ipv6 tunnels

Now let's say there are 3 kinds of icmp packets to process for tunnels,
toobig(needfrag), redirect, others, their process should be:

- toobig(needfrag)
   update the lower dst's pmtu by route cache, also update sk dst's pmtu
   if possible, or it will be fine if sk dst pmtu will get updated on tx
   path.

- redirect
   update the lower dst's gw by route cache and return, no need to send
   this redirect packet to user sk.

- others
   send the packet to user's sk, or it will also be fine to use err_count
   to count it and report fail link on tx path.

All ipv4 tunnels basically follow this while some of ipv6 tunnels are
doing in different ways, like ip6gre and ip6_tunnels update tnl dev's
mtu instead of updating lower dst pmtu, no redirect process on their
err_handlers, which doesn't make any sense and even causes performance
problems.

This patchset is to improve the process of redirect and toobig for ip6gre
ip4ip6, ip6ip6 tunnels, as in ipv4 tunnels.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ip6_tunnel: clean up ip4ip6 and ip6ip6's err_handlers

This patch is to remove some useless codes of redirect and fix some
indents on ip4ip6 and ip6ip6's err_handlers.

Note that redirect icmp packet is already processed in ip6_tnl_err,
the old redirect codes in ip4ip6_err actually never worked even
before this patch. Besides, there's no need to send redirect to
user's sk, it's for lower dst, so just remove it in this patch.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ip6_tunnel: process toobig in a better way

The same improvement in "ip6_gre: process toobig in a better way"
is needed by ip4ip6 and ip6ip6 as well.

Note that ip4ip6 and ip6ip6 will also update sk dst pmtu in their
err_handlers. Like I said before, gre6 could not do this as it's
inner proto is not certain. But for all of them, sk dst pmtu will
be updated in tx path if in need.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ip6_tunnel: add the process for redirect in ip6_tnl_err

The same process for redirect in "ip6_gre: add the process for redirect
in ip6gre_err" is needed by ip4ip6 and ip6ip6 as well.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ip6_gre: process toobig in a better way

Now ip6gre processes toobig icmp packet by setting gre dev's mtu in
ip6gre_err, which would cause few things not good:

  - It couldn't set mtu with dev_set_mtu due to it's not in user context,
    which causes route cache and idev->cnf.mtu6 not to be updated.

  - It has to update sk dst pmtu in tx path according to gredev->mtu for
    ip6gre, while it updates pmtu again according to lower dst pmtu in
    ip6_tnl_xmit.

  - To change dev->mtu by toobig icmp packet is not a good idea, it should
    only work on pmtu.

This patch is to process toobig by updating the lower dst's pmtu, as later
sk dst pmtu will be updated in ip6_tnl_xmit, the same way as in ip4gre.

Note that gre dev's mtu will not be updated any more, it doesn't make any
sense to change dev's mtu after receiving a toobig packet.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ip6_gre: add the process for redirect in ip6gre_err

This patch is to add redirect icmp packet process for ip6gre by
calling ip6_redirect() in ip6gre_err(), as in vti6_err.

Prior to this patch, there's even no route cache generated after
receiving redirect.

Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

forcedeth: remove redudant assignments in xmit

In xmit process, the variables are set many times. In fact,
it is enough for these variables to be set once.
After a long time test, the throughput performance is better
than before.

CC: Srinivas Eeda <srinivas.eeda@oracle.com>
CC: Joe Jin <joe.jin@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'nfc-next-4.15-1' of git://git./linux/kernel/git/sameo/nfc-next

Samuel Ortiz says:

====================
NFC 4.15 pull request

This is the NFC pull request for 4.15. We have:

- A new netlink command for explicitly deactivating NFC targets
- i2c constification for all NFC drivers
- One NFC device allocation error path fix
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'Openvswitch-meter-action'

Andy Zhou says:

====================
Openvswitch meter action

This patch series is the first attempt to add openvswitch
meter support. We have previously experimented with adding
metering support in nftables. However 1) It was not clear
how to expose a named nftables object cleanly, and 2)
the logic that implements metering is quite small, < 100 lines
of code.

With those two observations, it seems cleaner to add meter
support in the openvswitch module directly.

---

    v1(RFC)->v2:  remove unused code improve locking
  and other review comments
    v2 -> v3:     rebase
    v3 -> v4:     fix undefined "__udivdi3" references on 32 bit builds.
                  use div_u64() instead.
    v4 -> v5:     rebase
====================

Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

openvswitch: Add meter action support

Implements OVS kernel meter action support.

Signed-off-by: Andy Zhou <azhou@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

openvswitch: Add meter infrastructure

OVS kernel datapath so far does not support Openflow meter action.
This is the first stab at adding kernel datapath meter support.
This implementation supports only drop band type.

Signed-off-by: Andy Zhou <azhou@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

openvswitch: export get_dp() API.

Later patches will invoke get_dp() outside of datapath.c. Export it.

Signed-off-by: Andy Zhou <azhou@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

openvswitch: Add meter netlink definitions

Meter has its own netlink family. Define netlink messages and attributes
for communicating with the user space programs.

Signed-off-by: Andy Zhou <azhou@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'dsa-b53-Support-prepended-Broadcom-tags'

Florian Fainelli says:

====================
net: dsa: b53: Support prepended Broadcom tags

This patch series adds support for prepended 4-bytes Broadcom tags that we
already support. This type of tag will typically be used when interfaced to
a SoC like BCM58xx (NorthStar Plus) which supports a Flow Accelerator (WIP).
In that case, we need to support a slightly different tagging format.

The first patch does a bit of re-factoring and passes a port index to
the get_tag_protocol() function since at least two different drivers need
that type of information (mt7530, b53) to support tagging or not.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: b53: Support prepended Broadcom tags

On BCM58xx devices (Northstar Plus), there is an accelerator attached to
port 8 which would only work if we use prepended Broadcom tags. Resolve
that difference in our get_tag_protocol() function by setting the
appropriate tagging protocol in that case. We need to change
b53_brcm_hdr_setup() a little bit now since we can deal with two types
of Broadcom tags.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: Support prepended Broadcom tag

Add a new type: DSA_TAG_PROTO_PREPEND which allows us to support for the
4-bytes Broadcom tag that we already support, but in a format where it
is pre-pended to the packet instead of located between the MAC SA and
the Ethertyper (DSA_TAG_PROTO_BRCM).

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: tag_brcm: Prepare for supporting prepended tag

In preparation for supporting the same Broadcom tag format, but instead
of inserted between the MAC SA and EtherType, prepended to the Ethernet
frame, restructure the code a little bit to make that possible and take
an offset parameter.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: Pass a port to get_tag_protocol()

A number of drivers want to check whether the configured CPU port is a
possible configuration for enabling tagging, pass down the CPU port
number so they verify that.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/sched/sch_red.c: work around gcc-4.4.4 anon union initializer issue

gcc-4.4.4 (at lest) has issues with initializers and anonymous unions:

net/sched/sch_red.c: In function 'red_dump_offload':
net/sched/sch_red.c:282: error: unknown field 'stats' specified in initializer
net/sched/sch_red.c:282: warning: initialization makes integer from pointer without a cast
net/sched/sch_red.c:283: error: unknown field 'stats' specified in initializer
net/sched/sch_red.c:283: warning: initialization makes integer from pointer without a cast
net/sched/sch_red.c: In function 'red_dump_stats':
net/sched/sch_red.c:352: error: unknown field 'xstats' specified in initializer
net/sched/sch_red.c:352: warning: initialization makes integer from pointer without a cast

Work around this.

Fixes: 602f3baf2218 ("net_sch: red: Add offload ability to RED qdisc")
Cc: Nogah Frankel <nogahf@mellanox.com>
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Simon Horman <simon.horman@netronome.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4: Use Kconfig flag to remove support of old gen2 Mellanox devices

Since Mellanox focus is on newer adapters, we would like to have the
ability to disable the support for old gen2 adapters.

This can be done by turning off the MLX4_CORE_GEN2 Kconfig flag.
We keep it turned on by default.

Signed-off-by: Slava Shwartsman <slavash@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

af_netlink: ensure that NLMSG_DONE never fails in dumps

The way people generally use netlink_dump is that they fill in the skb
as much as possible, breaking when nla_put returns an error. Then, they
get called again and start filling out the next skb, and again, and so
forth. The mechanism at work here is the ability for the iterative
dumping function to detect when the skb is filled up and not fill it
past the brim, waiting for a fresh skb for the rest of the data.

However, if the attributes are small and nicely packed, it is possible
that a dump callback function successfully fills in attributes until the
skb is of size 4080 (libmnl's default page-sized receive buffer size).
The dump function completes, satisfied, and then, if it happens to be
that this is actually the last skb, and no further ones are to be sent,
then netlink_dump will add on the NLMSG_DONE part:

nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI);

It is very important that netlink_dump does this, of course. However, in
this example, that call to nlmsg_put_answer will fail, because the
previous filling by the dump function did not leave it enough room. And
how could it possibly have done so? All of the nla_put variety of
functions simply check to see if the skb has enough tailroom,
independent of the context it is in.

In order to keep the important assumptions of all netlink dump users, it
is therefore important to give them an skb that has this end part of the
tail already reserved, so that the call to nlmsg_put_answer does not
fail. Otherwise, library authors are forced to find some bizarre sized
receive buffer that has a large modulo relative to the common sizes of
messages received, which is ugly and buggy.

This patch thus saves the NLMSG_DONE for an additional message, for the
case that things are dangerously close to the brim. This requires
keeping track of the errno from ->dump() across calls.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'netem-add-nsec-scheduling-and-slot-feature'

Dave Taht says:

====================
netem: add nsec scheduling and slot feature

This patch series converts netem away from the old "ticks" interface and
userspace API, and adds support for a new "slot" feature intended to
emulate bursty macs such as WiFi and LTE better.

Changes since v2:
Use u64 for packet_len_sched_time()
Use simpler max(time_to_send,q->slot.slot_next)

Changes since v1:
Always pass new nanosecond APIs to userspace
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

netem: support delivering packets in delayed time slots

Slotting is a crude approximation of the behaviors of shared media such
as cable, wifi, and LTE, which gather up a bunch of packets within a
varying delay window and deliver them, relative to that, nearly all at
once.

It works within the existing loss, duplication, jitter and delay
parameters of netem. Some amount of inherent latency must be specified,
regardless.

The new "slot" parameter specifies a minimum and maximum delay between
transmission attempts.

The "bytes" and "packets" parameters can be used to limit the amount of
information transferred per slot.

Examples of use:

tc qdisc add dev eth0 root netem delay 200us \
         slot 800us 10ms bytes 64k packets 42

A more correct example, using stacked netem instances and a packet limit
to emulate a tail drop wifi queue with slots and variable packet
delivery, with a 200Mbit isochronous underlying rate, and 20ms path
delay:

tc qdisc add dev eth0 root handle 1: netem delay 20ms rate 200mbit \
         limit 10000
tc qdisc add dev eth0 parent 1:1 handle 10:1 netem delay 200us \
         slot 800us 10ms bytes 64k packets 42 limit 512

Signed-off-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netem: add uapi to express delay and jitter in nanoseconds

netem userspace has long relied on a horrible /proc/net/psched hack
to translate the current notion of "ticks" to nanoseconds.

Expressing latency and jitter instead, in well defined nanoseconds,
increases the dynamic range of emulated delays and jitter in netem.

It will also ease a transition where reducing a tick to nsec
equivalence would constrain the max delay in prior versions of
netem to only 4.3 seconds.

Signed-off-by: Dave Taht <dave.taht@gmail.com>
Suggested-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netem: convert to qdisc_watchdog_schedule_ns

Upgrade the internal netem scheduler to use nanoseconds rather than
ticks throughout.

Convert to and from the std "ticks" userspace api automatically,
while allowing for finer grained scheduling to take place.

Signed-off-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: try not to take rtnl_lock in ip6mr_sk_done

Avoid traversing the list of mr6_tables (which requires the
rtnl_lock) in ip6mr_sk_done(), when we know in advance that
a match will not be found.
This can happen when rawv6_close()/ip6mr_sk_done() is invoked
on non-mroute6 sockets.
This patch helps reduce rtnl_lock contention when destroying
a large number of net namespaces, each having a non-mroute6
raw socket.

v2: same patch, only fixed subject line and expanded comment.

Signed-off-by: Francesco Ruggeri <fruggeri@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: realtek: r8169: remove redundant assignment to giga_ctrl

The variable giga_ctrl is being assigned to zero however this is
never read and hence the assignment is redundant, so remove it.
Cleans up clang warning:

drivers/net/ethernet/realtek/r8169.c:1978:3: warning: Value stored
to 'giga_ctrl' is never read

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: lan9303: Documentation: Add missing word "Mbps"

Signed-off-by: Egil Hjelmeland <privat@egil-hjelmeland.no>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: lan9303: Fix lan9303_alr_del_port()

Fix embarrassing bug in lan9303_alr_del_port(): Instead of zeroing
entr->mac_addr, I destroyed the next cache entry. Affected .port_fdb_del and
.port_mdb_del.

Fixes: 0620427ea0d6 ("net: dsa: lan9303: Add fdb/mdb manipulation")
Signed-off-by: Egil Hjelmeland <privat@egil-hjelmeland.no>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

timers: Add a function to start/reduce a timer

Add a function, similar to mod_timer(), that will start a timer if it isn't
running and will modify it if it is running and has an expiry time longer
than the new time.  If the timer is running with an expiry time that's the
same or sooner, no change is made.

The function looks like:

int timer_reduce(struct timer_list *timer, unsigned long expires);

This can be used by code such as networking code to make it easier to share
a timer for multiple timeouts.  For instance, in upcoming AF_RXRPC code,
the rxrpc_call struct will maintain a number of timeouts:

unsigned long ack_at;
unsigned long resend_at;
unsigned long ping_at;
unsigned long expect_rx_by;
unsigned long expect_req_by;
unsigned long expect_term_by;

each of which is set independently of the others.  With timer reduction
available, when the code needs to set one of the timeouts, it only needs to
look at that timeout and then call timer_reduce() to modify the timer,
starting it or bringing it forward if necessary.  There is no need to refer
to the other timeouts to see which is earliest and no need to take any lock
other than, potentially, the timer lock inside timer_reduce().

Note, that this does not protect against concurrent invocations of any of
the timer functions.

As an example, the expect_rx_by timeout above, which terminates a call if
we don't get a packet from the server within a certain time window, would
be set something like this:

unsigned long now = jiffies;
unsigned long expect_rx_by = now + packet_receive_timeout;
WRITE_ONCE(call->expect_rx_by, expect_rx_by);
timer_reduce(&call->timer, expect_rx_by);

The timer service code (which might, say, be in a work function) would then
check all the timeouts to see which, if any, had triggered, deal with
those:

t = READ_ONCE(call->ack_at);
if (time_after_eq(now, t)) {
cmpxchg(&call->ack_at, t, now + MAX_JIFFY_OFFSET);
set_bit(RXRPC_CALL_EV_ACK, &call->events);
}

and then restart the timer if necessary by finding the soonest timeout that
hasn't yet passed and then calling timer_reduce().

The disadvantage of doing things this way rather than comparing the timers
each time and calling mod_timer() is that you *will* take timer events
unless you can finish what you're doing and delete the timer in time.

The advantage of doing things this way is that you don't need to use a lock
to work out when the next timer should be set, other than the timer's own
lock - which you might not have to take.

[ tglx: Fixed weird formatting and adopted it to pending changes ]

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: keyrings@vger.kernel.org
Cc: linux-afs@lists.infradead.org
Link: https://lkml.kernel.org/r/151023090769.23050.1801643667223880753.stgit@warthog.procyon.org.uk

pstore: Use ktime_get_real_fast_ns() instead of __getnstimeofday()

__getnstimeofday() is a rather odd interface, with a number of quirks:

- The caller may come from NMI context, but the implementation is not NMI safe,
  one way to get there from NMI is

      NMI handler:
        something bad
          panic()
            kmsg_dump()
              pstore_dump()
                 pstore_record_init()
                   __getnstimeofday()

- The calling conventions are different from any other timekeeping functions,
  to deal with returning an error code during suspended timekeeping.

Address the above issues by using a completely different method to get the
time: ktime_get_real_fast_ns() is NMI safe and has a reasonable behavior
when timekeeping is suspended: it returns the time at which it got
suspended. As Thomas Gleixner explained, this is safe, as
ktime_get_real_fast_ns() does not call into the clocksource driver that
might be suspended.

The result can easily be transformed into a timespec structure. Since
ktime_get_real_fast_ns() was not exported to modules, add the export.

The pstore behavior for the suspended case changes slightly, as it now
stores the timestamp at which timekeeping was suspended instead of storing
a zero timestamp.

This change is not addressing y2038-safety, that's subject to a more
complex follow up patch.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Anton Vorontsov <anton@enomsg.org>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Colin Cross <ccross@android.com>
Link: https://lkml.kernel.org/r/20171110152530.1926955-1-arnd@arndb.de

Merge git://git./linux/kernel/git/davem/net

Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) Use after free in vlan, from Cong Wang.

2) Handle NAPI poll with a zero budget properly in mlx5 driver, from
    Saeed Mahameed.

3) If DMA mapping fails in mlx5 driver, NULL out page, from Inbar
    Karmy.

4) Handle overrun in RX FIFO of sun4i CAN driver, from Gerhard
    Bertelsmann.

5) Missing return in mdb and vlan prepare phase of DSA layer, from
    Vivien Didelot.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  vlan: fix a use-after-free in vlan_device_event()
  net: dsa: return after vlan prepare phase
  net: dsa: return after mdb prepare phase
  can: ifi: Fix transmitter delay calculation
  tcp: fix tcp_fastretrans_alert warning
  tcp: gso: avoid refcount_t warning from tcp_gso_segment()
  can: peak: Add support for new PCIe/M2 CAN FD interfaces
  can: sun4i: handle overrun in RX FIFO
  can: c_can: don't indicate triple sampling support for D_CAN
  net/mlx5e: Increase Striding RQ minimum size limit to 4 multi-packet WQEs
  net/mlx5e: Set page to null in case dma mapping fails
  net/mlx5e: Fix napi poll with zero budget
  net/mlx5: Cancel health poll before sending panic teardown command
  net/mlx5: Loop over temp list to release delay events
  rds: ib: Fix NULL pointer dereference in debug code

Merge tag 'wireless-drivers-next-for-davem-2017-11-11' of git://git./linux/kernel/git/kvalo/wireless-drivers-next

Kalle Valo says:

====================
wireless-drivers-next patches for 4.15

Last minute patches before the merge window. Not really anything
special standing out, mostly fixes or cleanup and some minor new
features.

Major changes:

iwlwifi

* some new PCI IDs
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: Remove unused skb_shared_info member

ip6_frag_id was only used by UFO, which has been removed.
ipv6_proxy_select_ident() only existed to set ip6_frag_id and has no
in-tree callers.

Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'l2tp-avoid-aliasing-tunnels-socket-pointer'

Guillaume Nault says:

====================
l2tp: avoid aliasing tunnels socket pointer

We don't need to copy the tunnel's socket pointer in the pseudo-wire
specific session structures. This uselessly complicates the code
and hampers evolution.

This series was part of an effort to protect tunnels socket pointer
with RCU. But since it provides nice cleanup, I submit it separately.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

l2tp: remove the .tunnel_sock field from struct pppol2tp_session

The last user of .tunnel_sock is pppol2tp_connect() which defensively
uses it to verify internal data consistency.

This check isn't necessary: l2tp_session_get() guarantees that the
returned session belongs to the tunnel passed as parameter. And
.tunnel_sock is never updated, so checking that it still points to
the parent tunnel socket is useless; that test can never fail.

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

l2tp: avoid using ->tunnel_sock for getting session's parent tunnel

Sessions don't need to use l2tp_sock_to_tunnel(xxx->tunnel_sock) for
accessing their parent tunnel. They have the .tunnel field in the
l2tp_session structure for that. Furthermore, in all these cases, the
session is registered, so we're guaranteed that .tunnel isn't NULL and
that the session properly holds a reference on the tunnel.

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

l2tp: remove .tunnel_sock from struct l2tp_eth

This field has never been used.

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'dsa-b53-Turn-on-Broadcom-tags'

Florian Fainelli says:

====================
net: dsa: b53: Turn on Broadcom tags

This was long overdue, with this patch series, the b53 driver now
turns on Broadcom tags except for 5325 and 5365 which use an older
format that we do not support yet (TBD).

First patch is necessary in order for bgmac, used on BCM5301X and Northstar
Plus to work correctly and successfully send ARP packets back to the requsester.

Second patch is actually a bug fix, but because net/master and net-next/master
diverge in that area, I am targeting net-next/master here.

Finally, the last patch enables Broadcom tags after checking that the CPU port
selected is either, 5, 7 or 8, since those are the only valid combinations
given currently supported HW.

Changes in v3:

- guarded padding with netdev_uses_dsa() to let the non-DSA use cases
not have a performance hit for smaller packets

- added missing select NET_DSA_TAG_BRCM to drivers/net/dsa/b53/Kconfig

Changes in v2:

- moved a hunk between patch 2 and patch 3 to avoid a bisectability issue
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: b53: Turn on Broadcom tags

Enable Broadcom tags for b53 devices, except 5325 and 5365 which use a
different Broadcom tag format not yet supported by net/dsa/tag_brcm.c.

We also make sure that we can turn on Broadcom tags on a CPU port number
that is capable of that: 5, 7 or 8.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: b53: Stop using dev->cpu_port incorrectly

dev->cpu_port is the driver local information that should only be used
to look up register offsets for a particular port, when they differ
(e.g: IMP port override), but it should certainly not be used in place
of the DSA configured CPU port.

Since the DSA switch layer calls port_vlan_{add,del}() on the CPU port
as well, we can remove the specific setting of the CPU port within
port_vlan_{add,del}.

Fixes: ff39c2d68679 ("net: dsa: b53: Add bridge support")
Fixes: 967dd82ffc52 ("net: dsa: b53: Add support for Broadcom RoboSwitch")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bgmac: Pad packets to a minimum size

In preparation for enabling Broadcom tags with b53, pad packets to a
minimum size of 64 bytes (sans FCS) in order for the Broadcom switch to
accept ingressing frames. Without this, we would typically be able to
DHCP, but not resolve with ARP because packets are too small and get
rejected by the switch.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'linux-can-fixes-for-4.14-20171110' of git://git./linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2017-11-10

this is a pull request for net/master.

The first patch by Richard Schütz for the c_can driver removes the false
indication to support triple sampling for d_can. Gerhard Bertelsmann's
patch for the sun4i driver improves the RX overrun handling. The patch
by Stephane Grosjean for the peak_canfd driver adds the PCI ids for
various new PCIe/M2 interfaces. Marek Vasut's patch for the ifi driver
fix transmitter delay calculation.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'dsa-lan9303-IGMP-handling'

Egil Hjelmeland says:

====================
net: dsa: lan9303: IGMP handling

Set up the HW switch to trap IGMP packets to CPU port.
And make sure skb->offload_fwd_mark is cleared for incoming IGMP packets.

skb->offload_fwd_mark calculation is a candidate for consolidation into the
DSA core. The calculation can probably be more polished when done at a point
where DSA has updated skb.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: lan9303: Clear offload_fwd_mark for IGMP

Now that IGMP packets no longer is flooded in HW, we want the SW bridge to
forward packets based on bridge configuration. To make that happen,
IGMP packets must have skb->offload_fwd_mark = 0.

Signed-off-by: Egil Hjelmeland <privat@egil-hjelmeland.no>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: lan9303: Set up trapping of IGMP to CPU port

IGMP packets should be trapped to the CPU port. The SW bridge knows
whether to forward to other ports.

With "IGMP snooping for local traffic" merged, IGMP trapping is also
required for stable IGMPv2 operation.

LAN9303 does not trap IGMP packets by default.

Enable IGMP trapping in lan9303_setup.

Signed-off-by: Egil Hjelmeland <privat@egil-hjelmeland.no>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: collect vpd info directly from hardware

Collect vpd information directly from hardware instead of software
adapter context. Move EEPROM physical address to virtual address
translation logic to t4_hw.c and update relevant files.

Fixes: 6f92a6544f1a ("cxgb4: collect hardware misc dumps")
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'mlx5-fixes-2017-11-08' of git://git./linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
Mellanox, mlx5 fixes 2017-11-08

The following series includes some fixes for mlx5 core and etherent
driver.

Sorry for the late submission but as you can see i have some very
critical fixes below that i would like them merged into this RC.

Please pull and let me know if there is any problem.

For -stable:
('net/mlx5e: Set page to null in case dma mapping fails') kernels >= 4.13
('net/mlx5: FPGA, return -EINVAL if size is zero') kernels >= 4.13
('net/mlx5: Cancel health poll before sending panic teardown command') kernels >= 4.13

V1->V2:
- Fix Reviewed-by tag of the 2nd patch.
- Drop the FPGA 0 size fix, it needs some more change log info.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

vlan: fix a use-after-free in vlan_device_event()

After refcnt reaches zero, vlan_vid_del() could free
dev->vlan_info via RCU:

RCU_INIT_POINTER(dev->vlan_info, NULL);
call_rcu(&vlan_info->rcu, vlan_info_rcu_free);

However, the pointer 'grp' still points to that memory
since it is set before vlan_vid_del():

        vlan_info = rtnl_dereference(dev->vlan_info);
        if (!vlan_info)
                goto out;
        grp = &vlan_info->grp;

Depends on when that RCU callback is scheduled, we could
trigger a use-after-free in vlan_group_for_each_dev()
right following this vlan_vid_del().

Fix it by moving vlan_vid_del() before setting grp. This
is also symmetric to the vlan_vid_add() we call in
vlan_device_event().

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Fixes: efc73f4bbc23 ("net: Fix memory leak - vlan_info struct")
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Girish Moodalbail <girish.moodalbail@oracle.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Girish Moodalbail <girish.moodalbail@oracle.com>
Tested-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: Fix stats histogram mode

The statistics histogram mode was not being explicitly initialized on
devices other than the 6390 family. Clearing the statistics then
overwrote the default setting, setting the histogram to a reserved
mode.

Explicitly set the histogram mode for all devices. Change the
statistics clear into a read/modify/write, and since it is now more
complex, move it into global1.c.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mv88e6xxx-broadcast-flooding-in-hardware'

Andrew Lunn says:

====================
mv88e6xxx broadcast flooding in hardware

This patchset makes the mv88e6xxx driver perform flooding in hardware,
rather than let the software bridge perform the flooding. This is a
prerequisite for IGMP snooping on the bridge interface.

In order to make hardware broadcasting work, a few other issues need
fixing or improving. SWITCHDEV_ATTR_ID_PORT_PARENT_ID is broken, which
is apparent when testing on the ZII devel board with multiple
switches.

Some of these patches are taken from a previous RFC patchset of IGMP
support.

Rebased onto net-next, with fixup for Vivien's refactoring.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: Flood broadcast frames in hardware

By default, the switch does not flood broadcast frames. Instead the
broadcast address is unknown in the ATU, so the frame gets forwarded
out the cpu port. The software bridge then floods it back to the
individual switch ports which are members of the bridge.

Add an ATU entry in the switch so that it floods broadcast frames out
ports, rather than have the software bridge do it. Also, send a copy
out the cpu port and any dsa ports. Rely on the port vectors to
prevent broadcast frames leaking between bridges, and separated ports.

Additionally, when a VLAN is added, a new FID is allocated. This
represents a new table of ATU entries. A broadcast entry is added to
the new FID.

With offload_fwd_mark being set, the software bridge will not flood
the frames it receives back to the switch.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: Move mv88e6xxx_port_db_load_purge()

This function is going to be needed by a soon to be added new
function. Move it earlier so we can avoid a forward declaration.
No functional changes.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: Print offending port when vlan check fails

When testing if a VLAN is one more than one bridge, we print an error
message that the VLAN is already in use somewhere else. Print both the
new port which would like the VLAN, and the port which already has it,
to aid debugging.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: Fixed port netdev check for VLANs

Having the same VLAN on multiple bridges is currently unsupported as
an offload. mv88e6xxx_port_check_hw_vlan() is used to ensure that a
VLAN is not on multiple bridges when adding a VLAN range to a port. It
loops the ports and checks to see if there are ports in a different
bridge with the same VLAN.

While walking all switch ports, the code was checking if the new port
has a netdev slave attached to it. If not, skip checking the port
being walked. This seems like a typ0. If the new port does not have a
slave, how has a VLAN been added to it in the first place, requiring
this check be performed at all? More likely, we should be checking if
the port being walked has a slave. Without the port having a slave, it
cannot have a VLAN on it, so there is no need to check further for
that particular port.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: {e}dsa: set offload_fwd_mark on received packets

The software bridge needs to know if a packet has already been bridged
by hardware offload to ports in the same hardware offload, in order
that it does not re-flood them, causing duplicates. This is
particularly true for broadcast and multicast traffic which the host
has requested.

By setting offload_fwd_mark in the skb the bridge will only flood to
ports in other offloads and other netifs. Set this flag in the DSA and
EDSA tag driver.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: Fix SWITCHDEV_ATTR_ID_PORT_PARENT_ID

SWITCHDEV_ATTR_ID_PORT_PARENT_ID is used by the software bridge when
determining which ports to flood a packet out. If the packet
originated from a switch, it assumes the switch has already flooded
the packet out the switches ports, so the bridge should not flood the
packet itself out switch ports. Ports on the same switch are expected
to return the same parent ID when SWITCHDEV_ATTR_ID_PORT_PARENT_ID is
called.

DSA gets this wrong with clusters of switches. As far as the software
bridge is concerned, the cluster is all one switch. A packet from any
switch in the cluster can be assumed to have been flooded as needed
out of all ports of the cluster, not just the switch it originated
from. Hence all ports of a cluster should return the same parent. The
old implementation did not, each switch in the cluster had its own ID.

Also wrong was that the ID was not unique if multiple DSA instances
are in operation.

Use the tree ID as the parent ID, which is the same for all switches
in a cluster and unique across switch clusters.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'ieee802154-for-davem-2017-11-09' of git://git./linux/kernel/git/sschmidt/wpan-next

Stefan Schmidt says:

====================
pull-request: net-next: ieee802154 2017-11-09

A small update on ieee802154 patches for net-next. Nothing dramatic, but simply
housekeeping this time around.
A fix for the correct mask to be applied in the mrf24j40 driver by Gustavo A. R. Silva
Removal of a non existing email user for the ca8210 driver by Harry Morris
A bunch of checkpatch cleanups across the subsystem from myself
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

bindings: net: stmmac: correctify note about LPI interrupt

There are two different combined signal for various interrupt events:
In EQOS-CORE and EQOS-MTL configurations, mci_intr_o is the interrupt
signal.
In EQOS-DMA, EQOS-AHB and EQOS-AXI configurations, these interrupt events
are combined with the events in the DMA on the sbd_intr_o signal.

Depending on configuration, the device tree irq "macirq" will refer to
either mci_intr_o or sbd_intr_o.

The databook states:
"The MAC generates the LPI interrupt when the Tx or Rx side enters or exits
the LPI state. The interrupt mci_intr_o (sbd_intr_o in certain
configurations) is asserted when the LPI interrupt status is set.

When the MAC exits the Rx LPI state, then in addition to the mci_intr_o
(sbd_intr_o in certain configurations), the sideband signal lpi_intr_o is
asserted.

If you do not want to gate-off the application clock during the Rx LPI
state, you can leave the lpi_intr_o signal unconnected and use the
mci_intr_o (sbd_intr_o in certain configurations) signal to detect Rx LPI
exit."

Since the "macirq" is always raised when Tx or Rx enters/exits the LPI
state, "eth_lpi" must therefore refer to lpi_intr_o, which is only raised
when Rx exits the LPI state. Update the DT binding description to reflect
reality.

Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Acked-by: Alexandre TORGUE <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipvlan: fix ipv6 outbound device

When process the outbound packet of ipv6, we should assign the master
device to output device other than input device.

Signed-off-by: Keefe Liu <liuqifa@huawei.com>
Acked-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: thunderx: fix double free error

This patch fixes an error in memory allocation/freeing in
ThunderX PF driver.

I moved the allocation to the probe() function and made it managed.

>From the Colin's email:

While running static analysis on linux-next with CoverityScan I found 3
double free errors in the Cavium thunder driver.

The issue occurs on the err_disable_device: label of function nic_probe
when nic_free_lmacmem(nic) is called and a double free occurs on
nic->duplex, nic->link and nic->speed.  This occurs when nic_init_hw()
fails:

        /* Initialize hardware */
        err = nic_init_hw(nic);
        if (err)
                goto err_release_regions;

nic_init_hw() calls nic_get_hw_info() and this calls nic_free_lmacmem()
if any of the allocations fail. This free'ing occurs again by the call
to nic_free_lmacmem() on the err_release_regions exit path in nic_probe().

Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: thunderbolt: Clear finished Tx frame bus address in tbnet_tx_callback()

When Thunderbolt network interface is disabled or when the cable is
unplugged the driver releases all allocated buffers by calling
tbnet_free_buffers() for each ring. This function then calls
dma_unmap_page() for each buffer it finds where bus address is non-zero.
Now, we only clear this bus address when the Tx buffer is sent to the
hardware so it is possible that the function finds an entry that has
already been unmapped.

Enabling DMA-API debugging catches this as well:

thunderbolt 0000:06:00.0: DMA-API: device driver tries to free DMA
memory it has not allocated [device address=0x0000000068321000] [size=4096 bytes]

Fix this by clearing the bus address of a Tx frame right after we have
unmapped the buffer.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sock: Remove the global prot_inuse counter.

The per-cpu counter for init_net is prepared in core_initcall.
The patch 7d720c3e ("percpu: add __percpu sparse annotations to net")
and d6d9ca0fe ("net: this_cpu_xxx conversions") optimize the
routines. Then remove the old counter.

Cc: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Tonghao Zhang <zhangtonghao@didichuxing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sfc: remove redundant variable start

Variable start is assigned but never read hence it is redundant
and can be removed. Cleans up clang warning:

drivers/net/ethernet/sfc/ptp.c:655:2: warning: Value stored to 'start'
is never read

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qlge: remove duplicated assignment to mbcp

The assignment to mbcp is identical to the initiatialized value assigned
to mbcp at declaration time a few lines earlier, hence we can remove the
second redundant assignment. Cleans up clang warning:

drivers/net/ethernet/qlogic/qlge/qlge_mpi.c:209:22: warning:
Value stored to 'mbcp' during its initialization is never read

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: wan: x25_asy: mark expected switch fall-through

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Addresses-Coverity-ID: 114928
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: 3com: 3c574_cs: mark expected switch fall-through

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Addresses-Coverity-ID: 114888
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: 8390: pcnet_cs: mark expected switch fall-through

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Addresses-Coverity-ID: 114891
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: decnet: dn_table: mark expected switch fall-through

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Addresses-Coverity-ID: 115106
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

l2tp: don't close sessions in l2tp_tunnel_destruct()

Sessions are already removed by the proto ->destroy() handlers, and
since commit f3c66d4e144a ("l2tp: prevent creation of sessions on terminated tunnels"),
we're guaranteed that no new session can be created afterwards.

Furthermore, l2tp_tunnel_closeall() can sleep when there are sessions
left to close. So we really shouldn't call it in a ->sk_destruct()
handler, as it can be used from atomic context.

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'FACK-loss-recovery-remove'

Yuchung Cheng says:

====================
remove FACK loss recovery

This patch set removes the forward-acknowledgment (FACK)
packet-based loss and reordering detection. This simplifies TCP
loss recovery since the SACK scoreboard no longer needs to track
the number of pending packets under highest SACKed sequence. FACK
is subsumed by the time-based RACK loss detection which is more
robust under reordering and second order losses.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: use sequence distance to detect reordering

Replace the reordering distance measurement in packet unit with
sequence based approach. Previously it trackes the number of "packets"
toward the forward ACK (i.e. highest sacked sequence)in a state
variable "fackets_out".

Precisely measuring reordering degree on packet distance has not much
benefit, as the degree constantly changes by factors like path, load,
and congestion window. It is also complicated and prone to arcane bugs.
This patch replaces with sequence-based approach that's much simpler.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: retire FACK loss detection

FACK loss detection has been disabled by default and the
successor RACK subsumed FACK and can handle reordering better.
This patch removes FACK to simplify TCP loss recovery.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fsl/fman_port: mark expected switch fall-throughs

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Addresses-Coverity-ID: 1397960
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: bgmac: mark expected switch fall-through

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Addresses-Coverity-ID: 1397972
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ibmvnic: Add vnic client data to login buffer

Update the login buffer to include client data for the vnic driver,
this includes the OS name, LPAR name, and device name. This update
allows this information to be available in the VIOS.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: Revert bpf_overrid_function() helper changes.

NACK'd by x86 maintainer.

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'bpf-Fix-bugs-in-sock_ops-samples'

Lawrence Brakmo says:

====================
bpf: Fix bugs in sock_ops samples

The programs were returning -1 in some cases when they should
only return 0 or 1. Changes in the verifier now catch this
issue and the programs fail to load. This is now fixed.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: Fix tcp_clamp_kern.c sample program

The program was returning -1 in some cases which is not allowed
by the verifier any longer.

Fixes: 390ee7e29fc8 ("bpf: enforce return code for cgroup-bpf programs")
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: Fix tcp_iw_kern.c sample program

The program was returning -1 in some cases which is not allowed
by the verifier any longer.

Fixes: 390ee7e29fc8 ("bpf: enforce return code for cgroup-bpf programs")
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: Fix tcp_cong_kern.c sample program

The program was returning -1 in some cases which is not allowed
by the verifier any longer.

Fixes: 390ee7e29fc8 ("bpf: enforce return code for cgroup-bpf programs")
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: Fix tcp_bufs_kern.c sample program

The program was returning -1 in some cases which is not allowed
by the verifier any longer.

Fixes: 390ee7e29fc8 ("bpf: enforce return code for cgroup-bpf programs")
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: Fix tcp_rwnd_kern.c sample program

The program was returning -1 in some cases which is not allowed
by the verifier any longer.

Fixes: 390ee7e29fc8 ("bpf: enforce return code for cgroup-bpf programs")
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: Fix tcp_synrto_kern.c sample program

The program was returning -1 in some cases which is not allowed
by the verifier any longer.

Fixes: 390ee7e29fc8 ("bpf: enforce return code for cgroup-bpf programs")
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: return after vlan prepare phase

The current code does not return after successfully preparing the VLAN
addition on every ports member of a it. Fix this.

Fixes: 1ca4aa9cd4cc ("net: dsa: check VLAN capability of every switch")
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: return after mdb prepare phase

The current code does not return after successfully preparing the MDB
addition on every ports member of a multicast group. Fix this.

Fixes: a1a6b7ea7f2d ("net: dsa: add cross-chip multicast support")
Reported-by: Egil Hjelmeland <privat@egil-hjelmeland.no>
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: improve link resiliency when rps is activated

Currently, the TIPC RPS dissector is based only on the incoming packets'
source node address, hence steering all traffic from a node to the same
core. We have seen that this makes the links vulnerable to starvation
and unnecessary resets when we turn down the link tolerance to very low
values.

To reduce the risk of this happening, we exempt probe and probe replies
packets from the convergence to one core per source node. Instead, we do
the opposite, - we try to diverge those packets across as many cores as
possible, by randomizing the flow selector key.

To make such packets identifiable to the dissector, we add a new
'is_keepalive' bit to word 0 of the LINK_PROTOCOL header. This bit is
set both for PROBE and PROBE_REPLY messages, and only for those.

It should be noted that these packets are not part of any flow anyway,
and only constitute a minuscule fraction of all packets sent across a
link. Hence, there is no risk that this will affect overall performance.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'macb-next'

Michael Grzeschik says:

====================
net: macb: add error handling on probe and

This series adds more error handling to the macb driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: macb: add of_node_put to error paths

We add the call of_node_put(bp->phy_node) to all associated error
paths for memory clean up.

Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>