Daniel Borkmann [Sun, 15 Jan 2017 00:34:25 +0000 (01:34 +0100)]
bpf, trace: make ctx access checks more robust
Make sure that ctx cannot potentially be accessed oob by asserting
explicitly that ctx access size into pt_regs for BPF_PROG_TYPE_KPROBE
programs must be within limits. In case some 32bit archs have pt_regs
not being a multiple of 8, then BPF_DW access could cause such access.
BPF_PROG_TYPE_KPROBE progs don't have a ctx conversion function since
there's no extra mapping needed. kprobe_prog_is_valid_access() didn't
enforce sizeof(long) as the only allowed access size, since LLVM can
generate non BPF_W/BPF_DW access to regs from time to time.
For BPF_PROG_TYPE_TRACEPOINT we don't have a ctx conversion either, so
add a BUILD_BUG_ON() check to make sure that BPF_DW access will not be
a similar issue in future (ctx works on event buffer as opposed to
pt_regs there).
Fixes:
2541517c32be ("tracing, perf: Implement BPF programs attached to kprobes")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mahesh Bandewar [Fri, 13 Jan 2017 23:48:30 +0000 (15:48 -0800)]
ipvlan: fix dev_id creation corner case.
In the last patch
da36e13cf65 ("ipvlan: improvise dev_id generation
logic in IPvlan") I missed some part of Dave's suggestion and because
of that the dev_id creation could fail in a corner case scenario. This
would happen when more or less 64k devices have been already created and
several have been deleted. If the devices that are still sticking around
are the last n bits from the bitmap. So in this scenario even if lower
bits are available, the dev_id search is so narrow that it always fails.
Fixes:
da36e13cf65 ("ipvlan: improvise dev_id generation logic in IPvlan")
CC: David Miller <davem@davemloft.org>
CC: Eric Dumazet <edumazet@google.com>
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Fri, 13 Jan 2017 21:20:29 +0000 (21:20 +0000)]
sfc: get PIO buffer size from the NIC
The 8000 series SFC NICs have 4K PIO buffers, rather than the 2K of
the 7000 series. Rather than having a hard-coded PIO buffer size
(ER_DZ_TX_PIOBUF_SIZE), read it from the GET_CAPABILITIES_V2 MCDI
response.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Fri, 13 Jan 2017 21:20:14 +0000 (21:20 +0000)]
sfc: allow PIO more often
If an option descriptor has been sent on a queue but not followed by a
packet, there will have been no completion event, so the read and write
counts won't match and we'll think we can't do PIO. This combines with
the fact that we have two TX queues (for en/disable checksum offload),
and that both must be empty for PIO to happen.
This patch adds a separate "packet_write_count" that tracks the most
recent write_count we expect to see a completion event for; this excludes
option descriptors but _includes_ PIO descriptors (even though they look
like option descriptors). This is then used, rather than write_count,
in efx_nic_tx_is_empty().
We only bother to maintain packet_write_count on EF10, since on Siena
(a) there are no option descriptors and it always equals write_count, and
(b) there's no PIO, so we don't need it anyway.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marcelo Ricardo Leitner [Fri, 13 Jan 2017 20:31:15 +0000 (18:31 -0200)]
sctp: remove useless code from sctp_apply_peer_addr_params
sctp_frag_point() doesn't store anything, and thus just calling it
cannot do anything useful.
sctp_apply_peer_addr_params is only called by
sctp_setsockopt_peer_addr_params. When operating on an asoc,
sctp_setsockopt_peer_addr_params will call sctp_apply_peer_addr_params
once for the asoc, and then once for each transport this asoc has,
meaning that the frag_point will be recomputed when updating the
transports and calling it when updating the asoc is not necessary.
IOW, no action is needed here and we can remove this call.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marcelo Ricardo Leitner [Fri, 13 Jan 2017 20:27:33 +0000 (18:27 -0200)]
sctp: remove unused var from sctp_process_asconf
Assigned but not used.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Fri, 13 Jan 2017 18:48:20 +0000 (18:48 +0000)]
flow dissector: check if arp_eth is null rather than arp
arp is being checked instead of arp_eth to see if the call to
__skb_header_pointer failed. Fix this by checking arp_eth is
null instead of arp. Also fix to use length hlen rather than
hlen - sizeof(_arp); thanks to Eric Dumazet for spotting
this latter issue.
CoverityScan CID#1396428 ("Logically dead code") on 2nd
arp comparison (which should be arp_eth instead).
Fixes: commit
55733350e5e8b70c5 ("flow disector: ARP support")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 13 Jan 2017 17:11:22 +0000 (09:11 -0800)]
netlink: do not enter direct reclaim from netlink_trim()
In commit
d35c99ff77ecb ("netlink: do not enter direct reclaim from
netlink_dump()") we made sure to not trigger expensive memory reclaim.
Problem is that a bit later, netlink_trim() might be called and
trigger memory reclaim.
netlink_trim() should be best effort, and really as fast as possible.
Under memory pressure, it is fine to not trim this skb.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hariprasad Shenai [Fri, 13 Jan 2017 16:25:26 +0000 (21:55 +0530)]
cxgb4: Shutdown adapter if firmware times out or errors out
Perform an emergency shutdown of the adapter and stop it from
continuing any further communication on the ports or DMA to the
host. This is typically used when the adapter and/or firmware
have crashed and we want to prevent any further accidental
communication with the rest of the world. This will also force
the port Link Status to go down -- if register writes work --
which should help our peers figure out that we're down.
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Fri, 13 Jan 2017 14:46:19 +0000 (14:46 +0000)]
afs: Conditionalise a new unused variable
The bulk readpages support introduced a harmless warning:
fs/afs/file.c: In function 'afs_readpages_page_done':
fs/afs/file.c:270:20: error: unused variable 'vnode' [-Werror=unused-variable]
This adds an #ifdef to match the user of that variable. The user of the
variable has to be conditional because it accesses a member of a struct
that is also conditional.
Fixes:
91b467e0a3f5 ("afs: Make afs_readpages() fetch data in bulk")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shyam Saini [Mon, 16 Jan 2017 03:56:21 +0000 (09:26 +0530)]
sfc: Replace memset with eth_zero_addr
Use eth_zero_addr to assign zero address to the given address array
instead of memset when the second argument in memset is address
of zero which makes the code clearer and also add header
file linux/etherdevice.h
Signed-off-by: Shyam Saini <mayhs11saini@gmail.com>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Thu, 12 Jan 2017 18:46:32 +0000 (21:46 +0300)]
stmmac: indent an if statement
The break statement should be indented one more tab.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
jpinto [Thu, 12 Jan 2017 09:56:18 +0000 (09:56 +0000)]
synopsys: remove dwc_eth_qos driver
This driver is no longer necessary since it was merged into stmmac.
Acked-by: Lars Persson <larper@axis.com>
Signed-off-by: Joao Pinto <jpinto@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 14 Jan 2017 17:02:15 +0000 (12:02 -0500)]
Merge tag 'mac80211-next-for-davem-2017-01-13' of git://git./linux/kernel/git/jberg/mac80211-next
Johannes Berg says:
====================
For 4.11, we seem to have more than in the past few releases:
* socket owner support for connections, so when the wifi
manager (e.g. wpa_supplicant) is killed, connections are
torn down - wpa_supplicant is critical to managing certain
operations, and can opt in to this where applicable
* minstrel & minstrel_ht updates to be more efficient (time and space)
* set wifi_acked/wifi_acked_valid for skb->destructor use in the
kernel, which was already available to userspace
* don't indicate new mesh peers that might be used if there's no
room to add them
* multicast-to-unicast support in mac80211, for better medium usage
(since unicast frames can use *much* higher rates, by ~3 orders of
magnitude)
* add API to read channel (frequency) limitations from DT
* add infrastructure to allow randomizing public action frames for
MAC address privacy (still requires driver support)
* many cleanups and small improvements/fixes across the board
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Shyam Saini [Sat, 14 Jan 2017 01:17:40 +0000 (06:47 +0530)]
cxgb4: Remove redundant memset before memcpy
The region set by the call to memset, immediately overwritten by
the subsequent call to memcpy and thus makes the memset redundant.
Also remove the memset((&info, 0, sizeof(info)) on line 398 because
info is memcpy()'ed to before being used in the loop and it isn't
used outside of the loop.
Signed-off-by: Shyam Saini <mayhs11saini@gmail.com>
Reviewed-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ganesh Goudar [Fri, 13 Jan 2017 09:16:40 +0000 (14:46 +0530)]
cxgb4: Fix misleading packet/frame count stats.
Do not count pause frames as part of general TX/RX frame
counters.
Based on the original work of Casey Leedom <leedom@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 14 Jan 2017 04:21:31 +0000 (23:21 -0500)]
Merge branch 'bnxt_en-next'
Michael Chan says:
====================
bnxt_en: Misc. updates for net-next.
Miscellaneous updates including firmware spec update, ethtool -p blinking
LED support, RDMA SRIOV config callback, and minor fixes.
v2: Dropped the DCBX RoCE app TLV patch until the ETH_P_IBOE RDMA patch
is merged.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Fri, 13 Jan 2017 06:32:04 +0000 (01:32 -0500)]
bnxt_en: Add the ulp_sriov_cfg hooks for bnxt_re RDMA driver.
Add the ulp_sriov_cfg callbacks when the number of VFs is changing. This
allows the RDMA driver to provision RDMA resources for the VFs.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Fri, 13 Jan 2017 06:32:03 +0000 (01:32 -0500)]
bnxt_en: Add support for ethtool -p.
Add LED blinking code to support ethtool -p on the PF.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Fri, 13 Jan 2017 06:32:02 +0000 (01:32 -0500)]
bnxt_en: Update to firmware interface spec to 1.6.1.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Fri, 13 Jan 2017 06:32:01 +0000 (01:32 -0500)]
bnxt_en: Clear TPA flags when BNXT_FLAG_NO_AGG_RINGS is set.
Commit
bdbd1eb59c56 ("bnxt_en: Handle no aggregation ring gracefully.")
introduced the BNXT_FLAG_NO_AGG_RINGS flag. For consistency,
bnxt_set_tpa_flags() should also clear TPA flags when there are no
aggregation rings.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Fri, 13 Jan 2017 06:32:00 +0000 (01:32 -0500)]
bnxt_en: Fix compiler warnings when CONFIG_RFS_ACCEL is not defined.
CC [M] drivers/net/ethernet/broadcom/bnxt/bnxt.o
drivers/net/ethernet/broadcom/bnxt/bnxt.c:4947:21: warning: ‘bnxt_get_max_func_rss_ctxs’ defined but not used [-Wunused-function]
static unsigned int bnxt_get_max_func_rss_ctxs(struct bnxt *bp)
^
CC [M] drivers/net/ethernet/broadcom/bnxt/bnxt.o
drivers/net/ethernet/broadcom/bnxt/bnxt.c:4956:21: warning: ‘bnxt_get_max_func_vnics’ defined but not used [-Wunused-function]
static unsigned int bnxt_get_max_func_vnics(struct bnxt *bp)
^
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 14 Jan 2017 03:37:18 +0000 (22:37 -0500)]
Merge branch 'tcp-RACK-fast-recovery'
Yuchung Cheng says:
====================
tcp: RACK fast recovery
The patch set enables RACK loss detection (draft-ietf-tcpm-rack-01)
to trigger fast recovery with a reordering timer.
Previously RACK has been running in auxiliary mode where it is
used to detect packet losses once the recovery has triggered by
other algorithms (e.g., FACK). By inspecting packet timestamps,
RACK can start ACK-driven repairs timely. A few similar heuristics
are no longer needed and are either removed or disabled to reduce
the complexity of the Linux TCP loss recovery engine:
1. FACK (Forward Acknowledgement)
2. Early Retransmit (RFC5827)
3. thin_dupack (fast recovery on single DUPACK for thin-streams)
4. NCR (Non-Congestion Robustness RFC4653) (RFC4653)
5. Forward Retransmit
After this change, Linux's loss recovery algorithms consist of
1. Conventional DUPACK threshold approach (RFC6675)
2. RACK and Tail Loss Probe (draft-ietf-tcpm-rack-01)
3. RTO plus F-RTO extension (RFC5682)
The patch set has been tested on Google servers extensively and
presented in several IETF meetings. The data suggests that RACK
successfully improves recovery performance:
https://www.ietf.org/proceedings/97/slides/slides-97-tcpm-draft-ietf-tcpm-rack-01.pdf
https://www.ietf.org/proceedings/96/slides/slides-96-tcpm-3.pdf
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuchung Cheng [Fri, 13 Jan 2017 06:11:42 +0000 (22:11 -0800)]
tcp: disable fack by default
This patch disables FACK by default as RACK is the successor of FACK
(inspired by the insights behind FACK).
FACK[1] in Linux works as follows: a packet P is deemed lost,
if packet Q of higher sequence is s/acked and P and Q are distant
by at least dupthresh number of packets in sequence space.
FACK is more aggressive than the IETF recommened recovery for SACK
(RFC3517 A Conservative Selective Acknowledgment (SACK)-based Loss
Recovery Algorithm for TCP), because a single SACK may trigger
fast recovery. This obviously won't work well with reordering so
FACK is dynamically disabled upon detecting reordering.
RACK supersedes FACK by using time distance instead of sequence
distance. On reordering, RACK waits for a quarter of RTT receiving
a single SACK before starting recovery. (the timer can be made more
adaptive in the future by measuring reordering distance in time,
but currently RTT/4 seem to work well.) Once the recovery starts,
RACK behaves almost like FACK because it reduces the reodering
window to 1ms, so it fast retransmits quickly. In addition RACK
can detect loss retransmission as it does not care about the packet
sequences (being repeated or not), which is extremely useful when
the connection is going through a traffic policer.
Google server experiments indicate that disabling FACK after enabling
RACK has negligible impact on the overall loss recovery performance
with more reordering events detected. But we still keep the FACK
implementation for backup if RACK has bugs that needs to be disabled.
[1] M. Mathis, J. Mahdavi, "Forward Acknowledgment: Refining
TCP Congestion Control," In Proceedings of SIGCOMM '96, August 1996.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuchung Cheng [Fri, 13 Jan 2017 06:11:41 +0000 (22:11 -0800)]
tcp: remove thin_dupack feature
Thin stream DUPACK is to start fast recovery on only one DUPACK
provided the connection is a thin stream (i.e., low inflight). But
this older feature is now subsumed with RACK. If a connection
receives only a single DUPACK, RACK would arm a reordering timer
and soon starts fast recovery instead of timeout if no further
ACKs are received.
The socket option (THIN_DUPACK) is kept as a nop for compatibility.
Note that this patch does not change another thin-stream feature
which enables linear RTO. Although it might be good to generalize
that in the future (i.e., linear RTO for the first say 3 retries).
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuchung Cheng [Fri, 13 Jan 2017 06:11:40 +0000 (22:11 -0800)]
tcp: remove RFC4653 NCR
This patch removes the (partial) implementation of the aggressive
limited transmit in RFC4653 TCP Non-Congestion Robustness (NCR).
NCR is a mitigation to the problem created by the dynamic
DUPACK threshold. With the current adaptive DUPACK threshold
(tp->reordering) could cause timeouts by preventing fast recovery.
For example, if the last packet of a cwnd burst was reordered, the
threshold will be set to the size of cwnd. But if next application
burst is smaller than threshold and has drops instead of reorderings,
the sender would not trigger fast recovery but instead resorts to a
timeout recovery.
NCR mitigates this issue by checking the number of DUPACKs against
the current flight size additionally. The techniqueue is similar to
the early retransmit RFC.
With RACK loss detection, this mitigation is not needed, because RACK
does not use DUPACK threshold to detect losses. RACK arms a reordering
timer to fire at most a quarter RTT later to start fast recovery.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuchung Cheng [Fri, 13 Jan 2017 06:11:39 +0000 (22:11 -0800)]
tcp: remove early retransmit
This patch removes the support of RFC5827 early retransmit (i.e.,
fast recovery on small inflight with <3 dupacks) because it is
subsumed by the new RACK loss detection. More specifically when
RACK receives DUPACKs, it'll arm a reordering timer to start fast
recovery after a quarter of (min)RTT, hence it covers the early
retransmit except RACK does not limit itself to specific inflight
or dupack numbers.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuchung Cheng [Fri, 13 Jan 2017 06:11:38 +0000 (22:11 -0800)]
tcp: remove forward retransmit feature
Forward retransmit is an esoteric feature in RFC3517 (condition(3)
in the NextSeg()). Basically if a packet is not considered lost by
the current criteria (# of dupacks etc), but the congestion window
has room for more packets, then retransmit this packet.
However it actually conflicts with the rest of recovery design. For
example, when reordering is detected we want to be conservative
in retransmitting packets but forward-retransmit feature would
break that to force more retransmission. Also the implementation is
fairly complicated inside the retransmission logic inducing extra
iterations in the write queue. With RACK losses are being detected
timely and this heuristic is no longer necessary. There this patch
removes the feature.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuchung Cheng [Fri, 13 Jan 2017 06:11:37 +0000 (22:11 -0800)]
tcp: extend F-RTO to catch more spurious timeouts
Current F-RTO reverts cwnd reset whenever a never-retransmitted
packet was (s)acked. The timeout can be declared spurious because
the packets acknoledged with this ACK was transmitted before the
timeout, so clearly not all the packets are lost to reset the cwnd.
This nice detection does not really depend F-RTO internals. This
patch applies the detection universally. On Google servers this
change detected 20% more spurious timeouts.
Suggested-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuchung Cheng [Fri, 13 Jan 2017 06:11:36 +0000 (22:11 -0800)]
tcp: enable RACK loss detection to trigger recovery
This patch changes two things:
1. Start fast recovery with RACK in addition to other heuristics
(e.g., DUPACK threshold, FACK). Prior to this change RACK
is enabled to detect losses only after the recovery has
started by other algorithms.
2. Disable TCP early retransmit. RACK subsumes the early retransmit
with the new reordering timer feature. A latter patch in this
series removes the early retransmit code.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuchung Cheng [Fri, 13 Jan 2017 06:11:35 +0000 (22:11 -0800)]
tcp: check undo conditions before detecting losses
Currently RACK would mark loss before the undo operations in TCP
loss recovery. This could incorrectly identify real losses as
spurious. For example a sender first experiences a delay spike and
then eventually some packets were lost due to buffer overrun.
In this case, the sender should perform fast recovery b/c not all
the packets were lost.
But the sender may first trigger a (spurious) RTO and reset
cwnd to 1. The following ACKs may used to mark real losses by
tcp_rack_mark_lost. Then in tcp_process_loss this ACK could trigger
F-RTO undo condition and unmark real losses and revert the cwnd
reduction. If there are no more ACKs coming back, eventually the
sender would timeout again instead of performing fast recovery.
The patch fixes this incorrect process by always performing
the undo checks before detecting losses.
Fixes:
4f41b1c58a32 ("tcp: use RACK to detect losses")
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuchung Cheng [Fri, 13 Jan 2017 06:11:34 +0000 (22:11 -0800)]
tcp: use sequence to break TS ties for RACK loss detection
The packets inside a jumbo skb (e.g., TSO) share the same skb
timestamp, even though they are sent sequentially on the wire. Since
RACK is based on time, it can not detect some packets inside the
same skb are lost. However, we can leverage the packet sequence
numbers as extended timestamps to detect losses. Therefore, when
RACK timestamp is identical to skb's timestamp (i.e., one of the
packets of the skb is acked or sacked), we use the sequence numbers
of the acked and unacked packets to break ties.
We can use the same sequence logic to advance RACK xmit time as
well to detect more losses and avoid timeout.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuchung Cheng [Fri, 13 Jan 2017 06:11:33 +0000 (22:11 -0800)]
tcp: add reordering timer in RACK loss detection
This patch makes RACK install a reordering timer when it suspects
some packets might be lost, but wants to delay the decision
a little bit to accomodate reordering.
It does not create a new timer but instead repurposes the existing
RTO timer, because both are meant to retransmit packets.
Specifically it arms a timer ICSK_TIME_REO_TIMEOUT when
the RACK timing check fails. The wait time is set to
RACK.RTT + RACK.reo_wnd - (NOW - Packet.xmit_time) + fudge
This translates to expecting a packet (Packet) should take
(RACK.RTT + RACK.reo_wnd + fudge) to deliver after it was sent.
When there are multiple packets that need a timer, we use one timer
with the maximum timeout. Therefore the timer conservatively uses
the maximum window to expire N packets by one timeout, instead of
N timeouts to expire N packets sent at different times.
The fudge factor is 2 jiffies to ensure when the timer fires, all
the suspected packets would exceed the deadline and be marked lost
by tcp_rack_detect_loss(). It has to be at least 1 jiffy because the
clock may tick between calling icsk_reset_xmit_timer(timeout) and
actually hang the timer. The next jiffy is to lower-bound the timeout
to 2 jiffies when reo_wnd is < 1ms.
When the reordering timer fires (tcp_rack_reo_timeout): If we aren't
in Recovery we'll enter fast recovery and force fast retransmit.
This is very similar to the early retransmit (RFC5827) except RACK
is not constrained to only enter recovery for small outstanding
flights.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuchung Cheng [Fri, 13 Jan 2017 06:11:32 +0000 (22:11 -0800)]
tcp: record most recent RTT in RACK loss detection
Record the most recent RTT in RACK. It is often identical to the
"ca_rtt_us" values in tcp_clean_rtx_queue. But when the packet has
been retransmitted, RACK choses to believe the ACK is for the
(latest) retransmitted packet if the RTT is over minimum RTT.
This requires passing the arrival time of the most recent ACK to
RACK routines. The timestamp is now recorded in the "ack_time"
in tcp_sacktag_state during the ACK processing.
This patch does not change the RACK algorithm itself. It only adds
the RTT variable to prepare the next main patch.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuchung Cheng [Fri, 13 Jan 2017 06:11:31 +0000 (22:11 -0800)]
tcp: new helper for RACK to detect loss
Create a new helper tcp_rack_detect_loss to prepare the upcoming
RACK reordering timer patch.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuchung Cheng [Fri, 13 Jan 2017 06:11:30 +0000 (22:11 -0800)]
tcp: new helper function for RACK loss detection
Create a new helper tcp_rack_mark_skb_lost to prepare the
upcoming RACK reordering timer support.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Satanand Burla [Fri, 13 Jan 2017 00:18:22 +0000 (16:18 -0800)]
liquidio: use fallback for selecting txq
Remove assignment to ndo_select_queue so that fallback is used for
selecting txq. Also remove the now-useless function that used to be
assigned to ndo_select_queue.
Signed-off-by: Satanand Burla <satananda.burla@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: Derek Chickles <derek.chickles@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Thu, 12 Jan 2017 23:07:16 +0000 (18:07 -0500)]
net: dsa: mv88e6xxx: add EEPROM support to 6390
The Marvell 6352 chip has a 8-bit address/16-bit data EEPROM access.
The Marvell 6390 chip has a 16-bit address/8-bit data EEPROM access.
This patch implements the 8-bit data EEPROM access in the mv88e6xxx
driver and adds its support to chips of the 6390 family.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jouni Malinen [Thu, 12 Jan 2017 23:12:22 +0000 (01:12 +0200)]
cfg80211: Fix documentation for connect result
The function documentation for cfg80211_connect_bss() and
cfg80211_connect_result() was still claiming that they are used only for
a success case while these functions can now be used to report both
success and various failure cases. The actual use cases were already
described in the connect() documentation.
Update the function specific comments to note the failure cases and also
describe how the special status == -1 case is used in
cfg80211_connect_bss() to indicate a connection timeout based on the
internal implementation in cfg80211_connect_timeout().
Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
[use tabs for indentation]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Purushottam Kushwaha [Thu, 12 Jan 2017 23:12:21 +0000 (01:12 +0200)]
cfg80211: Specify the reason for connect timeout
This enhances the connect timeout API to also carry the reason for the
timeout. These reason codes for the connect time out are represented by
enum nl80211_timeout_reason and are passed to user space through a new
attribute NL80211_ATTR_TIMEOUT_REASON (u32).
Signed-off-by: Purushottam Kushwaha <pkushwah@qti.qualcomm.com>
Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
[keep gfp_t argument last]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
vamsi krishna [Thu, 12 Jan 2017 23:12:20 +0000 (01:12 +0200)]
cfg80211: Add support to sched scan to report better BSSs
Enhance sched scan to support option of finding a better BSS while in
connected state. Firmware scans the medium and reports when it finds a
known BSS which has better RSSI than the current connected BSS. New
attributes to specify the relative RSSI (compared to the current BSS)
are added to the sched scan to implement this.
Signed-off-by: vamsi krishna <vamsin@qti.qualcomm.com>
Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
vamsi krishna [Thu, 12 Jan 2017 23:12:19 +0000 (01:12 +0200)]
cfg80211: Add support for randomizing TA of Public Action frames
Add support to use a random local address (Address 2 = TA in transmit
and the same address in receive functionality) for Public Action frames
in order to improve privacy of WLAN clients. Applications fill the
random transmit address in the frame buffer in the NL80211_CMD_FRAME
command. This can be used only with the drivers that indicate support
for random local address by setting the new
NL80211_EXT_FEATURE_MGMT_TX_RANDOM_TA and/or
NL80211_EXT_FEATURE_MGMT_TX_RANDOM_TA_CONNECTED in ext_features.
The driver needs to configure receive behavior to accept frames to the
specified random address during the time the frame exchange is pending
and such frames need to be acknowledged similarly to frames sent to the
local permanent address when this random address functionality is not
used.
Signed-off-by: vamsi krishna <vamsin@qti.qualcomm.com>
Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Johannes Berg [Fri, 13 Jan 2017 08:31:32 +0000 (09:31 +0100)]
wext: uninline stream addition functions
With 78, 111 and 85 bytes respectively (on x86-64), the
functions iwe_stream_add_event(), iwe_stream_add_point()
and iwe_stream_add_value() really shouldn't be inlines.
It appears that at least my compiler already decided
the same, and created a single instance of each one
of them for each file using it, but that's still a
number of instances in the system overall, which this
reduces.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Eric Dumazet [Thu, 12 Jan 2017 16:50:10 +0000 (08:50 -0800)]
ipv6: sr: static percpu allocation for hmac_ring
Current allocations are not NUMA aware, and lack proper
cleanup in case of error.
It is perfectly fine to use static per cpu allocations for 256 bytes
per cpu.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: David Lebrun <david.lebrun@uclouvain.be>
Acked-by: David Lebrun <david.lebrun@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nikolay Aleksandrov [Thu, 12 Jan 2017 14:53:33 +0000 (15:53 +0100)]
ipmr: improve hash scalability
Recently we started using ipmr with thousands of entries and easily hit
soft lockups on smaller devices. The reason is that the hash function
uses the high order bits from the src and dst, but those don't change in
many common cases, also the hash table is only 64 elements so with
thousands it doesn't scale at all.
This patch migrates the hash table to rhashtable, and in particular the
rhl interface which allows for duplicate elements to be chained because
of the MFC_PROXY support (*,G; *,*,oif cases) which allows for multiple
duplicate entries to be added with different interfaces (IMO wrong, but
it's been in for a long time).
And here are some results from tests I've run in a VM:
mr_table size (default, allocated for all namespaces):
Before After
49304 bytes 2400 bytes
Add 65000 routes (the diff is much larger on smaller devices):
Before After
1m42s 58s
Forwarding 256 byte packets with 65000 routes (test done in a VM):
Before After
3 Mbps / ~1465 pps 122 Mbps / ~59000 pps
As a bonus we no longer see the soft lockups on smaller devices which
showed up even with 2000 entries before.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 12 Jan 2017 02:10:37 +0000 (18:10 -0800)]
secure_seq: fix sparse errors
Fixes following warnings :
net/core/secure_seq.c:125:28: warning: incorrect type in argument 1
(different base types)
net/core/secure_seq.c:125:28: expected unsigned int const [unsigned]
[usertype] a
net/core/secure_seq.c:125:28: got restricted __be32 [usertype] saddr
net/core/secure_seq.c:125:35: warning: incorrect type in argument 2
(different base types)
net/core/secure_seq.c:125:35: expected unsigned int const [unsigned]
[usertype] b
net/core/secure_seq.c:125:35: got restricted __be32 [usertype] daddr
net/core/secure_seq.c:125:43: warning: cast from restricted __be16
net/core/secure_seq.c:125:61: warning: restricted __be16 degrades to
integer
Fixes:
7cd23e5300c1 ("secure_seq: use SipHash in place of MD5")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Prasad Kanneganti [Thu, 12 Jan 2017 01:40:27 +0000 (17:40 -0800)]
liquidio VF: reduce load time of module
Reduce the load time of the VF driver by decreasing the wait time between
iterations of the loop that polls for a mailbox response from the PF. Also
change the wait time units from jiffies to milliseconds.
Signed-off-by: Prasad Kanneganti <prasad.kanneganti@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@cavium.com>
Signed-off-by: Derek Chickles <derek.chickles@cavium.com>
Signed-off-by: Satanand Burla <satananda.burla@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Felix Manlunas [Thu, 12 Jan 2017 01:09:02 +0000 (17:09 -0800)]
liquidio: remove unnecessary code
Remove code that's no longer needed. It used to serve a purpose, which was
to fix a link-related bug. For a while now, the NIC firmware has had a
more elegant fix for that bug.
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: Derek Chickles <derek.chickles@cavium.com>
Signed-off-by: Satanand Burla <satananda.burla@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Perches [Wed, 11 Jan 2017 22:52:20 +0000 (14:52 -0800)]
tilepro: Fix non-void return from void function
commit
bc1f44709cf2 ("net: make ndo_get_stats64 a void function")
mistakenly used a return value for this void conversion.
Fix it.
Signed-off-by: Joe Perches <joe@perches.com>
cc: stephen hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 12 Jan 2017 20:05:10 +0000 (15:05 -0500)]
Merge branch 'mdio-gpio-next'
Florian Fainelli says:
====================
net: mdio-gpio: Use modern GPIO helpers
This patch series modernizes the mdio-gpio and makes it switch to the
latest and greatest API for manipulating GPIO lines, thus allowing
some simplifications in the driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Guenter Roeck [Wed, 11 Jan 2017 20:59:51 +0000 (12:59 -0800)]
net: mdio-gpio: Use gpio subsystem to handle low-active pins
gpiod functions support handling low-active pins, so we can move
thos code out of this driver into the gpio subsystem and simplify
the code a bit.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Guenter Roeck [Wed, 11 Jan 2017 20:59:50 +0000 (12:59 -0800)]
net: mdio-gpio: Convert to use gpiod functions where possible
Using gpiod functions lets us use functionality which is not available
with gpio functions.
There is no gpiod function to match devm_gpio_request_one, so leave it
in place and use gpio_to_desc() to convert absolute pin numbers to gpio
descriptors.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Guenter Roeck [Wed, 11 Jan 2017 20:59:49 +0000 (12:59 -0800)]
net: mdio-gpio: Use devm_gpio_request_one instead of devm_gpio_request
Using devm_gpio_request_one lets us request gpio pins with initial state
in one go.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yongjun [Thu, 12 Jan 2017 13:43:47 +0000 (13:43 +0000)]
cdc-ether: usbnet_cdc_zte_status() can be static
Fixes the following sparse warning:
drivers/net/usb/cdc_ether.c:469:6: warning:
symbol 'usbnet_cdc_zte_status' was not declared. Should it be static?
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sowmini Varadhan [Thu, 12 Jan 2017 13:10:11 +0000 (05:10 -0800)]
tools: psock_lib: harden socket filter used by psock tests
The filter added by sock_setfilter is intended to only permit
packets matching the pattern set up by create_payload(), but
we only check the ip_len, and a single test-character in
the IP packet to ensure this condition.
Harden the filter by adding additional constraints so that we only
permit UDP/IPv4 packets that meet the ip_len and test-character
requirements. Include the bpf_asm src as a comment, in case this
needs to be enhanced in the future
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yongjun [Thu, 12 Jan 2017 14:39:28 +0000 (14:39 +0000)]
lwt_bpf: bpf_lwt_prog_cmp() can be static
Fixes the following sparse warning:
net/core/lwt_bpf.c:355:5: warning:
symbol 'bpf_lwt_prog_cmp' was not declared. Should it be static?
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 12 Jan 2017 15:02:45 +0000 (10:02 -0500)]
Merge branch 's390-qeth-next'
Ursula Braun says:
====================
s390: qeth patches
yesterday I came up with 13 qeth patches. Since you have not been
happy with the 13th patch, I want to make sure that at least the
remaining 12 qeth patches can be applied to net-next. Here is the
resend of them.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ursula Braun [Thu, 12 Jan 2017 14:48:43 +0000 (15:48 +0100)]
s390/qeth: fix retrieval of vipa and proxy-arp addresses
qeth devices in layer3 mode need a separate handling of vipa and proxy-arp
addresses. vipa and proxy-arp addresses processed by qeth can be read from
userspace. Introduced with commit
5f78e29ceebf ("qeth: optimize IP handling
in rx_mode callback") the retrieval of vipa and proxy-arp addresses is
broken, if more than one vipa or proxy-arp address are set.
The qeth code used local variable "int i" for 2 different purposes. This
patch now spends 2 separate local variables of type "int".
While touching these functions hash_for_each_safe() is converted to
hash_for_each(), since there is no removal of hash entries.
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Reviewed-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Reference-ID: RQM 3524
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Thu, 12 Jan 2017 14:48:42 +0000 (15:48 +0100)]
s390/qeth: issue STARTLAN as first IPA command
STARTLAN needs to be the first IPA command after MPC initialization
completes.
So move the qeth_send_startlan() call from the layer disciplines
into the core path, right after the MPC handshake.
While at it, replace the magic LAN OFFLINE return code
with the existing enum.
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Reviewed-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Thu, 12 Jan 2017 14:48:41 +0000 (15:48 +0100)]
s390/qeth: shuffle MAC management functions around
Move all MAC utility functions in one place, and drop the
forward declarations.
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Reviewed-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Thu, 12 Jan 2017 14:48:40 +0000 (15:48 +0100)]
s390/qeth: extract qeth_l2_remove_mac()
This matches qeth_l2_write_mac().
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Reviewed-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Thu, 12 Jan 2017 14:48:39 +0000 (15:48 +0100)]
s390/qeth: consolidate errno translation
Consolidate errno handling for MAC management: Instead of doing this in every
caller, do it in one place.
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Reviewed-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Suggested-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Thu, 12 Jan 2017 14:48:38 +0000 (15:48 +0100)]
s390/qeth: don't convert return code twice
qeth_l2_send_groupmac() already translates the return code, so
calling qeth_setdel_makerc() a second time only produces garbage.
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Reviewed-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Thu, 12 Jan 2017 14:48:37 +0000 (15:48 +0100)]
s390/qeth: drop qeth_l2_del_all_macs() parameter
The only caller passes del = 0, so remove both the parameter and
the code that handles != 0.
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Reviewed-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Acked-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Thu, 12 Jan 2017 14:48:36 +0000 (15:48 +0100)]
s390/qeth: Remove QETH_IP_HEADER_SIZE
Remove unused define QETH_IP_HEADER_SIZE.
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Reviewed-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Acked-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Thu, 12 Jan 2017 14:48:35 +0000 (15:48 +0100)]
s390/qeth: Allow reading hsuid in state DOWN
Accessing the current hsuid via card->options.hsuid is perfectly
fine, even when the card is DOWN.
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Reviewed-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Acked-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Richter [Thu, 12 Jan 2017 14:48:34 +0000 (15:48 +0100)]
s390/qeth: display warning for OSA3 RX/TX checksum offloading
When RX/TX checksum offloading is turned on and the adapter is
an OSA 3 card in layer 3 mode, the checksum offloading is only
performed when both peers use different adapters. If both peers
share an OSA 3 card, communication is a memory copy and
checksum offloading is not performed.
This patch adds a warning to inform the administrator.
OSA 3 in layer 2 mode does not offer the RX/TX checksum
offload feature.
Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Reviewed-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Richter [Thu, 12 Jan 2017 14:48:33 +0000 (15:48 +0100)]
s390/qeth: test RX/TX checksum offload reply
Turning on receive and/or transmit checksum offload support
on the OSA card requires 2 commands:
1. start command which replies with available features
2. enable command to turn on selected features.
The current version does not check the reply of the start
command and simply uses the returned value to enable
offload features. When the start command returns zero, this
leads to a situation where no checksum offload
is turned on by the hardware. Even worse no error
indication is returned. The Linux kernel assumes
the OSA card performs RX/TX checksum offload, but the hardware
does not perform any checksum verification at all.
This patch checks the return of the start and enable
command responses from the hardware and turns off
checksum offloading if the commands fails or does not
respond with the correct bit setting.
Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Reviewed-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Richter [Thu, 12 Jan 2017 14:48:32 +0000 (15:48 +0100)]
s390/qeth: rework RX/TX checksum offload
Rework the RX/TX checksum offloading command sequence to use
the provided function call back mechanims to return card
data to the device driver.
Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Reviewed-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 12 Jan 2017 15:00:31 +0000 (10:00 -0500)]
Merge branch 'bpf-cb-access'
Daniel Borkmann says:
====================
More flexible BPF cb access
This patch improves BPF's cb access by allowing b/h/w/dw
access variants on it. For details, please see individual
patches.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Thu, 12 Jan 2017 10:51:33 +0000 (11:51 +0100)]
bpf: allow b/h/w/dw access for bpf's cb in ctx
When structs are used to store temporary state in cb[] buffer that is
used with programs and among tail calls, then the generated code will
not always access the buffer in bpf_w chunks. We can ease programming
of it and let this act more natural by allowing for aligned b/h/w/dw
sized access for cb[] ctx member. Various test cases are attached as
well for the selftest suite. Potentially, this can also be reused for
other program types to pass data around.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Thu, 12 Jan 2017 10:51:32 +0000 (11:51 +0100)]
bpf: pass original insn directly to convert_ctx_access
Currently, when calling convert_ctx_access() callback for the various
program types, we pass in insn->dst_reg, insn->src_reg, insn->off from
the original instruction. This information is needed to rewrite the
instruction that is based on the user ctx structure into a kernel
representation for the ctx. As we'd like to allow access size beyond
just BPF_W, we'd need also insn->code for that in order to decode the
original access size. Given that, lets just pass insn directly to the
convert_ctx_access() callback and work on that to not clutter the
callback with even more arguments we need to pass when everything is
already contained in insn. So lets go through that once, no functional
change.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 12 Jan 2017 14:47:01 +0000 (09:47 -0500)]
Merge branch 'smc-fixes'
Ursula Braun says:
====================
net/smc: fix typo and clc-bug
I received 2 bug reports for my new AF_SMC-code. Here are the fixes for them.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ursula Braun [Thu, 12 Jan 2017 13:57:15 +0000 (14:57 +0100)]
smc: ETH_ALEN as memcpy length for mac addresses
When creating an SMC connection, there is a CLC (connection layer control)
handshake to prepare for RDMA traffic. The corresponding code is part of
commit
0cfdd8f92cac ("smc: connection and link group creation").
Mac addresses to be exchanged in the handshake are copied with a wrong
length of 12 instead of 6 bytes. Following code overwrites the wrongly
copied code, but nevertheless the correct length should already be used for
the preceding mac address copying. Use ETH_ALEN for the memcpy length with
mac addresses.
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Fixes:
0cfdd8f92cac ("smc: connection and link group creation")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ursula Braun [Thu, 12 Jan 2017 13:57:14 +0000 (14:57 +0100)]
net: fix AF_SMC related typo
When introducing the new socket family AF_SMC in
commit
ac7138746e14 ("smc: establish new socket family"),
a typo in af_family_clock_key_strings has slipped in.
This patch repairs it.
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Fixes:
ac7138746e14 ("smc: establish new socket family")
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ganesh Goudar [Thu, 12 Jan 2017 06:53:21 +0000 (12:23 +0530)]
cxgb4: Initialize mbox lock and list for mgmt dev
Initialize mbox lock and list for mgmt dev to avoid NULL pointer
dereference when cxgb_set_vf_mac is called.
And also allocate memory for private data while allocating mgmt
netdev.
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 12 Jan 2017 05:13:02 +0000 (21:13 -0800)]
net: core: Make netif_wake_subqueue a wrapper
netif_wake_subqueue() is duplicating the same thing that netif_tx_wake_queue()
does, so make it call it directly after looking up the queue from the index.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johannes Berg [Wed, 19 Oct 2016 13:02:32 +0000 (15:02 +0200)]
mac80211: set wifi_acked[_valid] bits for transmitted SKBs
There may be situations in which the in-kernel originator of an
SKB cares about its wifi transmission status. To have that, set
the wifi_acked[_valid] bits before freeing/orphaning the SKB if
the destructor is set. The originator can then use it in there.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
David Spinadel [Mon, 21 Nov 2016 14:58:40 +0000 (16:58 +0200)]
mac80211: Add RX flag to indicate ICV stripped
Add a flag that indicates that the WEP ICV was stripped from an
RX packet, allowing the device to not transfer that if it's
already checked.
Signed-off-by: David Spinadel <david.spinadel@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Arnd Bergmann [Wed, 11 Jan 2017 14:35:25 +0000 (15:35 +0100)]
wext: handle NULL extra data in iwe_stream_add_point better
gcc-7 complains that wl3501_cs passes NULL into a function that
then uses the argument as the input for memcpy:
drivers/net/wireless/wl3501_cs.c: In function 'wl3501_get_scan':
include/net/iw_handler.h:559:3: error: argument 2 null where non-null expected [-Werror=nonnull]
memcpy(stream + point_len, extra, iwe->u.data.length);
This works fine here because iwe->u.data.length is guaranteed to be 0
and the memcpy doesn't actually have an effect.
Making the length check explicit avoids the warning and should have
no other effect here.
Also check the pointer itself, since otherwise we get warnings
elsewhere in the code.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Robert Richter [Wed, 11 Jan 2017 17:04:32 +0000 (18:04 +0100)]
net: thunderx: Make hfunc variable const type in nicvf_set_rxfh()
>From struct ethtool_ops:
int (*set_rxfh)(struct net_device *, const u32 *indir,
const u8 *key, const u8 hfunc);
Change function arg of hfunc to const type.
V2: Fixed indentation.
Signed-off-by: Robert Richter <rrichter@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yongjun [Wed, 11 Jan 2017 16:32:51 +0000 (16:32 +0000)]
net: thunderx: Fix error return code in nicvf_open()
Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.
Fixes:
712c31853440 ("net: thunderx: Program LMAC credits based on MTU")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yongjun [Wed, 11 Jan 2017 16:16:12 +0000 (16:16 +0000)]
sfc: efx_get_phys_port_id() can be static
Fixes the following sparse warning:
drivers/net/ethernet/sfc/efx.c:2337:5: warning:
symbol 'efx_get_phys_port_id' was not declared. Should it be static?
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 11 Jan 2017 19:43:39 +0000 (14:43 -0500)]
Merge git://git./linux/kernel/git/davem/net
Two AF_* families adding entries to the lockdep tables
at the same time.
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Wed, 11 Jan 2017 19:15:15 +0000 (11:15 -0800)]
Merge branch 'akpm' (patches from Andrew)
Merge fixes from Andrew Morton:
"27 fixes.
There are three patches that aren't actually fixes. They're simple
function renamings which are nice-to-have in mainline as ongoing net
development depends on them."
* akpm: (27 commits)
timerfd: export defines to userspace
mm/hugetlb.c: fix reservation race when freeing surplus pages
mm/slab.c: fix SLAB freelist randomization duplicate entries
zram: support BDI_CAP_STABLE_WRITES
zram: revalidate disk under init_lock
mm: support anonymous stable page
mm: add documentation for page fragment APIs
mm: rename __page_frag functions to __page_frag_cache, drop order from drain
mm: rename __alloc_page_frag to page_frag_alloc and __free_page_frag to page_frag_free
mm, memcg: fix the active list aging for lowmem requests when memcg is enabled
mm: don't dereference struct page fields of invalid pages
mailmap: add codeaurora.org names for nameless email commits
signal: protect SIGNAL_UNKILLABLE from unintentional clearing.
mm: pmd dirty emulation in page fault handler
ipc/sem.c: fix incorrect sem_lock pairing
lib/Kconfig.debug: fix frv build failure
mm: get rid of __GFP_OTHER_NODE
mm: fix remote numa hits statistics
mm: fix devm_memremap_pages crash, use mem_hotplug_{begin, done}
ocfs2: fix crash caused by stale lvb with fsdlm plugin
...
Linus Torvalds [Wed, 11 Jan 2017 17:52:12 +0000 (09:52 -0800)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Fix rtlwifi crash, from Larry Finger.
2) Memory disclosure in appletalk ipddp routing code, from Vlad
Tsyrklevich.
3) r8152 can erroneously split an RX packet into multiple URBs if the
Rx FIFO is not empty when we suspend. Fix this by waiting for the
FIFO to empty before suspending. From Hayes Wang.
4) Two GRO fixes (enter slow path when not enough SKB tail room exists,
disable frag0 optimizations when there are IPV6 extension headers)
from Eric Dumazet and Herbert Xu.
5) A series of mlx5e bug fixes (do source udp port offloading for
tunnels properly, Ip fragment matching fixes, handling firmware
errors properly when installing TC rules, etc.) from Saeed Mahameed,
Or Gerlitz, Roi Dayan, Hadar Hen Zion, Gil Rockah, and Daniel
Jurgens.
6) Two VRF fixes from David Ahern (don't skip multipath selection for
VRF paths, disallow VRF to be configured with table ID 0).
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (35 commits)
net: vrf: do not allow table id 0
net: phy: marvell: fix Marvell 88E1512 used in SGMII mode
sctp: Fix spelling mistake: "Atempt" -> "Attempt"
net: ipv4: Fix multipath selection with vrf
cgroup: move CONFIG_SOCK_CGROUP_DATA to init/Kconfig
gro: use min_t() in skb_gro_reset_offset()
net/mlx5: Only cancel recovery work when cleaning up device
net/mlx5e: Remove WARN_ONCE from adaptive moderation code
net/mlx5e: Un-register uplink representor on nic_disable
net/mlx5e: Properly handle FW errors while adding TC rules
net/mlx5e: Fix kbuild warnings for uninitialized parameters
net/mlx5e: Set inline mode requirements for matching on IP fragments
net/mlx5e: Properly get address type of encapsulation IP headers
net/mlx5e: TC ipv4 tunnel encap offload error flow fixes
net/mlx5e: Warn when rejecting offload attempts of IP tunnels
net/mlx5e: Properly handle offloading of source udp port for IP tunnels
gro: Disable frag0 optimization on IPv6 ext headers
gro: Enter slow-path if there is no tailroom
mlx4: Return EOPNOTSUPP instead of ENOTSUPP
net/af_iucv: don't use paged skbs for TX on HiperSockets
...
Linus Torvalds [Wed, 11 Jan 2017 17:28:13 +0000 (09:28 -0800)]
Merge branch 'linus' of git://git./linux/kernel/git/herbert/crypto-2.6
Pull crypto fix from Herbert Xu:
"This fixes a regression in aesni that renders it useless if it's
built-in with a modular pcbc configuration"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: aesni - Fix failure when built-in with modular pcbc
David S. Miller [Wed, 11 Jan 2017 16:02:48 +0000 (11:02 -0500)]
Merge branch 'cls_flower-ARP'
Simon Horman says:
====================
net/sched: cls_flower: Support matching ARP
Add support for support matching on ARP operation, and hardware and
protocol addresses for Ethernet hardware and IPv4 protocol addresses.
Changes since RFC:
* None other than dropping RFC designation after positive feedback from Jiri
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Simon Horman [Wed, 11 Jan 2017 13:05:43 +0000 (14:05 +0100)]
net/sched: cls_flower: Support matching on ARP
Support matching on ARP operation, and hardware and protocol addresses
for Ethernet hardware and IPv4 protocol addresses.
Example usage:
tc qdisc add dev eth0 ingress
tc filter add dev eth0 protocol arp parent ffff: flower indev eth0 \
arp_op request arp_sip 10.0.0.1 action drop
tc filter add dev eth0 protocol rarp parent ffff: flower indev eth0 \
arp_op reply arp_tha 52:54:3f:00:00:00/24 action drop
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Simon Horman [Wed, 11 Jan 2017 13:05:42 +0000 (14:05 +0100)]
flow disector: ARP support
Allow dissection of (R)ARP operation hardware and protocol addresses
for Ethernet hardware and IPv4 protocol addresses.
There are currently no users of FLOW_DISSECTOR_KEY_ARP.
A follow-up patch will allow FLOW_DISSECTOR_KEY_ARP to be used by the
flower classifier.
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Keerthy [Wed, 11 Jan 2017 03:33:29 +0000 (09:03 +0530)]
net: netcp: correct netcp_get_stats function signature
Commit:
bc1f44709cf2 - net: make ndo_get_stats64 a void function
and
Commit:
6a8162e99ef3 - net: netcp: store network statistics in 64 bits.
The commit
6a8162e99ef3 adds ndo_get_stats64 function as per old
signature which causes compilation error:
drivers/net/ethernet/ti/netcp_core.c:1951:28: error:
initialization from incompatible pointer type
.ndo_get_stats64 = netcp_get_stats,
Hence correct netcp_get_stats function signature as per
the latest definition.
Signed-off-by: Keerthy <j-keerthy@ti.com>
Fixes:
6a8162e99ef344fc ("net: netcp: store network statistics in 64 bits")
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Tue, 10 Jan 2017 23:22:25 +0000 (15:22 -0800)]
net: vrf: do not allow table id 0
Frank reported that vrf devices can be created with a table id of 0.
This breaks many of the run time table id checks and should not be
allowed. Detect this condition at create time and fail with EINVAL.
Fixes:
193125dbd8eb ("net: Introduce VRF device driver")
Reported-by: Frank Kellermann <frank.kellermann@atos.net>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Tue, 10 Jan 2017 23:13:45 +0000 (23:13 +0000)]
net: phy: marvell: fix Marvell 88E1512 used in SGMII mode
When an Marvell 88E1512 PHY is connected to a nic in SGMII mode, the
fiber page is used for the SGMII host-side connection. The PHY driver
notices that SUPPORTED_FIBRE is set, so it tries reading the fiber page
for the link status, and ends up reading the MAC-side status instead of
the outgoing (copper) link. This leads to incorrect results reported
via ethtool.
If the PHY is connected via SGMII to the host, ignore the fiber page.
However, continue to allow the existing power management code to
suspend and resume the fiber page.
Fixes:
6cfb3bcc0641 ("Marvell phy: check link status in case of fiber link.")
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Tue, 10 Jan 2017 22:53:06 +0000 (22:53 +0000)]
sctp: Fix spelling mistake: "Atempt" -> "Attempt"
Trivial fix to spelling mistake in WARN_ONCE message
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Tue, 10 Jan 2017 22:37:35 +0000 (14:37 -0800)]
net: ipv4: Fix multipath selection with vrf
fib_select_path does not call fib_select_multipath if oif is set in the
flow struct. For VRF use cases oif is always set, so multipath route
selection is bypassed. Use the FLOWI_FLAG_SKIP_NH_OIF to skip the oif
check similar to what is done in fib_table_lookup.
Add saddr and proto to the flow struct for the fib lookup done by the
VRF driver to better match hash computation for a flow.
Fixes:
613d09b30f8b ("net: Use VRF device index for lookups on TX")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 11 Jan 2017 14:55:55 +0000 (09:55 -0500)]
Merge branch 'dsa-phys_port_name'
Florian Fainelli says:
====================
net: dsa: Implement ndo_get_phys_port_name()
This patch series implements ndo_get_phys_port_name() so we can revert
ndo_get_phys_id() which was (ab)used in the DSA layer.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 10 Jan 2017 20:32:37 +0000 (12:32 -0800)]
Revert "net: dsa: Implement ndo_get_phys_port_id"
This reverts commit
3a543ef479868e36c95935de320608a7e41466ca ("net: dsa:
Implement ndo_get_phys_port_id") since it misuses the purpose of
ndo_get_phys_port_id(). We have ndo_get_phys_port_name() to do the
correct thing for us now.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 10 Jan 2017 20:32:36 +0000 (12:32 -0800)]
net: dsa: Implement ndo_get_phys_port_name()
Return the physical port number of a DSA created network device using
ndo_get_phys_port_name().
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Tue, 10 Jan 2017 12:08:06 +0000 (13:08 +0100)]
cgroup: move CONFIG_SOCK_CGROUP_DATA to init/Kconfig
We now 'select SOCK_CGROUP_DATA' but Kconfig complains that this is
not right when CONFIG_NET is disabled and there is no socket interface:
warning: (CGROUP_BPF) selects SOCK_CGROUP_DATA which has unmet direct dependencies (NET)
I don't know what the correct solution for this is, but simply removing
the dependency on NET from SOCK_CGROUP_DATA by moving it out of the
'if NET' section avoids the warning and does not produce other build
errors.
Fixes:
483c4933ea09 ("cgroup: Fix CGROUP_BPF config")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Mon, 9 Jan 2017 23:13:51 +0000 (18:13 -0500)]
net: dsa: make "label" property optional for dsa2
In the new DTS bindings for DSA (dsa2), the "ethernet" and "link"
phandles are respectively mandatory and exclusive to CPU port and DSA
link device tree nodes.
Simplify dsa2.c a bit by checking the presence of such phandle instead
of checking the redundant "label" property.
Then the Linux philosophy for Ethernet switch ports is to expose them to
userspace as standard NICs by default. Thus use the standard enumerated
"eth%d" device name if no "label" property is provided for a user port.
This allows to save DTS files from subjective net device names.
If one wants to rename an interface, udev rules can be used as usual.
Of course the current behavior is unchanged, and the optional "label"
property for user ports has precedence over the enumerated name.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Acked-by: Uwe Kleine-König <uwe@kleine-koenig.org>
Signed-off-by: David S. Miller <davem@davemloft.net>