review.tizen.org Git - platform/kernel/linux-starfive.git/log

net: ep93xx_eth: Delete unnecessary checks before the function call "kfree"

The kfree() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

xen-netfront: Use static attribute groups for sysfs entries

Instead of manual calls of device_create_file() and
device_remove_files(), assign the static attribute groups to netdev
groups array. This simplifies the code and avoids the possible
races.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Acked-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tun: Use static attribute groups for sysfs entries

Instead of manual calls of device_create_file() and
device_remove_files(), assign the static attribute groups to netdev
groups array. This simplifies the code and avoids the possible
races.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

qlogic: Deletion of unnecessary checks before two function calls

The functions kfree() and vfree() perform also input parameter validation.
Thus the test around their calls is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'linux-can-next-for-3.20-20150204' of git://git./linux/kernel/git/mkl/linux-can-next

Marc Kleine-Budde says:

====================
pull-request: can-next 2015-02-04

this is a pull request of 2 patches for net-next/master.

Nicholas Mc Guire contributes a patch for the janz-ican3 driver to fix
a mismatch in an assignment. Ahmed S. Darwish contributes a patch for
the kvaser_usb driver, to make the driver more robust during the
bus-off handling.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: ti/cpsw-common.c: fix sparse warning

this patch fixes following sparse warning:

cpsw-common.c:23:5: warning: symbol 'cpsw_am33xx_cm_get_macid' was not declared. Should it be static?

Signed-off-by: Lad, Prabhakar <prabhakar.csengg@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netxen: Delete an unnecessary check before the function call "kfree"

The kfree() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fec: Delete unnecessary checks before the function call "kfree"

The kfree() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

myri10ge: Delete an unnecessary check before the function call "kfree"

The kfree() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: Delete an unnecessary check before the function call "release_firmware"

The release_firmware() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: Add low latency socket busy_poll support

cxgb_busy_poll, corresponding to ndo_busy_poll, gets called by the socket
waiting for data.

With busy_poll enabled, improvement is seen in latency numbers as observed by
collecting netperf TCP_RR numbers.
Below are latency number, with and without busy-poll, in a switched environment
for a particular msg size:
netperf command: netperf -4 -H <ip> -l 30 -t TCP_RR -- -r1,1
Latency without busy-poll: ~16.25 us
Latency with busy-poll : ~08.79 us

Based on original work by Kumar Sanghvi <kumaras@chelsio.com>

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Revert "bridge: Let bridge not age 'externally' learnt FDB entries, they are removed when 'external' entity notifies the aging"

This reverts commit 9a05dde59a35eee5643366d3d1e1f43fc9069adb.

Requested by Scott Feldman.

Signed-off-by: David S. Miller <davem@davemloft.net>

pkt_sched: fq: better control of DDOS traffic

FQ has a fast path for skb attached to a socket, as it does not
have to compute a flow hash. But for other packets, FQ being non
stochastic means that hosts exposed to random Internet traffic
can allocate million of flows structure (104 bytes each) pretty
easily. Not only host can OOM, but lookup in RB trees can take
too much cpu and memory resources.

This patch adds a new attribute, orphan_mask, that is adding
possibility of having a stochastic hash for orphaned skb.

Its default value is 1024 slots, to mimic SFQ behavior.

Note: This does not apply to locally generated TCP traffic,
and no locally generated traffic will share a flow structure
with another perfect or stochastic flow.

This patch also handles the specific case of SYNACK messages:

They are attached to the listener socket, and therefore all map
to a single hash bucket. If listener have set SO_MAX_PACING_RATE,
hoping to have new accepted socket inherit this rate, SYNACK
might be paced and even dropped.

This is very similar to an internal patch Google have used more
than one year.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'for-davem' of git://git./linux/kernel/git/viro/vfs

More iov_iter work from Al Viro.

Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: do not pace pure ack packets

When we added pacing to TCP, we decided to let sch_fq take care
of actual pacing.

All TCP had to do was to compute sk->pacing_rate using simple formula:

sk->pacing_rate = 2 * cwnd * mss / rtt

It works well for senders (bulk flows), but not very well for receivers
or even RPC :

cwnd on the receiver can be less than 10, rtt can be around 100ms, so we
can end up pacing ACK packets, slowing down the sender.

Really, only the sender should pace, according to its own logic.

Instead of adding a new bit in skb, or call yet another flow
dissection, we tweak skb->truesize to a small value (2), and
we instruct sch_fq to use new helper and not pace pure ack.

Note this also helps TCP small queue, as ack packets present
in qdisc/NIC do not prevent sending a data packet (RPC workload)

This helps to reduce tx completion overhead, ack packets can use regular
sock_wfree() instead of tcp_wfree() which is a bit more expensive.

This has no impact in the case packets are sent to loopback interface,
as we do not coalesce ack packets (were we would detect skb->truesize
lie)

In case netem (with a delay) is used, skb_orphan_partial() also sets
skb->truesize to 1.

This patch is a combination of two patches we used for about one year at
Google.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'rhashtable-next'

Herbert Xu says:

====================
rhashtable: Add iterators and use them

The first patch fixes a potential crash with nft_hash destroying
the table during a shrinking process. While the next patch adds
rhashtable iterators to replace current manual walks used by
netlink and netfilter. The final two patches make use of these
iterators in netlink and netfilter.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

netfilter: Use rhashtable walk iterator

This patch gets rid of the manual rhashtable walk in nft_hash
which touches rhashtable internals that should not be exposed.
It does so by using the rhashtable iterator primitives.

Note that I'm leaving nft_hash_destroy alone since it's only
invoked on shutdown and it shouldn't be affected by changes
to rhashtable internals (or at least not what I'm planning to
change).

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

netlink: Use rhashtable walk iterator

This patch gets rid of the manual rhashtable walk in netlink
which touches rhashtable internals that should not be exposed.
It does so by using the rhashtable iterator primitives.

In fact the existing code was very buggy. Some sockets weren't
shown at all while others were shown more than once.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

rhashtable: Introduce rhashtable_walk_*

Some existing rhashtable users get too intimate with it by walking
the buckets directly. This prevents us from easily changing the
internals of rhashtable.

This patch adds the helpers rhashtable_walk_init/exit/start/next/stop
which will replace these custom walkers.

They are meant to be usable for both procfs seq_file walks as well
as walking by a netlink dump. The iterator structure should fit
inside a netlink dump cb structure, with at least one element to
spare.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

rhashtable: Fix potential crash on destroy in rhashtable_shrink

The current being_destroyed check in rhashtable_expand is not
enough since if we start a shrinking process after freeing all
elements in the table that's also going to crash.

This patch adds a being_destroyed check to the deferred worker
thread so that we bail out as soon as we take the lock.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

NetCP: Deletion of unnecessary checks before two function calls

The functions cpsw_ale_destroy() and of_dev_put() test whether their argument
is NULL and then return immediately. Thus the test around the call
is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

IBM-EMAC: Delete an unnecessary check before the function call "of_dev_put"

The of_dev_put() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mlx4-next'

Amir Vadai says:

====================
Mellanox drivers updates Feb-03-2015

This patchset introduces some small bug fixes and code cleanups in mlx4_core,
mlx4_en and mlx5_core.
I am sending it in parallel to the patchset sent by Or Gerlitz today [1] because
this is the end of the time frame for 3.20. I also checked that there are no
conflicts between those two patchsets (Or's patchset is focused on the bonding
area while this on Mellanox drivers).

The patchset was applied on top of commit 7d37d0c ('net: sctp: Deletion of an
unnecessary check before the function call "kfree"')

[1] - [PATCH 00/10] Add HA and LAG support to mlx4 RoCE and SRIOV services
http://marc.info/?l=linux-netdev&m=142297582610254&w=2
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_en: Notify TX Vlan offload change

Notify users when TX vlan offload feature changed with ethtool.
Relevant command - ethtool -K <eth> txvlan on/off.

Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_en: Adjust RX frag strides to frag sizes

This patch improves memory utilization and therefore the packets rate
for special MTU's. Instead of setting the frag_stride to the maximal
hard coded frag_size, use the actual frag_size that is set according to
the MTU, when setting the stride of the last frag.
So, for example, for MTU 1600, where the frag_size of the 2nd frag is
86, the frag_size is set to 128 instead of 4096. See below:

Before:
frag:0 - size:1536 prefix:0 stride:1536
frag:1 - size:86 prefix:1536 stride:4096

frag 0 allocator: - size:32768 frags:21
frag 1 allocator: - size:32768 frags:8

After:
frag:0 - size:1536 prefix:0 stride:1536
frag:1 - size:86 prefix:1536 stride:128

frag 0 allocator: - size:32768 frags:21
frag 1 allocator: - size:32768 frags:256

Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_en: Print page allocator information

After Initialization of page_alloc, print actual allocated page
size and number of frags it contains. prints is done only when drv
message level is set on the interface.

Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5_core: Move to use hex PCI device IDs

Align the IDs in the code with the modinfo, lspci -n, etc tools outputs.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_core: Fix misleading debug print on CQE stride support

We do support cache line sizes of 32 and 64 bytes without activating the
CQE stride feature. Fix a misleading print saying that these cache line
sizes aren't supported.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4: mlx4_config_dev_retrieval() - Initialize struct config_dev before using

Add Initialization to struct config_dev before filling and using it.
Fix to warning:

warning: config_dev.rx_checksum_val may be used uninitialized in this function

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_core: Fix mpt_entry initialization in mlx4_mr_rereg_mem_write()

a) Previously, mlx4_mr_rereg_write filled the MPT's start
   and length with the old MPT's values.
   Fixing the initialization to take the new start and length.

b) In addition access flags in mpt_status were initialized instead of
   status due to bad boolean operation. Fixing the operation.

c) Initialization of pd_slave caused a protection error.
   Fix - removing this initialization.

d) In resource_tracker.c: Fixing vf encoding to be one-based.

Fixes: e630664c ('mlx4_core: Add helper functions to support MR re-registration')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mlx4-next'

Or Gerlitz says:

====================
Add HA and LAG support to mlx4 RoCE and SRIOV services

This series takes advanges of bonding mlx4 Ethernet devices to support
a model of High-Availability and Link Aggregation for more environments.

The mlx4 driver reacts on netdev events generated by bonding when
slave state changes happen by programming a HW V2P (Virt-to-Phys)
port table. Bonding was extended to expose these state changes
through netdev events.

When an mlx4 interface such as the mlx4 IB/RoCE driver is subject to
this policy, QPs are created over virtual ports which are mapped
to one of the two physical ports. When a failure happens, the
re-programming of the V2P table allows traffic to keep flowing.

The mlx4 Ethernet driver interfaces are not subject to this
policy and act as usual.

A 2nd use-case for this model would be to add HA and Link Aggregation
support to single ported mlx4 Ethernet VFs. In this case, the PF Ethernet
intrfaces are bonded, all the VFs see single port devices (which is
supported already today), and VF QPs are subject to V2P.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

IB/mlx4: Load balance ports in port aggregation mode

When the mlx4 IB (RoCE) device works in link aggregation mode, it
exposes a single port to upper layers. Therefore, applications always
set '1' in port_num attribute when modifying a QP or creating an address handle.

To make sure that a node uses all available ports the mlx4 driver will
override the port_num attribute with a round robin policy.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

IB/mlx4: Create mirror flows in port aggregation mode

In port aggregation mode flows for port #1 (the only port) should be mirrored
on port #2. This is because packets can arrive from either physical ports.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

IB/mlx4: Add port aggregation support

Register the interface with the mlx4 core driver with port aggregation support
and check for port aggregation mode when the 'add' function is called.

In this mode, only one physical port is reported to the upper layer
(RoCE/IB core stack and ULPs).

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

IB/mlx4: Reuse mlx4_mac_to_u64()

This function is implemented twice... get rid of one copy.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_en: Port aggregation configuration

Capture NETDEV events generated by the bonding driver and based on that
make decisions of how to configure port aggregation in the mlx4 core driver.

This includes setting the V2P port table and re-creating the interested
interfaces in bonded/non-bonded mode.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_core: Port aggregation upper layer interface

Supply interface functions to bond and unbond ports of a mlx4 internal
interfaces. Example for such an interface is the one registered by the
mlx4 IB driver under RoCE.

There are

1. Functions to go in/out to/from bonded mode
2. Function to remap virtual ports to physical ports

The bond_mutex prevents simultaneous access to data that keep status of
the device in bonded mode.

The upper mlx4 interface marks to the mlx4 core module that they
want to be subject for such bonding by setting the MLX4_INTFF_BONDING
flag. Interface which goes to/from bonded mode is re-created.

The mlx4 Ethernet driver does not set this flag when registering the
interface, the IB driver does.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_core: Port aggregation low level interface

Implement the hardware interface required for port aggregation.

1. Disable RX port check on receive - don't perform a validity check
that matches to QP's port and the port where the packet is received.

2. Virtual to physical port remap - configure virtual to physical port
mapping. Port remap capability for virtual functions.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/bonding: Notify state change on slaves

Use notifier chain to dispatch an event upon a change in slave state.
Event is dispatched with slave specific info.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/bonding: Move slave state changes to a helper function

Move slave state changes to a helper function, this is a pre-step for adding
functionality of dispatching an event when this helper is called.

This commit doesn't add new functionality.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/core: Add event for a change in slave state

Add event which provides an indication on a change in the state
of a bonding slave. The event handler should cast the pointer to the
appropriate type (struct netdev_bonding_info) in order to get the
full info about the slave.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'tipc-next'

Jon Maloy says:

====================
tipc: some small fixes

During extensive testing and analysis of running dual links between
nodes, we have encountered some issues that potentially may cause
problems. We choose to fix those proactively in this series.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: separate link starting event from link timeout event

When a new link instance is created, it is trigged to start by
sending it a TIPC_STARTING_EVT, whereafter a regular link
reset is applied to it.

The starting event is codewise treated as a timeout event, and prompts
a link RESET message to be sent to the peer node, carrying a link
session identifier. The later link_reset() call nudges this session
identifier, whereafter all subsequent RESET messages will be sent out
with the new identifier. The latter session number overrides the former,
causing the peer to unconditionally accept it irrespective of its
current working state.

We don't think that this causes any problem, but it is not in accordance
with the protocol spec, and may cause confusion when debugging TIPC
sessions.

To avoid this, we make the starting event distinct from the
subsequent timeout events, by not allowing the former to send
out any RESET message. This eliminates the described problem.

Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: eliminate race during node creation

Instances of struct node are created in the function tipc_disc_rcv()
under the assumption that there is no race between received discovery
messages arriving from the same node. This assumption is wrong.
When we use more than one bearer, it is possible that discovery
messages from the same node arrive at the same moment, resulting in
creation of two instances of struct tipc_node. This may later cause
confusion during link establishment, and may result in one of the links
never becoming activated.

We fix this by making lookup and potential creation of nodes atomic.
Instead of first looking up the node, and in case of failure, create it,
we now start with looking up the node inside node_link_create(), and
return a reference to that one if found. Otherwise, we go ahead and
create the node as we did before.

Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: avoid stale link after aborted failover

During link failover it may happen that the remaining link goes
down while it is still in the process of taking over traffic
from a previously failed link. When this happens, we currently
abort the failover procedure and reset the first failed link to
non-failover mode, so that it will be ready to re-establish
contact with its peer when it comes available.

However, if the first link goes down because its bearer was manually
disabled, it is not enough to reset it; it must also be deleted;
which is supposed to happen when the failover procedure is finished.
Otherwise it will remain a zombie link: attached to the owner node
structure, in mode LINK_STOPPED, and permanently blocking any re-
establishing of the link to the peer via the interface in question.

We fix this by amending the failover abort procedure. Apart from
resetting the link to non-failover state, we test if the link is
also in LINK_STOPPED mode. If so, we delete it, using the conditional
tipc_link_delete() function introduced in the previous commit.

Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: add reference count to struct tipc_link

When a bearer is disabled, all pertaining links will be reset and
deleted. However, if there is a second active link towards a killed
link's destination, the delete has to be postponed until the failover
is finished. During this interval, we currently put the link in zombie
mode, i.e., we take it out of traffic, delete its timer, but leave it
attached to the owner node structure until all missing packets have
been received. When this is done, we detach the link from its node
and delete it, assuming that the synchronous timer deletion that was
initiated earlier in a different thread has finished.

This is unsafe, as the failover may finish before del_timer_sync()
has returned in the other thread.

We fix this by adding an atomic reference counter of type kref in
struct tipc_link. The counter keeps track of the references kept
to the link by the owner node and the timer. We then do a conditional
delete, based on the reference counter, both after the failover has
been finished and when the timer expires, if applicable. Whoever
comes last, will actually delete the link. This approach also implies
that we can make the deletion of the timer asynchronous.

Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'mac80211-next-for-davem-2015-02-03' of git://git./linux/kernel/git/jberg/mac80211-next

Last round of updates for net-next:
* revert a patch that caused a regression with mesh userspace (Bob)
* fix a number of suspend/resume related races
(from Emmanuel, Luca and myself - we'll look at backporting later)
* add software implementations for new ciphers (Jouni)
* add a new ACPI ID for Broadcom's rfkill (Mika)
* allow using netns FD for wireless (Vadim)
* some other cleanups (various)

Signed-off-by: David S. Miller <davem@davemloft.net>

csiostor:Use firmware version from cxgb4/t4fw_version.h

This patch is to use firmware version macros from t4fw_version.h
and also enables 40g T5 adapter.

Signed-off-by: Praveen Madhavan <praveenm@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tlan: msecs_to_jiffies convrsion

This is only an API consolidation and should make things more readable
it replaces var * HZ / 1000 by msecs_to_jiffies(var).

As there is a discrepancy between the code and the comments this is in
a separate patch.

Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

tlan: use msecs_to_jiffies for conversion

This is only an API consolidation and should make things more readable
it replaces var * HZ / 1000 by msecs_to_jiffies(var).

Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'for-upstream' of git://git./linux/kernel/git/bluetooth/bluetooth-next

Johan Hedberg says:

====================
pull request: bluetooth-next 2015-02-03

Here's what's likely the last bluetooth-next pull request for 3.20.
Notable changes include:

- xHCI workaround + a new id for the ath3k driver
- Several new ids for the btusb driver
- Support for new Intel Bluetooth controllers
- Minor cleanups to ieee802154 code
- Nested sleep warning fix in socket accept() code path
- Fixes for Out of Band pairing handling
- Support for LE scan restarting for HCI_QUIRK_STRICT_DUPLICATE_FILTER
- Improvements to data we expose through debugfs
- Proper handling of Hardware Error HCI events

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: add skb functions to process remote checksum offload

This patch adds skb_remcsum_process and skb_gro_remcsum_process to
perform the appropriate adjustments to the skb when receiving
remote checksum offload.

Updated vxlan and gue to use these functions.

Tested: Ran TCP_RR and TCP_STREAM netperf for VXLAN and GUE, did
not see any change in performance.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bridge: Let bridge not age 'externally' learnt FDB entries, they are removed when 'external' entity notifies the aging

When 'learned_sync' flag is turned on, the offloaded switch
port syncs learned MAC addresses to bridge's FDB via switchdev notifier
(NETDEV_SWITCH_FDB_ADD). Currently, FDB entries learnt via this mechanism are
wrongly being deleted by bridge aging logic. This patch ensures that FDB
entries synced from offloaded switch ports are not deleted by bridging logic.
Such entries can only be deleted via switchdev notifier
(NETDEV_SWITCH_FDB_DEL).

Signed-off-by: Siva Mannem <siva.mannem.lnx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fs_enet: Implement NETIF_F_SG feature

Freescale ethernet controllers have the capability to re-assemble fragmented
data into a single ethernet frame. This patch uses this capability and
implements NETIP_F_SG feature into the fs_enet ethernet driver.

On a MPC885, I get 53% performance improvement on a ftp transfer of a 15Mb file:
* Without the patch : 2,8 Mbps
* With the patch : 4,3 Mbps

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

xps: fix xps for stacked devices

A typical qdisc setup is the following :

bond0 : bonding device, using HTB hierarchy
eth1/eth2 : slaves, multiqueue NIC, using MQ + FQ qdisc

XPS allows to spread packets on specific tx queues, based on the cpu
doing the send.

Problem is that dequeues from bond0 qdisc can happen on random cpus,
due to the fact that qdisc_run() can dequeue a batch of packets.

CPUA -> queue packet P1 on bond0 qdisc, P1->ooo_okay=1
CPUA -> queue packet P2 on bond0 qdisc, P2->ooo_okay=0

CPUB -> dequeue packet P1 from bond0
enqueue packet on eth1/eth2
CPUC -> dequeue packet P2 from bond0
enqueue packet on eth1/eth2 using sk cache (ooo_okay is 0)

get_xps_queue() then might select wrong queue for P1, since current cpu
might be different than CPUA.

P2 might be sent on the old queue (stored in sk->sk_tx_queue_mapping),
if CPUC runs a bit faster (or CPUB spins a bit on qdisc lock)

Effect of this bug is TCP reorders, and more generally not optimal
TX queue placement. (A victim bulk flow can be migrated to the wrong TX
queue for a while)

To fix this, we have to record sender cpu number the first time
dev_queue_xmit() is called for one tx skb.

We can union napi_id (used on receive path) and sender_cpu,
granted we clear sender_cpu in skb_scrub_packet() (credit to Willem for
this union idea)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

can: kvaser_usb: Ignore spurious error events after a busoff

Sending data in high speed then introducing a busoff results
in spurious BUS_ERROR events from the USBCan-II firmware directly
_after_ the triggered BUS_OFF event.

In the current CAN state handling code, this will lead to an
invalid can state of ACTIVE, ERROR, or PASSIVE even though the
CAN controller has been already shut down due to the busoff.

Guard the state handling code from such invalid events.

Signed-off-by: Ahmed S. Darwish <ahmed.darwish@valeo.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: janz-ican3: fix type mismatch in assignment

return type of wait_for_completion_timeout is unsigned long not int, this patch
removes the type mismatch by moving the call into the condition.

Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

vhost: vhost_scsi_handle_vq() should just use copy_from_user()

it has just verified that it asks no more than the length of the
first segment of iovec.

And with that the last user of stuff in lib/iovec.c is gone.
RIP.

Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Nicholas A. Bellinger <nab@linux-iscsi.org>
Cc: kvm@vger.kernel.org
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

vhost: don't bother copying iovecs in handle_rx(), kill memcpy_toiovecend()

Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: kvm@vger.kernel.org
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

vhost: don't bother with copying iovec in handle_tx()

just advance the msg.msg_iter and be done with that.

Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: kvm@vger.kernel.org
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

vhost: switch vhost get_indirect() to iov_iter, kill memcpy_fromiovec()

Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: kvm@vger.kernel.org
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

net: switch sockets to ->read_iter/->write_iter

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

net/socket.c: fold do_sock_{read,write} into callers

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

crypto: switch af_alg_make_sg() to iov_iter

With that, all ->sendmsg() instances are converted to iov_iter primitives
and are agnostic wrt the kind of iov_iter they are working with.
So's the last remaining ->recvmsg() instance that wasn't kind-agnostic yet.
All ->sendmsg() and ->recvmsg() advance ->msg_iter by the amount actually
copied and none of them modifies the underlying iovec, etc.

Cc: linux-crypto@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

net: bury net/core/iovec.c - nothing in there is used anymore

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

tipc: tipc ->sendmsg() conversion

This one needs to copy the same data from user potentially more than
once. Sadly, MTU changes can trigger that ;-/

Cc: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

net: switch memcpy_fromiovec()/memcpy_fromiovecend() users to copy_from_iter()

That takes care of the majority of ->sendmsg() instances - most of them
via memcpy_to_msg() or assorted getfrag() callbacks. One place where we
still keep memcpy_fromiovecend() is tipc - there we potentially read the
same data over and over; separate patch, that...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

ip: convert tcp_sendmsg() to iov_iter primitives

patch is actually smaller than it seems to be - most of it is unindenting
the inner loop body in tcp_sendmsg() itself...

the bit in tcp_input.c is going to get reverted very soon - that's what
memcpy_from_msg() will become, but not in this commit; let's keep it
reasonably contained...

There's one potentially subtle change here: in case of short copy from
userland, mainline tcp_send_syn_data() discards the skb it has allocated
and falls back to normal path, where we'll send as much as possible after
rereading the same data again. This patch trims SYN+data skb instead -
that way we don't need to copy from the same place twice.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

ip: stash a pointer to msghdr in struct ping_fakehdr

... instead of storing its ->mgs_iter.iov there

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

rxrpc: make the users of rxrpc_kernel_send_data() set kvec-backed msg_iter properly

Use iov_iter_kvec() there, get rid of set_fs() games - now that
rxrpc_send_data() uses iov_iter primitives, it'll handle ITER_KVEC just
fine.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

rxrpc: switch rxrpc_send_data() to iov_iter primitives

Convert skb_add_data() to iov_iter; allows to get rid of the explicit
messing with iovec in its only caller - skb_add_data() will keep advancing
->msg_iter for us, so there's no need to similate that manually.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

vmci: propagate msghdr all way down to __qp_memcpy_to_queue()

Switch from passing msg->iov_iter.iov to passing msg itself

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

ipv6: rawv6_send_hdrinc(): pass msghdr

Switch from passing msg->iov_iter.iov to passing msg itself

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

ipv4: raw_send_hdrinc(): pass msghdr

Switch from passing msg->iov_iter.iov to passing msg itself

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

netlink: make the check for "send from tx_ring" deterministic

As it is, zero msg_iovlen means that the first iovec in the kernel
array of iovecs is left uninitialized, so checking if its ->iov_base
is NULL is random. Since the real users of that thing are doing
sendto(fd, NULL, 0, ...), they are getting msg_iovlen = 1 and
msg_iov[0] = {NULL, 0}, which is what this test is trying to catch.
As suggested by davem, let's just check that msg_iovlen was 1 and
msg_iov[0].iov_base was NULL - _that_ is well-defined and it catches
what we want to catch.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Merge branch 'netlabel-next'

Markus Elfring says:

====================
netlabel: Deletion of a few unnecessary checks

Further update suggestions were taken into account after patches were applied
from static source code analysis.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

netlabel: Less function calls in netlbl_mgmt_add_common() after error detection

The functions "cipso_v4_doi_putdef" and "kfree" could be called in some cases
by the netlbl_mgmt_add_common() function during error handling even if the
passed variables contained still a null pointer.

* This implementation detail could be improved by adjustments for jump labels.

* Let us return immediately after the first failed function call according to
the current Linux coding style convention.

* Let us delete also an unnecessary check for the variable "entry" there.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netlabel: Deletion of an unnecessary check before the function call "cipso_v4_doi_free"

The cipso_v4_doi_free() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netlabel: Deletion of an unnecessary check before the function call "cipso_v4_doi_putdef"

The cipso_v4_doi_putdef() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/fsl_pq_mdio: Document supported compatibles

The device tree binding(s) document has fallen out of sync with the
driver code. Update the list of supported devices to reflect current
driver capabilities

Change-Id: I440d8de2ee2d9c3b7b23e69b3da851cab18a4c9a
Signed-off-by: Shruti Kanetkar <Kanetkar.Shruti@gmail.com>
Signed-off-by: Emil Medve <Emilian.Medve@Freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: rfkill: Add Broadcom BCM2E40 bluetooth ACPI ID

This is yet another Broadcom bluetooth chip with ACPI ID BCM2E40.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

Bluetooth: Fix potential NULL dereference

The bnep_get_device function may be triggered by an ioctl just after a
connection has gone down. In such a case the respective L2CAP chan->conn
pointer will get set to NULL (by l2cap_chan_del). This patch adds a
missing NULL check for this case in the bnep_get_device() function.

Reported-by: Patrik Flykt <patrik.flykt@linux.intel.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>

Bluetooth: btusb: Add support for Lite-On (04ca) Broadcom based, BCM43142

Please add support for sub BT chip on the combo card
Broadcom 43142A0 (in Lenovo E145), 04ca:2007

/sys/kernel/debug/usb/devices

T:  Bus=05 Lev=01 Prnt=01 Port=01 Cnt=02 Dev#=  3 Spd=12   MxCh= 0
D:  Ver= 2.00 Cls=ff(vend.) Sub=01 Prot=01 MxPS=64 #Cfgs=  1
P:  Vendor=04ca ProdID=2007 Rev= 1.12
S:  Manufacturer=Broadcom Corp
S:  Product=BCM43142A0
S:  SerialNumber=28E347EC73BD
C:* #Ifs= 4 Cfg#= 1 Atr=e0 MxPwr=  0mA
I:* If#= 0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=01 Prot=01 Driver=(none)
E:  Ad=81(I) Atr=03(Int.) MxPS=  16 Ivl=1ms
E:  Ad=82(I) Atr=02(Bulk) MxPS=  64 Ivl=0ms
E:  Ad=02(O) Atr=02(Bulk) MxPS=  64 Ivl=0ms
I:* If#= 1 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=01 Prot=01 Driver=(none)
E:  Ad=83(I) Atr=01(Isoc) MxPS=   0 Ivl=1ms
E:  Ad=03(O) Atr=01(Isoc) MxPS=   0 Ivl=1ms
I:  If#= 1 Alt= 1 #EPs= 2 Cls=ff(vend.) Sub=01 Prot=01 Driver=(none)
E:  Ad=83(I) Atr=01(Isoc) MxPS=   9 Ivl=1ms
E:  Ad=03(O) Atr=01(Isoc) MxPS=   9 Ivl=1ms
I:  If#= 1 Alt= 2 #EPs= 2 Cls=ff(vend.) Sub=01 Prot=01 Driver=(none)
E:  Ad=83(I) Atr=01(Isoc) MxPS=  17 Ivl=1ms
E:  Ad=03(O) Atr=01(Isoc) MxPS=  17 Ivl=1ms
I:  If#= 1 Alt= 3 #EPs= 2 Cls=ff(vend.) Sub=01 Prot=01 Driver=(none)
E:  Ad=83(I) Atr=01(Isoc) MxPS=  25 Ivl=1ms
E:  Ad=03(O) Atr=01(Isoc) MxPS=  25 Ivl=1ms
I:  If#= 1 Alt= 4 #EPs= 2 Cls=ff(vend.) Sub=01 Prot=01 Driver=(none)
E:  Ad=83(I) Atr=01(Isoc) MxPS=  33 Ivl=1ms
E:  Ad=03(O) Atr=01(Isoc) MxPS=  33 Ivl=1ms
I:  If#= 1 Alt= 5 #EPs= 2 Cls=ff(vend.) Sub=01 Prot=01 Driver=(none)
E:  Ad=83(I) Atr=01(Isoc) MxPS=  49 Ivl=1ms
E:  Ad=03(O) Atr=01(Isoc) MxPS=  49 Ivl=1ms
I:* If#= 2 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=(none)
E:  Ad=84(I) Atr=02(Bulk) MxPS=  32 Ivl=0ms
E:  Ad=04(O) Atr=02(Bulk) MxPS=  32 Ivl=0ms
I:* If#= 3 Alt= 0 #EPs= 0 Cls=fe(app. ) Sub=01 Prot=01 Driver=(none)

Firmware for 04ca:2007 can be extracted from the latest Lenovo E145
Bluetooth driver for Windows (driver is however described as BCM20702
but contains also firwmare for BCM43142).
Search for BCM43142A0_001.001.011.0122.0153.hex within hex files, then
it must be converted using hex2hcd utility. Rename file to
BCM43142A0-04ca-2007.hcd, then move to /lib/firmware/brcm/.

Signed-off-by: Matej Dubovy <matej.dubovy@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: stable@vger.kernel.org

net: sctp: Deletion of an unnecessary check before the function call "kfree"

The kfree() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Acked-By: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'udpv6_lockless_send'

Vladislav Yasevich says:

====================
ipv6: Add lockless UDP send path

This series introduces a lockless UDPv6 send path similar to
what Herbert Xu did for IPv4 a while ago.

There are some difference from IPv4. IPv6 caching for flow
label is a bit different, as well as it requires another cork
cork structure that holds the IPv6 ancillary data.

Please take a look.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: Allow for partial checksums on non-ufo packets

Currntly, if we are not doing UFO on the packet, all UDP
packets will start with CHECKSUM_NONE and thus perform full
checksum computations in software even if device support
IPv6 checksum offloading.

Let's start start with CHECKSUM_PARTIAL if the device
supports it and we are sending only a single packet at
or below mtu size.

Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

udpv6: Add lockless sendmsg() support

This commit adds the same functionaliy to IPv6 that
commit 903ab86d195cca295379699299c5fc10beba31c7
Author: Herbert Xu <herbert@gondor.apana.org.au>
Date: Tue Mar 1 02:36:48 2011 +0000

udp: Add lockless transmit path

added to IPv4.

UDP transmit path can now run without a socket lock,
thus allowing multiple threads to send to a single socket
more efficiently.
This is only used when corking/MSG_MORE is not used.

Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: Introduce udpv6_send_skb()

Now that we can individually construct IPv6 skbs to send, add a
udpv6_send_skb() function to populate the udp header and send the
skb. This allows udp_v6_push_pending_frames() to re-use this
function as well as enables us to add lockless sendmsg() support.

Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: introduce ipv6_make_skb

This commit is very similar to
commit 1c32c5ad6fac8cee1a77449f5abf211e911ff830
Author: Herbert Xu <herbert@gondor.apana.org.au>
Date:   Tue Mar 1 02:36:47 2011 +0000

    inet: Add ip_make_skb and ip_finish_skb

It adds IPv6 version of the helpers ip6_make_skb and ip6_finish_skb.

The job of ip6_make_skb is to collect messages into an ipv6 packet
and poplulate ipv6 eader.  The job of ip6_finish_skb is to transmit
the generated skb.  Together they replicated the job of
ip6_push_pending_frames() while also provide the capability to be
called independently.  This will be needed to add lockless UDP sendmsg
support.

Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: Append sending data to arbitrary queue

Add the ability to append data to arbitrary queue. This
will be needed later to implement lockless UDP sends.

Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: pull cork initialization into its own function.

Pull IPv6 cork initialization into its own function that
can be re-used.  IPv6 specific cork data did not have an
explicit data structure.  This patch creats eone so that
just ipv6 cork data can be as arguemts.  Also, since
IPv6 tries to save the flow label into inet_cork_full
tructure, pass the full cork.

Adjust ip6_cork_release() to take cork data structures.

Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4 : Improve IEEE DCBx support, other minor open-lldp fixes

* Add support for IEEE ets & pfc api.
* Fix bug that resulted in incorrect bandwidth percentage being returned for
CEE peers
* Convert pfc enabled info from firmware format to what dcbnl expects before
returning

Signed-off-by: Anish Bhatt <anish@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/tulip: don't warn about unknown ARM architecture

ARM has 32-byte cache lines, which according to the comment in
the init registers function seems to work best with the default
value of 0x4800 that is also used on sparc and parisc.

This adds ARM to the same list, to use that default but no
longer warn about it.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Grant Grundler <grundler@parisc-linux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hip04: add missing MODULE_LICENSE

The hip04 ethernet driver causes a new compile-time warning
when built as a loadable module:

WARNING: modpost: missing MODULE_LICENSE() in drivers/net/ethernet/hisilicon/hip04_eth.o
see include/linux/module.h for more information

This adds the license as "GPL", which matches the header of the file.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dctcp: loosen requirement to assert ECT(0) during 3WHS

One deployment requirement of DCTCP is to be able to run
in a DC setting along with TCP traffic. As Glenn Judd's
NSDI'15 paper "Attaining the Promise and Avoiding the Pitfalls
of TCP in the Datacenter" [1] (tba) explains, one way to
solve this on switch side is to split DCTCP and TCP traffic
in two queues per switch port based on the DSCP: one queue
soley intended for DCTCP traffic and one for non-DCTCP traffic.

For the DCTCP queue, there's the marking threshold K as
explained in commit e3118e8359bb ("net: tcp: add DCTCP congestion
control algorithm") for RED marking ECT(0) packets with CE.
For the non-DCTCP queue, there's f.e. a classic tail drop queue.
As already explained in e3118e8359bb, running DCTCP at scale
when not marking SYN/SYN-ACK packets with ECT(0) has severe
consequences as for non-ECT(0) packets, traversing the RED
marking DCTCP queue will result in a severe reduction of
connection probability.

This is due to the DCTCP queue being dominated by ECT(0) traffic
and switches handle non-ECT traffic in the RED marking queue
after passing K as drops, where K is usually a low watermark
in order to leave enough tailroom for bursts. Splitting DCTCP
traffic among several queues (ECN and non-ECN queue) is being
considered a terrible idea in the network community as it
splits single flows across multiple network paths.

Therefore, commit e3118e8359bb implements this on Linux as
ECT(0) marked traffic, as we argue that marking all packets
of a DCTCP flow is the only viable solution and also doesn't
speak against the draft.

However, recently, a DCTCP implementation for FreeBSD hit also
their mainline kernel [2]. In order to let them play well
together with Linux' DCTCP, we would need to loosen the
requirement that ECT(0) has to be asserted during the 3WHS as
not implemented in FreeBSD. This simplifies the ECN test and
lets DCTCP work together with FreeBSD.

Joint work with Daniel Borkmann.

[1] https://www.usenix.org/conference/nsdi15/technical-sessions/presentation/judd
[2] https://github.com/freebsd/freebsd/commit/8ad879445281027858a7fa706d13e458095b595f

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Glenn Judd <glenn.judd@morganstanley.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-timestamp'

Willem de Bruijn says:

====================
net-timestamp: blinding

Changes
  (v2 -> v3)
  - rebase only: v2 did not make it to patchwork / netdev
  (v1 -> v2)
  - fix capability check in patch 2
      this could be moved into net/core/sock.c as sk_capable_nouser()
  (rfc -> v1)
  - dropped patch 4: timestamp batching
      due to complexity, as discussed
  - dropped patch 5: default mode
      because it does not really cover all use cases, as discussed
  - added documentation
  - minor fix, see patch 2

Two issues were raised during recent timestamping discussions:
1. looping full packets on the error queue exposes packet headers
2. TCP timestamping with retransmissions generates many timestamps

This RFC patchset is an attempt at addressing both without breaking
legacy behavior.

Patch 1 reintroduces the "no payload" timestamp option, which loops
timestamps onto an empty skb. This reduces the pressure on SO_RCVBUF
from looping many timestamps. It does not reduce the number of recv()
calls needed to process them. The timestamp cookie mechanism developed
in http://patchwork.ozlabs.org/patch/427213/ did, but this is
considerably simpler.

Patch 2 then gives administrators the power to block all timestamp
requests that contain data by unprivileged users. I proposed this
earlier as a backward compatible workaround in the discussion of

  net-timestamp: pull headers for SOCK_STREAM
  http://patchwork.ozlabs.org/patch/414810/

Patch 3 only updates the txtimestamp example to test this option.
Verified that with option '-n', length is zero in all cases and
option '-I' (PKTINFO) stops working.
====================

Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net-timestamp: no-payload option in txtimestamp test

Demonstrate how SOF_TIMESTAMPING_OPT_TSONLY can be used and
test the implementation.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net-timestamp: no-payload only sysctl

Tx timestamps are looped onto the error queue on top of an skb. This
mechanism leaks packet headers to processes unless the no-payload
options SOF_TIMESTAMPING_OPT_TSONLY is set.

Add a sysctl that optionally drops looped timestamp with data. This
only affects processes without CAP_NET_RAW.

The policy is checked when timestamps are generated in the stack.
It is possible for timestamps with data to be reported after the
sysctl is set, if these were queued internally earlier.

No vulnerability is immediately known that exploits knowledge
gleaned from packet headers, but it may still be preferable to allow
administrators to lock down this path at the cost of possible
breakage of legacy applications.

Signed-off-by: Willem de Bruijn <willemb@google.com>
----

Changes
  (v1 -> v2)
  - test socket CAP_NET_RAW instead of capable(CAP_NET_RAW)
  (rfc -> v1)
  - document the sysctl in Documentation/sysctl/net.txt
  - fix access control race: read .._OPT_TSONLY only once,
        use same value for permission check and skb generation.
Signed-off-by: David S. Miller <davem@davemloft.net>

net-timestamp: no-payload option

Add timestamping option SOF_TIMESTAMPING_OPT_TSONLY. For transmit
timestamps, this loops timestamps on top of empty packets.

Doing so reduces the pressure on SO_RCVBUF. Payload inspection and
cmsg reception (aside from timestamps) are no longer possible. This
works together with a follow on patch that allows administrators to
only allow tx timestamping if it does not loop payload or metadata.

Signed-off-by: Willem de Bruijn <willemb@google.com>
----

Changes (rfc -> v1)
- add documentation
- remove unnecessary skb->len test (thanks to Richard Cochran)
Signed-off-by: David S. Miller <davem@davemloft.net>

Bluetooth: Remove mgmt_rp_read_local_oob_ext_data struct

This extended return parameters struct conflicts with the new Read Local
OOB Extended Data command definition. To avoid the conflict simply
rename the old "extended" version to the normal one and update the code
appropriately to take into account the two possible response PDU sizes.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>