platform/kernel/linux-exynos.git
8 years agonet: napi_hash_del() returns a boolean status
Eric Dumazet [Wed, 18 Nov 2015 14:31:02 +0000 (06:31 -0800)]
net: napi_hash_del() returns a boolean status

napi_hash_del() will soon be used from both drivers (if they want)
or core networking stack.

Callers are responsibles to ensure an RCU grace period is respected
before freeing napi structure : napi_hash_del() can signal if
this RCU grace period is needed or not.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: move napi_hash[] into read mostly section
Eric Dumazet [Wed, 18 Nov 2015 14:31:01 +0000 (06:31 -0800)]
net: move napi_hash[] into read mostly section

We do not often add/delete a napi context.
Moving napi_hash[] into read mostly section avoids potential false sharing.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: add netif_tx_napi_add()
Eric Dumazet [Wed, 18 Nov 2015 14:31:00 +0000 (06:31 -0800)]
net: add netif_tx_napi_add()

netif_tx_napi_add() is a variant of netif_napi_add()

It should be used by drivers that use a napi structure
to exclusively poll TX.

We do not want to add this kind of napi in napi_hash[] in following
patches, adding generic busy polling to all NAPI drivers.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: move skb_mark_napi_id() into core networking stack
Eric Dumazet [Wed, 18 Nov 2015 14:30:59 +0000 (06:30 -0800)]
net: move skb_mark_napi_id() into core networking stack

We would like to automatically provide busy polling support
to all NAPI drivers, without them having to implement anything.

skb_mark_napi_id() can be called from napi_gro_receive() and
napi_get_frags().

Few drivers are still calling skb_mark_napi_id() because
they use netif_receive_skb(). They should eventually call
napi_gro_receive() instead. I will leave this to drivers
maintainers.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agomlx4: remove mlx4_en_low_latency_recv()
Eric Dumazet [Wed, 18 Nov 2015 14:30:58 +0000 (06:30 -0800)]
mlx4: remove mlx4_en_low_latency_recv()

Busy polling can now be handled in generic NAPI poll infrastructure.
This removes complexity and fast path overhead :

mlx4 used two spin_lock()/spin_unlock() pair per napi->poll() call
in mlx4_en_cq_lock_napi()/mlx4_en_cq_unlock_napi()

Tested:

Without busy polling :

lpaa23:~# echo 0 >/proc/sys/net/core/busy_read
lpaa24:~# echo 0 >/proc/sys/net/core/busy_read
lpaa23:~# ./netperf -H lpaa24 -t TCP_RR
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpaa24.prod.google.com () port 0 AF_INET : first burst 0
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size     Size    Time     Rate
bytes  Bytes  bytes    bytes   secs.    per sec

16384  87380  1        1       10.00    47330.78

With busy polling :

lpaa23:~# echo 70 >/proc/sys/net/core/busy_read
lpaa24:~# echo 70 >/proc/sys/net/core/busy_read
lpaa23:~# ./netperf -H lpaa24 -t TCP_RR
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpaa24.prod.google.com () port 0 AF_INET : first burst 0
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size     Size    Time     Rate
bytes  Bytes  bytes    bytes   secs.    per sec

16384  87380  1        1       10.00    97643.55

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobnx2x: remove bnx2x_low_latency_recv() support
Eric Dumazet [Wed, 18 Nov 2015 14:30:57 +0000 (06:30 -0800)]
bnx2x: remove bnx2x_low_latency_recv() support

Switch to native NAPI polling, as this reduces overhead and complexity.

Normal path is faster, since one cmpxchg() is not anymore requested,
and busy polling with the NAPI polling has same performance.

Tested:
lpk50:~# cat /proc/sys/net/core/busy_read
70
lpk50:~# nstat >/dev/null;./netperf -H lpk55 -t TCP_RR;nstat
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpk55.prod.google.com () port 0 AF_INET : first burst 0
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size     Size    Time     Rate
bytes  Bytes  bytes    bytes   secs.    per sec

16384  87380  1        1       10.00    40095.07
16384  87380
IpInReceives                    401062             0.0
IpInDelivers                    401062             0.0
IpOutRequests                   401079             0.0
TcpActiveOpens                  7                  0.0
TcpPassiveOpens                 3                  0.0
TcpAttemptFails                 3                  0.0
TcpEstabResets                  5                  0.0
TcpInSegs                       401036             0.0
TcpOutSegs                      401052             0.0
TcpOutRsts                      38                 0.0
UdpInDatagrams                  26                 0.0
UdpOutDatagrams                 27                 0.0
Ip6OutNoRoutes                  1                  0.0
TcpExtDelayedACKs               1                  0.0
TcpExtTCPPrequeued              98                 0.0
TcpExtTCPDirectCopyFromPrequeue 98                 0.0
TcpExtTCPHPHits                 4                  0.0
TcpExtTCPHPHitsToUser           98                 0.0
TcpExtTCPPureAcks               5                  0.0
TcpExtTCPHPAcks                 101                0.0
TcpExtTCPAbortOnData            6                  0.0
TcpExtBusyPollRxPackets         400832             0.0
TcpExtTCPOrigDataSent           400983             0.0
IpExtInOctets                   21273867           0.0
IpExtOutOctets                  21261254           0.0
IpExtInNoECTPkts                401064             0.0

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agomlx5: support napi_complete_done()
Eric Dumazet [Wed, 18 Nov 2015 14:30:56 +0000 (06:30 -0800)]
mlx5: support napi_complete_done()

A NAPI poll handler should return number of RX packets processed,
instead of 0 / budget.

This allows proper busy poll accounting through LINUX_MIB_BUSYPOLLRXPACKETS
SNMP counter.

napi_complete_done() allows /sys/class/net/ethX/gro_flush_timeout
to be used for finer GRO aggregation control.

Tested:

Enabled busy polling, and checked TcpExtBusyPollRxPackets counter is increasing.

echo 70 >/proc/sys/net/core/busy_read
nstat >/dev/null
netperf -H target -t TCP_RR >/dev/null
nstat | grep TcpExtBusyPollRxPackets
TcpExtBusyPollRxPackets         490958             0.0

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agomlx5: add busy polling support
Eric Dumazet [Wed, 18 Nov 2015 14:30:55 +0000 (06:30 -0800)]
mlx5: add busy polling support

It is now easy to add busy polling support to a NAPI driver,
with very little impact on normal input path.

This patch serves as a reference implementation.

Note:

A followup patch will add proper napi_complete_done() in mlx5,
so that LINUX_MIB_BUSYPOLLRXPACKETS snmp counter is properly handled.

Tested:

Normal TCP_RR results without busy polling :

lpk51:~# echo 0 >/proc/sys/net/core/busy_read
lpk52:~# echo 0 >/proc/sys/net/core/busy_read

lpk51:~# ./netperf -H 192.168.4.52 -t TCP_RR -l 10
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.4.52 () port 0 AF_INET : first burst 0
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size     Size    Time     Rate
bytes  Bytes  bytes    bytes   secs.    per sec

16384  87380  1        1       10.00    53509.49
16384  87380

Now enable busy polling :

lpk51:~# echo 70 >/proc/sys/net/core/busy_read
lpk52:~# echo 70 >/proc/sys/net/core/busy_read

lpk51:~# ./netperf -H 192.168.4.52 -t TCP_RR -l 10
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.4.52 () port 0 AF_INET : first burst 0
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size     Size    Time     Rate
bytes  Bytes  bytes    bytes   secs.    per sec

16384  87380  1        1       10.00    97530.92
16384  87380

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: network drivers no longer need to implement ndo_busy_poll()
Eric Dumazet [Wed, 18 Nov 2015 14:30:54 +0000 (06:30 -0800)]
net: network drivers no longer need to implement ndo_busy_poll()

Instead of having to implement complex ndo_busy_poll() method,
drivers can simply rely on NAPI poll logic.

Busy polling gains are mainly coming from polling itself,
not on exact details on how we poll the device.

ndo_busy_poll() if implemented can avoid touching
napi state, but it adds extra synchronization between
normal napi->poll() and busy poll handler, slowing down
the common path (non busy polling) with extra atomic operations.
In practice few drivers ever got busy poll because of the complexity.

We could go one step further, and make busy polling
available for all NAPI drivers, but this would require
that all netif_napi_del() calls are done in process context
so that we can call synchronize_rcu().
Full audit would be required.

Before this is done, a driver still needs to call :

- skb_mark_napi_id() for each skb provided to the stack.
- napi_hash_add() and napi_hash_del() to allocate a napi_id per napi struct.
- Make sure RCU grace period is respected after napi_hash_del() before
  memory containing napi structure is freed.

Followup patch implements busy poll for mlx5 driver as an example.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: allow BH servicing in sk_busy_loop()
Eric Dumazet [Wed, 18 Nov 2015 14:30:53 +0000 (06:30 -0800)]
net: allow BH servicing in sk_busy_loop()

Instead of blocking BH in whole sk_busy_loop(), block them
only around ->ndo_busy_poll() calls.

This has many benefits.

1) allow tunneled traffic to use busy poll as well as native traffic.
   Tunnels handlers usually call netif_rx() and depend on net_rx_action()
   being run (from sofirq handler)

2) allow RFS/RPS being used (sending IPI to other cpus if needed)

3) use the 'lets burn cpu cycles' budget to do useful work
   (like TX completions, timers, RCU callbacks...)

4) reduce BH latencies, making busy poll a better citizen.

Tested:

Tested with SIT tunnel

lpaa5:~# echo 0 >/proc/sys/net/core/busy_read
lpaa5:~# ./netperf -H 2002:af6:786::1 -t TCP_RR
MIGRATED TCP REQUEST/RESPONSE TEST from ::0 (::) port 0 AF_INET6 to 2002:af6:786::1 () port 0 AF_INET6 : first burst 0
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size     Size    Time     Rate
bytes  Bytes  bytes    bytes   secs.    per sec

16384  87380  1        1       10.00    37373.93
16384  87380

Now enable busy poll on both hosts

lpaa5:~# echo 70 >/proc/sys/net/core/busy_read
lpaa6:~# echo 70 >/proc/sys/net/core/busy_read

lpaa5:~# ./netperf -H 2002:af6:786::1 -t TCP_RR
MIGRATED TCP REQUEST/RESPONSE TEST from ::0 (::) port 0 AF_INET6 to 2002:af6:786::1 () port 0 AF_INET6 : first burst 0
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size     Size    Time     Rate
bytes  Bytes  bytes    bytes   secs.    per sec

16384  87380  1        1       10.00    58314.77
16384  87380

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: un-inline sk_busy_loop()
Eric Dumazet [Wed, 18 Nov 2015 14:30:52 +0000 (06:30 -0800)]
net: un-inline sk_busy_loop()

There is really little gain from inlining this big function.
We'll soon make it even bigger in following patches.

This means we no longer need to export napi_by_id()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agomlx4: mlx4_en_low_latency_recv() called with BH disabled
Eric Dumazet [Wed, 18 Nov 2015 14:30:51 +0000 (06:30 -0800)]
mlx4: mlx4_en_low_latency_recv() called with BH disabled

mlx4_en_low_latency_recv() is called with BH disabled,
as other ndo_busy_poll() methods.

No need for spin_lock_bh()/spin_unlock_bh()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: better skb->sender_cpu and skb->napi_id cohabitation
Eric Dumazet [Wed, 18 Nov 2015 14:30:50 +0000 (06:30 -0800)]
net: better skb->sender_cpu and skb->napi_id cohabitation

skb->sender_cpu and skb->napi_id share a common storage,
and we had various bugs about this.

We had to call skb_sender_cpu_clear() in some places to
not leave a prior skb->napi_id and fool netdev_pick_tx()

As suggested by Alexei, we could split the space so that
these errors can not happen.

0 value being reserved as the common (not initialized) value,
let's reserve [1 .. NR_CPUS] range for valid sender_cpu,
and [NR_CPUS+1 .. ~0U] for valid napi_id.

This will allow proper busy polling support over tunnels.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobe2net: remove local variable 'status'
Ivan Vecera [Wed, 18 Nov 2015 13:06:34 +0000 (14:06 +0100)]
be2net: remove local variable 'status'

The lancer_cmd_get_file_len() uses lancer_cmd_read_object() to get
the current size of registers for ethtool registers dump. Returned status
value is stored but not checked. The check itself is not necessary as
the data_read output variable is initialized to 0 and status variable
can be removed.

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: hisilicon: fix binding document of mdio
huangdaode [Wed, 18 Nov 2015 02:08:00 +0000 (10:08 +0800)]
net: hisilicon: fix binding document of mdio

This patch explains the occasion of "hisilcon,mdio" and
"hisilicon,hns-mdio" according to Arnd's comments.
and reformat it according to comments from Rob<robh@kernel.org>.

Signed-off-by: huangdaode <huangdaode@hisilicon.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet ipv4: use preferred log methods
Bastian Stender [Fri, 13 Nov 2015 10:40:34 +0000 (11:40 +0100)]
net ipv4: use preferred log methods

Replace printk calls with preferred unconditional log method calls to keep
kernel messages clean.

Added newline to "too small MTU" message.

Signed-off-by: Bastian Stender <bst@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Linus Torvalds [Wed, 18 Nov 2015 16:59:29 +0000 (08:59 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/s390/linux

Pull s390 fixes from Martin Schwidefsky:
 "Assorted bug fixes, the mlock2 system call gets added, and one
  improvement.  The boot from dasd devices is now possible from a wider
  range of devices"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390: remove SALIPL loader
  s390: wire up mlock2 system call
  s390: remove g5 elf platform support
  s390: avoid cache aliasing under z/VM and KVM
  s390/sclp: _sclp_wait_int(): retain full PSW mask
  s390/zcrypt: Fix initialisation when zcrypt is built-in
  s390/zcrypt: Fix kernel crash on systems without AP bus support
  s390: add support for ipl devices in subchannel sets > 0
  s390/ipl: fix out of bounds access in scpdata_write
  s390/pci_dma: improve debugging of errors during dma map
  s390/pci_dma: handle dma table failures
  s390/pci_dma: unify label of invalid translation table entries
  s390/syscalls: remove system call number calculation
  s390/cio: simplify css_generate_pgid
  s390/diag: add a s390 prefix to the diagnose trace point
  s390/head: fix error message on unsupported hardware

8 years agoMerge tag 'hwmon-for-linus-v4.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Wed, 18 Nov 2015 16:43:29 +0000 (08:43 -0800)]
Merge tag 'hwmon-for-linus-v4.4-rc2' of git://git./linux/kernel/git/groeck/linux-staging

Pull hwmon fixes from Guenter Roeck:
 "Fix build issues in scpi and ina2xx drivers, update scpi driver to
  support recent firmware, and fix an uninitialized variable warning in
  applesmc driver"

* tag 'hwmon-for-linus-v4.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (scpi) skip unsupported sensors properly
  hwmon: (scpi) add thermal-of dependency
  hwmon : (applesmc) Fix uninitialized variables warnings
  hwmon: (ina2xx) Fix build issue by selecting REGMAP_I2C

8 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Linus Torvalds [Tue, 17 Nov 2015 21:52:59 +0000 (13:52 -0800)]
Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

 1) Fix list tests in netfilter ingress support, from Florian Westphal.

 2) Fix reversal of input and output interfaces in ingress hook
    invocation, from Pablo Neira Ayuso.

 3) We have a use after free in r8169, caught by Dave Jones, fixed by
    Francois Romieu.

 4) Splice use-after-free fix in AF_UNIX frmo Hannes Frederic Sowa.

 5) Three ipv6 route handling bug fixes from Martin KaFai Lau:
    a) Don't create clone routes not managed by the fib6 tree
    b) Don't forget to check expiration of DST_NOCACHE routes.
    c) Handle rt->dst.from == NULL properly.

 6) Several AF_PACKET fixes wrt transport header setting and SKB
    protocol setting, from Daniel Borkmann.

 7) Fix thunder driver crash on shutdown, from Pavel Fedin.

 8) Several Mellanox driver fixes (max MTU calculations, use of correct
    DMA unmap in TX path, etc.) from Saeed Mahameed, Tariq Toukan, Doron
    Tsur, Achiad Shochat, Eran Ben Elisha, and Noa Osherovich.

 9) Several mv88e6060 DSA driver fixes (wrong bit definitions for
    certain registers, etc.) from Neil Armstrong.

10) Make sure to disable preemption while updating per-cpu stats of ip
    tunnels, from Jason A.  Donenfeld.

11) Various ARM64 bpf JIT fixes, from Yang Shi.

12) Flush icache properly in ARM JITs, from Daniel Borkmann.

13) Fix masking of RX and TX interrupts in ravb driver, from Masaru
    Nagai.

14) Fix netdev feature propagation for devices not implementing
    ->ndo_set_features().  From Nikolay Aleksandrov.

15) Big endian fix in vmxnet3 driver, from Shrikrishna Khare.

16) RAW socket code increments incorrect SNMP counters, fix from Ben
    Cartwright-Cox.

17) IPv6 multicast SNMP counters are bumped twice, fix from Neil Horman.

18) Fix handling of VLAN headers on stacked devices when REORDER is
    disabled.  From Vlad Yasevich.

19) Fix SKB leaks and use-after-free in ipvlan and macvlan drivers, from
    Sabrina Dubroca.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (83 commits)
  MAINTAINERS: Update Mellanox's Eth NIC driver entries
  net/core: revert "net: fix __netdev_update_features return.." and add comment
  af_unix: take receive queue lock while appending new skb
  rtnetlink: fix frame size warning in rtnl_fill_ifinfo
  net: use skb_clone to avoid alloc_pages failure.
  packet: Use PAGE_ALIGNED macro
  packet: Don't check frames_per_block against negative values
  net: phy: Use interrupts when available in NOLINK state
  phy: marvell: Add support for 88E1540 PHY
  arm64: bpf: make BPF prologue and epilogue align with ARM64 AAPCS
  macvlan: fix leak in macvlan_handle_frame
  ipvlan: fix use after free of skb
  ipvlan: fix leak in ipvlan_rcv_frame
  vlan: Do not put vlan headers back on bridge and macvlan ports
  vlan: Fix untag operations of stacked vlans with REORDER_HEADER off
  via-velocity: unconditionally drop frames with bad l2 length
  ipg: Remove ipg driver
  dl2k: Add support for IP1000A-based cards
  snmp: Remove duplicate OUTMCAST stat increment
  net: thunder: Check for driver data in nicvf_remove()
  ...

8 years agoMAINTAINERS: Update Mellanox's Eth NIC driver entries
Or Gerlitz [Tue, 17 Nov 2015 16:25:07 +0000 (18:25 +0200)]
MAINTAINERS: Update Mellanox's Eth NIC driver entries

Eugenia (Jenny) Emantayev is replacing Amir Vadai as the
mlx4 Ethernet driver maintainer.

Saeed Mahameed is assigned to maintain mlx5 Eth functionality.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/core: revert "net: fix __netdev_update_features return.." and add comment
Nikolay Aleksandrov [Tue, 17 Nov 2015 14:49:06 +0000 (15:49 +0100)]
net/core: revert "net: fix __netdev_update_features return.." and add comment

This reverts commit 00ee59271777 ("net: fix __netdev_update_features return
on ndo_set_features failure")
and adds a comment explaining why it's okay to return a value other than
0 upon error. Some drivers might actually change flags and return an
error so it's better to fire a spurious notification rather than miss
these.

CC: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoaf_unix: take receive queue lock while appending new skb
Hannes Frederic Sowa [Tue, 17 Nov 2015 14:10:59 +0000 (15:10 +0100)]
af_unix: take receive queue lock while appending new skb

While possibly in future we don't necessarily need to use
sk_buff_head.lock this is a rather larger change, as it affects the
af_unix fd garbage collector, diag and socket cleanups. This is too much
for a stable patch.

For the time being grab sk_buff_head.lock without disabling bh and irqs,
so don't use locked skb_queue_tail.

Fixes: 869e7c62486e ("net: af_unix: implement stream sendpage support")
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Reported-by: Eric Dumazet <edumazet@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agortnetlink: fix frame size warning in rtnl_fill_ifinfo
Hannes Frederic Sowa [Tue, 17 Nov 2015 13:16:52 +0000 (14:16 +0100)]
rtnetlink: fix frame size warning in rtnl_fill_ifinfo

Fix the following warning:

  CC      net/core/rtnetlink.o
net/core/rtnetlink.c: In function ‘rtnl_fill_ifinfo’:
net/core/rtnetlink.c:1308:1: warning: the frame size of 2864 bytes is larger than 2048 bytes [-Wframe-larger-than=]
 }
 ^
by splitting up the huge rtnl_fill_ifinfo into some smaller ones, so we
don't have the huge frame allocations at the same time.

Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: use skb_clone to avoid alloc_pages failure.
Martin Zhang [Tue, 17 Nov 2015 12:49:30 +0000 (20:49 +0800)]
net: use skb_clone to avoid alloc_pages failure.

1. new skb only need dst and ip address(v4 or v6).
2. skb_copy may need high order pages, which is very rare on long running server.

Signed-off-by: Junwei Zhang <linggao.zjw@alibaba-inc.com>
Signed-off-by: Martin Zhang <martinbj2008@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agopacket: Use PAGE_ALIGNED macro
Tobias Klauser [Tue, 17 Nov 2015 09:40:21 +0000 (10:40 +0100)]
packet: Use PAGE_ALIGNED macro

Use PAGE_ALIGNED(...) instead of open-coding it.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agopacket: Don't check frames_per_block against negative values
Tobias Klauser [Tue, 17 Nov 2015 09:38:36 +0000 (10:38 +0100)]
packet: Don't check frames_per_block against negative values

rb->frames_per_block is an unsigned int, thus can never be negative.

Also fix spacing in the calculation of frames_per_block.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: phy: Use interrupts when available in NOLINK state
Andrew Lunn [Mon, 16 Nov 2015 22:36:46 +0000 (23:36 +0100)]
net: phy: Use interrupts when available in NOLINK state

The NOLINK state will poll the phy once a second to see if the link
has come up. If the phy has an interrupt line, this polling can be
skipped, since the phy should interrupt when the link returns.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agophy: marvell: Add support for 88E1540 PHY
Andrew Lunn [Mon, 16 Nov 2015 22:34:41 +0000 (23:34 +0100)]
phy: marvell: Add support for 88E1540 PHY

The 88E1540 can be found embedded in the Marvell 88E6352 switch.  It
is compatible with the 88E1510, so add support for it, using the
88E1510 specific functions.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoarm64: bpf: make BPF prologue and epilogue align with ARM64 AAPCS
Yang Shi [Mon, 16 Nov 2015 22:35:35 +0000 (14:35 -0800)]
arm64: bpf: make BPF prologue and epilogue align with ARM64 AAPCS

Save and restore FP/LR in BPF prog prologue and epilogue, save SP to FP
in prologue in order to get the correct stack backtrace.

However, ARM64 JIT used FP (x29) as eBPF fp register, FP is subjected to
change during function call so it may cause the BPF prog stack base address
change too.

Use x25 to replace FP as BPF stack base register (fp). Since x25 is callee
saved register, so it will keep intact during function call.
It is initialized in BPF prog prologue when BPF prog is started to run
everytime. Save and restore x25/x26 in BPF prologue and epilogue to keep
them intact for the outside of BPF. Actually, x26 is unnecessary, but SP
requires 16 bytes alignment.

So, the BPF stack layout looks like:

                                 high
         original A64_SP =>   0:+-----+ BPF prologue
                                |FP/LR|
         current A64_FP =>  -16:+-----+
                                | ... | callee saved registers
                                +-----+
                                |     | x25/x26
         BPF fp register => -80:+-----+
                                |     |
                                | ... | BPF prog stack
                                |     |
                                |     |
         current A64_SP =>      +-----+
                                |     |
                                | ... | Function call stack
                                |     |
                                +-----+
                                  low

CC: Zi Shen Lim <zlim.lnx@gmail.com>
CC: Xi Wang <xi.wang@gmail.com>
Signed-off-by: Yang Shi <yang.shi@linaro.org>
Acked-by: Zi Shen Lim <zlim.lnx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agomacvlan: fix leak in macvlan_handle_frame
Sabrina Dubroca [Mon, 16 Nov 2015 21:54:20 +0000 (22:54 +0100)]
macvlan: fix leak in macvlan_handle_frame

Reset pskb in macvlan_handle_frame in case skb_share_check returned a
clone.

Fixes: 8a4eb5734e8d ("net: introduce rx_handler results and logic around that")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoipvlan: fix use after free of skb
Sabrina Dubroca [Mon, 16 Nov 2015 21:44:53 +0000 (22:44 +0100)]
ipvlan: fix use after free of skb

ipvlan_handle_frame is a rx_handler, and when it returns a value other
than RX_HANDLER_CONSUMED (here, NET_RX_DROP aka RX_HANDLER_ANOTHER),
__netif_receive_skb_core expects that the skb still exists and will
process it further, but we just freed it.

Fixes: 2ad7bf363841 ("ipvlan: Initial check-in of the IPVLAN driver.")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoipvlan: fix leak in ipvlan_rcv_frame
Sabrina Dubroca [Mon, 16 Nov 2015 21:34:26 +0000 (22:34 +0100)]
ipvlan: fix leak in ipvlan_rcv_frame

Pass a **skb to ipvlan_rcv_frame so that if skb_share_check returns a
new skb, we actually use it during further processing.

It's safe to ignore the new skb in the ipvlan_xmit_* functions, because
they call ipvlan_rcv_frame with local == true, so that dev_forward_skb
is called and always takes ownership of the skb.

Fixes: 2ad7bf363841 ("ipvlan: Initial check-in of the IPVLAN driver.")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'vlan-reorder'
David S. Miller [Tue, 17 Nov 2015 19:38:36 +0000 (14:38 -0500)]
Merge branch 'vlan-reorder'

Vladislav Yasevich says:

====================
Fix issues with vlans without REORDER_HEADER

A while ago Phil Sutter brought up an issue with vlans without
REORDER_HEADER and bridges.  The problem was that if a vlan
without REORDER_HEADER was a port in the bridge, the bridge ended
up forwarding corrupted packets that still contained the vlan header.
The same issue exists for bridge mode macvlan/macvtap devices.

An additional issue with vlans without REORDER_HEADER is that stacking
them also doesn't work.  The reason here is that skb_reorder_vlan_header()
function assumes that it on ETH_HLEN bytes deep into the packet.  That
is not the case, when you a vlan without REORRDER_HEADER flag set.

This series attempts to correct these 2 issues.

1) To solve the stacked vlans problem, the patch simply use
skb->mac_len as an offset to start copying mac addresses that
is part of header reordering.

2) To fix the issue with bridge/macvlan/macvtap, the second patch
simply doesn't write the vlan header back to the packet if the
vlan device is either a bridge or a macvlan port.  This ends up
being the simplest and least performance intrussive solution.

I've considered extending patch 2 to all stacked devices (essentially
checked for the presense of rx_handler), but that feels like a broader
restriction and _may_ break existing uses.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agovlan: Do not put vlan headers back on bridge and macvlan ports
Vlad Yasevich [Mon, 16 Nov 2015 20:43:45 +0000 (15:43 -0500)]
vlan: Do not put vlan headers back on bridge and macvlan ports

When a vlan is configured with REORDER_HEADER set to 0, the vlan
header is put back into the packet and makes it appear that
the vlan header is still there even after it's been processed.
This posses a problem for bridge and macvlan ports.  The packets
passed to those device may be forwarded and at the time of the
forward, vlan headers end up being unexpectedly present.

With the patch, we make sure that we do not put the vlan header
back (when REORDER_HEADER is 0) if a bridge or macvlan has
been configured on top of the vlan device.

Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agovlan: Fix untag operations of stacked vlans with REORDER_HEADER off
Vlad Yasevich [Mon, 16 Nov 2015 20:43:44 +0000 (15:43 -0500)]
vlan: Fix untag operations of stacked vlans with REORDER_HEADER off

When we have multiple stacked vlan devices all of which have
turned off REORDER_HEADER flag, the untag operation does not
locate the ethernet addresses correctly for nested vlans.
The reason is that in case of REORDER_HEADER flag being off,
the outer vlan headers are put back and the mac_len is adjusted
to account for the presense of the header.  Then, the subsequent
untag operation, for the next level vlan, always use VLAN_ETH_HLEN
to locate the begining of the ethernet header and that ends up
being a multiple of 4 bytes short of the actuall beginning
of the mac header (the multiple depending on the how many vlan
encapsulations ethere are).

As a reslult, if there are multiple levles of vlan devices
with REODER_HEADER being off, the recevied packets end up
being dropped.

To solve this, we use skb->mac_len as the offset.  The value
is always set on receive path and starts out as a ETH_HLEN.
The value is also updated when the vlan header manupations occur
so we know it will be correct.

Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agovia-velocity: unconditionally drop frames with bad l2 length
Timo Teräs [Mon, 16 Nov 2015 12:36:32 +0000 (14:36 +0200)]
via-velocity: unconditionally drop frames with bad l2 length

By default the driver allowed incorrect frames to be received. What is
worse the code does not handle very short frames correctly. The FCS
length is unconditionally subtracted, and the underflow can cause
skb_put to be called with large number after implicit cast to unsigned.
And indeed, an skb_over_panic() was observed with via-velocity.

This removes the module parameter as it does not work in it's
current state, and should be implemented via NETIF_F_RXALL if needed.

Suggested-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: Timo Teräs <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Linus Torvalds [Tue, 17 Nov 2015 18:11:08 +0000 (10:11 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs

Pull vfs fixes from Al Viro:
 "A fs-cache regression fix, and adding a warning about obnoxiou^W
  moderation of list given in MAINTAINERS"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  MAINTAINERS: linux-cachefs@redhat.com is moderated for non-subscribers
  FS-Cache: Add missing initialization of ret in cachefiles_write_page()

8 years agoMerge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Linus Torvalds [Tue, 17 Nov 2015 17:40:05 +0000 (09:40 -0800)]
Merge branch 'linus' of git://git./linux/kernel/git/herbert/crypto-2.6

Pull crypto fix from Herbert Xu:
 "This fixes a bug in the qat driver where a user-space pointer is
  dereferenced"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: qat - don't use userspace pointer

8 years agoMAINTAINERS: linux-cachefs@redhat.com is moderated for non-subscribers
Geert Uytterhoeven [Thu, 12 Nov 2015 11:46:33 +0000 (11:46 +0000)]
MAINTAINERS: linux-cachefs@redhat.com is moderated for non-subscribers

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
8 years agoFS-Cache: Add missing initialization of ret in cachefiles_write_page()
Geert Uytterhoeven [Thu, 12 Nov 2015 11:46:23 +0000 (11:46 +0000)]
FS-Cache: Add missing initialization of ret in cachefiles_write_page()

fs/cachefiles/rdwr.c: In function ‘cachefiles_write_page’:
fs/cachefiles/rdwr.c:882: warning: ‘ret’ may be used uninitialized in
this function

If the jump to label "error" is taken, "ret" will indeed be
uninitialized, and random stack data may be printed by the debug code.

Fixes: 102f4d900c9c8f5e ("FS-Cache: Handle a write to the page immediately beyond the EOF marker")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
8 years agoipg: Remove ipg driver
Ondrej Zary [Sun, 15 Nov 2015 21:36:12 +0000 (22:36 +0100)]
ipg: Remove ipg driver

Now that IP1000A chips are supported by dl2k driver, the buggy ipg
driver can be removed.

Signed-off-by: Ondrej Zary <linux@rainbow-software.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agodl2k: Add support for IP1000A-based cards
Ondrej Zary [Sun, 15 Nov 2015 21:36:11 +0000 (22:36 +0100)]
dl2k: Add support for IP1000A-based cards

Add support for IP1000A chips to dl2k driver.
IP1000A chip looks like a TC9020 with integrated PHY.

This allows IP1000A chips to work reliably because the ipg driver is
buggy - it loses packets under load and then completely stops
transmitting data.

Tested with Asus NX1101 v2.0 at 10, 100 and 1000Mbps:
vendor=0x13f0 device=0x1023 (rev 0x41)
subsystem vendor=0x1043 device=0x8180

MAC address registers access needed to be changed from 8-bit to 16-bit
because 8-bit does not work on IP1000A. 8-bit access is not even
allowed in the TC9020 datasheet (although it worked). 16-bit access
works on both.

Tested that it does not break D-Link DGE-550T (DL-2000 chip, probably
a rebranded TC9020):
vendor=0x1186 device=0x4000 (rev 0x0c)
subsystem vendor=0x1186 device=0x4000

Signed-off-by: Ondrej Zary <linux@rainbow-software.org>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agosnmp: Remove duplicate OUTMCAST stat increment
Neil Horman [Mon, 16 Nov 2015 18:09:10 +0000 (13:09 -0500)]
snmp: Remove duplicate OUTMCAST stat increment

the OUTMCAST stat is double incremented, getting bumped once in the mcast code
itself, and again in the common ip output path.  Remove the mcast bump, as its
not needed

Validated by the reporter, with good results

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: Claus Jensen <claus.jensen@microsemi.com>
CC: Claus Jensen <claus.jensen@microsemi.com>
CC: David Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: thunder: Check for driver data in nicvf_remove()
Pavel Fedin [Mon, 16 Nov 2015 14:51:34 +0000 (17:51 +0300)]
net: thunder: Check for driver data in nicvf_remove()

In some cases the crash is caused by nicvf_remove() being called from
outside. For example, if we try to feed the device to vfio after the
probe has failed for some reason. So, move the check to better place.

Signed-off-by: Pavel Fedin <p.fedin@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/core: use netdev name in warning if no parent
Bjørn Mork [Mon, 16 Nov 2015 18:16:40 +0000 (19:16 +0100)]
net/core: use netdev name in warning if no parent

A recent flaw in the netdev feature setting resulted in warnings
like this one from VLAN interfaces:

 WARNING: CPU: 1 PID: 4975 at net/core/dev.c:2419 skb_warn_bad_offload+0xbc/0xcb()
 : caps=(0x00000000001b5820, 0x00000000001b5829) len=2782 data_len=0 gso_size=1348 gso_type=16 ip_summed=3

The ":" is supposed to be preceded by a driver name, but in this
case it is an empty string since the device has no parent.

There are many types of network devices without a parent. The
anonymous warnings for these devices can be hard to debug.  Log
the network device name instead in these cases to assist further
debugging.

This is mostly similar to how __netdev_printk() handles orphan
devices.

Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoaf_unix: don't append consumed skbs to sk_receive_queue
Hannes Frederic Sowa [Mon, 16 Nov 2015 15:25:56 +0000 (16:25 +0100)]
af_unix: don't append consumed skbs to sk_receive_queue

In case multiple writes to a unix stream socket race we could end up in a
situation where we pre-allocate a new skb for use in unix_stream_sendpage
but have to free it again in the locked section because another skb
has been appended meanwhile, which we must use. Accidentally we didn't
clear the pointer after consuming it and so we touched freed memory
while appending it to the sk_receive_queue. So, clear the pointer after
consuming the skb.

This bug has been found with syzkaller
(http://github.com/google/syzkaller) by Dmitry Vyukov.

Fixes: 869e7c62486e ("net: af_unix: implement stream sendpage support")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: switchdev: fix return code of fdb_dump stub
Dragos Tatulea [Mon, 16 Nov 2015 09:52:48 +0000 (10:52 +0100)]
net: switchdev: fix return code of fdb_dump stub

rtnl_fdb_dump always expects an index to be returned by the ndo_fdb_dump op,
but when CONFIG_NET_SWITCHDEV is off, it returns an error.

Fix that by returning the given unmodified idx.

A similar fix was 0890cf6cb6ab ("switchdev: fix return value of
switchdev_port_fdb_dump in case of error") but for the CONFIG_NET_SWITCHDEV=y
case.

Fixes: 45d4122ca7cd ("switchdev: add support for fdb add/del/dump via switchdev_port_obj ops.")
Signed-off-by: Dragos Tatulea <dragos@endocode.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobnx2x: Fix VLANs null-pointer for 57710, 57711
Yuval Mintz [Sun, 15 Nov 2015 13:02:16 +0000 (15:02 +0200)]
bnx2x: Fix VLANs null-pointer for 57710, 57711

Commit 05cc5a39ddb7 "bnx2x: add vlan filtering offload" introduced
a regression in regard for vlans for 57710, 57711 adapters -
Loading 8021q module on a machine with such an adapter would cause
a null pointer dereference, as the driver mistakenly publishes it
has capabilities for vlan CTAG filtering.

Reported-by: Otto Sabart <osabart@redhat.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoravb: remove unhandle int cause
Masaru Nagai [Sun, 15 Nov 2015 12:34:42 +0000 (21:34 +0900)]
ravb: remove unhandle int cause

This driver does not handle the AVB-DMAC Receive FIFO Warning interrupt
now, so the interrupt should not be enabled.

Signed-off-by: Masaru Nagai <masaru.nagai.vx@renesas.com>
Signed-off-by: Yoshihiro Kaneko <ykaneko0929@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoraw: increment correct SNMP counters for ICMP messages
Ben Cartwright-Cox [Sat, 14 Nov 2015 15:13:58 +0000 (15:13 +0000)]
raw: increment correct SNMP counters for ICMP messages

Sending ICMP packets with raw sockets ends up in the SNMP counters
logging the type as the first byte of the IPv4 header rather than
the ICMP header. This is fixed by adding the IP Header Length to
the casting into a icmphdr struct.

Signed-off-by: Ben Cartwright-Cox <ben@benjojo.co.uk>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agosfc: constify pci_error_handlers structures
Julia Lawall [Sat, 14 Nov 2015 10:06:57 +0000 (11:06 +0100)]
sfc: constify pci_error_handlers structures

This pci_error_handlers structure is never modified, like all the other
pci_error_handlers structures, so declare it as const.

Done with the help of Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: cavium: liquidio: constify pci_error_handlers structures
Julia Lawall [Sat, 14 Nov 2015 10:06:53 +0000 (11:06 +0100)]
net: cavium: liquidio: constify pci_error_handlers structures

This pci_error_handlers structure is never modified, like all the other
pci_error_handlers structures, so declare it as const.

Done with the help of Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoDriver: Vmxnet3: Fix use of mfTableLen for big endian architectures
Shrikrishna Khare [Fri, 13 Nov 2015 23:42:10 +0000 (15:42 -0800)]
Driver: Vmxnet3: Fix use of mfTableLen for big endian architectures

Signed-off-by: Shrikrishna Khare <skhare@vmware.com>
Reported-by: Masao Uebayashi <uebayasi@gmail.com>
Signed-off-by: Bhavesh Davda <bhavesh@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: usb: cdc_ether: add Dell DW5580 as a mobile broadband adapter
Daniele Palmas [Fri, 13 Nov 2015 17:01:21 +0000 (18:01 +0100)]
net: usb: cdc_ether: add Dell DW5580 as a mobile broadband adapter

Since Dell DW5580 is a 3G modem, this patch adds the device as a
mobile broadband adapter

Signed-off-by: Daniele Palmas <dnlplm@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: fix __netdev_update_features return on ndo_set_features failure
Nikolay Aleksandrov [Fri, 13 Nov 2015 14:20:24 +0000 (15:20 +0100)]
net: fix __netdev_update_features return on ndo_set_features failure

If ndo_set_features fails __netdev_update_features() will return -1 but
this is wrong because it is expected to return 0 if no features were
changed (see netdev_update_features()), which will cause a netdev
notifier to be called without any actual changes. Fix this by returning
0 if ndo_set_features fails.

Fixes: 6cb6a27c45ce ("net: Call netdev_features_change() from netdev_update_features()")
CC: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: fix feature changes on devices without ndo_set_features
Nikolay Aleksandrov [Fri, 13 Nov 2015 13:54:01 +0000 (14:54 +0100)]
net: fix feature changes on devices without ndo_set_features

When __netdev_update_features() was updated to ensure some features are
disabled on new lower devices, an error was introduced for devices which
don't have the ndo_set_features() method set. Before we'll just set the
new features, but now we return an error and don't set them. Fix this by
returning the old behaviour and setting err to 0 when ndo_set_features
is not present.

Fixes: e7868a85e1b2 ("net/core: ensure features get disabled on new lower devs")
CC: Jarod Wilson <jarod@redhat.com>
CC: Jiri Pirko <jiri@resnulli.us>
CC: Ido Schimmel <idosch@mellanox.com>
CC: Sander Eikelenboom <linux@eikelenboom.it>
CC: Andy Gospodarek <gospo@cumulusnetworks.com>
CC: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Reviewed-by: Jarod Wilson <jarod@redhat.com>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Dave Young <dyoung@redhat.com>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoswitchdev: bridge: Check return code is not EOPNOTSUPP
Ido Schimmel [Fri, 13 Nov 2015 11:06:12 +0000 (13:06 +0200)]
switchdev: bridge: Check return code is not EOPNOTSUPP

When NET_SWITCHDEV=n, switchdev_port_attr_set simply returns EOPNOTSUPP.
In this case we should not emit errors and warnings to the kernel log.

Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Fixes: 0bc05d585d38 ("switchdev: allow caller to explicitly request
attr_set as deferred")
Fixes: 6ac311ae8bfb ("Adding switchdev ageing notification on port
bridged")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobe2net: replace hardcoded values with existing define
Ivan Vecera [Fri, 13 Nov 2015 10:36:58 +0000 (11:36 +0100)]
be2net: replace hardcoded values with existing define

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobe2net: remove unused local rsstable array
Ivan Vecera [Fri, 13 Nov 2015 10:36:57 +0000 (11:36 +0100)]
be2net: remove unused local rsstable array

Remove rsstable array and its initialization from be_set_rss_hash_opts().
The array became unused after "e255787 be2net: Support for configurable
RSS hash key". The initial RSS table is now filled and stored for later
usage during Rx queue creation.

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Acked-by: Sathya Perla <sathya.perla@avagotech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoravb: Fix int mask value overwritten issue
Masaru Nagai [Fri, 13 Nov 2015 10:24:49 +0000 (19:24 +0900)]
ravb: Fix int mask value overwritten issue

When RX/TX interrupt for Network Control queue and Best Effort queue
is issued at the same time, the interrupt mask of Network Control
queue will be reset when the mask of Best Effort queue is set.
This patch fixes this problem.

Signed-off-by: Masaru Nagai <masaru.nagai.vx@renesas.com>
Signed-off-by: Yoshihiro Kaneko <ykaneko0929@gmail.com>
Acked-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: smsc911x: Reset PHY during initialization
Pavel Fedin [Fri, 13 Nov 2015 06:46:59 +0000 (09:46 +0300)]
net: smsc911x: Reset PHY during initialization

On certain hardware after software reboot the chip may get stuck and fail
to reinitialize during reset. This can be fixed by ensuring that PHY is
reset too.

Old PHY resetting method required operational MDIO interface, therefore
the chip should have been already set up. In order to be able to function
during probe, it is changed to use PMT_CTRL register.

The problem could be observed on SMDK5410 board.

Signed-off-by: Pavel Fedin <p.fedin@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobpf, arm64: start flushing icache range from header
Daniel Borkmann [Sat, 14 Nov 2015 00:16:18 +0000 (01:16 +0100)]
bpf, arm64: start flushing icache range from header

While recently going over ARM64's BPF code, I noticed that the icache
range we're flushing should start at header already and not at ctx.image.

Reason is that after b569c1c622c5 ("net: bpf: arm64: address randomize
and write protect JIT code"), we also want to make sure to flush the
random-sized trap in front of the start of the actual program (analogous
to x86). No operational differences from user side.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Zi Shen Lim <zlim.lnx@gmail.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobpf, arm: start flushing icache range from header
Daniel Borkmann [Sat, 14 Nov 2015 00:26:53 +0000 (01:26 +0100)]
bpf, arm: start flushing icache range from header

During review I noticed that the icache range we're flushing should
start at header already and not at ctx.image.

Reason is that after 55309dd3d4cd ("net: bpf: arm: address randomize
and write protect JIT code"), we also want to make sure to flush the
random-sized trap in front of the start of the actual program (analogous
to x86). No operational differences from user side.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Nicolas Schichan <nschichan@freebox.fr>
Cc: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobpf: samples: exclude asm/sysreg.h for arm64
Yang Shi [Thu, 12 Nov 2015 22:07:46 +0000 (14:07 -0800)]
bpf: samples: exclude asm/sysreg.h for arm64

commit 338d4f49d6f7114a017d294ccf7374df4f998edc
("arm64: kernel: Add support for Privileged Access Never") includes sysreg.h
into futex.h and uaccess.h. But, the inline assembly used by asm/sysreg.h is
incompatible with llvm so it will cause BPF samples build failure for ARM64.
Since sysreg.h is useless for BPF samples, just exclude it from Makefile via
defining __ASM_SYSREG_H.

Signed-off-by: Yang Shi <yang.shi@linaro.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoarm64: bpf: fix JIT frame pointer setup
Yang Shi [Thu, 12 Nov 2015 21:57:00 +0000 (13:57 -0800)]
arm64: bpf: fix JIT frame pointer setup

BPF fp should point to the top of the BPF prog stack. The original
implementation made it point to the bottom incorrectly.
Move A64_SP to fp before reserve BPF prog stack space.

CC: Zi Shen Lim <zlim.lnx@gmail.com>
CC: Xi Wang <xi.wang@gmail.com>
Signed-off-by: Yang Shi <yang.shi@linaro.org>
Reviewed-by: Zi Shen Lim <zlim.lnx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: phy: vitesse: add support for VSC8601
Måns Rullgård [Thu, 12 Nov 2015 18:41:12 +0000 (18:41 +0000)]
net: phy: vitesse: add support for VSC8601

This adds support for the Vitesse VSC8601 PHY. Generic functions are
used for everything except interrupt handling.

Signed-off-by: Mans Rullgard <mans@mansr.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: phy: at803x: support interrupt on 8030 and 8035
Måns Rullgård [Thu, 12 Nov 2015 17:40:20 +0000 (17:40 +0000)]
net: phy: at803x: support interrupt on 8030 and 8035

Commit 77a993942 "phy/at8031: enable at8031 to work on interrupt mode"
added interrupt support for the 8031 PHY but left out the other two
chips supported by this driver.

This patch sets the .ack_interrupt and .config_intr functions for the
8030 and 8035 drivers as well.

Signed-off-by: Mans Rullgard <mans@mansr.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoip_tunnel: disable preemption when updating per-cpu tstats
Jason A. Donenfeld [Thu, 12 Nov 2015 16:35:58 +0000 (17:35 +0100)]
ip_tunnel: disable preemption when updating per-cpu tstats

Drivers like vxlan use the recently introduced
udp_tunnel_xmit_skb/udp_tunnel6_xmit_skb APIs. udp_tunnel6_xmit_skb
makes use of ip6tunnel_xmit, and ip6tunnel_xmit, after sending the
packet, updates the struct stats using the usual
u64_stats_update_begin/end calls on this_cpu_ptr(dev->tstats).
udp_tunnel_xmit_skb makes use of iptunnel_xmit, which doesn't touch
tstats, so drivers like vxlan, immediately after, call
iptunnel_xmit_stats, which does the same thing - calls
u64_stats_update_begin/end on this_cpu_ptr(dev->tstats).

While vxlan is probably fine (I don't know?), calling a similar function
from, say, an unbound workqueue, on a fully preemptable kernel causes
real issues:

[  188.434537] BUG: using smp_processor_id() in preemptible [00000000] code: kworker/u8:0/6
[  188.435579] caller is debug_smp_processor_id+0x17/0x20
[  188.435583] CPU: 0 PID: 6 Comm: kworker/u8:0 Not tainted 4.2.6 #2
[  188.435607] Call Trace:
[  188.435611]  [<ffffffff8234e936>] dump_stack+0x4f/0x7b
[  188.435615]  [<ffffffff81915f3d>] check_preemption_disabled+0x19d/0x1c0
[  188.435619]  [<ffffffff81915f77>] debug_smp_processor_id+0x17/0x20

The solution would be to protect the whole
this_cpu_ptr(dev->tstats)/u64_stats_update_begin/end blocks with
disabling preemption and then reenabling it.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agohwmon: (scpi) skip unsupported sensors properly
Sudeep Holla [Wed, 28 Oct 2015 17:17:31 +0000 (17:17 +0000)]
hwmon: (scpi) skip unsupported sensors properly

Currently it's assumed that firmware exports only the class of sensors
supported by the driver. However with newer firmware or SCPI protocol
revision, support for newer classes of sensors can be present.

The driver fails to probe with the following warning if an unsupported
class of sensor is encountered in the firmware.

sysfs: cannot create duplicate filename
'/devices/platform/scpi/scpi:sensors/hwmon/hwmon0/'
------------[ cut here ]------------
WARNING: at fs/sysfs/dir.c:31
Modules linked in:

CPU: 0 PID: 6 Comm: kworker/u12:0 Not tainted 4.3.0-rc7 #137
Hardware name: ARM Juno development board (r0) (DT)
Workqueue: deferwq deferred_probe_work_func
PC is at sysfs_warn_dup+0x54/0x78
LR is at sysfs_warn_dup+0x54/0x78

This patch fixes the above issue by skipping through the unsupported
class of SCPI sensors.

Fixes: 68acc77a2d51 ("hwmon: Support thermal zones registration for SCP temperature sensors")
Fixes: ea98b29a05e9 ("hwmon: Support sensors exported via ARM SCP interface")
Cc: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Punit Agrawal <punit.agrawal@arm.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
8 years agohwmon: (scpi) add thermal-of dependency
Arnd Bergmann [Mon, 16 Nov 2015 16:56:39 +0000 (17:56 +0100)]
hwmon: (scpi) add thermal-of dependency

The newly added scpi thermal support is broken when the scpi driver
is built-in but the thermal driver is a loadable module:

drivers/built-in.o: In function `scpi_hwmon_probe':
(.text+0x444d70): undefined reference to `thermal_zone_of_sensor_unregister'
(.text+0x444d94): undefined reference to `thermal_zone_of_sensor_register'
drivers/built-in.o: In function `scpi_hwmon_remove':
(text+0x444e6c): undefined reference to `thermal_zone_of_sensor_unregister'

This uses the same Kconfig trick that we have in a couple of other
drivers already to ensure we can only select the driver in valid
configurations when either THERMAL_OF is disabled, or when with a
dependency on CONFIG_THERMAL that can force SCPI to be a loadable
module in the case I was hitting.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 68acc77a2d51 ("hwmon: Support thermal zones registration for SCP temperature sensors")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
8 years agos390: remove SALIPL loader
Heiko Carstens [Fri, 13 Nov 2015 13:17:14 +0000 (14:17 +0100)]
s390: remove SALIPL loader

There is no known user, therefore remove the code.

Acked-by: Rob Van Der Heij <robvdheij@nl.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
8 years agos390: wire up mlock2 system call
Heiko Carstens [Mon, 16 Nov 2015 11:31:33 +0000 (12:31 +0100)]
s390: wire up mlock2 system call

Passes mlock2-tests test case in 64 bit and compat mode.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
8 years agos390: remove g5 elf platform support
Heiko Carstens [Fri, 13 Nov 2015 11:45:12 +0000 (12:45 +0100)]
s390: remove g5 elf platform support

Remove dead code, since this could only happen on a 31 bit machine
where the kernel wouldn't IPL.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
8 years agos390: avoid cache aliasing under z/VM and KVM
Martin Schwidefsky [Tue, 10 Nov 2015 11:30:28 +0000 (12:30 +0100)]
s390: avoid cache aliasing under z/VM and KVM

commit 1f6b83e5e4d3 ("s390: avoid z13 cache aliasing") checks for the
machine type to optimize address space randomization and zero page
allocation to avoid cache aliases.

This check might fail under a hypervisor with migration support.
z/VMs "Single System Image and Live Guest Relocation" facility will
"fake" the machine type of the oldest system in the group. For example
in a group of zEC12 and Z13 the guest appears to run on a zEC12
(architecture fencing within the relocation domain)

Remove the machine type detection and always use cache aliasing
rules that are known to work for all machines. These are the z13
aliasing rules.

Suggested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
8 years agohwmon : (applesmc) Fix uninitialized variables warnings
Shuah Khan [Thu, 5 Nov 2015 22:39:27 +0000 (15:39 -0700)]
hwmon : (applesmc) Fix uninitialized variables warnings

Fix the following "maybe used uninitialized" warnings by
initializing the variables to keep the compiler quiet.
There is no "used uninitialized" in this case.

  CC [M]  drivers/hwmon/applesmc.o
drivers/hwmon/applesmc.c: In function ‘applesmc_init_smcreg’:
drivers/hwmon/applesmc.c:595:43: warning: ‘right_light_sensor’
may be used uninitialized in this function [-Wmaybe-uninitialized]
  s->num_light_sensors = left_light_sensor + right_light_sensor;
                                           ^
drivers/hwmon/applesmc.c:540:26: note: ‘right_light_sensor’ was
declared here
  bool left_light_sensor, right_light_sensor;
                          ^
drivers/hwmon/applesmc.c:595:43: warning: ‘left_light_sensor’ may
be used uninitialized in this function [-Wmaybe-uninitialized]
  s->num_light_sensors = left_light_sensor + right_light_sensor;
                                           ^
drivers/hwmon/applesmc.c:540:7: note: ‘left_light_sensor’ was
declared here
  bool left_light_sensor, right_light_sensor;
       ^

Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
8 years agohwmon: (ina2xx) Fix build issue by selecting REGMAP_I2C
Li Yang [Thu, 5 Nov 2015 20:18:17 +0000 (14:18 -0600)]
hwmon: (ina2xx) Fix build issue by selecting REGMAP_I2C

Since a0de56c81fcf ("hwmon: (ina2xx) convert driver to using regmap")
the driver requires REGMAP_I2C to build.  Select it by default
in Kconfig.

Reported-by: Guo Chunrong <B40290@freescale.com>
Cc: Marc Titinger <mtitinger@baylibre.com>
Signed-off-by: Li Yang <leoli@freescale.com>
Fixes: a0de56c81fcf ("hwmon: (ina2xx) convert driver to using regmap")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
8 years agoMerge branch 'mv88e6060-fixes'
David S. Miller [Mon, 16 Nov 2015 01:16:26 +0000 (20:16 -0500)]
Merge branch 'mv88e6060-fixes'

Neil Armstrong says:

====================
net: dsa: mv88e6060: cleanup and fix setup

This patchset introduces some fixes and a registers addressing cleanup for
the mv88e6060 DSA driver.

The first patch removes the poll_link as mv88e6xxx.
The 3 following patches fixes the setup in regards of the datasheet.
The 2 last patches introduces a clean header and replaces all magic values.

v2: cleanup InitReady patch, add missing Acked-by and fix header copyright notice
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: dsa: mv88e6060: replace magic values with register defines
Neil Armstrong [Tue, 10 Nov 2015 15:51:36 +0000 (16:51 +0100)]
net: dsa: mv88e6060: replace magic values with register defines

To align with the mv88e6xxx code, use the register defines to
access all the register addresses and bit fields.

Acked-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: dsa: mv88e6060: add register defines header file
Neil Armstrong [Tue, 10 Nov 2015 15:51:42 +0000 (16:51 +0100)]
net: dsa: mv88e6060: add register defines header file

To align with the mv88e6xxx code, add a similar header file
with all the register defines.
The file is based on the mv88e6xxx header for coherency.

Acked-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Acked-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: dsa: mv88e6060: use the correct bit shift for mac0
Neil Armstrong [Tue, 10 Nov 2015 15:51:32 +0000 (16:51 +0100)]
net: dsa: mv88e6060: use the correct bit shift for mac0

According to the mv88e6060 datasheet, the first mac byte must
be at position 9 instead of 8 since the bit 8 is used to select
if the mac address must differ for each port for Pause frames.
Use the correct shift and set the same mac address for all port.

Acked-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: dsa: mv88e6060: use the correct MaxFrameSize bit
Neil Armstrong [Tue, 10 Nov 2015 15:51:24 +0000 (16:51 +0100)]
net: dsa: mv88e6060: use the correct MaxFrameSize bit

According to the mv88e6060 datasheet, the MaxFrameSize bit position
is 10 instead of 11 which is reserved.
Use the bit correctly to setup max frame size to 1536.

Acked-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: dsa: mv88e6060: use the correct InitReady bit
Neil Armstrong [Tue, 10 Nov 2015 15:51:19 +0000 (16:51 +0100)]
net: dsa: mv88e6060: use the correct InitReady bit

According to the mv88e6060 datasheet, the InitReady bit position
is 11 and the polarity is inverted.
Use the bit correctly to detect the end of initialization.

Acked-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Acked-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: dsa: mv88e6060: remove poll_link callback
Neil Armstrong [Tue, 10 Nov 2015 15:51:14 +0000 (16:51 +0100)]
net: dsa: mv88e6060: remove poll_link callback

As of mv88e6xxx remove the poll_link callback since the link
state change polling is now handled by the phylib.

Tested on a mv88e6060 B0 device with a TI DM816X SoC.

Suggested-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoLinux 4.4-rc1 v4.4-rc1
Linus Torvalds [Mon, 16 Nov 2015 01:00:27 +0000 (17:00 -0800)]
Linux 4.4-rc1

8 years agoMerge branch 'mellanox-net-fixes'
David S. Miller [Sun, 15 Nov 2015 23:43:47 +0000 (18:43 -0500)]
Merge branch 'mellanox-net-fixes'

Or Gerlitz says:

====================
Mellanox NIC driver update, Nov 12, 2015

Few small mlx5 and mlx4 fixes from the team... done over
net commit c5a3788 "Merge branch 'akpm' (patches from Andrew)"

Eran's patch needs to go to 4.2 and 4.3 stable kernels.

Tariq's patch need to go to 4.3 stable too.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx4_core: Avoid returning success in case of an error flow
Noa Osherovich [Thu, 12 Nov 2015 17:35:30 +0000 (19:35 +0200)]
net/mlx4_core: Avoid returning success in case of an error flow

The err variable wasn't set with the correct error value in some cases.

Fixes: 47605df95398 ('mlx4: Modify proxy/tunnel QP mechanism [..]')
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx4_core: Fix sleeping while holding spinlock at rem_slave_counters
Eran Ben Elisha [Thu, 12 Nov 2015 17:35:29 +0000 (19:35 +0200)]
net/mlx4_core: Fix sleeping while holding spinlock at rem_slave_counters

When cleaning slave's counter resources, we hold a spinlock that
protects the slave's counters list. As part of the clean, we call
__mlx4_clear_if_stat which calls mlx4_alloc_cmd_mailbox which is a
sleepable function.

In order to fix this issue, hold the spinlock, and copy all counter
indices into a temporary array, and release the spinlock. Afterwards,
iterate over this array and free every counter. Repeat this scenario
until the original list is empty (a new counter might have been added
while releasing the counters from the temporary array).

Fixes: b72ca7e96acf ("net/mlx4_core: Reset counters data when freed")
Reported-by: Moni Shoua <monis@mellanox.com>
Tested-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx5e: Use the right DMA free function on TX path
Achiad Shochat [Thu, 12 Nov 2015 17:35:28 +0000 (19:35 +0200)]
net/mlx5e: Use the right DMA free function on TX path

On xmit path we use skb_frag_dma_map() which is using dma_map_page(),
while upon completion we dma-unmap the skb fragments using
dma_unmap_single() rather than dma_unmap_page().

To fix this, we now save the dma map type on xmit path and use this
info to call the right dma unmap method upon TX completion.

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx5e: Max mtu comparison fix
Doron Tsur [Thu, 12 Nov 2015 17:35:27 +0000 (19:35 +0200)]
net/mlx5e: Max mtu comparison fix

On change mtu the driver compares between hardware queried mtu and
software requested mtu. We need to compare between software
representation of the queried mtu and the requested mtu.

Fixes: facc9699f0fe ('net/mlx5e: Fix HW MTU settings')
Signed-off-by: Doron Tsur <doront@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx5e: Added self loopback prevention
Tariq Toukan [Thu, 12 Nov 2015 17:35:26 +0000 (19:35 +0200)]
net/mlx5e: Added self loopback prevention

Prevent outgoing multicast frames from looping back to the RX queue.

By introducing new HW capability self_lb_en_modifiable, which indicates
the support to modify self_lb_en bit in modify_tir command.

When this capability is set we can prevent TIRs from sending back
loopback multicast traffic to their own RQs, by "refreshing TIRs" with
modify_tir command, on every time new channels (SQs/RQs) are created at
device open.
This is needed since TIRs are static and only allocated once on driver
load, and the loopback decision is under their responsibility.

Fixes issues of the kind:
"IPv6: eth2: IPv6 duplicate address fe80::e61d:2dff:fe5c:f2e9 detected!"
The issue is seen since the IPv6 solicitations multicast messages are
loopedback and the network stack thinks they are coming from another host.

Fixes: 5c50368f3831 ("net/mlx5e: Light-weight netdev open/stop")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx5e: Fix inline header size calculation
Saeed Mahameed [Thu, 12 Nov 2015 17:35:25 +0000 (19:35 +0200)]
net/mlx5e: Fix inline header size calculation

mlx5e_get_inline_hdr_size didn't take into account the vlan insertion
into the inline WQE segment.
This could lead to max inline violation in cases where
skb_headlen(skb) + VLAN_HLEN >= sq->max_inline.

Fixes: 3ea4891db8d0 ("net/mlx5e: Fix LSO vlan insertion")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoipvs: use skb_to_full_sk() helper
Eric Dumazet [Thu, 12 Nov 2015 17:14:12 +0000 (09:14 -0800)]
ipvs: use skb_to_full_sk() helper

SYNACK packets might be attached to request sockets.

Use skb_to_full_sk() helper to avoid illegal accesses to
inet_sk(skb->sk)

Fixes: ca6fb0651883 ("tcp: attach SYNACK messages to request sockets instead of listener")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agotcp: ensure proper barriers in lockless contexts
Eric Dumazet [Thu, 12 Nov 2015 16:43:18 +0000 (08:43 -0800)]
tcp: ensure proper barriers in lockless contexts

Some functions access TCP sockets without holding a lock and
might output non consistent data, depending on compiler and or
architecture.

tcp_diag_get_info(), tcp_get_info(), tcp_poll(), get_tcp4_sock() ...

Introduce sk_state_load() and sk_state_store() to fix the issues,
and more clearly document where this lack of locking is happening.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: thunder: Fix crash upon shutdown after failed probe
Pavel Fedin [Thu, 12 Nov 2015 11:55:18 +0000 (14:55 +0300)]
net: thunder: Fix crash upon shutdown after failed probe

If device probe fails, driver remains bound to the PCI device. However,
driver data has been reset to NULL. This causes crash upon dereferencing
it in nicvf_remove()

Signed-off-by: Pavel Fedin <p.fedin@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agosctp: translate host order to network order when setting a hmacid
lucien [Thu, 12 Nov 2015 05:07:07 +0000 (13:07 +0800)]
sctp: translate host order to network order when setting a hmacid

now sctp auth cannot work well when setting a hmacid manually, which
is caused by that we didn't use the network order for hmacid, so fix
it by adding the transformation in sctp_auth_ep_set_hmacs.

even we set hmacid with the network order in userspace, it still
can't work, because of this condition in sctp_auth_ep_set_hmacs():

if (id > SCTP_AUTH_HMAC_ID_MAX)
return -EOPNOTSUPP;

so this wasn't working before and thus it won't break compatibility.

Fixes: 65b07e5d0d09 ("[SCTP]: API updates to suport SCTP-AUTH extensions.")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'packet-fixes'
David S. Miller [Sun, 15 Nov 2015 23:00:48 +0000 (18:00 -0500)]
Merge branch 'packet-fixes'

Daniel Borkmann says:

====================
packet fixes

Fixes a couple of issues in packet sockets, i.e. on TX ring side. See
individual patches for details.

v2 -> v3:
 - First two patches unchanged, kept Jason's Ack
 - Reworked 3rd patch and split into 3:
  - check for dev type as discussed with Willem
  - infer skb->protocol
  - fix max len for dgram
v1 -> v2:
 - Added patch 2 as suggested by Dave
 - Rest is unchanged from previous submission
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agopacket: fix tpacket_snd max frame len
Daniel Borkmann [Wed, 11 Nov 2015 22:25:44 +0000 (23:25 +0100)]
packet: fix tpacket_snd max frame len

Since it's introduction in commit 69e3c75f4d54 ("net: TX_RING and
packet mmap"), TX_RING could be used from SOCK_DGRAM and SOCK_RAW
side. When used with SOCK_DGRAM only, the size_max > dev->mtu +
reserve check should have reserve as 0, but currently, this is
unconditionally set (in it's original form as dev->hard_header_len).

I think this is not correct since tpacket_fill_skb() would then
take dev->mtu and dev->hard_header_len into account for SOCK_DGRAM,
the extra VLAN_HLEN could be possible in both cases. Presumably, the
reserve code was copied from packet_snd(), but later on missed the
check. Make it similar as we have it in packet_snd().

Fixes: 69e3c75f4d54 ("net: TX_RING and packet mmap")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agopacket: infer protocol from ethernet header if unset
Daniel Borkmann [Wed, 11 Nov 2015 22:25:43 +0000 (23:25 +0100)]
packet: infer protocol from ethernet header if unset

In case no struct sockaddr_ll has been passed to packet
socket's sendmsg() when doing a TX_RING flush run, then
skb->protocol is set to po->num instead, which is the protocol
passed via socket(2)/bind(2).

Applications only xmitting can go the path of allocating the
socket as socket(PF_PACKET, <mode>, 0) and do a bind(2) on the
TX_RING with sll_protocol of 0. That way, register_prot_hook()
is neither called on creation nor on bind time, which saves
cycles when there's no interest in capturing anyway.

That leaves us however with po->num 0 instead and therefore
the TX_RING flush run sets skb->protocol to 0 as well. Eric
reported that this leads to problems when using tools like
trafgen over bonding device. I.e. the bonding's hash function
could invoke the kernel's flow dissector, which depends on
skb->protocol being properly set. In the current situation, all
the traffic is then directed to a single slave.

Fix it up by inferring skb->protocol from the Ethernet header
when not set and we have ARPHRD_ETHER device type. This is only
done in case of SOCK_RAW and where we have a dev->hard_header_len
length. In case of ARPHRD_ETHER devices, this is guaranteed to
cover ETH_HLEN, and therefore being accessed on the skb after
the skb_store_bits().

Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agopacket: only allow extra vlan len on ethernet devices
Daniel Borkmann [Wed, 11 Nov 2015 22:25:42 +0000 (23:25 +0100)]
packet: only allow extra vlan len on ethernet devices

Packet sockets can be used by various net devices and are not
really restricted to ARPHRD_ETHER device types. However, when
currently checking for the extra 4 bytes that can be transmitted
in VLAN case, our assumption is that we generally probe on
ARPHRD_ETHER devices. Therefore, before looking into Ethernet
header, check the device type first.

This also fixes the issue where non-ARPHRD_ETHER devices could
have no dev->hard_header_len in TX_RING SOCK_RAW case, and thus
the check would test unfilled linear part of the skb (instead
of non-linear).

Fixes: 57f89bfa2140 ("network: Allow af_packet to transmit +4 bytes for VLAN packets.")
Fixes: 52f1454f629f ("packet: allow to transmit +4 byte in TX_RING slot for VLAN case")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agopacket: always probe for transport header
Daniel Borkmann [Wed, 11 Nov 2015 22:25:41 +0000 (23:25 +0100)]
packet: always probe for transport header

We concluded that the skb_probe_transport_header() should better be
called unconditionally. Avoiding the call into the flow dissector has
also not really much to do with the direct xmit mode.

While it seems that only virtio_net code makes use of GSO from non
RX/TX ring packet socket paths, we should probe for a transport header
nevertheless before they hit devices.

Reference: http://thread.gmane.org/gmane.linux.network/386173/
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>