platform/kernel/linux-starfive.git
7 years agonet: ieee802154: fix net_device reference release too early
Lin Zhang [Tue, 23 May 2017 05:29:39 +0000 (13:29 +0800)]
net: ieee802154: fix net_device reference release too early

This patch fixes the kernel oops when release net_device reference in
advance. In function raw_sendmsg(i think the dgram_sendmsg has the same
problem), there is a race condition between dev_put and dev_queue_xmit
when the device is gong that maybe lead to dev_queue_ximt to see
an illegal net_device pointer.

My test kernel is 3.13.0-32 and because i am not have a real 802154
device, so i change lowpan_newlink function to this:

        /* find and hold real wpan device */
        real_dev = dev_get_by_index(src_net, nla_get_u32(tb[IFLA_LINK]));
        if (!real_dev)
                return -ENODEV;
//      if (real_dev->type != ARPHRD_IEEE802154) {
//              dev_put(real_dev);
//              return -EINVAL;
//      }
        lowpan_dev_info(dev)->real_dev = real_dev;
        lowpan_dev_info(dev)->fragment_tag = 0;
        mutex_init(&lowpan_dev_info(dev)->dev_list_mtx);

Also, in order to simulate preempt, i change the raw_sendmsg function
to this:

        skb->dev = dev;
        skb->sk  = sk;
        skb->protocol = htons(ETH_P_IEEE802154);
        dev_put(dev);
        //simulate preempt
        schedule_timeout_uninterruptible(30 * HZ);
        err = dev_queue_xmit(skb);
        if (err > 0)
                err = net_xmit_errno(err);

and this is my userspace test code named test_send_data:

int main(int argc, char **argv)
{
        char buf[127];
        int sockfd;
        sockfd = socket(AF_IEEE802154, SOCK_RAW, 0);
        if (sockfd < 0) {
                printf("create sockfd error: %s\n", strerror(errno));
                return -1;
        }
        send(sockfd, buf, sizeof(buf), 0);
        return 0;
}

This is my test case:

root@zhanglin-x-computer:~/develop/802154# uname -a
Linux zhanglin-x-computer 3.13.0-32-generic #57-Ubuntu SMP Tue Jul 15
03:51:08 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
root@zhanglin-x-computer:~/develop/802154# ip link add link eth0 name
lowpan0 type lowpan
root@zhanglin-x-computer:~/develop/802154#
//keep the lowpan0 device down
root@zhanglin-x-computer:~/develop/802154# ./test_send_data &
//wait a while
root@zhanglin-x-computer:~/develop/802154# ip link del link dev lowpan0
//the device is gone
//oops
[381.303307] general protection fault: 0000 [#1]SMP
[381.303407] Modules linked in: af_802154 6lowpan bnep rfcomm
bluetooth nls_iso8859_1 snd_hda_codec_hdmi snd_hda_codec_realtek
rts5139(C) snd_hda_intel
snd_had_codec snd_hwdep snd_pcm snd_page_alloc snd_seq_midi
snd_seq_midi_event snd_rawmidi snd_req intel_rapl snd_seq_device
coretemp i915 kvm_intel
kvm snd_timer snd crct10dif_pclmul crc32_pclmul ghash_clmulni_intel
cypted drm_kms_helper drm i2c_algo_bit soundcore video mac_hid
parport_pc ppdev ip parport hid_generic
usbhid hid ahci r8169 mii libahdi
[381.304286] CPU:1 PID: 2524 Commm: 1 Tainted: G C 0 3.13.0-32-generic
[381.304409] Hardware name: Haier Haier DT Computer/Haier DT Codputer,
BIOS FIBT19H02_X64 06/09/2014
[381.304546] tasks: ffff000096965fc0 ti: ffffB0013779c000 task.ti:
ffffB8013779c000
[381.304659] RIP: 0010:[<ffffffff01621fe1>] [<ffffffff81621fe1>]
__dev_queue_ximt+0x61/0x500
[381.304798] RSP: 0018:ffffB8013779dca0 EFLAGS: 00010202
[381.304880] RAX: 272b031d57565351 RBX: 0000000000000000 RCX: ffff8800968f1a00
[381.304987] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8800968f1a00
[381.305095] RBP: ffff8e013773dce0 R08: 0000000000000266 R09: 0000000000000004
[381.305202] R10: 0000000000000004 R11: 0000000000000005 R12: ffff88013902e000
[381.305310] R13: 000000000000007f R14: 000000000000007f R15: ffff8800968f1a00
[381.305418] FS:  00007fc57f50f740(0000) GS: ffff88013fc80000(0000)
knlGS: 0000000000000000
[381.305540] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[381.305627] CR2: 00007fad0841c000 CR3: 00000001368dd000 CR4: 00000000001007e0
[361.905734] Stack:
[381.305768]  00000000002052d0 000000003facb30a ffff88013779dcc0
ffff880137764000
[381.305898]  ffff88013779de70 000000000000007f 000000000000007f
ffff88013902e000
[381.306026]  ffff88013779dcf0 ffffffff81622490 ffff88013779dd39
ffffffffa03af9f1
[381.306155] Call Trace:
[381.306202]  [<ffffffff81622490>] dev_queue_xmit+0x10/0x20
[381.306294]  [<ffffffffa03af9f1>] raw_sendmsg+0x1b1/0x270 [af_802154]
[381.306396]  [<ffffffffa03af054>] ieee802154_sock_sendmsg+0x14/0x20 [af_802154]
[381.306512]  [<ffffffff816079eb>] sock_sendmsg+0x8b/0xc0
[381.306600]  [<ffffffff811d52a5>] ? __d_alloc+0x25/0x180
[381.306687]  [<ffffffff811a1f56>] ? kmem_cache_alloc_trace+0x1c6/0x1f0
[381.306791]  [<ffffffff81607b91>] SYSC_sendto+0x121/0x1c0
[381.306878]  [<ffffffff8109ddf4>] ? vtime_account_user+x54/0x60
[381.306975]  [<ffffffff81020d45>] ? syscall_trace_enter+0x145/0x250
[381.307073]  [<ffffffff816086ae>] SyS_sendto+0xe/0x10
[381.307156]  [<ffffffff8172c87f>] tracesys+0xe1/0xe6
[381.307233] Code: c6 a1 a4 ff 41 8b 57 78 49 8b 47 20 85 d2 48 8b 80
78 07 00 00 75 21 49 8b 57 18 48 85 d2 74 18 48 85 c0 74 13 8b 92 ac
01 00 00 <3b> 50 10 73 08 8b 44 90 14 41 89 47 78 41 f6 84 24 d5 00 00
00
[381.307801] RIP [<ffffffff81621fe1>] _dev_queue_xmit+0x61/0x500
[381.307901]  RSP <ffff88013779dca0>
[381.347512] Kernel panic - not syncing: Fatal exception in interrupt
[381.347747] drm_kms_helper: panic occurred, switching back to text console

In my opinion, there is always exist a chance that the device is gong
before call dev_queue_xmit.

I think the latest kernel is have the same problem and that
dev_put should be behind of the dev_queue_xmit.

Signed-off-by: Lin Zhang <xiaolou4617@gmail.com>
Acked-by: Stefan Schmidt <stefan@osg.samsung.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
7 years agonet: ieee802154: remove explicit set skb->sk
Lin Zhang [Tue, 23 May 2017 05:21:05 +0000 (13:21 +0800)]
net: ieee802154: remove explicit set skb->sk

Explicit set skb->sk is needless, sock_alloc_send_skb is already set it.

Signed-off-by: Lin Zhang <xiaolou4617@gmail.com>
Acked-by: Stefan Schmidt <stefan@osg.samsung.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
7 years agoBluetooth: btintel: Add MODULE_FIRMWARE entries for iBT 3.5 controllers
Jürg Billeter [Tue, 23 May 2017 16:46:25 +0000 (18:46 +0200)]
Bluetooth: btintel: Add MODULE_FIRMWARE entries for iBT 3.5 controllers

The iBT 3.5 controllers (Intel 8265, Windstorm Peak) need
intel/ibt-12-16.sfi and intel/ibt-12-16.ddc firmware files from
linux-firmware repository.

Signed-off-by: Jürg Billeter <j@bitron.ch>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
7 years agoBluetooth: btwilink: Fix unexpected skb free
Loic Poulain [Tue, 23 May 2017 09:51:00 +0000 (11:51 +0200)]
Bluetooth: btwilink: Fix unexpected skb free

The caller (hci_core) still owns the skb in case of error, releasing
it inside the send function can lead to use-after-free errors.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Loic Poulain <loic.poulain@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
7 years agoBluetooth: hci_ll: Fix download_firmware() return when __hci_cmd_sync fails
Guodong Xu [Mon, 22 May 2017 13:50:42 +0000 (21:50 +0800)]
Bluetooth: hci_ll: Fix download_firmware() return when __hci_cmd_sync fails

When __hci_cmd_sync() fails, download_firmware() should also fail, and
the same error value should be returned as PTR_ERR(skb).

Without this fix, download_firmware() will return a success when it actually
failed in __hci_cmd_sync().

Fixes: 371805522f87 ("bluetooth: hci_uart: add LL protocol serdev driver support")
Signed-off-by: Guodong Xu <guodong.xu@linaro.org>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
7 years agoieee802154: ca8210: Delete an error message for a failed memory allocation in ca8210_...
Markus Elfring [Mon, 22 May 2017 06:03:17 +0000 (08:03 +0200)]
ieee802154: ca8210: Delete an error message for a failed memory allocation in ca8210_skb_rx()

Omit an extra message for a memory allocation failure in this function.

This issue was detected by using the Coccinelle software.

Link: http://events.linuxfoundation.org/sites/events/files/slides/LCJ16-Refactor_Strings-WSang_0.pdf
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
7 years agoieee802154: ca8210: Delete an error message for a failed memory allocation in ca8210_...
Markus Elfring [Mon, 22 May 2017 05:32:46 +0000 (07:32 +0200)]
ieee802154: ca8210: Delete an error message for a failed memory allocation in ca8210_probe()

Omit an extra message for a memory allocation failure in this function.

This issue was detected by using the Coccinelle software.

Link: http://events.linuxfoundation.org/sites/events/files/slides/LCJ16-Refactor_Strings-WSang_0.pdf
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
7 years agoBluetooth: Delete error messages for failed memory allocations in two functions
Markus Elfring [Mon, 22 May 2017 06:42:28 +0000 (08:42 +0200)]
Bluetooth: Delete error messages for failed memory allocations in two functions

Omit two extra messages for memory allocation failures in these functions.

This issue was detected by using the Coccinelle software.

Link: http://events.linuxfoundation.org/sites/events/files/slides/LCJ16-Refactor_Strings-WSang_0.pdf
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
7 years agoMAINTAINERS: update my mail address
Alexander Aring [Thu, 18 May 2017 18:52:56 +0000 (20:52 +0200)]
MAINTAINERS: update my mail address

I don't own this mail address anymore. This patch change the mail
address to my current one.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Acked-by: Stefan Schmidt <stefan@osg.samsung.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
7 years agoBluetooth: hci_nokia: select BT_HCIUART_H4
Tobias Regnery [Mon, 8 May 2017 09:39:11 +0000 (11:39 +0200)]
Bluetooth: hci_nokia: select BT_HCIUART_H4

We see the following build failure with CONFIG_BT_HCIUART_NOKIA=y and
CONFIG_BT_HCIUART_H4=n:

drivers/bluetooth/hci_nokia.c: In function 'nokia_recv':
drivers/bluetooth/hci_nokia.c:644:18: error: implicit declaration of function 'h4_recv_buf' [-Werror=implicit-function-declaration]
...

Fix this by selecting the BT_HCIUART_H4 symbol like all the other users
of the protocoll.

Fixes: 7bb318680e86 ("Bluetooth: add nokia driver")
Signed-off-by: Tobias Regnery <tobias.regnery@gmail.com>
Reviewed-by: Sebastian Reichel <sre@kernel.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
7 years agoBluetooth: hci_ldisc: Use rwlocking to avoid closing proto races
Dean Jenkins [Fri, 5 May 2017 15:27:06 +0000 (16:27 +0100)]
Bluetooth: hci_ldisc: Use rwlocking to avoid closing proto races

When HCI_UART_PROTO_READY is in the set state, the Data Link protocol
layer (proto) is bound to the HCI UART driver. This state allows the
registered proto function pointers to be used by the HCI UART driver.

When unbinding (closing) the Data Link protocol layer, the proto
function pointers much be prevented from being used immediately before
running the proto close function pointer. Otherwise, there is a risk
that a proto non-close function pointer is used during or after the
proto close function pointer is used. The consequences are likely to
be a kernel crash because the proto close function pointer will free
resources used in the Data Link protocol layer.

Therefore, add a reader writer lock (rwlock) solution to prevent the
close proto function pointer from running by using write_lock_irqsave()
whilst the other proto function pointers are protected using
read_lock(). This means HCI_UART_PROTO_READY can safely be cleared
in the knowledge that no proto function pointers are running.

When flag HCI_UART_PROTO_READY is put into the clear state,
proto close function pointer can safely be run. Note
flag HCI_UART_PROTO_SET being in the set state prevents the proto
open function pointer from being run so there is no race condition
between proto open and close function pointers.

Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
7 years agoBluetooth: Skip vendor diagnostic configuration for HCI User Channel
Marcel Holtmann [Tue, 2 May 2017 19:43:31 +0000 (12:43 -0700)]
Bluetooth: Skip vendor diagnostic configuration for HCI User Channel

When the HCI User Channel access is requested, then do not try to
undermine it with vendor diagnostic configuration. The exclusive user
is required to configure its own vendor diagnostic in that case and
can not rely on the host stack support.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
7 years agoBluetooth: hci_uart: fix kconfig dependency
Tobias Regnery [Tue, 2 May 2017 13:15:01 +0000 (15:15 +0200)]
Bluetooth: hci_uart: fix kconfig dependency

We see the following link error with CONFIG_BT_HCIUART=y,
CONFIG_BT_HCIUART_LL=y and CONFIG_SERIAL_DEV_BUS=m:

drivers/built-in.o: In function 'll_close':
supp.c:(.text+0x55add4): undefined reference to 'serdev_device_close'
supp.c:(.text+0x55add4): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol 'serdev_device_close'
drivers/built-in.o: In function 'll_open':
supp.c:(.text+0x55aed0): undefined reference to 'serdev_device_open'
supp.c:(.text+0x55aed0): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol 'serdev_device_open'
drivers/built-in.o: In function `hci_ti_probe':
supp.c:(.text+0x55b00c): undefined reference to 'hci_uart_register_device'
supp.c:(.text+0x55b00c): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol 'hci_uart_register_device'
drivers/built-in.o: In function `ll_setup':
supp.c:(.text+0x55b08c): undefined reference to 'serdev_device_set_flow_control'
supp.c:(.text+0x55b08c): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol 'serdev_device_set_flow_control'
supp.c:(.text+0x55b324): undefined reference to 'serdev_device_set_baudrate'
supp.c:(.text+0x55b324): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol 'serdev_device_set_baudrate'
drivers/built-in.o: In function 'll_init':
supp.c:(.init.text+0x1b508): undefined reference to '__serdev_device_driver_register'
supp.c:(.init.text+0x1b508): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol '__serdev_device_driver_register'

Fix this by dependig BT_HCIUART_LL on the BT_HCIUART_SERDEV symbol.
This implies a dependency on BT_HCIUART and hci_ll.c is only compiled in
if SERIAl_DEV_BUS is built in or SERIAL_DEV_BUS and BT_HCIUART are
modules.

Fixes: 371805522f87 ("bluetooth: hci_uart: add LL protocol serdev driver support")
Signed-off-by: Tobias Regnery <tobias.regnery@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
7 years agoBluetooth: Set LE Default PHY preferences
Marcel Holtmann [Tue, 2 May 2017 06:54:19 +0000 (23:54 -0700)]
Bluetooth: Set LE Default PHY preferences

If the LE Set Default PHY command is supported, the indicate to the
controller that the host has no preferences for transmitter PHY or
receiver PHY selection.

Issuing this command gives the controller a clear indication that other
PHY can be selected if available.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
7 years agoBluetooth: Enable LE PHY Update Complete event
Marcel Holtmann [Tue, 2 May 2017 06:54:18 +0000 (23:54 -0700)]
Bluetooth: Enable LE PHY Update Complete event

If either LE Set Default PHY command or LE Set PHY commands is
supported, then enable the LE PHY Update Complete event.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
7 years agoBluetooth: Enable LE Channel Selection Algorithm event
Marcel Holtmann [Tue, 2 May 2017 06:54:17 +0000 (23:54 -0700)]
Bluetooth: Enable LE Channel Selection Algorithm event

If the Channel Selection Algorithm #2 feature is supported, then enable
the new LE Channel Selection Algorithm event.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
7 years agoBluetooth: Set LE Suggested Default Data Length to maximum
Marcel Holtmann [Tue, 2 May 2017 04:43:24 +0000 (21:43 -0700)]
Bluetooth: Set LE Suggested Default Data Length to maximum

When LE Data Packet Length Extension is supported, then actually
increase the suggested default data length to the maximum to enable
higher througput.

< HCI Command: LE Read Maximum Data Length (0x08|0x002f) plen 0
> HCI Event: Command Complete (0x0e) plen 12
      LE Read Maximum Data Length (0x08|0x002f) ncmd 1
        Status: Success (0x00)
        Max TX octets: 251
        Max TX time: 2120
        Max RX octets: 251
        Max RX time: 2120

< HCI Command: LE Read Suggested Default Data Length (0x08|0x0023) plen 0
> HCI Event: Command Complete (0x0e) plen 8
      LE Read Suggested Default Data Length (0x08|0x0023) ncmd 1
        Status: Success (0x00)
        TX octets: 27
        TX time: 328

< HCI Command: LE Write Suggested Default Data Length (0x08|0x0024) plen 4
        TX octets: 251
        TX time: 2120
> HCI Event: Command Complete (0x0e) plen 4
      LE Write Suggested Default Data Length (0x08|0x0024) ncmd 1
        Status: Success (0x00)

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
7 years agoBluetooth: Add support for Intel Bluetooth device 9460/9560 [8087:0aaa]
Tedd Ho-Jeong An [Mon, 1 May 2017 20:35:12 +0000 (13:35 -0700)]
Bluetooth: Add support for Intel Bluetooth device 9460/9560 [8087:0aaa]

This patch adds support for Intel Bluetooth device 9460/9560 also known
as Jefferson Peak (JfP). The firmware downloading mechanism is same as
previous generation. So include the new USB product identifier and
whitelist the hardware variant.

T:  Bus=01 Lev=01 Prnt=01 Port=09 Cnt=04 Dev#=  5 Spd=12   MxCh= 0
D:  Ver= 2.01 Cls=e0(wlcon) Sub=01 Prot=01 MxPS=64 #Cfgs=  1
P:  Vendor=8087 ProdID=0aaa Rev= 0.02
C:* #Ifs= 2 Cfg#= 1 Atr=e0 MxPwr=100mA
I:* If#= 0 Alt= 0 #EPs= 3 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=81(I) Atr=03(Int.) MxPS=  64 Ivl=1ms
E:  Ad=02(O) Atr=02(Bulk) MxPS=  64 Ivl=0ms
E:  Ad=82(I) Atr=02(Bulk) MxPS=  64 Ivl=0ms
I:* If#= 1 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=03(O) Atr=01(Isoc) MxPS=   0 Ivl=1ms
E:  Ad=83(I) Atr=01(Isoc) MxPS=   0 Ivl=1ms
I:  If#= 1 Alt= 1 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=03(O) Atr=01(Isoc) MxPS=   9 Ivl=1ms
E:  Ad=83(I) Atr=01(Isoc) MxPS=   9 Ivl=1ms
I:  If#= 1 Alt= 2 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=03(O) Atr=01(Isoc) MxPS=  17 Ivl=1ms
E:  Ad=83(I) Atr=01(Isoc) MxPS=  17 Ivl=1ms
I:  If#= 1 Alt= 3 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=03(O) Atr=01(Isoc) MxPS=  25 Ivl=1ms
E:  Ad=83(I) Atr=01(Isoc) MxPS=  25 Ivl=1ms
I:  If#= 1 Alt= 4 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=03(O) Atr=01(Isoc) MxPS=  33 Ivl=1ms
E:  Ad=83(I) Atr=01(Isoc) MxPS=  33 Ivl=1ms
I:  If#= 1 Alt= 5 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=03(O) Atr=01(Isoc) MxPS=  49 Ivl=1ms
E:  Ad=83(I) Atr=01(Isoc) MxPS=  49 Ivl=1ms
I:  If#= 1 Alt= 6 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=03(O) Atr=01(Isoc) MxPS=  63 Ivl=1ms
E:  Ad=83(I) Atr=01(Isoc) MxPS=  63 Ivl=1ms

Bootloader version:
< HCI Command: Intel Read Version (0x3f|0x0005) plen 0
> HCI Event: Command Complete (0x0e) plen 13
      Intel Read Version (0x3f|0x0005) ncmd 32
        Status: Success (0x00)
        Hardware platform: 0x37
        Hardware variant: 0x11
        Hardware revision: 0.0
        Firmware variant: 0x06
        Firmware revision: 0.1
        Firmware build: 42-52.2015
        Firmware patch: 0

Signed-off-by: Tedd Ho-Jeong An <tedd.an@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
7 years agoMerge branch 'phy-marvell-cleanups'
David S. Miller [Wed, 17 May 2017 20:27:52 +0000 (16:27 -0400)]
Merge branch 'phy-marvell-cleanups'

Andrew Lunn says:

====================
net: phy: marvell: Checkpatch cleanup

I will be contributing a few new features to the Marvell PHY driver
soon. Start by making the code mostly checkpatch clean. There should
not be any functional changes. Just comments set into the correct
format, missing blank lines, turn some comparisons around, and
refactoring to reduce indentation depth.

There is still one camel in the code, but it actually makes sense, so
leave it in piece.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: phy: marvell: checkpatch - Fix remaining long lines
Andrew Lunn [Wed, 17 May 2017 01:26:04 +0000 (03:26 +0200)]
net: phy: marvell: checkpatch - Fix remaining long lines

Fold lines longer than 80 characters

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: phy: marvell: Add helpers to get/set page
Andrew Lunn [Wed, 17 May 2017 01:26:03 +0000 (03:26 +0200)]
net: phy: marvell: Add helpers to get/set page

Makes the code a bit more readable, and solves quite a few checkpatch
warnings of lines longer than 80 characters.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: phy: marvell: Refactor some bigger functions
Andrew Lunn [Wed, 17 May 2017 01:26:02 +0000 (03:26 +0200)]
net: phy: marvell: Refactor some bigger functions

Break big functions up by using a number of smaller helper
function. Solves some of the over 80 lines warnings, by reducing the
indentation level.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: phy: marvell: Checkpatch - assignments and comparisons
Andrew Lunn [Wed, 17 May 2017 01:26:01 +0000 (03:26 +0200)]
net: phy: marvell: Checkpatch - assignments and comparisons

Avoid multiple assignments
Comparisons should place the constant on the right side of the test

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: phy: marvell: Checkpatch - Missing or extra blank lines
Andrew Lunn [Wed, 17 May 2017 01:26:00 +0000 (03:26 +0200)]
net: phy: marvell: Checkpatch - Missing or extra blank lines

Remove the extra blank lines, add one in where recommended.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: phy: Marvell: checkpatch - Comments
Andrew Lunn [Wed, 17 May 2017 01:25:59 +0000 (03:25 +0200)]
net: phy: Marvell: checkpatch - Comments

Use net style comment blocks, and wrap one block with long lines.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'tcp-TCP-TS-option-use-1-ms-clock'
David S. Miller [Wed, 17 May 2017 20:06:03 +0000 (16:06 -0400)]
Merge branch 'tcp-TCP-TS-option-use-1-ms-clock'

Eric Dumazet says:

====================
tcp: TCP TS option use 1 ms clock

TCP Timestamps option is defined in RFC 7323

Traditionally on linux, it has been tied to the internal
'jiffy' variable, because it had been a cheap and good enough
generator.

Unfortunately some distros use HZ=250 or even HZ=100 leading
to not very useful TCP timestamps.

For TCP flows in the DC, Google has used usec resolution for more
than two years with great success [1].
RCVBUF autotuning is more precise.

This series converts tp->tcp_mstamp to a plain u64 value storing
a 1 usec TCP clock.

This choice will allow us to upstream the 1 usec TS option as
discussed in IETF 97.

Kathleen Nichols [2] and others advocate for 1ms TS clocks for
network analysis. (1ms being the lowest value supported by RFC 7323.)

[1] https://www.ietf.org/proceedings/97/slides/slides-97-tcpm-tcp-options-for-low-latency-00.pdf
[2] http://netseminar.stanford.edu/seminars/02_02_17.pdf
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agotcp: switch TCP TS option (RFC 7323) to 1ms clock
Eric Dumazet [Tue, 16 May 2017 21:00:14 +0000 (14:00 -0700)]
tcp: switch TCP TS option (RFC 7323) to 1ms clock

TCP Timestamps option is defined in RFC 7323

Traditionally on linux, it has been tied to the internal
'jiffies' variable, because it had been a cheap and good enough
generator.

For TCP flows on the Internet, 1 ms resolution would be much better
than 4ms or 10ms (HZ=250 or HZ=100 respectively)

For TCP flows in the DC, Google has used usec resolution for more
than two years with great success [1]

Receive size autotuning (DRS) is indeed more precise and converges
faster to optimal window size.

This patch converts tp->tcp_mstamp to a plain u64 value storing
a 1 usec TCP clock.

This choice will allow us to upstream the 1 usec TS option as
discussed in IETF 97.

[1] https://www.ietf.org/proceedings/97/slides/slides-97-tcpm-tcp-options-for-low-latency-00.pdf

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agotcp: replace misc tcp_time_stamp to tcp_jiffies32
Eric Dumazet [Tue, 16 May 2017 21:00:13 +0000 (14:00 -0700)]
tcp: replace misc tcp_time_stamp to tcp_jiffies32

After this patch, all uses of tcp_time_stamp will require
a change when we introduce 1 ms and/or 1 us TCP TS option.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agotcp_lp: cache tcp_time_stamp
Eric Dumazet [Tue, 16 May 2017 21:00:12 +0000 (14:00 -0700)]
tcp_lp: cache tcp_time_stamp

tcp_time_stamp will become slightly more expensive soon,
cache its value.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agotcp_westwood: use tcp_jiffies32 instead of tcp_time_stamp
Eric Dumazet [Tue, 16 May 2017 21:00:11 +0000 (14:00 -0700)]
tcp_westwood: use tcp_jiffies32 instead of tcp_time_stamp

This CC does not need 1 ms tcp_time_stamp and can use
the jiffy based 'timestamp'.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agotcp: use tcp_jiffies32 in __tcp_oow_rate_limited()
Eric Dumazet [Tue, 16 May 2017 21:00:10 +0000 (14:00 -0700)]
tcp: use tcp_jiffies32 in __tcp_oow_rate_limited()

This place wants to use tcp_jiffies32, this is good enough.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agotcp: uses jiffies_32 to feed tp->chrono_start
Eric Dumazet [Tue, 16 May 2017 21:00:09 +0000 (14:00 -0700)]
tcp: uses jiffies_32 to feed tp->chrono_start

tcp_time_stamp will no longer be tied to jiffies.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agotcp: use tcp_jiffies32 to feed probe_timestamp
Eric Dumazet [Tue, 16 May 2017 21:00:08 +0000 (14:00 -0700)]
tcp: use tcp_jiffies32 to feed probe_timestamp

Use tcp_jiffies32 instead of tcp_time_stamp, since
tcp_time_stamp will soon be only used for TCP TS option.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agotcp: use tcp_jiffies32 for rcv_tstamp and lrcvtime
Eric Dumazet [Tue, 16 May 2017 21:00:07 +0000 (14:00 -0700)]
tcp: use tcp_jiffies32 for rcv_tstamp and lrcvtime

Use tcp_jiffies32 instead of tcp_time_stamp, since
tcp_time_stamp will soon be only used for TCP TS option.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agotcp: bic, cubic: use tcp_jiffies32 instead of tcp_time_stamp
Eric Dumazet [Tue, 16 May 2017 21:00:06 +0000 (14:00 -0700)]
tcp: bic, cubic: use tcp_jiffies32 instead of tcp_time_stamp

Use tcp_jiffies32 instead of tcp_time_stamp, since
tcp_time_stamp will soon be only used for TCP TS option.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agotcp_bbr: use tcp_jiffies32 instead of tcp_time_stamp
Eric Dumazet [Tue, 16 May 2017 21:00:05 +0000 (14:00 -0700)]
tcp_bbr: use tcp_jiffies32 instead of tcp_time_stamp

Use tcp_jiffies32 instead of tcp_time_stamp, since
tcp_time_stamp will soon be only used for TCP TS option.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agotcp: use tcp_jiffies32 to feed tp->snd_cwnd_stamp
Eric Dumazet [Tue, 16 May 2017 21:00:04 +0000 (14:00 -0700)]
tcp: use tcp_jiffies32 to feed tp->snd_cwnd_stamp

Use tcp_jiffies32 instead of tcp_time_stamp to feed
tp->snd_cwnd_stamp.

tcp_time_stamp will soon be a litle bit more expensive
than simply reading 'jiffies'.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agotcp: use tcp_jiffies32 to feed tp->lsndtime
Eric Dumazet [Tue, 16 May 2017 21:00:03 +0000 (14:00 -0700)]
tcp: use tcp_jiffies32 to feed tp->lsndtime

Use tcp_jiffies32 instead of tcp_time_stamp to feed
tp->lsndtime.

tcp_time_stamp will soon be a litle bit more expensive
than simply reading 'jiffies'.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agodccp: do not use tcp_time_stamp
Eric Dumazet [Tue, 16 May 2017 21:00:02 +0000 (14:00 -0700)]
dccp: do not use tcp_time_stamp

Use our own macro instead of abusing tcp_time_stamp

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agotcp: introduce tcp_jiffies32
Eric Dumazet [Tue, 16 May 2017 21:00:01 +0000 (14:00 -0700)]
tcp: introduce tcp_jiffies32

We abuse tcp_time_stamp for two different cases :

1) base to generate TCP Timestamp options (RFC 7323)

2) A 32bit version of jiffies since some TCP fields
   are 32bit wide to save memory.

Since we want in the future to have 1ms TCP TS clock,
regardless of HZ value, we want to cleanup things.

tcp_jiffies32 is the truncated jiffies value,
which will be used only in places where we want a 'host'
timestamp.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agotcp: use tp->tcp_mstamp in output path
Eric Dumazet [Tue, 16 May 2017 21:00:00 +0000 (14:00 -0700)]
tcp: use tp->tcp_mstamp in output path

Idea is to later convert tp->tcp_mstamp to a full u64 counter
using usec resolution, so that we can later have fine
grained TCP TS clock (RFC 7323), regardless of HZ value.

We try to refresh tp->tcp_mstamp only when necessary.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosch_dsmark: Fix uninitialized variable warning.
David S. Miller [Wed, 17 May 2017 20:03:16 +0000 (16:03 -0400)]
sch_dsmark: Fix uninitialized variable warning.

We still need to initialize err to -EINVAL for
the case where 'opt' is NULL in dsmark_init().

Fixes: 6529eaba33f0 ("net: sched: introduce tcf block infractructure")
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'net-sched-multichain-filters'
David S. Miller [Wed, 17 May 2017 19:22:15 +0000 (15:22 -0400)]
Merge branch 'net-sched-multichain-filters'

Jiri Pirko says:

====================
net: sched: introduce multichain support for filters

Currently, each classful qdisc holds one chain of filters.
This chain is traversed and each filter could be matched on, which
may lead to execution of list of actions. One of such action
could be "reclassify", which would "reset" the processing of the
filter chain.

So this filter chain could be looked at as a flat table.
Sometimes it is convenient for user to configure a hierarchy
of tables. Example usecase is encapsulation.

Hierarchy of tables is a common way how it is done in HW pipelines.
So it is much more convenient to offload this.

This patchset contains two major patches:
8/10 - This patch introduces the support for having multiple
       chains of filters.
10/10 - This patch adds new control action to allow going to specified chain

The rest of the patches are smaller or bigger depencies of those 2.
Please see individual patch descriptions for details.

Corresponding iproute2 patches are appended as a reply to this cover letter.

Simple example:
$ tc qdisc add dev eth0 ingress
$ tc filter add dev eth0 parent ffff: protocol ip pref 33 flower dst_mac 52:54:00:3d:c7:6d action goto chain 11
$ tc filter add dev eth0 parent ffff: protocol ip pref 22 chain 11 flower dst_ip 192.168.40.1 action drop
$ tc filter show dev eth0 root
filter parent ffff: protocol ip pref 33 flower chain 0
filter parent ffff: protocol ip pref 33 flower chain 0 handle 0x1
  dst_mac 52:54:00:3d:c7:6d
  eth_type ipv4
        action order 1: gact action goto chain 11
         random type none pass val 0
         index 2 ref 1 bind 1

filter parent ffff: protocol ip pref 22 flower chain 11
filter parent ffff: protocol ip pref 22 flower chain 11 handle 0x1
  eth_type ipv4
  dst_ip 192.168.40.1
        action order 1: gact action drop
         random type none pass val 0
         index 3 ref 1 bind 1
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: sched: add termination action to allow goto chain
Jiri Pirko [Wed, 17 May 2017 09:08:03 +0000 (11:08 +0200)]
net: sched: add termination action to allow goto chain

Introduce new type of termination action called "goto_chain". This allows
user to specify a chain to be processed. This action type is
then processed as a return value in tcf_classify loop in similar
way as "reclassify" is, only it does not reset to the first filter
in chain but rather reset to the first filter of the desired chain.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: sched: push tp down to action init
Jiri Pirko [Wed, 17 May 2017 09:08:02 +0000 (11:08 +0200)]
net: sched: push tp down to action init

Tp pointer will be needed by the next patch in order to get the chain.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: sched: introduce multichain support for filters
Jiri Pirko [Wed, 17 May 2017 09:08:01 +0000 (11:08 +0200)]
net: sched: introduce multichain support for filters

Instead of having only one filter per block, introduce a list of chains
for every block. Create chain 0 by default. UAPI is extended so the user
can specify which chain he wants to change. If the new attribute is not
specified, chain 0 is used. That allows to maintain backward
compatibility. If chain does not exist and user wants to manipulate with
it, new chain is created with specified index. Also, when last filter is
removed from the chain, the chain is destroyed.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: sched: push chain dump to a separate function
Jiri Pirko [Wed, 17 May 2017 09:08:00 +0000 (11:08 +0200)]
net: sched: push chain dump to a separate function

Since there will be multiple chains to dump, push chain dumping code to
a separate function.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: sched: introduce helpers to work with filter chains
Jiri Pirko [Wed, 17 May 2017 09:07:59 +0000 (11:07 +0200)]
net: sched: introduce helpers to work with filter chains

Introduce struct tcf_chain object and set of helpers around it. Wraps up
insertion, deletion and search in the filter chain.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: sched: move TC_H_MAJ macro call into tcf_auto_prio
Jiri Pirko [Wed, 17 May 2017 09:07:58 +0000 (11:07 +0200)]
net: sched: move TC_H_MAJ macro call into tcf_auto_prio

Call the helper from the function rather than to always adjust the
return value of the function.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: sched: replace nprio by a bool to make the function more readable
Jiri Pirko [Wed, 17 May 2017 09:07:57 +0000 (11:07 +0200)]
net: sched: replace nprio by a bool to make the function more readable

The use of "nprio" variable in tc_ctl_tfilter is a bit cryptic and makes
a reader wonder what is going on for a while. So help him to understand
this priority allocation dance a litte bit better.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: sched: rename tcf_destroy_chain helper
Jiri Pirko [Wed, 17 May 2017 09:07:56 +0000 (11:07 +0200)]
net: sched: rename tcf_destroy_chain helper

Make the name consistent with the rest of the helpers around.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: sched: introduce tcf block infractructure
Jiri Pirko [Wed, 17 May 2017 09:07:55 +0000 (11:07 +0200)]
net: sched: introduce tcf block infractructure

Currently, the filter chains are direcly put into the private structures
of qdiscs. In order to be able to have multiple chains per qdisc and to
allow filter chains sharing among qdiscs, there is a need for common
object that would hold the chains. This introduces such object and calls
it "tcf_block".

Helpers to get and put the blocks are provided to be called from
individual qdisc code. Also, the original filter_list pointers are left
in qdisc privs to allow the entry into tcf_block processing without any
added overhead of possible multiple pointer dereference on fast path.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: sched: move tc_classify function to cls_api.c
Jiri Pirko [Wed, 17 May 2017 09:07:54 +0000 (11:07 +0200)]
net: sched: move tc_classify function to cls_api.c

Move tc_classify function to cls_api.c where it belongs, rename it to
fit the namespace.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'dsa-sort'
David S. Miller [Wed, 17 May 2017 19:19:40 +0000 (15:19 -0400)]
Merge branch 'dsa-sort'

Andrew Lunn says:

====================
net: dsa: Sort various lists

As we gain more DSA drivers and tagging protocols, the lists are
getting a bit unruly. Do some sorting.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agodrivers: net: DSA: Sort drivers
Andrew Lunn [Tue, 16 May 2017 20:40:08 +0000 (22:40 +0200)]
drivers: net: DSA: Sort drivers

With more drivers being added, it is time to sort the drivers to
impose some order.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: dsa: Sort DSA tagging protocol drivers
Andrew Lunn [Tue, 16 May 2017 20:40:07 +0000 (22:40 +0200)]
net: dsa: Sort DSA tagging protocol drivers

With more tag protocols being added, regain some order by sorting the
entries in various places.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoliquidio: fix PF falsely indicating success at setting MAC address of a nonexistent VF
Felix Manlunas [Tue, 16 May 2017 18:28:00 +0000 (11:28 -0700)]
liquidio: fix PF falsely indicating success at setting MAC address of a nonexistent VF

In the function assigned to .ndo_set_vf_mac, check the validity of the
vfidx argument before proceeding to tell the firmware to set the VF MAC
address.

Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: Derek Chickles <derek.chickles@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoliquidio: fix insmod failure when multiple NICs are plugged in
Rick Farrington [Tue, 16 May 2017 18:14:50 +0000 (11:14 -0700)]
liquidio: fix insmod failure when multiple NICs are plugged in

When multiple liquidio NICs are plugged in, the first insmod of the PF
driver succeeds.  But after an rmmod, a subsequent insmod fails.  Reason is
during rmmod, the PF driver resets the Octeon of only one of the NICs; it
neglects to reset the Octeons of the other NICs.

Fix the insmod failure by adding the missing Octeon resets at rmmod.  Keep
a per-NIC refcount that indicates the number of active PFs in a given NIC.
When the refcount goes to zero, then reset the Octeon of that NIC.

Signed-off-by: Rick Farrington <ricardo.farrington@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: dsa: store CPU port pointer in the tree
Vivien Didelot [Tue, 16 May 2017 18:10:33 +0000 (14:10 -0400)]
net: dsa: store CPU port pointer in the tree

A dsa_switch_tree instance holds a dsa_switch pointer and a port index
to identify the switch port to which the CPU is attached.

Now that the DSA layer has a dsa_port structure to hold this data, use
it to point the switch CPU port.

This patch simply substitutes s/dst->cpu_switch/dst->cpu_dp->ds/ and
s/dst->cpu_port/dst->cpu_dp->index/.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'mlxsw-Preparations-for-restructuring'
David S. Miller [Wed, 17 May 2017 18:06:56 +0000 (14:06 -0400)]
Merge branch 'mlxsw-Preparations-for-restructuring'

Jiri Pirko says:

====================
mlxsw: Preparations for restructuring

This patchset doesn't introduce any functional changes and merely meant
to make the code base more receptive for upcoming restructuring.

The first six patches mainly shuffle code in order to reduce the scope of
structs that shouldn't be defined in the main driver header. Most of them
will be later expanded, so it makes sense to correctly place them now.

The last patches mostly simplify bridge-related functions, so that they
could be more easily modified later on.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum: Default ports to non-virtual mode
Ido Schimmel [Tue, 16 May 2017 17:38:35 +0000 (19:38 +0200)]
mlxsw: spectrum: Default ports to non-virtual mode

In virtual mode, packets are classified to FIDs based on their ingress
port and VLAN whereas in non-virtual mode only the VLAN is taken into
account.

Currently ports are initialized to use virtual mode due to the presence
of the PVID vPort. However, we're going to transition ports between both
modes based on the FIDs they use and not merely based on the presence on
a VLAN upper. Therefore, during initialization, no mode will be
explicitly set.

Since the Programmer's Reference Manual (PRM) doesn't specify a default,
explicitly set the port to non-virtual mode and later transition the
port between both modes based on the FIDs it uses.

In a follow-up patchset, this step will be moved to the common FID core
where it logically belongs.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum: Move PVID code to appropriate place
Ido Schimmel [Tue, 16 May 2017 17:38:34 +0000 (19:38 +0200)]
mlxsw: spectrum: Move PVID code to appropriate place

PVID is a port attribute and should therefore reside in the main driver
file and not the switchdev specific one.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum_switchdev: Don't batch learning operations
Ido Schimmel [Tue, 16 May 2017 17:38:33 +0000 (19:38 +0200)]
mlxsw: spectrum_switchdev: Don't batch learning operations

We no longer batch VLAN operations, so there's no need to set the
learning state for a range of VLANs.

Use a common function to set the learning state for a Port-VLAN, thereby
making the code saner more receptive for upcoming changes.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum_switchdev: Don't batch STP operations
Ido Schimmel [Tue, 16 May 2017 17:38:32 +0000 (19:38 +0200)]
mlxsw: spectrum_switchdev: Don't batch STP operations

Simplify the code by using the common function that sets an STP state
for a Port-VLAN and remove the existing one that tries to batch it for
several VLANs.

This will help us in a follow-up patchset to introduce a unified
infrastructure for bridge ports, regardless if the bridge is VLAN-aware
or not.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum_switchdev: Don't batch VLAN operations
Ido Schimmel [Tue, 16 May 2017 17:38:31 +0000 (19:38 +0200)]
mlxsw: spectrum_switchdev: Don't batch VLAN operations

switchdev's VLAN object has the ability to describe a range of VLAN IDs,
but this is only used when VLAN operations are done using the SELF flag,
which is something we would like to remove as it allows one to bypass
the bridge driver.

Do VLAN operations on a per-VLAN basis, thereby simplifying the code and
preparing it for refactoring in a follow-up patchset.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum_switchdev: Remove redundant check
Ido Schimmel [Tue, 16 May 2017 17:38:30 +0000 (19:38 +0200)]
mlxsw: spectrum_switchdev: Remove redundant check

Since commit 97c242902c20 ("switchdev: Execute bridge ndos only for
bridge ports") switchdev code checks that port is bridged, so no need to
perform the same check in the driver.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum_router: Initialize RIFs in a separate function
Ido Schimmel [Tue, 16 May 2017 17:38:29 +0000 (19:38 +0200)]
mlxsw: spectrum_router: Initialize RIFs in a separate function

The router interfaces (RIFs) array is currently initialized together
with the general router configuration. However, in a follow-up patchset
we're going to introduce a common RIF core that will require us to
initialize more RIF constructs, so move the RIF initialization to its
own function.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum_router: Move FIB notification block to router struct
Ido Schimmel [Tue, 16 May 2017 17:38:28 +0000 (19:38 +0200)]
mlxsw: spectrum_router: Move FIB notification block to router struct

The FIB notification block logically belongs inside the router specific
struct, so move it there.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum_router: Move RIFs array to its rightful place
Ido Schimmel [Tue, 16 May 2017 17:38:27 +0000 (19:38 +0200)]
mlxsw: spectrum_router: Move RIFs array to its rightful place

The router interfaces (RIFs) array is of no interest to code outside the
routing realm, so declare it inside the router specific struct instead
of the chip-wide one.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum_switchdev: Reduce scope of bridge struct
Ido Schimmel [Tue, 16 May 2017 17:38:26 +0000 (19:38 +0200)]
mlxsw: spectrum_switchdev: Reduce scope of bridge struct

Some attributes in the global chip struct are only relevant for bridge
operation, so encapsulate them in their own struct that isn't exposed to
non-bridge code.

This will also help us later, when we add more bridge-specific
attributes.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum_router: Reduce scope of router struct
Ido Schimmel [Tue, 16 May 2017 17:38:25 +0000 (19:38 +0200)]
mlxsw: spectrum_router: Reduce scope of router struct

In a similar fashion to previous patch, the router structure
('mlxsw_sp_router') doesn't need to be accessible to anyone, but the
router code located at spectrum_router.c

Make this apparent and reduce its scope by defining it there.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum_buffer: Reduce scope of shared buffer struct
Ido Schimmel [Tue, 16 May 2017 17:38:24 +0000 (19:38 +0200)]
mlxsw: spectrum_buffer: Reduce scope of shared buffer struct

The shared buffer structure ('mlxsw_sp_sb') doesn't need to be
accessible to anyone, but the shared buffer code located at
spectrum_buffers.c

Make this apparent and reduce its scope by defining it there.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agocxgb4: add new T5 pci device id
Ganesh Goudar [Tue, 16 May 2017 16:09:05 +0000 (21:39 +0530)]
cxgb4: add new T5 pci device id

Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agocxgb4: reduce resource allocation in kdump kernel
Ganesh Goudar [Tue, 16 May 2017 15:47:42 +0000 (21:17 +0530)]
cxgb4: reduce resource allocation in kdump kernel

When is_kdump_kernel() is true, reduce memory footprint of
cxgb4 by using a single "Queue Set".

Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoliquidio: use pcie_flr instead of duplicating it
Christoph Hellwig [Tue, 16 May 2017 14:21:46 +0000 (16:21 +0200)]
liquidio: use pcie_flr instead of duplicating it

Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: phy: Remove residual magic from PHY drivers
Andrew Lunn [Tue, 16 May 2017 16:29:11 +0000 (18:29 +0200)]
net: phy: Remove residual magic from PHY drivers

commit fa8cddaf903c ("net phylib: Remove unnecessary condition check in phy")
removed the only place where the PHY flag PHY_HAS_MAGICANEG was
checked. But it left the flag being set in the drivers. Remove the flag.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobnx2x: Remove open coded carrier check
Leon Romanovsky [Tue, 16 May 2017 12:20:56 +0000 (15:20 +0300)]
bnx2x: Remove open coded carrier check

There is inline function to test if carrier present,
so it makes open-coded solution redundant.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agotcp: internal implementation for pacing
Eric Dumazet [Tue, 16 May 2017 11:24:36 +0000 (04:24 -0700)]
tcp: internal implementation for pacing

BBR congestion control depends on pacing, and pacing is
currently handled by sch_fq packet scheduler for performance reasons,
and also because implemening pacing with FQ was convenient to truly
avoid bursts.

However there are many cases where this packet scheduler constraint
is not practical.
- Many linux hosts are not focusing on handling thousands of TCP
  flows in the most efficient way.
- Some routers use fq_codel or other AQM, but still would like
  to use BBR for the few TCP flows they initiate/terminate.

This patch implements an automatic fallback to internal pacing.

Pacing is requested either by BBR or use of SO_MAX_PACING_RATE option.

If sch_fq happens to be in the egress path, pacing is delegated to
the qdisc, otherwise pacing is done by TCP itself.

One advantage of pacing from TCP stack is to get more precise rtt
estimations, and less work done from TX completion, since TCP Small
queue limits are not generally hit. Setups with single TX queue but
many cpus might even benefit from this.

Note that unlike sch_fq, we do not take into account header sizes.
Taking care of these headers would add additional complexity for
no practical differences in behavior.

Some performance numbers using 800 TCP_STREAM flows rate limited to
~48 Mbit per second on 40Gbit NIC.

If MQ+pfifo_fast is used on the NIC :

$ sar -n DEV 1 5 | grep eth
14:48:44         eth0 725743.00 2932134.00  46776.76 4335184.68      0.00      0.00      1.00
14:48:45         eth0 725349.00 2932112.00  46751.86 4335158.90      0.00      0.00      0.00
14:48:46         eth0 725101.00 2931153.00  46735.07 4333748.63      0.00      0.00      0.00
14:48:47         eth0 725099.00 2931161.00  46735.11 4333760.44      0.00      0.00      1.00
14:48:48         eth0 725160.00 2931731.00  46738.88 4334606.07      0.00      0.00      0.00
Average:         eth0 725290.40 2931658.20  46747.54 4334491.74      0.00      0.00      0.40
$ vmstat 1 5
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 4  0      0 259825920  45644 2708324    0    0    21     2  247   98  0  0 100  0  0
 4  0      0 259823744  45644 2708356    0    0     0     0 2400825 159843  0 19 81  0  0
 0  0      0 259824208  45644 2708072    0    0     0     0 2407351 159929  0 19 81  0  0
 1  0      0 259824592  45644 2708128    0    0     0     0 2405183 160386  0 19 80  0  0
 1  0      0 259824272  45644 2707868    0    0     0    32 2396361 158037  0 19 81  0  0

Now use MQ+FQ :

lpaa23:~# echo fq >/proc/sys/net/core/default_qdisc
lpaa23:~# tc qdisc replace dev eth0 root mq

$ sar -n DEV 1 5 | grep eth
14:49:57         eth0 678614.00 2727930.00  43739.13 4033279.14      0.00      0.00      0.00
14:49:58         eth0 677620.00 2723971.00  43674.69 4027429.62      0.00      0.00      1.00
14:49:59         eth0 676396.00 2719050.00  43596.83 4020125.02      0.00      0.00      0.00
14:50:00         eth0 675197.00 2714173.00  43518.62 4012938.90      0.00      0.00      1.00
14:50:01         eth0 676388.00 2719063.00  43595.47 4020171.64      0.00      0.00      0.00
Average:         eth0 676843.00 2720837.40  43624.95 4022788.86      0.00      0.00      0.40
$ vmstat 1 5
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 2  0      0 259832240  46008 2710912    0    0    21     2  223  192  0  1 99  0  0
 1  0      0 259832896  46008 2710744    0    0     0     0 1702206 198078  0 17 82  0  0
 0  0      0 259830272  46008 2710596    0    0     0     0 1696340 197756  1 17 83  0  0
 4  0      0 259829168  46024 2710584    0    0    16     0 1688472 197158  1 17 82  0  0
 3  0      0 259830224  46024 2710408    0    0     0     0 1692450 197212  0 18 82  0  0

As expected, number of interrupts per second is very different.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Van Jacobson <vanj@google.com>
Cc: Jerry Chu <hkchu@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'udp-scalability-improvements'
David S. Miller [Tue, 16 May 2017 19:41:31 +0000 (15:41 -0400)]
Merge branch 'udp-scalability-improvements'

Paolo Abeni says:

====================
udp: scalability improvements

This patch series implement an idea suggested by Eric Dumazet to
reduce the contention of the udp sk_receive_queue lock when the socket is
under flood.

An ancillary queue is added to the udp socket, and the socket always
tries first to read packets from such queue. If it's empty, we splice
the content from sk_receive_queue into the ancillary queue.

The first patch introduces some helpers to keep the udp code small, and the
following two implement the ancillary queue strategy. The code is split
to hopefully help the reviewing process.

The measured overall gain under udp flood is up to the 30% depending on
the numa layout and the number of ingress queue used by the relevant nic.

The performance numbers have been gathered using pktgen as sender, with 64
bytes packets, random src port on a host b2b connected via a 10Gbs link
with the dut.

The receiver used the udp_sink program by Jesper [1] and an h/w l4 rx hash on
the ingress nic, so that the number of ingress nic rx queues hit by the udp
traffic could be controlled via ethtool -L.

The udp_sink program was bound to the first idle cpu, to get more
stable numbers.

On a single numa node receiver:

nic rx queues           vanilla                 patched kernel
1                       1820 kpps               1900 kpps
2                       1950 kpps               2500 kpps
16                      1670 kpps               2120 kpps

When using a single nic rx queue, busy polling was also enabled,
elsewhere, in the above scenario, the bh processing becomes the bottle-neck
and this produces large artifacts in the measured performances (e.g.
improving the udp sink run time, decreases the overall tput, since more
action from the scheduler comes into play).

[1] https://github.com/netoptimizer/network-testing/blob/master/src/udp_sink.c

v1 -> v2:
  Patches 1/3 and 2/3 are unchanged, in patch 3/3 the rx_queue_lock_held param
  of udp_rmem_release() is now a bool.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoudp: keep the sk_receive_queue held when splicing
Paolo Abeni [Tue, 16 May 2017 09:20:15 +0000 (11:20 +0200)]
udp: keep the sk_receive_queue held when splicing

On packet reception, when we are forced to splice the
sk_receive_queue, we can keep the related lock held, so
that we can avoid re-acquiring it, if fwd memory
scheduling is required.

v1 -> v2:
  the rx_queue_lock_held param in udp_rmem_release() is
  now a bool

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoudp: use a separate rx queue for packet reception
Paolo Abeni [Tue, 16 May 2017 09:20:14 +0000 (11:20 +0200)]
udp: use a separate rx queue for packet reception

under udp flood the sk_receive_queue spinlock is heavily contended.
This patch try to reduce the contention on such lock adding a
second receive queue to the udp sockets; recvmsg() looks first
in such queue and, only if empty, tries to fetch the data from
sk_receive_queue. The latter is spliced into the newly added
queue every time the receive path has to acquire the
sk_receive_queue lock.

The accounting of forward allocated memory is still protected with
the sk_receive_queue lock, so udp_rmem_release() needs to acquire
both locks when the forward deficit is flushed.

On specific scenarios we can end up acquiring and releasing the
sk_receive_queue lock multiple times; that will be covered by
the next patch

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/sock: factor out dequeue/peek with offset code
Paolo Abeni [Tue, 16 May 2017 09:20:13 +0000 (11:20 +0200)]
net/sock: factor out dequeue/peek with offset code

And update __sk_queue_drop_skb() to work on the specified queue.
This will help the udp protocol to use an additional private
rx queue in a later patch.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'nfp-LSO-checksum-and-XDP-datapath-updates'
David S. Miller [Tue, 16 May 2017 16:59:05 +0000 (12:59 -0400)]
Merge branch 'nfp-LSO-checksum-and-XDP-datapath-updates'

Jakub Kicinski says:

====================
nfp: LSO, checksum and XDP datapath updates

This series introduces a number of refinements to standard features
like LSO and checksum offload.  Three major features are support for
CHECKSUM_COMPLETE, refinement of TSO handling and another small speed
up for XDP TX.  This series also switches from depending on some
app FW<>driver ABI versions to heavier use of capabilities.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonfp: eliminate an if statement in calculation of completed frames
Jakub Kicinski [Tue, 16 May 2017 00:55:23 +0000 (17:55 -0700)]
nfp: eliminate an if statement in calculation of completed frames

Given that our rings are always a power of 2, we can simplify the
calculation of number of completed TX descriptors by using masking
instead of if statement based on whether the index have wrapped
or not.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonfp: add a helper for wrapping descriptor index
Jakub Kicinski [Tue, 16 May 2017 00:55:22 +0000 (17:55 -0700)]
nfp: add a helper for wrapping descriptor index

We have a number of places where we calculate the descriptor
index based on a value which may have overflown.  Create a
macro for masking with the ring size.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonfp: complete the XDP TX ring only when it's full
Jakub Kicinski [Tue, 16 May 2017 00:55:21 +0000 (17:55 -0700)]
nfp: complete the XDP TX ring only when it's full

Since XDP TX ring holds "spare" RX buffers anyway, we don't have to
rush the completion.  We can wait until ring fills up completely
before trying to reclaim buffers.  If RX poll has ended an no
buffer has been queued for XDP TX we have no guarantee we will see
another interrupt, so run the reclaim there as well, to make sure
TX statistics won't become stale.

This should help us reclaim more buffers per single queue controller
register read.

Note that the XDP completion is very trivial, it only adds up
the sizes of transmitted frames for statistics so the latency
spike should be acceptable.  In case user sets the ring sizes
to something crazy, limit the completion to 2k entries.

The check if the ring is empty at the beginning of xdp_complete()
is no longer needed - the callers will perform it.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonfp: add CHECKSUM_COMPLETE support
Jakub Kicinski [Tue, 16 May 2017 00:55:20 +0000 (17:55 -0700)]
nfp: add CHECKSUM_COMPLETE support

Introduce NFP_NET_CFG_CTRL_CSUM_COMPLETE capability and implement parsing
of CHECKSUM_COMPLETE metadata.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Edwin Peer <edwin.peer@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonfp: version independent support for chained RSS metadata
Edwin Peer [Tue, 16 May 2017 00:55:19 +0000 (17:55 -0700)]
nfp: version independent support for chained RSS metadata

ABI version 4 introduced metadata chaining. Using the ABI version to signal
metadata chaining precludes firmware that advertises new capabilities which
rely on prepended metadata from working on older kernels.

Capability bits are thus better suited to signalling the chained metadata
format. A new version of the RSS capability is introduced to distinguish
between the differing metadata formats for ABI versions other than 4.

Signed-off-by: Edwin Peer <edwin.peer@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonfp: don't assume RSS and IRQ moderation are always enabled
Jakub Kicinski [Tue, 16 May 2017 00:55:18 +0000 (17:55 -0700)]
nfp: don't assume RSS and IRQ moderation are always enabled

Even if capability for RSS and IRQ moderation are present we may
have not initialized them for control vNIC.  Depend on selected
features mask (ctrl) rather than capabilities (cap) to determine
which features should be enabled.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonfp: support LSO2 capability
Edwin Peer [Tue, 16 May 2017 00:55:17 +0000 (17:55 -0700)]
nfp: support LSO2 capability

Firmware advertising the LSO2 capability exploits driver provided L3 and L4
offsets in order to avoid parsing packet headers in the TX path. The vlan
field in struct nfp_net_tx_desc is repurposed, making TXVLAN a mutually
exclusive configuration to LSO2.

Signed-off-by: Edwin Peer <edwin.peer@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonfp: rename l4_offset in struct nfp_net_tx_desc to lso_hdrlen
Edwin Peer [Tue, 16 May 2017 00:55:16 +0000 (17:55 -0700)]
nfp: rename l4_offset in struct nfp_net_tx_desc to lso_hdrlen

The l4_offset field referred to by NFD is confusingly named. It is not the
offset of the L4 transport header, but rather the L4 payload.

The LSO2 capability supported by alternative device firmware requires
the actual L4 offset, thus the rename seems prudent.

Signed-off-by: Edwin Peer <edwin.peer@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonfp: don't enable TSO on the device when disabled
Jakub Kicinski [Tue, 16 May 2017 00:55:15 +0000 (17:55 -0700)]
nfp: don't enable TSO on the device when disabled

We advertise TSO to the stack but leave it disabled by default.
Make sure it's not only disabled in the netdev features but
also on the device itself.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: socket: mark socket protocol handler structs as const
linzhang [Mon, 15 May 2017 02:26:47 +0000 (10:26 +0800)]
net: socket: mark socket protocol handler structs as const

Signed-off-by: linzhang <xiaolou4617@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agotools: hv: Add clean up for included files in Ubuntu net config
Haiyang Zhang [Fri, 12 May 2017 19:14:11 +0000 (12:14 -0700)]
tools: hv: Add clean up for included files in Ubuntu net config

The clean up function is updated to cover duplicate config info in
files included by "source" key word in Ubuntu network config.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobnxt: add dma mapping attributes
Shannon Nelson [Wed, 10 May 2017 01:30:12 +0000 (18:30 -0700)]
bnxt: add dma mapping attributes

On the SPARC platform we need to use the DMA_ATTR_WEAK_ORDERING attribute
in our Rx path dma mapping in order to get the expected performance out
of the receive path.  Adding it to the Tx path has little effect, so
that's not a part of this patch.

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Reviewed-by: Tushar Dave <tushar.n.dave@oracle.com>
Reviewed-by: Tom Saeger <tom.saeger@oracle.com>
Acked-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'xgene-Add-ethtool-stats-and-bug-fixes'
David S. Miller [Tue, 16 May 2017 15:41:11 +0000 (11:41 -0400)]
Merge branch 'xgene-Add-ethtool-stats-and-bug-fixes'

Iyappan Subramanian says:

====================
drivers: net: xgene: Add ethtool stats and bug fixes

This patch set,

- adds ethtool extended statistics support
- addresses errata workarounds
- fixes bugs related to statistics

v2: Address review comments from v1
- Adds lock to protect mdio-xgene indirect MAC access
- Refactors xgene-enet indirect MAC read/write functions
- Uses mdio-xgene MAC access routines, if xgene-enet port
  use the same HW.
v1:
- Initial version

Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: Quan Nguyen <qnguyen@apm.com>
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agodrivers: net: xgene: Fix redundant prefetch buffer cleanup
Iyappan Subramanian [Wed, 10 May 2017 20:45:10 +0000 (13:45 -0700)]
drivers: net: xgene: Fix redundant prefetch buffer cleanup

Prefetch buffer cleanup code was called twice, causing EDAC to
report errors during reboot.

[ 1130.972475] xgene-edac 78800000.edac: IOB bridge agent (BA) transaction
error
[ 1130.979584] xgene-edac 78800000.edac: IOB BA write response error
[ 1130.985648] xgene-edac 78800000.edac: IOB BA write access at 0x00.00000000
()
[ 1130.993612] xgene-edac 78800000.edac: IOB BA requestor ID 0x00002400
[ 1131.000242] xgene-edac 78800000.edac: IOB bridge agent (BA) transaction
error
...

This patch fixes the errors by,

- removing the redundant prefetch buffer cleanup from port_ops->shutdown()
- moving port_ops->shutdown() after delete_rings()

Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agodrivers: net: xgene: Workaround for HW errata 10GE_10/ENET_15
Quan Nguyen [Wed, 10 May 2017 20:45:09 +0000 (13:45 -0700)]
drivers: net: xgene: Workaround for HW errata 10GE_10/ENET_15

This patch adds workaround for HW errata 10GE_10 and ENET_15:
"HW statistic counters value are duplicated".

- RFCS duplicates RALN counter
- RFLR duplicates RUND counter
- TFCS duplicates TFRG counter
- RALN should be intepreted as 0 in 10G mode

Signed-off-by: Quan Nguyen <qnguyen@apm.com>
Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agodrivers: net: xgene: Add frame recovered statistics counter for errata 10GE_8/ENET_11
Quan Nguyen [Wed, 10 May 2017 20:45:08 +0000 (13:45 -0700)]
drivers: net: xgene: Add frame recovered statistics counter for errata 10GE_8/ENET_11

This patch adds statistic counter for frames recovered from HW errata
10GE_8 and ENET_11:
"HW reports Length error for valid 64 byte frames with len <46 bytes".

Signed-off-by: Quan Nguyen <qnguyen@apm.com>
Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agodrivers: net: xgene: Workaround for HW errata 10GE_4
Quan Nguyen [Wed, 10 May 2017 20:45:07 +0000 (13:45 -0700)]
drivers: net: xgene: Workaround for HW errata 10GE_4

This patch adds workaround for HW errata 10GE_4:
"XGENET_ICM_ECM_DROP_COUNT_REG_0 reg not clear on read".

Signed-off-by: Quan Nguyen <qnguyen@apm.com>
Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>