Mitch Williams [Thu, 13 Aug 2015 22:11:32 +0000 (15:11 -0700)]
i40evf: tweak init timing
This patch tweaks the init timing of the driver just a little bit to
increase stability on load/unload and SR-IOV enable/disable cycles.
First, run the init_task loop a little quicker in order to reduce
overall init time.
Second, stagger the start of the init task based on the device's
PCIe function ID. This lessens the impact on the firmware when a
whole bunch of VFs are initialized simultaneously, e.g. enabling
SR-IOV without the VF driver blacklisted. For single VFs assigned
to VMs this will have no effect as the function ID will always be 0.
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jesse Brandeburg [Thu, 23 Jul 2015 20:54:42 +0000 (16:54 -0400)]
i40e: warn on double free
Down was requesting queue disables, but then exited immediately without
waiting for the queues to actually disable. This could allow any
function called after i40evf_down to run immediately, including
i40evf_up, and causes a memory leak.
This issue has been fixed in a recent refactor of the reset code, but
add a couple WARN_ONs in the slow path to help us recognize if we
reintroduce this issue or if we missed any cases.
Change-ID: I27b6b5c9a79c1892f0ba453129f116bc32647dd0
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jesse Brandeburg [Thu, 23 Jul 2015 20:54:41 +0000 (16:54 -0400)]
i40e: refactor interrupt enable
The interrupt enable function was always making the caller add
the base_vector from the VSI struct which is already passed to
the function. Just collapse the math into the helper function.
Change-ID: I54ef33aa7ceebc3231c3cc48f7b39fd0c3ff5806
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Anjali Singhai Jain [Thu, 23 Jul 2015 20:54:40 +0000 (16:54 -0400)]
i40e: Strip VEB stats if they are disabled in HW
Due to performance reasons, VEB stats have been disabled in the hw. This
patch adds code to check for that condition before accumulating these
stats.
Change-ID: I7d805669476fedabb073790403703798ae5d878e
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Shannon Nelson [Thu, 23 Jul 2015 20:54:39 +0000 (16:54 -0400)]
i40e/i40evf: add new device id 1588
Add new device id and support for another 20Gb device.
Change-ID: Ib1b61e5bb6201d84953f97cade39a6e3369c2cf2
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Greg Rose [Thu, 23 Jul 2015 20:54:38 +0000 (16:54 -0400)]
i40e: Remove useless message
Remove a useless message that blathers on whenever a vxlan port is deleted.
Change-ID: If63fb8cf38e56cf433b68e498f11389de51919ba
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Shannon Nelson [Thu, 23 Jul 2015 20:54:37 +0000 (16:54 -0400)]
i40e: limit debugfs io ops
Don't let the debugfs register read and write commands try to access
outside of the ioremapped space. While we're at it, remove the use of
a misleading constant.
Change-ID: Ifce2893e232c65c7a76c23532c658f298218a81b
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Mitch Williams [Thu, 23 Jul 2015 20:54:36 +0000 (16:54 -0400)]
i40e: use QOS field consistently
In i40e_ndo_set_vf_port_vlan, we were using the QOS value
inconsistently, sometimes shifting it, sometimes not. Do the shift-and-
or operation correctly, once, and use the result consistently everywhere
in the function.
Change-ID: I46f062f3edc90a8a017ecec9137f4d1ab0ab9e41
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jesse Brandeburg [Thu, 23 Jul 2015 20:54:35 +0000 (16:54 -0400)]
i40e: count drops in netstat interface
The i40e rx_dropped counter was not showing up in netstat -i.
Add the right counter to be updated with the stats.
Change-ID: I4dd552e9995836099184f9d9a08e90edb591155f
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jesse Brandeburg [Thu, 23 Jul 2015 20:54:34 +0000 (16:54 -0400)]
i40e/i40evf: fix Tx hang workaround code
The arm writeback (arm_wb) code is used for kicking the Tx ring to
make sure any pending work is completed even if interrupts are
disabled. It was running when it didn't need to, and not clearing
the ring->arm_wb state after it was set. This caused Tx hangs
to still occur occasionally when there really was no hang.
Fix this by resetting the variable right after it was used.
Change-ID: I7bf75d552ba9c4bd203d40615213861a24bb5594
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Shannon Nelson [Thu, 23 Jul 2015 20:54:32 +0000 (16:54 -0400)]
i40e: fixup padding issue in get_cee_dcb_cfg_v1_resp
The struct i40e_aqc_get_cee_dcb_cfg_v1_resp was originally defined with
word boundary layout issues, which most compilers deal with by silently
adding padding, making the actual struct larger than designed.
This patch adds an extra byte in fields reserved3 and reserved4 to directly
acknowledge that padding.
Because the struct doesn't actually change in size or layout, this doesn't
constitute a change in the API.
Change-ID: I53fa4741b73fa255621232a85fba000b0e223015
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Greg Rose [Thu, 23 Jul 2015 20:54:31 +0000 (16:54 -0400)]
i40e: Fix a port VLAN configuration bug
If a port VLAN is set for a given virtual function (VF) before the VF
driver is loaded then a configuration error results in which the port
VLAN is ignored when the VF driver is subsequently loaded. This causes
the VF's MAC/VLAN filters to not use the correct VLAN filter. This
patch ensures that the port VLAN filter is considered at the right time
during configuration of the VF's MAC/VLAN filters.
Change-ID: I28f404cbc21a4c6d70a7980b87c77f13f06685a4
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Shannon Nelson [Thu, 23 Jul 2015 20:54:30 +0000 (16:54 -0400)]
i40e/i40evf: fix up type clash in i40e_aq_rc_to_posix conversion
The error code sent into i40e_aq_rc_to_posix() are signed values, so we
really need to treat them as such.
Change-ID: I3d1ae0ee9ae0b1b6f5fc424f8b8cc58b0ea93203
Reported-by: Helin Zhang <helin.zhang@intel.com>
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Vasily Averin [Wed, 8 Jul 2015 12:04:26 +0000 (15:04 +0300)]
i40e: rtnl_lock called twice in i40e_pci_error_resume()
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Vasily Averin [Tue, 7 Jul 2015 15:53:38 +0000 (18:53 +0300)]
i40evf: missing rtnl_unlock in i40evf_resume()
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Tested-by: Andrews Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
David S. Miller [Wed, 30 Sep 2015 04:46:21 +0000 (21:46 -0700)]
Merge git://git./linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:
====================
Netfilter/IPVS updates for net-next
The following pull request contains Netfilter/IPVS updates for net-next
containing 90 patches from Eric Biederman.
The main goal of this batch is to avoid recurrent lookups for the netns
pointer, that happens over and over again in our Netfilter/IPVS code. The idea
consists of passing netns pointer from the hook state to the relevant functions
and objects where this may be needed.
You can find more information on the IPVS updates from Simon Horman's commit
merge message:
c3456026adc0 ("Merge tag 'ipvs2-for-v4.4' of https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next").
Exceptionally, this time, I'm not posting the patches again on netdev, Eric
already Cc'ed this mailing list in the original submission. If you need me to
make, just let me know.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Tue, 29 Sep 2015 16:38:36 +0000 (12:38 -0400)]
net: dsa: fix preparation of a port STP update
Because of the default 0 value of ret in dsa_slave_port_attr_set, a
driver may return -EOPNOTSUPP from the commit phase of a STP state,
which triggers a WARN() from switchdev.
This happened on a 6185 switch which does not support hardware bridging.
Reported-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Acked-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Tue, 29 Sep 2015 16:32:03 +0000 (09:32 -0700)]
net: Add support for filtering neigh dump by master device
Add support for filtering neighbor dumps by master device by adding
the NDA_MASTER attribute to the dump request. A new netlink flag,
NLM_F_DUMP_FILTERED, is added to indicate the kernel supports the
request and output is filtered as requested.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 30 Sep 2015 04:24:05 +0000 (21:24 -0700)]
tcp: fix tcp_v6_md5_do_lookup prototype
tcp_v6_md5_do_lookup() now takes a const socket, even if
CONFIG_TCP_MD5SIG is not set.
Fixes:
b83e3deb974c ("tcp: md5: constify tcp_md5_do_lookup() socket argument")
From: Eric Dumazet <edumazet@google.com>
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 30 Sep 2015 04:32:00 +0000 (21:32 -0700)]
Merge branch 'switchdev-callback'
Vivien Didelot says:
====================
net: switchdev: use specific switchdev_obj_*
This patchset changes switchdev add, del, dump operations from this:
int (*switchdev_port_obj_add)(struct net_device *dev,
struct switchdev_obj *obj,
struct switchdev_trans *trans);
int (*switchdev_port_obj_del)(struct net_device *dev,
struct switchdev_obj *obj);
int (*switchdev_port_obj_dump)(struct net_device *dev,
struct switchdev_obj *obj);
to something similar to the notifier_call callback of a notifier_block:
int (*switchdev_port_obj_add)(struct net_device *dev,
enum switchdev_obj_id id,
const void *obj,
struct switchdev_trans *trans);
int (*switchdev_port_obj_del)(struct net_device *dev,
enum switchdev_obj_id id,
const void *obj);
int (*switchdev_port_obj_dump)(struct net_device *dev,
enum switchdev_obj_id id, void *obj,
int (*cb)(void *obj));
This allows the caller to pass and expect back a specific switchdev_obj_*
structure (e.g. switchdev_obj_fdb) instead of the generic switchdev_obj one.
This will simplify pushing the callback function down to the drivers.
The first 3 patches get rid of the dev parameter of the dump callback, since it
is not always neeeded (e.g. vlan_dump) and some drivers (such as DSA drivers)
may not have easy access to it.
Patches 4 and 5 implement the change in the switchdev operations and its users.
Patch 6 extracts the inner switchdev_obj_* structures from switchdev_obj and
removes this last one.
v2: fix error spotted by kbuild (extra ';' inline switchdev_port_obj_dump).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Tue, 29 Sep 2015 16:07:18 +0000 (12:07 -0400)]
net: switchdev: extract struct switchdev_obj_*
Now that switchdev and its drivers directly use specific switchdev_obj_*
structures, move them out of the switchdev_obj union and get rif of this
outer structure.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Tue, 29 Sep 2015 16:07:17 +0000 (12:07 -0400)]
net: switchdev: abstract object in add/del ops
Similar to the notifier_call callback of a notifier_block, change the
function signature of switchdev add and del operations to:
int switchdev_port_obj_add/del(struct net_device *dev,
enum switchdev_obj_id id, void *obj);
This allows the caller to pass a specific switchdev_obj_* structure
instead of the generic switchdev_obj one.
Drivers implementation of these operations and switchdev have been
changed accordingly.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Tue, 29 Sep 2015 16:07:16 +0000 (12:07 -0400)]
net: switchdev: pass callback to dump operation
Similar to the notifier_call callback of a notifier_block, change the
function signature of switchdev dump operation to:
int switchdev_port_obj_dump(struct net_device *dev,
enum switchdev_obj_id id, void *obj,
int (*cb)(void *obj));
This allows the caller to pass and expect back a specific
switchdev_obj_* structure instead of the generic switchdev_obj one.
Drivers implementation of dump operation can now expect this specific
structure and call the callback with it. Drivers have been changed
accordingly.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Tue, 29 Sep 2015 16:07:15 +0000 (12:07 -0400)]
net: switchdev: remove dev from switchdev_obj cb
The net_device associated to a dump operation does not have to be passed
to the callback. switchdev stores it in a superset struct, if needed.
Also some drivers (such as DSA drivers) may not have easy access to it.
This will simplify pushing the callback function down to the drivers.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Tue, 29 Sep 2015 16:07:14 +0000 (12:07 -0400)]
net: switchdev: move dev in switchdev_fdb_dump
The FDB dump callback requires the related net_device so move it to the
struct switchdev_fdb_dump superset instead of using a callback param.
With this done, it'll be simpler to change the dump function signature.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Tue, 29 Sep 2015 16:07:13 +0000 (12:07 -0400)]
net: switchdev: remove dev in port_vlan_dump_put
The static switchdev_port_vlan_dump_put function does not need the
net_device parameter, so remove it.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 30 Sep 2015 04:11:13 +0000 (21:11 -0700)]
Merge branch 'm68k-netdev-modular'
Geert Uytterhoeven says:
====================
net: m68k: Allow modular build
This patch series makes the remaining m68k Ethernet drivers modular.
It's an alternative to the last 3 patches of Paul Gortmaker's series
"[PATCH net-next 0/6] make non-modular code explicitly non-modular".
Note that "[PATCH 5/5] net: macmace: Allow modular build" depends on
"[PATCH 4/5] m68k/mac: Export Peripheral System Controller (PSC) base
address to modules". Feel free to take the dependency through the netdev
tree to avoid modular build breakage.
This was compile-tested only (mac_defconfig + allmodconfig) due to lack
of hardware.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Geert Uytterhoeven [Tue, 29 Sep 2015 08:24:06 +0000 (10:24 +0200)]
net: macmace: Allow modular build
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Geert Uytterhoeven [Tue, 29 Sep 2015 08:24:05 +0000 (10:24 +0200)]
m68k/mac: Export Peripheral System Controller (PSC) base address to modules
If CONFIG_MACMACE=m:
ERROR: psc [drivers/net/ethernet/apple/macmace.ko] undefined!
Add the missing export to fix this.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Geert Uytterhoeven [Tue, 29 Sep 2015 08:24:04 +0000 (10:24 +0200)]
net: hplance: Allow modular build
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Geert Uytterhoeven [Tue, 29 Sep 2015 08:24:03 +0000 (10:24 +0200)]
net: 7990: Export lance_poll() to modules
If CONFIG_HPLANCE=m and CONFIG_NET_POLL_CONTROLLER=y:
ERROR: "lance_poll" [drivers/net/ethernet/amd/hplance.ko] undefined!
Add the missing export to fix this.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Geert Uytterhoeven [Tue, 29 Sep 2015 08:24:02 +0000 (10:24 +0200)]
net: mac8390: Allow modular build
The modular driver supports only one card, just like the built-in
driver.
Note that this limitation is a problem which affects all Nubus card
drivers, because they have to do all their own bus matching, because
Nubus still lacks the necessary driver model support.
Suggested-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Mon, 28 Sep 2015 23:53:48 +0000 (01:53 +0200)]
dsa: mv88e6xxx: Fix unsigned/signed issue
commit
dea870242a9c ("dsa: mv88e6xxx: Allow speed/duplex of port to be
configured") leads to the following static checker warning:
drivers/net/dsa/mv88e6xxx.c:585 mv88e6xxx_adjust_link()
warn: unsigned 'ret' is never less than zero.
drivers/net/dsa/mv88e6xxx.c
573 void mv88e6xxx_adjust_link(struct dsa_switch *ds, int port,
574 struct phy_device *phydev)
575 {
576 struct mv88e6xxx_priv_state *ps = ds_to_priv(ds);
577 u32 ret, reg;
578
579 if (!phy_is_pseudo_fixed_link(phydev))
580 return;
581
582 mutex_lock(&ps->smi_mutex);
583
584 ret = _mv88e6xxx_reg_read(ds, REG_PORT(port), PORT_PCS_CTRL);
585 if (ret < 0)
Make ret an int, which is the return type for _mv88e6xxx_reg_read()
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 30 Sep 2015 03:41:10 +0000 (20:41 -0700)]
Merge branch 'L3_master_device'
David Ahern says:
====================
net: L3 master device
The VRF device is essentially a Layer 3 master device used to associate
netdevices with a specific routing table and to influence FIB lookups
via 'ip rules' and controlling the oif/iif used for the lookup.
This series generalizes the VRF into L3 master device, l3mdev. Similar
to switchdev it has a Kconfig option and separate set of operations
in net_device allowing it to be completely compiled out if not wanted.
The l3mdev methods rely on the 'master' aspect and use of
netdev_master_upper_dev_get_rcu to retrieve the master device from a
given netdevice if it is enslaved to an L3_MASTER.
The VRF device is converted to use the l3mdev operations. At the end the
vrf_ptr is no longer and removed, as are all direct references to VRF.
The end result is a much simpler implementation for VRF.
Thanks to Nikolay for suggestions (eg., use of the master linkage which
is the key to making this work) and to Roopa, Andy and Shrijeet for
early reviews.
v3
- added license header to l3mdev.c
- export symbols in l3mdev.c for use with GPL modules
- removed netdevice header from l3mdev.h (not needed) and fixed
typo in comment
v2
- rebased to top of net-next
- addressed Niks comments (checking master, removing extra lines, and
flipping the order of patches 1 and 2)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Wed, 30 Sep 2015 03:07:18 +0000 (20:07 -0700)]
net: Move netif_index_is_l3_master to l3mdev.h
Change CONFIG dependency to CONFIG_NET_L3_MASTER_DEV as well.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Wed, 30 Sep 2015 03:07:17 +0000 (20:07 -0700)]
net: Remove vrf header file
Move remaining structs to VRF driver and delete the vrf header file.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Wed, 30 Sep 2015 03:07:16 +0000 (20:07 -0700)]
net: Remove the now unused vrf_ptr
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Wed, 30 Sep 2015 03:07:15 +0000 (20:07 -0700)]
net: Replace calls to vrf_dev_get_rth
Replace calls to vrf_dev_get_rth with l3mdev_get_rtable.
The check on the flow flags is handled in the l3mdev operation.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Wed, 30 Sep 2015 03:07:14 +0000 (20:07 -0700)]
net: Replace vrf_dev_table and friends
Replace calls to vrf_dev_table and friends with l3mdev_fib_table
and kin.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Wed, 30 Sep 2015 03:07:13 +0000 (20:07 -0700)]
net: Replace vrf_master_ifindex{, _rcu} with l3mdev equivalents
Replace calls to vrf_master_ifindex_rcu and vrf_master_ifindex with either
l3mdev_master_ifindex_rcu or l3mdev_master_ifindex.
The pattern:
oif = vrf_master_ifindex(dev) ? : dev->ifindex;
is replaced with
oif = l3mdev_fib_oif(dev);
And remove the now unused vrf macros.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Wed, 30 Sep 2015 03:07:12 +0000 (20:07 -0700)]
net: Add support for l3mdev ops to VRF driver
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Wed, 30 Sep 2015 03:07:11 +0000 (20:07 -0700)]
net: Introduce L3 Master device abstraction
L3 master devices allow users of the abstraction to influence FIB lookups
for enslaved devices. Current API provides a means for the master device
to return a specific FIB table for an enslaved device, to return an
rtable/custom dst and influence the OIF used for fib lookups.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Wed, 30 Sep 2015 03:07:10 +0000 (20:07 -0700)]
net: Rename IFF_VRF_MASTER to IFF_L3MDEV_MASTER
Rename IFF_VRF_MASTER to IFF_L3MDEV_MASTER and update the name of the
netif_is_vrf and netif_index_is_vrf macros.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 29 Sep 2015 23:53:10 +0000 (16:53 -0700)]
Merge branch 'listener-refactoring-preparations'
Eric Dumazet says:
====================
tcp: listener refactoring preparations
This patch series makes changes to TCP/DCCP stacks so that
we can switch listener code to lockless mode.
This is done by marking const the listener socket in all
appropriate paths.
FastOpen code had to be changed to not dynamically allocate
a very small structure to make code simpler for following changes.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 29 Sep 2015 14:42:52 +0000 (07:42 -0700)]
tcp: prepare fastopen code for upcoming listener changes
While auditing TCP stack for upcoming 'lockless' listener changes,
I found I had to change fastopen_init_queue() to properly init the object
before publishing it.
Otherwise an other cpu could try to lock the spinlock before it gets
properly initialized.
Instead of adding appropriate barriers, just remove dynamic memory
allocations :
- Structure is 28 bytes on 64bit arches. Using additional 8 bytes
for holding a pointer seems overkill.
- Two listeners can share same cache line and performance would suffer.
If we really want to save few bytes, we would instead dynamically allocate
whole struct request_sock_queue in the future.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 29 Sep 2015 14:42:51 +0000 (07:42 -0700)]
tcp: constify tcp_syn_flood_action() socket argument
tcp_syn_flood_action() will soon be called with unlocked socket.
In order to avoid SYN flood warning being emitted multiple times,
use xchg().
Extend max_qlen_log and synflood_warned fields in struct listen_sock
to u32
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 29 Sep 2015 14:42:50 +0000 (07:42 -0700)]
tcp: constify tcp_v{4|6}_route_req() sock argument
These functions do not change the listener socket.
Goal is to make sure tcp_conn_request() is not messing with
listener in a racy way.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 29 Sep 2015 14:42:49 +0000 (07:42 -0700)]
tcp: cookie_init_sequence() cleanups
Some common IPv4/IPv6 code can be factorized.
Also constify cookie_init_sequence() socket argument.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 29 Sep 2015 14:42:48 +0000 (07:42 -0700)]
tcp/dccp: constify syn_recv_sock() method sock argument
We'll soon no longer hold listener socket lock, these
functions do not modify the socket in any way.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 29 Sep 2015 14:42:47 +0000 (07:42 -0700)]
tcp: constify tcp_create_openreq_child() socket argument
This method does not touch the listener socket.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 29 Sep 2015 14:42:46 +0000 (07:42 -0700)]
dccp: constify dccp_create_openreq_child() sock argument
socket no longer needs to be read/write
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 29 Sep 2015 14:42:45 +0000 (07:42 -0700)]
net: constify sk_gfp_atomic() sock argument
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 29 Sep 2015 14:42:44 +0000 (07:42 -0700)]
inet: constify __inet_inherit_port() sock argument
socket is not touched, make it const.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 29 Sep 2015 14:42:43 +0000 (07:42 -0700)]
inet: constify inet_csk_route_child_sock() socket argument
The socket points to the (shared) listener.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 29 Sep 2015 14:42:42 +0000 (07:42 -0700)]
dccp: use inet6_csk_route_req() helper
Before changing dccp_v6_request_recv_sock() sock argument
to const, we need to get rid of security_sk_classify_flow(),
and it seems doable by reusing inet6_csk_route_req() helper.
We need to add a proto parameter to inet6_csk_route_req(),
not assume it is TCP.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 29 Sep 2015 14:42:41 +0000 (07:42 -0700)]
tcp: remove tcp_rcv_state_process() tcp_hdr argument
Factorize code to get tcp header from skb. It makes no sense
to duplicate code in callers.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 29 Sep 2015 14:42:40 +0000 (07:42 -0700)]
tcp: remove unused len argument from tcp_rcv_state_process()
Once we realize tcp_rcv_synsent_state_process() does not use
its 'len' argument and we get rid of it, then it becomes clear
this argument is no longer used in tcp_rcv_state_process()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 29 Sep 2015 14:42:39 +0000 (07:42 -0700)]
tcp/dccp: constify send_synack and send_reset socket argument
None of these functions need to change the socket, make it
const.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 29 Sep 2015 23:27:47 +0000 (16:27 -0700)]
Merge branch 'ipv4-routing-cleanups'
Alexander Duyck says:
====================
Minor IPv4 routing cleanups
These patches just contain some minor cleanups to address a few minor
issues. The first and the third mostly just improve readability. The
second patch should improve the performance for multicast destination
addresses that do not have a localhost source IP address by avoiding some
unnecessary dereferences.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Mon, 28 Sep 2015 18:10:44 +0000 (11:10 -0700)]
net: Remove martian_source_keep_err goto label
err is initialized to -EINVAL when it is declared. It is not reset until
fib_lookup which is well after the 3 users of the martian_source jump. So
resetting err to -EINVAL at martian_source label is not needed.
Removing that line obviates the need for the martian_source_keep_err label
so delete it.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Mon, 28 Sep 2015 18:10:38 +0000 (11:10 -0700)]
net: Swap ordering of tests in ip_route_input_mc
This patch just swaps the ordering of one of the conditional tests in
ip_route_input_mc. Specifically it swaps the testing for the source
address to see if it is loopback, and the test to see if we allow a
loopback source address.
The reason for swapping these two tests is because it is much faster to
test if an address is loopback than it is to dereference several pointers
to get at the net structure to see if the use of loopback is allowed.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Mon, 28 Sep 2015 18:10:31 +0000 (11:10 -0700)]
net/ipv4: Pass proto as u8 instead of u16 in ip_check_mc_rcu
This patch updates ip_check_mc_rcu so that protocol is passed as a u8
instead of a u16.
The motivation is just to avoid any unneeded type transitions since some
systems will require an instruction to zero extend a u8 field to a u16.
Also it makes it a bit more readable as to the fact that protocol is a u8
so there are no byte ordering changes needed to pass it.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Liviu Dudau [Mon, 28 Sep 2015 16:51:51 +0000 (17:51 +0100)]
RESEND: [PATCH v3 net-next] sky2: use random address if EEPROM is bad
On some embedded systems the EEPROM does not contain a valid MAC address.
In that case it is better to fallback to a generated mac address and
let init scripts fix the value later.
Reported-by: Liviu Dudau <Liviu.Dudau@arm.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
[Changed handcoded setup to use eth_hw_addr_random() and to save new address into HW]
Signed-off-by: Liviu Dudau <Liviu.Dudau@arm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Mon, 28 Sep 2015 16:16:17 +0000 (09:16 -0700)]
netpoll: Drop budget parameter from NAPI polling call hierarchy
For some reason we were carrying the budget value around between the
various calls to napi->poll. If for example one of the drivers called had
a bug in which it returned a non-zero value for work this could result in
the budget value becoming negative.
Rather than carry around a value of budget that is 0 or less we can instead
just loop through and pass 0 to each napi->poll call. If any driver
returns a value for work done that is non-zero then we can report that
driver and continue rather than allowing a bad actor to make the budget
value negative and pass that negative value to napi->poll.
Note, the only actual change here is that instead of letting budget become
negative we are keeping it at 0 regardless of the value returned for work
since it should not be possible for the polling routine to do any actual
work with a budget of 0. So if the polling routine returns a non-0 value
we are just reporting it and continuing with a budget of 0 rather than
letting that work value be subtracted from the budget of 0.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nikolay Aleksandrov [Fri, 25 Sep 2015 17:00:11 +0000 (19:00 +0200)]
bridge: vlan: add per-vlan struct and move to rhashtables
This patch changes the bridge vlan implementation to use rhashtables
instead of bitmaps. The main motivation behind this change is that we
need extensible per-vlan structures (both per-port and global) so more
advanced features can be introduced and the vlan support can be
extended. I've tried to break this up but the moment net_port_vlans is
changed and the whole API goes away, thus this is a larger patch.
A few short goals of this patch are:
- Extensible per-vlan structs stored in rhashtables and a sorted list
- Keep user-visible behaviour (compressed vlans etc)
- Keep fastpath ingress/egress logic the same (optimizations to come
later)
Here's a brief list of some of the new features we'd like to introduce:
- per-vlan counters
- vlan ingress/egress mapping
- per-vlan igmp configuration
- vlan priorities
- avoid fdb entries replication (e.g. local fdb scaling issues)
The structure is kept single for both global and per-port entries so to
avoid code duplication where possible and also because we'll soon introduce
"port0 / aka bridge as port" which should simplify things further
(thanks to Vlad for the suggestion!).
Now we have per-vlan global rhashtable (bridge-wide) and per-vlan port
rhashtable, if an entry is added to a port it'll get a pointer to its
global context so it can be quickly accessed later. There's also a
sorted vlan list which is used for stable walks and some user-visible
behaviour such as the vlan ranges, also for error paths.
VLANs are stored in a "vlan group" which currently contains the
rhashtable, sorted vlan list and the number of "real" vlan entries.
A good side-effect of this change is that it resembles how hw keeps
per-vlan data.
One important note after this change is that if a VLAN is being looked up
in the bridge's rhashtable for filtering purposes (or to check if it's an
existing usable entry, not just a global context) then the new helper
br_vlan_should_use() needs to be used if the vlan is found. In case the
lookup is done only with a port's vlan group, then this check can be
skipped.
Things tested so far:
- basic vlan ingress/egress
- pvids
- untagged vlans
- undef CONFIG_BRIDGE_VLAN_FILTERING
- adding/deleting vlans in different scenarios (with/without global ctx,
while transmitting traffic, in ranges etc)
- loading/removing the module while having/adding/deleting vlans
- extracting bridge vlan information (user ABI), compressed requests
- adding/deleting fdbs on vlans
- bridge mac change, promisc mode
- default pvid change
- kmemleak ON during the whole time
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 29 Sep 2015 18:51:41 +0000 (11:51 -0700)]
Merge branch 'mvneta_percpu_irq'
Gregory CLEMENT says:
====================
net: mvneta: Switch to per-CPU irq and make rxq_def useful
As stated in the first version: "this patchset reworks the Marvell
neta driver in order to really support its per-CPU interrupts, instead
of faking them as SPI, and allow the use of any RX queue instead of
the hardcoded RX queue 0 that we have currently."
Following the review which has been done, Maxime started adding the
CPU hotplug support. I continued his work a few weeks ago and here is
the result.
Since the 1st version the main change is this CPU hotplug support, in
order to validate it I powered up and down the CPUs while performing
iperf. I ran the tests during hours: the kernel didn't crash and the
network interfaces were still usable. Of course it impacted the
performance, but continuously power down and up the CPUs is not
something we usually do.
I also reorganized the series, the 3 first patches should go through
the irq subsystem, whereas the 4 others should go to the network
subsystem.
However, there is a runtime dependency between the two parts. Patch 5
depend on the patch 3 to be able to use the percpu irq.
Thanks,
Gregory
PS: Thanks to Willy who gave me some pointers on how to deal with the
NAPI.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Maxime Ripard [Fri, 25 Sep 2015 16:09:38 +0000 (18:09 +0200)]
net: mvneta: Statically assign queues to CPUs
Since the switch to per-CPU interrupts, we lost the ability to set which
CPU was going to receive our RX interrupt, which was now only the CPU on
which the mvneta_open function was run.
We can now assign our queues to their respective CPUs, and make sure only
this CPU is going to handle our traffic.
This also paves the road to be able to change that at runtime, and later on
to support RSS.
[gregory.clement@free-electrons.com]: hardened the CPU hotplug support.
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Maxime Ripard [Fri, 25 Sep 2015 16:09:37 +0000 (18:09 +0200)]
net: mvneta: Allow different queues
The mvneta driver allows to change the default RX queue trough the rxq_def
kernel parameter.
However, the current code doesn't allow to have any value but 0. It is
actively checked for in the driver's probe because the drivers makes a
number of assumption and takes a number of shortcuts in order to just use
that RX queue.
Remove these limitations in order to be able to specify any available
queue.
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Maxime Ripard [Fri, 25 Sep 2015 16:09:36 +0000 (18:09 +0200)]
net: mvneta: Handle per-cpu interrupts
Now that our interrupt controller is allowing us to use per-CPU interrupts,
actually use it in the mvneta driver.
This involves obviously reworking the driver to have a CPU-local NAPI
structure, and report for incoming packet using that structure.
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Maxime Ripard [Fri, 25 Sep 2015 16:09:35 +0000 (18:09 +0200)]
net: mvneta: Fix CPU_MAP registers initialisation
The CPU_MAP register is duplicated for each CPUs at different addresses,
each instance being at a different address.
However, the code so far was using CONFIG_NR_CPUS to initialise the CPU_MAP
registers for each registers, while the SoCs embed at most 4 CPUs.
This is especially an issue with multi_v7_defconfig, where CONFIG_NR_CPUS
is currently set to 16, resulting in writes to registers that are not
CPU_MAP.
Fixes:
c5aff18204da ("net: mvneta: driver for Marvell Armada 370/XP network unit")
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Cc: <stable@vger.kernel.org> # v3.8+
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Maxime Ripard [Fri, 25 Sep 2015 16:09:34 +0000 (18:09 +0200)]
irqchip: armada-370-xp: Rework per-cpu interrupts handling
The MPIC driver currently has a list of interrupts to handle as per-cpu.
Since the timer, fabric and neta interrupts were the only per-cpu
interrupts in the system, we can now remove the switch and just check for
the hardware irq number to determine whether a given interrupt is per-cpu
or not.
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Acked-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Maxime Ripard [Fri, 25 Sep 2015 16:09:33 +0000 (18:09 +0200)]
irq: Export per-cpu irq allocation and de-allocation functions
Some drivers might use the per-cpu interrupts and still might be built as a
module. Export request_percpu_irq an free_percpu_irq to these user, which
also make it consistent with enable/disable_percpu_irq that were exported.
Reported-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Maxime Ripard [Fri, 25 Sep 2015 16:09:32 +0000 (18:09 +0200)]
genirq: Fix the documentation of request_percpu_irq
The documentation of request_percpu_irq is confusing and suggest that the
interrupt is not enabled at all, while it is actually enabled on the local
CPU.
Clarify that.
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Fri, 25 Sep 2015 21:52:51 +0000 (16:52 -0500)]
bridge: Pass net into br_validate_ipv4 and br_validate_ipv6
The network namespace is easiliy available in state->net so use it.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Eric W. Biederman [Fri, 25 Sep 2015 20:07:31 +0000 (15:07 -0500)]
ipv6: Pass struct net into ip6_route_me_harder
Don't make ip6_route_me_harder guess which network namespace
it is routing in, pass the network namespace in.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Eric W. Biederman [Fri, 25 Sep 2015 20:07:30 +0000 (15:07 -0500)]
ipv4: Pass struct net into ip_route_me_harder
Don't make ip_route_me_harder guess which network namespace
it is routing in, pass the network namespace in.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Eric W. Biederman [Fri, 25 Sep 2015 20:07:29 +0000 (15:07 -0500)]
netfilter: ipt_SYNPROXY: Pass snet into synproxy_send_tcp
ip6t_SYNPROXY already does this and this is needed so that we have a
struct net that can be passed down into ip_route_me_harder, so
that ip_route_me_harder can stop guessing it's context.
Along the way pass snet into synproxy_send_client_synack as this
is the only caller of synprox_send_tcp that is not passed snet
already.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Eric W. Biederman [Fri, 25 Sep 2015 20:07:28 +0000 (15:07 -0500)]
netfilter: Push struct net down into nf_afinfo.reroute
The network namespace is needed when routing a packet.
Stop making nf_afinfo.reroute guess which network namespace
is the proper namespace to route the packet in.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Eric W. Biederman [Fri, 25 Sep 2015 20:07:27 +0000 (15:07 -0500)]
ipv4: Push struct net down into nf_send_reset
This is needed so struct net can be pushed down into
ip_route_me_harder.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Jesper Dangaard Brouer [Mon, 28 Sep 2015 10:47:14 +0000 (12:47 +0200)]
net: help compiler generate better code in eth_get_headlen
Noticed that the compiler (gcc version 4.8.5
20150623 (Red Hat 4.8.5-4) (GCC))
generated suboptimal assembler code in eth_get_headlen().
This early return coding style is usually not an issue, on super scalar CPUs,
but the compiler choose to put the return statement after this very unlikely
branch, thus creating larger jump down to the likely code path.
Performance wise, I could measure slightly less L1-icache-load-misses
and less branch-misses, and an improvement of 1 nanosec with an IP-forwarding
use-case with 257 bytes packets with ixgbe (CPU i7-4790K @ 4.00GHz).
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bendik Rønning Opstad [Wed, 23 Sep 2015 16:49:53 +0000 (18:49 +0200)]
tcp: Fix CWV being too strict on thin streams
Application limited streams such as thin streams, that transmit small
amounts of payload in relatively few packets per RTT, can be prevented
from growing the CWND when in congestion avoidance. This leads to
increased sojourn times for data segments in streams that often transmit
time-dependent data.
Currently, a connection is considered CWND limited only after having
successfully transmitted at least one packet with new data, while at the
same time failing to transmit some unsent data from the output queue
because the CWND is full. Applications that produce small amounts of
data may be left in a state where it is never considered to be CWND
limited, because all unsent data is successfully transmitted each time
an incoming ACK opens up for more data to be transmitted in the send
window.
Fix by always testing whether the CWND is fully used after successful
packet transmissions, such that a connection is considered CWND limited
whenever the CWND has been filled. This is the correct behavior as
specified in RFC2861 (section 3.1).
Cc: Andreas Petlund <apetlund@simula.no>
Cc: Carsten Griwodz <griff@simula.no>
Cc: Jonas Markussen <jonassm@ifi.uio.no>
Cc: Kenneth Klette Jonassen <kennetkl@ifi.uio.no>
Cc: Mads Johannessen <madsjoh@ifi.uio.no>
Signed-off-by: Bendik Rønning Opstad <bro.devel+kernel@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Tested-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Tested-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hariprasad Shenai [Mon, 28 Sep 2015 04:56:53 +0000 (10:26 +0530)]
cxgb4: Add HW timesptamp support for RX
Adds support for ethtool get time stamp ioctl, which is used by
tcpdump to get the supported time stamp types
eg: tcpdump -i eth5 -J
Time stamp types for eth5 (use option -j to set):
host (Host)
adapter_unsynced (Adapter, not synced with system time)
Adds support for adapter unsynced mode, by adding SIOCSHWTSTAMP support
in driver.
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
huangdaode [Sun, 27 Sep 2015 07:22:44 +0000 (15:22 +0800)]
net: Fix Hisilicon Network Subsystem Support Compilation
This patch fixes the compilation error with arm allmodconfig, this error
generated due to unavailability of readq() on 32-bit platform which was
found during net-next daily compilation. In the same time, fix all the
hns drivers compilation warnings.
Signed-off-by: huangdaode <huangdaode@hisilicon.com>
Signed-off-by: zhaungyuzeng <Yisen.zhuang@huawei.com>
Signed-off-by: kenneth Lee <liguozhu@hisilicon.com>
Signed-off-by: yankejian <yankejian@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Robert Jarzmik [Sat, 26 Sep 2015 18:49:20 +0000 (20:49 +0200)]
net: irda: pxaficp_ir: dmaengine conversion
Convert pxaficp_ir to dmaengine. As pxa architecture is shifting from
raw DMA registers access to pxa_dma dmaengine driver, convert this
driver to dmaengine.
Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Tested-by: Petr Cvek <petr.cvek@tul.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Robert Jarzmik [Sat, 26 Sep 2015 18:49:19 +0000 (20:49 +0200)]
net: irda: pxaficp_ir: convert to readl and writel
Convert the pxa IRDA driver to readl and writel primitives, and remove
another set of direct registers access. This leaves only the DMA
registers access, which will be dealt with dmaengine conversion.
Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Tested-by: Petr Cvek <petr.cvek@tul.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Robert Jarzmik [Sat, 26 Sep 2015 18:49:18 +0000 (20:49 +0200)]
net: irda: pxaficp_ir: use sched_clock() for time management
Instead of using directly the OS timer through direct register access,
use the standard sched_clock(), which will end up in OSCR reading
anyway.
This is a first step for direct access register removal and machine
specific code removal from this driver.
This commit changes the behavior, as previously the minimum turnaround
time was counted in 76ns steps, while with this patch it is counted in
microsecond steps. The strictly equal formula would have been :
while ((sched_clock() - si->last_clk) * 76 < mtt)
Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fabio Estevam [Fri, 25 Sep 2015 21:31:32 +0000 (18:31 -0300)]
net: fec: Remove unneeded FEATURES_NEED_QUIESCE definition
There is no need to have FEATURES_NEED_QUIESCE defined as we
can simply use NETIF_F_RXCSUM instead as done in other parts
of the driver.
Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Fri, 25 Sep 2015 21:22:54 +0000 (15:22 -0600)]
net: Remove redundant oif checks in rt6_device_match
The oif has already been checked that it is non-zero; the 2 additional
checks on oif within that if (oif) {...} block are redundant.
CC: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Woojung.Huh@microchip.com [Fri, 25 Sep 2015 21:13:48 +0000 (21:13 +0000)]
lan78xx: Return 0 when lan78xx_suspend() has no error.
lan78xx_suspend() may return non-zero from lan78xx_write_reg() in some scenario.
Fix to return 0 when lan78xx_suspend() has no error.
Signed-off-by: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 29 Sep 2015 05:19:56 +0000 (22:19 -0700)]
Merge branch 'mlx5-next'
Or Gerlitz says:
====================
Mellanox mlx5 driver update
Bunch of changes from the team, while warming engines for the
upcoming SRIOV support.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eli Cohen [Fri, 25 Sep 2015 07:49:16 +0000 (10:49 +0300)]
net/mlx5_core: Update health syndromes
Update new health monitored syndromes and their descriptions.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eli Cohen [Fri, 25 Sep 2015 07:49:15 +0000 (10:49 +0300)]
net/mlx5_core: Fix wrong name in struct
The name refers to syndrome so uset ext_synd instread of ext_sync.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Majd Dibbiny [Fri, 25 Sep 2015 07:49:14 +0000 (10:49 +0300)]
net/mlx5_core: New init and exit flow for mlx5_core
In the new flow, we separate the pci initialization and teardown from the
initialization and teardown of the other resources.
init_one calls mlx5_pci_init that handles the pci resources initialization.
It then calls mlx5_load_one to initialize the remainder of the resources.
When removing a device, remove_one is invoked. However, now remove_one
calls mlx5_unload_one to free all the resources except the pci resources.
When mlx5_unload_one returns, mlx5_pci_close is called to free the pci
resources.
The above separation will allow us to implement the pci error handlers and
suspend and resume callbacks.
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eli Cohen [Fri, 25 Sep 2015 07:49:13 +0000 (10:49 +0300)]
net/mlx5_core: Fix notification of page supplement error
Some errors did not result with notifying firmware that the page request
could not be fulfilled. Fix this and put the notification logic into a
separate function.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eli Cohen [Fri, 25 Sep 2015 07:49:12 +0000 (10:49 +0300)]
net/mlx5_core: Fix async commands return code
In case of async command completion, the error code returned should take
into account the command completion status.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Achiad Shochat [Fri, 25 Sep 2015 07:49:11 +0000 (10:49 +0300)]
net/mlx5_core: Remove redundant "err" variable usage
Cosmetic change.
Do not use the an err variable just to assign and return it.
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Saeed Mahameed [Fri, 25 Sep 2015 07:49:10 +0000 (10:49 +0300)]
net/mlx5_core: Fix struct type in the DESTROY_TIR/TIS device commands
Used the output mailbox format for input mailbox.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Achiad Shochat [Fri, 25 Sep 2015 07:49:09 +0000 (10:49 +0300)]
net/mlx5e: Priv state flag not rolled-back upon netdev open error
The private mlx5 state flag that indicates that the netdev is
opened is set at the beginning of the netdev open flow.
In case an error occured later in the mlx5 netdev open flow, this
flag was not cleared, remaining set although the actual set is
closed.
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrzej Hajda [Fri, 25 Sep 2015 06:45:43 +0000 (08:45 +0200)]
tools: bpf_jit_disasm: make get_last_jit_image return unsigned
The function returns always non-negative values.
The problem has been detected using proposed semantic patch
scripts/coccinelle/tests/assign_signed_to_unsigned.cocci [1].
[1]: http://permalink.gmane.org/gmane.linux.kernel/2046107
Signed-off-by: Andrzej Hajda <a.hajda@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 25 Sep 2015 00:16:05 +0000 (17:16 -0700)]
tcp: avoid reorders for TFO passive connections
We found that a TCP Fast Open passive connection was vulnerable
to reorders, as the exchange might look like
[1] C -> S S <FO ...> <request>
[2] S -> C S. ack request <options>
[3] S -> C . <answer>
packets [2] and [3] can be generated at almost the same time.
If C receives the 3rd packet before the 2nd, it will drop it as
the socket is in SYN_SENT state and expects a SYNACK.
S will have to retransmit the answer.
Current OOO avoidance in linux is defeated because SYNACK
packets are attached to the LISTEN socket, while DATA packets
are attached to the children. They might be sent by different cpus,
and different TX queues might be selected.
It turns out that for TFO, we created a child, which is a
full blown socket in TCP_SYN_RECV state, and we simply can attach
the SYNACK packet to this socket.
This means that at the time tcp_sendmsg() pushes DATA packet,
skb->ooo_okay will be set iff the SYNACK packet had been sent
and TX completed.
This removes the reorder source at the host level.
We also removed the export of tcp_try_fastopen(), as it is no
longer called from IPv6.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>