Catherine Sullivan [Wed, 18 Dec 2013 13:46:03 +0000 (13:46 +0000)]
i40e: Turn flow director off in MFP mode
The driver needs to set the MFP flag earlier in i40e_sw_init
and then can use that flag to decide if other hardware
work-arouds are required.
Change-ID: Ib17ad1e3485f57b28845ab4722294a99f203bd48
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jesse Brandeburg [Wed, 18 Dec 2013 13:46:02 +0000 (13:46 +0000)]
i40e: Add a dummy packet template
The hardware requires a full packet template to be pointed to when adding
hardware flow filters. This patch adds the template and uses it for
programming filters.
Change-ID: I09db9f4ab0207ca9c520ae36596d74e1a0663ae5
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jesse Brandeburg [Wed, 18 Dec 2013 13:46:01 +0000 (13:46 +0000)]
i40e: fix spelling errors
This patch does s/rebuid/rebuild/.
Change-ID: I77ff0c26db3d23ca5dc971b82d71b4ed8c6bcc14
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Mitch Williams [Wed, 18 Dec 2013 13:46:00 +0000 (13:46 +0000)]
i40e: trivial: formatting and checkpatch fixes
Fix some badly formatted lines, long lines and a mis-formatted else.
Change-ID: Iac2eef064ae27c55a0c3d9c15c525bf8fed8ab6f
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Mitch Williams [Wed, 18 Dec 2013 13:45:59 +0000 (13:45 +0000)]
i40e: shorten wordy fields
The alloc_rx_buff_failed and alloc_rx_page_failed variables
are both part of an rx specific structure so just remove
the _rx part of the name. No functional changes.
Change-ID: Icffa2f5d13c6f2b1e09cf45b9472b83c9dae8fc6
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Shannon Nelson [Wed, 18 Dec 2013 13:45:58 +0000 (13:45 +0000)]
i40e: accept pf to pf adminq messages
The admin queue can be used to send messages among the physical
function interfaces. This adds the code to handle that case.
Change-ID: I0700fcc47e41433131a381f0eb72fc7b01b6bd87
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Shannon Nelson [Wed, 18 Dec 2013 13:45:57 +0000 (13:45 +0000)]
i40e: remove interrupt on AQ error
Since nearly everything we do is synchronous, using the interrupt-on-error bit
is unnecessary and causing unneeded interrupts. If anyone wants to use the bit
they can turn it on in individual AQ requests using the cmd_details parameter.
Change-ID: I4690a9c561d3e0836aeadb4f88f8a8702b1d1366
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Shannon Nelson [Wed, 18 Dec 2013 13:45:56 +0000 (13:45 +0000)]
i40e: release NVM resource reservation on startup
Call AQ to release any reservation held by this PF on the NVM resource
lock on startup, in order to clear anything that might have been left over from
a previous run. The lock is only cleared by the requestor calling for it to be
cleared, on power-on, or firmware reset. This should help limit the need for
rebooting a customer machine if something goes wrong on a firmware update or
some other action.
Change-ID: I8c8473e601d4ef512dda7baa77a6e75f2e5fea49
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Anjali Singhai Jain [Wed, 18 Dec 2013 13:45:55 +0000 (13:45 +0000)]
i40e: Cleanup reconfig rss path
RSS initialization was doing some extra work, remove the extra
work and any bugs it created when managing number of queues.
Change-ID: Iea75b04a70d73ce76947b6a177ce89ab4899d4c6
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jesse Brandeburg [Wed, 18 Dec 2013 13:45:54 +0000 (13:45 +0000)]
i40e: disable packet split
The driver and hardware support splitting packets on headers
but with the use of GRO we don't need the extra bus
overhead, so make this driver more like igb and ixgbe and
disable packet split.
Change-ID: Id42f2c3736baa9d5bdfe1f72d64226e7d8ebd737
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Greg Rose [Wed, 18 Dec 2013 13:45:53 +0000 (13:45 +0000)]
i40e: add a comment on barrier and fix panic on reset
The memory barrier used in maybe_stop_tx can use a comment.
Also add checks to VSI->rx_rings to ensure a kernel panic is not induced.
Change-ID: I48cc1bf1d6cf301818155b737edeef77c0d790c7
Change-ID: I1363a8445fbf521a26267849966296ed55f43ad8
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Kamil Krawczyk [Wed, 18 Dec 2013 13:45:52 +0000 (13:45 +0000)]
i40e: Fix MAC format in Write MAC address AQ cmd
The MAC address format expected by the hardware is in a very specific
format, and the driver was filling in the data incorrectly.
Change-ID: I7bc66505ef459ee347dd3bda68051004c141c689
Signed-off-by: Kamil Krawczyk <kamil.krawczyk@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Greg Rose [Wed, 18 Dec 2013 13:45:51 +0000 (13:45 +0000)]
i40e: Fix GPL header
The GPL header included in each file in the i40e driver doesn't
need to include the "this program" text since this driver
is already part of the larger kernel.
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jesse Brandeburg [Wed, 18 Dec 2013 13:45:50 +0000 (13:45 +0000)]
i40e: use kernel specific defines
Replace uses of I40E_LENGTH_OF_ADDRESS with ETH_ALEN.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Anjali Singhai Jain [Wed, 18 Dec 2013 13:45:49 +0000 (13:45 +0000)]
i40e: Re-enable interrupt on ICR0
The hardware can occasionally give an interrupt on the misc
queue for which there is no driver work to do. In that case
the driver was not re-enabling interrupts even though they
were auto masked by hardware. This left interrupts disabled
on this queue.
Re-enable the interrupt whenever leaving this function.
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
David S. Miller [Wed, 8 Jan 2014 20:04:56 +0000 (15:04 -0500)]
Merge branch 'master' of git://git./linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains three Netfilter updates, they are:
* Fix wrong usage of skb_header_pointer in the DCCP protocol helper that
has been there for quite some time. It was resulting in copying the dccp
header to a pointer allocated in the stack. Fortunately, this pointer
provides room for the dccp header is 4 bytes long, so no crashes have been
reported so far. From Daniel Borkmann.
* Use format string to print in the invocation of nf_log_packet(), again
in the DCCP helper. Also from Daniel Borkmann.
* Revert "netfilter: avoid get_random_bytes call" as prandom32 does not
guarantee enough entropy when being calling this at boot time, that may
happen when reloading the rule.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 8 Jan 2014 06:11:19 +0000 (01:11 -0500)]
Merge branch 'master' of git://git./linux/kernel/git/jkirsher/net-next
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates
This series contains updates to i40e only.
Anjali adds more functionality to debugfs to assist development and
testing of admin queue commands.
Greg makes sure broadcast promiscuous is disabled by default, otherwise
VLAN tagged packets out of the assigned VLAN domain are received. Also
provides a fix when the 8021q driver is loaded, so that VLAN 0 tagged
packets are accepted so that upper layers can interpret the priority
bits. Then provides a fix to let the VF to request the PF to set its
already assigned MAC address without generating an error. Greg also
adds helper functions to enable or disable internal switch loopback
when VFs are created or destroyed via the sysfs interface.
Shannon provides most of the changes, where he adds code to ensure
that the hardware waits to make sure that the firmware is ready as well
after reset. Also updates the code to use the new features in the
firmware. Provides a fix while in MFP mode where resources are
reduced, so use a smaller range of test registers than when in SFP mode.
Moves the PF ID initialization code to earlier in the driver
initialization function since a few operations need the information
before the first PF reset is called. Shannon adds a check for MAC
type before reading anything from the registers to ensure we dealing
with the correct MAC type. Then reworks Shadow RAM read word/buffer
functions as to not block whole NVM resources for SR read operations.
Mitch lastly provides a fix to correctly setup ARQ descriptors in
the cleanup code.
Catherine bumps the driver version due to all the recent changes.
v2:
- Added blank lines after local variable declarations to patch 01
as suggested by David Miller
- Used the suggested memset() line in patch 15 as suggested by David
Miller
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Mitch Williams [Wed, 18 Dec 2013 13:45:48 +0000 (13:45 +0000)]
i40e: correctly setup ARQ descriptors
When cleaning descriptors, we must set up ALL fields, not just the DMA
address. The initial setup does this correctly, but not the cleanup
code, so the firmware would process the ring exactly once and then fail.
Change-ID: I2930b83c76194b3016a8ac0fa693f9a573995640
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Kamil Krawczyk [Wed, 18 Dec 2013 13:45:46 +0000 (13:45 +0000)]
i40e: remove redundant AQ enable
The admin queue length register is updated in
config_a<sq|rq>_regs functions. We should not update it again,
as we will trigger firmware to init the AQ again. In this case
firmware will lose the information about the AQ Rx tail position
and will see Rx queue as full (no free desc for FW to use).
Signed-off-by: Kamil Krawczyk <kamil.krawczyk@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Greg Rose [Fri, 13 Dec 2013 08:38:38 +0000 (08:38 +0000)]
i40e: Enable/Disable PF switch LB on SR-IOV configure changes
The PF VSI was never updated to enable or disable internal switch loopback
when VFs were created or destroyed via the sysfs interface. Add some
helper functions to take care of that.
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Shannon Nelson [Wed, 18 Dec 2013 05:29:17 +0000 (05:29 +0000)]
i40e: whitespace paren and comment tweaks
Addresses a few code format issues that have crept in over time.
Change-Id: I1a62cbd16b29a218a933b0f7176abe748f9615e8
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Shannon Nelson [Wed, 11 Dec 2013 08:17:15 +0000 (08:17 +0000)]
i40e: rework shadow ram read functions
Rework Shadow RAM read word/buffer functions to not use AQ Request
Resource commands. Requesting resource is not needed for SR read
operations which are done through SRCTL register. Access to SR through
register is controlled through DONE bit within SRCTL. With this change
we do not block whole NVM resource for SR read operations.
Change-Id: I73e96cdea39a45ee7b5bdf038e527308de2d9efe
Signed-off-by: Kamil Krawczyk <kamil.krawczyk@intel.com>
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Shannon Nelson [Wed, 11 Dec 2013 08:17:14 +0000 (08:17 +0000)]
i40e: check MAC type before any REG access
We need to check if we are dealing with the correct MAC type before we
try to read anything from the registers.
Change-Id: I3989516999d06c3009e87d6a2eafc20af305c5c2
Signed-off-by: Kamil Krawczyk <kamil.krawczyk@intel.com>
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Shannon Nelson [Wed, 11 Dec 2013 08:17:13 +0000 (08:17 +0000)]
i40e: move PF ID init from PF reset to SC init
Move PF ID initialization code from PF reset routine to earlier driver
initialization function. There are a few operations which need the
PF ID before the first PF reset is called.
Change-Id: I7e971f7556b46a837149850ec05ce115c35db575
Signed-off-by: Kamil Krawczyk <kamil.krawczyk@intel.com>
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Shannon Nelson [Wed, 11 Dec 2013 08:17:12 +0000 (08:17 +0000)]
i40e: Reduce range of interrupt reg in reg test
Use a smaller range of test registers in MFP mode as there are fewer
resources than when in SFP mode.
Change-Id: I08424890c3f57b5dde5ee99e99724ce252e0875a
Signed-off-by: Kamil Krawczyk <kamil.krawczyk@intel.com>
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Shannon Nelson [Wed, 11 Dec 2013 08:17:11 +0000 (08:17 +0000)]
i40e: update firmware api to 1.1
The firmware's AdminQ interface has matured a little, so update the
code to use the new fields and values.
Change-Id: I8fcd7b443f268dcf9346bd6a9e940fe9c2958891
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Shannon Nelson [Wed, 11 Dec 2013 08:17:10 +0000 (08:17 +0000)]
i40e: Add code to wait for FW to complete in reset path
The RSTAT comes back with DEVICE READY to indicate that the HW is ready,
but we still have to wait for the FW to indicate that the Core and Global
modules are back up again. This needs a read of the NVM_ULD register.
Change-Id: I88276165f9cd446d2f166fb4b8cff00521af4bec
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Catherine Sullivan [Thu, 28 Nov 2013 06:42:42 +0000 (06:42 +0000)]
i40e: Bump version
Version update to 0.3.25-k
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Greg Rose [Sat, 7 Dec 2013 10:36:54 +0000 (10:36 +0000)]
i40e: Allow VF to set already assigned MAC address
The VF is allowed to request the PF to set its already assigned
MAC address without generating an error.
Change-Id: I8dfdf353396995dbbb26cafab4e42b451911da3d
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Greg Rose [Thu, 28 Nov 2013 06:42:40 +0000 (06:42 +0000)]
i40e: Stop accepting any VLAN tag on VLAN 0 filter set
When the 8021q driver is loaded it sets VLAN 0 filters on all devices.
This does not mean that any VLAN tagged packet should be accepted.
Instead accept only VLAN 0 tagged packets so that upper layers can
interpret the priority bits.
Change-Id: I17274a540b613749612ffe23a3aef2b8ee6ff6a4
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Greg Rose [Thu, 28 Nov 2013 06:42:39 +0000 (06:42 +0000)]
i40e: Do not enable broadcast promiscuous by default
Broadcast promiscuous should only be turned on when general
promiscuous mode is turned on, otherwise VLAN tagged packets out of
the assigned VLAN domain are received.
Add a broadcast MAC filter in order to continue to receive
broadcast traffic on VLANs, MAIN or VMDQ VSI.
Change-Id: I99d8e382a082ee51201228f1226af3b46452ac55
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Anjali Singhai Jain [Thu, 28 Nov 2013 06:42:38 +0000 (06:42 +0000)]
i40e: Expose AQ debugfs hooks
Add more functionality to debugfs to assist development
and testing of AQ commands.
adds:
send aq_cmd
send indirect aq_cmd
Change-Id: I01710cddd33110a6c1e1aabf84cb6e93cda4c42a
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Ying Xue [Wed, 8 Jan 2014 02:26:39 +0000 (10:26 +0800)]
net: xfrm: xfrm_policy: silence compiler warning
Fix below compiler warning:
net/xfrm/xfrm_policy.c:1644:12: warning: ‘xfrm_dst_alloc_copy’ defined but not used [-Wunused-function]
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 7 Jan 2014 23:44:35 +0000 (18:44 -0500)]
Merge branch 'tipc'
Jon Maloy says:
====================
tipc: link setup and failover improvements
This series consists of four unrelated commits with different purposes.
- Commit #1 is purely cosmetic and pedagogic, hopefully making the
failover/tunneling logics slightly easier to understand.
- Commit #2 fixes a bug that has always been in the code, but was not
discovered until very recently.
- Commit #3 fixes a non-fatal race issue in the neighbour discovery
code.
- Commit #4 removes an unnecessary indirection step during link
startup.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Paul Maloy [Tue, 7 Jan 2014 22:02:44 +0000 (17:02 -0500)]
tipc: make link start event synchronous
When a link is created we delay the start event by launching it
to be executed later in a tasklet. As we hold all the
necessary locks at the moment of creation, and there is no risk
of deadlock or contention, this delay serves no purpose in the
current code.
We remove this obsolete indirection step, and the associated function
link_start(). At the same time, we rename the function tipc_link_stop()
to the more appropriate tipc_link_purge_queues().
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ying Xue [Tue, 7 Jan 2014 22:02:43 +0000 (17:02 -0500)]
tipc: introduce new spinlock to protect struct link_req
Currently, only 'bearer_lock' is used to protect struct link_req in
the function disc_timeout(). This is unsafe, since the member fields
'num_nodes' and 'timer_intv' might be accessed by below three different
threads simultaneously, none of them grabbing bearer_lock in the
critical region:
link_activate()
tipc_bearer_add_dest()
tipc_disc_add_dest()
req->num_nodes++;
tipc_link_reset()
tipc_bearer_remove_dest()
tipc_disc_remove_dest()
req->num_nodes--
disc_update()
read req->num_nodes
write req->timer_intv
disc_timeout()
read req->num_nodes
read/write req->timer_intv
Without lock protection, the only symptom of a race is that discovery
messages occasionally may not be sent out. This is not fatal, since such
messages are best-effort anyway. On the other hand, since discovery
messages are not time critical, adding a protecting lock brings no
serious overhead either. So we add a new, dedicated spinlock in
order to guarantee absolute data consistency in link_req objects.
This also helps reduce the overall role of the bearer_lock, which
we want to remove completely in a later commit series.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Paul Maloy [Tue, 7 Jan 2014 22:02:42 +0000 (17:02 -0500)]
tipc: remove 'has_redundant_link' flag from STATE link protocol messages
The flag 'has_redundant_link' is defined only in RESET and ACTIVATE
protocol messages. Due to an ambiguity in the protocol specification it
is currently also transferred in STATE messages. Its value is used to
initialize a link state variable, 'permit_changeover', which is used
to inhibit futile link failover attempts when it is known that the
peer node has no working links at the moment, although the local node
may still think it has one.
The fact that 'has_redundant_link' incorrectly is read from STATE
messages has the effect that 'permit_changeover' sometimes gets a wrong
value, and permanently blocks any links from being re-established. Such
failures can only occur in in dual-link systems, and are extremely rare.
This bug seems to have always been present in the code.
Furthermore, since commit
b4b5610223f17790419b03eaa962b0e3ecf930d7
("tipc: Ensure both nodes recognize loss of contact between them"),
the 'permit_changeover' field serves no purpose any more. The task of
enforcing 'lost contact' cycles at both peer endpoints is now taken
by a new mechanism, using the flags WAIT_NODE_DOWN and WAIT_PEER_DOWN
in struct tipc_node to abort unnecessary failover attempts.
We therefore remove the 'has_redundant_link' flag from STATE messages,
as well as the now redundant 'permit_changeover' variable.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Paul Maloy [Tue, 7 Jan 2014 22:02:41 +0000 (17:02 -0500)]
tipc: rename functions related to link failover and improve comments
The functionality related to link addition and failover is unnecessarily
hard to understand and maintain. We try to improve this by renaming
some of the functions, at the same time adding or improving the
explanatory comments around them. Names such as "tipc_rcv()" etc. also
align better with what is used in other networking components.
The changes in this commit are purely cosmetic, no functional changes
are made.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Tue, 7 Jan 2014 22:23:44 +0000 (23:23 +0100)]
net: skbuff: const-ify casts in skb_queue_* functions
We should const-ify comparisons on skb_queue_* inline helper
functions as their parameters are const as well, so lets not
drop that.
Suggested-by: Brad Spengler <spender@grsecurity.net>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Tue, 7 Jan 2014 22:20:27 +0000 (23:20 +0100)]
net: xfrm: xfrm_policy: fix inline not at beginning of declaration
Fix three warnings related to:
net/xfrm/xfrm_policy.c:1644:1: warning: 'inline' is not at beginning of declaration [-Wold-style-declaration]
net/xfrm/xfrm_policy.c:1656:1: warning: 'inline' is not at beginning of declaration [-Wold-style-declaration]
net/xfrm/xfrm_policy.c:1668:1: warning: 'inline' is not at beginning of declaration [-Wold-style-declaration]
Just removing the inline keyword is sufficient as the compiler will
decide on its own about inlining or not.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shawn Bohrer [Tue, 7 Jan 2014 18:49:17 +0000 (12:49 -0600)]
mlx4_en: Select PTP_1588_CLOCK
Now that mlx4_en includes a PHC driver it must select PTP_1588_CLOCK.
drivers/built-in.o: In function `mlx4_en_get_ts_info':
>> en_ethtool.c:(.text+0x391a11): undefined reference to `ptp_clock_index'
drivers/built-in.o: In function `mlx4_en_remove_timestamp':
>> (.text+0x397913): undefined reference to `ptp_clock_unregister'
drivers/built-in.o: In function `mlx4_en_init_timestamp':
>> (.text+0x397b20): undefined reference to `ptp_clock_register'
Fixes:
ad7d4eaed995d ("mlx4_en: Add PTP hardware clock")
Signed-off-by: Shawn Bohrer <sbohrer@rgmadvisors.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jerry Chu [Tue, 7 Jan 2014 18:23:19 +0000 (10:23 -0800)]
net-gre-gro: Add GRE support to the GRO stack
This patch built on top of Commit
299603e8370a93dd5d8e8d800f0dff1ce2c53d36
("net-gro: Prepare GRO stack for the upcoming tunneling support") to add
the support of the standard GRE (RFC1701/RFC2784/RFC2890) to the GRO
stack. It also serves as an example for supporting other encapsulation
protocols in the GRO stack in the future.
The patch supports version 0 and all the flags (key, csum, seq#) but
will flush any pkt with the S (seq#) flag. This is because the S flag
is not support by GSO, and a GRO pkt may end up in the forwarding path,
thus requiring GSO support to break it up correctly.
Currently the "packet_offload" structure only contains L3 (ETH_P_IP/
ETH_P_IPV6) GRO offload support so the encapped pkts are limited to
IP pkts (i.e., w/o L2 hdr). But support for other protocol type can
be easily added, so is the support for GRE variations like NVGRE.
The patch also support csum offload. Specifically if the csum flag is on
and the h/w is capable of checksumming the payload (CHECKSUM_COMPLETE),
the code will take advantage of the csum computed by the h/w when
validating the GRE csum.
Note that commit
60769a5dcd8755715c7143b4571d5c44f01796f1 "ipv4: gre:
add GRO capability" already introduces GRO capability to IPv4 GRE
tunnels, using the gro_cells infrastructure. But GRO is done after
GRE hdr has been removed (i.e., decapped). The following patch applies
GRO when pkts first come in (before hitting the GRE tunnel code). There
is some performance advantage for applying GRO as early as possible.
Also this approach is transparent to other subsystem like Open vSwitch
where GRE decap is handled outside of the IP stack hence making it
harder for the gro_cells stuff to apply. On the other hand, some NICs
are still not capable of hashing on the inner hdr of a GRE pkt (RSS).
In that case the GRO processing of pkts from the same remote host will
all happen on the same CPU and the performance may be suboptimal.
I'm including some rough preliminary performance numbers below. Note
that the performance will be highly dependent on traffic load, mix as
usual. Moreover it also depends on NIC offload features hence the
following is by no means a comprehesive study. Local testing and tuning
will be needed to decide the best setting.
All tests spawned 50 copies of netperf TCP_STREAM and ran for 30 secs.
(super_netperf 50 -H 192.168.1.18 -l 30)
An IP GRE tunnel with only the key flag on (e.g., ip tunnel add gre1
mode gre local 10.246.17.18 remote 10.246.17.17 ttl 255 key 123)
is configured.
The GRO support for pkts AFTER decap are controlled through the device
feature of the GRE device (e.g., ethtool -K gre1 gro on/off).
1.1 ethtool -K gre1 gro off; ethtool -K eth0 gro off
thruput: 9.16Gbps
CPU utilization: 19%
1.2 ethtool -K gre1 gro on; ethtool -K eth0 gro off
thruput: 5.9Gbps
CPU utilization: 15%
1.3 ethtool -K gre1 gro off; ethtool -K eth0 gro on
thruput: 9.26Gbps
CPU utilization: 12-13%
1.4 ethtool -K gre1 gro on; ethtool -K eth0 gro on
thruput: 9.26Gbps
CPU utilization: 10%
The following tests were performed on a different NIC that is capable of
csum offload. I.e., the h/w is capable of computing IP payload csum
(CHECKSUM_COMPLETE).
2.1 ethtool -K gre1 gro on (hence will use gro_cells)
2.1.1 ethtool -K eth0 gro off; csum offload disabled
thruput: 8.53Gbps
CPU utilization: 9%
2.1.2 ethtool -K eth0 gro off; csum offload enabled
thruput: 8.97Gbps
CPU utilization: 7-8%
2.1.3 ethtool -K eth0 gro on; csum offload disabled
thruput: 8.83Gbps
CPU utilization: 5-6%
2.1.4 ethtool -K eth0 gro on; csum offload enabled
thruput: 8.98Gbps
CPU utilization: 5%
2.2 ethtool -K gre1 gro off
2.2.1 ethtool -K eth0 gro off; csum offload disabled
thruput: 5.93Gbps
CPU utilization: 9%
2.2.2 ethtool -K eth0 gro off; csum offload enabled
thruput: 5.62Gbps
CPU utilization: 8%
2.2.3 ethtool -K eth0 gro on; csum offload disabled
thruput: 7.69Gbps
CPU utilization: 8%
2.2.4 ethtool -K eth0 gro on; csum offload enabled
thruput: 8.96Gbps
CPU utilization: 5-6%
Signed-off-by: H.K. Jerry Chu <hkchu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Benjamin Poirier [Tue, 7 Jan 2014 15:11:10 +0000 (10:11 -0500)]
net: Do not enable tx-nocache-copy by default
There are many cases where this feature does not improve performance or even
reduces it.
For example, here are the results from tests that I've run using 3.12.6 on one
Intel Xeon W3565 and one i7 920 connected by ixgbe adapters. The results are
from the Xeon, but they're similar on the i7. All numbers report the
mean±stddev over 10 runs of 10s.
1) latency tests similar to what is described in "c6e1a0d net: Allow no-cache
copy from user on transmit"
There is no statistically significant difference between tx-nocache-copy
on/off.
nic irqs spread out (one queue per cpu)
200x netperf -r 1400,1
tx-nocache-copy off
692000±1000 tps
50/90/95/99% latency (us): 275±2/643.8±0.4/799±1/2474.4±0.3
tx-nocache-copy on
693000±1000 tps
50/90/95/99% latency (us): 274±1/644.1±0.7/800±2/2474.5±0.7
200x netperf -r 14000,14000
tx-nocache-copy off
86450±80 tps
50/90/95/99% latency (us): 334.37±0.02/838±1/2100±20/3990±40
tx-nocache-copy on
86110±60 tps
50/90/95/99% latency (us): 334.28±0.01/837±2/2110±20/3990±20
2) single stream throughput tests
tx-nocache-copy leads to higher service demand
throughput cpu0 cpu1 demand
(Gb/s) (Gcycle) (Gcycle) (cycle/B)
nic irqs and netperf on cpu0 (1x netperf -T0,0 -t omni -- -d send)
tx-nocache-copy off 9402±5 9.4±0.2 0.80±0.01
tx-nocache-copy on 9403±3 9.85±0.04 0.838±0.004
nic irqs on cpu0, netperf on cpu1 (1x netperf -T1,1 -t omni -- -d send)
tx-nocache-copy off 9401±5 5.83±0.03 5.0±0.1 0.923±0.007
tx-nocache-copy on 9404±2 5.74±0.03 5.523±0.009 0.958±0.002
As a second example, here are some results from Eric Dumazet with latest
net-next.
tx-nocache-copy also leads to higher service demand
(cpu is Intel(R) Xeon(R) CPU X5660 @ 2.80GHz)
lpq83:~# ./ethtool -K eth0 tx-nocache-copy on
lpq83:~# perf stat ./netperf -H lpq84 -c
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpq84.prod.google.com () port 0 AF_INET
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB
87380 16384 16384 10.00 9407.44 2.50 -1.00 0.522 -1.000
Performance counter stats for './netperf -H lpq84 -c':
4282.648396 task-clock # 0.423 CPUs utilized
9,348 context-switches # 0.002 M/sec
88 CPU-migrations # 0.021 K/sec
355 page-faults # 0.083 K/sec
11,812,797,651 cycles # 2.758 GHz [82.79%]
9,020,522,817 stalled-cycles-frontend # 76.36% frontend cycles idle [82.54%]
4,579,889,681 stalled-cycles-backend # 38.77% backend cycles idle [67.33%]
6,053,172,792 instructions # 0.51 insns per cycle
# 1.49 stalled cycles per insn [83.64%]
597,275,583 branches # 139.464 M/sec [83.70%]
8,960,541 branch-misses # 1.50% of all branches [83.65%]
10.
128990264 seconds time elapsed
lpq83:~# ./ethtool -K eth0 tx-nocache-copy off
lpq83:~# perf stat ./netperf -H lpq84 -c
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpq84.prod.google.com () port 0 AF_INET
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB
87380 16384 16384 10.00 9412.45 2.15 -1.00 0.449 -1.000
Performance counter stats for './netperf -H lpq84 -c':
2847.375441 task-clock # 0.281 CPUs utilized
11,632 context-switches # 0.004 M/sec
49 CPU-migrations # 0.017 K/sec
354 page-faults # 0.124 K/sec
7,646,889,749 cycles # 2.686 GHz [83.34%]
6,115,050,032 stalled-cycles-frontend # 79.97% frontend cycles idle [83.31%]
1,726,460,071 stalled-cycles-backend # 22.58% backend cycles idle [66.55%]
2,079,702,453 instructions # 0.27 insns per cycle
# 2.94 stalled cycles per insn [83.22%]
363,773,213 branches # 127.757 M/sec [83.29%]
4,242,732 branch-misses # 1.17% of all branches [83.51%]
10.
128449949 seconds time elapsed
CC: Tom Herbert <therbert@google.com>
Signed-off-by: Benjamin Poirier <bpoirier@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Tue, 7 Jan 2014 14:55:45 +0000 (15:55 +0100)]
ipv4: loopback device: ignore value changes after device is upped
When lo is brought up, new ifa is created. Then, devconf and neigh values
bitfield should be set so later changes of default values would not
affect lo values.
Note that the same behaviour is in ipv6. Also note that this is likely
not an issue in many distros (for example Fedora 19) because userspace
sets address to lo manually before bringing it up.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
FX Le Bail [Tue, 7 Jan 2014 13:57:27 +0000 (14:57 +0100)]
IPv6: add the option to use anycast addresses as source addresses in echo reply
This change allows to follow a recommandation of RFC4942.
- Add "anycast_src_echo_reply" sysctl to control the use of anycast addresses
as source addresses for ICMPv6 echo reply. This sysctl is false by default
to preserve existing behavior.
- Add inline check ipv6_anycast_destination().
- Use them in icmpv6_echo_reply().
Reference:
RFC4942 - IPv6 Transition/Coexistence Security Considerations
(http://tools.ietf.org/html/rfc4942#section-2.1.6)
2.1.6. Anycast Traffic Identification and Security
[...]
To avoid exposing knowledge about the internal structure of the
network, it is recommended that anycast servers now take advantage of
the ability to return responses with the anycast address as the
source address if possible.
Signed-off-by: Francois-Xavier Le Bail <fx.lebail@yahoo.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yongjun [Tue, 7 Jan 2014 08:56:07 +0000 (16:56 +0800)]
net/mlx4_en: fix error return code in mlx4_en_get_qp()
Fix to return a negative error code from the error handling
case instead of 0.
Fixes:
837052d0ccc5 ('net/mlx4_en: Add netdev support for TCP/IP offloads of vxlan tunneling')
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hayes Wang [Tue, 7 Jan 2014 03:18:22 +0000 (11:18 +0800)]
r8152: correct some messages
- Replace pr_warn_ratelimited() with net_ratelimit() and netdev_warn().
- Adjust the algnment of some messages.
- Remove the peroid.
- Fix some messages don't have terminating newline.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 7 Jan 2014 01:37:41 +0000 (20:37 -0500)]
bna: Fix build due to missing use of dma_unmap_len_set()
> as reported for linux-next of Dec.20, 2013
> when CONFIG_NEED_DMA_MAP_STATE is not enabled:
>
> drivers/net/ethernet/brocade/bna/bnad.c: In function 'bnad_start_xmit':
> drivers/net/ethernet/brocade/bna/bnad.c:3074:26: error: 'struct bnad_tx_vector' has no member named 'dma_len'
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 6 Jan 2014 22:03:07 +0000 (14:03 -0800)]
gre_offload: statically build GRE offloading support
GRO/GSO layers can be enabled on a node, even if said
node is only forwarding packets.
This patch permits GSO (and upcoming GRO) support for GRE
encapsulated packets, even if the host has no GRE tunnel setup.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: H.K. Jerry Chu <hkchu@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hauke Mehrtens [Mon, 6 Jan 2014 22:24:29 +0000 (23:24 +0100)]
bgmac: fix typos
This fixes some typos found by Sergei.
Reported-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 7 Jan 2014 00:48:38 +0000 (19:48 -0500)]
Merge branch 'master' of git://git./linux/kernel/git/jesse/openvswitch
Jesse Gross says:
====================
[GIT net-next] Open vSwitch
Open vSwitch changes for net-next/3.14. Highlights are:
* Performance improvements in the mechanism to get packets to userspace
using memory mapped netlink and skb zero copy where appropriate.
* Per-cpu flow stats in situations where flows are likely to be shared
across CPUs. Standard flow stats are used in other situations to save
memory and allocation time.
* A handful of code cleanups and rationalization.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Hemminger [Tue, 17 Dec 2013 19:22:48 +0000 (19:22 +0000)]
ovs: make functions local
Several functions and datastructures could be local
Found with 'make namespacecheck'
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Thomas Graf [Fri, 13 Dec 2013 14:22:22 +0000 (15:22 +0100)]
openvswitch: Compute checksum in skb_gso_segment() if needed
The copy & csum optimization is no longer present with zerocopy
enabled. Compute the checksum in skb_gso_segment() directly by
dropping the HW CSUM capability from the features passed in.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Thomas Graf [Fri, 13 Dec 2013 14:22:21 +0000 (15:22 +0100)]
openvswitch: Use skb_zerocopy() for upcall
Use of skb_zerocopy() can avoid the expensive call to memcpy()
when copying the packet data into the Netlink skb. Completes
checksum through skb_checksum_help() if not already done in
GSO segmentation.
Zerocopy is only performed if user space supported unaligned
Netlink messages. memory mapped netlink i/o is preferred over
zerocopy if it is set up.
Cost of upcall is significantly reduced from:
+ 7.48% vhost-8471 [k] memcpy
+ 5.57% ovs-vswitchd [k] memcpy
+ 2.81% vhost-8471 [k] csum_partial_copy_generic
to:
+ 5.72% ovs-vswitchd [k] memcpy
+ 3.32% vhost-5153 [k] memcpy
+ 0.68% vhost-5153 [k] skb_zerocopy
(megaflows disabled)
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Thomas Graf [Fri, 13 Dec 2013 14:22:20 +0000 (15:22 +0100)]
openvswitch: Pass datapath into userspace queue functions
Allows removing the net and dp_ifindex argument and simplify the
code.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Thomas Graf [Fri, 13 Dec 2013 14:22:19 +0000 (15:22 +0100)]
openvswitch: Drop user features if old user space attempted to create datapath
Drop user features if an outdated user space instance that does not
understand the concept of user_features attempted to create a new
datapath.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Thomas Graf [Fri, 13 Dec 2013 14:22:18 +0000 (15:22 +0100)]
openvswitch: Allow user space to announce ability to accept unaligned Netlink messages
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Thomas Graf [Fri, 13 Dec 2013 14:22:17 +0000 (15:22 +0100)]
net: Export skb_zerocopy() to zerocopy from one skb to another
Make the skb zerocopy logic written for nfnetlink queue available for
use by other modules.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Wei Yongjun [Mon, 16 Dec 2013 06:06:15 +0000 (14:06 +0800)]
openvswitch: remove duplicated include from flow_table.c
Remove duplicated include.
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Daniel Borkmann [Tue, 10 Dec 2013 11:02:03 +0000 (12:02 +0100)]
net: ovs: use kfree_rcu instead of rcu_free_{sw_flow_mask_cb,acts_callback}
As we're only doing a kfree() anyway in the RCU callback, we can
simply use kfree_rcu, which does the same job, and remove the
function rcu_free_sw_flow_mask_cb() and rcu_free_acts_callback().
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Pravin B Shelar [Wed, 30 Oct 2013 00:22:21 +0000 (17:22 -0700)]
openvswitch: Per cpu flow stats.
With mega flow implementation ovs flow can be shared between
multiple CPUs which makes stats updates highly contended
operation. This patch uses per-CPU stats in cases where a flow
is likely to be shared (if there is a wildcard in the 5-tuple
and therefore likely to be spread by RSS). In other situations,
it uses the current strategy, saving memory and allocation time.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Thomas Graf [Sat, 30 Nov 2013 12:21:32 +0000 (13:21 +0100)]
openvswitch: Enable memory mapped Netlink i/o
Use memory mapped Netlink i/o for all unicast openvswitch
communication if a ring has been set up.
Benchmark
* pktgen -> ovs internal port
* 5M pkts, 5M flows
* 4 threads, 8 cores
Before:
Result: OK:
67418743(
c67108212+d310530) usec, 5000000 (9000byte,0frags)
74163pps 5339Mb/sec (5339736000bps) errors: 0
+ 2.98% ovs-vswitchd [k] copy_user_generic_string
+ 2.49% ovs-vswitchd [k] memcpy
+ 1.84% kpktgend_2 [k] memcpy
+ 1.81% kpktgend_1 [k] memcpy
+ 1.81% kpktgend_3 [k] memcpy
+ 1.78% kpktgend_0 [k] memcpy
After:
Result: OK:
24229690(
c24127165+d102524) usec, 5000000 (9000byte,0frags)
206358pps 14857Mb/sec (14857776000bps) errors: 0
+ 2.80% ovs-vswitchd [k] memcpy
+ 1.31% kpktgend_2 [k] memcpy
+ 1.23% kpktgend_0 [k] memcpy
+ 1.09% kpktgend_1 [k] memcpy
+ 1.04% kpktgend_3 [k] memcpy
+ 0.96% ovs-vswitchd [k] copy_user_generic_string
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Thomas Graf [Sat, 30 Nov 2013 12:21:31 +0000 (13:21 +0100)]
netlink: Avoid netlink mmap alloc if msg size exceeds frame size
An insufficent ring frame size configuration can lead to an
unnecessary skb allocation for every Netlink message. Check frame
size before taking the queue lock and allocating the skb and
re-check with lock to be safe.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Thomas Graf [Sat, 30 Nov 2013 12:21:30 +0000 (13:21 +0100)]
genl: Add genlmsg_new_unicast() for unicast message allocation
Allocates a new sk_buff large enough to cover the specified payload
plus required Netlink headers. Will check receiving socket for
memory mapped i/o capability and use it if enabled. Will fall back
to non-mapped skb if message size exceeds the frame size of the ring.
Signed-of-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Jesse Gross [Tue, 3 Dec 2013 18:58:53 +0000 (10:58 -0800)]
openvswitch: Silence RCU lockdep checks from flow lookup.
Flow lookup can happen either in packet processing context or userspace
context but it was annotated as requiring RCU read lock to be held. This
also allows OVS mutex to be held without causing warnings.
Reported-by: Justin Pettit <jpettit@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Reviewed-by: Thomas Graf <tgraf@redhat.com>
Andy Zhou [Mon, 25 Nov 2013 18:42:46 +0000 (10:42 -0800)]
openvswitch: Change ovs_flow_tbl_lookup_xx() APIs
API changes only for code readability. No functional chnages.
This patch removes the underscored version. Added a new API
ovs_flow_tbl_lookup_stats() that returns the n_mask_hits.
Reported by: Ben Pfaff <blp@nicira.com>
Reviewed-by: Thomas Graf <tgraf@redhat.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Ben Pfaff [Mon, 25 Nov 2013 18:41:28 +0000 (10:41 -0800)]
openvswitch: Shrink sw_flow_mask by 8 bytes (64-bit) or 4 bytes (32-bit).
We won't normally have a ton of flow masks but using a size_t to store
values no bigger than sizeof(struct sw_flow_key) seems excessive.
This reduces sw_flow_key_range and sw_flow_mask by 4 bytes on 32-bit
systems. On 64-bit systems it shrinks sw_flow_key_range by 12 bytes but
sw_flow_mask only by 8 bytes due to padding.
Compile tested only.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Ben Pfaff [Mon, 25 Nov 2013 18:40:51 +0000 (10:40 -0800)]
openvswitch: Correct comment.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
David S. Miller [Mon, 6 Jan 2014 22:37:45 +0000 (17:37 -0500)]
Merge branch 'master' of git://git./linux/kernel/git/davem/net
Conflicts:
drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_pf.c
net/ipv6/ip6_tunnel.c
net/ipv6/ip6_vti.c
ipv6 tunnel statistic bug fixes conflicting with consolidation into
generic sw per-cpu net stats.
qlogic conflict between queue counting bug fix and the addition
of multiple MAC address support.
Signed-off-by: David S. Miller <davem@davemloft.net>
Jamal Hadi Salim [Mon, 23 Dec 2013 13:02:13 +0000 (08:02 -0500)]
net_sched: act: action flushing missaccounting
action flushing missaccounting
Account only for deleted actions
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jamal Hadi Salim [Mon, 23 Dec 2013 13:02:12 +0000 (08:02 -0500)]
net_sched: Remove unnecessary checks for act->ops
Remove unnecessary checks for act->ops
(suggested by Eric Dumazet).
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sfeldma@cumulusnetworks.com [Mon, 6 Jan 2014 19:00:44 +0000 (11:00 -0800)]
bridge: use DEVICE_ATTR_xx macros
Use DEVICE_ATTR_RO/RW macros to simplify bridge sysfs attribute definitions.
Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Curt Brune [Mon, 6 Jan 2014 19:00:32 +0000 (11:00 -0800)]
bridge: use spin_lock_bh() in br_multicast_set_hash_max
br_multicast_set_hash_max() is called from process context in
net/bridge/br_sysfs_br.c by the sysfs store_hash_max() function.
br_multicast_set_hash_max() calls spin_lock(&br->multicast_lock),
which can deadlock the CPU if a softirq that also tries to take the
same lock interrupts br_multicast_set_hash_max() while the lock is
held . This can happen quite easily when any of the bridge multicast
timers expire, which try to take the same lock.
The fix here is to use spin_lock_bh(), preventing other softirqs from
executing on this CPU.
Steps to reproduce:
1. Create a bridge with several interfaces (I used 4).
2. Set the "multicast query interval" to a low number, like 2.
3. Enable the bridge as a multicast querier.
4. Repeatedly set the bridge hash_max parameter via sysfs.
# brctl addbr br0
# brctl addif br0 eth1 eth2 eth3 eth4
# brctl setmcqi br0 2
# brctl setmcquerier br0 1
# while true ; do echo 4096 > /sys/class/net/br0/bridge/hash_max; done
Signed-off-by: Curt Brune <curt@cumulusnetworks.com>
Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 6 Jan 2014 17:54:31 +0000 (09:54 -0800)]
vxlan: keep original skb ownership
Sathya Perla posted a patch trying to address following problem :
<quote>
The vxlan driver sets itself as the socket owner for all the TX flows
it encapsulates (using vxlan_set_owner()) and assigns it's own skb
destructor. This causes all tunneled traffic to land up on only one TXQ
as all encapsulated skbs refer to the vxlan socket and not the original
socket. Also, the vxlan skb destructor breaks some functionality for
tunneled traffic like wmem accounting and as TCP small queues and
FQ/pacing packet scheduler.
</quote>
I reworked Sathya patch and added some explanations.
vxlan_xmit() can avoid one skb_clone()/dev_kfree_skb() pair
and gain better drop monitor accuracy, by calling kfree_skb() when
appropriate.
The UDP socket used by vxlan to perform encapsulation of xmit packets
do not need to be alive while packets leave vxlan code. Its better
to keep original socket ownership to get proper feedback from qdisc and
NIC layers.
We use skb->sk to
A) control amount of bytes/packets queued on behalf of a socket, but
prior vxlan code did the skb->sk transfert without any limit/control
on vxlan socket sk_sndbuf.
B) security purposes (as selinux) or netfilter uses, and I do not think
anything is prepared to handle vxlan stacked case in this area.
By not changing ownership, vxlan tunnels behave like other tunnels.
As Stephen mentioned, we might do the same change in L2TP.
Reported-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 6 Jan 2014 17:36:12 +0000 (09:36 -0800)]
tcp: out_of_order_queue do not use its lock
TCP out_of_order_queue lock is not used, as queue manipulation
happens with socket lock held and we therefore use the lockless
skb queue routines (as __skb_queue_head())
We can use __skb_queue_head_init() instead of skb_queue_head_init()
to make this more consistent.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hannes Frederic Sowa [Mon, 6 Jan 2014 16:53:14 +0000 (17:53 +0100)]
ipv6: don't install anycast address for /128 addresses on routers
It does not make sense to create an anycast address for an /128-prefix.
Suppress it.
As
32019e651c6fce ("ipv6: Do not leave router anycast address for /127
prefixes.") shows we also may not leave them, because we could accidentally
remove an anycast address the user has allocated or got added via another
prefix.
Cc: François-Xavier Le Bail <fx.lebail@yahoo.com>
Cc: Thomas Haller <thaller@redhat.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Williams [Mon, 6 Jan 2014 16:07:29 +0000 (10:07 -0600)]
hso: fix handling of modem port SERIAL_STATE notifications
The existing serial state notification handling expected older Option
devices, having a hardcoded assumption that the Modem port was always
USB interface #2. That isn't true for devices from the past few years.
hso_serial_state_notification is a local cache of a USB Communications
Interface Class SERIAL_STATE notification from the device, and the
USB CDC specification (section 6.3, table 67 "Class-Specific Notifications")
defines wIndex as the USB interface the event applies to. For hso
devices this will always be the Modem port, as the Modem port is the
only port which is set up to receive them by the driver.
So instead of always expecting USB interface #2, instead validate the
notification with the actual USB interface number of the Modem port.
Signed-off-by: Dan Williams <dcbw@redhat.com>
Tested-by: H. Nikolaus Schaller <hns@goldelico.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Veaceslav Falico [Mon, 6 Jan 2014 10:54:40 +0000 (11:54 +0100)]
bonding: fix kstrtou8() return value verification in num_peer_notif
It returns 0 in case of success, !0 error otherwise. Fix the improper error
verification.
Fixes:
2c9839c143bbc ("bonding: add num_grat_arp attribute netlink support")
CC: sfeldma@cumulusnetworks.com
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Acked-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
hayeswang [Mon, 6 Jan 2014 09:08:43 +0000 (17:08 +0800)]
r8152: replace the return value of rtl_ops_init
Replace the boolean value with the error code for the return value
of the rtl_ops_init().
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
hayeswang [Mon, 6 Jan 2014 09:08:42 +0000 (17:08 +0800)]
r8152: move the actions of saving the information of the device
Some information of the device may be used in other functions. Move
the relative code to make sure it would be initialzed correctly
before using it.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
hayeswang [Mon, 6 Jan 2014 09:08:41 +0000 (17:08 +0800)]
r8152: replace some tabs with spaces
Replace the tabs of the variables declaration with the spaces.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Guenter Roeck [Mon, 6 Jan 2014 04:31:39 +0000 (20:31 -0800)]
isdn: Drop big endian cpp checks from telespci and hfc_pci drivers
With arm:allmodconfig, building the Teles PCI driver fails with
telespci.c:294:2: error: #error "not running on big endian machines now"
Similar, building the driver for HFC PCI-Bus cards fails with
hfc_pci.c:1647:2: error: #error "not running on big endian machines now"
Remove the big endian cpp check from both drivers to fix the build errors.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vijay Subramanian [Sun, 5 Jan 2014 01:33:55 +0000 (17:33 -0800)]
net: pkt_sched: PIE AQM scheme
Proportional Integral controller Enhanced (PIE) is a scheduler to address the
bufferbloat problem.
>From the IETF draft below:
" Bufferbloat is a phenomenon where excess buffers in the network cause high
latency and jitter. As more and more interactive applications (e.g. voice over
IP, real time video streaming and financial transactions) run in the Internet,
high latency and jitter degrade application performance. There is a pressing
need to design intelligent queue management schemes that can control latency and
jitter; and hence provide desirable quality of service to users.
We present here a lightweight design, PIE(Proportional Integral controller
Enhanced) that can effectively control the average queueing latency to a target
value. Simulation results, theoretical analysis and Linux testbed results have
shown that PIE can ensure low latency and achieve high link utilization under
various congestion situations. The design does not require per-packet
timestamp, so it incurs very small overhead and is simple enough to implement
in both hardware and software. "
Many thanks to Dave Taht for extensive feedback, reviews, testing and
suggestions. Thanks also to Stephen Hemminger and Eric Dumazet for reviews and
suggestions. Naeem Khademi and Dave Taht independently contributed to ECN
support.
For more information, please see technical paper about PIE in the IEEE
Conference on High Performance Switching and Routing 2013. A copy of the paper
can be found at ftp://ftpeng.cisco.com/pie/.
Please also refer to the IETF draft submission at
http://tools.ietf.org/html/draft-pan-tsvwg-pie-00
All relevant code, documents and test scripts and results can be found at
ftp://ftpeng.cisco.com/pie/.
For problems with the iproute2/tc or Linux kernel code, please contact Vijay
Subramanian (vijaynsu@cisco.com or subramanian.vijay@gmail.com) Mythili Prabhu
(mysuryan@cisco.com)
Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
Signed-off-by: Mythili Prabhu <mysuryan@cisco.com>
CC: Dave Taht <dave.taht@bufferbloat.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 6 Jan 2014 18:36:06 +0000 (13:36 -0500)]
netfilter: Fix build failure in nfnetlink_queue_core.c.
net/netfilter/nfnetlink_queue_core.c: In function 'nfqnl_put_sk_uidgid':
net/netfilter/nfnetlink_queue_core.c:304:35: error: 'TCP_TIME_WAIT' undeclared (first use in this function)
net/netfilter/nfnetlink_queue_core.c:304:35: note: each undeclared identifier is reported only once for each function it appears in
make[3]: *** [net/netfilter/nfnetlink_queue_core.o] Error 1
Just a missing include of net/tcp_states.h
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 6 Jan 2014 18:29:30 +0000 (13:29 -0500)]
Merge branch 'master' of git://git./linux/kernel/git/pablo/nftables
Pablo Neira Ayuso says: <pablo@netfilter.org>
====================
nftables updates for net-next
The following patchset contains nftables updates for your net-next tree,
they are:
* Add set operation to the meta expression by means of the select_ops()
infrastructure, this allows us to set the packet mark among other things.
From Arturo Borrero Gonzalez.
* Fix wrong format in sscanf in nf_tables_set_alloc_name(), from Daniel
Borkmann.
* Add new queue expression to nf_tables. These comes with two previous patches
to prepare this new feature, one to add mask in nf_tables_core to
evaluate the queue verdict appropriately and another to refactor common
code with xt_NFQUEUE, from Eric Leblond.
* Do not hide nftables from Kconfig if nfnetlink is not enabled, also from
Eric Leblond.
* Add the reject expression to nf_tables, this adds the missing TCP RST
support. It comes with an initial patch to refactor common code with
xt_NFQUEUE, again from Eric Leblond.
* Remove an unused variable assignment in nf_tables_dump_set(), from Michal
Nazarewicz.
* Remove the nft_meta_target code, now that Arturo added the set operation
to the meta expression, from me.
* Add help information for nf_tables to Kconfig, also from me.
* Allow to dump all sets by specifying NFPROTO_UNSPEC, similar feature is
available to other nf_tables objects, requested by Arturo, from me.
* Expose the table usage counter, so we can know how many chains are using
this table without dumping the list of chains, from Tomasz Bursztyka.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 6 Jan 2014 18:25:58 +0000 (13:25 -0500)]
Merge branch 'master' of git://git./linux/kernel/git/jkirsher/net-next
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates
This series contains updates to i40e only.
Majority of this series contains patches from Greg and Mitch to fix
up or add functionality to the PF/VF driver interactions. Notably,
a fix for SR-IOV VF port VLAN which resolved the problem of port VLAN
configurations not being persistent across VF driver loads and unloads
and enable/disable of the feature. Also do not enable the default port
on the VEB, which is designed only to bridge the PF to an Open vSwitch
or bridge. Another fix to resolve a possible memory corruption
condition where ARQ messages are written to random memory locations.
Fix a problem where the 'ip link show' command would display stale
link address information after the link address was set via the 'ip
link set' command.
Anjali provides several patches, one which saves information that can
be used while cleaning the Tx ring and useful in detecting Tx hangs.
Then provides a fixes to the admin queue shutdown function to ensure
we are shutting down the queue in the shutdown path and ensure ASQ is
alive before issuing the admin queue command.
Shannon provides a fix for get/update vsi params where the incorrect
struct was being used.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 6 Jan 2014 18:09:26 +0000 (13:09 -0500)]
Merge branch 'be2net'
Sathya Perla says:
====================
be2net: patch set
Pls apply the following bug fixes to the 'net' tree. Thanks.
Suresh Reddy (2):
be2net: increase the timeout value for loopback-test FW cmd
be2net: fix max_evt_qs calculation for BE3 in SR-IOV config
Vasundhara Volam (1):
be2net: disable RSS when number of RXQs is reduced to 1 via
set-channels
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Suresh Reddy [Mon, 6 Jan 2014 07:32:25 +0000 (13:02 +0530)]
be2net: fix max_evt_qs calculation for BE3 in SR-IOV config
The driver wrongly assumes 16 EQs/vectors are available for each BE3 PF.
When SR-IOV is enabled, a BE3 PF can support only a max of 8 EQs.
Signed-off-by: Suresh Reddy <suresh.reddy@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Suresh Reddy [Mon, 6 Jan 2014 07:32:24 +0000 (13:02 +0530)]
be2net: increase the timeout value for loopback-test FW cmd
The loopback test FW cmd may need upto 15 seconds to complete on
certain PHYs. This patch also fixes the name of the completion variable
used to synchronize FW cmd completions as it not used by the flashing
cmd alone anymore.
Signed-off-by: Suresh Reddy <suresh.reddy@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vasundhara Volam [Mon, 6 Jan 2014 07:32:23 +0000 (13:02 +0530)]
be2net: disable RSS when number of RXQs is reduced to 1 via set-channels
When *only* the default RXQ is used, the RSS policy must be disabled so
that all IP and no-IP traffic is placed into the default RXQ. If not,
IP traffic is dropped.
Also, issue the RSS_CONFIG cmd only if FW advertises RSS capability for
the interface.
Signed-off-by: Vasundhara Volam <vasundhara.volam@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Sun, 5 Jan 2014 23:57:54 +0000 (00:57 +0100)]
netfilter: nf_conntrack_dccp: fix skb_header_pointer API usages
Some occurences in the netfilter tree use skb_header_pointer() in
the following way ...
struct dccp_hdr _dh, *dh;
...
skb_header_pointer(skb, dataoff, sizeof(_dh), &dh);
... where dh itself is a pointer that is being passed as the copy
buffer. Instead, we need to use &_dh as the forth argument so that
we're copying the data into an actual buffer that sits on the stack.
Currently, we probably could overwrite memory on the stack (e.g.
with a possibly mal-formed DCCP packet), but unintentionally, as
we only want the buffer to be placed into _dh variable.
Fixes:
2bc780499aa3 ("[NETFILTER]: nf_conntrack: add DCCP protocol support")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Daniel Borkmann [Mon, 6 Jan 2014 00:04:16 +0000 (01:04 +0100)]
netfilter: nf_conntrack_dccp: use %s format string for buffer
Some invocations of nf_log_packet() use arg buffer directly instead of
"%s" format string with follow-up buffer pointer. Currently, these two
usages are not really critical, but we should fix this up nevertheless
so that we don't run into trouble if that changes one day.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pablo Neira Ayuso [Mon, 6 Jan 2014 12:54:30 +0000 (13:54 +0100)]
Revert "netfilter: avoid get_random_bytes calls"
This reverts commit
a42b99a6e329654d376b330de057eff87686d890.
Hannes Frederic Sowa reported some problems with this patch, more specifically
that prandom_u32() may not be ready at boot time, see:
http://marc.info/?l=linux-netdev&m=
138896532403533&w=2
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Anjali Singhai Jain [Thu, 28 Nov 2013 06:39:47 +0000 (06:39 +0000)]
i40e: Do not allow AQ calls from ndo-ops
If the device is not in a working state avoid making admin
queue (AQ) calls that rely on a working AQ.
Change-Id: Ifbba6d257b3a5b51bfe92938c04088c0baa21433
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Anjali Singhai Jain [Thu, 28 Nov 2013 06:39:46 +0000 (06:39 +0000)]
i40e: check asq alive before notify
Driver needs to make sure the send queue is alive before
trying to use it.
Chagne-Id: I9bd1f6159c45c98e63f562e3a8dfb57edfe50e13
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Anjali Singhai Jain [Thu, 28 Nov 2013 06:39:45 +0000 (06:39 +0000)]
i40e: Admin queue shutdown fixes
Always call the AQ call to shutdown the queue in the shutdown path.
Check ASQ is alive before issuing the AQ command since we might be
resetting to recover from a bad state in which case we should not
issue the AQ command.
Use the register variable for length so it can be used by PF, VF
and GL AQ commands.
Change-Id: Ic3d305687ea3f1a6afa84e864b7a27bd38a9af32
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Greg Rose [Thu, 28 Nov 2013 06:39:44 +0000 (06:39 +0000)]
i40e: Hide the Port VLAN VLAN ID
The VF VSI Port VLAN settings still allow the user to view VLAN tag in
the descriptor. Fix the settings to hide the VLAN ID from the VF. The
VF is not supposed to be aware it is on a VLAN in the Port VLAN
scenario.
Change-Id: I976f2bacb455dbb750f8c53a781c689f02cb8907
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Shannon Nelson [Thu, 28 Nov 2013 06:39:43 +0000 (06:39 +0000)]
i40e: use correct struct for get and update vsi params
The get_vsi_params and update_vsi_params functions were using a
different command struct that just happened to have an seid element in
the right place and so worked correctly anyway. This patch fixes the
functions to use the right data struct.
There is no actual logic change.
Change-Id: I513b5e1dc293dfd5b2ba4fa443cbdbfa608d9d19
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Greg Rose [Thu, 28 Nov 2013 06:39:42 +0000 (06:39 +0000)]
i40e: Fix VF driver MAC address configuration
Fix a problem where the 'ip link show' command would display stale
link address information after the link address was set via the 'ip
link set' command. In addition, fix problem with the user being
allowed to overwrite the administratively set VF MAC address.
Change-Id: I669ed14e55f2b633ef7b456b713632b08468671c
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Mitch Williams [Thu, 28 Nov 2013 06:39:41 +0000 (06:39 +0000)]
i40e: support VFs on PFs other than 0
When communicating with VF devices over the AQ, the FW refers to the
VF by its global VF ID, not local the VF ID with reference to its
parent PF. Since the global and local VF IDs are identical for PF 0,
the code worked correctly on PF 0.
However, we cannot just use global IDs throughout the code as most of
the other references to the VF (VSI setup, register offsets, etc.)
require the local VF ID. Instead, we just add or subtract our base VF
ID when sending and receiving AQ messages.
Change-Id: I92f4332b4876bc68b2f9af9ebf48761f63b6bd97
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>