Linus Lüssing [Wed, 11 Jun 2014 23:41:24 +0000 (01:41 +0200)]
bridge: fix compile error when compiling without IPv6 support
Some fields in "struct net_bridge" aren't available when compiling the
kernel without IPv6 support. Therefore adding a check/macro to skip the
complaining code sections in that case.
Introduced by
2cd4143192e8c60f66cb32c3a30c76d0470a372d
("bridge: memorize and export selected IGMP/MLD querier port")
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Lüssing [Wed, 11 Jun 2014 23:41:23 +0000 (01:41 +0200)]
bridge: fix smatch warning / potential null pointer dereference
"New smatch warnings:
net/bridge/br_multicast.c:1368 br_ip6_multicast_query() error:
we previously assumed 'group' could be null (see line 1349)"
In the rare (sort of broken) case of a query having a Maximum
Response Delay of zero, we could create a potential null pointer
dereference.
Fixing this by skipping the multicast specific MLD Query parsing again
if no multicast group address is available.
Introduced by
dc4eb53a996a78bfb8ea07b47423ff5a3aadc362
("bridge: adhere to querier election mechanism specified by RFCs")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
François Cachereul [Thu, 12 Jun 2014 10:11:25 +0000 (12:11 +0200)]
via-rhine: fix full-duplex with autoneg disable
With some specific configuration (VT6105M on Soekris 5510 and depending
on the device at the other end), fragmented packets were not transmitted
when forcing 100 full-duplex with autoneg disable.
This fix now write full-duplex chips register when forcing full or
half-duplex not only when autoneg is enable.
Signed-off-by: François Cachereul <f.cachereul@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 12 Jun 2014 17:28:49 +0000 (10:28 -0700)]
Merge branch 'bnx2x'
Yuval Mintz says:
====================
bnx2x: Bug fixes patch series
This patch series contains various bug fixes - 2 link related fixes,
one sriov-related issue and an additional fix for a theoretical bug
on new boards.
Please consider applying these patches to `net'.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ariel Elior [Thu, 12 Jun 2014 04:55:32 +0000 (07:55 +0300)]
bnx2x: Enlarge the dorq threshold for VFs
A malicious VF might try to starve the other VFs & PF by creating
contineous doorbell floods. In order to negate this, HW has a threshold of
doorbells per client, which will stop the client doorbells from arriving
if crossed.
The threshold currently configured for VFs is too low - under extreme traffic
scenarios, it's possible for a VF to reach the threshold and thus for its
fastpath to stop working.
Signed-off-by: Ariel Elior <ariel.elior@qlogic.com>
Signed-off-by: Yuval Mintz <yuval.mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Thu, 12 Jun 2014 04:55:31 +0000 (07:55 +0300)]
bnx2x: Check for UNDI in uncommon branch
If L2FW utilized by the UNDI driver has the same version number as that
of the regular FW, a driver loading after UNDI and receiving an uncommon
answer from management will mistakenly assume the loaded FW matches its
own requirement and try to exist the flow via FLR.
Signed-off-by: Yuval Mintz <yuval.mintz@qlogic.com>
Signed-off-by: Ariel Elior <ariel.elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yaniv Rosner [Thu, 12 Jun 2014 04:55:30 +0000 (07:55 +0300)]
bnx2x: Fix 1G-baseT link
Set the phy access mode even in case of link-flap avoidance.
Signed-off-by: Yaniv Rosner <yaniv.rosner@qlogic.com>
Signed-off-by: Yuval Mintz <yuval.mintz@qlogic.com>
Signed-off-by: Ariel Elior <ariel.elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yaniv Rosner [Thu, 12 Jun 2014 04:55:29 +0000 (07:55 +0300)]
bnx2x: Fix link for KR with swapped polarity lane
This avoids clearing the RX polarity setting in KR mode when polarity lane
is swapped, as otherwise this will result in failed link.
Signed-off-by: Yaniv Rosner <yaniv.rosner@qlogic.com>
Signed-off-by: Yuval Mintz <yuval.mintz@qlogic.com>
Signed-off-by: Ariel Elior <ariel.elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xufeng Zhang [Thu, 12 Jun 2014 02:53:36 +0000 (10:53 +0800)]
sctp: Fix sk_ack_backlog wrap-around problem
Consider the scenario:
For a TCP-style socket, while processing the COOKIE_ECHO chunk in
sctp_sf_do_5_1D_ce(), after it has passed a series of sanity check,
a new association would be created in sctp_unpack_cookie(), but afterwards,
some processing maybe failed, and sctp_association_free() will be called to
free the previously allocated association, in sctp_association_free(),
sk_ack_backlog value is decremented for this socket, since the initial
value for sk_ack_backlog is 0, after the decrement, it will be 65535,
a wrap-around problem happens, and if we want to establish new associations
afterward in the same socket, ABORT would be triggered since sctp deem the
accept queue as full.
Fix this issue by only decrementing sk_ack_backlog for associations in
the endpoint's list.
Fix-suggested-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Xufeng Zhang <xufeng.zhang@windriver.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 11 Jun 2014 23:02:55 +0000 (16:02 -0700)]
Merge git://git./linux/kernel/git/davem/net
Conflicts:
net/core/rtnetlink.c
net/core/skbuff.c
Both conflicts were very simple overlapping changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
Doug Ledford [Wed, 11 Jun 2014 14:38:03 +0000 (10:38 -0400)]
net/core: Add VF link state control policy
Commit
1d8faf48c7 (net/core: Add VF link state control) added VF link state
control to the netlink VF nested structure, but failed to add a proper entry
for the new structure into the VF policy table. Add the missing entry so
the table and the actual data copied into the netlink nested struct are in
sync.
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Fleming [Wed, 11 Jun 2014 18:48:17 +0000 (13:48 -0500)]
net/fsl: xgmac_mdio is dependent on OF_MDIO
Signed-off-by: Shruti Kanetkar <Shruti@Freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shruti Kanetkar [Wed, 11 Jun 2014 18:41:40 +0000 (13:41 -0500)]
net/fsl: Make xgmac_mdio read error message useful
Print the device address, the register number and the PHY ID for
which the MDIO read operation failed
Signed-off-by: Shruti Kanetkar <Shruti@Freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal [Wed, 11 Jun 2014 18:35:18 +0000 (20:35 +0200)]
net_sched: drr: warn when qdisc is not work conserving
The DRR scheduler requires that items on the active list are work
conserving, i.e. do not hold on to skbs for throttling purposes, etc.
Attaching e.g. tbf renders DRR useless because all other classes on the
active list are delayed as well.
So, warn users that this configuration won't work as expected; we
already do this in couple of other qdiscs, see e.g.
commit
b00355db3f88d96810a60011a30cfb2c3469409d
('pkt_sched: sch_hfsc: sch_htb: Add non-work-conserving warning handler')
The 'const' change is needed to avoid compiler warning ("discards 'const'
qualifier from pointer target type").
tested with:
drr_hier() {
parent=$1
classes=$2
for i in $(seq 1 $classes); do
classid=$parent$(printf %x $i)
tc class add dev eth0 parent $parent classid $classid drr
tc qdisc add dev eth0 parent $classid tbf rate 64kbit burst 256kbit limit 64kbit
done
}
tc qdisc add dev eth0 root handle 1: drr
drr_hier 1: 32
tc filter add dev eth0 protocol all pref 1 parent 1: handle 1 flow hash keys dst perturb 1 divisor 32
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 11 Jun 2014 22:46:17 +0000 (15:46 -0700)]
Merge branch 'inet_csums'
Tom Herbert says:
====================
net: Checksum offload changes - Part IV
I am working on overhauling RX checksum offload. Goals of this effort
are:
- Specify what exactly it means when driver returns CHECKSUM_UNNECESSARY
- Preserve CHECKSUM_COMPLETE through encapsulation layers
- Don't do skb_checksum more than once per packet
- Unify GRO and non-GRO csum verification as much as possible
- Unify the checksum functions (checksum_init)
- Simply code
What is in this fourth patch set:
- Preserve CHECKSUM_COMPLETE instead of changing it to
CHECKSUM_UNNECESSARY. This allows correct reuse in validating multiple
csums in a packet.
- When SW needs to compute the packet checksum, save it as
CHECKSUM_COMPLETE. Also mark that checksum was compute by SW.
- Add skb_gro_postpull_rcsum to udp and vxlan to make GRO work with
CHECKSUM_COMPLETE.
v2: Removed patch setting skb_encapsulation when validating checksum
in tcp_gro_receive
Please review carefully and test if possible, mucking with basic
checksum functions is always a little precarious :-)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Herbert [Wed, 11 Jun 2014 01:54:26 +0000 (18:54 -0700)]
net: Add skb_gro_postpull_rcsum to udp and vxlan
Need to gro_postpull_rcsum for GRO to work with checksum complete.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Herbert [Wed, 11 Jun 2014 01:54:19 +0000 (18:54 -0700)]
net: Save software checksum complete
In skb_checksum complete, if we need to compute the checksum for the
packet (via skb_checksum) save the result as CHECKSUM_COMPLETE.
Subsequent checksum verification can use this.
Also, added csum_complete_sw flag to distinguish between software and
hardware generated checksum complete, we should always be able to trust
the software computation.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Herbert [Wed, 11 Jun 2014 01:54:13 +0000 (18:54 -0700)]
net: Preserve CHECKSUM_COMPLETE at validation
Currently when the first checksum in a packet is validated using
CHECKSUM_COMPLETE, ip_summed is overwritten to be CHECKSUM_UNNECESSARY
so that any subsequent checksums in the packet are not correctly
validated.
This patch adds csum_valid flag in sk_buff and uses that to indicate
validated checksum instead of setting CHECKSUM_UNNECESSARY. The bit
is set accordingly in the skb_checksum_validate_* functions. The flag
is checked in skb_checksum_complete, so that validation is communicated
between checksum_init and checksum_complete sequence in TCP and UDP.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 11 Jun 2014 22:44:42 +0000 (15:44 -0700)]
Merge branch 'qlcnic-next'
Shahed Shaikh says:
====================
This series contains an enhancement in the area of firmware minidump collection
and optimization of ring count validation function.
Please apply this series to net-next.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Shahed Shaikh [Wed, 11 Jun 2014 18:09:13 +0000 (14:09 -0400)]
qlcnic: Update version to 5.3.60
Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shahed Shaikh [Wed, 11 Jun 2014 18:09:12 +0000 (14:09 -0400)]
qlcnic: Optimize ring count validations
- Check interrupt mode at the start of qlcnic_set_channels().
- Do not validate ring count if they are not going to change.
Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shahed Shaikh [Wed, 11 Jun 2014 18:09:11 +0000 (14:09 -0400)]
qlcnic: Pre-allocate DMA buffer used for minidump collection
Pre-allocate the physically contiguous DMA buffer used for
minidump collection at driver load time, rather than at
run time, to minimize allocation failures. Driver will allocate
the buffer at load time if PEX DMA support capability is indicated
by the adapter.
Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Popov [Wed, 11 Jun 2014 11:09:14 +0000 (15:09 +0400)]
ip_vti: fix sparse warnings for VTI_ISVTI
This patch fixes the following sparse warnings:
net/ipv4/ip_tunnel.c:245:53: warning: restricted __be16 degrades to integer
net/ipv4/ip_vti.c:321:19: warning: incorrect type in assignment (different base types)
net/ipv4/ip_vti.c:321:19: expected restricted __be16 [addressable] [assigned] [usertype] i_flags
net/ipv4/ip_vti.c:321:19: got int
net/ipv4/ip_vti.c:447:24: warning: incorrect type in assignment (different base types)
net/ipv4/ip_vti.c:447:24: expected restricted __be16 [usertype] i_flags
net/ipv4/ip_vti.c:447:24: got int
Since VTI_ISVTI is always used with ip_tunnel_parm->i_flags (which is __be16),
we can __force cast VTI_ISVTI to __be16 in header file.
Signed-off-by: Dmitry Popov <ixaphire@qrator.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Wed, 11 Jun 2014 08:16:51 +0000 (11:16 +0300)]
drivers: net: davinci_cpdma: double free on error
We recently change the kzalloc() to devm_kzalloc() so freeing "ctlr"
here could lead to a double free.
Fixes:
e194312854ed ('drivers: net: davinci_cpdma: Convert kzalloc() to devm_kzalloc().')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Wed, 11 Jun 2014 06:56:26 +0000 (09:56 +0300)]
amd-xgbe: unwind on error in xgbe_mdio_register()
There is a typo here so we return directly instead of unwinding.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Varka Bhadram [Wed, 11 Jun 2014 04:34:44 +0000 (10:04 +0530)]
mrf24j40: add device managed APIs
adds the device managed APIs so that no need worry about
freeing the resources.
Signed-off-by: Varka Bhadram <varkab@cdac.in>
Signed-off-by: David S. Miller <davem@davemloft.net>
stephen hemminger [Wed, 11 Jun 2014 03:30:13 +0000 (20:30 -0700)]
ceph: remove bogus extern
Sparse complained about this bogus extern on definition of
a function.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov [Tue, 10 Jun 2014 15:44:07 +0000 (17:44 +0200)]
net: filter: document internal instruction encoding
This patch adds a description of eBPFs instruction encoding in order
to bring the documentation in line with the implementation.
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov [Tue, 10 Jun 2014 15:44:06 +0000 (17:44 +0200)]
net: filter: mention eBPF terminology as well
Since the term eBPF is used anyway on mailing list discussions, lets
also document that in the main BPF documentation file and replace a
couple of occurrences with eBPF terminology to be more clear.
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 10 Jun 2014 13:43:01 +0000 (06:43 -0700)]
ipv4: fix a race in ip4_datagram_release_cb()
Alexey gave a AddressSanitizer[1] report that finally gave a good hint
at where was the origin of various problems already reported by Dormando
in the past [2]
Problem comes from the fact that UDP can have a lockless TX path, and
concurrent threads can manipulate sk_dst_cache, while another thread,
is holding socket lock and calls __sk_dst_set() in
ip4_datagram_release_cb() (this was added in linux-3.8)
It seems that all we need to do is to use sk_dst_check() and
sk_dst_set() so that all the writers hold same spinlock
(sk->sk_dst_lock) to prevent corruptions.
TCP stack do not need this protection, as all sk_dst_cache writers hold
the socket lock.
[1]
https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel
AddressSanitizer: heap-use-after-free in ipv4_dst_check
Read of size 2 by thread T15453:
[<
ffffffff817daa3a>] ipv4_dst_check+0x1a/0x90 ./net/ipv4/route.c:1116
[<
ffffffff8175b789>] __sk_dst_check+0x89/0xe0 ./net/core/sock.c:531
[<
ffffffff81830a36>] ip4_datagram_release_cb+0x46/0x390 ??:0
[<
ffffffff8175eaea>] release_sock+0x17a/0x230 ./net/core/sock.c:2413
[<
ffffffff81830882>] ip4_datagram_connect+0x462/0x5d0 ??:0
[<
ffffffff81846d06>] inet_dgram_connect+0x76/0xd0 ./net/ipv4/af_inet.c:534
[<
ffffffff817580ac>] SYSC_connect+0x15c/0x1c0 ./net/socket.c:1701
[<
ffffffff817596ce>] SyS_connect+0xe/0x10 ./net/socket.c:1682
[<
ffffffff818b0a29>] system_call_fastpath+0x16/0x1b
./arch/x86/kernel/entry_64.S:629
Freed by thread T15455:
[<
ffffffff8178d9b8>] dst_destroy+0xa8/0x160 ./net/core/dst.c:251
[<
ffffffff8178de25>] dst_release+0x45/0x80 ./net/core/dst.c:280
[<
ffffffff818304c1>] ip4_datagram_connect+0xa1/0x5d0 ??:0
[<
ffffffff81846d06>] inet_dgram_connect+0x76/0xd0 ./net/ipv4/af_inet.c:534
[<
ffffffff817580ac>] SYSC_connect+0x15c/0x1c0 ./net/socket.c:1701
[<
ffffffff817596ce>] SyS_connect+0xe/0x10 ./net/socket.c:1682
[<
ffffffff818b0a29>] system_call_fastpath+0x16/0x1b
./arch/x86/kernel/entry_64.S:629
Allocated by thread T15453:
[<
ffffffff8178d291>] dst_alloc+0x81/0x2b0 ./net/core/dst.c:171
[<
ffffffff817db3b7>] rt_dst_alloc+0x47/0x50 ./net/ipv4/route.c:1406
[< inlined >] __ip_route_output_key+0x3e8/0xf70
__mkroute_output ./net/ipv4/route.c:1939
[<
ffffffff817dde08>] __ip_route_output_key+0x3e8/0xf70 ./net/ipv4/route.c:2161
[<
ffffffff817deb34>] ip_route_output_flow+0x14/0x30 ./net/ipv4/route.c:2249
[<
ffffffff81830737>] ip4_datagram_connect+0x317/0x5d0 ??:0
[<
ffffffff81846d06>] inet_dgram_connect+0x76/0xd0 ./net/ipv4/af_inet.c:534
[<
ffffffff817580ac>] SYSC_connect+0x15c/0x1c0 ./net/socket.c:1701
[<
ffffffff817596ce>] SyS_connect+0xe/0x10 ./net/socket.c:1682
[<
ffffffff818b0a29>] system_call_fastpath+0x16/0x1b
./arch/x86/kernel/entry_64.S:629
[2]
<4>[196727.311203] general protection fault: 0000 [#1] SMP
<4>[196727.311224] Modules linked in: xt_TEE xt_dscp xt_DSCP macvlan bridge coretemp crc32_pclmul ghash_clmulni_intel gpio_ich microcode ipmi_watchdog ipmi_devintf sb_edac edac_core lpc_ich mfd_core tpm_tis tpm tpm_bios ipmi_si ipmi_msghandler isci igb libsas i2c_algo_bit ixgbe ptp pps_core mdio
<4>[196727.311333] CPU: 17 PID: 0 Comm: swapper/17 Not tainted 3.10.26 #1
<4>[196727.311344] Hardware name: Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.0 07/05/2013
<4>[196727.311364] task:
ffff885e6f069700 ti:
ffff885e6f072000 task.ti:
ffff885e6f072000
<4>[196727.311377] RIP: 0010:[<
ffffffff815f8c7f>] [<
ffffffff815f8c7f>] ipv4_dst_destroy+0x4f/0x80
<4>[196727.311399] RSP: 0018:
ffff885effd23a70 EFLAGS:
00010282
<4>[196727.311409] RAX:
dead000000200200 RBX:
ffff8854c398ecc0 RCX:
0000000000000040
<4>[196727.311423] RDX:
dead000000100100 RSI:
dead000000100100 RDI:
dead000000200200
<4>[196727.311437] RBP:
ffff885effd23a80 R08:
ffffffff815fd9e0 R09:
ffff885d5a590800
<4>[196727.311451] R10:
0000000000000000 R11:
0000000000000000 R12:
0000000000000000
<4>[196727.311464] R13:
ffffffff81c8c280 R14:
0000000000000000 R15:
ffff880e85ee16ce
<4>[196727.311510] FS:
0000000000000000(0000) GS:
ffff885effd20000(0000) knlGS:
0000000000000000
<4>[196727.311554] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
<4>[196727.311581] CR2:
00007a46751eb000 CR3:
0000005e65688000 CR4:
00000000000407e0
<4>[196727.311625] DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
<4>[196727.311669] DR3:
0000000000000000 DR6:
00000000ffff0ff0 DR7:
0000000000000400
<4>[196727.311713] Stack:
<4>[196727.311733]
ffff8854c398ecc0 ffff8854c398ecc0 ffff885effd23ab0 ffffffff815b7f42
<4>[196727.311784]
ffff88be6595bc00 ffff8854c398ecc0 0000000000000000 ffff8854c398ecc0
<4>[196727.311834]
ffff885effd23ad0 ffffffff815b86c6 ffff885d5a590800 ffff8816827821c0
<4>[196727.311885] Call Trace:
<4>[196727.311907] <IRQ>
<4>[196727.311912] [<
ffffffff815b7f42>] dst_destroy+0x32/0xe0
<4>[196727.311959] [<
ffffffff815b86c6>] dst_release+0x56/0x80
<4>[196727.311986] [<
ffffffff81620bd5>] tcp_v4_do_rcv+0x2a5/0x4a0
<4>[196727.312013] [<
ffffffff81622b5a>] tcp_v4_rcv+0x7da/0x820
<4>[196727.312041] [<
ffffffff815fd9e0>] ? ip_rcv_finish+0x360/0x360
<4>[196727.312070] [<
ffffffff815de02d>] ? nf_hook_slow+0x7d/0x150
<4>[196727.312097] [<
ffffffff815fd9e0>] ? ip_rcv_finish+0x360/0x360
<4>[196727.312125] [<
ffffffff815fda92>] ip_local_deliver_finish+0xb2/0x230
<4>[196727.312154] [<
ffffffff815fdd9a>] ip_local_deliver+0x4a/0x90
<4>[196727.312183] [<
ffffffff815fd799>] ip_rcv_finish+0x119/0x360
<4>[196727.312212] [<
ffffffff815fe00b>] ip_rcv+0x22b/0x340
<4>[196727.312242] [<
ffffffffa0339680>] ? macvlan_broadcast+0x160/0x160 [macvlan]
<4>[196727.312275] [<
ffffffff815b0c62>] __netif_receive_skb_core+0x512/0x640
<4>[196727.312308] [<
ffffffff811427fb>] ? kmem_cache_alloc+0x13b/0x150
<4>[196727.312338] [<
ffffffff815b0db1>] __netif_receive_skb+0x21/0x70
<4>[196727.312368] [<
ffffffff815b0fa1>] netif_receive_skb+0x31/0xa0
<4>[196727.312397] [<
ffffffff815b1ae8>] napi_gro_receive+0xe8/0x140
<4>[196727.312433] [<
ffffffffa00274f1>] ixgbe_poll+0x551/0x11f0 [ixgbe]
<4>[196727.312463] [<
ffffffff815fe00b>] ? ip_rcv+0x22b/0x340
<4>[196727.312491] [<
ffffffff815b1691>] net_rx_action+0x111/0x210
<4>[196727.312521] [<
ffffffff815b0db1>] ? __netif_receive_skb+0x21/0x70
<4>[196727.312552] [<
ffffffff810519d0>] __do_softirq+0xd0/0x270
<4>[196727.312583] [<
ffffffff816cef3c>] call_softirq+0x1c/0x30
<4>[196727.312613] [<
ffffffff81004205>] do_softirq+0x55/0x90
<4>[196727.312640] [<
ffffffff81051c85>] irq_exit+0x55/0x60
<4>[196727.312668] [<
ffffffff816cf5c3>] do_IRQ+0x63/0xe0
<4>[196727.312696] [<
ffffffff816c5aaa>] common_interrupt+0x6a/0x6a
<4>[196727.312722] <EOI>
<1>[196727.313071] RIP [<
ffffffff815f8c7f>] ipv4_dst_destroy+0x4f/0x80
<4>[196727.313100] RSP <
ffff885effd23a70>
<4>[196727.313377] ---[ end trace
64b3f14fae0f2e29 ]---
<0>[196727.380908] Kernel panic - not syncing: Fatal exception in interrupt
Reported-by: Alexey Preobrazhensky <preobr@google.com>
Reported-by: dormando <dormando@rydia.ne>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes:
8141ed9fcedb2 ("ipv4: Add a socket release callback for datagram sockets")
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Tue, 10 Jun 2014 10:31:10 +0000 (12:31 +0200)]
net: filter: add test_bpf module under MAINTAINERS' networking section
Add lib/test_bpf.c entry to maintainers file under networking.
All changes were posted via netdev for review, so make sure
other people Cc it as well when they call get_maintainer.pl.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Octavian Purdila [Wed, 11 Jun 2014 22:36:26 +0000 (01:36 +0300)]
net: add __pskb_copy_fclone and pskb_copy_for_clone
There are several instances where a pskb_copy or __pskb_copy is
immediately followed by an skb_clone.
Add a couple of new functions to allow the copy skb to be allocated
from the fclone cache and thus speed up subsequent skb_clone calls.
Cc: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Cc: Marek Lindner <mareklindner@neomailbox.ch>
Cc: Simon Wunderlich <sw@simonwunderlich.de>
Cc: Antonio Quartulli <antonio@meshcoding.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Cc: Arvid Brodin <arvid.brodin@alten.se>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Cc: Lauro Ramos Venancio <lauro.venancio@openbossa.org>
Cc: Aloisio Almeida Jr <aloisio.almeida@openbossa.org>
Cc: Samuel Ortiz <sameo@linux.intel.com>
Cc: Jon Maloy <jon.maloy@ericsson.com>
Cc: Allan Stephens <allan.stephens@windriver.com>
Cc: Andrew Hendry <andrew.hendry@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Reviewed-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Cooper [Wed, 11 Jun 2014 13:33:08 +0000 (14:33 +0100)]
sfc: PIO:Restrict to 64bit arch and use 64-bit writes.
Fixes:
ee45fd92c739
("sfc: Use TX PIO for sufficiently small packets")
The linux net driver uses memcpy_toio() in order to copy into
the PIO buffers.
Even on a 64bit machine this causes 32bit accesses to a write-
combined memory region.
There are hardware limitations that mean that only 64bit
naturally aligned accesses are safe in all cases.
Due to being write-combined memory region two 32bit accesses
may be coalesced to form a 64bit non 64bit aligned access.
Solution was to open-code the memory copy routines using pointers
and to only enable PIO for x86_64 machines.
Not tested on platforms other than x86_64 because this patch
disables the PIO feature on other platforms.
Compile-tested on x86 to ensure that works.
The WARN_ON_ONCE() code in the previous version of this patch
has been moved into the internal sfc debug driver as the
assertion was unnecessary in the upstream kernel code.
This bug fix applies to v3.13 and v3.14 stable branches.
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 11 Jun 2014 22:23:03 +0000 (15:23 -0700)]
Merge branch 'bridge-next'
Toshiaki Makita says:
====================
bridge: 802.1ad vlan protocol support
Currently bridge vlan filtering doesn't work fine with 802.1ad protocol.
Only if a bridge is configured without pvid, the bridge receives only
802.1ad tagged frames and no STP is used, it will work.
Otherwise:
- If pvid is configured, it can put only 802.1Q tags but cannot put 802.1ad
tags.
- If 802.1Q and 802.1ad tagged frames arrive in mixture, it applies filtering
regardless of their protocols.
- While an 802.1ad bridge should use another mac address for STP BPDU and
should forward customer's BPDU frames, it can't.
Thus, we can't properly handle frames once 802.1ad is used.
Handling 802.1ad is useful if we want to allow stacked vlans to be used,
e.g., guest VMs wants to use vlan tags and the host also wants to segregate
guest's traffic from other guests' by vlan tags.
Here is the image describing how to configure a bridge to filter VMs traffic.
+-------+p/u +-----+ +---------+
+----+ | |------|vnet0|--|User A VM|
|eth0|--|802.1ad| +-----+ +---------+
+----+ |bridge |p/u +-----+ +---------+
| |------|vnet1|--|User B VM|
+-------+ +-----+ +---------+
p/u: pvid/untagged
This patch set enables us to set vlan protocols per bridge.
This tries to implement a bridge like S-VLAN component in IEEE 802.1Q-2011
spec.
Note that there is another possible implementation that sets vlan protocols
per port. Some HW switches seem to take that approach.
However, I think per-bridge approach is better, because;
- I think the typical usage of an 802.1ad bridge is segregating 802.1Q tagged
traffic (like what is described above), and this doesn't need the ability to
be set protocols per port. Also, If a bridge has many ports and it supports
per-port setting, we might have to make much more extra configurations to
change protocols of all ports.
- I assume that the main perpose to set protocol per port is to assign S-VID
according to C-VID, or to realize two logical bridges (one is an 802.1Q
filtering bridge and the other is an 802.1ad filtering bridge) in one bridge.
The former usually needs additional features such as vlan id mapping, and
is likely to make bridge's code complicated. If a user wants, such enhanced
features can be accomplished by a combination of multiple bridges, so it is
not absolutely necessary to implement these features in a bridge itself.
The latter is simply unnecessary because we can easily make two bridges of
which one is an 802.1Q bridge and the other is an 802.1ad bridge.
Here is an example of the enhanced feature that we can realize by using
multiple bridges and veth interfaces. This way is documented in
IEEE 802.1Q-2011 clause 15.4 (C-tagged service interface).
+----+ +-------+p/u +------+ +----+ +--+
|eth0|--|802.1ad|----veth----|802.1Q|--|vnet|--|VM|
+----+ |bridge |----veth----|bridge| +----+ +--+
+-------+p/u +------+
p/u: pvid/untagged
In this configuration, we can map C-VIDs to any S-VID.
For example;
C-VID 10 and 20 to S-VID 100
C-VID 30 to S-VID 110
This is achieved through the 802.1Q bridge that forwards C-tagged frames to
proper ports of the 802.1ad bridge.
Changes:
v1 -> v2:
- Make the way to forward bridge group addresses more generic by introducing
new mask, group_fwd_mask_required.
RFC -> v1:
- Add S-TAG tx offload.
- Remove a fix around stacked vlan which has already been fixed.
- Take into account Bridge Group Addresses.
- Separate handling of protocol-mismatch from br_vlan_get_tag().
- Change the way to set vlan_proto from netlink to sysfs because no other
existing configuration per bridge can be set by netlink.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Toshiaki Makita [Tue, 10 Jun 2014 11:59:25 +0000 (20:59 +0900)]
bridge: Support 802.1ad vlan filtering
This enables us to change the vlan protocol for vlan filtering.
We come to be able to filter frames on the basis of 802.1ad vlan tags
through a bridge.
This also changes br->group_addr if it has not been set by user.
This is needed for an 802.1ad bridge.
(See IEEE 802.1Q-2011 8.13.5.)
Furthermore, this sets br->group_fwd_mask_required so that an 802.1ad
bridge can forward the Nearest Customer Bridge group addresses except
for br->group_addr, which should be passed to higher layer.
To change the vlan protocol, write a protocol in sysfs:
# echo 0x88a8 > /sys/class/net/br0/bridge/vlan_protocol
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
Toshiaki Makita [Tue, 10 Jun 2014 11:59:24 +0000 (20:59 +0900)]
bridge: Prepare for forwarding another bridge group addresses
If a bridge is an 802.1ad bridge, it must forward another bridge group
addresses (the Nearest Customer Bridge group addresses).
(For details, see IEEE 802.1Q-2011 8.6.3.)
As user might not want group_fwd_mask to be modified by enabling 802.1ad,
introduce a new mask, group_fwd_mask_required, which indicates addresses
the bridge wants to forward. This will be set by enabling 802.1ad.
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
Toshiaki Makita [Tue, 10 Jun 2014 11:59:23 +0000 (20:59 +0900)]
bridge: Prepare for 802.1ad vlan filtering support
This enables a bridge to have vlan protocol informantion and allows vlan
tag manipulation (retrieve, insert and remove tags) according to the vlan
protocol.
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
Toshiaki Makita [Tue, 10 Jun 2014 11:59:22 +0000 (20:59 +0900)]
bridge: Add 802.1ad tx vlan acceleration
Bridge device doesn't need to embed S-tag into skb->data.
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Tue, 10 Jun 2014 08:34:36 +0000 (10:34 +0200)]
net: xen-netback: include linux/vmalloc.h again
commit
e9ce7cb6b107 ("xen-netback: Factor queue-specific data into
queue struct") added a use of vzalloc/vfree to interface.c, but
removed the #include <linux/vmalloc.h> statement at the same time,
which causes this build error:
drivers/net/xen-netback/interface.c: In function 'xenvif_free':
drivers/net/xen-netback/interface.c:754:2: error: implicit declaration of function 'vfree' [-Werror=implicit-function-declaration]
vfree(vif->queues);
^
cc1: some warnings being treated as errors
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Andrew J. Bennieston <andrew.bennieston@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jongsung Kim [Tue, 10 Jun 2014 03:50:12 +0000 (12:50 +0900)]
net: phy: realtek: register/unregister multiple drivers properly
Using phy_drivers_register/_unregister functions is proper way to
handle multiple PHY drivers registration. For Realtek PHY drivers
module, it fixes incomplete current error-handlings up and adds
missed unregistration for the RTL8201CP driver.
Signed-off-by: Jongsung Kim <neidhard.kim@lge.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yoshihiro Shimoda [Tue, 10 Jun 2014 00:40:24 +0000 (09:40 +0900)]
net: sh_eth: Fix timing of RACT setting in sh_eth_rx()
This patch fixes an issue that we cannot use nfs rootfs correctly
on r8a7790 when the command below runs on a host PC.
$ sudo ping -f -l 8 $BOARD_IP_ADDR
Since the driver sets the RACT to 1 in the first while loop of
sh_eth_rx(), the controller accepts a next frame into the next RX
descriptor during the while loop. But, in the first while loop
doesn't allocate a next skb. So, this patch removes the RACT setting
in the first while loop of sh_eth_rx().
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yoshihiro Shimoda [Tue, 10 Jun 2014 00:40:14 +0000 (09:40 +0900)]
net: sh_eth: Fix receive packet "exceeded" condition in sh_eth_rx()
This patch fixes the packet "exceeded" condition in sh_eth_rx() when
RACT in an RX descriptor is not set and the "quota" is 0.
Otherwise, kernel panic happens because the "&n->poll_list" is deleted
twice in sh_eth_poll() which calls napi_complete() and net_rx_action().
Signed-off-by: Kouei Abe <kouei.abe.cp@renesas.com>
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov [Wed, 11 Jun 2014 20:16:44 +0000 (13:16 -0700)]
net: filter: fix warning on 32-bit arch
fix compiler warning on 32-bit architectures:
net/core/filter.c: In function '__sk_run_filter':
net/core/filter.c:540:22: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
net/core/filter.c:550:22: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
net/core/filter.c:560:22: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Paul Maloy [Mon, 9 Jun 2014 16:08:18 +0000 (11:08 -0500)]
tipc: fix potential bug in function tipc_backlog_rcv
In commit
4f4482dcd9a0606a30541ff165ddaca64748299b ("tipc: compensate
for double accounting in socket rcv buffer") we access 'truesize' of
a received buffer after it might have been released by the function
filter_rcv().
In this commit we correct this by reading the value of 'truesize' to
the stack before delivering the buffer to filter_rcv().
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Mon, 9 Jun 2014 15:09:01 +0000 (18:09 +0300)]
net: sxgbe: remove duplicate SXGBE_CORE_L34_ADDCTL_REG define
The SXGBE_CORE_L34_ADDCTL_REG define is cut and pasted twice so we can
delete the second instance.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Mon, 9 Jun 2014 14:55:39 +0000 (17:55 +0300)]
qlcnic: remove duplicate QLC_83XX_GET_LSO_CAPABILITY define
The QLC_83XX_GET_LSO_CAPABILITY define is cut and pasted twice so we can
delete the second instance.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Sony Chacko <sony.chacko@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 11 Jun 2014 21:59:21 +0000 (14:59 -0700)]
Merge branch 'mlx4'
Amir Vadai says:
====================
cpumask,net: affinity hint helper function
This patchset will set affinity hint to influence IRQs to be allocated on the
same NUMA node as the one where the card resides. As discussed in
http://www.spinics.net/lists/netdev/msg271497.html
If number of IRQs allocated is greater than the number of local NUMA cores, all
local cores will be used first, and the rest of the IRQs will be on a remote
NUMA node.
If no NUMA support - IRQ's and cores will be mapped 1:1
Since the utility function to calculate the mapping could be useful in other mq
drivers in the kernel, it was added to cpumask.[ch]
This patchset was tested and applied on top of net-next since the first
consumer is a network device (mlx4_en). Over commit fff1f59 "mac802154:
llsec: add forgotten list_del_rcu in key removal"
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Atias [Mon, 9 Jun 2014 07:24:39 +0000 (10:24 +0300)]
net/mlx4_en: Use affinity hint
The “affinity hint” mechanism is used by the user space
daemon, irqbalancer, to indicate a preferred CPU mask for irqs.
Irqbalancer can use this hint to balance the irqs between the
cpus indicated by the mask.
We wish the HCA to preferentially map the IRQs it uses to numa cores
close to it. To accomplish this, we use cpumask_set_cpu_local_first(), that
sets the affinity hint according the following policy:
First it maps IRQs to “close” numa cores. If these are exhausted, the
remaining IRQs are mapped to “far” numa cores.
Signed-off-by: Yuval Atias <yuvala@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Amir Vadai [Mon, 9 Jun 2014 07:24:38 +0000 (10:24 +0300)]
cpumask: Utility function to set n'th cpu - local cpu first
This function sets the n'th cpu - local cpu's first.
For example: in a 16 cores server with even cpu's local, will get the
following values:
cpumask_set_cpu_local_first(0, numa, cpumask) => cpu 0 is set
cpumask_set_cpu_local_first(1, numa, cpumask) => cpu 2 is set
...
cpumask_set_cpu_local_first(7, numa, cpumask) => cpu 14 is set
cpumask_set_cpu_local_first(8, numa, cpumask) => cpu 1 is set
cpumask_set_cpu_local_first(9, numa, cpumask) => cpu 3 is set
...
cpumask_set_cpu_local_first(15, numa, cpumask) => cpu 15 is set
Curently this function will be used by multi queue networking devices to
calculate the irq affinity mask, such that as many local cpu's as
possible will be utilized to handle the mq device irq's.
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 11 Jun 2014 19:25:12 +0000 (12:25 -0700)]
Merge branch 'master' of git://git./linux/kernel/git/jkirsher/net-next
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2014-06-11
This series contains updates to igb, i40e and i40evf.
Todd makes a change to igb to un-hide invariant returns by getting rid of
the E1000_SUCCESS define and converting those returns to return 0.
Jacob separates the hardware logic from the set function, so that we can
re-use it during a ptp_reset in igb. This enables the reset to return
functionality to the last know timestamp mode, rather than resetting the
value.
Ashish implements context flags for headwb and headwb_addr so that we
do not have to keep them always enabled.
Shannon updates the admin queue API for the new firmware, which adds
set_pf_content, nvm_config_read/write, replaces set_phy_reset with
set_phy_debug and removes nvm_read/write_reg_se. Cleans up the driver
to use the stored base_queue value since there is no need to read the
PCI register for the PF's base queue on every single transmit queue
enable and disable as we already have the value stored from reading
the capability features at startup.
Anjali changes the notion of source and destination for FD_SB in ethtool
to align i40e with other drivers. Adds flow director statistics to
the PF stats. Fixes a bug in ethtool for flow director drop packet
filter where the drop action comes down as a ring_cookie value, so allow
it as a special value that can be used to configure destination control.
Mitch fixes the i40evf to keep the driver from going down when it is
already in a down state. This prevents a CPU soft lock in napi_disable().
Also change the i40evf to check the admin queue error bits since the
firmware can indicate any admin queue error states to the driver via
some bits in the length registers.
Neerav separates out the DCB capability and enabled flags because currently
if the firmware reports DCB capability the driver enables
I40E_FLAG_DCB_ENABLED flag. When this flag is enabled the driver inserts
a tag when transmitting a packet from the port even if there are no DCB
traffic classes configured at the port. So by adding the additional flag,
I40E_FLAG_DCB_CAPABLE, that will be set when the DCB capability is present
and the existing enabled flag will only be set if there are more than one
traffic classes configured at the port.
Greg fixes the i40e driver to not automatically accept tagged packets by
default so that the system must request a VLAN tag packet filter to get
packets with that tag. Greg also converts i40e to use the in-kernel
ether_addr_copy() instead of mempcy().
Jesse removes the FTYPE field from the receive descriptor to match the
hardware implementation.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 11 Jun 2014 19:23:30 +0000 (12:23 -0700)]
Merge branch 'sctp-next'
Daniel Borkmann says:
====================
SCTP update
This set contains transport path selection improvements in
SCTP. Please see individual patches for details.
====================
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Wed, 11 Jun 2014 16:19:32 +0000 (18:19 +0200)]
net: sctp: fix incorrect type in gfp initializer
This fixes the following sparse warning:
net/sctp/associola.c:1556:29: warning: incorrect type in initializer (different base types)
net/sctp/associola.c:1556:29: expected bool [unsigned] [usertype] preload
net/sctp/associola.c:1556:29: got restricted gfp_t
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Wed, 11 Jun 2014 16:19:31 +0000 (18:19 +0200)]
net: sctp: improve sctp_select_active_and_retran_path selection
In function sctp_select_active_and_retran_path(), we walk the
transport list in order to look for the two most recently used
ACTIVE transports (trans_pri, trans_sec). In case we didn't find
anything ACTIVE, we currently just camp on a possibly PF or
INACTIVE transport that is primary path; this behavior actually
dates back to linux-history tree of the very early days of
lksctp, and can yield a behavior that chooses suboptimal
transport paths.
Instead, be a bit more clever by reusing and extending the
recently introduced sctp_trans_elect_best() handler. In case
both transports are evaluated to have the same score resulting
from their states, break the tie by looking at: 1) transport
patch error count 2) last_time_heard value from each transport.
This is analogous to Nishida's Quick Failover draft [1],
section 5.1, 3:
The sender SHOULD avoid data transmission to PF destinations.
When all destinations are in either PF or Inactive state,
the sender MAY either move the destination from PF to active
state (and transmit data to the active destination) or the
sender MAY transmit data to a PF destination. In the former
scenario, (i) the sender MUST NOT notify the ULP about the
state transition, and (ii) MUST NOT clear the destination's
error counter. It is recommended that the sender picks the
PF destination with least error count (fewest consecutive
timeouts) for data transmission. In case of a tie (multiple PF
destinations with same error count), the sender MAY choose the
last active destination.
Thus for sctp_select_active_and_retran_path(), we keep track of
the best, if any, transport that is in PF state and in case no
ACTIVE transport has been found (hence trans_{pri,sec} is NULL),
we select the best out of the three: current primary_path and
retran_path as well as a possible PF transport.
The secondary may still camp on the original primary_path as
before. The change in sctp_trans_elect_best() with a more fine
grained tie selection also improves at the same time path selection
for sctp_assoc_update_retran_path() in case of non-ACTIVE states.
[1] http://tools.ietf.org/html/draft-nishida-tsvwg-sctp-failover-05
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Wed, 11 Jun 2014 16:19:30 +0000 (18:19 +0200)]
net: sctp: migrate most recently used transport to ktime
Be more precise in transport path selection and use ktime
helpers instead of jiffies to compare and pick the better
primary and secondary recently used transports. This also
avoids any side-effects during a possible roll-over, and
could lead to better path decision-making.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Wed, 11 Jun 2014 16:19:29 +0000 (18:19 +0200)]
net: sctp: refactor active path selection
This patch just refactors and moves the code for the active
path selection into its own helper function outside of
sctp_assoc_control_transport() which is already big enough.
No functional changes here.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Wed, 11 Jun 2014 16:19:28 +0000 (18:19 +0200)]
ktime: add ktime_after and ktime_before helper
Add two minimal helper functions analogous to time_before() and
time_after() that will later on both be needed by SCTP code.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 11 Jun 2014 19:11:25 +0000 (12:11 -0700)]
Merge branch 'mac802154'
Phoebe Buckheister says:
====================
Recent llsec code introduced a memory leak on decryption failures during rx.
This fixes said leak, and optimizes the receive loops for monitor and wpan
devices to only deliver skbs to devices that are actually up. Also changes a
dev_kfree_skb to kfree_skb when an invalid packet is dropped before being
pushed into the stack.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Phoebe Buckheister [Wed, 11 Jun 2014 10:03:07 +0000 (12:03 +0200)]
mac802154: don't deliver packets to devices that are down
Only one WPAN devices can be active at any given time, so only deliver
packets to that one interface that is actually up. Multiple monitors may
be up at any given time, but we don't have to deliver to monitors that
are down either.
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Phoebe Buckheister [Wed, 11 Jun 2014 10:03:06 +0000 (12:03 +0200)]
mac802154: properly free incoming skbs on decryption failure
mac802154 RX did not free skbs on decryption failure, assuming that the
caller would when the local rx handler returned _DROP. This was false.
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Catherine Sullivan [Thu, 22 May 2014 06:32:33 +0000 (06:32 +0000)]
i40e/i40evf: Bump i40e to version 0.4.10 and i40evf to 0.9.34
Bump versions.
Change-ID: Ic4a84354955061ca18321b1e97c9c30fe1563b5c
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Shannon Nelson [Thu, 22 May 2014 06:32:28 +0000 (06:32 +0000)]
i40e: use stored base_queue value
No need to read the PCI register for the PF's base queue on every single Tx
queue enable and disable as we already have the value stored from reading
the capability features at startup.
Change-ID: Ic02fb622757742f43cb8269369c3d972d4f66555
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Anjali Singhai Jain [Thu, 22 May 2014 06:32:23 +0000 (06:32 +0000)]
i40e: Fix a bug in ethtool for FD drop packet filter action
A drop action comes down as a ring_cookie value, so allow it as
a special value that can be used to configure destination control.
Also fix the output to filter read command accordingly.
Change-ID: I9956723cee42f3194885403317dd21ed4a151144
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Anjali Singhai Jain [Thu, 22 May 2014 06:32:17 +0000 (06:32 +0000)]
i40e/i40evf: Add Flow director stats to PF stats
Add members to stat struct to keep track of Flow director ATR and
SideBand filter packet matches.
Change-ID: Ibbb31a53c7adcc2bb96991dd80565442a2f2513c
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jesse Brandeburg [Thu, 22 May 2014 06:32:12 +0000 (06:32 +0000)]
i40e/i40evf: remove FTYPE
This change drops the FTYPE field from the Rx descriptor, to
match the hardware implementation.
Change-ID: I66d31d2b43861da45e8ace4fb03df033abe88bab
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Mitch Williams [Thu, 22 May 2014 06:32:07 +0000 (06:32 +0000)]
i40evf: check admin queue error bits
FW can indicate any admin queue error states to the driver via some bits
in the length registers. Each time we process an admin queue message,
check these bits and log any errors we find. Since the VF really can't
do much, we just print the message and depend on the PF driver to clear
things up on our behalf.
Change-ID: I92bc6c53ce3b4400544e0ca19c5de2d27490bd0d
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Greg Rose [Thu, 22 May 2014 06:32:02 +0000 (06:32 +0000)]
i40e/i40evf: User ether_addr_copy instead of memcpy
Linux gives us a function to copy Ethernet MAC addresses, let's use it.
Change-ID: I0c861900029ca5ea65a53ca39565852fb633f6fd
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Greg Rose [Thu, 22 May 2014 06:31:56 +0000 (06:31 +0000)]
i40e: Do not accept tagged packets by default
Remove the filter created by the firmware with the default MAC address it
reads out of the NVM storage and a promiscuous VLAN tag and replace it
with a filter that will not accept tagged packets by default. The system
must request a VLAN tag packet filter to get packets with that tag.
Change-ID: I119e6c3603a039bd68282ba31bf26f33a575490a
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Neerav Parikh [Thu, 22 May 2014 06:31:51 +0000 (06:31 +0000)]
i40e: Separate out DCB capability and enabled flags
Currently if the firmware reports DCB capability the driver enables
I40E_FLAG_DCB_ENABLED flag. When this flag is enabled the driver
inserts a tag when transmitting a packet from the port even if there
are no DCB traffic classes configured at the port.
This patch adds a new flag I40E_FLAG_DCB_CAPABLE that will be set
when the DCB capability is present and the existing flag
I40E_FLAG_DCB_ENABLED will be set only if there are more than one
traffic classes configured at the port.
Change-ID: I24ccbf53ef293db2eba80c8a9772acf729795bd5
Signed-off-by: Neerav Parikh <neerav.parikh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Mitch Williams [Thu, 22 May 2014 06:31:46 +0000 (06:31 +0000)]
i40evf: don't go further down
If the device is down, there's no place to go but up, so don't try to go
down even more. This prevents a CPU soft lock in napi_disable().
Change-ID: I8b058b9ee974dfa01c212fae2597f4f54b333314
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Anjali Singhai Jain [Thu, 22 May 2014 06:31:41 +0000 (06:31 +0000)]
i40e: Change the notion of src and dst for FD_SB in ethtool
In XL710 devices we program FD filter's fields from Tx perspective of the flow.
However the user interface exposed in ethtool should be compliant with the
previous generation of drivers where a filter src and dst field are from
the RX perspective. This patch changes the ethtool interface in this regard
to match the other drivers.
Change-ID: Iec6ccddd87357c4fb53ccf33aa0fae699faf70cf
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Shannon Nelson [Thu, 22 May 2014 06:31:30 +0000 (06:31 +0000)]
i40e/i40evf: AdminQ API update for new FW
Add set_pf_context, replace set_phy_reset with set_phy_debug, add
nvm_config_read/write, remove nvm_read/write_reg_se and add some
PHY types.
With these changes we bump the API version to 1.2.
Change-ID: I4dc3aec175c2316f66fc9b726b3f7d594699d84e
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Ashish Shah [Thu, 22 May 2014 06:31:25 +0000 (06:31 +0000)]
i40e/i40evf: set headwb Tx context flags and use them
Set appropriate fields in Tx queue configuration virtchnl message
to pf to enable headwb and setup headwb addr.
Then use that info from the VF to set headwb and headwb_addr instead of
always enabling them.
Change-ID: I7d393d1b2b07f0f3355b3a4f7c2d3c6ee3b0d622
Signed-off-by: Ashish Shah <ashish.n.shah@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Thu, 5 Jun 2014 07:25:10 +0000 (07:25 +0000)]
igb: separate hardware setting from the set_ts_config ioctl
This patch separates the hardware logic from the set function, so that
we can re-use it during a ptp_reset. This enables the reset to return
functionality to the last known timestamp mode, rather than resetting
the value. We initialize the mode to off during the ptp_init cycle.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Todd Fujinaka [Wed, 4 Jun 2014 07:12:15 +0000 (07:12 +0000)]
igb: unhide invariant returns
Return a 0 directly rather than a constant.
Reported-by: Peter Senna Tschudin <peter.senna@gmail.com>
Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Lendacky, Thomas [Mon, 9 Jun 2014 14:19:32 +0000 (09:19 -0500)]
amd-xgbe: Rename MAX_DMA_CHANNELS to avoid powerpc conflict
MAX_DMA_CHANNELS is defined in asm/scatterlist.h of the powerpc
architecture. Rename this #define in xgbe.h to avoid the
redefined warning issued during compilation.
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei-Chun Chao [Mon, 9 Jun 2014 06:48:54 +0000 (23:48 -0700)]
net: fix UDP tunnel GSO of frag_list GRO packets
This patch fixes a kernel BUG_ON in skb_segment. It is hit when
testing two VMs on openvswitch with one VM acting as VXLAN gateway.
During VXLAN packet GSO, skb_segment is called with skb->data
pointing to inner TCP payload. skb_segment calls skb_network_protocol
to retrieve the inner protocol. skb_network_protocol actually expects
skb->data to point to MAC and it calls pskb_may_pull with ETH_HLEN.
This ends up pulling in ETH_HLEN data from header tail. As a result,
pskb_trim logic is skipped and BUG_ON is hit later.
Move skb_push in front of skb_network_protocol so that skb->data
lines up properly.
kernel BUG at net/core/skbuff.c:2999!
Call Trace:
[<
ffffffff816ac412>] tcp_gso_segment+0x122/0x410
[<
ffffffff816bc74c>] inet_gso_segment+0x13c/0x390
[<
ffffffff8164b39b>] skb_mac_gso_segment+0x9b/0x170
[<
ffffffff816b3658>] skb_udp_tunnel_segment+0xd8/0x390
[<
ffffffff816b3c00>] udp4_ufo_fragment+0x120/0x140
[<
ffffffff816bc74c>] inet_gso_segment+0x13c/0x390
[<
ffffffff8109d742>] ? default_wake_function+0x12/0x20
[<
ffffffff8164b39b>] skb_mac_gso_segment+0x9b/0x170
[<
ffffffff8164b4d0>] __skb_gso_segment+0x60/0xc0
[<
ffffffff8164b6b3>] dev_hard_start_xmit+0x183/0x550
[<
ffffffff8166c91e>] sch_direct_xmit+0xfe/0x1d0
[<
ffffffff8164bc94>] __dev_queue_xmit+0x214/0x4f0
[<
ffffffff8164bf90>] dev_queue_xmit+0x10/0x20
[<
ffffffff81687edb>] ip_finish_output+0x66b/0x890
[<
ffffffff81688a58>] ip_output+0x58/0x90
[<
ffffffff816c628f>] ? fib_table_lookup+0x29f/0x350
[<
ffffffff816881c9>] ip_local_out_sk+0x39/0x50
[<
ffffffff816cbfad>] iptunnel_xmit+0x10d/0x130
[<
ffffffffa0212200>] vxlan_xmit_skb+0x1d0/0x330 [vxlan]
[<
ffffffffa02a3919>] vxlan_tnl_send+0x129/0x1a0 [openvswitch]
[<
ffffffffa02a2cd6>] ovs_vport_send+0x26/0xa0 [openvswitch]
[<
ffffffffa029931e>] do_output+0x2e/0x50 [openvswitch]
Signed-off-by: Wei-Chun Chao <weichunc@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
huizhang [Mon, 9 Jun 2014 04:37:25 +0000 (12:37 +0800)]
net: ipv6: Fixed up ipsec packet be re-routing issue
Bug report on https://bugzilla.kernel.org/show_bug.cgi?id=75781
When a local output ipsec packet match the mangle table rule,
and be set mark value, the packet will be route again in
route_me_harder -> _session_decoder6
In this case, the nhoff in CB of skb was still the default
value 0. So the protocal match can't success and the packet can't match
correct SA rule,and then the packet be send out in plaintext.
To fixed up the issue. The CB->nhoff must be set.
Signed-off-by: Hui Zhang <huizhang@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Sun, 8 Jun 2014 22:49:34 +0000 (23:49 +0100)]
farsync: Fix confusion about DMA address and buffer offset types
Use dma_addr_t for DMA address parameters and u32 for shared memory
offset parameters.
Do not assume that dma_addr_t is the same as unsigned long; it will
not be in PAE configurations. Truncate DMA addresses to 32 bits when
printing them. This is OK because the DMA mask for this device is
32-bit (per default).
Also rename the DMA address parameters from 'skb' to 'dma'.
Compile-tested only.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Popov [Sat, 7 Jun 2014 23:03:08 +0000 (03:03 +0400)]
ip_tunnel: fix i_key matching in ip_tunnel_find
Some tunnels (though only vti as for now) can use i_key just for internal use:
for example vti uses it for fwmark'ing incoming packets. So raw i_key value
shouldn't be treated as a distinguisher for them. ip_tunnel_key_match exists for
cases when we want to compare two ip_tunnel_parms' i_keys.
Example bug:
ip link add type vti ikey 1 local 1.0.0.1 remote 2.0.0.2
ip link add type vti ikey 2 local 1.0.0.1 remote 2.0.0.2
spawned two tunnels, although it doesn't make sense.
Signed-off-by: Dmitry Popov <ixaphire@qrator.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 11 Jun 2014 07:32:53 +0000 (00:32 -0700)]
Merge branch 'mlx4'
Or Gerlitz says:
====================
mlx4 SRIOV fixes
The patch from Wei Yang is a designed fix to a regression introduced by earlier commit
of him. Jack added a fix to the resource management which we got from IBM.
Let's get that into 3.16-rc1 1st and later see to what stable version/s this should go.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yang [Sun, 8 Jun 2014 10:49:46 +0000 (13:49 +0300)]
net/mlx4_core: Keep only one driver entry release mlx4_priv
Following commit befdf89 "net/mlx4_core: Preserve pci_dev_data after
__mlx4_remove_one()", there are two mlx4 pci callbacks which will
attempt to release the mlx4_priv object -- .shutdown and .remove.
This leads to a use-after-free access to the already freed mlx4_priv
instance and trigger a "Kernel access of bad area" crash when both
.shutdown and .remove are called.
During reboot or kexec, .shutdown is called, with the VFs probed to
the host going through shutdown first and then the PF. Later, the PF
will trigger VFs' .remove since VFs still have driver attached.
Fix that by keeping only one driver entry which releases mlx4_priv.
Fixes: befdf89 ('net/mlx4_core: Preserve pci_dev_data after __mlx4_remove_one()')
CC: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jack Morgenstein [Sun, 8 Jun 2014 10:49:45 +0000 (13:49 +0300)]
net/mlx4_core: Fix SRIOV free-pool management when enforcing resource quotas
The Hypervisor driver tracks free slots and reserved slots at the global level
and tracks allocated slots and guaranteed slots per VF.
Guaranteed slots are treated as reserved by the driver, so the total
reserved slots is the sum of all guaranteed slots over all the VFs.
As VFs allocate resources, free (global) is decremented and allocated (per VF)
is incremented for those resources. However, reserved (global) is never changed.
This means that effectively, when a VF allocates a resource from its
guaranteed pool, it is actually reducing that resource's free pool (since
the global reserved count was not also reduced).
The fix for this problem is the following: For each resource, as long as a
VF's allocated count is <= its guaranteed number, when allocating for that
VF, the reserved count (global) should be reduced by the allocation as well.
When the global reserved count reaches zero, the remaining global free count
is still accessible as the free pool for that resource.
When the VF frees resources, the reverse happens: the global reserved count
for a resource is incremented only once the VFs allocated number falls below
its guaranteed number.
This fix was developed by Rick Kready <kready@us.ibm.com>
Reported-by: Rick Kready <kready@us.ibm.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Popov [Sat, 7 Jun 2014 22:06:25 +0000 (02:06 +0400)]
ip_vti: Fix 'ip tunnel add' with 'key' parameters
ip tunnel add remote 10.2.2.1 local 10.2.2.2 mode vti ikey 1 okey 2
translates to p->iflags = VTI_ISVTI|GRE_KEY and p->i_key = 1, but GRE_KEY !=
TUNNEL_KEY, so ip_tunnel_ioctl would set i_key to 0 (same story with o_key)
making us unable to create vti tunnels with [io]key via ip tunnel.
We cannot simply translate GRE_KEY to TUNNEL_KEY (as GRE module does) because
vti_tunnels with same local/remote addresses but different ikeys will be treated
as different then. So, imo the best option here is to move p->i_flags & *_KEY
check for vti tunnels from ip_tunnel.c to ip_vti.c and to think about [io]_mark
field for ip_tunnel_parm in the future.
Signed-off-by: Dmitry Popov <ixaphire@qrator.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rickard Strandqvist [Sat, 7 Jun 2014 11:26:37 +0000 (13:26 +0200)]
net: wimax: i2400m: control.c: Cleaning up conjunction always evaluates to false
Logical conjunction always evaluates to false: minor < 2 && minor > 1
I guess what you wanted is rather: minor > 2 || minor < 1
This was partly found using a static code analysis program called cppcheck.
Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rickard Strandqvist [Sat, 7 Jun 2014 10:22:08 +0000 (12:22 +0200)]
net: ethernet: toshiba: ps3_gelic_net.c: Cleaning up a check on a memory allocation
A check on a memory allocation is checked incorrectly.
This was partly found using a static code analysis program called cppcheck.
Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Acked-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
françois romieu [Sat, 7 Jun 2014 09:07:48 +0000 (11:07 +0200)]
amd-xgbe: fix unused variable compilation warning in phylib driver
Fix following compilation warning:
[...]
CC drivers/net/phy/amd-xgbe-phy.o
drivers/net/phy/amd-xgbe-phy.c:1353:30: warning:
‘amd_xgbe_phy_ids’ defined but not used [-Wunused-variable]
static struct mdio_device_id amd_xgbe_phy_ids[] = {
^
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov [Sat, 7 Jun 2014 00:48:20 +0000 (17:48 -0700)]
net: filter: fix nlattr and nlattr_nest BPF tests
- 'struct nlattr' must be 2 byte aligned
- provide big-endian input data for nlattr/nlattr_nest tests
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov [Fri, 6 Jun 2014 21:46:06 +0000 (14:46 -0700)]
net: filter: cleanup A/X name usage
The macro 'A' used in internal BPF interpreter:
#define A regs[insn->a_reg]
was easily confused with the name of classic BPF register 'A', since
'A' would mean two different things depending on context.
This patch is trying to clean up the naming and clarify its usage in the
following way:
- A and X are names of two classic BPF registers
- BPF_REG_A denotes internal BPF register R0 used to map classic register A
in internal BPF programs generated from classic
- BPF_REG_X denotes internal BPF register R7 used to map classic register X
in internal BPF programs generated from classic
- internal BPF instruction format:
struct sock_filter_int {
__u8 code; /* opcode */
__u8 dst_reg:4; /* dest register */
__u8 src_reg:4; /* source register */
__s16 off; /* signed offset */
__s32 imm; /* signed immediate constant */
};
- BPF_X/BPF_K is 1 bit used to encode source operand of instruction
In classic:
BPF_X - means use register X as source operand
BPF_K - means use 32-bit immediate as source operand
In internal:
BPF_X - means use 'src_reg' register as source operand
BPF_K - means use 32-bit immediate as source operand
Suggested-by: Chema Gonzalez <chema@google.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Chema Gonzalez <chema@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Manuel Schölling [Sat, 7 Jun 2014 21:57:25 +0000 (23:57 +0200)]
dns_resolver: assure that dns_query() result is null-terminated
dns_query() credulously assumes that keys are null-terminated and
returns a copy of a memory block that is off by one.
Signed-off-by: Manuel Schölling <manuel.schoelling@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 11 Jun 2014 06:51:00 +0000 (23:51 -0700)]
Merge branch 'bridge_multicast_exports'
Linus Lüssing says:
====================
bridge: multicast snooping patches / exports
The first patch is simply a cosmetic patch. So far I (and maybe others
too?) have been regularly confusing these two structs, therefore I'd
suggest renaming them and therefore making the follow-up patches easier
to understand and nicer to fit in.
The second patch fixes a minor issue, but probably not worth for stable.
On the other hand the first two patches are also preparations for the
third and fourth patch:
These two patches are exporting functionality needed to marry the bridge
multicast snooping with the batman-adv multicast optimizations recently
added for the 3.15 kernel, allowing to use these optimzations in common
setups having a bridge on top of e.g. bat0, too. So far these bridged
setups would fall back to simple flooding through the batman-adv mesh
network for any multicast packet entering bat0.
More information about the batman-adv multicast optimizations currently
implemented can be found here:
http://www.open-mesh.org/projects/batman-adv/wiki/Basic-multicast-optimizations
The integration on the batman-adv side could afterwards look like this,
for instance:
http://git.open-mesh.org/batman-adv.git/commitdiff/
576b59dd3e34737c702e548b21fa72059262f796?hp=
f95ce7131746c65fbcdffcf2089cab59e2c2f7ac
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Lüssing [Sat, 7 Jun 2014 16:26:29 +0000 (18:26 +0200)]
bridge: memorize and export selected IGMP/MLD querier port
Adding bridge support to the batman-adv multicast optimization requires
batman-adv knowing about the existence of bridged-in IGMP/MLD queriers
to be able to reliably serve any multicast listener behind this same
bridge.
Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Lüssing [Sat, 7 Jun 2014 16:26:28 +0000 (18:26 +0200)]
bridge: add export of multicast database adjacent to net_dev
With this new, exported function br_multicast_list_adjacent(net_dev) a
list of IPv4/6 addresses is returned. This list contains all multicast
addresses sensed by the bridge multicast snooping feature on all bridge
ports of the bridge interface of net_dev, excluding addresses from the
specified net_device itself.
Adding bridge support to the batman-adv multicast optimization requires
batman-adv knowing about the existence of bridged-in multicast
listeners to be able to reliably serve them with multicast packets.
Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Lüssing [Sat, 7 Jun 2014 16:26:27 +0000 (18:26 +0200)]
bridge: adhere to querier election mechanism specified by RFCs
MLDv1 (RFC2710 section 6), MLDv2 (RFC3810 section 7.6.2), IGMPv2
(RFC2236 section 3) and IGMPv3 (RFC3376 section 6.6.2) specify that the
querier with lowest source address shall become the selected
querier.
So far the bridge stopped its querier as soon as it heard another
querier regardless of its source address. This results in the "wrong"
querier potentially becoming the active querier or a potential,
unnecessary querying delay.
With this patch the bridge memorizes the source address of the currently
selected querier and ignores queries from queriers with a higher source
address than the currently selected one. This slight optimization is
supposed to make it more RFC compliant (but is rather uncritical and
therefore probably not necessary to be queued for stable kernels).
Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Lüssing [Sat, 7 Jun 2014 16:26:26 +0000 (18:26 +0200)]
bridge: rename struct bridge_mcast_query/querier
The current naming of these two structs is very random, in that
reversing their naming would not make any semantical difference.
This patch tries to make the naming less confusing by giving them a more
specific, distinguishable naming.
This is also useful for the upcoming patches reintroducing the
"struct bridge_mcast_querier" but for storing information about the
selected querier (no matter if our own or a foreign querier).
Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Popov [Fri, 6 Jun 2014 19:19:21 +0000 (23:19 +0400)]
ipip, sit: fix ipv4_{update_pmtu,redirect} calls
ipv4_{update_pmtu,redirect} were called with tunnel's ifindex (t->dev is a
tunnel netdevice). It caused wrong route lookup and failure of pmtu update or
redirect. We should use the same ifindex that we use in ip_route_output_* in
*tunnel_xmit code. It is t->parms.link .
Signed-off-by: Dmitry Popov <ixaphire@qrator.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 11 Jun 2014 05:49:59 +0000 (22:49 -0700)]
Merge branch 'cxgb4'
Hariprasad Shenai says:
====================
Adds support for CIQ and other misc. fixes for rdma/cxgb4
This patch series adds support to allocate and use IQs specifically for
indirect interrupts, adds fixes to align ISS for iWARP connections & fixes
related to tcp snd/rvd window for Chelsio T4/T5 adapters on iw_cxgb4.
Also changes Interrupt Holdoff Packet Count threshold of response queues for
cxgb4 driver.
The patches series is created against 'net-next' tree.
And includes patches on cxgb4 and iw_cxgb4 driver.
Since this patch-series contains cxgb4 and iw_cxgb4 patches, we would like to
request this patch series to get merged via David Miller's 'net-next' tree.
We have included all the maintainers of respective drivers. Kindly review the
change and let us know in case of any review comments.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Hariprasad Shenai [Fri, 6 Jun 2014 16:10:45 +0000 (21:40 +0530)]
cxgb4: Change default Interrupt Holdoff Packet Count Threshold
Based on original work by Casey Leedom <leedom@chelsio.com>
Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hariprasad Shenai [Fri, 6 Jun 2014 16:10:44 +0000 (21:40 +0530)]
iw_cxgb4: don't truncate the recv window size
Fixed a bug that shows up with recv window sizes that exceed the size of
the RCV_BUFSIZ field in opt0 (>= 1024K). If the recv window exceeds
this, then we specify the max possible in opt0, add add the rest in via
a RX_DATA_ACK credits.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hariprasad Shenai [Fri, 6 Jun 2014 16:10:43 +0000 (21:40 +0530)]
iw_cxgb4: Choose appropriate hw mtu index and ISS for iWARP connections
Select the appropriate hw mtu index and initial sequence number to optimize
hw memory performance.
Add new cxgb4_best_aligned_mtu() which allows callers to provide enough
information to be used to [possibly] select an MTU which will result in the
TCP Data Segment Size (AKA Maximum Segment Size) to be an aligned value.
If an RTR message exhange is required, then align the ISS to 8B - 1 + 4, so
that after the SYN the send seqno will align on a 4B boundary. The RTR
message exchange will leave the send seqno aligned on an 8B boundary.
If an RTR is not required, then align the ISS to 8B - 1. The goal is
to have the send seqno be 8B aligned when we send the first FPDU.
Based on original work by Casey Leedom <leeedom@chelsio.com> and
Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hariprasad Shenai [Fri, 6 Jun 2014 16:10:42 +0000 (21:40 +0530)]
iw_cxgb4: Allocate and use IQs specifically for indirect interrupts
Currently indirect interrupts for RDMA CQs funnel through the LLD's RDMA
RXQs, which also handle direct interrupts for offload CPLs during RDMA
connection setup/teardown. The intended T4 usage model, however, is to
have indirect interrupts flow through dedicated IQs. IE not to mix
indirect interrupts with CPL messages in an IQ. This patch adds the
concept of RDMA concentrator IQs, or CIQs, setup and maintained by the
LLD and exported to iw_cxgb4 for use when creating CQs. RDMA CPLs will
flow through the LLD's RDMA RXQs, and CQ interrupts flow through the
CIQs.
Design:
cxgb4 creates and exports an array of CIQs for the RDMA ULD. These IQs
are sized according to the max available CQs available at adapter init.
In addition, these IQs don't need FL buffers since they only service
indirect interrupts. One CIQ is setup per RX channel similar to the
RDMA RXQs.
iw_cxgb4 will utilize these CIQs based on the vector value passed into
create_cq(). The num_comp_vectors advertised by iw_cxgb4 will be the
number of CIQs configured, and thus the vector value will be the index
into the array of CIQs.
Based on original work by Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>