Timur Tabi [Fri, 24 Aug 2012 09:10:53 +0000 (09:10 +0000)]
netdev/phy: add MDIO bus multiplexer driven by a memory-mapped device
Add support for an MDIO bus multiplexer controlled by a simple memory-mapped
device, like an FPGA. The device must be memory-mapped and contain only
8-bit registers (which keeps things simple).
Tested on a Freescale P5020DS board which uses the "PIXIS" FPGA attached
to the localbus.
Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Francesco Ruggeri [Fri, 24 Aug 2012 07:38:35 +0000 (07:38 +0000)]
net: ipv4: ipmr_expire_timer causes crash when removing net namespace
When tearing down a net namespace, ipv4 mr_table structures are freed
without first deactivating their timers. This can result in a crash in
run_timer_softirq.
This patch mimics the corresponding behaviour in ipv6.
Locking and synchronization seem to be adequate.
We are about to kfree mrt, so existing code should already make sure that
no other references to mrt are pending or can be created by incoming traffic.
The functions invoked here do not cause new references to mrt or other
race conditions to be created.
Invoking del_timer_sync guarantees that ipmr_expire_timer is inactive.
Both ipmr_expire_process (whose completion we may have to wait in
del_timer_sync) and mroute_clean_tables internally use mfc_unres_lock
or other synchronizations when needed, and they both only modify mrt.
Tested in Linux 3.4.8.
Signed-off-by: Francesco Ruggeri <fruggeri@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bruce Allan [Fri, 24 Aug 2012 20:38:11 +0000 (20:38 +0000)]
e1000e: DoS while TSO enabled caused by link partner with small MSS
With a low enough MSS on the link partner and TSO enabled locally, the
networking stack can periodically send a very large (e.g. 64KB) TCP
message for which the driver will attempt to use more Tx descriptors than
are available by default in the Tx ring. This is due to a workaround in
the code that imposes a limit of only 4 MSS-sized segments per descriptor
which appears to be a carry-over from the older e1000 driver and may be
applicable only to some older PCI or PCIx parts which are not supported in
e1000e. When the driver gets a message that is too large to fit across the
configured number of Tx descriptors, it stops the upper stack from queueing
any more and gets stuck in this state. After a timeout, the upper stack
assumes the adapter is hung and calls the driver to reset it.
Remove the unnecessary limitation of using up to only 4 MSS-sized segments
per Tx descriptor, and put in a hard failure test to catch when attempting
to check for message sizes larger than would fit in the whole Tx ring.
Refactor the remaining logic that limits the size of data per Tx descriptor
from a seemingly arbitrary 8KB to a limit based on the dynamic size of the
Tx packet buffer as described in the hardware specification.
Also, fix the logic in the check for space in the Tx ring for the next
largest possible packet after the current one has been successfully queued
for transmit, and use the appropriate defines for default ring sizes in
e1000_probe instead of magic values.
This issue goes back to the introduction of e1000e in 2.6.24 when it was
split off from e1000.
Reported-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Cc: Stable <stable@vger.kernel.org> [2.6.24+]
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Srinivas Kandagatla [Fri, 24 Aug 2012 01:59:17 +0000 (01:59 +0000)]
of/mdio-gpio: Simplify the way device tree support is implemented.
This patch cleans up the way device tree support is added in mdio-gpio
driver. I found lot of code duplication which is not necessary.
Also strangely a new platform driver was also introduced for device tree
support. All this forced me to do this cleanup patch.
After this patch, the driver probe checks the of_node pointer to get the
data from device tree.
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Srinivas Kandagatla [Fri, 24 Aug 2012 01:58:59 +0000 (01:58 +0000)]
of/mdio: Add dummy functions in of_mdio.h.
This patch adds dummy functions in of_mdio.h, so that driver need not
ifdef there code with CONFIG_OF.
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 24 Aug 2012 01:47:26 +0000 (01:47 +0000)]
netpoll: provide an IP ident in UDP frames
Let's fill IP header ident field with a meaningful value,
it might help some setups.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
xeb@mail.ru [Fri, 24 Aug 2012 01:07:38 +0000 (01:07 +0000)]
l2tp: avoid to use synchronize_rcu in tunnel free function
Avoid to use synchronize_rcu in l2tp_tunnel_free because context may be
atomic.
Signed-off-by: Dmitry Kozlov <xeb@mail.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Claudiu Manoil [Thu, 23 Aug 2012 21:46:25 +0000 (21:46 +0000)]
gianfar: fix default tx vlan offload feature flag
Commit -
"b852b72 gianfar: fix bug caused by
87c288c6e9aa31720b72e2bc2d665e24e1653c3e"
disables by default (on mac init) the hw vlan tag insertion.
The "features" flags were not updated to reflect this, and
"ethtool -K" shows tx-vlan-offload to be "on" by default.
Cc: Sebastian Poehn <sebastian.poehn@belden.com>
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso [Wed, 29 Aug 2012 15:24:09 +0000 (15:24 +0000)]
netfilter: nf_nat_sip: fix incorrect handling of EBUSY for RTCP expectation
We're hitting bug while trying to reinsert an already existing
expectation:
kernel BUG at kernel/timer.c:895!
invalid opcode: 0000 [#1] SMP
[...]
Call Trace:
<IRQ>
[<
ffffffffa0069563>] nf_ct_expect_related_report+0x4a0/0x57a [nf_conntrack]
[<
ffffffff812d423a>] ? in4_pton+0x72/0x131
[<
ffffffffa00ca69e>] ip_nat_sdp_media+0xeb/0x185 [nf_nat_sip]
[<
ffffffffa00b5b9b>] set_expected_rtp_rtcp+0x32d/0x39b [nf_conntrack_sip]
[<
ffffffffa00b5f15>] process_sdp+0x30c/0x3ec [nf_conntrack_sip]
[<
ffffffff8103f1eb>] ? irq_exit+0x9a/0x9c
[<
ffffffffa00ca738>] ? ip_nat_sdp_media+0x185/0x185 [nf_nat_sip]
We have to remove the RTP expectation if the RTCP expectation hits EBUSY
since we keep trying with other ports until we succeed.
Reported-by: Rafal Fitt <rafalf@aplusc.com.pl>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Ian Campbell [Wed, 22 Aug 2012 00:26:47 +0000 (00:26 +0000)]
xen-netfront: use __pskb_pull_tail to ensure linear area is big enough on RX
I'm slightly concerned by the "only in exceptional circumstances"
comment on __pskb_pull_tail but the structure of an skb just created
by netfront shouldn't hit any of the especially slow cases.
This approach still does slightly more work than the old way, since if
we pull up the entire first frag we now have to shuffle everything
down where before we just received into the right place in the first
place.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: xen-devel@lists.xensource.com
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gao feng [Thu, 23 Aug 2012 15:36:55 +0000 (15:36 +0000)]
net: dev: fix the incorrect hold of net namespace's lo device
When moving a net device from one net namespace to another
net namespace,dev_change_net_namespace calls NETDEV_DOWN
event,so the original net namespace's dst entries which
beloned to this net device will be put into dst_garbage
list.
then dev_change_net_namespace will set this net device's
net to the new net namespace.
If we unregister this net device's driver, this will trigger
the NETDEV_UNREGISTER_FINAL event, dst_ifdown will be called,
and get this net device's dst entries from dst_garbage list,
put these entries' dev to the new net namespace's lo device.
It's not what we want,actually we need these dst entries hold
the original net namespace's lo device,this incorrect device
holding will trigger emg message like below.
unregister_netdevice: waiting for lo to become free. Usage count = 1
so we should call NETDEV_UNREGISTER_FINAL event in
dev_change_net_namespace too,in order to make sure dst entries
already in the dst_garbage list, we need rcu_barrier before we
call NETDEV_UNREGISTER_FINAL event.
With help form Eric Dumazet.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julia Lawall [Wed, 29 Aug 2012 06:49:17 +0000 (06:49 +0000)]
netfilter: nfnetlink_log: fix error return code in init path
Initialize return variable before exiting on an error path.
A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)
// <smpl>
(
if@p1 (\(ret < 0\|ret != 0\))
{ ... return ret; }
|
ret@p1 = 0
)
... when != ret = e1
when != &ret
*if(...)
{
... when != ret = e2
when forall
return ret;
}
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Julia Lawall [Wed, 29 Aug 2012 06:49:16 +0000 (06:49 +0000)]
netfilter: ctnetlink: fix error return code in init path
Initialize return variable before exiting on an error path.
A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)
// <smpl>
(
if@p1 (\(ret < 0\|ret != 0\))
{ ... return ret; }
|
ret@p1 = 0
)
... when != ret = e1
when != &ret
*if(...)
{
... when != ret = e2
when forall
return ret;
}
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Julia Lawall [Wed, 29 Aug 2012 06:49:11 +0000 (06:49 +0000)]
ipvs: fix error return code
Initialize return variable before exiting on an error path.
A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)
// <smpl>
(
if@p1 (\(ret < 0\|ret != 0\))
{ ... return ret; }
|
ret@p1 = 0
)
... when != ret = e1
when != &ret
*if(...)
{
... when != ret = e2
when forall
return ret;
}
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Patrick McHardy [Sun, 26 Aug 2012 17:14:31 +0000 (19:14 +0200)]
netfilter: ip6tables: add stateless IPv6-to-IPv6 Network Prefix Translation target
Signed-off-by: Patrick McHardy <kaber@trash.net>
Pablo Neira Ayuso [Sun, 26 Aug 2012 17:14:29 +0000 (19:14 +0200)]
netfilter: nf_nat: support IPv6 in TFTP NAT helper
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Pablo Neira Ayuso [Sun, 26 Aug 2012 17:14:27 +0000 (19:14 +0200)]
netfilter: nf_nat: support IPv6 in IRC NAT helper
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Patrick McHardy [Sun, 26 Aug 2012 17:14:25 +0000 (19:14 +0200)]
netfilter: nf_nat: support IPv6 in SIP NAT helper
Add IPv6 support to the SIP NAT helper. There are no functional differences
to IPv4 NAT, just different formats for addresses.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Patrick McHardy [Sun, 26 Aug 2012 17:14:22 +0000 (19:14 +0200)]
netfilter: nf_nat: support IPv6 in amanda NAT helper
Signed-off-by: Patrick McHardy <kaber@trash.net>
Patrick McHardy [Sun, 26 Aug 2012 17:14:20 +0000 (19:14 +0200)]
netfilter: nf_nat: support IPv6 in FTP NAT helper
Signed-off-by: Patrick McHardy <kaber@trash.net>
Patrick McHardy [Sun, 26 Aug 2012 17:14:18 +0000 (19:14 +0200)]
netfilter: ip6tables: add NETMAP target
Signed-off-by: Patrick McHardy <kaber@trash.net>
Patrick McHardy [Sun, 26 Aug 2012 17:14:16 +0000 (19:14 +0200)]
netfilter: ip6tables: add REDIRECT target
Signed-off-by: Patrick McHardy <kaber@trash.net>
Patrick McHardy [Sun, 26 Aug 2012 17:14:14 +0000 (19:14 +0200)]
netfilter: ip6tables: add MASQUERADE target
Signed-off-by: Patrick McHardy <kaber@trash.net>
Patrick McHardy [Sun, 26 Aug 2012 17:14:12 +0000 (19:14 +0200)]
netfilter: ipv6: add IPv6 NAT support
Signed-off-by: Patrick McHardy <kaber@trash.net>
Patrick McHardy [Sun, 26 Aug 2012 17:14:10 +0000 (19:14 +0200)]
net: core: add function for incremental IPv6 pseudo header checksum updates
Add inet_proto_csum_replace16 for incrementally updating IPv6 pseudo header
checksums for IPv6 NAT.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Sun, 26 Aug 2012 17:14:08 +0000 (19:14 +0200)]
netfilter: ipv6: expand skb head in ip6_route_me_harder after oif change
Expand the skb headroom if the oif changed due to rerouting similar to
how IPv4 packets are handled.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Patrick McHardy [Sun, 26 Aug 2012 17:14:06 +0000 (19:14 +0200)]
netfilter: add protocol independent NAT core
Convert the IPv4 NAT implementation to a protocol independent core and
address family specific modules.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Patrick McHardy [Sun, 26 Aug 2012 17:14:04 +0000 (19:14 +0200)]
netfilter: nf_nat: add protoff argument to packet mangling functions
For mangling IPv6 packets the protocol header offset needs to be known
by the NAT packet mangling functions. Add a so far unused protoff argument
and convert the conntrack and NAT helpers to use it in preparation of
IPv6 NAT.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Patrick McHardy [Sun, 26 Aug 2012 17:14:01 +0000 (19:14 +0200)]
netfilter: nf_conntrack: restrict NAT helper invocation to IPv4
The NAT helpers currently only handle IPv4 packets correctly. Restrict
invocation of the helpers to IPv4 in preparation of IPv6 NAT.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Patrick McHardy [Sun, 26 Aug 2012 17:13:59 +0000 (19:13 +0200)]
netfilter: nf_conntrack_ipv6: fix tracking of ICMPv6 error messages containing fragments
ICMPv6 error messages are tracked by extracting the conntrack tuple of
the inner packet and looking up the corresponding conntrack entry. Tuple
extraction uses the ->get_l4proto() callback, which in case of fragments
returns NEXTHDR_FRAGMENT instead of the upper protocol, even for the
first fragment when the entire next header is present, resulting in a
failure to find the correct connection tracking entry.
This patch changes ipv6_get_l4proto() to use ipv6_skip_exthdr() instead
of nf_ct_ipv6_skip_exthdr() in order to skip fragment headers when the
fragment offset is zero.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Patrick McHardy [Sun, 26 Aug 2012 17:13:58 +0000 (19:13 +0200)]
netfilter: nf_conntrack_ipv6: improve fragmentation handling
The IPv6 conntrack fragmentation currently has a couple of shortcomings.
Fragmentes are collected in PREROUTING/OUTPUT, are defragmented, the
defragmented packet is then passed to conntrack, the resulting conntrack
information is attached to each original fragment and the fragments then
continue their way through the stack.
Helper invocation occurs in the POSTROUTING hook, at which point only
the original fragments are available. The result of this is that
fragmented packets are never passed to helpers.
This patch improves the situation in the following way:
- If a reassembled packet belongs to a connection that has a helper
assigned, the reassembled packet is passed through the stack instead
of the original fragments.
- During defragmentation, the largest received fragment size is stored.
On output, the packet is refragmented if required. If the largest
received fragment size exceeds the outgoing MTU, a "packet too big"
message is generated, thus behaving as if the original fragments
were passed through the stack from an outside point of view.
- The ipv6_helper() hook function can't receive fragments anymore for
connections using a helper, so it is switched to use ipv6_skip_exthdr()
instead of the netfilter specific nf_ct_ipv6_skip_exthdr() and the
reassembled packets are passed to connection tracking helpers.
The result of this is that we can properly track fragmented packets, but
still generate ICMPv6 Packet too big messages if we would have before.
This patch is also required as a precondition for IPv6 NAT, where NAT
helpers might enlarge packets up to a point that they require
fragmentation. In that case we can't generate Packet too big messages
since the proper MTU can't be calculated in all cases (f.i. when
changing textual representation of a variable amount of addresses),
so the packet is transparently fragmented iff the original packet or
fragments would have fit the outgoing MTU.
IPVS parts by Jesper Dangaard Brouer <brouer@redhat.com>.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Jesper Dangaard Brouer [Tue, 28 Aug 2012 20:05:51 +0000 (22:05 +0200)]
ipvs: IPv6 MTU checking cleanup and bugfix
Cleaning up the IPv6 MTU checking in the IPVS xmit code, by using
a common helper function __mtu_check_toobig_v6().
The MTU check for tunnel mode can also use this helper as
ntohs(old_iph->payload_len) + sizeof(struct ipv6hdr) is qual to
skb->len. And the 'mtu' variable have been adjusted before
calling helper.
Notice, this also fixes a bug, as the the MTU check in ip_vs_dr_xmit_v6()
were missing a check for skb_is_gso().
This bug e.g. caused issues for KVM IPVS setups, where different
Segmentation Offloading techniques are utilized, between guests,
via the virtio driver. This resulted in very bad performance,
due to the ICMPv6 "too big" messages didn't affect the sender.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Amerigo Wang [Fri, 24 Aug 2012 21:41:11 +0000 (21:41 +0000)]
netpoll: revert
6bdb7fe3104 and fix be_poll() instead
Against -net.
In the patch "netpoll: re-enable irq in poll_napi()", I tried to
fix the following warning:
[100718.051041] ------------[ cut here ]------------
[100718.051048] WARNING: at kernel/softirq.c:159 local_bh_enable_ip+0x7d/0xb0()
(Not tainted)
[100718.051049] Hardware name: ProLiant BL460c G7
...
[100718.051068] Call Trace:
[100718.051073] [<
ffffffff8106b747>] ? warn_slowpath_common+0x87/0xc0
[100718.051075] [<
ffffffff8106b79a>] ? warn_slowpath_null+0x1a/0x20
[100718.051077] [<
ffffffff810747ed>] ? local_bh_enable_ip+0x7d/0xb0
[100718.051080] [<
ffffffff8150041b>] ? _spin_unlock_bh+0x1b/0x20
[100718.051085] [<
ffffffffa00ee974>] ? be_process_mcc+0x74/0x230 [be2net]
[100718.051088] [<
ffffffffa00ea68c>] ? be_poll_tx_mcc+0x16c/0x290 [be2net]
[100718.051090] [<
ffffffff8144fe76>] ? netpoll_poll_dev+0xd6/0x490
[100718.051095] [<
ffffffffa01d24a5>] ? bond_poll_controller+0x75/0x80 [bonding]
[100718.051097] [<
ffffffff8144fde5>] ? netpoll_poll_dev+0x45/0x490
[100718.051100] [<
ffffffff81161b19>] ? ksize+0x19/0x80
[100718.051102] [<
ffffffff81450437>] ? netpoll_send_skb_on_dev+0x157/0x240
by reenabling IRQ before calling ->poll, but it seems more
problems are introduced after that patch:
http://ozlabs.org/~akpm/stuff/IMG_20120824_122054.jpg
http://marc.info/?l=linux-netdev&m=
134563282530588&w=2
So it is safe to fix be2net driver code directly.
This patch reverts the offending commit and fixes be_poll() by
avoid disabling BH there, this is okay because be_poll()
can be called either by poll_napi() which already disables
IRQ, or by net_rx_action() which already disables BH.
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Reported-by: Sylvain Munaut <s.munaut@whatever-company.com>
Cc: Sylvain Munaut <s.munaut@whatever-company.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Miller <davem@davemloft.net>
Cc: Sathya Perla <sathya.perla@emulex.com>
Cc: Subbu Seetharaman <subbu.seetharaman@emulex.com>
Cc: Ajit Khaparde <ajit.khaparde@emulex.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
Tested-by: Sylvain Munaut <s.munaut@whatever-company.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Sun, 26 Aug 2012 17:13:55 +0000 (19:13 +0200)]
ipv4: fix path MTU discovery with connection tracking
IPv4 conntrack defragments incoming packet at the PRE_ROUTING hook and
(in case of forwarded packets) refragments them at POST_ROUTING
independent of the IP_DF flag. Refragmentation uses the dst_mtu() of
the local route without caring about the original fragment sizes,
thereby breaking PMTUD.
This patch fixes this by keeping track of the largest received fragment
with IP_DF set and generates an ICMP fragmentation required error during
refragmentation if that size exceeds the MTU.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 24 Aug 2012 22:54:37 +0000 (18:54 -0400)]
Merge branch 'for-next' of git://git./linux/kernel/git/ebiederm/user-namespace
This is an initial merge in of Eric Biederman's work to start adding
user namespace support to the networking.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 24 Aug 2012 20:35:43 +0000 (16:35 -0400)]
Merge branch 'for-davem' of git://git./linux/kernel/git/bwh/sfc-next
Ben Hutchings says:
====================
1. Change the TX path to stop queues earlier and avoid returning
NETDEV_TX_BUSY.
2. Remove some inefficiencies in soft-TSO.
3. Fix various bugs involving device state transitions and/or reset
scheduling by error handlers.
4. Take advantage of my previous change to operstate initialisation.
5. Miscellaneous cleanup.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 24 Aug 2012 20:23:31 +0000 (16:23 -0400)]
Merge branch 'sfc-3.6' of git://git./linux/kernel/git/bwh/sfc
Ben Hutchings says:
====================
Simple fix for a braino. Please also queue this for the 3.4 and 3.5
stable series.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 24 Aug 2012 19:21:07 +0000 (15:21 -0400)]
Merge branch 'fixes-for-3.6' of git://gitorious.org/linux-can/linux-can
Marc Kleine-Budde says:
====================
here are two fixes for the v3.6 release cycle. Alexey Khoroshilov submitted a
fix for a memory leak in the softing driver (in softing_load_fw()) in case a
krealloc() fails. Sven Schmitt fixed the misuse of the IRQF_SHARED flag in the
irq resouce of the sja1000 platform driver, now the correct flag is used. There
are no mainline users of this feature which need to be converted.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 24 Aug 2012 19:18:03 +0000 (15:18 -0400)]
Merge branch 'for-davem' of git://git./linux/kernel/git/linville/wireless-next
John W. Linville says:
====================
This is a batch of updates intended for 3.7. The bulk of it is
mac80211 changes, including some mesh work from Thomas Pederson and
some multi-channel work from Johannes. A variety of driver updates
and other bits are scattered in there as well.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 24 Aug 2012 19:15:04 +0000 (15:15 -0400)]
Merge branch 'for-davem' of git://git./linux/kernel/git/linville/wireless
John W. Linville says:
====================
This batch of fixes is intended for 3.6...
Johannes Berg gives us a pair of iwlwifi fixes. One corrects some
improperly defined ifdefs that lead to crashes and BUG_ONs. The other
prevents attempts to read SRAM for devices that aren't actually started.
Julia Lawall provides an ipw2100 fix to properly set the return code
from a function call before testing it! :-)
Thomas Huehn corrects the improper use of a constant related to a power
setting in ath5k.
Thomas Pedersen offers a mac80211 fix to properly handle destination
addresses of unicast frames passing though a mesh gate.
Vladimir Zapolskiy provides a brcmsmac fix to properly mark the
interface state when the device goes down.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Fri, 24 Aug 2012 17:04:38 +0000 (18:04 +0100)]
sfc: Fix the initial device operstate
Following commit 8f4cccb ('net: Set device operstate at registration
time') it is now correct and preferable to set the carrier off before
registering a device.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Ben Hutchings [Thu, 2 Aug 2012 00:39:38 +0000 (01:39 +0100)]
sfc: Assign efx and efx->type as early as possible in efx_pci_probe()
We also stop clearing *efx in efx_init_struct(). This is safe because
alloc_etherdev_mq() already clears it for us.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Ben Hutchings [Fri, 27 Jul 2012 19:50:57 +0000 (20:50 +0100)]
sfc: Remove bogus comment about MTU change and RX buffer overrun
RX DMA is limited by the length specified in each descriptor and not
by the MAC. Over-length frames may get into the RX FIFO regardless of
the MAC settings, due to a hardware bug, but they will be truncated by
the packet DMA engine and reported as such in the completion event.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Ben Hutchings [Fri, 27 Jul 2012 19:50:54 +0000 (20:50 +0100)]
sfc: Remove overly paranoid locking assertions from netdev operations
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Ben Hutchings [Fri, 27 Jul 2012 19:50:52 +0000 (20:50 +0100)]
sfc: Fix reset vs probe/remove/PM races involving efx_nic::state
We try to defer resets while the device is not READY, but we're not
doing this quite correctly. In particular, changes to efx_nic::state
are documented as serialised by the RTNL lock, but they aren't.
1. We check whether a reset was requested during probe (suggesting
broken hardware) before we allow requested resets to be scheduled.
This leaves a window where a requested reset would be deferred
indefinitely.
2. Although we cancel the reset work item during device removal,
there are still later operations that can cause it to be scheduled
again. We need to check the state before scheduling it.
3. Since the state can change between scheduling and running of
the work item, we still need to check it there, and we need to
do so *after* acquiring the RTNL lock which serialises state
changes.
4. We must cancel the reset work item during device removal, if the
state could ever have been READY. This wasn't done in some of the
failure paths from efx_pci_probe(). Move the cancellation to
efx_pci_remove_main().
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Ben Hutchings [Fri, 27 Jul 2012 19:48:36 +0000 (20:48 +0100)]
sfc: Improve log messages in case we abort probe due to a pending reset
The current informational message doesn't properly explain what
happens, and could also appear if we defer a reset during
suspend/resume.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Ben Hutchings [Fri, 27 Jul 2012 19:46:41 +0000 (20:46 +0100)]
sfc: Never try to stop and start a NIC that is disabled
efx_change_mtu() and efx_realloc_channels() each stop and start much
of the NIC, even if it has been disabled. Since efx_start_all() is a
no-op when the NIC is disabled, this is probably harmless in the case
of efx_change_mtu(), but efx_realloc_channels() also reenables
interrupts which could be a bad thing to do.
Change efx_start_all() and efx_start_interrupts() to assert that the
NIC is not disabled, but make efx_stop_interrupts() do nothing if the
NIC is disabled (since it is already stopped), consistent with
efx_stop_all().
Update comments for efx_start_all() and efx_stop_all() to describe
their purpose and preconditions more accurately.
Add a common function to check and log if the NIC is disabled, and use
it in efx_net_open(), efx_change_mtu() and efx_realloc_channels().
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Ben Hutchings [Fri, 27 Jul 2012 18:35:52 +0000 (19:35 +0100)]
sfc: Hold RTNL lock (only) when calling efx_stop_interrupts()
Interrupt state should be consistently guarded by the RTNL lock once
the net device is registered.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Ben Hutchings [Fri, 27 Jul 2012 18:35:47 +0000 (19:35 +0100)]
sfc: Keep disabled NICs quiescent during suspend/resume
Currently we ignore and clear the disabled state.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Ben Hutchings [Fri, 27 Jul 2012 18:35:39 +0000 (19:35 +0100)]
sfc: Hold the RTNL lock for more of the suspend/resume cycle
I don't think these PM functions can race with userland net device
operations, but it's much easier to reason about locking if state is
consistently guarded by the same lock.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Ben Hutchings [Fri, 27 Jul 2012 18:31:16 +0000 (19:31 +0100)]
sfc: Change state names to be clearer, and comment them
STATE_INIT and STATE_FINI are equivalent and represent incompletely
initialised states; combine them as STATE_UNINIT.
Rename STATE_RUNNING to STATE_READY, to avoid confusion with
netif_running() and IFF_RUNNING.
The comments do not quite match current usage, but this will be
corrected in subsequent fixes.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Ben Hutchings [Fri, 22 Jun 2012 01:44:01 +0000 (02:44 +0100)]
sfc: Stash header offsets for TSO in struct tso_state
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Ben Hutchings [Tue, 19 Jun 2012 19:03:41 +0000 (20:03 +0100)]
sfc: Replace tso_state::full_packet_space with ip_base_len
We only use tso_state::full_packet_space to calculate the IPv4 tot_len
or IPv6 payload_len, not to set tso_state::packet_space. Replace it
with an ip_base_len field holding the value of tot_len or payload_len
before including the TCP payload, which is much more useful when
constructing the new headers.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Ben Hutchings [Thu, 17 May 2012 17:40:54 +0000 (18:40 +0100)]
sfc: Simplify TSO header buffer allocation
TSO header buffers contain a control structure immediately followed by
the packet headers, and are kept on a free list when not in use. This
complicates buffer management and tends to result in cache read misses
when we recycle such buffers (particularly if DMA-coherent memory
requires caches to be disabled).
Replace the free list with a simple mapping by descriptor index. We
know that there is always a payload descriptor between any two
descriptors with TSO header buffers, so we can allocate only one
such buffer for each two descriptors.
While we're at it, use a standard error code for allocation failure,
not -1.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Ben Hutchings [Tue, 22 May 2012 00:27:58 +0000 (01:27 +0100)]
sfc: Stop TX queues before they fill up
We now have a definite upper bound on the number of descriptors per
skb; use that to stop the queue when the next packet might not fit.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Ben Hutchings [Thu, 17 May 2012 19:52:20 +0000 (20:52 +0100)]
sfc: Refactor struct efx_tx_buffer to use a flags field
Add a flags field to struct efx_tx_buffer, replacing the
continuation and map_single booleans.
Since a single descriptor cannot be both a TSO header and the last
descriptor for an skb, unionise efx_tx_buffer::{skb,tsoh} and add
flags for validity of these fields.
Clear all flags in free buffers (whereas previously the continuation
flag would be set).
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Yuchung Cheng [Thu, 23 Aug 2012 07:05:17 +0000 (07:05 +0000)]
tcp: fix cwnd reduction for non-sack recovery
The cwnd reduction in fast recovery is based on the number of packets
newly delivered per ACK. For non-sack connections every DUPACK
signifies a packet has been delivered, but the sender mistakenly
skips counting them for cwnd reduction.
The fix is to compute newly_acked_sacked after DUPACKs are accounted
in sacked_out for non-sack connections.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Nandita Dukkipati <nanditad@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Thu, 23 Aug 2012 03:26:53 +0000 (03:26 +0000)]
team: do not allow to add VLAN challenged port when vlan is used
Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Thu, 23 Aug 2012 03:26:52 +0000 (03:26 +0000)]
vlan: add helper which can be called to see if device is used by vlan
also, remove unused vlan_info definition from header
CC: Patrick McHardy <kaber@trash.net>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Thu, 23 Aug 2012 03:26:51 +0000 (03:26 +0000)]
team: don't print warn message on -ESRCH during event send
When no one is listening on NL socket, -ESRCH is returned and warning
message is printed. This message is confusing people and in fact has no
meaning. So do not print it in this case.
Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso [Thu, 23 Aug 2012 02:09:11 +0000 (02:09 +0000)]
netlink: fix possible spoofing from non-root processes
Non-root user-space processes can send Netlink messages to other
processes that are well-known for being subscribed to Netlink
asynchronous notifications. This allows ilegitimate non-root
process to send forged messages to Netlink subscribers.
The userspace process usually verifies the legitimate origin in
two ways:
a) Socket credentials. If UID != 0, then the message comes from
some ilegitimate process and the message needs to be dropped.
b) Netlink portID. In general, portID == 0 means that the origin
of the messages comes from the kernel. Thus, discarding any
message not coming from the kernel.
However, ctnetlink sets the portID in event messages that has
been triggered by some user-space process, eg. conntrack utility.
So other processes subscribed to ctnetlink events, eg. conntrackd,
know that the event was triggered by some user-space action.
Neither of the two ways to discard ilegitimate messages coming
from non-root processes can help for ctnetlink.
This patch adds capability validation in case that dst_pid is set
in netlink_sendmsg(). This approach is aggressive since existing
applications using any Netlink bus to deliver messages between
two user-space processes will break. Note that the exception is
NETLINK_USERSOCK, since it is reserved for netlink-to-netlink
userspace communication.
Still, if anyone wants that his Netlink bus allows netlink-to-netlink
userspace, then they can set NL_NONROOT_SEND. However, by default,
I don't think it makes sense to allow to use NETLINK_ROUTE to
communicate two processes that are sending no matter what information
that is not related to link/neighbouring/routing. They should be using
NETLINK_USERSOCK instead for that.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yongjun [Wed, 22 Aug 2012 21:28:40 +0000 (21:28 +0000)]
w5300: using eth_hw_addr_random() for random MAC and set device flag
Using eth_hw_addr_random() to generate a random Ethernet address
(MAC) to be used by a net device and set addr_assign_type.
Not need to duplicating its implementation.
spatch with a semantic match is used to found this problem.
(http://coccinelle.lip6.fr/)
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yongjun [Wed, 22 Aug 2012 21:28:19 +0000 (21:28 +0000)]
w5100: using eth_hw_addr_random() for random MAC and set device flag
Using eth_hw_addr_random() to generate a random Ethernet address
(MAC) to be used by a net device and set addr_assign_type.
Not need to duplicating its implementation.
spatch with a semantic match is used to found this problem.
(http://coccinelle.lip6.fr/)
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yongjun [Wed, 22 Aug 2012 20:49:33 +0000 (20:49 +0000)]
wimax/i2400m: use is_zero_ether_addr() instead of memcmp()
Using is_zero_ether_addr() instead of directly use
memcmp() to determine if the ethernet address is all
zeros.
spatch with a semantic match is used to found this problem.
(http://coccinelle.lip6.fr/)
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rayagond Kokatanur [Wed, 22 Aug 2012 21:28:18 +0000 (21:28 +0000)]
stmmac: add header inclusion protection
This patch adds "#ifndef __<header>_H" for protecting header from double
inclusion.
Signed-off-by: Rayagond Kokatanur <rayagond@vayavyalabs.com>
Hacked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Mon, 20 Aug 2012 21:16:51 +0000 (22:16 +0100)]
net: Set device operstate at registration time
The operstate of a device is initially IF_OPER_UNKNOWN and is updated
asynchronously by linkwatch after each change of carrier state
reported by the driver. The default carrier state of a net device is
on, and this will never be changed on drivers that do not support
carrier detection, thus the operstate remains IF_OPER_UNKNOWN.
For devices that do support carrier detection, the driver must set the
carrier state to off initially, then poll the hardware state when the
device is opened. However, we must not activate linkwatch for a
unregistered device, and commit b473001 ('net: Do not fire linkwatch
events until the device is registered.') ensured that we don't. But
this means that the operstate for many devices that support carrier
detection remains IF_OPER_UNKNOWN when it should be IF_OPER_DOWN.
The same issue exists with the dormant state.
The proper initialisation sequence, avoiding a race with opening of
the device, is:
rtnl_lock();
rc = register_netdevice(dev);
if (rc)
goto out_unlock;
netif_carrier_off(dev); /* or netif_dormant_on(dev) */
rtnl_unlock();
but it seems silly that this should have to be repeated in so many
drivers. Further, the operstate seen immediately after opening the
device may still be IF_OPER_UNKNOWN due to the asynchronous nature of
linkwatch.
Commit 22604c8 ('net: Fix for initial link state in 2.6.28') attempted
to fix this by setting the operstate synchronously, but it was
reverted as it could lead to deadlock.
This initialises the operstate synchronously at registration time
only.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Timur Tabi [Mon, 20 Aug 2012 09:26:39 +0000 (09:26 +0000)]
net/fsl: introduce Freescale 10G MDIO driver
Similar to fsl_pq_mdio.c, this driver is for the 10G MDIO controller on
Freescale Frame Manager Ethernet controllers.
Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Neil Horman [Mon, 20 Aug 2012 07:59:10 +0000 (07:59 +0000)]
cls_cgroup: Allow classifier cgroups to have their classid reset to 0
The network classifier cgroup initalizes each cgroups instance classid value to
0. However, the sock_update_classid function only updates classid's in sockets
if the tasks cgroup classid is not zero, and if it differs from the current
classid. The later check is to prevent cache line dirtying, but the former is
detrimental, as it prevents resetting a classid for a cgroup to 0. While this
is not a common action, it has administrative usefulness (if the admin wants to
disable classification of a certain group temporarily for instance).
Easy fix, just remove the zero check. Tested successfully by myself
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: "David S. Miller" <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
John W. Linville [Fri, 24 Aug 2012 16:25:30 +0000 (12:25 -0400)]
Merge branch 'master' of git://git./linux/kernel/git/linville/wireless-next into for-davem
Eric Dumazet [Fri, 24 Aug 2012 05:40:47 +0000 (05:40 +0000)]
ipv4: take rt_uncached_lock only if needed
Multicast traffic allocates dst with DST_NOCACHE, but dst is
not inserted into rt_uncached_list.
This slowdown multicast workloads on SMP because rt_uncached_lock is
contended.
Change the test before taking the lock to actually check the dst
was inserted into rt_uncached_list.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 24 Aug 2012 15:30:38 +0000 (11:30 -0400)]
Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge
Antonio Quartulli says:
====================
Included changes:
- a set of codestyle rearrangements/fixes
- new feature to early detect new joining (mesh-unaware) clients
- a minor fix for the gw-feature
- substitution of shift operations with the BIT() macro
- reorganization of the main batman-adv structure (struct batadv_priv)
- some more (very) minor cleanups and fixes
===================
Signed-off-by: David S. Miller <davem@davemloft.net>
John W. Linville [Fri, 24 Aug 2012 15:16:58 +0000 (11:16 -0400)]
Merge branch 'master' of git://git./linux/kernel/git/linville/wireless into for-davem
Sven Schmitt [Thu, 9 Aug 2012 12:46:34 +0000 (14:46 +0200)]
can: sja1000_platform: fix wrong flag IRQF_SHARED for interrupt sharing
The sja1000 platform driver wrongly assumes that a shared IRQ is indicated
with the IRQF_SHARED flag in irq resource flags. This patch changes the
driver to handle the correct flag IORESOURCE_IRQ_SHAREABLE instead.
There are no mainline users of the platform driver which wrongly make use
of IRQF_SHARED.
Signed-off-by: Sven Schmitt <sven.schmitt@volkswagen.de>
Acked-by: Yegor Yefremov <yegorslists@googlemail.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Alexey Khoroshilov [Wed, 8 Aug 2012 15:15:01 +0000 (19:15 +0400)]
can: softing: Fix potential memory leak in softing_load_fw()
Do not leak memory by updating pointer with potentially NULL realloc return value.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Ben Hutchings [Wed, 15 Aug 2012 17:09:15 +0000 (18:09 +0100)]
sfc: Fix reporting of IPv4 full filters through ethtool
ETHTOOL_GRXCLSRULE returns filters for a TCP/IPv4 or UDP/IPv4 4-tuple
with source and destination swapped.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Rami Rosen [Thu, 23 Aug 2012 02:55:41 +0000 (02:55 +0000)]
packet: fix broken build.
This patch fixes a broken build due to a missing header:
...
CC net/ipv4/proc.o
In file included from include/net/net_namespace.h:15,
from net/ipv4/proc.c:35:
include/net/netns/packet.h:11: error: field 'sklist_lock' has incomplete type
...
The lock of netns_packet has been replaced by a recent patch to be a mutex instead of a spinlock,
but we need to replace the header file to be linux/mutex.h instead of linux/spinlock.h as well.
See commit
0fa7fa98dbcc2789409ed24e885485e645803d7f:
packet: Protect packet sk list with mutex (v2) patch,
Signed-off-by: Rami Rosen <rosenr@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fengguang Wu [Thu, 23 Aug 2012 11:51:21 +0000 (19:51 +0800)]
af_packet: match_fanout_group() can be static
cc: Eric Leblond <eric@regit.org>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 22 Aug 2012 21:50:59 +0000 (21:50 +0000)]
net: reinstate rtnl in call_netdevice_notifiers()
Eric Biederman pointed out that not holding RTNL while calling
call_netdevice_notifiers() was racy.
This patch is a direct transcription his feedback
against commit
0115e8e30d6fc (net: remove delay at device dismantle)
Thanks Eric !
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Gao feng <gaofeng@cn.fujitsu.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
John W. Linville [Thu, 23 Aug 2012 13:51:15 +0000 (09:51 -0400)]
Merge branch 'for-john' of git://git./linux/kernel/git/jberg/mac80211
John W. Linville [Thu, 23 Aug 2012 13:49:42 +0000 (09:49 -0400)]
Merge branch 'for-john' of git://git./linux/kernel/git/jberg/mac80211-next
Sven Eckelmann [Sun, 19 Aug 2012 19:48:25 +0000 (21:48 +0200)]
batman-adv: Start new development cycle
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Antonio Quartulli [Thu, 5 Jul 2012 21:38:30 +0000 (23:38 +0200)]
batman-adv: change interface_rx to get orig node
In order to understand where a broadcast packet is coming from and use
this information to detect not yet announced clients, this patch modifies the
interface_rx() function by passing a new argument: the orig node
corresponding to the node that originated the received packet (if known).
This new argument if not NULL for broadcast packets only (other packets does not
have source field).
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Antonio Quartulli [Thu, 5 Jul 2012 21:38:29 +0000 (23:38 +0200)]
batman-adv: detect not yet announced clients
With the current TT mechanism a new client joining the network is not
immediately able to communicate with other hosts because its MAC address has not
been announced yet. This situation holds until the first OGM containing its
joining event will be spread over the mesh network.
This behaviour can be acceptable in networks where the originator interval is a
small value (e.g. 1sec) but if that value is set to an higher time (e.g. 5secs)
the client could suffer from several malfunctions like DHCP client timeouts,
etc.
This patch adds an early detection mechanism that makes nodes in the network
able to recognise "not yet announced clients" by means of the broadcast packets
they emitted on connection (e.g. ARP or DHCP request). The added client will
then be confirmed upon receiving the OGM claiming it or purged if such OGM
is not received within a fixed amount of time.
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Sven Eckelmann [Sun, 8 Jul 2012 16:33:51 +0000 (18:33 +0200)]
batman-adv: Reduce accumulated length of simple statements
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Sven Eckelmann [Sun, 8 Jul 2012 15:13:15 +0000 (17:13 +0200)]
batman-adv: Don't break statements after assignment operator
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Sven Eckelmann [Sun, 8 Jul 2012 14:32:09 +0000 (16:32 +0200)]
batman-adv: Use BIT(x) macro to calculate bit positions
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Martin Hundebøll [Thu, 5 Jul 2012 09:34:28 +0000 (11:34 +0200)]
batman-adv: Drop tt queries with foreign dest
When enabling promiscuous mode, tt queries for other hosts might be
received. Before this patch, "foreign" tt queries were processed like
any other query and thus forwarded to its destination again and thereby
causing a loop.
This patch adds a check to drop foreign tt queries.
Signed-off-by: Martin Hundebøll <martin@hundeboll.net>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Martin Hundebøll [Thu, 5 Jul 2012 09:34:27 +0000 (11:34 +0200)]
batman-adv: Move batadv_check_unicast_packet()
batadv_check_unicast_packet() is needed in batadv_recv_tt_query(), so
move the former to before the latter.
Signed-off-by: Martin Hundebøll <martin@hundeboll.net>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Sven Eckelmann [Sun, 15 Jul 2012 20:26:51 +0000 (22:26 +0200)]
batman-adv: Split batadv_priv in sub-structures for features
The structure batadv_priv grows everytime a new feature is introduced. It gets
hard to find the parts of the struct that belongs to a specific feature. This
becomes even harder by the fact that not every feature uses a prefix in the
member name.
The variables for bridge loop avoidence, gateway handling, translation table
and visualization server are moved into separate structs that are included in
the bat_priv main struct.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Simon Wunderlich [Sun, 1 Jul 2012 20:51:55 +0000 (22:51 +0200)]
batman-adv: check batadv_orig_hash_add_if() return code
If this call fails, some of the orig_nodes spaces may have been
resized for the increased number of interface, and some may not.
If we would just continue with the larger number of interfaces,
this would lead to access to not allocated memory later.
We better check the return code, and don't add the interface if
no memory is available. OTOH, keeping some of the orig_nodes
with too much memory allocated should hurt no one (except for
a few too many bytes allocated).
Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Antonio Quartulli [Sun, 1 Jul 2012 17:07:31 +0000 (19:07 +0200)]
batman-adv: fix typos in comments
the word millisecond is misspelled in several comments. This patch fixes it.
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Antonio Quartulli [Sun, 1 Jul 2012 12:09:12 +0000 (14:09 +0200)]
batman-adv: add reference counting for type batadv_tt_orig_list_entry
The batadv_tt_orig_list_entry structure didn't have any refcounting mechanism so
far. This patch introduces it and makes the structure being usable in much more
complex context.
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Jonathan Corbet [Sat, 30 Jun 2012 16:49:13 +0000 (10:49 -0600)]
batman-adv: remove a misleading comment
As much as I'm happy to see LWN links sprinkled through the kernel by the
dozen, this one in particular reflects a very old state of reality; the
associated comment is now incorrect. So just delete it.
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Marek Lindner [Sat, 23 Jun 2012 09:47:53 +0000 (11:47 +0200)]
batman-adv: convert remaining packet counters to per_cpu_ptr() infrastructure
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Acked-by: Martin Hundebøll <martin@hundeboll.net>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Simon Wunderlich [Sat, 23 Jun 2012 10:34:18 +0000 (12:34 +0200)]
batman-adv: rename bridge loop avoidance claim types
for consistency reasons within the code and with the documentation,
we should always call it "claim" and "unclaim".
Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Simon Wunderlich [Sat, 23 Jun 2012 10:34:17 +0000 (12:34 +0200)]
batman-adv: correct comments in bridge loop avoidance
Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Simon Wunderlich [Mon, 18 Jun 2012 16:39:26 +0000 (18:39 +0200)]
batman-adv: Add the backbone gateway list to debugfs
This is especially useful if there are no claims yet, but we still want
to know which gateways are using bridge loop avoidance in the network.
Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Antonio Quartulli [Tue, 21 Aug 2012 22:42:40 +0000 (00:42 +0200)]
batman-adv: move function arguments on one line
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Pavel Emelyanov [Tue, 21 Aug 2012 01:06:47 +0000 (01:06 +0000)]
packet: Protect packet sk list with mutex (v2)
Change since v1:
* Fixed inuse counters access spotted by Eric
In patch
eea68e2f (packet: Report socket mclist info via diag module) I've
introduced a "scheduling in atomic" problem in packet diag module -- the
socket list is traversed under rcu_read_lock() while performed under it sk
mclist access requires rtnl lock (i.e. -- mutex) to be taken.
[152363.820563] BUG: scheduling while atomic: crtools/12517/0x10000002
[152363.820573] 4 locks held by crtools/12517:
[152363.820581] #0: (sock_diag_mutex){+.+.+.}, at: [<
ffffffff81a2dcb5>] sock_diag_rcv+0x1f/0x3e
[152363.820613] #1: (sock_diag_table_mutex){+.+.+.}, at: [<
ffffffff81a2de70>] sock_diag_rcv_msg+0xdb/0x11a
[152363.820644] #2: (nlk->cb_mutex){+.+.+.}, at: [<
ffffffff81a67d01>] netlink_dump+0x23/0x1ab
[152363.820693] #3: (rcu_read_lock){.+.+..}, at: [<
ffffffff81b6a049>] packet_diag_dump+0x0/0x1af
Similar thing was then re-introduced by further packet diag patches (fanount
mutex and pgvec mutex for rings) :(
Apart from being terribly sorry for the above, I propose to change the packet
sk list protection from spinlock to mutex. This lock currently protects two
modifications:
* sklist
* prot inuse counters
The sklist modifications can be just reprotected with mutex since they already
occur in a sleeping context. The inuse counters modifications are trickier -- the
__this_cpu_-s are used inside, thus requiring the caller to handle the potential
issues with contexts himself. Since packet sockets' counters are modified in two
places only (packet_create and packet_release) we only need to protect the context
from being preempted. BH disabling is not required in this case.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allan, Bruce W [Mon, 20 Aug 2012 04:55:29 +0000 (04:55 +0000)]
mdio: translation of MMD EEE registers to/from ethtool settings
The helper functions which translate IEEE MDIO Manageable Device (MMD)
Energy-Efficient Ethernet (EEE) registers 3.20, 7.60 and 7.61 to and from
the comparable ethtool supported/advertised settings will be needed by
drivers other than those in PHYLIB (e.g. e1000e in a follow-on patch).
In the same fashion as similar translation functions in linux/mii.h, move
these functions from the PHYLIB core to the linux/mdio.h header file so the
code will not have to be duplicated in each driver needing MMD-to-ethtool
(and vice-versa) translations. The function and some variable names have
been renamed to be more descriptive.
Not tested on the only hardware that currently calls the related functions,
stmmac, because I don't have access to any. Has been compile tested and
the translations have been tested on a locally modified version of e1000e.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>