Alexey Dobriyan [Wed, 8 Oct 2008 09:35:11 +0000 (11:35 +0200)]
netfilter: netns nat: per-netns bysource hash
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Alexey Dobriyan [Wed, 8 Oct 2008 09:35:10 +0000 (11:35 +0200)]
netfilter: netns nat: per-netns NAT table
Same story as with iptable_filter, iptables_raw tables.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Alexey Dobriyan [Wed, 8 Oct 2008 09:35:10 +0000 (11:35 +0200)]
netfilter: netns nat: fix ipt_MASQUERADE in netns
First, allow entry in notifier hook.
Second, start conntrack cleanup in netns to which netdevice belongs.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Alexey Dobriyan [Wed, 8 Oct 2008 09:35:10 +0000 (11:35 +0200)]
netfilter: netns nf_conntrack: PPTP conntracking in netns
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Alexey Dobriyan [Wed, 8 Oct 2008 09:35:10 +0000 (11:35 +0200)]
netfilter: netns nf_conntrack: GRE conntracking in netns
* make keymap list per-netns
* per-netns keymal lock (not strictly necessary)
* flush keymap at netns stop and module unload.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Alexey Dobriyan [Wed, 8 Oct 2008 09:35:09 +0000 (11:35 +0200)]
netfilter: netns nf_conntrack: H323 conntracking in netns
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Alexey Dobriyan [Wed, 8 Oct 2008 09:35:09 +0000 (11:35 +0200)]
netfilter: netns nf_conntrack: SIP conntracking in netns
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Alexey Dobriyan [Wed, 8 Oct 2008 09:35:09 +0000 (11:35 +0200)]
netfilter: netns nf_conntrack: final netns tweaks
Add init_net checks to not remove kmem_caches twice and so on.
Refactor functions to split code which should be executed only for
init_net into one place.
ip_ct_attach and ip_ct_destroy assignments remain separate, because
they're separate stages in setup and teardown.
NOTE: NOTRACK code is in for-every-net part. It will be made per-netns
after we decidce how to do it correctly.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Alexey Dobriyan [Wed, 8 Oct 2008 09:35:09 +0000 (11:35 +0200)]
netfilter: netns nf_conntrack: per-netns conntrack accounting
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Alexey Dobriyan [Wed, 8 Oct 2008 09:35:08 +0000 (11:35 +0200)]
netfilter: netns nf_conntrack: per-netns net.netfilter.nf_conntrack_log_invalid sysctl
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Alexey Dobriyan [Wed, 8 Oct 2008 09:35:08 +0000 (11:35 +0200)]
netfilter: netns nf_conntrack: per-netns net.netfilter.nf_conntrack_checksum sysctl
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Alexey Dobriyan [Wed, 8 Oct 2008 09:35:08 +0000 (11:35 +0200)]
netfilter: netns nf_conntrack: per-netns net.netfilter.nf_conntrack_count sysctl
Note, sysctl table is always duplicated, this is simpler and less
special-cased.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Alexey Dobriyan [Wed, 8 Oct 2008 09:35:08 +0000 (11:35 +0200)]
netfilter: netns nf_conntrack: per-netns /proc/net/stat/nf_conntrack, /proc/net/stat/ip_conntrack
Show correct conntrack count, while I'm at it.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Alexey Dobriyan [Wed, 8 Oct 2008 09:35:07 +0000 (11:35 +0200)]
netfilter: netns nf_conntrack: per-netns statistics
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Alexey Dobriyan [Wed, 8 Oct 2008 09:35:07 +0000 (11:35 +0200)]
netfilter: netns nf_conntrack: per-netns event cache
Heh, last minute proof-reading of this patch made me think,
that this is actually unneeded, simply because "ct" pointers will be
different for different conntracks in different netns, just like they
are different in one netns.
Not so sure anymore.
[Patrick: pointers will be different, flushing can only be done while
inactive though and thus it needs to be per netns]
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Alexey Dobriyan [Wed, 8 Oct 2008 09:35:07 +0000 (11:35 +0200)]
netfilter: netns nf_conntrack: pass conntrack to nf_conntrack_event_cache() not skb
This is cleaner, we already know conntrack to which event is relevant.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Alexey Dobriyan [Wed, 8 Oct 2008 09:35:07 +0000 (11:35 +0200)]
netfilter: netns nf_conntrack: cleanup after L3 and L4 proto unregister in every netns
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Alexey Dobriyan [Wed, 8 Oct 2008 09:35:06 +0000 (11:35 +0200)]
netfilter: netns nf_conntrack: unregister helper in every netns
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Alexey Dobriyan [Wed, 8 Oct 2008 09:35:06 +0000 (11:35 +0200)]
netns: export netns list
Conntrack code will use it for
a) removing expectations and helpers when corresponding module is removed, and
b) removing conntracks when L3 protocol conntrack module is removed.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Alexey Dobriyan [Wed, 8 Oct 2008 09:35:06 +0000 (11:35 +0200)]
netfilter: netns nf_conntrack: per-netns /proc/net/ip_conntrack, /proc/net/stat/ip_conntrack, /proc/net/ip_conntrack_expect
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Alexey Dobriyan [Wed, 8 Oct 2008 09:35:06 +0000 (11:35 +0200)]
netfilter: netns nf_conntrack: per-netns /proc/net/nf_conntrack_expect
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Alexey Dobriyan [Wed, 8 Oct 2008 09:35:05 +0000 (11:35 +0200)]
netfilter: netns nf_conntrack: per-netns /proc/net/nf_conntrack, /proc/net/stat/nf_conntrack
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Alexey Dobriyan [Wed, 8 Oct 2008 09:35:05 +0000 (11:35 +0200)]
netfilter: netns nf_conntrack: pass netns pointer to L4 protocol's ->error hook
Again, it's deducible from skb, but we're going to use it for
nf_conntrack_checksum and statistics, so just pass it from upper layer.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Alexey Dobriyan [Wed, 8 Oct 2008 09:35:04 +0000 (11:35 +0200)]
netfilter: netns nf_conntrack: pass netns pointer to nf_conntrack_in()
It's deducible from skb->dev or skb->dst->dev, but we know netns at
the moment of call, so pass it down and use for finding and creating
conntracks.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Alexey Dobriyan [Wed, 8 Oct 2008 09:35:04 +0000 (11:35 +0200)]
netfilter: netns nf_conntrack: per-netns unconfirmed list
What is confirmed connection in one netns can very well be unconfirmed
in another one.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Alexey Dobriyan [Wed, 8 Oct 2008 09:35:03 +0000 (11:35 +0200)]
netfilter: netns nf_conntrack: per-netns expectations
Make per-netns a) expectation hash and b) expectations count.
Expectations always belongs to netns to which it's master conntrack belong.
This is natural and doesn't bloat expectation.
Proc files and leaf users are stubbed to init_net, this is temporary.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Alexey Dobriyan [Wed, 8 Oct 2008 09:35:03 +0000 (11:35 +0200)]
netfilter: netns: fix {ip,6}_route_me_harder() in netns
Take netns from skb->dst->dev. It should be safe because, they are called
from LOCAL_OUT hook where dst is valid (though, I'm not exactly sure about
IPVS and queueing packets to userspace).
[Patrick: its safe everywhere since they already expect skb->dst to be set]
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Alexey Dobriyan [Wed, 8 Oct 2008 09:35:03 +0000 (11:35 +0200)]
netfilter: netns nf_conntrack: per-netns conntrack hash
* make per-netns conntrack hash
Other solution is to add ->ct_net pointer to tuplehashes and still has one
hash, I tried that it's ugly and requires more code deep down in protocol
modules et al.
* propagate netns pointer to where needed, e. g. to conntrack iterators.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Alexey Dobriyan [Wed, 8 Oct 2008 09:35:03 +0000 (11:35 +0200)]
netfilter: netns nf_conntrack: per-netns conntrack count
Sysctls and proc files are stubbed to init_net's one. This is temporary.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Alexey Dobriyan [Wed, 8 Oct 2008 09:35:02 +0000 (11:35 +0200)]
netfilter: netns nf_conntrack: add ->ct_net -- pointer from conntrack to netns
Conntrack (struct nf_conn) gets pointer to netns: ->ct_net -- netns in which
it was created. It comes from netdevice.
->ct_net is write-once field.
Every conntrack in system has ->ct_net initialized, no exceptions.
->ct_net doesn't pin netns: conntracks are recycled after timeouts and
pinning background traffic will prevent netns from even starting shutdown
sequence.
Right now every conntrack is created in init_net.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Alexey Dobriyan [Wed, 8 Oct 2008 09:35:02 +0000 (11:35 +0200)]
netfilter: netns nf_conntrack: add netns boilerplate
One comment: #ifdefs around #include is necessary to overcome amazing compile
breakages in NOTRACK-in-netns patch (see below).
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Alexey Dobriyan [Wed, 8 Oct 2008 09:35:02 +0000 (11:35 +0200)]
netfilter: netns: ip6t_REJECT in netns for real
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Alexey Dobriyan [Wed, 8 Oct 2008 09:35:02 +0000 (11:35 +0200)]
netfilter: netns: ip6table_mangle in netns for real
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Alexey Dobriyan [Wed, 8 Oct 2008 09:35:01 +0000 (11:35 +0200)]
netfilter: netns: ip6table_raw in netns for real
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Alexey Dobriyan [Wed, 8 Oct 2008 09:35:01 +0000 (11:35 +0200)]
netfilter: netns: remove nf_*_net() wrappers
Now that dev_net() exists, the usefullness of them is even less. Also they're
a big problem in resolving circular header dependencies necessary for
NOTRACK-in-netns patch. See below.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Jan Engelhardt [Wed, 8 Oct 2008 09:35:01 +0000 (11:35 +0200)]
netfilter: implement NFPROTO_UNSPEC as a wildcard for extensions
When a match or target is looked up using xt_find_{match,target},
Xtables will also search the NFPROTO_UNSPEC module list. This allows
for protocol-independent extensions (like xt_time) to be reused from
other components (e.g. arptables, ebtables).
Extensions that take different codepaths depending on match->family
or target->family of course cannot use NFPROTO_UNSPEC within the
registration structure (e.g. xt_pkttype).
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Jan Engelhardt [Wed, 8 Oct 2008 09:35:01 +0000 (11:35 +0200)]
netfilter: x_tables: use NFPROTO_* in extensions
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Jan Engelhardt [Wed, 8 Oct 2008 09:35:00 +0000 (11:35 +0200)]
netfilter: Introduce NFPROTO_* constants
The netfilter subsystem only supports a handful of protocols (much
less than PF_*) and even non-PF protocols like ARP and
pseudo-protocols like PF_BRIDGE. By creating NFPROTO_*, we can earn a
few memory savings on arrays that previously were always PF_MAX-sized
and keep the pseudo-protocols to ourselves.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Jan Engelhardt [Wed, 8 Oct 2008 09:35:00 +0000 (11:35 +0200)]
netfilter: xt_recent: IPv6 support
This updates xt_recent to support the IPv6 address family.
The new /proc/net/xt_recent directory must be used for this.
The old proc interface can also be configured out.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Jan Engelhardt [Wed, 8 Oct 2008 09:35:00 +0000 (11:35 +0200)]
netfilter: rename ipt_recent to xt_recent
Like with other modules (such as ipt_state), ipt_recent.h is changed
to forward definitions to (IOW include) xt_recent.h, and xt_recent.c
is changed to use the new constant names.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Jan Engelhardt [Wed, 8 Oct 2008 09:35:00 +0000 (11:35 +0200)]
netfilter: Use unsigned types for hooknum and pf vars
and (try to) consistently use u_int8_t for the L3 family.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Denis V. Lunev [Tue, 7 Oct 2008 21:50:06 +0000 (14:50 -0700)]
netns: make uplitev6 mib per/namespace
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Denis V. Lunev [Tue, 7 Oct 2008 21:49:36 +0000 (14:49 -0700)]
netns: make udpv6 mib per/namespace
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Denis V. Lunev [Tue, 7 Oct 2008 21:48:53 +0000 (14:48 -0700)]
netns: add stub functions for per/namespace mibs allocation
The content of init_ipv6_mibs/cleanup_ipv6_mibs will be moved to new
calls one by one next.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Denis V. Lunev [Tue, 7 Oct 2008 21:47:55 +0000 (14:47 -0700)]
netns: allow per device ipv6 snmp statistics in non-initial namespace
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Denis V. Lunev [Tue, 7 Oct 2008 21:47:37 +0000 (14:47 -0700)]
netns: register global ipv6 mibs statistics in each namespace
Unused net variable will become used very soon.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Denis V. Lunev [Tue, 7 Oct 2008 21:47:12 +0000 (14:47 -0700)]
ipv6: separate seq_ops for global & per/device ipv6 statistics
idev has been stored on seq->private. NULL has been stored for global
statistics.
The situation is changed with net namespace. We need to store pointer to
struct net and the only place is seq->private. So, we'll have for
/proc/net/dev_snmp6/* and for /proc/net/snmp6 pointers of two different
types stored in the same field.
This effectively requires to separate seq_ops of these files.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Denis V. Lunev [Tue, 7 Oct 2008 21:46:47 +0000 (14:46 -0700)]
ipv6: consolidate ipv6 sock_stat code at the beginning of net/ipv6/proc.c
Simple, comsolidate sockstat6 staff in one place, at the beginning of
the file. Right now sockstat6_seq_open/sockstat6_seq_fops looks like an
intrusion in the middle of snmp6 code.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Denis V. Lunev [Tue, 7 Oct 2008 21:46:18 +0000 (14:46 -0700)]
netns: register /proc/net/dev_snmp6/* in each ns
Do the same for /proc/net/snmp6.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Denis V. Lunev [Tue, 7 Oct 2008 21:45:55 +0000 (14:45 -0700)]
netns: move /proc/net/dev_snmp6 to struct net
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ilpo Järvinen [Tue, 7 Oct 2008 21:43:31 +0000 (14:43 -0700)]
tcp: cleanup messy initializer
I'm quite sure that if I give this function in its old format
for you to inspect, you start to wonder what is the type of
demanded or if it's a global variable.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ilpo Järvinen [Tue, 7 Oct 2008 21:43:06 +0000 (14:43 -0700)]
tcp: kill pointless urg_mode
It all started from me noticing that this urgent check in
tcp_clean_rtx_queue is unnecessarily inside the loop. Then
I took a longer look to it and found out that the users of
urg_mode can trivially do without, well almost, there was
one gotcha.
Bonus: those funny people who use urg with >= 2^31 write_seq -
snd_una could now rejoice too (that's the only purpose for the
between being there, otherwise a simple compare would have done
the thing). Not that I assume that the rest of the tcp code
happily lives with such mind-boggling numbers :-). Alas, it
turned out to be impossible to set wmem to such numbers anyway,
yes I really tried a big sendfile after setting some wmem but
nothing happened :-). ...Tcp_wmem is int and so is sk_sndbuf...
So I hacked a bit variable to long and found out that it seems
to work... :-)
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Peter Zijlstra [Tue, 7 Oct 2008 21:22:33 +0000 (14:22 -0700)]
net: packet split receive api
Add some packet-split receive hooks.
For one this allows to do NUMA node affine page allocs. Later on these
hooks will be extended to do emergency reserve allocations for
fragments.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: David S. Miller <davem@davemloft.net>
Peter Zijlstra [Tue, 7 Oct 2008 21:18:42 +0000 (14:18 -0700)]
net: wrap sk->sk_backlog_rcv()
Wrap calling sk->sk_backlog_rcv() in a function. This will allow extending the
generic sk_backlog_rcv behaviour.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: David S. Miller <davem@davemloft.net>
Peter Zijlstra [Tue, 7 Oct 2008 21:15:00 +0000 (14:15 -0700)]
ipv6: initialize ip6_route sysctl vars in ip6_route_net_init()
This makes that ip6_route_net_init() does all of the route init code.
There used to be a race between ip6_route_net_init() and ip6_net_init()
and someone relying on the combined result was left out cold.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: David S. Miller <davem@davemloft.net>
Peter Zijlstra [Tue, 7 Oct 2008 21:12:10 +0000 (14:12 -0700)]
ipv6: clean up ip6_route_net_init() error handling
ip6_route_net_init() error handling looked less than solid, fix 'er up.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: David S. Miller <davem@davemloft.net>
KOVACS Krisztian [Tue, 7 Oct 2008 19:41:01 +0000 (12:41 -0700)]
inet: Don't lookup the socket if there's a socket attached to the skb
Use the socket cached in the skb if it's present.
Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
KOVACS Krisztian [Tue, 7 Oct 2008 19:38:32 +0000 (12:38 -0700)]
inet: Add udplib_lookup_skb() helpers
To be able to use the cached socket reference in the skb during input
processing we add a new set of lookup functions that receive the skb on
their argument list.
Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnaldo Carvalho de Melo [Tue, 7 Oct 2008 18:41:57 +0000 (11:41 -0700)]
inet_hashtables: Add inet_lookup_skb helpers
To be able to use the cached socket reference in the skb during input
processing we add a new set of lookup functions that receive the skb on
their argument list.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 6 Oct 2008 17:43:54 +0000 (10:43 -0700)]
tcp: Respect SO_RCVLOWAT in tcp_poll().
Based upon a report by Vito Caputo.
Signed-off-by: David S. Miller <davem@davemloft.net>
Jarek Poplawski [Mon, 6 Oct 2008 17:41:50 +0000 (10:41 -0700)]
pkt_sched: Simplify dev_requeue_skb and dequeue_skb
qdisc->requeue was planned to universally replace all requeuing code,
but at the top level we never requeue more than one skb, so qdisc->
gso_skb is enough for this. qdisc->requeue would be used on the lower
levels only for one level deep requeuing (like in sch_hfsc) after
finishing all the changes.
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jarek Poplawski [Mon, 6 Oct 2008 16:54:39 +0000 (09:54 -0700)]
pkt_sched: Fix handling of gso skbs on requeuing
Jay Cliburn noticed and diagnosed a bug triggered in
dev_gso_skb_destructor() after last change from qdisc->gso_skb
to qdisc->requeue list. Since gso_segmented skbs can't be queued
to another list this patch brings back qdisc->gso_skb for them.
Reported-by: Jay Cliburn <jcliburn@gmail.com>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnaud Ebalard [Sun, 5 Oct 2008 20:33:42 +0000 (13:33 -0700)]
xfrm: MIGRATE enhancements (draft-ebalard-mext-pfkey-enhanced-migrate)
Provides implementation of the enhancements of XFRM/PF_KEY MIGRATE mechanism
specified in draft-ebalard-mext-pfkey-enhanced-migrate-00. Defines associated
PF_KEY SADB_X_EXT_KMADDRESS extension and XFRM/netlink XFRMA_KMADDRESS
attribute.
Signed-off-by: Arnaud Ebalard <arno@natisbad.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rémi Denis-Courmont [Sun, 5 Oct 2008 18:16:36 +0000 (11:16 -0700)]
Phonet: pipe end-point protocol documentation
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rémi Denis-Courmont [Sun, 5 Oct 2008 18:16:16 +0000 (11:16 -0700)]
Phonet: implement GPRS virtual interface over PEP socket
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rémi Denis-Courmont [Sun, 5 Oct 2008 18:15:43 +0000 (11:15 -0700)]
Phonet: receive pipe control requests as out-of-band data
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rémi Denis-Courmont [Sun, 5 Oct 2008 18:15:13 +0000 (11:15 -0700)]
Phonet: Pipe End Point for Phonet Pipes protocol
This protocol provides some connection handling and negotiated
congestion control. Nokia cellular modems use it for bulk transfers.
It provides packet boundaries (hence SOCK_SEQPACKET). Congestion
control is per packet rather per byte, so we do not re-use the
generic socket memory accounting.
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rémi Denis-Courmont [Sun, 5 Oct 2008 18:14:48 +0000 (11:14 -0700)]
Phonet: connected sockets glue
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rémi Denis-Courmont [Sun, 5 Oct 2008 18:14:27 +0000 (11:14 -0700)]
Phonet: modules auto-loading support
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Sun, 5 Oct 2008 16:20:28 +0000 (09:20 -0700)]
netdrv: Fix unregister_netdev typos
Found during the (partial) unregister_netdevice audit that we didn't
have to have :)
It looks like a couple of Sun NIC drivers had unregister_netdevice
when they really meant unregister_netdev.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Yasevich [Mon, 15 Sep 2008 20:29:49 +0000 (16:29 -0400)]
sctp: correctly save sctp_adaptation from parameter.
The INIT perameter carries the adapatation value in network-byte
order. We need to store it in host byte order as expected
by data types and the user API.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Vlad Yasevich [Mon, 8 Sep 2008 18:00:26 +0000 (14:00 -0400)]
sctp: enable cookie-echo retransmission transport switch
This patch enables cookie-echo retransmission transport switch
feature. If COOKIE-ECHO retransmission happens, it will be sent
to the address other than the one last sent to.
Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Wei Yongjun [Mon, 8 Sep 2008 04:13:55 +0000 (12:13 +0800)]
sctp: Fix the SNMP counter of SCTP_MIB_OUTOFBLUES
RFC3873 defined SCTP_MIB_OUTOFBLUES:
sctpOutOfBlues OBJECT-TYPE
SYNTAX Counter32
MAX-ACCESS read-only
STATUS current
DESCRIPTION
"The number of out of the blue packets received by the host.
An out of the blue packet is an SCTP packet correctly formed,
including the proper checksum, but for which the receiver was
unable to identify an appropriate association."
REFERENCE
"Section 8.4 in RFC2960 deals with the Out-Of-The-Blue
(OOTB) packet definition and procedures."
But OOTB packet INIT, INIT-ACK and SHUTDOWN-ACK(COOKIE-WAIT or
COOKIE-ECHOED state) are not counted by SCTP_MIB_OUTOFBLUES.
Case 1(INIT):
Endpoint A Endpoint B
(CLOSED) (CLOSED)
INIT ---------->
<---------- ABORT
Case 2(INIT-ACK):
Endpoint A Endpoint B
(CLOSED) (CLOSED)
INIT-ACK ---------->
<---------- ABORT
Case 3(SHUTDOWN-ACK):
Endpoint A Endpoint B
(CLOSED) (CLOSED)
<---------- INIT
SHUTDOWN-ACK ---------->
<---------- SHUTDOWN-COMPLETE
Case 4(SHUTDOWN-ACK):
Endpoint A Endpoint B
(CLOSED) (COOKIE-ECHOED)
SHUTDOWN-ACK ---------->
<---------- SHUTDOWN-COMPLETE
This patch fixed the problem.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Wei Yongjun [Fri, 5 Sep 2008 00:55:26 +0000 (08:55 +0800)]
sctp: Fix to start T5-shutdown-guard timer while enter SHUTDOWN-SENT state
RFC 4960: Section 9.2
The sender of the SHUTDOWN MAY also start an overall guard timer
'T5-shutdown-guard' to bound the overall time for the shutdown
sequence. At the expiration of this timer, the sender SHOULD abort
the association by sending an ABORT chunk. If the 'T5-shutdown-
guard' timer is used, it SHOULD be set to the recommended value of 5
times 'RTO.Max'.
The timer 'T5-shutdown-guard' is used to counter the overall time
for shutdown sequence, and it's start by the sender of the SHUTDOWN.
So timer 'T5-shutdown-guard' should be start when we send the first
SHUTDOWN chunk and enter the SHUTDOWN-SENT state, not start when we
receipt of the SHUTDOWN primitive and enter SHUTDOWN-PENDING state.
If 'T5-shutdown-guard' timer is start at SHUTDOWN-PENDING state, the
association may be ABORT while data is still transmitting.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Vlad Yasevich [Mon, 18 Aug 2008 14:34:34 +0000 (10:34 -0400)]
sctp: try harder to figure out address family when checking wildcards
sctp_is_any() function that is used to check for wildcard addresses
only looks at the address itself to determine the address family.
This function is used in the API to check the address passed in from
the user. If the user simply zerroes out the sockaddr_storage and
pass that in, we'll end up failing. So, let's try harder to determine
the address family by also checking the socket if it's possible.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Neil Horman [Fri, 25 Jul 2008 16:44:09 +0000 (12:44 -0400)]
sctp: reduce memory footprint of sctp_chunk structure
sctp_chunks should be put on a diet. This is some of the low hanging
fruit that we can strip out. Changes all the __s8/__u8 flags to
bitfields. Saves 12 bytes per chunk.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Vlad Yasevich [Mon, 23 Jun 2008 19:26:20 +0000 (15:26 -0400)]
sctp: Retransmit list is ineligable for missing indications
Chunks placed on the retransmit list are marked as inelegible
for fast retrasnmission. Since missing indications determine
when fast reransmission is done, there is not point in calling
sctp_mark_missing() on the retransmit list since those chunks
will not be marked.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Vlad Yasevich [Thu, 19 Jun 2008 22:17:24 +0000 (18:17 -0400)]
sctp: Optimize SFR-CACC transport list walking during SACK processing
There is a possibility of walking the transport list twice during
SACK processing when doing SFR-CACC algorithm. We can restructure
the code to only do this once.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Vlad Yasevich [Thu, 19 Jun 2008 21:59:13 +0000 (17:59 -0400)]
sctp: Only mark chunks as missing when there are gaps
Frist small step in optimizing SACK processing. Do not call
sctp_mark_missing() when there are no gaps reported and thus
not missing chunks.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
KOVACS Krisztian [Wed, 1 Oct 2008 14:48:10 +0000 (07:48 -0700)]
udp: Export UDP socket lookup function
The iptables tproxy code has to be able to do UDP socket hash lookups,
so we have to provide an exported lookup function for this purpose.
Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
KOVACS Krisztian [Wed, 1 Oct 2008 14:46:49 +0000 (07:46 -0700)]
tcp: Port redirection support for TCP
Current TCP code relies on the local port of the listening socket
being the same as the destination address of the incoming
connection. Port redirection used by many transparent proxying
techniques obviously breaks this, so we have to store the original
destination port address.
This patch extends struct inet_request_sock and stores the incoming
destination port value there. It also modifies the handshake code to
use that value as the source port when sending reply packets.
Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
KOVACS Krisztian [Wed, 1 Oct 2008 14:44:42 +0000 (07:44 -0700)]
ipv4: Make Netfilter's ip_route_me_harder() non-local address compatible
Netfilter's ip_route_me_harder() tries to re-route packets either
generated or re-routed by Netfilter. This patch changes
ip_route_me_harder() to handle packets from non-locally-bound sockets
with IP_TRANSPARENT set as local and to set the appropriate flowi
flags when re-doing the routing lookup.
Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
KOVACS Krisztian [Wed, 1 Oct 2008 14:41:00 +0000 (07:41 -0700)]
tcp: Handle TCP SYN+ACK/ACK/RST transparency
The TCP stack sends out SYN+ACK/ACK/RST reply packets in response to
incoming packets. The non-local source address check on output bites
us again, as replies for transparently redirected traffic won't have a
chance to leave the node.
This patch selectively sets the FLOWI_FLAG_ANYSRC flag when doing the
route lookup for those replies. Transparent replies are enabled if the
listening socket has the transparent socket flag set.
Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
KOVACS Krisztian [Wed, 1 Oct 2008 14:35:39 +0000 (07:35 -0700)]
ipv4: Conditionally enable transparent flow flag when connecting
Set FLOWI_FLAG_ANYSRC in flowi->flags if the socket has the
transparent socket option set. This way we selectively enable certain
connections with non-local source addresses to be routed.
Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
KOVACS Krisztian [Wed, 1 Oct 2008 14:33:10 +0000 (07:33 -0700)]
ipv4: Make inet_sock.h independent of route.h
inet_iif() in inet_sock.h requires route.h. Since users of inet_iif()
usually require other route.h functionality anyway this patch moves
inet_iif() to route.h.
Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tóth László Attila [Wed, 1 Oct 2008 14:31:24 +0000 (07:31 -0700)]
ipv4: Allow binding to non-local addresses if IP_TRANSPARENT is set
Setting IP_TRANSPARENT is not really useful without allowing non-local
binds for the socket. To make user-space code simpler we allow these
binds even if IP_TRANSPARENT is set but IP_FREEBIND is not.
Signed-off-by: Tóth László Attila <panther@balabit.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
KOVACS Krisztian [Wed, 1 Oct 2008 14:30:02 +0000 (07:30 -0700)]
ipv4: Implement IP_TRANSPARENT socket option
This patch introduces the IP_TRANSPARENT socket option: enabling that
will make the IPv4 routing omit the non-local source address check on
output. Setting IP_TRANSPARENT requires NET_ADMIN capability.
Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Anastasov [Wed, 1 Oct 2008 14:28:28 +0000 (07:28 -0700)]
ipv4: Loosen source address check on IPv4 output
ip_route_output() contains a check to make sure that no flows with
non-local source IP addresses are routed. This obviously makes using
such addresses impossible.
This patch introduces a flowi flag which makes omitting this check
possible. The new flag provides a way of handling transparent and
non-transparent connections differently.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Wed, 1 Oct 2008 14:09:38 +0000 (07:09 -0700)]
net: BUG instead of corrupting memory in pskb_expand_head
If the caller of pskb_expand_head specifies a negative nhead
we'll silently overwrite other people's memory. This patch
makes it BUG instead.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Wed, 1 Oct 2008 14:03:24 +0000 (07:03 -0700)]
ipsec: Put dumpers on the dump list
Herbert Xu came up with the idea and the original patch to make
xfrm_state dump list contain also dumpers:
As it is we go to extraordinary lengths to ensure that states
don't go away while dumpers go to sleep. It's much easier if
we just put the dumpers themselves on the list since they can't
go away while they're going.
I've also changed the order of addition on new states to prevent
a never-ending dump.
Timo Teräs improved the patch to apply cleanly to latest tree,
modified iteration code to be more readable by using a common
struct for entries in the list, implemented the same idea for
xfrm_policy dumping and moved the af_key specific "last" entry
caching to af_key.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Timo Teras <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 1 Oct 2008 13:12:56 +0000 (06:12 -0700)]
Merge branch 'master' of /linux/kernel/git/davem/net-2.6
Conflicts:
drivers/net/wireless/ath9k/core.c
drivers/net/wireless/ath9k/main.c
net/core/dev.c
Timo Teras [Wed, 1 Oct 2008 12:17:54 +0000 (05:17 -0700)]
af_key: Free dumping state on socket close
Fix a xfrm_{state,policy}_walk leak if pfkey socket is closed while
dumping is on-going.
Signed-off-by: Timo Teras <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ilpo Järvinen [Wed, 1 Oct 2008 09:48:31 +0000 (02:48 -0700)]
ipv6: almost identical frag hashing funcs combined
$ diff-funcs ip6qhashfn reassembly.c netfilter/nf_conntrack_reasm.c
--- reassembly.c:ip6qhashfn()
+++ netfilter/nf_conntrack_reasm.c:ip6qhashfn()
@@ -1,5 +1,5 @@
-static unsigned int ip6qhashfn(__be32 id, struct in6_addr *saddr,
- struct in6_addr *daddr)
+static unsigned int ip6qhashfn(__be32 id, const struct in6_addr *saddr,
+ const struct in6_addr *daddr)
{
u32 a, b, c;
@@ -9,7 +9,7 @@
a += JHASH_GOLDEN_RATIO;
b += JHASH_GOLDEN_RATIO;
- c += ip6_frags.rnd;
+ c += nf_frags.rnd;
__jhash_mix(a, b, c);
a += (__force u32)saddr->s6_addr32[3];
And codiff xx.o.old xx.o.new:
net/ipv6/netfilter/nf_conntrack_reasm.c:
ip6qhashfn | -512
nf_hashfn | +6
nf_ct_frag6_gather | +36
3 functions changed, 42 bytes added, 512 bytes removed, diff: -470
net/ipv6/reassembly.c:
ip6qhashfn | -512
ip6_hashfn | +7
ipv6_frag_rcv | +89
3 functions changed, 96 bytes added, 512 bytes removed, diff: -416
net/ipv6/reassembly.c:
inet6_hash_frag | +510
1 function changed, 510 bytes added, diff: +510
Total: -376
Compile tested.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnaud Ebalard [Wed, 1 Oct 2008 09:37:56 +0000 (02:37 -0700)]
XFRM,IPv6: initialize ip6_dst_blackhole_ops.kmem_cachep
ip6_dst_blackhole_ops.kmem_cachep is not expected to be NULL (i.e. to
be initialized) when dst_alloc() is called from ip6_dst_blackhole().
Otherwise, it results in the following (xfrm_larval_drop is now set to
1 by default):
[ 78.697642] Unable to handle kernel paging request for data at address 0x0000004c
[ 78.703449] Faulting instruction address: 0xc0097f54
[ 78.786896] Oops: Kernel access of bad area, sig: 11 [#1]
[ 78.792791] PowerMac
[ 78.798383] Modules linked in: btusb usbhid bluetooth b43 mac80211 cfg80211 ehci_hcd ohci_hcd sungem sungem_phy usbcore ssb
[ 78.804263] NIP:
c0097f54 LR:
c0334a28 CTR:
c002d430
[ 78.809997] REGS:
eef19ad0 TRAP: 0300 Not tainted (2.6.27-rc5)
[ 78.815743] MSR:
00001032 <ME,IR,DR> CR:
22242482 XER:
20000000
[ 78.821550] DAR:
0000004c, DSISR:
40000000
[ 78.827278] TASK =
eef0df40[3035] 'mip6d' THREAD:
eef18000
[ 78.827408] GPR00:
00001032 eef19b80 eef0df40 00000000 00008020 eef19c30 00000001 00000000
[ 78.833249] GPR08:
eee5101c c05a5c10 ef9ad500 00000000 24242422 1005787c 00000000 1004f960
[ 78.839151] GPR16:
00000000 10024e90 10050040 48030018 0fe44150 00000000 00000000 eef19c30
[ 78.845046] GPR24:
eef19e44 00000000 eef19bf8 efb37c14 eef19bf8 00008020 00009032 c0596064
[ 78.856671] NIP [
c0097f54] kmem_cache_alloc+0x20/0x94
[ 78.862581] LR [
c0334a28] dst_alloc+0x40/0xc4
[ 78.868451] Call Trace:
[ 78.874252] [
eef19b80] [
c03c1810] ip6_dst_lookup_tail+0x1c8/0x1dc (unreliable)
[ 78.880222] [
eef19ba0] [
c0334a28] dst_alloc+0x40/0xc4
[ 78.886164] [
eef19bb0] [
c03cd698] ip6_dst_blackhole+0x28/0x1cc
[ 78.892090] [
eef19be0] [
c03d9be8] rawv6_sendmsg+0x75c/0xc88
[ 78.897999] [
eef19cb0] [
c038bca4] inet_sendmsg+0x4c/0x78
[ 78.903907] [
eef19cd0] [
c03207c8] sock_sendmsg+0xac/0xe4
[ 78.909734] [
eef19db0] [
c03209e4] sys_sendmsg+0x1e4/0x2a0
[ 78.915540] [
eef19f00] [
c03220a8] sys_socketcall+0xfc/0x210
[ 78.921406] [
eef19f40] [
c0014b3c] ret_from_syscall+0x0/0x38
[ 78.927295] --- Exception: c01 at 0xfe2d730
[ 78.927297] LR = 0xfe2d71c
[ 78.939019] Instruction dump:
[ 78.944835]
91640018 9144001c 900a0000 4bffff44 9421ffe0 7c0802a6 bf810010 7c9d2378
[ 78.950694]
90010024 7fc000a6 57c0045e 7c000124 <
83e3004c>
8383005c 2f9f0000 419e0050
[ 78.956464] ---[ end trace
05fa1ed7972487a1 ]---
As commented by Benjamin Thery, the bug was introduced by
f2fc6a54585a1be6669613a31fbaba2ecbadcd36, while adding network
namespaces support to ipv6 routes.
Signed-off-by: Arnaud Ebalard <arno@natisbad.org>
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lennert Buytenhek [Wed, 1 Oct 2008 09:33:57 +0000 (02:33 -0700)]
mv643xx_eth: hook up skb recycling
This gives a nice increase in the maximum loss-free packet forwarding
rate in routing workloads.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lennert Buytenhek [Wed, 1 Oct 2008 09:33:12 +0000 (02:33 -0700)]
net: add skb_recycle_check() to enable netdriver skb recycling
This patch adds skb_recycle_check(), which can be used by a network
driver after transmitting an skb to check whether this skb can be
recycled as a receive buffer.
skb_recycle_check() checks that the skb is not shared or cloned, and
that it is linear and its head portion large enough (as determined by
the driver) to be recycled as a receive buffer. If these conditions
are met, it does any necessary reference count dropping and cleans
up the skbuff as if it just came from __alloc_skb().
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Denis V. Lunev [Wed, 1 Oct 2008 09:13:16 +0000 (02:13 -0700)]
ipv6: NULL pointer dereferrence in tcp_v6_send_ack
The following actions are possible:
tcp_v6_rcv
skb->dev = NULL;
tcp_v6_do_rcv
tcp_v6_hnd_req
tcp_check_req
req->rsk_ops->send_ack == tcp_v6_send_ack
So, skb->dev can be NULL in tcp_v6_send_ack. We must obtain namespace
from dst entry.
Thanks to Vitaliy Gusev <vgusev@openvz.org> for initial problem finding
in IPv4 code.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 1 Oct 2008 08:55:41 +0000 (01:55 -0700)]
Merge branch 'master' of git://git./linux/kernel/git/linville/wireless-next-2.6
Vitaliy Gusev [Wed, 1 Oct 2008 08:51:39 +0000 (01:51 -0700)]
tcp: Fix NULL dereference in tcp_4_send_ack()
Fix NULL dereference in tcp_4_send_ack().
As skb->dev is reset to NULL in tcp_v4_rcv() thus OOPS occurs:
BUG: unable to handle kernel NULL pointer dereference at
00000000000004d0
IP: [<
ffffffff80498503>] tcp_v4_send_ack+0x203/0x250
Stack:
ffff810005dbb000 ffff810015c8acc0 e77b2c6e5f861600 a01610802e90cb6d
0a08010100000000 88afffff88afffff 0000000080762be8 0000000115c872e8
0004122000000000 0000000000000001 ffffffff80762b88 0000000000000020
Call Trace:
<IRQ> [<
ffffffff80499c33>] tcp_v4_reqsk_send_ack+0x20/0x22
[<
ffffffff8049bce5>] tcp_check_req+0x108/0x14c
[<
ffffffff8047aaf7>] ? rt_intern_hash+0x322/0x33c
[<
ffffffff80499846>] tcp_v4_do_rcv+0x399/0x4ec
[<
ffffffff8045ce4b>] ? skb_checksum+0x4f/0x272
[<
ffffffff80485b74>] ? __inet_lookup_listener+0x14a/0x15c
[<
ffffffff8049babc>] tcp_v4_rcv+0x6a1/0x701
[<
ffffffff8047e739>] ip_local_deliver_finish+0x157/0x24a
[<
ffffffff8047ec9a>] ip_local_deliver+0x72/0x7c
[<
ffffffff8047e5bd>] ip_rcv_finish+0x38d/0x3b2
[<
ffffffff803d3548>] ? scsi_io_completion+0x19d/0x39e
[<
ffffffff8047ebe5>] ip_rcv+0x2a2/0x2e5
[<
ffffffff80462faa>] netif_receive_skb+0x293/0x303
[<
ffffffff80465a9b>] process_backlog+0x80/0xd0
[<
ffffffff802630b4>] ? __rcu_process_callbacks+0x125/0x1b4
[<
ffffffff8046560e>] net_rx_action+0xb9/0x17f
[<
ffffffff80234cc5>] __do_softirq+0xa3/0x164
[<
ffffffff8020c52c>] call_softirq+0x1c/0x28
<EOI> [<
ffffffff8020de1c>] do_softirq+0x34/0x72
[<
ffffffff80234b8e>] local_bh_enable_ip+0x3f/0x50
[<
ffffffff804d43ca>] _spin_unlock_bh+0x12/0x14
[<
ffffffff804599cd>] release_sock+0xb8/0xc1
[<
ffffffff804a6f9a>] inet_stream_connect+0x146/0x25c
[<
ffffffff80243078>] ? autoremove_wake_function+0x0/0x38
[<
ffffffff8045751f>] sys_connect+0x68/0x8e
[<
ffffffff80291818>] ? fd_install+0x5f/0x68
[<
ffffffff80457784>] ? sock_map_fd+0x55/0x62
[<
ffffffff8020b39b>] system_call_after_swapgs+0x7b/0x80
Code: 41 10 11 d0 83 d0 00 4d 85 ed 89 45 c0 c7 45 c4 08 00 00 00 74 07 41 8b 45 04 89 45 c8 48 8b 43 20 8b 4d b8 48 8d 55 b0 48 89 de <48> 8b 80 d0 04 00 00 48 8b b8 60 01 00 00 e8 20 ae fe ff 65 48
RIP [<
ffffffff80498503>] tcp_v4_send_ack+0x203/0x250
RSP <
ffffffff80762b78>
CR2:
00000000000004d0
Signed-off-by: Vitaliy Gusev <vgusev@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remi Denis-Courmont [Wed, 1 Oct 2008 08:30:19 +0000 (01:30 -0700)]
phonet: Protect if_phonet.h against multiple inclusions.
From: Remi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>