platform/upstream/kernel-adaptation-pc.git
11 years agoipv4: provide addr and netconf dump consistency info
Nicolas Dichtel [Fri, 22 Mar 2013 06:28:42 +0000 (06:28 +0000)]
ipv4: provide addr and netconf dump consistency info

This patch takes benefit of dev_addr_genid and dev_base_seq to check if a change
occurs during a netlink dump. If a change is detected, the flag NLM_F_DUMP_INTR
is set in the first message after the dump was interrupted.

Note that seq and prev_seq must be reset between each family in rtnl_dump_all()
because they are specific to each family.

Reported-by: Junwei Zhang <junwei.zhang@6wind.com>
Reported-by: Hongjun Li <hongjun.li@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agomv643xx_eth: defer probing if Marvell Orion MDIO driver not loaded
Simon Baatz [Sun, 24 Mar 2013 10:34:00 +0000 (10:34 +0000)]
mv643xx_eth: defer probing if Marvell Orion MDIO driver not loaded

When both the Marvell MV643XX ethernet driver and the Orion MDIO driver
are compiled as modules, the ethernet driver may be probed before the
MDIO driver.  Let mv643xx_eth_probe() return EPROBE_DEFER in this case,
i.e. when it cannot find the PHY.

Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
Acked-by: Florian Fainelli <florian@openwrt.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: mvmdio: define module alias for platform device
Simon Baatz [Sun, 24 Mar 2013 10:33:59 +0000 (10:33 +0000)]
net: mvmdio: define module alias for platform device

The mvmdio driver can be instantiated using device tree or as a classic
platform device.  In order to load the driver automatically by udev in
the latter case, the driver needs to define a module alias for the
platform device.

Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
Acked-by: Florian Fainelli <florian@openwrt.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agol2tp: calling the ref() instead of deref()
Dan Carpenter [Fri, 22 Mar 2013 18:33:15 +0000 (21:33 +0300)]
l2tp: calling the ref() instead of deref()

This is a cut and paste typo.  We call ->ref() a second time instead
of ->deref().

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoioat/dca: Update DCA BIOS workarounds to use TAINT_FIRMWARE_WORKAROUND
Alexander Duyck [Fri, 22 Mar 2013 06:01:59 +0000 (06:01 +0000)]
ioat/dca: Update DCA BIOS workarounds to use TAINT_FIRMWARE_WORKAROUND

This patch is meant to be a follow-up for a patch originally submitted under
the title "ioat: Do not enable DCA if tag map is invalid".  It was brought to
my attention that the preferred approach for BIOS workarounds is to set the
taint flag for TAINT_FIRMWARE_WORKAROUND for systems that require BIOS
workarounds.

This change makes it so that the DCA workarounds for broken BIOSes will now
use WARN_TAINT_ONCE(1, TAINT_FIRMWARE_WORKAROUND, ...) instead of just
printing a message via dev_err.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
David S. Miller [Fri, 22 Mar 2013 16:53:09 +0000 (12:53 -0400)]
Merge git://git./linux/kernel/git/davem/net

Pull to get the thermal netlink multicast group name fix, otherwise
the assertion added in net-next to netlink to detect that kind of bug
makes systems unbootable for some folks.

Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agodecnet: Move rtm_dn_policy to dn_route to make it available if !CONFIG_DECNET_ROUTER
Thomas Graf [Fri, 22 Mar 2013 16:50:29 +0000 (16:50 +0000)]
decnet: Move rtm_dn_policy to dn_route to make it available if !CONFIG_DECNET_ROUTER

Otherwise build fails with CONFIG_DECNET && !CONFIG_DECNET_ROUTER

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoqlcnic: Bump up the version to 5.1.38
Shahed Shaikh [Fri, 22 Mar 2013 05:57:57 +0000 (05:57 +0000)]
qlcnic: Bump up the version to 5.1.38

Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoqlcnic: Clear link status when interface is down
Shahed Shaikh [Fri, 22 Mar 2013 05:57:56 +0000 (05:57 +0000)]
qlcnic: Clear link status when interface is down

o When interface is down, mailbox command to get context statistics
  fails. So restrict driver from issuing get statistics command when
  interface is down.

Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoqlcnic: change mdelay to msleep
Shahed Shaikh [Fri, 22 Mar 2013 05:57:55 +0000 (05:57 +0000)]
qlcnic: change mdelay to msleep

Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoqlcnic: Log warning message for 83xx adapter in MSI mode.
Himanshu Madhani [Fri, 22 Mar 2013 05:57:54 +0000 (05:57 +0000)]
qlcnic: Log warning message for 83xx adapter in MSI mode.

o 83xx adapter does not support MSI interrupts, display
  warning whenever module parameter is used to load driver
  in MSI mode.

Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoqlcnic: Fix configure mailbox interrupt command for 83xx adapter
Manish chopra [Fri, 22 Mar 2013 05:57:53 +0000 (05:57 +0000)]
qlcnic: Fix configure mailbox interrupt command for 83xx adapter

o Due to improper data type of variable "type", interrupt resources were
  not getting deleted in hardware which was causing resource exhaustion
  in hardware. Hence mailbox command fails after some iterations of context change.

Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotcp: preserve ACK clocking in TSO
Eric Dumazet [Thu, 21 Mar 2013 17:36:09 +0000 (17:36 +0000)]
tcp: preserve ACK clocking in TSO

A long standing problem with TSO is the fact that tcp_tso_should_defer()
rearms the deferred timer, while it should not.

Current code leads to following bad bursty behavior :

20:11:24.484333 IP A > B: . 297161:316921(19760) ack 1 win 119
20:11:24.484337 IP B > A: . ack 263721 win 1117
20:11:24.485086 IP B > A: . ack 265241 win 1117
20:11:24.485925 IP B > A: . ack 266761 win 1117
20:11:24.486759 IP B > A: . ack 268281 win 1117
20:11:24.487594 IP B > A: . ack 269801 win 1117
20:11:24.488430 IP B > A: . ack 271321 win 1117
20:11:24.489267 IP B > A: . ack 272841 win 1117
20:11:24.490104 IP B > A: . ack 274361 win 1117
20:11:24.490939 IP B > A: . ack 275881 win 1117
20:11:24.491775 IP B > A: . ack 277401 win 1117
20:11:24.491784 IP A > B: . 316921:332881(15960) ack 1 win 119
20:11:24.492620 IP B > A: . ack 278921 win 1117
20:11:24.493448 IP B > A: . ack 280441 win 1117
20:11:24.494286 IP B > A: . ack 281961 win 1117
20:11:24.495122 IP B > A: . ack 283481 win 1117
20:11:24.495958 IP B > A: . ack 285001 win 1117
20:11:24.496791 IP B > A: . ack 286521 win 1117
20:11:24.497628 IP B > A: . ack 288041 win 1117
20:11:24.498459 IP B > A: . ack 289561 win 1117
20:11:24.499296 IP B > A: . ack 291081 win 1117
20:11:24.500133 IP B > A: . ack 292601 win 1117
20:11:24.500970 IP B > A: . ack 294121 win 1117
20:11:24.501388 IP B > A: . ack 295641 win 1117
20:11:24.501398 IP A > B: . 332881:351881(19000) ack 1 win 119

While the expected behavior is more like :

20:19:49.259620 IP A > B: . 197601:202161(4560) ack 1 win 119
20:19:49.260446 IP B > A: . ack 154281 win 1212
20:19:49.261282 IP B > A: . ack 155801 win 1212
20:19:49.262125 IP B > A: . ack 157321 win 1212
20:19:49.262136 IP A > B: . 202161:206721(4560) ack 1 win 119
20:19:49.262958 IP B > A: . ack 158841 win 1212
20:19:49.263795 IP B > A: . ack 160361 win 1212
20:19:49.264628 IP B > A: . ack 161881 win 1212
20:19:49.264637 IP A > B: . 206721:211281(4560) ack 1 win 119
20:19:49.265465 IP B > A: . ack 163401 win 1212
20:19:49.265886 IP B > A: . ack 164921 win 1212
20:19:49.266722 IP B > A: . ack 166441 win 1212
20:19:49.266732 IP A > B: . 211281:215841(4560) ack 1 win 119
20:19:49.267559 IP B > A: . ack 167961 win 1212
20:19:49.268394 IP B > A: . ack 169481 win 1212
20:19:49.269232 IP B > A: . ack 171001 win 1212
20:19:49.269241 IP A > B: . 215841:221161(5320) ack 1 win 119

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Van Jacobson <vanj@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agortnetlink: Remove passing of attributes into rtnl_doit functions
Thomas Graf [Thu, 21 Mar 2013 07:45:29 +0000 (07:45 +0000)]
rtnetlink: Remove passing of attributes into rtnl_doit functions

With decnet converted, we can finally get rid of rta_buf and its
computations around it. It also gets rid of the minimal header
length verification since all message handlers do that explicitly
anyway.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agodecnet: Parse netlink attributes on our own
Thomas Graf [Thu, 21 Mar 2013 07:45:28 +0000 (07:45 +0000)]
decnet: Parse netlink attributes on our own

decnet is the only subsystem left that is relying on the global
netlink attribute buffer rta_buf. It's horrible design and we
want to get rid of it.

This converts all of decnet to do implicit attribute parsing. It
also gets rid of the error prone struct dn_kern_rta.

Yes, the fib_magic() stuff is not pretty.

It's compiled tested but I need someone with appropriate hardware
to test the patch since I don't have access to it.

Cc: linux-decnet-user@lists.sourceforge.net
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoMerge branch 'mv643xx_eth'
David S. Miller [Fri, 22 Mar 2013 14:25:25 +0000 (10:25 -0400)]
Merge branch 'mv643xx_eth'

Florian Fainelli says:

====================
This patch converts the mv643xx_eth driver to use the mvmdio MDIO bus driver
instead of rolling its own implementation. As a result, all users of this
mv643xx_eth driver are converted to register an "orion-mdio" platform_device.
The mvmdio driver is also updated to support an interrupt line which reports
SMI error/completion, and to allow traditionnal platform device registration
instead of just device tree.

David, I think it makes sense for you to merge all of this, since we do
not want the architecture files to be desynchronized from the mv643xx_eth to
avoid runtime breakage. The potential for merge conflicts should be very small.
====================

Tested-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Tested-by: Jason Cooper <jason@lakedaemon.net>
Acked-by: Jason Cooper <jason@lakedaemon.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agomv643xx_eth: convert to use the Marvell Orion MDIO driver
Florian Fainelli [Fri, 22 Mar 2013 03:39:28 +0000 (03:39 +0000)]
mv643xx_eth: convert to use the Marvell Orion MDIO driver

This patch converts the Marvell MV643XX ethernet driver to use the
Marvell Orion MDIO driver. As a result, PowerPC and ARM platforms
registering the Marvell MV643XX ethernet driver are also updated to
register a Marvell Orion MDIO driver. This driver voluntarily overlaps
with the Marvell Ethernet shared registers because it will use a subset
of this shared register (shared_base + 0x4 to shared_base + 0x84). The
Ethernet driver is also updated to look up for a PHY device using the
Orion MDIO bus driver.

For ARM and PowerPC we register a single instance of the "mvmdio" driver
in the system like it used to be done with the use of the "shared_smi"
platform_data cookie on ARM.

Note that it is safe to register the mvmdio driver only for the "ge00"
instance of the driver because this "ge00" interface is guaranteed to
always be explicitely registered by consumers of
arch/arm/plat-orion/common.c and other instances (ge01, ge10 and ge11)
were all pointing their shared_smi to ge00. For PowerPC the in-tree
Device Tree Source files mention only one MV643XX ethernet MAC instance
so the MDIO bus driver is registered only when id == 0.

Signed-off-by: Florian Fainelli <florian@openwrt.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: mvmdio: enhance driver to support SMI error/done interrupts
Florian Fainelli [Fri, 22 Mar 2013 03:39:27 +0000 (03:39 +0000)]
net: mvmdio: enhance driver to support SMI error/done interrupts

This patch enhances the "mvmdio" to support a SMI error/done interrupt
line which can be used along with a wait queue instead of doing
busy-waiting on the registers. This is a feature which is available in
the mv643xx_eth SMI code and thus reduces again the gap between the two.

Signed-off-by: Florian Fainelli <florian@openwrt.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: mvmdio: rename base register cookie from smireg to regs
Florian Fainelli [Fri, 22 Mar 2013 03:39:26 +0000 (03:39 +0000)]
net: mvmdio: rename base register cookie from smireg to regs

This patch renames the base register cookie in the mvmdio drive from
"smireg" to "regs" since a subsequent patch is going to use an ioremap()
cookie whose size is larger than a single register of 4 bytes. No
functionnal code change introduced.

Acked-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Florian Fainelli <florian@openwrt.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: mvmdio: allow platform device style registration
Florian Fainelli [Fri, 22 Mar 2013 03:39:25 +0000 (03:39 +0000)]
net: mvmdio: allow platform device style registration

This patch changes the mvmdio driver not to use device tree
helper functions such as of_mdiobus_register() and of_iomap() so we can
instantiate this driver using a classic platform_device approach. Use
the device manager helper to ioremap() the base register cookie so we
get automatic freeing upon error and removal. This change is harmless
for Device Tree platforms because they will get the driver be registered
the same way as it was before.

Signed-off-by: Florian Fainelli <florian@openwrt.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agos6gmac: fix error return code in s6gmac_probe()
Wei Yongjun [Fri, 22 Mar 2013 03:17:47 +0000 (03:17 +0000)]
s6gmac: fix error return code in s6gmac_probe()

Fix to return a negative error code from the error handling
case instead of 0, as returned elsewhere in this function.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoudp: increase inner ip header ID during segmentation
Cong Wang [Fri, 22 Mar 2013 00:31:32 +0000 (00:31 +0000)]
udp: increase inner ip header ID during segmentation

Similar to GRE tunnel, UDP tunnel should take care of IP header ID
too.

Cc: Pravin B Shelar <pshelar@nicira.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoip_gre: increase inner ip header ID during segmentation
Cong Wang [Fri, 22 Mar 2013 00:31:31 +0000 (00:31 +0000)]
ip_gre: increase inner ip header ID during segmentation

According to the previous discussion [1] on netdev list, DaveM insists
we should increase the IP header ID for each segmented packets.
This patch fixes it.

Cc: Pravin B Shelar <pshelar@nicira.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
1. http://marc.info/?t=136384172700001&r=1&w=2
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agobnx2x: increase inner ip id during encapsulated tso
Dmitry Kravkov [Thu, 21 Mar 2013 15:38:24 +0000 (15:38 +0000)]
bnx2x: increase inner ip id during encapsulated tso

57712/578xx devices during handling of encapsulated TSO can
properly increase ip id for only one ip header.
The patch selects inner header to be increased.

Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
CC: Eilon Greenstein <eilong@broadcom.com>
CC: Ariel Elior <ariele@broadcom.com>
CC: Maciej Zenczykowski <maze@google.com>
CC: Jesse Gross <jesse@nicira.com>
Reported-by: Eric Dumazet <edumazet@google.com>
Tested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agovirtio: remove obsolete virtqueue_get_queue_index()
Rusty Russell [Thu, 21 Mar 2013 14:17:34 +0000 (14:17 +0000)]
virtio: remove obsolete virtqueue_get_queue_index()

You can access it directly now, since 3.8: v3.7-rc1-13-g06ca287
'virtio: move queue_index and num_free fields into core struct
virtqueue.'

Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: Fix p3_gelic_net sparse warnings
Geoff Levand [Fri, 22 Mar 2013 00:06:59 +0000 (17:06 -0700)]
net: Fix p3_gelic_net sparse warnings

Rearrange routines to avoid local declarations and remove
unnecessary inline tags.  No functional changes.

Fixes sparse warnings like these:

  ps3_gelic_net.c: error: marked inline, but without a definition

Signed-off-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoappletalk: remove "config IPDDP_DECAP"
Paul Bolle [Thu, 21 Mar 2013 23:35:24 +0000 (00:35 +0100)]
appletalk: remove "config IPDDP_DECAP"

The Kconfig symbol IPDDP_DECAP got added in v2.1.75. It has never been
used. Its entry can safely be removed.

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agosh_eth: fix unused variable warning
Sergei Shtylyov [Thu, 21 Mar 2013 22:59:00 +0000 (01:59 +0300)]
sh_eth: fix unused variable warning

Commit d5e07e69218fd9aa21d6c8c5ccc629d92bdb9b0f (sh_eth: use managed device API)
has caused this warning (due to my overlook):

drivers/net/ethernet/renesas/sh_eth.c: In function `sh_eth_drv_remove':
drivers/net/ethernet/renesas/sh_eth.c:2482:25: warning: unused variable `mdp'
[-Wunused-variable]

Kill the darn variable now...

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agofilter: bpf_jit_comp: refactor and unify BPF JIT image dump output
Daniel Borkmann [Thu, 21 Mar 2013 21:22:03 +0000 (22:22 +0100)]
filter: bpf_jit_comp: refactor and unify BPF JIT image dump output

If bpf_jit_enable > 1, then we dump the emitted JIT compiled image
after creation. Currently, only SPARC and PowerPC has similar output
as in the reference implementation on x86_64. Make a small helper
function in order to reduce duplicated code and make the dump output
uniform across architectures x86_64, SPARC, PPC, ARM (e.g. on ARM
flen, pass and proglen are currently not shown, but would be
interesting to know as well), also for future BPF JIT implementations
on other archs.

Cc: Mircea Gherzan <mgherzan@gmail.com>
Cc: Matt Evans <matt@ozlabs.org>
Cc: Eric Dumazet <eric.dumazet@google.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agosh_eth: use managed device API
Sergei Shtylyov [Thu, 21 Mar 2013 10:41:11 +0000 (10:41 +0000)]
sh_eth: use managed device API

Switch the driver to the managed device API by replacing ioremap() calls with
devm_ioremap_resource() (that will also result in calling request_mem_region()
which the driver forgot to do until now) and k[mz]alloc() with devm_kzalloc() --
this permits to simplify driver's probe()/remove() method cleanup. We can now
remove the ioremap() error messages since the error messages are printed by
 devm_ioremap_resource() itself. We can also remove the 'bitbang' field from
'struct sh_eth_private' as we don't need it anymore in order to free the memory
behind it...

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agosh_eth: kill unneeded typecast in sh_eth_drv_probe()
Sergei Shtylyov [Thu, 21 Mar 2013 10:39:22 +0000 (10:39 +0000)]
sh_eth: kill unneeded typecast in sh_eth_drv_probe()

sh_eth_drv_probe() does cast from 'void *' when assigning to the 'pd'  variable
which is automatic anyway. Turn the assignment into initializer, while removing
the cast...

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agosh_eth: use PIR_* bits
Sergei Shtylyov [Thu, 21 Mar 2013 10:37:54 +0000 (10:37 +0000)]
sh_eth: use PIR_* bits

sh_mdio_init() uses the bare numbers instead of the PHY interface bits, despite
these are declared in sh_eth.h as 'enum PIR_BIT'...

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: fix psock_fanout on sparc64
Willem de Bruijn [Thu, 21 Mar 2013 18:10:03 +0000 (14:10 -0400)]
net: fix psock_fanout on sparc64

The packetsocket fanout test uses a packet ring. Use TPACKET_V2
instead of TPACKET_V1 to work around a known 32/64 bit issue in
the older ring that manifests on sparc64.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonetlink: Diag core and basic socket info dumping (v2)
Andrey Vagin [Thu, 21 Mar 2013 16:33:48 +0000 (20:33 +0400)]
netlink: Diag core and basic socket info dumping (v2)

The netlink_diag can be built as a module, just like it's done in
unix sockets.

The core dumping message carries the basic info about netlink sockets:
family, type and protocol, portis, dst_group, dst_portid, state.

Groups can be received as an optional parameter NETLINK_DIAG_GROUPS.

Netlink sockets cab be filtered by protocols.

The socket inode number and cookie is reserved for future per-socket info
retrieving. The per-protocol filtering is also reserved for future by
requiring the sdiag_protocol to be zero.

The file /proc/net/netlink doesn't provide enough information for
dumping netlink sockets. It doesn't provide dst_group, dst_portid,
groups above 32.

v2: fix NETLINK_DIAG_MAX. Now it's equal to the last constant.

Acked-by: Pavel Emelyanov <xemul@parallels.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Gao feng <gaofeng@cn.fujitsu.com>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: prepare netlink code for netlink diag
Andrey Vagin [Thu, 21 Mar 2013 16:33:47 +0000 (20:33 +0400)]
net: prepare netlink code for netlink diag

Move a few declarations in a header.

Acked-by: Pavel Emelyanov <xemul@parallels.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Gao feng <gaofeng@cn.fujitsu.com>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: fix *_DIAG_MAX constants
Andrey Vagin [Thu, 21 Mar 2013 16:33:46 +0000 (20:33 +0400)]
net: fix *_DIAG_MAX constants

Follow the common pattern and define *_DIAG_MAX like:

        [...]
        __XXX_DIAG_MAX,
};

Because everyone is used to do:

        struct nlattr *attrs[XXX_DIAG_MAX+1];

        nla_parse([...], XXX_DIAG_MAX, [...]

Reported-by: Thomas Graf <tgraf@suug.ch>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoMerge branch 'mlx4'
David S. Miller [Thu, 21 Mar 2013 16:05:32 +0000 (12:05 -0400)]
Merge branch 'mlx4'

Or Gerlitz says:

====================
Here's a batch of mlx4 driver fixes for 3.9, mostly SRIOV/Flow-steering
related. Series done against the net tree as of commit 5a3da1f
"inet: limit length of fragment queue hash table bucket lists
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet/mlx4_core: Disallow releasing VF QPs which have steering rules
Hadar Hen Zion [Thu, 21 Mar 2013 05:55:55 +0000 (05:55 +0000)]
net/mlx4_core: Disallow releasing VF QPs which have steering rules

VF QPs must not be released when they have steering rules attached to them.

For that end, introduce a reference count field to the QP object in the
SRIOV resource tracker which is incremented/decremented when steering rules
are attached/detached to it. QPs can be released by VF only when their
ref count is zero.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet/mlx4_core: Always use 64 bit resource ID when doing lookup
Hadar Hen Zion [Thu, 21 Mar 2013 05:55:54 +0000 (05:55 +0000)]
net/mlx4_core: Always use 64 bit resource ID when doing lookup

One of the resource tracker code paths was wrongly using int and not u64
for resource tracking IDs, fix it.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet/mlx4_en: Remove ethtool flow steering rules before releasing QPs
Hadar Hen Zion [Thu, 21 Mar 2013 05:55:53 +0000 (05:55 +0000)]
net/mlx4_en: Remove ethtool flow steering rules before releasing QPs

Fix the ethtool flow steering rules cleanup to be carried out before
releasing the RX QPs.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet/mlx4_core: Fix wrong order of flow steering resources removal
Hadar Hen Zion [Thu, 21 Mar 2013 05:55:52 +0000 (05:55 +0000)]
net/mlx4_core: Fix wrong order of flow steering resources removal

On the resource tracker cleanup flow, the DMFS rules must be deleted before we
destroy the QPs, else the HW may attempt doing packet steering to non existent QPs.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet/mlx4_core: Fix wrong mask applied on EQ numbers in the wrapper
Moshe Lazer [Thu, 21 Mar 2013 05:55:51 +0000 (05:55 +0000)]
net/mlx4_core: Fix wrong mask applied on EQ numbers in the wrapper

Currently the  mask is wrongly set in the MAP_EQ wrapper, fix that.
Without the fix any EQ number above 511 is mapped to one below 511.

Signed-off-by: Moshe Lazer <moshel@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agogianfar: Remove superfluous kernel_dropped local counter
Claudiu Manoil [Thu, 21 Mar 2013 03:12:15 +0000 (03:12 +0000)]
gianfar: Remove superfluous kernel_dropped local counter

The GRO_DROP return code is handled by the core network layer.
The current kernel approach is to factorize this kind of statistics into
the upper layers, instead of having all the drivers maintaining them.

Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agogianfar: Cleanup dead code and minor formatting
Claudiu Manoil [Thu, 21 Mar 2013 03:12:14 +0000 (03:12 +0000)]
gianfar: Cleanup dead code and minor formatting

Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agogianfar: Remove 'maybe-uninitialized' compile warning
Claudiu Manoil [Thu, 21 Mar 2013 03:12:13 +0000 (03:12 +0000)]
gianfar: Remove 'maybe-uninitialized' compile warning

Warning message:
warning: 'budget_per_q' may be used uninitialized in this function

budget_per_q won't be used uninitialized since the only time
it doesn't get initialized is when entering gfar_poll with
num_act_queues == 0, meaning rstat_rxf == 0, in which case
budget_per_q is not utilized (as it has no meaning).
Inititalize budget_per_q to 0 though to suppress this compile
warning.

Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: ethernet: cpsw: fix erroneous condition in error check
Lothar Waßmann [Thu, 21 Mar 2013 02:20:11 +0000 (02:20 +0000)]
net: ethernet: cpsw: fix erroneous condition in error check

The error check in cpsw_probe_dt() has an '&&' where an '||' is
meant to be. This causes a NULL pointer dereference when incomplet DT
data is passed to the driver ('phy_id' property for cpsw_emac1
missing).

Signed-off-by: Lothar Waßmann <LW@KARO-electronics.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: sh-eth: Use pr_err instead of printk
Nobuhiro Iwamatsu [Wed, 20 Mar 2013 22:46:55 +0000 (22:46 +0000)]
net: sh-eth: Use pr_err instead of printk

Signed-off-by: Nobuhiro Iwamatsu <nobuhiro.iwamatsu.yj@renesas.com>
Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agolantiq_etop: use free_netdev(netdev) instead of kfree()
Wei Yongjun [Wed, 20 Mar 2013 21:31:42 +0000 (21:31 +0000)]
lantiq_etop: use free_netdev(netdev) instead of kfree()

Freeing netdev without free_netdev() leads to net, tx leaks.
And it may lead to dereferencing freed pointer.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: remove redundant ifdef CONFIG_CGROUPS
Zefan Li [Wed, 20 Mar 2013 16:54:51 +0000 (16:54 +0000)]
net: remove redundant ifdef CONFIG_CGROUPS

The cgroup code has been surrounded by ifdef CONFIG_NET_CLS_CGROUP
and CONFIG_NETPRIO_CGROUP.

Signed-off-by: Li Zefan <lizefan@huawei.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotcp: implement RFC5682 F-RTO
Yuchung Cheng [Wed, 20 Mar 2013 13:33:00 +0000 (13:33 +0000)]
tcp: implement RFC5682 F-RTO

This patch implements F-RTO (foward RTO recovery):

When the first retransmission after timeout is acknowledged, F-RTO
sends new data instead of old data. If the next ACK acknowledges
some never-retransmitted data, then the timeout was spurious and the
congestion state is reverted.  Otherwise if the next ACK selectively
acknowledges the new data, then the timeout was genuine and the
loss recovery continues. This idea applies to recurring timeouts
as well. While F-RTO sends different data during timeout recovery,
it does not (and should not) change the congestion control.

The implementaion follows the three steps of SACK enhanced algorithm
(section 3) in RFC5682. Step 1 is in tcp_enter_loss(). Step 2 and
3 are in tcp_process_loss().  The basic version is not supported
because SACK enhanced version also works for non-SACK connections.

The new implementation is functionally in parity with the old F-RTO
implementation except the one case where it increases undo events:
In addition to the RFC algorithm, a spurious timeout may be detected
without sending data in step 2, as long as the SACK confirms not
all the original data are dropped. When this happens, the sender
will undo the cwnd and perhaps enter fast recovery instead. This
additional check increases the F-RTO undo events by 5x compared
to the prior implementation on Google Web servers, since the sender
often does not have new data to send for HTTP.

Note F-RTO may detect spurious timeout before Eifel with timestamps
does so.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotcp: refactor CA_Loss state processing
Yuchung Cheng [Wed, 20 Mar 2013 13:32:59 +0000 (13:32 +0000)]
tcp: refactor CA_Loss state processing

Consolidate all of TCP CA_Loss state processing in
tcp_fastretrans_alert() into a new function called tcp_process_loss().
This is to prepare the new F-RTO implementation in the next patch.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotcp: refactor F-RTO
Yuchung Cheng [Wed, 20 Mar 2013 13:32:58 +0000 (13:32 +0000)]
tcp: refactor F-RTO

The patch series refactor the F-RTO feature (RFC4138/5682).

This is to simplify the loss recovery processing. Existing F-RTO
was developed during the experimental stage (RFC4138) and has
many experimental features.  It takes a separate code path from
the traditional timeout processing by overloading CA_Disorder
instead of using CA_Loss state. This complicates CA_Disorder state
handling because it's also used for handling dubious ACKs and undos.
While the algorithm in the RFC does not change the congestion control,
the implementation intercepts congestion control in various places
(e.g., frto_cwnd in tcp_ack()).

The new code implements newer F-RTO RFC5682 using CA_Loss processing
path.  F-RTO becomes a small extension in the timeout processing
and interfaces with congestion control and Eifel undo modules.
It lets congestion control (module) determines how many to send
independently.  F-RTO only chooses what to send in order to detect
spurious retranmission. If timeout is found spurious it invokes
existing Eifel undo algorithms like DSACK or TCP timestamp based
detection.

The first patch removes all F-RTO code except the sysctl_tcp_frto is
left for the new implementation.  Since CA_EVENT_FRTO is removed, TCP
westwood now computes ssthresh on regular timeout CA_EVENT_LOSS event.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agofilter: add minimal BPF JIT image disassembler
Daniel Borkmann [Wed, 20 Mar 2013 12:11:47 +0000 (12:11 +0000)]
filter: add minimal BPF JIT image disassembler

This is a minimal stand-alone user space helper, that allows for debugging or
verification of emitted BPF JIT images. This is in particular useful for
emitted opcode debugging, since minor bugs in the JIT compiler can be fatal.
The disassembler is architecture generic and uses libopcodes and libbfd.

How to get to the disassembly, example:

  1) `echo 2 > /proc/sys/net/core/bpf_jit_enable`
  2) Load a BPF filter (e.g. `tcpdump -p -n -s 0 -i eth1 host 192.168.20.0/24`)
  3) Run e.g. `bpf_jit_disasm -o` to disassemble the most recent JIT code output

`bpf_jit_disasm -o` will display the related opcodes to a particular instruction
as well. Example for x86_64:

$ ./bpf_jit_disasm
94 bytes emitted from JIT compiler (pass:3, flen:9)
ffffffffa0356000 + <x>:
   0: push   %rbp
   1: mov    %rsp,%rbp
   4: sub    $0x60,%rsp
   8: mov    %rbx,-0x8(%rbp)
   c: mov    0x68(%rdi),%r9d
  10: sub    0x6c(%rdi),%r9d
  14: mov    0xe0(%rdi),%r8
  1b: mov    $0xc,%esi
  20: callq  0xffffffffe0d01b71
  25: cmp    $0x86dd,%eax
  2a: jne    0x000000000000003d
  2c: mov    $0x14,%esi
  31: callq  0xffffffffe0d01b8d
  36: cmp    $0x6,%eax
[...]
  5c: leaveq
  5d: retq

$ ./bpf_jit_disasm -o
94 bytes emitted from JIT compiler (pass:3, flen:9)
ffffffffa0356000 + <x>:
   0: push   %rbp
55
   1: mov    %rsp,%rbp
48 89 e5
   4: sub    $0x60,%rsp
48 83 ec 60
   8: mov    %rbx,-0x8(%rbp)
48 89 5d f8
   c: mov    0x68(%rdi),%r9d
44 8b 4f 68
  10: sub    0x6c(%rdi),%r9d
44 2b 4f 6c
[...]
  5c: leaveq
c9
  5d: retq
c3

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoMerge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville...
David S. Miller [Thu, 21 Mar 2013 15:16:35 +0000 (11:16 -0400)]
Merge branch 'for-davem' of git://git./linux/kernel/git/linville/wireless-next into wireless

John W. Linville says:

====================
This is a big pull request for new features intended for the 3.10
stream...

Regarding mac80211, Johannes says:

"First, I merged mac80211/master to avoid some conflicts. This brings in
a bunch of fixes you're already familiar with. For real -next material,
I have a whole bunch of minstrel work, minstrel_ht from Felix and legacy
minstrel from Thomas (Huehn). The other Thomas (Pedersen) did a number
of changes in mesh to allow userspace peering management even when the
mesh isn't secured. Stanislaw changes suspend/resume to always
disconnect the networks. This is typically already done by
network-manager so won't make a huge difference for most users, but
fixes a number problems, particularly with USB drivers that can easily
disconnect while suspended. Ilan has a small change to allow mac80211
drivers to differentiate remain-on-channel reasons, and Jouni extends
nl80211 to allow fast roaming with full-MAC devices. I have a fairly
large number of patches as well, many of them fairly simple cleanups,
but also allowing split wiphy dumps and adding back the full wiphy
information in nl80211, station entry change checking and more VHT work
including VHT capability overrides (mostly for testing purposes)."

And for iwlwifi, Johannes says:

"Here, I also merged iwlwifi-fixes to avoid conflicts, and otherwise have
various cleanups and improvements on the MVM driver, along with a few
throughout the driver. Other than Bluetooth Coexistence from Emmanuel
there's no over-arching theme, so listing them would pretty much
reproduce the shortlog."

Regarding NFC, Samuel says:

"The 2 features we have with this one are:

- An LLCP Service Name Lookup (SNL) netlink interface for querying LLCP
  service availability from user space.
  Along the way, Thierry also improved the existing SNL interface for
  aggregating SNL responses.

- An initial LLCP socket options implementation, for setting the Receive
  Window (RW) and the Maximum Information Unit Extension (MIUX) per socket.
  This is need for the LLCP validation tests.

We also have a microread MEI build failure here: I am not sending this one to
3.9 because the MEI bus code is not there yet, so it won't break for anyone
else than me."

And for ath6kl, Kalle says:

"I added tracing support to ath6kl, along with a new Kconfig option. Now
there's also a workaround to reset USB devices when the firmware upload
fails, this happened when host was warm rebooted. There are also quite a
few small fixes or cleanup."

On top of all that, there is the usual bundle of driver updates
with new features, new hardware support and the like mixed-in.
The ath9k, b43, brcmfmac, mwifiex, rt2800, and wil6210 drivers
are all well-represented, and a few other drivers are hit as well.
I also pulled-in the wireless fixes tree in order to resolve some
pending merge conflicts.

Please let me know if there are problems!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agothermal: shorten too long mcast group name
Masatake YAMATO [Tue, 19 Mar 2013 01:47:28 +0000 (01:47 +0000)]
thermal: shorten too long mcast group name

The original name is too long.

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wirel...
John W. Linville [Wed, 20 Mar 2013 19:24:57 +0000 (15:24 -0400)]
Merge branch 'master' of git://git./linux/kernel/git/linville/wireless-next into for-davem

11 years agochelsio: use netdev_alloc_skb_ip_align
stephen hemminger [Wed, 20 Mar 2013 09:02:41 +0000 (09:02 +0000)]
chelsio: use netdev_alloc_skb_ip_align

Use netdev_alloc_sk_ip_align in the case where packet is copied.
This handles case where NET_IP_ALIGN == 0 as well as adding required header
padding.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoMerge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville...
David S. Miller [Wed, 20 Mar 2013 19:19:32 +0000 (15:19 -0400)]
Merge branch 'for-davem' of git://git./linux/kernel/git/linville/wireless

John W. Linville says:

====================
I present to you another batch of fixes intended for the 3.9 stream...

On the bluetooth bits, Gustavo says:

"I put together 3 fixes intended for 3.9, there are support for two
new devices and a NULL dereference fix in the SCO code."

Amitkumar Karwar fixes a command queueing race in mwifiex.

Bing Zhao provides a pair of mwifiex related to cleaning-up before
a shutdown.

Felix Fietkau provides an ath9k fix for a regression caused by an
earlier calibration fix, and another ath9k fix to avoid race conditions
that unnecessarily lead to chip resets.

Jussi Kivilinna prevents and skbuff leak in rtlwifi.

Stanislaw Gruszka corrects a length paramater for a DMA buffer mapping
operation in iwlegacy.

Please let me know if there are problems!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: Move selftests to common net/ subdirectory.
David S. Miller [Wed, 20 Mar 2013 19:07:56 +0000 (15:07 -0400)]
net: Move selftests to common net/ subdirectory.

Suggested-by: Daniel Baluta <daniel.baluta@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agofec: Fix the build as module
Fabio Estevam [Wed, 20 Mar 2013 08:19:32 +0000 (08:19 +0000)]
fec: Fix the build as module

Since commit ff43da86c69 (NET: FEC: dynamtic check DMA desc buff type) the
following build error happens when CONFIG_FEC=m

ERROR: "fec_ptp_init" [drivers/net/ethernet/freescale/fec.ko] undefined!
ERROR: "fec_ptp_ioctl" [drivers/net/ethernet/freescale/fec.ko] undefined!
ERROR: "fec_ptp_start_cyclecounter" [drivers/net/ethernet/freescale/fec.ko] undefined!

Fix it by exporting the required fec_ptp symbols.

Reported-by: Uwe Kleine-Koenig <u.kleine-koenig@pengutronix.de>
Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wirel...
John W. Linville [Wed, 20 Mar 2013 18:26:37 +0000 (14:26 -0400)]
Merge branch 'master' of git://git./linux/kernel/git/linville/wireless into for-davem

11 years agonet: fix psock_fanout selftest bind error message
Daniel Baluta [Wed, 20 Mar 2013 17:28:56 +0000 (19:28 +0200)]
net: fix psock_fanout selftest bind error message

Signed-off-by: Daniel Baluta <dbaluta@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agochelsio: add headroom in RX path
Eric Dumazet [Wed, 20 Mar 2013 16:33:19 +0000 (09:33 -0700)]
chelsio: add headroom in RX path

Drivers should reserve some headroom in skb used in receive path,
to avoid future head reallocation.

One possible way to do that is to use dev_alloc_skb() instead
of alloc_skb(), so that NET_SKB_PAD bytes are reserved.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agodynticks: avoid flow_cache_flush() interrupting every core
Chris Metcalf [Tue, 19 Mar 2013 11:35:58 +0000 (11:35 +0000)]
dynticks: avoid flow_cache_flush() interrupting every core

Previously, if you did an "ifconfig down" or similar on one core, and
the kernel had CONFIG_XFRM enabled, every core would be interrupted to
check its percpu flow list for items that could be garbage collected.

With this change, we generate a mask of cores that actually have any
percpu items, and only interrupt those cores.  When we are trying to
isolate a set of cpus from interrupts, this is important to do.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agobnx2x: AER revised
Yuval Mintz [Wed, 20 Mar 2013 05:21:28 +0000 (05:21 +0000)]
bnx2x: AER revised

Revised bnx2x implementation of PCI Express Advanced Error Recovery -
stop and free driver resources according to the AER flow (instead of the
currently implemented `hope-for-the-best' release approach), and do not make
any assumptions on the HW state after slot reset.

Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: fec: make local function fec_poll_controller() static
Wei Yongjun [Wed, 20 Mar 2013 05:06:11 +0000 (05:06 +0000)]
net: fec: make local function fec_poll_controller() static

fec_poll_controller() was not declared. It should be static.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: ethernet: davinci_emac: make local function emac_poll_controller() static
Wei Yongjun [Wed, 20 Mar 2013 05:01:45 +0000 (05:01 +0000)]
net: ethernet: davinci_emac: make local function emac_poll_controller() static

emac_poll_controller() was not declared. It should be static.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: mdio-octeon: Use module_platform_driver()
Sachin Kamat [Wed, 20 Mar 2013 01:41:32 +0000 (01:41 +0000)]
net: mdio-octeon: Use module_platform_driver()

module_platform_driver macro removes some boilerplate and
simplifies the code.

Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Cc: David Daney <ddaney@caviumnetworks.com>
Acked-by: David Daney <ddaney@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: mdio-gpio: Use module_platform_driver()
Sachin Kamat [Wed, 20 Mar 2013 01:41:31 +0000 (01:41 +0000)]
net: mdio-gpio: Use module_platform_driver()

module_platform_driver macro removes some boilerplate and
simplifies the code.

Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: au1k_ir: Use module_platform_driver()
Sachin Kamat [Wed, 20 Mar 2013 01:41:30 +0000 (01:41 +0000)]
net: au1k_ir: Use module_platform_driver()

module_platform_driver macro removes some boilerplate and
simplifies the code.

Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Cc: Samuel Ortiz <samuel@sortiz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: s6gmac: Use module_platform_driver()
Sachin Kamat [Wed, 20 Mar 2013 01:41:29 +0000 (01:41 +0000)]
net: s6gmac: Use module_platform_driver()

module_platform_driver macro removes some boilerplate and
simplifies the code.

Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: ks8695net: Use module_platform_driver()
Sachin Kamat [Wed, 20 Mar 2013 01:41:28 +0000 (01:41 +0000)]
net: ks8695net: Use module_platform_driver()

module_platform_driver macro removes some boilerplate and
simplifies the code.

Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoconnector: Added coredumping event to the process connector
Jesper Derehag [Tue, 19 Mar 2013 20:50:05 +0000 (20:50 +0000)]
connector: Added coredumping event to the process connector

Process connector can now also detect coredumping events.

Main aim of patch is get notified at start of coredumping, instead of
having to wait for it to finish and then being notified through EXIT
event.

Could be used for instance by process-managers that want to get
notified as soon as possible about process failures, and not
necessarily beeing notified after coredump, which could be in the
order of minutes depending on size of coredump, piping and so on.

Signed-off-by: Jesper Derehag <jderehag@hotmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agogianfar: Refactor config coalescing calls for all queues
Claudiu Manoil [Tue, 19 Mar 2013 07:40:05 +0000 (07:40 +0000)]
gianfar: Refactor config coalescing calls for all queues

The only place where gfar_configure_coalescing is called
with an actual bitmask (other than 0xff) is in gfar_poll
(on the hot path). So make gfar_configure_coalescing()
static for the buffer processing path, and export
gfar_configure_coalescing_all() for the remaining cases
that require to set coalescing for all the queues at once
(on the slow path).

Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agogianfar: Remove redundant programming of [rt]xic registers
Claudiu Manoil [Tue, 19 Mar 2013 07:40:04 +0000 (07:40 +0000)]
gianfar: Remove redundant programming of [rt]xic registers

For Multi Q Multi Group (MQ_MG_MODE) mode, the Rx/Tx colescing registers [rt]xic
are aliased with the [rt]xic0 registers (coalescing setting regs for Q0). This
avoids programming twice in a row the coalescing registers for the Rx/Tx hw Q0.

Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agogianfar: Poll only active Rx queues
Claudiu Manoil [Tue, 19 Mar 2013 07:40:03 +0000 (07:40 +0000)]
gianfar: Poll only active Rx queues

Split the napi budget fairly among the active queues only, instead
of dividing it by the total number of Rx queues assigned to the
given interrupt group.
Use the h/w indication field RXFi in rstat (receive status register)
to identify the active rx queues from the current interrupt group
(i.e. receive event occured on ring i, if ring i is part of the current
interrupt group). This indication field in rstat, RXFi i=0..7,
allows us to find out on which queues of the same interrupt group
do we have incomming traffic once we entered the polling routine for
the given interrupt group. After servicing the ring i, the corresponding
bit RXFi will be written with 1 to clear the active queue indication for
that ring.

Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agogianfar: Fix tx napi polling
Claudiu Manoil [Tue, 19 Mar 2013 07:40:02 +0000 (07:40 +0000)]
gianfar: Fix tx napi polling

There are 2 issues with the current napi poll routine, with regards
to tx ring cleanup:
1) for multi-queue devices (MQ_MG_MODE), should tx_bit_map != rx_bit_map,
which is possible (and supported in h/w) if the DT property "fsl,tx-bit-map"
holds a different value than rx_bit_map, the current polling routine will
service the wrong Tx queues in this case (i.e. the interrupt group will
receive interrupts from tx queues that it will not service)
2) Tx cleanup completion consumes napi budget, whereas the napi budget
should be reserved for Rx work only.

The patch fixes these issues and provides a clean napi polling routine.
Napi poll completion is reached when all the Rx queues have been
serviced and there is no Tx work to do.

Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agofilter: add ANC_PAY_OFFSET instruction for loading payload start offset
Daniel Borkmann [Tue, 19 Mar 2013 06:39:31 +0000 (06:39 +0000)]
filter: add ANC_PAY_OFFSET instruction for loading payload start offset

It is very useful to do dynamic truncation of packets. In particular,
we're interested to push the necessary header bytes to the user space and
cut off user payload that should probably not be transferred for some reasons
(e.g. privacy, speed, or others). With the ancillary extension PAY_OFFSET,
we can load it into the accumulator, and return it. E.g. in bpfc syntax ...

        ld #poff        ; { 0x20, 0, 0, 0xfffff034 },
        ret a           ; { 0x16, 0, 0, 0x00000000 },

... as a filter will accomplish this without having to do a big hackery in
a BPF filter itself. Follow-up JIT implementations are welcome.

Thanks to Eric Dumazet for suggesting and discussing this during the
Netfilter Workshop in Copenhagen.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: flow_dissector: add __skb_get_poff to get a start offset to payload
Daniel Borkmann [Tue, 19 Mar 2013 06:39:30 +0000 (06:39 +0000)]
net: flow_dissector: add __skb_get_poff to get a start offset to payload

__skb_get_poff() returns the offset to the payload as far as it could
be dissected. The main user is currently BPF, so that we can dynamically
truncate packets without needing to push actual payload to the user
space and instead can analyze headers only.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
David S. Miller [Wed, 20 Mar 2013 16:46:26 +0000 (12:46 -0400)]
Merge git://git./linux/kernel/git/davem/net

Pull in the 'net' tree to get Daniel Borkmann's flow dissector
infrastructure change.

Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: fix psock_fanout selftest hash collision
Willem de Bruijn [Tue, 19 Mar 2013 20:42:44 +0000 (20:42 +0000)]
net: fix psock_fanout selftest hash collision

Fix flaky results with PACKET_FANOUT_HASH depending on whether the
two flows hash into the same packet socket or not.

Also adds tests for PACKET_FANOUT_LB and PACKET_FANOUT_CPU and
replaces the counting method with a packet ring.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: fec: Define indexes as 'unsigned int'
Fabio Estevam [Wed, 20 Mar 2013 15:31:07 +0000 (12:31 -0300)]
net: fec: Define indexes as 'unsigned int'

Fix the following warnings that happen when building with W=1 option:

drivers/net/ethernet/freescale/fec.c: In function 'fec_enet_free_buffers':
drivers/net/ethernet/freescale/fec.c:1337:16: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
drivers/net/ethernet/freescale/fec.c: In function 'fec_enet_alloc_buffers':
drivers/net/ethernet/freescale/fec.c:1361:16: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
drivers/net/ethernet/freescale/fec.c: In function 'fec_enet_init':
drivers/net/ethernet/freescale/fec.c:1631:16: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]

Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet/irda: add missing error path release_sock call
Kees Cook [Wed, 20 Mar 2013 05:19:24 +0000 (05:19 +0000)]
net/irda: add missing error path release_sock call

This makes sure that release_sock is called for all error conditions in
irda_getsockopt.

Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Brad Spengler <spender@grsecurity.net>
Cc: stable@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agolpc_eth: fix error return code in lpc_eth_drv_probe()
Wei Yongjun [Wed, 20 Mar 2013 02:21:48 +0000 (02:21 +0000)]
lpc_eth: fix error return code in lpc_eth_drv_probe()

Fix to return a negative error code from the error handling
case instead of 0, as returned elsewhere in this function.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agosh_eth: check TSU registers ioremap() error
Sergei Shtylyov [Tue, 19 Mar 2013 13:41:32 +0000 (13:41 +0000)]
sh_eth: check TSU registers ioremap() error

One must check the result of ioremap() -- in this case it prevents potential
kernel oops when initializing TSU registers further on...

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agosh_eth: fix bitbang memory leak
Sergei Shtylyov [Tue, 19 Mar 2013 13:40:23 +0000 (13:40 +0000)]
sh_eth: fix bitbang memory leak

sh_mdio_init() allocates pointer to 'struct bb_info' but only stores it locally,
so that sh_mdio_release() can't free it on driver unload.  Add the pointer to
'struct bb_info' to 'struct sh_eth_private', so that sh_mdio_init() can save
'bitbang' variable for sh_mdio_release() to be able to free it later...

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoipconfig: Fix newline handling in log message.
Martin Fuzzey [Tue, 19 Mar 2013 08:19:29 +0000 (08:19 +0000)]
ipconfig: Fix newline handling in log message.

When using ipconfig the logs currently look like:

Single name server:
[    3.467270] IP-Config: Complete:
[    3.470613]      device=eth0, hwaddr=ac:de:48:00:00:01, ipaddr=172.16.42.2, mask=255.255.255.0, gw=172.16.42.1
[    3.480670]      host=infigo-1, domain=, nis-domain=(none)
[    3.486166]      bootserver=172.16.42.1, rootserver=172.16.42.1, rootpath=
[    3.492910]      nameserver0=172.16.42.1[    3.496853] ALSA device list:

Three name servers:
[    3.496949] IP-Config: Complete:
[    3.500293]      device=eth0, hwaddr=ac:de:48:00:00:01, ipaddr=172.16.42.2, mask=255.255.255.0, gw=172.16.42.1
[    3.510367]      host=infigo-1, domain=, nis-domain=(none)
[    3.515864]      bootserver=172.16.42.1, rootserver=172.16.42.1, rootpath=
[    3.522635]      nameserver0=172.16.42.1, nameserver1=172.16.42.100
[    3.529149] , nameserver2=172.16.42.200

Fix newline handling for these cases

Signed-off-by: Martin Fuzzey <mfuzzey@parkeon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoflow_keys: include thoff into flow_keys for later usage
Daniel Borkmann [Tue, 19 Mar 2013 06:39:29 +0000 (06:39 +0000)]
flow_keys: include thoff into flow_keys for later usage

In skb_flow_dissect(), we perform a dissection of a skbuff. Since we're
doing the work here anyway, also store thoff for a later usage, e.g. in
the BPF filter.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoMerge branch 'l2tp'
David S. Miller [Wed, 20 Mar 2013 16:10:46 +0000 (12:10 -0400)]
Merge branch 'l2tp'

Tom Parkin says:

====================
This l2tp bugfix patchset addresses a number of issues.

The first five patches in the series prevent l2tp sessions pinning an l2tp
tunnel open.  This occurs because the l2tp tunnel is torn down in the tunnel
socket destructor, but each session holds a tunnel socket reference which
prevents tunnels with sessions being deleted.  The solution I've implemented
here involves adding a .destroy hook to udp code, as discussed previously on
netdev[1].

The subsequent seven patches address futher bugs exposed by fixing the problem
above, or exposed through stress testing the implementation above.  Patch 11
(avoid deadlock in l2tp stats update) isn't directly related to tunnel/session
lifetimes, but it does prevent deadlocks on i386 kernels running on 64 bit
hardware.

This patchset has been tested on 32 and 64 bit preempt/non-preempt kernels,
using iproute2, openl2tp, and custom-made stress test code.

[1] http://comments.gmane.org/gmane.linux.network/259169
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agol2tp: unhash l2tp sessions on delete, not on free
Tom Parkin [Tue, 19 Mar 2013 06:11:23 +0000 (06:11 +0000)]
l2tp: unhash l2tp sessions on delete, not on free

If we postpone unhashing of l2tp sessions until the structure is freed, we
risk:

 1. further packets arriving and getting queued while the pseudowire is being
    closed down
 2. the recv path hitting "scheduling while atomic" errors in the case that
    recv drops the last reference to a session and calls l2tp_session_free
    while in atomic context

As such, l2tp sessions should be unhashed from l2tp_core data structures early
in the teardown process prior to calling pseudowire close.  For pseudowires
like l2tp_ppp which have multiple shutdown codepaths, provide an unhash hook.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agol2tp: avoid deadlock in l2tp stats update
Tom Parkin [Tue, 19 Mar 2013 06:11:22 +0000 (06:11 +0000)]
l2tp: avoid deadlock in l2tp stats update

l2tp's u64_stats writers were incorrectly synchronised, making it possible to
deadlock a 64bit machine running a 32bit kernel simply by sending the l2tp
code netlink commands while passing data through l2tp sessions.

Previous discussion on netdev determined that alternative solutions such as
spinlock writer synchronisation or per-cpu data would bring unjustified
overhead, given that most users interested in high volume traffic will likely
be running 64bit kernels on 64bit hardware.

As such, this patch replaces l2tp's use of u64_stats with atomic_long_t,
thereby avoiding the deadlock.

Ref:
http://marc.info/?l=linux-netdev&m=134029167910731&w=2
http://marc.info/?l=linux-netdev&m=134079868111131&w=2

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agol2tp: push all ppp pseudowire shutdown through .release handler
Tom Parkin [Tue, 19 Mar 2013 06:11:21 +0000 (06:11 +0000)]
l2tp: push all ppp pseudowire shutdown through .release handler

If userspace deletes a ppp pseudowire using the netlink API, either by
directly deleting the session or by deleting the tunnel that contains the
session, we need to tear down the corresponding pppox channel.

Rather than trying to manage two pppox unbind codepaths, switch the netlink
and l2tp_core session_close handlers to close via. the l2tp_ppp socket
.release handler.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agol2tp: purge session reorder queue on delete
Tom Parkin [Tue, 19 Mar 2013 06:11:20 +0000 (06:11 +0000)]
l2tp: purge session reorder queue on delete

Add calls to l2tp_session_queue_purge as a part of l2tp_tunnel_closeall
and l2tp_session_delete.  Pseudowire implementations which are deleted only
via. l2tp_core l2tp_session_delete calls can dispense with their own code for
flushing the reorder queue.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agol2tp: add session reorder queue purge function to core
Tom Parkin [Tue, 19 Mar 2013 06:11:19 +0000 (06:11 +0000)]
l2tp: add session reorder queue purge function to core

If an l2tp session is deleted, it is necessary to delete skbs in-flight
on the session's reorder queue before taking it down.

Rather than having each pseudowire implementation reaching into the
l2tp_session struct to handle this itself, provide a function in l2tp_core to
purge the session queue.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agol2tp: don't BUG_ON sk_socket being NULL
Tom Parkin [Tue, 19 Mar 2013 06:11:18 +0000 (06:11 +0000)]
l2tp: don't BUG_ON sk_socket being NULL

It is valid for an existing struct sock object to have a NULL sk_socket
pointer, so don't BUG_ON in l2tp_tunnel_del_work if that should occur.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agol2tp: take a reference for kernel sockets in l2tp_tunnel_sock_lookup
Tom Parkin [Tue, 19 Mar 2013 06:11:17 +0000 (06:11 +0000)]
l2tp: take a reference for kernel sockets in l2tp_tunnel_sock_lookup

When looking up the tunnel socket in struct l2tp_tunnel, hold a reference
whether the socket was created by the kernel or by userspace.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agol2tp: close sessions before initiating tunnel delete
Tom Parkin [Tue, 19 Mar 2013 06:11:16 +0000 (06:11 +0000)]
l2tp: close sessions before initiating tunnel delete

When a user deletes a tunnel using netlink, all the sessions in the tunnel
should also be deleted.  Since running sessions will pin the tunnel socket
with the references they hold, have the l2tp_tunnel_delete close all sessions
in a tunnel before finally closing the tunnel socket.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agol2tp: close sessions in ip socket destroy callback
Tom Parkin [Tue, 19 Mar 2013 06:11:15 +0000 (06:11 +0000)]
l2tp: close sessions in ip socket destroy callback

l2tp_core hooks UDP's .destroy handler to gain advance warning of a tunnel
socket being closed from userspace.  We need to do the same thing for
IP-encapsulation sockets.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agol2tp: export l2tp_tunnel_closeall
Tom Parkin [Tue, 19 Mar 2013 06:11:14 +0000 (06:11 +0000)]
l2tp: export l2tp_tunnel_closeall

l2tp_core internally uses l2tp_tunnel_closeall to close all sessions in a
tunnel when a UDP-encapsulation socket is destroyed.  We need to do something
similar for IP-encapsulation sockets.

Export l2tp_tunnel_closeall as a GPL symbol to enable l2tp_ip and l2tp_ip6 to
call it from their .destroy handlers.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agol2tp: add udp encap socket destroy handler
Tom Parkin [Tue, 19 Mar 2013 06:11:13 +0000 (06:11 +0000)]
l2tp: add udp encap socket destroy handler

L2TP sessions hold a reference to the tunnel socket to prevent it going away
while sessions are still active.  However, since tunnel destruction is handled
by the sock sk_destruct callback there is a catch-22: a tunnel with sessions
cannot be deleted since each session holds a reference to the tunnel socket.
If userspace closes a managed tunnel socket, or dies, the tunnel will persist
and it will be neccessary to individually delete the sessions using netlink
commands.  This is ugly.

To prevent this occuring, this patch leverages the udp encapsulation socket
destroy callback to gain early notification when the tunnel socket is closed.
This allows us to safely close the sessions running in the tunnel, dropping
the tunnel socket references in the process.  The tunnel socket is then
destroyed as normal, and the tunnel resources deallocated in sk_destruct.

While we're at it, ensure that l2tp_tunnel_closeall correctly drops session
references to allow the sessions to be deleted rather than leaking.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>