review.tizen.org Git - platform/upstream/kernel-adaptation-pc.git/log

ipv4: fix forwarding for strict source routes

After the change "Adjust semantics of rt->rt_gateway"
(commit f8126f1d51) rt_gateway can be 0 but ip_forward() compares
it directly with nexthop. What we want here is to check if traffic
is to directly connected nexthop and to fail if using gateway.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: fix sending of redirects

After "Cache input routes in fib_info nexthops" (commit
d2d68ba9fe) and "Elide fib_validate_source() completely when possible"
(commit 7a9bc9b81a) we can not send ICMP redirects. It seems we
should not cache the RTCF_DOREDIRECT flag in nh_rth_input because
the same fib_info can be used for traffic that is not redirected,
eg. from other input devices or from sources that are not in same subnet.

As result, we have to disable the caching of RTCF_DOREDIRECT
flag and to force source validation for the case when forwarding
traffic to the input device. If traffic comes from directly connected
source we allow redirection as it was done before both changes.

Avoid setting RTCF_DOREDIRECT if IN_DEV_TX_REDIRECTS
is disabled, this can avoid source address validation and to
help caching the routes.

After the change "Adjust semantics of rt->rt_gateway"
(commit f8126f1d51) we should make sure our ICMP_REDIR_HOST messages
contain daddr instead of 0.0.0.0 when target is directly connected.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: gro: fix PV6_GRO_CB(skb)->proto problem

It seems IPV6_GRO_CB(skb)->proto can be destroyed in skb_gro_receive()
if a new skb is allocated (to serve as an anchor for frag_list)

We copy NAPI_GRO_CB() only (not the IPV6 specific part) in :

*NAPI_GRO_CB(nskb) = *NAPI_GRO_CB(p);

So we leave IPV6_GRO_CB(nskb)->proto to 0 (fresh skb allocation) instead
of IPPROTO_TCP (6)

ipv6_gro_complete() isnt able to call ops->gro_complete()
[ tcp6_gro_complete() ]

Fix this by moving proto in NAPI_GRO_CB() and getting rid of
IPV6_GRO_CB

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

vlan: don't deliver frames for unknown vlans to protocols

6a32e4f9dd9219261f8856f817e6655114cfec2f made the vlan code skip marking
vlan-tagged frames for not locally configured vlans as PACKET_OTHERHOST if
there was an rx_handler, as the rx_handler could cause the frame to be received
on a different (virtual) vlan-capable interface where that vlan might be
configured.

As rx_handlers do not necessarily return RX_HANDLER_ANOTHER, this could cause
frames for unknown vlans to be delivered to the protocol stack as if they had
been received untagged.

For example, if an ipv6 router advertisement that's tagged for a locally not
configured vlan is received on an interface with macvlan interfaces attached,
macvlan's rx_handler returns RX_HANDLER_PASS after delivering the frame to the
macvlan interfaces, which caused it to be passed to the protocol stack, leading
to ipv6 addresses for the announced prefix being configured even though those
are completely unusable on the underlying interface.

The fix moves marking as PACKET_OTHERHOST after the rx_handler so the
rx_handler, if there is one, sees the frame unchanged, but afterwards,
before the frame is delivered to the protocol stack, it gets marked whether
there is an rx_handler or not.

Signed-off-by: Florian Zumbiehl <florz@florz.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: gro: selective flush of packets

Current GRO can hold packets in gro_list for almost unlimited
time, in case napi->poll() handler consumes its budget over and over.

In this case, napi_complete()/napi_gro_flush() are not called.

Another problem is that gro_list is flushed in non friendly way :
We scan the list and complete packets in the reverse order.
(youngest packets first, oldest packets last)
This defeats priorities that sender could have cooked.

Since GRO currently only store TCP packets, we dont really notice the
bug because of retransmits, but this behavior can add unexpected
latencies, particularly on mice flows clamped by elephant flows.

This patch makes sure no packet can stay more than 1 ms in queue, and
only in stress situations.

It also complete packets in the right order to minimize latencies.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Jesse Gross <jesse@nicira.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

skge: Add DMA mask quirk for Marvell 88E8001 on ASUS P5NSLI motherboard

Marvell 88E8001 on an ASUS P5NSLI motherboard is unable to send/receive
packets on a system with >4gb ram unless a 32bit DMA mask is used.

This issue has been around for years and a fix was sent 3.5 years ago, but
there was some debate as to whether it should instead be fixed as a PCI quirk.
http://www.spinics.net/lists/netdev/msg88670.html

However, 18 months later a similar workaround was introduced for another
chipset exhibiting the same problem.
http://www.spinics.net/lists/netdev/msg142287.html

Signed-off-by: Graham Gower <graham.gower@gmail.com>
Signed-off-by: Jan Ceuleers <jan.ceuleers@computer.org>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: Address various sparse warnings

This patch fixes type assignment issues, function definition and symbol
shadowing which triggered sparse warnings.

Signed-off-by: Jay Hernandez <jay@chelsio.com>
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: Don't report stale pmtu values to userspace

We report cached pmtu values even if they are already expired.
Change this to not report these values after they are expired
and fix a race in the expire time calculation, as suggested by
Eric Dumazet.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: Don't create nh exeption when the device mtu is smaller than the reported pmtu

When a local tool like tracepath tries to send packets bigger than
the device mtu, we create a nh exeption and set the pmtu to device
mtu. The device mtu does not expire, so check if the device mtu is
smaller than the reported pmtu and don't crerate a nh exeption in
that case.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: Always invalidate or update the route on pmtu events

Some protocols, like IPsec still cache routes. So we need to invalidate
the old route on pmtu events to avoid the reuse of stale routes.
We also need to update the mtu and expire time of the route if we already
use a nh exception route, otherwise we ignore newly learned pmtu values
after the first expiration.

With this patch we always invalidate or update the route on pmtu events.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: allocate enough data in t4_memory_rw()

MEMWIN0_APERTURE is the size in bytes.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ptp: use list_move instead of list_del/list_add

Using list_move() instead of list_del() + list_add().

dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>

vxlan: remove unused including <linux/version.h>

Remove including <linux/version.h> that don't need it.

dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: gro: fix a potential crash in skb_gro_reset_offset

Before accessing skb first fragment, better make sure there
is one.

This is probably not needed for old kernels, since an ethernet frame
cannot contain only an ethernet header, but the recent GRO addition
to tunnels makes this patch needed.

Also skb_gro_reset_offset() can be static, it actually allows
compiler to inline it.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: GRO should be ECN friendly

IPv4 side of the problem was addressed in commit a9e050f4e7f9d
(net: tcp: GRO should be ECN friendly)

This patch does the same, but for IPv6 : A Traffic Class mismatch
doesnt mean flows are different, but instead should force a flush
of previous packets.

This patch removes artificial packet reordering problem.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Fix skb_under_panic oops in neigh_resolve_output

The retry loop in neigh_resolve_output() and neigh_connected_output()
call dev_hard_header() with out reseting the skb to network_header.
This causes the retry to fail with skb_under_panic. The fix is to
reset the network_header within the retry loop.

Signed-off-by: Ramesh Nagappa <ramesh.nagappa@ericsson.com>
Reviewed-by: Shawn Lu <shawn.lu@ericsson.com>
Reviewed-by: Robert Coulson <robert.coulson@ericsson.com>
Reviewed-by: Billie Alsup <billie.alsup@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers/net/ethernet/marvell/sky2.c: fix error return code

The function sky2_probe() return 0 for success and negative value
for most of its internal tests failures. There are two exceptions
that are error cases going to err_out*:. For this two cases, the
function abort its success execution path, but returns non negative
value, making it dificult for a caller function to notice the error.

This patch fixes the error cases that do not return negative values.

This was found by Coccinelle, but the code change was made by hand.
This patch is not robot generated.

A simplified version of the semantic match that finds this problem is
as follows: (http://coccinelle.lip6.fr/)

// <smpl>
(
if@p1 ($ret < 0\|ret != 0$)
{ ... return ret; }
|
ret@p1 = 0
)
... when != ret = e1
    when != &ret
*if(...)
{
  ... when != ret = e2
      when forall
return ret;
}
// </smpl>

Signed-off-by: Peter Senna Tschudin <peter.senna@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers/net/ethernet/marvell/skge.c: fix error return code

The function skge_probe() return 0 for success and negative value
for most of its internal tests failures. There is one exception
that is error case going to err_out_led_off:. For this error case, the
function abort its success execution path, but returns non negative
value, making it difficult for a caller function to notice the error.

This patch fixes the error case that do not return negative value.

This was found by Coccinelle, but the code change was made by hand.
This patch is not robot generated.

A simplified version of the semantic match that finds this problem is
as follows: (http://coccinelle.lip6.fr/)

// <smpl>
(
if@p1 ($ret < 0\|ret != 0$)
{ ... return ret; }
|
ret@p1 = 0
)
... when != ret = e1
    when != &ret
*if(...)
{
  ... when != ret = e2
      when forall
return ret;
}
// </smpl>

Signed-off-by: Peter Senna Tschudin <peter.senna@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers/net/ethernet/sun/sungem.c: fix error return code

The function gem_init_one() return 0 for success and negative value
for most of its internal tests failures. There is one exception
that is error case going to err_out_free_consistent:. For this error
case, the function abort its success execution path, but returns non
negative value, making it difficult for a caller function to notice
the error.

This patch fixes the error case that do not return negative value.

This was found by Coccinelle, but the code change was made by hand.
This patch is not robot generated.

A simplified version of the semantic match that finds this problem is
as follows: (http://coccinelle.lip6.fr/)

// <smpl>
(
if@p1 ($ret < 0\|ret != 0$)
{ ... return ret; }
|
ret@p1 = 0
)
... when != ret = e1
    when != &ret
*if(...)
{
  ... when != ret = e2
      when forall
return ret;
}
// </smpl>

Signed-off-by: Peter Senna Tschudin <peter.senna@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers/net/ethernet/sun/niu.c: fix error return code

The function niu_pci_init_one() return 0 for success and negative value
for most of its internal tests failures. There is one exception
that is error case going to err_out_free_res:. For this error case, the
function abort its success execution path, but returns non negative
value, making it difficult for a caller function to notice the error.

This patch fixes the error case that do not return negative value.

This was found by Coccinelle, but the code change was made by hand.
This patch is not robot generated.

A simplified version of the semantic match that finds this problem is
as follows: (http://coccinelle.lip6.fr/)

// <smpl>
(
if@p1 ($ret < 0\|ret != 0$)
{ ... return ret; }
|
ret@p1 = 0
)
... when != ret = e1
    when != &ret
*if(...)
{
  ... when != ret = e2
      when forall
return ret;
}
// </smpl>

Signed-off-by: Peter Senna Tschudin <peter.senna@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers/net/ethernet/renesas/sh_eth.c: fix error return code

The function sh_eth_drv_probe() return 0 for success and negative value
for most of its internal tests failures. There is one exception
that is error case going to out_release:. For this error case, the
function abort its success execution path, but returns non negative
value, making it difficult for a caller function to notice the error.

This patch fixes the error case that do not return negative value.

This was found by Coccinelle, but the code change was made by hand.
This patch is not robot generated.

A simplified version of the semantic match that finds this problem is
as follows: (http://coccinelle.lip6.fr/)

// <smpl>
(
if@p1 ($ret < 0\|ret != 0$)
{ ... return ret; }
|
ret@p1 = 0
)
... when != ret = e1
    when != &ret
*if(...)
{
  ... when != ret = e2
      when forall
return ret;
}
// </smpl>

Signed-off-by: Peter Senna Tschudin <peter.senna@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers/net/ethernet/natsemi/xtsonic.c: fix error return code

The function sonic_probe1() return 0 for success and negative value
for most of its internal tests failures. There is one exception
that is error case going to out:. For this error case, the
function abort its success execution path, but returns non negative
value, making it difficult for a caller function to notice the error.

This patch fixes the error case that do not return negative value.

This was found by Coccinelle, but the code change was made by hand.
This patch is not robot generated.

A simplified version of the semantic match that finds this problem is
as follows: (http://coccinelle.lip6.fr/)

// <smpl>
(
if@p1 ($ret < 0\|ret != 0$)
{ ... return ret; }
|
ret@p1 = 0
)
... when != ret = e1
    when != &ret
*if(...)
{
  ... when != ret = e2
      when forall
return ret;
}
// </smpl>

Signed-off-by: Peter Senna Tschudin <peter.senna@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers/net/ethernet/amd/au1000_eth.c: fix error return code

The function au1000_probe() return 0 for success and negative value
for most of its internal tests failures. There are exceptions
that are error cases going to err_out:. For this cases, the
function abort its success execution path, but returns non negative
value, making it dificult for a caller function to notice the error.

This patch fixes the error cases that do not return negative values.

This was found by Coccinelle, but the code change was made by hand.
This patch is not robot generated.

A simplified version of the semantic match that finds this problem is
as follows: (http://coccinelle.lip6.fr/)

// <smpl>
(
if@p1 ($ret < 0\|ret != 0$)
{ ... return ret; }
|
ret@p1 = 0
)
... when != ret = e1
    when != &ret
*if(...)
{
  ... when != ret = e2
      when forall
return ret;
}
// </smpl>

Signed-off-by: Peter Senna Tschudin <peter.senna@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers/net/ethernet/amd/amd8111e.c: fix error return code

The function amd8111e_probe_one() return 0 for success and negative
value for most of its internal tests failures. There are two exceptions
that are error cases going to err_free_reg:. For this two cases, the
function abort its success execution path, but returns non negative
value, making it dificult for a caller function to notice the error.

This patch fixes the error cases that do not return negative values.

This was found by Coccinelle, but the code change was made by hand.
This patch is not robot generated.

A simplified version of the semantic match that finds this problem is
as follows: (http://coccinelle.lip6.fr/)

// <smpl>
(
if@p1 ($ret < 0\|ret != 0$)
{ ... return ret; }
|
ret@p1 = 0
)
... when != ret = e1
    when != &ret
*if(...)
{
  ... when != ret = e2
      when forall
return ret;
}
// </smpl>

Signed-off-by: Peter Senna Tschudin <peter.senna@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c: fix error return code

The function qlcnic_probe() return 0 for success and negative value
for most of its internal tests failures. There is one exception
that is error case going to err_out_free_netdev:. For this error case,
the function abort its success execution path, but returns non negative
value, making it difficult for a caller function to notice the error.

This patch fixes the error case that do not return negative value.

This was found by Coccinelle, but the code change was made by hand.
This patch is not robot generated.

A simplified version of the semantic match that finds this problem is
as follows: (http://coccinelle.lip6.fr/)

// <smpl>
(
if@p1 ($ret < 0\|ret != 0$)
{ ... return ret; }
|
ret@p1 = 0
)
... when != ret = e1
    when != &ret
*if(...)
{
  ... when != ret = e2
      when forall
return ret;
}
// </smpl>

Signed-off-by: Peter Senna Tschudin <peter.senna@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers/net/irda/sh_sir.c: fix error return code

The function sh_sir_probe() return 0 for success and negative value
for most of its internal tests failures. There are two exceptions
that are error cases going to err_mem_*:. For this two cases, the
function abort its success execution path, but returns non negative
value, making it dificult for a caller function to notice the error.

This patch fixes the error cases that do not return negative values.

This was found by Coccinelle, but the code change was made by hand.
This patch is not robot generated.

A simplified version of the semantic match that finds this problem is
as follows: (http://coccinelle.lip6.fr/)

// <smpl>
(
if@p1 ($ret < 0\|ret != 0$)
{ ... return ret; }
|
ret@p1 = 0
)
... when != ret = e1
    when != &ret
*if(...)
{
  ... when != ret = e2
      when forall
return ret;
}
// </smpl>

Signed-off-by: Peter Senna Tschudin <peter.senna@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers/net/irda/sh_irda.c: fix error return code

The function sh_irda_probe() return 0 for success and negative value
for most of its internal tests failures. There is one exception
that is error case going to err_mem_4:. For this error case, the
function abort its success execution path, but returns non negative
value, making it difficult for a caller function to notice the error.

This patch fixes the error case that do not return negative value.

This was found by Coccinelle, but the code change was made by hand.
This patch is not robot generated.

A simplified version of the semantic match that finds this problem is
as follows: (http://coccinelle.lip6.fr/)

// <smpl>
(
if@p1 ($ret < 0\|ret != 0$)
{ ... return ret; }
|
ret@p1 = 0
)
... when != ret = e1
    when != &ret
*if(...)
{
  ... when != ret = e2
      when forall
return ret;
}
// </smpl>

Signed-off-by: Peter Senna Tschudin <peter.senna@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers/net/irda/sa1100_ir.c: fix error return code

The function sa1100_irda_probe() return 0 for success and negative
value for most of its internal tests failures. There is one exception
that is error case going to err_mem_4:. For this error case, the
function abort its success execution path, but returns non negative
value, making it difficult for a caller function to notice the error.

This patch fixes the error case that do not return negative value.

This was found by Coccinelle, but the code change was made by hand.
This patch is not robot generated.

A simplified version of the semantic match that finds this problem is
as follows: (http://coccinelle.lip6.fr/)

// <smpl>
(
if@p1 ($ret < 0\|ret != 0$)
{ ... return ret; }
|
ret@p1 = 0
)
... when != ret = e1
    when != &ret
*if(...)
{
  ... when != ret = e2
      when forall
return ret;
}
// </smpl>

Signed-off-by: Peter Senna Tschudin <peter.senna@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers/net/irda/pxaficp_ir.c: fix error return code

The function pxa_irda_probe() return 0 for success and negative value
for most of its internal tests failures. There is one exception
that is error case going to err_mem_3:. For this error case, the
function abort its success execution path, but returns non negative
value, making it difficult for a caller function to notice the error.

This patch fixes the error case that do not return negative value.

This was found by Coccinelle, but the code change was made by hand.
This patch is not robot generated.

A simplified version of the semantic match that finds this problem is
as follows: (http://coccinelle.lip6.fr/)

// <smpl>
(
if@p1 ($ret < 0\|ret != 0$)
{ ... return ret; }
|
ret@p1 = 0
)
... when != ret = e1
    when != &ret
*if(...)
{
  ... when != ret = e2
      when forall
return ret;
}
// </smpl>

Signed-off-by: Peter Senna Tschudin <peter.senna@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers/net/irda/mcs7780.c: fix error return code

The function mcs_probe() return 0 for success and negative value
for most of its internal tests failures. There is one exception
that is error case going to error2:. For this error case, the
function abort its success execution path, but returns non negative
value, making it difficult for a caller function to notice the error.

This patch fixes the error case that do not return negative value.

A simplified version of the semantic match that finds this problem is
as follows: (http://coccinelle.lip6.fr/)

// <smpl>
(
if@p1 ($ret < 0\|ret != 0$)
{ ... return ret; }
|
ret@p1 = 0
)
... when != ret = e1
    when != &ret
*if(...)
{
  ... when != ret = e2
      when forall
return ret;
}
// </smpl>

Signed-off-by: Peter Senna Tschudin <peter.senna@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers/net/irda/irtty-sir.c: fix error return code

The function irtty_open() return 0 for success and negative value
for most of its internal tests failures. There is one exception
that is error case going to out_put:. For this error case, the
function abort its success execution path, but returns non negative
value, making it difficult for a caller function to notice the error.

This patch fixes the error case that do not return negative value.

This was found by Coccinelle, but the code change was made by hand.
This patch is not robot generated.

A simplified version of the semantic match that finds this problem is
as follows: (http://coccinelle.lip6.fr/)

// <smpl>
(
if@p1 ($ret < 0\|ret != 0$)
{ ... return ret; }
|
ret@p1 = 0
)
... when != ret = e1
    when != &ret
*if(...)
{
  ... when != ret = e2
      when forall
return ret;
}
// </smpl>

Signed-off-by: Peter Senna Tschudin <peter.senna@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers/net/ethernet/sis/sis900.c: fix error return code

The function sis900_probe() return 0 for success and negative value
for most of its internal tests failures. There is one exception
that is error case going to err_out_cleardev:. Fore this error case,
the function abort its success execution path, but returns non negative
value, making it difficult for a caller function to notice the error.

This patch fixes the error case that do not return negative value.

This was found by Coccinelle, but the code change was made by hand.
This patch is not robot generated.

A simplified version of the semantic match that finds this problem is
as follows: (http://coccinelle.lip6.fr/)

// <smpl>
(
if@p1 ($ret < 0\|ret != 0$)
{ ... return ret; }
|
ret@p1 = 0
)
... when != ret = e1
    when != &ret
*if(...)
{
  ... when != ret = e2
      when forall
return ret;
}
// </smpl>

Signed-off-by: Peter Senna Tschudin <peter.senna@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers/net/ethernet/natsemi/natsemi.c: fix error return code

The function natsemi_probe1() return 0 for success and negative value
for most of its internal tests failures. There is one exception
that is error case going to err_create_file:. Fore this error case the
function abort its success execution path, but returns non negative value,
making it difficult for a caller function to notice the error.

This patch fixes the error case that do not return negative value.

This was found by Coccinelle, but the code change was made by hand.
This patch is not robot generated.

A simplified version of the semantic match that finds this problem is
as follows: (http://coccinelle.lip6.fr/)

// <smpl>
(
if@p1 ($ret < 0\|ret != 0$)
{ ... return ret; }
|
ret@p1 = 0
)
... when != ret = e1
    when != &ret
*if(...)
{
  ... when != ret = e2
      when forall
return ret;
}
// </smpl>

Signed-off-by: Peter Senna Tschudin <peter.senna@gmail.com>
Acked-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers/net/ethernet/dec/tulip/dmfe.c: fix error return code

The function dmfe_init_one() return 0 for success and negative value
for most of its internal tests failures. There are three exceptions
that are error cases going to err_out_*:. Fore this three cases the
function abort its success execution path, but returns non negative
value, making it dificult for a caller function to notice the error.

This patch fixes the error cases that do not return negative values.

This was found by Coccinelle, but the code change was made by hand.
This patch is not robot generated.

A simplified version of the semantic match that finds this problem is
as follows: (http://coccinelle.lip6.fr/)

// <smpl>
(
if@p1 ($ret < 0\|ret != 0$)
{ ... return ret; }
|
ret@p1 = 0
)
... when != ret = e1
    when != &ret
*if(...)
{
  ... when != ret = e2
      when forall
return ret;
}
// </smpl>

Signed-off-by: Peter Senna Tschudin <peter.senna@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: remove skb recycling

Over time, skb recycling infrastructure got litle interest and
many bugs. Generic rx path skb allocation is now using page
fragments for efficient GRO / TCP coalescing, and recyling
a tx skb for rx path is not worth the pain.

Last identified bug is that fat skbs can be recycled
and it can endup using high order pages after few iterations.

With help from Maxime Bizon, who pointed out that commit
87151b8689d (net: allow pskb_expand_head() to get maximum tailroom)
introduced this regression for recycled skbs.

Instead of fixing this bug, lets remove skb recycling.

Drivers wanting really hot skbs should use build_skb() anyway,
to allocate/populate sk_buff right before netif_receive_skb()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Maxime Bizon <mbizon@freebox.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

infiniband: pass rdma_cm module to netlink_dump_start

set netlink_dump_control.module to avoid panic.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Cc: Roland Dreier <roland@kernel.org>
Cc: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netlink: add reference of module in netlink_dump_start

I get a panic when I use ss -a and rmmod inet_diag at the
same time.

It's because netlink_dump uses inet_diag_dump which belongs to module
inet_diag.

I search the codes and find many modules have the same problem. We
need to add a reference to the module which the cb->dump belongs to.

Thanks for all help from Stephen,Jan,Eric,Steffen and Pablo.

Change From v3:
change netlink_dump_start to inline,suggestion from Pablo and
Eric.

Change From v2:
delete netlink_dump_done,and call module_put in netlink_dump
and netlink_sock_destruct.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'uapi-prep' of git://git.infradead.org/users/dhowells/linux-headers

Pull UAPI disintegration fixes from David Howells:
"There are three main parts:

(1) I found I needed some more fixups in the wake of testing Arm64
     (some asm/unistd.h files had weird guards that caused problems -
     mostly in arches for which I don't have a compiler) and some
     __KERNEL__ splitting needed to take place in Arm64.

(2) I found that c6x was missing some __KERNEL__ guards in its
     asm/signal.h.  Mark Salter pointed me at a tree with a patch to
     remove that file entirely and use the asm-generic variant instead.

(3) Lastly, m68k turned out to have a header installation problem due
     to it lacking a kvm_para.h file.

     The conditional installation bits for linux/kvm_para.h, linux/kvm.h
     and linux/a.out.h weren't very well specified - and didn't work if
     an arch didn't have the asm/ version of that file, but there *was*
     an asm-generic/ version.

     It seems the "ifneq $((wildcard ...),)" for each of those three
     headers in include/kernel/Kbuild is invoked twice during header
     installation, and the second time it matches on the just installed
     asm-generic/kvm_para.h file and thus incorrectly installs
     linux/kvm_para.h as well.

     Most arches actually have an asm/kvm_para.h, so this wasn't
     detectable in those."

* 'uapi-prep' of git://git.infradead.org/users/dhowells/linux-headers:
  UAPI: Fix conditional header installation handling (notably kvm_para.h on m68k)
  c6x: remove c6x signal.h
  UAPI: Split compound conditionals containing __KERNEL__ in Arm64
  UAPI: Fix the guards on various asm/unistd.h files
  c6x: make dsk6455 the default config

Merge branch 'slab/for-linus' of git://git./linux/kernel/git/penberg/linux

Pull SLAB changes from Pekka Enberg:
"New and noteworthy:

  * More SLAB allocator unification patches from Christoph Lameter and
    others.  This paves the way for slab memcg patches that hopefully
    will land in v3.8.

  * SLAB tracing improvements from Ezequiel Garcia.

  * Kernel tainting upon SLAB corruption from Dave Jones.

  * Miscellanous SLAB allocator bug fixes and improvements from various
    people."

* 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux: (43 commits)
  slab: Fix build failure in __kmem_cache_create()
  slub: init_kmem_cache_cpus() and put_cpu_partial() can be static
  mm/slab: Fix kmem_cache_alloc_node_trace() declaration
  Revert "mm/slab: Fix kmem_cache_alloc_node_trace() declaration"
  mm, slob: fix build breakage in __kmalloc_node_track_caller
  mm/slab: Fix kmem_cache_alloc_node_trace() declaration
  mm/slab: Fix typo _RET_IP -> _RET_IP_
  mm, slub: Rename slab_alloc() -> slab_alloc_node() to match SLAB
  mm, slab: Rename __cache_alloc() -> slab_alloc()
  mm, slab: Match SLAB and SLUB kmem_cache_alloc_xxx_trace() prototype
  mm, slab: Replace 'caller' type, void* -> unsigned long
  mm, slob: Add support for kmalloc_track_caller()
  mm, slab: Remove silly function slab_buffer_size()
  mm, slob: Use NUMA_NO_NODE instead of -1
  mm, sl[au]b: Taint kernel when we detect a corrupted slab
  slab: Only define slab_error for DEBUG
  slab: fix the DEADLOCK issue on l3 alien lock
  slub: Zero initial memory segment for kmem_cache and kmem_cache_node
  Revert "mm/sl[aou]b: Move sysfs_slab_add to common"
  mm/sl[aou]b: Move kmem_cache refcounting to common code
  ...

Merge tag 'stable/for-linus-3.7-arm-tag' of git://git./linux/kernel/git/konrad/xen

Pull ADM Xen support from Konrad Rzeszutek Wilk:

  Features:
   * Allow a Linux guest to boot as initial domain and as normal guests
     on Xen on ARM (specifically ARMv7 with virtualized extensions).  PV
     console, block and network frontend/backends are working.
  Bug-fixes:
   * Fix compile linux-next fallout.
   * Fix PVHVM bootup crashing.

  The Xen-unstable hypervisor (so will be 4.3 in a ~6 months), supports
  ARMv7 platforms.

  The goal in implementing this architecture is to exploit the hardware
  as much as possible.  That means use as little as possible of PV
  operations (so no PV MMU) - and use existing PV drivers for I/Os
  (network, block, console, etc).  This is similar to how PVHVM guests
  operate in X86 platform nowadays - except that on ARM there is no need
  for QEMU.  The end result is that we share a lot of the generic Xen
  drivers and infrastructure.

  Details on how to compile/boot/etc are available at this Wiki:

    http://wiki.xen.org/wiki/Xen_ARMv7_with_Virtualization_Extensions

  and this blog has links to a technical discussion/presentations on the
  overall architecture:

    http://blog.xen.org/index.php/2012/09/21/xensummit-sessions-new-pvh-virtualisation-mode-for-arm-cortex-a15arm-servers-and-x86/

* tag 'stable/for-linus-3.7-arm-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: (21 commits)
  xen/xen_initial_domain: check that xen_start_info is initialized
  xen: mark xen_init_IRQ __init
  xen/Makefile: fix dom-y build
  arm: introduce a DTS for Xen unprivileged virtual machines
  MAINTAINERS: add myself as Xen ARM maintainer
  xen/arm: compile netback
  xen/arm: compile blkfront and blkback
  xen/arm: implement alloc/free_xenballooned_pages with alloc_pages/kfree
  xen/arm: receive Xen events on ARM
  xen/arm: initialize grant_table on ARM
  xen/arm: get privilege status
  xen/arm: introduce CONFIG_XEN on ARM
  xen: do not compile manage, balloon, pci, acpi, pcpu and cpu_hotplug on ARM
  xen/arm: Introduce xen_ulong_t for unsigned long
  xen/arm: Xen detection and shared_info page mapping
  docs: Xen ARM DT bindings
  xen/arm: empty implementation of grant_table arch specific functions
  xen/arm: sync_bitops
  xen/arm: page.h definitions
  xen/arm: hypercalls
  ...

Merge branch 'next' of git://git./linux/kernel/git/benh/powerpc

Pull powerpc updates from Benjamin Herrenschmidt:
"Some highlights in addition to the usual batch of fixes:

   - 64TB address space support for 64-bit processes by Aneesh Kumar

   - Gavin Shan did a major cleanup & re-organization of our EEH support
     code (IBM fancy PCI error handling & recovery infrastructure) which
     paves the way for supporting different platform backends, along
     with some rework of the PCIe code for the PowerNV platform in order
     to remove home made resource allocations and instead use the
     generic code (which is possible after some small improvements to it
     done by Gavin).

   - Uprobes support by Ananth N Mavinakayanahalli

   - A pile of embedded updates from Freescale folks, including new SoC
     and board supports, more KVM stuff including preparing for 64-bit
     BookE KVM support, ePAPR 1.1 updates, etc..."

Fixup trivial conflicts in drivers/scsi/ipr.c

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (146 commits)
  powerpc/iommu: Fix multiple issues with IOMMU pools code
  powerpc: Fix VMX fix for memcpy case
  driver/mtd:IFC NAND:Initialise internal SRAM before any write
  powerpc/fsl-pci: use 'Header Type' to identify PCIE mode
  powerpc/eeh: Don't release eeh_mutex in eeh_phb_pe_get
  powerpc: Remove tlb batching hack for nighthawk
  powerpc: Set paca->data_offset = 0 for boot cpu
  powerpc/perf: Sample only if SIAR-Valid bit is set in P7+
  powerpc/fsl-pci: fix warning when CONFIG_SWIOTLB is disabled
  powerpc/mpc85xx: Update interrupt handling for IFC controller
  powerpc/85xx: Enable USB support in p1023rds_defconfig
  powerpc/smp: Do not disable IPI interrupts during suspend
  powerpc/eeh: Fix crash on converting OF node to edev
  powerpc/eeh: Lock module while handling EEH event
  powerpc/kprobe: Don't emulate store when kprobe stwu r1
  powerpc/kprobe: Complete kprobe and migrate exception frame
  powerpc/kprobe: Introduce a new thread flag
  powerpc: Remove unused __get_user64() and __put_user64()
  powerpc/eeh: Global mutex to protect PE tree
  powerpc/eeh: Remove EEH PE for normal PCI hotplug
  ...

Merge git://git./linux/kernel/git/davem/net

Pull networking changes from David Miller:
"The most important bit in here is the fix for input route caching from
  Eric Dumazet, it's a shame we couldn't fully analyze this in time for
  3.6 as it's a 3.6 regression introduced by the routing cache removal.

  Anyways, will send quickly to -stable after you pull this in.

  Other changes of note:

   1) Fix lockdep splats in team and bonding, from Eric Dumazet.

   2) IPV6 adds link local route even when there is no link local
      address, from Nicolas Dichtel.

   3) Fix ixgbe PTP implementation, from Jacob Keller.

   4) Fix excessive stack usage in cxgb4 driver, from Vipul Pandya.

   5) MAC length computed improperly in VLAN demux, from Antonio
      Quartulli."

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (26 commits)
  ipv6: release reference of ip6_null_entry's dst entry in __ip6_del_rt
  Remove noisy printks from llcp_sock_connect
  tipc: prevent dropped connections due to rcvbuf overflow
  silence some noisy printks in irda
  team: set qdisc_tx_busylock to avoid LOCKDEP splat
  bonding: set qdisc_tx_busylock to avoid LOCKDEP splat
  sctp: check src addr when processing SACK to update transport state
  sctp: fix a typo in prototype of __sctp_rcv_lookup()
  ipv4: add a fib_type to fib_info
  can: mpc5xxx_can: fix section type conflict
  can: peak_pcmcia: fix error return code
  can: peak_pci: fix error return code
  cxgb4: Fix build error due to missing linux/vmalloc.h include.
  bnx2x: fix ring size for 10G functions
  cxgb4: Dynamically allocate memory in t4_memory_rw() and get_vpd_params()
  ixgbe: add support for X540-AT1
  ixgbe: fix poll loop for FDIRCTRL.INIT_DONE bit
  ixgbe: fix PTP ethtool timestamping function
  ixgbe: (PTP) Fix PPS interrupt code
  ixgbe: Fix PTP X540 SDP alignment code for PPS signal
  ...

Merge branch 'akpm' (Andrew's patch-bomb)

Merge misc patches from Andrew Morton:
"The MM tree is rather stuck while I wait to find out what the heck is
  happening with sched/numa.  Probably I'll need to route around all the
  code which was added to -next, sigh.

  So this is "everything else", or at least most of it - other small
  bits are still awaiting resolutions of various kinds."

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (180 commits)
  lib/decompress.c add __init to decompress_method and data
  kernel/resource.c: fix stack overflow in __reserve_region_with_split()
  omfs: convert to use beXX_add_cpu()
  taskstats: cgroupstats_user_cmd() may leak on error
  aoe: update aoe-internal version number to 50
  aoe: update documentation to better reflect aoe-plus-udev usage
  aoe: remove unused code
  aoe: make dynamic block minor numbers the default
  aoe: update and specify AoE address guards and error messages
  aoe: retain static block device numbers for backwards compatibility
  aoe: support more AoE addresses with dynamic block device minor numbers
  aoe: update documentation with new URL and VM settings reference
  aoe: update copyright year in touched files
  aoe: update internal version number to 49
  aoe: remove unused code and add cosmetic improvements
  aoe: increase net_device reference count while using it
  aoe: associate frames with the AoE storage target
  aoe: disallow unsupported AoE minor addresses
  aoe: do revalidation steps in order
  aoe: failover remote interface based on aoe_deadsecs parameter
  ...

lib/decompress.c add __init to decompress_method and data

Fix the warning:

  WARNING: vmlinux.o(.text+0x14cfd8): Section mismatch in reference from the variable compressed_formats to the function .init.text:gunzip()
  The function compressed_formats() references
  the function __init gunzip().
  etc..

Within decompress.c, compressed_formats[] needs 'a __initdata annotation',
because some of it's data members refer to functions which will be
unloaded after init.

Consequently, its user decompress_method() will get the __init prefix.

Signed-off-by: Hein Tibosch <hein_tibosch@yahoo.es>
Cc: Albin Tonnerre <albin.tonnerre@free-electrons.com>
Cc: Phillip Lougher <phillip@lougher.demon.co.uk>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

kernel/resource.c: fix stack overflow in __reserve_region_with_split()

Using a recursive call add a non-conflicting region in
__reserve_region_with_split() could result in a stack overflow in the case
that the recursive calls are too deep. Convert the recursive calls to an
iterative loop to avoid the problem.

Tested on a machine containing 135 regions. The kernel no longer panicked
with stack overflow.

Also tested with code arbitrarily adding regions with no conflict,
embedding two consecutive conflicts and embedding two non-consecutive
conflicts.

Signed-off-by: T Makphaibulchoke <tmac@hp.com>
Reviewed-by: Ram Pai <linuxram@us.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@gmail.com>
Cc: Wei Yang <weiyang@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

omfs: convert to use beXX_add_cpu()

Convert cpu_to_beXX(beXX_to_cpu(E1) + E2) to use beXX_add_cpu().

dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

taskstats: cgroupstats_user_cmd() may leak on error

If prepare_reply() succeeds we have allocated memory for 'rep_skb'. If
nla_reserve() then subsequently fails and returns NULL we fail to release
the memory we allocated, thus causing a leak.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Cc: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

aoe: update aoe-internal version number to 50

Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

aoe: update documentation to better reflect aoe-plus-udev usage

Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

aoe: remove unused code

Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

aoe: make dynamic block minor numbers the default

Because udev use is so widespread, making the old static mapping the
default is too conservative, given the severe limitations it places on
usable AoE addresses. Storage virtualization and larger shelves have made
the old limitations too confining.

These changes make the dynamic block device minor numbers the default,
removing the limitations on usable AoE addresses.

The static arrangement is still available with aoe_dyndevs=0, and the
aoe-stat tool from the userland aoetools package, the user space
counterpart to the aoe driver, recognizes the case where there is a
mismatch between the minor number in sysfs and the minor number in a
special device file.

Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

aoe: update and specify AoE address guards and error messages

In general, specific is better when it comes to messages about AoE usage
problems. Also, explicit checks for the AoE broadcast addresses are
added.

Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

aoe: retain static block device numbers for backwards compatibility

The old mapping between AoE target shelf and slot addresses and the block
device minor number is retained as a backwards-compatible feature, with a
new "aoe_dyndevs" module parameter available for enabling dynamic block
device minor numbers.

Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

aoe: support more AoE addresses with dynamic block device minor numbers

The ATA over Ethernet protocol uses a major (shelf) and minor (slot)
address to identify a particular storage target. These changes remove an
artificial limitation the aoe driver imposes on the use of AoE addresses.
For example, without these changes, the slot address has a maximum of 15,
but users commonly use slot numbers much greater than that.

The AoE shelf and slot address space is often used sparsely. Instead of
using a static mapping between AoE addresses and the block device minor
number, the block device minor numbers are now allocated on demand.

Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

aoe: update documentation with new URL and VM settings reference

The old area has a new URL. Also, now that the driver can perform better,
it is worth mentioning the VM settings that help aoe to sink dirty pages
out early, avoiding unecessary memory pressure when much I/O is going on.

Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

aoe: update copyright year in touched files

Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

aoe: update internal version number to 49

The internal version number of the aoe driver appears in a console message
when the driver loads and is usually obtained by the user with the
userland aoe-version tool, part of the aoetools.[1]

Although this patchset includes bugfixes backported from higher-numbered
versions published on the coraid.com website, it is a form of version 49.

1. http://aoetools.sourceforge.net/

Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

aoe: remove unused code and add cosmetic improvements

This change removes some unused code and attempts to increase code
consistency.

Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

aoe: increase net_device reference count while using it

This change eliminates the danger that the user could rmmod the driver for
a network interface that is being used for AoE by the aoe driver.

Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

aoe: associate frames with the AoE storage target

In the driver code, "target" and aoetgt refer to a particular remote
interface on the AoE storage target. The latter is identified by its AoE
major and minor addresses. Commands that are being sent to an AoE storage
target {major, minor} can be sent or retransmitted to any of the remote
MAC addresses associated with the AoE storage target.

That is, frames are naturally associated with not an aoetgt (AoE major,
AoE minor, remote MAC address) but an aoedev (AoE major, AoE minor).
Making the code reflect that reality simplifies the driver, especially
when the path to a remote MAC address becomes unusable.

Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

aoe: disallow unsupported AoE minor addresses

A guard is inserted to prevent AoE minor addresses (slot addresses) higher
than 15 to be used, as they are not yet supported by the driver.

There is a change coming that will allow the aoe driver to overcome this
limit by using system device minor numbers dynamically, but until then,
this guard prevents unexpected targets from being used by the driver when
AoE targets with high minor numbers are on the AoE network.

Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

aoe: do revalidation steps in order

The discovery process begins with an optional AoE config query command and
an AoE config query response. Normally when an aoe device is already
open, the config query response does not trigger an ATA identify device
command to be sent out, since the response contains storage capacity
information that, if changed, could surprise the user of the device.

The userland "aoe-revalidate" tool uses a character device to trigger an
AoE config query for a particular AoE storage target and an ATA device
identify command, even when the device is open.

This change causes the config query to go out first, reflecting the normal
discovery sequence. The responses could come back in any order, so this
change is fairly cosmetic.

Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

aoe: failover remote interface based on aoe_deadsecs parameter

The aoe_deadsecs module parameter allows the user to specify a hard limit
on the number of seconds an AoE command can be retransmitted before the
AoE block device is considered to have failed.

Using aoe_deadsecs to determine the time we try using a different remote
interface helps to ensure that the hard limit is not reached before we've
tried to recover by sending to a different remote port.

As a data storage target, the AoE target is unambiguously identified by
its {major, minor} AoE address tuple, and an AoE target can have multiple
MAC addresses. However, note that "target" in the driver code and
comments means a {major, minor, MAC address} tuple, as in "somewhere to
send packets".

Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

aoe: use packets that work with the smallest-MTU local interface

Users with several network interfaces dedicated to AoE generally do not
configure them to support different-sized AoE data payloads on purpose.

For a given AoE target, there will be a set of local network interfaces
that can reach it. Using only the payload that will fit in the
smallest-sized MTU of all those local interfaces greatly simplifies the
driver, especially in failure scenarios.

Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

aoe: use a kernel thread for transmissions

The dev_queue_xmit function needs to have interrupts enabled, so the most
simple way to get the locking right but still fulfill that requirement is
to use a process that can call dev_queue_xmit serially over queued
transmissions.

Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

aoe: become I/O request queue handler for increased user control

To allow users to choose an elevator algorithm for their particular
workloads, change from a make_request-style driver to an
I/O-request-queue-handler-style driver.

We have to do a couple of things that might be surprising. We manipulate
the page _count directly on the assumption that we still have no guarantee
that users of the block layer are prohibited from submitting bios
containing pages with zero reference counts.[1] If such a prohibition now
exists, I can get rid of the _count manipulation.

Just as before this patch, we still keep track of the sk_buffs that the
network layer still hasn't finished yet and cap the resources we use with
a "pool" of skbs.[2]

Now that the block layer maintains the disk stats, the aoe driver's
diskstats function can go away.

1. https://lkml.org/lkml/2007/3/1/374
2. https://lkml.org/lkml/2007/7/6/241

Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

aoe: kernel thread handles I/O completions for simple locking

Make the frames the aoe driver uses to track the relationship between bios
and packets more flexible and detached, so that they can be passed to an
"aoe_ktio" thread for completion of I/O.

The frames are handled much like skbs, with a capped amount of
preallocation so that real-world use cases are likely to run smoothly and
degenerate gracefully even under memory pressure.

Decoupling I/O completion from the receive path and serializing it in a
process makes it easier to think about the correctness of the locking in
the driver, especially in the case of a remote MAC address becoming
unusable.

[dan.carpenter@oracle.com: cleanup an allocation a bit]
Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

aoe: for performance support larger packet payloads

tAdd adds the ability to work with large packets composed of a number of
segments, using the scatter gather feature of the block layer (biovecs)
and the network layer (skb frag array).  The motivation is the performance
gained by using a packet data payload greater than a page size and by
using the network card's scatter gather feature.

Users of the out-of-tree aoe driver already had these changes, but since
early 2011, they have complained of increased memory utilization and
higher CPU utilization during heavy writes.[1] The commit below appears
related, as it disables scatter gather on non-IP protocols inside the
harmonize_features function, even when the NIC supports sg.

  commit f01a5236bd4b140198fbcc550f085e8361fd73fa
  Author: Jesse Gross <jesse@nicira.com>
  Date:   Sun Jan 9 06:23:31 2011 +0000

      net offloading: Generalize netif_get_vlan_features().

With that regression in place, transmits always linearize sg AoE packets,
but in-kernel users did not have this patch.  Before 2.6.38, though, these
changes were working to allow sg to increase performance.

1. http://www.spinics.net/lists/linux-mm/msg15184.html

Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

nbd: handle discard requests

Add discard support to nbd.  If the nbd-server supports discard, it will
send NBD_FLAG_SEND_TRIM to the client.  The client will then set the flag
in the kernel via NBD_SET_FLAGS, which tells the kernel to enable discards
for the device (QUEUE_FLAG_DISCARD).

If discard support is enabled, then when the nbd client system receives a
discard request, this will be passed along to the nbd-server.  When the
discard request is received by the nbd-server, it will perform:

fallocate(.. FALLOC_FL_PUNCH_HOLE ..)

To punch a hole in the backend storage, which is no longer needed.

Signed-off-by: Paul Clements <paul.clements@steeleye.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

nbd: add set flags ioctl

Add a set-flags ioctl, allowing various option flags to be set on an nbd
device. This allows the nbd-client to set the device flags (to enable
read-only mode, or enable discard support, etc.).

Flags are typically specified by the nbd-server. During the negotiation
phase of the nbd connection, the server sends its flags to the client.
The client then uses NBD_SET_FLAGS to inform the kernel of the options.

Also included is a one-line fix to debug output for the set-timeout ioctl.

Signed-off-by: Paul Clements <paul.clements@steeleye.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

rapidio: add destination ID allocation mechanism

Replace the single global destination ID counter with per-net allocation
mechanism to allow independent destID management for each available
RapidIO network. Using bitmap based mechanism instead of counters allows
destination ID release and reuse in systems that support hot-swap.

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

rapidio/rionet: rework to support multiple RIO master ports

Make RIONET driver multi-net safe/capable by introducing per-net lists of
RapidIO network peers. Rework registration of network adapters to support
all available RIO master port devices.

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

rapidio: run discovery as an asynchronous process

Modify mport initialization routine to run the RapidIO discovery process
asynchronously. This allows to have an arbitrary order of enumerating and
discovering ports in systems with multiple RapidIO controllers without
creating a deadlock situation if enumerator port is registered after a
discovering one.

Making netID matching to mportID ensures consistent net ID assignment in
multiport RapidIO systems with asynchronous discovery process (global
counter implementation is affected by race between threads).

[akpm@linux-foundation.org: tweak code layput]
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

rapidio: use device lists handling on per-net basis

Modify handling of device lists to resolve issues caused by using single
global list of RIO devices during enumeration/discovery. The most common
sign of existing issue is incorrect contents of switch routing tables in
systems with multiple mport controllers while single-port configuration
performs as expected.

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

rapidio: fix blocking wait for discovery ready

The following set of patches provides modifications targeting support of
multiple RapidIO master port (mport) devices on a CPU-side of
RapidIO-capable board.  While the RapidIO subsystem code has definitions
suitable for multi-controller/multi-net support, the existing
implementation cannot be considered ready for multiple mport
configurations.

=========== NOTES: =============

a) The patches below do not address RapidIO side view of multiport
   processing elements defined in Part 6 of RapidIO spec Rev.2.1 (section
   6.4.1).  These devices have Base Device ID CSR (0x60) and Component Tag
   CSR (0x6C) shared by all SRIO ports.  For example, Freescale's P4080,
   P3041 and P5020 have a dual-port SRIO controller implemented according
   the specification.  Enumeration/discovery of such devices from RapidIO
   side may require device-specific fixups.

b) Devices referenced above may also require implementation specific
   code to setup a host device ID for mport device.  These operations are
   not addressed by patches in this package.

=================================

Details about provided patches:

1. Fix blocking wait for discovery ready

   While it does not happen on PowerPC based platforms, there is
   possibility of stalled CPU warning dump on x86 based platforms that run
   RapidIO discovery process if they wait too long for being enumerated.

   Currently users can avoid it by disabling the soft-lockup detector
   using "nosoftlockup" kernel parameter OR by ensuring that enumeration
   is completed before soft-lockup is detected.

   This patch eliminates blocking wait and keeps a scheduler running.
   It also is required for patch 3 below which introduces asynchronous
   discovery process.

2. Use device lists handling on per-net basis

   This patch allows to correctly support multiple RapidIO nets and
   resolves possible issues caused by using single global list of devices
   during RapidIO system enumeration/discovery.  The most common sign of
   existing issue is incorrect contents of switch routing tables in
   systems with multiple mport controllers while single-port configuration
   performs as expected.

   The patch does not eliminate the global RapidIO device list but
   changes some routines in enumeration/discovery to use per-net device
   lists instead.  This way compatibility with upper layer RIO routines is
   preserved.

3.  Run discovery as an asynchronous process

   This patch modifies RapidIO initialization routine to asynchronously
   run the discovery process for each corresponding mport.  This allows
   having an arbitrary order of enumerating and discovering mports without
   creating a deadlock situation if an enumerator port was registered
   after a discovering one.

   On boards with multiple discovering mports it also eliminates order
   dependency between mports and may reduce total time of RapidIO
   subsystem initialization.

   Making netID matching to mportID ensures consistent netID assignment
   in multiport RapidIO systems with asynchronous discovery process
   (global counter implementation is affected by race between threads).

4. Rework RIONET to support multiple RIO master ports

   In the current version of the driver rionet_probe() has comment "XXX
   Make multi-net safe".  Now it is a good time to address this comment.

   This patch makes RIONET driver multi-net safe/capable by introducing
   per-net lists of RapidIO network peers.  It also enables to register
   network adapters for all available mport devices.

5. Add destination ID allocation mechanism

   The patch replaces a single global destination ID counter with
   per-net allocation mechanism to allow independent destID management for
   each available RapidIO network.  Using bitmap based mechanism instead
   of counters allows destination ID release and reuse in systems that
   support hot-swap.

This patch:

Fix blocking wait loop in the RapidIO discovery routine to avoid warning
dumps about stalled CPU on x86 platforms.

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

rapidio: apply RX/TX enable to active switch ports only

Apply port RX/TX enable operations only to active switch ports.

RapidIO specification (Part 6: LP-Serial Physical Layer) recommends to
keep Output Port Enable (TX) and Input Port Enable (RX) control bits in
disabled state (0b0) after device reset. It also allows to have
implementation specific reset state for these bits.

This patch ensures that TX/RX enable action is applied only to active
switch's ports while preserving an initial state of inactive ones.

This patch is intended to keep inactive switch ports with inbound and
outbound packet transfers disabled to block unexpected packets during hot
insertion event. While it does not fix any visible malfunction it is
intended to prevent such events in future.

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

rapidio/tsi721: add inbound memory mapping callbacks

Add Tsi721 routines to support RapidIO subsystem's inbound memory mapping
interface (RapidIO to system's local memory).

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

rapidio: add inbound memory mapping interface

Add common inbound memory mapping/unmapping interface. This allows to make
local memory space accessible from the RapidIO side using hardware mapping
capabilities of RapidIO bridging devices. The new interface is intended to
enable data transfers between RapidIO devices in combination with DMA engine
support.

This patch is based on patch submitted by Li Yang <leoli@freescale.com>
(https://lists.ozlabs.org/pipermail/linuxppc-dev/2009-April/071210.html)

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

drivers/rapidio/devices/tsi721.c: fix error return code

Convert a nonnegative error return code to a negative one, as returned
elsewhere in the function.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
(
if@p1 ($ret < 0\|ret != 0$)
{ ... return ret; }
|
ret@p1 = 0
)
... when != ret = e1
    when != &ret
*if(...)
{
  ... when != ret = e2
      when forall
return ret;
}
// </smpl>

Signed-off-by: Peter Senna Tschudin <peter.senna@gmail.com>
Acked-by: Alexandre Bounine <alexandre.bounine@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

rapidio: fix kerneldoc warnings after DMA support was added

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Reported-by: Robert P. J. Day <rpjday@crashcourse.ca>
Cc: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

rapidio/tsi721: modify mport name assignment

Modify RapidIO mport device name assignment to include device name of PCIe
side of Tsi721 bridge. The new name format is intended to provide
definitive reference between RapidIO and PCIe sides of the bridge in
systems with multiple Tsi721 bridges.

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

rapidio/rionet: fix multicast packet transmit logic

Fix multicast packet transmit logic to account for repetitive transmission
of single skb:
- correct check for available buffers (this bug may produce NULL pointer
  crash dump in case of heavy traffic);
- update skb user count (incorrect user counter causes a warning dump from
  net_tx_action routine during multicast transfers in systems with three or
  more rionet participants).

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

kdump: remove unneeded include

The inclusion of <generated/utsrelease.h> is unnecessary.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Reviewed-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

fs/proc/root.c: use NULL instead of 0 for pointer

This cleanup also fixes the following sparse warning:

fs/proc/root.c:64:45: warning: Using plain integer as NULL pointer

Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

proc_sysctl.c: use BUG_ON instead of BUG

The use of if (!head) BUG(); can be replaced with the single line
BUG_ON(!head).

Signed-off-by: Prasad Joshi <prasadjoshi.linux@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

proc: use kzalloc instead of kmalloc and memset

Part of the memory will be written twice after this change, but that
should be negligible.

[akpm@linux-foundation.org: fix __proc_create() coding-style issues, remove unneeded zero-initialisations]
Signed-off-by: yan <clouds.yan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

proc: no need to initialize proc_inode->fd in proc_get_inode()

proc_get_inode() obtains the inode via a call to iget_locked().
iget_locked() calls alloc_inode() which will call proc_alloc_inode() which
clears proc_inode.fd, so there is no need to clear this field in
proc_get_inode().

If iget_locked() instead found the inode via find_inode_fast(), that inode
will not have I_NEW set so this change has no effect.

Signed-off-by: yan <clouds.yan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

proc: return -ENOMEM when inode allocation failed

If proc_get_inode() returns NULL then presumably it encountered memory
exhaustion. proc_lookup_de() should return -ENOMEM in this case, not
-EINVAL.

Signed-off-by: yan <clouds.yan@gmail.com>
Cc: Ryan Mallon <rmallon@gmail.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

coredump: extend core dump note section to contain file names of mapped files

This note has the following format:

long count     -- how many files are mapped
long page_size -- units for file_ofs
array of [COUNT] elements of
   long start
   long end
   long file_ofs
followed by COUNT filenames in ASCII: "FILE1" NUL "FILE2" NUL...

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Amerigo Wang <amwang@redhat.com>
Cc: "Jonathan M. Foote" <jmfoote@cert.org>
Cc: Roland McGrath <roland@hack.frob.com>
Cc: Pedro Alves <palves@redhat.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

coredump: add a new elf note with siginfo of the signal

Existing PRSTATUS note contains only si_signo, si_code, si_errno fields
from the siginfo of the signal which caused core to be dumped.

There are tools which try to analyze crashes for possible security
implications, and they want to use, among other data, si_addr field from
the SIGSEGV.

This patch adds a new elf note, NT_SIGINFO, which contains the complete
siginfo_t of the signal which killed the process.

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: Amerigo Wang <amwang@redhat.com>
Cc: "Jonathan M. Foote" <jmfoote@cert.org>
Cc: Roland McGrath <roland@hack.frob.com>
Cc: Pedro Alves <palves@redhat.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

compat: move compat_siginfo_t definition to asm/compat.h

This is a preparatory patch for the introduction of NT_SIGINFO elf note.

Make the location of compat_siginfo_t uniform across eight architectures
which have it.  Now it can be pulled in by including asm/compat.h or
linux/compat.h.

Most of the copies are verbatim.  compat_uid[32]_t had to be replaced by
__compat_uid[32]_t.  compat_uptr_t had to be moved up before
compat_siginfo_t in asm/compat.h on a several architectures (tile already
had it moved up).  compat_sigval_t had to be relocated from linux/compat.h
to asm/compat.h.

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Amerigo Wang <amwang@redhat.com>
Cc: "Jonathan M. Foote" <jmfoote@cert.org>
Cc: Roland McGrath <roland@hack.frob.com>
Cc: Pedro Alves <palves@redhat.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

coredump: pass siginfo_t* to do_coredump() and below, not merely signr

This is a preparatory patch for the introduction of NT_SIGINFO elf note.

With this patch we pass "siginfo_t *siginfo" instead of "int signr" to
do_coredump() and put it into coredump_params. It will be used by the
next patch. Most changes are simple s/signr/siginfo->si_signo/.

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: Amerigo Wang <amwang@redhat.com>
Cc: "Jonathan M. Foote" <jmfoote@cert.org>
Cc: Roland McGrath <roland@hack.frob.com>
Cc: Pedro Alves <palves@redhat.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

coredump: use SUID_DUMPABLE_ENABLED rather than hardcoded 1

Cosmetic. Change setup_new_exec() and task_dumpable() to use
SUID_DUMPABLE_ENABLED for /bin/grep.

[akpm@linux-foundation.org: checkpatch fixes]
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

coredump: add support for %d=__get_dumpable() in core name

Some coredump handlers want to create a core file in a way compatible with
standard behavior. Standard behavior with fs.suid_dumpable = 2 is to
create core file with uid=gid=0. However, there was no way for coredump
handler to know that the process being dumped was suid'ed.

This patch adds the new %d specifier for format_corename() which simply
reports __get_dumpable(mm->flags), this is compatible with
/proc/sys/fs/suid_dumpable we already have.

Addresses https://bugzilla.redhat.com/show_bug.cgi?id=787135

Developed during a discussion with Denys Vlasenko.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Denys Vlasenko <vda.linux@googlemail.com>
Cc: Alex Kelly <alex.page.kelly@gmail.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Cong Wang <amwang@redhat.com>
Cc: Jiri Moskovcak <jmoskovc@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

coredump: update coredump-related headers

Create a new header file, fs/coredump.h, which contains functions only
used by the new coredump.c. It also moves do_coredump to the
include/linux/coredump.h header file, for consistency.

Signed-off-by: Alex Kelly <alex.page.kelly@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

coredump: make core dump functionality optional

Adds an expert Kconfig option, CONFIG_COREDUMP, which allows disabling of
core dump. This saves approximately 2.6k in the compiled kernel, and
complements CONFIG_ELF_CORE, which now depends on it.

CONFIG_COREDUMP also disables coredump-related sysctls, except for
suid_dumpable and related functions, which are necessary for ptrace.

[akpm@linux-foundation.org: fix binfmt_aout.c build]
Signed-off-by: Alex Kelly <alex.page.kelly@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

device_cgroup: rename whitelist to exception list

This patch replaces the "whitelist" usage in the code and comments and replace
them by exception list related information.

Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: James Morris <jmorris@namei.org>
Cc: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

device_cgroup: convert device_cgroup internally to policy + exceptions

The original model of device_cgroup is having a whitelist where all the
allowed devices are listed. The problem with this approach is that is
impossible to have the case of allowing everything but few devices.

The reason for that lies in the way the whitelist is handled internally:
since there's only a whitelist, the "all devices" entry would have to be
removed and replaced by the entire list of possible devices but the ones
that are being denied. Since dev_t is 32 bits long, representing the allowed
devices as a bitfield is not memory efficient.

This patch replaces the "whitelist" by a "exceptions" list and the default
policy is kept as "deny_all" variable in dev_cgroup structure.

The current interface determines that whenever "a" is written to devices.allow
or devices.deny, the entry masking all devices will be added or removed,
respectively. This behavior is kept and it's what will determine the default
policy:

# cat devices.list
a *:* rwm
# echo a >devices.deny
# cat devices.list
# echo a >devices.allow
# cat devices.list
a *:* rwm

The interface is also preserved. For example, if one wants to block only access
to /dev/null:
# ls -l /dev/null
crw-rw-rw- 1 root root 1, 3 Jul 24 16:17 /dev/null
# echo a >devices.allow
# echo "c 1:3 rwm" >devices.deny
# cat /dev/null
cat: /dev/null: Operation not permitted
# echo >/dev/null
bash: /dev/null: Operation not permitted
mknod /tmp/null c 1 3
mknod: `/tmp/null': Operation not permitted
# echo "c 1:3 r" >devices.allow
# cat /dev/null
# echo >/dev/null
bash: /dev/null: Operation not permitted
mknod /tmp/null c 1 3
mknod: `/tmp/null': Operation not permitted
# echo "c 1:3 rw" >devices.allow
# echo >/dev/null
# cat /dev/null
# mknod /tmp/null c 1 3
mknod: `/tmp/null': Operation not permitted
# echo "c 1:3 rwm" >devices.allow
# echo >/dev/null
# cat /dev/null
# mknod /tmp/null c 1 3
#

Note that I didn't rename the functions/variables in this patch, but in the
next one to make reviewing easier.

Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: James Morris <jmorris@namei.org>
Cc: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

device_cgroup: introduce dev_whitelist_clean()

This function cleans all the items in a whitelist and will be used by the next
patches.

Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: James Morris <jmorris@namei.org>
Cc: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

device_cgroup: add "deny_all" in dev_cgroup structure

deny_all will determine if the default policy is to deny all device access
unless for the ones in the exception list.

This variable will be used in the next patches to convert device_cgroup
internally into a default policy + rules.

Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: James Morris <jmorris@namei.org>
Cc: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>