review.tizen.org Git - platform/upstream/kernel-adaptation-pc.git/log

mlx4_en: Byte Queue Limit support

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlx4_en: Moving to Interrupts for TX completions

Moving to interrupts instead of polling fpr TX completions
Avoiding situations where skb can be held in by the driver for
a long time (till timer expires).
The change is also necessary for supporting BQL.

Removing comp_lock that was required because we could handle TX
completions from several contexts: Interrupts, timer, polling.
Now there is only interrupts

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlx4_en: Added Ethtool support for TX Interrupt coalescing

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: sk_add_backlog() is too agressive for TCP

While investigating TCP performance problems on 10Gb+ links, we found a
tcp sender was dropping lot of incoming ACKS because of sk_rcvbuf limit
in sk_add_backlog(), especially if receiver doesnt use GRO/LRO and sends
one ACK every two MSS segments.

A sender usually tweaks sk_sndbuf, but sk_rcvbuf stays at its default
value (87380), allowing a too small backlog.

A TCP ACK, even being small, can consume nearly same truesize space than
outgoing packets. Using sk_rcvbuf + sk_sndbuf as a limit makes sense and
is fast to compute.

Performance results on netperf, single flow, receiver with disabled
GRO/LRO : 7500 Mbits instead of 6050 Mbits, no more TCPBacklogDrop
increments at sender.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Cc: Rick Jones <rick.jones2@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: add a limit parameter to sk_add_backlog()

sk_add_backlog() & sk_rcvqueues_full() hard coded sk_rcvbuf as the
memory limit. We need to make this limit a parameter for TCP use.

No functional change expected in this patch, all callers still using the
old sk_rcvbuf limit.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Cc: Rick Jones <rick.jones2@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net ax25: Fix the build when sysctl support is disabled.

Randy Dunlap <rdunlap@xenotime.net> reported:

> On 04/23/2012 12:07 AM, Stephen Rothwell wrote:
>
>> Hi all,
>>
>> Changes since 20120420:
>
>
> include/net/ax25.h:447:75: error: expected ';' before '}' token
>
> static inline int ax25_register_dev_sysctl(ax25_dev *ax25_dev) { return 0 };
> static inline void ax25_unregister_dev_sysctl(ax25_dev *ax25_dev) {};
>
> First function: move ';' inside braces.
> Second function: drop the ';'.

Put the semicolons where it makes sense.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

net sysctl: Add place holder functions for when sysctl support is compiled out of the kernel.

Randy Dunlap <rdunlap@xenotime.net> reported:
> On 04/23/2012 12:07 AM, Stephen Rothwell wrote:
>
>> Hi all,
>>
>> Changes since 20120420:
>
>
>
> ERROR: "unregister_net_sysctl_table" [net/phonet/phonet.ko] undefined!
> ERROR: "register_net_sysctl" [net/phonet/phonet.ko] undefined!
>
> when CONFIG_SYSCTL is not enabled.

Add static inline stub functions to gracefully handle the case when sysctl
support is not present.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

be2net: fix ethtool get settings

ethtool get settings was not displaying all the settings correctly.
use the get_phy_info to get more information about the PHY to fix this.

Signed-off-by: Ajit Khaparde <ajit.khaparde@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: Fix build warning after tcp_{v4,v6}_init_sock consolidation.

net/ipv4/tcp_ipv4.c: In function 'tcp_v4_init_sock':
net/ipv4/tcp_ipv4.c:1891:19: warning: unused variable 'tp' [-Wunused-variable]
net/ipv6/tcp_ipv6.c: In function 'tcp_v6_init_sock':
net/ipv6/tcp_ipv6.c:1836:19: warning: unused variable 'tp' [-Wunused-variable]

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

gianfar: add GRO support

Replace netif_receive_skb with napi_gro_receive.

Signed-off-by: Jiajun Wu <b06378@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

af_packet: packet_getsockopt() cleanup

Factorize code, since most fetched values are int type.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: move duplicate code from tcp_v4_init_sock()/tcp_v6_init_sock()

This commit moves the (substantial) common code shared between
tcp_v4_init_sock() and tcp_v6_init_sock() to a new address-family
independent function, tcp_init_sock().

Centralizing this functionality should help avoid drift issues,
e.g. where the IPv4 side is updated without a corresponding update to
IPv6. There was already some drift: IPv4 initialized snd_cwnd to
TCP_INIT_CWND, while the IPv6 side was still initializing snd_cwnd to
2 (in this case it should not matter, since snd_cwnd is also
initialized in tcp_init_metrics(), but the general risks and
maintenance overhead remain).

When diffing the old and new code, note that new tcp_init_sock()
function uses the order of steps from the tcp_v4_init_sock()
implementation (the order is slightly different in
tcp_v6_init_sock()).

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: allow better page reuse in splice(sock -> pipe)

splice() from socket to pipe needs linear_to_page() helper to transfert
skb header to part of page.

We can reset the offset in the current sk->sk_sndmsg_page if we are the
last user of the page.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

team: add per-port option for enabling/disabling ports

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

team: allow to enable/disable ports

This patch changes content of hashlist (used to get port struct by
computed index (0...en_port_count-1)). Now the hash list contains only
enabled ports so userspace will be able to say what ports can be used
for tx/rx. This becomes handy when userspace will need to disable ports
which does not belong to active aggregator. By default, newly added port
is enabled.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

team: lb: let userspace care about port macs

Better to leave this for userspace

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: change big iov allocations

iov of more than 8 entries are allocated in sendmsg()/recvmsg() through
sock_kmalloc()

As these allocations are temporary only and small enough, it makes sense
to use plain kmalloc() and avoid sk_omem_alloc atomic overhead.

Slightly changed fast path to be even faster.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Mike Waychison <mikew@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: Repair connection-time negotiated parameters

There are options, which are set up on a socket while performing
TCP handshake. Need to resurrect them on a socket while repairing.
A new sockoption accepts a buffer and parses it. The buffer should
be CODE:VALUE sequence of bytes, where CODE is standard option
code and VALUE is the respective value.

Only 4 options should be handled on repaired socket.

To read 3 out of 4 of these options the TCP_INFO sockoption can be
used. An ability to get the last one (the mss_clamp) was added by
the previous patch.

Now the restore. Three of these options -- timestamp_ok, mss_clamp
and snd_wscale -- are just restored on a coket.

The sack_ok flags has 2 issues. First, whether or not to do sacks
at all. This flag is just read and set back. No other sack info is
saved or restored, since according to the standart and the code
dropping all sack-ed segments is OK, the sender will resubmit them
again, so after the repair we will probably experience a pause in
connection. Next, the fack bit. It's just set back on a socket if
the respective sysctl is set. No collected stats about packets flow
is preserved. As far as I see (plz, correct me if I'm wrong) the
fack-based congestion algorithm survives dropping all of the stats
and repairs itself eventually, probably losing the performance for
that period.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: Report mss_clamp with TCP_MAXSEG option in repair mode

The mss_clamp is the only connection-time negotiated option which
cannot be obtained from the user space. Make the TCP_MAXSEG sockopt
report one in the repair mode.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: Repair socket queues

Reading queues under repair mode is done with recvmsg call.
The queue-under-repair set by TCP_REPAIR_QUEUE option is used
to determine which queue should be read. Thus both send and
receive queue can be read with this.

Caller must pass the MSG_PEEK flag.

Writing to queues is done with sendmsg call and yet again --
the repair-queue option can be used to push data into the
receive queue.

When putting an skb into receive queue a zero tcp header is
appented to its head to address the tcp_hdr(skb)->syn and
the ->fin checks by the (after repair) tcp_recvmsg. These
flags flags are both set to zero and that's why.

The fin cannot be met in the queue while reading the source
socket, since the repair only works for closed/established
sockets and queueing fin packet always changes its state.

The syn in the queue denotes that the respective skb's seq
is "off-by-one" as compared to the actual payload lenght. Thus,
at the rcv queue refill we can just drop this flag and set the
skb's sequences to precice values.

When the repair mode is turned off, the write queue seqs are
updated so that the whole queue is considered to be 'already sent,
waiting for ACKs' (write_seq = snd_nxt <= snd_una). From the
protocol POV the send queue looks like it was sent, but the data
between the write_seq and snd_nxt is lost in the network.

This helps to avoid another sockoption for setting the snd_nxt
sequence. Leaving the whole queue in a 'not yet sent' state (as
it will be after sendmsg-s) will not allow to receive any acks
from the peer since the ack_seq will be after the snd_nxt. Thus
even the ack for the window probe will be dropped and the
connection will be 'locked' with the zero peer window.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: Initial repair mode

This includes (according the the previous description):

* TCP_REPAIR sockoption

This one just puts the socket in/out of the repair mode.
Allowed for CAP_NET_ADMIN and for closed/establised sockets only.
When repair mode is turned off and the socket happens to be in
the established state the window probe is sent to the peer to
'unlock' the connection.

* TCP_REPAIR_QUEUE sockoption

This one sets the queue which we're about to repair. The
'no-queue' is set by default.

* TCP_QUEUE_SEQ socoption

Sets the write_seq/rcv_nxt of a selected repaired queue.
Allowed for TCP_CLOSE-d sockets only. When the socket changes
its state the other seq-s are changed by the kernel according
to the protocol rules (most of the existing code is actually
reused).

* Ability to forcibly bind a socket to a port

The sk->sk_reuse is set to SK_FORCE_REUSE.

* Immediate connect modification

The connect syscall initializes the connection, then directly jumps
to the code which finalizes it.

* Silent close modification

The close just aborts the connection (similar to SO_LINGER with 0
time) but without sending any FIN/RST-s to peer.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: Move code around

This is just the preparation patch, which makes the needed for
TCP repair code ready for use.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sock: Introduce named constants for sk_reuse

Name them in a "backward compatible" manner, i.e. reuse or not
are still 1 and 0 respectively. The reuse value of 2 means that
the socket with it will forcibly reuse everyone else's port.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers/net: decouple ISA and ISA_DMA_API

The two options are separate, and some platforms (e.g. arm pxa)
have ISA slots but no ISA dma controller, so they cannot build
drivers using the DMA API functions.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

sungem: use mdelay instead of udelay where necessary

Some architectures like ARM cannot handle large numbers as
arguments to udelay, so the drivers should use mdelay when
delaying for multiple miliseconds.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

donauboe: replace excessive udelay with msleep

No driver should spin the CPU for 10ms, so better use
an msleep, which is allowed in the ->suspend function.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

8390: select CRC32 support

The ax88796 driver uses the CRC32 functions, so make sure that
they are actually enabled.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers/net: iwmc3200 depends on EXPERIMENTAL

The iwmc3200 driver selects other code in Kconfig that depends on
EXPERIMENTAL. Kconfig warns about this when CONFIG_EXPERIMENTAL
is not already set, so logically, these options should also
be marked experimental or promoted to stable.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

caif: include linux/io.h

The caif_shmcore requires io.h in order to use ioremap, so include that
explicitly to compile in all configurations.

Also add a note about the use of ioremap(), which is not a proper way
to map a DMA buffer into kernel space. It's not completely clear what
the intention is for using ioremap, but it is clear that the result
of ioremap must not simply be accessed using kernel pointers but
should use readl/writel or memcopy_{to,from}io. Assigning the result
of ioremap to a regular pointer that can also be set to something
else is not ok.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers/net: add missing __devexit_p() annotations

Drivers that refer to a __devexit function in an operations
structure need to annotate that pointer with __devexit_p so
replace it with a NULL pointer when the section gets discarded.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

davinci_cpdma: export symbols used by other drivers

The davinci_emac driver can be a module, so the symbols
it needs from the cpdma driver must be exported.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

pch_gbe: remove suspicious comment

The time stamping code in this driver appears to have been copied from
the ixp4xx_eth.c driver, including this timing comment. I had actually
measured the time stamp delay on an IXP425, but I really doubt that this
value also applies here.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

pch_gbe: run the ptp bpf just once per packet

This patch fixes code which needlessly ran the BPF twice per
packet. Instead, we just run the classifier once and test
whether the packet is any kind of PTP event message.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

pch_gbe: correct receive time stamp filtering

This patch fixes the driver so that multicast PTP event messages can
be recognized by the hardware time stamping unit. The station address
register must be set according to the desired transport type.

[ RC - Rebased Takahiro's changes and wrote a commit message
explaining the changes. ]

Signed-off-by: Takahiro Shimizu <tshimizu818@gmail.com>
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

pch_gbe: do not set the channel control register

We will let the pch_gbe code do that according to the receive time stamp
filter.

[ RC - Rebased Takahiro's changes and wrote a commit message
explaining the changes. ]

Signed-off-by: Takahiro Shimizu <tshimizu818@gmail.com>
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

pch_gbe: improve coding style

This patch clears up a few coding style issues:

- Makes two function definitions a bit nicer looking.
- Remove unneeded parentheses.
- Simplify macros for register bits.

[ RC - Rebased Takahiro's changes and wrote a commit message
explaining the changes. ]

Signed-off-by: Takahiro Shimizu <tshimizu818@gmail.com>
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

pch_gbe: export a method to set the receive match address

The code in phc_gbe_main will need to call this method in order to set the
station address register according to the receive time stamping filter.

[ RC - Rebased Takahiro's changes and wrote a commit message
explaining the changes. ]

Signed-off-by: Takahiro Shimizu <tshimizu818@gmail.com>
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

pch_gbe: reprogram multicast address register on reset

The reset logic after a Rx FIFO overrun will clear the programmed
multicast addresses. This patch fixes the issue by reprogramming the
registers after the reset.

[ RC - Rebased Takahiro's changes and wrote a commit message
explaining the changes. ]

Signed-off-by: Takahiro Shimizu <tshimizu818@gmail.com>
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

pch_gbe: simplify transmit time stamping flag test

This patch makes logic surrounding the test of the
transmit time stamping flag more readable.

[ RC - Rebased Takahiro's changes and wrote a commit message
explaining the changes. ]

Signed-off-by: Takahiro Shimizu <tshimizu818@gmail.com>
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

pch_gbe: scale time stamps to nanoseconds

This patch fixes the helper functions that give the transmit and
receive time stamps to return nanoseconds, instead of arbitrary clock
ticks.

[ RC - Rebased Takahiro's changes and wrote a commit message
explaining the changes. ]

Signed-off-by: Takahiro Shimizu <tshimizu818@gmail.com>
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Remove register_net_sysctl_table

All of the users have been converted to use registera_net_sysctl so we
no longer need register_net_sysctl.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Delete all remaining instances of ctl_path

We don't use struct ctl_path anymore so delete the exported constants.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Convert all sysctl registrations to register_net_sysctl

This results in code with less boiler plate that is a bit easier
to read.

Additionally stops us from using compatibility code in the sysctl
core, hastening the day when the compatibility code can be removed.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Convert nf_conntrack_proto to use register_net_sysctl

There isn't much advantage here except that strings paths are a bit
easier to read, and converting everything to them allows me to kill off
ctl_path.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net ipv4: Convert devinet to use register_net_sysctl

Using an ascii path to register_net_sysctl as opposed to the slightly
awkward ctl_path allows for much simpler code.

We no longer need to malloc dev_name to keep it alive the length of our
sysctl register instead we can use a small temporary buffer on the
stack.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net ipv6: Convert addrconf to use register_net_sysctl

Using an ascii path to register_net_sysctl as opposed to the slightly
awkward ctl_path allows for much simpler code.

We no longer need to malloc dev_name to keep it alive the length of our
sysctl register instead we can use a small temporary buffer on the
stack.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net decnet: Convert to use register_net_sysctl

Using an ascii path to register_net_sysctl as opposed to the slightly
awkward ctl_path allows for much simpler code.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net neighbour: Convert to use register_net_sysctl

Using an ascii path to register_net_sysctl as opposed to the slightly
awkward ctl_path allows for much simpler code.

We no longer need to malloc dev_name to keep it alive the length of our
sysctl register instead we can use a small temporary buffer on the
stack.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net ipv6: Don't use sysctl tables with .child entries.

The sysctl core no longer natively understands sysctl tables
with .child entries.

Split the ipv6_table to remove the .child entries.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net llc: Don't use sysctl tables with .child entries.

The sysctl core no longer natively understands sysctl tables with .child
entries.

Kill the intermediate tables and use register_net_sysctl directly to
remove the need for compatibility code.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net ax25: Simplify and cleanup the ax25 sysctl handling.

Don't register/unregister every ax25 table in a batch. Instead register
and unregister per device ax25 sysctls as ax25 devices come and go.

This moves ax25 to be a completely modern sysctl user. Registering the
sysctls in just the initial network namespace, removing the use of
.child entries that are no longer natively supported by the sysctl core
and taking advantage of the fact that there are no longer any ordering
constraints between registering and unregistering different sysctl
tables.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net ipv4: Remove the unneeded registration of an empty net/ipv4/neigh

sysctl no longer requires explicit creation of directories. The neigh
directory is always populated with at least a default entry so this
won't cause any user visible changes.

Delete the ipv4_path and the ipv4_skeleton these are no longer needed.

Directly register the ipv4_route_table.

And since I am an idiot remove the header definitions that I should
have removed in the previous patch.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net ipv6: Remove unneded registration of an empty net/ipv6/neigh

sysctl no longer requires explicit creation of directories. The neigh
directory is always populated with at least a default entry so this
should cause no user visible changes.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net core: Remove unneded creation of an empty net/core sysctl directory

On the next line we register the net_core_table in net/core which
creates the directory and ensures it exists.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Move all of the network sysctls without a namespace into init_net.

This makes it clearer which sysctls are relative to your current network
namespace.

This makes it a little less error prone by not exposing sysctls for the
initial network namespace in other namespaces.

This is the same way we handle all of our other network interfaces to
userspace and I can't honestly remember why we didn't do this for
sysctls right from the start.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Kill register_sysctl_rotable

register_sysctl_rotable never caught on as an interesting way to
register sysctls.  My take on the situation is that what we want are
sysctls that we can only see in the initial network namespace.  What we
have implemented with register_sysctl_rotable are sysctls that we can
see in all of the network namespaces and can only change in the initial
network namespace.

That is a very silly way to go.  Just register the network sysctls
in the initial network namespace and we don't have any weird special
cases to deal with.

The sysctls affected are:
/proc/sys/net/ipv4/ipfrag_secret_interval
/proc/sys/net/ipv4/ipfrag_max_dist
/proc/sys/net/ipv6/ip6frag_secret_interval
/proc/sys/net/ipv6/mld_max_msf

I really don't expect anyone will miss them if they can't read them in a
child user namespace.

CC: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net sysctl: Initialize the network sysctls sooner to avoid problems.

If the netfilter code is modified to use register_net_sysctl_table the
kernel fails to boot because the per net sysctl infrasturce is not setup
soon enough. So to avoid races call net_sysctl_init from sock_init().

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net sysctl: Register an empty /proc/sys/net

Implementation limitations of the sysctl core won't let /proc/sys/net
reside in a network namespace. /proc/sys/net at least must be registered
as a normal sysctl. So register /proc/sys/net early as an empty directory
to guarantee we don't violate this constraint and hit bugs in the sysctl
implementation.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Implement register_net_sysctl.

Right now all of the networking sysctl registrations are running in a
compatibiity mode.  The natvie sysctl registration api takes a cstring
for a path and a simple ctl_table.  Implement register_net_sysctl so
that we can register network sysctls without needing to use
compatiblity code in the sysctl core.

Switching from a ctl_path to a cstring results in less boiler plate
and denser code that is a little easier to read.

I would simply have changed the arguments to register_net_sysctl_table
instead of keeping two functions in parallel but gcc will allow a
ctl_path pointer to be passed to a char * pointer with only issuing a
warning resulting in completely incorrect code can be built.  Since I
have to change the function name I am taking advantage of the situation
to let both register_net_sysctl and register_net_sysctl_table live for a
short time in parallel which makes clean conversion patches a bit easier
to read and write.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'tipc_net-next' of git://git./linux/kernel/git/paulg/linux

atl1c: remove MDIO_REG_ADDR_MASK in atl1c_mdio_read/write

MDIO_REG_ADDR_MASK is already applied in function
atl1c_write_phy_reg and atl1c_read_phy_reg

Signed-off-by: xiong <xiong@qca.qualcomm.com>
Tested-by: Liu David <dwliu@qca.qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

atl1c: fix WoL(magic) issue for l2cb 1.1

l2cb 1.1 hardware has a bug for magic wakeup,
the workaround is to add pattern enable.
WoL related registers are refined as well.

Signed-off-by: xiong <xiong@qca.qualcomm.com>
Tested-by: Liu David <dwliu@qca.qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

atl1c: refine atl1c_pcie_patch

bit PCIE_PHYMISC_FORCE_RCV_DET is only for l1c&l2c to fix WoL issue,
other chips set bit5 of REG_MASTER_CTRL --- this way could save more
power than the former, and the bit should be kept all time.
l2cb 1.x has special setting for L0S/L1
l2cb 1.x & l1d 1.x should clear Vendor Message on some platforms,
otherwise it will cause the root complex hang.

Signed-off-by: xiong <xiong@qca.qualcomm.com>
Tested-by: Liu David <dwliu@qca.qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

atl1c: refine/update ASPM configuration

some platforms(BIOS or OS) may change ASPM configuration in
PCI Express Link Control Register directly and dynamically
regardless the device driver installation.
Checking if ASPM support during the driver init phase by reading
PCI Express Link Contrl Register doesn't make sense.
This refine/update assume L0S/L1 is defalut enabled as hw->ctrl_flags
inited. atl1c_set_aspm will set real configuration based on chip
capability to hardware register.
atl1c_disable_l0s_l1 and register definition of REG_PM_CTRL are
refined as well.

Signed-off-by: xiong <xiong@qca.qualcomm.com>
Tested-by: Liu David <dwliu@qca.qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

atl1c: clear bit MASTER_CTRL_CLK_SEL_DIS in atl1c_pcie_patch

bit MASTER_CTRL_CLK_SEL_DIS could be set before enter suspend
clear it after resume to enable pclk(PCIE clock) switch to
low frequency(25M) in some circumstances to save power.

Signed-off-by: xiong <xiong@qca.qualcomm.com>
Tested-by: Liu David <dwliu@qca.qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

atl1c: refine reg definition of REG_MASTER_CTRL

refine/update register REG_MASTER_CTRL definition according with
hardware spec.

Signed-off-by: xiong <xiong@qca.qualcomm.com>
Tested-by: Liu David <dwliu@qca.qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

atl1c: clear PCIE error status in atl1c_reset_pcie

clear PCIE error status (error log is write-1-clear).
REG_PCIE_UC_SEVERITY is removed as it's a standard pcie register,
and using kernle API to access it.

Signed-off-by: xiong <xiong@qca.qualcomm.com>
Tested-by: Liu David <dwliu@qca.qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

atl1c: remove dmar_dly_cnt and dmaw_dly_cnt

dmar_dly_cnt and dmaw_dly_cnt aren't used by hardware/driver any more.

Signed-off-by: xiong <xiong@qca.qualcomm.com>
Tested-by: Liu David <dwliu@qca.qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

atl1c: update right threshold for TSO

atl1c_configure_tx used a wrong value of MAX_TX_OFFLOAD_THRESH(9KB)
for TSO threshold.
the right value should be 7KB
Fast Ethernet controller doesn't support Jumbo frame.

Signed-off-by: xiong <xiong@qca.qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

atl1c: add module parameter for l1c_wait_until_idle

l1c_wait_until_idle is called for serval modules (TXQ/RXQ/TXMAC/RXMAC).
specific moudle have specific idle/busy status in reg REG_IDLE_STATUS.
the previous code return wrongly if all modules are in idle status,
regardless the 'stop' action is applied on individual module.
Refine the reg REG_IDLE_STATUS definition as well.

Signed-off-by: xiong <xiong@qca.qualcomm.com>
Tested-by: Liu David <dwliu@qca.qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

atl1c: threshold for ASPM is changed based on chip capability

threshold setting to control ASPM for diff chips are different.
currently, all gigabit-capability chips have limited-ASPM under
100M throughput.

Signed-off-by: xiong <xiong@qca.qualcomm.com>
Tested-by: Liu David <dwliu@qca.qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

stmmac: do not fail when probe and there is no csr clk defined

On some platforms, for example where we are doing the bring-up,
the csr clock is not passed from the framework and the Ethernet
device driver is failing when it can work w/o any issues and
using the default values. So this patch just warnings the case
of the csr clock cannot be acquired but w/o failing the probe
step. I have just tested it on ST STiH415 SoC (ARM).

Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

stmmac: verify the dma_cfg platform fields

Recently the dma parameters that can be passed from the platform
have been moved from the plat_stmmacenet_data to the stmmac_dma_cfg.

In case of this new structure is not well allocated the driver can
fails. This is an example how this field is managed in ST platforms

static struct stmmac_dma_cfg gmac_dma_setting = {
.pbl = 32,
};

static struct plat_stmmacenet_data stih415_ethernet_platform_data[] = {
{
.dma_cfg = &gmac_dma_setting,
.has_gmac = 1,
[snip]

This patch so verifies that the dma_cfg passed from the platform.
In case of it is NULL there is no reason that the driver has to fail
and some default values can be passed. These are ok for all the
Synopsys chips and could impact on performances, only.

Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
cc: Viresh Kumar <viresh.kumar@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

stmmac: Move the mdio_register/_unregister in probe/remove

This patch moves the mdio_register/_unregister in probe/remove
functions and this also is required when hibernation on disk
is done.

Signed-off-by: Francesco Virlinzi <francesco.virlinzi@st,com>
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st,com>
Signed-off-by: David S. Miller <davem@davemloft.net>

stmmac: use custom init/exit functions in pm ops

Freeze and restore can call the custom init/exit functions.
Also the patch adds a custom data field that can be used
for storing platform data useful on restore the embedded
setup (e.g. GPIO, SYSCFG).

Signed-off-by: Francesco Virlinzi <francesco.virlinzi@st.com>
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

iwlwifi: Remove inconsistent and redundant declaration

Remove declaration of iwl_alloc_traffic_mem from iwl-agn.h,
from methods that was exposed to support MVM.

MVM doesn't have to use this declaration.

CC: netdev@vger.kernel.org
Signed-off-by: David Spinadel <david.spinadel@intel.com>
Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: Ensure network address change doesn't impact configuration service

Enhances command validation done by TIPC's configuration service so
that it works properly even if the node's network address is changed in
mid-operation. The default node address of <0.0.0> is now recognized as an
alias for "this node" even after a new network address has been assigned.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>

tipc: Ensure network address change doesn't impact rejected message

Revises handling of a rejected message to ensure that a locally
originated message is returned properly even if the node's network
address is changed in mid-operation. The routine now treats the
default node address of <0.0.0> as an alias for "this node" when
determining where to send a returned message.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>

tipc: handle <0.0.0> as an alias for this node on outgoing msgs

Revises handling of send routines for payload messages to ensure that
they are processed properly even if the node's network address is
changed in mid-operation. The routines now treat the default node
address of <0.0.0> as an alias for "this node" when determining where
to send an outgoing message.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>

tipc: properly handle off-node send requests with invalid addr

There are two send routines that might conceivably be asked by an
application to send a message off-node when the node is still using
the default network address. These now have an added check that
detects this and rejects the message gracefully.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>

tipc: take lock while updating node network address

The routine that changes the node's network address now takes TIPC's
network lock in write mode while the main address variable and associated
data structures are being changed; this is needed to ensure that the
link subsystem won't attempt to send a message off-node until the sending
port's message header template has been updated with the node's new
network address.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>

tipc: Ensure network address change doesn't impact local connections

Revises routines that deal with connections between two ports on
the same node to ensure the connection is not impacted if the node's
network address is changed in mid-operation. The routines now treat
the default node address of <0.0.0> as an alias for "this node" in
the following situations:

1) Incoming messages destined to a connected port now handle the alias
properly when validating that the message was sent by the expected
peer port, ensuring that the message will be accepted regardless of
whether it specifies the node's old network address or it's current one.

2) The code which completes connection establishment now handles the
alias properly when determining if the peer port is on the same node
as the connected port.

An added benefit of addressing issue 1) is that some peer port
validation code has been relocated to TIPC's socket subsystem, which
means that validation is no longer done twice when a message is
sent to a non-socket port (such as TIPC's configuration service or
network topology service).

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>

tipc: delete duplicate peerport/peernode helper functions

Prior to commit 23dd4cce387124ec3ea06ca30d17854ae4d9b772

    "tipc: Combine port structure with tipc_port structure"

there was a need for the two sets of helper functions.  But
now they are just duplicates.  Remove the globally visible
ones, and mark the remaining ones as inline.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>

tipc: Ensure network address change doesn't impact new port

Re-orders port creation logic so that the initialization of a new
port's message header template occurs while the port list lock is
held. This ensures that a change to the node's network address that
occurs at the same time as the port is being created does not result
in the template identifying the sender using the former network
address. The new approach guarantees that the new port's template is
using the current network address or that it will be updated when
the address changes.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>

tipc: Optimize re-initialization of port message header templates

Removes an unnecessary check in the logic that updates the message
header template for existing ports when a node's network address is
first assigned. There is no longer any need to check to see if the
node's network address has actually changed since the calling routine
has already verified that this is so.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>

tipc: Ensure network address change doesn't impact name table updates

Revises routines that add and remove an entry from a node's name table
so that the publication scope lists are updated properly even if the
node's network address is changed in mid-operation. The routines now
recognize the default node address of <0.0.0> as an alias for "this node"
even after a new network address has been assigned.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>

tipc: Add routines for safe checking of node's network address

Introduces routines that test whether a given network address is
equal to a node's own network address or if it lies within the node's
own network cluster, and which work properly regardless of whether
the node is using the default network address <0.0.0> or a non-zero
network address that is assigned later on. In essence, these routines
ensure that address <0.0.0> is treated as an alias for "this node",
regardless of which network address the node is actually using.

Old users of the pre-existing more strict match in_own_cluster()
have been accordingly redirected to what is now called
in_own_cluster_exact() --- which does not extend matching to <0,0,0>.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>

tipc: Don't record failed publication attempt as a success

No longer increments counter of number of publications by a node
if an attempt to add a new publication fails. This prevents TIPC from
incorrectly blocking future publications because the configured maximum
number of publications has been reached.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>

tipc: Update node-scope publications when network address is assigned

Ensures that node-scope name publications that exist prior to the
configuration of a node's network address are properly re-initialized
with that address when it is assigned. TIPC's node-scope publications
are now tracked using a publications list like the lists used for
cluster-scope and zone-scope publications so they can be easily updated
when required.

The inclusion of node scope name publications in a conventional publication
list means that they must now also be withdrawn, just like cluster and zone
scope publications are currently withdrawn. So some conditional tests on
scope ==/!= TIPC_NODE_SCOPE are inserted/removed accordingly.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>

tipc: Separate cluster-scope and zone-scope names into distinct lists

Utilizes distinct lists to track zone-scope and cluster-scope names
published by a node. For now, TIPC continues to process the entries
in both lists in the same way; however, an upcoming patch will utilize
the existence of the lists to prevent the sending of cluster-scope names
to nodes that are not part of the local cluster.

To achieve this, an array of publication lists is introduced, so
that they can be iterated over and accessed via publ->scope as
an index where convenient.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>

bonding: start slaves with link down for ARP monitor

Initialize slave device link state as down if ARP monitor is
active and net_carrier_ok() returns zero. Also shift initial
value of its last_arp_tx so that it doesn't immediately cause
fake detection of "up" state.

When ARP monitoring is used, initializing the slave device with
up link state can cause ARP monitor to detect link failure
before the device is really up (with igb driver, this can take
more than two seconds).

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nf_bridge: remove holes in struct nf_bridge_info

Put use & mask on same location to avoid two holes on 64bit arches

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: dont drop packet in defrag but consume it

When defragmentation is finalized, we clone a packet and kfree_skb() it.

Call consume_skb() to not confuse dropwatch, since its not a drop.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: gro: GRO_MERGED_FREE consumes packets

As part of GRO processing, merged skbs should be consumed, not freed, to
not confuse dropwatch/drop_monitor.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dont drop packet but consume it

When we need to clone skb, we dont drop a packet.
Call consume_skb() to not confuse dropwatch.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: dccp: dont drop packet but consume it

When we need to clone skb, we dont drop a packet.
Call consume_skb() to not confuse dropwatch.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

packet: dont drop packet but consume it

When we need to clone skb, we dont drop a packet.
Call consume_skb() to not confuse dropwatch.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: tcp: dont drop packet but consume it

When we need to clone skb, we dont drop a packet.
Call consume_skb() to not confuse dropwatch.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netlink: dont drop packet but consume it

When we need to clone skb, we dont drop a packet.
Call consume_skb() to not confuse dropwatch.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ip6_tunnel: dont drop packet but consume it

When we need to reallocate skb, we dont drop a packet.
Call consume_skb() to not confuse dropwatch.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>