review.tizen.org Git - platform/kernel/linux-rpi.git/log

r8169:Fix typo in setting RTL8168H PHY PFM mode.

The PHY PFM register is in PHY page 0x0a44 register 0x11, not 0x14.

Signed-off-by: Chunhao Lin <hau@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8169:Fix typo in setting RTL8168EP and RTL8168H D3cold PFM mode

The register for setting D3code PFM mode is MISC_1, not DLLPR.

Signed-off-by: Chunhao Lin <hau@realtek.com>
Reviewed-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

l2tp: rely on ppp layer for skb scrubbing

Since 79c441ae505c ("ppp: implement x-netns support"), the PPP layer
calls skb_scrub_packet() whenever the skb is received on the PPP
device. Manually resetting packet meta-data in the L2TP layer is thus
redundant.

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'sh_eth-remove-BE-desc-support'

Sergei Shtylyov says:

====================
sh_eth: remove unused BE descriptor support

Here's a set of 2 patches against DaveM's 'net-next.git' repo plus the
recently merged to 'net.git' repo fix for the 16-bit descriptor endianness.
We get rid of ~30 LoCs and ~300 bytes of code.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

sh_eth: get rid of {cpu|edmac}_to_{edmac|cpu}()

Now that {cpu|edmac}_to_{edmac|cpu}() functions boiled down to the mere
{cpu|le32}_to_{le32|cpu}() calls, there's no need for these functions
anymore, so just get rid of them.

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Acked-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

sh_eth: remove EDMAC_BIG_ENDIAN

Commit  71557a37adb5 ("[netdrvr] sh_eth: Add SH7619 support") added support
for the big-endian EDMAC descriptors. However, it was never used and never
worked right until the recent driver  fixes. I think we now  can just remove
this support,  it was only burdening the driver from the start. It should be
easy to do without disturbing the SH platform code, at least for now...

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Acked-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

tilepro: use to_delayed_work

Use to_delayed_work() instead of open-coding it.

Signed-off-by: Geliang Tang <geliangtang@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'bnxt_en-combined-rx-tx-channels'

Michael Chan says:

====================
bnxt_en: Support combined and rx/tx channels.

The bnxt hardware uses a completion ring for rx and tx events.  The driver
has to process the completion ring entries sequentially for the events.
The current code only supports an rx/tx ring pair for each completion ring.
This patch series add support for using a dedicated completion ring for
rx only or tx only as an option configuarble using ethtool -L.

The benefits for using dedicated completion rings are:

1. A burst of rx packets can cause delay in processing tx events if the
completion ring is shared.  If tx queue is stopped by BQL, this can cause
delay in re-starting the tx queue.

2. A completion ring is sized according to the rx and tx ring size rounded
up to the nearest power of 2.  When the completion ring is shared, it is
sized by adding the rx and tx ring sizes and then rounded to the next power
of 2, often with a lot of wasted space.

3. Using dedicated completion ring, we can adjust the tx and rx coalescing
parameters independently for rx and tx.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt_en: Modify ethtool -l|-L to support combined or rx/tx rings.

The driver can support either all combined or all rx/tx rings. The
default is combined, but the user can now select rx/tx rings.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt_en: Modify init sequence to support shared or non shared rings.

Modify ring memory allocation and MSIX setup to support shared or
non shared rings and do the proper mapping. Default is still to
use shared rings.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt_en: Modify bnxt_get_max_rings() to support shared or non shared rings.

Add logic to calculate how many shared or non shared rings can be
supported. Default is to use shared rings.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt_en: Re-structure ring indexing and mapping.

In order to support dedicated or shared completion rings, the ring
indexing and mapping are re-structured as below:

1. bp->grp_info[] array index is 1:1 with bp->bnapi[] array index and
completion ring index.

2. rx rings 0 to n will be mapped to completion rings 0 to n.

3. If tx and rx rings share completion rings, then tx rings 0 to m will
be mapped to completion rings 0 to m.

4. If tx and rx rings use dedicated completion rings, then tx rings 0 to
m will be mapped to completion rings n + 1 to n + m.

5. Each tx or rx ring will use the corresponding completion ring index
for doorbell mapping and MSIX mapping.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt_en: Check for NULL rx or tx ring.

Each bnxt_napi structure may no longer be having both an rx ring and
a tx ring. Check for a valid ring before using it.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt_en: Separate bnxt_{rx|tx}_ring_info structs from bnxt_napi struct.

Currently, an rx and a tx ring are always paired with a completion ring.
We want to restructure it so that it is possible to have a dedicated
completion ring for tx or rx only.

The bnxt hardware uses a completion ring for rx and tx events.  The driver
has to process the completion ring entries sequentially for the rx and tx
events.  Using a dedicated completion ring for rx only or tx only has these
benefits:

1. A burst of rx packets can cause delay in processing tx events if the
completion ring is shared.  If tx queue is stopped by BQL, this can cause
delay in re-starting the tx queue.

2. A completion ring is sized according to the rx and tx ring size rounded
up to the nearest power of 2.  When the completion ring is shared, it is
sized by adding the rx and tx ring sizes and then rounded to the next power
of 2, often with a lot of wasted space.

3. Using dedicated completion ring, we can adjust the tx and rx coalescing
parameters independently for rx and tx.

The first step is to separate the rx and tx ring structures from the
bnxt_napi struct.

In this patch, an rx ring and a tx ring will point to the same bnxt_napi
struct to share the same completion ring.  No change in ring assignment
and mapping yet.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt_en: Refactor bnxt_dbg_dump_states().

By adding 3 separate functions to dump the different ring states.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git./linux/kernel/git/davem/net

Merge tag 'pci-v4.4-fixes-3' of git://git./linux/kernel/git/helgaas/pci

Pull PCI bugfix from Bjorn Helgaas:
"Here's another fix for v4.4.

  This fixes 32-bit config reads for the HiSilicon driver.  Obviously
  the driver is completely broken without this fix (apparently it
  actually was tested internally, but got broken somehow in the process
  of upstreaming it).

  Summary:

  HiSilicon host bridge driver
    Fix 32-bit config reads (Dongdong Liu)"

* tag 'pci-v4.4-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI: hisi: Fix hisi_pcie_cfg_read() 32-bit reads

Merge git://git./linux/kernel/git/davem/sparc

Pull sparc fixes from David Miller:
"Just some missing syscall wire ups"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc: Wire up mlock2 system call.
sparc: Add all necessary direct socket system calls.

Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) Prevent XFRM per-cpu counter updates for one namespace from being
    applied to another namespace.  Fix from DanS treetman.

2) Fix RCU de-reference in iwl_mvm_get_key_sta_id(), from Johannes
    Berg.

3) Remove ethernet header assumption in nft_do_chain_netdev(), from
    Pablo Neira Ayuso.

4) Fix cpsw PHY ident with multiple slaves and fixed-phy, from Pascal
    Speck.

5) Fix use after free in sixpack_close and mkiss_close.

6) Fix VXLAN fw assertion on bnx2x, from Yuval Mintz.

7) natsemi doesn't check for DMA mapping errors, from Alexey
    Khoroshilov.

8) Fix inverted test in ip6addrlbl_get(), from ANdrey Ryabinin.

9) Missing initialization of needed_headroom in geneve tunnel driver,
    from Paolo Abeni.

10) Fix conntrack template leak in openvswitch, from Joe Stringer.

11) Mission initialization of wq->flags in sock_alloc_inode(), from
    Nicolai Stange.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (35 commits)
  sctp: sctp should release assoc when sctp_make_abort_user return NULL in sctp_close
  net, socket, socket_wq: fix missing initialization of flags
  drivers: net: cpsw: fix error return code
  openvswitch: Fix template leak in error cases.
  sctp: label accepted/peeled off sockets
  sctp: use GFP_USER for user-controlled kmalloc
  qlcnic: fix a loop exit condition better
  net: cdc_ncm: avoid changing RX/TX buffers on MTU changes
  geneve: initialize needed_headroom
  ipv6: honor ifindex in case we receive ll addresses in router advertisements
  addrconf: always initialize sysctl table data
  ipv6/addrlabel: fix ip6addrlbl_get()
  switchdev: bridge: Pass ageing time as clock_t instead of jiffies
  sh_eth: fix 16-bit descriptor field access endianness too
  veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.
  net: usb: cdc_ncm: Adding Dell DW5813 LTE AT&T Mobile Broadband Card
  net: usb: cdc_ncm: Adding Dell DW5812 LTE Verizon Mobile Broadband Card
  natsemi: add checks for dma mapping errors
  rhashtable: Kill harmless RCU warning in rhashtable_walk_init
  openvswitch: correct encoding of set tunnel action attributes
  ...

sparc: Wire up mlock2 system call.

Signed-off-by: David S. Miller <davem@davemloft.net>

sparc: Add all necessary direct socket system calls.

The GLIBC folks would like to eliminate socketcall support
eventually, and this makes sense regardless so wire them
all up.

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'for-upstream' of git://git./linux/kernel/git/bluetooth/bluetooth-next

Johan Hedberg says:

====================
pull request: bluetooth-next 2015-12-31

Here's (probably) the last bluetooth-next pull request for the 4.5
kernel:

- Add support for BCM2E65 ACPI ID
- Minor fixes/cleanups in the bcm203x & bfusb drivers
- Minor debugfs related fix in 6lowpan code

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'ethtool-phy-stats'

Andrew Lunn says:

====================
Ethtool support for phy stats

This patchset add ethtool support for reading statistics from the PHY.
The Marvell and Micrel Phys are then extended to report receiver
packet errors and idle errors.

v2:
  Fix linking when phylib is not enabled.
v3:
  Inline helpers into ethtool.c, so fixing when phylib is a module.
v4:
  Add missing static
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

phy: micrel: Add ethtool statistics counters

The PHY counters receiver errors and errors while idle.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

phy: marvell: Add ethtool statistics counters

The PHY counters receiver errors and errors while idle.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethtool: Add phy statistics

Ethernet PHYs can maintain statistics, for example errors while idle
and receive errors. Add an ethtool mechanism to retrieve these
statistics, using the same model as MAC statistics.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sctp: sctp should release assoc when sctp_make_abort_user return NULL in sctp_close

In sctp_close, sctp_make_abort_user may return NULL because of memory
allocation failure. If this happens, it will bypass any state change
and never free the assoc. The assoc has no chance to be freed and it
will be kept in memory with the state it had even after the socket is
closed by sctp_close().

So if sctp_make_abort_user fails to allocate memory, we should abort
the asoc via sctp_primitive_ABORT as well. Just like the annotation in
sctp_sf_cookie_wait_prm_abort and sctp_sf_do_9_1_prm_abort said,
"Even if we can't send the ABORT due to low memory delete the TCB.
This is a departure from our typical NOMEM handling".

But then the chunk is NULL (low memory) and the SCTP_CMD_REPLY cmd would
dereference the chunk pointer, and system crash. So we should add
SCTP_CMD_REPLY cmd only when the chunk is not NULL, just like other
places where it adds SCTP_CMD_REPLY cmd.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'wireless-drivers-for-davem-2015-12-28' of git://git./linux/kernel/git/kvalo/wireless-drivers

Kalle Valo says:

====================
iwlwifi

* don't load firmware that won't exist for 7260
* fix RCU splat
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net, socket, socket_wq: fix missing initialization of flags

Commit ceb5d58b2170 ("net: fix sock_wake_async() rcu protection") from
the current 4.4 release cycle introduced a new flags member in
struct socket_wq and moved SOCKWQ_ASYNC_NOSPACE and SOCKWQ_ASYNC_WAITDATA
from struct socket's flags member into that new place.

Unfortunately, the new flags field is never initialized properly, at least
not for the struct socket_wq instance created in sock_alloc_inode().

One particular issue I encountered because of this is that my GNU Emacs
failed to draw anything on my desktop -- i.e. what I got is a transparent
window, including the title bar. Bisection lead to the commit mentioned
above and further investigation by means of strace told me that Emacs
is indeed speaking to my Xorg through an O_ASYNC AF_UNIX socket. This is
reproducible 100% of times and the fact that properly initializing the
struct socket_wq ->flags fixes the issue leads me to the conclusion that
somehow SOCKWQ_ASYNC_WAITDATA got set in the uninitialized ->flags,
preventing my Emacs from receiving any SIGIO's due to data becoming
available and it got stuck.

Make sock_alloc_inode() set the newly created struct socket_wq's ->flags
member to zero.

Fixes: ceb5d58b2170 ("net: fix sock_wake_async() rcu protection")
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch '10GbE' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
10GbE Intel Wired LAN Driver Updates 2015-12-29

This series contains updates to ixgbe and ixgbevf.

William Dauchy provides a fix for ixgbevf that was implemented for ixgbe,
commit 5d6002b7b822c7 ("ixgbe: Fix handling of NAPI budget when multiple
queues are enabled per vector"). The issue was that the polling routine
would increase the budget for receive to at least 1 per queue if multiple
queues were present, which resulted in receive packets being processed
when the budget was 0.

Emil provides minor cleanups for ixgbevf, one being that we need to
check rx_itr_setting with == and not &, since it is not a mask.  Added
QSFP PHY support in ixgbe to allow for more accurate reporting of port
settings.  Fixed the max RSS limit for X550 which is 63, not 64.

Veola fixes ixgbe ethtool reporting of backplane type interfaces as
1000/10000baseT link modes, instead, report the media as KR, KX or KX4
based on the backplane interface present.

Mark cleans up redundancy in the setting of hw_enc_features that makes
it appear that X550 has more encapsulation features than other devices.
Also do not set NETIF_F_SG any longer since that is set by the
register_netdev() call.  Also fixed the X550EM_x revision check, which
needs to check a value, not just a bit.

Alex Duyck fixes additional bugs in ixgbe_clear_vf_vlans(), one being
that the mask was using a divide instead of a modulus, which resulted
in the mask bit being incorrectly set to 0 or 1 based on the value of
the VF being tested.  Alex also found that he was not consistent in
using the "word" argument as an offset or as a register offset, so
made the code consistently use word as the offset in the array.

v2: dropped patch 8 of the original series, as it was undoing a part of
    the fix Alex Duyck was doing in patch 9 of the original series.
    Dropped based on feedback from Emil (the author).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'be2net-next'

Sathya Perla says:

====================
be2net: patch set

The following patch set contains some feature additions, code re-organization
and cleanup and a few non-critical fixes. Pls consider applying this to
the net-next tree. Thanks.

v3 changes: add a default case to the switch statement in patch 5 to
satisfy the compiler (-Wswitch).
v2 changes: replaced an if/else block that checks for error values with a
switch/case statement in patch 5.

Patch 1 fixes VF link state transition from disabled to auto that did
not work due to an issue in the FW. This issue could not be fixed in FW due
to some backward compatibility issues it causes with released drivers.
The issue has been fixed by introducing a new version (v2) of the cmd
from 10.6 FW onwards. This patch adds support for v2 version of this cmd.

Patch 2 reports a EOPNOTSUPP status to ethtool when the user tries to
configure BE3/SRIOV in VEPA mode as it is not supported by the chip.

Patch 3 cleansup FW flash image related constant definitions. Many of these
definitions (such as section offset values) were defined in decimal format
rather than hexa-decimal. This makes this part of the code un-readable.
Also some defines related to BE2 are labeld "g2" and defines related to BE3
are labeled "g3". This patch cleans up all of this to make this code more
readable.

Patch 4 moves the FW cmd code to be_cmds.c. All code relating to FW cmds
has been in be_cmds.[ch], excepting FW flash cmd related code.
This patch moves these routines from be_main.c to be_cmds.c.

Patch 5 adds a log message to report digital signature errors while
flashing a FW image. From FW version 11.0 onwards, the FW supports a new
"secure mode" feature (based on a jumper setting on the adapter.) In this
mode, the FW image when flashed is authenticated with a digital signature.

Patch 6 removes a line of code that has no effect.

Patch 7 removes some unused variables.

Patch 8 fixes port resource descriptor query via the GET_PROFILE FW cmd.
An earlier commit passed a specific pf_num while issuing this cmd as FW
returns descriptors for all functions when pf_num is zero. But, when pf_num
is set to a non-zero value, FW does not return the port resource descriptor.
This patch fixes this by setting pf_num to 0 while issuing the query cmd
and adds code to pick the correct NIC resource descriptor from the list of
descriptors returned by FW.

Patch 9 adds support for ethtool get-dump feature. In the past when this
option was not yet available, this feature was supported via the
--register-dump option as a workaround. This patch removes support for
FW-dump via --register-dump option as it is now available via --get-dump
option. Even though the "ethtool --register-dump" cmd which used to work
earlier, will now fail with ENOTSUPP error, we feel it is not an issue as
this is used only for diagnostics purpose.

Patch 10 bumps up the driver version.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

be2net: bump up the driver version to 11.0.0.0

Signed-off-by: Suresh Reddy <suresh.reddy@avagotech.com>
Signed-off-by: Sathya Perla <sathya.perla@avagotech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

be2net: support ethtool get-dump option

This patch adds support for ethtool's --get-dump option in be2net,
to retrieve FW dump. In the past when this option was not yet available,
this feature was supported via the --register-dump option as a workaround.
This patch removes support for FW-dump via --register-dump option as it is
now available via --get-dump option. Even though the
"ethtool --register-dump" cmd which used to work earlier, will now fail
with ENOTSUPP error, we feel it is not an issue as this is used only
for diagnostics purpose.

Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@avagotech.com>
Signed-off-by: Sathya Perla <sathya.perla@avagotech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

be2net: fix port-res desc query of GET_PROFILE_CONFIG FW cmd

Commit 72ef3a88fa8e ("be2net: set pci_func_num while issuing
GET_PROFILE_CONFIG cmd") passed a specific pf_num while issuing a
GET_PROFILE_CONFIG cmd as FW returns descriptors for all functions when
pf_num is zero. But, when pf_num is set to a non-zero value, FW does not
return the Port resource descriptor.
This patch fixes this by setting pf_num to 0 while issuing the query cmd
and adds code to pick the correct NIC resource descriptor from the list of
descriptors returned by FW.

Fixes: 72ef3a88fa8e ("be2net: set pci_func_num while issuing
GET_PROFILE_CONFIG cmd")
Signed-off-by: Suresh Reddy <suresh.reddy@avagotech.com>
Signed-off-by: Sathya Perla <sathya.perla@avagotech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

be2net: remove unused error variables

eeh_error, fw_timeout, hw_error variables in the be_adapter structure are
not used anymore. An earlier patch that introduced adapter->err_flags to
store this information missed removing these variables.

Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@avagotech.com>
Signed-off-by: Sathya Perla <sathya.perla@avagotech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

be2net: remove a line of code that has no effect

This patch removes a line of code that changes adapter->recommended_prio
value followed by yet another assignment.
Also, the variable is used to store the vlan priority value that is already
shifted to the PCP bits position in the vlan tag format. Hence, the name of
this variable is changed to recommended_prio_bits.

Signed-off-by: Sathya Perla <sathya.perla@avagotech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

be2net: log digital signature errors while flashing FW image

(based on a jumper setting on the adapter.) In this mode, the FW image when
flashed is authenticated with a digital signature. This patch logs
appropriate error messages and return a status to ethtool when errors
relating to FW image authentication occur.

Signed-off-by: Suresh Reddy <suresh.reddy@avagotech.com>
Signed-off-by: Sathya Perla <sathya.perla@avagotech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

be2net: move FW flash cmd code to be_cmds.c

All code relating to FW cmds is in be_cmds.[ch] excepting FW flash cmd
related code. This patch moves these routines from be_main.c to be_cmds.c

Signed-off-by: Suresh Reddy <suresh.reddy@avagotech.com>
Signed-off-by: Sathya Perla <sathya.perla@avagotech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

be2net: cleanup FW flash image related macro defines

Many constant definitions relating to the FW-image layout
(such as section offset values) were defined in decimal format rather than
hexa-decimal. This makes this part of the code un-readable. Also some
defines related to BE2 are labeld "g2" and defines related to BE3 are
labeled "g3". This patch cleans up all of this to make this code more
readable.

Signed-off-by: Suresh Reddy <suresh.reddy@avagotech.com>
Signed-off-by: Sathya Perla <sathya.perla@avagotech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

be2net: avoid configuring VEPA mode on BE3

BE3 chip doesn't support VEPA mode.

Signed-off-by: Suresh Reddy <suresh.reddy@avagotech.com>
Signed-off-by: Sathya Perla <sathya.perla@avagotech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

be2net: fix VF link state transition from disabled to auto

The VF link state setting transition from "disable" to "auto" does not work
due to a bug in SET_LOGICAL_LINK_CONFIG_V1 cmd in FW. This issue could not
be fixed in FW due to some backward compatibility issues it causes with
some released drivers. The issue has been fixed by introducing a new
version (v2) of the cmd from 10.6 FW onwards. In v2, to set the VF link
state to auto, both PLINK_ENABLE and PLINK_TRACK bits have to be set to 1.

The VF link state setting feature now works on Lancer chips too from
FW ver 10.6.315.0 onwards.

Signed-off-by: Suresh Reddy <suresh.reddy@avagotech.com>
Signed-off-by: Sathya Perla <sathya.perla@avagotech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'for-linus' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
"Make the block layer great again.

  Basically three amazing fixes in this pull request, split into 4
  patches.  Believe me, they should go into 4.4.  Two of them fix a
  regression, the third and last fixes an easy-to-trigger bug.

   - Fix a bad irq enable through null_blk, for queue_mode=1 and using
     timer completions.  Add a block helper to restart a queue
     asynchronously, and use that from null_blk.  From me.

   - Fix a performance issue in NVMe.  Some devices (Intel Pxxxx) expose
     a stripe boundary, and performance suffers if we cross it.  We took
     that into account for merging, but not for the newer splitting
     code.  Fix from Keith.

   - Fix a kernel oops in lightnvm with multiple channels.  From Matias"

* 'for-linus' of git://git.kernel.dk/linux-block:
  lightnvm: wrong offset in bad blk lun calculation
  null_blk: use async queue restart helper
  block: add blk_start_queue_async()
  block: Split bios on chunk boundaries

ixgbe: Fix bugs in ixgbe_clear_vf_vlans()

When I had rewritten the code for ixgbe_clear_vf_vlans() it looks like I
had transitioned back and forth between using word as an offset and using
word as a register offset.  As a result I honestly don't see how the code
was working before other than the fact that resetting the VLANs on the VF
like didn't do much to clear them.

Another issue found is that the mask was using a divide instead of a
modulus.  As a result the mask bit was incorrectly being set to either bit
0 or 1 based on the value of the VF being tested.  As a result the wrong
VFs were having their VLANs cleared if they were enabled.

I have updated the code so that word represents the offset in the array.
This way we can use the modulus and xor operations and they will make sense
instead of being performed on a 4 byte aligned value.

I replaced the statement "(word % 2) ^ 1" with "~word % 2" in order to
reduce the line length as the line exceeded 80 characters with the register
name inserted.  The two should be equivalent so the change should be safe.

Reported-by: Emil Tantilov <emil.s.tantilov@intel.com>
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: Correct X550EM_x revision check

The X550EM_x revision check needs to check a value, not just a bit.
Use a mask and check the value. Also remove the redundant check
inside the ixgbe_enter_lplu_t_x550em, because it can only be called
when both the mac type and revision check pass.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: fix RSS limit for X550

X550 allows for up to 64 RSS queues, but the driver can have max
of 63 (-1 MSIX vector for link).

On systems with >= 64 CPUs the driver will set the redirection table
for all 64 queues which will result in packets being dropped.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: Clean up redundancy in hw_enc_features

Clean up minor redundancy in the setting of hw_enc_features that
makes it appears that X550 uniquely has more encapsulation features
than other devices. The driver only supports one more feature, so
make it look that way. No longer set NETIF_F_SG since that is set
by the register_netdev call. Thanks to Alex Duyck for noticing this
slight confusion.

Reported-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: report correct media type for KR, KX and KX4 interfaces

Ethtool reports backplane type interfaces as 1000/10000baseT link modes.
This has been corrected to report the media as KR, KX or KX4 based on the
backplane interface present.

Signed-off-by: Veola Nazareth <veola.nazareth@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: add support for QSFP PHY types in ixgbe_get_settings()

Add missing QSFP PHY types to allow for more accurate reporting of
port settings.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbevf: minor cleanups for ixgbevf_set_itr()

adapter->rx_itr_setting is not a mask so check it with == instead of &
do not default to 12K interrupts in ixgbevf_set_itr()

There should be no functional effect from these changes.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbevf: Fix handling of NAPI budget when multiple queues are enabled per vector

This is the same patch as for ixgbe but applied differently according to
busy polling. See commit 5d6002b7b822c74 ("ixgbe: Fix handling of NAPI
budget when multiple queues are enabled per vector")

Signed-off-by: William Dauchy <william@gandi.net>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Merge branch 'akpm' (patches from Andrew)

Merge misc fixes from Andrew Morton:
"9 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm/vmstat: fix overflow in mod_zone_page_state()
  ocfs2/dlm: clear migration_pending when migration target goes down
  mm/memory_hotplug.c: check for missing sections in test_pages_in_a_zone()
  ocfs2: fix flock panic issue
  m32r: add io*_rep helpers
  m32r: fix build failure
  arch/x86/xen/suspend.c: include xen/xen.h
  mm: memcontrol: fix possible memcg leak due to interrupted reclaim
  ocfs2: fix BUG when calculate new backup super

Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs

Pull vfs fix from Al Viro:
"Fix for 3.15 breakage of fcntl64() in arm OABI compat.  -stable
  fodder"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  [PATCH] arm: fix handling of F_OFD_... in oabi_fcntl64()

mm/vmstat: fix overflow in mod_zone_page_state()

mod_zone_page_state() takes a "delta" integer argument.  delta contains
the number of pages that should be added or subtracted from a struct
zone's vm_stat field.

If a zone is larger than 8TB this will cause overflows.  E.g.  for a
zone with a size slightly larger than 8TB the line

    mod_zone_page_state(zone, NR_ALLOC_BATCH, zone->managed_pages);

in mm/page_alloc.c:free_area_init_core() will result in a negative
result for the NR_ALLOC_BATCH entry within the zone's vm_stat, since 8TB
contain 0x8xxxxxxx pages which will be sign extended to a negative
value.

Fix this by changing the delta argument to long type.

This could fix an early boot problem seen on s390, where we have a 9TB
system with only one node.  ZONE_DMA contains 2GB and ZONE_NORMAL the
rest.  The system is trying to allocate a GFP_DMA page but ZONE_DMA is
completely empty, so it tries to reclaim pages in an endless loop.

This was seen on a heavily patched 3.10 kernel.  One possible
explaination seem to be the overflows caused by mod_zone_page_state().
Unfortunately I did not have the chance to verify that this patch
actually fixes the problem, since I don't have access to the system
right now.  However the overflow problem does exist anyway.

Given the description that a system with slightly less than 8TB does
work, this seems to be a candidate for the observed problem.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ocfs2/dlm: clear migration_pending when migration target goes down

We have found a BUG on res->migration_pending when migrating lock
resources.  The situation is as follows.

dlm_mark_lockres_migration
  res->migration_pending = 1;
  __dlm_lockres_reserve_ast
  dlm_lockres_release_ast returns with res->migration_pending remains
      because other threads reserve asts
  wait dlm_migration_can_proceed returns 1
  >>>>>>> o2hb found that target goes down and remove target
          from domain_map
  dlm_migration_can_proceed returns 1
  dlm_mark_lockres_migrating returns -ESHOTDOWN with
      res->migration_pending still remains.

When reentering dlm_mark_lockres_migrating(), it will trigger the BUG_ON
with res->migration_pending.  So clear migration_pending when target is
down.

Signed-off-by: Jiufei Xue <xuejiufei@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm/memory_hotplug.c: check for missing sections in test_pages_in_a_zone()

test_pages_in_a_zone() does not account for the possibility of missing
sections in the given pfn range. pfn_valid_within always returns 1 when
CONFIG_HOLES_IN_ZONE is not set, allowing invalid pfns from missing
sections to pass the test, leading to a kernel oops.

Wrap an additional pfn loop with PAGES_PER_SECTION granularity to check
for missing sections before proceeding into the zone-check code.

This also prevents a crash from offlining memory devices with missing
sections. Despite this, it may be a good idea to keep the related patch
'[PATCH 3/3] drivers: memory: prohibit offlining of memory blocks with
missing sections' because missing sections in a memory block may lead to
other problems not covered by the scope of this fix.

Signed-off-by: Andrew Banman <abanman@sgi.com>
Acked-by: Alex Thorlton <athorlton@sgi.com>
Cc: Russ Anderson <rja@sgi.com>
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Greg KH <greg@kroah.com>
Cc: Seth Jennings <sjennings@variantweb.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ocfs2: fix flock panic issue

Commit 4f6563677ae8 ("Move locks API users to locks_lock_inode_wait()")
move flock/posix lock indentify code to locks_lock_inode_wait(), but
missed to set fl_flags to FL_FLOCK which caused the following kernel
panic on 4.4.0_rc5.

  kernel BUG at fs/locks.c:1895!
  invalid opcode: 0000 [#1] SMP
  Modules linked in: ocfs2(O) ocfs2_dlmfs(O) ocfs2_stack_o2cb(O) ocfs2_dlm(O) ocfs2_nodemanager(O) ocfs2_stackglue(O) iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi xen_kbdfront xen_netfront xen_fbfront xen_blkfront
  CPU: 0 PID: 20268 Comm: flock_unit_test Tainted: G           O    4.4.0-rc5-next-20151217 #1
  Hardware name: Xen HVM domU, BIOS 4.3.1OVM 05/14/2014
  task: ffff88007b3672c0 ti: ffff880028b58000 task.ti: ffff880028b58000
  RIP: locks_lock_inode_wait+0x2e/0x160
  Call Trace:
    ocfs2_do_flock+0x91/0x160 [ocfs2]
    ocfs2_flock+0x76/0xd0 [ocfs2]
    SyS_flock+0x10f/0x1a0
    entry_SYSCALL_64_fastpath+0x12/0x71
  Code: e5 41 57 41 56 49 89 fe 41 55 41 54 53 48 89 f3 48 81 ec 88 00 00 00 8b 46 40 83 e0 03 83 f8 01 0f 84 ad 00 00 00 83 f8 02 74 04 <0f> 0b eb fe 4c 8d ad 60 ff ff ff 4c 8d 7b 58 e8 0e 8e 73 00 4d
  RIP  locks_lock_inode_wait+0x2e/0x160
   RSP <ffff880028b5bce8>
  ---[ end trace dfca74ec9b5b274c ]---

Fixes: 4f6563677ae8 ("Move locks API users to locks_lock_inode_wait()")
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

m32r: add io*_rep helpers

m32r allmodconfig was failing with the error:

error: implicit declaration of function 'read'

On checking io.h it turned out that 'read' is not defined but 'readb' is
defined and 'ioread8' will then obviously mean 'readb'.

At the same time some of the helper functions ioreadN_rep() and
iowriteN_rep() were missing which also led to the build failure.

Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

m32r: fix build failure

m32r allmodconfig is failing with:

  In file included from ../include/linux/kvm_para.h:4:0,
                   from ../kernel/watchdog.c:26:
  ../include/uapi/linux/kvm_para.h:30:26: fatal error: asm/kvm_para.h: No such file or directory

kvm_para.h was not included in the build.

Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

arch/x86/xen/suspend.c: include xen/xen.h

Fix the build warning:

  arch/x86/xen/suspend.c: In function 'xen_arch_pre_suspend':
  arch/x86/xen/suspend.c:70:9: error: implicit declaration of function 'xen_pv_domain' [-Werror=implicit-function-declaration]
          if (xen_pv_domain())
              ^

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: memcontrol: fix possible memcg leak due to interrupted reclaim

Memory cgroup reclaim can be interrupted with mem_cgroup_iter_break()
once enough pages have been reclaimed, in which case, in contrast to a
full round-trip over a cgroup sub-tree, the current position stored in
mem_cgroup_reclaim_iter of the target cgroup does not get invalidated
and so is left holding the reference to the last scanned cgroup.  If the
target cgroup does not get scanned again (we might have just reclaimed
the last page or all processes might exit and free their memory
voluntary), we will leak it, because there is nobody to put the
reference held by the iterator.

The problem is easy to reproduce by running the following command
sequence in a loop:

    mkdir /sys/fs/cgroup/memory/test
    echo 100M > /sys/fs/cgroup/memory/test/memory.limit_in_bytes
    echo $$ > /sys/fs/cgroup/memory/test/cgroup.procs
    memhog 150M
    echo $$ > /sys/fs/cgroup/memory/cgroup.procs
    rmdir test

The cgroups generated by it will never get freed.

This patch fixes this issue by making mem_cgroup_iter avoid taking
reference to the current position.  In order not to hit use-after-free
bug while running reclaim in parallel with cgroup deletion, we make use
of ->css_released cgroup callback to clear references to the dying
cgroup in all reclaim iterators that might refer to it.  This callback
is called right before scheduling rcu work which will free css, so if we
access iter->position from rcu read section, we might be sure it won't
go away under us.

[hannes@cmpxchg.org: clean up css ref handling]
Fixes: 5ac8fb31ad2e ("mm: memcontrol: convert reclaim iterator to simple css refcounting")
Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@kernel.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org> [3.19+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ocfs2: fix BUG when calculate new backup super

When resizing, it firstly extends the last gd.  Once it should backup
super in the gd, it calculates new backup super and update the
corresponding value.

But it currently doesn't consider the situation that the backup super is
already done.  And in this case, it still sets the bit in gd bitmap and
then decrease from bg_free_bits_count, which leads to a corrupted gd and
trigger the BUG in ocfs2_block_group_set_bits:

    BUG_ON(le16_to_cpu(bg->bg_free_bits_count) < num_bits);

So check whether the backup super is done and then do the updates.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: Jiufei Xue <xuejiufei@huawei.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

net: hns: use to_platform_device()

Use to_platform_device() instead of open-coding it.

Signed-off-by: Geliang Tang <geliangtang@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

atm: solos-pci: use to_pci_dev()

Use to_pci_dev() instead of open-coding it.

Signed-off-by: Geliang Tang <geliangtang@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers: net: cpsw: fix error return code

Propagate the return value of platform_get_irq on failure.

A simplified version of the semantic match that finds the two cases where
no error code is returned at all is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
identifier ret; expression e1,e2;
@@
(
if ($ret < 0\|ret != 0$)
{ ... return ret; }
|
ret = 0
)
... when != ret = e1
    when != &ret
*if(...)
{
  ... when != ret = e2
      when forall
return ret;
}
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

openvswitch: Fix template leak in error cases.

Commit 5b48bb8506c5 ("openvswitch: Fix helper reference leak") fixed a
reference leak on helper objects, but inadvertently introduced a leak on
the ct template.

Previously, ct_info.ct->general.use was initialized to 0 by
nf_ct_tmpl_alloc() and only incremented when ovs_ct_copy_action()
returned successful. If an error occurred while adding the helper or
adding the action to the actions buffer, the __ovs_ct_free_action()
cleanup would use nf_ct_put() to free the entry; However, this relies on
atomic_dec_and_test(ct_info.ct->general.use). This reference must be
incremented first, or nf_ct_put() will never free it.

Fix the issue by acquiring a reference to the template immediately after
allocation.

Fixes: cae3a2627520 ("openvswitch: Allow attaching helpers to ct action")
Fixes: 5b48bb8506c5 ("openvswitch: Fix helper reference leak")
Signed-off-by: Joe Stringer <joe@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'bpf_hash-locking'

Ming Lei says:

====================
bpf: hash: use per-bucket spinlock

This patchset tries to optimize ebpf hash map, and follows
the idea:

        Both htab_map_update_elem() and htab_map_delete_elem()
        can be called from eBPF program, and they may be in kernel
        hot path, it isn't efficient to use a per-hashtable lock
        in this two helpers, so this patch converts the lock into
        per-bucket spinlock.

With this patchset, looks the performance penalty from eBPF
decreased a lot, see the following test:

        1) run 'tools/biolatency' of bcc before running block test;

        2) run fio to test block throught over /dev/nullb0,
        (randread, 16jobs, libaio, 4k bs) and the test box
        is one 24cores(dual sockets) VM server:
        - without patchset:  607K IOPS
        - with this patchset: 1184K IOPS
        - without running eBPF prog: 1492K IOPS

TODO:
        - remove the per-hashtable atomic counter

V2:
        - fix checking on buckets size
V1:
        - fix the wrong 3/3 patch
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: hash: use per-bucket spinlock

Both htab_map_update_elem() and htab_map_delete_elem() can be
called from eBPF program, and they may be in kernel hot path,
so it isn't efficient to use a per-hashtable lock in this two
helpers.

The per-hashtable spinlock is used for protecting bucket's
hlist, and per-bucket lock is just enough. This patch converts
the per-hashtable lock into per-bucket spinlock, so that
contention can be decreased a lot.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: hash: move select_bucket() out of htab's spinlock

The spinlock is just used for protecting the per-bucket
hlist, so it isn't needed for selecting bucket.

Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: hash: use atomic count

Preparing for removing global per-hashtable lock, so
the counter need to be defined as aotmic_t first.

Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

[PATCH] arm: fix handling of F_OFD_... in oabi_fcntl64()

Cc: stable@vger.kernel.org # 3.15+
Reviewed-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

lightnvm: wrong offset in bad blk lun calculation

dev->nr_luns reports the total number of luns available in a device
while dev->luns_per_chnl is the number of luns per channel.

When multiple channels are available, the offset is calculated from a
channel and lun id into a linear array. As it multiplies with
the total number of luns, we go out of bound when channel id > 0 and
causes the kernel to panic when we read a protected kernel memory area.

Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>

Merge tag 'for-linus' of git://git./linux/kernel/git/dledford/rdma

Pull rdma fixes from Doug Ledford:
"Three late 4.4-rc fixes.

  The first two were very small in terms of number of lines, the third
  is more lines of change than I like this late in the cycle, but there
  are positive test results from Avagotech and from my own test setup
  with the target hardware, and given the problem was a 100% failure
  case, I sent it through.

   - A previous patch updated the mlx4 driver to use vmalloc when there
     was not enough memory to get a contiguous region large enough for
     our needs, so we need kvfree() whenever we free that item.  We
     missed one place, so fix that now.

   - A previous patch added code to match incoming packets against a
     specific device, but failed to compensate for devices that have
     both InfiniBand and Ethernet ports.  Fix that.

   - Under certain vlan conditions, the ocrdma driver would fail to
     bring up any vlan interfaces and would print out a circular locking
     failure.  Fix that"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
  RDMA/be2net: Remove open and close entry points
  RDMA/ocrdma: Depend on async link events from CNA
  RDMA/ocrdma: Dispatch only port event when port state changes
  RDMA/ocrdma: Fix vlan-id assignment in qp parameters
  IB/mlx4: Replace kfree with kvfree in mlx4_ib_destroy_srq
  IB/cma: cma_match_net_dev needs to take into account port_num

null_blk: use async queue restart helper

If null_blk is run in NULL_IRQ_TIMER mode and with queue_mode NULL_Q_RQ,
we need to restart the queue from the hrtimer interrupt. We can't
directly invoke the request_fn from that context, so punt the queue run
to async kblockd context.

Tested-by: Rabin Vincent <rabin@rab.in>
Signed-off-by: Jens Axboe <axboe@fb.com>

block: add blk_start_queue_async()

We currently only have an inline/sync helper to restart a stopped
queue. If drivers need an async version, they have to roll their
own. Add a generic helper instead.

Signed-off-by: Jens Axboe <axboe@fb.com>

Merge branch 'linus' of git://git./linux/kernel/git/herbert/crypto-2.6

Pull crypto fix from Herbert Xu:
"This fixes a bug in the algif_skcipher interface that can trigger a
  kernel WARN_ON from user-space.  It does so by using the new skcipher
  interface which unlike the previous ablkcipher does not need to create
  extra geniv objects which is what was used to trigger the WARN_ON"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: algif_skcipher - Use new skcipher interface

Merge branch 'for-linus2' of git://git./linux/kernel/git/jmorris/linux-security

Pull key handling bugfix from James Morris:
"Fix a race between keyctl_read() and keyctl_revoke()"

* 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
KEYS: Fix race between read and revoke

RDMA/be2net: Remove open and close entry points

Recently Dough Ledford reported a deadlock happening
between ocrdma-load sequence and NetworkManager service
issueing "open" on be2net interface.

The deadlock happens when any be2net hook (e.g. open/close) is called
in parallel to insmod ocrdma.ko.

A. be2net is sending administrative open/close event to ocrdma holding
   device_list_mutex. It does this from ndo_open/ndo_stop hooks of be2net.
   So sequence of locks is rtnl_lock---> device_list lock

B.  When new ocrdma roce device gets registered, infiniband stack now
    takes rtnl_lock in ib_register_device() in GID initialization routines.
    So sequence of locks in this path is device_list lock ---> rtnl_lock.

This improper locking sequence causes deadlock.

In order to resolve the above deadlock condition, ocrdma intorduced a
patch to stop listening to administrative open/close events generated from
be2net driver. It now depends on link-state-change async-event generated from
CNA. This change leaves behind dead code which used to generate administrative
open/close events. This patch cleans-up all that dead code from be2net.

Reported-by: Doug Ledford <dledford@redhat.com>
CC: Sathya Perla <sathya.perla@avagotech.com>
Signed-off-by: Padmanabh Ratnakar <padmanabh.ratnakar@avagotech.com>
Signed-off-by: Selvin Xavier <selvin.xavier@avagotech.com>
Signed-off-by: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

RDMA/ocrdma: Depend on async link events from CNA

Recently Dough Ledford reported a deadlock happening
between ocrdma-load sequence and NetworkManager service
issuing "open" on be2net interface.

The deadlock happens when any be2net hook (e.g. open/close) is called
in parallel to insmod ocrdma.ko.

A. be2net is sending administrative open/close event to ocrdma holding
   device_list_mutex. It does this from ndo_open/ndo_stop hooks of be2net.
   So sequence of locks is rtnl_lock---> device_list lock

B.  When new ocrdma roce device gets registered, infiniband stack now
    takes rtnl_lock in ib_register_device() in GID initialization routines.
    So sequence of locks in this path is device_list lock ---> rtnl_lock.

This improper locking sequence causes deadlock.

With this patch we stop using administrative open and close events
injected by be2net driver. These events were used to dispatch PORT_ACTIVE
and PORT_ERROR events to the IB-stack. This patch implements a logic
to receive async-link-events generated from CNA whenever link-state-change
is detected. Now on, these async-events will be used to dispatch
PORT_ACTIVE and PORT_ERROR events to IB-stack.

Depending on async-events from CNA removes the need to hold device-list-mutex
and thus breaks the busy-wait scenario.

Reported-by: Doug Ledford <dledford@redhat.com>
CC: Sathya Perla <sathya.perla@avagotech.com>
Signed-off-by: Padmanabh Ratnakar <padmanabh.ratnakar@avagotech.com>
Signed-off-by: Selvin Xavier <selvin.xavier@avagotech.com>
Signed-off-by: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

RDMA/ocrdma: Dispatch only port event when port state changes

Dispatch only port event to IB stack when port state changes.
Don't explicitly modify qps to error. Let application listen to
port events on async event queue or let QP fail with retry-exceeded
completion error.

Signed-off-by: Padmanabh Ratnakar <padmanabh.ratnakar@avagotech.com>
Signed-off-by: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

RDMA/ocrdma: Fix vlan-id assignment in qp parameters

vlan-id is wrongly getting as 0 when PFC is enabled.
Set vlan-id configured by user in QP parameters.
In case vlan interface is not used, flash a warning to
user to configure vlan and assign vlan-id as 0 in qp params.

Fixes: dbf727de7440 ('IB/core: Use GID table in AH creation and dmac resolution')
Cc: Matan Barak <matanb@mellanox.com>
Signed-off-by: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

Merge branch 'bnxt_en-next'

Michael Chan says:

====================
bnxt_en: Patches for net-next.

Mainly clean-ups, optimizations, and updating to the latest firmware
interface spec.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt_en: Add BCM57301 & BCM57402 devices.

Added the PCI IDs for the BCM57301 and BCM57402 controllers.

Signed-off-by: David Christensen <davidch@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt_en: Update to Firmware interface spec 1.0.0.

This interface will be forward compatible with future changes.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt_en: Keep track of the ring group resource.

Newer firmware will return the ring group resource when we call
hwrm_func_qcaps(). To be compatible with older firmware, use the
number of tx rings as the number of ring groups if the older firmware
returns 0. When determining how many rx rings we can support, take
the ring group resource in account as well in _bnxt_get_max_rings().
Divide and assign the ring groups to VFs.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt_en: Improve VF resource accounting.

We need to keep track of all resources, such as rx rings, tx rings,
cmpl rings, rss contexts, stats contexts, vnics, after we have
divided them for the VFs. Otherwise, subsequent ring changes on
the PF may not work correctly.

We adjust all max resources in struct bnxt_pf_info after they have been
assigned to the VFs. There is no need to keep the separate
max_pf_tx_rings and max_pf_rx_rings.

When SR-IOV is disabled, we call bnxt_hwrm_func_qcaps() to restore the
max resources for the PF.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt_en: Cleanup bnxt_hwrm_func_cfg().

1. Use local variable pf for repeated access to this pointer.

2. The 2nd argument num_vfs was unnecessarily declared as pointer to int.
This function doesn't change num_vfs so change the argument to int.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt_en: Check hardware resources before enabling NTUPLE.

The hardware resources required to enable NTUPLE varies depending on
how many rx channels are configured.  We need to make sure we have the
resources before we enable NTUPLE.  Add bnxt_rfs_capable() to do the
checking.

In addition, we need to do the same checking in ndo_fix_features().  As
the rx channels are changed using ethtool -L, we call
netdev_update_features() to make the necessary adjustment for NTUPLE.

Calling netdev_update_features() in netif_running() state but before
calling bnxt_open_nic() would be a problem.  To make this work,
bnxt_set_features() has to be modified to test for BNXT_STATE_OPEN for
the true hardware state instead of checking netif_running().

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt_en: Don't treat single segment rx frames as GRO frames.

If hardware completes single segment rx frames, don't bother setting
up all the GRO related fields. Pass the SKB up as a normal frame.

Reviewed-by: vasundhara volam <vvolam@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt_en: Allocate rx_cpu_rmap only if Accelerated RFS is enabled.

Also, no need to check for bp->rx_nr_rings as it is always >= 1. If the
allocation fails, it is not a fatal error and we can still proceed.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt_en: Increment checksum error counter only if NETIF_F_RXCSUM is set.

rx_l4_csum_error is now incremented only when offload is enabled

Signed-off-by: Satish Baddipadige <sbaddipa@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt_en: Add support for upgrading APE/NC-SI firmware via Ethtool FLASHDEV

NC-SI firmware of type apeFW (10) is now supported.

Signed-off-by: Rob Swindell <swindell@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt_en: Optimize ring alloc and ring free functions.

Remove the unnecessary "if" statement before the "for" statement:

if (x) {
for (i = 0; i < x; i++)
...
}

Also, change the ring free function to return void as it only returns 0.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt_en: support hwrm_func_drv_unrgtr command

During remove_one, the driver should issue hwrm_func_drv_unrgtr
command to inform firmware that this function has been unloaded.
This is to let firmware keep track of driver present/absent state
when driver is gracefully unloaded. A keep alive timer is needed
later to keep track of driver state during abnormal shutdown.

Signed-off-by: Jeffrey Huang <huangjw@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qlcnic: constify qlcnic_dcb_ops structures

The qlcnic_dcb_ops structures are never modified, so declare them as const.

Done with the help of Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'r8169-RTL8168H-PHY-fixes'

Chunhao Lin says:

====================
r8169: Update RTL8168H PHY parameters

Fix typo in setting PHY parameter and update the way of reading PHY register
"rg_saw_cnt".
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

r8169:Update the way of reading RTL8168H PHY register "rg_saw_cnt"

The vlaue of RTL8168H PHY register "rg_saw_cnt" only valid from bit0 to bit13.
When read this register, add bitwise-anding its value with 0x3fff.

Signed-off-by: Chunhao Lin <hau@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8169:Fix typo in setting RTL8168H PHY parameter

In function "rtl8168h_2_hw_phy_config", there is a typo in setting
RTL8168H PHY parameter.

Signed-off-by: Chunhao Lin <hau@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sctp: label accepted/peeled off sockets

Accepted or peeled off sockets were missing a security label (e.g.
SELinux) which means that socket was in "unlabeled" state.

This patch clones the sock's label from the parent sock and resolves the
issue (similar to AF_BLUETOOTH protocol family).

Cc: Paul Moore <pmoore@redhat.com>
Cc: David Teigland <teigland@redhat.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sctp: use GFP_USER for user-controlled kmalloc

Commit cacc06215271 ("sctp: use GFP_USER for user-controlled kmalloc")
missed two other spots.

For connectx, as it's more likely to be used by kernel users of the API,
it detects if GFP_USER should be used or not.

Fixes: cacc06215271 ("sctp: use GFP_USER for user-controlled kmalloc")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Driver for IBM System i/p VNIC protocol

This is a new device driver for a high performance SR-IOV assisted virtual
network for IBM System p and IBM System i systems. The SR-IOV VF will be
attached to the VIOS partition and mapped to the Linux client via the
hypervisor's VNIC protocol that this driver implements.

This driver is able to perform basic tx and rx, new features
and improvements will be added as they are being developed and tested.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: John Allen <jallen@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>