platform/kernel/linux-starfive.git
6 years agonet: phy: marvell: consolidate RGMII delay code
Andrew Lunn [Sun, 30 Jul 2017 20:41:46 +0000 (22:41 +0200)]
net: phy: marvell: consolidate RGMII delay code

The same code is repeated for different PHY versions. Put it into a
help and call when needed.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: phy: marvell: Use core genphy_soft_reset()
Andrew Lunn [Sun, 30 Jul 2017 20:41:45 +0000 (22:41 +0200)]
net: phy: marvell: Use core genphy_soft_reset()

Rather than using an open coded equivalent, use the core
genphy_soft_reset() function.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: phy: marvell: tabification
Andrew Lunn [Sun, 30 Jul 2017 20:41:44 +0000 (22:41 +0200)]
net: phy: marvell: tabification

Convert spaces to tabs where appropriate, and fix up some otherwise
odd indentation.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomv643xx_eth: fix of_irq_to_resource() error check
Sergei Shtylyov [Sat, 29 Jul 2017 19:18:41 +0000 (22:18 +0300)]
mv643xx_eth: fix of_irq_to_resource() error check

of_irq_to_resource() has recently been  fixed to return negative error #'s
along with 0 in case of failure,  however the Marvell MV643xx Ethernet
driver still only regards 0  as invalid IRQ -- fix it up.

Fixes: 7a4228bbff76 ("of: irq: use of_irq_get() in of_irq_to_resource()")
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: bcmgenet: Add dependency on HAS_IOMEM && OF
Florian Fainelli [Tue, 1 Aug 2017 00:53:07 +0000 (17:53 -0700)]
net: bcmgenet: Add dependency on HAS_IOMEM && OF

The driver needs CONFIG_HAS_IOMEM and OF to be functional, but we still
let it build with COMPILE_TEST. This fixes the unmet dependency after
selecting MDIO_BCM_UNIMAC in commit mentioned below:

warning: (NET_DSA_BCM_SF2 && BCMGENET) selects MDIO_BCM_UNIMAC which has
unmet direct dependencies (NETDEVICES && MDIO_DEVICE && HAS_IOMEM &&
OF_MDIO)

Fixes: 9a4e79697009 ("net: bcmgenet: utilize generic Broadcom UniMAC MDIO controller driver")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMAINTAINERS: Add more files to the PHY LIBRARY section
Florian Fainelli [Mon, 31 Jul 2017 16:47:50 +0000 (09:47 -0700)]
MAINTAINERS: Add more files to the PHY LIBRARY section

Include missing files that are provided by, used, or directly maintained
within the PHY LIBRARY, this include uapi header, header files used by
Device Tree code etc.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoipv4: fib: Fix NULL pointer deref during fib_sync_down_dev()
Ido Schimmel [Fri, 28 Jul 2017 20:27:44 +0000 (23:27 +0300)]
ipv4: fib: Fix NULL pointer deref during fib_sync_down_dev()

Michał reported a NULL pointer deref during fib_sync_down_dev() when
unregistering a netdevice. The problem is that we don't check for
'in_dev' being NULL, which can happen in very specific cases.

Usually routes are flushed upon NETDEV_DOWN sent in either the netdev or
the inetaddr notification chains. However, if an interface isn't
configured with any IP address, then it's possible for host routes to be
flushed following NETDEV_UNREGISTER, after NULLing dev->ip_ptr in
inetdev_destroy().

To reproduce:
$ ip link add type dummy
$ ip route add local 1.1.1.0/24 dev dummy0
$ ip link del dev dummy0

Fix this by checking for the presence of 'in_dev' before referencing it.

Fixes: 982acb97560c ("ipv4: fib: Notify about nexthop status changes")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Tested-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: phy: Correctly process PHY_HALTED in phy_stop_machine()
Florian Fainelli [Fri, 28 Jul 2017 18:58:36 +0000 (11:58 -0700)]
net: phy: Correctly process PHY_HALTED in phy_stop_machine()

Marc reported that he was not getting the PHY library adjust_link()
callback function to run when calling phy_stop() + phy_disconnect()
which does not indeed happen because we set the state machine to
PHY_HALTED but we don't get to run it to process this state past that
point.

Fix this with a synchronous call to phy_state_machine() in order to have
the state machine actually act on PHY_HALTED, set the PHY device's link
down, turn the network device's carrier off and finally call the
adjust_link() function.

Reported-by: Marc Gonzalez <marc_gonzalez@sigmadesigns.com>
Fixes: a390d1f379cf ("phylib: convert state_queue work to delayed_work")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Marc Gonzalez <marc_gonzalez@sigmadesigns.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotcp: add related fields into SCM_TIMESTAMPING_OPT_STATS
Wei Wang [Fri, 28 Jul 2017 17:28:21 +0000 (10:28 -0700)]
tcp: add related fields into SCM_TIMESTAMPING_OPT_STATS

Add the following stats into SCM_TIMESTAMPING_OPT_STATS control msg:
    TCP_NLA_PACING_RATE
    TCP_NLA_DELIVERY_RATE
    TCP_NLA_SND_CWND
    TCP_NLA_REORDERING
    TCP_NLA_MIN_RTT
    TCP_NLA_RECUR_RETRANS
    TCP_NLA_DELIVERY_RATE_APP_LMT

Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotcp: extract the function to compute delivery rate
Wei Wang [Fri, 28 Jul 2017 17:28:20 +0000 (10:28 -0700)]
tcp: extract the function to compute delivery rate

Refactor the code to extract the function to compute delivery rate.
This function will be used in later commit.

Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosunhme: fix up GREG_STAT and GREG_IMASK register offsets
Mark Cave-Ayland [Thu, 27 Jul 2017 16:26:00 +0000 (17:26 +0100)]
sunhme: fix up GREG_STAT and GREG_IMASK register offsets

Update the values to match those from the STP2002QFP documentation.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: phy: Log only PHY state transitions
Marc Gonzalez [Fri, 28 Jul 2017 11:18:30 +0000 (13:18 +0200)]
net: phy: Log only PHY state transitions

In the current code, old and new PHY states are always logged.
>From now on, log only PHY state transitions.

Signed-off-by: Marc Gonzalez <marc_gonzalez@sigmadesigns.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'mlxsw-Various-small-fixes'
David S. Miller [Mon, 31 Jul 2017 21:44:34 +0000 (14:44 -0700)]
Merge branch 'mlxsw-Various-small-fixes'

Jiri Pirko says:

====================
mlxsw: Various small fixes

This patch series is to contribute several fixes for nits that I noticed while
working on mlxsw. The changes range from typo fixes to local improvements of
the code and have little in common besides being small in scope.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: spectrum_router: Simplify a piece of code
Petr Machata [Mon, 31 Jul 2017 07:27:30 +0000 (09:27 +0200)]
mlxsw: spectrum_router: Simplify a piece of code

Express the same logic more succinctly.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: spectrum_router: Clarify a piece of code
Petr Machata [Mon, 31 Jul 2017 07:27:29 +0000 (09:27 +0200)]
mlxsw: spectrum_router: Clarify a piece of code

Prefer logical operator that expresses the intent to bitwise one that
happens to give the same result.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: spectrum_router: Simplify a piece of code
Petr Machata [Mon, 31 Jul 2017 07:27:28 +0000 (09:27 +0200)]
mlxsw: spectrum_router: Simplify a piece of code

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: reg.h: Namespace IP2ME registers
Petr Machata [Mon, 31 Jul 2017 07:27:27 +0000 (09:27 +0200)]
mlxsw: reg.h: Namespace IP2ME registers

This renames IP2ME-specific registers reg_ralue_v and
reg_ralue_tunnel_ptr to reg_ralue_ip2me_*.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: Update specification of reg_ritr_type
Petr Machata [Mon, 31 Jul 2017 07:27:26 +0000 (09:27 +0200)]
mlxsw: Update specification of reg_ritr_type

The comments really belong to the individual enumerators. The comment
at the register should instead reference the enum.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: spectrum_router: Fix a typo
Petr Machata [Mon, 31 Jul 2017 07:27:25 +0000 (09:27 +0200)]
mlxsw: spectrum_router: Fix a typo

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: reg.h: Fix a typo
Petr Machata [Mon, 31 Jul 2017 07:27:24 +0000 (09:27 +0200)]
mlxsw: reg.h: Fix a typo

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: spectrum_acl: Fix a typo
Petr Machata [Mon, 31 Jul 2017 07:27:23 +0000 (09:27 +0200)]
mlxsw: spectrum_acl: Fix a typo

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'bcmgenet-utilize-MDIO-unimac-driver'
David S. Miller [Mon, 31 Jul 2017 21:40:59 +0000 (14:40 -0700)]
Merge branch 'bcmgenet-utilize-MDIO-unimac-driver'

Florian Fainelli says:

====================
net: bcmgenet: utilize MDIO unimac driver

This patch series migrates the Broadcom GENET driver to use the mdio-bcm-unimac
driver. This MDIO HW is the same as the one GENET internally embedds, yet for
historical reasons the two drivers lived their own lives. Because of the GENET
interrupt situation, we let it specify how it wants to signal MDIO operations
completion using its driver-private waitqueue.

The diffstat is not super impressive, but it's still negative! This would
make it easier in the future to absorb possible workarounds/bugs/features
within the same location.

This was tested on BCM7260 (GENETv5, single instance), BCM7439 (GENETv4, triple
instance) and BCM7445 (bcm_sf2 + mdio-bcm-unimac).

We also now have a nice /proc/iomem output:

f0b00000-f0b0fc4b : /rdb/ethernet@f0b00000
  f0b00e14-f0b00e1c : unimac-mdio.0
f0b20000-f0b2fc4b : /rdb/ethernet@f0b20000
  f0b20e14-f0b20e1c : unimac-mdio.1
f0b40000-f0b4fc4b : /rdb/ethernet@f0b40000
  f0b40e14-f0b40e1c : unimac-mdio.2
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: bcmgenet: Utilize bcmgenet_mii_exit() for error path
Florian Fainelli [Mon, 31 Jul 2017 19:04:28 +0000 (12:04 -0700)]
net: bcmgenet: Utilize bcmgenet_mii_exit() for error path

bcmgenet_mii_init() has an error path which is strictly identical to the
unwinding that bcmgenet_mii_exit() does, so have bcmgenet_mii_init()
utilize bcmgenet_mii_exit() for that.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: bcmgenet: Drop legacy MDIO code
Florian Fainelli [Mon, 31 Jul 2017 19:04:27 +0000 (12:04 -0700)]
net: bcmgenet: Drop legacy MDIO code

Now that we have fully migrated to the mdio-bcm-unimac driver, drop the
legacy MDIO bus code which did duplicate a fair amount of code.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: bcmgenet: utilize generic Broadcom UniMAC MDIO controller driver
Florian Fainelli [Mon, 31 Jul 2017 19:04:26 +0000 (12:04 -0700)]
net: bcmgenet: utilize generic Broadcom UniMAC MDIO controller driver

Update the GENET driver to register an UniMAC MDIO bus controller for
the GENET internal MDIO bus, update the platform data code to attach the
PHY to the correct MDIO bus controller.

The Device Tree portion of the code is mostly left unmodified since the
lookup/binding is done via phandles and Device Tree nodes which are much
more flexible in locating and binding PHYs to their respective MDIO bus
controllers.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: phy: mdio-bcm-unimac: Allow specifying platform data
Florian Fainelli [Mon, 31 Jul 2017 19:04:25 +0000 (12:04 -0700)]
net: phy: mdio-bcm-unimac: Allow specifying platform data

In preparation for having the bcmgenet driver migrate over the
mdio-bcm-unimac driver, add a platform data structure which allows
passing integrating specific details like bus name, wait function to
complete MDIO operations and PHY mask.

We also define what the platform device name contract is by defining
UNIMAC_MDIO_DRV_NAME and moving it to the platform_data header.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: phy: mdio-bcm-unimac: Add debug print for PHY workaround
Florian Fainelli [Mon, 31 Jul 2017 19:04:24 +0000 (12:04 -0700)]
net: phy: mdio-bcm-unimac: Add debug print for PHY workaround

In order to be stricly identical to what bcmgenet does, add a debug
print when a PHY workaround during bus->reset() is executed. Preliminary
change to moving bcmgenet towards mdio-bcm-unimac.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: phy: mdio-bcm-unimac: create unique bus names
Florian Fainelli [Mon, 31 Jul 2017 19:04:23 +0000 (12:04 -0700)]
net: phy: mdio-bcm-unimac: create unique bus names

In preparation for having multiple GENET instances in a system (up to
3), make sure that we do include the bus instance number in the name of
the MDIO bus such that we change it from "unimac-mdio" to
"unimac-mdio-0" for instance.

So far, the only user of this driver is using Device Tree, which uses a
lookup/parenting based technique to map PHY devices to their respective
MDIO bus controllers, hence causing no additional changes.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: phy: mdio-bcm-unimac: factor busy polling loop
Florian Fainelli [Mon, 31 Jul 2017 19:04:22 +0000 (12:04 -0700)]
net: phy: mdio-bcm-unimac: factor busy polling loop

Factor the code that does the busy polling on the MDIO_BUSY bit since we
will have different code-paths for for completion depending on whether
we are using interrupts or polling.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'tcp-remove-prequeue-and-header-prediction'
David S. Miller [Mon, 31 Jul 2017 21:37:50 +0000 (14:37 -0700)]
Merge branch 'tcp-remove-prequeue-and-header-prediction'

Florian Westphal says:

====================
tcp: remove prequeue and header prediction

During a hallway discussion with Eric Dumazet at Netdev 1.2 in
Tokyo some maybe-not-so-useful-anymore TCP stack features came up,
among these header prediction and prequeueing.

In brief, TCP prequeue assumes a single-process-blocking-read design,
which is not that common anymore. The most frequently used high-performance
networking program that is an excellent fit for these features is netperf.

The idea behind prequeueing is to move part of tcp processing, including
retransmit queue cleaning, to process context.

With (e)poll designs, prequeue is always skipped, so for such programs
this is dead-code removal.

Header prediction is also less useful nowadays.
For packet trains, GRO will do packet aggregation so we do not get the
per-packet benefit that this had before GRO anymore.

Because of SACK, header prediction also will be ineffective once
a connection suffers even light packet losses.

code removal aside, after this change processing always occurs in BH
context, this allows to experiment e.g. with doing bulk freeing of
skb heads when incoming ACKs clean packets from the retransmit queue.

There are no changes since the RFC, except in last patch (i missed
another no-longer-used mib counter). I also edited a few commit messages.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotcp: remove unused mib counters
Florian Westphal [Sun, 30 Jul 2017 01:57:23 +0000 (03:57 +0200)]
tcp: remove unused mib counters

was used by tcp prequeue and header prediction.
TCPFORWARDRETRANS use was removed in january.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotcp: remove CA_ACK_SLOWPATH
Florian Westphal [Sun, 30 Jul 2017 01:57:22 +0000 (03:57 +0200)]
tcp: remove CA_ACK_SLOWPATH

re-indent tcp_ack, and remove CA_ACK_SLOWPATH; it is always set now.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotcp: remove header prediction
Florian Westphal [Sun, 30 Jul 2017 01:57:21 +0000 (03:57 +0200)]
tcp: remove header prediction

Like prequeue, I am not sure this is overly useful nowadays.

If we receive a train of packets, GRO will aggregate them if the
headers are the same (HP predates GRO by several years) so we don't
get a per-packet benefit, only a per-aggregated-packet one.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotcp: remove low_latency sysctl
Florian Westphal [Sun, 30 Jul 2017 01:57:20 +0000 (03:57 +0200)]
tcp: remove low_latency sysctl

Was only checked by the removed prequeue code.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotcp: reindent two spots after prequeue removal
Florian Westphal [Sun, 30 Jul 2017 01:57:19 +0000 (03:57 +0200)]
tcp: reindent two spots after prequeue removal

These two branches are now always true, remove the conditional.
objdiff shows no changes.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotcp: remove prequeue support
Florian Westphal [Sun, 30 Jul 2017 01:57:18 +0000 (03:57 +0200)]
tcp: remove prequeue support

prequeue is a tcp receive optimization that moves part of rx processing
from bh to process context.

This only works if the socket being processed belongs to a process that
is blocked in recv on that socket.

In practice, this doesn't happen anymore that often because nowadays
servers tend to use an event driven (epoll) model.

Even normal client applications (web browsers) commonly use many tcp
connections in parallel.

This has measureable impact only in netperf (which uses plain recv and
thus allows prequeue use) from host to locally running vm (~4%), however,
there were no changes when using netperf between two physical hosts with
ixgbe interfaces.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'for-4.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj...
Linus Torvalds [Mon, 31 Jul 2017 21:03:05 +0000 (14:03 -0700)]
Merge branch 'for-4.13-fixes' of git://git./linux/kernel/git/tj/cgroup

Pull cgroup fixes from Tejun Heo:
 "Several cgroup bug fixes.

   - cgroup core was calling a migration callback on empty migrations,
     which could make cpuset crash.

   - There was a very subtle bug where the controller interface files
     aren't created directly when cgroup2 is mounted. Because later
     operations create them, this bug didn't get noticed earlier.

   - Failed writes to cgroup.subtree_control were incorrectly returning
     zero"

* 'for-4.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: fix error return value from cgroup_subtree_control()
  cgroup: create dfl_root files on subsys registration
  cgroup: don't call migration methods if there are no tasks to migrate

6 years agoMerge branch 'for-4.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Linus Torvalds [Mon, 31 Jul 2017 20:37:28 +0000 (13:37 -0700)]
Merge branch 'for-4.13-fixes' of git://git./linux/kernel/git/tj/wq

Pull workqueue fixes from Tejun Heo:
 "Two notable fixes.

   - While adding NUMA affinity support to unbound workqueues, the
     assumption that an unbound workqueue with max_active == 1 is
     ordered was broken.

     The plan was to use explicit alloc_ordered_workqueue() for those
     cases. Unfortunately, I forgot to update the documentation properly
     and we grew a handful of use cases which depend on that assumption.

     While we want to convert them to alloc_ordered_workqueue(), we
     don't really lose anything by enforcing ordered execution on
     unbound max_active == 1 workqueues and it doesn't make sense to
     risk subtle bugs. Restore the assumption.

   - Workqueue assumes that CPU <-> NUMA node mapping remains static.

     This is a general assumption - we don't have any synchronization
     mechanism around CPU <-> node mapping. Unfortunately, powerpc may
     change the mapping dynamically leading to crashes. Michael added a
     workaround so that we at least don't crash while powerpc hotplug
     code gets updated"

* 'for-4.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: Work around edge cases for calc of pool's cpumask
  workqueue: implicit ordered attribute should be overridable
  workqueue: restore WQ_UNBOUND/max_active==1 to be ordered

6 years agoMerge branch 'for-4.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj...
Linus Torvalds [Mon, 31 Jul 2017 20:33:21 +0000 (13:33 -0700)]
Merge branch 'for-4.13-fixes' of git://git./linux/kernel/git/tj/libata

Pull libata fixes from Tejun Heo:
 "Dan found a really old bug where libata hotplug code wasn't sanitizing
  index value from userland and may end up indexing with a negative
  number. It is scary but fortunately can only be triggered by root.

  Other than that, minor fixes"

* 'for-4.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
  libata: fix a couple of doc build warnings
  libata: array underflow in ata_find_dev()
  ata: sata_rcar: add gen[23] fallback compatibility strings
  libata: remove unused rc in ata_eh_handle_port_resume
  libata: Cleanup ata_read_log_page()
  ata: fix gemini Kconfig dependencies

6 years agolibata: fix a couple of doc build warnings
Jonathan Corbet [Sun, 30 Jul 2017 22:16:04 +0000 (16:16 -0600)]
libata: fix a couple of doc build warnings

The kerneldoc comments for a couple of functions in drivers/ata/libata-eh.c
had fallen behind the current implementation, resulting in these doc build
warnings:

  ./drivers/ata/libata-eh.c:1449: warning: No description found for parameter 'link'
  ./drivers/ata/libata-eh.c:1449: warning: Excess function parameter 'ap' description in 'ata_eh_done'
  ./drivers/ata/libata-eh.c:1590: warning: No description found for parameter 'qc'
  ./drivers/ata/libata-eh.c:1590: warning: Excess function parameter 'dev' description in 'ata_eh_request_sense'

Update the comments and make the warnings go away.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Tejun Heo <tj@kernel.org>
6 years agoBluetooth: hci_uart: Fix uninitialized alignment value
Loic Poulain [Sat, 29 Jul 2017 17:32:31 +0000 (19:32 +0200)]
Bluetooth: hci_uart: Fix uninitialized alignment value

Force alignment value to the default one (1 byte) if uninitialized.
This fixes hci_ll serdev driver (alignment = 0) and avoid any further
issues with upcoming drivers.

Signed-off-by: Loic Poulain <loic.poulain@gmail.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
6 years agoMerge branch 'net-sched-actions-improve-dump-performance'
David S. Miller [Mon, 31 Jul 2017 02:28:08 +0000 (19:28 -0700)]
Merge branch 'net-sched-actions-improve-dump-performance'

Jamal Hadi Salim says:

====================
net sched actions: improve dump performance

Changes since v11:
------------------
1) Jiri - renames: nla_value to value and nla_selector to selector
2) Jiri - rename: validate_nla_bitfield_32 to validate_nla_bitfield_32
3) Jiri - rename: NLA_BITFIELD_32 to NLA_BITFIELD32
4) Jiri - remove unnecessary break when we return in case statement
5) Jiri - rename and move nla_get_bitfield_32 to an earlier patch
6) Jiri - xmas tree alignment of var declaration
7) Jiri - rename all declarations of bitfield 32 vars to be consistent ("bf")
8) Jiri - improve validate_nla_bitfield32() validation to disallow valid
          bit values that are not selected by the selector

Changes since v10:
-----------------
1) Jiri: move type->validate_content() to its own patch
Jamal: decided to remove it altogether so we can get this patch set in.

2) Change name of NLA_FLAG_BITS to NLA_BITFIELD_32 based on discussions
with D. Ahern and Jiri. D. Ahern suggests to make this a variable bitmap size.
My analysis at this point is it too complex and i only need a few bit
flags. If we run out of bits someone else can create a new NLA_BITFIELD_XXX
and start using that. So please let this go.

3) Jamal - Add Suggested-by: Jiri for type NLA_BITFIELD_32

4) Jiri: Change name allowed_flags to tcaa_root_flags_allowed

5) Jiri: Introduce nla_get_flag_bits_values() helper instead of using
memcpy for retrieving nla_bitfield_32 fields.

Changes since v9:
-----------------

1) General consensus:
- remove again the use of BIT() to maintain uapi consistency ;->

1) Jiri:
- Add a new netlink type NLA_FLAG_BITS to check for valid bits
  and use it instead of inline vetting (patch 4/4 now)

Changes since v8:
-----------------

1) Jiri:
- Add back the use of BIT(). Eventually fix iproute2 instead
- Rename VALID_TCA_FLAGS to VALID_TCA_ROOT_FLAGS

Changes since v7:
-----------------

Jamal:
No changes.
Patch 1 went out twice. Resend without two copies of patch 1

changes since v6:
-----------------

1) DaveM:
New rules for netlink messages. From now on we are going to start
checking for bits that are not used and rejecting anything we dont
understand. In the future this is going to require major changes
to user space code (tc etc). This is just a start.

To quote, David:
"
 Again, bits you aren't using now, make sure userspace doesn't
   set them.  And if it does, reject.
"
Added checks for ensuring things work as above.

2) Jiri:
a)Fix the commit message to properly use "Fixes" description
b)Align assignments for nla_policy

Changes since v5:
----------------

0)
Remove use of BIT() because it is kernel specific. Requires a separate
patch (Jiri can submit that in his cleanups)

1)To paraphrase Eric D.

"memcpy(nla_data(count_attr), &cb->args[1], sizeof(u32));
wont work on 64bit BE machines because cb->args[1]
(which is 64 bit is larger in size than sizeof(u32))"

Fixed

2) Jiri Pirko

i) Spotted a bug fix mixed in the patch for wrong TLV
fix. Add patch 1/3 to address this. Make part of this
series because of dependencies.

ii) Rename ACT_LARGE_DUMP_ON -> TCA_FLAG_LARGE_DUMP_ON

iii) Satisfy Jiri's obsession against the noun "tcaa"
a)Rename struct nlattr *tcaa --> struct nlattr *tb
b)Rename TCAA_ACT_XXX -> TCA_ROOT_XXX

Changes since v4:
-----------------

1) Eric D.

pointed out that when all skb space is used up by the dump
there will be no space to insert the TCAA_ACT_COUNT attribute.

2) Jiri:

i) Change:

enum {
        TCAA_UNSPEC,
        TCAA_ACT_TAB,
        TCAA_ACT_FLAGS,
        TCAA_ACT_COUNT,
        TCAA_ACT_TIME_FILTER,
        __TCAA_MAX
};

to:
enum {
       TCAA_UNSPEC,
       TCAA_ACT_TAB,
       TCAA_ACT_FLAGS,
       TCAA_ACT_COUNT,
       __TCAA_MAX,
};

Jiri plans to followup with the rest of the code to make the
style consistent.

ii) Rename attribute TCAA_ACT_TIME_FILTER --> TCAA_ACT_TIME_DELTA

iii) Rename variable jiffy_filter --> jiffy_since
iv) Rename msecs_filter --> msecs_since
v) get rid of unused cb->args[0] and rename cb->args[4] to cb->args[0]

Earlier Changes
----------------
- Jiri mostly on names of things.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet sched actions: add time filter for action dumping
Jamal Hadi Salim [Sun, 30 Jul 2017 17:24:52 +0000 (13:24 -0400)]
net sched actions: add time filter for action dumping

This patch adds support for filtering based on time since last used.
When we are dumping a large number of actions it is useful to
have the option of filtering based on when the action was last
used to reduce the amount of data crossing to user space.

With this patch the user space app sets the TCA_ROOT_TIME_DELTA
attribute with the value in milliseconds with "time of interest
since now".  The kernel converts this to jiffies and does the
filtering comparison matching entries that have seen activity
since then and returns them to user space.
Old kernels and old tc continue to work in legacy mode since
they dont specify this attribute.

Some example (we have 400 actions bound to 400 filters); at
installation time. Using updated when tc setting the time of
interest to 120 seconds earlier (we see 400 actions):
prompt$ hackedtc actions ls action gact since 120000| grep index | wc -l
400

go get some coffee and wait for > 120 seconds and try again:

prompt$ hackedtc actions ls action gact since 120000 | grep index | wc -l
0

Lets see a filter bound to one of these actions:
....
filter pref 10 u32
filter pref 10 u32 fh 800: ht divisor 1
filter pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:10  (rule hit 2 success 1)
  match 7f000002/ffffffff at 12 (success 1 )
    action order 1: gact action pass
     random type none pass val 0
     index 23 ref 2 bind 1 installed 1145 sec used 802 sec
    Action statistics:
    Sent 84 bytes 1 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0
....

that coffee took long, no? It was good.

Now lets ping -c 1 127.0.0.2, then run the actions again:
prompt$ hackedtc actions ls action gact since 120 | grep index | wc -l
1

More details please:
prompt$ hackedtc -s actions ls action gact since 120000

    action order 0: gact action pass
     random type none pass val 0
     index 23 ref 2 bind 1 installed 1270 sec used 30 sec
    Action statistics:
    Sent 168 bytes 2 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0

And the filter?

filter pref 10 u32
filter pref 10 u32 fh 800: ht divisor 1
filter pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:10  (rule hit 4 success 2)
  match 7f000002/ffffffff at 12 (success 2 )
    action order 1: gact action pass
     random type none pass val 0
     index 23 ref 2 bind 1 installed 1324 sec used 84 sec
    Action statistics:
    Sent 168 bytes 2 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch
Jamal Hadi Salim [Sun, 30 Jul 2017 17:24:51 +0000 (13:24 -0400)]
net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch

When you dump hundreds of thousands of actions, getting only 32 per
dump batch even when the socket buffer and memory allocations allow
is inefficient.

With this change, the user will get as many as possibly fitting
within the given constraints available to the kernel.

The top level action TLV space is extended. An attribute
TCA_ROOT_FLAGS is used to carry flags; flag TCA_FLAG_LARGE_DUMP_ON
is set by the user indicating the user is capable of processing
these large dumps. Older user space which doesnt set this flag
doesnt get the large (than 32) batches.
The kernel uses the TCA_ROOT_COUNT attribute to tell the user how many
actions are put in a single batch. As such user space app knows how long
to iterate (independent of the type of action being dumped)
instead of hardcoded maximum of 32 thus maintaining backward compat.

Some results dumping 1.5M actions below:
first an unpatched tc which doesnt understand these features...

prompt$ time -p tc actions ls action gact | grep index | wc -l
1500000
real 1388.43
user 2.07
sys 1386.79

Now lets see a patched tc which sets the correct flags when requesting
a dump:

prompt$ time -p updatedtc actions ls action gact | grep index | wc -l
1500000
real 178.13
user 2.02
sys 176.96

That is about 8x performance improvement for tc app which sets its
receive buffer to about 32K.

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet sched actions: Use proper root attribute table for actions
Jamal Hadi Salim [Sun, 30 Jul 2017 17:24:50 +0000 (13:24 -0400)]
net sched actions: Use proper root attribute table for actions

Bug fix for an issue which has been around for about a decade.
We got away with it because the enumeration was larger than needed.

Fixes: 7ba699c604ab ("[NET_SCHED]: Convert actions from rtnetlink to new netlink API")
Suggested-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet netlink: Add new type NLA_BITFIELD32
Jamal Hadi Salim [Sun, 30 Jul 2017 17:24:49 +0000 (13:24 -0400)]
net netlink: Add new type NLA_BITFIELD32

Generic bitflags attribute content sent to the kernel by user.
With this netlink attr type the user can either set or unset a
flag in the kernel.

The value is a bitmap that defines the bit values being set
The selector is a bitmask that defines which value bit is to be
considered.

A check is made to ensure the rules that a kernel subsystem always
conforms to bitflags the kernel already knows about. i.e
if the user tries to set a bit flag that is not understood then
the _it will be rejected_.

In the most basic form, the user specifies the attribute policy as:
[ATTR_GOO] = { .type = NLA_BITFIELD32, .validation_data = &myvalidflags },

where myvalidflags is the bit mask of the flags the kernel understands.

If the user _does not_ provide myvalidflags then the attribute will
also be rejected.

Examples:
value = 0x0, and selector = 0x1
implies we are selecting bit 1 and we want to set its value to 0.

value = 0x2, and selector = 0x2
implies we are selecting bit 2 and we want to set its value to 1.

Suggested-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: fec: Allow reception of frames bigger than 1522 bytes
Andrew Lunn [Sun, 30 Jul 2017 17:36:05 +0000 (19:36 +0200)]
net: fec: Allow reception of frames bigger than 1522 bytes

The FEC Receive Control Register has a 14 bit field indicating the
longest frame that may be received. It is being set to 1522. Frames
longer than this are discarded, but counted as being in error.

When using DSA, frames from the switch has an additional header,
either 4 or 8 bytes if a Marvell switch is used. Thus a full MTU frame
of 1522 bytes received by the switch on a port becomes 1530 bytes when
passed to the host via the FEC interface.

Change the maximum receive size to 2048 - 64, where 64 is the maximum
rx_alignment applied on the receive buffer for AVB capable FEC
cores. Use this value also for the maximum receive buffer size. The
driver is already allocating a receive SKB of 2048 bytes, so this
change should not have any significant effects.

Tested on imx51, imx6, vf610.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: fec: Issue error for missing but expected PHY
Andrew Lunn [Sun, 30 Jul 2017 20:11:06 +0000 (22:11 +0200)]
net: fec: Issue error for missing but expected PHY

If the PHY is missing but expected, e.g. because of a typ0 in the dt
file, it is not possible to open the interface. ip link returns:

RTNETLINK answers: No such device

It is not very obvious what the problem is. Add a netdev_err() in this
case to make it easier to debug the issue.

[   21.409385] fec 2188000.ethernet eth0: Unable to connect to phy
RTNETLINK answers: No such device

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Fugang Duan <fugang.duan@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'dsa-lan9303-Fix-MDIO-issues'
David S. Miller [Mon, 31 Jul 2017 02:23:29 +0000 (19:23 -0700)]
Merge branch 'dsa-lan9303-Fix-MDIO-issues'

Egil Hjelmeland says:

====================
net: dsa: lan9303: Fix MDIO issues.

This series fix the MDIO interface for the lan9303 DSA driver.
Bugs found after testing on actual HW.

This series is extracted from the first patch of my first large
series. Significant changes from that version are:
 - use mdiobus_write_nested, mdiobus_read_nested.
 - EXPORT lan9303_indirect_phy_ops

Unfortunately I do not have access to i2c based system for
testing.

Changes from first version:
 - Change EXPORT_SYMBOL to EXPORT_SYMBOL_GPL
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: dsa: lan9303: MDIO access phy registers directly
Egil Hjelmeland [Sun, 30 Jul 2017 17:58:56 +0000 (19:58 +0200)]
net: dsa: lan9303: MDIO access phy registers directly

Indirect access (PMI) to phy register only work in I2C mode. In
MDIO mode phy registers must be accessed directly. Introduced
struct lan9303_phy_ops to handle the two modes.

Signed-off-by: Egil Hjelmeland <privat@egil-hjelmeland.no>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: dsa: lan9303: Renamed indirect phy access functions
Egil Hjelmeland [Sun, 30 Jul 2017 17:58:55 +0000 (19:58 +0200)]
net: dsa: lan9303: Renamed indirect phy access functions

Preparing for the following fix of MDIO phy access:

Renamed functions that access PHY 1 and 2 indirectly through PMI
registers.

 lan9303_port_phy_reg_wait_for_completion() to
 lan9303_indirect_phy_wait_for_completion()

 lan9303_port_phy_reg_read() to
 lan9303_indirect_phy_read()

 lan9303_port_phy_reg_write() to
 lan9303_indirect_phy_write()

Also changed "val" parameter of lan9303_indirect_phy_write() to u16,
for clarity.

Signed-off-by: Egil Hjelmeland <privat@egil-hjelmeland.no>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: dsa: lan9303: Multiply by 4 to get MDIO register
Egil Hjelmeland [Sun, 30 Jul 2017 17:58:54 +0000 (19:58 +0200)]
net: dsa: lan9303: Multiply by 4 to get MDIO register

lan9303_mdio_write()/_read() must multiply register number by 4 to get
offset.

Added some commments to the register definitions.

Signed-off-by: Egil Hjelmeland <privat@egil-hjelmeland.no>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: dsa: lan9303: Fix lan9303_detect_phy_setup() for MDIO
Egil Hjelmeland [Sun, 30 Jul 2017 17:58:53 +0000 (19:58 +0200)]
net: dsa: lan9303: Fix lan9303_detect_phy_setup() for MDIO

Handle that MDIO read with no response return 0xffff.

Signed-off-by: Egil Hjelmeland <privat@egil-hjelmeland.no>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoLinux 4.13-rc3
Linus Torvalds [Sun, 30 Jul 2017 19:40:36 +0000 (12:40 -0700)]
Linux 4.13-rc3

6 years agoMerge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 30 Jul 2017 19:19:35 +0000 (12:19 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:
 "A small set of x86 fixes:

   - prevent the kernel from using the EFI reboot method when EFI is
     disabled.

   - two patches addressing clang issues"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot: Disable the address-of-packed-member compiler warning
  x86/efi: Fix reboot_mode when EFI runtime services are disabled
  x86/boot: #undef memcpy() et al in string.c

6 years agoMerge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 30 Jul 2017 18:54:08 +0000 (11:54 -0700)]
Merge branch 'sched-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull scheduler fixes from Thomas Gleixner:
 "Two patches addressing build warnings caused by inconsistent kernel
  doc comments"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/wait: Clean up some documentation warnings
  sched/core: Fix some documentation build warnings

6 years agoMerge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 30 Jul 2017 18:52:15 +0000 (11:52 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull perf fixes from Thomas Gleixner:
 "A couple of fixes for performance counters and kprobes:

   - a series of small patches which make the uncore performance
     counters on Skylake server systems work correctly

   - add a missing instruction slot release to the failure path of
     kprobes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  kprobes/x86: Release insn_slot in failure path
  perf/x86/intel/uncore: Fix missing marker for skx_uncore_cha_extra_regs
  perf/x86/intel/uncore: Fix SKX CHA event extra regs
  perf/x86/intel/uncore: Remove invalid Skylake server CHA filter field
  perf/x86/intel/uncore: Fix Skylake server CHA LLC_LOOKUP event umask
  perf/x86/intel/uncore: Fix Skylake server PCU PMU event format
  perf/x86/intel/uncore: Fix Skylake UPI PMU event masks

6 years agoMerge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 30 Jul 2017 18:27:33 +0000 (11:27 -0700)]
Merge branch 'irq-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull irq fix from Thomas Gleixner:
 "Fix for a regression caused by the conversion of x86 to the generic
  hotplug code.

  Instead of doing a plain single line revert, this adds a pile of
  comments so the semantics of the force argument are clear"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq/cpuhotplug: Revert "Set force affinity flag on hotplug migration"

6 years agobpf: fix bpf_prog_get_info_by_fd to dump correct xlated_prog_len
Daniel Borkmann [Fri, 28 Jul 2017 15:05:25 +0000 (17:05 +0200)]
bpf: fix bpf_prog_get_info_by_fd to dump correct xlated_prog_len

bpf_prog_size(prog->len) is not the correct length we want to dump
back to user space. The code in bpf_prog_get_info_by_fd() uses this
to copy prog->insnsi to user space, but bpf_prog_size(prog->len) also
includes the size of struct bpf_prog itself plus program instructions
and is usually used either in context of accounting or for bpf_prog_alloc()
et al, thus we copy out of bounds in bpf_prog_get_info_by_fd()
potentially. Use the correct bpf_prog_insn_size() instead.

Fixes: 1e2709769086 ("bpf: Add BPF_OBJ_GET_INFO_BY_FD")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotcp: avoid bogus gcc-7 array-bounds warning
Arnd Bergmann [Fri, 28 Jul 2017 14:41:37 +0000 (16:41 +0200)]
tcp: avoid bogus gcc-7 array-bounds warning

When using CONFIG_UBSAN_SANITIZE_ALL, the TCP code produces a
false-positive warning:

net/ipv4/tcp_output.c: In function 'tcp_connect':
net/ipv4/tcp_output.c:2207:40: error: array subscript is below array bounds [-Werror=array-bounds]
   tp->chrono_stat[tp->chrono_type - 1] += now - tp->chrono_start;
                                        ^~
net/ipv4/tcp_output.c:2207:40: error: array subscript is below array bounds [-Werror=array-bounds]
   tp->chrono_stat[tp->chrono_type - 1] += now - tp->chrono_start;
   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~

I have opened a gcc bug for this, but distros have already shipped
compilers with this problem, and it's not clear yet whether there is
a way for gcc to avoid the warning. As the problem is related to the
bitfield access, this introduces a temporary variable to store the old
enum value.

I did not notice this warning earlier, since UBSAN is disabled when
building with COMPILE_TEST, and that was always turned on in both
allmodconfig and randconfig tests.

Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81601
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'ethtool-fec'
David S. Miller [Sun, 30 Jul 2017 06:23:45 +0000 (23:23 -0700)]
Merge branch 'ethtool-fec'

Roopa Prabhu says:

====================
ethtool: support for forward error correction mode setting on a link

Forward Error Correction (FEC) modes i.e Base-R
and Reed-Solomon modes are introduced in 25G/40G/100G standards
for providing good BER at high speeds. Various networking devices
which support 25G/40G/100G provides ability to manage supported FEC
modes and the lack of FEC encoding control and reporting today is a
source for interoperability issues for many vendors.
FEC capability as well as specific FEC mode i.e. Base-R
or RS modes can be requested or advertised through bits D44:47 of base link
codeword.

This patch set intends to provide option under ethtool to manage and
report FEC encoding settings for networking devices as per IEEE 802.3
bj, bm and by specs.

v2 :
        - minor patch format fixes and typos pointed out by Andrew
        - there was a pending discussion on the use of 'auto' vs
          'automatic' for fec settings. I have left it as 'auto'
          because in most cases today auto is used in place of
          automatic to represent automatically generated values.
          We use it in other networking config too. I would prefer
          leaving it as auto.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agocxgb4: ethtool forward error correction management support
Casey Leedom [Thu, 27 Jul 2017 23:47:28 +0000 (16:47 -0700)]
cxgb4: ethtool forward error correction management support

Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agocxgb4: core hardware/firmware support for Forward Error Correction on a link
Casey Leedom [Thu, 27 Jul 2017 23:47:27 +0000 (16:47 -0700)]
cxgb4: core hardware/firmware support for Forward Error Correction on a link

Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: ethtool: add support for forward error correction modes
Vidya Sagar Ravipati [Thu, 27 Jul 2017 23:47:26 +0000 (16:47 -0700)]
net: ethtool: add support for forward error correction modes

Forward Error Correction (FEC) modes i.e Base-R
and Reed-Solomon modes are introduced in 25G/40G/100G standards
for providing good BER at high speeds. Various networking devices
which support 25G/40G/100G provides ability to manage supported FEC
modes and the lack of FEC encoding control and reporting today is a
source for interoperability issues for many vendors.
FEC capability as well as specific FEC mode i.e. Base-R
or RS modes can be requested or advertised through bits D44:47 of
base link codeword.

This patch set intends to provide option under ethtool to manage
and report FEC encoding settings for networking devices as per
IEEE 802.3 bj, bm and by specs.

set-fec/show-fec option(s) are designed to provide control and
report the FEC encoding on the link.

SET FEC option:
root@tor: ethtool --set-fec  swp1 encoding [off | RS | BaseR | auto]

Encoding: Types of encoding
Off    :  Turning off any encoding
RS     :  enforcing RS-FEC encoding on supported speeds
BaseR  :  enforcing Base R encoding on supported speeds
Auto   :  IEEE defaults for the speed/medium combination

Here are a few examples of what we would expect if encoding=auto:
- if autoneg is on, we are  expecting FEC to be negotiated as on or off
  as long as protocol supports it
- if the hardware is capable of detecting the FEC encoding on it's
      receiver it will reconfigure its encoder to match
- in absence of the above, the configuration would be set to IEEE
  defaults.

>From our  understanding , this is essentially what most hardware/driver
combinations are doing today in the absence of a way for users to
control the behavior.

SHOW FEC option:
root@tor: ethtool --show-fec  swp1
FEC parameters for swp1:
Active FEC encodings: RS
Configured FEC encodings:  RS | BaseR

ETHTOOL DEVNAME output modification:

ethtool devname output:
root@tor:~# ethtool swp1
Settings for swp1:
root@hpe-7712-03:~# ethtool swp18
Settings for swp18:
    Supported ports: [ FIBRE ]
    Supported link modes:   40000baseCR4/Full
                            40000baseSR4/Full
                            40000baseLR4/Full
                            100000baseSR4/Full
                            100000baseCR4/Full
                            100000baseLR4_ER4/Full
    Supported pause frame use: No
    Supports auto-negotiation: Yes
    Supported FEC modes: [RS | BaseR | None | Not reported]
    Advertised link modes:  Not reported
    Advertised pause frame use: No
    Advertised auto-negotiation: No
    Advertised FEC modes: [RS | BaseR | None | Not reported]
<<<< One or more FEC modes
    Speed: 100000Mb/s
    Duplex: Full
    Port: FIBRE
    PHYAD: 106
    Transceiver: internal
    Auto-negotiation: off
    Link detected: yes

This patch includes following changes
a) New ETHTOOL_SFECPARAM/SFECPARAM API, handled by
  the new get_fecparam/set_fecparam callbacks, provides support
  for configuration of forward error correction modes.
b) Link mode bits for FEC modes i.e. None (No FEC mode), RS, BaseR/FC
  are defined so that users can configure these fec modes for supported
  and advertising fields as part of link autonegotiation.

Signed-off-by: Vidya Sagar Ravipati <vidya.chowdary@gmail.com>
Signed-off-by: Dustin Byford <dustin@cumulusnetworks.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge tag 'wireless-drivers-for-davem-2017-07-28' of git://git.kernel.org/pub/scm...
David S. Miller [Sat, 29 Jul 2017 22:30:08 +0000 (15:30 -0700)]
Merge tag 'wireless-drivers-for-davem-2017-07-28' of git://git./linux/kernel/git/kvalo/wireless-drivers

Kalle Valo says:

====================
wireless-drivers fixes for 4.13

Two fixes for for brcmfmac, the crash was reported by two people
already so it's a high priority fix.

brcmfmac

* fix a crash in skb headroom handling in v4.13-rc1
* fix a memory leak due to a merge error in v4.6
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'netvsc-minor-fixes-and-optimization'
David S. Miller [Sat, 29 Jul 2017 22:25:44 +0000 (15:25 -0700)]
Merge branch 'netvsc-minor-fixes-and-optimization'

Stephen Hemminger says:

====================
netvsc: minor fixes and optimization

This is a subset of earlier submission with a few more fixes
found during testing. The are two small optimizations, one is to
better manage the receive completion ring, and the other is removing
one unneeded level of indirection.

Will submit the improved VF support and buffer sizing in a later
patch so they get more review.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonetvsc: signal host if receive ring is emptied
stephen hemminger [Fri, 28 Jul 2017 15:59:47 +0000 (08:59 -0700)]
netvsc: signal host if receive ring is emptied

Latency improvement related to NAPI conversion.
If all packets are processed from receive ring then need
to signal host.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonetvsc: fix error unwind on device setup failure
stephen hemminger [Fri, 28 Jul 2017 15:59:46 +0000 (08:59 -0700)]
netvsc: fix error unwind on device setup failure

If setting receive buffer fails, the error unwind would cause
kernel panic because it was not correctly doing RCU and NAPI
unwind.  RCU'd pointer needs to be reset to NULL, and NAPI needs
to be disabled not deleted.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonetvsc: optimize receive completions
stephen hemminger [Fri, 28 Jul 2017 15:59:45 +0000 (08:59 -0700)]
netvsc: optimize receive completions

Optimize how receive completion ring are managed.
   * Allocate only as many slots as needed for all buffers from host
   * Allocate before setting up sub channel for better error detection
   * Don't need to keep copy of initial receive section message
   * Precompute the watermark for when receive flushing is needed
   * Replace division with conditional test
   * Replace atomic per-device variable with per-channel check.
   * Handle corner case where receive completion send
     fails if ring buffer to host is full.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonetvsc: remove unnecessary indirection of page_buffer
stephen hemminger [Fri, 28 Jul 2017 15:59:44 +0000 (08:59 -0700)]
netvsc: remove unnecessary indirection of page_buffer

The internal API was passing struct hv_page_buffer **
when only simple struct hv_page_buffer * was necessary
for passing an array.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonetvsc: don't print pointer value in error message
stephen hemminger [Fri, 28 Jul 2017 15:59:43 +0000 (08:59 -0700)]
netvsc: don't print pointer value in error message

Using %p to print pointer to packet meta-data doesn't give any
good info, and exposes kernel memory offsets.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonetvsc: fix warnings reported by lockdep
stephen hemminger [Fri, 28 Jul 2017 15:59:42 +0000 (08:59 -0700)]
netvsc: fix warnings reported by lockdep

This includes a bunch of fixups for issues reported by
lockdep.
   * ethtool routines can assume RTNL
   * send is done with RCU lock (and BH disable)
   * avoid refetching internal device struct (netvsc)
     instead pass it as a parameter.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonetvsc: fix return value for set_channels
stephen hemminger [Fri, 28 Jul 2017 15:59:41 +0000 (08:59 -0700)]
netvsc: fix return value for set_channels

The error and normal case got swapped.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: tc35815: fix spelling mistake: "Intterrupt" -> "Interrupt"
Colin Ian King [Thu, 27 Jul 2017 22:15:09 +0000 (23:15 +0100)]
net: tc35815: fix spelling mistake: "Intterrupt" -> "Interrupt"

Trivial fix to spelling mistake in printk message

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoliquidio: bump up driver version to match newer NIC firmware
Felix Manlunas [Thu, 27 Jul 2017 19:32:28 +0000 (12:32 -0700)]
liquidio: bump up driver version to match newer NIC firmware

Bump up driver version to match newer NIC firmware.  Also update
nic_rx_stats (a struct common to host driver and firmware) by adding a new
field:  fw_total_fwd_bytes.

Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agobpf: don't indicate success when copy_from_user fails
Daniel Borkmann [Thu, 27 Jul 2017 19:02:46 +0000 (21:02 +0200)]
bpf: don't indicate success when copy_from_user fails

err in bpf_prog_get_info_by_fd() still holds 0 at that time from prior
check_uarg_tail_zero() check. Explicitly return -EFAULT instead, so
user space can be notified of buggy behavior.

Fixes: 1e2709769086 ("bpf: Add BPF_OBJ_GET_INFO_BY_FD")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoudp6: fix socket leak on early demux
Paolo Abeni [Thu, 27 Jul 2017 12:45:09 +0000 (14:45 +0200)]
udp6: fix socket leak on early demux

When an early demuxed packet reaches __udp6_lib_lookup_skb(), the
sk reference is retrieved and used, but the relevant reference
count is leaked and the socket destructor is never called.
Beyond leaking the sk memory, if there are pending UDP packets
in the receive queue, even the related accounted memory is leaked.

In the long run, this will cause persistent forward allocation errors
and no UDP skbs (both ipv4 and ipv6) will be able to reach the
user-space.

Fix this by explicitly accessing the early demux reference before
the lookup, and properly decreasing the socket reference count
after usage.

Also drop the skb_steal_sock() in __udp6_lib_lookup_skb(), and
the now obsoleted comment about "socket cache".

The newly added code is derived from the current ipv4 code for the
similar path.

v1 -> v2:
  fixed the __udp6_lib_rcv() return code for resubmission,
  as suggested by Eric

Reported-by: Sam Edwards <CFSworks@gmail.com>
Reported-by: Marc Haber <mh+netdev@zugschlus.de>
Fixes: 5425077d73e0 ("net: ipv6: Add early demux handler for UDP unicast")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agobnxt_re: add MAY_USE_DEVLINK dependency
Sathya Perla [Thu, 27 Jul 2017 10:29:51 +0000 (06:29 -0400)]
bnxt_re: add MAY_USE_DEVLINK dependency

bnxt_en depends on MAY_USE_DEVLINK; this is used to force bnxt_en
to be =m when DEVLINK is =m.

Now, bnxt_re selects bnxt_en. Unless bnxt_re also explicitly calls
out dependency on MAY_USE_DEVLINK, Kconfig does not force bnxt_re
to be =m when DEVLINK is =m, causing the following error:

drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.o: In function
`bnxt_dl_register':
bnxt_vfr.c:(.text+0x1440): undefined reference to `devlink_alloc'
bnxt_vfr.c:(.text+0x14c0): undefined reference to `devlink_register'
bnxt_vfr.c:(.text+0x14e0): undefined reference to `devlink_free'
drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.o: In function
`bnxt_dl_unregister':
bnxt_vfr.c:(.text+0x1534): undefined reference to `devlink_unregister'
bnxt_vfr.c:(.text+0x153c): undefined reference to `devlink_free'

Fix this by adding MAY_USE_DEVLINK dependency in bnxt_re.

Fixes: 4ab0c6a8ffd7 ("bnxt_en: add support to enable VF-representors")
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Sathya Perla <sathya.perla@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: thunderx: Fix BGX transmit stall due to underflow
Sunil Goutham [Thu, 27 Jul 2017 07:23:04 +0000 (12:53 +0530)]
net: thunderx: Fix BGX transmit stall due to underflow

For SGMII/RGMII/QSGMII interfaces when physical link goes down
while traffic is high is resulting in underflow condition being set
on that specific BGX's LMAC. Which assets a backpresure and VNIC stops
transmitting packets.

This is due to BGX being disabled in link status change callback while
packet is in transit. This patch fixes this issue by not disabling BGX
but instead just disables packet Rx and Tx.

Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoRevert "vhost: cache used event for better performance"
Jason Wang [Thu, 27 Jul 2017 03:22:05 +0000 (11:22 +0800)]
Revert "vhost: cache used event for better performance"

This reverts commit 809ecb9bca6a9424ccd392d67e368160f8b76c92. Since it
was reported to break vhost_net. We want to cache used event and use
it to check for notification. The assumption was that guest won't move
the event idx back, but this could happen in fact when 16 bit index
wraps around after 64K entries.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agobpf: testing: fix devmap tests
John Fastabend [Thu, 27 Jul 2017 00:32:07 +0000 (17:32 -0700)]
bpf: testing: fix devmap tests

Apparently through one of my revisions of the initial patches
series I lost the devmap test. We can add more testing later but
for now lets fix the simple one we have.

Fixes: 546ac1ffb70d "bpf: add devmap, a map for storing net device references"
Reported-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'moxa-Fix-style-issues'
David S. Miller [Sat, 29 Jul 2017 21:02:07 +0000 (14:02 -0700)]
Merge branch 'moxa-Fix-style-issues'

SZ Lin says:

====================
net: moxa: Fix style issues

This patch set fixs the WARNINGs found by the checkpatch.pl tool
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: moxa: Add spaces preferred around that '{+,-}'
SZ Lin [Sat, 29 Jul 2017 10:42:39 +0000 (18:42 +0800)]
net: moxa: Add spaces preferred around that '{+,-}'

This patch fixes all checkpatch occurences of
"CHECK: spaces preferred around that '{+,-}' (ctx:VxV)"
in moxart_ether code.

Signed-off-by: SZ Lin <sz.lin@moxa.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: moxa: Fix for typo in comment to function moxart_mac_setup_desc_ring()
SZ Lin [Sat, 29 Jul 2017 10:42:38 +0000 (18:42 +0800)]
net: moxa: Fix for typo in comment to function moxart_mac_setup_desc_ring()

Signed-off-by: SZ Lin <sz.lin@moxa.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: moxa: Remove extra space after a cast
SZ Lin [Sat, 29 Jul 2017 10:42:37 +0000 (18:42 +0800)]
net: moxa: Remove extra space after a cast

No space is necessary after a cast
This warning is found using checkpatch.pl

Signed-off-by: SZ Lin <sz.lin@moxa.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: moxa: Fix comparison to NULL could be written with !
SZ Lin [Sat, 29 Jul 2017 10:42:36 +0000 (18:42 +0800)]
net: moxa: Fix comparison to NULL could be written with !

Fixed coding style for null comparisons in moxart_ether driver
to be more consistent with the rest of the kernel coding style

Signed-off-by: SZ Lin <sz.lin@moxa.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: moxa: Prefer 'unsigned int' to bare use of 'unsigned'
SZ Lin [Sat, 29 Jul 2017 10:42:35 +0000 (18:42 +0800)]
net: moxa: Prefer 'unsigned int' to bare use of 'unsigned'

Use 'unsigned int' instead of 'unsigned'
This warning is found using checkpatch.pl

Signed-off-by: SZ Lin <sz.lin@moxa.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: moxa: Remove braces from single-line body
SZ Lin [Sat, 29 Jul 2017 10:42:34 +0000 (18:42 +0800)]
net: moxa: Remove braces from single-line body

Remove unnecessary braces from single-line if statement
This warning is found using checkpatch.pl

Signed-off-by: SZ Lin <sz.lin@moxa.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge tag 'mlx5-fixes-2017-07-27-V2' of git://git.kernel.org/pub/scm/linux/kernel...
David S. Miller [Sat, 29 Jul 2017 18:26:45 +0000 (11:26 -0700)]
Merge tag 'mlx5-fixes-2017-07-27-V2' of git://git./linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
Mellanox, mlx5 fixes 2017-07-27

This series contains some misc fixes to the mlx5 driver.

Please pull and let me know if there's any problem.

V1->V2:
 - removed redundant braces

for -stable:
4.7
net/mlx5: Fix command bad flow on command entry allocation failure

4.9
net/mlx5: Consider tx_enabled in all modes on remap
net/mlx5e: Fix outer_header_zero() check size

4.10
net/mlx5: Fix mlx5_add_flow_rules call with correct num of dests

4.11
net/mlx5: Fix mlx5_ifc_mtpps_reg_bits structure size
net/mlx5e: Add field select to MTPPS register
net/mlx5e: Fix broken disable 1PPS flow
net/mlx5e: Change 1PPS out scheme
net/mlx5e: Add missing support for PTP_CLK_REQ_PPS request
net/mlx5e: Fix wrong delay calculation for overflow check scheduling
net/mlx5e: Schedule overflow check work to mlx5e workqueue

4.12
net/mlx5: Fix command completion after timeout access invalid structure
net/mlx5e: IPoIB, Modify add/remove underlay QPN flows

I hope this is not too much, but most of the patches do apply cleanly on -stable.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoteam: use a larger struct for mac address
WANG Cong [Wed, 26 Jul 2017 22:22:07 +0000 (15:22 -0700)]
team: use a larger struct for mac address

IPv6 tunnels use sizeof(struct in6_addr) as dev->addr_len,
but in many places especially bonding, we use struct sockaddr
to copy and set mac addr, this could lead to stack out-of-bounds
access.

Fix it by using a larger address storage like bonding.

Reported-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: check dev->addr_len for dev_set_mac_address()
WANG Cong [Wed, 26 Jul 2017 22:22:06 +0000 (15:22 -0700)]
net: check dev->addr_len for dev_set_mac_address()

Historically, dev_ifsioc() uses struct sockaddr as mac
address definition, this is why dev_set_mac_address()
accepts a struct sockaddr pointer as input but now we
have various types of mac addresse whose lengths
are up to MAX_ADDR_LEN, longer than struct sockaddr,
and saved in dev->addr_len.

It is too late to fix dev_ifsioc() due to API
compatibility, so just reject those larger than
sizeof(struct sockaddr), otherwise we would read
and use some random bytes from kernel stack.

Fortunately, only a few IPv6 tunnel devices have addr_len
larger than sizeof(struct sockaddr) and they don't support
ndo_set_mac_addr(). But with team driver, in lb mode, they
can still be enslaved to a team master and make its mac addr
length as the same.

Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'smc-get-rid-of-unsafe_global_rkey'
David S. Miller [Sat, 29 Jul 2017 18:22:58 +0000 (11:22 -0700)]
Merge branch 'smc-get-rid-of-unsafe_global_rkey'

Ursula Braun says:

====================
net/smc: get rid of unsafe_global_rkey

The smc code uses the unsafe_global_rkey, exposing all memory for
remote reads and writes once a connection is established.
Here is now a patch series to get rid of unsafe_global_rkey usage.
Main idea is to switch to SG-logic and separate memory regions for RMBs.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/smc: synchronize buffer usage with device
Ursula Braun [Fri, 28 Jul 2017 11:56:22 +0000 (13:56 +0200)]
net/smc: synchronize buffer usage with device

Usage of send buffer "sndbuf" is synced
(a) before filling sndbuf for cpu access
(b) after filling sndbuf for device access

Usage of receive buffer "RMB" is synced
(a) before reading RMB content for cpu access
(b) after reading RMB content for device access

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/smc: cleanup function __smc_buf_create()
Ursula Braun [Fri, 28 Jul 2017 11:56:21 +0000 (13:56 +0200)]
net/smc: cleanup function __smc_buf_create()

Split function __smc_buf_create() for better readability.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/smc: common functions for RMBs and send buffers
Ursula Braun [Fri, 28 Jul 2017 11:56:20 +0000 (13:56 +0200)]
net/smc: common functions for RMBs and send buffers

Creation and deletion of SMC receive and send buffers shares a high
amount of common code . This patch introduces common functions to get
rid of duplicate code.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/smc: introduce sg-logic for send buffers
Ursula Braun [Fri, 28 Jul 2017 11:56:19 +0000 (13:56 +0200)]
net/smc: introduce sg-logic for send buffers

SMC send buffers are processed the same way as RMBs. Since RMBs have
been converted to sg-logic, do the same for send buffers.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/smc: remove Kconfig warning
Ursula Braun [Fri, 28 Jul 2017 11:56:18 +0000 (13:56 +0200)]
net/smc: remove Kconfig warning

Now separate memory regions are created and registered for separate
RMBs. The unsafe_global_rkey of the protection domain is no longer
used. Thus the exposing memory warning can be removed.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/smc: register RMB-related memory region
Ursula Braun [Fri, 28 Jul 2017 11:56:17 +0000 (13:56 +0200)]
net/smc: register RMB-related memory region

A memory region created for a new RMB must be registered explicitly,
before the peer can make use of it for remote DMA transfer.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/smc: use separate memory regions for RMBs
Ursula Braun [Fri, 28 Jul 2017 11:56:16 +0000 (13:56 +0200)]
net/smc: use separate memory regions for RMBs

SMC currently uses the unsafe_global_rkey of the protection domain,
which exposes all memory for remote reads and writes once a connection
is established. This patch introduces separate memory regions with
separate rkeys for every RMB. Now the unsafe_global_rkey of the
protection domain is no longer needed.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/smc: introduce sg-logic for RMBs
Ursula Braun [Fri, 28 Jul 2017 11:56:15 +0000 (13:56 +0200)]
net/smc: introduce sg-logic for RMBs

The follow-on patch makes use of ib_map_mr_sg() when introducing
separate memory regions for RMBs. This function is based on
scatterlists; thus this patch introduces scatterlists for RMBs.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>