platform/kernel/linux-rpi.git
5 years agodevlink: Add devlink notifications support for port params
Vasundhara Volam [Mon, 28 Jan 2019 12:30:25 +0000 (18:00 +0530)]
devlink: Add devlink notifications support for port params

Add notification call for devlink port param set, register and unregister
functions.
Add devlink_port_param_value_changed() function to enable the driver notify
devlink on value change. Driver should use this function after value was
changed on any configuration mode part to driverinit.

v7->v8:
Order devlink_port_param_value_changed() definitions followed by
devlink_param_value_changed()

Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agodevlink: Add support for driverinit set value for devlink_port
Vasundhara Volam [Mon, 28 Jan 2019 12:30:24 +0000 (18:00 +0530)]
devlink: Add support for driverinit set value for devlink_port

Add support for "driverinit" configuration mode value for devlink_port
configuration parameters. Add devlink_port_param_driverinit_value_set()
function to help the driver set the value to devlink_port.

Also, move the common code to __devlink_param_driverinit_value_set()
to be used by both device and port params.

v7->v8:
Re-order the definitions as follows:
__devlink_param_driverinit_value_get
__devlink_param_driverinit_value_set
devlink_param_driverinit_value_get
devlink_param_driverinit_value_set
devlink_port_param_driverinit_value_get
devlink_port_param_driverinit_value_set

Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agodevlink: Add support for driverinit get value for devlink_port
Vasundhara Volam [Mon, 28 Jan 2019 12:30:23 +0000 (18:00 +0530)]
devlink: Add support for driverinit get value for devlink_port

Add support for "driverinit" configuration mode value for devlink_port
configuration parameters. Add devlink_port_param_driverinit_value_get()
function to help the driver get the value from devlink_port.

Also, move the common code to __devlink_param_driverinit_value_get()
to be used by both device and port params.

v7->v8:
-Add the missing devlink_port_param_driverinit_value_get() declaration.
-Also, order devlink_port_param_driverinit_value_get() after
devlink_param_driverinit_value_get/set() calls

Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agodevlink: Add port param set command
Vasundhara Volam [Mon, 28 Jan 2019 12:30:22 +0000 (18:00 +0530)]
devlink: Add port param set command

Add port param set command to set the value for a parameter.
Value can be set to any of the supported configuration modes.

v7->v8: Append "Acked-by: Jiri Pirko <jiri@mellanox.com>"

Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agodevlink: Add port param get command
Vasundhara Volam [Mon, 28 Jan 2019 12:30:21 +0000 (18:00 +0530)]
devlink: Add port param get command

Add port param get command which gets data per parameter.
It also has option to dump the parameters data per port.

v7->v8: Append "Acked-by: Jiri Pirko <jiri@mellanox.com>"

Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agodevlink: Add devlink_param for port register and unregister
Vasundhara Volam [Mon, 28 Jan 2019 12:30:20 +0000 (18:00 +0530)]
devlink: Add devlink_param for port register and unregister

Add functions to register and unregister for the driver supported
configuration parameters table per port.

v7->v8:
- Order the definitions following way as suggested by Jiri.
__devlink_params_register
__devlink_params_unregister
devlink_params_register
devlink_params_unregister
devlink_port_params_register
devlink_port_params_unregister
- Append with Acked-by: Jiri Pirko <jiri@mellanox.com>.

v2->v3:
- Add a helper __devlink_params_register() with common code used by
  both devlink_params_register() and devlink_port_params_register().

Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
David S. Miller [Wed, 30 Jan 2019 05:18:54 +0000 (21:18 -0800)]
Merge git://git./linux/kernel/git/davem/net

5 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Linus Torvalds [Wed, 30 Jan 2019 01:11:47 +0000 (17:11 -0800)]
Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

 1) Need to save away the IV across tls async operations, from Dave
    Watson.

 2) Upon successful packet processing, we should liberate the SKB with
    dev_consume_skb{_irq}(). From Yang Wei.

 3) Only apply RX hang workaround on effected macb chips, from Harini
    Katakam.

 4) Dummy netdev need a proper namespace assigned to them, from Josh
    Elsasser.

 5) Some paths of nft_compat run lockless now, and thus we need to use a
    proper refcnt_t. From Florian Westphal.

 6) Avoid deadlock in mlx5 by doing IRQ locking, from Moni Shoua.

 7) netrom does not refcount sockets properly wrt. timers, fix that by
    using the sock timer API. From Cong Wang.

 8) Fix locking of inexact inserts of xfrm policies, from Florian
    Westphal.

 9) Missing xfrm hash generation bump, also from Florian.

10) Missing of_node_put() in hns driver, from Yonglong Liu.

11) Fix DN_IFREQ_SIZE, from Johannes Berg.

12) ip6mr notifier is invoked during traversal of wrong table, from Nir
    Dotan.

13) TX promisc settings not performed correctly in qed, from Manish
    Chopra.

14) Fix OOB access in vhost, from Jason Wang.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (52 commits)
  MAINTAINERS: Add entry for XDP (eXpress Data Path)
  net: set default network namespace in init_dummy_netdev()
  net: b44: replace dev_kfree_skb_xxx by dev_consume_skb_xxx for drop profiles
  net: caif: call dev_consume_skb_any when skb xmit done
  net: 8139cp: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
  net: macb: Apply RXUBR workaround only to versions with errata
  net: ti: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
  net: apple: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
  net: amd8111e: replace dev_kfree_skb_irq by dev_consume_skb_irq
  net: alteon: replace dev_kfree_skb_irq by dev_consume_skb_irq
  net: tls: Fix deadlock in free_resources tx
  net: tls: Save iv in tls_rec for async crypto requests
  vhost: fix OOB in get_rx_bufs()
  qed: Fix stack out of bounds bug
  qed: Fix system crash in ll2 xmit
  qed: Fix VF probe failure while FLR
  qed: Fix LACP pdu drops for VFs
  qed: Fix bug in tx promiscuous mode settings
  net: i825xx: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
  netfilter: ipt_CLUSTERIP: fix warning unused variable cn
  ...

5 years agoMAINTAINERS: Add entry for XDP (eXpress Data Path)
Jesper Dangaard Brouer [Tue, 29 Jan 2019 13:22:33 +0000 (14:22 +0100)]
MAINTAINERS: Add entry for XDP (eXpress Data Path)

Add multiple people as maintainers for XDP, sorted alphabetically.

XDP is also tied to driver level support and code, but we cannot add all
drivers to the list. Instead K: and N: match on 'xdp' in hope to catch some
of those changes in drivers.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: set default network namespace in init_dummy_netdev()
Josh Elsasser [Sat, 26 Jan 2019 22:38:33 +0000 (14:38 -0800)]
net: set default network namespace in init_dummy_netdev()

Assign a default net namespace to netdevs created by init_dummy_netdev().
Fixes a NULL pointer dereference caused by busy-polling a socket bound to
an iwlwifi wireless device, which bumps the per-net BUSYPOLLRXPACKETS stat
if napi_poll() received packets:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000190
  IP: napi_busy_loop+0xd6/0x200
  Call Trace:
    sock_poll+0x5e/0x80
    do_sys_poll+0x324/0x5a0
    SyS_poll+0x6c/0xf0
    do_syscall_64+0x6b/0x1f0
    entry_SYSCALL_64_after_hwframe+0x3d/0xa2

Fixes: 7db6b048da3b ("net: Commonize busy polling code to focus on napi_id instead of socket")
Signed-off-by: Josh Elsasser <jelsasser@appneta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agocxgb4: cxgb4_tc_u32: use struct_size() in kvzalloc()
Gustavo A. R. Silva [Tue, 29 Jan 2019 17:05:23 +0000 (11:05 -0600)]
cxgb4: cxgb4_tc_u32: use struct_size() in kvzalloc()

One of the more common cases of allocation size calculations is finding the
size of a structure that has a zero-sized array at the end, along with memory
for some number of elements for that array. For example:

struct foo {
    int stuff;
    struct boo entry[];
};

instance = kvzalloc(sizeof(struct foo) + count * sizeof(struct boo), GFP_KERNEL);

Instead of leaving these open-coded and prone to type mistakes, we can now
use the new struct_size() helper:

instance = kvzalloc(struct_size(instance, entry, count), GFP_KERNEL);

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agocxgb4: clip_tbl: Use struct_size() in kvzalloc()
Gustavo A. R. Silva [Tue, 29 Jan 2019 16:44:44 +0000 (10:44 -0600)]
cxgb4: clip_tbl: Use struct_size() in kvzalloc()

One of the more common cases of allocation size calculations is finding the
size of a structure that has a zero-sized array at the end, along with memory
for some number of elements for that array. For example:

struct foo {
    int stuff;
    struct boo entry[];
};

instance = kvzalloc(sizeof(struct foo) + count * sizeof(struct boo), GFP_KERNEL);

Instead of leaving these open-coded and prone to type mistakes, we can now
use the new struct_size() helper:

instance = kvzalloc(struct_size(instance, entry, count), GFP_KERNEL);

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: b44: replace dev_kfree_skb_xxx by dev_consume_skb_xxx for drop profiles
Yang Wei [Tue, 29 Jan 2019 15:04:40 +0000 (23:04 +0800)]
net: b44: replace dev_kfree_skb_xxx by dev_consume_skb_xxx for drop profiles

The skb should be freed by dev_consume_skb_any() in b44_start_xmit()
when bounce_skb is used. The skb is be replaced by bounce_skb, so the
original skb should be consumed(not drop).

dev_consume_skb_irq() should be called in b44_tx() when skb xmit
done. It makes drop profiles(dropwatch, perf) more friendly.

Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: caif: call dev_consume_skb_any when skb xmit done
Yang Wei [Tue, 29 Jan 2019 15:32:22 +0000 (23:32 +0800)]
net: caif: call dev_consume_skb_any when skb xmit done

The skb shouled be consumed when xmit done, it makes drop profiles
(dropwatch, perf) more friendly.
dev_kfree_skb_irq()/kfree_skb() shouled be replaced by
dev_consume_skb_any(), it makes code cleaner.

Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: 8139cp: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
Yang Wei [Tue, 29 Jan 2019 14:40:51 +0000 (22:40 +0800)]
net: 8139cp: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles

dev_consume_skb_irq() should be called in cp_tx() when skb xmit
done. It makes drop profiles(dropwatch, perf) more friendly.

Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMAINTAINERS: update cxgb4 and cxgb3 maintainer
Arjun Vynipadath [Tue, 29 Jan 2019 10:08:42 +0000 (15:38 +0530)]
MAINTAINERS: update cxgb4 and cxgb3 maintainer

Vishal Kulkarni will be the new maintainer for Chelsio cxgb3/cxgb4
drivers.

Signed-off-by: Arjun Vynipadath <arjun@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agocxgb4vf: Update port information in cxgb4vf_open()
Arjun Vynipadath [Tue, 29 Jan 2019 09:50:19 +0000 (15:20 +0530)]
cxgb4vf: Update port information in cxgb4vf_open()

It's possible that the basic port information could have
changed since we first read it.

Signed-off-by: Arjun Vynipadath <arjun@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: macb: Apply RXUBR workaround only to versions with errata
Harini Katakam [Tue, 29 Jan 2019 09:50:03 +0000 (15:20 +0530)]
net: macb: Apply RXUBR workaround only to versions with errata

The interrupt handler contains a workaround for RX hang applicable
to Zynq and AT91RM9200 only. Subsequent versions do not need this
workaround. This workaround unnecessarily resets RX whenever RX used
bit read is observed, which can be often under heavy traffic. There
is no other action performed on RX UBR interrupt. Hence introduce a
CAPS mask; enable this interrupt and workaround only on affected
versions.

Signed-off-by: Harini Katakam <harini.katakam@xilinx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoliquidio: fix the validation of rx checksum status from NIC hardware
Veerasenareddy Burru [Mon, 28 Jan 2019 19:38:31 +0000 (19:38 +0000)]
liquidio: fix the validation of rx checksum status from NIC hardware

Fixed the code that was incorrectly interpreting the rx checksum validation
status from hardware, and updating kernel that the packet arrived with
correct checksum though the packet arrived with incorrect checksum and
hardware also indicated checksum is not correct.

Signed-off-by: Veerasenareddy Burru <vburru@marvell.com>
Acked-by: Derek Chickles <dchickles@marvell.com>
Signed-off-by: Felix Manlunas <fmanlunas@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ti: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
Yang Wei [Mon, 28 Jan 2019 23:40:10 +0000 (07:40 +0800)]
net: ti: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles

dev_consume_skb_irq() should be called in cpmac_end_xmit() when
xmit done. It makes drop profiles more friendly.

Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: apple: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
Yang Wei [Mon, 28 Jan 2019 23:39:13 +0000 (07:39 +0800)]
net: apple: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles

dev_consume_skb_irq() should be called in bmac_txdma_intr() when
xmit done. It makes drop profiles more friendly.

Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: amd8111e: replace dev_kfree_skb_irq by dev_consume_skb_irq
Yang Wei [Sun, 27 Jan 2019 15:58:25 +0000 (23:58 +0800)]
net: amd8111e: replace dev_kfree_skb_irq by dev_consume_skb_irq

dev_consume_skb_irq() should be called in amd8111e_tx() when xmit
done. It makes drop profiles more friendly.

Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: alteon: replace dev_kfree_skb_irq by dev_consume_skb_irq
Yang Wei [Sun, 27 Jan 2019 15:56:34 +0000 (23:56 +0800)]
net: alteon: replace dev_kfree_skb_irq by dev_consume_skb_irq

dev_consume_skb_irq() should be called in ace_tx_int() when xmit
done. It makes drop profiles more friendly.

Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: tls: Fix deadlock in free_resources tx
Dave Watson [Sun, 27 Jan 2019 00:59:03 +0000 (00:59 +0000)]
net: tls: Fix deadlock in free_resources tx

If there are outstanding async tx requests (when crypto returns EINPROGRESS),
there is a potential deadlock: the tx work acquires the lock, while we
cancel_delayed_work_sync() while holding the lock.  Drop the lock while waiting
for the work to complete.

Fixes: a42055e8d2c30 ("Add support for async encryption of records...")
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: tls: Save iv in tls_rec for async crypto requests
Dave Watson [Sun, 27 Jan 2019 00:57:38 +0000 (00:57 +0000)]
net: tls: Save iv in tls_rec for async crypto requests

aead_request_set_crypt takes an iv pointer, and we change the iv
soon after setting it.  Some async crypto algorithms don't save the iv,
so we need to save it in the tls_rec for async requests.

Found by hardcoding x64 aesni to use async crypto manager (to test the async
codepath), however I don't think this combination can happen in the wild.
Presumably other hardware offloads will need this fix, but there have been
no user reports.

Fixes: a42055e8d2c30 ("Add support for async encryption of records...")
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agovhost: fix OOB in get_rx_bufs()
Jason Wang [Mon, 28 Jan 2019 07:05:05 +0000 (15:05 +0800)]
vhost: fix OOB in get_rx_bufs()

After batched used ring updating was introduced in commit e2b3b35eb989
("vhost_net: batch used ring update in rx"). We tend to batch heads in
vq->heads for more than one packet. But the quota passed to
get_rx_bufs() was not correctly limited, which can result a OOB write
in vq->heads.

        headcount = get_rx_bufs(vq, vq->heads + nvq->done_idx,
                    vhost_len, &in, vq_log, &log,
                    likely(mergeable) ? UIO_MAXIOV : 1);

UIO_MAXIOV was still used which is wrong since we could have batched
used in vq->heads, this will cause OOB if the next buffer needs more
than 960 (1024 (UIO_MAXIOV) - 64 (VHOST_NET_BATCH)) heads after we've
batched 64 (VHOST_NET_BATCH) heads:
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
=============================================================================
BUG kmalloc-8k (Tainted: G    B            ): Redzone overwritten
-----------------------------------------------------------------------------

INFO: 0x00000000fd93b7a2-0x00000000f0713384. First byte 0xa9 instead of 0xcc
INFO: Allocated in alloc_pd+0x22/0x60 age=3933677 cpu=2 pid=2674
    kmem_cache_alloc_trace+0xbb/0x140
    alloc_pd+0x22/0x60
    gen8_ppgtt_create+0x11d/0x5f0
    i915_ppgtt_create+0x16/0x80
    i915_gem_create_context+0x248/0x390
    i915_gem_context_create_ioctl+0x4b/0xe0
    drm_ioctl_kernel+0xa5/0xf0
    drm_ioctl+0x2ed/0x3a0
    do_vfs_ioctl+0x9f/0x620
    ksys_ioctl+0x6b/0x80
    __x64_sys_ioctl+0x11/0x20
    do_syscall_64+0x43/0xf0
    entry_SYSCALL_64_after_hwframe+0x44/0xa9
INFO: Slab 0x00000000d13e87af objects=3 used=3 fp=0x          (null) flags=0x200000000010201
INFO: Object 0x0000000003278802 @offset=17064 fp=0x00000000e2e6652b

Fixing this by allocating UIO_MAXIOV + VHOST_NET_BATCH iovs for
vhost-net. This is done through set the limitation through
vhost_dev_init(), then set_owner can allocate the number of iov in a
per device manner.

This fixes CVE-2018-16880.

Fixes: e2b3b35eb989 ("vhost_net: batch used ring update in rx")
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoenetc: include linux/vmalloc.h for vzalloc etc
Stephen Rothwell [Tue, 29 Jan 2019 05:13:08 +0000 (16:13 +1100)]
enetc: include linux/vmalloc.h for vzalloc etc

Fixes: d4fd0404c1c9 ("enetc: Introduce basic PF and VF ENETC ethernet drivers")
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
David S. Miller [Tue, 29 Jan 2019 03:38:33 +0000 (19:38 -0800)]
Merge git://git./linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2019-01-29

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Teach verifier dead code removal, this also allows for optimizing /
   removing conditional branches around dead code and to shrink the
   resulting image. Code store constrained architectures like nfp would
   have hard time doing this at JIT level, from Jakub.

2) Add JMP32 instructions to BPF ISA in order to allow for optimizing
   code generation for 32-bit sub-registers. Evaluation shows that this
   can result in code reduction of ~5-20% compared to 64 bit-only code
   generation. Also add implementation for most JITs, from Jiong.

3) Add support for __int128 types in BTF which is also needed for
   vmlinux's BTF conversion to work, from Yonghong.

4) Add a new command to bpftool in order to dump a list of BPF-related
   parameters from the system or for a specific network device e.g. in
   terms of available prog/map types or helper functions, from Quentin.

5) Add AF_XDP sock_diag interface for querying sockets from user
   space which provides information about the RX/TX/fill/completion
   rings, umem, memory usage etc, from Björn.

6) Add skb context access for skb_shared_info->gso_segs field, from Eric.

7) Add support for testing flow dissector BPF programs by extending
   existing BPF_PROG_TEST_RUN infrastructure, from Stanislav.

8) Split BPF kselftest's test_verifier into various subgroups of tests
   in order better deal with merge conflicts in this area, from Jakub.

9) Add support for queue/stack manipulations in bpftool, from Stanislav.

10) Document BTF, from Yonghong.

11) Dump supported ELF section names in libbpf on program load
    failure, from Taeung.

12) Silence a false positive compiler warning in verifier's BTF
    handling, from Peter.

13) Fix help string in bpftool's feature probing, from Prashant.

14) Remove duplicate includes in BPF kselftests, from Yue.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
David S. Miller [Tue, 29 Jan 2019 01:34:38 +0000 (17:34 -0800)]
Merge git://git./linux/kernel/git/pablo/nf-next

Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

The following patchset contains Netfilter/IPVS updates for your net-next tree:

1) Introduce a hashtable to speed up object lookups, from Florian Westphal.

2) Make direct calls to built-in extension, also from Florian.

3) Call helper before confirming the conntrack as it used to be originally,
   from Florian.

4) Call request_module() to autoload br_netfilter when physdev is used
   to relax the dependency, also from Florian.

5) Allow to insert rules at a given position ID that is internal to the
   batch, from Phil Sutter.

6) Several patches to replace conntrack indirections by direct calls,
   and to reduce modularization, from Florian. This also includes
   several follow up patches to deal with minor fallout from this
   rework.

7) Use RCU from conntrack gre helper, from Florian.

8) GRE conntrack module becomes built-in into nf_conntrack, from Florian.

9) Replace nf_ct_invert_tuplepr() by calls to nf_ct_invert_tuple(),
   from Florian.

10) Unify sysctl handling at the core of nf_conntrack, from Florian.

11) Provide modparam to register conntrack hooks.

12) Allow to match on the interface kind string, from wenxu.

13) Remove several exported symbols, not required anymore now after
    a bit of de-modulatization work has been done, from Florian.

14) Remove built-in map support in the hash extension, this can be
    done with the existing userspace infrastructure, from laura.

15) Remove indirection to calculate checksums in IPVS, from Matteo Croce.

16) Use call wrappers for indirection in IPVS, also from Matteo.

17) Remove superfluous __percpu parameter in nft_counter, patch from
    Luc Van Oostenryck.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'bpf-flow-dissector-tests'
Daniel Borkmann [Tue, 29 Jan 2019 00:08:30 +0000 (01:08 +0100)]
Merge branch 'bpf-flow-dissector-tests'

Stanislav Fomichev says:

====================
This patch series adds support for testing flow dissector BPF programs
by extending already existing BPF_PROG_TEST_RUN. The goal is to have
a packet as an input and `struct bpf_flow_key' as an output. That way
we can easily test flow dissector programs' behavior. I've also modified
existing test_progs.c test to do a simple flow dissector run as well.

* first patch introduces new __skb_flow_bpf_dissect to simplify
  sharing between __skb_flow_bpf_dissect and BPF_PROG_TEST_RUN
* second patch adds actual BPF_PROG_TEST_RUN support
* third patch adds example usage to the selftests

v3:
* rebased on top of latest bpf-next

v2:
* loop over 'kattr->test.repeat' inside of
  bpf_prog_test_run_flow_dissector, don't reuse
  bpf_test_run/bpf_test_run_one
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agoselftests/bpf: add simple BPF_PROG_TEST_RUN examples for flow dissector
Stanislav Fomichev [Mon, 28 Jan 2019 16:53:55 +0000 (08:53 -0800)]
selftests/bpf: add simple BPF_PROG_TEST_RUN examples for flow dissector

Use existing pkt_v4 and pkt_v6 to make sure flow_keys are what we want.

Also, add new bpf_flow_load routine (and flow_dissector_load.h header)
that loads bpf_flow.o program and does all required setup.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agobpf: add BPF_PROG_TEST_RUN support for flow dissector
Stanislav Fomichev [Mon, 28 Jan 2019 16:53:54 +0000 (08:53 -0800)]
bpf: add BPF_PROG_TEST_RUN support for flow dissector

The input is packet data, the output is struct bpf_flow_key. This should
make it easy to test flow dissector programs without elaborate
setup.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agonet/flow_dissector: move bpf case into __skb_flow_bpf_dissect
Stanislav Fomichev [Mon, 28 Jan 2019 16:53:53 +0000 (08:53 -0800)]
net/flow_dissector: move bpf case into __skb_flow_bpf_dissect

This way, we can reuse it for flow dissector in BPF_PROG_TEST_RUN.

No functional changes.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agotools: bpftool: warn about risky prog array updates
Jakub Kicinski [Mon, 28 Jan 2019 18:29:15 +0000 (10:29 -0800)]
tools: bpftool: warn about risky prog array updates

When prog array is updated with bpftool users often refer
to the map via the ID.  Unfortunately, that's likely
to lead to confusion because prog arrays get flushed when
the last user reference is gone.  If there is no other
reference bpftool will create one, update successfully
just to close the map again and have it flushed.

Warn about this case in non-JSON mode.

If the problem continues causing confusion we can remove
the support for referring to a map by ID for prog array
update completely.  For now it seems like the potential
inconvenience to users who know what they're doing outweighs
the benefit.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agoselftests: bpf: remove duplicated include
YueHaibing [Fri, 25 Jan 2019 02:46:34 +0000 (10:46 +0800)]
selftests: bpf: remove duplicated include

Remove duplicated include.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agoMerge branch 'qed-Bug-fixes'
David S. Miller [Mon, 28 Jan 2019 19:13:35 +0000 (11:13 -0800)]
Merge branch 'qed-Bug-fixes'

Manish Chopra says:

====================
qed: Bug fixes

This series have SR-IOV and some general fixes.
Please consider applying it to "net"
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoqed: Fix stack out of bounds bug
Manish Chopra [Mon, 28 Jan 2019 18:05:08 +0000 (10:05 -0800)]
qed: Fix stack out of bounds bug

KASAN reported following bug in qed_init_qm_get_idx_from_flags
due to inappropriate casting of "pq_flags". Fix the type of "pq_flags".

[  196.624707] BUG: KASAN: stack-out-of-bounds in qed_init_qm_get_idx_from_flags+0x1a4/0x1b8 [qed]
[  196.624712] Read of size 8 at addr ffff809b00bc7360 by task kworker/0:9/1712
[  196.624714]
[  196.624720] CPU: 0 PID: 1712 Comm: kworker/0:9 Not tainted 4.18.0-60.el8.aarch64+debug #1
[  196.624723] Hardware name: To be filled by O.E.M. Saber/Saber, BIOS 0ACKL024 09/26/2018
[  196.624733] Workqueue: events work_for_cpu_fn
[  196.624738] Call trace:
[  196.624742]  dump_backtrace+0x0/0x2f8
[  196.624745]  show_stack+0x24/0x30
[  196.624749]  dump_stack+0xe0/0x11c
[  196.624755]  print_address_description+0x68/0x260
[  196.624759]  kasan_report+0x178/0x340
[  196.624762]  __asan_report_load_n_noabort+0x38/0x48
[  196.624786]  qed_init_qm_get_idx_from_flags+0x1a4/0x1b8 [qed]
[  196.624808]  qed_init_qm_info+0xec0/0x2200 [qed]
[  196.624830]  qed_resc_alloc+0x284/0x7e8 [qed]
[  196.624853]  qed_slowpath_start+0x6cc/0x1ae8 [qed]
[  196.624864]  __qede_probe.isra.10+0x1cc/0x12c0 [qede]
[  196.624874]  qede_probe+0x78/0xf0 [qede]
[  196.624879]  local_pci_probe+0xc4/0x180
[  196.624882]  work_for_cpu_fn+0x54/0x98
[  196.624885]  process_one_work+0x758/0x1900
[  196.624888]  worker_thread+0x4e0/0xd18
[  196.624892]  kthread+0x2c8/0x350
[  196.624897]  ret_from_fork+0x10/0x18
[  196.624899]
[  196.624902] Allocated by task 2:
[  196.624906]  kasan_kmalloc.part.1+0x40/0x108
[  196.624909]  kasan_kmalloc+0xb4/0xc8
[  196.624913]  kasan_slab_alloc+0x14/0x20
[  196.624916]  kmem_cache_alloc_node+0x1dc/0x480
[  196.624921]  copy_process.isra.1.part.2+0x1d8/0x4a98
[  196.624924]  _do_fork+0x150/0xfa0
[  196.624926]  kernel_thread+0x48/0x58
[  196.624930]  kthreadd+0x3a4/0x5a0
[  196.624932]  ret_from_fork+0x10/0x18
[  196.624934]
[  196.624937] Freed by task 0:
[  196.624938] (stack is not available)
[  196.624940]
[  196.624943] The buggy address belongs to the object at ffff809b00bc0000
[  196.624943]  which belongs to the cache thread_stack of size 32768
[  196.624946] The buggy address is located 29536 bytes inside of
[  196.624946]  32768-byte region [ffff809b00bc0000ffff809b00bc8000)
[  196.624948] The buggy address belongs to the page:
[  196.624952] page:ffff7fe026c02e00 count:1 mapcount:0 mapping:ffff809b4001c000 index:0x0 compound_mapcount: 0
[  196.624960] flags: 0xfffff8000008100(slab|head)
[  196.624967] raw: 0fffff8000008100 dead000000000100 dead000000000200 ffff809b4001c000
[  196.624970] raw: 0000000000000000 0000000000080008 00000001ffffffff 0000000000000000
[  196.624973] page dumped because: kasan: bad access detected
[  196.624974]
[  196.624976] Memory state around the buggy address:
[  196.624980]  ffff809b00bc7200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  196.624983]  ffff809b00bc7280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  196.624985] >ffff809b00bc7300: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 04 f2 f2 f2
[  196.624988]                                                        ^
[  196.624990]  ffff809b00bc7380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  196.624993]  ffff809b00bc7400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  196.624995] ==================================================================

Signed-off-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoqed: Fix system crash in ll2 xmit
Manish Chopra [Mon, 28 Jan 2019 18:05:07 +0000 (10:05 -0800)]
qed: Fix system crash in ll2 xmit

Cache number of fragments in the skb locally as in case
of linear skb (with zero fragments), tx completion
(or freeing of skb) may happen before driver tries
to get number of frgaments from the skb which could
lead to stale access to an already freed skb.

Signed-off-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoqed: Fix VF probe failure while FLR
Manish Chopra [Mon, 28 Jan 2019 18:05:06 +0000 (10:05 -0800)]
qed: Fix VF probe failure while FLR

VFs may hit VF-PF channel timeout while probing, as in some
cases it was observed that VF FLR and VF "acquire" message
transaction (i.e first message from VF to PF in VF's probe flow)
could occur simultaneously which could lead VF to fail sending
"acquire" message to PF as VF is marked disabled from HW perspective
due to FLR, which will result into channel timeout and VF probe failure.

In such cases, try retrying VF "acquire" message so that in later
attempts it could be successful to pass message to PF after the VF
FLR is completed and can be probed successfully.

Signed-off-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoqed: Fix LACP pdu drops for VFs
Manish Chopra [Mon, 28 Jan 2019 18:05:05 +0000 (10:05 -0800)]
qed: Fix LACP pdu drops for VFs

VF is always configured to drop control frames
(with reserved mac addresses) but to work LACP
on the VFs, it would require LACP control frames
to be forwarded or transmitted successfully.

This patch fixes this in such a way that trusted VFs
(marked through ndo_set_vf_trust) would be allowed to
pass the control frames such as LACP pdus.

Signed-off-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoqed: Fix bug in tx promiscuous mode settings
Manish Chopra [Mon, 28 Jan 2019 18:05:04 +0000 (10:05 -0800)]
qed: Fix bug in tx promiscuous mode settings

When running tx switched traffic between VNICs
created via a bridge(to which VFs are added),
adapter drops the unicast packets in tx flow due to
VNIC's ucast mac being unknown to it. But VF interfaces
being in promiscuous mode should have caused adapter
to accept all the unknown ucast packets. Later, it
was found that driver doesn't really configure tx
promiscuous mode settings to accept all unknown unicast macs.

This patch fixes tx promiscuous mode settings to accept all
unknown/unmatched unicast macs and works out the scenario.

Signed-off-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'qed-Error-recovery-process'
David S. Miller [Mon, 28 Jan 2019 18:58:41 +0000 (10:58 -0800)]
Merge branch 'qed-Error-recovery-process'

Michal Kalderon says:

====================
qed*: Error recovery process

Parity errors might happen in the device's memories due to momentary bit
flips which are caused by radiation.
Errors that are not correctable initiate a process kill event, which blocks
the device access towards the host and the network, and a recovery process
is started in the management FW and in the driver.

This series adds the support of this process in the qed core module and in
the qede driver (patches 2 & 3).
Patch 1 in the series revises the load sequence, to avoid PCI errors that
might be observed during a recovery process.

Changes in v2:
- Addressed issue found in https://patchwork.ozlabs.org/patch/1030545/
  The change was done be removing the enum and passing a boolean to
  the related functions.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoqede: Error recovery process
Tomer Tayar [Mon, 28 Jan 2019 17:27:56 +0000 (19:27 +0200)]
qede: Error recovery process

This patch adds the error recovery process in the qede driver.
The process includes a partial/customized driver unload and load, which
allows it to look like a short suspend period to the kernel while
preserving the net devices' state.

Signed-off-by: Tomer Tayar <tomer.tayar@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: Michal Kalderon <michal.kalderon@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoqed: Add infrastructure for error detection and recovery
Tomer Tayar [Mon, 28 Jan 2019 17:27:55 +0000 (19:27 +0200)]
qed: Add infrastructure for error detection and recovery

This patch adds the detection and handling of a parity error ("process kill
event"), including the update of the protocol drivers, and the prevention
of any HW access that will lead to device access towards the host while
recovery is in progress.
It also provides the means for the protocol drivers to trigger a recovery
process on their decision.

Signed-off-by: Tomer Tayar <tomer.tayar@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: Michal Kalderon <michal.kalderon@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoqed: Revise load sequence to avoid PCI errors
Tomer Tayar [Mon, 28 Jan 2019 17:27:54 +0000 (19:27 +0200)]
qed: Revise load sequence to avoid PCI errors

Initiating final cleanup after an ungraceful driver unload can lead to bad
PCI accesses towards the host.
This patch revises the load sequence so final cleanup is sent while the
internal master enable is cleared, to prevent the host accesses, and clears
the internal error indications just before enabling the internal master
enable.

Signed-off-by: Tomer Tayar <tomer.tayar@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: Michal Kalderon <michal.kalderon@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agobenet: remove broken and unused macro
Lubomir Rintel [Mon, 28 Jan 2019 16:17:40 +0000 (17:17 +0100)]
benet: remove broken and unused macro

is_broadcast_packet() expands to compare_ether_addr() which doesn't
exist since commit 7367d0b573d1 ("drivers/net: Convert uses of
compare_ether_addr to ether_addr_equal"). It turns out it's actually not
used.

Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: i825xx: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
Yang Wei [Mon, 28 Jan 2019 14:42:25 +0000 (22:42 +0800)]
net: i825xx: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles

dev_consume_skb_irq() should be called in i596_interrupt() when skb
xmit done. It makes drop profiles(dropwatch, perf) more friendly.

Signed-off-by: Yang Wei <albin_yang@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
David S. Miller [Mon, 28 Jan 2019 18:51:51 +0000 (10:51 -0800)]
Merge git://git./pub/scm/linux/kernel/git/pablo/nf

Pablo Neira Ayuso says:

====================
Netfilter/IPVS fixes for net

The following patchset contains Netfilter/IPVS fixes for your net tree:

1) The nftnl mutex is now per-netns, therefore use reference counter
   for matches and targets to deal with concurrent updates from netns.
   Moreover, place extensions in a pernet list. Patches from Florian Westphal.

2) Bail out with EINVAL in case of negative timeouts via setsockopt()
   through ip_vs_set_timeout(), from ZhangXiaoxu.

3) Spurious EINVAL on ebtables 32bit binary with 64bit kernel, also
   from Florian.

4) Reset TCP option header parser in case of fingerprint mismatch,
   otherwise follow up overlapping fingerprint definitions including
   TCP options do not work, from Fernando Fernandez Mancera.

5) Compilation warning in ipt_CLUSTER with CONFIG_PROC_FS unset.
   From Anders Roxell.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'mlxsw-Misc-updates'
David S. Miller [Mon, 28 Jan 2019 18:43:15 +0000 (10:43 -0800)]
Merge branch 'mlxsw-Misc-updates'

Ido Schimmel says:

====================
mlxsw: Misc updates

This patchset contains miscellaneous patches we gathered in our queue.
Some of them are dependencies of larger patchsets that I will submit
later this cycle.

Patches #1-#3 perform small non-functional changes in mlxsw.

Patch #4 adds more extended ack messages in mlxsw.

Patch #5 adds devlink parameters documentation for mlxsw. To be extended
with more parameters this cycle.

Patches #6-#7 perform small changes in forwarding selftests
infrastructure.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: forwarding: Use OK instead of PASS in test output
Ido Schimmel [Mon, 28 Jan 2019 12:02:13 +0000 (12:02 +0000)]
selftests: forwarding: Use OK instead of PASS in test output

It is easier to distinguish "[ OK ]" from "[FAIL]" than "[PASS]".

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Suggested-by: David Ahern <dsahern@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: net: forwarding: change devlink resource support checking
Jiri Pirko [Mon, 28 Jan 2019 12:02:12 +0000 (12:02 +0000)]
selftests: net: forwarding: change devlink resource support checking

As for the others, check help message output to find out if devlink
supports "resource" object.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoDocumentation: add devlink param file for mlxsw driver
Jiri Pirko [Mon, 28 Jan 2019 12:02:11 +0000 (12:02 +0000)]
Documentation: add devlink param file for mlxsw driver

Add initial documentation file for devlink params of mlxsw driver. Only
"fw_load_policy" is now supported.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum_switchdev: Add more extack messages
Ido Schimmel [Mon, 28 Jan 2019 12:02:10 +0000 (12:02 +0000)]
mlxsw: spectrum_switchdev: Add more extack messages

Add more extack messages that let the user know why VXLAN offload
failed.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Suggested-by: David Ahern <dsahern@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum_acl: Fix rul/rule typo
Jiri Pirko [Mon, 28 Jan 2019 12:02:09 +0000 (12:02 +0000)]
mlxsw: spectrum_acl: Fix rul/rule typo

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum_acl: Move mr_ruleset and mr_rule structs
Jiri Pirko [Mon, 28 Jan 2019 12:02:08 +0000 (12:02 +0000)]
mlxsw: spectrum_acl: Move mr_ruleset and mr_rule structs

Move the struct to the place where they belong, alongside with the rest
of the MR code.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum_acl: Remove unnecessary arg on action_replace call path
Jiri Pirko [Mon, 28 Jan 2019 12:02:07 +0000 (12:02 +0000)]
mlxsw: spectrum_acl: Remove unnecessary arg on action_replace call path

No need to pass ruleset/group and chunk pointers on action_replace call
path, nobody uses them.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoRevert "mm, memory_hotplug: initialize struct pages for the full memory section"
Michal Hocko [Fri, 25 Jan 2019 18:08:58 +0000 (19:08 +0100)]
Revert "mm, memory_hotplug: initialize struct pages for the full memory section"

This reverts commit 2830bf6f05fb3e05bc4743274b806c821807a684.

The underlying assumption that one sparse section belongs into a single
numa node doesn't hold really. Robert Shteynfeld has reported a boot
failure. The boot log was not captured but his memory layout is as
follows:

  Early memory node ranges
    node   1: [mem 0x0000000000001000-0x0000000000090fff]
    node   1: [mem 0x0000000000100000-0x00000000dbdf8fff]
    node   1: [mem 0x0000000100000000-0x0000001423ffffff]
    node   0: [mem 0x0000001424000000-0x0000002023ffffff]

This means that node0 starts in the middle of a memory section which is
also in node1.  memmap_init_zone tries to initialize padding of a
section even when it is outside of the given pfn range because there are
code paths (e.g.  memory hotplug) which assume that the full worth of
memory section is always initialized.

In this particular case, though, such a range is already intialized and
most likely already managed by the page allocator.  Scribbling over
those pages corrupts the internal state and likely blows up when any of
those pages gets used.

Reported-by: Robert Shteynfeld <robert.shteynfeld@gmail.com>
Fixes: 2830bf6f05fb ("mm, memory_hotplug: initialize struct pages for the full memory section")
Cc: stable@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agonetfilter: ipv4: remove useless export_symbol
Florian Westphal [Sun, 27 Jan 2019 18:18:57 +0000 (19:18 +0100)]
netfilter: ipv4: remove useless export_symbol

Only one caller; place it where needed and get rid of the EXPORT_SYMBOL.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
5 years agonetfilter: conntrack: fix error path in nf_conntrack_pernet_init()
Cong Wang [Wed, 23 Jan 2019 20:58:57 +0000 (12:58 -0800)]
netfilter: conntrack: fix error path in nf_conntrack_pernet_init()

When nf_ct_netns_get() fails, it should clean up itself,
its caller doesn't need to call nf_conntrack_fini_net().

nf_conntrack_init_net() is called after registering sysctl
and proc, so its cleanup function should be called before
unregistering sysctl and proc.

Fixes: ba3fbe663635 ("netfilter: nf_conntrack: provide modparam to always register conntrack hooks")
Fixes: b884fa461776 ("netfilter: conntrack: unify sysctl handling")
Reported-and-tested-by: syzbot+fcee88b2d87f0539dfe9@syzkaller.appspotmail.com
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
5 years agonetfilter: nft_counter: remove wrong __percpu of nft_counter_resest()'s arg
Luc Van Oostenryck [Sat, 19 Jan 2019 21:50:24 +0000 (22:50 +0100)]
netfilter: nft_counter: remove wrong __percpu of nft_counter_resest()'s arg

nft_counter_rest() has its first argument declared as
struct nft_counter_percpu_priv __percpu *priv
but this structure is not percpu (it only countains
a member 'counter' which is, correctly, a pointer to a
percpu struct nft_counter).

So, remove the '__percpu' from the argument's declaration.

Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
5 years agoipvs: use indirect call wrappers
Matteo Croce [Sat, 19 Jan 2019 14:25:35 +0000 (15:25 +0100)]
ipvs: use indirect call wrappers

Use the new indirect call wrappers in IPVS when calling the TCP or UDP
protocol specific functions.
This avoids an indirect calls in IPVS, and reduces the performance
impact of the Spectre mitigation.

Signed-off-by: Matteo Croce <mcroce@redhat.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
5 years agoipvs: avoid indirect calls when calculating checksums
Matteo Croce [Sat, 19 Jan 2019 14:22:38 +0000 (15:22 +0100)]
ipvs: avoid indirect calls when calculating checksums

The function pointer ip_vs_protocol->csum_check is only used in protocol
specific code, and never in the generic one.
Remove the function pointer from struct ip_vs_protocol and call the
checksum functions directly.
This reduces the performance impact of the Spectre mitigation, and
should give a small improvement even with RETPOLINES disabled.

Signed-off-by: Matteo Croce <mcroce@redhat.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
5 years agonetfilter: ipt_CLUSTERIP: fix warning unused variable cn
Anders Roxell [Wed, 23 Jan 2019 11:48:11 +0000 (12:48 +0100)]
netfilter: ipt_CLUSTERIP: fix warning unused variable cn

When CONFIG_PROC_FS isn't set the variable cn isn't used.

net/ipv4/netfilter/ipt_CLUSTERIP.c: In function â€˜clusterip_net_exit’:
net/ipv4/netfilter/ipt_CLUSTERIP.c:849:24: warning: unused variable â€˜cn’ [-Wunused-variable]
  struct clusterip_net *cn = clusterip_pernet(net);
                        ^~

Rework so the variable 'cn' is declared inside "#ifdef CONFIG_PROC_FS".

Fixes: b12f7bad5ad3 ("netfilter: ipt_CLUSTERIP: remove wrong WARN_ON_ONCE in netns exit routine")
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
5 years agonetfilter: nfnetlink_osf: add missing fmatch check
Fernando Fernandez Mancera [Mon, 21 Jan 2019 11:53:21 +0000 (12:53 +0100)]
netfilter: nfnetlink_osf: add missing fmatch check

When we check the tcp options of a packet and it doesn't match the current
fingerprint, the tcp packet option pointer must be restored to its initial
value in order to do the proper tcp options check for the next fingerprint.

Here we can see an example.
Assumming the following fingerprint base with two lines:

S10:64:1:60:M*,S,T,N,W6:      Linux:3.0::Linux 3.0
S20:64:1:60:M*,S,T,N,W7:      Linux:4.19:arch:Linux 4.1

Where TCP options are the last field in the OS signature, all of them overlap
except by the last one, ie. 'W6' versus 'W7'.

In case a packet for Linux 4.19 kicks in, the osf finds no matching because the
TCP options pointer is updated after checking for the TCP options in the first
line.

Therefore, reset pointer back to where it should be.

Fixes: 11eeef41d5f6 ("netfilter: passive OS fingerprint xtables match")
Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
5 years agonetfilter: ebtables: compat: un-break 32bit setsockopt when no rules are present
Florian Westphal [Mon, 21 Jan 2019 20:54:36 +0000 (21:54 +0100)]
netfilter: ebtables: compat: un-break 32bit setsockopt when no rules are present

Unlike ip(6)tables ebtables only counts user-defined chains.

The effect is that a 32bit ebtables binary on a 64bit kernel can do
'ebtables -N FOO' only after adding at least one rule, else the request
fails with -EINVAL.

This is a similar fix as done in
3f1e53abff84 ("netfilter: ebtables: don't attempt to allocate 0-sized compat array").

Fixes: 7d7d7e02111e9 ("netfilter: compat: reject huge allocation requests")
Reported-by: Francesco Ruggeri <fruggeri@arista.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
5 years agonet: dsa: mv88e6xxx: Fix serdes irq setup going recursive
Andrew Lunn [Sun, 27 Jan 2019 21:48:00 +0000 (22:48 +0100)]
net: dsa: mv88e6xxx: Fix serdes irq setup going recursive

Duec to a typo, mv88e6390_serdes_irq_setup() calls itself, rather than
mv88e6390x_serdes_irq_setup(). It then blows the stack, and shortly
after the machine blows up.

Fixes: 2defda1f4b91 ("net: dsa: mv88e6xxx: Add support for SERDES on ports 2-8 for 6390X")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoip6mr: Fix notifiers call on mroute_clean_tables()
Nir Dotan [Sun, 27 Jan 2019 07:26:22 +0000 (09:26 +0200)]
ip6mr: Fix notifiers call on mroute_clean_tables()

When the MC route socket is closed, mroute_clean_tables() is called to
cleanup existing routes. Mistakenly notifiers call was put on the cleanup
of the unresolved MC route entries cache.
In a case where the MC socket closes before an unresolved route expires,
the notifier call leads to a crash, caused by the driver trying to
increment a non initialized refcount_t object [1] and then when handling
is done, to decrement it [2]. This was detected by a test recently added in
commit 6d4efada3b82 ("selftests: forwarding: Add multicast routing test").

Fix that by putting notifiers call on the resolved entries traversal,
instead of on the unresolved entries traversal.

[1]

[  245.748967] refcount_t: increment on 0; use-after-free.
[  245.754829] WARNING: CPU: 3 PID: 3223 at lib/refcount.c:153 refcount_inc_checked+0x2b/0x30
...
[  245.802357] Hardware name: Mellanox Technologies Ltd. MSN2740/SA001237, BIOS 5.6.5 06/07/2016
[  245.811873] RIP: 0010:refcount_inc_checked+0x2b/0x30
...
[  245.907487] Call Trace:
[  245.910231]  mlxsw_sp_router_fib_event.cold.181+0x42/0x47 [mlxsw_spectrum]
[  245.917913]  notifier_call_chain+0x45/0x7
[  245.922484]  atomic_notifier_call_chain+0x15/0x20
[  245.927729]  call_fib_notifiers+0x15/0x30
[  245.932205]  mroute_clean_tables+0x372/0x3f
[  245.936971]  ip6mr_sk_done+0xb1/0xc0
[  245.940960]  ip6_mroute_setsockopt+0x1da/0x5f0
...

[2]

[  246.128487] refcount_t: underflow; use-after-free.
[  246.133859] WARNING: CPU: 0 PID: 7 at lib/refcount.c:187 refcount_sub_and_test_checked+0x4c/0x60
[  246.183521] Hardware name: Mellanox Technologies Ltd. MSN2740/SA001237, BIOS 5.6.5 06/07/2016
...
[  246.193062] Workqueue: mlxsw_core_ordered mlxsw_sp_router_fibmr_event_work [mlxsw_spectrum]
[  246.202394] RIP: 0010:refcount_sub_and_test_checked+0x4c/0x60
...
[  246.298889] Call Trace:
[  246.301617]  refcount_dec_and_test_checked+0x11/0x20
[  246.307170]  mlxsw_sp_router_fibmr_event_work.cold.196+0x47/0x78 [mlxsw_spectrum]
[  246.315531]  process_one_work+0x1fa/0x3f0
[  246.320005]  worker_thread+0x2f/0x3e0
[  246.324083]  kthread+0x118/0x130
[  246.327683]  ? wq_update_unbound_numa+0x1b0/0x1b0
[  246.332926]  ? kthread_park+0x80/0x80
[  246.337013]  ret_from_fork+0x1f/0x30

Fixes: 088aa3eec2ce ("ip6mr: Support fib notifications")
Signed-off-by: Nir Dotan <nird@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agodecnet: fix DN_IFREQ_SIZE
Johannes Berg [Sat, 26 Jan 2019 20:12:19 +0000 (21:12 +0100)]
decnet: fix DN_IFREQ_SIZE

Digging through the ioctls with Al because of the previous
patches, we found that on 64-bit decnet's dn_dev_ioctl()
is wrong, because struct ifreq::ifr_ifru is actually 24
bytes (not 16 as expected from struct sockaddr) due to the
ifru_map and ifru_settings members.

Clearly, decnet expects the ioctl to be called with a struct
like
  struct ifreq_dn {
    char ifr_name[IFNAMSIZ];
    struct sockaddr_dn ifr_addr;
  };

since it does
  struct ifreq *ifr = ...;
  struct sockaddr_dn *sdn = (struct sockaddr_dn *)&ifr->ifr_addr;

This means that DN_IFREQ_SIZE is too big for what it wants on
64-bit, as it is
  sizeof(struct ifreq) - sizeof(struct sockaddr) +
  sizeof(struct sockaddr_dn)

This assumes that sizeof(struct sockaddr) is the size of ifr_ifru
but that isn't true.

Fix this to use offsetof(struct ifreq, ifr_ifru).

This indeed doesn't really matter much - the result is that we
copy in/out 8 bytes more than we should on 64-bit platforms. In
case the "struct ifreq_dn" lands just on the end of a page though
it might lead to faults.

As far as I can tell, it has been like this forever, so it seems
very likely that nobody cares.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: stmmac: dwmac-rk: fix error handling in rk_gmac_powerup()
Alexey Khoroshilov [Sat, 26 Jan 2019 19:48:57 +0000 (22:48 +0300)]
net: stmmac: dwmac-rk: fix error handling in rk_gmac_powerup()

If phy_power_on() fails in rk_gmac_powerup(), clocks are left enabled.

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'hns-fixes'
David S. Miller [Mon, 28 Jan 2019 07:01:56 +0000 (23:01 -0800)]
Merge branch 'hns-fixes'

Peng Li says:

====================
net: hns: code optimizations & bugfixes for HNS driver

This patchset includes bugfixes and code optimizations for the HNS
ethernet controller driver
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns: Fix wrong read accesses via Clause 45 MDIO protocol
Yonglong Liu [Sat, 26 Jan 2019 09:18:27 +0000 (17:18 +0800)]
net: hns: Fix wrong read accesses via Clause 45 MDIO protocol

When reading phy registers via Clause 45 MDIO protocol, after write
address operation, the driver use another write address operation, so
can not read the right value of any phy registers. This patch fixes it.

Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns: Restart autoneg need return failed when autoneg off
Yonglong Liu [Sat, 26 Jan 2019 09:18:26 +0000 (17:18 +0800)]
net: hns: Restart autoneg need return failed when autoneg off

The hns driver of earlier devices, when autoneg off, restart autoneg
will return -EINVAL, so make the hns driver for the latest devices
do the same.

Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns: Fix for missing of_node_put() after of_parse_phandle()
Yonglong Liu [Sat, 26 Jan 2019 09:18:25 +0000 (17:18 +0800)]
net: hns: Fix for missing of_node_put() after of_parse_phandle()

In hns enet driver, we use of_parse_handle() to get hold of the
device node related to "ae-handle" but we have missed to put
the node reference using of_node_put() after we are done using
the node. This patch fixes it.

Note:
Link: https://lkml.org/lkml/2018/12/22/217
Fixes: 48189d6aaf1e ("net: hns: enet specifies a reference to dsaf")
Reported-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'split-test_verifier'
Alexei Starovoitov [Mon, 28 Jan 2019 05:37:46 +0000 (21:37 -0800)]
Merge branch 'split-test_verifier'

Jakub Kicinski says:

====================
The tools/testing/selftests/bpf/test_verifier.c file is
way too large, and since most people add their at the
end of the list it's very prone to conflicts.

Break it up in the simplest possible way - slice the
array up into smaller C files and include them in the
right spot.

Tested:
$ make -C tools/testing/selftests/bpf/
$ cd tools/testing/selftests/bpf/ ; make

v2:

The indentation is reduced further as discussed and lines folded.
The conversion was scripted, and double checked by hand.
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
5 years agoselftests: bpf: break up the rest of test_verifier
Jakub Kicinski [Fri, 25 Jan 2019 23:24:44 +0000 (15:24 -0800)]
selftests: bpf: break up the rest of test_verifier

Break up the rest of test_verifier tests into separate
files.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
5 years agoselftests: bpf: break up test_verifier
Jakub Kicinski [Fri, 25 Jan 2019 23:24:43 +0000 (15:24 -0800)]
selftests: bpf: break up test_verifier

Break up the first 10 kLoC of test verifier test cases
out into smaller files.  Looks like git line counting
gets a little flismy above 16 bit integers, so we need
two commits to break up test_verifier.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
5 years agoselftests: bpf: prepare for break up of verifier tests
Jakub Kicinski [Fri, 25 Jan 2019 23:24:42 +0000 (15:24 -0800)]
selftests: bpf: prepare for break up of verifier tests

test_verifier.c has grown to be very long (almost 16 kLoC),
and it is very conflict prone since we always add tests at
the end.

Try to break it apart a little bit.  Allow test snippets
to be defined in separate files and include them automatically
into the huge test array.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
5 years agoLinux 5.0-rc4
Linus Torvalds [Sun, 27 Jan 2019 23:18:05 +0000 (15:18 -0800)]
Linux 5.0-rc4

5 years agoMerge branch 'tcp-change-pingpong-to-3-in-delayed-ack-logic'
David S. Miller [Sun, 27 Jan 2019 21:29:43 +0000 (13:29 -0800)]
Merge branch 'tcp-change-pingpong-to-3-in-delayed-ack-logic'

Wei Wang says:

====================
tcp: change pingpong to 3 in delayed ack logic

TCP receiver today tries not to delay the ACKs to speed up the initial
slow start (a.k.a QUICK ACK mechanism). However the previous design
does not work well with modern TCP applications that starts with an
application-level handshake. For example, a HTTPs server often
receives the SSL hello and responds right away which triggers the TCP
stack to stop the quick ack and start delaying the ACKs based only one
instance of ping-pong. This patchset changes the threshold from 1 to 3
ping-pong transactions, so that we only start to delay the acks after
the receiver responds data quickly three times.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotcp: change pingpong threshold to 3
Wei Wang [Fri, 25 Jan 2019 18:53:20 +0000 (10:53 -0800)]
tcp: change pingpong threshold to 3

In order to be more confident about an on-going interactive session, we
increment pingpong count by 1 for every interactive transaction and we
adjust TCP_PINGPONG_THRESH to 3.
This means, we only consider a session in pingpong mode after we see 3
interactive transactions, and start to activate delayed acks in quick
ack mode.
And in order to not over-count the credits, we only increase pingpong
count for the first packet sent in response for the previous received
packet.
This is mainly to prevent delaying the ack immediately after some
handshake protocol but no real interactive traffic pattern afterwards.

Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotcp: Refactor pingpong code
Wei Wang [Fri, 25 Jan 2019 18:53:19 +0000 (10:53 -0800)]
tcp: Refactor pingpong code

Instead of using pingpong as a single bit information, we refactor the
code to treat it as a counter. When interactive session is detected,
we set pingpong count to TCP_PINGPONG_THRESH. And when pingpong count
is >= TCP_PINGPONG_THRESH, we consider the session in pingpong mode.

This patch is a pure refactor and sets foundation for the next patch.
This patch itself does not change any pingpong logic.

Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ipv4: ip_input: fix blank line coding style issues
Yang Wei [Fri, 25 Jan 2019 14:41:50 +0000 (22:41 +0800)]
net: ipv4: ip_input: fix blank line coding style issues

Fix blank line coding style issues, make the code cleaner.
Remove a redundant blank line in ip_rcv_core().
Insert a blank line in ip_rcv() between different statement blocks.

Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: at803x: Use helpers to access MMD PHY registers
Carlo Caione [Fri, 25 Jan 2019 12:35:10 +0000 (12:35 +0000)]
net: phy: at803x: Use helpers to access MMD PHY registers

Libphy provides a standard set of helpers to access the MMD PHY
registers. Use those instead of relying on custom driver-specific
functions.

Signed-off-by: Carlo Caione <ccaione@baylibre.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 27 Jan 2019 20:02:00 +0000 (12:02 -0800)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:
 "A set of fixes for x86:

   - Fix the swapped outb() parameters in the KASLR code

   - Fix the PKEY handling at fork which missed to preserve the pkey
     state for the child. Comes with a test case to validate that.

   - Fix the entry stack handling for XEN PV to respect that XEN PV
     systems enter the function already on the current thread stack and
     not on the trampoline.

   - Fix kexec load failure caused by using a stale value when the
     kexec_buf structure is reused for subsequent allocations.

   - Fix a bogus sizeof() in the memory encryption code

   - Enforce PCI dependency for the Intel Low Power Subsystem

   - Enforce PCI_LOCKLESS_CONFIG when PCI is enabled"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/Kconfig: Select PCI_LOCKLESS_CONFIG if PCI is enabled
  x86/entry/64/compat: Fix stack switching for XEN PV
  x86/kexec: Fix a kexec_file_load() failure
  x86/mm/mem_encrypt: Fix erroneous sizeof()
  x86/selftests/pkeys: Fork() to check for state being preserved
  x86/pkeys: Properly copy pkey state at fork()
  x86/kaslr: Fix incorrect i8254 outb() parameters
  x86/intel/lpss: Make PCI dependency explicit

5 years agoMerge branch 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 27 Jan 2019 19:57:46 +0000 (11:57 -0800)]
Merge branch 'x86-timers-for-linus' of git://git./linux/kernel/git/tip/tip

Pull x86 timer fixes from Thomas Gleixner:
 "Two commits which were missed to be sent during the merge window.

   - The TSC calibration fix turns out to be more urgent as recent
     Skylake-X systems seem to have massive trouble with calibration
     disturbance. This should go back into stable for that reason and it
     the risk of breakage is rather low.

   - Drop an unused define"

* 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/hpet: Remove unused FSEC_PER_NSEC define
  x86/tsc: Make calibration refinement more robust

5 years agoMerge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 27 Jan 2019 19:55:06 +0000 (11:55 -0800)]
Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull timer fix from Thomas Glexiner:
 "A single regression fix to address the unintended breakage of posix
  cpu timers.

  This is caused by a new sanity check in the common code, which fails
  for posix cpu timers under certain conditions because the posix cpu
  timer code never updates the variable which is checked"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  posix-cpu-timers: Unbreak timer rearming

5 years agoMerge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 27 Jan 2019 19:52:50 +0000 (11:52 -0800)]
Merge branch 'locking-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull locking fixes from Thomas Gleixner:
 "A small series of fixes which all address possible missed wakeups:

   - Document and fix the wakeup ordering of wake_q

   - Add the missing barrier in rcuwait_wake_up(), which was documented
     in the comment but missing in the code

   - Fix the possible missed wakeups in the rwsem and futex code"

* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/rwsem: Fix (possible) missed wakeup
  futex: Fix (possible) missed wakeup
  sched/wake_q: Fix wakeup ordering for wake_q
  sched/wake_q: Document wake_q_add()
  sched/wait: Fix rcuwait_wake_up() ordering

5 years agoMerge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 27 Jan 2019 19:25:38 +0000 (11:25 -0800)]
Merge branch 'irq-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull irq fixes from Thomas Gleixner:
 "A small set of fixes for the interrupt subsystem:

   - Fix a double increment in the irq descriptor allocator which
     resulted in a sanity check only being done for every second
     affinity mask

   - Add a missing device tree translation in the stm32-exti driver.
     Without that the interrupt association is completely wrong.

   - Initialize the mutex in the GIC-V3 MBI driver

   - Fix the alignment for aliasing devices in the GIC-V3-ITS driver so
     multi MSI allocations work correctly

   - Ensure that the initial affinity of a interrupt is not empty at
     startup time.

   - Drop bogus include in the madera irq chip driver

   - Fix KernelDoc regression"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/gic-v3-its: Align PCI Multi-MSI allocation on their size
  genirq/irqdesc: Fix double increment in alloc_descs()
  genirq: Fix the kerneldoc comment for struct irq_affinity_desc
  irqchip/madera: Drop GPIO includes
  irqchip/gic-v3-mbi: Fix uninitialized mbi_lock
  irqchip/stm32-exti: Add domain translate function
  genirq: Make sure the initial affinity is not empty

5 years agoMerge tag 'mlx5-fixes-2019-01-25' of git://git.kernel.org/pub/scm/linux/kernel/git...
David S. Miller [Sun, 27 Jan 2019 19:06:45 +0000 (11:06 -0800)]
Merge tag 'mlx5-fixes-2019-01-25' of git://git./linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
Mellanox, mlx5 fixes 2019-01-25

This series introduces some fixes to mlx5 driver.
For more information please see tag log below.

Please pull and let me know if there is any problem.

For -stable v4.13
('net/mlx5e: Allow MAC invalidation while spoofchk is ON')

For -stable v4.18
('Revert "net/mlx5e: E-Switch, Initialize eswitch only if eswitch manager"')
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge tag 'edac_fix_for_5.0' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
Linus Torvalds [Sun, 27 Jan 2019 19:00:37 +0000 (11:00 -0800)]
Merge tag 'edac_fix_for_5.0' of git://git./linux/kernel/git/bp/bp

Pull EDAC fix from Borislav Petkov:
 "Fix persistent register offsets of altera_edac, from Thor Thayer"

* tag 'edac_fix_for_5.0' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
  EDAC, altera: Fix S10 persistent register offset

5 years agoMerge tag 'for-linus-20190127' of git://git.kernel.dk/linux-block
Linus Torvalds [Sun, 27 Jan 2019 18:58:20 +0000 (10:58 -0800)]
Merge tag 'for-linus-20190127' of git://git.kernel.dk/linux-block

Pull block revert from Jens Axboe:
 "Silly error snuck into a patch from the last series, let's do a revert
  to avoid a potential use-after-free"

* tag 'for-linus-20190127' of git://git.kernel.dk/linux-block:
  Revert "block: cover another queue enter recursion via BIO_QUEUE_ENTERED"

5 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
David S. Miller [Sun, 27 Jan 2019 18:43:17 +0000 (10:43 -0800)]
Merge git://git./linux/kernel/git/davem/net

5 years agonet/rose: fix NULL ax25_cb kernel panic
Bernard Pidoux [Fri, 25 Jan 2019 10:46:40 +0000 (11:46 +0100)]
net/rose: fix NULL ax25_cb kernel panic

When an internally generated frame is handled by rose_xmit(),
rose_route_frame() is called:

        if (!rose_route_frame(skb, NULL)) {
                dev_kfree_skb(skb);
                stats->tx_errors++;
                return NETDEV_TX_OK;
        }

We have the same code sequence in Net/Rom where an internally generated
frame is handled by nr_xmit() calling nr_route_frame(skb, NULL).
However, in this function NULL argument is tested while it is not in
rose_route_frame().
Then kernel panic occurs later on when calling ax25cmp() with a NULL
ax25_cb argument as reported many times and recently with syzbot.

We need to test if ax25 is NULL before using it.

Testing:
Built kernel with CONFIG_ROSE=y.

Signed-off-by: Bernard Pidoux <f6bvp@free.fr>
Acked-by: Dmitry Vyukov <dvyukov@google.com>
Reported-by: syzbot+1a2c456a1ea08fa5b5f7@syzkaller.appspotmail.com
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Bernard Pidoux <f6bvp@free.fr>
Cc: linux-hams@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: altera_tse: fix msgdma_tx_completion on non-zero fill_level case
Tomonori Sakita [Fri, 25 Jan 2019 02:02:22 +0000 (11:02 +0900)]
net: altera_tse: fix msgdma_tx_completion on non-zero fill_level case

If fill_level was not zero and status was not BUSY,
result of "tx_prod - tx_cons - inuse" might be zero.
Subtracting 1 unconditionally results invalid negative return value
on this case.
Make sure not to return an negative value.

Signed-off-by: Tomonori Sakita <tomonori.sakita@sord.co.jp>
Signed-off-by: Atsushi Nemoto <atsushi.nemoto@sord.co.jp>
Reviewed-by: Dalon L Westergreen <dalon.westergreen@linux.intel.com>
Acked-by: Thor Thayer <thor.thayer@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonetrom: switch to sock timer API
Cong Wang [Thu, 24 Jan 2019 22:18:18 +0000 (14:18 -0800)]
netrom: switch to sock timer API

sk_reset_timer() and sk_stop_timer() properly handle
sock refcnt for timer function. Switching to them
could fix a refcounting bug reported by syzbot.

Reported-and-tested-by: syzbot+defa700d16f1bd1b9a05@syzkaller.appspotmail.com
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-hams@vger.kernel.org
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
David S. Miller [Sun, 27 Jan 2019 18:30:01 +0000 (10:30 -0800)]
Merge branch 'master' of git://git./linux/kernel/git/klassert/ipsec

Steffen Klassert says:

====================
pull request (net): ipsec 2019-01-25

1) Several patches to fix the fallout from the recent
   tree based policy lookup work. From Florian Westphal.

2) Fix VTI for IPCOMP for 'not compressed' IPCOMP packets.
   We need an extra IPIP handler to process these packets
   correctly. From Su Yanjun.

3) Fix validation of template and selector families for
   MODE_ROUTEOPTIMIZATION with ipv4-in-ipv6 packets.
   This can lead to a stack-out-of-bounds because
   flowi4 struct is treated as flowi6 struct.
   Fix from Florian Westphal.

4) Restore the default behaviour of the xfrm set-mark
   in the output path. This was changed accidentally
   when mark setting was extended to the input path.
   From Benedict Wong.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Linus Torvalds [Sun, 27 Jan 2019 17:21:00 +0000 (09:21 -0800)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "Quite a few fixes for x86: nested virtualization save/restore, AMD
  nested virtualization and virtual APIC, 32-bit fixes, an important fix
  to restore operation on older processors, and a bunch of hyper-v
  bugfixes. Several are marked stable.

  There are also fixes for GCC warnings and for a GCC/objtool interaction"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: Mark expected switch fall-throughs
  KVM: x86: fix TRACE_INCLUDE_PATH and remove -I. header search paths
  KVM: selftests: check returned evmcs version range
  x86/kvm/hyper-v: nested_enable_evmcs() sets vmcs_version incorrectly
  KVM: VMX: Move vmx_vcpu_run()'s VM-Enter asm blob to a helper function
  kvm: selftests: Fix region overlap check in kvm_util
  kvm: vmx: fix some -Wmissing-prototypes warnings
  KVM: nSVM: clear events pending from svm_complete_interrupts() when exiting to L1
  svm: Fix AVIC incomplete IPI emulation
  svm: Add warning message for AVIC IPI invalid target
  KVM: x86: WARN_ONCE if sending a PV IPI returns a fatal error
  KVM: x86: Fix PV IPIs for 32-bit KVM host
  x86/kvm/hyper-v: recommend using eVMCS only when it is enabled
  x86/kvm/hyper-v: don't recommend doing reset via synthetic MSR
  kvm: x86/vmx: Use kzalloc for cached_vmcs12
  KVM: VMX: Use the correct field var when clearing VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL
  KVM: x86: Fix single-step debugging
  x86/kvm/hyper-v: don't announce GUEST IDLE MSR support

5 years agoMerge tag 'dma-mapping-5.0-2' of git://git.infradead.org/users/hch/dma-mapping
Linus Torvalds [Sun, 27 Jan 2019 17:18:05 +0000 (09:18 -0800)]
Merge tag 'dma-mapping-5.0-2' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping fix from Christoph Hellwig:
 "Fix a xen-swiotlb regression on arm64"

* tag 'dma-mapping-5.0-2' of git://git.infradead.org/users/hch/dma-mapping:
  arm64/xen: fix xen-swiotlb cache flushing

5 years agoMerge tag 'libnvdimm-fixes-5.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 27 Jan 2019 17:11:51 +0000 (09:11 -0800)]
Merge tag 'libnvdimm-fixes-5.0-rc4' of git://git./linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm fixes from Dan Williams:
 "A fix for namespace label support for non-Intel NVDIMMs that implement
  the ACPI standard label method.

  This has apparently never worked and could wait for v5.1. However it
  has enough visibility with hardware vendors [1] and distro bug
  trackers [2], and low enough risk that I decided it should go in for
  -rc4. The other fixups target the new, for v5.0, nvdimm security
  functionality. The larger init path fixup closes a memory leak and a
  potential userspace lockup due to missed notifications.

    [1] https://github.com/pmem/ndctl/issues/78
    [2] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1811785

  These have all soaked in -next for a week with no reported issues.

  Summary:

   - Fix support for NVDIMMs that implement the ACPI standard label
     methods.

   - Fix error handling for security overwrite (memory leak / userspace
     hang condition), and another one-line security cleanup"

* tag 'libnvdimm-fixes-5.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  acpi/nfit: Fix command-supported detection
  acpi/nfit: Block function zero DSMs
  libnvdimm/security: Require nvdimm_security_setup_events() to succeed
  nfit_test: fix security state pull for nvdimm security nfit_test

5 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Linus Torvalds [Sun, 27 Jan 2019 17:07:03 +0000 (09:07 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input

Pull input fixes from Dmitry Torokhov:
 "A fixup for the input_event fix for y2038 Sparc64, and couple other
  minor fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: input_event - fix the CONFIG_SPARC64 mixup
  Input: olpc_apsp - assign priv->dev earlier
  Input: uinput - fix undefined behavior in uinput_validate_absinfo()
  Input: raspberrypi-ts - fix link error
  Input: xpad - add support for SteelSeries Stratus Duo
  Input: input_event - provide override for sparc64