Vasundhara Volam [Mon, 28 Jan 2019 12:30:22 +0000 (18:00 +0530)]
devlink: Add port param set command
Add port param set command to set the value for a parameter.
Value can be set to any of the supported configuration modes.
v7->v8: Append "Acked-by: Jiri Pirko <jiri@mellanox.com>"
Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vasundhara Volam [Mon, 28 Jan 2019 12:30:21 +0000 (18:00 +0530)]
devlink: Add port param get command
Add port param get command which gets data per parameter.
It also has option to dump the parameters data per port.
v7->v8: Append "Acked-by: Jiri Pirko <jiri@mellanox.com>"
Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vasundhara Volam [Mon, 28 Jan 2019 12:30:20 +0000 (18:00 +0530)]
devlink: Add devlink_param for port register and unregister
Add functions to register and unregister for the driver supported
configuration parameters table per port.
v7->v8:
- Order the definitions following way as suggested by Jiri.
__devlink_params_register
__devlink_params_unregister
devlink_params_register
devlink_params_unregister
devlink_port_params_register
devlink_port_params_unregister
- Append with Acked-by: Jiri Pirko <jiri@mellanox.com>.
v2->v3:
- Add a helper __devlink_params_register() with common code used by
both devlink_params_register() and devlink_port_params_register().
Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 30 Jan 2019 05:18:54 +0000 (21:18 -0800)]
Merge git://git./linux/kernel/git/davem/net
Linus Torvalds [Wed, 30 Jan 2019 01:11:47 +0000 (17:11 -0800)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Need to save away the IV across tls async operations, from Dave
Watson.
2) Upon successful packet processing, we should liberate the SKB with
dev_consume_skb{_irq}(). From Yang Wei.
3) Only apply RX hang workaround on effected macb chips, from Harini
Katakam.
4) Dummy netdev need a proper namespace assigned to them, from Josh
Elsasser.
5) Some paths of nft_compat run lockless now, and thus we need to use a
proper refcnt_t. From Florian Westphal.
6) Avoid deadlock in mlx5 by doing IRQ locking, from Moni Shoua.
7) netrom does not refcount sockets properly wrt. timers, fix that by
using the sock timer API. From Cong Wang.
8) Fix locking of inexact inserts of xfrm policies, from Florian
Westphal.
9) Missing xfrm hash generation bump, also from Florian.
10) Missing of_node_put() in hns driver, from Yonglong Liu.
11) Fix DN_IFREQ_SIZE, from Johannes Berg.
12) ip6mr notifier is invoked during traversal of wrong table, from Nir
Dotan.
13) TX promisc settings not performed correctly in qed, from Manish
Chopra.
14) Fix OOB access in vhost, from Jason Wang.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (52 commits)
MAINTAINERS: Add entry for XDP (eXpress Data Path)
net: set default network namespace in init_dummy_netdev()
net: b44: replace dev_kfree_skb_xxx by dev_consume_skb_xxx for drop profiles
net: caif: call dev_consume_skb_any when skb xmit done
net: 8139cp: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: macb: Apply RXUBR workaround only to versions with errata
net: ti: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: apple: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: amd8111e: replace dev_kfree_skb_irq by dev_consume_skb_irq
net: alteon: replace dev_kfree_skb_irq by dev_consume_skb_irq
net: tls: Fix deadlock in free_resources tx
net: tls: Save iv in tls_rec for async crypto requests
vhost: fix OOB in get_rx_bufs()
qed: Fix stack out of bounds bug
qed: Fix system crash in ll2 xmit
qed: Fix VF probe failure while FLR
qed: Fix LACP pdu drops for VFs
qed: Fix bug in tx promiscuous mode settings
net: i825xx: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
netfilter: ipt_CLUSTERIP: fix warning unused variable cn
...
Jesper Dangaard Brouer [Tue, 29 Jan 2019 13:22:33 +0000 (14:22 +0100)]
MAINTAINERS: Add entry for XDP (eXpress Data Path)
Add multiple people as maintainers for XDP, sorted alphabetically.
XDP is also tied to driver level support and code, but we cannot add all
drivers to the list. Instead K: and N: match on 'xdp' in hope to catch some
of those changes in drivers.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Josh Elsasser [Sat, 26 Jan 2019 22:38:33 +0000 (14:38 -0800)]
net: set default network namespace in init_dummy_netdev()
Assign a default net namespace to netdevs created by init_dummy_netdev().
Fixes a NULL pointer dereference caused by busy-polling a socket bound to
an iwlwifi wireless device, which bumps the per-net BUSYPOLLRXPACKETS stat
if napi_poll() received packets:
BUG: unable to handle kernel NULL pointer dereference at
0000000000000190
IP: napi_busy_loop+0xd6/0x200
Call Trace:
sock_poll+0x5e/0x80
do_sys_poll+0x324/0x5a0
SyS_poll+0x6c/0xf0
do_syscall_64+0x6b/0x1f0
entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Fixes: 7db6b048da3b ("net: Commonize busy polling code to focus on napi_id instead of socket")
Signed-off-by: Josh Elsasser <jelsasser@appneta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gustavo A. R. Silva [Tue, 29 Jan 2019 17:05:23 +0000 (11:05 -0600)]
cxgb4: cxgb4_tc_u32: use struct_size() in kvzalloc()
One of the more common cases of allocation size calculations is finding the
size of a structure that has a zero-sized array at the end, along with memory
for some number of elements for that array. For example:
struct foo {
int stuff;
struct boo entry[];
};
instance = kvzalloc(sizeof(struct foo) + count * sizeof(struct boo), GFP_KERNEL);
Instead of leaving these open-coded and prone to type mistakes, we can now
use the new struct_size() helper:
instance = kvzalloc(struct_size(instance, entry, count), GFP_KERNEL);
This code was detected with the help of Coccinelle.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gustavo A. R. Silva [Tue, 29 Jan 2019 16:44:44 +0000 (10:44 -0600)]
cxgb4: clip_tbl: Use struct_size() in kvzalloc()
One of the more common cases of allocation size calculations is finding the
size of a structure that has a zero-sized array at the end, along with memory
for some number of elements for that array. For example:
struct foo {
int stuff;
struct boo entry[];
};
instance = kvzalloc(sizeof(struct foo) + count * sizeof(struct boo), GFP_KERNEL);
Instead of leaving these open-coded and prone to type mistakes, we can now
use the new struct_size() helper:
instance = kvzalloc(struct_size(instance, entry, count), GFP_KERNEL);
This code was detected with the help of Coccinelle.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yang Wei [Tue, 29 Jan 2019 15:04:40 +0000 (23:04 +0800)]
net: b44: replace dev_kfree_skb_xxx by dev_consume_skb_xxx for drop profiles
The skb should be freed by dev_consume_skb_any() in b44_start_xmit()
when bounce_skb is used. The skb is be replaced by bounce_skb, so the
original skb should be consumed(not drop).
dev_consume_skb_irq() should be called in b44_tx() when skb xmit
done. It makes drop profiles(dropwatch, perf) more friendly.
Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yang Wei [Tue, 29 Jan 2019 15:32:22 +0000 (23:32 +0800)]
net: caif: call dev_consume_skb_any when skb xmit done
The skb shouled be consumed when xmit done, it makes drop profiles
(dropwatch, perf) more friendly.
dev_kfree_skb_irq()/kfree_skb() shouled be replaced by
dev_consume_skb_any(), it makes code cleaner.
Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yang Wei [Tue, 29 Jan 2019 14:40:51 +0000 (22:40 +0800)]
net: 8139cp: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
dev_consume_skb_irq() should be called in cp_tx() when skb xmit
done. It makes drop profiles(dropwatch, perf) more friendly.
Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arjun Vynipadath [Tue, 29 Jan 2019 10:08:42 +0000 (15:38 +0530)]
MAINTAINERS: update cxgb4 and cxgb3 maintainer
Vishal Kulkarni will be the new maintainer for Chelsio cxgb3/cxgb4
drivers.
Signed-off-by: Arjun Vynipadath <arjun@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arjun Vynipadath [Tue, 29 Jan 2019 09:50:19 +0000 (15:20 +0530)]
cxgb4vf: Update port information in cxgb4vf_open()
It's possible that the basic port information could have
changed since we first read it.
Signed-off-by: Arjun Vynipadath <arjun@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Harini Katakam [Tue, 29 Jan 2019 09:50:03 +0000 (15:20 +0530)]
net: macb: Apply RXUBR workaround only to versions with errata
The interrupt handler contains a workaround for RX hang applicable
to Zynq and AT91RM9200 only. Subsequent versions do not need this
workaround. This workaround unnecessarily resets RX whenever RX used
bit read is observed, which can be often under heavy traffic. There
is no other action performed on RX UBR interrupt. Hence introduce a
CAPS mask; enable this interrupt and workaround only on affected
versions.
Signed-off-by: Harini Katakam <harini.katakam@xilinx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Veerasenareddy Burru [Mon, 28 Jan 2019 19:38:31 +0000 (19:38 +0000)]
liquidio: fix the validation of rx checksum status from NIC hardware
Fixed the code that was incorrectly interpreting the rx checksum validation
status from hardware, and updating kernel that the packet arrived with
correct checksum though the packet arrived with incorrect checksum and
hardware also indicated checksum is not correct.
Signed-off-by: Veerasenareddy Burru <vburru@marvell.com>
Acked-by: Derek Chickles <dchickles@marvell.com>
Signed-off-by: Felix Manlunas <fmanlunas@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yang Wei [Mon, 28 Jan 2019 23:40:10 +0000 (07:40 +0800)]
net: ti: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
dev_consume_skb_irq() should be called in cpmac_end_xmit() when
xmit done. It makes drop profiles more friendly.
Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yang Wei [Mon, 28 Jan 2019 23:39:13 +0000 (07:39 +0800)]
net: apple: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
dev_consume_skb_irq() should be called in bmac_txdma_intr() when
xmit done. It makes drop profiles more friendly.
Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yang Wei [Sun, 27 Jan 2019 15:58:25 +0000 (23:58 +0800)]
net: amd8111e: replace dev_kfree_skb_irq by dev_consume_skb_irq
dev_consume_skb_irq() should be called in amd8111e_tx() when xmit
done. It makes drop profiles more friendly.
Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yang Wei [Sun, 27 Jan 2019 15:56:34 +0000 (23:56 +0800)]
net: alteon: replace dev_kfree_skb_irq by dev_consume_skb_irq
dev_consume_skb_irq() should be called in ace_tx_int() when xmit
done. It makes drop profiles more friendly.
Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dave Watson [Sun, 27 Jan 2019 00:59:03 +0000 (00:59 +0000)]
net: tls: Fix deadlock in free_resources tx
If there are outstanding async tx requests (when crypto returns EINPROGRESS),
there is a potential deadlock: the tx work acquires the lock, while we
cancel_delayed_work_sync() while holding the lock. Drop the lock while waiting
for the work to complete.
Fixes: a42055e8d2c30 ("Add support for async encryption of records...")
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dave Watson [Sun, 27 Jan 2019 00:57:38 +0000 (00:57 +0000)]
net: tls: Save iv in tls_rec for async crypto requests
aead_request_set_crypt takes an iv pointer, and we change the iv
soon after setting it. Some async crypto algorithms don't save the iv,
so we need to save it in the tls_rec for async requests.
Found by hardcoding x64 aesni to use async crypto manager (to test the async
codepath), however I don't think this combination can happen in the wild.
Presumably other hardware offloads will need this fix, but there have been
no user reports.
Fixes: a42055e8d2c30 ("Add support for async encryption of records...")
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Wang [Mon, 28 Jan 2019 07:05:05 +0000 (15:05 +0800)]
vhost: fix OOB in get_rx_bufs()
After batched used ring updating was introduced in commit
e2b3b35eb989
("vhost_net: batch used ring update in rx"). We tend to batch heads in
vq->heads for more than one packet. But the quota passed to
get_rx_bufs() was not correctly limited, which can result a OOB write
in vq->heads.
headcount = get_rx_bufs(vq, vq->heads + nvq->done_idx,
vhost_len, &in, vq_log, &log,
likely(mergeable) ? UIO_MAXIOV : 1);
UIO_MAXIOV was still used which is wrong since we could have batched
used in vq->heads, this will cause OOB if the next buffer needs more
than 960 (1024 (UIO_MAXIOV) - 64 (VHOST_NET_BATCH)) heads after we've
batched 64 (VHOST_NET_BATCH) heads:
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
=============================================================================
BUG kmalloc-8k (Tainted: G B ): Redzone overwritten
-----------------------------------------------------------------------------
INFO: 0x00000000fd93b7a2-0x00000000f0713384. First byte 0xa9 instead of 0xcc
INFO: Allocated in alloc_pd+0x22/0x60 age=
3933677 cpu=2 pid=2674
kmem_cache_alloc_trace+0xbb/0x140
alloc_pd+0x22/0x60
gen8_ppgtt_create+0x11d/0x5f0
i915_ppgtt_create+0x16/0x80
i915_gem_create_context+0x248/0x390
i915_gem_context_create_ioctl+0x4b/0xe0
drm_ioctl_kernel+0xa5/0xf0
drm_ioctl+0x2ed/0x3a0
do_vfs_ioctl+0x9f/0x620
ksys_ioctl+0x6b/0x80
__x64_sys_ioctl+0x11/0x20
do_syscall_64+0x43/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
INFO: Slab 0x00000000d13e87af objects=3 used=3 fp=0x (null) flags=0x200000000010201
INFO: Object 0x0000000003278802 @offset=17064 fp=0x00000000e2e6652b
Fixing this by allocating UIO_MAXIOV + VHOST_NET_BATCH iovs for
vhost-net. This is done through set the limitation through
vhost_dev_init(), then set_owner can allocate the number of iov in a
per device manner.
This fixes CVE-2018-16880.
Fixes: e2b3b35eb989 ("vhost_net: batch used ring update in rx")
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Rothwell [Tue, 29 Jan 2019 05:13:08 +0000 (16:13 +1100)]
enetc: include linux/vmalloc.h for vzalloc etc
Fixes: d4fd0404c1c9 ("enetc: Introduce basic PF and VF ENETC ethernet drivers")
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 29 Jan 2019 03:38:33 +0000 (19:38 -0800)]
Merge git://git./linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2019-01-29
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Teach verifier dead code removal, this also allows for optimizing /
removing conditional branches around dead code and to shrink the
resulting image. Code store constrained architectures like nfp would
have hard time doing this at JIT level, from Jakub.
2) Add JMP32 instructions to BPF ISA in order to allow for optimizing
code generation for 32-bit sub-registers. Evaluation shows that this
can result in code reduction of ~5-20% compared to 64 bit-only code
generation. Also add implementation for most JITs, from Jiong.
3) Add support for __int128 types in BTF which is also needed for
vmlinux's BTF conversion to work, from Yonghong.
4) Add a new command to bpftool in order to dump a list of BPF-related
parameters from the system or for a specific network device e.g. in
terms of available prog/map types or helper functions, from Quentin.
5) Add AF_XDP sock_diag interface for querying sockets from user
space which provides information about the RX/TX/fill/completion
rings, umem, memory usage etc, from Björn.
6) Add skb context access for skb_shared_info->gso_segs field, from Eric.
7) Add support for testing flow dissector BPF programs by extending
existing BPF_PROG_TEST_RUN infrastructure, from Stanislav.
8) Split BPF kselftest's test_verifier into various subgroups of tests
in order better deal with merge conflicts in this area, from Jakub.
9) Add support for queue/stack manipulations in bpftool, from Stanislav.
10) Document BTF, from Yonghong.
11) Dump supported ELF section names in libbpf on program load
failure, from Taeung.
12) Silence a false positive compiler warning in verifier's BTF
handling, from Peter.
13) Fix help string in bpftool's feature probing, from Prashant.
14) Remove duplicate includes in BPF kselftests, from Yue.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 29 Jan 2019 01:34:38 +0000 (17:34 -0800)]
Merge git://git./linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:
====================
Netfilter/IPVS updates for net-next
The following patchset contains Netfilter/IPVS updates for your net-next tree:
1) Introduce a hashtable to speed up object lookups, from Florian Westphal.
2) Make direct calls to built-in extension, also from Florian.
3) Call helper before confirming the conntrack as it used to be originally,
from Florian.
4) Call request_module() to autoload br_netfilter when physdev is used
to relax the dependency, also from Florian.
5) Allow to insert rules at a given position ID that is internal to the
batch, from Phil Sutter.
6) Several patches to replace conntrack indirections by direct calls,
and to reduce modularization, from Florian. This also includes
several follow up patches to deal with minor fallout from this
rework.
7) Use RCU from conntrack gre helper, from Florian.
8) GRE conntrack module becomes built-in into nf_conntrack, from Florian.
9) Replace nf_ct_invert_tuplepr() by calls to nf_ct_invert_tuple(),
from Florian.
10) Unify sysctl handling at the core of nf_conntrack, from Florian.
11) Provide modparam to register conntrack hooks.
12) Allow to match on the interface kind string, from wenxu.
13) Remove several exported symbols, not required anymore now after
a bit of de-modulatization work has been done, from Florian.
14) Remove built-in map support in the hash extension, this can be
done with the existing userspace infrastructure, from laura.
15) Remove indirection to calculate checksums in IPVS, from Matteo Croce.
16) Use call wrappers for indirection in IPVS, also from Matteo.
17) Remove superfluous __percpu parameter in nft_counter, patch from
Luc Van Oostenryck.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Tue, 29 Jan 2019 00:08:30 +0000 (01:08 +0100)]
Merge branch 'bpf-flow-dissector-tests'
Stanislav Fomichev says:
====================
This patch series adds support for testing flow dissector BPF programs
by extending already existing BPF_PROG_TEST_RUN. The goal is to have
a packet as an input and `struct bpf_flow_key' as an output. That way
we can easily test flow dissector programs' behavior. I've also modified
existing test_progs.c test to do a simple flow dissector run as well.
* first patch introduces new __skb_flow_bpf_dissect to simplify
sharing between __skb_flow_bpf_dissect and BPF_PROG_TEST_RUN
* second patch adds actual BPF_PROG_TEST_RUN support
* third patch adds example usage to the selftests
v3:
* rebased on top of latest bpf-next
v2:
* loop over 'kattr->test.repeat' inside of
bpf_prog_test_run_flow_dissector, don't reuse
bpf_test_run/bpf_test_run_one
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Stanislav Fomichev [Mon, 28 Jan 2019 16:53:55 +0000 (08:53 -0800)]
selftests/bpf: add simple BPF_PROG_TEST_RUN examples for flow dissector
Use existing pkt_v4 and pkt_v6 to make sure flow_keys are what we want.
Also, add new bpf_flow_load routine (and flow_dissector_load.h header)
that loads bpf_flow.o program and does all required setup.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Stanislav Fomichev [Mon, 28 Jan 2019 16:53:54 +0000 (08:53 -0800)]
bpf: add BPF_PROG_TEST_RUN support for flow dissector
The input is packet data, the output is struct bpf_flow_key. This should
make it easy to test flow dissector programs without elaborate
setup.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Stanislav Fomichev [Mon, 28 Jan 2019 16:53:53 +0000 (08:53 -0800)]
net/flow_dissector: move bpf case into __skb_flow_bpf_dissect
This way, we can reuse it for flow dissector in BPF_PROG_TEST_RUN.
No functional changes.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Jakub Kicinski [Mon, 28 Jan 2019 18:29:15 +0000 (10:29 -0800)]
tools: bpftool: warn about risky prog array updates
When prog array is updated with bpftool users often refer
to the map via the ID. Unfortunately, that's likely
to lead to confusion because prog arrays get flushed when
the last user reference is gone. If there is no other
reference bpftool will create one, update successfully
just to close the map again and have it flushed.
Warn about this case in non-JSON mode.
If the problem continues causing confusion we can remove
the support for referring to a map by ID for prog array
update completely. For now it seems like the potential
inconvenience to users who know what they're doing outweighs
the benefit.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
YueHaibing [Fri, 25 Jan 2019 02:46:34 +0000 (10:46 +0800)]
selftests: bpf: remove duplicated include
Remove duplicated include.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
David S. Miller [Mon, 28 Jan 2019 19:13:35 +0000 (11:13 -0800)]
Merge branch 'qed-Bug-fixes'
Manish Chopra says:
====================
qed: Bug fixes
This series have SR-IOV and some general fixes.
Please consider applying it to "net"
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Manish Chopra [Mon, 28 Jan 2019 18:05:08 +0000 (10:05 -0800)]
qed: Fix stack out of bounds bug
KASAN reported following bug in qed_init_qm_get_idx_from_flags
due to inappropriate casting of "pq_flags". Fix the type of "pq_flags".
[ 196.624707] BUG: KASAN: stack-out-of-bounds in qed_init_qm_get_idx_from_flags+0x1a4/0x1b8 [qed]
[ 196.624712] Read of size 8 at addr
ffff809b00bc7360 by task kworker/0:9/1712
[ 196.624714]
[ 196.624720] CPU: 0 PID: 1712 Comm: kworker/0:9 Not tainted 4.18.0-60.el8.aarch64+debug #1
[ 196.624723] Hardware name: To be filled by O.E.M. Saber/Saber, BIOS 0ACKL024 09/26/2018
[ 196.624733] Workqueue: events work_for_cpu_fn
[ 196.624738] Call trace:
[ 196.624742] dump_backtrace+0x0/0x2f8
[ 196.624745] show_stack+0x24/0x30
[ 196.624749] dump_stack+0xe0/0x11c
[ 196.624755] print_address_description+0x68/0x260
[ 196.624759] kasan_report+0x178/0x340
[ 196.624762] __asan_report_load_n_noabort+0x38/0x48
[ 196.624786] qed_init_qm_get_idx_from_flags+0x1a4/0x1b8 [qed]
[ 196.624808] qed_init_qm_info+0xec0/0x2200 [qed]
[ 196.624830] qed_resc_alloc+0x284/0x7e8 [qed]
[ 196.624853] qed_slowpath_start+0x6cc/0x1ae8 [qed]
[ 196.624864] __qede_probe.isra.10+0x1cc/0x12c0 [qede]
[ 196.624874] qede_probe+0x78/0xf0 [qede]
[ 196.624879] local_pci_probe+0xc4/0x180
[ 196.624882] work_for_cpu_fn+0x54/0x98
[ 196.624885] process_one_work+0x758/0x1900
[ 196.624888] worker_thread+0x4e0/0xd18
[ 196.624892] kthread+0x2c8/0x350
[ 196.624897] ret_from_fork+0x10/0x18
[ 196.624899]
[ 196.624902] Allocated by task 2:
[ 196.624906] kasan_kmalloc.part.1+0x40/0x108
[ 196.624909] kasan_kmalloc+0xb4/0xc8
[ 196.624913] kasan_slab_alloc+0x14/0x20
[ 196.624916] kmem_cache_alloc_node+0x1dc/0x480
[ 196.624921] copy_process.isra.1.part.2+0x1d8/0x4a98
[ 196.624924] _do_fork+0x150/0xfa0
[ 196.624926] kernel_thread+0x48/0x58
[ 196.624930] kthreadd+0x3a4/0x5a0
[ 196.624932] ret_from_fork+0x10/0x18
[ 196.624934]
[ 196.624937] Freed by task 0:
[ 196.624938] (stack is not available)
[ 196.624940]
[ 196.624943] The buggy address belongs to the object at
ffff809b00bc0000
[ 196.624943] which belongs to the cache thread_stack of size 32768
[ 196.624946] The buggy address is located 29536 bytes inside of
[ 196.624946] 32768-byte region [
ffff809b00bc0000,
ffff809b00bc8000)
[ 196.624948] The buggy address belongs to the page:
[ 196.624952] page:
ffff7fe026c02e00 count:1 mapcount:0 mapping:
ffff809b4001c000 index:0x0 compound_mapcount: 0
[ 196.624960] flags: 0xfffff8000008100(slab|head)
[ 196.624967] raw:
0fffff8000008100 dead000000000100 dead000000000200 ffff809b4001c000
[ 196.624970] raw:
0000000000000000 0000000000080008 00000001ffffffff 0000000000000000
[ 196.624973] page dumped because: kasan: bad access detected
[ 196.624974]
[ 196.624976] Memory state around the buggy address:
[ 196.624980]
ffff809b00bc7200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 196.624983]
ffff809b00bc7280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 196.624985] >
ffff809b00bc7300: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 04 f2 f2 f2
[ 196.624988] ^
[ 196.624990]
ffff809b00bc7380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 196.624993]
ffff809b00bc7400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 196.624995] ==================================================================
Signed-off-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Manish Chopra [Mon, 28 Jan 2019 18:05:07 +0000 (10:05 -0800)]
qed: Fix system crash in ll2 xmit
Cache number of fragments in the skb locally as in case
of linear skb (with zero fragments), tx completion
(or freeing of skb) may happen before driver tries
to get number of frgaments from the skb which could
lead to stale access to an already freed skb.
Signed-off-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Manish Chopra [Mon, 28 Jan 2019 18:05:06 +0000 (10:05 -0800)]
qed: Fix VF probe failure while FLR
VFs may hit VF-PF channel timeout while probing, as in some
cases it was observed that VF FLR and VF "acquire" message
transaction (i.e first message from VF to PF in VF's probe flow)
could occur simultaneously which could lead VF to fail sending
"acquire" message to PF as VF is marked disabled from HW perspective
due to FLR, which will result into channel timeout and VF probe failure.
In such cases, try retrying VF "acquire" message so that in later
attempts it could be successful to pass message to PF after the VF
FLR is completed and can be probed successfully.
Signed-off-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Manish Chopra [Mon, 28 Jan 2019 18:05:05 +0000 (10:05 -0800)]
qed: Fix LACP pdu drops for VFs
VF is always configured to drop control frames
(with reserved mac addresses) but to work LACP
on the VFs, it would require LACP control frames
to be forwarded or transmitted successfully.
This patch fixes this in such a way that trusted VFs
(marked through ndo_set_vf_trust) would be allowed to
pass the control frames such as LACP pdus.
Signed-off-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Manish Chopra [Mon, 28 Jan 2019 18:05:04 +0000 (10:05 -0800)]
qed: Fix bug in tx promiscuous mode settings
When running tx switched traffic between VNICs
created via a bridge(to which VFs are added),
adapter drops the unicast packets in tx flow due to
VNIC's ucast mac being unknown to it. But VF interfaces
being in promiscuous mode should have caused adapter
to accept all the unknown ucast packets. Later, it
was found that driver doesn't really configure tx
promiscuous mode settings to accept all unknown unicast macs.
This patch fixes tx promiscuous mode settings to accept all
unknown/unmatched unicast macs and works out the scenario.
Signed-off-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 28 Jan 2019 18:58:41 +0000 (10:58 -0800)]
Merge branch 'qed-Error-recovery-process'
Michal Kalderon says:
====================
qed*: Error recovery process
Parity errors might happen in the device's memories due to momentary bit
flips which are caused by radiation.
Errors that are not correctable initiate a process kill event, which blocks
the device access towards the host and the network, and a recovery process
is started in the management FW and in the driver.
This series adds the support of this process in the qed core module and in
the qede driver (patches 2 & 3).
Patch 1 in the series revises the load sequence, to avoid PCI errors that
might be observed during a recovery process.
Changes in v2:
- Addressed issue found in https://patchwork.ozlabs.org/patch/
1030545/
The change was done be removing the enum and passing a boolean to
the related functions.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Tomer Tayar [Mon, 28 Jan 2019 17:27:56 +0000 (19:27 +0200)]
qede: Error recovery process
This patch adds the error recovery process in the qede driver.
The process includes a partial/customized driver unload and load, which
allows it to look like a short suspend period to the kernel while
preserving the net devices' state.
Signed-off-by: Tomer Tayar <tomer.tayar@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: Michal Kalderon <michal.kalderon@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tomer Tayar [Mon, 28 Jan 2019 17:27:55 +0000 (19:27 +0200)]
qed: Add infrastructure for error detection and recovery
This patch adds the detection and handling of a parity error ("process kill
event"), including the update of the protocol drivers, and the prevention
of any HW access that will lead to device access towards the host while
recovery is in progress.
It also provides the means for the protocol drivers to trigger a recovery
process on their decision.
Signed-off-by: Tomer Tayar <tomer.tayar@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: Michal Kalderon <michal.kalderon@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tomer Tayar [Mon, 28 Jan 2019 17:27:54 +0000 (19:27 +0200)]
qed: Revise load sequence to avoid PCI errors
Initiating final cleanup after an ungraceful driver unload can lead to bad
PCI accesses towards the host.
This patch revises the load sequence so final cleanup is sent while the
internal master enable is cleared, to prevent the host accesses, and clears
the internal error indications just before enabling the internal master
enable.
Signed-off-by: Tomer Tayar <tomer.tayar@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: Michal Kalderon <michal.kalderon@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lubomir Rintel [Mon, 28 Jan 2019 16:17:40 +0000 (17:17 +0100)]
benet: remove broken and unused macro
is_broadcast_packet() expands to compare_ether_addr() which doesn't
exist since commit
7367d0b573d1 ("drivers/net: Convert uses of
compare_ether_addr to ether_addr_equal"). It turns out it's actually not
used.
Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yang Wei [Mon, 28 Jan 2019 14:42:25 +0000 (22:42 +0800)]
net: i825xx: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
dev_consume_skb_irq() should be called in i596_interrupt() when skb
xmit done. It makes drop profiles(dropwatch, perf) more friendly.
Signed-off-by: Yang Wei <albin_yang@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 28 Jan 2019 18:51:51 +0000 (10:51 -0800)]
Merge git://git./pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:
====================
Netfilter/IPVS fixes for net
The following patchset contains Netfilter/IPVS fixes for your net tree:
1) The nftnl mutex is now per-netns, therefore use reference counter
for matches and targets to deal with concurrent updates from netns.
Moreover, place extensions in a pernet list. Patches from Florian Westphal.
2) Bail out with EINVAL in case of negative timeouts via setsockopt()
through ip_vs_set_timeout(), from ZhangXiaoxu.
3) Spurious EINVAL on ebtables 32bit binary with 64bit kernel, also
from Florian.
4) Reset TCP option header parser in case of fingerprint mismatch,
otherwise follow up overlapping fingerprint definitions including
TCP options do not work, from Fernando Fernandez Mancera.
5) Compilation warning in ipt_CLUSTER with CONFIG_PROC_FS unset.
From Anders Roxell.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 28 Jan 2019 18:43:15 +0000 (10:43 -0800)]
Merge branch 'mlxsw-Misc-updates'
Ido Schimmel says:
====================
mlxsw: Misc updates
This patchset contains miscellaneous patches we gathered in our queue.
Some of them are dependencies of larger patchsets that I will submit
later this cycle.
Patches #1-#3 perform small non-functional changes in mlxsw.
Patch #4 adds more extended ack messages in mlxsw.
Patch #5 adds devlink parameters documentation for mlxsw. To be extended
with more parameters this cycle.
Patches #6-#7 perform small changes in forwarding selftests
infrastructure.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Mon, 28 Jan 2019 12:02:13 +0000 (12:02 +0000)]
selftests: forwarding: Use OK instead of PASS in test output
It is easier to distinguish "[ OK ]" from "[FAIL]" than "[PASS]".
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Suggested-by: David Ahern <dsahern@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Mon, 28 Jan 2019 12:02:12 +0000 (12:02 +0000)]
selftests: net: forwarding: change devlink resource support checking
As for the others, check help message output to find out if devlink
supports "resource" object.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Mon, 28 Jan 2019 12:02:11 +0000 (12:02 +0000)]
Documentation: add devlink param file for mlxsw driver
Add initial documentation file for devlink params of mlxsw driver. Only
"fw_load_policy" is now supported.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Mon, 28 Jan 2019 12:02:10 +0000 (12:02 +0000)]
mlxsw: spectrum_switchdev: Add more extack messages
Add more extack messages that let the user know why VXLAN offload
failed.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Suggested-by: David Ahern <dsahern@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Mon, 28 Jan 2019 12:02:09 +0000 (12:02 +0000)]
mlxsw: spectrum_acl: Fix rul/rule typo
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Mon, 28 Jan 2019 12:02:08 +0000 (12:02 +0000)]
mlxsw: spectrum_acl: Move mr_ruleset and mr_rule structs
Move the struct to the place where they belong, alongside with the rest
of the MR code.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Mon, 28 Jan 2019 12:02:07 +0000 (12:02 +0000)]
mlxsw: spectrum_acl: Remove unnecessary arg on action_replace call path
No need to pass ruleset/group and chunk pointers on action_replace call
path, nobody uses them.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michal Hocko [Fri, 25 Jan 2019 18:08:58 +0000 (19:08 +0100)]
Revert "mm, memory_hotplug: initialize struct pages for the full memory section"
This reverts commit
2830bf6f05fb3e05bc4743274b806c821807a684.
The underlying assumption that one sparse section belongs into a single
numa node doesn't hold really. Robert Shteynfeld has reported a boot
failure. The boot log was not captured but his memory layout is as
follows:
Early memory node ranges
node 1: [mem 0x0000000000001000-0x0000000000090fff]
node 1: [mem 0x0000000000100000-0x00000000dbdf8fff]
node 1: [mem 0x0000000100000000-0x0000001423ffffff]
node 0: [mem 0x0000001424000000-0x0000002023ffffff]
This means that node0 starts in the middle of a memory section which is
also in node1. memmap_init_zone tries to initialize padding of a
section even when it is outside of the given pfn range because there are
code paths (e.g. memory hotplug) which assume that the full worth of
memory section is always initialized.
In this particular case, though, such a range is already intialized and
most likely already managed by the page allocator. Scribbling over
those pages corrupts the internal state and likely blows up when any of
those pages gets used.
Reported-by: Robert Shteynfeld <robert.shteynfeld@gmail.com>
Fixes: 2830bf6f05fb ("mm, memory_hotplug: initialize struct pages for the full memory section")
Cc: stable@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Florian Westphal [Sun, 27 Jan 2019 18:18:57 +0000 (19:18 +0100)]
netfilter: ipv4: remove useless export_symbol
Only one caller; place it where needed and get rid of the EXPORT_SYMBOL.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Cong Wang [Wed, 23 Jan 2019 20:58:57 +0000 (12:58 -0800)]
netfilter: conntrack: fix error path in nf_conntrack_pernet_init()
When nf_ct_netns_get() fails, it should clean up itself,
its caller doesn't need to call nf_conntrack_fini_net().
nf_conntrack_init_net() is called after registering sysctl
and proc, so its cleanup function should be called before
unregistering sysctl and proc.
Fixes: ba3fbe663635 ("netfilter: nf_conntrack: provide modparam to always register conntrack hooks")
Fixes: b884fa461776 ("netfilter: conntrack: unify sysctl handling")
Reported-and-tested-by: syzbot+fcee88b2d87f0539dfe9@syzkaller.appspotmail.com
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Luc Van Oostenryck [Sat, 19 Jan 2019 21:50:24 +0000 (22:50 +0100)]
netfilter: nft_counter: remove wrong __percpu of nft_counter_resest()'s arg
nft_counter_rest() has its first argument declared as
struct nft_counter_percpu_priv __percpu *priv
but this structure is not percpu (it only countains
a member 'counter' which is, correctly, a pointer to a
percpu struct nft_counter).
So, remove the '__percpu' from the argument's declaration.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Matteo Croce [Sat, 19 Jan 2019 14:25:35 +0000 (15:25 +0100)]
ipvs: use indirect call wrappers
Use the new indirect call wrappers in IPVS when calling the TCP or UDP
protocol specific functions.
This avoids an indirect calls in IPVS, and reduces the performance
impact of the Spectre mitigation.
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Matteo Croce [Sat, 19 Jan 2019 14:22:38 +0000 (15:22 +0100)]
ipvs: avoid indirect calls when calculating checksums
The function pointer ip_vs_protocol->csum_check is only used in protocol
specific code, and never in the generic one.
Remove the function pointer from struct ip_vs_protocol and call the
checksum functions directly.
This reduces the performance impact of the Spectre mitigation, and
should give a small improvement even with RETPOLINES disabled.
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Anders Roxell [Wed, 23 Jan 2019 11:48:11 +0000 (12:48 +0100)]
netfilter: ipt_CLUSTERIP: fix warning unused variable cn
When CONFIG_PROC_FS isn't set the variable cn isn't used.
net/ipv4/netfilter/ipt_CLUSTERIP.c: In function ‘clusterip_net_exit’:
net/ipv4/netfilter/ipt_CLUSTERIP.c:849:24: warning: unused variable ‘cn’ [-Wunused-variable]
struct clusterip_net *cn = clusterip_pernet(net);
^~
Rework so the variable 'cn' is declared inside "#ifdef CONFIG_PROC_FS".
Fixes: b12f7bad5ad3 ("netfilter: ipt_CLUSTERIP: remove wrong WARN_ON_ONCE in netns exit routine")
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Fernando Fernandez Mancera [Mon, 21 Jan 2019 11:53:21 +0000 (12:53 +0100)]
netfilter: nfnetlink_osf: add missing fmatch check
When we check the tcp options of a packet and it doesn't match the current
fingerprint, the tcp packet option pointer must be restored to its initial
value in order to do the proper tcp options check for the next fingerprint.
Here we can see an example.
Assumming the following fingerprint base with two lines:
S10:64:1:60:M*,S,T,N,W6: Linux:3.0::Linux 3.0
S20:64:1:60:M*,S,T,N,W7: Linux:4.19:arch:Linux 4.1
Where TCP options are the last field in the OS signature, all of them overlap
except by the last one, ie. 'W6' versus 'W7'.
In case a packet for Linux 4.19 kicks in, the osf finds no matching because the
TCP options pointer is updated after checking for the TCP options in the first
line.
Therefore, reset pointer back to where it should be.
Fixes: 11eeef41d5f6 ("netfilter: passive OS fingerprint xtables match")
Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Westphal [Mon, 21 Jan 2019 20:54:36 +0000 (21:54 +0100)]
netfilter: ebtables: compat: un-break 32bit setsockopt when no rules are present
Unlike ip(6)tables ebtables only counts user-defined chains.
The effect is that a 32bit ebtables binary on a 64bit kernel can do
'ebtables -N FOO' only after adding at least one rule, else the request
fails with -EINVAL.
This is a similar fix as done in
3f1e53abff84 ("netfilter: ebtables: don't attempt to allocate 0-sized compat array").
Fixes: 7d7d7e02111e9 ("netfilter: compat: reject huge allocation requests")
Reported-by: Francesco Ruggeri <fruggeri@arista.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Andrew Lunn [Sun, 27 Jan 2019 21:48:00 +0000 (22:48 +0100)]
net: dsa: mv88e6xxx: Fix serdes irq setup going recursive
Duec to a typo, mv88e6390_serdes_irq_setup() calls itself, rather than
mv88e6390x_serdes_irq_setup(). It then blows the stack, and shortly
after the machine blows up.
Fixes: 2defda1f4b91 ("net: dsa: mv88e6xxx: Add support for SERDES on ports 2-8 for 6390X")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nir Dotan [Sun, 27 Jan 2019 07:26:22 +0000 (09:26 +0200)]
ip6mr: Fix notifiers call on mroute_clean_tables()
When the MC route socket is closed, mroute_clean_tables() is called to
cleanup existing routes. Mistakenly notifiers call was put on the cleanup
of the unresolved MC route entries cache.
In a case where the MC socket closes before an unresolved route expires,
the notifier call leads to a crash, caused by the driver trying to
increment a non initialized refcount_t object [1] and then when handling
is done, to decrement it [2]. This was detected by a test recently added in
commit
6d4efada3b82 ("selftests: forwarding: Add multicast routing test").
Fix that by putting notifiers call on the resolved entries traversal,
instead of on the unresolved entries traversal.
[1]
[ 245.748967] refcount_t: increment on 0; use-after-free.
[ 245.754829] WARNING: CPU: 3 PID: 3223 at lib/refcount.c:153 refcount_inc_checked+0x2b/0x30
...
[ 245.802357] Hardware name: Mellanox Technologies Ltd. MSN2740/SA001237, BIOS 5.6.5 06/07/2016
[ 245.811873] RIP: 0010:refcount_inc_checked+0x2b/0x30
...
[ 245.907487] Call Trace:
[ 245.910231] mlxsw_sp_router_fib_event.cold.181+0x42/0x47 [mlxsw_spectrum]
[ 245.917913] notifier_call_chain+0x45/0x7
[ 245.922484] atomic_notifier_call_chain+0x15/0x20
[ 245.927729] call_fib_notifiers+0x15/0x30
[ 245.932205] mroute_clean_tables+0x372/0x3f
[ 245.936971] ip6mr_sk_done+0xb1/0xc0
[ 245.940960] ip6_mroute_setsockopt+0x1da/0x5f0
...
[2]
[ 246.128487] refcount_t: underflow; use-after-free.
[ 246.133859] WARNING: CPU: 0 PID: 7 at lib/refcount.c:187 refcount_sub_and_test_checked+0x4c/0x60
[ 246.183521] Hardware name: Mellanox Technologies Ltd. MSN2740/SA001237, BIOS 5.6.5 06/07/2016
...
[ 246.193062] Workqueue: mlxsw_core_ordered mlxsw_sp_router_fibmr_event_work [mlxsw_spectrum]
[ 246.202394] RIP: 0010:refcount_sub_and_test_checked+0x4c/0x60
...
[ 246.298889] Call Trace:
[ 246.301617] refcount_dec_and_test_checked+0x11/0x20
[ 246.307170] mlxsw_sp_router_fibmr_event_work.cold.196+0x47/0x78 [mlxsw_spectrum]
[ 246.315531] process_one_work+0x1fa/0x3f0
[ 246.320005] worker_thread+0x2f/0x3e0
[ 246.324083] kthread+0x118/0x130
[ 246.327683] ? wq_update_unbound_numa+0x1b0/0x1b0
[ 246.332926] ? kthread_park+0x80/0x80
[ 246.337013] ret_from_fork+0x1f/0x30
Fixes: 088aa3eec2ce ("ip6mr: Support fib notifications")
Signed-off-by: Nir Dotan <nird@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johannes Berg [Sat, 26 Jan 2019 20:12:19 +0000 (21:12 +0100)]
decnet: fix DN_IFREQ_SIZE
Digging through the ioctls with Al because of the previous
patches, we found that on 64-bit decnet's dn_dev_ioctl()
is wrong, because struct ifreq::ifr_ifru is actually 24
bytes (not 16 as expected from struct sockaddr) due to the
ifru_map and ifru_settings members.
Clearly, decnet expects the ioctl to be called with a struct
like
struct ifreq_dn {
char ifr_name[IFNAMSIZ];
struct sockaddr_dn ifr_addr;
};
since it does
struct ifreq *ifr = ...;
struct sockaddr_dn *sdn = (struct sockaddr_dn *)&ifr->ifr_addr;
This means that DN_IFREQ_SIZE is too big for what it wants on
64-bit, as it is
sizeof(struct ifreq) - sizeof(struct sockaddr) +
sizeof(struct sockaddr_dn)
This assumes that sizeof(struct sockaddr) is the size of ifr_ifru
but that isn't true.
Fix this to use offsetof(struct ifreq, ifr_ifru).
This indeed doesn't really matter much - the result is that we
copy in/out 8 bytes more than we should on 64-bit platforms. In
case the "struct ifreq_dn" lands just on the end of a page though
it might lead to faults.
As far as I can tell, it has been like this forever, so it seems
very likely that nobody cares.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexey Khoroshilov [Sat, 26 Jan 2019 19:48:57 +0000 (22:48 +0300)]
net: stmmac: dwmac-rk: fix error handling in rk_gmac_powerup()
If phy_power_on() fails in rk_gmac_powerup(), clocks are left enabled.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 28 Jan 2019 07:01:56 +0000 (23:01 -0800)]
Merge branch 'hns-fixes'
Peng Li says:
====================
net: hns: code optimizations & bugfixes for HNS driver
This patchset includes bugfixes and code optimizations for the HNS
ethernet controller driver
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Yonglong Liu [Sat, 26 Jan 2019 09:18:27 +0000 (17:18 +0800)]
net: hns: Fix wrong read accesses via Clause 45 MDIO protocol
When reading phy registers via Clause 45 MDIO protocol, after write
address operation, the driver use another write address operation, so
can not read the right value of any phy registers. This patch fixes it.
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yonglong Liu [Sat, 26 Jan 2019 09:18:26 +0000 (17:18 +0800)]
net: hns: Restart autoneg need return failed when autoneg off
The hns driver of earlier devices, when autoneg off, restart autoneg
will return -EINVAL, so make the hns driver for the latest devices
do the same.
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yonglong Liu [Sat, 26 Jan 2019 09:18:25 +0000 (17:18 +0800)]
net: hns: Fix for missing of_node_put() after of_parse_phandle()
In hns enet driver, we use of_parse_handle() to get hold of the
device node related to "ae-handle" but we have missed to put
the node reference using of_node_put() after we are done using
the node. This patch fixes it.
Note:
Link: https://lkml.org/lkml/2018/12/22/217
Fixes: 48189d6aaf1e ("net: hns: enet specifies a reference to dsaf")
Reported-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov [Mon, 28 Jan 2019 05:37:46 +0000 (21:37 -0800)]
Merge branch 'split-test_verifier'
Jakub Kicinski says:
====================
The tools/testing/selftests/bpf/test_verifier.c file is
way too large, and since most people add their at the
end of the list it's very prone to conflicts.
Break it up in the simplest possible way - slice the
array up into smaller C files and include them in the
right spot.
Tested:
$ make -C tools/testing/selftests/bpf/
$ cd tools/testing/selftests/bpf/ ; make
v2:
The indentation is reduced further as discussed and lines folded.
The conversion was scripted, and double checked by hand.
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Jakub Kicinski [Fri, 25 Jan 2019 23:24:44 +0000 (15:24 -0800)]
selftests: bpf: break up the rest of test_verifier
Break up the rest of test_verifier tests into separate
files.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Jakub Kicinski [Fri, 25 Jan 2019 23:24:43 +0000 (15:24 -0800)]
selftests: bpf: break up test_verifier
Break up the first 10 kLoC of test verifier test cases
out into smaller files. Looks like git line counting
gets a little flismy above 16 bit integers, so we need
two commits to break up test_verifier.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Jakub Kicinski [Fri, 25 Jan 2019 23:24:42 +0000 (15:24 -0800)]
selftests: bpf: prepare for break up of verifier tests
test_verifier.c has grown to be very long (almost 16 kLoC),
and it is very conflict prone since we always add tests at
the end.
Try to break it apart a little bit. Allow test snippets
to be defined in separate files and include them automatically
into the huge test array.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Linus Torvalds [Sun, 27 Jan 2019 23:18:05 +0000 (15:18 -0800)]
Linux 5.0-rc4
David S. Miller [Sun, 27 Jan 2019 21:29:43 +0000 (13:29 -0800)]
Merge branch 'tcp-change-pingpong-to-3-in-delayed-ack-logic'
Wei Wang says:
====================
tcp: change pingpong to 3 in delayed ack logic
TCP receiver today tries not to delay the ACKs to speed up the initial
slow start (a.k.a QUICK ACK mechanism). However the previous design
does not work well with modern TCP applications that starts with an
application-level handshake. For example, a HTTPs server often
receives the SSL hello and responds right away which triggers the TCP
stack to stop the quick ack and start delaying the ACKs based only one
instance of ping-pong. This patchset changes the threshold from 1 to 3
ping-pong transactions, so that we only start to delay the acks after
the receiver responds data quickly three times.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Wang [Fri, 25 Jan 2019 18:53:20 +0000 (10:53 -0800)]
tcp: change pingpong threshold to 3
In order to be more confident about an on-going interactive session, we
increment pingpong count by 1 for every interactive transaction and we
adjust TCP_PINGPONG_THRESH to 3.
This means, we only consider a session in pingpong mode after we see 3
interactive transactions, and start to activate delayed acks in quick
ack mode.
And in order to not over-count the credits, we only increase pingpong
count for the first packet sent in response for the previous received
packet.
This is mainly to prevent delaying the ack immediately after some
handshake protocol but no real interactive traffic pattern afterwards.
Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Wang [Fri, 25 Jan 2019 18:53:19 +0000 (10:53 -0800)]
tcp: Refactor pingpong code
Instead of using pingpong as a single bit information, we refactor the
code to treat it as a counter. When interactive session is detected,
we set pingpong count to TCP_PINGPONG_THRESH. And when pingpong count
is >= TCP_PINGPONG_THRESH, we consider the session in pingpong mode.
This patch is a pure refactor and sets foundation for the next patch.
This patch itself does not change any pingpong logic.
Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yang Wei [Fri, 25 Jan 2019 14:41:50 +0000 (22:41 +0800)]
net: ipv4: ip_input: fix blank line coding style issues
Fix blank line coding style issues, make the code cleaner.
Remove a redundant blank line in ip_rcv_core().
Insert a blank line in ip_rcv() between different statement blocks.
Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Carlo Caione [Fri, 25 Jan 2019 12:35:10 +0000 (12:35 +0000)]
net: phy: at803x: Use helpers to access MMD PHY registers
Libphy provides a standard set of helpers to access the MMD PHY
registers. Use those instead of relying on custom driver-specific
functions.
Signed-off-by: Carlo Caione <ccaione@baylibre.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sun, 27 Jan 2019 20:02:00 +0000 (12:02 -0800)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"A set of fixes for x86:
- Fix the swapped outb() parameters in the KASLR code
- Fix the PKEY handling at fork which missed to preserve the pkey
state for the child. Comes with a test case to validate that.
- Fix the entry stack handling for XEN PV to respect that XEN PV
systems enter the function already on the current thread stack and
not on the trampoline.
- Fix kexec load failure caused by using a stale value when the
kexec_buf structure is reused for subsequent allocations.
- Fix a bogus sizeof() in the memory encryption code
- Enforce PCI dependency for the Intel Low Power Subsystem
- Enforce PCI_LOCKLESS_CONFIG when PCI is enabled"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/Kconfig: Select PCI_LOCKLESS_CONFIG if PCI is enabled
x86/entry/64/compat: Fix stack switching for XEN PV
x86/kexec: Fix a kexec_file_load() failure
x86/mm/mem_encrypt: Fix erroneous sizeof()
x86/selftests/pkeys: Fork() to check for state being preserved
x86/pkeys: Properly copy pkey state at fork()
x86/kaslr: Fix incorrect i8254 outb() parameters
x86/intel/lpss: Make PCI dependency explicit
Linus Torvalds [Sun, 27 Jan 2019 19:57:46 +0000 (11:57 -0800)]
Merge branch 'x86-timers-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 timer fixes from Thomas Gleixner:
"Two commits which were missed to be sent during the merge window.
- The TSC calibration fix turns out to be more urgent as recent
Skylake-X systems seem to have massive trouble with calibration
disturbance. This should go back into stable for that reason and it
the risk of breakage is rather low.
- Drop an unused define"
* 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/hpet: Remove unused FSEC_PER_NSEC define
x86/tsc: Make calibration refinement more robust
Linus Torvalds [Sun, 27 Jan 2019 19:55:06 +0000 (11:55 -0800)]
Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull timer fix from Thomas Glexiner:
"A single regression fix to address the unintended breakage of posix
cpu timers.
This is caused by a new sanity check in the common code, which fails
for posix cpu timers under certain conditions because the posix cpu
timer code never updates the variable which is checked"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
posix-cpu-timers: Unbreak timer rearming
Linus Torvalds [Sun, 27 Jan 2019 19:52:50 +0000 (11:52 -0800)]
Merge branch 'locking-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull locking fixes from Thomas Gleixner:
"A small series of fixes which all address possible missed wakeups:
- Document and fix the wakeup ordering of wake_q
- Add the missing barrier in rcuwait_wake_up(), which was documented
in the comment but missing in the code
- Fix the possible missed wakeups in the rwsem and futex code"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/rwsem: Fix (possible) missed wakeup
futex: Fix (possible) missed wakeup
sched/wake_q: Fix wakeup ordering for wake_q
sched/wake_q: Document wake_q_add()
sched/wait: Fix rcuwait_wake_up() ordering
Linus Torvalds [Sun, 27 Jan 2019 19:25:38 +0000 (11:25 -0800)]
Merge branch 'irq-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
"A small set of fixes for the interrupt subsystem:
- Fix a double increment in the irq descriptor allocator which
resulted in a sanity check only being done for every second
affinity mask
- Add a missing device tree translation in the stm32-exti driver.
Without that the interrupt association is completely wrong.
- Initialize the mutex in the GIC-V3 MBI driver
- Fix the alignment for aliasing devices in the GIC-V3-ITS driver so
multi MSI allocations work correctly
- Ensure that the initial affinity of a interrupt is not empty at
startup time.
- Drop bogus include in the madera irq chip driver
- Fix KernelDoc regression"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/gic-v3-its: Align PCI Multi-MSI allocation on their size
genirq/irqdesc: Fix double increment in alloc_descs()
genirq: Fix the kerneldoc comment for struct irq_affinity_desc
irqchip/madera: Drop GPIO includes
irqchip/gic-v3-mbi: Fix uninitialized mbi_lock
irqchip/stm32-exti: Add domain translate function
genirq: Make sure the initial affinity is not empty
David S. Miller [Sun, 27 Jan 2019 19:06:45 +0000 (11:06 -0800)]
Merge tag 'mlx5-fixes-2019-01-25' of git://git./linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
Mellanox, mlx5 fixes 2019-01-25
This series introduces some fixes to mlx5 driver.
For more information please see tag log below.
Please pull and let me know if there is any problem.
For -stable v4.13
('net/mlx5e: Allow MAC invalidation while spoofchk is ON')
For -stable v4.18
('Revert "net/mlx5e: E-Switch, Initialize eswitch only if eswitch manager"')
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sun, 27 Jan 2019 19:00:37 +0000 (11:00 -0800)]
Merge tag 'edac_fix_for_5.0' of git://git./linux/kernel/git/bp/bp
Pull EDAC fix from Borislav Petkov:
"Fix persistent register offsets of altera_edac, from Thor Thayer"
* tag 'edac_fix_for_5.0' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
EDAC, altera: Fix S10 persistent register offset
Linus Torvalds [Sun, 27 Jan 2019 18:58:20 +0000 (10:58 -0800)]
Merge tag 'for-linus-
20190127' of git://git.kernel.dk/linux-block
Pull block revert from Jens Axboe:
"Silly error snuck into a patch from the last series, let's do a revert
to avoid a potential use-after-free"
* tag 'for-linus-
20190127' of git://git.kernel.dk/linux-block:
Revert "block: cover another queue enter recursion via BIO_QUEUE_ENTERED"
David S. Miller [Sun, 27 Jan 2019 18:43:17 +0000 (10:43 -0800)]
Merge git://git./linux/kernel/git/davem/net
Bernard Pidoux [Fri, 25 Jan 2019 10:46:40 +0000 (11:46 +0100)]
net/rose: fix NULL ax25_cb kernel panic
When an internally generated frame is handled by rose_xmit(),
rose_route_frame() is called:
if (!rose_route_frame(skb, NULL)) {
dev_kfree_skb(skb);
stats->tx_errors++;
return NETDEV_TX_OK;
}
We have the same code sequence in Net/Rom where an internally generated
frame is handled by nr_xmit() calling nr_route_frame(skb, NULL).
However, in this function NULL argument is tested while it is not in
rose_route_frame().
Then kernel panic occurs later on when calling ax25cmp() with a NULL
ax25_cb argument as reported many times and recently with syzbot.
We need to test if ax25 is NULL before using it.
Testing:
Built kernel with CONFIG_ROSE=y.
Signed-off-by: Bernard Pidoux <f6bvp@free.fr>
Acked-by: Dmitry Vyukov <dvyukov@google.com>
Reported-by: syzbot+1a2c456a1ea08fa5b5f7@syzkaller.appspotmail.com
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Bernard Pidoux <f6bvp@free.fr>
Cc: linux-hams@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Tomonori Sakita [Fri, 25 Jan 2019 02:02:22 +0000 (11:02 +0900)]
net: altera_tse: fix msgdma_tx_completion on non-zero fill_level case
If fill_level was not zero and status was not BUSY,
result of "tx_prod - tx_cons - inuse" might be zero.
Subtracting 1 unconditionally results invalid negative return value
on this case.
Make sure not to return an negative value.
Signed-off-by: Tomonori Sakita <tomonori.sakita@sord.co.jp>
Signed-off-by: Atsushi Nemoto <atsushi.nemoto@sord.co.jp>
Reviewed-by: Dalon L Westergreen <dalon.westergreen@linux.intel.com>
Acked-by: Thor Thayer <thor.thayer@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Thu, 24 Jan 2019 22:18:18 +0000 (14:18 -0800)]
netrom: switch to sock timer API
sk_reset_timer() and sk_stop_timer() properly handle
sock refcnt for timer function. Switching to them
could fix a refcounting bug reported by syzbot.
Reported-and-tested-by: syzbot+defa700d16f1bd1b9a05@syzkaller.appspotmail.com
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-hams@vger.kernel.org
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 27 Jan 2019 18:30:01 +0000 (10:30 -0800)]
Merge branch 'master' of git://git./linux/kernel/git/klassert/ipsec
Steffen Klassert says:
====================
pull request (net): ipsec 2019-01-25
1) Several patches to fix the fallout from the recent
tree based policy lookup work. From Florian Westphal.
2) Fix VTI for IPCOMP for 'not compressed' IPCOMP packets.
We need an extra IPIP handler to process these packets
correctly. From Su Yanjun.
3) Fix validation of template and selector families for
MODE_ROUTEOPTIMIZATION with ipv4-in-ipv6 packets.
This can lead to a stack-out-of-bounds because
flowi4 struct is treated as flowi6 struct.
Fix from Florian Westphal.
4) Restore the default behaviour of the xfrm set-mark
in the output path. This was changed accidentally
when mark setting was extended to the input path.
From Benedict Wong.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sun, 27 Jan 2019 17:21:00 +0000 (09:21 -0800)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
"Quite a few fixes for x86: nested virtualization save/restore, AMD
nested virtualization and virtual APIC, 32-bit fixes, an important fix
to restore operation on older processors, and a bunch of hyper-v
bugfixes. Several are marked stable.
There are also fixes for GCC warnings and for a GCC/objtool interaction"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: Mark expected switch fall-throughs
KVM: x86: fix TRACE_INCLUDE_PATH and remove -I. header search paths
KVM: selftests: check returned evmcs version range
x86/kvm/hyper-v: nested_enable_evmcs() sets vmcs_version incorrectly
KVM: VMX: Move vmx_vcpu_run()'s VM-Enter asm blob to a helper function
kvm: selftests: Fix region overlap check in kvm_util
kvm: vmx: fix some -Wmissing-prototypes warnings
KVM: nSVM: clear events pending from svm_complete_interrupts() when exiting to L1
svm: Fix AVIC incomplete IPI emulation
svm: Add warning message for AVIC IPI invalid target
KVM: x86: WARN_ONCE if sending a PV IPI returns a fatal error
KVM: x86: Fix PV IPIs for 32-bit KVM host
x86/kvm/hyper-v: recommend using eVMCS only when it is enabled
x86/kvm/hyper-v: don't recommend doing reset via synthetic MSR
kvm: x86/vmx: Use kzalloc for cached_vmcs12
KVM: VMX: Use the correct field var when clearing VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL
KVM: x86: Fix single-step debugging
x86/kvm/hyper-v: don't announce GUEST IDLE MSR support
Linus Torvalds [Sun, 27 Jan 2019 17:18:05 +0000 (09:18 -0800)]
Merge tag 'dma-mapping-5.0-2' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping fix from Christoph Hellwig:
"Fix a xen-swiotlb regression on arm64"
* tag 'dma-mapping-5.0-2' of git://git.infradead.org/users/hch/dma-mapping:
arm64/xen: fix xen-swiotlb cache flushing
Linus Torvalds [Sun, 27 Jan 2019 17:11:51 +0000 (09:11 -0800)]
Merge tag 'libnvdimm-fixes-5.0-rc4' of git://git./linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fixes from Dan Williams:
"A fix for namespace label support for non-Intel NVDIMMs that implement
the ACPI standard label method.
This has apparently never worked and could wait for v5.1. However it
has enough visibility with hardware vendors [1] and distro bug
trackers [2], and low enough risk that I decided it should go in for
-rc4. The other fixups target the new, for v5.0, nvdimm security
functionality. The larger init path fixup closes a memory leak and a
potential userspace lockup due to missed notifications.
[1] https://github.com/pmem/ndctl/issues/78
[2] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/
1811785
These have all soaked in -next for a week with no reported issues.
Summary:
- Fix support for NVDIMMs that implement the ACPI standard label
methods.
- Fix error handling for security overwrite (memory leak / userspace
hang condition), and another one-line security cleanup"
* tag 'libnvdimm-fixes-5.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
acpi/nfit: Fix command-supported detection
acpi/nfit: Block function zero DSMs
libnvdimm/security: Require nvdimm_security_setup_events() to succeed
nfit_test: fix security state pull for nvdimm security nfit_test
Linus Torvalds [Sun, 27 Jan 2019 17:07:03 +0000 (09:07 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:
"A fixup for the input_event fix for y2038 Sparc64, and couple other
minor fixes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: input_event - fix the CONFIG_SPARC64 mixup
Input: olpc_apsp - assign priv->dev earlier
Input: uinput - fix undefined behavior in uinput_validate_absinfo()
Input: raspberrypi-ts - fix link error
Input: xpad - add support for SteelSeries Stratus Duo
Input: input_event - provide override for sparc64
Linus Torvalds [Sun, 27 Jan 2019 16:59:12 +0000 (08:59 -0800)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Count ttl-dropped frames properly in mac80211, from Bob Copeland.
2) Integer overflow in ktime handling of bcm can code, from Oliver
Hartkopp.
3) Fix RX desc handling wrt. hw checksumming in ravb, from Simon
Horman.
4) Various hash key fixes in hv_netvsc, from Haiyang Zhang.
5) Use after free in ax25, from Eric Dumazet.
6) Several fixes to the SSN support in SCTP, from Xin Long.
7) Do not process frames after a NAPI reschedule in ibmveth, from
Thomas Falcon.
8) Fix NLA_POLICY_NESTED arguments, from Johannes Berg.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (42 commits)
qed: Revert error handling changes.
cfg80211: extend range deviation for DMG
cfg80211: reg: remove warn_on for a normal case
mac80211: Add attribute aligned(2) to struct 'action'
mac80211: don't initiate TDLS connection if station is not associated to AP
nl80211: fix NLA_POLICY_NESTED() arguments
ibmveth: Do not process frames after calling napi_reschedule
net: dev_is_mac_header_xmit() true for ARPHRD_RAWIP
net: usb: asix: ax88772_bind return error when hw_reset fail
MAINTAINERS: Update cavium networking drivers
net/mlx4_core: Fix error handling when initializing CQ bufs in the driver
net/mlx4_core: Add masking for a few queries on HCA caps
sctp: set flow sport from saddr only when it's 0
sctp: set chunk transport correctly when it's a new asoc
sctp: improve the events for sctp stream adding
sctp: improve the events for sctp stream reset
ip_tunnel: Make none-tunnel-dst tunnel port work with lwtunnel
ax25: fix possible use-after-free
sfc: suppress duplicate nvmem partition types in efx_ef10_mtd_probe
hv_netvsc: fix typos in code comments
...
Jens Axboe [Sun, 27 Jan 2019 13:35:28 +0000 (06:35 -0700)]
Revert "block: cover another queue enter recursion via BIO_QUEUE_ENTERED"
We can't touch a bio after ->make_request_fn(), for all we know it could
already have been completed by the time this function returns.
This reverts commit
698cef173983b086977e633e46476e0f925ca01e.
Reported-by: syzbot+4df6ca820108fd248943@syzkaller.appspotmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Masahiro Yamada [Fri, 25 Jan 2019 14:22:29 +0000 (23:22 +0900)]
net: lmc: remove -I. header search path
The header search path -I. in kernel Makefiles is very suspicious;
it allows the compiler to search for headers in the top of $(srctree),
where obviously no header file exists.
I was able to build without this header search path.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: David S. Miller <davem@davemloft.net>