platform/kernel/linux-rpi.git
6 years agonet: sched: red: don't reset the backlog on every stat dump
Jakub Kicinski [Mon, 15 Jan 2018 04:01:26 +0000 (20:01 -0800)]
net: sched: red: don't reset the backlog on every stat dump

Commit 0dfb33a0d7e2 ("sch_red: report backlog information") copied
child's backlog into RED's backlog.  Back then RED did not maintain
its own backlog counts.  This has changed after commit 2ccccf5fb43f
("net_sched: update hierarchical backlog too") and commit d7f4f332f082
("sch_red: update backlog as well").  Copying is no longer necessary.

Tested:

$ tc -s qdisc show dev veth0
qdisc red 1: root refcnt 2 limit 400000b min 30000b max 30000b ecn
 Sent 20942 bytes 221 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 1260b 14p requeues 14
  marked 0 early 0 pdrop 0 other 0
qdisc tbf 2: parent 1: rate 1Kbit burst 15000b lat 3585.0s
 Sent 20942 bytes 221 pkt (dropped 0, overlimits 138 requeues 0)
 backlog 1260b 14p requeues 14

Recently RED offload was added.  We need to make sure drivers don't
depend on resetting the stats.  This means backlog should be treated
like any other statistic:

  total_stat = new_hw_stat - prev_hw_stat;

Adjust mlxsw.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Nogah Frankel <nogahf@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/mlx5: Fix build break
Saeed Mahameed [Wed, 17 Jan 2018 07:54:38 +0000 (09:54 +0200)]
net/mlx5: Fix build break

The latest merge between net and net-next introduced a complier assert in
mlx5 driver.  In hca_cap_bits older fields are kept along with newer
fields that should have replaced them.

Fixes: c02b3741eb99 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
David S. Miller [Wed, 17 Jan 2018 05:00:25 +0000 (00:00 -0500)]
Merge git://git./linux/kernel/git/davem/net

Overlapping changes all over.

The mini-qdisc bits were a little bit tricky, however.

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
David S. Miller [Wed, 17 Jan 2018 03:42:14 +0000 (22:42 -0500)]
Merge git://git./linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2018-01-17

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Add initial BPF map offloading for nfp driver. Currently only
   programs were supported so far w/o being able to access maps.
   Offloaded programs are right now only allowed to perform map
   lookups, and control path is responsible for populating the
   maps. BPF core infrastructure along with nfp implementation is
   provided, from Jakub.

2) Various follow-ups to Josef's BPF error injections. More
   specifically that includes: properly check whether the error
   injectable event is on function entry or not, remove the percpu
   bpf_kprobe_override and rather compare instruction pointer
   with original one, separate error-injection from kprobes since
   it's not limited to it, add injectable error types in order to
   specify what is the expected type of failure, and last but not
   least also support the kernel's fault injection framework, all
   from Masami.

3) Various misc improvements and cleanups to the libbpf Makefile.
   That is, fix permissions when installing BPF header files, remove
   unused variables and functions, and also install the libbpf.h
   header, from Jesper.

4) When offloading to nfp JIT and the BPF insn is unsupported in the
   JIT, then reject right at verification time. Also fix libbpf with
   regards to ELF section name matching by properly treating the
   program type as prefix. Both from Quentin.

5) Add -DPACKAGE to bpftool when including bfd.h for the disassembler.
   This is needed, for example, when building libfd from source as
   bpftool doesn't supply a config.h for bfd.h. Fix from Jiong.

6) xdp_convert_ctx_access() is simplified since it doesn't need to
   set target size during verification, from Jesper.

7) Let bpftool properly recognize BPF_PROG_TYPE_CGROUP_DEVICE
   program types, from Roman.

8) Various functions in BPF cpumap were not declared static, from Wei.

9) Fix a double semicolon in BPF samples, from Luis.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Linus Torvalds [Wed, 17 Jan 2018 00:47:40 +0000 (16:47 -0800)]
Merge tag 'for-linus' of git://git./linux/kernel/git/rdma/rdma

Pull rdma fixes from Doug Ledford:
 "We had a few more items creep up over the last week. Given we are in
  -rc8, these are obviously limited to bugs that have a big downside and
  for which we are certain of the fix.

  The first is a straight up oops bug that all you have to do is read
  the code to see it's a guaranteed 100% oops bug.

  The second is a use-after-free issue. We get away lucky if the queue
  we are shutting down is empty, but if it isn't, we can end up oopsing.
  We really need to drain the queue before destroying it.

  The final one is an issue with bad user input causing us to access our
  port array out of bounds. While fixing the array out of bounds issue,
  it was noticed that the original code did the same thing twice (the
  call to rdma_ah_set_port_num()), so its removal is not balanced by a
  readd elsewhere, it was already where it needed to be in addition to
  where it didn't need to be.

  Summary:

   - Oops fix in hfi1 driver

   - use-after-free issue in iser-target

   - use of user supplied array index without proper checking"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/mlx5: Fix out-of-bound access while querying AH
  IB/hfi1: Prevent a NULL dereference
  iser-target: Fix possible use-after-free in connection establishment error

6 years agoMerge branch 'bpf-libbpf-cleanups'
Daniel Borkmann [Wed, 17 Jan 2018 00:18:11 +0000 (01:18 +0100)]
Merge branch 'bpf-libbpf-cleanups'

Jesper Dangaard Brouer says:

====================
This patchset contains some small improvements and cleanup for
the Makefile in tools/lib/bpf/.

It worries me that the libbpf.so shared library is not versioned,
but it not addressed in this patchset.
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
6 years agolibbpf: Makefile set specified permission mode
Jesper Dangaard Brouer [Tue, 16 Jan 2018 23:20:40 +0000 (00:20 +0100)]
libbpf: Makefile set specified permission mode

The third parameter to do_install was not used by $(INSTALL) command.
Fix this by only setting the -m option when the third parameter is supplied.

The use of a third parameter was introduced in commit  eb54e522a000 ("bpf:
install libbpf headers on 'make install'").

Without this change, the header files are install as executables files (755).

Fixes: eb54e522a000 ("bpf: install libbpf headers on 'make install'")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
6 years agolibbpf: cleanup Makefile, remove unused elements
Jesper Dangaard Brouer [Tue, 16 Jan 2018 23:20:35 +0000 (00:20 +0100)]
libbpf: cleanup Makefile, remove unused elements

The plugin_dir_SQ variable is not used, remove it.
The function update_dir is also unused, remove it.
The variable $VERSION_FILES is empty, remove it.

These all originates from the introduction of the Makefile, and is likely a copy paste
from tools/lib/traceevent/Makefile.

Fixes: 1b76c13e4b36 ("bpf tools: Introduce 'bpf' library and add bpf feature check")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
6 years agolibbpf: install the header file libbpf.h
Jesper Dangaard Brouer [Tue, 16 Jan 2018 23:20:30 +0000 (00:20 +0100)]
libbpf: install the header file libbpf.h

It seems like an oversight not to install the header file for libbpf,
given the libbpf.so + libbpf.a files are installed.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
6 years agoMerge branch 'bpf-various-improvements'
Daniel Borkmann [Wed, 17 Jan 2018 00:15:07 +0000 (01:15 +0100)]
Merge branch 'bpf-various-improvements'

Jakub Kicinski says:

====================
This series combines a number of random improvements ranging from
libbpf to nfp driver.  NFP patches make better use of the verifier
log.  There is a requested adjustment to the map offload code, and
a warning fix for a W=1 build to the disassembler.  Quentin also
fixes the libbpf program type detection, while Jiong allows the use
of libbfd compiled from source.
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
6 years agonfp: bpf: reject program on instructions unknown to the JIT compiler
Quentin Monnet [Tue, 16 Jan 2018 23:51:50 +0000 (15:51 -0800)]
nfp: bpf: reject program on instructions unknown to the JIT compiler

If an eBPF instruction is unknown to the driver JIT compiler, we can
reject the program at verification time.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
6 years agonfp: bpf: print map lookup problems into verifier log
Jakub Kicinski [Tue, 16 Jan 2018 23:51:49 +0000 (15:51 -0800)]
nfp: bpf: print map lookup problems into verifier log

Use the verifier log to output error messages if map lookup
can't be offloaded.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
6 years agolibbpf: fix string comparison for guessing eBPF program type
Quentin Monnet [Tue, 16 Jan 2018 23:51:48 +0000 (15:51 -0800)]
libbpf: fix string comparison for guessing eBPF program type

libbpf is able to deduce the type of a program from the name of the ELF
section in which it is located. However, the comparison is made on the
first n characters, n being determined with sizeof() applied to the
reference string (e.g. "xdp"). When such section names are supposed to
receive a suffix separated with a slash (e.g. "kprobe/"), using sizeof()
takes the final NUL character of the reference string into account,
which implies that both strings must be equal. Instead, the desired
behaviour would consist in taking the length of the string, *without*
accounting for the ending NUL character, and to make sure the reference
string is a prefix to the ELF section name.

Subtract 1 to the total size of the string for obtaining the length for
the comparison.

Fixes: 583c90097f72 ("libbpf: add ability to guess program type based on section name")
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
6 years agotools: bpftool: add -DPACKAGE when including bfd.h
Jiong Wang [Tue, 16 Jan 2018 23:51:47 +0000 (15:51 -0800)]
tools: bpftool: add -DPACKAGE when including bfd.h

bfd.h is requiring including of config.h except when PACKAGE or
PACKAGE_VERSION are defined.

  /* PR 14072: Ensure that config.h is included first.  */
  #if !defined PACKAGE && !defined PACKAGE_VERSION
  #error config.h must be included before this header
  #endif

This check has been introduced since May-2012. It doesn't show up in bfd.h
on some Linux distribution, probably because distributions have remove it
when building the package.

However, sometimes the user might just build libfd from source code then
link bpftool against it. For this case, bfd.h will be original that we need
to define PACKAGE or PACKAGE_VERSION.

Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
6 years agobpf: annotate bpf_insn_print_t with __printf
Jakub Kicinski [Tue, 16 Jan 2018 23:51:46 +0000 (15:51 -0800)]
bpf: annotate bpf_insn_print_t with __printf

Functions of type bpf_insn_print_t take printf-like format
string, mark the type accordingly.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
6 years agobpf: offload: make bpf_offload_dev_match() reject host+host case
Jakub Kicinski [Tue, 16 Jan 2018 23:51:45 +0000 (15:51 -0800)]
bpf: offload: make bpf_offload_dev_match() reject host+host case

Daniel suggests it would be more logical for bpf_offload_dev_match()
to return false is either the program or the map are not offloaded,
rather than treating the both not offloaded case as a "matching
CPU/host device".

This makes no functional difference today, since verifier only calls
bpf_offload_dev_match() when one of the objects is offloaded.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
6 years agosamples/bpf: Fix trailing semicolon
Luis de Bethencourt [Tue, 16 Jan 2018 14:15:30 +0000 (14:15 +0000)]
samples/bpf: Fix trailing semicolon

The trailing semicolon is an empty statement that does no operation.
Removing it since it doesn't do anything.

Signed-off-by: Luis de Bethencourt <luisbg@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
6 years agobpf: cpumap: make some functions static
Wei Yongjun [Tue, 16 Jan 2018 11:27:05 +0000 (11:27 +0000)]
bpf: cpumap: make some functions static

Fixes the following sparse warnings:

kernel/bpf/cpumap.c:146:6: warning:
 symbol '__cpu_map_queue_destructor' was not declared. Should it be static?
kernel/bpf/cpumap.c:225:16: warning:
 symbol 'cpu_map_build_skb' was not declared. Should it be static?
kernel/bpf/cpumap.c:340:26: warning:
 symbol '__cpu_map_entry_alloc' was not declared. Should it be static?
kernel/bpf/cpumap.c:398:6: warning:
 symbol '__cpu_map_entry_free' was not declared. Should it be static?
kernel/bpf/cpumap.c:441:6: warning:
 symbol '__cpu_map_entry_replace' was not declared. Should it be static?
kernel/bpf/cpumap.c:454:5: warning:
 symbol 'cpu_map_delete_elem' was not declared. Should it be static?
kernel/bpf/cpumap.c:467:5: warning:
 symbol 'cpu_map_update_elem' was not declared. Should it be static?
kernel/bpf/cpumap.c:505:6: warning:
 symbol 'cpu_map_free' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
6 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Linus Torvalds [Tue, 16 Jan 2018 20:45:30 +0000 (12:45 -0800)]
Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

 1) Two read past end of buffer fixes in AF_KEY, from Eric Biggers.

 2) Memory leak in key_notify_policy(), from Steffen Klassert.

 3) Fix overflow with bpf arrays, from Daniel Borkmann.

 4) Fix RDMA regression with mlx5 due to mlx5 no longer using
    pci_irq_get_affinity(), from Saeed Mahameed.

 5) Missing RCU read locking in nl80211_send_iface() when it calls
    ieee80211_bss_get_ie(), from Dominik Brodowski.

 6) cfg80211 should check dev_set_name()'s return value, from Johannes
    Berg.

 7) Missing module license tag in 9p protocol, from Stephen Hemminger.

 8) Fix crash due to too small MTU in udp ipv6 sendmsg, from Mike
    Maloney.

 9) Fix endless loop in netlink extack code, from David Ahern.

10) TLS socket layer sets inverted error codes, resulting in an endless
    loop. From Robert Hering.

11) Revert openvswitch erspan tunnel support, it's mis-designed and we
    need to kill it before it goes into a real release. From William Tu.

12) Fix lan78xx failures in full speed USB mode, from Yuiko Oshino.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (54 commits)
  net, sched: fix panic when updating miniq {b,q}stats
  qed: Fix potential use-after-free in qed_spq_post()
  nfp: use the correct index for link speed table
  lan78xx: Fix failure in USB Full Speed
  sctp: do not allow the v4 socket to bind a v4mapped v6 address
  sctp: return error if the asoc has been peeled off in sctp_wait_for_sndbuf
  sctp: reinit stream if stream outcnt has been change by sinit in sendmsg
  ibmvnic: Fix pending MAC address changes
  netlink: extack: avoid parenthesized string constant warning
  ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY
  net: Allow neigh contructor functions ability to modify the primary_key
  sh_eth: fix dumping ARSTR
  Revert "openvswitch: Add erspan tunnel support."
  net/tls: Fix inverted error codes to avoid endless loop
  ipv6: ip6_make_skb() needs to clear cork.base.dst
  sctp: avoid compiler warning on implicit fallthru
  net: ipv4: Make "ip route get" match iif lo rules again.
  netlink: extack needs to be reset each time through loop
  tipc: fix a memory leak in tipc_nl_node_get_link()
  ipv6: fix udpv6 sendmsg crash caused by too small MTU
  ...

6 years agobnxt_en: don't update cpr->rx_bytes with uninitialized length len
Colin Ian King [Tue, 16 Jan 2018 10:22:50 +0000 (10:22 +0000)]
bnxt_en: don't update cpr->rx_bytes with uninitialized length len

Currently in the cases where cmp_type == CMP_TYPE_RX_L2_TPA_START_CMP or
CMP_TYPE_RX_L2_TPA_END_CMP the exit path updates cpr->rx_bytes with an
uninitialized length len.  Fix this by adding a new exit path that does
not update the cpr stats with the bogus length len and remove the unused
label next_rx_no_prod.

Detected by CoverityScan, CID#1463807 ("Uninitialized scalar variable")
Fixes: 6a8788f25625 ("bnxt_en: add support for software dynamic interrupt moderation")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge tag 'sound-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Linus Torvalds [Tue, 16 Jan 2018 20:13:52 +0000 (12:13 -0800)]
Merge tag 'sound-4.15' of git://git./linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "A few small last-minute fixes that should sneak into 4.15:

   - remove a spurious WARN_ON() triggered by syzkaller

   - fix for ioctl races in ALSA sequencer

   - two trivial HD-audio fixup entries"

* tag 'sound-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: seq: Make ioctls race-free
  ALSA: pcm: Remove yet superfluous WARN_ON()
  ALSA: hda - Apply the existing quirk to iMac 14,1
  ALSA: hda - Apply headphone noise quirk for another Dell XPS 13 variant

6 years agoMerge tag 'trace-v4.15-rc4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rosted...
Linus Torvalds [Tue, 16 Jan 2018 20:09:36 +0000 (12:09 -0800)]
Merge tag 'trace-v4.15-rc4-2' of git://git./linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:

 - Bring back context level recursive protection in ring buffer.

   The simpler counter protection failed, due to a path when tracing
   with trace_clock_global() as it could not be reentrant and depended
   on the ring buffer recursive protection to keep that from happening.

 - Prevent branch profiling when FORTIFY_SOURCE is enabled.

   It causes 50 - 60 MB in warning messages. Branch profiling should
   never be run on production systems, so there's no reason that it
   needs to be enabled with FORTIFY_SOURCE.

* tag 'trace-v4.15-rc4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Prevent PROFILE_ALL_BRANCHES when FORTIFY_SOURCE=y
  ring-buffer: Bring back context level recursive checks

6 years agonet, sched: fix panic when updating miniq {b,q}stats
Daniel Borkmann [Mon, 15 Jan 2018 22:12:09 +0000 (23:12 +0100)]
net, sched: fix panic when updating miniq {b,q}stats

While working on fixing another bug, I ran into the following panic
on arm64 by simply attaching clsact qdisc, adding a filter and running
traffic on ingress to it:

  [...]
  [  178.188591] Unable to handle kernel read from unreadable memory at virtual address 810fb501f000
  [  178.197314] Mem abort info:
  [  178.200121]   ESR = 0x96000004
  [  178.203168]   Exception class = DABT (current EL), IL = 32 bits
  [  178.209095]   SET = 0, FnV = 0
  [  178.212157]   EA = 0, S1PTW = 0
  [  178.215288] Data abort info:
  [  178.218175]   ISV = 0, ISS = 0x00000004
  [  178.222019]   CM = 0, WnR = 0
  [  178.224997] user pgtable: 4k pages, 48-bit VAs, pgd = 0000000023cb3f33
  [  178.231531] [0000810fb501f000] *pgd=0000000000000000
  [  178.236508] Internal error: Oops: 96000004 [#1] SMP
  [...]
  [  178.311855] CPU: 73 PID: 2497 Comm: ping Tainted: G        W        4.15.0-rc7+ #5
  [  178.319413] Hardware name: FOXCONN R2-1221R-A4/C2U4N_MB, BIOS G31FB18A 03/31/2017
  [  178.326887] pstate: 60400005 (nZCv daif +PAN -UAO)
  [  178.331685] pc : __netif_receive_skb_core+0x49c/0xac8
  [  178.336728] lr : __netif_receive_skb+0x28/0x78
  [  178.341161] sp : ffff00002344b750
  [  178.344465] x29: ffff00002344b750 x28: ffff810fbdfd0580
  [  178.349769] x27: 0000000000000000 x26: ffff000009378000
  [...]
  [  178.418715] x1 : 0000000000000054 x0 : 0000000000000000
  [  178.424020] Process ping (pid: 2497, stack limit = 0x000000009f0a3ff4)
  [  178.430537] Call trace:
  [  178.432976]  __netif_receive_skb_core+0x49c/0xac8
  [  178.437670]  __netif_receive_skb+0x28/0x78
  [  178.441757]  process_backlog+0x9c/0x160
  [  178.445584]  net_rx_action+0x2f8/0x3f0
  [...]

Reason is that sch_ingress and sch_clsact are doing mini_qdisc_pair_init()
which sets up miniq pointers to cpu_{b,q}stats from the underlying qdisc.
Problem is that this cannot work since they are actually set up right after
the qdisc ->init() callback in qdisc_create(), so first packet going into
sch_handle_ingress() tries to call mini_qdisc_bstats_cpu_update() and we
therefore panic.

In order to fix this, allocation of {b,q}stats needs to happen before we
call into ->init(). In net-next, there's already such option through commit
d59f5ffa59d8 ("net: sched: a dflt qdisc may be used with per cpu stats").
However, the bug needs to be fixed in net still for 4.15. Thus, include
these bits to reduce any merge churn and reuse the static_flags field to
set TCQ_F_CPUSTATS, and remove the allocation from qdisc_create() since
there is no other user left. Prashant Bhole ran into the same issue but
for net-next, thus adding him below as well as co-author. Same issue was
also reported by Sandipan Das when using bcc.

Fixes: 46209401f8f6 ("net: core: introduce mini_Qdisc and eliminate usage of tp->q for clsact fastpath")
Reference: https://lists.iovisor.org/pipermail/iovisor-dev/2018-January/001190.html
Reported-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
Co-authored-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Co-authored-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: delete /proc THIS_MODULE references
Alexey Dobriyan [Mon, 15 Jan 2018 21:42:40 +0000 (00:42 +0300)]
net: delete /proc THIS_MODULE references

/proc has been ignoring struct file_operations::owner field for 10 years.
Specifically, it started with commit 786d7e1612f0b0adb6046f19b906609e4fe8b1ba
("Fix rmmod/read/write races in /proc entries"). Notice the chunk where
inode->i_fop is initialized with proxy struct file_operations for
regular files:

-               if (de->proc_fops)
-                       inode->i_fop = de->proc_fops;
+               if (de->proc_fops) {
+                       if (S_ISREG(inode->i_mode))
+                               inode->i_fop = &proc_reg_file_ops;
+                       else
+                               inode->i_fop = de->proc_fops;
+               }

VFS stopped pinning module at this point.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoqed: Fix potential use-after-free in qed_spq_post()
Roland Dreier [Mon, 15 Jan 2018 20:24:49 +0000 (12:24 -0800)]
qed: Fix potential use-after-free in qed_spq_post()

We need to check if p_ent->comp_mode is QED_SPQ_MODE_EBLOCK before
calling qed_spq_add_entry().  The test is fine is the mode is EBLOCK,
but if it isn't then qed_spq_add_entry() might kfree(p_ent).

Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: remove prototype of qdisc_lookup_class()
Jakub Kicinski [Mon, 15 Jan 2018 20:13:32 +0000 (12:13 -0800)]
net: remove prototype of qdisc_lookup_class()

Looks like qdisc_lookup_class() never existed in the tree
in the git era.  Remove the prototype from the header.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonfp: use the correct index for link speed table
Jakub Kicinski [Mon, 15 Jan 2018 19:47:53 +0000 (11:47 -0800)]
nfp: use the correct index for link speed table

sts variable is holding link speed as well as state.  We should
be using ls to index into ls_to_ethtool.

Fixes: 265aeb511bd5 ("nfp: add support for .get_link_ksettings()")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agolan78xx: Fix failure in USB Full Speed
Yuiko Oshino [Mon, 15 Jan 2018 18:24:28 +0000 (13:24 -0500)]
lan78xx: Fix failure in USB Full Speed

Fix initialize the uninitialized tx_qlen to an appropriate value when USB
Full Speed is used.

Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver")
Signed-off-by: Yuiko Oshino <yuiko.oshino@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotipc: fix race condition at topology server receive
Jon Maloy [Mon, 15 Jan 2018 16:56:28 +0000 (17:56 +0100)]
tipc: fix race condition at topology server receive

We have identified a race condition during reception of socket
events and messages in the topology server.

- The function tipc_close_conn() is releasing the corresponding
  struct tipc_subscriber instance without considering that there
  may still be items in the receive work queue. When those are
  scheduled, in the function tipc_receive_from_work(), they are
  using the subscriber pointer stored in struct tipc_conn, without
  first checking if this is valid or not. This will sometimes
  lead to crashes, as the next call of tipc_conn_recvmsg() will
  access the now deleted item.
  We fix this by making the usage of this pointer conditional on
  whether the connection is active or not. I.e., we check the condition
  test_bit(CF_CONNECTED) before making the call tipc_conn_recvmsg().

- Since the two functions may be running on different cores, the
  condition test described above is not enough. tipc_close_conn()
  may come in between and delete the subscriber item after the condition
  test is done, but before tipc_conn_recv_msg() is finished. This
  happens less frequently than the problem described above, but leads
  to the same symptoms.

  We fix this by using the existing sk_callback_lock for mutual
  exclusion in the two functions. In addition, we have to move
  a call to tipc_conn_terminate() outside the mentioned lock to
  avoid deadlock.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'aquantia-next'
David S. Miller [Tue, 16 Jan 2018 19:40:02 +0000 (14:40 -0500)]
Merge branch 'aquantia-next'

Igor Russkikh says:

====================
Aquantia atlantic driver update 2018/01

This patch is a set of cleanups and bugfixes in preparation to new
Aquantia hardware support.

Standard ARRAY_SIZE is now used through all the code,
some unused abstraction structures removed and cleaned up,
duplicate declarations removed.

Also two large declaration styling fixes:
- Hardware register set defines are lined up with kernel style
- Hardware access functions were not prefixed, now already
  defined hw_atl prefix is used.

patch v2 changes:
- patch reorganized because of its big size. New HW support
  will be submitted as a separate patchset.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: aquantia: Fix internal stats calculation on rx
Igor Russkikh [Mon, 15 Jan 2018 13:41:22 +0000 (16:41 +0300)]
net: aquantia: Fix internal stats calculation on rx

skb len should be fetched before gro_receive - otherwise we may get
wrong or even outdated skb data.

Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: aquantia: Prepend hw access functions declarations with prefix
Igor Russkikh [Mon, 15 Jan 2018 13:41:21 +0000 (16:41 +0300)]
net: aquantia: Prepend hw access functions declarations with prefix

Internal functions for registers and HW access were not prefixed.
This introduce noise in global kernel symbols. Here we add explicit prefix
'hw_atl' to all the HW access layer functions.
Alignment and styling were fixed as well.

Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: aquantia: Fix register definitions to linux style
Igor Russkikh [Mon, 15 Jan 2018 13:41:20 +0000 (16:41 +0300)]
net: aquantia: Fix register definitions to linux style

Original driver code had internal registers and masks declarations
in low case and without any prefix.
Here we make all these uppercase and add already used HW_ATL prefix
to recognize these.

Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: aquantia: Eliminate aq_nic structure abstraction
Igor Russkikh [Mon, 15 Jan 2018 13:41:19 +0000 (16:41 +0300)]
net: aquantia: Eliminate aq_nic structure abstraction

aq_nic_s was hidden in aq_nic_internal.h, that made it difficult to access
nic fields and structures from other modules.
This change moves aq_nic_s struct into aq_nic.h and thus makes it available
to other driver modules, mainly pci module and hw related module.

Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: aquantia: Simplify dependencies between pci modules
Igor Russkikh [Mon, 15 Jan 2018 13:41:18 +0000 (16:41 +0300)]
net: aquantia: Simplify dependencies between pci modules

Eliminate useless passing of net_device_ops and ethtools_ops through
deep chain of calls.
Move all pci related code into aq_pci_func module.

Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: aquantia: Add const qualifiers for hardware ops tables
Igor Russkikh [Mon, 15 Jan 2018 13:41:17 +0000 (16:41 +0300)]
net: aquantia: Add const qualifiers for hardware ops tables

Hardware operations and capabilities tables are constants and
never changed. Declare these as constants.

Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: aquantia: Remove duplicate hardware descriptors declarations
Igor Russkikh [Mon, 15 Jan 2018 13:41:16 +0000 (16:41 +0300)]
net: aquantia: Remove duplicate hardware descriptors declarations

Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: aquantia: Cleanup hardware access modules
Igor Russkikh [Mon, 15 Jan 2018 13:41:15 +0000 (16:41 +0300)]
net: aquantia: Cleanup hardware access modules

Use direct aq_hw_s *self reference where possible
Eliminate useless abstraction PHAL, duplicated structures definitions,
Simplify nic config structure creation and management.

Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: aquantia: Cleanup status flags accesses
Igor Russkikh [Mon, 15 Jan 2018 13:41:14 +0000 (16:41 +0300)]
net: aquantia: Cleanup status flags accesses

Usage of aq_obj_s structure is noop, here we remove it
replacing access to flags filed directly.

Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: aquantia: Eliminate AQ_DIMOF, replace with ARRAY_SIZE
Igor Russkikh [Mon, 15 Jan 2018 13:41:13 +0000 (16:41 +0300)]
net: aquantia: Eliminate AQ_DIMOF, replace with ARRAY_SIZE

Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'net-thunderx-add-support-for-PTP-clock'
David S. Miller [Tue, 16 Jan 2018 19:31:15 +0000 (14:31 -0500)]
Merge branch 'net-thunderx-add-support-for-PTP-clock'

Aleksey Makarov says:

====================
net: thunderx: add support for PTP clock

This series adds support for IEEE 1588 Precision Time Protocol
to Cavium ethernet driver.

The first patch adds support for the Precision Time Protocol Clocks and
Timestamping coprocessor (PTP) found on Cavium processors.
It registers a new PTP clock in the PTP core and provides functions
to use the counter in BGX, TNS, GTI, and NIC blocks.

The second patch introduces support for the PTP protocol to the
Cavium ThunderX ethernet driver.

v6:
- check if ptp_clock_register() returns NULL (Richard Cochran)
- fix doc comment for cavium_ptp_enable() (Richard Cochran)
- fix a function call formatting; use defined constant (Richard Cochran)
- add comments for `tx_ptp_skbs` and `ptp_skb` (Richard Cochran)
- use adjfine() instead of adjfreq() (Richard Cochran)
- add Acked-by: Philippe Ombredanne <pombredanne@nexb.com>

v5: https://lkml.kernel.org/r/20171211141435.2915-1-aleksey.makarov@cavium.com
- fix the file headers (add SPDX tags, remove advertisment) (Philippe Ombredanne)
- use "imply" instead of "select" (Richard Cochran)
- add some code in cavium_ptp_get() for the case when the PTP driver has not been
  registered with the PTP core

v4: https://lkml.kernel.org/r/20171208103442.19354-1-aleksey.makarov@cavium.com
- use IS_ENABLED. This fixes compilation of the ptp as a module (David Miller)
- select PTP_1588_CLOCK, not depend on it.  This fixes a build warning.
- change u64 to __be64.  This fixes the sparse warning
  "warning: cast to restricted __be64"
- make nicvf_config_hwtstamp() static. This fixes the sparse warning
  "warning: symbol 'nicvf_config_hwtstamp' was not declared. Should it be static?"

v3: https://lkml.kernel.org/r/20171206133100.26436-1-aleksey.makarov@cavium.com
- rebase to net-next

v2: https://lkml.kernel.org/r/20171117134909.8954-1-aleksey.makarov@cavium.com
- use readq()/writeq() in place of cavium_ptp_reg_read()/cavium_ptp_reg_write(),
  don't use readq_relaxed()/writeq_relaxed() (David Daney)

v1: https://lkml.kernel.org/r/20171107190704.15458-1-aleksey.makarov@cavium.com
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: thunderx: add timestamping support
Sunil Goutham [Mon, 15 Jan 2018 12:44:57 +0000 (18:44 +0600)]
net: thunderx: add timestamping support

This adds timestamping support for both receive and transmit
paths. On the receive side no filters are supported i.e either
all pkts will get a timestamp appended infront of the packet or none.
On the transmit side HW doesn't support timestamp insertion but
only generates a separate CQE with transmitted packet's timestamp.
Also HW supports only one packet at a time for timestamping on the
transmit side.

Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@cavium.com>
Acked-by: Philippe Ombredanne <pombredanne@nexb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: add support for Cavium PTP coprocessor
Radoslaw Biernacki [Mon, 15 Jan 2018 12:44:56 +0000 (18:44 +0600)]
net: add support for Cavium PTP coprocessor

This patch adds support for the Precision Time Protocol
Clocks and Timestamping hardware found on Cavium ThunderX
processors.

Signed-off-by: Radoslaw Biernacki <rad@semihalf.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@cavium.com>
Acked-by: Philippe Ombredanne <pombredanne@nexb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge tag 'mac80211-for-davem-2018-01-15' of git://git.kernel.org/pub/scm/linux/kerne...
David S. Miller [Tue, 16 Jan 2018 19:28:14 +0000 (14:28 -0500)]
Merge tag 'mac80211-for-davem-2018-01-15' of git://git./linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
More fixes:
 * hwsim:
    - properly flush deletion works at module unload
    - validate # of channels passed from userspace
 * cfg80211:
    - fix RCU locking regression
    - initialize on-stack channel data for nl80211 event
    - check dev_set_name() return value
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: spectrum: qdiscs: Make function mlxsw_sp_qdisc_prio_unoffload static
Wei Yongjun [Mon, 15 Jan 2018 10:43:03 +0000 (10:43 +0000)]
mlxsw: spectrum: qdiscs: Make function mlxsw_sp_qdisc_prio_unoffload static

Fixes the following sparse warning:

drivers/net/ethernet/mellanox/mlxsw/spectrum_qdisc.c:464:1: warning:
 symbol 'mlxsw_sp_qdisc_prio_unoffload' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosctp: do not allow the v4 socket to bind a v4mapped v6 address
Xin Long [Mon, 15 Jan 2018 09:02:00 +0000 (17:02 +0800)]
sctp: do not allow the v4 socket to bind a v4mapped v6 address

The check in sctp_sockaddr_af is not robust enough to forbid binding a
v4mapped v6 addr on a v4 socket.

The worse thing is that v4 socket's bind_verify would not convert this
v4mapped v6 addr to a v4 addr. syzbot even reported a crash as the v4
socket bound a v6 addr.

This patch is to fix it by doing the common sa.sa_family check first,
then AF_INET check for v4mapped v6 addrs.

Fixes: 7dab83de50c7 ("sctp: Support ipv6only AF_INET6 sockets.")
Reported-by: syzbot+7b7b518b1228d2743963@syzkaller.appspotmail.com
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosctp: return error if the asoc has been peeled off in sctp_wait_for_sndbuf
Xin Long [Mon, 15 Jan 2018 09:01:36 +0000 (17:01 +0800)]
sctp: return error if the asoc has been peeled off in sctp_wait_for_sndbuf

After commit cea0cc80a677 ("sctp: use the right sk after waking up from
wait_buf sleep"), it may change to lock another sk if the asoc has been
peeled off in sctp_wait_for_sndbuf.

However, the asoc's new sk could be already closed elsewhere, as it's in
the sendmsg context of the old sk that can't avoid the new sk's closing.
If the sk's last one refcnt is held by this asoc, later on after putting
this asoc, the new sk will be freed, while under it's own lock.

This patch is to revert that commit, but fix the old issue by returning
error under the old sk's lock.

Fixes: cea0cc80a677 ("sctp: use the right sk after waking up from wait_buf sleep")
Reported-by: syzbot+ac6ea7baa4432811eb50@syzkaller.appspotmail.com
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosctp: reinit stream if stream outcnt has been change by sinit in sendmsg
Xin Long [Mon, 15 Jan 2018 09:01:19 +0000 (17:01 +0800)]
sctp: reinit stream if stream outcnt has been change by sinit in sendmsg

After introducing sctp_stream structure, sctp uses stream->outcnt as the
out stream nums instead of c.sinit_num_ostreams.

However when users use sinit in cmsg, it only updates c.sinit_num_ostreams
in sctp_sendmsg. At that moment, stream->outcnt is still using previous
value. If it's value is not updated, the sinit_num_ostreams of sinit could
not really work.

This patch is to fix it by updating stream->outcnt and reiniting stream
if stream outcnt has been change by sinit in sendmsg.

Fixes: a83863174a61 ("sctp: prepare asoc stream for stream reconf")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'devlink-resource'
David S. Miller [Tue, 16 Jan 2018 19:15:36 +0000 (14:15 -0500)]
Merge branch 'devlink-resource'

Jiri Pirko says:

====================
devlink: Add support for resource abstraction

Arkadi says:

Many of the ASIC's internal resources are limited and are shared between
several hardware procedures. For example, unified hash-based memory can
be used for many lookup purposes, like FDB and LPM. In many cases the user
can provide a partitioning scheme for such a resource in order to perform
fine tuning for his application. In such cases performing driver reload is
needed for the changes to take place, thus this patchset also adds support
for hot reload.

Such an abstraction can be coupled with devlink's dpipe interface, which
models the ASIC's pipeline as a graph of match/action tables. By modeling
the hardware resource object, and by coupling it to several dpipe tables,
further visibility can be achieved in order to debug ASIC-wide issues.

The proposed interface will provide the user the ability to understand the
limitations of the hardware, and receive notification regarding its occupancy.
Furthermore, monitoring the resource occupancy can be done in real-time and
can be useful in many cases.

---
v2->v3
- Mix/Max/Gran attributes.
- Add resource consumption per table.
- Change basic resource unit to 'entry'.
- ABI documentation.

v1->v2
- Add resource size attribute.
- Fix split bug.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: documentation: Add resources ABI documentation
Arkadi Sharshevsky [Mon, 15 Jan 2018 07:59:12 +0000 (08:59 +0100)]
mlxsw: documentation: Add resources ABI documentation

Add resources ABI documentation.

Signed-off-by: Arkadi Sharhsevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: core: Add support for reload
Arkadi Sharshevsky [Mon, 15 Jan 2018 07:59:11 +0000 (08:59 +0100)]
mlxsw: core: Add support for reload

Add support for hot reload. First, all the driver/core resources are
released but the PCI and devlink instances, then reset is performed
through the PCI interface. Finally the driver performs initialization.

In case of reload failure the driver is left in a partially initialized
state. Special care is taken during the driver removal in order to
properly handle this state.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: pci: Add support for getting resource through devlink
Arkadi Sharshevsky [Mon, 15 Jan 2018 07:59:10 +0000 (08:59 +0100)]
mlxsw: pci: Add support for getting resource through devlink

Up until now the KVD partition was static. This patch introduces the
ability to get the resource sizes via devlink. In case the resource is not
available the default configuration is used.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: spectrum: Add support for getting kvdl occupancy
Arkadi Sharshevsky [Mon, 15 Jan 2018 07:59:09 +0000 (08:59 +0100)]
mlxsw: spectrum: Add support for getting kvdl occupancy

Add support for getting the kvdl occupancy through the resource interface.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: spectrum_dpipe: Connect dpipe tables to resources
Arkadi Sharshevsky [Mon, 15 Jan 2018 07:59:08 +0000 (08:59 +0100)]
mlxsw: spectrum_dpipe: Connect dpipe tables to resources

Connect current dpipe tables to resources. The tables are connected
in the following fashion:
1. IPv4 host -> KVD hash single
2. IPv6 host -> KVD hash double
3. Adjacency -> KVD linear

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: spectrum: Register KVD resources with devlink
Arkadi Sharshevsky [Mon, 15 Jan 2018 07:59:07 +0000 (08:59 +0100)]
mlxsw: spectrum: Register KVD resources with devlink

Register the KVD resources with devlink. The KVD is a memory resource
which is subdivided into three partitions which are the linear, hash
single and hash double.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: pci: Add support for performing bus reset
Arkadi Sharshevsky [Mon, 15 Jan 2018 07:59:06 +0000 (08:59 +0100)]
mlxsw: pci: Add support for performing bus reset

This is a preparation stage before introducing hot reload. During the
reload process the ASIC should be resetted by accessing the PCI BAR due
to unavailability of the mailbox/emad interfaces.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agodevlink: Add relation between dpipe and resource
Arkadi Sharshevsky [Mon, 15 Jan 2018 07:59:05 +0000 (08:59 +0100)]
devlink: Add relation between dpipe and resource

The hardware processes which are modeled via dpipe commonly use some
internal hardware resources. Such relation can improve the understanding
of hardware limitations. The number of resource's unit consumed per
table's entry are also provided for each table.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agodevlink: Add support for reload
Arkadi Sharshevsky [Mon, 15 Jan 2018 07:59:04 +0000 (08:59 +0100)]
devlink: Add support for reload

Add support for performing driver hot reload.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agodevlink: Add support for resource abstraction
Arkadi Sharshevsky [Mon, 15 Jan 2018 07:59:03 +0000 (08:59 +0100)]
devlink: Add support for resource abstraction

Add support for hardware resource abstraction over devlink. Each resource
is identified via id, furthermore it contains information regarding its
size and its related sub resources. Each resource can also provide its
current occupancy.

In some cases the sizes of some resources can be changed, yet for those
changes to take place a hot driver reload may be needed. The reload
capability will be introduced in the next patch.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agodevlink: Add per devlink instance lock
Arkadi Sharshevsky [Mon, 15 Jan 2018 07:59:02 +0000 (08:59 +0100)]
devlink: Add per devlink instance lock

This is a preparation before introducing resources and hot reload support.
Currently there are two global lock where one protects all devlink access,
and the second one protects devlink port access. This patch adds per devlink
instance lock which protects the internal members which are the sb/dpipe/
resource/ports. By introducing this lock the global devlink port lock can
be discarded.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agophy: realtek: use new helpers for paged register access
Heiner Kallweit [Fri, 12 Jan 2018 22:17:34 +0000 (23:17 +0100)]
phy: realtek: use new helpers for paged register access

Make use of the new helpers for paged register access.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'phy-add-helpers-for-setting-clearing-bits-in-PHY-registers'
David S. Miller [Tue, 16 Jan 2018 17:25:11 +0000 (12:25 -0500)]
Merge branch 'phy-add-helpers-for-setting-clearing-bits-in-PHY-registers'

Heiner Kallweit says:

====================
phy: add helpers for setting/clearing bits in PHY registers

Based on the recent introduction of phy_modify add helpers for setting
and clearing bits in PHY registers. First user is phylib.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agophy: use new helpers phy_set_bits/phy_clear_bits in phylib
Heiner Kallweit [Fri, 12 Jan 2018 20:20:36 +0000 (21:20 +0100)]
phy: use new helpers phy_set_bits/phy_clear_bits in phylib

Use new helpers phy_set_bits / phy_clear_bits in phylib.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agophy: add helpers for setting/clearing bits in PHY registers
Heiner Kallweit [Fri, 12 Jan 2018 20:20:33 +0000 (21:20 +0100)]
phy: add helpers for setting/clearing bits in PHY registers

Based on the recent introduction of phy_modify add helpers for setting
and clearing bits in PHY registers.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoibmvnic: Fix pending MAC address changes
Thomas Falcon [Thu, 11 Jan 2018 01:39:52 +0000 (19:39 -0600)]
ibmvnic: Fix pending MAC address changes

Due to architecture limitations, the IBM VNIC client driver is unable
to perform MAC address changes unless the device has "logged in" to
its backing device. Currently, pending MAC changes are handled before
login, resulting in an error and failure to change the MAC address.
Moving that chunk to the end of the ibmvnic_login function, when we are
sure that it was successful, fixes that.

The MAC address can be changed when the device is up or down, so
only check if the device is in a "PROBED" state before setting the
MAC address.

Fixes: c26eba03e407 ("ibmvnic: Update reset infrastructure to support tunable parameters")
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Reviewed-by: John Allen <jallen@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agobpftool: recognize BPF_PROG_TYPE_CGROUP_DEVICE programs
Roman Gushchin [Mon, 15 Jan 2018 19:16:15 +0000 (19:16 +0000)]
bpftool: recognize BPF_PROG_TYPE_CGROUP_DEVICE programs

Bpftool doesn't recognize BPF_PROG_TYPE_CGROUP_DEVICE programs,
so the prog show command prints the numeric type value:

$ bpftool prog show
1: type 15  name bpf_prog1  tag ac9f93dbfd6d9b74
loaded_at Jan 15/07:58  uid 0
xlated 96B  jited 105B  memlock 4096B

This patch defines the corresponding textual representation:

$ bpftool prog show
1: cgroup_device  name bpf_prog1  tag ac9f93dbfd6d9b74
loaded_at Jan 15/07:58  uid 0
xlated 96B  jited 105B  memlock 4096B

Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Cc: Quentin Monnet <quentin.monnet@netronome.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
6 years agoRDMA/mlx5: Fix out-of-bound access while querying AH
Leon Romanovsky [Fri, 12 Jan 2018 05:58:39 +0000 (07:58 +0200)]
RDMA/mlx5: Fix out-of-bound access while querying AH

The rdma_ah_find_type() accesses the port array based on an index
controlled by userspace. The existing bounds check is after the first use
of the index, so userspace can generate an out of bounds access, as shown
by the KASN report below.

==================================================================
BUG: KASAN: slab-out-of-bounds in to_rdma_ah_attr+0xa8/0x3b0
Read of size 4 at addr ffff880019ae2268 by task ibv_rc_pingpong/409

CPU: 0 PID: 409 Comm: ibv_rc_pingpong Not tainted 4.15.0-rc2-00031-gb60a3faf5b83-dirty #3
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
Call Trace:
 dump_stack+0xe9/0x18f
 print_address_description+0xa2/0x350
 kasan_report+0x3a5/0x400
 to_rdma_ah_attr+0xa8/0x3b0
 mlx5_ib_query_qp+0xd35/0x1330
 ib_query_qp+0x8a/0xb0
 ib_uverbs_query_qp+0x237/0x7f0
 ib_uverbs_write+0x617/0xd80
 __vfs_write+0xf7/0x500
 vfs_write+0x149/0x310
 SyS_write+0xca/0x190
 entry_SYSCALL_64_fastpath+0x18/0x85
RIP: 0033:0x7fe9c7a275a0
RSP: 002b:00007ffee5498738 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 00007fe9c7ce4b00 RCX: 00007fe9c7a275a0
RDX: 0000000000000018 RSI: 00007ffee5498800 RDI: 0000000000000003
RBP: 000055d0c8d3f010 R08: 00007ffee5498800 R09: 0000000000000018
R10: 00000000000000ba R11: 0000000000000246 R12: 0000000000008000
R13: 0000000000004fb0 R14: 000055d0c8d3f050 R15: 00007ffee5498560

Allocated by task 1:
 __kmalloc+0x3f9/0x430
 alloc_mad_private+0x25/0x50
 ib_mad_post_receive_mads+0x204/0xa60
 ib_mad_init_device+0xa59/0x1020
 ib_register_device+0x83a/0xbc0
 mlx5_ib_add+0x50e/0x5c0
 mlx5_add_device+0x142/0x410
 mlx5_register_interface+0x18f/0x210
 mlx5_ib_init+0x56/0x63
 do_one_initcall+0x15b/0x270
 kernel_init_freeable+0x2d8/0x3d0
 kernel_init+0x14/0x190
 ret_from_fork+0x24/0x30

Freed by task 0:
(stack is not available)

The buggy address belongs to the object at ffff880019ae2000
 which belongs to the cache kmalloc-512 of size 512
The buggy address is located 104 bytes to the right of
 512-byte region [ffff880019ae2000ffff880019ae2200)
The buggy address belongs to the page:
page:000000005d674e18 count:1 mapcount:0 mapping:          (null) index:0x0 compound_mapcount: 0
flags: 0x4000000000008100(slab|head)
raw: 4000000000008100 0000000000000000 0000000000000000 00000001000c000c
raw: dead000000000100 dead000000000200 ffff88001a402000 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff880019ae2100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffff880019ae2180: 00 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc
>ffff880019ae2200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
                                                          ^
 ffff880019ae2280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffff880019ae2300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================
Disabling lock debugging due to kernel taint

Cc: <stable@vger.kernel.org>
Fixes: 44c58487d51a ("IB/core: Define 'ib' and 'roce' rdma_ah_attr types")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoMerge tag 'linux-can-next-for-4.16-20180105' of ssh://gitolite.kernel.org/pub/scm...
David S. Miller [Mon, 15 Jan 2018 21:13:34 +0000 (16:13 -0500)]
Merge tag 'linux-can-next-for-4.16-20180105' of ssh://gitolite./linux/kernel/git/mkl/linux-can-next

Marc Kleine-Budde says:

====================
pull-request: can-next 2017-12-01,Re: pull-request: can-next

this is a pull request of 7 patches for net-next/master.

All patches are by me. Patch 6 is for the "can_raw" protocol and add
error checking to the bind() function. All other patches clean up the
coding style and remove unused parameters in various CAN drivers and
infrastructure.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonetlink: extack: avoid parenthesized string constant warning
Johannes Berg [Mon, 15 Jan 2018 11:42:25 +0000 (12:42 +0100)]
netlink: extack: avoid parenthesized string constant warning

NL_SET_ERR_MSG() and NL_SET_ERR_MSG_ATTR() lead to the following warning
in newer versions of gcc:
  warning: array initialized from parenthesized string constant

Just remove the parentheses, they're not needed in this context since
anyway since there can be no operator precendence issues or similar.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'sh_eth-simplify-TSU-initialization'
David S. Miller [Mon, 15 Jan 2018 20:09:46 +0000 (15:09 -0500)]
Merge branch 'sh_eth-simplify-TSU-initialization'

Sergei Shtylyov says:

====================
sh_eth: simplify TSU initialization

Here's a set of 2 patches against DaveM's 'net-next.git' repo. With those,
I'm somewhat simplifying the TSU init code in the driver probe() method...
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosh_eth: get Ether port # only when needed
Sergei Shtylyov [Sun, 14 Jan 2018 17:47:44 +0000 (20:47 +0300)]
sh_eth: get Ether port # only when needed

The dual-port Ether configurations always have a shared TSU to e.g. pass
the packets between those  ports.  With the  TSU init. code gathered under
the single *if*, we now can only get the port # from 'platform_device::id'
only when we actually  need it  (and not recalculate it each time)...

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosh_eth: gather all TSU init code in one place
Sergei Shtylyov [Sun, 14 Jan 2018 17:47:43 +0000 (20:47 +0300)]
sh_eth: gather all TSU init code in one place

The  sh_eth_cpu_data::chip_reset() method  always resets using ARSTR and
this register is always located at the start of the  TSU register region.
Therefore, we can  only call  this method if we know TSU is there and thus
simplify  the probing code a  bit...

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'ipv4-Make-neigh-lookup-keys-for-loopback-point-to-point-devices-be...
David S. Miller [Mon, 15 Jan 2018 19:53:44 +0000 (14:53 -0500)]
Merge branch 'ipv4-Make-neigh-lookup-keys-for-loopback-point-to-point-devices-be-INADDR_ANY'

Jim Westfall says:

====================
ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY

This used to be the previous behavior in older kernels but became broken in
a263b3093641f (ipv4: Make neigh lookups directly in output packet path)
and then later removed because it was broken in 0bb4087cbec0 (ipv4: Fix neigh
lookup keying over loopback/point-to-point devices)

Not having this results in there being an arp entry for every remote ip
address that the device talks to.  Given a fairly active device it can
cause the arp table to become huge and/or having to add/purge large number
of entires to keep within table size thresholds.

$ ip -4 neigh show nud noarp | grep tun | wc -l
55850

$ lnstat -k arp_cache:entries,arp_cache:allocs,arp_cache:destroys -c 10
arp_cach|arp_cach|arp_cach|
 entries|  allocs|destroys|
   81493|620166816|620126069|
  101867|   10186|       0|
  113854|    5993|       0|
  118773|    2459|       0|
   27937|   18579|   63998|
   39256|    5659|       0|
   56231|    8487|       0|
   65602|    4685|       0|
   79697|    7047|       0|
   90733|    5517|       0|

v2:
 - fixes coding style issues
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY
Jim Westfall [Sun, 14 Jan 2018 12:18:51 +0000 (04:18 -0800)]
ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY

Map all lookup neigh keys to INADDR_ANY for loopback/point-to-point devices
to avoid making an entry for every remote ip the device needs to talk to.

This used the be the old behavior but became broken in a263b3093641f
(ipv4: Make neigh lookups directly in output packet path) and later removed
in 0bb4087cbec0 (ipv4: Fix neigh lookup keying over loopback/point-to-point
devices) because it was broken.

Signed-off-by: Jim Westfall <jwestfall@surrealistic.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Allow neigh contructor functions ability to modify the primary_key
Jim Westfall [Sun, 14 Jan 2018 12:18:50 +0000 (04:18 -0800)]
net: Allow neigh contructor functions ability to modify the primary_key

Use n->primary_key instead of pkey to account for the possibility that a neigh
constructor function may have modified the primary_key value.

Signed-off-by: Jim Westfall <jwestfall@surrealistic.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosh_eth: fix dumping ARSTR
Sergei Shtylyov [Sat, 13 Jan 2018 17:22:01 +0000 (20:22 +0300)]
sh_eth: fix dumping ARSTR

ARSTR  is always located at the start of the TSU register region, thus
using add_reg()  instead of add_tsu_reg() in __sh_eth_get_regs() to dump it
causes EDMR or EDSR (depending on the register layout) to be dumped instead
of ARSTR.  Use the correct condition/macro there...

Fixes: 6b4b4fead342 ("sh_eth: Implement ethtool register dump operations")
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge tag 'wireless-drivers-next-for-davem-2018-01-13' of git://git.kernel.org/pub...
David S. Miller [Mon, 15 Jan 2018 19:46:16 +0000 (14:46 -0500)]
Merge tag 'wireless-drivers-next-for-davem-2018-01-13' of git://git./linux/kernel/git/kvalo/wireless-drivers-next

Kalle Valo says:

====================
wireless-drivers-next patches for 4.16

Here are patches which have been accumulating over the holidays and
after the New Year. Business as usual and nothing special really
standing out.

But what's noteworthy here is that Larry Finger is stepping down as
the rtlwifi maintainer. He has been maintaining rtlwifi since it was
applied back in 2010 in commit 0c8173385e54 ("rtl8192ce: Add new
driver") and it has been no easy role trying to juggle between the
vendor, demanding upstream community and users. So big thank you to
Larry for all his efforts!

ath10k

* more preparation work for wcn3990 support

* add memory dump to firmware coredump files

wil6210

* support scheduled scan

* support 40-bit DMA addresses

qtnfmac

* support MAC address based access control

* support for radar detection and Channel Availibility Check (CAC)

mwifiex

* firmware coredump for usb devices

rtlwifi

* Larry Finger steps down as the maintainer and Ping-Ke Shih becomes
  the new maintainer

* add debugfs interfaces to dump register and btcoex status, and also
  write registers and h2c
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: ethernet: Add a driver for Gemini gigabit ethernet
Linus Walleij [Fri, 12 Jan 2018 21:34:24 +0000 (22:34 +0100)]
net: ethernet: Add a driver for Gemini gigabit ethernet

The Gemini ethernet has been around for years as an out-of-tree
patch used with the NAS boxen and routers built on StorLink
SL3512 and SL3516, later Storm Semiconductor, later Cortina
Systems. These ASICs are still being deployed and brand new
off-the-shelf systems using it can easily be acquired.

The full name of the IP block is "Net Engine and Gigabit
Ethernet MAC" commonly just called "GMAC".

The hardware block contains a common TCP Offload Enginer (TOE)
that can be used by both MACs. The current driver does not use
it.

Cc: Tobias Waldvogel <tobias.waldvogel@gmail.com>
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: ethernet: Add DT bindings for the Gemini ethernet
Linus Walleij [Fri, 12 Jan 2018 21:34:23 +0000 (22:34 +0100)]
net: ethernet: Add DT bindings for the Gemini ethernet

This adds the device tree bindings for the Gemini ethernet
controller. It is pretty straight-forward, using standard
bindings and modelling the two child ports as child devices
under the parent ethernet controller device.

Cc: devicetree@vger.kernel.org
Cc: Tobias Waldvogel <tobias.waldvogel@gmail.com>
Cc: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoRevert "openvswitch: Add erspan tunnel support."
William Tu [Fri, 12 Jan 2018 20:29:22 +0000 (12:29 -0800)]
Revert "openvswitch: Add erspan tunnel support."

This reverts commit ceaa001a170e43608854d5290a48064f57b565ed.

The OVS_TUNNEL_KEY_ATTR_ERSPAN_OPTS attr should be designed
as a nested attribute to support all ERSPAN v1 and v2's fields.
The current attr is a be32 supporting only one field.  Thus, this
patch reverts it and later patch will redo it using nested attr.

Signed-off-by: William Tu <u9012063@gmail.com>
Cc: Jiri Benc <jbenc@redhat.com>
Cc: Pravin Shelar <pshelar@ovn.org>
Acked-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoipv6: Fix build with gcc-4.4.5
Ido Schimmel [Fri, 12 Jan 2018 20:07:36 +0000 (22:07 +0200)]
ipv6: Fix build with gcc-4.4.5

Emil reported the following compiler errors:

net/ipv6/route.c: In function `rt6_sync_up`:
net/ipv6/route.c:3586: error: unknown field `nh_flags` specified in initializer
net/ipv6/route.c:3586: warning: missing braces around initializer
net/ipv6/route.c:3586: warning: (near initialization for `arg.<anonymous>`)
net/ipv6/route.c: In function `rt6_sync_down_dev`:
net/ipv6/route.c:3695: error: unknown field `event` specified in initializer
net/ipv6/route.c:3695: warning: missing braces around initializer
net/ipv6/route.c:3695: warning: (near initialization for `arg.<anonymous>`)

Problem is with the named initializers for the anonymous union members.
Fix this by adding curly braces around the initialization.

Fixes: 4c981e28d373 ("ipv6: Prepare to handle multiple netdev events")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Emil S Tantilov <emils.tantilov@gmail.com>
Tested-by: Emil S Tantilov <emils.tantilov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotipc: fix bug during lookup of multicast destination nodes
Jon Maloy [Fri, 12 Jan 2018 19:56:50 +0000 (20:56 +0100)]
tipc: fix bug during lookup of multicast destination nodes

In commit 232d07b74a33 ("tipc: improve groupcast scope handling") we
inadvertently broke non-group multicast transmission when changing the
parameter 'domain' to 'scope' in the function
tipc_nametbl_lookup_dst_nodes(). We missed to make the corresponding
change in the calling function, with the result that the lookup always
fails.

A closer anaysis reveals that this parameter is not needed at all.
Non-group multicast is hard coded to use CLUSTER_SCOPE, and in the
current implementation this will be delivered to all matching
destinations except those which are published with NODE_SCOPE on other
nodes. Since such publications never will be visible on the sending node
anyway, it makes no sense to discriminate by scope at all.

We now remove this parameter altogether.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Convert atomic_t net::count to refcount_t
Kirill Tkhai [Fri, 12 Jan 2018 15:28:31 +0000 (18:28 +0300)]
net: Convert atomic_t net::count to refcount_t

Since net could be obtained from RCU lists,
and there is a race with net destruction,
the patch converts net::count to refcount_t.

This provides sanity checks for the cases of
incrementing counter of already dead net,
when maybe_get_net() has to used instead
of get_net().

Drivers: allyesconfig and allmodconfig are OK.

Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/tls: Fix inverted error codes to avoid endless loop
r.hering@avm.de [Fri, 12 Jan 2018 14:42:06 +0000 (15:42 +0100)]
net/tls: Fix inverted error codes to avoid endless loop

sendfile() calls can hang endless with using Kernel TLS if a socket error occurs.
Socket error codes must be inverted by Kernel TLS before returning because
they are stored with positive sign. If returned non-inverted they are
interpreted as number of bytes sent, causing endless looping of the
splice mechanic behind sendfile().

Signed-off-by: Robert Hering <r.hering@avm.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoipv6: ip6_make_skb() needs to clear cork.base.dst
Eric Dumazet [Fri, 12 Jan 2018 06:31:18 +0000 (22:31 -0800)]
ipv6: ip6_make_skb() needs to clear cork.base.dst

In my last patch, I missed fact that cork.base.dst was not initialized
in ip6_make_skb() :

If ip6_setup_cork() returns an error, we might attempt a dst_release()
on some random pointer.

Fixes: 862c03ee1deb ("ipv6: fix possible mem leaks in ipv6_make_skb()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotracing: Prevent PROFILE_ALL_BRANCHES when FORTIFY_SOURCE=y
Randy Dunlap [Mon, 15 Jan 2018 19:07:27 +0000 (11:07 -0800)]
tracing: Prevent PROFILE_ALL_BRANCHES when FORTIFY_SOURCE=y

I regularly get 50 MB - 60 MB files during kernel randconfig builds.
These large files mostly contain (many repeats of; e.g., 124,594):

In file included from ../include/linux/string.h:6:0,
                 from ../include/linux/uuid.h:20,
                 from ../include/linux/mod_devicetable.h:13,
                 from ../scripts/mod/devicetable-offsets.c:3:
../include/linux/compiler.h:64:4: warning: '______f' is static but declared in inline function 'strcpy' which is not static [enabled by default]
    ______f = {     \
    ^
../include/linux/compiler.h:56:23: note: in expansion of macro '__trace_if'
                       ^
../include/linux/string.h:425:2: note: in expansion of macro 'if'
  if (p_size == (size_t)-1 && q_size == (size_t)-1)
  ^

This only happens when CONFIG_FORTIFY_SOURCE=y and
CONFIG_PROFILE_ALL_BRANCHES=y, so prevent PROFILE_ALL_BRANCHES if
FORTIFY_SOURCE=y.

Link: http://lkml.kernel.org/r/9199446b-a141-c0c3-9678-a3f9107f2750@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
6 years agosctp: removed unused var from sctp_make_auth
Marcelo Ricardo Leitner [Thu, 11 Jan 2018 16:22:07 +0000 (14:22 -0200)]
sctp: removed unused var from sctp_make_auth

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosctp: avoid compiler warning on implicit fallthru
Marcelo Ricardo Leitner [Thu, 11 Jan 2018 16:22:06 +0000 (14:22 -0200)]
sctp: avoid compiler warning on implicit fallthru

These fall-through are expected.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: ipv4: Make "ip route get" match iif lo rules again.
Lorenzo Colitti [Thu, 11 Jan 2018 09:36:26 +0000 (18:36 +0900)]
net: ipv4: Make "ip route get" match iif lo rules again.

Commit 3765d35ed8b9 ("net: ipv4: Convert inet_rtm_getroute to rcu
versions of route lookup") broke "ip route get" in the presence
of rules that specify iif lo.

Host-originated traffic always has iif lo, because
ip_route_output_key_hash and ip6_route_output_flags set the flow
iif to LOOPBACK_IFINDEX. Thus, putting "iif lo" in an ip rule is a
convenient way to select only originated traffic and not forwarded
traffic.

inet_rtm_getroute used to match these rules correctly because
even though it sets the flow iif to 0, it called
ip_route_output_key which overwrites iif with LOOPBACK_IFINDEX.
But now that it calls ip_route_output_key_hash_rcu, the ifindex
will remain 0 and not match the iif lo in the rule. As a result,
"ip route get" will return ENETUNREACH.

Fixes: 3765d35ed8b9 ("net: ipv4: Convert inet_rtm_getroute to rcu versions of route lookup")
Tested: https://android.googlesource.com/kernel/tests/+/master/net/test/multinetwork_test.py passes again
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonetlink: extack needs to be reset each time through loop
David Ahern [Wed, 10 Jan 2018 21:00:39 +0000 (13:00 -0800)]
netlink: extack needs to be reset each time through loop

syzbot triggered the WARN_ON in netlink_ack testing the bad_attr value.
The problem is that netlink_rcv_skb loops over the skb repeatedly invoking
the callback and without resetting the extack leaving potentially stale
data. Initializing each time through avoids the WARN_ON.

Fixes: 2d4bc93368f5a ("netlink: extended ACK reporting")
Reported-by: syzbot+315fa6766d0f7c359327@syzkaller.appspotmail.com
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotipc: fix a memory leak in tipc_nl_node_get_link()
Cong Wang [Wed, 10 Jan 2018 20:50:25 +0000 (12:50 -0800)]
tipc: fix a memory leak in tipc_nl_node_get_link()

When tipc_node_find_by_name() fails, the nlmsg is not
freed.

While on it, switch to a goto label to properly
free it.

Fixes: be9c086715c ("tipc: narrow down exposure of struct tipc_node")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Jon Maloy <jon.maloy@ericsson.com>
Cc: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: phy: remove parameter new_link from phy_mac_interrupt()
Heiner Kallweit [Wed, 10 Jan 2018 20:21:31 +0000 (21:21 +0100)]
net: phy: remove parameter new_link from phy_mac_interrupt()

I see two issues with parameter new_link:

1. It's not needed. See also phy_interrupt(), works w/o this parameter.
   phy_mac_interrupt sets the state to PHY_CHANGELINK and triggers the
   state machine which then calls phy_read_status. And phy_read_status
   updates the link state.

2. phy_mac_interrupt is used in interrupt context and getting the link
   state may sleep (at least when having to access the PHY registers
   via MDIO bus).

So let's remove it.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotipc: fix a potental access after delete in tipc_sk_join()
Jon Maloy [Wed, 10 Jan 2018 20:08:50 +0000 (21:08 +0100)]
tipc: fix a potental access after delete in tipc_sk_join()

In commit d12d2e12cec2 "tipc: send out join messages as soon as new
member is discovered") we added a call to the function tipc_group_join()
without considering the case that the preceding tipc_sk_publish() might
have failed, and the group item already deleted.

We fix this by returning from tipc_sk_join() directly after the
failed tipc_sk_publish.

Reported-by: syzbot+e3eeae78ea88b8d6d858@syzkaller.appspotmail.com
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoipv6: fix udpv6 sendmsg crash caused by too small MTU
Mike Maloney [Wed, 10 Jan 2018 17:45:10 +0000 (12:45 -0500)]
ipv6: fix udpv6 sendmsg crash caused by too small MTU

The logic in __ip6_append_data() assumes that the MTU is at least large
enough for the headers.  A device's MTU may be adjusted after being
added while sendmsg() is processing data, resulting in
__ip6_append_data() seeing any MTU.  For an mtu smaller than the size of
the fragmentation header, the math results in a negative 'maxfraglen',
which causes problems when refragmenting any previous skb in the
skb_write_queue, leaving it possibly malformed.

Instead sendmsg returns EINVAL when the mtu is calculated to be less
than IPV6_MIN_MTU.

Found by syzkaller:
kernel BUG at ./include/linux/skbuff.h:2064!
invalid opcode: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 1 PID: 14216 Comm: syz-executor5 Not tainted 4.13.0-rc4+ #2
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
task: ffff8801d0b68580 task.stack: ffff8801ac6b8000
RIP: 0010:__skb_pull include/linux/skbuff.h:2064 [inline]
RIP: 0010:__ip6_make_skb+0x18cf/0x1f70 net/ipv6/ip6_output.c:1617
RSP: 0018:ffff8801ac6bf570 EFLAGS: 00010216
RAX: 0000000000010000 RBX: 0000000000000028 RCX: ffffc90003cce000
RDX: 00000000000001b8 RSI: ffffffff839df06f RDI: ffff8801d9478ca0
RBP: ffff8801ac6bf780 R08: ffff8801cc3f1dbc R09: 0000000000000000
R10: ffff8801ac6bf7a0 R11: 43cb4b7b1948a9e7 R12: ffff8801cc3f1dc8
R13: ffff8801cc3f1d40 R14: 0000000000001036 R15: dffffc0000000000
FS:  00007f43d740c700(0000) GS:ffff8801dc100000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f7834984000 CR3: 00000001d79b9000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 ip6_finish_skb include/net/ipv6.h:911 [inline]
 udp_v6_push_pending_frames+0x255/0x390 net/ipv6/udp.c:1093
 udpv6_sendmsg+0x280d/0x31a0 net/ipv6/udp.c:1363
 inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:762
 sock_sendmsg_nosec net/socket.c:633 [inline]
 sock_sendmsg+0xca/0x110 net/socket.c:643
 SYSC_sendto+0x352/0x5a0 net/socket.c:1750
 SyS_sendto+0x40/0x50 net/socket.c:1718
 entry_SYSCALL_64_fastpath+0x1f/0xbe
RIP: 0033:0x4512e9
RSP: 002b:00007f43d740bc08 EFLAGS: 00000216 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00000000007180a8 RCX: 00000000004512e9
RDX: 000000000000002e RSI: 0000000020d08000 RDI: 0000000000000005
RBP: 0000000000000086 R08: 00000000209c1000 R09: 000000000000001c
R10: 0000000000040800 R11: 0000000000000216 R12: 00000000004b9c69
R13: 00000000ffffffff R14: 0000000000000005 R15: 00000000202c2000
Code: 9e 01 fe e9 c5 e8 ff ff e8 7f 9e 01 fe e9 4a ea ff ff 48 89 f7 e8 52 9e 01 fe e9 aa eb ff ff e8 a8 b6 cf fd 0f 0b e8 a1 b6 cf fd <0f> 0b 49 8d 45 78 4d 8d 45 7c 48 89 85 78 fe ff ff 49 8d 85 ba
RIP: __skb_pull include/linux/skbuff.h:2064 [inline] RSP: ffff8801ac6bf570
RIP: __ip6_make_skb+0x18cf/0x1f70 net/ipv6/ip6_output.c:1617 RSP: ffff8801ac6bf570

Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Mike Maloney <maloney@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: cs89x0: add MODULE_LICENSE
Arnd Bergmann [Wed, 10 Jan 2018 16:30:22 +0000 (17:30 +0100)]
net: cs89x0: add MODULE_LICENSE

This driver lacks a MODULE_LICENSE tag, leading to a Kbuild warning:

WARNING: modpost: missing MODULE_LICENSE() in drivers/net/ethernet/cirrus/cs89x0.o

This adds license, author, and description according to the
comment block at the start of the file.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoppp: unlock all_ppp_mutex before registering device
Guillaume Nault [Wed, 10 Jan 2018 15:24:45 +0000 (16:24 +0100)]
ppp: unlock all_ppp_mutex before registering device

ppp_dev_uninit(), which is the .ndo_uninit() handler of PPP devices,
needs to lock pn->all_ppp_mutex. Therefore we mustn't call
register_netdevice() with pn->all_ppp_mutex already locked, or we'd
deadlock in case register_netdevice() fails and calls .ndo_uninit().

Fortunately, we can unlock pn->all_ppp_mutex before calling
register_netdevice(). This lock protects pn->units_idr, which isn't
used in the device registration process.

However, keeping pn->all_ppp_mutex locked during device registration
did ensure that no device in transient state would be published in
pn->units_idr. In practice, unlocking it before calling
register_netdevice() doesn't change this property: ppp_unit_register()
is called with 'ppp_mutex' locked and all searches done in
pn->units_idr hold this lock too.

Fixes: 8cb775bc0a34 ("ppp: fix device unregistration upon netns deletion")
Reported-and-tested-by: syzbot+367889b9c9e279219175@syzkaller.appspotmail.com
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoptr_ring: document usage around __ptr_ring_peek
Michael S. Tsirkin [Wed, 10 Jan 2018 14:03:05 +0000 (16:03 +0200)]
ptr_ring: document usage around __ptr_ring_peek

This explains why is the net usage of __ptr_ring_peek
actually ok without locks.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'dsa-lan9303-check-error-value-from-devm_gpiod_get_optional'
David S. Miller [Mon, 15 Jan 2018 18:18:03 +0000 (13:18 -0500)]
Merge branch 'dsa-lan9303-check-error-value-from-devm_gpiod_get_optional'

Phil Reid says:

====================
net: dsa: lan9303: check error value from devm_gpiod_get_optional()

Errors need to be prograted back from probe.

Note: I have only compile tested the code as I don't have the hardware.
Egil Hjelmeland <privat@egil-hjelmeland.no> has tested it but I haven't
added at Test-by: wasn't in the standard form. Not sure if that's ok or
not.

Changes from v1:
- rebased on net-next
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: dsa: lan9303: check error value from devm_gpiod_get_optional()
Phil Reid [Wed, 10 Jan 2018 07:39:33 +0000 (15:39 +0800)]
net: dsa: lan9303: check error value from devm_gpiod_get_optional()

devm_gpiod_get_optional() can return an error in addition to a NULL ptr.
Check for error and propagate that to the probe function. Check return
value in probe. This will now handle EPROBE_DEFER for the reset gpio.

Signed-off-by: Phil Reid <preid@electromag.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: dsa: lan9303: make lan9303_handle_reset() a void function
Phil Reid [Wed, 10 Jan 2018 07:39:32 +0000 (15:39 +0800)]
net: dsa: lan9303: make lan9303_handle_reset() a void function

lan9303_handle_reset never returns anything other than success.
So there's not need for it to return an error code.

Signed-off-by: Phil Reid <preid@electromag.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>