platform/kernel/linux-rpi.git
5 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
David S. Miller [Fri, 26 Apr 2019 03:52:29 +0000 (23:52 -0400)]
Merge git://git./linux/kernel/git/davem/net

Two easy cases of overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Linus Torvalds [Wed, 24 Apr 2019 23:18:59 +0000 (16:18 -0700)]
Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:
 "Just the usual assortment of small'ish fixes:

   1) Conntrack timeout is sometimes not initialized properly, from
      Alexander Potapenko.

   2) Add a reasonable range limit to tcp_min_rtt_wlen to avoid
      undefined behavior. From ZhangXiaoxu.

   3) des1 field of descriptor in stmmac driver is initialized with the
      wrong variable. From Yue Haibing.

   4) Increase mlxsw pci sw reset timeout a little bit more, from Ido
      Schimmel.

   5) Match IOT2000 stmmac devices more accurately, from Su Bao Cheng.

   6) Fallback refcount fix in TLS code, from Jakub Kicinski.

   7) Fix max MTU check when using XDP in mlx5, from Maxim Mikityanskiy.

   8) Fix recursive locking in team driver, from Hangbin Liu.

   9) Fix tls_set_device_offload_Rx() deadlock, from Jakub Kicinski.

  10) Don't use napi_alloc_frag() outside of softiq context of socionext
      driver, from Ilias Apalodimas.

  11) MAC address increment overflow in ncsi, from Tao Ren.

  12) Fix a regression in 8K/1M pool switching of RDS, from Zhu Yanjun.

  13) ipv4_link_failure has to validate the headers that are actually
      there because RAW sockets can pass in arbitrary garbage, from Eric
      Dumazet"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (43 commits)
  ipv4: add sanity checks in ipv4_link_failure()
  net/rose: fix unbound loop in rose_loopback_timer()
  rxrpc: fix race condition in rxrpc_input_packet()
  net: rds: exchange of 8K and 1M pool
  net: vrf: Fix operation not supported when set vrf mac
  net/ncsi: handle overflow when incrementing mac address
  net: socionext: replace napi_alloc_frag with the netdev variant on init
  net: atheros: fix spelling mistake "underun" -> "underrun"
  spi: ST ST95HF NFC: declare missing of table
  spi: Micrel eth switch: declare missing of table
  net: stmmac: move stmmac_check_ether_addr() to driver probe
  netfilter: fix nf_l4proto_log_invalid to log invalid packets
  netfilter: never get/set skb->tstamp
  netfilter: ebtables: CONFIG_COMPAT: drop a bogus WARN_ON
  Documentation: decnet: remove reference to CONFIG_DECNET_ROUTE_FWMARK
  dt-bindings: add an explanation for internal phy-mode
  net/tls: don't leak IV and record seq when offload fails
  net/tls: avoid potential deadlock in tls_set_device_offload_rx()
  selftests/net: correct the return value for run_afpackettests
  team: fix possible recursive locking when add slaves
  ...

5 years agoMerge tag 'leds-for-5.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anasz...
Linus Torvalds [Wed, 24 Apr 2019 23:15:38 +0000 (16:15 -0700)]
Merge tag 'leds-for-5.1-rc7' of git://git./linux/kernel/git/j.anaszewski/linux-leds

Pull LED update from Jacek Anaszewski:
 "A single change to MAINTAINERS:

  We announce a new LED reviewer - Dan Murphy"

* tag 'leds-for-5.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds:
  MAINTAINERS: LEDs: Add designated reviewer for LED subsystem

5 years agoipv4: add sanity checks in ipv4_link_failure()
Eric Dumazet [Wed, 24 Apr 2019 15:04:05 +0000 (08:04 -0700)]
ipv4: add sanity checks in ipv4_link_failure()

Before calling __ip_options_compile(), we need to ensure the network
header is a an IPv4 one, and that it is already pulled in skb->head.

RAW sockets going through a tunnel can end up calling ipv4_link_failure()
with total garbage in the skb, or arbitrary lengthes.

syzbot report :

BUG: KASAN: stack-out-of-bounds in memcpy include/linux/string.h:355 [inline]
BUG: KASAN: stack-out-of-bounds in __ip_options_echo+0x294/0x1120 net/ipv4/ip_options.c:123
Write of size 69 at addr ffff888096abf068 by task syz-executor.4/9204

CPU: 0 PID: 9204 Comm: syz-executor.4 Not tainted 5.1.0-rc5+ #77
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x172/0x1f0 lib/dump_stack.c:113
 print_address_description.cold+0x7c/0x20d mm/kasan/report.c:187
 kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
 check_memory_region_inline mm/kasan/generic.c:185 [inline]
 check_memory_region+0x123/0x190 mm/kasan/generic.c:191
 memcpy+0x38/0x50 mm/kasan/common.c:133
 memcpy include/linux/string.h:355 [inline]
 __ip_options_echo+0x294/0x1120 net/ipv4/ip_options.c:123
 __icmp_send+0x725/0x1400 net/ipv4/icmp.c:695
 ipv4_link_failure+0x29f/0x550 net/ipv4/route.c:1204
 dst_link_failure include/net/dst.h:427 [inline]
 vti6_xmit net/ipv6/ip6_vti.c:514 [inline]
 vti6_tnl_xmit+0x10d4/0x1c0c net/ipv6/ip6_vti.c:553
 __netdev_start_xmit include/linux/netdevice.h:4414 [inline]
 netdev_start_xmit include/linux/netdevice.h:4423 [inline]
 xmit_one net/core/dev.c:3292 [inline]
 dev_hard_start_xmit+0x1b2/0x980 net/core/dev.c:3308
 __dev_queue_xmit+0x271d/0x3060 net/core/dev.c:3878
 dev_queue_xmit+0x18/0x20 net/core/dev.c:3911
 neigh_direct_output+0x16/0x20 net/core/neighbour.c:1527
 neigh_output include/net/neighbour.h:508 [inline]
 ip_finish_output2+0x949/0x1740 net/ipv4/ip_output.c:229
 ip_finish_output+0x73c/0xd50 net/ipv4/ip_output.c:317
 NF_HOOK_COND include/linux/netfilter.h:278 [inline]
 ip_output+0x21f/0x670 net/ipv4/ip_output.c:405
 dst_output include/net/dst.h:444 [inline]
 NF_HOOK include/linux/netfilter.h:289 [inline]
 raw_send_hdrinc net/ipv4/raw.c:432 [inline]
 raw_sendmsg+0x1d2b/0x2f20 net/ipv4/raw.c:663
 inet_sendmsg+0x147/0x5d0 net/ipv4/af_inet.c:798
 sock_sendmsg_nosec net/socket.c:651 [inline]
 sock_sendmsg+0xdd/0x130 net/socket.c:661
 sock_write_iter+0x27c/0x3e0 net/socket.c:988
 call_write_iter include/linux/fs.h:1866 [inline]
 new_sync_write+0x4c7/0x760 fs/read_write.c:474
 __vfs_write+0xe4/0x110 fs/read_write.c:487
 vfs_write+0x20c/0x580 fs/read_write.c:549
 ksys_write+0x14f/0x2d0 fs/read_write.c:599
 __do_sys_write fs/read_write.c:611 [inline]
 __se_sys_write fs/read_write.c:608 [inline]
 __x64_sys_write+0x73/0xb0 fs/read_write.c:608
 do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x458c29
Code: ad b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f293b44bc78 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000458c29
RDX: 0000000000000014 RSI: 00000000200002c0 RDI: 0000000000000003
RBP: 000000000073bf00 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f293b44c6d4
R13: 00000000004c8623 R14: 00000000004ded68 R15: 00000000ffffffff

The buggy address belongs to the page:
page:ffffea00025aafc0 count:0 mapcount:0 mapping:0000000000000000 index:0x0
flags: 0x1fffc0000000000()
raw: 01fffc0000000000 0000000000000000 ffffffff025a0101 0000000000000000
raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff888096abef80: 00 00 00 f2 f2 f2 f2 f2 00 00 00 00 00 00 00 f2
 ffff888096abf000: f2 f2 f2 f2 00 00 00 00 00 00 00 00 00 00 00 00
>ffff888096abf080: 00 00 f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00
                         ^
 ffff888096abf100: 00 00 00 00 f1 f1 f1 f1 00 00 f3 f3 00 00 00 00
 ffff888096abf180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Fixes: ed0de45a1008 ("ipv4: recompile ip options in ipv4_link_failure")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Stephen Suryaputra <ssuryaextr@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/rose: fix unbound loop in rose_loopback_timer()
Eric Dumazet [Wed, 24 Apr 2019 12:35:00 +0000 (05:35 -0700)]
net/rose: fix unbound loop in rose_loopback_timer()

This patch adds a limit on the number of skbs that fuzzers can queue
into loopback_queue. 1000 packets for rose loopback seems more than enough.

Then, since we now have multiple cpus in most linux hosts,
we also need to limit the number of skbs rose_loopback_timer()
can dequeue at each round.

rose_loopback_queue() can be drop-monitor friendly, calling
consume_skb() or kfree_skb() appropriately.

Finally, use mod_timer() instead of del_timer() + add_timer()

syzbot report was :

rcu: INFO: rcu_preempt self-detected stall on CPU
rcu:    0-...!: (10499 ticks this GP) idle=536/1/0x4000000000000002 softirq=103291/103291 fqs=34
rcu:     (t=10500 jiffies g=140321 q=323)
rcu: rcu_preempt kthread starved for 10426 jiffies! g140321 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=1
rcu: RCU grace-period kthread stack dump:
rcu_preempt     I29168    10      2 0x80000000
Call Trace:
 context_switch kernel/sched/core.c:2877 [inline]
 __schedule+0x813/0x1cc0 kernel/sched/core.c:3518
 schedule+0x92/0x180 kernel/sched/core.c:3562
 schedule_timeout+0x4db/0xfd0 kernel/time/timer.c:1803
 rcu_gp_fqs_loop kernel/rcu/tree.c:1971 [inline]
 rcu_gp_kthread+0x962/0x17b0 kernel/rcu/tree.c:2128
 kthread+0x357/0x430 kernel/kthread.c:253
 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352
NMI backtrace for cpu 0
CPU: 0 PID: 7632 Comm: kworker/0:4 Not tainted 5.1.0-rc5+ #172
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: events iterate_cleanup_work
Call Trace:
 <IRQ>
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x172/0x1f0 lib/dump_stack.c:113
 nmi_cpu_backtrace.cold+0x63/0xa4 lib/nmi_backtrace.c:101
 nmi_trigger_cpumask_backtrace+0x1be/0x236 lib/nmi_backtrace.c:62
 arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
 trigger_single_cpu_backtrace include/linux/nmi.h:164 [inline]
 rcu_dump_cpu_stacks+0x183/0x1cf kernel/rcu/tree.c:1223
 print_cpu_stall kernel/rcu/tree.c:1360 [inline]
 check_cpu_stall kernel/rcu/tree.c:1434 [inline]
 rcu_pending kernel/rcu/tree.c:3103 [inline]
 rcu_sched_clock_irq.cold+0x500/0xa4a kernel/rcu/tree.c:2544
 update_process_times+0x32/0x80 kernel/time/timer.c:1635
 tick_sched_handle+0xa2/0x190 kernel/time/tick-sched.c:161
 tick_sched_timer+0x47/0x130 kernel/time/tick-sched.c:1271
 __run_hrtimer kernel/time/hrtimer.c:1389 [inline]
 __hrtimer_run_queues+0x33e/0xde0 kernel/time/hrtimer.c:1451
 hrtimer_interrupt+0x314/0x770 kernel/time/hrtimer.c:1509
 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1035 [inline]
 smp_apic_timer_interrupt+0x120/0x570 arch/x86/kernel/apic/apic.c:1060
 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:807
RIP: 0010:__sanitizer_cov_trace_pc+0x0/0x50 kernel/kcov.c:95
Code: 89 25 b4 6e ec 08 41 bc f4 ff ff ff e8 cd 5d ea ff 48 c7 05 9e 6e ec 08 00 00 00 00 e9 a4 e9 ff ff 90 90 90 90 90 90 90 90 90 <55> 48 89 e5 48 8b 75 08 65 48 8b 04 25 00 ee 01 00 65 8b 15 c8 60
RSP: 0018:ffff8880ae807ce0 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff13
RAX: ffff88806fd40640 RBX: dffffc0000000000 RCX: ffffffff863fbc56
RDX: 0000000000000100 RSI: ffffffff863fbc1d RDI: ffff88808cf94228
RBP: ffff8880ae807d10 R08: ffff88806fd40640 R09: ffffed1015d00f8b
R10: ffffed1015d00f8a R11: 0000000000000003 R12: ffff88808cf941c0
R13: 00000000fffff034 R14: ffff8882166cd840 R15: 0000000000000000
 rose_loopback_timer+0x30d/0x3f0 net/rose/rose_loopback.c:91
 call_timer_fn+0x190/0x720 kernel/time/timer.c:1325
 expire_timers kernel/time/timer.c:1362 [inline]
 __run_timers kernel/time/timer.c:1681 [inline]
 __run_timers kernel/time/timer.c:1649 [inline]
 run_timer_softirq+0x652/0x1700 kernel/time/timer.c:1694
 __do_softirq+0x266/0x95a kernel/softirq.c:293
 do_softirq_own_stack+0x2a/0x40 arch/x86/entry/entry_64.S:1027

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agorxrpc: fix race condition in rxrpc_input_packet()
Eric Dumazet [Wed, 24 Apr 2019 16:44:11 +0000 (09:44 -0700)]
rxrpc: fix race condition in rxrpc_input_packet()

After commit 5271953cad31 ("rxrpc: Use the UDP encap_rcv hook"),
rxrpc_input_packet() is directly called from lockless UDP receive
path, under rcu_read_lock() protection.

It must therefore use RCU rules :

- udp_sk->sk_user_data can be cleared at any point in this function.
  rcu_dereference_sk_user_data() is what we need here.

- Also, since sk_user_data might have been set in rxrpc_open_socket()
  we must observe a proper RCU grace period before kfree(local) in
  rxrpc_lookup_local()

v4: @local can be NULL in xrpc_lookup_local() as reported by kbuild test robot <lkp@intel.com>
        and Julia Lawall <julia.lawall@lip6.fr>, thanks !

v3,v2 : addressed David Howells feedback, thanks !

syzbot reported :

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 19236 Comm: syz-executor703 Not tainted 5.1.0-rc6 #79
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:__lock_acquire+0xbef/0x3fb0 kernel/locking/lockdep.c:3573
Code: 00 0f 85 a5 1f 00 00 48 81 c4 10 01 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 48 b8 00 00 00 00 00 fc ff df 4c 89 ea 48 c1 ea 03 <80> 3c 02 00 0f 85 4a 21 00 00 49 81 7d 00 20 54 9c 89 0f 84 cf f4
RSP: 0018:ffff88809d7aef58 EFLAGS: 00010002
RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000026 RSI: 0000000000000000 RDI: 0000000000000001
RBP: ffff88809d7af090 R08: 0000000000000001 R09: 0000000000000001
R10: ffffed1015d05bc7 R11: ffff888089428600 R12: 0000000000000000
R13: 0000000000000130 R14: 0000000000000001 R15: 0000000000000001
FS:  00007f059044d700(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000004b6040 CR3: 00000000955ca000 CR4: 00000000001406f0
Call Trace:
 lock_acquire+0x16f/0x3f0 kernel/locking/lockdep.c:4211
 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
 _raw_spin_lock_irqsave+0x95/0xcd kernel/locking/spinlock.c:152
 skb_queue_tail+0x26/0x150 net/core/skbuff.c:2972
 rxrpc_reject_packet net/rxrpc/input.c:1126 [inline]
 rxrpc_input_packet+0x4a0/0x5536 net/rxrpc/input.c:1414
 udp_queue_rcv_one_skb+0xaf2/0x1780 net/ipv4/udp.c:2011
 udp_queue_rcv_skb+0x128/0x730 net/ipv4/udp.c:2085
 udp_unicast_rcv_skb.isra.0+0xb9/0x360 net/ipv4/udp.c:2245
 __udp4_lib_rcv+0x701/0x2ca0 net/ipv4/udp.c:2301
 udp_rcv+0x22/0x30 net/ipv4/udp.c:2482
 ip_protocol_deliver_rcu+0x60/0x8f0 net/ipv4/ip_input.c:208
 ip_local_deliver_finish+0x23b/0x390 net/ipv4/ip_input.c:234
 NF_HOOK include/linux/netfilter.h:289 [inline]
 NF_HOOK include/linux/netfilter.h:283 [inline]
 ip_local_deliver+0x1e9/0x520 net/ipv4/ip_input.c:255
 dst_input include/net/dst.h:450 [inline]
 ip_rcv_finish+0x1e1/0x300 net/ipv4/ip_input.c:413
 NF_HOOK include/linux/netfilter.h:289 [inline]
 NF_HOOK include/linux/netfilter.h:283 [inline]
 ip_rcv+0xe8/0x3f0 net/ipv4/ip_input.c:523
 __netif_receive_skb_one_core+0x115/0x1a0 net/core/dev.c:4987
 __netif_receive_skb+0x2c/0x1c0 net/core/dev.c:5099
 netif_receive_skb_internal+0x117/0x660 net/core/dev.c:5202
 napi_frags_finish net/core/dev.c:5769 [inline]
 napi_gro_frags+0xade/0xd10 net/core/dev.c:5843
 tun_get_user+0x2f24/0x3fb0 drivers/net/tun.c:1981
 tun_chr_write_iter+0xbd/0x156 drivers/net/tun.c:2027
 call_write_iter include/linux/fs.h:1866 [inline]
 do_iter_readv_writev+0x5e1/0x8e0 fs/read_write.c:681
 do_iter_write fs/read_write.c:957 [inline]
 do_iter_write+0x184/0x610 fs/read_write.c:938
 vfs_writev+0x1b3/0x2f0 fs/read_write.c:1002
 do_writev+0x15e/0x370 fs/read_write.c:1037
 __do_sys_writev fs/read_write.c:1110 [inline]
 __se_sys_writev fs/read_write.c:1107 [inline]
 __x64_sys_writev+0x75/0xb0 fs/read_write.c:1107
 do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 5271953cad31 ("rxrpc: Use the UDP encap_rcv hook")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMAINTAINERS: LEDs: Add designated reviewer for LED subsystem
Dan Murphy [Tue, 23 Apr 2019 20:00:24 +0000 (15:00 -0500)]
MAINTAINERS: LEDs: Add designated reviewer for LED subsystem

Add a designated reviewer for the LED subsystem as there
are already two maintainers assigned.

Signed-off-by: Dan Murphy <dmurphy@ti.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
5 years agonet: rds: exchange of 8K and 1M pool
Zhu Yanjun [Wed, 24 Apr 2019 06:56:42 +0000 (02:56 -0400)]
net: rds: exchange of 8K and 1M pool

Before the commit 490ea5967b0d ("RDS: IB: move FMR code to its own file"),
when the dirty_count is greater than 9/10 of max_items of 8K pool,
1M pool is used, Vice versa. After the commit 490ea5967b0d ("RDS: IB: move
FMR code to its own file"), the above is removed. When we make the
following tests.

Server:
  rds-stress -r 1.1.1.16 -D 1M

Client:
  rds-stress -r 1.1.1.14 -s 1.1.1.16 -D 1M

The following will appear.
"
connecting to 1.1.1.16:4000
negotiated options, tasks will start in 2 seconds
Starting up..header from 1.1.1.166:4001 to id 4001 bogus
..
tsks  tx/s  rx/s tx+rx K/s  mbi K/s  mbo K/s tx us/c  rtt us
cpu %
   1    0    0     0.00     0.00     0.00    0.00 0.00 -1.00
   1    0    0     0.00     0.00     0.00    0.00 0.00 -1.00
   1    0    0     0.00     0.00     0.00    0.00 0.00 -1.00
   1    0    0     0.00     0.00     0.00    0.00 0.00 -1.00
   1    0    0     0.00     0.00     0.00    0.00 0.00 -1.00
...
"
So this exchange between 8K and 1M pool is added back.

Fixes: commit 490ea5967b0d ("RDS: IB: move FMR code to its own file")
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: sched: flower: refactor reoffload for concurrent access
Vlad Buslov [Wed, 24 Apr 2019 06:53:31 +0000 (09:53 +0300)]
net: sched: flower: refactor reoffload for concurrent access

Recent changes that introduced unlocked flower did not properly account for
case when reoffload is initiated concurrently with filter updates. To fix
the issue, extend flower with 'hw_filters' list that is used to store
filters that don't have 'skip_hw' flag set. Filter is added to the list
when it is inserted to hardware and only removed from it after being
unoffloaded from all drivers that parent block is attached to. This ensures
that concurrent reoffload can still access filter that is being deleted and
prevents race condition when driver callback can be removed when filter is
no longer accessible trough idr, but is still present in hardware.

Refactor fl_change() to respect new filter reference counter and to release
filter reference with __fl_put() in case of error, instead of directly
deallocating filter memory. This allows for concurrent access to filter
from fl_reoffload() and protects it with reference counting. Refactor
fl_reoffload() to iterate over hw_filters list instead of idr. Implement
fl_get_next_hw_filter() helper function that is used to iterate over
hw_filters list with reference counting and skips filters that are being
concurrently deleted.

Fixes: 92149190067d ("net: sched: flower: set unlocked flag for flower proto ops")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: mvneta: Switch to using devm_alloc_etherdev_mqs
Rosen Penev [Tue, 23 Apr 2019 20:46:03 +0000 (13:46 -0700)]
net: mvneta: Switch to using devm_alloc_etherdev_mqs

It allows some of the code to be simplified.

Tested on Turris Omnia.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotipc: tipc_udp_recv() cleanup vs rcu verbs
Eric Dumazet [Tue, 23 Apr 2019 16:24:46 +0000 (09:24 -0700)]
tipc: tipc_udp_recv() cleanup vs rcu verbs

First thing tipc_udp_recv() does is to use rcu_dereference_sk_user_data(),
and this is really hinting we already own rcu_read_lock() from the caller
(UDP stack).

No need to add another rcu_read_lock()/rcu_read_unlock() pair.

Also use rcu_dereference() instead of rcu_dereference_rtnl()
in the data path.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jon Maloy <jon.maloy@ericsson.com>
Cc: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: vrf: Fix operation not supported when set vrf mac
Miaohe Lin [Sat, 20 Apr 2019 04:09:39 +0000 (12:09 +0800)]
net: vrf: Fix operation not supported when set vrf mac

Vrf device is not able to change mac address now because lack of
ndo_set_mac_address. Complete this in case some apps need to do
this.

Reported-by: Hui Wang <wanghui104@huawei.com>
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoipv6: Use result arg in fib_lookup_arg consistently
David Ahern [Wed, 24 Apr 2019 01:05:33 +0000 (18:05 -0700)]
ipv6: Use result arg in fib_lookup_arg consistently

arg.result is sometimes used as fib6_result and sometimes used to
hold the rt6_info. Add rt6_info to fib6_result and make the use
of arg.result consistent through ipv6 rules.

The rt6 entry is filled in for lookups returning a dst_entry, but not
for direct fib_lookups that just want a fib6_info.

Fixes: effda4dd97e8 ("ipv6: Pass fib6_result to fib lookups")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoipv6: fib6_rule_action_alt needs to return -EAGAIN
David Ahern [Wed, 24 Apr 2019 01:06:30 +0000 (18:06 -0700)]
ipv6: fib6_rule_action_alt needs to return -EAGAIN

fib rule actions should return -EAGAIN for the rules to continue to the
next one. A recent change overwrote err with the lookup always returning
0 (future change will make it more like IPv4) which means the rules
stopped at the first (e.g., local table lookup only). Catch and reset err
to -EAGAIN.

Fixes: effda4dd97e87 ("ipv6: Pass fib6_result to fib lookups")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/ncsi: handle overflow when incrementing mac address
Tao Ren [Wed, 24 Apr 2019 01:43:32 +0000 (01:43 +0000)]
net/ncsi: handle overflow when incrementing mac address

Previously BMC's MAC address is calculated by simply adding 1 to the
last byte of network controller's MAC address, and it produces incorrect
result when network controller's MAC address ends with 0xFF.

The problem can be fixed by calling eth_addr_inc() function to increment
MAC address; besides, the MAC address is also validated before assigning
to BMC.

Fixes: cb10c7c0dfd9 ("net/ncsi: Add NCSI Broadcom OEM command")
Signed-off-by: Tao Ren <taoren@fb.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge tag 'drm-fixes-2019-04-24' of git://anongit.freedesktop.org/drm/drm
Linus Torvalds [Wed, 24 Apr 2019 04:08:52 +0000 (21:08 -0700)]
Merge tag 'drm-fixes-2019-04-24' of git://anongit.freedesktop.org/drm/drm

Pull drm regression fixes from Dave Airlie:
 "We interrupt your regularly scheduled drm fixes for a regression
  special.

  The first is for a fix in i915 that had unexpected side effects
  fallout in the userspace X.org modesetting driver where X would no
  longer start. I got tired of the nitpicking and issued a large hammer
  on it. The X.org driver is buggy, but blackscreen regressions are
  worse.

  The second was an oversight that myself and Gerd should have noticed
  better, Gerd is trying to fix this properly, but the regression is too
  large to leave, even if the original behaviour is bad in some cases,
  it's clearly bad to break a bunch of working use cases.

  I'll likely have a regular fixes pull later, but I really wanted to
  highlight these"

* tag 'drm-fixes-2019-04-24' of git://anongit.freedesktop.org/drm/drm:
  Revert "drm/virtio: drop prime import/export callbacks"
  Revert "drm/i915/fbdev: Actually configure untiled displays"

5 years agomlxsw: spectrum_router: Prevent ipv6 gateway with v4 route via replace and append
David Ahern [Tue, 23 Apr 2019 23:03:29 +0000 (16:03 -0700)]
mlxsw: spectrum_router: Prevent ipv6 gateway with v4 route via replace and append

mlxsw currently does not support v6 gateways with v4 routes. Commit
19a9d136f198 ("ipv4: Flag fib_info with a fib_nh using IPv6 gateway")
prevents a route from being added, but nothing stops the replace or
append. Add a catch for them too.
    $ ip  ro add 172.16.2.0/24 via 10.99.1.2
    $ ip  ro replace 172.16.2.0/24 via inet6 fe80::202:ff:fe00:b dev swp1s0
    Error: mlxsw_spectrum: IPv6 gateway with IPv4 route is not supported.
    $ ip  ro append 172.16.2.0/24 via inet6 fe80::202:ff:fe00:b dev swp1s0
    Error: mlxsw_spectrum: IPv6 gateway with IPv4 route is not supported.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'Taprio-qdisc-fixes'
David S. Miller [Wed, 24 Apr 2019 02:52:33 +0000 (19:52 -0700)]
Merge branch 'Taprio-qdisc-fixes'

Andre Guedes says:

====================
Taprio qdisc fixes

I'm re-sending this series, now with the "net-next" prefix in the subject.

The only change from the previous version is in patch 3. As suggested by Cong
Wang, it removes the !entry check within should_restart_cycle() since it is
already checked by the caller. As a side effect, that function becomes a dummy
wrapper on list_is_last() so we simply remove it and call list_is_last()
instead.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: sched: taprio: Fix taprio_dequeue()
Andre Guedes [Tue, 23 Apr 2019 19:44:24 +0000 (12:44 -0700)]
net: sched: taprio: Fix taprio_dequeue()

In case we don't have 'guard' or 'budget' to transmit the skb, we should
continue traversing the qdisc list since the remaining guard/budget
might be enough to transmit a skb from other children qdiscs.

Fixes: 5a781ccbd19e (“tc: Add support for configuring the taprio scheduler”)
Signed-off-by: Andre Guedes <andre.guedes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: sched: taprio: Fix taprio_peek()
Andre Guedes [Tue, 23 Apr 2019 19:44:23 +0000 (12:44 -0700)]
net: sched: taprio: Fix taprio_peek()

While traversing taprio's children qdisc list, if the gate is closed for
a given traffic class, we should continue traversing the list since the
remaining qdiscs may have skb ready for transmission.

This patch also takes this opportunity and changes the function to use
the TAPRIO_ALL_GATES_OPEN macro instead of the magic number '-1'.

Fixes: 5a781ccbd19e (“tc: Add support for configuring the taprio scheduler”)
Signed-off-by: Andre Guedes <andre.guedes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: sched: taprio: Remove should_restart_cycle()
Andre Guedes [Tue, 23 Apr 2019 19:44:22 +0000 (12:44 -0700)]
net: sched: taprio: Remove should_restart_cycle()

The 'entry' argument from should_restart_cycle() cannot be NULL since it
is already checked by the caller so the WARN_ON() within should_
restart_cycle() could be removed.  By doing that, that function becomes
a dummy wrapper on list_is_last() so this patch simply gets rid of it
and call list_is_last() within advance_sched() instead.

Signed-off-by: Andre Guedes <andre.guedes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: sched: taprio: Refactor taprio_get_start_time()
Andre Guedes [Tue, 23 Apr 2019 19:44:21 +0000 (12:44 -0700)]
net: sched: taprio: Refactor taprio_get_start_time()

This patch does a code refactoring to taprio_get_start_time() function
to improve readability and report error properly.

If 'base' time is later than 'now', the start time is equal to 'base'
and taprio_get_start_time() is done. That's the natural case so we move
that code to the beginning of the function. Also, if 'cycle' calculation
is zero, something went really wrong with taprio and we should log that
internal error properly.

Signed-off-by: Andre Guedes <andre.guedes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: sched: taprio: Remove pointless variable assigment
Andre Guedes [Tue, 23 Apr 2019 19:44:20 +0000 (12:44 -0700)]
net: sched: taprio: Remove pointless variable assigment

This patch removes a pointless variable assigment in taprio_change().
The 'err' variable is not used from this assignment to the next one so
this patch removes it.

Signed-off-by: Andre Guedes <andre.guedes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: Change nhc_flags to unsigned char
David Ahern [Tue, 23 Apr 2019 15:48:09 +0000 (08:48 -0700)]
net: Change nhc_flags to unsigned char

nhc_flags holds the RTNH_F flags for a given nexthop (fib{6}_nh).
All of the RTNH_F_ flags fit in an unsigned char, and since the API to
userspace (rtnh_flags and lower byte of rtm_flags) is 1 byte it can not
grow. Make nhc_flags in fib_nh_common an unsigned char and shrink the
size of the struct by 8, from 56 to 48 bytes.

Update the flags arguments for up netdevice events and fib_nexthop_info
which determines the RTNH_F flags to return on a dump/event. The RTNH_F
flags are passed in the lower byte of rtm_flags which is an unsigned int
so use a temp variable for the flags to fib_nexthop_info and combine
with rtm_flags in the caller.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agolwtunnel: Pass encap and encap type attributes to lwtunnel_fill_encap
David Ahern [Tue, 23 Apr 2019 15:23:41 +0000 (08:23 -0700)]
lwtunnel: Pass encap and encap type attributes to lwtunnel_fill_encap

Currently, lwtunnel_fill_encap hardcodes the encap and encap type
attributes as RTA_ENCAP and RTA_ENCAP_TYPE, respectively. The nexthop
objects want to re-use this code but the encap attributes passed to
userspace as NHA_ENCAP and NHA_ENCAP_TYPE. Since that is the only
difference, change lwtunnel_fill_encap to take the attribute type as
an input.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: socionext: replace napi_alloc_frag with the netdev variant on init
Ilias Apalodimas [Tue, 23 Apr 2019 06:01:41 +0000 (09:01 +0300)]
net: socionext: replace napi_alloc_frag with the netdev variant on init

The netdev variant is usable on any context since it disables interrupts.
The napi variant of the call should only be used within softirq context.
Replace napi_alloc_frag on driver init with the correct netdev_alloc_frag
call

Changes since v1:
- Adjusted commit message

Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Jassi Brar <jaswinder.singh@linaro.org>
Fixes: 4acb20b46214 ("net: socionext: different approach on DMA")
Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: atheros: fix spelling mistake "underun" -> "underrun"
Colin Ian King [Tue, 23 Apr 2019 14:30:07 +0000 (15:30 +0100)]
net: atheros: fix spelling mistake "underun" -> "underrun"

There are spelling mistakes in structure elements, fix these.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoravb: Avoid unsupported internal delay mode for R-Car E3/D3
Simon Horman [Tue, 23 Apr 2019 13:01:53 +0000 (15:01 +0200)]
ravb: Avoid unsupported internal delay mode for R-Car E3/D3

According to the R-Car Gen3 Hardware Manual Rev 1.50 of Nov 30, 2018, the
TX clock internal delay mode isn't supported on R-Car E3 (r8a77990) or D3
(r8a77995). And by extension it is also not supported by RZ/G2E (r9a774c0).

This matches all ES versions of the affected SoCs as it is
not clear if this problem will be resolved in newer chips.
This can be revisited, as necessary.

This patch does not error-out if PHY_INTERFACE_MODE_RGMII_ID or
PHY_INTERFACE_MODE_RGMII_TXID are used on SoCs where TX clock delay
mode is not supported as there is a risk of introducing a regression
when used in conjunction with older DT blobs present in the field.
Rather, a warning is logged in such cases.

Based on work by Kazuya Mizuguchi.

Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: fix sparc64 compilation of sock_gettstamp
Stephen Rothwell [Tue, 23 Apr 2019 07:25:24 +0000 (17:25 +1000)]
net: fix sparc64 compilation of sock_gettstamp

net/core/sock.c: In function 'sock_gettstamp':
net/core/sock.c:3007:23: error: expected '}' before ';' token
    .tv_sec = ts.tv_sec;
                       ^
net/core/sock.c:3011:4: error: expected ')' before 'return'
    return -EFAULT;
    ^~~~~~
net/core/sock.c:3013:2: error: expected expression before '}' token
  }
  ^

Fixes: c7cbdbf29f48 ("net: rework SIOCGSTAMP ioctl handling")
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoisdn:mISDN: fix misuse of %x in hfcpci.c
Fuqian Huang [Tue, 23 Apr 2019 02:56:23 +0000 (10:56 +0800)]
isdn:mISDN: fix misuse of %x in hfcpci.c

Pointers should be printed with %p or %px rather than
cast to (u_long) and printed with %lx.
Change %lx to %p to print the pointer.
Change %lx to %pad to print the dma_addr_t.

Signed-off-by: Fuqian Huang <huangfq.daxian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoisdn: hisax: Fix misuse of %x in config.c
Fuqian Huang [Tue, 23 Apr 2019 02:25:28 +0000 (10:25 +0800)]
isdn: hisax: Fix misuse of %x in config.c

Pointers should be printed with %p or %px rather than
cast to (u_long) type and printed with %lX.
As the function seems to be for debug purpose.
Change %lX to %px to print the pointer value.

Signed-off-by: Fuqian Huang <huangfq.daxian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoRevert "drm/virtio: drop prime import/export callbacks"
Dave Airlie [Wed, 24 Apr 2019 00:52:20 +0000 (10:52 +1000)]
Revert "drm/virtio: drop prime import/export callbacks"

This patch does more harm than good, as it breaks both Xwayland and
gnome-shell with X11.

Xwayland requires DRI3 & DRI3 requires PRIME.

X11 crash for obscure double-free reason which are hard to debug
(starting X11 by hand doesn't trigger the crash).

I don't see an apparent problem implementing those stub prime
functions, they may return an error at run-time, and it seems to be
handled fine by GNOME at least.

This reverts commit b318e3ff7ca065d6b107e424c85a63d7a6798a69.
[airlied:
This broke userspace for virtio-gpus, and regressed things from DRI3 to DRI2.

This brings back the original problem, but it's better than regressions.]

Fixes: b318e3ff7ca065d6b107e424c85a63d7a6798a ("drm/virtio: drop prime import/export callbacks")
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
5 years agoRevert "drm/i915/fbdev: Actually configure untiled displays"
Dave Airlie [Wed, 24 Apr 2019 00:47:56 +0000 (10:47 +1000)]
Revert "drm/i915/fbdev: Actually configure untiled displays"

This reverts commit d179b88deb3bf6fed4991a31fd6f0f2cad21fab5.

This commit is documented to break userspace X.org modesetting driver in certain configurations.

The X.org modesetting userspace driver is broken. No fixes are available yet. In order for this patch to be applied it either needs a config option or a workaround developed.

This has been reported a few times, saying it's a userspace problem is clearly against the regression rules.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109806
Signed-off-by: Dave Airlie <airlied@redhat.com>
Cc: <stable@vger.kernel.org> # v3.19+
5 years agoMerge branch 'ipv6-fib6_ref-conversion-to-refcount_t'
David S. Miller [Wed, 24 Apr 2019 00:19:48 +0000 (17:19 -0700)]
Merge branch 'ipv6-fib6_ref-conversion-to-refcount_t'

Eric Dumazet says:

====================
ipv6: fib6_ref conversion to refcount_t

We are chasing use-after-free in IPv6 that could have their origin
in fib6_ref 0 -> 1 transitions.

This patch series should help finding the root causes if these
illegal transitions ever happen.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoipv6: convert fib6_ref to refcount_t
Eric Dumazet [Tue, 23 Apr 2019 01:35:03 +0000 (18:35 -0700)]
ipv6: convert fib6_ref to refcount_t

We suspect some issues involving fib6_ref 0 -> 1 transitions might
cause strange syzbot reports.

Lets convert fib6_ref to refcount_t to catch them earlier.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Wei Wang <weiwan@google.com>
Acked-by: Wei Wang <weiwan@google.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoipv6: broadly use fib6_info_hold() helper
Eric Dumazet [Tue, 23 Apr 2019 01:35:02 +0000 (18:35 -0700)]
ipv6: broadly use fib6_info_hold() helper

Instead of using atomic_inc(), prefer fib6_info_hold()
so that upcoming refcount_t conversion is simpler.

Only fib6_info_alloc() is using atomic_set() since we
just allocated a new object.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Wei Wang <weiwan@google.com>
Acked-by: Wei Wang <weiwan@google.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoipv6: fib6_info_destroy_rcu() cleanup
Eric Dumazet [Tue, 23 Apr 2019 01:35:01 +0000 (18:35 -0700)]
ipv6: fib6_info_destroy_rcu() cleanup

We do not need to clear f6i->rt6i_exception_bucket right before
freeing f6i.

Note that f6i->rt6i_exception_bucket is properly protected by
f6i->exception_bucket_flushed being set to one in rt6_flush_exceptions()
under the protection of rt6_exception_lock.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Wei Wang <weiwan@google.com>
Acked-by: Wei Wang <weiwan@google.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge tag 'mlx5-updates-2019-04-22' of git://git.kernel.org/pub/scm/linux/kernel...
David S. Miller [Wed, 24 Apr 2019 00:03:40 +0000 (17:03 -0700)]
Merge tag 'mlx5-updates-2019-04-22' of git://git./linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2019-04-22

This series includes updates to mlx5e driver RX data path and some
significant XDP RX/TX improvements to overcome/mitigate HW and PCIE
bottlenecks.

From Tariq:
1) Some Enhancements in rq->flags
2) Stabilize RX packet rate (on Striding RQ) with
multiple outstanding UMR posts
In this patch, we add support for multiple outstanding UMR posts,
 to allow faster gap closure between consuming MPWQEs and reposting
them back into the WQ.

Performance test:
As expected, huge improvement in large-scale (48 cores).

xdp_redirect_map, 64B UDP multi-stream.
Redirect from ConnectX-5 100Gbps to ConnectX-6 100Gbps.
CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz.

Before: Unstable, 7 to 30 Mpps
After:  Stable,   at 70.5 Mpps

From Shay:
3) XDP, Inline small packets into the TX MPWQE in XDP xmit flow

Upon high packet rate with multiple CPUs TX workloads, much of the HCA's
resources are spent on prefetching TX descriptors, thus affecting
transmission rates.
This patch comes to mitigate this problem by moving some workload to the
CPU and reducing the HW data prefetch overhead for small packets (<= 256B).

When forwarding packets with XDP, a packet that is smaller
than a certain size (set to ~256 bytes) would be sent inline within
its WQE TX descrptor (mem-copied), when the hardware tx queue is congested
beyond a pre-defined water-mark.

Performance:
    Tested packet rate for UDP 64Byte multi-stream
    over two dual port ConnectX-5 100Gbps NICs.
    CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz

    * Tested with hyper-threading disabled

    XDP_TX:

    |          | before | after   |       |
    | 24 rings | 51Mpps | 116Mpps | +126% |
    | 1 ring   | 12Mpps | 12Mpps  | same  |

    XDP_REDIRECT:

    ** Below is the transmit rate, not the redirection rate
    which might be larger, and is not affected by this patch.

    |          | before  | after   |      |
    | 32 rings | 64Mpps  | 92Mpps  | +43% |
    | 1 ring   | 6.4Mpps | 6.4Mpps | same |

As we can see, feature significantly improves scaling, without
hurting single ring performance.

From Maxim:
4) Some trivial refactoring and code improvements prior to a larger series
to support AF_XDP.
====================

Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge tag 'nfsd-5.1-1' of git://linux-nfs.org/~bfields/linux
Linus Torvalds [Tue, 23 Apr 2019 20:40:55 +0000 (13:40 -0700)]
Merge tag 'nfsd-5.1-1' of git://linux-nfs.org/~bfields/linux

Pull nfsd bugfixes from Bruce Fields:
 "Fix miscellaneous nfsd bugs, in NFSv4.1 callbacks, NFSv4.1
  lock-notification callbacks, NFSv3 readdir encoding, and the
  cache/upcall code"

* tag 'nfsd-5.1-1' of git://linux-nfs.org/~bfields/linux:
  nfsd: wake blocked file lock waiters before sending callback
  nfsd: wake waiters blocked on file_lock before deleting it
  nfsd: Don't release the callback slot unless it was actually held
  nfsd/nfsd3_proc_readdir: fix buffer count and page pointers
  sunrpc: don't mark uninitialised items as VALID.

5 years agoMerge tag 'syscalls-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm...
Linus Torvalds [Tue, 23 Apr 2019 20:34:17 +0000 (13:34 -0700)]
Merge tag 'syscalls-5.1' of git://git./linux/kernel/git/arnd/asm-generic

Pull syscall numbering updates from Arnd Bergmann:
 "arch: add pidfd and io_uring syscalls everywhere

  This comes a bit late, but should be in 5.1 anyway: we want the newly
  added system calls to be synchronized across all architectures in the
  release.

  I hope that in the future, any newly added system calls can be added
  to all architectures at the same time, and tested there while they are
  in linux-next, avoiding dependencies between the architecture
  maintainer trees and the tree that contains the new system call"

* tag 'syscalls-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  arch: add pidfd and io_uring syscalls everywhere

5 years agonet/mlx5e: Use #define for the WQE wait timeout constant
Maxim Mikityanskiy [Tue, 5 Mar 2019 12:46:16 +0000 (14:46 +0200)]
net/mlx5e: Use #define for the WQE wait timeout constant

Create a #define for the timeout of mlx5e_wait_for_min_rx_wqes to
clarify the meaning of a magic number.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Remove unused rx_page_reuse stat
Maxim Mikityanskiy [Thu, 21 Mar 2019 13:22:57 +0000 (15:22 +0200)]
net/mlx5e: Remove unused rx_page_reuse stat

Remove the no longer used page_reuse stat of RQs.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Take HW interrupt trigger into a function
Maxim Mikityanskiy [Fri, 1 Mar 2019 10:05:21 +0000 (12:05 +0200)]
net/mlx5e: Take HW interrupt trigger into a function

mlx5e_trigger_irq posts a NOP to the ICO SQ just to trigger an IRQ and
enter the NAPI poll on the right CPU according to the affinity. Use it
in mlx5e_activate_rq.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Remove unused parameter
Maxim Mikityanskiy [Wed, 27 Mar 2019 11:46:50 +0000 (13:46 +0200)]
net/mlx5e: Remove unused parameter

mdev is unused in mlx5e_rx_is_linear_skb.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Add an underflow warning comment
Maxim Mikityanskiy [Wed, 27 Mar 2019 11:39:21 +0000 (13:39 +0200)]
net/mlx5e: Add an underflow warning comment

mlx5e_mpwqe_get_log_rq_size calculates the number of WQEs (N) based on
the requested number of frames in the RQ (F) and the number of packets
per WQE (P). It ensures that N is not less than the minimum number of
WQEs in an RQ (N_min). Arithmetically, it means that F / P >= N_min
should be true. This function deals with logarithms, so it should check
that log(F) - log(P) >= log(N_min). However, if F < P, this expression
will cause an unsigned underflow. Check log(F) >= log(P) + log(N_min)
instead.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Move parameter calculation functions to en/params.c
Maxim Mikityanskiy [Wed, 27 Mar 2019 11:09:27 +0000 (13:09 +0200)]
net/mlx5e: Move parameter calculation functions to en/params.c

This commit moves the parameter calculation functions to a separate file
for better modularity and code sharing with future features.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Report mlx5e_xdp_set errors
Maxim Mikityanskiy [Tue, 19 Mar 2019 16:32:37 +0000 (18:32 +0200)]
net/mlx5e: Report mlx5e_xdp_set errors

If the channels fail to reopen after setting an XDP program, return the
error code instead of 0. A proper fix is still needed, as now any error
while reopening the channels brings the interface down. This patch only
adds error reporting.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Remove unused parameter
Maxim Mikityanskiy [Thu, 7 Mar 2019 17:30:30 +0000 (19:30 +0200)]
net/mlx5e: Remove unused parameter

params is unused in mlx5e_init_di_list.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: XDP, Inline small packets into the TX MPWQE in XDP xmit flow
Shay Agroskin [Thu, 14 Mar 2019 12:54:07 +0000 (14:54 +0200)]
net/mlx5e: XDP, Inline small packets into the TX MPWQE in XDP xmit flow

Upon high packet rate with multiple CPUs TX workloads, much of the HCA's
resources are spent on prefetching TX descriptors, thus affecting
transmission rates.
This patch comes to mitigate this problem by moving some workload to the
CPU and reducing the HW data prefetch overhead for small packets (<= 256B).

When forwarding packets with XDP, a packet that is smaller
than a certain size (set to ~256 bytes) would be sent inline within
its WQE TX descrptor (mem-copied), when the hardware tx queue is congested
beyond a pre-defined water-mark.

This is added to better utilize the HW resources (which now makes
one less packet data prefetch) and allow better scalability, on the
account of CPU usage (which now 'memcpy's the packet into the WQE).

To load balance between HW and CPU and get max packet rate, we use
watermarks to detect how much the HW is congested and move the work
loads back and forth between HW and CPU.

Performance:
Tested packet rate for UDP 64Byte multi-stream
over two dual port ConnectX-5 100Gbps NICs.
CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz

* Tested with hyper-threading disabled

XDP_TX:

|          | before | after   |       |
| 24 rings | 51Mpps | 116Mpps | +126% |
| 1 ring   | 12Mpps | 12Mpps  | same  |

XDP_REDIRECT:

** Below is the transmit rate, not the redirection rate
which might be larger, and is not affected by this patch.

|          | before  | after   |      |
| 32 rings | 64Mpps  | 92Mpps  | +43% |
| 1 ring   | 6.4Mpps | 6.4Mpps | same |

As we can see, feature significantly improves scaling, without
hurting single ring performance.

Signed-off-by: Shay Agroskin <shayag@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: XDP, Add TX MPWQE session counter
Shay Agroskin [Mon, 25 Feb 2019 16:02:09 +0000 (18:02 +0200)]
net/mlx5e: XDP, Add TX MPWQE session counter

This counter tracks how many TX MPWQE sessions are started in XDP SQ
in XDP TX/REDIRECT flow. It counts per-channel and global stats.

Signed-off-by: Shay Agroskin <shayag@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: XDP, Enhance RQ indication for XDP redirect flush
Tariq Toukan [Sun, 10 Mar 2019 13:35:58 +0000 (15:35 +0200)]
net/mlx5e: XDP, Enhance RQ indication for XDP redirect flush

The XDP redirect flush indication belongs to the receive queue,
not to its XDP send queue.

For this, use a new bit on rq->flags.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Shay Agroskin <shayag@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: XDP, Fix shifted flag index in RQ bitmap
Tariq Toukan [Sun, 10 Mar 2019 13:29:13 +0000 (15:29 +0200)]
net/mlx5e: XDP, Fix shifted flag index in RQ bitmap

Values in enum mlx5e_rq_flag are used as bit indixes.
Intention was to use them with no BIT(i) wrapping.

No functional bug fix here, as the same (shifted)flag bit
is used for all set, test, and clear operations.

Fixes: 121e89275471 ("net/mlx5e: Refactor RQ XDP_TX indication")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Shay Agroskin <shayag@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: RX, Support multiple outstanding UMR posts
Tariq Toukan [Wed, 27 Feb 2019 10:06:08 +0000 (12:06 +0200)]
net/mlx5e: RX, Support multiple outstanding UMR posts

The buffers mapping of the Multi-Packet WQEs (of Striding RQ)
is done via UMR posts, one UMR WQE per an RX MPWQE.

A single MPWQE is capable of serving many incoming packets,
usually larger than the budget of a single napi cycle.
Hence, posting a single UMR WQE per napi cycle (and handling its
completion in the next cycle) works fine in many common cases,
but not always.

When an XDP program is loaded, every MPWQE is capable of serving less
packets, to satisfy the packet-per-page requirement.
Thus, for the same number of packets more MPWQEs (and UMR posts)
are needed (twice as much for the default MTU), giving less latency
room for the UMR completions.

In this patch, we add support for multiple outstanding UMR posts,
to allow faster gap closure between consuming MPWQEs and reposting
them back into the WQ.

For better SW and HW locality, we combine the UMR posts in bulks of
(at least) two.

This is expected to improve packet rate in high CPU scale.

Performance test:
As expected, huge improvement in large-scale (48 cores).

xdp_redirect_map, 64B UDP multi-stream.
Redirect from ConnectX-5 100Gbps to ConnectX-6 100Gbps.
CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz.

Before: Unstable, 7 to 30 Mpps
After:  Stable,   at 70.5 Mpps

No degradation in other tested scenarios.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agoMerge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox...
Saeed Mahameed [Tue, 23 Apr 2019 18:57:33 +0000 (11:57 -0700)]
Merge branch 'mlx5-next' of git://git./linux/kernel/git/mellanox/linux

5 years agoMerge branch 'net-phy-mscc-Improvements-to-VSC8514-PHY-driver'
David S. Miller [Tue, 23 Apr 2019 17:47:58 +0000 (10:47 -0700)]
Merge branch 'net-phy-mscc-Improvements-to-VSC8514-PHY-driver'

Kavya Sree Kotagiri says:

====================
net: phy: mscc: Improvements to VSC8514 PHY driver.

    The VSC8514 PHY is a 4-ports PHY that is 10/100/1000BASE-T, 100BASE-FX,
    1000BASE-X, can communicate with the MAC via QSGMII.
    The MAC interface protocol for each port within QSGMII can
    be either 1000BASE-X or SGMII, if the QSGMII MAC that the VSC8514 is
    connecting to supports this functionality.
    VSC8514 also supports SGMII MAC-side autonegotiation on each individual
    port, downshifting, can set the blinking pattern of each of its 4 LEDs,
    SyncE, 1000BASE-T Ring Resiliency as well as HP Auto-MDIX detection.

    This patch series adds support for 10BASE-T, 100BASE-TX, and
    1000BASE-T, QSGMII link with the MAC, downshifting, HP Auto-MDIX
    detection and blinking pattern for its 4 LEDs.

    The GPIO register bank is a set of registers that are common to all
    PHYs in the package. So any modification in any register of this bank
    affects all PHYs of the package.

    If the PHYs haven't been reset before booting the Linux kernel and were
    configured to use interrupts for e.g. link status updates, it is
    required to clear the interrupts mask register of all PHYs before being
    able to use interrupts with any PHY. The first PHY of the package that
    will be init will take care of clearing all PHYs interrupts mask
    registers. Thus, we need to keep track of the init sequence in the
    package, if it's already been done or if it's to be done.

    Most of the init sequence of a PHY of the package is common to all PHYs
    in the package, thus we use the SMI broadcast feature which enables us
    to propagate a write in one register of one PHY to all PHYs in the same
    package.

    This patch series adds support for VSC8514 in Microsemi driver(mscc.c)
    and removes support from Vitesse driver(vitesse.c).

v8
- mscc: Added appropriate code using phy_modify() in vsc8514_config_init().

v7
- mscc: Handled return values in vsc8514_config_init().

v6
- mscc: Added proper return value in vsc85xx_csr_ctrl_phy_read().
- mscc: Replaced __mdiobus_write and__mdiobus_read with __phy_write and __phy_read resp.
- mscc: Replaced register addresses in 8514_config_init() with proper constants.

v5
- mscc: Added return error statements for few function calls.
- mscc: Added comments in vsc85xx_csr_ctrl_phy_read() and vsc85xx_csr_ctrl_phy_write()
v4
- mscc: Removed features settings
- mscc: Removed aneg_done settings.

v3
- mscc: Used BIT(x) for PHY_MCB_S6G_WRITE and PHY_MCB_S6G_READ
        instead of hex.
- mscc: Replaced magic numbers with proper constants.
- mscc: Handled delays and timeouts at appropriate points.
- mscc: Added comments/explanation where requested.

v2
- mscc: Sorted variable declarations in reverse christmas tree order.

v1
- Added 0/2 file.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: vitesse: Remove support for VSC8514.
Kavya Sree Kotagiri [Mon, 22 Apr 2019 11:52:04 +0000 (11:52 +0000)]
net: phy: vitesse: Remove support for VSC8514.

Add support for VSC8514 in Microsemi driver (mscc.c)
with more features.

Signed-off-by: Kavya Sree Kotagiri <kavyasree.kotagiri@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: mscc: add support for VSC8514 PHY.
Kavya Sree Kotagiri [Mon, 22 Apr 2019 11:51:35 +0000 (11:51 +0000)]
net: phy: mscc: add support for VSC8514 PHY.

The VSC8514 PHY is a 4-ports PHY that is 10/100/1000BASE-T, 100BASE-FX,
1000BASE-X, can communicate with the MAC via QSGMII.
The MAC interface protocol for each port within QSGMII can
be either 1000BASE-X or SGMII, if the QSGMII MAC that the VSC8514 is
connecting to supports this functionality.
VSC8514 also supports SGMII MAC-side autonegotiation on each individual
port, downshifting, can set the blinking pattern of each of its 4 LEDs,
SyncE, 1000BASE-T Ring Resiliency as well as HP Auto-MDIX detection.

This adds support for 10BASE-T, 100BASE-TX, and 1000BASE-T,
QSGMII link with the MAC, downshifting, HP Auto-MDIX detection
and blinking pattern for its 4 LEDs.

The GPIO register bank is a set of registers that are common to all PHYs
in the package. So any modification in any register of this bank affects
all PHYs of the package.

If the PHYs haven't been reset before booting the Linux kernel and were
configured to use interrupts for e.g. link status updates, it is
required to clear the interrupts mask register of all PHYs before being
able to use interrupts with any PHY. The first PHY of the package that
will be init will take care of clearing all PHYs interrupts mask
registers. Thus, we need to keep track of the init sequence in the
package, if it's already been done or if it's to be done.

Most of the init sequence of a PHY of the package is common to all PHYs
in the package, thus we use the SMI broadcast feature which enables us
to propagate a write in one register of one PHY to all PHYs in the same
package.

Signed-off-by: Kavya Sree Kotagiri <kavyasree.kotagiri@microchip.com>
Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com>
Co-developed-by: Quentin Schulz <quentin.schulz@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agospi: ST ST95HF NFC: declare missing of table
Daniel Gomez [Mon, 22 Apr 2019 19:08:04 +0000 (21:08 +0200)]
spi: ST ST95HF NFC: declare missing of table

Add missing <of_device_id> table for SPI driver relying on SPI
device match since compatible is in a DT binding or in a DTS.

Before this patch:
modinfo drivers/nfc/st95hf/st95hf.ko | grep alias
alias:          spi:st95hf

After this patch:
modinfo drivers/nfc/st95hf/st95hf.ko | grep alias
alias:          spi:st95hf
alias:          of:N*T*Cst,st95hfC*
alias:          of:N*T*Cst,st95hf

Reported-by: Javier Martinez Canillas <javier@dowhile0.org>
Signed-off-by: Daniel Gomez <dagmcr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agospi: Micrel eth switch: declare missing of table
Daniel Gomez [Mon, 22 Apr 2019 19:08:03 +0000 (21:08 +0200)]
spi: Micrel eth switch: declare missing of table

Add missing <of_device_id> table for SPI driver relying on SPI
device match since compatible is in a DT binding or in a DTS.

Before this patch:
modinfo drivers/net/phy/spi_ks8995.ko | grep alias
alias:          spi:ksz8795
alias:          spi:ksz8864
alias:          spi:ks8995

After this patch:
modinfo drivers/net/phy/spi_ks8995.ko | grep alias
alias:          spi:ksz8795
alias:          spi:ksz8864
alias:          spi:ks8995
alias:          of:N*T*Cmicrel,ksz8795C*
alias:          of:N*T*Cmicrel,ksz8795
alias:          of:N*T*Cmicrel,ksz8864C*
alias:          of:N*T*Cmicrel,ksz8864
alias:          of:N*T*Cmicrel,ks8995C*
alias:          of:N*T*Cmicrel,ks8995

Reported-by: Javier Martinez Canillas <javier@dowhile0.org>
Signed-off-by: Daniel Gomez <dagmcr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: marvell: add new default led configure for m88e151x
Jian Shen [Mon, 22 Apr 2019 13:52:23 +0000 (21:52 +0800)]
net: phy: marvell: add new default led configure for m88e151x

The default m88e151x LED configuration is 0x1177, used LED[0]
for 1000M link, LED[1] for 100M link, and LED[2] for active.
But for some boards, which use LED[0] for link, and LED[1] for
active, prefer to be 0x1040. To be compatible with this case,
this patch defines a new dev_flag, and set it before connect
phy in HNS3 driver. When phy initializing, using the new
LED configuration if this dev_flag is set.

Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: systemport: Remove need for DMA descriptor
Florian Fainelli [Mon, 22 Apr 2019 16:46:44 +0000 (09:46 -0700)]
net: systemport: Remove need for DMA descriptor

All we do is write the length/status and address bits to a DMA
descriptor only to write its contents into on-chip registers right
after, eliminate this unnecessary step.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agobridge: Fix possible use-after-free when deleting bridge port
Ido Schimmel [Mon, 22 Apr 2019 09:33:19 +0000 (09:33 +0000)]
bridge: Fix possible use-after-free when deleting bridge port

When a bridge port is being deleted, do not dereference it later in
br_vlan_port_event() as it can result in a use-after-free [1] if the RCU
callback was executed before invoking the function.

[1]
[  129.638551] ==================================================================
[  129.646904] BUG: KASAN: use-after-free in br_vlan_port_event+0x53c/0x5fd
[  129.654406] Read of size 8 at addr ffff8881e4aa1ae8 by task ip/483
[  129.663008] CPU: 0 PID: 483 Comm: ip Not tainted 5.1.0-rc5-custom-02265-ga946bd73daac #1383
[  129.672359] Hardware name: Mellanox Technologies Ltd. MSN2100-CB2FO/SA001017, BIOS 5.6.5 06/07/2016
[  129.682484] Call Trace:
[  129.685242]  dump_stack+0xa9/0x10e
[  129.689068]  print_address_description.cold.2+0x9/0x25e
[  129.694930]  kasan_report.cold.3+0x78/0x9d
[  129.704420]  br_vlan_port_event+0x53c/0x5fd
[  129.728300]  br_device_event+0x2c7/0x7a0
[  129.741505]  notifier_call_chain+0xb5/0x1c0
[  129.746202]  rollback_registered_many+0x895/0xe90
[  129.793119]  unregister_netdevice_many+0x48/0x210
[  129.803384]  rtnl_delete_link+0xe1/0x140
[  129.815906]  rtnl_dellink+0x2a3/0x820
[  129.844166]  rtnetlink_rcv_msg+0x397/0x910
[  129.868517]  netlink_rcv_skb+0x137/0x3a0
[  129.882013]  netlink_unicast+0x49b/0x660
[  129.900019]  netlink_sendmsg+0x755/0xc90
[  129.915758]  ___sys_sendmsg+0x761/0x8e0
[  129.966315]  __sys_sendmsg+0xf0/0x1c0
[  129.988918]  do_syscall_64+0xa4/0x470
[  129.993032]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  129.998696] RIP: 0033:0x7ff578104b58
...
[  130.073811] Allocated by task 479:
[  130.077633]  __kasan_kmalloc.constprop.5+0xc1/0xd0
[  130.083008]  kmem_cache_alloc_trace+0x152/0x320
[  130.088090]  br_add_if+0x39c/0x1580
[  130.092005]  do_set_master+0x1aa/0x210
[  130.096211]  do_setlink+0x985/0x3100
[  130.100224]  __rtnl_newlink+0xc52/0x1380
[  130.104625]  rtnl_newlink+0x6b/0xa0
[  130.108541]  rtnetlink_rcv_msg+0x397/0x910
[  130.113136]  netlink_rcv_skb+0x137/0x3a0
[  130.117538]  netlink_unicast+0x49b/0x660
[  130.121939]  netlink_sendmsg+0x755/0xc90
[  130.126340]  ___sys_sendmsg+0x761/0x8e0
[  130.130645]  __sys_sendmsg+0xf0/0x1c0
[  130.134753]  do_syscall_64+0xa4/0x470
[  130.138864]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

[  130.146195] Freed by task 0:
[  130.149421]  __kasan_slab_free+0x125/0x170
[  130.154016]  kfree+0xf3/0x310
[  130.157349]  kobject_put+0x1a8/0x4c0
[  130.161363]  rcu_core+0x859/0x19b0
[  130.165175]  __do_softirq+0x250/0xa26
[  130.170956] The buggy address belongs to the object at ffff8881e4aa1ae8
                which belongs to the cache kmalloc-1k of size 1024
[  130.184972] The buggy address is located 0 bytes inside of
                1024-byte region [ffff8881e4aa1ae8ffff8881e4aa1ee8)

Fixes: 9c0ec2e7182a ("bridge: support binding vlan dev link state to vlan member bridge ports")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Cc: Mike Manning <mmanning@vyatta.att-mail.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: Mike Manning <mmanning@vyatta.att-mail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agor8152: sync sa_family with the media type of network device
Crag.Wang [Mon, 22 Apr 2019 05:03:43 +0000 (13:03 +0800)]
r8152: sync sa_family with the media type of network device

Without this patch the socket address family sporadically gets wrong
value ends up the dev_set_mac_address() fails to set the desired MAC
address.

Fixes: 25766271e42f ("r8152: Refresh MAC address during USBDEVFS_RESET")
Signed-off-by: Crag.Wang <crag.wang@dell.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-By: Mario Limonciello <mario.limonciello@dell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'mlxsw-Shared-buffer-improvements'
David S. Miller [Tue, 23 Apr 2019 05:09:33 +0000 (22:09 -0700)]
Merge branch 'mlxsw-Shared-buffer-improvements'

Ido Schimmel says:

====================
mlxsw: Shared buffer improvements

This patchset includes two improvements with regards to shared buffer
configuration in mlxsw.

The first part of this patchset forbids the user from performing illegal
shared buffer configuration that can result in unnecessary packet loss.
In order to better communicate these configuration failures to the user,
extack is propagated from devlink towards drivers. This is done in
patches #1-#8.

The second part of the patchset deals with the shared buffer
configuration of the CPU port. When a packet is trapped by the device,
it is sent across the PCI bus to the attached host CPU. From the
device's perspective, it is as if the packet is transmitted through the
CPU port.

While testing traffic directed at the CPU it became apparent that for
certain packet sizes and certain burst sizes, the current shared buffer
configuration of the CPU port is inadequate and results in packet drops.
The configuration is adjusted by patches #9-#14 that create two new pools
- ingress & egress - which are dedicated for CPU traffic.
====================

Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum_buffers: Adjust CPU port shared buffer egress quotas
Ido Schimmel [Mon, 22 Apr 2019 12:08:56 +0000 (12:08 +0000)]
mlxsw: spectrum_buffers: Adjust CPU port shared buffer egress quotas

Switch the CPU port to use the new dedicated egress pool instead the
previously used egress pool which was shared with normal front panel
ports.

Add per-port quotas for the amount of traffic that can be buffered for
the CPU port and also adjust the per-{port, TC} quotas.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum_buffers: Allow skipping ingress port quota configuration
Ido Schimmel [Mon, 22 Apr 2019 12:08:55 +0000 (12:08 +0000)]
mlxsw: spectrum_buffers: Allow skipping ingress port quota configuration

The CPU port is used to transmit traffic that is trapped to the host
CPU. It is therefore irrelevant to define ingress quota for it.

Add a 'skip_ingress' argument to the function tasked with configuring
per-port quotas, so that ingress quotas could be skipped in case the
passed local port is the CPU port.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum_buffers: Split business logic from mlxsw_sp_port_sb_pms_init()
Ido Schimmel [Mon, 22 Apr 2019 12:08:54 +0000 (12:08 +0000)]
mlxsw: spectrum_buffers: Split business logic from mlxsw_sp_port_sb_pms_init()

The function is used to set the per-port shared buffer quotas.
Currently, these quotas are only set for front panel ports, but a
subsequent patch will configure these quotas for the CPU port as well.

The configuration required for the CPU port is a bit different than that
of the front panel ports, so split the business logic into a separate
function which will be called with different parameters for the CPU
port.

No functional changes intended.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum_buffers: Use new CPU ingress pool for control packets
Ido Schimmel [Mon, 22 Apr 2019 12:08:52 +0000 (12:08 +0000)]
mlxsw: spectrum_buffers: Use new CPU ingress pool for control packets

Use the new ingress pool that was added in the previous patch for
control packets (e.g., STP, LACP) that are trapped to the CPU.

The previous management pool is no longer necessary and therefore its
size is set to 0.

The maximum quota for traffic towards the CPU is increased to 50% of the
free space in the new ingress pool and therefore the reserved space is
reduced by half, to 10KB - in both the shared and headroom buffer. This
allows for more efficient utilization of the shared buffer as reserved
space cannot be used for other purposes.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum_buffers: Add pools for CPU traffic
Ido Schimmel [Mon, 22 Apr 2019 12:08:51 +0000 (12:08 +0000)]
mlxsw: spectrum_buffers: Add pools for CPU traffic

Packets that are trapped to the CPU are transmitted through the CPU port
to the attached host. The CPU port is therefore like any other port and
needs to have shared buffer configuration.

The maximum quotas configured for the CPU are provided using dynamic
threshold and cannot be changed by the user. In order to make sure that
these thresholds are always valid, the configuration of the threshold
type of these pools is forbidden.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum_buffers: Remove assumption about pool order
Ido Schimmel [Mon, 22 Apr 2019 12:08:50 +0000 (12:08 +0000)]
mlxsw: spectrum_buffers: Remove assumption about pool order

The code currently assumes that ingress pools have lower indices than
egress pools. This makes it impossible to add more ingress pools
without breaking user configuration that relies on a certain pool index
to correspond to an egress pool.

Remove such assumptions from the code, so that more ingress pools could
be added by subsequent patches.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum_buffers: Forbid changing multicast TCs' attributes
Ido Schimmel [Mon, 22 Apr 2019 12:08:49 +0000 (12:08 +0000)]
mlxsw: spectrum_buffers: Forbid changing multicast TCs' attributes

Commit e83c045e53d7 ("mlxsw: spectrum_buffers: Configure MC pool")
configured the threshold of the multicast TCs as infinite so that the
admission of multicast packets is only depended on per-switch priority
threshold.

Forbid the user from changing the thresholds of these multicast TCs and
their binding to a different pool.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum_buffers: Forbid changing threshold type of first egress pool
Ido Schimmel [Mon, 22 Apr 2019 12:08:47 +0000 (12:08 +0000)]
mlxsw: spectrum_buffers: Forbid changing threshold type of first egress pool

Multicast packets have three egress quotas:
* Per egress port
* Per egress port and traffic class
* Per switch priority

The limits on the switch priority are not exposed to the user and
specified as dynamic threshold on the first egress pool.

Forbid changing the threshold type of the first egress pool so that
these limits are always valid.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum_buffers: Forbid configuration of multicast pool
Ido Schimmel [Mon, 22 Apr 2019 12:08:46 +0000 (12:08 +0000)]
mlxsw: spectrum_buffers: Forbid configuration of multicast pool

Commit e83c045e53d7 ("mlxsw: spectrum_buffers: Configure MC pool") added
a dedicated pool for multicast traffic. The pool is visible to the user
so that it would be possible to monitor its occupancy, but its
configuration should be forbidden in order to maintain its intended
operation.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum_buffers: Add ability to veto TC's configuration
Ido Schimmel [Mon, 22 Apr 2019 12:08:45 +0000 (12:08 +0000)]
mlxsw: spectrum_buffers: Add ability to veto TC's configuration

Subsequent patches are going to need to veto changes in certain TCs'
binding and threshold configurations.

Add fields to the TC's struct that indicate if the TC can be bound to a
different pool and whether its threshold can change and enforce that.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum_buffers: Add ability to veto pool's configuration
Ido Schimmel [Mon, 22 Apr 2019 12:08:43 +0000 (12:08 +0000)]
mlxsw: spectrum_buffers: Add ability to veto pool's configuration

Subsequent patches are going to need to veto changes in certain pools'
size and / or threshold type (mode).

Add two fields to the pool's struct that indicate if either of these
attributes is allowed to change and enforce that.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum_buffers: Use defines for pool indices
Ido Schimmel [Mon, 22 Apr 2019 12:08:42 +0000 (12:08 +0000)]
mlxsw: spectrum_buffers: Use defines for pool indices

The pool indices are currently hard coded throughout the code, which
makes the code hard to follow and extend.

Overcome this by using defines for the pool indices.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum_buffers: Add extack messages for invalid configurations
Ido Schimmel [Mon, 22 Apr 2019 12:08:41 +0000 (12:08 +0000)]
mlxsw: spectrum_buffers: Add extack messages for invalid configurations

Add extack messages to better communicate invalid configuration to the
user.

Example:

# devlink sb pool set pci/0000:01:00.0 pool 0 size 104857600 thtype dynamic
Error: mlxsw_spectrum: Exceeded shared buffer size.
devlink answers: Invalid argument

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: devlink: Add extack to shared buffer operations
Ido Schimmel [Mon, 22 Apr 2019 12:08:39 +0000 (12:08 +0000)]
net: devlink: Add extack to shared buffer operations

Add extack to shared buffer set operations, so that meaningful error
messages could be propagated to the user.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: stmmac: move stmmac_check_ether_addr() to driver probe
Vinod Koul [Mon, 22 Apr 2019 09:45:32 +0000 (15:15 +0530)]
net: stmmac: move stmmac_check_ether_addr() to driver probe

stmmac_check_ether_addr() checks the MAC address and assigns one in
driver open(). In many cases when we create slave netdevice, the dev
addr is inherited from master but the master dev addr maybe NULL at
that time, so move this call to driver probe so that address is
always valid.

Signed-off-by: Xiaofei Shen <xiaofeis@codeaurora.org>
Tested-by: Xiaofei Shen <xiaofeis@codeaurora.org>
Signed-off-by: Sneh Shah <snehshah@codeaurora.org>
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'net-clean-up-needless-use-of-module-infrastructure'
David S. Miller [Tue, 23 Apr 2019 04:51:00 +0000 (21:51 -0700)]
Merge branch 'net-clean-up-needless-use-of-module-infrastructure'

Paul Gortmaker says:

====================
clean up needless use of module infrastructure

People can embed modular includes and modular exit functions into code
that never use any of it, and they won't get any errors or warnings.

Using modular infrastructure in non-modules might seem harmless, but some
of the downfalls this leads to are:

 (1) it is easy to accidentally write unused module_exit removal code
 (2) it can be misleading when reading the source, thinking a driver can
     be modular when the Makefile and/or Kconfig prohibit it
 (3) an unused include of the module.h header file will in turn
     include nearly everything else; adding a lot to CPP overhead.
 (4) it gets copied/replicated into other drivers and spreads quickly.

As a data point for #3 above, an empty C file that just includes the
module.h header generates over 750kB of CPP output.  Repeating the same
experiment with init.h and the result is less than 12kB; with export.h
it is only about 1/2kB; with both it still is less than 12kB.  One driver
in this series gets the module.h ---> init.h+export.h conversion.

Worse, are headers in include/linux that in turn include <linux/module.h>
as they can impact a whole fleet of drivers, or a whole subsystem, so
special care should be used in order to avoid that.  Such headers should
only include what they need to be stand-alone; they should not be trying
to anticipate the various header needs of their possible end users.

In this series, four include/linux headers have module.h removed from
them because they don't strictly need it.  Then three chunks of net
related code have modular infrastructure that isn't used, removed.

There are no runtime changes, so the biggest risk is a genuine consumer
of module.h content relying on implicitly getting it from one of the
include/linux instances removed here - thus resulting in a build fail.

With that in mind, allmodconfig build testing was done on x86-64, arm64,
x86-32, arm. powerpc, and mips on linux-next (and hence net-next).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: strparser: make it explicitly non-modular
Paul Gortmaker [Sun, 21 Apr 2019 03:29:48 +0000 (23:29 -0400)]
net: strparser: make it explicitly non-modular

The Kconfig currently controlling compilation of this code is:

net/strparser/Kconfig:config STREAM_PARSER
net/strparser/Kconfig:  def_bool n

...meaning that it currently is not being built as a module by anyone.

Lets remove the modular code that is essentially orphaned, so that
when reading the driver there is no doubt it is builtin-only.

Since module_init translates to device_initcall in the non-modular
case, the init ordering remains unchanged with this commit.  For
clarity, we change the fcn name mod_init to dev_init at the same time.

We replace module.h with init.h and export.h ; the latter since this
file exports some syms.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: bpfilter: dont use module_init in non-modular code
Paul Gortmaker [Sun, 21 Apr 2019 03:29:47 +0000 (23:29 -0400)]
net: bpfilter: dont use module_init in non-modular code

The Kconfig controlling this code is:

bpfilter/Kconfig:menuconfig BPFILTER
bpfilter/Kconfig:   bool "BPF based packet filtering framework (BPFILTER)"

Since it isn't a module, we shouldn't use module_init().  Instead we
use device_initcall() - which is exactly what module_init() defaults
to for non-modular code/builds.

We don't remove <linux/module.h> from the includes since this file does
a request_module() and hence is a valid user of that header file, even
though it is not modular itself.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agocgroup: net: remove left over MODULE_LICENSE tag
Paul Gortmaker [Sun, 21 Apr 2019 03:29:46 +0000 (23:29 -0400)]
cgroup: net: remove left over MODULE_LICENSE tag

The Kconfig currently controlling compilation of this code is:

net/Kconfig:config CGROUP_NET_PRIO
net/Kconfig:    bool "Network priority cgroup"

...meaning that it currently is not being built as a module by anyone,
as module support was discontinued in 2014.

We delete the MODULE_LICENSE tag since all that information is already
contained at the top of the file in the comments.

We don't delete module.h from the includes since it was no longer there
to begin with.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Tejun Heo <tj@kernel.org>
Cc: "Rosen, Rami" <rami.rosen@intel.com>
Cc: Daniel Wagner <daniel.wagner@bmw-carit.de>
Cc: netdev@vger.kernel.org
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: tc_act: drop include of module.h from tc_ife.h
Paul Gortmaker [Sun, 21 Apr 2019 03:29:45 +0000 (23:29 -0400)]
net: tc_act: drop include of module.h from tc_ife.h

Ideally, header files under include/linux shouldn't be adding
includes of other headers, in anticipation of their consumers,
but just the headers needed for the header itself to pass
parsing with CPP.

The module.h is particularly bad in this sense, as it itself does
include a whole bunch of other headers, due to the complexity of
module support.

Since tc_ife.h is not going into a module struct looking for
specific fields, we can just let it know that module is a struct,
just like about 60 other include/linux headers already do.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: fib: drop include of module.h from fib_notifier.h
Paul Gortmaker [Sun, 21 Apr 2019 03:29:44 +0000 (23:29 -0400)]
net: fib: drop include of module.h from fib_notifier.h

Ideally, header files under include/linux shouldn't be adding
includes of other headers, in anticipation of their consumers,
but just the headers needed for the header itself to pass
parsing with CPP.

The module.h is particularly bad in this sense, as it itself does
include a whole bunch of other headers, due to the complexity of
module support.

Since fib_notifier.h is not going into a module struct looking for
specific fields, we can just let it know that module is a struct,
just like about 60 other include/linux headers already do.

Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ife: drop include of module.h from net/ife.h
Paul Gortmaker [Sun, 21 Apr 2019 03:29:43 +0000 (23:29 -0400)]
net: ife: drop include of module.h from net/ife.h

Ideally, header files under include/linux shouldn't be adding
includes of other headers, in anticipation of their consumers,
but just the headers needed for the header itself to pass
parsing with CPP.

The module.h is particularly bad in this sense, as it itself does
include a whole bunch of other headers, due to the complexity of
module support.

There doesn't appear to be anything in net/ife.h that is module
related, and build coverage doesn't appear to show any other
files/drivers relying implicitly on getting it from here.

So it appears we are simply free to just remove it in this case.

Cc: Yotam Gigi <yotam.gi@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: psample: drop include of module.h from psample.h
Paul Gortmaker [Sun, 21 Apr 2019 03:29:42 +0000 (23:29 -0400)]
net: psample: drop include of module.h from psample.h

Ideally, header files under include/linux shouldn't be adding
includes of other headers, in anticipation of their consumers,
but just the headers needed for the header itself to pass
parsing with CPP.

The module.h is particularly bad in this sense, as it itself does
include a whole bunch of other headers, due to the complexity of
module support.

There doesn't appear to be anything in psample.h that is module
related, and build coverage doesn't appear to show any other
files/drivers relying implicitly on getting it from here.

So it appears we are simply free to just remove it in this case.

Cc: Yotam Gigi <yotam.gi@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: Rename net/nexthop.h net/rtnh.h
David Ahern [Sat, 20 Apr 2019 16:28:20 +0000 (09:28 -0700)]
net: Rename net/nexthop.h net/rtnh.h

The header contains rtnh_ macros so rename the file accordingly.
Allows a later patch to use the nexthop.h name for the new
nexthop code.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoipv6: Remove fib6_info_nh_lwt
David Ahern [Sat, 20 Apr 2019 16:27:27 +0000 (09:27 -0700)]
ipv6: Remove fib6_info_nh_lwt

fib6_info_nh_lwt is no longer used; remove it.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoinclude/net/tcp.h: whitespace cleanup at tcp_v4_check
Daniel T. Lee [Sat, 20 Apr 2019 15:50:42 +0000 (00:50 +0900)]
include/net/tcp.h: whitespace cleanup at tcp_v4_check

This patch makes trivial whitespace fix to the function
tcp_v4_check at include/net/tcp.h file.

It has stylistic issue, which is "space required after that ','"
and it can be confirmed with ./scripts/checkpatch.pl tool.

    ERROR: space required after that ',' (ctx:VxV)
    #29: FILE: include/net/tcp.h:1317:
    +         return csum_tcpudp_magic(saddr,daddr,len,IPPROTO_TCP,base);
                                        ^

Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
David S. Miller [Tue, 23 Apr 2019 04:35:55 +0000 (21:35 -0700)]
Merge git://git./linux/kernel/git/bpf/bpf-next

Alexei Starovoitov says:

====================
pull-request: bpf-next 2019-04-22

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) allow stack/queue helpers from more bpf program types, from Alban.

2) allow parallel verification of root bpf programs, from Alexei.

3) introduce bpf sysctl hook for trusted root cases, from Andrey.

4) recognize var/datasec in btf deduplication, from Andrii.

5) cpumap performance optimizations, from Jesper.

6) verifier prep for alu32 optimization, from Jiong.

7) libbpf xsk cleanup, from Magnus.

8) other various fixes and cleanups.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
David S. Miller [Tue, 23 Apr 2019 04:23:55 +0000 (21:23 -0700)]
Merge git://git./pub/scm/linux/kernel/git/pablo/nf

Pablo Neira Ayuso says:

====================
Netfilter/IPVS fixes for net

The following patchset contains Netfilter/IPVS fixes for your net tree:

1) Add a selftest for icmp packet too big errors with conntrack, from
   Florian Westphal.

2) Validate inner header in ICMP error message does not lie to us
   in conntrack, also from Florian.

3) Initialize ct->timeout to calm down KASAN, from Alexander Potapenko.

4) Skip ICMP error messages from tunnels in IPVS, from Julian Anastasov.

5) Use a hash to expose conntrack and expectation ID, from Florian Westphal.

6) Prevent shift wrap in nft_chain_parse_hook(), from Dan Carpenter.

7) Fix broken ICMP ID randomization with NAT, also from Florian.

8) Remove WARN_ON in ebtables compat that is reached via syzkaller,
   from Florian Westphal.

9) Fix broken timestamps since fb420d5d91c1 ("tcp/fq: move back to
   CLOCK_MONOTONIC"), from Florian.

10) Fix logging of invalid packets in conntrack, from Andrei Vagin.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'bpf-verifier-lock'
Daniel Borkmann [Mon, 22 Apr 2019 23:50:44 +0000 (01:50 +0200)]
Merge branch 'bpf-verifier-lock'

Alexei Starovoitov says:

====================
Allow the bpf verifier to run in parallel for root.
====================

Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agobpf: drop bpf_verifier_lock
Alexei Starovoitov [Fri, 19 Apr 2019 14:44:55 +0000 (07:44 -0700)]
bpf: drop bpf_verifier_lock

Drop bpf_verifier_lock for root to avoid being DoS-ed by unprivileged.
The BPF verifier is now fully parallel.
All unpriv users are still serialized by bpf_verifier_lock to avoid
exhausting kernel memory by running N parallel verifications.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agobpf: remove global variables
Alexei Starovoitov [Fri, 19 Apr 2019 14:44:54 +0000 (07:44 -0700)]
bpf: remove global variables

Move three global variables protected by bpf_verifier_lock into
'struct bpf_verifier_env' to allow parallel verification.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agobpf: document the verifier limits
Alexei Starovoitov [Thu, 18 Apr 2019 01:27:01 +0000 (18:27 -0700)]
bpf: document the verifier limits

Document the verifier limits.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agoMerge tag 'v5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux...
Saeed Mahameed [Mon, 22 Apr 2019 22:25:39 +0000 (15:25 -0700)]
Merge tag 'v5.1-rc1' of git://git./linux/kernel/git/torvalds/linux into mlx5-next

Linux 5.1-rc1

We forgot to reset the branch last merge window thus mlx5-next is outdated
and still based on 5.0-rc2. This merge commit is needed to sync mlx5-next
branch with 5.1-rc1.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonfsd: wake blocked file lock waiters before sending callback
Jeff Layton [Mon, 22 Apr 2019 16:34:24 +0000 (12:34 -0400)]
nfsd: wake blocked file lock waiters before sending callback

When a blocked NFS lock is "awoken" we send a callback to the server and
then wake any hosts waiting on it. If a client attempts to get a lock
and then drops off the net, we could end up waiting for a long time
until we end up waking locks blocked on that request.

So, wake any other waiting lock requests before sending the callback.
Do this by calling locks_delete_block in a new "prepare" phase for
CB_NOTIFY_LOCK callbacks.

URL: https://bugzilla.kernel.org/show_bug.cgi?id=203363
Fixes: 16306a61d3b7 ("fs/locks: always delete_block after waiting.")
Reported-by: Slawomir Pryczek <slawek1211@gmail.com>
Cc: Neil Brown <neilb@suse.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
5 years agonfsd: wake waiters blocked on file_lock before deleting it
Jeff Layton [Mon, 22 Apr 2019 16:34:23 +0000 (12:34 -0400)]
nfsd: wake waiters blocked on file_lock before deleting it

After a blocked nfsd file_lock request is deleted, knfsd will send a
callback to the client and then free the request. Commit 16306a61d3b7
("fs/locks: always delete_block after waiting.") changed it such that
locks_delete_block is always called on a request after it is awoken,
but that patch missed fixing up blocked nfsd request handling.

Call locks_delete_block on the block to wake up any locks still blocked
on the nfsd lock request before freeing it. Some of its callers already
do this however, so just remove those calls.

URL: https://bugzilla.kernel.org/show_bug.cgi?id=203363
Fixes: 16306a61d3b7 ("fs/locks: always delete_block after waiting.")
Reported-by: Slawomir Pryczek <slawek1211@gmail.com>
Cc: Neil Brown <neilb@suse.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
5 years agoMerge tag 'mips_fixes_5.1_3' of git://git.kernel.org/pub/scm/linux/kernel/git/mips...
Linus Torvalds [Mon, 22 Apr 2019 18:54:47 +0000 (11:54 -0700)]
Merge tag 'mips_fixes_5.1_3' of git://git./linux/kernel/git/mips/linux

Pull MIPS fixes from Paul Burton:
 "A couple more MIPS fixes:

   - Fix indirect syscall tracing & seccomp filtering for big endian
     MIPS64 kernels, which previously loaded the syscall number
     incorrectly & would always use zero.

   - Fix performance counter IRQ setup for Atheros/ath79 SoCs, allowing
     perf to function on those systems.

  And not really a fix, but a useful addition:

   - Add a Broadcom mailing list to the MAINTAINERS entry for BMIPS
     systems to allow relevant engineers to track patch submissions"

* tag 'mips_fixes_5.1_3' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
  MIPS: perf: ath79: Fix perfcount IRQ assignment
  MIPS: scall64-o32: Fix indirect syscall number load
  MAINTAINERS: BMIPS: Add internal Broadcom mailing list