review.tizen.org Git - platform/kernel/linux-rpi3.git/log

projects / platform / kernel / linux-rpi3.git / log

Ursula Braun [Thu, 21 Sep 2017 07:16:29 +0000 (09:16 +0200)]

net/smc: adjust net_device refcount

smc_pnet_fill_entry() uses dev_get_by_name() adding a refcount to ndev.
The following smc_pnet_enter() has to reduce the refcount if the entry
to be added exists already in the pnet table.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Ursula Braun [Thu, 21 Sep 2017 07:16:28 +0000 (09:16 +0200)]

net/smc: take RCU read lock for routing cache lookup

smc_netinfo_by_tcpsk() looks up the routing cache. Such a lookup requires
protection by an RCU read lock.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Hans Wippel [Thu, 21 Sep 2017 07:16:27 +0000 (09:16 +0200)]

net/smc: add receive timeout check

The SMC receive function currently lacks a timeout check under the
condition that no data were received and no data are available. This
patch adds such a check.

Signed-off-by: Hans Wippel <hwippel@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Hans Wippel [Thu, 21 Sep 2017 07:16:26 +0000 (09:16 +0200)]

net/smc: add missing dev_put

In the infiniband part, SMC currently uses get_netdev which calls
dev_hold on the returned net device. However, the SMC code never calls
dev_put on that net device resulting in a wrong reference count.

This patch adds a dev_put after the usage of the net device to fix the
issue.

Signed-off-by: Hans Wippel <hwippel@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Thomas Meyer [Thu, 21 Sep 2017 06:24:27 +0000 (08:24 +0200)]

net: stmmac: Cocci spatch "of_table"

Make sure (of/i2c/platform)_device_id tables are NULL terminated.
Found by coccinelle spatch "misc/of_table.cocci"

Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Thu, 21 Sep 2017 22:22:54 +0000 (15:22 -0700)]

Merge branch 'lan78xx-fixes'

Nisar Sayed says:

====================
lan78xx: This series of patches are for lan78xx driver.

This series of patches are for lan78xx driver.

These patches fixes potential issues associated with lan78xx driver.

v5
- Updated changes as per comments

v4
- Updated changes to handle return values as per comments
- Updated EEPROM write handling as per comments

v3
- Updated chagnes as per comments

v2
- Added patch version information
- Added fixes tag
- Updated patch description
- Updated chagnes as per comments

v1
- Splitted patches as per comments
- Dropped "fixed_phy device support" and "Fix for system suspend" changes
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Nisar Sayed [Wed, 20 Sep 2017 21:06:38 +0000 (02:36 +0530)]

lan78xx: Use default values loaded from EEPROM/OTP after reset

Use default value of auto duplex and auto speed values loaded
from EEPROM/OTP after reset. The LAN78xx allows platform
configurations to be loaded from EEPROM/OTP.
Ex: When external phy is connected, the MAC can be configured to
have correct auto speed, auto duplex, auto polarity configured
from the EEPROM/OTP.

Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver")
Signed-off-by: Nisar Sayed <Nisar.Sayed@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Nisar Sayed [Wed, 20 Sep 2017 21:06:37 +0000 (02:36 +0530)]

lan78xx: Allow EEPROM write for less than MAX_EEPROM_SIZE

Allow EEPROM write for less than MAX_EEPROM_SIZE

Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver")
Signed-off-by: Nisar Sayed <Nisar.Sayed@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Nisar Sayed [Wed, 20 Sep 2017 21:06:36 +0000 (02:36 +0530)]

lan78xx: Fix for eeprom read/write when device auto suspend

Fix for eeprom read/write when device auto suspend

Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver")
Signed-off-by: Nisar Sayed <Nisar.Sayed@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Thu, 21 Sep 2017 22:20:41 +0000 (15:20 -0700)]

Merge branch 'phylib-xcvr-type'

Florian Fainelli says:

====================
net: Bring back transceiver type for PHYLIB

With the introduction of the xLINKSETTINGS ethtool APIs, the transceiver type
was deprecated, but in that process we lost some useful information that PHYLIB
was consistently reporting about internal vs. external PHYs.

This brings back transceiver as a read-only field that is only consumed in the
legacy path where ETHTOOL_GET is called but the underlying drivers implement the
new style klink_settings API.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Florian Fainelli [Wed, 20 Sep 2017 22:52:14 +0000 (15:52 -0700)]

net: phy: Keep reporting transceiver type

With commit 2d55173e71b0 ("phy: add generic function to support
ksetting support"), we lost the ability to report the transceiver type
like we used to. Now that we have added back the transceiver type to
ethtool_link_settings, we can report it back like we used to and have no
loss of information.

Fixes: 3f1ac7a700d0 ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API")
Fixes: 2d55173e71b0 ("phy: add generic function to support ksetting support")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Florian Fainelli [Wed, 20 Sep 2017 22:52:13 +0000 (15:52 -0700)]

net: ethtool: Add back transceiver type

Commit 3f1ac7a700d0 ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API")
deprecated the ethtool_cmd::transceiver field, which was fine in
premise, except that the PHY library was actually using it to report the
type of transceiver: internal or external.

Use the first word of the reserved field to put this __u8 transceiver
field back in. It is made read-only, and we don't expect the
ETHTOOL_xLINKSETTINGS API to be doing anything with this anyway, so this
is mostly for the legacy path where we do:

ethtool_get_settings()
-> dev->ethtool_ops->get_link_ksettings()
-> convert_link_ksettings_to_legacy_settings()

to have no information loss compared to the legacy get_settings API.

Fixes: 3f1ac7a700d0 ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Timur Tabi [Wed, 20 Sep 2017 20:32:53 +0000 (15:32 -0500)]

net: qcom/emac: add software control for pause frame mode

The EMAC has the option of sending only a single pause frame when
flow control is enabled and the RX queue is full. Although sending
only one pause frame has little value, this would allow admins to
enable automatic flow control without having to worry about the EMAC
flooding nearby switches with pause frames if the kernel hangs.

The option is enabled by using the single-pause-mode private flag.

Signed-off-by: Timur Tabi <timur@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Alex Ng [Wed, 20 Sep 2017 18:17:35 +0000 (11:17 -0700)]

hv_netvsc: fix send buffer failure on MTU change

If MTU is changed the host would reject the send buffer change.
This problem is result of recent change to allow changing send
buffer size.

Every time we change the MTU, we store the previous net_device section
count before destroying the buffer, but we don’t store the previous
section size. When we reinitialize the buffer, its size is calculated
by multiplying the previous count and previous size. Since we
continuously increase the MTU, the host returns us a decreasing count
value while the section size is reinitialized to 1728 bytes every
time.

This eventually leads to a condition where the calculated buf_size is
so small that the host rejects it.

Fixes: 8b5327975ae1 ("netvsc: allow controlling send/recv buffer size")
Signed-off-by: Alex Ng <alexng@microsoft.com>
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Cong Wang [Wed, 20 Sep 2017 16:18:45 +0000 (09:18 -0700)]

net_sched: remove cls_flower idr on failure

Fixes: c15ab236d69d ("net/sched: Change cls_flower to use IDR")
Cc: Chris Mi <chrism@mellanox.com>
Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Konstantin Khlebnikov [Wed, 20 Sep 2017 12:46:11 +0000 (15:46 +0300)]

net_sched/hfsc: fix curve activation in hfsc_change_class()

If real-time or fair-share curves are enabled in hfsc_change_class()
class isn't inserted into rb-trees yet. Thus init_ed() and init_vf()
must be called in place of update_ed() and update_vf().

Remove isn't required because for now curves cannot be disabled.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Konstantin Khlebnikov [Wed, 20 Sep 2017 12:45:36 +0000 (15:45 +0300)]

net_sched: always reset qdisc backlog in qdisc_reset()

SKB stored in qdisc->gso_skb also counted into backlog.

Some qdiscs don't reset backlog to zero in ->reset(),
for example sfq just dequeue and free all queued skb.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Fixes: 2ccccf5fb43f ("net_sched: update hierarchical backlog too")
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Wed, 20 Sep 2017 23:15:40 +0000 (16:15 -0700)]

Merge branch 'hns3-tm-fixes'

Yunsheng Lin says:

====================
TM related bugfixes for the HNS3 Ethernet Driver

This patch set contains a few bugfixes related to hclge_tm module.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Yunsheng Lin [Wed, 20 Sep 2017 10:52:58 +0000 (18:52 +0800)]

net: hns3: Fix for pri to tc mapping in TM

Current mapping between pri and tc is one to one,
so user can't map multi priorities to the same tc.
This patch changes the mapping to many to one.

Fixes: 848440544b41f ("net: hns3: Add support of TX Scheduler & Shaper to HNS3 driver")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Yunsheng Lin [Wed, 20 Sep 2017 10:52:57 +0000 (18:52 +0800)]

net: hns3: Fix for setting rss_size incorrectly

rss_size is 1, 2, 4, 8, 16, 32, 64, 128, but acutal tc queue
size can be any u16 less than 128. If tc queue size is 5, we
set the rss_size to 8, indirection table will be used to limit
the size of actual queue size.
It may cause dropping of receiving packet in hardware if
rss_size is not set correctly.
For now, each TC has the same rss size.

Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Yunsheng Lin [Wed, 20 Sep 2017 10:52:56 +0000 (18:52 +0800)]

net: hns3: Fix typo error for feild in hclge_tm

This patch fixes a typo error for feild, which should be field.

Fixes: 848440544b41f ("net: hns3: Add support of TX Scheduler & Shaper to HNS3 driver")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Yunsheng Lin [Wed, 20 Sep 2017 10:52:55 +0000 (18:52 +0800)]

net: hns3: Fix for rx priv buf allocation when DCB is not supported

When hdev doesn't support DCB, rx private buffer is not allocated,
otherwise there is not enough buffer for rx shared buffer, causing
buffer allocation process to fail.
This patch fixes by checking the dcb capability in
hclge_rx_buffer_calc.

Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Yunsheng Lin [Wed, 20 Sep 2017 10:52:54 +0000 (18:52 +0800)]

net: hns3: Fix for rx_priv_buf_alloc not setting rx shared buffer

rx_priv_buf_alloc is used to tell hardware how much buffer is
used for rx direction, right now only the private buffer is
assigned.
For ae_dev that doesn't support DCB, private rx buffer is assigned
to zero, only shared rx buffer is used. So not setting the shared
rx buffer cause dropping of packet in SSU.

Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Yunsheng Lin [Wed, 20 Sep 2017 10:52:53 +0000 (18:52 +0800)]

net: hns3: Fix for not setting rx private buffer size to zero

When rx private buffer is disabled, there may be some case that
the rx private buffer is not set to zero, which may cause buffer
allocation process to fail.
This patch fixes this problem by setting priv->enable to 0 and
priv->buf_size to zero when rx private buffer is disabled.

Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Yunsheng Lin [Wed, 20 Sep 2017 10:52:52 +0000 (18:52 +0800)]

net: hns3: Fix for DEFAULT_DV when dev doesn't support DCB

When ae_dev doesn't support DCB, DEFAULT_DV must be set to
a lower value, otherwise the buffer allocation process will
fail.
This patch fix it by setting it to 30K bytes.

Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Yunsheng Lin [Wed, 20 Sep 2017 10:52:51 +0000 (18:52 +0800)]

net: hns3: Fix initialization when cmd is not supported

When ae_dev doesn't support DCB, rx_priv_wl_config,
common_thrd_config and tm_qs_bp_cfg can't be called, otherwise
cmd return fail, which causes the hclge module initialization
process to fail.
This patch fix it by adding a DCB capability flag to check if
the ae_dev support DCB.

Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Yunsheng Lin [Wed, 20 Sep 2017 10:52:50 +0000 (18:52 +0800)]

net: hns3: Cleanup for ROCE capability flag in ae_dev

This patch add the ROCE supported flag in the driver_data
field of pci_device_id, delete roce_pci_tbl and change
HNAE_DEV_SUPPORT_ROCE_B to HNAE3_DEV_SUPPORT_ROCE_B.
This cleanup is done in order to support adding capability
in pci_device_id and to fix initialization failure when
cmd is not supported.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Wed, 20 Sep 2017 23:08:23 +0000 (16:08 -0700)]

Merge git://git./pub/scm/linux/kernel/git/pablo/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains two Netfilter fixes for your net tree,
they are:

1) Fix NAt compilation with UP, from Geert Uytterhoeven.

2) Fix incorrect number of entries when dumping a set, from
Vishwanath Pai.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Meng Xu [Wed, 20 Sep 2017 01:49:55 +0000 (21:49 -0400)]

isdn/i4l: fetch the ppp_write buffer in one shot

In isdn_ppp_write(), the header (i.e., protobuf) of the buffer is
fetched twice from userspace. The first fetch is used to peek at the
protocol of the message and reset the huptimer if necessary; while the
second fetch copies in the whole buffer. However, given that buf resides
in userspace memory, a user process can race to change its memory content
across fetches. By doing so, we can either avoid resetting the huptimer
for any type of packets (by first setting proto to PPP_LCP and later
change to the actual type) or force resetting the huptimer for LCP
packets.

This patch changes this double-fetch behavior into two single fetches
decided by condition (lp->isdn_device < 0 || lp->isdn_channel <0).
A more detailed discussion can be found at
https://marc.info/?l=linux-kernel&m=150586376926123&w=2

Signed-off-by: Meng Xu <mengxu.gatech@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Troy Kisky [Wed, 20 Sep 2017 00:33:09 +0000 (17:33 -0700)]

net: fec: return IRQ_HANDLED if fec_ptp_check_pps_event handled it

fec_ptp_check_pps_event will return 1 if FEC_T_TF_MASK caused
an interrupt. Don't return IRQ_NONE in this case.

Signed-off-by: Troy Kisky <troy.kisky@boundarydevices.com>
Acked-by: Fugang Duan <fugang.duan@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Troy Kisky [Wed, 20 Sep 2017 00:33:08 +0000 (17:33 -0700)]

net: fec: remove unused interrupt FEC_ENET_TS_TIMER

FEC_ENET_TS_TIMER is not checked in the interrupt routine
so there is no need to enable it.

Signed-off-by: Troy Kisky <troy.kisky@boundarydevices.com>
Acked-by: Fugang Duan <fugang.duan@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Troy Kisky [Wed, 20 Sep 2017 00:33:07 +0000 (17:33 -0700)]

net: fec: only check queue 0 if RXF_0/TXF_0 interrupt is set

Before queue 0 was always checked if any queue caused an interrupt.
It is better to just mark queue 0 if queue 0 has caused an interrupt.

Signed-off-by: Troy Kisky <troy.kisky@boundarydevices.com>
Acked-by: Fugang Duan <Fugang.duan@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Edward Cree [Tue, 19 Sep 2017 17:45:56 +0000 (18:45 +0100)]

net: change skb->mac_header when Generic XDP calls adjust_head

Since XDP's view of the packet includes the MAC header, moving the start-
of-packet with bpf_xdp_adjust_head needs to also update the offset of the
MAC header (which is relative to skb->head, not to the skb->data that was
changed).
Without this, tcpdump sees packets starting from the old MAC header rather
than the new one, at least in my tests on the loopback device.

Fixes: b5cdae3291f7 ("net: Generic XDP")
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Meng Xu [Tue, 19 Sep 2017 17:19:13 +0000 (13:19 -0400)]

net: compat: assert the size of cmsg copied in is as expected

The actual length of cmsg fetched in during the second loop
(i.e., kcmsg - kcmsg_base) could be different from what we
get from the first loop (i.e., kcmlen).

The main reason is that the two get_user() calls in the two
loops (i.e., get_user(ucmlen, &ucmsg->cmsg_len) and
__get_user(ucmlen, &ucmsg->cmsg_len)) could cause ucmlen
to have different values even they fetch from the same userspace
address, as user can race to change the memory content in
&ucmsg->cmsg_len across fetches.

Although in the second loop, the sanity check
if ((char *)kcmsg_base + kcmlen - (char *)kcmsg < CMSG_ALIGN(tmp))
is inplace, it only ensures that the cmsg fetched in during the
second loop does not exceed the length of kcmlen, but not
necessarily equal to kcmlen. But indicated by the assignment
kmsg->msg_controllen = kcmlen, we should enforce that.

This patch adds this additional sanity check and ensures that
what is recorded in kmsg->msg_controllen is the actual cmsg length.

Signed-off-by: Meng Xu <mengxu.gatech@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Yonghong Song [Mon, 18 Sep 2017 23:38:36 +0000 (16:38 -0700)]

bpf: one perf event close won't free bpf program attached by another perf event

This patch fixes a bug exhibited by the following scenario:
  1. fd1 = perf_event_open with attr.config = ID1
  2. attach bpf program prog1 to fd1
  3. fd2 = perf_event_open with attr.config = ID1
     <this will be successful>
  4. user program closes fd2 and prog1 is detached from the tracepoint.
  5. user program with fd1 does not work properly as tracepoint
     no output any more.

The issue happens at step 4. Multiple perf_event_open can be called
successfully, but only one bpf prog pointer in the tp_event. In the
current logic, any fd release for the same tp_event will free
the tp_event->prog.

The fix is to free tp_event->prog only when the closing fd
corresponds to the one which registered the program.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Willem de Bruijn [Thu, 14 Sep 2017 21:14:41 +0000 (17:14 -0400)]

packet: hold bind lock when rebinding to fanout hook

Packet socket bind operations must hold the po->bind_lock. This keeps
po->running consistent with whether the socket is actually on a ptype
list to receive packets.

fanout_add unbinds a socket and its packet_rcv/tpacket_rcv call, then
binds the fanout object to receive through packet_rcv_fanout.

Make it hold the po->bind_lock when testing po->running and rebinding.
Else, it can race with other rebind operations, such as that in
packet_set_ring from packet_rcv to tpacket_rcv. Concurrent updates
can result in a socket being added to a fanout group twice, causing
use-after-free KASAN bug reports, among others.

Reported independently by both trinity and syzkaller.
Verified that the syzkaller reproducer passes after this patch.

Fixes: dc99f600698d ("packet: Add fanout support.")
Reported-by: nixioaming <nixiaoming@huawei.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Matteo Croce [Tue, 12 Sep 2017 15:46:37 +0000 (17:46 +0200)]

ipv6: fix net.ipv6.conf.all interface DAD handlers

Currently, writing into
net.ipv6.conf.all.{accept_dad,use_optimistic,optimistic_dad} has no effect.
Fix handling of these flags by:

- using the maximum of global and per-interface values for the
  accept_dad flag. That is, if at least one of the two values is
  non-zero, enable DAD on the interface. If at least one value is
  set to 2, enable DAD and disable IPv6 operation on the interface if
  MAC-based link-local address was found

- using the logical OR of global and per-interface values for the
  optimistic_dad flag. If at least one of them is set to one, optimistic
  duplicate address detection (RFC 4429) is enabled on the interface

- using the logical OR of global and per-interface values for the
  use_optimistic flag. If at least one of them is set to one,
  optimistic addresses won't be marked as deprecated during source address
  selection on the interface.

While at it, as we're modifying the prototype for ipv6_use_optimistic_addr(),
drop inline, and let the compiler decide.

Fixes: 7fd2561e4ebd ("net: ipv6: Add a sysctl to make optimistic addresses useful candidates")
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Mike Manning [Mon, 4 Sep 2017 14:52:55 +0000 (15:52 +0100)]

net: ipv6: fix regression of no RTM_DELADDR sent after DAD failure

Commit f784ad3d79e5 ("ipv6: do not send RTM_DELADDR for tentative
addresses") incorrectly assumes that no RTM_NEWADDR are sent for
addresses in tentative state, as this does happen for the standard
IPv6 use-case of DAD failure, see the call to ipv6_ifa_notify() in
addconf_dad_stop(). So as a result of this change, no RTM_DELADDR is
sent after DAD failure for a link-local when strict DAD (accept_dad=2)
is configured, or on the next admin down in other cases. The absence
of this notification breaks backwards compatibility and causes problems
after DAD failure if this notification was being relied on. The
solution is to allow RTM_DELADDR to still be sent after DAD failure.

Fixes: f784ad3d79e5 ("ipv6: do not send RTM_DELADDR for tentative addresses")
Signed-off-by: Mike Manning <mmanning@brocade.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Daniel Borkmann [Tue, 19 Sep 2017 22:44:21 +0000 (00:44 +0200)]

bpf: fix ri->map_owner pointer on bpf_prog_realloc

Commit 109980b894e9 ("bpf: don't select potentially stale
ri->map from buggy xdp progs") passed the pointer to the prog
itself to be loaded into r4 prior on bpf_redirect_map() helper
call, so that we can store the owner into ri->map_owner out of
the helper.

Issue with that is that the actual address of the prog is still
subject to change when subsequent rewrites occur that require
slow path in bpf_prog_realloc() to alloc more memory, e.g. from
patching inlining helper functions or constant blinding. Thus,
we really need to take prog->aux as the address we're holding,
which also works with prog clones as they share the same aux
object.

Instead of then fetching aux->prog during runtime, which could
potentially incur cache misses due to false sharing, we are
going to just use aux for comparison on the map owner. This
will also keep the patchlet of the same size, and later check
in xdp_map_invalid() only accesses read-only aux pointer from
the prog, it's also in the same cacheline already from prior
access when calling bpf_func.

Fixes: 109980b894e9 ("bpf: don't select potentially stale ri->map from buggy xdp progs")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Christian Lamparter [Tue, 19 Sep 2017 17:35:18 +0000 (19:35 +0200)]

net: emac: Fix napi poll list corruption

This patch is pretty much a carbon copy of
commit 3079c652141f ("caif: Fix napi poll list corruption")
with "caif" replaced by "emac".

The commit d75b1ade567f ("net: less interrupt masking in NAPI")
breaks emac.

It is now required that if the entire budget is consumed when poll
returns, the napi poll_list must remain empty. However, like some
other drivers emac tries to do a last-ditch check and if there is
more work it will call napi_reschedule and then immediately process
some of this new work. Should the entire budget be consumed while
processing such new work then we will violate the new caller
contract.

This patch fixes this by not touching any work when we reschedule
in emac.

Signed-off-by: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Eric Dumazet [Tue, 19 Sep 2017 17:05:57 +0000 (10:05 -0700)]

tcp: fastopen: fix on syn-data transmit failure

Our recent change exposed a bug in TCP Fastopen Client that syzkaller
found right away [1]

When we prepare skb with SYN+DATA, we attempt to transmit it,
and we update socket state as if the transmit was a success.

In socket RTX queue we have two skbs, one with the SYN alone,
and a second one containing the DATA.

When (malicious) ACK comes in, we now complain that second one had no
skb_mstamp.

The proper fix is to make sure that if the transmit failed, we do not
pretend we sent the DATA skb, and make it our send_head.

When 3WHS completes, we can now send the DATA right away, without having
to wait for a timeout.

[1]
WARNING: CPU: 0 PID: 100189 at net/ipv4/tcp_input.c:3117 tcp_clean_rtx_queue+0x2057/0x2ab0 net/ipv4/tcp_input.c:3117()

WARN_ON_ONCE(last_ackt == 0);

Modules linked in:
CPU: 0 PID: 100189 Comm: syz-executor1 Not tainted
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
0000000000000000 ffff8800b35cb1d8 ffffffff81cad00d 0000000000000000
ffffffff828a4347 ffff88009f86c080 ffffffff8316eb20 0000000000000d7f
ffff8800b35cb220 ffffffff812c33c2 ffff8800baad2440 00000009d46575c0
Call Trace:
[<ffffffff81cad00d>] __dump_stack
[<ffffffff81cad00d>] dump_stack+0xc1/0x124
[<ffffffff812c33c2>] warn_slowpath_common+0xe2/0x150
[<ffffffff812c361e>] warn_slowpath_null+0x2e/0x40
[<ffffffff828a4347>] tcp_clean_rtx_queue+0x2057/0x2ab0 n
[<ffffffff828ae6fd>] tcp_ack+0x151d/0x3930
[<ffffffff828baa09>] tcp_rcv_state_process+0x1c69/0x4fd0
[<ffffffff828efb7f>] tcp_v4_do_rcv+0x54f/0x7c0
[<ffffffff8258aacb>] sk_backlog_rcv
[<ffffffff8258aacb>] __release_sock+0x12b/0x3a0
[<ffffffff8258ad9e>] release_sock+0x5e/0x1c0
[<ffffffff8294a785>] inet_wait_for_connect
[<ffffffff8294a785>] __inet_stream_connect+0x545/0xc50
[<ffffffff82886f08>] tcp_sendmsg_fastopen
[<ffffffff82886f08>] tcp_sendmsg+0x2298/0x35a0
[<ffffffff82952515>] inet_sendmsg+0xe5/0x520
[<ffffffff8257152f>] sock_sendmsg_nosec
[<ffffffff8257152f>] sock_sendmsg+0xcf/0x110

Fixes: 8c72c65b426b ("tcp: update skb->skb_mstamp more carefully")
Fixes: 783237e8daf1 ("net-tcp: Fast Open client - sending SYN-data")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Tue, 19 Sep 2017 23:06:50 +0000 (16:06 -0700)]

Merge branch 'hns3-bug-fixes'

Salil Mehta says:

====================
Bug fixes for the HNS3 Ethernet Driver for Hip08 SoC

This patch set presents some bug fixes for the HNS3 Ethernet driver identified
during internal testing & stabilization efforts.

Change Log:
Patch V2: Resolved comments from Leon Romanovsky
Patch V1: Initial Submit
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Lipeng [Tue, 19 Sep 2017 16:17:16 +0000 (17:17 +0100)]

net: hns3: Fixes the premature exit of loop when matching clients

When register/unregister ae_dev, ae_dev should match all client
in the client_list. Enet and roce can co-exists together so we
should continue checking for enet and roce presence together.
So break should not be there.

Above caused problems in loading and unloading of modules.

Fixes: 38eddd126772 ("net: hns3: Add support of the HNAE3 framework")
Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Lipeng [Tue, 19 Sep 2017 16:17:15 +0000 (17:17 +0100)]

net: hns3: Fixes the default VLAN-id of PF

When there is no vlan id in the packets, hardware will treat the vlan id
as 0 and look for the mac_vlan table. This patch set the default vlan id
of PF as 0. Without this config, it will fail when look for mac_vlan
table, and hardware will drop packets.

Fixes: 6427264ef330 ("net: hns3: Add HNS3 Acceleration Engine &
Compatibility Layer Support")
Signed-off-by: Mingguang Qu <qumingguang@huawei.com>
Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Salil Mehta [Tue, 19 Sep 2017 16:17:14 +0000 (17:17 +0100)]

net: hns3: Fixes the ether address copy with appropriate API

This patch replaces the ethernet address copy instance with more
appropriate ether_addr_copy() function.

Fixes: 6427264ef330 ("net: hns3: Add HNS3 Acceleration Engine &
Compatibility Layer Support")
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Lipeng [Tue, 19 Sep 2017 16:17:13 +0000 (17:17 +0100)]

net: hns3: Fixes the initialization of MAC address in hardware

This patch fixes the initialization of MAC address, fetched from HNS3
firmware i.e. when it is not randomly generated, to the HNS3 hardware.

Fixes: ca60906d2795 ("net: hns3: Add support of HNS3 Ethernet Driver for
hip08 SoC")
Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Lipeng [Tue, 19 Sep 2017 16:17:12 +0000 (17:17 +0100)]

net: hns3: Fixes ring-to-vector map-and-unmap command

This patch fixes the vector-to-ring map and unmap command and adds
INT_GL(for, Gap Limiting Interrupts) and VF id to it as required
by the hardware interface.

Fixes: 6427264ef330 ("net: hns3: Add HNS3 Acceleration Engine &
Compatibility Layer Support")
Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: Mingguang Qu <qumingguang@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Lipeng [Tue, 19 Sep 2017 16:17:11 +0000 (17:17 +0100)]

net: hns3: Fixes the command used to unmap ring from vector

This patch fixes the IMP command being used to unmap the vector
from the corresponding ring.

Fixes: 6427264ef330 ("net: hns3: Add HNS3 Acceleration Engine &
Compatibility Layer Support")
Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Lipeng [Tue, 19 Sep 2017 16:17:10 +0000 (17:17 +0100)]

net: hns3: Fixes initialization of phy address from firmware

Default phy address of every port is 0. Therefore, phy address for
each port need to be fetched from firmware and device initialized
with fetched non-default phy address.

Fixes: 6427264ef330 ("net: hns3: Add HNS3 Acceleration Engine &
Compatibility Layer Support")
Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Eric Dumazet [Tue, 19 Sep 2017 16:15:59 +0000 (09:15 -0700)]

bpf: do not disable/enable BH in bpf_map_free_id()

syzkaller reported following splat [1]

Since hard irq are disabled by the caller, bpf_map_free_id()
should not try to enable/disable BH.

Another solution would be to change htab_map_delete_elem() to
defer the free_htab_elem() call after
raw_spin_unlock_irqrestore(&b->lock, flags), but this might be not
enough to cover other code paths.

[1]
WARNING: CPU: 1 PID: 8052 at kernel/softirq.c:161 __local_bh_enable_ip
+0x1e/0x160 kernel/softirq.c:161
Kernel panic - not syncing: panic_on_warn set ...

CPU: 1 PID: 8052 Comm: syz-executor1 Not tainted 4.13.0-next-20170915+
#23
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:16 [inline]
dump_stack+0x194/0x257 lib/dump_stack.c:52
panic+0x1e4/0x417 kernel/panic.c:181
__warn+0x1c4/0x1d9 kernel/panic.c:542
report_bug+0x211/0x2d0 lib/bug.c:183
fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:178
do_trap_no_signal arch/x86/kernel/traps.c:212 [inline]
do_trap+0x260/0x390 arch/x86/kernel/traps.c:261
do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:298
do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:311
invalid_op+0x18/0x20 arch/x86/entry/entry_64.S:905
RIP: 0010:__local_bh_enable_ip+0x1e/0x160 kernel/softirq.c:161
RSP: 0018:ffff8801cdcd7748 EFLAGS: 00010046
RAX: 0000000000000082 RBX: 0000000000000201 RCX: 0000000000000000
RDX: 1ffffffff0b5933c RSI: 0000000000000201 RDI: ffffffff85ac99e0
RBP: ffff8801cdcd7758 R08: ffffffff85b87158 R09: 1ffff10039b9aec6
R10: ffff8801c99f24c0 R11: 0000000000000002 R12: ffffffff817b0b47
R13: dffffc0000000000 R14: ffff8801cdcd77e8 R15: 0000000000000001
__raw_spin_unlock_bh include/linux/spinlock_api_smp.h:176 [inline]
_raw_spin_unlock_bh+0x30/0x40 kernel/locking/spinlock.c:207
spin_unlock_bh include/linux/spinlock.h:361 [inline]
bpf_map_free_id kernel/bpf/syscall.c:197 [inline]
__bpf_map_put+0x267/0x320 kernel/bpf/syscall.c:227
bpf_map_put+0x1a/0x20 kernel/bpf/syscall.c:235
bpf_map_fd_put_ptr+0x15/0x20 kernel/bpf/map_in_map.c:96
free_htab_elem+0xc3/0x1b0 kernel/bpf/hashtab.c:658
htab_map_delete_elem+0x74d/0x970 kernel/bpf/hashtab.c:1063
map_delete_elem kernel/bpf/syscall.c:633 [inline]
SYSC_bpf kernel/bpf/syscall.c:1479 [inline]
SyS_bpf+0x2188/0x46a0 kernel/bpf/syscall.c:1451
entry_SYSCALL_64_fastpath+0x1f/0xbe

Fixes: f3f1c054c288 ("bpf: Introduce bpf_map ID")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Andreas Gruenbacher [Tue, 19 Sep 2017 10:41:37 +0000 (12:41 +0200)]

rhashtable: Documentation tweak

Clarify that rhashtable_walk_{stop,start} will not reset the iterator to
the beginning of the hash table. Confusion between rhashtable_walk_enter
and rhashtable_walk_start has already lead to a bug.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Jerome Brunet [Mon, 18 Sep 2017 12:59:20 +0000 (14:59 +0200)]

net: phy: Kconfig: Fix PHY infrastructure menu in menuconfig

Since the integration of PHYLINK, the configuration option which
used to be under the PHY infrastructure menu in menuconfig ended
up one level up (the network device driver section)

By placing PHYLINK option right after PHYLIB entry, it broke the
way Kconfig used to build the menu. See kconfig-language.txt, section
"Menu structure", 2nd method.

This is fixed by placing the PHYLINK option just before PHYLIB.

Fixes: 9525ae83959b ("phylink: add phylink infrastructure")
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Tue, 19 Sep 2017 17:51:08 +0000 (10:51 -0700)]

Merge tag 'mac80211-for-davem-2017-11-19' of git://git./linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
Just two netlink fixes, both allowing privileged users
to crash the kernel with malformed netlink messages.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Ariel Elior [Tue, 19 Sep 2017 09:54:34 +0000 (12:54 +0300)]

MAINTAINERS: Remove Yuval Mintz from maintainers list

Remove Yuval from maintaining the bnx2x & qed* modules as he is no longer
working for the company. Thanks Yuval for your huge contributions and
tireless efforts over the many years and various companies.

Ariel
Signed-off-by: Ariel Elior <aelior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Florian Fainelli [Mon, 18 Sep 2017 23:31:30 +0000 (16:31 -0700)]

net: systemport: Fix 64-bit statistics dependency

There are several problems with commit 10377ba7673d ("net: systemport:
Support 64bit statistics", first one got fixed in 7095c973453e ("net:
systemport: Fix 64-bit stats deadlock").

The second problem is that this specific code updates the
stats64.tx_{packets,bytes} from ndo_get_stats64() and that is what we
are returning to ethtool -S. If we are not running a tool that involves
calling ndo_get_stats64(), then we won't get updated ethtool stats.

The solution to this is to update the stats from both call sites,
factoring that into a specific function, While at it, don't just check
the sizeof() but also the type of the statistics in order to use the
64-bit stats seqlock.

Fixes: 10377ba7673d ("net: systemport: Support 64bit statistics")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Eric Dumazet [Mon, 18 Sep 2017 20:03:43 +0000 (13:03 -0700)]

8139too: revisit napi_complete_done() usage

It seems we have to be more careful in napi_complete_done()
use. This patch is not a revert, as it seems we can
avoid bug that Ville reported by moving the napi_complete_done()
test in the spinlock section.

Many thanks to Ville for detective work and all tests.

Fixes: 617f01211baf ("8139too: use napi_complete_done()")
Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Yuchung Cheng [Mon, 18 Sep 2017 18:05:16 +0000 (11:05 -0700)]

tcp: remove two unused functions

remove tcp_may_send_now and tcp_snd_test that are no longer used

Fixes: 840a3cbe8969 ("tcp: remove forward retransmit feature")
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Tobias Klauser [Mon, 18 Sep 2017 13:03:46 +0000 (15:03 +0200)]

bpf: devmap: pass on return value of bpf_map_precharge_memlock

If bpf_map_precharge_memlock in dev_map_alloc, -ENOMEM is returned
regardless of the actual error produced by bpf_map_precharge_memlock.
Fix it by passing on the error returned by bpf_map_precharge_memlock.

Also return -EINVAL instead of -ENOMEM if the page count overflow check
fails.

This makes dev_map_alloc match the behavior of other bpf maps' alloc
functions wrt. return values.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Sathya Perla [Mon, 18 Sep 2017 11:35:37 +0000 (17:05 +0530)]

bnxt_en: check for ingress qdisc in flower offload

Check for ingress-only qdisc for flower offload, as other qdiscs
are not supported for flower offload.

Suggested-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Sathya Perla <sathya.perla@broadcom.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Randy Dunlap [Sat, 16 Sep 2017 20:10:06 +0000 (13:10 -0700)]

Documentation: networking: fix ASCII art in switchdev.txt

Fix ASCII art in Documentation/networking/switchdev.txt:

Change non-ASCII "spaces" to ASCII spaces.

Change 2 erroneous '+' characters in ASCII art to '-' (at the '*'
characters below):

line 32:
+--+----+----+----+-*--+----+---+ +-----+-----+
line 41:
+--------------+---*------------+

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Pavel Machek <pavel@ucw.cz>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Davide Caratti [Sat, 16 Sep 2017 12:02:21 +0000 (14:02 +0200)]

net/sched: cls_matchall: fix crash when used with classful qdisc

this script, edited from Linux Advanced Routing and Traffic Control guide

tc q a dev en0 root handle 1: htb default a
tc c a dev en0 parent 1: classid 1:1 htb rate 6mbit burst 15k
tc c a dev en0 parent 1:1 classid 1:a htb rate 5mbit ceil 6mbit burst 15k
tc c a dev en0 parent 1:1 classid 1:b htb rate 1mbit ceil 6mbit burst 15k
tc f a dev en0 parent 1:0 prio 1 $clsname $clsargs classid 1:b
ping $address -c1
tc -s c s dev en0

classifies traffic to 1:b or 1:a, depending on whether the packet matches
or not the pattern $clsargs of filter $clsname. However, when $clsname is
'matchall', a systematic crash can be observed in htb_classify(). HTB and
classful qdiscs don't assign initial value to struct tcf_result, but then
they expect it to contain valid values after filters have been run. Thus,
current 'matchall' ignores the TCA_MATCHALL_CLASSID attribute, configured
by user, and makes HTB (and classful qdiscs) dereference random pointers.

By assigning head->res to *res in mall_classify(), before the actions are
invoked, we fix this crash and enable TCA_MATCHALL_CLASSID functionality,
that had no effect on 'matchall' classifier since its first introduction.

BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1460213
Reported-by: Jiri Benc <jbenc@redhat.com>
Fixes: b87f7936a932 ("net/sched: introduce Match-all classifier")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Xin Long [Fri, 15 Sep 2017 07:58:33 +0000 (15:58 +0800)]

ip6_tunnel: do not allow loading ip6_tunnel if ipv6 is disabled in cmdline

If ipv6 has been disabled from cmdline since kernel started, it makes
no sense to allow users to create any ip6 tunnel. Otherwise, it could
some potential problem.

Jianlin found a kernel crash caused by this in ip6_gre when he set
ipv6.disable=1 in grub:

[  209.588865] Unable to handle kernel paging request for data at address 0x00000080
[  209.588872] Faulting instruction address: 0xc000000000a3aa6c
[  209.588879] Oops: Kernel access of bad area, sig: 11 [#1]
[  209.589062] NIP [c000000000a3aa6c] fib_rules_lookup+0x4c/0x260
[  209.589071] LR [c000000000b9ad90] fib6_rule_lookup+0x50/0xb0
[  209.589076] Call Trace:
[  209.589097] fib6_rule_lookup+0x50/0xb0
[  209.589106] rt6_lookup+0xc4/0x110
[  209.589116] ip6gre_tnl_link_config+0x214/0x2f0 [ip6_gre]
[  209.589125] ip6gre_newlink+0x138/0x3a0 [ip6_gre]
[  209.589134] rtnl_newlink+0x798/0xb80
[  209.589142] rtnetlink_rcv_msg+0xec/0x390
[  209.589151] netlink_rcv_skb+0x138/0x150
[  209.589159] rtnetlink_rcv+0x48/0x70
[  209.589169] netlink_unicast+0x538/0x640
[  209.589175] netlink_sendmsg+0x40c/0x480
[  209.589184] ___sys_sendmsg+0x384/0x4e0
[  209.589194] SyS_sendmsg+0xd4/0x140
[  209.589201] SyS_socketcall+0x3e0/0x4f0
[  209.589209] system_call+0x38/0xe0

This patch is to return -EOPNOTSUPP in ip6_tunnel_init if ipv6 has been
disabled from cmdline.

Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Fahad Kunnathadi [Fri, 15 Sep 2017 06:31:58 +0000 (12:01 +0530)]

net: phy: Fix mask value write on gmii2rgmii converter speed register

To clear Speed Selection in MDIO control register(0x10),
ie, clear bits 6 and 13 to zero while keeping other bits same.
Before AND operation,The Mask value has to be perform with bitwise NOT
operation (ie, ~ operator)

This patch clears current speed selection before writing the
new speed settings to gmii2rgmii converter

Fixes: f411a6160bd4 ("net: phy: Add gmiitorgmii converter support")

Signed-off-by: Fahad Kunnathadi <fahad.kunnathadi@dexceldesigns.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Xin Long [Fri, 15 Sep 2017 04:00:07 +0000 (12:00 +0800)]

ip6_gre: skb_push ipv6hdr before packing the header in ip6gre_header

Now in ip6gre_header before packing the ipv6 header, it skb_push t->hlen
which only includes encap_hlen + tun_hlen. It means greh and inner header
would be over written by ipv6 stuff and ipv6h might have no chance to set
up.

Jianlin found this issue when using remote any on ip6_gre, the packets he
captured on gre dev are truncated:

22:50:26.210866 Out ethertype IPv6 (0x86dd), length 120: truncated-ip6 -\
8128 bytes missing!(flowlabel 0x92f40, hlim 0, next-header Options (0) \
payload length: 8192) ::1:2000:0 > ::1:0:86dd: HBH [trunc] ip-proto-128 \
8184

It should also skb_push ipv6hdr so that ipv6h points to the right position
to set ipv6 stuff up.

This patch is to skb_push hlen + sizeof(*ipv6h) and also fix some indents
in ip6gre_header.

Fixes: c12b395a4664 ("gre: Support GRE over IPv6")
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Johannes Berg [Mon, 18 Sep 2017 20:46:36 +0000 (22:46 +0200)]

nl80211: fix null-ptr dereference on invalid mesh configuration

If TX rates are specified during mesh join, the channel must
also be specified. Check the channel pointer to avoid a null
pointer dereference if it isn't.

Reported-by: Jouni Malinen <j@w1.fi>
Fixes: 8564e38206de ("cfg80211: add checks for beacon rate, extend to mesh")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

commit | commitdiff | tree

Subash Abhinov Kasiviswanathan [Thu, 14 Sep 2017 01:30:51 +0000 (19:30 -0600)]

udpv6: Fix the checksum computation when HW checksum does not apply

While trying an ESP transport mode encryption for UDPv6 packets of
datagram size 1436 with MTU 1500, checksum error was observed in
the secondary fragment.

This error occurs due to the UDP payload checksum being missed out
when computing the full checksum for these packets in
udp6_hwcsum_outgoing().

Fixes: d39d938c8228 ("ipv6: Introduce udpv6_send_skb()")
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Vishwanath Pai [Mon, 11 Sep 2017 19:52:40 +0000 (21:52 +0200)]

netfilter: ipset: ipset list may return wrong member count for set with timeout

Simple testcase:

$ ipset create test hash:ip timeout 5
$ ipset add test 1.2.3.4
$ ipset add test 1.2.2.2
$ sleep 5

$ ipset l
Name: test
Type: hash:ip
Revision: 5
Header: family inet hashsize 1024 maxelem 65536 timeout 5
Size in memory: 296
References: 0
Number of entries: 2
Members:

We return "Number of entries: 2" but no members are listed. That is
because mtype_list runs "ip_set_timeout_expired" and does not list the
expired entries, but set->elements is never upated (until mtype_gc
cleans it up later).

Reviewed-by: Joshua Hunt <johunt@akamai.com>
Signed-off-by: Vishwanath Pai <vpai@akamai.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

commit | commitdiff | tree

Geert Uytterhoeven [Sun, 10 Sep 2017 11:41:41 +0000 (13:41 +0200)]

netfilter: nat: Do not use ARRAY_SIZE() on spinlocks to fix zero div

If no spinlock debugging options (CONFIG_GENERIC_LOCKBREAK,
CONFIG_DEBUG_SPINLOCK, CONFIG_DEBUG_LOCK_ALLOC) are enabled on a UP
platform (e.g. m68k defconfig), arch_spinlock_t is an empty struct,
hence using ARRAY_SIZE(nf_nat_locks) causes a division by zero:

    net/netfilter/nf_nat_core.c: In function ‘nf_nat_setup_info’:
    net/netfilter/nf_nat_core.c:432: warning: division by zero
    net/netfilter/nf_nat_core.c: In function ‘__nf_nat_cleanup_conntrack’:
    net/netfilter/nf_nat_core.c:535: warning: division by zero
    net/netfilter/nf_nat_core.c:537: warning: division by zero
    net/netfilter/nf_nat_core.c: In function ‘nf_nat_init’:
    net/netfilter/nf_nat_core.c:810: warning: division by zero
    net/netfilter/nf_nat_core.c:811: warning: division by zero
    net/netfilter/nf_nat_core.c:824: warning: division by zero

Fix this by using the CONNTRACK_LOCKS definition instead.

Suggested-by: Florian Westphal <fw@strlen.de>
Fixes: 8073e960a03bf7b5 ("netfilter: nat: use keyed locks")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

commit | commitdiff | tree

Linus Torvalds [Sat, 16 Sep 2017 22:47:51 +0000 (15:47 -0700)]

Linux 4.14-rc1

commit | commitdiff | tree

Linus Torvalds [Sat, 16 Sep 2017 19:08:10 +0000 (12:08 -0700)]

Merge tag 'upstream-4.14-rc1' of git://git.infradead.org/linux-ubifs

Pull UBI updates from Richard Weinberger:
"Minor improvements"

* tag 'upstream-4.14-rc1' of git://git.infradead.org/linux-ubifs:
  UBI: Fix two typos in comments
  ubi: fastmap: fix spelling mistake: "invalidiate" -> "invalidate"
  ubi: pr_err() strings should end with newlines
  ubi: pr_err() strings should end with newlines
  ubi: pr_err() strings should end with newlines

commit | commitdiff | tree

Linus Torvalds [Sat, 16 Sep 2017 19:03:25 +0000 (12:03 -0700)]

Merge branch 'for-linus-4.14-rc1' of git://git./linux/kernel/git/rw/uml

Pull UML updates from Richard Weinberger:

- minor improvements

- fixes for Debian's new gcc defaults (pie enabled by default)

- fixes for XSTATE/XSAVE to make UML work again on modern systems

* 'for-linus-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
  um: return negative in tuntap_open_tramp()
  um: remove a stray tab
  um: Use relative modversions with LD_SCRIPT_DYN
  um: link vmlinux with -no-pie
  um: Fix CONFIG_GCOV for modules.
  Fix minor typos and grammar in UML start_up help
  um: defconfig: Cleanup from old Kconfig options
  um: Fix FP register size for XSTATE/XSAVE

commit | commitdiff | tree

Linus Torvalds [Sat, 16 Sep 2017 18:28:59 +0000 (11:28 -0700)]

Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) Fix hotplug deadlock in hv_netvsc, from Stephen Hemminger.

2) Fix double-free in rmnet driver, from Dan Carpenter.

3) INET connection socket layer can double put request sockets, fix
    from Eric Dumazet.

4) Don't match collect metadata-mode tunnels if the device is down,
    from Haishuang Yan.

5) Do not perform TSO6/GSO on ipv6 packets with extensions headers in
    be2net driver, from Suresh Reddy.

6) Fix scaling error in gen_estimator, from Eric Dumazet.

7) Fix 64-bit statistics deadlock in systemport driver, from Florian
    Fainelli.

8) Fix use-after-free in sctp_sock_dump, from Xin Long.

9) Reject invalid BPF_END instructions in verifier, from Edward Cree.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (43 commits)
  mlxsw: spectrum_router: Only handle IPv4 and IPv6 events
  Documentation: link in networking docs
  tcp: fix data delivery rate
  bpf/verifier: reject BPF_ALU64|BPF_END
  sctp: do not mark sk dumped when inet_sctp_diag_fill returns err
  sctp: fix an use-after-free issue in sctp_sock_dump
  netvsc: increase default receive buffer size
  tcp: update skb->skb_mstamp more carefully
  net: ipv4: fix l3slave check for index returned in IP_PKTINFO
  net: smsc911x: Quieten netif during suspend
  net: systemport: Fix 64-bit stats deadlock
  net: vrf: avoid gcc-4.6 warning
  qed: remove unnecessary call to memset
  tg3: clean up redundant initialization of tnapi
  tls: make tls_sw_free_resources static
  sctp: potential read out of bounds in sctp_ulpevent_type_enabled()
  MAINTAINERS: review Renesas DT bindings as well
  net_sched: gen_estimator: fix scaling error in bytes/packets samples
  nfp: wait for the NSP resource to appear on boot
  nfp: wait for board state before talking to the NSP
  ...

commit | commitdiff | tree

Linus Torvalds [Sat, 16 Sep 2017 18:24:26 +0000 (11:24 -0700)]

Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input

Pull more input updates from Dmitry Torokhov:
"A second round of updates for the input subsystem:

   - a new driver for PWM-controlled vibrators

   - ucb1400 touchscreen driver had completely busted suspend/resume
     handling

   - we now handle "home" button found on some devices with Goodix
     touchscreens

   - assorted other fixups"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: i8042 - add Gigabyte P57 to the keyboard reset table
  Input: xpad - validate USB endpoint type during probe
  Input: ucb1400_ts - fix suspend and resume handling
  Input: edt-ft5x06 - fix access to non-existing register
  Input: elantech - make arrays debounce_packet static, reduces object code size
  Input: surface3_spi - make const array header static, reduces object code size
  Input: goodix - add support for capacitive home button
  Input: add a driver for PWM controllable vibrators
  Input: adi - make array seq static, reduces object code size

commit | commitdiff | tree

Markus Trippelsdorf [Sat, 16 Sep 2017 09:01:16 +0000 (11:01 +0200)]

firmware: Restore support for built-in firmware

Commit 5620a0d1aac ("firmware: delete in-kernel firmware") removed the
entire firmware directory. Unfortunately it thereby also removed the
support for built-in firmware.

This restores the ability to build firmware directly into the kernel by
pruning the original Makefile to the necessary minimum. The default for
EXTRA_FIRMWARE_DIR is now the standard directory /lib/firmware/.

Fixes: 5620a0d1aac ("firmware: delete in-kernel firmware")
Signed-off-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Acked-by: Greg K-H <gregkh@linuxfoundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Ido Schimmel [Fri, 15 Sep 2017 13:31:07 +0000 (15:31 +0200)]

mlxsw: spectrum_router: Only handle IPv4 and IPv6 events

The driver doesn't support events from address families other than IPv4
and IPv6, so ignore them. Otherwise, we risk queueing a work item before
it's initialized.

This can happen in case a VRF is configured when MROUTE_MULTIPLE_TABLES
is enabled, as the VRF driver will try to add an l3mdev rule for the
IPMR family.

Fixes: 65e65ec137f4 ("mlxsw: spectrum_router: Don't ignore IPv6 notifications")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Andreas Rammhold <andreas@rammhold.de>
Reported-by: Florian Klink <flokli@flokli.de>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Pavel Machek [Sat, 16 Sep 2017 14:28:02 +0000 (16:28 +0200)]

Documentation: link in networking docs

Fix link in filter.txt.

Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Eric Dumazet [Fri, 15 Sep 2017 23:47:42 +0000 (16:47 -0700)]

tcp: fix data delivery rate

Now skb->mstamp_skb is updated later, we also need to call
tcp_rate_skb_sent() after the update is done.

Fixes: 8c72c65b426b ("tcp: update skb->skb_mstamp more carefully")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Linus Torvalds [Sat, 16 Sep 2017 03:43:33 +0000 (20:43 -0700)]

Merge branch '4.14-features' of git://git.linux-mips.org/ralf/upstream-linus

Pull MIPS updates from Ralf Baechle:
"This is the main pull request for 4.14 for MIPS; below a summary of
  the non-merge commits:

  CM:
   - Rename mips_cm_base to mips_gcr_base
   - Specify register size when generating accessors
   - Use BIT/GENMASK for register fields, order & drop shifts
   - Add cluster & block args to mips_cm_lock_other()

  CPC:
   - Use common CPS accessor generation macros
   - Use BIT/GENMASK for register fields, order & drop shifts
   - Introduce register modify (set/clear/change) accessors
   - Use change_*, set_* & clear_* where appropriate
   - Add CM/CPC 3.5 register definitions
   - Use GlobalNumber macros rather than magic numbers
   - Have asm/mips-cps.h include CM & CPC headers
   - Cluster support for topology functions
   - Detect CPUs in secondary clusters

  CPS:
   - Read GIC_VL_IDENT directly, not via irqchip driver

  DMA:
   - Consolidate coherent and non-coherent dma_alloc code
   - Don't use dma_cache_sync to implement fd_cacheflush

  FPU emulation / FP assist code:
   - Another series of 14 commits fixing corner cases such as NaN
     propgagation and other special input values.
   - Zero bits 32-63 of the result for a CLASS.D instruction.
   - Enhanced statics via debugfs
   - Do not use bools for arithmetic. GCC 7.1 moans about this.
   - Correct user fault_addr type

  Generic MIPS:
   - Enhancement of stack backtraces
   - Cleanup from non-existing options
   - Handle non word sized instructions when examining frame
   - Fix detection and decoding of ADDIUSP instruction
   - Fix decoding of SWSP16 instruction
   - Refactor handling of stack pointer in get_frame_info
   - Remove unreachable code from force_fcr31_sig()
   - Convert to using %pOF instead of full_name
   - Remove the R6000 support.
   - Move FP code from *_switch.S to *_fpu.S
   - Remove unused ST_OFF from r2300_switch.S
   - Allow platform to specify multiple its.S files
   - Add #includes to various files to ensure code builds reliable and
     without warning..
   - Remove __invalidate_kernel_vmap_range
   - Remove plat_timer_setup
   - Declare various variables & functions static
   - Abstract CPU core & VP(E) ID access through accessor functions
   - Store core & VP IDs in GlobalNumber-style variable
   - Unify checks for sibling CPUs
   - Add CPU cluster number accessors
   - Prevent direct use of generic_defconfig
   - Make CONFIG_MIPS_MT_SMP default y
   - Add __ioread64_copy
   - Remove unnecessary inclusions of linux/irqchip/mips-gic.h

  GIC:
   - Introduce asm/mips-gic.h with accessor functions
   - Use new GIC accessor functions in mips-gic-timer
   - Remove counter access functions from irq-mips-gic.c
   - Remove gic_read_local_vp_id() from irq-mips-gic.c
   - Simplify shared interrupt pending/mask reads in irq-mips-gic.c
   - Simplify gic_local_irq_domain_map() in irq-mips-gic.c
   - Drop gic_(re)set_mask() functions in irq-mips-gic.c
   - Remove gic_set_polarity(), gic_set_trigger(), gic_set_dual_edge(),
     gic_map_to_pin() and gic_map_to_vpe() from irq-mips-gic.c.
   - Convert remaining shared reg access, local int mask access and
     remaining local reg access to new accessors
   - Move GIC_LOCAL_INT_* to asm/mips-gic.h
   - Remove GIC_CPU_INT* macros from irq-mips-gic.c
   - Move various definitions to the driver
   - Remove gic_get_usm_range()
   - Remove __gic_irq_dispatch() forward declaration
   - Remove gic_init()
   - Use mips_gic_present() in place of gic_present and remove
     gic_present
   - Move gic_get_c0_*_int() to asm/mips-gic.h
   - Remove linux/irqchip/mips-gic.h
   - Inline __gic_init()
   - Inline gic_basic_init()
   - Make pcpu_masks a per-cpu variable
   - Use pcpu_masks to avoid reading GIC_SH_MASK*
   - Clean up mti, reserved-cpu-vectors handling
   - Use cpumask_first_and() in gic_set_affinity()
   - Let the core set struct irq_common_data affinity

  microMIPS:
   - Fix microMIPS stack unwinding on big endian systems

  MIPS-GIC:
   - SYNC after enabling GIC region

  NUMA:
   - Remove the unused parent_node() macro

  R6:
   - Constify r2_decoder_tables
   - Add accessor & bit definitions for GlobalNumber

  SMP:
   - Constify smp ops
   - Allow boot_secondary SMP op to return errors

  VDSO:
   - Drop gic_get_usm_range() usage
   - Avoid use of linux/irqchip/mips-gic.h

  Platform changes:

  Alchemy:
   - Add devboard machine type to cpuinfo
   - update cpu feature overrides
   - Threaded carddetect irqs for devboards

  AR7:
   - allow NULL clock for clk_get_rate

  BCM63xx:
   - Fix ENETDMA_6345_MAXBURST_REG offset
   - Allow NULL clock for clk_get_rate

  CI20:
   - Enable GPIO and RTC drivers in defconfig
   - Add ethernet and fixed-regulator nodes to DTS

  Generic platform:
   - Move Boston and NI 169445 FIT image source to their own files
   - Include asm/bootinfo.h for plat_fdt_relocated()
   - Include asm/time.h for get_c0_*_int()
   - Include asm/bootinfo.h for plat_fdt_relocated()
   - Include asm/time.h for get_c0_*_int()
   - Allow filtering enabled boards by requirements
   - Don't explicitly disable CONFIG_USB_SUPPORT
   - Bump default NR_CPUS to 16

  JZ4700:
   - Probe the jz4740-rtc driver from devicetree

  Lantiq:
   - Drop check of boot select from the spi-falcon driver.
   - Drop check of boot select from the lantiq-flash MTD driver.
   - Access boot cause register in the watchdog driver through regmap
   - Add device tree binding documentation for the watchdog driver
   - Add docs for the RCU DT bindings.
   - Convert the fpi bus driver to a platform_driver
   - Remove ltq_reset_cause() and ltq_boot_select(
   - Switch to a proper reset driver
   - Switch to a new drivers/soc GPHY driver
   - Add an USB PHY driver for the Lantiq SoCs using the RCU module
   - Use of_platform_default_populate instead of __dt_register_buses
   - Enable MFD_SYSCON to be able to use it for the RCU MFD
   - Replace ltq_boot_select() with dummy implementation.

  Loongson 2F:
   - Allow NULL clock for clk_get_rate

  Malta:
   - Use new GIC accessor functions

  NI 169445:
   - Add support for NI 169445 board.
   - Only include in 32r2el kernels

  Octeon:
   - Add support for watchdog of 78XX SOCs.
   - Add support for watchdog of CN68XX SOCs.
   - Expose support for mips32r1, mips32r2 and mips64r1
   - Enable more drivers in config file
   - Add support for accessing the boot vector.
   - Remove old boot vector code from watchdog driver
   - Define watchdog registers for 70xx, 73xx, 78xx, F75xx.
   - Make CSR functions node aware.
   - Allow access to CIU3 IRQ domains.
   - Misc cleanups in the watchdog driver

  Omega2+:
   - New board, add support and defconfig

  Pistachio:
   - Enable Root FS on NFS in defconfig

  Ralink:
   - Add Mediatek MT7628A SoC
   - Allow NULL clock for clk_get_rate
   - Explicitly request exclusive reset control in the pci-mt7620 PCI driver.

  SEAD3:
   - Only include in 32 bit kernels by default

  VoCore:
   - Add VoCore as a vendor t0 dt-bindings
   - Add defconfig file"

* '4.14-features' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (167 commits)
  MIPS: Refactor handling of stack pointer in get_frame_info
  MIPS: Stacktrace: Fix microMIPS stack unwinding on big endian systems
  MIPS: microMIPS: Fix decoding of swsp16 instruction
  MIPS: microMIPS: Fix decoding of addiusp instruction
  MIPS: microMIPS: Fix detection of addiusp instruction
  MIPS: Handle non word sized instructions when examining frame
  MIPS: ralink: allow NULL clock for clk_get_rate
  MIPS: Loongson 2F: allow NULL clock for clk_get_rate
  MIPS: BCM63XX: allow NULL clock for clk_get_rate
  MIPS: AR7: allow NULL clock for clk_get_rate
  MIPS: BCM63XX: fix ENETDMA_6345_MAXBURST_REG offset
  mips: Save all registers when saving the frame
  MIPS: Add DWARF unwinding to assembly
  MIPS: Make SAVE_SOME more standard
  MIPS: Fix issues in backtraces
  MIPS: jz4780: DTS: Probe the jz4740-rtc driver from devicetree
  MIPS: Ci20: Enable RTC driver
  watchdog: octeon-wdt: Add support for 78XX SOCs.
  watchdog: octeon-wdt: Add support for cn68XX SOCs.
  watchdog: octeon-wdt: File cleaning.
  ...

commit | commitdiff | tree

Linus Torvalds [Sat, 16 Sep 2017 03:25:06 +0000 (20:25 -0700)]

Merge tag 'pci-v4.14-fixes-1' of git://git./linux/kernel/git/helgaas/pci

Pull PCI fix from Bjorn Helgaas:
"Revert an attempt to fix a race while enabling upstream bridges
because it broke iwlwifi firmware loading"

* tag 'pci-v4.14-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
Revert "PCI: Avoid race while enabling upstream bridges"

commit | commitdiff | tree

Linus Torvalds [Sat, 16 Sep 2017 00:52:52 +0000 (17:52 -0700)]

Merge tag 'drm-fixes-for-v4.14-rc1' of git://people.freedesktop.org/~airlied/linux

Pull drm AMD fixes from Dave Airlie:
"Just had a single AMD fixes pull from Alex for rc1"

* tag 'drm-fixes-for-v4.14-rc1' of git://people.freedesktop.org/~airlied/linux:
  drm/amdgpu: revert "fix deadlock of reservation between cs and gpu reset v2"
  drm/amdgpu: remove duplicate return statement
  drm/amdgpu: check memory allocation failure
  drm/amd/amdgpu: fix BANK_SELECT on Vega10 (v2)
  drm/amdgpu: inline amdgpu_ttm_do_bind again
  drm/amdgpu: fix amdgpu_ttm_bind
  drm/amdgpu: remove the GART copy hack
  drm/ttm:fix wrong decoding of bo_count
  drm/ttm: fix missing inc bo_count
  drm/amdgpu: set sched_hw_submission higher for KIQ (v3)
  drm/amdgpu: move default gart size setting into gmc modules
  drm/amdgpu: refine default gart size
  drm/amd/powerplay: ACG frequency added in PPTable
  drm/amdgpu: discard commands of killed processes
  drm/amdgpu: fix and cleanup shadow handling
  drm/amdgpu: add automatic per asic settings for gart_size
  drm/amdgpu/gfx8: fix spelling typo in mqd allocation
  drm/amd/powerplay: unhalt mec after loading
  drm/amdgpu/virtual_dce: Virtual display doesn't support disable vblank immediately
  drm/amdgpu: Fix huge page updates with CPU

commit | commitdiff | tree

Linus Torvalds [Sat, 16 Sep 2017 00:49:46 +0000 (17:49 -0700)]

Merge branch 'i2c/for-next' of git://git./linux/kernel/git/wsa/linux

Pull more i2c updates from Wolfram Sang:
"I2C has two more new drivers: Altera FPGA and STM32F7"

* 'i2c/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: i2c-stm32f7: add driver
  i2c: i2c-stm32f4: use generic definition of speed enum
  dt-bindings: i2c-stm32: Document the STM32F7 I2C bindings
  i2c: altera: Add Altera I2C Controller driver
  dt-bindings: i2c: Add Altera I2C Controller

commit | commitdiff | tree

Linus Torvalds [Fri, 15 Sep 2017 22:43:55 +0000 (15:43 -0700)]

Merge tag 'for-linus' of git://git./virt/kvm/kvm

Pull more KVM updates from Paolo Bonzini:
- PPC bugfixes
- RCU splat fix
- swait races fix
- pointless userspace-triggerable BUG() fix
- misc fixes for KVM_RUN corner cases
- nested virt correctness fixes + one host DoS
- some cleanups
- clang build fix
- fix AMD AVIC with default QEMU command line options
- x86 bugfixes

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (28 commits)
  kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly
  kvm: vmx: Handle VMLAUNCH/VMRESUME failure properly
  kvm: nVMX: Remove nested_vmx_succeed after successful VM-entry
  kvm,mips: Fix potential swait_active() races
  kvm,powerpc: Serialize wq active checks in ops->vcpu_kick
  kvm: Serialize wq active checks in kvm_vcpu_wake_up()
  kvm,x86: Fix apf_task_wake_one() wq serialization
  kvm,lapic: Justify use of swait_active()
  kvm,async_pf: Use swq_has_sleeper()
  sched/wait: Add swq_has_sleeper()
  KVM: VMX: Do not BUG() on out-of-bounds guest IRQ
  KVM: Don't accept obviously wrong gsi values via KVM_IRQFD
  kvm: nVMX: Don't allow L2 to access the hardware CR8
  KVM: trace events: update list of exit reasons
  KVM: async_pf: Fix #DF due to inject "Page not Present" and "Page Ready" exceptions simultaneously
  KVM: X86: Don't block vCPU if there is pending exception
  KVM: SVM: Add irqchip_split() checks before enabling AVIC
  KVM: Add struct kvm_vcpu pointer parameter to get_enable_apicv()
  KVM: SVM: Refactor AVIC vcpu initialization into avic_init_vcpu()
  KVM: x86: fix clang build
  ...

commit | commitdiff | tree

Edward Cree [Fri, 15 Sep 2017 13:37:38 +0000 (14:37 +0100)]

bpf/verifier: reject BPF_ALU64|BPF_END

Neither ___bpf_prog_run nor the JITs accept it.
Also adds a new test case.

Fixes: 17a5267067f3 ("bpf: verifier (add verifier core)")
Signed-off-by: Edward Cree <ecree@solarflare.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Xin Long [Fri, 15 Sep 2017 03:02:48 +0000 (11:02 +0800)]

sctp: do not mark sk dumped when inet_sctp_diag_fill returns err

sctp_diag would not actually dump out sk/asoc if inet_sctp_diag_fill
returns err, in which case it shouldn't mark sk dumped by setting
cb->args[3] as 1 in sctp_sock_dump().

Otherwise, it could cause some asocs to have no parent's sk dumped
in 'ss --sctp'.

So this patch is to not set cb->args[3] when inet_sctp_diag_fill()
returns err in sctp_sock_dump().

Fixes: 8f840e47f190 ("sctp: add the sctp_diag.c file")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Xin Long [Fri, 15 Sep 2017 03:02:21 +0000 (11:02 +0800)]

sctp: fix an use-after-free issue in sctp_sock_dump

Commit 86fdb3448cc1 ("sctp: ensure ep is not destroyed before doing the
dump") tried to fix an use-after-free issue by checking !sctp_sk(sk)->ep
with holding sock and sock lock.

But Paolo noticed that endpoint could be destroyed in sctp_rcv without
sock lock protection. It means the use-after-free issue still could be
triggered when sctp_rcv put and destroy ep after sctp_sock_dump checks
!ep, although it's pretty hard to reproduce.

I could reproduce it by mdelay in sctp_rcv while msleep in sctp_close
and sctp_sock_dump long time.

This patch is to add another param cb_done to sctp_for_each_transport
and dump ep->assocs with holding tsp after jumping out of transport's
traversal in it to avoid this issue.

It can also improve sctp diag dump to make it run faster, as no need
to save sk into cb->args[5] and keep calling sctp_for_each_transport
any more.

This patch is also to use int * instead of int for the pos argument
in sctp_for_each_transport, which could make postion increment only
in sctp_for_each_transport and no need to keep changing cb->args[2]
in sctp_sock_filter and sctp_sock_dump any more.

Fixes: 86fdb3448cc1 ("sctp: ensure ep is not destroyed before doing the dump")
Reported-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Stephen Hemminger [Thu, 14 Sep 2017 16:31:07 +0000 (09:31 -0700)]

netvsc: increase default receive buffer size

The default receive buffer size was reduced by recent change
to a value which was appropriate for 10G and Windows Server 2016.
But the value is too small for full performance with 40G on Azure.
Increase the default back to maximum supported by host.

Fixes: 8b5327975ae1 ("netvsc: allow controlling send/recv buffer size")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Eric Dumazet [Thu, 14 Sep 2017 03:30:39 +0000 (20:30 -0700)]

tcp: update skb->skb_mstamp more carefully

liujian reported a problem in TCP_USER_TIMEOUT processing with a patch
in tcp_probe_timer() :
      https://www.spinics.net/lists/netdev/msg454496.html

After investigations, the root cause of the problem is that we update
skb->skb_mstamp of skbs in write queue, even if the attempt to send a
clone or copy of it failed. One reason being a routing problem.

This patch prevents this, solving liujian issue.

It also removes a potential RTT miscalculation, since
__tcp_retransmit_skb() is not OR-ing TCP_SKB_CB(skb)->sacked with
TCPCB_EVER_RETRANS if a failure happens, but skb->skb_mstamp has
been changed.

A future ACK would then lead to a very small RTT sample and min_rtt
would then be lowered to this too small value.

Tested:

# cat user_timeout.pkt
--local_ip=192.168.102.64

    0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
   +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
   +0 bind(3, ..., ...) = 0
   +0 listen(3, 1) = 0

   +0 `ifconfig tun0 192.168.102.64/16; ip ro add 192.0.2.1 dev tun0`

   +0 < S 0:0(0) win 0 <mss 1460>
   +0 > S. 0:0(0) ack 1 <mss 1460>

  +.1 < . 1:1(0) ack 1 win 65530
   +0 accept(3, ..., ...) = 4

   +0 setsockopt(4, SOL_TCP, TCP_USER_TIMEOUT, [3000], 4) = 0
   +0 write(4, ..., 24) = 24
   +0 > P. 1:25(24) ack 1 win 29200
   +.1 < . 1:1(0) ack 25 win 65530

//change the ipaddress
   +1 `ifconfig tun0 192.168.0.10/16`

   +1 write(4, ..., 24) = 24
   +1 write(4, ..., 24) = 24
   +1 write(4, ..., 24) = 24
   +1 write(4, ..., 24) = 24

   +0 `ifconfig tun0 192.168.102.64/16`
   +0 < . 1:2(1) ack 25 win 65530
   +0 `ifconfig tun0 192.168.0.10/16`

   +3 write(4, ..., 24) = -1

# ./packetdrill user_timeout.pkt

Signed-off-by: Eric Dumazet <edumazet@googl.com>
Reported-by: liujian <liujian56@huawei.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David Ahern [Thu, 14 Sep 2017 00:11:37 +0000 (17:11 -0700)]

net: ipv4: fix l3slave check for index returned in IP_PKTINFO

rt_iif is only set to the actual egress device for the output path. The
recent change to consider the l3slave flag when returning IP_PKTINFO
works for local traffic (the correct device index is returned), but it
broke the more typical use case of packets received from a remote host
always returning the VRF index rather than the original ingress device.
Update the fixup to consider l3slave and rt_iif actually getting set.

Fixes: 1dfa76390bf05 ("net: ipv4: add check for l3slave for index returned in IP_PKTINFO")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Geert Uytterhoeven [Wed, 13 Sep 2017 17:42:05 +0000 (19:42 +0200)]

net: smsc911x: Quieten netif during suspend

If the network interface is kept running during suspend, the net core
may call net_device_ops.ndo_start_xmit() while the Ethernet device is
still suspended, which may lead to a system crash.

E.g. on sh73a0/kzm9g and r8a73a4/ape6evm, the external Ethernet chip is
driven by a PM controlled clock.  If the Ethernet registers are accessed
while the clock is not running, the system will crash with an imprecise
external abort.

As this is a race condition with a small time window, it is not so easy
to trigger at will.  Using pm_test may increase your chances:

    # echo 0 > /sys/module/printk/parameters/console_suspend
    # echo platform > /sys/power/pm_test
    # echo mem > /sys/power/state

To fix this, make sure the network interface is quietened during
suspend.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Florian Fainelli [Tue, 12 Sep 2017 20:14:26 +0000 (13:14 -0700)]

net: systemport: Fix 64-bit stats deadlock

We can enter a deadlock situation because there is no sufficient protection
when ndo_get_stats64() runs in process context to guard against RX or TX NAPI
contexts running in softirq, this can lead to the following lockdep splat and
actual deadlock was experienced as well with an iperf session in the background
and a while loop doing ifconfig + ethtool.

[    5.780350] ================================
[    5.784679] WARNING: inconsistent lock state
[    5.789011] 4.13.0-rc7-02179-g32fae27c725d #70 Not tainted
[    5.794561] --------------------------------
[    5.798890] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[    5.804971] swapper/0/0 [HC0[0]:SC1[1]:HE0:SE0] takes:
[    5.810175]  (&syncp->seq#2){+.?...}, at: [<c0768a28>] bcm_sysport_tx_reclaim+0x30/0x54
[    5.818327] {SOFTIRQ-ON-W} state was registered at:
[    5.823278]   bcm_sysport_get_stats64+0x17c/0x258
[    5.828053]   dev_get_stats+0x38/0xac
[    5.831776]   rtnl_fill_stats+0x30/0x118
[    5.835761]   rtnl_fill_ifinfo+0x538/0xe24
[    5.839921]   rtmsg_ifinfo_build_skb+0x6c/0xd8
[    5.844430]   rtmsg_ifinfo_event.part.5+0x14/0x44
[    5.849201]   rtmsg_ifinfo+0x20/0x28
[    5.852837]   register_netdevice+0x628/0x6b8
[    5.857171]   register_netdev+0x14/0x24
[    5.861051]   bcm_sysport_probe+0x30c/0x438
[    5.865280]   platform_drv_probe+0x50/0xb0
[    5.869418]   driver_probe_device+0x2e8/0x450
[    5.873817]   __driver_attach+0x104/0x120
[    5.877871]   bus_for_each_dev+0x7c/0xc0
[    5.881834]   bus_add_driver+0x1b0/0x270
[    5.885797]   driver_register+0x78/0xf4
[    5.889675]   do_one_initcall+0x54/0x190
[    5.893646]   kernel_init_freeable+0x144/0x1d0
[    5.898135]   kernel_init+0x8/0x110
[    5.901665]   ret_from_fork+0x14/0x2c
[    5.905363] irq event stamp: 24263
[    5.908804] hardirqs last  enabled at (24262): [<c08eecf0>] net_rx_action+0xc4/0x4e4
[    5.916624] hardirqs last disabled at (24263): [<c0a7da00>] _raw_spin_lock_irqsave+0x1c/0x98
[    5.925143] softirqs last  enabled at (24258): [<c022a7fc>] irq_enter+0x84/0x98
[    5.932524] softirqs last disabled at (24259): [<c022a918>] irq_exit+0x108/0x16c
[    5.939985]
[    5.939985] other info that might help us debug this:
[    5.946576]  Possible unsafe locking scenario:
[    5.946576]
[    5.952556]        CPU0
[    5.955031]        ----
[    5.957506]   lock(&syncp->seq#2);
[    5.960955]   <Interrupt>
[    5.963604]     lock(&syncp->seq#2);
[    5.967227]
[    5.967227]  *** DEADLOCK ***
[    5.967227]
[    5.973222] 1 lock held by swapper/0/0:
[    5.977092]  #0:  (&(&ring->lock)->rlock){..-...}, at: [<c0768a18>] bcm_sysport_tx_reclaim+0x20/0x54

So just remove the u64_stats_update_begin()/end() pair in ndo_get_stats64()
since it does not appear to be useful for anything. No inconsistency was
observed with either ifconfig or ethtool, global TX counts equal the sum of
per-queue TX counts on a 32-bit architecture.

Fixes: 10377ba7673d ("net: systemport: Support 64bit statistics")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Arnd Bergmann [Tue, 12 Sep 2017 20:10:53 +0000 (22:10 +0200)]

net: vrf: avoid gcc-4.6 warning

When building an allmodconfig kernel with gcc-4.6, we get a rather
odd warning:

drivers/net/vrf.c: In function ‘vrf_ip6_input_dst’:
drivers/net/vrf.c:964:3: error: initialized field with side-effects overwritten [-Werror]
drivers/net/vrf.c:964:3: error: (near initialization for ‘fl6’) [-Werror]

I have no idea what this warning is even trying to say, but it does
seem like a false positive. Reordering the initialization in to match
the structure definition gets rid of the warning, and might also avoid
whatever gcc thinks is wrong here.

Fixes: 9ff74384600a ("net: vrf: Handle ipv6 multicast and link-local addresses")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Himanshu Jha [Tue, 12 Sep 2017 11:19:22 +0000 (16:49 +0530)]

qed: remove unnecessary call to memset

call to memset to assign 0 value immediately after allocating
memory with kzalloc is unnecesaary as kzalloc allocates the memory
filled with 0 value.

Semantic patch used to resolve this issue:

@@
expression e,e2; constant c;
statement S;
@@

e = kzalloc(e2, c);
if(e == NULL) S
- memset(e, 0, e2);

Signed-off-by: Himanshu Jha <himanshujha199640@gmail.com>
Signed-off-by: Himanshu Jha <himanshujha199640@gmail.com>
Acked-by: Sudarsana Kalluru <sudarsana.kalluru@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Linus Torvalds [Fri, 15 Sep 2017 19:58:55 +0000 (12:58 -0700)]

Merge tag 'firmware_removal-4.14-rc1' of git://git./linux/kernel/git/gregkh/driver-core

Pull firmware removal from Greg KH:
"Many many years ago (at the kernel summit in Boston), we all came to
  the agreement that the firmware/ tree should be dropped from the
  kernel, and everyone use the linux-firmware package instead. For some
  minor reason, David Woodhouse didn't send the pull request at that
  point in time, and everyone forgot about this.

  The topic came up in the hallway track at the Plumbers conference this
  week, so here's a single patch that drops the whole firmware tree. The
  last firmware update was back in 2013, and all distros have been using
  linux-firmware instead since at least that year, if not before. The
  only commits to that directory since 2013 was some kbuild fixups for
  various build tool issues.

  So lets finally drop this, we don't need to lug them around in the
  kernel source tree anymore, especially as no one wants or uses them.

  This has passed build testing with 0-day, I don't think it made it
  into linux-next this week, but I figured it was good to get in before
  4.14-rc1 was out"

* tag 'firmware_removal-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  firmware: delete in-kernel firmware

commit | commitdiff | tree

Linus Torvalds [Fri, 15 Sep 2017 19:47:21 +0000 (12:47 -0700)]

Merge tag 'nios2-v4.14-rc1' of git://git./linux/kernel/git/lftan/nios2

Pull arch/nios2 update from Ley Foon Tan.

* tag 'nios2-v4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2:
nios2: time: Read timer in get_cycles only if initialized
nios2: add earlycon support to 3c120 devboard DTS

commit | commitdiff | tree

Linus Torvalds [Fri, 15 Sep 2017 19:44:59 +0000 (12:44 -0700)]

Merge tag 'powerpc-4.14-2' of git://git./linux/kernel/git/powerpc/linux

Pull powerpc fix from Michael Ellerman:
"Just one fix, for the handling of alignment interrupts on dcbz
  instructions.

  Thanks to Paul Mackerras, Christian Zigotzky, Michal Sojka"

* tag 'powerpc-4.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc: Fix handling of alignment interrupt on dcbz instruction

commit | commitdiff | tree

Linus Torvalds [Fri, 15 Sep 2017 19:16:18 +0000 (12:16 -0700)]

Merge tag 'for-linus-4.14-ofs2' of git://git./linux/kernel/git/hubcap/linux

Pull orangefs updates from Mike Marshall:
"Some cleanups and a big bug fix for ACLs.

  When I was reviewing Jan Kara's ACL patch, I realized that Orangefs
  ACL code was busted, not just in the kernel module, but in the server
  as well. I've been working on the code in the server mostly, but
  here's one kernel patch, there will be more"

* tag 'for-linus-4.14-ofs2' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
  orangefs: Adjust three checks for null pointers
  orangefs: Use kcalloc() in orangefs_prepare_cdm_array()
  orangefs: Delete error messages for a failed memory allocation in five functions
  orangefs: constify xattr_handler structure
  orangefs: don't call filemap_write_and_wait from fsync
  orangefs: off by ones in xattr size checks
  orangefs: documentation clean up
  orangefs: react properly to posix_acl_update_mode's aftermath.
  orangefs: Don't clear SGID when inheriting ACLs

commit | commitdiff | tree

Dmitry Torokhov [Fri, 15 Sep 2017 16:52:21 +0000 (09:52 -0700)]

Merge branch 'next' into for-linus

Prepare second round of input updates for 4.14 merge window.

commit | commitdiff | tree

Kai-Heng Feng [Fri, 15 Sep 2017 16:36:16 +0000 (09:36 -0700)]

Input: i8042 - add Gigabyte P57 to the keyboard reset table

Similar to other Gigabyte laptops, the touchpad on P57 requires a
keyboard reset to detect Elantech touchpad correctly.

BugLink: https://bugs.launchpad.net/bugs/1594214
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>

commit | commitdiff | tree

Jim Mattson [Thu, 14 Sep 2017 23:31:44 +0000 (16:31 -0700)]

kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly

When emulating a nested VM-entry from L1 to L2, several control field
validation checks are deferred to the hardware. Should one of these
validation checks fail, vcpu_vmx_run will set the vmx->fail flag. When
this happens, the L2 guest state is not loaded (even in part), and
execution should continue in L1 with the next instruction after the
VMLAUNCH/VMRESUME.

The VMCS12 is not modified (except for the VM-instruction error
field), the VMCS12 MSR save/load lists are not processed, and the CPU
state is not loaded from the VMCS12 host area. Moreover, the vmcs02
exit reason is stale, so it should not be consulted for any reason.

Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

commit | commitdiff | tree

Jim Mattson [Thu, 14 Sep 2017 23:31:42 +0000 (16:31 -0700)]

kvm: vmx: Handle VMLAUNCH/VMRESUME failure properly

On an early VMLAUNCH/VMRESUME failure (i.e. one which sets the
VM-instruction error field of the current VMCS), the launch state of
the current VMCS is not set to "launched," and the VM-exit information
fields of the current VMCS (including IDT-vectoring information and
exit reason) are stale.

On a late VMLAUNCH/VMRESUME failure (i.e. one which sets the high bit
of the exit reason field), the launch state of the current VMCS is not
set to "launched," and only two of the VM-exit information fields of
the current VMCS are modified (exit reason and exit
qualification). The remaining VM-exit information fields of the
current VMCS (including IDT-vectoring information, in particular) are
stale.

Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Domain: System / Kernel; Licenses: GPL-2.0;

RSS Atom