review.tizen.org Git - platform/kernel/linux-arm64.git/log

projects / platform / kernel / linux-arm64.git / log

Al Viro [Wed, 5 Feb 2014 15:28:58 +0000 (10:28 -0500)]

untangling process_vm_..., part 3

lift iov one more level out - from process_vm_rw_single_vec to
process_vm_rw_core(). Same story as with the previous commit.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Al Viro [Wed, 5 Feb 2014 15:22:50 +0000 (10:22 -0500)]

untangling process_vm_..., part 2

move iov to caller's stack frame; the value we assign to it on the
next call of process_vm_rw_pages() is equal to the value it had
when the last time we were leaving process_vm_rw_pages().

drop lvec argument of process_vm_rw_pages() - it's not used anymore.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Al Viro [Wed, 5 Feb 2014 15:14:01 +0000 (10:14 -0500)]

untangling process_vm_..., part 1

we want to massage it to use of iov_iter. This one is an equivalent
transformation - just introduce a local variable mirroring
lvec + *lvec_current.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Al Viro [Wed, 5 Feb 2014 00:08:21 +0000 (19:08 -0500)]

read_code(): go through vfs_read() instead of calling the method directly

... and don't skip on sanity checks. It's *not* a hot path, TYVM
(a couple of calls per a.out execve(), for pity sake) and headers of
random a.out binary are not to be trusted.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Al Viro [Tue, 4 Feb 2014 19:19:48 +0000 (14:19 -0500)]

fold cifs_iovec_read() into its (only) caller

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Al Viro [Tue, 4 Feb 2014 19:07:43 +0000 (14:07 -0500)]

cifs_iovec_read: keep iov_iter between the calls of cifs_readdata_to_iov()

... we are doing them on adjacent parts of file, so what happens is that
each subsequent call works to rebuild the iov_iter to exact state it
had been abandoned in by previous one. Just keep it through the entire
cifs_iovec_read(). And use copy_page_to_iter() instead of doing
kmap/copy_to_user/kunmap manually...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Al Viro [Mon, 3 Feb 2014 23:19:51 +0000 (18:19 -0500)]

switch vmsplice_to_user() to copy_page_to_iter()

I've switched the sanity checks on iovec to rw_copy_check_uvector();
we might need to do a local analog, if any behaviour differences are
not actually bugfixes here...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Al Viro [Tue, 4 Feb 2014 00:11:42 +0000 (19:11 -0500)]

switch pipe_read() to copy_page_to_iter()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Al Viro [Tue, 4 Feb 2014 18:47:26 +0000 (13:47 -0500)]

cifs_iovec_read(): resubmit shouldn't restart the loop

... by that point the request we'd just resent is in the
head of the list anyway. Just return to the beginning of
the loop body...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Al Viro [Mon, 3 Feb 2014 22:07:03 +0000 (17:07 -0500)]

introduce copy_page_to_iter, kill loop over iovec in generic_file_aio_read()

generic_file_aio_read() was looping over the target iovec, with loop over
(source) pages nested inside that. Just set an iov_iter up and pass *that*
to do_generic_file_aio_read(). With copy_page_to_iter() doing all work
of mapping and copying a page to iovec and advancing iov_iter.

Switch shmem_file_aio_read() to the same and kill file_read_actor(), while
we are at it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Kent Overstreet [Thu, 28 Nov 2013 00:29:46 +0000 (16:29 -0800)]

iov_iter: Move iov_iter to uio.h

Signed-off-by: Kent Overstreet <kmo@daterainc.com>

commit | commitdiff | tree

Al Viro [Mon, 3 Feb 2014 03:18:22 +0000 (22:18 -0500)]

do_shmem_file_read(): call file_read_actor() directly

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Al Viro [Mon, 3 Feb 2014 03:10:25 +0000 (22:10 -0500)]

callers of iov_copy_from_user_atomic() don't need pagecache_disable()

... it does that itself (via kmap_atomic())

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Al Viro [Mon, 3 Feb 2014 02:16:54 +0000 (21:16 -0500)]

switch ->is_partially_uptodate() to saner arguments

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Al Viro [Mon, 3 Feb 2014 02:09:54 +0000 (21:09 -0500)]

pipe: kill ->map() and ->unmap()

all pipe_buffer_operations have the same instances of those...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Al Viro [Mon, 3 Feb 2014 02:03:40 +0000 (21:03 -0500)]

fuse/dev: use atomic maps

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

David Howells [Fri, 24 Jan 2014 12:17:54 +0000 (12:17 +0000)]

VFS: Make delayed_free() call free_vfsmnt()

Make delayed_free() call free_vfsmnt() so that we don't have two functions
doing the same job. This requires the calls to mnt_free_id() in free_vfsmnt()
to be moved into the callers of that function.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Al Viro [Sun, 2 Feb 2014 11:31:19 +0000 (06:31 -0500)]

mn10300: kmap_atomic() returns void *, not unsigned long...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Al Viro [Sat, 1 Feb 2014 13:26:29 +0000 (08:26 -0500)]

cifs: ->rename() without ->lookup() makes no sense

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Al Viro [Sat, 1 Feb 2014 09:43:32 +0000 (04:43 -0500)]

get rid of pointless checks for NULL ->i_op

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Al Viro [Sat, 1 Feb 2014 09:41:36 +0000 (04:41 -0500)]

ntfs: don't put NULL into ->i_op/->i_fop

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Al Viro [Fri, 14 Mar 2014 17:42:45 +0000 (13:42 -0400)]

new helper: readlink_copy()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Al Viro [Fri, 14 Mar 2014 16:54:25 +0000 (12:54 -0400)]

lustre: generic_readlink() is just fine there, TYVM...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Al Viro [Fri, 14 Mar 2014 16:45:29 +0000 (12:45 -0400)]

get rid of files_defer_init()

the only thing it's doing these days is calculation of
upper limit for fs.nr_open sysctl and that can be done
statically

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Al Viro [Fri, 14 Mar 2014 16:20:17 +0000 (12:20 -0400)]

namei.c: move EXPORT_SYMBOL to corresponding definitions

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Al Viro [Fri, 14 Mar 2014 16:09:36 +0000 (12:09 -0400)]

get_write_access() is inlined, exporting it is pointless

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Al Viro [Fri, 14 Mar 2014 13:43:29 +0000 (09:43 -0400)]

tidy do_dentry_open() up a bit

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Al Viro [Fri, 14 Mar 2014 16:02:47 +0000 (12:02 -0400)]

mark struct file that had write access grabbed by open()

new flag in ->f_mode - FMODE_WRITER. Set by do_dentry_open() in case
when it has grabbed write access, checked by __fput() to decide whether
it wants to drop the sucker. Allows to stop bothering with mnt_clone_write()
in alloc_file(), along with fewer special_file() checks.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Al Viro [Fri, 14 Mar 2014 14:40:46 +0000 (10:40 -0400)]

fold __get_file_write_access() into its only caller

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Al Viro [Fri, 14 Mar 2014 14:06:32 +0000 (10:06 -0400)]

get rid of DEBUG_WRITECOUNT

it only makes control flow in __fput() and friends more convoluted.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Al Viro [Fri, 14 Mar 2014 14:56:20 +0000 (10:56 -0400)]

don't bother with {get,put}_write_access() on non-regular files

it's pointless and actually leads to wrong behaviour in at least one
moderately convoluted case (pipe(), close one end, try to get to
another via /proc/*/fd and run into ETXTBUSY).

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Al Viro [Thu, 6 Mar 2014 22:41:01 +0000 (17:41 -0500)]

ncpfs: switch to sockfd_lookup()/sockfd_put()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Al Viro [Thu, 6 Mar 2014 01:41:36 +0000 (20:41 -0500)]

switch nbd to sockfd_lookup/sockfd_put

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Al Viro [Thu, 6 Mar 2014 01:39:00 +0000 (20:39 -0500)]

vhost: don't open-code sockfd_put()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Al Viro [Thu, 6 Mar 2014 01:33:08 +0000 (20:33 -0500)]

usbip: don't open-code sockfd_lookup/sockfd_put

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Al Viro [Thu, 27 Feb 2014 19:40:10 +0000 (14:40 -0500)]

reduce m_start() cost...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Al Viro [Thu, 27 Feb 2014 14:35:45 +0000 (09:35 -0500)]

smarter propagate_mnt()

The current mainline has copies propagated to *all* nodes, then
tears down the copies we made for nodes that do not contain
counterparts of the desired mountpoint.  That sets the right
propagation graph for the copies (at teardown time we move
the slaves of removed node to a surviving peer or directly
to master), but we end up paying a fairly steep price in
useless allocations.  It's fairly easy to create a situation
where N calls of mount(2) create exactly N bindings, with
O(N^2) vfsmounts allocated and freed in process.

Fortunately, it is possible to avoid those allocations/freeings.
The trick is to create copies in the right order and find which
one would've eventually become a master with the current algorithm.
It turns out to be possible in O(nodes getting propagation) time
and with no extra allocations at all.

One part is that we need to make sure that eventual master will be
created before its slaves, so we need to walk the propagation
tree in a different order - by peer groups.  And iterate through
the peers before dealing with the next group.

Another thing is finding the (earlier) copy that will be a master
of one we are about to create; to do that we are (temporary) marking
the masters of mountpoints we are attaching the copies to.

Either we are in a peer of the last mountpoint we'd dealt with,
or we have the following situation: we are attaching to mountpoint M,
the last copy S_0 had been attached to M_0 and there are sequences
S_0...S_n, M_0...M_n such that S_{i+1} is a master of S_{i},
S_{i} mounted on M{i} and we need to create a slave of the first S_{k}
such that M is getting propagation from M_{k}.  It means that the master
of M_{k} will be among the sequence of masters of M.  On the
other hand, the nearest marked node in that sequence will either
be the master of M_{k} or the master of M_{k-1} (the latter -
in the case if M_{k-1} is a slave of something M gets propagation
from, but in a wrong peer group).

So we go through the sequence of masters of M until we find
a marked one (P).  Let N be the one before it.  Then we go through
the sequence of masters of S_0 until we find one (say, S) mounted
on a node D that has P as master and check if D is a peer of N.
If it is, S will be the master of new copy, if not - the master of S
will be.

That's it for the hard part; the rest is fairly simple.  Iterator
is in next_group(), handling of one prospective mountpoint is
propagate_one().

It seems to survive all tests and gives a noticably better performance
than the current mainline for setups that are seriously using shared
subtrees.

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Al Viro [Fri, 21 Mar 2014 01:10:51 +0000 (21:10 -0400)]

switch mnt_hash to hlist

fixes RCU bug - walking through hlist is safe in face of element moves,
since it's self-terminating. Cyclic lists are not - if we end up jumping
to another hash chain, we'll loop infinitely without ever hitting the
original list head.

[fix for dumb braino folded]

Spotted by: Max Kellermann <mk@cm4all.com>
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Al Viro [Fri, 21 Mar 2014 14:14:08 +0000 (10:14 -0400)]

don't bother with propagate_mnt() unless the target is shared

If the dest_mnt is not shared, propagate_mnt() does nothing -
there's no mounts to propagate to and thus no copies to create.
Might as well don't bother calling it in that case.

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Al Viro [Fri, 21 Mar 2014 00:34:43 +0000 (20:34 -0400)]

keep shadowed vfsmounts together

preparation to switching mnt_hash to hlist

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Al Viro [Fri, 28 Feb 2014 18:46:44 +0000 (13:46 -0500)]

resizable namespace.c hashes

* switch allocation to alloc_large_system_hash()
* make sizes overridable by boot parameters (mhash_entries=, mphash_entries=)
* switch mountpoint_hashtable from list_head to hlist_head

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Linus Torvalds [Sat, 29 Mar 2014 22:01:09 +0000 (15:01 -0700)]

Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull timer fix from Ingo Molnar:
"A late breaking fix from John.  (The bug fixed has a hard lockup
  potential, but that was not observed, warnings were)"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  time: Revert to calling clock_was_set_delayed() while in irq context

commit | commitdiff | tree

Linus Torvalds [Sat, 29 Mar 2014 22:00:27 +0000 (15:00 -0700)]

Merge branch 'for-linus' of git://git./linux/kernel/git/sage/ceph-client

Pull Ceph fix from Sage Weil:
"This drops a bad assert that a few users have been hitting but we've
only recently been able to track down"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
rbd: drop an unsafe assertion

commit | commitdiff | tree

Alex Elder [Tue, 25 Mar 2014 13:36:02 +0000 (15:36 +0200)]

rbd: drop an unsafe assertion

Olivier Bonvalet reported having repeated crashes due to a failed
assertion he was hitting in rbd_img_obj_callback():

    Assertion failure in rbd_img_obj_callback() at line 2165:
rbd_assert(which >= img_request->next_completion);

With a lot of help from Olivier with reproducing the problem
we were able to determine the object and image requests had
already been completed (and often freed) at the point the
assertion failed.

There was a great deal of discussion on the ceph-devel mailing list
about this.  The problem only arose when there were two (or more)
object requests in an image request, and the problem was always
seen when the second request was being completed.

The problem is due to a race in the window between setting the
"done" flag on an object request and checking the image request's
next completion value.  When the first object request completes, it
checks to see if its successor request is marked "done", and if
so, that request is also completed.  In the process, the image
request's next_completion value is updated to reflect that both
the first and second requests are completed.  By the time the
second request is able to check the next_completion value, it
has been set to a value *greater* than its own "which" value,
which caused an assertion to fail.

Fix this problem by skipping over any completion processing
unless the completing object request is the next one expected.
Test only for inequality (not >=), and eliminate the bad
assertion.

Tested-by: Olivier Bonvalet <ob@daevel.fr>
Signed-off-by: Alex Elder <elder@linaro.org>
Reviewed-by: Sage Weil <sage@inktank.com>
Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>

commit | commitdiff | tree

Linus Torvalds [Fri, 28 Mar 2014 22:09:37 +0000 (15:09 -0700)]

Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) We've discovered a common error in several networking drivers, they
    put VLAN offload features into ->vlan_features, which would suggest
    that they support offloading 2 or more levels of VLAN encapsulation.
    Not only do these devices not do that, but we don't have the
    infrastructure yet to handle that at all.

    Fixes from Vlad Yasevich.

2) Fix tcpdump crash with bridging and vlans, also from Vlad.

3) Some MAINTAINERS updates for random32 and bonding.

4) Fix late reseeds of prandom generator, from Sasha Levin.

5) Bridge doesn't handle stacked vlans properly, fix from Toshiaki
    Makita.

6) Fix deadlock in openvswitch, from Flavio Leitner.

7) get_timewait4_sock() doesn't report delay times correctly, fix from
    Eric Dumazet.

8) Duplicate address detection and addrconf verification need to run in
    contexts where RTNL can be obtained.  Move them to run from a
    workqueue.  From Hannes Frederic Sowa.

9) Fix route refcount leaking in ip tunnels, from Pravin B Shelar.

10) Don't return -EINTR from non-blocking recvmsg() on AF_UNIX sockets,
    from Eric Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (28 commits)
  vlan: Warn the user if lowerdev has bad vlan features.
  veth: Turn off vlan rx acceleration in vlan_features
  ifb: Remove vlan acceleration from vlan_features
  qlge: Do not propaged vlan tag offloads to vlans
  bridge: Fix crash with vlan filtering and tcpdump
  net: Account for all vlan headers in skb_mac_gso_segment
  MAINTAINERS: bonding: change email address
  MAINTAINERS: bonding: change email address
  ipv6: move DAD and addrconf_verify processing to workqueue
  tcp: fix get_timewait4_sock() delay computation on 64bit
  openvswitch: fix a possible deadlock and lockdep warning
  bridge: Fix handling stacked vlan tags
  bridge: Fix inabillity to retrieve vlan tags when tx offload is disabled
  vhost: validate vhost_get_vq_desc return value
  vhost: fix total length when packets are too short
  random32: avoid attempt to late reseed if in the middle of seeding
  random32: assign to network folks in MAINTAINERS
  net/mlx4_core: pass pci_device_id.driver_data to __mlx4_init_one during reset
  core, nfqueue, openvswitch: Orphan frags in skb_zerocopy and handle errors
  vlan: Set hard_header_len according to available acceleration
  ...

commit | commitdiff | tree

David S. Miller [Fri, 28 Mar 2014 21:17:16 +0000 (17:17 -0400)]

Merge branch 'vlan_offloads'

Vlad Yasevich says:

====================
Audit all drivers for correct vlan_features.

Some drivers set vlan acceleration features in vlan_features. This causes
issues with Q-in-Q/802.1ad configurations.

Audit all the drivers for correct vlan_features. Fix broken ones.
Add a warning to vlan code to help catch future offenders.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Vlad Yasevich [Fri, 28 Mar 2014 02:14:49 +0000 (22:14 -0400)]

vlan: Warn the user if lowerdev has bad vlan features.

Some drivers incorrectly assign vlan acceleration features to
vlan_features thus causing issues for Q-in-Q vlan configurations.
Warn the user of such cases.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Vlad Yasevich [Fri, 28 Mar 2014 02:14:48 +0000 (22:14 -0400)]

veth: Turn off vlan rx acceleration in vlan_features

For completeness, turn off vlan rx acceleration in vlan_features so
that it doesn't show up on q-in-q setups.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Vlad Yasevich [Fri, 28 Mar 2014 02:14:47 +0000 (22:14 -0400)]

ifb: Remove vlan acceleration from vlan_features

Do not include vlan acceleration features in vlan_features as that
precludes correct Q-in-Q operation.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Vlad Yasevich [Fri, 28 Mar 2014 02:14:46 +0000 (22:14 -0400)]

qlge: Do not propaged vlan tag offloads to vlans

qlge driver turns off NETIF_F_HW_CTAG_FILTER, but forgets to
turn off HW_CTAG_TX and HW_CTAG_RX on vlan devices. With the
current settings, q-in-q will only generate a single vlan header.
Remember to mask off CTAG_TX and CTAG_RX features in vlan_features.

CC: Shahed Shaikh <shahed.shaikh@qlogic.com>
CC: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
CC: Ron Mercer <ron.mercer@qlogic.com>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Acked-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Vlad Yasevich [Fri, 28 Mar 2014 01:51:18 +0000 (21:51 -0400)]

bridge: Fix crash with vlan filtering and tcpdump

When the vlan filtering is enabled on the bridge, but
the filter is not configured on the bridge device itself,
running tcpdump on the bridge device will result in a
an Oops with NULL pointer dereference.  The reason
is that br_pass_frame_up() will bypass the vlan
check because promisc flag is set.  It will then try
to get the table pointer and process the packet based
on the table.  Since the table pointer is NULL, we oops.
Catch this special condition in br_handle_vlan().

Reported-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
CC: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Acked-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Vlad Yasevich [Thu, 27 Mar 2014 21:26:18 +0000 (17:26 -0400)]

net: Account for all vlan headers in skb_mac_gso_segment

skb_network_protocol() already accounts for multiple vlan
headers that may be present in the skb.  However, skb_mac_gso_segment()
doesn't know anything about it and assumes that skb->mac_len
is set correctly to skip all mac headers.  That may not
always be the case.  If we are simply forwarding the packet (via
bridge or macvtap), all vlan headers may not be accounted for.

A simple solution is to allow skb_network_protocol to return
the vlan depth it has calculated.  This way skb_mac_gso_segment
will correctly skip all mac headers.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Veaceslav Falico [Thu, 27 Mar 2014 17:43:50 +0000 (18:43 +0100)]

MAINTAINERS: bonding: change email address

Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Jay Vosburgh [Thu, 27 Mar 2014 17:33:44 +0000 (10:33 -0700)]

MAINTAINERS: bonding: change email address

Update my email address.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Linus Torvalds [Fri, 28 Mar 2014 20:57:13 +0000 (13:57 -0700)]

Merge branch 'akpm' (patches from Andrew Morton)

Merge two fixes from Andrew Morton:
"The x86 fix should come from x86 guys but they appear to be
  conferencing or otherwise distracted.

  The ocfs2 fix is a bit of a mess - the code runs into an immediate
  NULL deref and we're trying to work out how this got through test and
  review, but we haven't heard from Goldwyn in the past few days.
  Sasha's patch fixes the oops, but the feature as a whole is probably
  broken.  So this is a stopgap for 3.14 - I'll aim to get the real
  fixes into 3.14.x"

* emailed patches from Andrew Morton akpm@linux-foundation.org>:
  x86: fix boot on uniprocessor systems
  ocfs2: check if cluster name exists before deref

commit | commitdiff | tree

Artem Fetishev [Fri, 28 Mar 2014 20:33:39 +0000 (13:33 -0700)]

x86: fix boot on uniprocessor systems

On x86 uniprocessor systems topology_physical_package_id() returns -1
which causes rapl_cpu_prepare() to leave rapl_pmu variable uninitialized
which leads to GPF in rapl_pmu_init().

See arch/x86/kernel/cpu/perf_event_intel_rapl.c.

It turns out that physical_package_id and core_id can actually be
retreived for uniprocessor systems too. Enabling them also fixes
rapl_pmu code.

Signed-off-by: Artem Fetishev <artem_fetishev@epam.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Sasha Levin [Fri, 28 Mar 2014 20:33:38 +0000 (13:33 -0700)]

ocfs2: check if cluster name exists before deref

Commit c74a3bdd9b52 ("ocfs2: add clustername to cluster connection") is
trying to strlcpy a string which was explicitly passed as NULL in the
very same patch, triggering a NULL ptr deref.

  BUG: unable to handle kernel NULL pointer dereference at           (null)
  IP: strlcpy (lib/string.c:388 lib/string.c:151)
  CPU: 19 PID: 19426 Comm: trinity-c19 Tainted: G        W     3.14.0-rc7-next-20140325-sasha-00014-g9476368-dirty #274
  RIP:  strlcpy (lib/string.c:388 lib/string.c:151)
  Call Trace:
   ocfs2_cluster_connect (fs/ocfs2/stackglue.c:350)
   ocfs2_cluster_connect_agnostic (fs/ocfs2/stackglue.c:396)
   user_dlm_register (fs/ocfs2/dlmfs/userdlm.c:679)
   dlmfs_mkdir (fs/ocfs2/dlmfs/dlmfs.c:503)
   vfs_mkdir (fs/namei.c:3467)
   SyS_mkdirat (fs/namei.c:3488 fs/namei.c:3472)
   tracesys (arch/x86/kernel/entry_64.S:749)

akpm: this patch probably disables the feature.  A temporary thing to
avoid triviel oopses.

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Goldwyn Rodrigues <rgoldwyn@suse.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Hannes Frederic Sowa [Thu, 27 Mar 2014 17:28:07 +0000 (18:28 +0100)]

ipv6: move DAD and addrconf_verify processing to workqueue

addrconf_join_solict and addrconf_join_anycast may cause actions which
need rtnl locked, especially on first address creation.

A new DAD state is introduced which defers processing of the initial
DAD processing into a workqueue.

To get rtnl lock we need to push the code paths which depend on those
calls up to workqueues, specifically addrconf_verify and the DAD
processing.

(v2)
addrconf_dad_failure needs to be queued up to the workqueue, too. This
patch introduces a new DAD state and stop the DAD processing in the
workqueue (this is because of the possible ipv6_del_addr processing
which removes the solicited multicast address from the device).

addrconf_verify_lock is removed, too. After the transition it is not
needed any more.

As we are not processing in bottom half anymore we need to be a bit more
careful about disabling bottom half out when we lock spin_locks which are also
used in bh.

Relevant backtrace:
[  541.030090] RTNL: assertion failed at net/core/dev.c (4496)
[  541.031143] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           O 3.10.33-1-amd64-vyatta #1
[  541.031145] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[  541.031146]  ffffffff8148a9f0 000000000000002f ffffffff813c98c1 ffff88007c4451f8
[  541.031148]  0000000000000000 0000000000000000 ffffffff813d3540 ffff88007fc03d18
[  541.031150]  0000880000000006 ffff88007c445000 ffffffffa0194160 0000000000000000
[  541.031152] Call Trace:
[  541.031153]  <IRQ>  [<ffffffff8148a9f0>] ? dump_stack+0xd/0x17
[  541.031180]  [<ffffffff813c98c1>] ? __dev_set_promiscuity+0x101/0x180
[  541.031183]  [<ffffffff813d3540>] ? __hw_addr_create_ex+0x60/0xc0
[  541.031185]  [<ffffffff813cfe1a>] ? __dev_set_rx_mode+0xaa/0xc0
[  541.031189]  [<ffffffff813d3a81>] ? __dev_mc_add+0x61/0x90
[  541.031198]  [<ffffffffa01dcf9c>] ? igmp6_group_added+0xfc/0x1a0 [ipv6]
[  541.031208]  [<ffffffff8111237b>] ? kmem_cache_alloc+0xcb/0xd0
[  541.031212]  [<ffffffffa01ddcd7>] ? ipv6_dev_mc_inc+0x267/0x300 [ipv6]
[  541.031216]  [<ffffffffa01c2fae>] ? addrconf_join_solict+0x2e/0x40 [ipv6]
[  541.031219]  [<ffffffffa01ba2e9>] ? ipv6_dev_ac_inc+0x159/0x1f0 [ipv6]
[  541.031223]  [<ffffffffa01c0772>] ? addrconf_join_anycast+0x92/0xa0 [ipv6]
[  541.031226]  [<ffffffffa01c311e>] ? __ipv6_ifa_notify+0x11e/0x1e0 [ipv6]
[  541.031229]  [<ffffffffa01c3213>] ? ipv6_ifa_notify+0x33/0x50 [ipv6]
[  541.031233]  [<ffffffffa01c36c8>] ? addrconf_dad_completed+0x28/0x100 [ipv6]
[  541.031241]  [<ffffffff81075c1d>] ? task_cputime+0x2d/0x50
[  541.031244]  [<ffffffffa01c38d6>] ? addrconf_dad_timer+0x136/0x150 [ipv6]
[  541.031247]  [<ffffffffa01c37a0>] ? addrconf_dad_completed+0x100/0x100 [ipv6]
[  541.031255]  [<ffffffff8105313a>] ? call_timer_fn.isra.22+0x2a/0x90
[  541.031258]  [<ffffffffa01c37a0>] ? addrconf_dad_completed+0x100/0x100 [ipv6]

Hunks and backtrace stolen from a patch by Stephen Hemminger.

Reported-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Eric Dumazet [Thu, 27 Mar 2014 14:19:19 +0000 (07:19 -0700)]

tcp: fix get_timewait4_sock() delay computation on 64bit

It seems I missed one change in get_timewait4_sock() to compute
the remaining time before deletion of IPV4 timewait socket.

This could result in wrong output in /proc/net/tcp for tm->when field.

Fixes: 96f817fedec4 ("tcp: shrink tcp6_timewait_sock by one cache line")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Flavio Leitner [Thu, 27 Mar 2014 14:05:34 +0000 (11:05 -0300)]

openvswitch: fix a possible deadlock and lockdep warning

There are two problematic situations.

A deadlock can happen when is_percpu is false because it can get
interrupted while holding the spinlock. Then it executes
ovs_flow_stats_update() in softirq context which tries to get
the same lock.

The second sitation is that when is_percpu is true, the code
correctly disables BH but only for the local CPU, so the
following can happen when locking the remote CPU without
disabling BH:

       CPU#0                            CPU#1
  ovs_flow_stats_get()
   stats_read()
+->spin_lock remote CPU#1        ovs_flow_stats_get()
|  <interrupted>                  stats_read()
|  ...                       +-->  spin_lock remote CPU#0
|                            |     <interrupted>
|  ovs_flow_stats_update()   |     ...
|   spin_lock local CPU#0 <--+     ovs_flow_stats_update()
+---------------------------------- spin_lock local CPU#1

This patch disables BH for both cases fixing the deadlocks.
Acked-by: Jesse Gross <jesse@nicira.com>
=================================
[ INFO: inconsistent lock state ]
3.14.0-rc8-00007-g632b06a #1 Tainted: G          I
---------------------------------
inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
swapper/0/0 [HC0[0]:SC1[5]:HE1:SE0] takes:
(&(&cpu_stats->lock)->rlock){+.?...}, at: [<ffffffffa05dd8a1>] ovs_flow_stats_update+0x51/0xd0 [openvswitch]
{SOFTIRQ-ON-W} state was registered at:
[<ffffffff810f973f>] __lock_acquire+0x68f/0x1c40
[<ffffffff810fb4e2>] lock_acquire+0xa2/0x1d0
[<ffffffff817d8d9e>] _raw_spin_lock+0x3e/0x80
[<ffffffffa05dd9e4>] ovs_flow_stats_get+0xc4/0x1e0 [openvswitch]
[<ffffffffa05da855>] ovs_flow_cmd_fill_info+0x185/0x360 [openvswitch]
[<ffffffffa05daf05>] ovs_flow_cmd_build_info.constprop.27+0x55/0x90 [openvswitch]
[<ffffffffa05db41d>] ovs_flow_cmd_new_or_set+0x4dd/0x570 [openvswitch]
[<ffffffff816c245d>] genl_family_rcv_msg+0x1cd/0x3f0
[<ffffffff816c270e>] genl_rcv_msg+0x8e/0xd0
[<ffffffff816c0239>] netlink_rcv_skb+0xa9/0xc0
[<ffffffff816c0798>] genl_rcv+0x28/0x40
[<ffffffff816bf830>] netlink_unicast+0x100/0x1e0
[<ffffffff816bfc57>] netlink_sendmsg+0x347/0x770
[<ffffffff81668e9c>] sock_sendmsg+0x9c/0xe0
[<ffffffff816692d9>] ___sys_sendmsg+0x3a9/0x3c0
[<ffffffff8166a911>] __sys_sendmsg+0x51/0x90
[<ffffffff8166a962>] SyS_sendmsg+0x12/0x20
[<ffffffff817e3ce9>] system_call_fastpath+0x16/0x1b
irq event stamp: 1740726
hardirqs last  enabled at (1740726): [<ffffffff8175d5e0>] ip6_finish_output2+0x4f0/0x840
hardirqs last disabled at (1740725): [<ffffffff8175d59b>] ip6_finish_output2+0x4ab/0x840
softirqs last  enabled at (1740674): [<ffffffff8109be12>] _local_bh_enable+0x22/0x50
softirqs last disabled at (1740675): [<ffffffff8109db05>] irq_exit+0xc5/0xd0

other info that might help us debug this:
Possible unsafe locking scenario:

       CPU0
       ----
  lock(&(&cpu_stats->lock)->rlock);
  <Interrupt>
    lock(&(&cpu_stats->lock)->rlock);

*** DEADLOCK ***

5 locks held by swapper/0/0:
#0:  (((&ifa->dad_timer))){+.-...}, at: [<ffffffff810a7155>] call_timer_fn+0x5/0x320
#1:  (rcu_read_lock){.+.+..}, at: [<ffffffff81788a55>] mld_sendpack+0x5/0x4a0
#2:  (rcu_read_lock_bh){.+....}, at: [<ffffffff8175d149>] ip6_finish_output2+0x59/0x840
#3:  (rcu_read_lock_bh){.+....}, at: [<ffffffff8168ba75>] __dev_queue_xmit+0x5/0x9b0
#4:  (rcu_read_lock){.+.+..}, at: [<ffffffffa05e41b5>] internal_dev_xmit+0x5/0x110 [openvswitch]

stack backtrace:
CPU: 0 PID: 0 Comm: swapper/0 Tainted: G          I  3.14.0-rc8-00007-g632b06a #1
Hardware name:                  /DX58SO, BIOS SOX5810J.86A.5599.2012.0529.2218 05/29/2012
0000000000000000 0fcf20709903df0c ffff88042d603808 ffffffff817cfe3c
ffffffff81c134c0 ffff88042d603858 ffffffff817cb6da 0000000000000005
ffffffff00000001 ffff880400000000 0000000000000006 ffffffff81c134c0
Call Trace:
<IRQ>  [<ffffffff817cfe3c>] dump_stack+0x4d/0x66
[<ffffffff817cb6da>] print_usage_bug+0x1f4/0x205
[<ffffffff810f7f10>] ? check_usage_backwards+0x180/0x180
[<ffffffff810f8963>] mark_lock+0x223/0x2b0
[<ffffffff810f96d3>] __lock_acquire+0x623/0x1c40
[<ffffffff810f5707>] ? __lock_is_held+0x57/0x80
[<ffffffffa05e26c6>] ? masked_flow_lookup+0x236/0x250 [openvswitch]
[<ffffffff810fb4e2>] lock_acquire+0xa2/0x1d0
[<ffffffffa05dd8a1>] ? ovs_flow_stats_update+0x51/0xd0 [openvswitch]
[<ffffffff817d8d9e>] _raw_spin_lock+0x3e/0x80
[<ffffffffa05dd8a1>] ? ovs_flow_stats_update+0x51/0xd0 [openvswitch]
[<ffffffffa05dd8a1>] ovs_flow_stats_update+0x51/0xd0 [openvswitch]
[<ffffffffa05dcc64>] ovs_dp_process_received_packet+0x84/0x120 [openvswitch]
[<ffffffff810f93f7>] ? __lock_acquire+0x347/0x1c40
[<ffffffffa05e3bea>] ovs_vport_receive+0x2a/0x30 [openvswitch]
[<ffffffffa05e4218>] internal_dev_xmit+0x68/0x110 [openvswitch]
[<ffffffffa05e41b5>] ? internal_dev_xmit+0x5/0x110 [openvswitch]
[<ffffffff8168b4a6>] dev_hard_start_xmit+0x2e6/0x8b0
[<ffffffff8168be87>] __dev_queue_xmit+0x417/0x9b0
[<ffffffff8168ba75>] ? __dev_queue_xmit+0x5/0x9b0
[<ffffffff8175d5e0>] ? ip6_finish_output2+0x4f0/0x840
[<ffffffff8168c430>] dev_queue_xmit+0x10/0x20
[<ffffffff8175d641>] ip6_finish_output2+0x551/0x840
[<ffffffff8176128a>] ? ip6_finish_output+0x9a/0x220
[<ffffffff8176128a>] ip6_finish_output+0x9a/0x220
[<ffffffff8176145f>] ip6_output+0x4f/0x1f0
[<ffffffff81788c29>] mld_sendpack+0x1d9/0x4a0
[<ffffffff817895b8>] mld_send_initial_cr.part.32+0x88/0xa0
[<ffffffff817691b0>] ? addrconf_dad_completed+0x220/0x220
[<ffffffff8178e301>] ipv6_mc_dad_complete+0x31/0x50
[<ffffffff817690d7>] addrconf_dad_completed+0x147/0x220
[<ffffffff817691b0>] ? addrconf_dad_completed+0x220/0x220
[<ffffffff8176934f>] addrconf_dad_timer+0x19f/0x1c0
[<ffffffff810a71e9>] call_timer_fn+0x99/0x320
[<ffffffff810a7155>] ? call_timer_fn+0x5/0x320
[<ffffffff817691b0>] ? addrconf_dad_completed+0x220/0x220
[<ffffffff810a76c4>] run_timer_softirq+0x254/0x3b0
[<ffffffff8109d47d>] __do_softirq+0x12d/0x480

Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Toshiaki Makita [Thu, 27 Mar 2014 12:46:56 +0000 (21:46 +0900)]

bridge: Fix handling stacked vlan tags

If a bridge with vlan_filtering enabled receives frames with stacked
vlan tags, i.e., they have two vlan tags, br_vlan_untag() strips not
only the outer tag but also the inner tag.

br_vlan_untag() is called only from br_handle_vlan(), and in this case,
it is enough to set skb->vlan_tci to 0 here, because vlan_tci has already
been set before calling br_handle_vlan().

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Acked-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Toshiaki Makita [Thu, 27 Mar 2014 12:46:55 +0000 (21:46 +0900)]

bridge: Fix inabillity to retrieve vlan tags when tx offload is disabled

Bridge vlan code (br_vlan_get_tag()) assumes that all frames have vlan_tci
if they are tagged, but if vlan tx offload is manually disabled on bridge
device and frames are sent from vlan device on the bridge device, the tags
are embedded in skb->data and they break this assumption.
Extract embedded vlan tags and move them to vlan_tci at ingress.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Acked-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Michael S. Tsirkin [Thu, 27 Mar 2014 10:53:37 +0000 (12:53 +0200)]

vhost: validate vhost_get_vq_desc return value

vhost fails to validate negative error code
from vhost_get_vq_desc causing
a crash: we are using -EFAULT which is 0xfffffff2
as vector size, which exceeds the allocated size.

The code in question was introduced in commit
8dd014adfea6f173c1ef6378f7e5e7924866c923
vhost-net: mergeable buffers support

CVE-2014-0055

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Michael S. Tsirkin [Thu, 27 Mar 2014 10:00:26 +0000 (12:00 +0200)]

vhost: fix total length when packets are too short

When mergeable buffers are disabled, and the
incoming packet is too large for the rx buffer,
get_rx_bufs returns success.

This was intentional in order for make recvmsg
truncate the packet and then handle_rx would
detect err != sock_len and drop it.

Unfortunately we pass the original sock_len to
recvmsg - which means we use parts of iov not fully
validated.

Fix this up by detecting this overrun and doing packet drop
immediately.

CVE-2014-0077

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Sasha Levin [Fri, 28 Mar 2014 16:38:42 +0000 (17:38 +0100)]

random32: avoid attempt to late reseed if in the middle of seeding

Commit 4af712e8df ("random32: add prandom_reseed_late() and call when
nonblocking pool becomes initialized") has added a late reseed stage
that happens as soon as the nonblocking pool is marked as initialized.

This fails in the case that the nonblocking pool gets initialized
during __prandom_reseed()'s call to get_random_bytes(). In that case
we'd double back into __prandom_reseed() in an attempt to do a late
reseed - deadlocking on 'lock' early on in the boot process.

Instead, just avoid even waiting to do a reseed if a reseed is already
occuring.

Fixes: 4af712e8df99 ("random32: add prandom_reseed_late() and call when nonblocking pool becomes initialized")
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Linus Torvalds [Fri, 28 Mar 2014 20:03:00 +0000 (13:03 -0700)]

Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input

Pull input subsystem fixes from Dmitry Torokhov:
"Updates to Synaptics touchpad to better cope with devices in Lenovo
  laptops, and a couple more fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: synaptics - add manual min/max quirk for ThinkPad X240
  Input: synaptics - add manual min/max quirk
  Input: cypress_ps2 - don't report as a button pads
  Input: da9052_onkey - use correct register bit for key status
  Input: adp5588-keys - get value from data out when dir is out

commit | commitdiff | tree

Sasha Levin [Thu, 27 Mar 2014 06:01:34 +0000 (02:01 -0400)]

random32: assign to network folks in MAINTAINERS

lib/random32.c was split out of the network code and is de-facto
still maintained by the almighty net/ gods.

Make it a bit more official so that people who aren't aware of
that know where to send their patches.

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Linus Torvalds [Fri, 28 Mar 2014 17:58:10 +0000 (10:58 -0700)]

Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:
"I didn't want these to wait for stable cycle.

  The nouveau and radeon ones are the same problem, where the runtime pm
  stuff broke non-runtime pm managed secondary GPUs.

  The udl fix is for an oops on unplug, and the i915 fix is for a
  regression on Sandybridge even though it may break haswell (regression
  wins)"

Daniel Vetter comments:
"My apologies for the i915 regression fumble, that thing somehow fell
  through the cracks here for almost half a year :( Imo that's more than
  enough flailing to just go ahead with the revert, and the re-broken
  hsw should get peoples attention ..."

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
  drm/i915: Undo gtt scratch pte unmapping again
  drm/radeon: fix runtime suspend breaking secondary GPUs
  drm/nouveau: fail runtime pm properly.
  drm/udl: take reference to device struct for dma-bufs

commit | commitdiff | tree

Linus Torvalds [Fri, 28 Mar 2014 17:55:44 +0000 (10:55 -0700)]

Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux

Pull i2c build fix from Wolfram Sang:
"The build fix from my last request unveiled another build problem
which is fixed with this patch"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: cpm: Fix build by adding of_address.h and of_irq.h

commit | commitdiff | tree

Linus Torvalds [Fri, 28 Mar 2014 17:52:05 +0000 (10:52 -0700)]

Merge tag 'stable/for-linus-3.14-rc8-tag' of git://git./linux/kernel/git/xen/tip

Pull Xen bugfixes from David Vrabel:
"Fix two bugs that cause x86 PV guest crashes.

  1. Ballooning a 32-bit guest would eventually crash it.

  2. Revert a broken fix for a regression with NUMA_BALACING.  The bad
     fix caused PV guests to crash after migration.  This is not ideal
     but unpicking the madness that is _PAGE_NUMA == _PAGE_PROTNONE will
     take a while longer"

* tag 'stable/for-linus-3.14-rc8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  Revert "xen: properly account for _PAGE_NUMA during xen pte translations"
  xen/balloon: flush persistent kmaps in correct position

commit | commitdiff | tree

Hans de Goede [Fri, 28 Mar 2014 08:01:38 +0000 (01:01 -0700)]

Input: synaptics - add manual min/max quirk for ThinkPad X240

This extends Benjamin Tissoires manual min/max quirk table with support for
the ThinkPad X240.

Cc: stable@vger.kernel.org
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>

commit | commitdiff | tree

Benjamin Tissoires [Fri, 28 Mar 2014 07:43:00 +0000 (00:43 -0700)]

Input: synaptics - add manual min/max quirk

The new Lenovo Haswell series (-40's) contains a new Synaptics touchpad.
However, these new Synaptics devices report bad axis ranges.
Under Windows, it is not a problem because the Windows driver uses RMI4
over SMBus to talk to the device. Under Linux, we are using the PS/2
fallback interface and it occurs the reported ranges are wrong.

Of course, it would be too easy to have only one range for the whole
series, each touchpad seems to be calibrated in a different way.

We can not use SMBus to get the actual range because I suspect the firmware
will switch into the SMBus mode and stop talking through PS/2 (this is the
case for hybrid HID over I2C / PS/2 Synaptics touchpads).

So as a temporary solution (until RMI4 land into upstream), start a new
list of quirks with the min/max manually set.

Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
CC: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>

commit | commitdiff | tree

John Stultz [Thu, 27 Mar 2014 23:30:49 +0000 (16:30 -0700)]

time: Revert to calling clock_was_set_delayed() while in irq context

In commit 47a1b796306356f35 ("tick/timekeeping: Call
update_wall_time outside the jiffies lock"), we moved to calling
clock_was_set() due to the fact that we were no longer holding
the timekeeping or jiffies lock.

However, there is still the problem that clock_was_set()
triggers an IPI, which cannot be done from the timer's hard irq
context, and will generate WARN_ON warnings.

Apparently in my earlier testing, I'm guessing I didn't bump the
dmesg log level, so I somehow missed the WARN_ONs.

Thus we need to revert back to calling clock_was_set_delayed().

Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1395963049-11923-1-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

commit | commitdiff | tree

Daniel Vetter [Wed, 26 Mar 2014 19:10:09 +0000 (20:10 +0100)]

drm/i915: Undo gtt scratch pte unmapping again

It apparently blows up on some machines. This functionally reverts

commit 828c79087cec61eaf4c76bb32c222fbe35ac3930
Author: Ben Widawsky <benjamin.widawsky@intel.com>
Date: Wed Oct 16 09:21:30 2013 -0700

drm/i915: Disable GGTT PTEs on GEN6+ suspend

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=64841
Reported-and-Tested-by: Brad Jackson <bjackson0971@gmail.com>
Cc: stable@vger.kernel.org
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Todd Previte <tprevite@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>

commit | commitdiff | tree

Dave Airlie [Thu, 27 Mar 2014 02:31:08 +0000 (02:31 +0000)]

drm/radeon: fix runtime suspend breaking secondary GPUs

Same fix as for nouveau, when we fail with EINVAL, subsequent
gets fail hard, causing the device not to open.

Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

commit | commitdiff | tree

Wei Yang [Thu, 27 Mar 2014 01:28:31 +0000 (09:28 +0800)]

net/mlx4_core: pass pci_device_id.driver_data to __mlx4_init_one during reset

The second parameter of __mlx4_init_one() is used to identify whether the
pci_dev is a PF or VF. Currently, when it is invoked in mlx4_pci_slot_reset()
this information is missed.

This patch match the pci_dev with mlx4_pci_table and passes the
pci_device_id.driver_data to __mlx4_init_one() in mlx4_pci_slot_reset().

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Zoltan Kiss [Wed, 26 Mar 2014 22:37:45 +0000 (22:37 +0000)]

core, nfqueue, openvswitch: Orphan frags in skb_zerocopy and handle errors

skb_zerocopy can copy elements of the frags array between skbs, but it doesn't
orphan them. Also, it doesn't handle errors, so this patch takes care of that
as well, and modify the callers accordingly. skb_tx_error() is also added to
the callers so they will signal the failed delivery towards the creator of the
skb.

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Vlad Yasevich [Wed, 26 Mar 2014 15:47:56 +0000 (11:47 -0400)]

vlan: Set hard_header_len according to available acceleration

Currently, if the card supports CTAG acceleration we do not
account for the vlan header even if we are configuring an
8021AD vlan. This may not be best since we'll do software
tagging for 8021AD which will cause data copy on skb head expansion
Configure the length based on available hw offload capabilities and
vlan protocol.

CC: Patrick McHardy <kaber@trash.net>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Oliver Neukum [Wed, 26 Mar 2014 13:32:51 +0000 (14:32 +0100)]

usbnet: include wait queue head in device structure

This fixes a race which happens by freeing an object on the stack.
Quoting Julius:
> The issue is
> that it calls usbnet_terminate_urbs() before that, which temporarily
> installs a waitqueue in dev->wait in order to be able to wait on the
> tasklet to run and finish up some queues. The waiting itself looks
> okay, but the access to 'dev->wait' is totally unprotected and can
> race arbitrarily. I think in this case usbnet_bh() managed to succeed
> it's dev->wait check just before usbnet_terminate_urbs() sets it back
> to NULL. The latter then finishes and the waitqueue_t structure on its
> stack gets overwritten by other functions halfway through the
> wake_up() call in usbnet_bh().

The fix is to just not allocate the data structure on the stack.
As dev->wait is abused as a flag it also takes a runtime PM change
to fix this bug.

Signed-off-by: Oliver Neukum <oneukum@suse.de>
Reported-by: Grant Grundler <grundler@google.com>
Tested-by: Grant Grundler <grundler@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Jason Wang [Wed, 26 Mar 2014 05:03:00 +0000 (13:03 +0800)]

virtio-net: correct error handling of virtqueue_kick()

Current error handling of virtqueue_kick() was wrong in two places:
- The skb were freed immediately when virtqueue_kick() fail during
  xmit. This may lead double free since the skb was not detached from
  the virtqueue.
- try_fill_recv() returns false when virtqueue_kick() fail. This will
  lead unnecessary rescheduling of refill work.

Actually, it's safe to just ignore the kick failure in those two
places. So this patch fixes this by partially revert commit
67975901183799af8e93ec60e322f9e2a1940b9b.

Fixes 67975901183799af8e93ec60e322f9e2a1940b9b
(virtio_net: verify if virtqueue_kick() succeeded).

Cc: Heinz Graalfs <graalfs@linux.vnet.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Jan Kara [Wed, 26 Mar 2014 05:20:14 +0000 (06:20 +0100)]

vfs: Allocate anon_inode_inode in anon_inode_init()

Currently we allocated anon_inode_inode in anon_inodefs_mount. This is
somewhat fragile as if that function ever gets called again, it will
overwrite anon_inode_inode pointer. So move the initialization of
anon_inode_inode to anon_inode_init().

Signed-off-by: Jan Kara <jack@suse.cz>
[ Further simplified on suggestion from Dave Jones ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Dave Airlie [Wed, 26 Mar 2014 04:09:37 +0000 (14:09 +1000)]

drm/nouveau: fail runtime pm properly.

If we were on a non-optimus device, we'd return -EINVAL, this would
lead to the over engineered runtime pm system to go into an error
state, subsequent get_sync's would fail, so we'd never be able
to open the device again.

(like really get_sync shouldn't fail if the device isn't powered
down).

Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

commit | commitdiff | tree

Dave Airlie [Tue, 25 Mar 2014 04:38:44 +0000 (14:38 +1000)]

drm/udl: take reference to device struct for dma-bufs

this stops the device from being deleted before all the dma-bufs
on it are freed, this fixes an oops when you unplug a udl device while
it has imported a buffer from another device.

Signed-off-by: Dave Airlie <airlied@redhat.com>

commit | commitdiff | tree

Eric Dumazet [Wed, 26 Mar 2014 01:42:27 +0000 (18:42 -0700)]

net: unix: non blocking recvmsg() should not return -EINTR

Some applications didn't expect recvmsg() on a non blocking socket
could return -EINTR. This possibility was added as a side effect
of commit b3ca9b02b00704 ("net: fix multithreaded signal handling in
unix recv routines").

To hit this bug, you need to be a bit unlucky, as the u->readlock
mutex is usually held for very small periods.

Fixes: b3ca9b02b00704 ("net: fix multithreaded signal handling in unix recv routines")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Wed, 26 Mar 2014 20:53:04 +0000 (16:53 -0400)]

Merge branch 'mvneta'

Thomas Petazzoni says:

====================
net: mvneta: fix usage as a module

The following set of two patches fix the usage of the mvneta driver
when built as a module, and used in RGMII configurations. It is
somewhat similar to a previous fix that was made by Arnaud Patard, but
which was limited to SGMII configurations.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Thomas Petazzoni [Tue, 25 Mar 2014 23:26:55 +0000 (00:26 +0100)]

net: mvneta: use devm_ioremap_resource() instead of of_iomap()

The mvneta driver currently uses of_iomap(), which has two drawbacks:
it doesn't request the resource, and it isn't devm-style so some error
handling is needed.

This commit switches to use devm_ioremap_resource() instead, which
automatically requests the resource (so the I/O registers region shows
up properly in /proc/iomem), and also is devm-style, which allows to
get rid of some error handling to unmap the I/O registers region.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Thomas Petazzoni [Tue, 25 Mar 2014 23:25:42 +0000 (00:25 +0100)]

net: mvneta: fix usage as a module on RGMII configurations

Commit 5445eaf309ff ('mvneta: Try to fix mvneta when compiled as
module') fixed the mvneta driver to make it work properly when loaded
as a module in SGMII configuration, which was tested successful by the
author on the Armada XP OpenBlocks AX3, which uses SGMII.

However, it turns out that the Armada XP GP, which uses RGMII, is
affected by a similar problem: its SERDES configuration is lost when
mvneta is loaded as a module, because this configuration is set by the
bootloader, and then lost because the clock is gated by the clock
framework until the mvneta driver is loaded again and the clock is
re-enabled.

However, it turns out that for the RGMII case, setting the SERDES
configuration is not sufficient: the PCS enable bit in the
MVNETA_GMAC_CTRL_2 register must also be set, like in the SGMII
configuration.

Therefore, this commit reworks the SGMII/RGMII initialization: the
only difference between the two now is a different SERDES
configuration, all the rest is identical.

In detail, to achieve this, the commit:

* Renames MVNETA_SGMII_SERDES_CFG to MVNETA_SERDES_CFG because it is
   not specific to SGMII, but also used on RGMII configurations.

* Adds a MVNETA_RGMII_SERDES_PROTO definition, that must be used as
   the MVNETA_SERDES_CFG value in RGMII configurations.

* Removes the mvneta_gmac_rgmii_set() and mvneta_port_sgmii_config()
   functions, and instead directly do the SGMII/RGMII configuration in
   mvneta_port_up(), from where those functions where called. It is
   worth mentioning that mvneta_gmac_rgmii_set() had an 'enable'
   parameter that was always passed as '1', so it was pretty useless.

* Reworks the mvneta_port_up() function to set the MVNETA_SERDES_CFG
   register to the appropriate value depending on the RGMII vs. SGMII
   configuration. It also unconditionally set the PCS_ENABLE bit (was
   already done for SGMII, but is now also needed for RGMII), and sets
   the PORT_RGMII bit (which was already done for both SGMII and
   RGMII).

This commit was successfully tested with mvneta compiled as a module,
on both the OpenBlocks AX3 (SGMII configuration) and the Armada XP GP
(RGMII configuration).

Reported-by: Steve McIntyre <steve@einval.com>
Cc: stable@vger.kernel.org # 3.11.x: 5445eaf309ff mvneta: Try to fix mvneta when compiled as module
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Thomas Petazzoni [Tue, 25 Mar 2014 23:25:41 +0000 (00:25 +0100)]

net: mvneta: rename MVNETA_GMAC2_PSC_ENABLE to MVNETA_GMAC2_PCS_ENABLE

Bit 3 of the MVNETA_GMAC_CTRL_2 is actually used to enable the PCS,
not the PSC: there was a typo in the name of the define, which this
commit fixes.

Cc: stable@vger.kernel.org
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Hans de Goede [Wed, 26 Mar 2014 20:30:52 +0000 (13:30 -0700)]

Input: cypress_ps2 - don't report as a button pads

The cypress PS/2 trackpad models supported by the cypress_ps2 driver
emulate BTN_RIGHT events in firmware based on the finger position, as part
of this no motion events are sent when the finger is in the button area.

The INPUT_PROP_BUTTONPAD property is there to indicate to userspace that
BTN_RIGHT events should be emulated in userspace, which is not necessary
in this case.

When INPUT_PROP_BUTTONPAD is advertised userspace will wait for a motion
event before propagating the button event higher up the stack, as it needs
current abs x + y data for its BTN_RIGHT emulation. Since in the
cypress_ps2 pads don't report motion events in the button area, this means
that clicks in the button area end up being ignored, so
INPUT_PROP_BUTTONPAD actually causes problems for these touchpads, and
removing it fixes:

https://bugs.freedesktop.org/show_bug.cgi?id=76341

Reported-by: Adam Williamson <awilliam@redhat.com>
Tested-by: Adam Williamson <awilliam@redhat.com>
Reviewed-by: Peter Hutterer <peter.hutterer@who-t.net>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>

commit | commitdiff | tree

Vlad Yasevich [Mon, 24 Mar 2014 21:52:12 +0000 (17:52 -0400)]

tg3: Do not include vlan acceleration features in vlan_features

Including hardware acceleration features in vlan_features breaks
stacked vlans (Q-in-Q) by marking the bottom vlan interface as
capable of acceleration. This causes one of the tags to be lost
and the packets are sent with a sing vlan header.

CC: Nithin Nayak Sujir <nsujir@broadcom.com>
CC: Michael Chan <mchan@broadcom.com>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Pravin B Shelar [Mon, 24 Mar 2014 05:06:36 +0000 (22:06 -0700)]

ip_tunnel: Fix dst ref-count.

Commit 10ddceb22ba (ip_tunnel:multicast process cause panic due
to skb->_skb_refdst NULL pointer) removed dst-drop call from
ip-tunnel-recv.

Following commit reintroduce dst-drop and fix the original bug by
checking loopback packet before releasing dst.
Original bug: https://bugzilla.kernel.org/show_bug.cgi?id=70681

CC: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Linus Torvalds [Wed, 26 Mar 2014 16:09:18 +0000 (09:09 -0700)]

Merge tag 'trace-fixes-v3.14-rc7-v2' of git://git./linux/kernel/git/rostedt/linux-trace

Pull tracing fix from Steven Rostedt:
"While on my flight to Linux Collaboration Summit, I was working on my
  slides for the event trigger tutorial.  I booted a 3.14-rc7 kernel to
  perform what I wanted to teach and cut and paste it into my slides.
  When I tried the traceon event trigger with a condition attached to it
  (turns tracing on only if a field of the trigger event matches a
  condition set by the user), nothing happened.  Tracing would not turn
  on.  I stopped working on my presentation in order to find what was
  wrong.

  It ended up being the way trace event triggers work when they have
  conditions.  Instead of copying the fields, the condition code just
  looks at the fields that were copied into the ring buffer.  This works
  great, unless tracing is off.  That's because when the event is
  reserved on the ring buffer, the ring buffer returns a NULL pointer,
  this tells the tracing code that the ring buffer is disabled.  This
  ends up being a problem for the traceon trigger if it is using this
  information to check its condition.

  Luckily the code that checks if tracing is on returns the ring buffer
  to use (because the ring buffer is determined by the event file also
  passed to that field).  I was able to easily solve this bug by
  checking in that helper function if the returned ring buffer entry is
  NULL, and if so, also check the file flag if it has a trace event
  trigger condition, and if so, to pass back a temp ring buffer to use.
  This will allow the trace event trigger condition to still test the
  event fields, but nothing will be recorded"

* tag 'trace-fixes-v3.14-rc7-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Fix traceon trigger condition to actually turn tracing on

commit | commitdiff | tree

Steven Rostedt (Red Hat) [Wed, 26 Mar 2014 03:39:41 +0000 (23:39 -0400)]

tracing: Fix traceon trigger condition to actually turn tracing on

While working on my tutorial for 2014 Linux Collaboration Summit
I found that the traceon trigger did not work when conditions were
used. The other triggers worked fine though. Looking into it, it
is because of the way the triggers use the ring buffer to store
the fields it will use for the condition. But if tracing is off, nothing
is stored in the buffer, and the tracepoint exits before calling the
trigger to test the condition. This is fine for all the triggers that
only work when tracing is on, but for traceon trigger that is to
work when tracing is off, nothing happens.

The fix is simple, just use a temp ring buffer to record the event
if tracing is off and the event has a trace event conditional trigger
enabled. The rest of the tracepoint code will work just fine, but
the tracepoint wont be recorded in the other buffers.

Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

commit | commitdiff | tree

Linus Torvalds [Wed, 26 Mar 2014 00:43:34 +0000 (17:43 -0700)]

fs: remove now stale label in anon_inode_init()

The previous commit removed the register_filesystem() call and the
associated error handling, but left the label for the error path that no
longer exists. Remove that too.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Jan Kara [Tue, 25 Mar 2014 20:37:09 +0000 (21:37 +0100)]

fs: Avoid userspace mounting anon_inodefs filesystem

anon_inodefs filesystem is a kernel internal filesystem userspace
shouldn't mess with. Remove registration of it so userspace cannot
even try to mount it (which would fail anyway because the filesystem is
MS_NOUSER).

This fixes an oops triggered by trinity when it tried mounting
anon_inodefs which overwrote anon_inode_inode pointer while other CPU
has been in anon_inode_getfile() between ihold() and d_instantiate().
Thus effectively creating dentry pointing to an inode without holding a
reference to it.

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Linus Torvalds [Tue, 25 Mar 2014 22:24:11 +0000 (15:24 -0700)]

Merge branch 'nfsd-next' of git://linux-nfs.org/~bfields/linux

Pull nfsd fix frm Bruce Fields:
"J R Okajima sent this early and I was just slow to pass it along,
apologies. Fortunately it's a simple fix"

* 'nfsd-next' of git://linux-nfs.org/~bfields/linux:
nfsd: fix lost nfserrno() call in nfsd_setattr()

commit | commitdiff | tree

Linus Torvalds [Tue, 25 Mar 2014 22:05:57 +0000 (15:05 -0700)]

Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs

Pull vfs fixes from Al Viro:
"These four commits are obvious fixes (a couple of fdget_pos()-related
  ones from Eric Biggers, prepend_name() fix, missing checks for false
  negatives from __lookup_mnt() in fs/namei.c)"

For now I'm pulling just the four obvious fixes, there's another four
pending in Al's 'for-linus' branch wrt the mnt_hash list that were more
involved.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  rcuwalk: recheck mount_lock after mountpoint crossing attempts
  make prepend_name() work correctly when called with negative *buflen
  vfs: Don't let __fdget_pos() get FMODE_PATH files
  vfs: atomic f_pos access in llseek()

commit | commitdiff | tree

David Vrabel [Tue, 25 Mar 2014 10:38:37 +0000 (10:38 +0000)]

Revert "xen: properly account for _PAGE_NUMA during xen pte translations"

This reverts commit a9c8e4beeeb64c22b84c803747487857fe424b68.

PTEs in Xen PV guests must contain machine addresses if _PAGE_PRESENT
is set and pseudo-physical addresses is _PAGE_PRESENT is clear.

This is because during a domain save/restore (migration) the page
table entries are "canonicalised" and uncanonicalised". i.e., MFNs are
converted to PFNs during domain save so that on a restore the page
table entries may be rewritten with the new MFNs on the destination.
This canonicalisation is only done for PTEs that are present.

This change resulted in writing PTEs with MFNs if _PAGE_PROTNONE (or
_PAGE_NUMA) was set but _PAGE_PRESENT was clear. These PTEs would be
migrated as-is which would result in unexpected behaviour in the
destination domain. Either a) the MFN would be translated to the
wrong PFN/page; b) setting the _PAGE_PRESENT bit would clear the PTE
because the MFN is no longer owned by the domain; or c) the present
bit would not get set.

Symptoms include "Bad page" reports when munmapping after migrating a
domain.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: <stable@vger.kernel.org> [3.12+]

commit | commitdiff | tree

Wei Liu [Sat, 15 Mar 2014 16:11:47 +0000 (16:11 +0000)]

xen/balloon: flush persistent kmaps in correct position

Xen balloon driver will update ballooned out pages' P2M entries to point
to scratch page for PV guests. In 24f69373e2 ("xen/balloon: don't alloc
page while non-preemptible", kmap_flush_unused was moved after updating
P2M table. In that case for 32 bit PV guest we might end up with

  P2M    X -----> S  (S is mfn of balloon scratch page)
  M2P    Y -----> X  (Y is mfn in persistent kmap entry)

kmap_flush_unused() iterates through all the PTEs in the kmap address
space, using pte_to_page() to obtain the page. If the p2m and the m2p
are inconsistent the incorrect page is returned.  This will clear
page->address on the wrong page which may cause subsequent oopses if
that page is currently kmap'ed.

Move the flush back between get_page and __set_phys_to_machine to fix
this.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Cc: stable@vger.kernel.org # 3.12+

commit | commitdiff | tree

Linus Torvalds [Tue, 25 Mar 2014 02:31:17 +0000 (19:31 -0700)]

Linux 3.14-rc8

Domain: System / Kernel;

RSS Atom