Chuck Lever [Thu, 17 Oct 2013 18:12:28 +0000 (14:12 -0400)]
NFS: Add nfs4_update_server
New function nfs4_update_server() moves an nfs_server to a different
nfs_client. This is done as part of migration recovery.
Though it may be appealing to think of them as the same thing,
migration recovery is not the same as following a referral.
For a referral, the client has not descended into the file system
yet: it has no nfs_server, no super block, no inodes or open state.
It is enough to simply instantiate the nfs_server and super block,
and perform a referral mount.
For a migration, however, we have all of those things already, and
they have to be moved to a different nfs_client. No local namespace
changes are needed here.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Thu, 17 Oct 2013 18:12:23 +0000 (14:12 -0400)]
SUNRPC: Add a helper to switch the transport of an rpc_clnt
Add an RPC client API to redirect an rpc_clnt's transport from a
source server to a destination server during a migration event.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
[ cel: forward ported to 3.12 ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Thu, 17 Oct 2013 18:12:17 +0000 (14:12 -0400)]
SUNRPC: Modify synopsis of rpc_client_register()
The rpc_client_register() helper was added in commit
e73f4cc0,
"SUNRPC: split client creation routine into setup and registration,"
Mon Jun 24 11:52:52 2013. In a subsequent patch, I'd like to invoke
rpc_client_register() from a context where a struct rpc_create_args
is not available.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Weston Andros Adamson [Mon, 21 Oct 2013 17:10:13 +0000 (13:10 -0400)]
NFSv4: don't reprocess cached open CLAIM_PREVIOUS
Cached opens have already been handled by _nfs4_opendata_reclaim_to_nfs4_state
and can safely skip being reprocessed, but must still call update_open_stateid
to make sure that all active fmodes are recovered.
Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Cc: stable@vger.kernel.org # 3.7.x: f494a6071d3: NFSv4: fix NULL dereference
Cc: stable@vger.kernel.org # 3.7.x: a43ec98b72a: NFSv4: don't fail on missin
Cc: stable@vger.kernel.org # 3.7.x
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 28 Oct 2013 18:57:12 +0000 (14:57 -0400)]
NFSv4: Fix state reference counting in _nfs4_opendata_reclaim_to_nfs4_state
Currently, if the call to nfs_refresh_inode fails, then we end up leaking
a reference count, due to the call to nfs4_get_open_state.
While we're at it, replace nfs4_get_open_state with a simple call to
atomic_inc(); there is no need to do a full lookup of the struct nfs_state
since it is passed as an argument in the struct nfs4_opendata, and
is already assigned to the variable 'state'.
Cc: stable@vger.kernel.org # 3.7.x: a43ec98b72a: NFSv4: don't fail on missing
Cc: stable@vger.kernel.org # 3.7.x
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Weston Andros Adamson [Mon, 21 Oct 2013 17:10:11 +0000 (13:10 -0400)]
NFSv4: don't fail on missing fattr in open recover
This is an unneeded check that could cause the client to fail to recover
opens.
Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Weston Andros Adamson [Mon, 21 Oct 2013 17:10:10 +0000 (13:10 -0400)]
NFSv4: fix NULL dereference in open recover
_nfs4_opendata_reclaim_to_nfs4_state doesn't expect to see a cached
open CLAIM_PREVIOUS, but this can happen. An example is when there are
RDWR openers and RDONLY openers on a delegation stateid. The recovery
path will first try an open CLAIM_PREVIOUS for the RDWR openers, this
marks the delegation as not needing RECLAIM anymore, so the open
CLAIM_PREVIOUS for the RDONLY openers will not actually send an rpc.
The NULL dereference is due to _nfs4_opendata_reclaim_to_nfs4_state
returning PTR_ERR(rpc_status) when !rpc_done. When the open is
cached, rpc_done == 0 and rpc_status == 0, thus
_nfs4_opendata_reclaim_to_nfs4_state returns NULL - this is unexpected
by callers of nfs4_opendata_to_nfs4_state().
This can be reproduced easily by opening the same file two times on an
NFSv4.0 mount with delegations enabled, once as RDWR and once as RDONLY then
sleeping for a long time. While the files are held open, kick off state
recovery and this NULL dereference will be hit every time.
An example OOPS:
[ 65.003602] BUG: unable to handle kernel NULL pointer dereference at
00000000
00000030
[ 65.005312] IP: [<
ffffffffa037d6ee>] __nfs4_close+0x1e/0x160 [nfsv4]
[ 65.006820] PGD
7b0ea067 PUD
791ff067 PMD 0
[ 65.008075] Oops: 0000 [#1] SMP
[ 65.008802] Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache
snd_ens1371 gameport nfsd snd_rawmidi snd_ac97_codec ac97_bus btusb snd_seq snd
_seq_device snd_pcm ppdev bluetooth auth_rpcgss coretemp snd_page_alloc crc32_pc
lmul crc32c_intel ghash_clmulni_intel microcode rfkill nfs_acl vmw_balloon serio
_raw snd_timer lockd parport_pc e1000 snd soundcore parport i2c_piix4 shpchp vmw
_vmci sunrpc ata_generic mperf pata_acpi mptspi vmwgfx ttm scsi_transport_spi dr
m mptscsih mptbase i2c_core
[ 65.018684] CPU: 0 PID: 473 Comm: 192.168.10.85-m Not tainted 3.11.2-201.fc19
.x86_64 #1
[ 65.020113] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop
Reference Platform, BIOS 6.00 07/31/2013
[ 65.022012] task:
ffff88003707e320 ti:
ffff88007b906000 task.ti:
ffff88007b906000
[ 65.023414] RIP: 0010:[<
ffffffffa037d6ee>] [<
ffffffffa037d6ee>] __nfs4_close+0x1e/0x160 [nfsv4]
[ 65.025079] RSP: 0018:
ffff88007b907d10 EFLAGS:
00010246
[ 65.026042] RAX:
0000000000000000 RBX:
0000000000000000 RCX:
0000000000000000
[ 65.027321] RDX:
0000000000000050 RSI:
0000000000000001 RDI:
0000000000000000
[ 65.028691] RBP:
ffff88007b907d38 R08:
0000000000016f60 R09:
0000000000000000
[ 65.029990] R10:
0000000000000000 R11:
0000000000000000 R12:
0000000000000001
[ 65.031295] R13:
0000000000000050 R14:
0000000000000000 R15:
0000000000000001
[ 65.032527] FS:
0000000000000000(0000) GS:
ffff88007f600000(0000) knlGS:
0000000000000000
[ 65.033981] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 65.035177] CR2:
0000000000000030 CR3:
000000007b27f000 CR4:
00000000000407f0
[ 65.036568] Stack:
[ 65.037011]
0000000000000000 0000000000000001 ffff88007b907d90 ffff88007a880220
[ 65.038472]
ffff88007b768de8 ffff88007b907d48 ffffffffa037e4a5 ffff88007b907d80
[ 65.039935]
ffffffffa036a6c8 ffff880037020e40 ffff88007a880000 ffff880037020e40
[ 65.041468] Call Trace:
[ 65.042050] [<
ffffffffa037e4a5>] nfs4_close_state+0x15/0x20 [nfsv4]
[ 65.043209] [<
ffffffffa036a6c8>] nfs4_open_recover_helper+0x148/0x1f0 [nfsv4]
[ 65.044529] [<
ffffffffa036a886>] nfs4_open_recover+0x116/0x150 [nfsv4]
[ 65.045730] [<
ffffffffa036d98d>] nfs4_open_reclaim+0xad/0x150 [nfsv4]
[ 65.046905] [<
ffffffffa037d979>] nfs4_do_reclaim+0x149/0x5f0 [nfsv4]
[ 65.048071] [<
ffffffffa037e1dc>] nfs4_run_state_manager+0x3bc/0x670 [nfsv4]
[ 65.049436] [<
ffffffffa037de20>] ? nfs4_do_reclaim+0x5f0/0x5f0 [nfsv4]
[ 65.050686] [<
ffffffffa037de20>] ? nfs4_do_reclaim+0x5f0/0x5f0 [nfsv4]
[ 65.051943] [<
ffffffff81088640>] kthread+0xc0/0xd0
[ 65.052831] [<
ffffffff81088580>] ? insert_kthread_work+0x40/0x40
[ 65.054697] [<
ffffffff8165686c>] ret_from_fork+0x7c/0xb0
[ 65.056396] [<
ffffffff81088580>] ? insert_kthread_work+0x40/0x40
[ 65.058208] Code: 5c 41 5d 5d c3 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5 41 57 41 89 f7 41 56 41 89 ce 41 55 41 89 d5 41 54 53 48 89 fb <4c> 8b 67 30 f0 41 ff 44 24 44 49 8d 7c 24 40 e8 0e 0a 2d e1 44
[ 65.065225] RIP [<
ffffffffa037d6ee>] __nfs4_close+0x1e/0x160 [nfsv4]
[ 65.067175] RSP <
ffff88007b907d10>
[ 65.068570] CR2:
0000000000000030
[ 65.070098] ---[ end trace
0d1fe4f5c7dd6f8b ]---
Cc: <stable@vger.kernel.org> #3.7+
Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 28 Oct 2013 18:50:38 +0000 (14:50 -0400)]
NFSv4.1: Don't change the security label as part of open reclaim.
The current caching model calls for the security label to be set on
first lookup and/or on any subsequent label changes. There is no
need to do it as part of an open reclaim.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Jeff Layton [Mon, 21 Oct 2013 13:52:19 +0000 (09:52 -0400)]
nfs: fix handling of invalid mount options in nfs_remount
nfs_parse_mount_options returns 0 on error, not -errno.
Reported-by: Karel Zak <kzak@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Jeff Layton [Fri, 18 Oct 2013 14:03:37 +0000 (10:03 -0400)]
nfs: reject version and minorversion changes on remount attempts
Reported-by: Eric Doutreleau <edoutreleau@genoscope.cns.fr>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Andy Adamson [Thu, 3 Oct 2013 16:23:24 +0000 (12:23 -0400)]
NFSv4 Remove zeroing state kern warnings
As of commit
5d422301f97b821301efcdb6fc9d1a83a5c102d6 we no longer zero the
state.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Thu, 26 Sep 2013 19:22:45 +0000 (15:22 -0400)]
SUNRPC: call_connect_status should recheck bind and connect status on error
Currently, we go directly to call_transmit which sends us to call_status
on error. If we know that the connect attempt failed, we should rather
just jump straight back to call_bind and call_connect.
Ditto for EAGAIN, except do not delay.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Fri, 27 Sep 2013 15:28:40 +0000 (11:28 -0400)]
SUNRPC: Remove redundant initialisations of request rq_bytes_sent
Now that we clear the rq_bytes_sent field on unlock, we don't need
to set it on lock, so we just set it once when initialising the request.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Wed, 25 Sep 2013 17:39:45 +0000 (13:39 -0400)]
SUNRPC: Fix RPC call retransmission statistics
A retransmit should be when you successfully transmit an RPC call to
the server a second time.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Tue, 24 Sep 2013 16:06:07 +0000 (12:06 -0400)]
NFSv4: Ensure that we disable the resend timeout for NFSv4
The spec states that the client should not resend requests because
the server will disconnect if it needs to drop an RPC request.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Tue, 24 Sep 2013 16:00:27 +0000 (12:00 -0400)]
SUNRPC: Add RPC task and client level options to disable the resend timeout
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Wed, 25 Sep 2013 16:17:18 +0000 (12:17 -0400)]
SUNRPC: Clean up - convert xprt_prepare_transmit to return a bool
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Fri, 27 Sep 2013 15:09:53 +0000 (11:09 -0400)]
SUNRPC: Clear the request rq_bytes_sent field in xprt_release_write
Otherwise the tests of req->rq_bytes_sent in xprt_prepare_transmit
will fail if we're dealing with a resend.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Wed, 25 Sep 2013 15:31:54 +0000 (11:31 -0400)]
SUNRPC: Don't set the request connect_cookie until a successful transmit
We're using the request connect_cookie to track whether or not a
request was successfully transmitted on the current transport
connection or not. For that reason we should ensure that it is
only set after we've successfully transmitted the request.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Thu, 26 Sep 2013 14:18:04 +0000 (10:18 -0400)]
SUNRPC: Only update the TCP connect cookie on a successful connect
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Tue, 24 Sep 2013 15:25:22 +0000 (11:25 -0400)]
SUNRPC: Enable the keepalive option for TCP sockets
For NFSv4 we want to avoid retransmitting RPC calls unless the TCP
connection breaks. However we still want to detect TCP connection
breakage as soon as possible. Do this by setting the keepalive option
with the idle timeout and count set to the 'timeo' and 'retrans' mount
options.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Tue, 1 Oct 2013 18:24:58 +0000 (14:24 -0400)]
NFSv4: Fix a use-after-free situation in _nfs4_proc_getlk()
In nfs4_proc_getlk(), when some error causes a retry of the call to
_nfs4_proc_getlk(), we can end up with Oopses of the form
BUG: unable to handle kernel NULL pointer dereference at
0000000000000134
IP: [<
ffffffff8165270e>] _raw_spin_lock+0xe/0x30
<snip>
Call Trace:
[<
ffffffff812f287d>] _atomic_dec_and_lock+0x4d/0x70
[<
ffffffffa053c4f2>] nfs4_put_lock_state+0x32/0xb0 [nfsv4]
[<
ffffffffa053c585>] nfs4_fl_release_lock+0x15/0x20 [nfsv4]
[<
ffffffffa0522c06>] _nfs4_proc_getlk.isra.40+0x146/0x170 [nfsv4]
[<
ffffffffa052ad99>] nfs4_proc_lock+0x399/0x5a0 [nfsv4]
The problem is that we don't clear the request->fl_ops after the first
try and so when we retry, nfs4_set_lock_state() exits early without
setting the lock stateid.
Regression introduced by commit
70cc6487a4e08b8698c0e2ec935fb48d10490162
(locks: make ->lock release private data before returning in GETLK case)
Reported-by: Weston Andros Adamson <dros@netapp.com>
Reported-by: Jorge Mora <mora@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: <stable@vger.kernel.org> #2.6.22+
Linus Torvalds [Tue, 1 Oct 2013 00:10:26 +0000 (17:10 -0700)]
Merge tag 'nfs-for-3.12-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes from Trond Myklebust:
- Stable fix for Oopses in the pNFS files layout driver
- Fix a regression when doing a non-exclusive file create on NFSv4.x
- NFSv4.1 security negotiation fixes when looking up the root
filesystem
- Fix a memory ordering issue in the pNFS files layout driver
* tag 'nfs-for-3.12-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFS: Give "flavor" an initial value to fix a compile warning
NFSv4.1: try SECINFO_NO_NAME flavs until one works
NFSv4.1: Ensure memory ordering between nfs4_ds_connect and nfs4_fl_prepare_ds
NFSv4.1: nfs4_fl_prepare_ds - fix bugs when the connect attempt fails
NFSv4: Honour the 'opened' parameter in the atomic_open() filesystem method
Linus Torvalds [Mon, 30 Sep 2013 21:32:32 +0000 (14:32 -0700)]
Merge branch 'akpm' (fixes from Andrew Morton)
Merge misc fixes from Andrew Morton.
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (22 commits)
pidns: fix free_pid() to handle the first fork failure
ipc,msg: prevent race with rmid in msgsnd,msgrcv
ipc/sem.c: update sem_otime for all operations
mm/hwpoison: fix the lack of one reference count against poisoned page
mm/hwpoison: fix false report on 2nd attempt at page recovery
mm/hwpoison: fix test for a transparent huge page
mm/hwpoison: fix traversal of hugetlbfs pages to avoid printk flood
block: change config option name for cmdline partition parsing
mm/mlock.c: prevent walking off the end of a pagetable in no-pmd configuration
mm: avoid reinserting isolated balloon pages into LRU lists
arch/parisc/mm/fault.c: fix uninitialized variable usage
include/asm-generic/vtime.h: avoid zero-length file
nilfs2: fix issue with race condition of competition between segments for dirty blocks
Documentation/kernel-parameters.txt: replace kernelcore with Movable
mm/bounce.c: fix a regression where MS_SNAP_STABLE (stable pages snapshotting) was ignored
kernel/kmod.c: check for NULL in call_usermodehelper_exec()
ipc/sem.c: synchronize the proc interface
ipc/sem.c: optimize sem_lock()
ipc/sem.c: fix race in sem_lock()
mm/compaction.c: periodically schedule when freeing pages
...
Oleg Nesterov [Mon, 30 Sep 2013 20:45:27 +0000 (13:45 -0700)]
pidns: fix free_pid() to handle the first fork failure
"case 0" in free_pid() assumes that disable_pid_allocation() should
clear PIDNS_HASH_ADDING before the last pid goes away.
However this doesn't happen if the first fork() fails to create the
child reaper which should call disable_pid_allocation().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Davidlohr Bueso [Mon, 30 Sep 2013 20:45:26 +0000 (13:45 -0700)]
ipc,msg: prevent race with rmid in msgsnd,msgrcv
This fixes a race in both msgrcv() and msgsnd() between finding the msg
and actually dealing with the queue, as another thread can delete shmid
underneath us if we are preempted before acquiring the
kern_ipc_perm.lock.
Manfred illustrates this nicely:
Assume a preemptible kernel that is preempted just after
msq = msq_obtain_object_check(ns, msqid)
in do_msgrcv(). The only lock that is held is rcu_read_lock().
Now the other thread processes IPC_RMID. When the first task is
resumed, then it will happily wait for messages on a deleted queue.
Fix this by checking for if the queue has been deleted after taking the
lock.
Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Reported-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: <stable@vger.kernel.org> [3.11]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Manfred Spraul [Mon, 30 Sep 2013 20:45:25 +0000 (13:45 -0700)]
ipc/sem.c: update sem_otime for all operations
In commit
0a2b9d4c7967 ("ipc/sem.c: move wake_up_process out of the
spinlock section"), the update of semaphore's sem_otime(last semop time)
was moved to one central position (do_smart_update).
But since do_smart_update() is only called for operations that modify
the array, this means that wait-for-zero semops do not update sem_otime
anymore.
The fix is simple:
Non-alter operations must update sem_otime.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Reported-by: Jia He <jiakernel@gmail.com>
Tested-by: Jia He <jiakernel@gmail.com>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wanpeng Li [Mon, 30 Sep 2013 20:45:24 +0000 (13:45 -0700)]
mm/hwpoison: fix the lack of one reference count against poisoned page
The lack of one reference count against poisoned page for hwpoison_inject
w/o hwpoison_filter enabled result in hwpoison detect -1 users still
referenced the page, however, the number should be 0 except the poison
handler held one after successfully unmap. This patch fix it by hold one
referenced count against poisoned page for hwpoison_inject w/ and w/o
hwpoison_filter enabled.
Before patch:
[ 71.902112] Injecting memory failure at pfn 224706
[ 71.902137] MCE 0x224706: dirty LRU page recovery: Failed
[ 71.902138] MCE 0x224706: dirty LRU page still referenced by -1 users
After patch:
[ 94.710860] Injecting memory failure at pfn 215b68
[ 94.710885] MCE 0x215b68: dirty LRU page recovery: Recovered
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wanpeng Li [Mon, 30 Sep 2013 20:45:23 +0000 (13:45 -0700)]
mm/hwpoison: fix false report on 2nd attempt at page recovery
If the page is poisoned by software injection w/ MF_COUNT_INCREASED
flag, there is a false report during the 2nd attempt at page recovery
which is not truthful.
This patch fixes it by reporting the first attempt to try free buddy
page recovery if MF_COUNT_INCREASED is set.
Before patch:
[ 346.332041] Injecting memory failure at pfn 200010
[ 346.332189] MCE 0x200010: free buddy, 2nd try page recovery: Delayed
After patch:
[ 297.742600] Injecting memory failure at pfn 200010
[ 297.742941] MCE 0x200010: free buddy page recovery: Delayed
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wanpeng Li [Mon, 30 Sep 2013 20:45:22 +0000 (13:45 -0700)]
mm/hwpoison: fix test for a transparent huge page
PageTransHuge() can't guarantee the page is a transparent huge page
since it returns true for both transparent huge and hugetlbfs pages.
This patch fixes it by checking the page is also !hugetlbfs page.
Before patch:
[ 121.571128] Injecting memory failure at pfn 23a200
[ 121.571141] MCE 0x23a200: huge page recovery: Delayed
[ 140.355100] MCE: Memory failure is now running on 0x23a200
After patch:
[ 94.290793] Injecting memory failure at pfn 23a000
[ 94.290800] MCE 0x23a000: huge page recovery: Delayed
[ 105.722303] MCE: Software-unpoisoned page 0x23a000
Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wanpeng Li [Mon, 30 Sep 2013 20:45:21 +0000 (13:45 -0700)]
mm/hwpoison: fix traversal of hugetlbfs pages to avoid printk flood
madvise_hwpoison won't check if the page is small page or huge page and
traverses in small page granularity against the range unconditionally,
which result in a printk flood "MCE xxx: already hardware poisoned" if
the page is a huge page.
This patch fixes it by using compound_order(compound_head(page)) for
huge page iterator.
Testcase:
#define _GNU_SOURCE
#include <stdlib.h>
#include <stdio.h>
#include <sys/mman.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/types.h>
#include <errno.h>
#define PAGES_TO_TEST 3
#define PAGE_SIZE 4096 * 512
int main(void)
{
char *mem;
int i;
mem = mmap(NULL, PAGES_TO_TEST * PAGE_SIZE,
PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, 0, 0);
if (madvise(mem, PAGES_TO_TEST * PAGE_SIZE, MADV_HWPOISON) == -1)
return -1;
munmap(mem, PAGES_TO_TEST * PAGE_SIZE);
return 0;
}
Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Paul Gortmaker [Mon, 30 Sep 2013 20:45:19 +0000 (13:45 -0700)]
block: change config option name for cmdline partition parsing
Recently commit
bab55417b10c ("block: support embedded device command
line partition") introduced CONFIG_CMDLINE_PARSER. However, that name
is too generic and sounds like it enables/disables generic kernel boot
arg processing, when it really is block specific.
Before this option becomes a part of a full/final release, add the BLK_
prefix to it so that it is clear in absence of any other context that it
is block specific.
In addition, fix up the following less critical items:
- help text was not really at all helpful.
- index file for Documentation was not updated
- add the new arg to Documentation/kernel-parameters.txt
- clarify wording in source comments
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Cai Zhiyong <caizhiyong@huawei.com>
Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vlastimil Babka [Mon, 30 Sep 2013 20:45:18 +0000 (13:45 -0700)]
mm/mlock.c: prevent walking off the end of a pagetable in no-pmd configuration
The function __munlock_pagevec_fill() introduced in commit
7a8010cd3627
("mm: munlock: manual pte walk in fast path instead of
follow_page_mask()") uses pmd_addr_end() for restricting its operation
within current page table.
This is insufficient on architectures/configurations where pmd is folded
and pmd_addr_end() just returns the end of the full range to be walked.
In this case, it allows pte++ to walk off the end of a page table
resulting in unpredictable behaviour.
This patch fixes the function by using pgd_addr_end() and pud_addr_end()
before pmd_addr_end(), which will yield correct page table boundary on
all configurations. This is similar to what existing page walkers do
when walking each level of the page table.
Additionaly, the patch clarifies a comment for get_locked_pte() call in the
function.
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Reviewed-by: Bob Liu <bob.liu@oracle.com>
Cc: Jörn Engel <joern@logfs.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michel Lespinasse <walken@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rafael Aquini [Mon, 30 Sep 2013 20:45:16 +0000 (13:45 -0700)]
mm: avoid reinserting isolated balloon pages into LRU lists
Isolated balloon pages can wrongly end up in LRU lists when
migrate_pages() finishes its round without draining all the isolated
page list.
The same issue can happen when reclaim_clean_pages_from_list() tries to
reclaim pages from an isolated page list, before migration, in the CMA
path. Such balloon page leak opens a race window against LRU lists
shrinkers that leads us to the following kernel panic:
BUG: unable to handle kernel NULL pointer dereference at
0000000000000028
IP: [<
ffffffff810c2625>] shrink_page_list+0x24e/0x897
PGD
3cda2067 PUD
3d713067 PMD 0
Oops: 0000 [#1] SMP
CPU: 0 PID: 340 Comm: kswapd0 Not tainted 3.12.0-rc1-22626-g4367597 #87
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
RIP: shrink_page_list+0x24e/0x897
RSP: 0000:
ffff88003da499b8 EFLAGS:
00010286
RAX:
0000000000000000 RBX:
ffff88003e82bd60 RCX:
00000000000657d5
RDX:
0000000000000000 RSI:
000000000000031f RDI:
ffff88003e82bd40
RBP:
ffff88003da49ab0 R08:
0000000000000001 R09:
0000000081121a45
R10:
ffffffff81121a45 R11:
ffff88003c4a9a28 R12:
ffff88003e82bd40
R13:
ffff88003da0e800 R14:
0000000000000001 R15:
ffff88003da49d58
FS:
0000000000000000(0000) GS:
ffff88003fc00000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
00000000067d9000 CR3:
000000003ace5000 CR4:
00000000000407b0
Call Trace:
shrink_inactive_list+0x240/0x3de
shrink_lruvec+0x3e0/0x566
__shrink_zone+0x94/0x178
shrink_zone+0x3a/0x82
balance_pgdat+0x32a/0x4c2
kswapd+0x2f0/0x372
kthread+0xa2/0xaa
ret_from_fork+0x7c/0xb0
Code: 80 7d 8f 01 48 83 95 68 ff ff ff 00 4c 89 e7 e8 5a 7b 00 00 48 85 c0 49 89 c5 75 08 80 7d 8f 00 74 3e eb 31 48 8b 80 18 01 00 00 <48> 8b 74 0d 48 8b 78 30 be 02 00 00 00 ff d2 eb
RIP [<
ffffffff810c2625>] shrink_page_list+0x24e/0x897
RSP <
ffff88003da499b8>
CR2:
0000000000000028
---[ end trace
703d2451af6ffbfd ]---
Kernel panic - not syncing: Fatal exception
This patch fixes the issue, by assuring the proper tests are made at
putback_movable_pages() & reclaim_clean_pages_from_list() to avoid
isolated balloon pages being wrongly reinserted in LRU lists.
[akpm@linux-foundation.org: clarify awkward comment text]
Signed-off-by: Rafael Aquini <aquini@redhat.com>
Reported-by: Luiz Capitulino <lcapitulino@redhat.com>
Tested-by: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Rik van Riel <riel@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Felipe Pena [Mon, 30 Sep 2013 20:45:14 +0000 (13:45 -0700)]
arch/parisc/mm/fault.c: fix uninitialized variable usage
The FAULT_FLAG_WRITE flag has been set based on uninitialized variable.
Fixes a regression added by commit
759496ba6407 ("arch: mm: pass
userspace fault flag to generic fault handler")
Signed-off-by: Felipe Pena <felipensp@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Mon, 30 Sep 2013 20:45:13 +0000 (13:45 -0700)]
include/asm-generic/vtime.h: avoid zero-length file
patch(1) can't handle zero-length files - it appears to simply not create
the file, so my powerpc build fails.
Put something in here to make life easier.
Cc: Hugh Dickins <hughd@google.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vyacheslav Dubeyko [Mon, 30 Sep 2013 20:45:12 +0000 (13:45 -0700)]
nilfs2: fix issue with race condition of competition between segments for dirty blocks
Many NILFS2 users were reported about strange file system corruption
(for example):
NILFS: bad btree node (blocknr=185027): level = 0, flags = 0x0, nchildren = 768
NILFS error (device sda4): nilfs_bmap_last_key: broken bmap (inode number=11540)
But such error messages are consequence of file system's issue that takes
place more earlier. Fortunately, Jerome Poulin <jeromepoulin@gmail.com>
and Anton Eliasson <devel@antoneliasson.se> were reported about another
issue not so recently. These reports describe the issue with segctor
thread's crash:
BUG: unable to handle kernel paging request at
0000000000004c83
IP: nilfs_end_page_io+0x12/0xd0 [nilfs2]
Call Trace:
nilfs_segctor_do_construct+0xf25/0x1b20 [nilfs2]
nilfs_segctor_construct+0x17b/0x290 [nilfs2]
nilfs_segctor_thread+0x122/0x3b0 [nilfs2]
kthread+0xc0/0xd0
ret_from_fork+0x7c/0xb0
These two issues have one reason. This reason can raise third issue
too. Third issue results in hanging of segctor thread with eating of
100% CPU.
REPRODUCING PATH:
One of the possible way or the issue reproducing was described by
Jermoe me Poulin <jeromepoulin@gmail.com>:
1. init S to get to single user mode.
2. sysrq+E to make sure only my shell is running
3. start network-manager to get my wifi connection up
4. login as root and launch "screen"
5. cd /boot/log/nilfs which is a ext3 mount point and can log when NILFS dies.
6. lscp | xz -9e > lscp.txt.xz
7. mount my snapshot using mount -o cp=3360839,ro /dev/vgUbuntu/root /mnt/nilfs
8. start a screen to dump /proc/kmsg to text file since rsyslog is killed
9. start a screen and launch strace -f -o find-cat.log -t find
/mnt/nilfs -type f -exec cat {} > /dev/null \;
10. start a screen and launch strace -f -o apt-get.log -t apt-get update
11. launch the last command again as it did not crash the first time
12. apt-get crashes
13. ps aux > ps-aux-crashed.log
13. sysrq+W
14. sysrq+E wait for everything to terminate
15. sysrq+SUSB
Simplified way of the issue reproducing is starting kernel compilation
task and "apt-get update" in parallel.
REPRODUCIBILITY:
The issue is reproduced not stable [60% - 80%]. It is very important to
have proper environment for the issue reproducing. The critical
conditions for successful reproducing:
(1) It should have big modified file by mmap() way.
(2) This file should have the count of dirty blocks are greater that
several segments in size (for example, two or three) from time to time
during processing.
(3) It should be intensive background activity of files modification
in another thread.
INVESTIGATION:
First of all, it is possible to see that the reason of crash is not valid
page address:
NILFS [nilfs_segctor_complete_write]:2100 bh->b_count 0, bh->b_blocknr
13895680, bh->b_size
13897727, bh->b_page
0000000000001a82
NILFS [nilfs_segctor_complete_write]:2101 segbuf->sb_segnum 6783
Moreover, value of b_page (0x1a82) is 6786. This value looks like segment
number. And b_blocknr with b_size values look like block numbers. So,
buffer_head's pointer points on not proper address value.
Detailed investigation of the issue is discovered such picture:
[-----------------------------SEGMENT 6783-------------------------------]
NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction
NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect
NILFS [nilfs_segctor_do_construct]:2336 nilfs_segctor_assign
NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage
NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write
NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs
NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write
NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector
111149024, segbuf->sb_segnum 6783
[-----------------------------SEGMENT 6784-------------------------------]
NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction
NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect
NILFS [nilfs_lookup_dirty_data_buffers]:782 bh->b_count 1, bh->b_page
ffffea000709b000, page->index 0, i_ino 1033103, i_size
25165824
NILFS [nilfs_lookup_dirty_data_buffers]:783 bh->b_assoc_buffers.next
ffff8802174a6798, bh->b_assoc_buffers.prev
ffff880221cffee8
NILFS [nilfs_segctor_do_construct]:2336 nilfs_segctor_assign
NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage
NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write
NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs
NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write
NILFS [nilfs_segbuf_submit_bh]:575 bh->b_count 1, bh->b_page
ffffea000709b000, page->index 0, i_ino 1033103, i_size
25165824
NILFS [nilfs_segbuf_submit_bh]:576 segbuf->sb_segnum 6784
NILFS [nilfs_segbuf_submit_bh]:577 bh->b_assoc_buffers.next
ffff880218a0d5f8, bh->b_assoc_buffers.prev
ffff880218bcdf50
NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector
111150080, segbuf->sb_segnum 6784, segbuf->sb_nbio 0
[----------] ditto
NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector
111164416, segbuf->sb_segnum 6784, segbuf->sb_nbio 15
[-----------------------------SEGMENT 6785-------------------------------]
NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction
NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect
NILFS [nilfs_lookup_dirty_data_buffers]:782 bh->b_count 2, bh->b_page
ffffea000709b000, page->index 0, i_ino 1033103, i_size
25165824
NILFS [nilfs_lookup_dirty_data_buffers]:783 bh->b_assoc_buffers.next
ffff880219277e80, bh->b_assoc_buffers.prev
ffff880221cffc88
NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage
NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write
NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs
NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write
NILFS [nilfs_segbuf_submit_bh]:575 bh->b_count 2, bh->b_page
ffffea000709b000, page->index 0, i_ino 1033103, i_size
25165824
NILFS [nilfs_segbuf_submit_bh]:576 segbuf->sb_segnum 6785
NILFS [nilfs_segbuf_submit_bh]:577 bh->b_assoc_buffers.next
ffff880218a0d5f8, bh->b_assoc_buffers.prev
ffff880222cc7ee8
NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector
111165440, segbuf->sb_segnum 6785, segbuf->sb_nbio 0
[----------] ditto
NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector
111177728, segbuf->sb_segnum 6785, segbuf->sb_nbio 12
NILFS [nilfs_segctor_do_construct]:2399 nilfs_segctor_wait
NILFS [nilfs_segbuf_wait]:676 segbuf->sb_segnum 6783
NILFS [nilfs_segbuf_wait]:676 segbuf->sb_segnum 6784
NILFS [nilfs_segbuf_wait]:676 segbuf->sb_segnum 6785
NILFS [nilfs_segctor_complete_write]:2100 bh->b_count 0, bh->b_blocknr
13895680, bh->b_size
13897727, bh->b_page
0000000000001a82
BUG: unable to handle kernel paging request at
0000000000001a82
IP: [<
ffffffffa024d0f2>] nilfs_end_page_io+0x12/0xd0 [nilfs2]
Usually, for every segment we collect dirty files in list. Then, dirty
blocks are gathered for every dirty file, prepared for write and
submitted by means of nilfs_segbuf_submit_bh() call. Finally, it takes
place complete write phase after calling nilfs_end_bio_write() on the
block layer. Buffers/pages are marked as not dirty on final phase and
processed files removed from the list of dirty files.
It is possible to see that we had three prepare_write and submit_bio
phases before segbuf_wait and complete_write phase. Moreover, segments
compete between each other for dirty blocks because on every iteration
of segments processing dirty buffer_heads are added in several lists of
payload_buffers:
[SEGMENT 6784]: bh->b_assoc_buffers.next
ffff880218a0d5f8, bh->b_assoc_buffers.prev
ffff880218bcdf50
[SEGMENT 6785]: bh->b_assoc_buffers.next
ffff880218a0d5f8, bh->b_assoc_buffers.prev
ffff880222cc7ee8
The next pointer is the same but prev pointer has changed. It means
that buffer_head has next pointer from one list but prev pointer from
another. Such modification can be made several times. And, finally, it
can be resulted in various issues: (1) segctor hanging, (2) segctor
crashing, (3) file system metadata corruption.
FIX:
This patch adds:
(1) setting of BH_Async_Write flag in nilfs_segctor_prepare_write()
for every proccessed dirty block;
(2) checking of BH_Async_Write flag in
nilfs_lookup_dirty_data_buffers() and
nilfs_lookup_dirty_node_buffers();
(3) clearing of BH_Async_Write flag in nilfs_segctor_complete_write(),
nilfs_abort_logs(), nilfs_forget_buffer(), nilfs_clear_dirty_page().
Reported-by: Jerome Poulin <jeromepoulin@gmail.com>
Reported-by: Anton Eliasson <devel@antoneliasson.se>
Cc: Paul Fertser <fercerpav@gmail.com>
Cc: ARAI Shun-ichi <hermes@ceres.dti.ne.jp>
Cc: Piotr Szymaniak <szarpaj@grubelek.pl>
Cc: Juan Barry Manuel Canham <Linux@riotingpacifist.net>
Cc: Zahid Chowdhury <zahid.chowdhury@starsolutions.com>
Cc: Elmer Zhang <freeboy6716@gmail.com>
Cc: Kenneth Langga <klangga@gmail.com>
Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Weiping Pan [Mon, 30 Sep 2013 20:45:10 +0000 (13:45 -0700)]
Documentation/kernel-parameters.txt: replace kernelcore with Movable
Han Pingtian found a typo in Documentation/kernel-parameters.txt about
"kernelcore=", that "kernelcore" should be replaced with "Movable" here.
Signed-off-by: Weiping Pan <wpan@redhat.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Darrick J. Wong [Mon, 30 Sep 2013 20:45:09 +0000 (13:45 -0700)]
mm/bounce.c: fix a regression where MS_SNAP_STABLE (stable pages snapshotting) was ignored
The "force" parameter in __blk_queue_bounce was being ignored, which
means that stable page snapshots are not always happening (on ext3).
This of course leads to DIF disks reporting checksum errors, so fix this
regression.
The regression was introduced in commit
6bc454d15004 ("bounce: Refactor
__blk_queue_bounce to not use bi_io_vec")
Reported-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: <stable@vger.kernel.org> [3.10+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tetsuo Handa [Mon, 30 Sep 2013 20:45:08 +0000 (13:45 -0700)]
kernel/kmod.c: check for NULL in call_usermodehelper_exec()
If /proc/sys/kernel/core_pattern contains only "|", a NULL pointer
dereference happens upon core dump because argv_split("") returns
argv[0] == NULL.
This bug was once fixed by commit
264b83c07a84 ("usermodehelper: check
subprocess_info->path != NULL") but was by error reintroduced by commit
7f57cfa4e2aa ("usermodehelper: kill the sub_info->path[0] check").
This bug seems to exist since 2.6.19 (the version which core dump to
pipe was added). Depending on kernel version and config, some side
effect might happen immediately after this oops (e.g. kernel panic with
2.6.32-358.18.1.el6).
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Manfred Spraul [Mon, 30 Sep 2013 20:45:07 +0000 (13:45 -0700)]
ipc/sem.c: synchronize the proc interface
The proc interface is not aware of sem_lock(), it instead calls
ipc_lock_object() directly. This means that simple semop() operations
can run in parallel with the proc interface. Right now, this is
uncritical, because the implementation doesn't do anything that requires
a proper synchronization.
But it is dangerous and therefore should be fixed.
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Manfred Spraul [Mon, 30 Sep 2013 20:45:06 +0000 (13:45 -0700)]
ipc/sem.c: optimize sem_lock()
Operations that need access to the whole array must guarantee that there
are no simple operations ongoing. Right now this is achieved by
spin_unlock_wait(sem->lock) on all semaphores.
If complex_count is nonzero, then this spin_unlock_wait() is not
necessary, because it was already performed in the past by the thread
that increased complex_count and even though sem_perm.lock was dropped
inbetween, no simple operation could have started, because simple
operations cannot start when complex_count is non-zero.
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Mike Galbraith <bitbucket@online.de>
Cc: Rik van Riel <riel@redhat.com>
Reviewed-by: Davidlohr Bueso <davidlohr@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Manfred Spraul [Mon, 30 Sep 2013 20:45:04 +0000 (13:45 -0700)]
ipc/sem.c: fix race in sem_lock()
The exclusion of complex operations in sem_lock() is insufficient: after
acquiring the per-semaphore lock, a simple op must first check that
sem_perm.lock is not locked and only after that test check
complex_count. The current code does it the other way around - and that
creates a race. Details are below.
The patch is a complete rewrite of sem_lock(), based in part on the code
from Mike Galbraith. It removes all gotos and all loops and thus the
risk of livelocks.
I have tested the patch (together with the next one) on my i3 laptop and
it didn't cause any problems.
The bug is probably also present in 3.10 and 3.11, but for these kernels
it might be simpler just to move the test of sma->complex_count after
the spin_is_locked() test.
Details of the bug:
Assume:
- sma->complex_count = 0.
- Thread 1: semtimedop(complex op that must sleep)
- Thread 2: semtimedop(simple op).
Pseudo-Trace:
Thread 1: sem_lock(): acquire sem_perm.lock
Thread 1: sem_lock(): check for ongoing simple ops
Nothing ongoing, thread 2 is still before sem_lock().
Thread 1: try_atomic_semop()
<<< preempted.
Thread 2: sem_lock():
static inline int sem_lock(struct sem_array *sma, struct sembuf *sops,
int nsops)
{
int locknum;
again:
if (nsops == 1 && !sma->complex_count) {
struct sem *sem = sma->sem_base + sops->sem_num;
/* Lock just the semaphore we are interested in. */
spin_lock(&sem->lock);
/*
* If sma->complex_count was set while we were spinning,
* we may need to look at things we did not lock here.
*/
if (unlikely(sma->complex_count)) {
spin_unlock(&sem->lock);
goto lock_array;
}
<<<<<<<<<
<<< complex_count is still 0.
<<<
<<< Here it is preempted
<<<<<<<<<
Thread 1: try_atomic_semop() returns, notices that it must sleep.
Thread 1: increases sma->complex_count.
Thread 1: drops sem_perm.lock
Thread 2:
/*
* Another process is holding the global lock on the
* sem_array; we cannot enter our critical section,
* but have to wait for the global lock to be released.
*/
if (unlikely(spin_is_locked(&sma->sem_perm.lock))) {
spin_unlock(&sem->lock);
spin_unlock_wait(&sma->sem_perm.lock);
goto again;
}
<<< sem_perm.lock already dropped, thus no "goto again;"
locknum = sops->sem_num;
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Mike Galbraith <bitbucket@online.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: <stable@vger.kernel.org> [3.10+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Rientjes [Mon, 30 Sep 2013 20:45:03 +0000 (13:45 -0700)]
mm/compaction.c: periodically schedule when freeing pages
We've been getting warnings about an excessive amount of time spent
allocating pages for migration during memory compaction without
scheduling. isolate_freepages_block() already periodically checks for
contended locks or the need to schedule, but isolate_freepages() never
does.
When a zone is massively long and no suitable targets can be found, this
iteration can be quite expensive without ever doing cond_resched().
Check periodically for the need to reschedule while the compaction free
scanner iterates.
Signed-off-by: David Rientjes <rientjes@google.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dan Aloni [Mon, 30 Sep 2013 20:45:02 +0000 (13:45 -0700)]
fs/binfmt_elf.c: prevent a coredump with a large vm_map_count from Oopsing
A high setting of max_map_count, and a process core-dumping with a large
enough vm_map_count could result in an NT_FILE note not being written,
and the kernel crashing immediately later because it has assumed
otherwise.
Reproduction of the oops-causing bug described here:
https://lkml.org/lkml/2013/8/30/50
Rge ussue originated in commit
2aa362c49c31 ("coredump: extend core dump
note section to contain file names of mapped file") from Oct 4, 2012.
This patch make that section optional in that case. fill_files_note()
should signify the error, and also let the info struct in
elf_core_dump() be zero-initialized so that we can check for the
optionally written note.
[akpm@linux-foundation.org: avoid abusing E2BIG, remove a couple of not-really-needed local variables]
[akpm@linux-foundation.org: fix sparse warning]
Signed-off-by: Dan Aloni <alonid@stratoscale.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Denys Vlasenko <vda.linux@googlemail.com>
Reported-by: Martin MOKREJS <mmokrejs@gmail.com>
Tested-by: Martin MOKREJS <mmokrejs@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joonyoung Shim [Mon, 30 Sep 2013 20:45:01 +0000 (13:45 -0700)]
revert "mm/memory-hotplug: fix lowmem count overflow when offline pages"
This reverts commit
cea27eb2a202 ("mm/memory-hotplug: fix lowmem count
overflow when offline pages").
The fixed bug by commit cea27eb was fixed to another way by commit
3dcc0571cd64 ("mm: correctly update zone->managed_pages"). That commit
enhances memory_hotplug.c to adjust totalhigh_pages when hot-removing
memory, for details please refer to:
http://marc.info/?l=linux-mm&m=
136957578620221&w=2
As a result, commit
cea27eb2a202 currently causes duplicated decreasing
of totalhigh_pages, thus the revert.
Signed-off-by: Joonyoung Shim <jy0922.shim@samsung.com>
Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Jiang Liu <liuj97@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Mon, 30 Sep 2013 18:13:33 +0000 (11:13 -0700)]
Merge tag 'regulator-v3.12-rc3' of git://git./linux/kernel/git/broonie/regulator
Pull regulator fixes from Mark Brown:
"Quite a few fixes here, mostly small driver specific ones.
The stand out thing is a fix for errors generating the documentation
from Randy Dunlap, otherwise unless you're using the driver in
question there should be no impact"
* tag 'regulator-v3.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: ti-abb: Fix bias voltage glitch in transition to bypass mode
regulator: wm831x-ldo: Fix max_uV for gp_ldo and aldo linear range settings
regulator: wm8350: correct the max_uV of LDO
regulator: fix fatal kernel-doc error
regulator: palmas: Remove wrong comment for the equation calculating num_voltages
regulator: da9063: Fix PTR_ERR/ERR_PTR mismatch
regulator: palmas: configure enable time for LDOs
regulator: palmas: fix the n_voltages for smps to 122
Linus Torvalds [Mon, 30 Sep 2013 18:12:20 +0000 (11:12 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jmorris/linux-security
Pull apparmor fixes from James Morris:
"Bugfixes for the Apparmor code for regressions introduced in the 3.12
pull request"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
apparmor: fix suspicious RCU usage warning in policy.c/policy.h
apparmor: Use shash crypto API interface for profile hashes
Linus Torvalds [Mon, 30 Sep 2013 18:11:28 +0000 (11:11 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs
Pull assorted vfs fixes from Al Viro:
"A couple of bug fixes + removal of dead code in afs ->d_revalidate()"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
afs: dget_parent() can't return a negative dentry
ocfs2: needs ->d_lock to poke in ->d_parent->d_inode from ->d_revalidate()
sysv: Add forgotten superblock lock init for v7 fs
Linus Torvalds [Mon, 30 Sep 2013 17:40:20 +0000 (10:40 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/egtvedt/linux-avr32
Pull AVR32 fixes from Hans-Christian Egtvedt.
Fix build warnings and use the Kbuild infrastructure for generic headers
rather than doing it by hand.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/egtvedt/linux-avr32:
avr32: cast syscall_return to silence compiler warning
avr32: fix clockevents kernel warning
avr32: use Kbuild infrastructure to handle the asm-generic headers
Linus Torvalds [Mon, 30 Sep 2013 17:38:46 +0000 (10:38 -0700)]
Merge tag 'for-linus-
20130929' of git://github.com/sctscore/official-linux
Pull S+core fixes from Lennox Wu:
"These updates include updating information of maintainers, fix some
trivial errors, and add a necessary function for supporting ipv6"
* tag 'for-linus-
20130929' of git://github.com/sctscore/official-linux:
Score: Update the information of Score maintaners
Score: Modify the Makefile of Score, remove -mlong-calls for compiling
Score: Implement the function csum_ipv6_magic
Score: The commit is for compiling successfully
Linus Torvalds [Mon, 30 Sep 2013 17:37:05 +0000 (10:37 -0700)]
Merge tag 'arc-fixes-for-3.12' of git://git./linux/kernel/git/vgupta/arc
Pull ARC Fixes from Vineet Gupta:
- Handle unaligned access in zero delay loops
- spinlock livelock fix for SMP systemC model
- fix 32bit overflow in access_ok
- better setup of clockevents
* tag 'arc-fixes-for-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: Use clockevents_config_and_register over clockevents_register_device
ARC: Workaround spinlock livelock in SMP SystemC simulation
ARC: Fix 32-bit wrap around in access_ok()
ARC: Handle zero-overhead-loop in unaligned access handler
Mark Brown [Mon, 30 Sep 2013 11:04:33 +0000 (12:04 +0100)]
Merge remote-tracking branch 'regulator/fix/wm8350' into regulator-linus
Mark Brown [Mon, 30 Sep 2013 11:04:33 +0000 (12:04 +0100)]
Merge remote-tracking branch 'regulator/fix/wm831x' into regulator-linus
Mark Brown [Mon, 30 Sep 2013 11:04:32 +0000 (12:04 +0100)]
Merge remote-tracking branch 'regulator/fix/ti-abb' into regulator-linus
Mark Brown [Mon, 30 Sep 2013 11:04:31 +0000 (12:04 +0100)]
Merge remote-tracking branch 'regulator/fix/palmas' into regulator-linus
Mark Brown [Mon, 30 Sep 2013 11:04:30 +0000 (12:04 +0100)]
Merge remote-tracking branch 'regulator/fix/doc' into regulator-linus
Mark Brown [Mon, 30 Sep 2013 11:04:29 +0000 (12:04 +0100)]
Merge remote-tracking branch 'regulator/fix/da9063' into regulator-linus
Gabor Juhos [Wed, 25 Sep 2013 19:50:01 +0000 (21:50 +0200)]
avr32: cast syscall_return to silence compiler warning
The patch fixes the following compiler warning:
CC arch/avr32/kernel/process.o
arch/avr32/kernel/process.c: In function 'copy_thread':
arch/avr32/kernel/process.c:292: warning: assignment makes integer \
from pointer without a cast
Signed-off-by: Gabor Juhos <juhosg@openwrt.org>
Acked-by: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Gabor Juhos [Wed, 25 Sep 2013 13:32:35 +0000 (15:32 +0200)]
avr32: fix clockevents kernel warning
Since commit
01426478df3a8791ff5c8b6b82d409e699cfaf38
(avr32: Use generic idle loop) the kernel throws the
following warning on avr32:
WARNING: at
900322e4 [verbose debug info unavailable]
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 3.12.0-rc2 #117
task:
901c3ecc ti:
901c0000 task.ti:
901c0000
PC is at cpu_idle_poll_ctrl+0x1c/0x38
LR is at comparator_mode+0x3e/0x40
pc : [<
900322e4>] lr : [<
90014882>] Not tainted
sp :
901c1f74 r12:
00000000 r11:
901c74a0
r10:
901d2510 r9 :
00000001 r8 :
901db4de
r7 :
901c74a0 r6 :
00000001 r5 :
00410020 r4 :
901db574
r3 :
00410024 r2 :
90206fe0 r1 :
00000000 r0 :
007f0000
Flags: qvnzc
Mode bits: hjmde....G
CPU Mode: Supervisor
Call trace:
[<
90039ede>] clockevents_set_mode+0x16/0x2e
[<
90039f00>] clockevents_shutdown+0xa/0x1e
[<
9003a078>] clockevents_exchange_device+0x58/0x70
[<
9003a78c>] tick_check_new_device+0x38/0x54
[<
9003a1a2>] clockevents_register_device+0x32/0x90
[<
900035c4>] time_init+0xa8/0x108
[<
90000520>] start_kernel+0x128/0x23c
When the 'avr32_comparator' clockevent device is registered,
the clockevent core sets the mode of that clockevent device
to CLOCK_EVT_MODE_SHUTDOWN. Due to this, the 'comparator_mode'
function calls the 'cpu_idle_poll_ctrl' to disables idle poll.
This results in the aforementioned warning because the polling
is not enabled yet.
Change the code to only disable idle poll if it is enabled by
the same function to avoid the warning.
Cc: stable@vger.kernel.org
Signed-off-by: Gabor Juhos <juhosg@openwrt.org>
Acked-by: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Steven Rostedt [Sun, 1 Sep 2013 20:56:21 +0000 (22:56 +0200)]
avr32: use Kbuild infrastructure to handle the asm-generic headers
Use kbuild to add asm-generic headers that do nothing, also remove the arch
specific wrapper headers.
This only affects headers that do nothing but include the generic
equivalent. It does not touch any header that does a little more.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Al Viro [Sun, 29 Sep 2013 20:29:04 +0000 (16:29 -0400)]
afs: dget_parent() can't return a negative dentry
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 29 Sep 2013 18:59:30 +0000 (14:59 -0400)]
ocfs2: needs ->d_lock to poke in ->d_parent->d_inode from ->d_revalidate()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Lubomir Rintel [Wed, 18 Sep 2013 10:39:16 +0000 (12:39 +0200)]
sysv: Add forgotten superblock lock init for v7 fs
Superblock lock was replaced with (un)lock_super() removal, but left
uninitialized for Seventh Edition UNIX filesystem in the following commit (3.7):
c07cb01 sysv: drop lock/unlock super
Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
John Johansen [Sun, 29 Sep 2013 15:39:22 +0000 (08:39 -0700)]
apparmor: fix suspicious RCU usage warning in policy.c/policy.h
The recent 3.12 pull request for apparmor was missing a couple rcu _protected
access modifiers. Resulting in the follow suspicious RCU usage
[ 29.804534] [ INFO: suspicious RCU usage. ]
[ 29.804539] 3.11.0+ #5 Not tainted
[ 29.804541] -------------------------------
[ 29.804545] security/apparmor/include/policy.h:363 suspicious rcu_dereference_check() usage!
[ 29.804548]
[ 29.804548] other info that might help us debug this:
[ 29.804548]
[ 29.804553]
[ 29.804553] rcu_scheduler_active = 1, debug_locks = 1
[ 29.804558] 2 locks held by apparmor_parser/1268:
[ 29.804560] #0: (sb_writers#9){.+.+.+}, at: [<
ffffffff81120a4c>] file_start_write+0x27/0x29
[ 29.804576] #1: (&ns->lock){+.+.+.}, at: [<
ffffffff811f5d88>] aa_replace_profiles+0x166/0x57c
[ 29.804589]
[ 29.804589] stack backtrace:
[ 29.804595] CPU: 0 PID: 1268 Comm: apparmor_parser Not tainted 3.11.0+ #5
[ 29.804599] Hardware name: ASUSTeK Computer Inc. UL50VT /UL50VT , BIOS 217 03/01/2010
[ 29.804602]
0000000000000000 ffff8800b95a1d90 ffffffff8144eb9b ffff8800b94db540
[ 29.804611]
ffff8800b95a1dc0 ffffffff81087439 ffff880138cc3a18 ffff880138cc3a18
[ 29.804619]
ffff8800b9464a90 ffff880138cc3a38 ffff8800b95a1df0 ffffffff811f5084
[ 29.804628] Call Trace:
[ 29.804636] [<
ffffffff8144eb9b>] dump_stack+0x4e/0x82
[ 29.804642] [<
ffffffff81087439>] lockdep_rcu_suspicious+0xfc/0x105
[ 29.804649] [<
ffffffff811f5084>] __aa_update_replacedby+0x53/0x7f
[ 29.804655] [<
ffffffff811f5408>] __replace_profile+0x11f/0x1ed
[ 29.804661] [<
ffffffff811f6032>] aa_replace_profiles+0x410/0x57c
[ 29.804668] [<
ffffffff811f16d4>] profile_replace+0x35/0x4c
[ 29.804674] [<
ffffffff81120fa3>] vfs_write+0xad/0x113
[ 29.804680] [<
ffffffff81121609>] SyS_write+0x44/0x7a
[ 29.804687] [<
ffffffff8145bfd2>] system_call_fastpath+0x16/0x1b
[ 29.804691]
[ 29.804694] ===============================
[ 29.804697] [ INFO: suspicious RCU usage. ]
[ 29.804700] 3.11.0+ #5 Not tainted
[ 29.804703] -------------------------------
[ 29.804706] security/apparmor/policy.c:566 suspicious rcu_dereference_check() usage!
[ 29.804709]
[ 29.804709] other info that might help us debug this:
[ 29.804709]
[ 29.804714]
[ 29.804714] rcu_scheduler_active = 1, debug_locks = 1
[ 29.804718] 2 locks held by apparmor_parser/1268:
[ 29.804721] #0: (sb_writers#9){.+.+.+}, at: [<
ffffffff81120a4c>] file_start_write+0x27/0x29
[ 29.804733] #1: (&ns->lock){+.+.+.}, at: [<
ffffffff811f5d88>] aa_replace_profiles+0x166/0x57c
[ 29.804744]
[ 29.804744] stack backtrace:
[ 29.804750] CPU: 0 PID: 1268 Comm: apparmor_parser Not tainted 3.11.0+ #5
[ 29.804753] Hardware name: ASUSTeK Computer Inc. UL50VT /UL50VT , BIOS 217 03/01/2010
[ 29.804756]
0000000000000000 ffff8800b95a1d80 ffffffff8144eb9b ffff8800b94db540
[ 29.804764]
ffff8800b95a1db0 ffffffff81087439 ffff8800b95b02b0 0000000000000000
[ 29.804772]
ffff8800b9efba08 ffff880138cc3a38 ffff8800b95a1dd0 ffffffff811f4f94
[ 29.804779] Call Trace:
[ 29.804786] [<
ffffffff8144eb9b>] dump_stack+0x4e/0x82
[ 29.804791] [<
ffffffff81087439>] lockdep_rcu_suspicious+0xfc/0x105
[ 29.804798] [<
ffffffff811f4f94>] aa_free_replacedby_kref+0x4d/0x62
[ 29.804804] [<
ffffffff811f4f47>] ? aa_put_namespace+0x17/0x17
[ 29.804810] [<
ffffffff811f4f0b>] kref_put+0x36/0x40
[ 29.804816] [<
ffffffff811f5423>] __replace_profile+0x13a/0x1ed
[ 29.804822] [<
ffffffff811f6032>] aa_replace_profiles+0x410/0x57c
[ 29.804829] [<
ffffffff811f16d4>] profile_replace+0x35/0x4c
[ 29.804835] [<
ffffffff81120fa3>] vfs_write+0xad/0x113
[ 29.804840] [<
ffffffff81121609>] SyS_write+0x44/0x7a
[ 29.804847] [<
ffffffff8145bfd2>] system_call_fastpath+0x16/0x1b
Reported-by: miles.lane@gmail.com
CC: paulmck@linux.vnet.ibm.com
Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Tyler Hicks [Sun, 29 Sep 2013 15:39:21 +0000 (08:39 -0700)]
apparmor: Use shash crypto API interface for profile hashes
Use the shash interface, rather than the hash interface, when hashing
AppArmor profiles. The shash interface does not use scatterlists and it
is a better fit for what AppArmor needs.
This fixes a kernel paging BUG when aa_calc_profile_hash() is passed a
buffer from vmalloc(). The hash interface requires callers to handle
vmalloc() buffers differently than what AppArmor was doing. Due to
vmalloc() memory not being physically contiguous, each individual page
behind the buffer must be assigned to a scatterlist with sg_set_page()
and then the scatterlist passed to crypto_hash_update().
The shash interface does not have that limitation and allows vmalloc()
and kmalloc() buffers to be handled in the same manner.
BugLink: https://launchpad.net/bugs/1216294/
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=62261
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Acked-by: Seth Arnold <seth.arnold@canonical.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Linus Torvalds [Sun, 29 Sep 2013 22:02:38 +0000 (15:02 -0700)]
Linux 3.12-rc3
Linus Torvalds [Sun, 29 Sep 2013 20:47:35 +0000 (13:47 -0700)]
Merge tag 'usb-3.12-rc3' of git://git./linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are a number of USB driver fixes for 3.12-rc3.
These are all for host controller issues that have been reported, and
there's a fix for an annoying error message that gets printed every
time you remove a USB 3 device from the system that's been bugging me
for a while"
* tag 'usb-3.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: dwc3: add support for Merrifield
USB: fsl/ehci: fix failure of checking PHY_CLK_VALID during reinitialization
USB: Fix breakage in ffs_fs_mount()
fsl/usb: Resolve PHY_CLK_VLD instability issue for ULPI phy
usb/core/devio.c: Don't reject control message to endpoint with wrong direction bit
usb: chipidea: USB_CHIPIDEA should depend on HAS_DMA
usb: chipidea: udc: free pending TD at removal procedure
usb: chipidea: imx: Add usb_phy_shutdown at probe's error path
usb: chipidea: Fix memleak for ci->hw_bank.regmap when removal
usb: chipidea: udc: fix the oops after rmmod gadget
USB: fix PM config symbol in uhci-hcd, ehci-hcd, and xhci-hcd
USB: OHCI: accept very late isochronous URBs
USB: UHCI: accept very late isochronous URBs
USB: iMX21: accept very late isochronous URBs
usbcore: check usb device's state before sending a Set SEL control transfer
xhci: Fix race between ep halt and URB cancellation
usb: Fix xHCI host issues on remote wakeup.
xhci: Ensure a command structure points to the correct trb on the command ring
xhci: Fix oops happening after address device timeout
Linus Torvalds [Sun, 29 Sep 2013 20:47:00 +0000 (13:47 -0700)]
Merge tag 'tty-3.12-rc3' of git://git./linux/kernel/git/gregkh/tty
Pull tty/serial fixes from Greg KH:
"Here are some serial at tty driver fixes for 3.12-rc3
The serial driver fixes some kref leaks, documentation is moved to the
proper places, and the tty and n_tty fixes resolve some reported
regressions. There is still one outstanding tty regression fix that
isn't in here yet, as I want to test it out some more, it will be sent
for 3.12-rc4 if it checks out"
* tag 'tty-3.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
tty: ar933x_uart: move devicetree binding documentation
tty: Fix SIGTTOU not sent with tcflush()
n_tty: Fix EOF push index when termios changes
serial: pch_uart: remove unnecessary tty_port_tty_get
serial: pch_uart: fix tty-kref leak in dma-rx path
serial: pch_uart: fix tty-kref leak in rx-error path
serial: tegra: fix tty-kref leak
Linus Torvalds [Sun, 29 Sep 2013 20:46:18 +0000 (13:46 -0700)]
Merge tag 'staging-3.12-rc3' of git://git./linux/kernel/git/gregkh/staging
Pull staging fixes from Greg KH:
"Here are some staging driver fixes, MAINTAINER updates, and a new
device id. All of these have been in the linux-next tree, and are
pretty simple patches"
* tag 'staging-3.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: r8188eu: Add new device ID
staging: imx-drm: Fix probe failure
staging: vt6656: [BUG] iwctl_siwencodeext return if device not open
staging: vt6656: [BUG] main_usb.c oops on device_close move flag earlier.
staging: vt6656: rxtx.c [BUG] s_vGetFreeContext dead lock on null apTD.
Staging: rtl8192u: r819xU_cmdpkt: checking NULL value after doing dev_alloc_skb
staging: usbip: Orphan usbip
staging: r8188eu: Add files for new drive: Cocci spatch "noderef"
staging: r8188eu: Cocci spatch "noderef"
staging: octeon-usb: Cocci spatch "noderef"
staging: r8188eu: Add files for new drive: Cocci spatch "noderef"
MAINTAINERS: staging: dgnc and dgap drivers: add maintainer
staging: lustre: Cocci spatch "noderef"
Linus Torvalds [Sun, 29 Sep 2013 20:45:23 +0000 (13:45 -0700)]
Merge tag 'driver-core-3.12-rc3' of git://git./linux/kernel/git/gregkh/driver-core
Pull driver core / sysfs fixes from Greg KH:
"Here are 2 fixes for 3.12-rc3. One fixes a sysfs problem with
mounting caused by 3.12-rc1, and the other is a bug reported by the
chromeos developers with the driver core.
Both have been in linux-next for a bit"
* tag 'driver-core-3.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
driver core : Fix use after free of dev->parent in device_shutdown
sysfs: Allow mounting without CONFIG_NET
Linus Torvalds [Sun, 29 Sep 2013 20:44:48 +0000 (13:44 -0700)]
Merge tag 'char-misc-3.12-rc3' of git://git./linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here are some HyperV and MEI driver fixes for 3.12-rc3. They resolve
some issues that people have been reporting for them"
* tag 'char-misc-3.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
Drivers: hv: vmbus: Terminate vmbus version negotiation on timeout
Drivers: hv: util: Correctly support ws2008R2 and earlier
mei: cancel stall timers in mei_reset
mei: bus: stop wait for read during cl state transition
mei: make me client counters less error prone
Anna Schumaker [Wed, 25 Sep 2013 21:02:48 +0000 (17:02 -0400)]
NFS: Give "flavor" an initial value to fix a compile warning
The previous patch introduces a compile warning by not assigning an initial
value to the "flavor" variable. This could only be a problem if the server
returns a supported secflavor list of length zero, but it's better to
fix this before it's ever hit.
Signed-off-by: Anna Schumaker <bjschuma@netapp.com>
Acked-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Weston Andros Adamson [Tue, 24 Sep 2013 17:58:02 +0000 (13:58 -0400)]
NFSv4.1: try SECINFO_NO_NAME flavs until one works
Call nfs4_lookup_root_sec for each flavor returned by SECINFO_NO_NAME until
one works.
One example of a situation this fixes:
- server configured for krb5
- server principal somehow gets deleted from KDC
- server still thinking krb is good, sends krb5 as first entry in
SECINFO_NO_NAME response
- client tries krb5, but this fails without even sending an RPC because
gssd's requests to the KDC can't find the server's principal
Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Thu, 26 Sep 2013 18:32:56 +0000 (14:32 -0400)]
NFSv4.1: Ensure memory ordering between nfs4_ds_connect and nfs4_fl_prepare_ds
We need to ensure that the initialisation of the data server nfs_client
structure in nfs4_ds_connect is correctly ordered w.r.t. the read of
ds->ds_clp in nfs4_fl_prepare_ds.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Thu, 26 Sep 2013 18:08:36 +0000 (14:08 -0400)]
NFSv4.1: nfs4_fl_prepare_ds - fix bugs when the connect attempt fails
- Fix an Oops when nfs4_ds_connect() returns an error.
- Always check the device status after waiting for a connect to complete.
Reported-by: Andy Adamson <andros@netapp.com>
Reported-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: <stable@vger.kernel.org> # v3.10+
Linus Torvalds [Sun, 29 Sep 2013 17:04:03 +0000 (10:04 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull perf revert from Ingo Molnar:
"This fixes the 'perf top' regression Markus Trippelsdorf reported"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Revert "perf symbols: Demangle cloned functions"
Linus Torvalds [Sun, 29 Sep 2013 17:02:40 +0000 (10:02 -0700)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"Nothing too major, radeon still has some dpm changes for off by
default.
Radeon, intel, msm:
- radeon: a few more dpm fixes (still off by default), uvd fixes
- i915: runtime warn backtrace and regression fix
- msm: iommu changes fallout"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (27 commits)
drm/msm: use drm_gem_dumb_destroy helper
drm/msm: deal with mach/iommu.h removal
drm/msm: Remove iommu include from mdp4_kms.c
drm/msm: Odd PTR_ERR usage
drm/i915: Fix up usage of SHRINK_STOP
drm/radeon: fix hdmi audio on DCE3.0/3.1 asics
drm/i915: preserve pipe A quirk in i9xx_set_pipeconf
drm/i915/tv: clear adjusted_mode.flags
drm/i915/dp: increase i2c-over-aux retry interval on AUX DEFER
drm/radeon/cik: fix overflow in vram fetch
drm/radeon: add missing hdmi callbacks for rv6xx
drm/i915: Use a temporary va_list for two-pass string handling
drm/radeon/uvd: lower msg&fb buffer requirements on UVD3
drm/radeon: disable tests/benchmarks if accel is disabled
drm/radeon: don't set default clocks for SI when DPM is disabled
drm/radeon/dpm/ci: filter clocks based on voltage/clk dep tables
drm/radeon/dpm/si: filter clocks based on voltage/clk dep tables
drm/radeon/dpm/ni: filter clocks based on voltage/clk dep tables
drm/radeon/dpm/btc: filter clocks based on voltage/clk dep tables
drm/radeon/dpm: fetch the max clk from voltage dep tables helper
...
Ingo Molnar [Sun, 29 Sep 2013 14:12:54 +0000 (16:12 +0200)]
Revert "perf symbols: Demangle cloned functions"
This reverts commit
de95ab53645a2f0015e0f68ee723f18dce2b8b51.
Markus Trippelsdorf reported that this commit broke 'perf top':
> I just see a gray screen with no text at all. Sometimes the
> following error messages are printed:
>
> *** Error in `perf': invalid fastbin entry (free): 0x00000000029b18c0
> ***
> *** Error in `perf': malloc(): memory corruption (fast): 0x0000000000ee0b10 ***
While this code is fixable, the commit itself fails on several levels:
- it should have been a separate helper function
- why the heck does it do strchr() twice
- it casts a const char * over into char *
- sloppy style
- it's not even a regression fix!
So lets revert it and re-try the patch in v3.13.
Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Dave Airlie [Sun, 29 Sep 2013 00:06:28 +0000 (10:06 +1000)]
Merge branch 'msm-fixes-3.12-rc2' of git://people.freedesktop.org/~robclark/linux into drm-fixes
A small fix + deal with fallout of iommu changes + use new
drm_gem_dumb_destroy helper.
* 'msm-fixes-3.12-rc2' of git://people.freedesktop.org/~robclark/linux:
drm/msm: use drm_gem_dumb_destroy helper
drm/msm: deal with mach/iommu.h removal
drm/msm: Remove iommu include from mdp4_kms.c
drm/msm: Odd PTR_ERR usage
Linus Torvalds [Sat, 28 Sep 2013 21:22:17 +0000 (14:22 -0700)]
Merge branches 'sched-urgent-for-linus', 'timers-urgent-for-linus' and 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull scheduler, timer and x86 fixes from Ingo Molnar:
- A context tracking ARM build and functional fix
- A handful of ARM clocksource/clockevent driver fixes
- An AMD microcode patch level sysfs reporting fixlet
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
arm: Fix build error with context tracking calls
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clocksource: em_sti: Set cpu_possible_mask to fix SMP broadcast
clocksource: of: Respect device tree node status
clocksource: exynos_mct: Set IRQ affinity when the CPU goes online
arm: clocksource: mvebu: Use the main timer as clock source from DT
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/microcode/AMD: Fix patch level reporting for family 15h
Linus Torvalds [Sat, 28 Sep 2013 21:21:13 +0000 (14:21 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"A couple of tooling fixlets and a PMU detection printout fix"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86: Fix PMU detection printout when no PMU is detected
perf symbols: Demangle cloned functions
perf machine: Fix path unpopulated in machine__create_modules()
perf tools: Explicitly add libdl dependency
perf probe: Fix probing symbols with optimization suffix
perf trace: Add mmap2 handler
perf kmem: Make it work again on non NUMA machines
Linus Torvalds [Sat, 28 Sep 2013 20:52:05 +0000 (13:52 -0700)]
Merge tag 'xfs-for-linus-v3.12-rc3' of git://oss.sgi.com/xfs/xfs
Pull xfs bugfixes from Ben Myers:
- fix for directory node collapse regression
- fix for recovery over stale on disk structures
- fix for eofblocks ioctl
- fix asserts in xfs_inode_free
- lock the ail before removing an item from it
* tag 'xfs-for-linus-v3.12-rc3' of git://oss.sgi.com/xfs/xfs:
xfs: fix node forward in xfs_node_toosmall
xfs: log recovery lsn ordering needs uuid check
xfs: fix XFS_IOC_FREE_EOFBLOCKS definition
xfs: asserting lock not held during freeing not valid
xfs: lock the AIL before removing the buffer item
Linus Torvalds [Sat, 28 Sep 2013 20:44:09 +0000 (13:44 -0700)]
Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"Some driver bugfixes for the I2C subsystem"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: ismt: initialize DMA buffer
i2c: designware: 10-bit addressing mode enabling if I2C_DYNAMIC_TAR_UPDATE is set
i2c: mv64xxx: Do not use writel_relaxed()
i2c: mv64xxx: Fix some build warnings
i2c: s3c2410: fix clk_disable/clk_unprepare WARNings
Linus Torvalds [Sat, 28 Sep 2013 20:27:31 +0000 (13:27 -0700)]
Merge tag 'pm+acpi-3.12-rc3' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI and power management fixes from Rafael Wysocki:
"These fix one recent cpufreq regression, a few older bugs that may
harm users and a kerneldoc typo.
Specifics:
1) After the recent locking changes in the cpufreq core it is
possible to trigger BUG_ON(!policy) in lock_policy_rwsem_read() if
cpufreq_get() is called before registering a cpufreq driver. Fix
from Viresh Kumar.
2) If intel_pstate has been loaded already, it doesn't make sense to
do anything in acpi_cpufreq_init() and moreover doing something in
there in that case may be harmful, so make that function return
immediately if another cpufreq driver is already present. From
Yinghai Lu.
3) The ACPI IPMI driver sometimes attempts to acquire a mutex from
interrupt context, which can be avoided by replacing that mutex
with a spinlock. From Lv Zheng.
4) A NULL pointer may be dereferenced by the exynos5440 cpufreq
driver if a memory allocation made by it fails. Fix from Sachin
Kamat.
5) Hanjun Guo's commit fixes a typo in the kerneldoc comment
documenting acpi_bus_unregister_driver()"
* tag 'pm+acpi-3.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / scan: fix typo in comments of acpi_bus_unregister_driver()
cpufreq: exynos5440: Fix potential NULL pointer dereference
cpufreq: check cpufreq driver is valid and cpufreq isn't disabled in cpufreq_get()
acpi-cpufreq: skip loading acpi_cpufreq after intel_pstate
ACPI / IPMI: Fix atomic context requirement of ipmi_msg_handler()
Yinghai Lu [Sat, 28 Sep 2013 20:13:07 +0000 (13:13 -0700)]
PCI: Workaround missing pci_set_master in pci drivers
Ben Herrenschmidt found that commit
928bea964827 ("PCI: Delay enabling
bridges until they're needed") breaks PCI in some powerpc environments.
The reason is that the PCIe port driver will call pci_enable_device() on
the bridge, so the device is enabled, but skips pci_set_master because
pcie_port_auto and no acpi on powerpc.
Because of that, pci_enable_bridge() later on (called as a result of the
child device driver doing pci_enable_device) will see the bridge as
already enabled and will not call pci_set_master() on it.
Fixed by add checking in pci_enable_bridge, and call pci_set_master
if driver skip that.
That will make the code more robot and wade off problem for missing
pci_set_master in drivers.
Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 28 Sep 2013 19:36:19 +0000 (12:36 -0700)]
Merge branch 'lockref' of git://git./linux/kernel/git/s390/linux
Pull s390 lockref enablement from Heiko Carstens:
"Enabling the new lockless lockref variant on s390 would have been
trivial until Tony Luck added a cpu_relax() call into the
CMPXCHG_LOOP(), with commit
d472d9d98b46 ("lockref: Relax in cmpxchg
loop")
As already mentioned cpu_relax() is very expensive on s390 since it
yields() the current virtual cpu. So we are talking of several
thousand cycles. Considering this enabling the lockless lockref
variant would contradict the intention of the new semantics. And also
some quick measurements show performance regressions of 50% and more.
Simply removing the cpu_relax() call again seems also not very
desireable since Waiman Long reported that for some workloads the call
improved performance by 5%."
* 'lockref' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390: enable ARCH_USE_CMPXCHG_LOCKREF
lockref: use arch_mutex_cpu_relax() in CMPXCHG_LOOP()
mutex: replace CONFIG_HAVE_ARCH_MUTEX_CPU_RELAX with simple ifdef
Jean Delvare [Fri, 27 Sep 2013 20:17:39 +0000 (13:17 -0700)]
kernel/params: fix handling of signed integer types
Commit
6072ddc8520b ("kernel: replace strict_strto*() with kstrto*()")
broke the handling of signed integer types, fix it.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Reported-by: Christian Kujau <lists@nerdbynature.de>
Tested-by: Christian Kujau <lists@nerdbynature.de>
Cc: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 28 Sep 2013 18:57:26 +0000 (11:57 -0700)]
Merge tag 'devicetree-fixes' of git://git./linux/kernel/git/robh/linux
Pull DeviceTree fixes from Rob Herring:
"Clean-up to fix some warnings for !OF builds and spelling fixes in
docs:
- Clean-up openrisc prom.h
- Fix warnings caused by of_irq.h ifdefs
- Spelling fix for Synopsys"
* tag 'devicetree-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
dts: Fix misspelling of Synopsys
of: clean-up ifdefs in of_irq.h
openrisc: clean-up prom.h
Linus Torvalds [Sat, 28 Sep 2013 18:56:34 +0000 (11:56 -0700)]
Merge branch 'fixes' of git://git.linaro.org/people/rmk/linux-arm
Pull ARM fixes from Russell King:
"Just a few relatively small ARM fixes found since the last merge
window, nothing too exciting"
* 'fixes' of git://git.linaro.org/people/rmk/linux-arm:
ARM: 7837/3: fix Thumb-2 bug in AES assembler code
ARM: only allow kernel mode neon with AEABI
ARM: 7839/1: entry: fix tracing of ARM-private syscalls
ARM: 7836/1: add __get_user_unaligned/__put_user_unaligned
James Ralston [Tue, 24 Sep 2013 23:47:55 +0000 (16:47 -0700)]
i2c: ismt: initialize DMA buffer
This patch adds code to initialize the DMA buffer to compensate for
possible hardware data corruption.
Signed-off-by: James Ralston <james.d.ralston@intel.com>
[wsa: changed to use 'sizeof']
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Rob Clark [Sat, 28 Sep 2013 14:13:04 +0000 (10:13 -0400)]
drm/msm: use drm_gem_dumb_destroy helper
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Sat, 28 Sep 2013 14:07:06 +0000 (10:07 -0400)]
drm/msm: deal with mach/iommu.h removal
We still need an API exported by msm iommu driver (but not visible in
any public header anymore). For now, just declare the prototype
ourselves, but when msm iommu driver provides a better option, use that
instead.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Ingo Molnar [Sat, 28 Sep 2013 13:48:48 +0000 (15:48 +0200)]
perf/x86: Fix PMU detection printout when no PMU is detected
Ran into this cryptic PMU bootup log recently:
[ 0.124047] Performance Events:
[ 0.125000] smpboot: ...
Turns out we print this if no PMU is detected. Fall back to
the right condition so that the following is printed:
[ 0.122381] Performance Events: no PMU driver, software events only.
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/n/tip-u2fwaUffakjp0qkpRfqljgsn@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Joerg Roedel [Wed, 25 Sep 2013 14:49:40 +0000 (16:49 +0200)]
drm/msm: Remove iommu include from mdp4_kms.c
The include file has been removed and the file does not
need it anyway, so remove it. Fixes a compile error.
Signed-off-by: Joerg Roedel <joro@8bytes.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Thomas Meyer [Mon, 16 Sep 2013 21:19:54 +0000 (23:19 +0200)]
drm/msm: Odd PTR_ERR usage
The variable priv->kms is not initialized yet.
Found by "scripts/coccinelle/tests/odd_ptr_err.cocci".
PTR_ERR should access the value just tested by IS_ERR.
Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Heiko Carstens [Thu, 5 Sep 2013 11:26:17 +0000 (13:26 +0200)]
s390: enable ARCH_USE_CMPXCHG_LOCKREF
Enable ARCH_USE_CMPXCHG_LOCKREF since it shows performance improvements
with Linus' simple stat() test case of up to 50% on a 30 cpu system.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Heiko Carstens [Mon, 23 Sep 2013 10:59:56 +0000 (12:59 +0200)]
lockref: use arch_mutex_cpu_relax() in CMPXCHG_LOOP()
Make use of arch_mutex_cpu_relax() so architectures can override the
default cpu_relax() semantics.
This is especially useful for s390, where cpu_relax() means that we
yield() the current (virtual) cpu and therefore is very expensive,
and would contradict the whole purpose of the lockless cmpxchg loop.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Heiko Carstens [Sat, 28 Sep 2013 09:23:59 +0000 (11:23 +0200)]
mutex: replace CONFIG_HAVE_ARCH_MUTEX_CPU_RELAX with simple ifdef
Linus suggested to replace
#ifndef CONFIG_HAVE_ARCH_MUTEX_CPU_RELAX
#define arch_mutex_cpu_relax() cpu_relax()
#endif
with just a simple
#ifndef arch_mutex_cpu_relax
# define arch_mutex_cpu_relax() cpu_relax()
#endif
to get rid of CONFIG_HAVE_CPU_RELAX_SIMPLE. So architectures can
simply define arch_mutex_cpu_relax if they want an architecture
specific function instead of having to add a select statement in
their Kconfig in addition.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Ingo Molnar [Sat, 28 Sep 2013 06:50:09 +0000 (08:50 +0200)]
Merge branch 'context_tracking/fixes' of git://git./linux/kernel/git/frederic/linux-dynticks into sched/urgent
Pull context tracking ARM fix from Frederic Weisbecker.
Signed-off-by: Ingo Molnar <mingo@kernel.org>