Pekka Enberg [Tue, 30 Jun 2015 21:59:06 +0000 (14:59 -0700)]
kernel/relay.c: use kvfree() in relay_free_page_array()
Use kvfree() instead of open-coding it.
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Antonio Ospite [Tue, 30 Jun 2015 21:59:03 +0000 (14:59 -0700)]
printk: improve the description of /dev/kmsg line format
The comment about /dev/kmsg does not mention the additional values which
may actually be exported, fix that.
Also move up the part of the comment instructing the users to ignore these
additional values, this way the reading is more fluent and logically
compact.
Signed-off-by: Antonio Ospite <ao2@ao2.it>
Cc: Joe Perches <joe@perches.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Masanari Iida [Tue, 30 Jun 2015 21:59:00 +0000 (14:59 -0700)]
arch/unicore32/kernel/fpu-ucf64.c: remove unnecessary KERN_ERR
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dave Gordon [Tue, 30 Jun 2015 21:58:57 +0000 (14:58 -0700)]
drivers/scsi/scsi_debug.c: resolve sg buffer const-ness issue
do_device_access() takes a separate parameter to indicate the direction of
data transfer, which it used to use to select the appropriate function out
of sg_pcopy_{to,from}_buffer(). However these two functions now have
So this patch makes it bypass these wrappers and call the underlying
function sg_copy_buffer() directly; this has the same calling style as
do_device_access() i.e. a separate direction-of-transfer parameter and no
pointers-to-const, so skipping the wrappers not only eliminates the
warning, it also make the code simpler :)
[akpm@linux-foundation.org: fix very broken build]
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dave Gordon [Tue, 30 Jun 2015 21:58:54 +0000 (14:58 -0700)]
lib/scatterlist: mark input buffer parameters as 'const'
The 'buf' parameter of sg(p)copy_from_buffer() can and should be
const-qualified, although because of the shared implementation of
_to_buffer() and _from_buffer(), we have to cast this away internally.
This means that callers who have a 'const' buffer containing the data to
be copied to the sg-list no longer have to cast away the const-ness
themselves. It also enables improved coverage by code analysis tools.
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Cc: Akinobu Mita <akinobu.mita@gmail.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dave Gordon [Tue, 30 Jun 2015 21:58:52 +0000 (14:58 -0700)]
lib/scatterlist.c: fix kerneldoc for sg_pcopy_{to,from}_buffer()
The kerneldoc for the functions doesn't match the code; the last two
parameters (buflen, skip) have been transposed, which is confusing,
especially as they're both integral types and the compiler won't warn
about swapping them.
These functions and the kerneldoc were introduced in commit:
df642cea lib/scatterlist: introduce sg_pcopy_from_buffer() ...
Author: Akinobu Mita <akinobu.mita@gmail.com>
Date: Mon Jul 8 16:01:54 2013 -0700
The only difference between sg_pcopy_{from,to}_buffer() and
sg_copy_{from,to}_buffer() is an additional argument that
specifies the number of bytes to skip the SG list before
copying.
The functions have the extra argument at the end, but the kerneldoc
lists it in penultimate position.
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Davidlohr Bueso [Tue, 30 Jun 2015 21:58:48 +0000 (14:58 -0700)]
ipc,sysv: return -EINVAL upon incorrect id/seqnum
In ipc_obtain_object_check we return -EIDRM when a bogus sequence number
is detected via ipc_checkid, while the ipc manpages state the following
return codes for such errors:
EIDRM <ID> points to a removed identifier.
EINVAL Invalid <ID> value, or unaligned, etc.
EIDRM should only be returned upon a RMID call (->deleted check), and thus
return EINVAL for wrong seq. This difference in semantics has also caused
real bugs, ie: https://bugzilla.redhat.com/show_bug.cgi?id=246509
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Davidlohr Bueso [Tue, 30 Jun 2015 21:58:45 +0000 (14:58 -0700)]
ipc,sysv: make return -EIDRM when racing with RMID consistent
The ipc_lock helper is used by all forms of sysv ipc to acquire the ipc
object's spinlock. Upon error (bogus identifier), we always return
-EINVAL, whether the problem be in the idr path or because we raced with a
task performing RMID. For the later, however, all ipc related manpages,
state the that for:
EIDRM <ID> points to a removed identifier.
And return:
EINVAL Invalid <ID> value, or unaligned, etc.
Which (EINVAL) should only return once the ipc resource is deleted. For
all types of ipc this is done immediately upon a RMID command. However,
shared memory behaves slightly different as it can merely mark a segment
for deletion, and delay the actual freeing until there are no more active
consumers. Per shmctl(IPC_RMID) manpage:
""
Mark the segment to be destroyed. The segment will only actually
be destroyed after the last process detaches it (i.e., when the
shm_nattch member of the associated structure shmid_ds is zero).
""
Unlike ipc_lock, paths that behave "correctly", at least per the manpage,
involve controlling the ipc resource via *ctl(), doing the exact same
validity check as ipc_lock after right acquiring the spinlock:
if (!ipc_valid_object()) {
err = -EIDRM;
goto out_unlock;
}
Thus make ipc_lock consistent with the rest of ipc code and return -EIDRM
in ipc_lock when !ipc_valid_object().
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Davidlohr Bueso [Tue, 30 Jun 2015 21:58:42 +0000 (14:58 -0700)]
ipc: rename ipc_obtain_object
... to ipc_obtain_object_idr, which is more meaningful and makes the code
slightly easier to follow.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Davidlohr Bueso [Tue, 30 Jun 2015 21:58:39 +0000 (14:58 -0700)]
ipc,msg: provide barrier pairings for lockless receive
We currently use a full barrier on the sender side to to avoid receiver
tasks disappearing on us while still performing on the sender side wakeup.
We lack however, the proper CPU-CPU interactions pairing on the receiver
side which busy-waits for the message. Similarly, we do not need a full
smp_mb, and can relax the semantics for the writer and reader sides of the
message. This is safe as we are only ordering loads and stores to r_msg.
And in both smp_wmb and smp_rmb, there are no stores after the calls
_anyway_.
This obviously applies for pipelined_send and expunge_all, for EIRDM when
destroying a queue.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Davidlohr Bueso [Tue, 30 Jun 2015 21:58:36 +0000 (14:58 -0700)]
ipc,shm: move BUG_ON check into shm_lock
Upon every shm_lock call, we BUG_ON if an error was returned, indicating
racing either in idr or in shm_destroy. Move this logic into the locking.
[akpm@linux-foundation.org: simplify code]
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pekka Enberg [Tue, 30 Jun 2015 21:58:33 +0000 (14:58 -0700)]
ipc/util.c: use kvfree() in ipc_rcu_free()
Use kvfree() instead of open-coding it.
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Akinobu Mita [Tue, 30 Jun 2015 21:58:30 +0000 (14:58 -0700)]
arc: use for_each_sg()
This replaces the plain loop over the sglist array with for_each_sg()
macro which consists of sg_next() function calls. Since arc doesn't
select ARCH_HAS_SG_CHAIN, it is not necessary to use for_each_sg() in
order to loop over each sg element. But this can help find problems with
drivers that do not properly initialize their sg tables when
CONFIG_DEBUG_SG is enabled.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Josh Triplett [Tue, 30 Jun 2015 21:58:27 +0000 (14:58 -0700)]
devpts: if initialization failed, don't crash when opening /dev/ptmx
If devpts failed to initialize, it would store an ERR_PTR in the global
devpts_mnt. A subsequent open of /dev/ptmx would call devpts_new_index,
which would dereference devpts_mnt and crash.
Avoid storing invalid values in devpts_mnt; leave it NULL instead. Make
both devpts_new_index and devpts_pty_new fail gracefully with ENODEV in
that case, which then becomes the return value to the userspace open call
on /dev/ptmx.
[akpm@linux-foundation.org: remove unneeded static]
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thiébaud Weksteen [Tue, 30 Jun 2015 21:58:24 +0000 (14:58 -0700)]
scripts/gdb: remove useless global instruction
Signed-off-by: Thiébaud Weksteen <thiebaud@weksteen.fr>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thiébaud Weksteen [Tue, 30 Jun 2015 21:58:21 +0000 (14:58 -0700)]
scripts/gdb: add ps command
Signed-off-by: Thiébaud Weksteen <thiebaud@weksteen.fr>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thiébaud Weksteen [Tue, 30 Jun 2015 21:58:18 +0000 (14:58 -0700)]
scripts/gdb: fix PEP8 compliance
Signed-off-by: Thiébaud Weksteen <thiebaud@weksteen.fr>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thiébaud Weksteen [Tue, 30 Jun 2015 21:58:16 +0000 (14:58 -0700)]
scripts/gdb: fix typo in exception name
Signed-off-by: Thiébaud Weksteen <thiebaud@weksteen.fr>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Kiszka [Tue, 30 Jun 2015 21:58:13 +0000 (14:58 -0700)]
scripts/gdb: enable completion for lx-list-check parameter
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Thiébaud Weksteen <thiebaud@weksteen.fr>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Kiszka [Tue, 30 Jun 2015 21:58:10 +0000 (14:58 -0700)]
scripts/gdb: also allow list_head pointer as lx-list-check paramter
This makes the usage more flexible.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Thiébaud Weksteen <thiebaud@weksteen.fr>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thiébaud Weksteen [Tue, 30 Jun 2015 21:58:07 +0000 (14:58 -0700)]
scripts/gdb: add command to check list consistency
Add a gdb script to verify the consistency of lists.
Signed-off-by: Thiébaud Weksteen <thiebaud@weksteen.fr>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Quentin Lambert [Tue, 30 Jun 2015 21:58:04 +0000 (14:58 -0700)]
memstick: remove deprecated use of pci api
Replace occurences of the pci api by appropriate call to the dma api.
A simplified version of the semantic patch that finds this problem is as
follows: (http://coccinelle.lip6.fr)
@deprecated@
idexpression id;
position p;
@@
(
pci_dma_supported@p ( id, ...)
|
pci_alloc_consistent@p ( id, ...)
)
@bad1@
idexpression id;
position deprecated.p;
@@
...when != &id->dev
when != pci_get_drvdata ( id )
when != pci_enable_device ( id )
(
pci_dma_supported@p ( id, ...)
|
pci_alloc_consistent@p ( id, ...)
)
@depends on !bad1@
idexpression id;
expression direction;
position deprecated.p;
@@
(
- pci_dma_supported@p ( id,
+ dma_supported ( &id->dev,
...
+ , GFP_ATOMIC
)
|
- pci_alloc_consistent@p ( id,
+ dma_alloc_coherent ( &id->dev,
...
+ , GFP_ATOMIC
)
)
Signed-off-by: Quentin Lambert <lambert.quentin@gmail.com>
Cc: Maxim Levitsky <maximlevitsky@gmail.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fabian Frederick [Tue, 30 Jun 2015 21:58:01 +0000 (14:58 -0700)]
fs/affs/symlink.c: remove unneeded err variable
err is only assigned to -EIO. Return that value at the end of fail
context.
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fabian Frederick [Tue, 30 Jun 2015 21:57:58 +0000 (14:57 -0700)]
fs/affs/amigaffs.c: remove unneeded initialization
bh is initialized unconditionally in affs_remove_link()
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fabian Frederick [Tue, 30 Jun 2015 21:57:55 +0000 (14:57 -0700)]
fs/affs/inode.c: remove unneeded initialization
bh is initialized unconditionally in affs_add_entry()
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Firo Yang [Tue, 30 Jun 2015 21:57:52 +0000 (14:57 -0700)]
fs/adfs: remove unneeded cast
kmem_cache_alloc() returns void*.
Signed-off-by: Firo Yang <firogm@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Lorenzo Stoakes [Tue, 30 Jun 2015 21:57:49 +0000 (14:57 -0700)]
gcov: add support for GCC 5.1
Fix kernel gcov support for GCC 5.1. Similar to commit
a992bf836f9
("gcov: add support for GCC 4.9"), this patch takes into account the
existence of a new gcov counter (see gcc's gcc/gcov-counter.def.)
Firstly, it increments GCOV_COUNTERS (to 10), which makes the data
structure struct gcov_info compatible with GCC 5.1.
Secondly, a corresponding counter function __gcov_merge_icall_topn (Top N
value tracking for indirect calls) is included in base.c with the other
gcov counters unused for kernel profiling.
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Andrey Ryabinin <a.ryabinin@samsung.com>
Cc: Yuan Pengfei <coolypf@qq.com>
Tested-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
HATAYAMA Daisuke [Tue, 30 Jun 2015 21:57:46 +0000 (14:57 -0700)]
kernel/panic/kexec: fix "crash_kexec_post_notifiers" option issue in oops path
Commit
f06e5153f4ae2e ("kernel/panic.c: add "crash_kexec_post_notifiers"
option for kdump after panic_notifers") introduced
"crash_kexec_post_notifiers" kernel boot option, which toggles wheather
panic() calls crash_kexec() before panic_notifiers and dump kmsg or after.
The problem is that the commit overlooks panic_on_oops kernel boot option.
If it is enabled, crash_kexec() is called directly without going through
panic() in oops path.
To fix this issue, this patch adds a check to "crash_kexec_post_notifiers"
in the condition of kexec_should_crash().
Also, put a comment in kexec_should_crash() to explain not obvious things
on this patch.
Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Acked-by: Baoquan He <bhe@redhat.com>
Tested-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
HATAYAMA Daisuke [Tue, 30 Jun 2015 21:57:43 +0000 (14:57 -0700)]
kernel/panic: call the 2nd crash_kexec() only if crash_kexec_post_notifiers is enabled
For compatibility with the behaviour before the commit
f06e5153f4ae2e
("kernel/panic.c: add "crash_kexec_post_notifiers" option for kdump after
panic_notifers"), the 2nd crash_kexec() should be called only if
crash_kexec_post_notifiers is enabled.
Note that crash_kexec() returns immediately if kdump crash kernel is not
loaded, so in this case, this patch makes no functionality change, but the
point is to make it explicit, from the caller panic() side, that the 2nd
crash_kexec() does nothing.
Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Suggested-by: Ingo Molnar <mingo@kernel.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KarimAllah Ahmed [Tue, 30 Jun 2015 21:57:39 +0000 (14:57 -0700)]
x86/kexec: prepend elfcorehdr instead of appending it to the crash-kernel command-line.
Any parameter passed after '--' in the kernel command-line will not be
parsed by the kernel at all, instead it will be passed directly to init
process.
Currently the kernel appends elfcorehdr=<paddr> to the cmdline passed from
kexec load, and if this command-line is used to pass parameters to init
process this means that 'elfcorehdr' will not be parsed as a kernel
parameter at all which will be a problem for vmcore subsystem since it
will know nothing about the location of the ELF structure!
Prepending 'elfcorehdr' instead of appending it fixes this problem since
it ensures that it always comes before '--' and so it's always parsed as a
kernel command-line parameter.
Even with this patch things can still go wrong if 'CONFIG_CMDLINE' was
also used to embedd a command-line to the crash dump kernel and this
command-line contains '--' since the current behavior of the kernel is to
actually append the boot loader command-line to the embedded command-line.
Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Cc: Haren Myneni <hbabu@us.ibm.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yann Droneaud [Tue, 30 Jun 2015 21:57:36 +0000 (14:57 -0700)]
fs: document seq_open()'s usage of file->private_data
seq_open() stores its struct seq_file in file->private_data, thus it must
not be modified by user of seq_file.
Link: http://lkml.kernel.org/r/cover.1433193673.git.ydroneaud@opteya.com
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yann Droneaud [Tue, 30 Jun 2015 21:57:33 +0000 (14:57 -0700)]
fs: allocate structure unconditionally in seq_open()
Since patch described below, from v2.6.15-rc1, seq_open() could use a
struct seq_file already allocated by the caller if the pointer to the
structure is stored in file->private_data before calling the function.
Commit
1abe77b0fc4b485927f1f798ae81a752677e1d05
Author: Al Viro <viro@zeniv.linux.org.uk>
Date: Mon Nov 7 17:15:34 2005 -0500
[PATCH] allow callers of seq_open do allocation themselves
Allow caller of seq_open() to kmalloc() seq_file + whatever else they
want and set ->private_data to it. seq_open() will then abstain from
doing allocation itself.
As there's no more use for such feature, as it could be easily replaced by
calls to seq_open_private() (see commit
39699037a5c9 ("[FS] seq_file:
Introduce the seq_open_private()")) and seq_release_private() (see
v2.6.0-test3), support for this uncommon feature can be removed from
seq_open().
Link: http://lkml.kernel.org/r/cover.1433193673.git.ydroneaud@opteya.com
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yann Droneaud [Tue, 30 Jun 2015 21:57:30 +0000 (14:57 -0700)]
fs: use seq_open_private() for proc_mounts
A patchset to remove support for passing pre-allocated struct seq_file to
seq_open(). Such feature is undocumented and prone to error.
In particular, if seq_release() is used in release handler, it will
kfree() a pointer which was not allocated by seq_open().
So this patchset drops support for pre-allocated struct seq_file: it's
only of use in proc_namespace.c and can be easily replaced by using
seq_open_private()/seq_release_private().
Additionally, it documents the use of file->private_data to hold pointer
to struct seq_file by seq_open().
This patch (of 3):
Since patch described below, from v2.6.15-rc1, seq_open() could use a
struct seq_file already allocated by the caller if the pointer to the
structure is stored in file->private_data before calling the function.
Commit
1abe77b0fc4b485927f1f798ae81a752677e1d05
Author: Al Viro <viro@zeniv.linux.org.uk>
Date: Mon Nov 7 17:15:34 2005 -0500
[PATCH] allow callers of seq_open do allocation themselves
Allow caller of seq_open() to kmalloc() seq_file + whatever else they
want and set ->private_data to it. seq_open() will then abstain from
doing allocation itself.
Such behavior is only used by mounts_open_common().
In order to drop support for such uncommon feature, proc_mounts is
converted to use seq_open_private(), which take care of allocating the
proc_mounts structure, making it available through ->private in struct
seq_file.
Conversely, proc_mounts is converted to use seq_release_private(), in
order to release the private structure allocated by seq_open_private().
Then, ->private is used directly instead of proc_mounts() macro to access
to the proc_mounts structure.
Link: http://lkml.kernel.org/r/cover.1433193673.git.ydroneaud@opteya.com
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mel Gorman [Tue, 30 Jun 2015 21:57:27 +0000 (14:57 -0700)]
mm: meminit: finish initialisation of struct pages before basic setup
Waiman Long reported that 24TB machines hit OOM during basic setup when
struct page initialisation was deferred. One approach is to initialise
memory on demand but it interferes with page allocator paths. This patch
creates dedicated threads to initialise memory before basic setup. It
then blocks on a rw_semaphore until completion as a wait_queue and counter
is overkill. This may be slower to boot but it's simplier overall and
also gets rid of a section mangling which existed so kswapd could do the
initialisation.
[akpm@linux-foundation.org: include rwsem.h, use DECLARE_RWSEM, fix comment, remove unneeded cast]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Waiman Long <waiman.long@hp.com
Cc: Nathan Zimmer <nzimmer@sgi.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Scott Norton <scott.norton@hp.com>
Tested-by: Daniel J Blueman <daniel@numascale.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mel Gorman [Tue, 30 Jun 2015 21:57:23 +0000 (14:57 -0700)]
mm: meminit: remove mminit_verify_page_links
mminit_verify_page_links() is an extremely paranoid check that was
introduced when memory initialisation was being heavily reworked.
Profiles indicated that up to 10% of parallel memory initialisation was
spent on checking this for every page. The cost could be reduced but in
practice this check only found problems very early during the
initialisation rewrite and has found nothing since. This patch removes an
expensive unnecessary check.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Tested-by: Nate Zimmer <nzimmer@sgi.com>
Tested-by: Waiman Long <waiman.long@hp.com>
Tested-by: Daniel J Blueman <daniel@numascale.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Robin Holt <robinmholt@gmail.com>
Cc: Nate Zimmer <nzimmer@sgi.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Waiman Long <waiman.long@hp.com>
Cc: Scott Norton <scott.norton@hp.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mel Gorman [Tue, 30 Jun 2015 21:57:20 +0000 (14:57 -0700)]
mm: meminit: reduce number of times pageblocks are set during struct page init
During parallel sturct page initialisation, ranges are checked for every
PFN unnecessarily which increases boot times. This patch alters when the
ranges are checked.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Tested-by: Nate Zimmer <nzimmer@sgi.com>
Tested-by: Waiman Long <waiman.long@hp.com>
Tested-by: Daniel J Blueman <daniel@numascale.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Robin Holt <robinmholt@gmail.com>
Cc: Nate Zimmer <nzimmer@sgi.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Waiman Long <waiman.long@hp.com>
Cc: Scott Norton <scott.norton@hp.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mel Gorman [Tue, 30 Jun 2015 21:57:16 +0000 (14:57 -0700)]
mm: meminit: free pages in large chunks where possible
Parallel struct page frees pages one at a time. Try free pages as single
large pages where possible.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Tested-by: Nate Zimmer <nzimmer@sgi.com>
Tested-by: Waiman Long <waiman.long@hp.com>
Tested-by: Daniel J Blueman <daniel@numascale.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Robin Holt <robinmholt@gmail.com>
Cc: Nate Zimmer <nzimmer@sgi.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Waiman Long <waiman.long@hp.com>
Cc: Scott Norton <scott.norton@hp.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mel Gorman [Tue, 30 Jun 2015 21:57:13 +0000 (14:57 -0700)]
x86: mm: enable deferred struct page initialisation on x86-64
Subject says it all. Other architectures may enable on a case-by-case
basis after auditing early_pfn_to_nid and testing.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Tested-by: Nate Zimmer <nzimmer@sgi.com>
Tested-by: Waiman Long <waiman.long@hp.com>
Tested-by: Daniel J Blueman <daniel@numascale.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Robin Holt <robinmholt@gmail.com>
Cc: Nate Zimmer <nzimmer@sgi.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Waiman Long <waiman.long@hp.com>
Cc: Scott Norton <scott.norton@hp.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mel Gorman [Tue, 30 Jun 2015 21:57:09 +0000 (14:57 -0700)]
mm: meminit: minimise number of pfn->page lookups during initialisation
Deferred struct page initialisation is using pfn_to_page() on every PFN
unnecessarily. This patch minimises the number of lookups and scheduler
checks.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Tested-by: Nate Zimmer <nzimmer@sgi.com>
Tested-by: Waiman Long <waiman.long@hp.com>
Tested-by: Daniel J Blueman <daniel@numascale.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Robin Holt <robinmholt@gmail.com>
Cc: Nate Zimmer <nzimmer@sgi.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Waiman Long <waiman.long@hp.com>
Cc: Scott Norton <scott.norton@hp.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mel Gorman [Tue, 30 Jun 2015 21:57:05 +0000 (14:57 -0700)]
mm: meminit: initialise remaining struct pages in parallel with kswapd
Only a subset of struct pages are initialised at the moment. When this
patch is applied kswapd initialise the remaining struct pages in parallel.
This should boot faster by spreading the work to multiple CPUs and
initialising data that is local to the CPU. The user-visible effect on
large machines is that free memory will appear to rapidly increase early
in the lifetime of the system until kswapd reports that all memory is
initialised in the kernel log. Once initialised there should be no other
user-visibile effects.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Tested-by: Nate Zimmer <nzimmer@sgi.com>
Tested-by: Waiman Long <waiman.long@hp.com>
Tested-by: Daniel J Blueman <daniel@numascale.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Robin Holt <robinmholt@gmail.com>
Cc: Nate Zimmer <nzimmer@sgi.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Waiman Long <waiman.long@hp.com>
Cc: Scott Norton <scott.norton@hp.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mel Gorman [Tue, 30 Jun 2015 21:57:02 +0000 (14:57 -0700)]
mm: meminit: initialise a subset of struct pages if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set
This patch initalises all low memory struct pages and 2G of the highest
zone on each node during memory initialisation if
CONFIG_DEFERRED_STRUCT_PAGE_INIT is set. That config option cannot be set
but will be available in a later patch. Parallel initialisation of struct
page depends on some features from memory hotplug and it is necessary to
alter alter section annotations.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Tested-by: Nate Zimmer <nzimmer@sgi.com>
Tested-by: Waiman Long <waiman.long@hp.com>
Tested-by: Daniel J Blueman <daniel@numascale.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Robin Holt <robinmholt@gmail.com>
Cc: Nate Zimmer <nzimmer@sgi.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Waiman Long <waiman.long@hp.com>
Cc: Scott Norton <scott.norton@hp.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mel Gorman [Tue, 30 Jun 2015 21:56:59 +0000 (14:56 -0700)]
mm: meminit: inline some helper functions
early_pfn_in_nid() and meminit_pfn_in_nid() are small functions that are
unnecessarily visible outside memory initialisation. As well as
unnecessary visibility, it's unnecessary function call overhead when
initialising pages. This patch moves the helpers inline.
[akpm@linux-foundation.org: fix build]
[mhocko@suse.cz: fix build]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Tested-by: Nate Zimmer <nzimmer@sgi.com>
Tested-by: Waiman Long <waiman.long@hp.com>
Tested-by: Daniel J Blueman <daniel@numascale.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Robin Holt <robinmholt@gmail.com>
Cc: Nate Zimmer <nzimmer@sgi.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Waiman Long <waiman.long@hp.com>
Cc: Scott Norton <scott.norton@hp.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mel Gorman [Tue, 30 Jun 2015 21:56:55 +0000 (14:56 -0700)]
mm: meminit: make __early_pfn_to_nid SMP-safe and introduce meminit_pfn_in_nid
__early_pfn_to_nid() use static variables to cache recent lookups as
memblock lookups are very expensive but it assumes that memory
initialisation is single-threaded. Parallel initialisation of struct
pages will break that assumption so this patch makes __early_pfn_to_nid()
SMP-safe by requiring the caller to cache recent search information.
early_pfn_to_nid() keeps the same interface but is only safe to use early
in boot due to the use of a global static variable. meminit_pfn_in_nid()
is an SMP-safe version that callers must maintain their own state for.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Tested-by: Nate Zimmer <nzimmer@sgi.com>
Tested-by: Waiman Long <waiman.long@hp.com>
Tested-by: Daniel J Blueman <daniel@numascale.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Robin Holt <robinmholt@gmail.com>
Cc: Nate Zimmer <nzimmer@sgi.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Waiman Long <waiman.long@hp.com>
Cc: Scott Norton <scott.norton@hp.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mel Gorman [Tue, 30 Jun 2015 21:56:52 +0000 (14:56 -0700)]
mm: page_alloc: pass PFN to __free_pages_bootmem
__free_pages_bootmem prepares a page for release to the buddy allocator
and assumes that the struct page is initialised. Parallel initialisation
of struct pages defers initialisation and __free_pages_bootmem can be
called for struct pages that cannot yet map struct page to PFN. This
patch passes PFN to __free_pages_bootmem with no other functional change.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Tested-by: Nate Zimmer <nzimmer@sgi.com>
Tested-by: Waiman Long <waiman.long@hp.com>
Tested-by: Daniel J Blueman <daniel@numascale.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Robin Holt <robinmholt@gmail.com>
Cc: Nate Zimmer <nzimmer@sgi.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Waiman Long <waiman.long@hp.com>
Cc: Scott Norton <scott.norton@hp.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nathan Zimmer [Tue, 30 Jun 2015 21:56:48 +0000 (14:56 -0700)]
mm: meminit: only set page reserved in the memblock region
Currently each page struct is set as reserved upon initialization. This
patch leaves the reserved bit clear and only sets the reserved bit when it
is known the memory was allocated by the bootmem allocator. This makes it
easier to distinguish between uninitialised struct pages and reserved
struct pages in later patches.
Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Nathan Zimmer <nzimmer@sgi.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Tested-by: Nate Zimmer <nzimmer@sgi.com>
Tested-by: Waiman Long <waiman.long@hp.com>
Tested-by: Daniel J Blueman <daniel@numascale.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Robin Holt <robinmholt@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Waiman Long <waiman.long@hp.com>
Cc: Scott Norton <scott.norton@hp.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Robin Holt [Tue, 30 Jun 2015 21:56:45 +0000 (14:56 -0700)]
mm: meminit: move page initialization into a separate function
Currently, memmap_init_zone() has all the smarts for initializing a single
page. A subset of this is required for parallel page initialisation and
so this patch breaks up the monolithic function in preparation.
Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Nathan Zimmer <nzimmer@sgi.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Tested-by: Nate Zimmer <nzimmer@sgi.com>
Tested-by: Waiman Long <waiman.long@hp.com>
Tested-by: Daniel J Blueman <daniel@numascale.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Robin Holt <robinmholt@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Waiman Long <waiman.long@hp.com>
Cc: Scott Norton <scott.norton@hp.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Robin Holt [Tue, 30 Jun 2015 21:56:41 +0000 (14:56 -0700)]
memblock: introduce a for_each_reserved_mem_region iterator
Struct page initialisation had been identified as one of the reasons why
large machines take a long time to boot. Patches were posted a long time ago
to defer initialisation until they were first used. This was rejected on
the grounds it should not be necessary to hurt the fast paths. This series
reuses much of the work from that time but defers the initialisation of
memory to kswapd so that one thread per node initialises memory local to
that node.
After applying the series and setting the appropriate Kconfig variable I
see this in the boot log on a 64G machine
[ 7.383764] kswapd 0 initialised deferred memory in 188ms
[ 7.404253] kswapd 1 initialised deferred memory in 208ms
[ 7.411044] kswapd 3 initialised deferred memory in 216ms
[ 7.411551] kswapd 2 initialised deferred memory in 216ms
On a 1TB machine, I see
[ 8.406511] kswapd 3 initialised deferred memory in 1116ms
[ 8.428518] kswapd 1 initialised deferred memory in 1140ms
[ 8.435977] kswapd 0 initialised deferred memory in 1148ms
[ 8.437416] kswapd 2 initialised deferred memory in 1148ms
Once booted the machine appears to work as normal. Boot times were measured
from the time shutdown was called until ssh was available again. In the
64G case, the boot time savings are negligible. On the 1TB machine, the
savings were 16 seconds.
Nate Zimmer said:
: On an older 8 TB box with lots and lots of cpus the boot time, as
: measure from grub to login prompt, the boot time improved from 1484
: seconds to exactly 1000 seconds.
Waiman Long said:
: I ran a bootup timing test on a 12-TB 16-socket IvyBridge-EX system. From
: grub menu to ssh login, the bootup time was 453s before the patch and 265s
: after the patch - a saving of 188s (42%).
Daniel Blueman said:
: On a 7TB, 1728-core NumaConnect system with 108 NUMA nodes, we're seeing
: stock 4.0 boot in 7136s. This drops to 2159s, or a 70% reduction with
: this patchset. Non-temporal PMD init (https://lkml.org/lkml/2015/4/23/350)
: drops this to 1045s.
This patch (of 13):
As part of initializing struct page's in 2MiB chunks, we noticed that at
the end of free_all_bootmem(), there was nothing which had forced the
reserved/allocated 4KiB pages to be initialized.
This helper function will be used for that expansion.
Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Nate Zimmer <nzimmer@sgi.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Tested-by: Nate Zimmer <nzimmer@sgi.com>
Tested-by: Waiman Long <waiman.long@hp.com>
Tested-by: Daniel J Blueman <daniel@numascale.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Robin Holt <robinmholt@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Waiman Long <waiman.long@hp.com>
Cc: Scott Norton <scott.norton@hp.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Mon, 29 Jun 2015 18:10:56 +0000 (11:10 -0700)]
Merge tag 'md/4.2' of git://neil.brown.name/md
Pull md updates from Neil Brown:
"A mixed bag
- a few bug fixes
- some performance improvement that decrease lock contention
- some clean-up
Nothing major"
* tag 'md/4.2' of git://neil.brown.name/md:
md: clear Blocked flag on failed devices when array is read-only.
md: unlock mddev_lock on an error path.
md: clear mddev->private when it has been freed.
md: fix a build warning
md/raid5: ignore released_stripes check
md/raid5: per hash value and exclusive wait_for_stripe
md/raid5: split wait_for_stripe and introduce wait_for_quiescent
wait: introduce wait_event_exclusive_cmd
md: convert to kstrto*()
md/raid10: make sync_request_write() call bio_copy_data()
Christoph Lameter [Mon, 29 Jun 2015 14:28:08 +0000 (09:28 -0500)]
Fix kmalloc slab creation sequence
This patch restores the slab creation sequence that was broken by commit
4066c33d0308f8 and also reverts the portions that introduced the
KMALLOC_LOOP_XXX macros. Those can never really work since the slab creation
is much more complex than just going from a minimum to a maximum number.
The latest upstream kernel boots cleanly on my machine with a 64 bit x86
configuration under KVM using either SLAB or SLUB.
Fixes:
4066c33d0308f8 ("support the slub_debug boot option")
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Mon, 29 Jun 2015 17:34:42 +0000 (10:34 -0700)]
Merge tag 'libnvdimm-for-4.2' of git://git./linux/kernel/git/djbw/nvdimm
Pull libnvdimm subsystem from Dan Williams:
"The libnvdimm sub-system introduces, in addition to the
libnvdimm-core, 4 drivers / enabling modules:
NFIT:
Instantiates an "nvdimm bus" with the core and registers memory
devices (NVDIMMs) enumerated by the ACPI 6.0 NFIT (NVDIMM Firmware
Interface table).
After registering NVDIMMs the NFIT driver then registers "region"
devices. A libnvdimm-region defines an access mode and the
boundaries of persistent memory media. A region may span multiple
NVDIMMs that are interleaved by the hardware memory controller. In
turn, a libnvdimm-region can be carved into a "namespace" device and
bound to the PMEM or BLK driver which will attach a Linux block
device (disk) interface to the memory.
PMEM:
Initially merged in v4.1 this driver for contiguous spans of
persistent memory address ranges is re-worked to drive
PMEM-namespaces emitted by the libnvdimm-core.
In this update the PMEM driver, on x86, gains the ability to assert
that writes to persistent memory have been flushed all the way
through the caches and buffers in the platform to persistent media.
See memcpy_to_pmem() and wmb_pmem().
BLK:
This new driver enables access to persistent memory media through
"Block Data Windows" as defined by the NFIT. The primary difference
of this driver to PMEM is that only a small window of persistent
memory is mapped into system address space at any given point in
time.
Per-NVDIMM windows are reprogrammed at run time, per-I/O, to access
different portions of the media. BLK-mode, by definition, does not
support DAX.
BTT:
This is a library, optionally consumed by either PMEM or BLK, that
converts a byte-accessible namespace into a disk with atomic sector
update semantics (prevents sector tearing on crash or power loss).
The sinister aspect of sector tearing is that most applications do
not know they have a atomic sector dependency. At least today's
disk's rarely ever tear sectors and if they do one almost certainly
gets a CRC error on access. NVDIMMs will always tear and always
silently. Until an application is audited to be robust in the
presence of sector-tearing the usage of BTT is recommended.
Thanks to: Ross Zwisler, Jeff Moyer, Vishal Verma, Christoph Hellwig,
Ingo Molnar, Neil Brown, Boaz Harrosh, Robert Elliott, Matthew Wilcox,
Andy Rudoff, Linda Knippers, Toshi Kani, Nicholas Moulin, Rafael
Wysocki, and Bob Moore"
* tag 'libnvdimm-for-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm: (33 commits)
arch, x86: pmem api for ensuring durability of persistent memory updates
libnvdimm: Add sysfs numa_node to NVDIMM devices
libnvdimm: Set numa_node to NVDIMM devices
acpi: Add acpi_map_pxm_to_online_node()
libnvdimm, nfit: handle unarmed dimms, mark namespaces read-only
pmem: flag pmem block devices as non-rotational
libnvdimm: enable iostat
pmem: make_request cleanups
libnvdimm, pmem: fix up max_hw_sectors
libnvdimm, blk: add support for blk integrity
libnvdimm, btt: add support for blk integrity
fs/block_dev.c: skip rw_page if bdev has integrity
libnvdimm: Non-Volatile Devices
tools/testing/nvdimm: libnvdimm unit test infrastructure
libnvdimm, nfit, nd_blk: driver for BLK-mode access persistent memory
nd_btt: atomic sector updates
libnvdimm: infrastructure for btt devices
libnvdimm: write blk label set
libnvdimm: write pmem label set
libnvdimm: blk labels and namespace instantiation
...
Linus Torvalds [Mon, 29 Jun 2015 16:44:45 +0000 (09:44 -0700)]
Merge tag 'dmaengine-4.2-rc1' of git://git.infradead.org/users/vkoul/slave-dma
Pull dmaengine updates from Vinod Koul:
"This time we have support for few new devices, few new features and
odd fixes spread thru the subsystem.
New devices added:
- support for CSRatlas7 dma controller
- Allwinner H3(sun8i) controller
- TI DMA crossbar driver on DRA7x
- new pxa driver
New features added:
- memset support is bought back now that we have a user in xdmac controller
- interleaved transfers support different source and destination strides
- supporting DMA routers and configuration thru DT
- support for reusing descriptors
- xdmac memset and interleaved transfer support
- hdmac support for interleaved transfers
- omap-dma support for memcpy
Others:
- Constify platform_device_id
- mv_xor fixes and improvements"
* tag 'dmaengine-4.2-rc1' of git://git.infradead.org/users/vkoul/slave-dma: (46 commits)
dmaengine: xgene: fix file permission
dmaengine: fsl-edma: clear pending interrupts on initialization
dmaengine: xdmac: Add memset support
Documentation: dmaengine: document DMA_CTRL_ACK
dmaengine: virt-dma: don't always free descriptor upon completion
dmaengine: Revert "drivers/dma: remove unused support for MEMSET operations"
dmaengine: hdmac: Implement interleaved transfers
dmaengine: Move icg helpers to global header
dmaengine: mv_xor: improve descriptors list handling and reduce locking
dmaengine: mv_xor: Enlarge descriptor pool size
dmaengine: mv_xor: add support for a38x command in descriptor mode
dmaengine: mv_xor: Rename function for consistent naming
dmaengine: mv_xor: bug fix for racing condition in descriptors cleanup
dmaengine: pl330: fix wording in mcbufsz message
dmaengine: sirf: add CSRatlas7 SoC support
dmaengine: xgene-dma: Fix "incorrect type in assignement" warnings
dmaengine: fix kernel-doc documentation
dmaengine: pxa_dma: add support for legacy transition
dmaengine: pxa_dma: add debug information
dmaengine: pxa: add pxa dmaengine driver
...
Linus Torvalds [Mon, 29 Jun 2015 16:25:14 +0000 (09:25 -0700)]
Merge tag 'please-pull-misc-4.2' of git://git./linux/kernel/git/aegl/linux
Pull ia64 updates from Tony Luck:
"Pair of ia64 cleanups"
* tag 'please-pull-misc-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
ia64: Use setup_timer
ia64: export flush_icache_range for module use
Linus Torvalds [Mon, 29 Jun 2015 16:11:10 +0000 (09:11 -0700)]
Merge tag 'linux-kselftest-4.2-rc1' of git://git./linux/kernel/git/shuah/linux-kselftest
Pull kselftest update from Shuah Khan:
"This update adds two new test suites: futex and seccomp.
In addition, it includes fixes for bugs in timers, other tests, and
compile framework. It introduces new quicktest feature to enable
users to choose to run tests that complete in a short time"
* tag 'linux-kselftest-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests: add quicktest support
selftests: add seccomp suite
selftest, x86: fix incorrect comment
tools selftests: Fix 'clean' target with make 3.81
selftests/futex: Add .gitignore
kselftest: Add exit code defines
selftests: Add futex tests to the top-level Makefile
selftests/futex: Increment ksft pass and fail counters
selftests/futex: Update Makefile to use lib.mk
selftests: Add futex functional tests
kselftests: timers: Check _ALARM clockids are supported before suspending
kselftests: timers: Ease alarmtimer-suspend unreasonable latency value
kselftests: timers: Increase delay between suspends in alarmtimer-suspend
selftests/exec: do not install subdir as it is already created
selftests/ftrace: install test.d
selftests: copy TEST_DIRS to INSTALL_PATH
Test compaction of mlocked memory
selftests/mount: output WARN messages when mount test skipped
selftests/timers: Make git ignore all binaries in timers test suite
Linus Torvalds [Sun, 28 Jun 2015 23:52:47 +0000 (16:52 -0700)]
Merge branch 'for-next' of git://git./linux/kernel/git/gerg/m68knommu
Pull m68knommu updates from Greg Ungerer:
"Only a couple of small changes.
Improved the m68knommu MAINTAINERS entry to make it clearer which m68k
parts this applies to, and a print format clean up"
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
m68k: improve m68knommu MAINTAINERS entry
m68k: Use vsprintf %pM extension
Linus Torvalds [Sun, 28 Jun 2015 20:55:08 +0000 (13:55 -0700)]
Merge branch 'for-linus-4.2-rc1' of git://git./linux/kernel/git/rw/uml
Pull UML updates from Richard Weinberger:
- remove hppfs ("HonePot ProcFS")
- initial support for musl libc
- uaccess cleanup
- random cleanups and bug fixes all over the place
* 'for-linus-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml: (21 commits)
um: Don't pollute kernel namespace with uapi
um: Include sys/types.h for makedev(), major(), minor()
um: Do not use stdin and stdout identifiers for struct members
um: Do not use __ptr_t type for stack_t's .ss pointer
um: Fix mconsole dependency
um: Handle tracehook_report_syscall_entry() result
um: Remove copy&paste code from init.h
um: Stop abusing __KERNEL__
um: Catch unprotected user memory access
um: Fix warning in setup_signal_stack_si()
um: Rework uaccess code
um: Add uaccess.h to ldt.c
um: Add uaccess.h to syscalls_64.c
um: Add asm/elf.h to vma.c
um: Cleanup mem_32/64.c headers
um: Remove hppfs
um: Move syscall() declaration into os.h
um: kernel: ksyms: Export symbol syscall() for fixing modpost issue
um/os-Linux: Use char[] for syscall_stub declarations
um: Use char[] for linker script address declarations
...
Linus Torvalds [Sun, 28 Jun 2015 19:32:13 +0000 (12:32 -0700)]
Merge tag 'vfio-v4.2-rc1' of git://github.com/awilliam/linux-vfio
Pull VFIO updates from Alex Williamson:
- fix race with device reference versus driver release (Alex Williamson)
- add reset hooks and Calxeda xgmac reset for vfio-platform (Eric Auger)
- enable vfio-platform for ARM64 (Eric Auger)
- tag Baptiste Reynal as vfio-platform sub-maintainer (Alex Williamson)
* tag 'vfio-v4.2-rc1' of git://github.com/awilliam/linux-vfio:
MAINTAINERS: Add vfio-platform sub-maintainer
VFIO: platform: enable ARM64 build
VFIO: platform: Calxeda xgmac reset module
VFIO: platform: populate the reset function on probe
VFIO: platform: add reset callback
VFIO: platform: add reset struct and lookup table
vfio/pci: Fix racy vfio_device_get_from_dev() call
Linus Torvalds [Sat, 27 Jun 2015 20:53:16 +0000 (13:53 -0700)]
Merge branch 'upstream' of git://git.infradead.org/users/pcmoore/audit
Pull audit updates from Paul Moore:
"Four small audit patches for v4.2, all bug fixes. Only 10 lines of
change this time so very unremarkable, the patch subject lines pretty
much tell the whole story"
* 'upstream' of git://git.infradead.org/users/pcmoore/audit:
audit: Fix check of return value of strnlen_user()
audit: obsolete audit_context check is removed in audit_filter_rules()
audit: fix for typo in comment to function audit_log_link_denied()
lsm: rename duplicate labels in LSM_AUDIT_DATA_TASK audit message type
Linus Torvalds [Sat, 27 Jun 2015 20:26:03 +0000 (13:26 -0700)]
Merge branch 'next' of git://git./linux/kernel/git/jmorris/linux-security
Pull security subsystem updates from James Morris:
"The main change in this kernel is Casey's generalized LSM stacking
work, which removes the hard-coding of Capabilities and Yama stacking,
allowing multiple arbitrary "small" LSMs to be stacked with a default
monolithic module (e.g. SELinux, Smack, AppArmor).
See
https://lwn.net/Articles/636056/
This will allow smaller, simpler LSMs to be incorporated into the
mainline kernel and arbitrarily stacked by users. Also, this is a
useful cleanup of the LSM code in its own right"
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (38 commits)
tpm, tpm_crb: fix le64_to_cpu conversions in crb_acpi_add()
vTPM: set virtual device before passing to ibmvtpm_reset_crq
tpm_ibmvtpm: remove unneccessary message level.
ima: update builtin policies
ima: extend "mask" policy matching support
ima: add support for new "euid" policy condition
ima: fix ima_show_template_data_ascii()
Smack: freeing an error pointer in smk_write_revoke_subj()
selinux: fix setting of security labels on NFS
selinux: Remove unused permission definitions
selinux: enable genfscon labeling for sysfs and pstore files
selinux: enable per-file labeling for debugfs files.
selinux: update netlink socket classes
signals: don't abuse __flush_signals() in selinux_bprm_committed_creds()
selinux: Print 'sclass' as string when unrecognized netlink message occurs
Smack: allow multiple labels in onlycap
Smack: fix seq operations in smackfs
ima: pass iint to ima_add_violation()
ima: wrap event related data to the new ima_event_data structure
integrity: add validity checks for 'path' parameter
...
Linus Torvalds [Sat, 27 Jun 2015 19:44:34 +0000 (12:44 -0700)]
Merge branch 'upstream' of git://git.linux-mips.org/ralf/upstream-linus
Pull MIPS updates from Ralf Baechle:
- Improvements to the tlb_dump code
- KVM fixes
- Add support for appended DTB
- Minor improvements to the R12000 support
- Minor improvements to the R12000 support
- Various platform improvments for BCM47xx
- The usual pile of minor cleanups
- A number of BPF fixes and improvments
- Some improvments to the support for R3000 and DECstations
- Some improvments to the ATH79 platform support
- A major patchset for the JZ4740 SOC adding support for the CI20 platform
- Add support for the Pistachio SOC
- Minor BMIPS/BCM63xx platform support improvments.
- Avoid "SYNC 0" as memory barrier when unlocking spinlocks
- Add support for the XWR-1750 board.
- Paul's __cpuinit/__cpuinitdata cleanups.
- New Malta CPU board support large memory so enable ZONE_DMA32.
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (131 commits)
MIPS: spinlock: Adjust arch_spin_lock back-off time
MIPS: asmmacro: Ensure 64-bit FP registers are used with MSA
MIPS: BCM47xx: Simplify handling SPROM revisions
MIPS: Cobalt Don't use module_init in non-modular MTD registration.
MIPS: BCM47xx: Move NVRAM driver to the drivers/firmware/
MIPS: use for_each_sg()
MIPS: BCM47xx: Don't select BCMA_HOST_PCI
MIPS: BCM47xx: Add helper variable for storing NVRAM length
MIPS: IRQ/IP27: Move IRQ allocation API to platform code.
MIPS: Replace smp_mb with release barrier function in unlocks.
MIPS: i8259: DT support
MIPS: Malta: Basic DT plumbing
MIPS: include errno.h for ENODEV in mips-cm.h
MIPS: Define GCR_GIC_STATUS register fields
MIPS: BPF: Introduce BPF ASM helpers
MIPS: BPF: Use BPF register names to describe the ABI
MIPS: BPF: Move register definition to the BPF header
MIPS: net: BPF: Replace RSIZE with SZREG
MIPS: BPF: Free up some callee-saved registers
MIPS: Xtalk: Update xwidget.h with known Xtalk device numbers
...
Linus Torvalds [Sat, 27 Jun 2015 17:14:39 +0000 (10:14 -0700)]
Merge branch 'for-4.2' of git://linux-nfs.org/~bfields/linux
Pull nfsd updates from Bruce Fields:
"A relatively quiet cycle, with a mix of cleanup and smaller bugfixes"
* 'for-4.2' of git://linux-nfs.org/~bfields/linux: (24 commits)
sunrpc: use sg_init_one() in krb5_rc4_setup_enc/seq_key()
nfsd: wrap too long lines in nfsd4_encode_read
nfsd: fput rd_file from XDR encode context
nfsd: take struct file setup fully into nfs4_preprocess_stateid_op
nfsd: refactor nfs4_preprocess_stateid_op
nfsd: clean up raparams handling
nfsd: use swap() in sort_pacl_range()
rpcrdma: Merge svcrdma and xprtrdma modules into one
svcrdma: Add a separate "max data segs macro for svcrdma
svcrdma: Replace GFP_KERNEL in a loop with GFP_NOFAIL
svcrdma: Keep rpcrdma_msg fields in network byte-order
svcrdma: Fix byte-swapping in svc_rdma_sendto.c
nfsd: Update callback sequnce id only CB_SEQUENCE success
nfsd: Reset cb_status in nfsd4_cb_prepare() at retrying
svcrdma: Remove svc_rdma_xdr_decode_deferred_req()
SUNRPC: Move EXPORT_SYMBOL for svc_process
uapi/nfs: Add NFSv4.1 ACL definitions
nfsd: Remove dead declarations
nfsd: work around a gcc-5.1 warning
nfsd: Checking for acl support does not require fetching any acls
...
Linus Torvalds [Sat, 27 Jun 2015 16:47:46 +0000 (09:47 -0700)]
Merge tag 'gfs2-merge-window' of git://git./linux/kernel/git/gfs2/linux-gfs2
Pull GFS2 updates from Bob Peterson:
"Here are the patches we've accumulated for GFS2 for the current
upstream merge window. We have a good mixture this time. Here are
some of the features:
- Fix a problem with RO mounts writing to the journal.
- Further improvements to quotas on GFS2.
- Added support for rename2 and RENAME_EXCHANGE on GFS2.
- Increase performance by making glock lru_list less of a bottleneck.
- Increase performance by avoiding unnecessary buffer_head releases.
- Increase performance by using average glock round trip time from all CPUs.
- Fixes for some compiler warnings and minor white space issues.
- Other misc bug fixes"
* tag 'gfs2-merge-window' of git://git.kernel.org:/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
GFS2: Don't brelse rgrp buffer_heads every allocation
GFS2: Don't add all glocks to the lru
gfs2: Don't support fallocate on jdata files
gfs2: s64 cast for negative quota value
gfs2: limit quota log messages
gfs2: fix quota updates on block boundaries
gfs2: fix shadow warning in gfs2_rbm_find()
gfs2: kerneldoc warning fixes
gfs2: convert simple_str to kstr
GFS2: make sure S_NOSEC flag isn't overwritten
GFS2: add support for rename2 and RENAME_EXCHANGE
gfs2: handle NULL rgd in set_rgrp_preferences
GFS2: inode.c: indent with TABs, not spaces
GFS2: mark the journal idle to fix ro mounts
GFS2: Average in only non-zero round-trip times for congestion stats
GFS2: Use average srttb value in congestion calculations
Linus Torvalds [Sat, 27 Jun 2015 16:41:50 +0000 (09:41 -0700)]
Revert "jbd2: speedup jbd2_journal_dirty_metadata()"
This reverts commit
2143c1965a761332ae417b22fd477b636e4f54ec.
This commit seems to be the cause of the following jbd2 assertion
failure:
------------[ cut here ]------------
kernel BUG at fs/jbd2/transaction.c:1325!
invalid opcode: 0000 [#1] SMP
Modules linked in: bnep bluetooth fuse ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 ...
CPU: 7 PID: 5509 Comm: gcc Not tainted 4.1.0-10944-g2a298679b411 #1
Hardware name: /DH87RL, BIOS RLH8710H.86A.0327.2014.0924.1645 09/24/2014
task:
ffff8803bf866040 ti:
ffff880308528000 task.ti:
ffff880308528000
RIP: jbd2_journal_dirty_metadata+0x237/0x290
Call Trace:
__ext4_handle_dirty_metadata+0x43/0x1f0
ext4_handle_dirty_dirent_node+0xde/0x160
? jbd2_journal_get_write_access+0x36/0x50
ext4_delete_entry+0x112/0x160
? __ext4_journal_start_sb+0x52/0xb0
ext4_unlink+0xfa/0x260
vfs_unlink+0xec/0x190
do_unlinkat+0x24a/0x270
SyS_unlink+0x11/0x20
entry_SYSCALL_64_fastpath+0x12/0x6a
---[ end trace
ae033ebde8d080b4 ]---
which is not easily reproducible (I've seen it just once, and then Ted
was able to reproduce it once). Revert it while Ted and Jan try to
figure out what is wrong.
Cc: Jan Kara <jack@suse.cz>
Acked-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 27 Jun 2015 03:12:21 +0000 (20:12 -0700)]
Merge branch 'for-4.2' of git://git./linux/kernel/git/tj/wq
Pull workqueue updates from Tejun Heo:
"Most of the changes are around implementing and fixing fallouts from
sysfs and internal interface to limit the CPUs available to all
unbound workqueues to help isolating CPUs. It needs more work as
ordered workqueues can roam unrestricted but still is a significant
improvement"
* 'for-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: fix typos in comments
workqueue: move flush_scheduled_work() to workqueue.h
workqueue: remove the lock from wq_sysfs_prep_attrs()
workqueue: remove the declaration of copy_workqueue_attrs()
workqueue: ensure attrs changes are properly synchronized
workqueue: separate out and refactor the locking of applying attrs
workqueue: simplify wq_update_unbound_numa()
workqueue: wq_pool_mutex protects the attrs-installation
workqueue: fix a typo
workqueue: function name in the comment differs from the real function name
workqueue: fix trivial typo in Documentation/workqueue.txt
workqueue: Allow modifying low level unbound workqueue cpumask
workqueue: Create low-level unbound workqueues cpumask
workqueue: split apply_workqueue_attrs() into 3 stages
Linus Torvalds [Sat, 27 Jun 2015 02:50:04 +0000 (19:50 -0700)]
Merge branch 'for-4.2' of git://git./linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
- threadgroup_lock got reorganized so that its users can pick the
actual locking mechanism to use. Its only user - cgroups - is
updated to use a percpu_rwsem instead of per-process rwsem.
This makes things a bit lighter on hot paths and allows cgroups to
perform and fail multi-task (a process) migrations atomically.
Multi-task migrations are used in several places including the
unified hierarchy.
- Delegation rule and documentation added to unified hierarchy. This
will likely be the last interface update from the cgroup core side
for unified hierarchy before lifting the devel mask.
- Some groundwork for the pids controller which is scheduled to be
merged in the coming devel cycle.
* 'for-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: add delegation section to unified hierarchy documentation
cgroup: require write perm on common ancestor when moving processes on the default hierarchy
cgroup: separate out cgroup_procs_write_permission() from __cgroup_procs_write()
kernfs: make kernfs_get_inode() public
MAINTAINERS: add a cgroup core co-maintainer
cgroup: fix uninitialised iterator in for_each_subsys_which
cgroup: replace explicit ss_mask checking with for_each_subsys_which
cgroup: use bitmask to filter for_each_subsys
cgroup: add seq_file forward declaration for struct cftype
cgroup: simplify threadgroup locking
sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem
sched, cgroup: reorganize threadgroup locking
cgroup: switch to unsigned long for bitmasks
cgroup: reorganize include/linux/cgroup.h
cgroup: separate out include/linux/cgroup-defs.h
cgroup: fix some comment typos
Stephen Rothwell [Mon, 25 May 2015 11:00:24 +0000 (21:00 +1000)]
power: axp288_charger: fix for API change
Caused by commit
843735b788a4 ("power: axp288_charger: axp288 charger
driver") from the battery tree interacting with commit
046050f6e623
("extcon: Update the prototype of extcon_register_notifier() with enum
extcon") from the extcon tree.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 27 Jun 2015 00:40:54 +0000 (17:40 -0700)]
Merge tag 'dma-buf-for-4.2' of git://git./linux/kernel/git/sumits/dma-buf
Pull dma-buf updates from Sumit Semwal:
"Minor changes for 4.2
- add ref-counting for kernel modules as exporters
- minor code style fixes"
* tag 'dma-buf-for-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/sumits/dma-buf:
dma-buf: Minor coding style fixes
dma-buf: add ref counting for module as exporter
Linus Torvalds [Fri, 26 Jun 2015 22:59:26 +0000 (15:59 -0700)]
Merge tag 'usb-4.2-rc1' of git://git./linux/kernel/git/gregkh/usb
Pull USB updates from Greg KH:
"Here's the big USB patchset for 4.2-rc1. As is normal these days, the
majority of changes are in the gadget drivers, with a bunch of other
small driver changes.
All of these have been in linux-next with no reported issues"
* tag 'usb-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (175 commits)
usb: dwc3: Use ASCII space in Kconfig
usb: chipidea: add work-around for Marvell HSIC PHY startup
usb: chipidea: allow multiple instances to use default ci_default_pdata
dt-bindings: Consolidate ChipIdea USB ci13xxx bindings
phy: add Marvell HSIC 28nm PHY
phy: Add Marvell USB 2.0 OTG 28nm PHY
dt-bindings: Add Marvell PXA1928 USB and HSIC PHY bindings
USB: ssb: use devm_kzalloc
USB: ssb: fix error handling in ssb_hcd_create_pdev()
usb: isp1760: check for null return from kzalloc
cdc-acm: Add support of ATOL FPrint fiscal printers
usb: chipidea: usbmisc_imx: Remove unneeded semicolon
USB: usbtmc: add device quirk for Rigol DS6104
USB: serial: mos7840: Use setup_timer
phy: twl4030-usb: add ABI documentation
phy: twl4030-usb: remove incorrect pm_runtime_get_sync() in probe function.
phy: twl4030-usb: remove pointless 'suspended' test in 'suspend' callback.
phy: twl4030-usb: make runtime pm more reliable.
drivers:usb:fsl: Fix compilation error for fsl ehci drv
usb: renesas_usbhs: Don't disable the pipe if Control write status stage
...
Linus Torvalds [Fri, 26 Jun 2015 22:53:22 +0000 (15:53 -0700)]
Merge tag 'tty-4.2-rc1' of git://git./linux/kernel/git/gregkh/tty
Pull tty/serial driver updates from Greg KH:
"Here's the tty and serial driver patches for 4.2-rc1.
A number of individual driver updates, some code cleanups, and other
minor things, full details in the shortlog.
All have been in linux-next for a while with no reported issues"
* tag 'tty-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (152 commits)
Doc: serial-rs485.txt: update RS485 driver interface
Doc: tty.txt: remove mention of the BKL
MAINTAINERS: tty: add serial docs directory
serial: sprd: check for NULL after calling devm_clk_get
serial: 8250_pci: Correct uartclk for xr17v35x expansion chips
serial: 8250_pci: Add support for 12 port Exar boards
serial: 8250_uniphier: add bindings document for UniPhier UART
serial: core: cleanup in uart_get_baud_rate()
serial: stm32-usart: Add STM32 USART Driver
tty/serial: kill off set_irq_flags usage
tty: move linux/gsmmux.h to uapi
doc: dt: add documentation for nxp,lpc1850-uart
serial: 8250: add LPC18xx/43xx UART driver
serial: 8250_uniphier: add UniPhier serial driver
serial: 8250_dw: support ACPI platforms with integrated DMA engine
serial: of_serial: check the return value of clk_prepare_enable()
serial: of_serial: use devm_clk_get() instead of clk_get()
serial: earlycon: Add support for big-endian MMIO accesses
serial: sirf: use hrtimer for data rx
serial: sirf: correct the fifo empty_bit
...
Linus Torvalds [Fri, 26 Jun 2015 22:46:08 +0000 (15:46 -0700)]
Merge tag 'staging-4.2-rc1' of git://git./linux/kernel/git/gregkh/staging
Pull staging driver updates from Greg KH:
"Here's the big, really big, staging tree patches for 4.2-rc1.
Loads of stuff in here, almost all just coding style fixes / churn,
and a few new drivers as well, one of which I just disabled from the
build a few minutes ago due to way too many build warnings.
Other than the one "disable this driver" patch, all of these have been
in linux-next for quite a while with no reported issues"
* tag 'staging-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (1163 commits)
staging: wilc1000: disable driver due to build warnings
Staging: rts5208: fix CHANGE_LINK_STATE value
Staging: sm750fb: ddk750_swi2c.c: Insert spaces before parenthesis
Staging: sm750fb: ddk750_swi2c.c: Place braces on correct lines
Staging: sm750fb: ddk750_swi2c.c: Insert spaces around operators
Staging: sm750fb: ddk750_swi2c.c: Replace spaces with tabs
Staging: sm750fb: ddk750_swi2c.h: Shorten lines to under 80 characters
Staging: sm750fb: ddk750_swi2c.h: Replace spaces with tabs
Staging: sm750fb: modedb.h: Shorten lines to under 80 characters
Staging: sm750fb: modedb.h: Replace spaces with tabs
staging: comedi: addi_apci_3120: rename 'this_board' variables
staging: comedi: addi_apci_1516: rename 'this_board' variables
staging: comedi: ni_atmio: cleanup ni_getboardtype()
staging: comedi: vmk80xx: sanity check context used to get the boardinfo
staging: comedi: vmk80xx: rename 'boardinfo' variables
staging: comedi: dt3000: rename 'this_board' variables
staging: comedi: adv_pci_dio: rename 'this_board' variables
staging: comedi: cb_pcidas64: rename 'thisboard' variables
staging: comedi: cb_pcidas: rename 'thisboard' variables
staging: comedi: me4000: rename 'thisboard' variables
...
Linus Torvalds [Fri, 26 Jun 2015 22:07:37 +0000 (15:07 -0700)]
Merge tag 'driver-core-4.2-rc1' of git://git./linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is the driver core / firmware changes for 4.2-rc1.
A number of small changes all over the place in the driver core, and
in the firmware subsystem. Nothing really major, full details in the
shortlog. Some of it is a bit of churn, given that the platform
driver probing changes was found to not work well, so they were
reverted.
All of these have been in linux-next for a while with no reported
issues"
* tag 'driver-core-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (31 commits)
Revert "base/platform: Only insert MEM and IO resources"
Revert "base/platform: Continue on insert_resource() error"
Revert "of/platform: Use platform_device interface"
Revert "base/platform: Remove code duplication"
firmware: add missing kfree for work on async call
fs: sysfs: don't pass count == 0 to bin file readers
base:dd - Fix for typo in comment to function driver_deferred_probe_trigger().
base/platform: Remove code duplication
of/platform: Use platform_device interface
base/platform: Continue on insert_resource() error
base/platform: Only insert MEM and IO resources
firmware: use const for remaining firmware names
firmware: fix possible use after free on name on asynchronous request
firmware: check for file truncation on direct firmware loading
firmware: fix __getname() missing failure check
drivers: of/base: move of_init to driver_init
drivers/base: cacheinfo: fix annoying typo when DT nodes are absent
sysfs: disambiguate between "error code" and "failure" in comments
driver-core: fix build for !CONFIG_MODULES
driver-core: make __device_attach() static
...
Linus Torvalds [Fri, 26 Jun 2015 21:51:15 +0000 (14:51 -0700)]
Merge tag 'char-misc-4.2-rc1' of git://git./linux/kernel/git/gregkh/char-misc
Pull char/misc driver updates from Greg KH:
"Here's the big char/misc driver pull request for 4.2-rc1.
Lots of mei, extcon, coresight, uio, mic, and other driver updates in
here. Full details in the shortlog. All of these have been in
linux-next for some time with no reported problems"
* tag 'char-misc-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (176 commits)
mei: me: wait for power gating exit confirmation
mei: reset flow control on the last client disconnection
MAINTAINERS: mei: add mei_cl_bus.h to maintained file list
misc: sram: sort and clean up included headers
misc: sram: move reserved block logic out of probe function
misc: sram: add private struct device and virt_base members
misc: sram: report correct SRAM pool size
misc: sram: bump error message level on unclean driver unbinding
misc: sram: fix device node reference leak on error
misc: sram: fix enabled clock leak on error path
misc: mic: Fix reported static checker warning
misc: mic: Fix randconfig build error by including errno.h
uio: pruss: Drop depends on ARCH_DAVINCI_DA850 from config
uio: pruss: Add CONFIG_HAS_IOMEM dependence
uio: pruss: Include <linux/sizes.h>
extcon: Redefine the unique id of supported external connectors without 'enum extcon' type
char:xilinx_hwicap:buffer_icap - change 1/0 to true/false for bool type variable in function buffer_icap_set_configuration().
Drivers: hv: vmbus: Allocate ring buffer memory in NUMA aware fashion
parport: check exclusive access before register
w1: use correct lock on error in w1_seq_show()
...
Linus Torvalds [Fri, 26 Jun 2015 21:02:43 +0000 (14:02 -0700)]
Merge tag 'trace-v4.2' of git://git./linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
"This patch series contains several clean ups and even a new trace
clock "monitonic raw". Also some enhancements to make the ring buffer
even faster. But the biggest and most noticeable change is the
renaming of the ftrace* files, structures and variables that have to
deal with trace events.
Over the years I've had several developers tell me about their
confusion with what ftrace is compared to events. Technically,
"ftrace" is the infrastructure to do the function hooks, which include
tracing and also helps with live kernel patching. But the trace
events are a separate entity altogether, and the files that affect the
trace events should not be named "ftrace". These include:
include/trace/ftrace.h -> include/trace/trace_events.h
include/linux/ftrace_event.h -> include/linux/trace_events.h
Also, functions that are specific for trace events have also been renamed:
ftrace_print_*() -> trace_print_*()
(un)register_ftrace_event() -> (un)register_trace_event()
ftrace_event_name() -> trace_event_name()
ftrace_trigger_soft_disabled() -> trace_trigger_soft_disabled()
ftrace_define_fields_##call() -> trace_define_fields_##call()
ftrace_get_offsets_##call() -> trace_get_offsets_##call()
Structures have been renamed:
ftrace_event_file -> trace_event_file
ftrace_event_{call,class} -> trace_event_{call,class}
ftrace_event_buffer -> trace_event_buffer
ftrace_subsystem_dir -> trace_subsystem_dir
ftrace_event_raw_##call -> trace_event_raw_##call
ftrace_event_data_offset_##call-> trace_event_data_offset_##call
ftrace_event_type_funcs_##call -> trace_event_type_funcs_##call
And a few various variables and flags have also been updated.
This has been sitting in linux-next for some time, and I have not
heard a single complaint about this rename breaking anything. Mostly
because these functions, variables and structures are mostly internal
to the tracing system and are seldom (if ever) used by anything
external to that"
* tag 'trace-v4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (33 commits)
ring_buffer: Allow to exit the ring buffer benchmark immediately
ring-buffer-benchmark: Fix the wrong type
ring-buffer-benchmark: Fix the wrong param in module_param
ring-buffer: Add enum names for the context levels
ring-buffer: Remove useless unused tracing_off_permanent()
ring-buffer: Give NMIs a chance to lock the reader_lock
ring-buffer: Add trace_recursive checks to ring_buffer_write()
ring-buffer: Allways do the trace_recursive checks
ring-buffer: Move recursive check to per_cpu descriptor
ring-buffer: Add unlikelys to make fast path the default
tracing: Rename ftrace_get_offsets_##call() to trace_event_get_offsets_##call()
tracing: Rename ftrace_define_fields_##call() to trace_event_define_fields_##call()
tracing: Rename ftrace_event_type_funcs_##call to trace_event_type_funcs_##call
tracing: Rename ftrace_data_offset_##call to trace_event_data_offset_##call
tracing: Rename ftrace_raw_##call event structures to trace_event_raw_##call
tracing: Rename ftrace_trigger_soft_disabled() to trace_trigger_soft_disabled()
tracing: Rename FTRACE_EVENT_FL_* flags to EVENT_FILE_FL_*
tracing: Rename struct ftrace_subsystem_dir to trace_subsystem_dir
tracing: Rename ftrace_event_name() to trace_event_name()
tracing: Rename FTRACE_MAX_EVENT to TRACE_EVENT_TYPE_MAX
...
Linus Torvalds [Fri, 26 Jun 2015 20:56:55 +0000 (13:56 -0700)]
Merge tag 'trace-fixes-4.1' of git://git./linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"This isn't my 4.2 pull request (yet). I found a few more bugs that I
would have sent to fix 4.1, but since 4.1 is already out, I'm sending
this before sending my 4.2 request (which is ready to go).
After fixing the previous filter issue reported by Vince Weaver, I
could not come up with a situation where the operand counter (cnt)
could go below zero, so I added a WARN_ON_ONCE(cnt < 0). Vince was
able to trigger that warn on with his fuzzer test, but didn't have a
filter input that caused it.
Later, Sasha Levin was able to trigger that same warning, and was able
to give me the filter string that triggered it. It was simply a
single operation ">".
I wrapped the filtering code in a userspace program such that I could
single step through the logic. With a single operator the operand
counter can legitimately go below zero, and should be reported to the
user as an error, but should not produce a kernel warning. The
WARN_ON_ONCE(cnt < 0) should be just a "if (cnt < 0) break;" and the
code following it will produce the error message for the user.
While debugging this, I found that there was another bug that let the
pointer to the filter string go beyond the filter string. This too
was fixed.
Finally, there was a typo in a stub function that only gets compiled
if trace events is disabled but tracing is enabled (I'm not even sure
that's possible)"
* tag 'trace-fixes-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Fix typo from "static inlin" to "static inline"
tracing/filter: Do not allow infix to exceed end of string
tracing/filter: Do not WARN on operand count going below zero
Linus Torvalds [Fri, 26 Jun 2015 20:18:51 +0000 (13:18 -0700)]
Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux
Pull drm updates from Dave Airlie:
"This is the main drm pull request for v4.2.
I've one other new driver from freescale on my radar, it's been posted
and reviewed, I'd just like to get someone to give it a last look, so
maybe I'll send it or maybe I'll leave it.
There is no major nouveau changes in here, Ben was working on
something big, and we agreed it was a bit late, there wasn't anything
else he considered urgent to merge.
There might be another msm pull for some bits that are waiting on
arm-soc, I'll see how we time it.
This touches some "of" stuff, acks are in place except for the fixes
to the build in various configs,t hat I just applied.
Summary:
New drivers:
- virtio-gpu:
KMS only pieces of driver for virtio-gpu in qemu.
This is just the first part of this driver, enough to run
unaccelerated userspace on. As qemu merges more we'll start
adding the 3D features for the virgl 3d work.
- amdgpu:
a new driver from AMD to driver their newer GPUs. (VI+)
It contains a new cleaner userspace API, and is a clean
break from radeon moving forward, that AMD are going to
concentrate on. It also contains a set of register headers
auto generated from AMD internal database.
core:
- atomic modesetting API completed, enabled by default now.
- Add support for mode_id blob to atomic ioctl to complete interface.
- bunch of Displayport MST fixes
- lots of misc fixes.
panel:
- new simple panels
- fix some long-standing build issues with bridge drivers
radeon:
- VCE1 support
- add a GPU reset counter for userspace
- lots of fixes.
amdkfd:
- H/W debugger support module
- static user-mode queues
- support killing all the waves when a process terminates
- use standard DECLARE_BITMAP
i915:
- Add Broxton support
- S3, rotation support for Skylake
- RPS booting tuning
- CPT modeset sequence fixes
- ns2501 dither support
- enable cmd parser on haswell
- cdclk handling fixes
- gen8 dynamic pte allocation
- lots of atomic conversion work
exynos:
- Add atomic modesetting support
- Add iommu support
- Consolidate drm driver initialization
- and MIC, DECON and MIPI-DSI support for exynos5433
omapdrm:
- atomic modesetting support (fixes lots of things in rewrite)
tegra:
- DP aux transaction fixes
- iommu support fix
msm:
- adreno a306 support
- various dsi bits
- various 64-bit fixes
- NV12MT support
rcar-du:
- atomic and misc fixes
sti:
- fix HDMI timing complaince
tilcdc:
- use drm component API to access tda998x driver
- fix module unloading
qxl:
- stability fixes"
* 'drm-next' of git://people.freedesktop.org/~airlied/linux: (872 commits)
drm/nouveau: Pause between setting gpu to D3hot and cutting the power
drm/dp/mst: close deadlock in connector destruction.
drm: Always enable atomic API
drm/vgem: Set unique to "vgem"
of: fix a build error to of_graph_get_endpoint_by_regs function
drm/dp/mst: take lock around looking up the branch device on hpd irq
drm/dp/mst: make sure mst_primary mstb is valid in work function
of: add EXPORT_SYMBOL for of_graph_get_endpoint_by_regs
ARM: dts: rename the clock of MIPI DSI 'pll_clk' to 'sclk_mipi'
drm/atomic: Don't set crtc_state->enable manually
drm/exynos: dsi: do not set TE GPIO direction by input
drm/exynos: dsi: add support for MIC driver as a bridge
drm/exynos: dsi: add support for Exynos5433
drm/exynos: dsi: make use of array for clock access
drm/exynos: dsi: make use of driver data for static values
drm/exynos: dsi: add macros for register access
drm/exynos: dsi: rename pll_clk to sclk_clk
drm/exynos: mic: add MIC driver
of: add helper for getting endpoint node of specific identifiers
drm/exynos: add Exynos5433 decon driver
...
Linus Torvalds [Fri, 26 Jun 2015 19:35:01 +0000 (12:35 -0700)]
Merge tag 'dm-4.2-fixes' of git://git./linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
"Apologies for not pressing this request-based DM partial completion
issue further, it was an oversight on my part. We'll have to get it
fixed up properly and revisit for a future release.
- Revert block and DM core changes the removed request-based DM's
ability to handle partial request completions -- otherwise with the
current SCSI LLDs these changes could lead to silent data
corruption.
- Fix two DM version bumps that were missing from the initial 4.2 DM
pull request (enabled userspace lvm2 to know certain changes have
been made)"
* tag 'dm-4.2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm cache policy smq: fix "default" version to be 1.4.0
dm: bump the ioctl version to 4.32.0
Revert "block, dm: don't copy bios for request clones"
Revert "dm: do not allocate any mempools for blk-mq request-based DM"
Linus Torvalds [Fri, 26 Jun 2015 19:27:48 +0000 (12:27 -0700)]
Merge git://git./linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
"This fixes the following issues:
- move -O0 jitterentropy code into its own file instead of using gcc
pragma magic.
- kill testmgr warning for gcm-aes-aesni.
- fix build failure in old rsa.
Other minor fixes:
- ignore asn1 files generated by new rsa.
- remove unnecessary kzfree NULL checks in jitterentropy.
- typo fix in akcipher"
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: rsa - add .gitignore for crypto/*.-asn1.[ch] files
crypto: asymmetric_keys/rsa - Use non-conflicting variable name
crypto: testmgr - don't print info about missing test for gcm-aes-aesni
crypto: jitterentropy - Delete unnecessary checks before the function call "kzfree"
crypto: akcipher - fix spelling cihper -> cipher
crypto: jitterentropy - avoid compiler warnings
Linus Torvalds [Fri, 26 Jun 2015 19:20:00 +0000 (12:20 -0700)]
Merge branch 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm
Pull ARM updates from Russell King:
"Bigger items included in this update are:
- A series of updates from Arnd for ARM randconfig build failures
- Updates from Dmitry for StrongARM SA-1100 to move IRQ handling to
drivers/irqchip/
- Move ARMs SP804 timer to drivers/clocksource/
- Perf updates from Mark Rutland in preparation to move the ARM perf
code into drivers/ so it can be shared with ARM64.
- MCPM updates from Nicolas
- Add support for taking platform serial number from DT
- Re-implement Keystone2 physical address space switch to conform to
architecture requirements
- Clean up ARMv7 LPAE code, which goes in hand with the Keystone2
changes.
- L2C cleanups to avoid unlocking caches if we're prevented by the
secure support to unlock.
- Avoid cleaning a potentially dirty cache containing stale data on
CPU initialisation
- Add ARM-only entry point for secondary startup (for machines that
can only call into a Thumb kernel in ARM mode). Same thing is also
done for the resume entry point.
- Provide arch_irqs_disabled via asm-generic
- Enlarge ARMv7M vector table
- Always use BFD linker for VDSO, as gold doesn't accept some of the
options we need.
- Fix an incorrect BSYM (for Thumb symbols) usage, and convert all
BSYM compiler macros to a "badr" (for branch address).
- Shut up compiler warnings provoked by our cmpxchg() implementation.
- Ensure bad xchg sizes fail to link"
* 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm: (75 commits)
ARM: Fix build if CLKDEV_LOOKUP is not configured
ARM: fix new BSYM() usage introduced via for-arm-soc branch
ARM: 8383/1: nommu: avoid deprecated source register on mov
ARM: 8391/1: l2c: add options to overwrite prefetching behavior
ARM: 8390/1: irqflags: Get arch_irqs_disabled from asm-generic
ARM: 8387/1: arm/mm/dma-mapping.c: Add arm_coherent_dma_mmap
ARM: 8388/1: tcm: Don't crash when TCM banks are protected by TrustZone
ARM: 8384/1: VDSO: force use of BFD linker
ARM: 8385/1: VDSO: group link options
ARM: cmpxchg: avoid warnings from macro-ized cmpxchg() implementations
ARM: remove __bad_xchg definition
ARM: 8369/1: ARMv7M: define size of vector table for Vybrid
ARM: 8382/1: clocksource: make ARM_TIMER_SP804 depend on GENERIC_SCHED_CLOCK
ARM: 8366/1: move Dual-Timer SP804 driver to drivers/clocksource
ARM: 8365/1: introduce sp804_timer_disable and remove arm_timer.h inclusion
ARM: 8364/1: fix BE32 module loading
ARM: 8360/1: add secondary_startup_arm prototype in header file
ARM: 8359/1: correct secondary_startup_arm mode
ARM: proc-v7: sanitise and document registers around errata
ARM: proc-v7: clean up MIDR access
...
Linus Torvalds [Fri, 26 Jun 2015 19:11:23 +0000 (12:11 -0700)]
Merge tag 'armsoc-defconfig' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM SoC defconfig updates from Kevin Hilman:
"We keep collecting defconfig updates in a separate branch mostly to
encourage people to handle them separately and avoid conflicts between
different topics.
Most of these are enablement of new SoCs, boards or drivers that have
come in, or minor config refreshes due to reorderings in Kconfig
files, etc. I.e. mostly minor churn of various kinds"
* tag 'armsoc-defconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (55 commits)
ARM: multi_v7_defconfig: remove duplicate CONFIG_COMMON_CLK_QCOM=y
ARM: multi_v7_defconfig: Enable display on Trats2 board
ARM64: add GPIO keys to the defconfig
ARM: keystone: defconfig: enable netcp driver by default
ARM: exynos_defconfig: Enable CONFIG_SENSORS_INA2XX for Odroid-XU3
ARM: exynos_defconfig: Enable CONFIG_SENSORS_PWM_FAN for Odroid-XU3
ARM: omap2plus_defconfig: Enable TOUCHSCREEN_PIXCIR
ARM: omap2plus_defconfig: Add dm816x USB PHY as a loadable module
ARM: omap2plus_defconifg: Enable DM9000 in omap2plus_defconfig
ARM: lpc18xx: remove DEBUG_LL_UART_8250 from defconfig
ARM: multi_v7_defconfig: Make media support modular
ARM: multi_v7_defconfig: Make sound support modular
ARM: multi_v7_defconfig: Enable shmobile r8a7778/bockw platform
ARM: exynos_defconfig: savedefconfig
ARM: exynos_defconfig: Enable display on Trats2 board
ARM: multi_v7_defconfig: Enable OHCI on exynos SoCs
ARM: multi_v7_defconfig: Enable TMU for exynos SoCs
ARM: multi_v7_defconfig: Enable PMIC and MUIC drivers for exynos
ARM: multi_v7_defconfig: Enable CPU idle for exynos SoCs
ARM: multi_v7_defconfig: Enable Cypress APA I2C Trackpad support
...
Greg Kroah-Hartman [Fri, 26 Jun 2015 19:04:47 +0000 (12:04 -0700)]
staging: wilc1000: disable driver due to build warnings
The wilc1000 has just too many build warnings to be able to enable it
for the 4.2 release. Given that there have not been any patches
submitted to properly fix these obvious errors, I'm going to disable it
for now. I will enable it back when the build warning fixes are
submitted, or, if that never happens, I will remove it from the tree.
Cc: Johnny Kim <johnny.kim@atmel.com>
Cc: Rachel Kim <rachel.kim@atmel.com>
Cc: Dean Lee <dean.lee@atmel.com>
Cc: Chris Park <chris.park@atmel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Linus Torvalds [Fri, 26 Jun 2015 18:54:29 +0000 (11:54 -0700)]
Merge tag 'armsoc-drivers' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM SoC driver updates from Kevin Hilman:
"Some of these are for drivers/soc, where we're now putting
SoC-specific drivers these days. Some are for other driver subsystems
where we have received acks from the appropriate maintainers.
Some highlights:
- simple-mfd: document DT bindings and misc updates
- migrate mach-berlin to simple-mfd for clock, pinctrl and reset
- memory: support for Tegra132 SoC
- memory: introduce tegra EMC driver for scaling memory frequency
- misc. updates for ARM CCI and CCN busses"
* tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (48 commits)
drivers: soc: sunxi: Introduce SoC driver to map SRAMs
arm-cci: Add aliases for PMU events
arm-cci: Add CCI-500 PMU support
arm-cci: Sanitise CCI400 PMU driver specific code
arm-cci: Abstract handling for CCI events
arm-cci: Abstract out the PMU counter details
arm-cci: Cleanup PMU driver code
arm-cci: Do not enable CCI-400 PMU by default
firmware: qcom: scm: Add HDCP Support
ARM: berlin: add an ADC node for the BG2Q
ARM: berlin: remove useless chip and system ctrl compatibles
clk: berlin: drop direct of_iomap of nodes reg property
ARM: berlin: move BG2Q clock node
ARM: berlin: move BG2CD clock node
ARM: berlin: move BG2 clock node
clk: berlin: prepare simple-mfd conversion
pinctrl: berlin: drop SoC stub provided regmap
ARM: berlin: move pinctrl to simple-mfd nodes
pinctrl: berlin: prepare to use regmap provided by syscon
reset: berlin: drop arch_initcall initialization
...
Linus Torvalds [Fri, 26 Jun 2015 18:43:59 +0000 (11:43 -0700)]
Merge tag 'armsoc-dt' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM SoC DT updates from Kevin Hilman:
"As usual, quite a few device-tree updates in ARM land. There was one
minor churn in DTs due to relicensing under a dual-license, and lots
of little additions of new peripherals, features etc, but nothing
really exciting to call to your attention. Some higlights, focsuing
on support for new SoCs and boards:
- AT91: new boards: Overkiz, Acme Systems' Arietta G25
- tegra: HDA support
- bcm: new platforms: Buffalo WXR-1900DHP, SmartRG SR400ac, ASUS
RT-AC87U
- mvebu: new platforms: Compulab CM-A510, Armada 385-based Linksys
boards, DLink DNS-327L
- OMAP: new platforms: Baltos IR5221, LogicPD Torpedo, Toby-Churchill
SL50
- ARM: added support for Juno r1 board
- sunxi: A33 SoC support; new boards: A23 EVB, SinA33, GA10H-A33,
Mele A1000G
- imx: i.MX7D SoC support; new boards: Armadeus Systems APF6,
Gateworks GW5510, and aristainetos2 boards
- hisilicon: hi6220 SoC support; new boards: 96boards hikey"
* tag 'armsoc-dt' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (462 commits)
ARM: hisi: revert changes from hisi/hip04-dt branch
ARM: nomadik: set proper compatible for accelerometer
ARM64: juno: add GPIO keys
ARM: at91/dt: sama5d4: fix dma conf for aes, sha and tdes nodes
ARM: dts: Introduce STM32F429 MCU
ARM: socfpga: dts: enable ethernet for Arria10 devkit
ARM: dts: k2l: fix the netcp range size
ARM: dts: k2e: fix the netcp range size
ARM: dts: k2hk: fix the netcp range size
ARM: dts: k2l-evm: Add device bindings for netcp driver
ARM: dts: k2e-evm: Add device bindings for netcp driver
ARM: dts: k2hk-evm: Add device bindings for netcp driver
ARM: BCM5301X: Add DT for Asus RT-AC87U
ARM: BCM5301X: add IRQ numbers for PCIe controller
ARM: BCM5301X: add NAND flash chip description
arm64: dts: Add dts files for Hisilicon Hi6220 SoC
clk: hi6220: Document devicetree bindings for hi6220 clock
arm64: hi6220: Document devicetree bindings for Hisilicon hi6220 SoC
ARM: at91/dt: sama5d4ek: mci0 uses slot 0
ARM: at91/dt: kizbox: fix mismatch LED PWM device
...
Linus Torvalds [Fri, 26 Jun 2015 18:34:35 +0000 (11:34 -0700)]
Merge tag 'armsoc-soc' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM SoC platform support updates from Kevin Hilman:
"Our SoC branch usually contains expanded support for new SoCs and
other core platform code. Some highlights from this round:
- sunxi: SMP support for A23 SoC
- socpga: big-endian support
- pxa: conversion to common clock framework
- bcm: SMP support for BCM63138
- imx: support new I.MX7D SoC
- zte: basic support for ZX296702 SoC"
* tag 'armsoc-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (134 commits)
ARM: zx: Add basic defconfig support for ZX296702
ARM: dts: zx: add an initial zx296702 dts and doc
clk: zx: add clock support to zx296702
dt-bindings: Add #defines for ZTE ZX296702 clocks
ARM: socfpga: fix build error due to secondary_startup
MAINTAINERS: ARM64: EXYNOS: Extend entry for ARM64 DTS
ARM: ep93xx: simone: support for SPI-based MMC/SD cards
MAINTAINERS: update Shawn's email to use kernel.org one
ARM: socfpga: support suspend to ram
ARM: socfpga: add CPU_METHOD_OF_DECLARE for Arria 10
ARM: socfpga: use CPU_METHOD_OF_DECLARE for socfpga_cyclone5
ARM: EXYNOS: register power domain driver from core_initcall
ARM: EXYNOS: use PS_HOLD based poweroff for all supported SoCs
ARM: SAMSUNG: Constify platform_device_id
ARM: EXYNOS: Constify irq_domain_ops
ARM: EXYNOS: add coupled cpuidle support for Exynos3250
ARM: EXYNOS: add exynos_get_boot_addr() helper
ARM: EXYNOS: add exynos_set_boot_addr() helper
ARM: EXYNOS: make exynos_core_restart() less verbose
ARM: EXYNOS: fix exynos_boot_secondary() return value on timeout
...
Linus Torvalds [Fri, 26 Jun 2015 18:08:27 +0000 (11:08 -0700)]
Merge tag 'armsoc-cleanup' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM SoC cleanups from Kevin Hilman:
"A relatively small setup of cleanups this time around, and similar to
last time the bulk of it is removal of legacy board support:
- OMAP: removal of legacy (non-DT) booting for several platforms
- i.MX: remove some legacy board files"
* tag 'armsoc-cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (36 commits)
ARM: fix EFM32 build breakage caused by cpu_resume_arm
ARM: 8389/1: Add cpu_resume_arm() for firmwares that resume in ARM state
ARM: v7 setup function should invalidate L1 cache
mach-omap2: Remove use of deprecated marco, PTR_RET in devices.c
ARM: OMAP2+: Remove calls to deprecacted marco,PTR_RET in the files,fb.c and pmu.c
ARM: OMAP2+: Constify irq_domain_ops
ARM: OMAP2+: use symbolic defines for console loglevels instead of numbers
ARM: at91: remove useless Makefile.boot
ARM: at91: remove at91rm9200_sdramc.h
ARM: at91: remove mach/at91_ramc.h and mach/at91rm9200_mc.h
ARM: at91/pm: use the atmel-mc syscon defines
pcmcia: at91_cf: Use syscon to configure the MC/smc
ARM: at91: declare the at91rm9200 memory controller as a syscon
mfd: syscon: Add Atmel MC (Memory Controller) registers definition
ARM: at91: drop sam9_smc.c
ata: at91: use syscon to configure the smc
ARM: ux500: delete static resource defines
ARM: ux500: rename ux500_map_io
ARM: ux500: look up PRCMU resource from DT
ARM: ux500: kill off L2CC static map
...
Linus Torvalds [Fri, 26 Jun 2015 16:52:05 +0000 (09:52 -0700)]
Merge branch 'akpm' (patches from Andrew)
Merge second patchbomb from Andrew Morton:
- most of the rest of MM
- lots of misc things
- procfs updates
- printk feature work
- updates to get_maintainer, MAINTAINERS, checkpatch
- lib/ updates
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (96 commits)
exit,stats: /* obey this comment */
coredump: add __printf attribute to cn_*printf functions
coredump: use from_kuid/kgid when formatting corename
fs/reiserfs: remove unneeded cast
NILFS2: support NFSv2 export
fs/befs/btree.c: remove unneeded initializations
fs/minix: remove unneeded cast
init/do_mounts.c: add create_dev() failure log
kasan: remove duplicate definition of the macro KASAN_FREE_PAGE
fs/efs: femove unneeded cast
checkpatch: emit "NOTE: <types>" message only once after multiple files
checkpatch: emit an error when there's a diff in a changelog
checkpatch: validate MODULE_LICENSE content
checkpatch: add multi-line handling for PREFER_ETHER_ADDR_COPY
checkpatch: suggest using eth_zero_addr() and eth_broadcast_addr()
checkpatch: fix processing of MEMSET issues
checkpatch: suggest using ether_addr_equal*()
checkpatch: avoid NOT_UNIFIED_DIFF errors on cover-letter.patch files
checkpatch: remove local from codespell path
checkpatch: add --showfile to allow input via pipe to show filenames
...
Ross Zwisler [Thu, 25 Jun 2015 07:08:39 +0000 (03:08 -0400)]
arch, x86: pmem api for ensuring durability of persistent memory updates
Based on an original patch by Ross Zwisler [1].
Writes to persistent memory have the potential to be posted to cpu
cache, cpu write buffers, and platform write buffers (memory controller)
before being committed to persistent media. Provide apis,
memcpy_to_pmem(), wmb_pmem(), and memremap_pmem(), to write data to
pmem and assert that it is durable in PMEM (a persistent linear address
range). A '__pmem' attribute is added so sparse can track proper usage
of pointers to pmem.
This continues the status quo of pmem being x86 only for 4.2, but
reworks to ioremap, and wider implementation of memremap() will enable
other archs in 4.3.
[1]: https://lists.01.org/pipermail/linux-nvdimm/2015-May/000932.html
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
[djbw: various reworks]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Toshi Kani [Fri, 19 Jun 2015 18:18:34 +0000 (12:18 -0600)]
libnvdimm: Add sysfs numa_node to NVDIMM devices
Add support of sysfs 'numa_node' to I/O-related NVDIMM devices
under /sys/bus/nd/devices, regionN, namespaceN.0, and bttN.x.
An example of numa_node values on a 2-socket system with a single
NVDIMM range on each socket is shown below.
/sys/bus/nd/devices
|-- btt0.0/numa_node:0
|-- btt1.0/numa_node:1
|-- btt1.1/numa_node:1
|-- namespace0.0/numa_node:0
|-- namespace1.0/numa_node:1
|-- region0/numa_node:0
|-- region1/numa_node:1
These numa_node files are then linked under the block class of
their device names.
/sys/class/block/pmem0/device/numa_node:0
/sys/class/block/pmem1s/device/numa_node:1
This enables numactl(8) to accept 'block:' and 'file:' paths of
pmem and btt devices as shown in the examples below.
numactl --preferred block:pmem0 --show
numactl --preferred file:/dev/pmem1s --show
Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Toshi Kani [Fri, 19 Jun 2015 18:18:33 +0000 (12:18 -0600)]
libnvdimm: Set numa_node to NVDIMM devices
ACPI NFIT table has System Physical Address Range Structure entries that
describe a proximity ID of each range when ACPI_NFIT_PROXIMITY_VALID is
set in the flags.
Change acpi_nfit_register_region() to map a proximity ID to its node ID,
and set it to a new numa_node field of nd_region_desc, which is then
conveyed to the nd_region device.
The device core arranges for btt and namespace devices to inherit their
node from their parent region.
Signed-off-by: Toshi Kani <toshi.kani@hp.com>
[djbw: move set_dev_node() from region.c to bus.c]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Toshi Kani [Fri, 19 Jun 2015 23:14:15 +0000 (17:14 -0600)]
acpi: Add acpi_map_pxm_to_online_node()
The kernel initializes CPU & memory's NUMA topology from ACPI
SRAT table. Some other ACPI tables, such as NFIT and DMAR, also
contain proximity IDs for their device's NUMA topology. This
information can be used to improve performance of these devices.
This patch introduces acpi_map_pxm_to_online_node(), which is
similar to acpi_map_pxm_to_node(), but always returns an online
node. When the mapped node from a given proximity ID is offline,
it looks up the node distance table and returns the nearest
online node.
ACPI device drivers, which are called after the NUMA initialization
has completed in the kernel, can call this interface to obtain their
device NUMA topology from ACPI tables. Such drivers do not have to
deal with offline nodes. A node may be offline when a device
proximity ID is unique, SRAT memory entry does not exist, or NUMA is
disabled, ex. "numa=off" on x86.
This patch also moves the pxm range check from acpi_get_node() to
acpi_map_pxm_to_node().
Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Wed, 24 Jun 2015 00:08:34 +0000 (20:08 -0400)]
libnvdimm, nfit: handle unarmed dimms, mark namespaces read-only
Upon detection of an unarmed dimm in a region, arrange for descendant
BTT, PMEM, or BLK instances to be read-only. A dimm is primarily marked
"unarmed" via flags passed by platform firmware (NFIT).
The flags in the NFIT memory device sub-structure indicate the state of
the data on the nvdimm relative to its energy source or last "flush to
persistence". For the most part there is nothing the driver can do but
advertise the state of these flags in sysfs and emit a message if
firmware indicates that the contents of the device may be corrupted.
However, for the case of ACPI_NFIT_MEM_ARMED, the driver can arrange for
the block devices incorporating that nvdimm to be marked read-only.
This is a safe default as the data is still available and new writes are
held off until the administrator either forces read-write mode, or the
energy source becomes armed.
A 'read_only' attribute is added to REGION devices to allow for
overriding the default read-only policy of all descendant block devices.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Sat, 16 May 2015 16:28:54 +0000 (12:28 -0400)]
pmem: flag pmem block devices as non-rotational
...since they are effectively SSDs as far as userspace is concerned.
Reviewed-by: Vishal Verma <vishal.l.verma@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Sat, 16 May 2015 16:28:53 +0000 (12:28 -0400)]
libnvdimm: enable iostat
This is disabled by default as the overhead is prohibitive, but if the
user takes the action to turn it on we'll oblige.
Reviewed-by: Vishal Verma <vishal.l.verma@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Sat, 16 May 2015 16:28:51 +0000 (12:28 -0400)]
pmem: make_request cleanups
Various cleanups:
1/ Kill the BUG_ON since we've already told the block layer we don't
support DISCARD on all these drivers.
2/ Kill the 'rw' variable, no need to cache it.
3/ Kill the local 'sector' variable. bio_for_each_segment() is already
advancing the iterator's sector number by the bio_vec length.
4/ Kill the check for accessing past the end of device
generic_make_request_checks() already does that.
Suggested-by: Christoph Hellwig <hch@lst.de>
[hch: kill access past end of the device check]
Reviewed-by: Vishal Verma <vishal.l.verma@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Sat, 16 May 2015 16:28:50 +0000 (12:28 -0400)]
libnvdimm, pmem: fix up max_hw_sectors
There is no hardware limit to enforce on the size of the i/o that can be passed
to an nvdimm block device, so set it to UINT_MAX.
Reviewed-by: Vishal Verma <vishal.l.verma@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Vishal Verma [Thu, 25 Jun 2015 08:22:39 +0000 (04:22 -0400)]
libnvdimm, blk: add support for blk integrity
Support multiple block sizes (sector + metadata) for nd_blk in the
same way as done for the BTT. Add the idea of an 'internal' lbasize,
which is properly aligned and padded, and store metadata in this space.
Signed-off-by: Vishal Verma <vishal.l.verma@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Vishal Verma [Thu, 25 Jun 2015 08:21:52 +0000 (04:21 -0400)]
libnvdimm, btt: add support for blk integrity
Support multiple block sizes (sector + metadata) using the blk integrity
framework. This registers a new integrity template that defines the
protection information tuple size based on the configured metadata size,
and simply acts as a passthrough for protection information generated by
another layer. The metadata is written to the storage as-is, and read back
with each sector.
Signed-off-by: Vishal Verma <vishal.l.verma@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Vishal Verma [Tue, 12 May 2015 17:48:53 +0000 (13:48 -0400)]
fs/block_dev.c: skip rw_page if bdev has integrity
If a block device has bio integrity enabled, rw_page will bypass the
integrity payload, which is undesirable. Skip rw_page if this is the
case.
Currently brd and zram provide rw_page, and the proposed 'nd' drivers
will too.
Cc: Jens Axboe <axboe@fb.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Suggested-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Thu, 25 Jun 2015 08:48:19 +0000 (04:48 -0400)]
libnvdimm: Non-Volatile Devices
Maintainer information and documentation for drivers/nvdimm
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boaz Harrosh <boaz@plexistor.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Neil Brown <neilb@suse.de>
Cc: Greg KH <gregkh@linuxfoundation.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Wed, 17 Jun 2015 21:23:32 +0000 (17:23 -0400)]
tools/testing/nvdimm: libnvdimm unit test infrastructure
'libnvdimm' is the first driver sub-system in the kernel to implement
mocking for unit test coverage. The nfit_test module gets built as an
external module and arranges for external module replacements of nfit,
libnvdimm, nd_pmem, and nd_blk. These replacements use the linker
--wrap option to redirect calls to ioremap() + request_mem_region() to
custom defined unit test resources. The end result is a fully
functional nvdimm_bus, as far as userspace is concerned, but with the
capability to perform otherwise destructive tests on emulated resources.
Q: Why not use QEMU for this emulation?
QEMU is not suitable for unit testing. QEMU's role is to faithfully
emulate the platform. A unit test's role is to unfaithfully implement
the platform with the goal of triggering bugs in the corners of the
sub-system implementation. As bugs are discovered in platforms, or the
sub-system itself, the unit tests are extended to backstop a fix with a
reproducer unit test.
Another problem with QEMU is that it would require coordination of 3
software projects instead of 2 (kernel + libndctl [1]) to maintain and
execute the tests. The chances for bit rot and the difficulty of
getting the tests running goes up non-linearly the more components
involved.
Q: Why submit this to the kernel tree instead of external modules in
libndctl?
Simple, to alleviate the same risk that out-of-tree external modules
face. Updates to drivers/nvdimm/ can be immediately evaluated to see if
they have any impact on tools/testing/nvdimm/.
Q: What are the negative implications of merging this?
It is a unique maintenance burden because the purpose of mocking an
interface to enable a unit test is to purposefully short circuit the
semantics of a routine to enable testing. For example
__wrap_ioremap_cache() fakes the pmem driver into "ioremap()'ing" a test
resource buffer allocated by dma_alloc_coherent(). The future
maintenance burden hits when someone changes the semantics of
ioremap_cache() and wonders what the implications are for the unit test.
[1]: https://github.com/pmem/ndctl
Cc: <linux-acpi@vger.kernel.org>
Cc: Lv Zheng <lv.zheng@intel.com>
Cc: Robert Moore <robert.moore@intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Ross Zwisler [Thu, 25 Jun 2015 08:21:02 +0000 (04:21 -0400)]
libnvdimm, nfit, nd_blk: driver for BLK-mode access persistent memory
The libnvdimm implementation handles allocating dimm address space (DPA)
between PMEM and BLK mode interfaces. After DPA has been allocated from
a BLK-region to a BLK-namespace the nd_blk driver attaches to handle I/O
as a struct bio based block device. Unlike PMEM, BLK is required to
handle platform specific details like mmio register formats and memory
controller interleave. For this reason the libnvdimm generic nd_blk
driver calls back into the bus provider to carry out the I/O.
This initial implementation handles the BLK interface defined by the
ACPI 6 NFIT [1] and the NVDIMM DSM Interface Example [2] composed from
DCR (dimm control region), BDW (block data window), IDT (interleave
descriptor) NFIT structures and the hardware register format.
[1]: http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf
[2]: http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boaz Harrosh <boaz@plexistor.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Vishal Verma [Thu, 25 Jun 2015 08:20:32 +0000 (04:20 -0400)]
nd_btt: atomic sector updates
BTT stands for Block Translation Table, and is a way to provide power
fail sector atomicity semantics for block devices that have the ability
to perform byte granularity IO. It relies on the capability of libnvdimm
namespace devices to do byte aligned IO.
The BTT works as a stacked blocked device, and reserves a chunk of space
from the backing device for its accounting metadata. It is a bio-based
driver because all IO is done synchronously, and there is no queuing or
asynchronous completions at either the device or the driver level.
The BTT uses 'lanes' to index into various 'on-disk' data structures,
and lanes also act as a synchronization mechanism in case there are more
CPUs than available lanes. We did a comparison between two lane lock
strategies - first where we kept an atomic counter around that tracked
which was the last lane that was used, and 'our' lane was determined by
atomically incrementing that. That way, for the nr_cpus > nr_lanes case,
theoretically, no CPU would be blocked waiting for a lane. The other
strategy was to use the cpu number we're scheduled on to and hash it to
a lane number. Theoretically, this could block an IO that could've
otherwise run using a different, free lane. But some fio workloads
showed that the direct cpu -> lane hash performed faster than tracking
'last lane' - my reasoning is the cache thrash caused by moving the
atomic variable made that approach slower than simply waiting out the
in-progress IO. This supports the conclusion that the driver can be a
very simple bio-based one that does synchronous IOs instead of queuing.
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boaz Harrosh <boaz@plexistor.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Neil Brown <neilb@suse.de>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
[jmoyer: fix nmi watchdog timeout in btt_map_init]
[jmoyer: move btt initialization to module load path]
[jmoyer: fix memory leak in the btt initialization path]
[jmoyer: Don't overwrite corrupted arenas]
Signed-off-by: Vishal Verma <vishal.l.verma@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>