Linus Torvalds [Tue, 7 May 2013 17:13:52 +0000 (10:13 -0700)]
Merge tag 'for_linus' of git://git./linux/kernel/git/mst/vhost
Pull more vhost fixes from Michael Tsirkin:
"This fixes some minor issues in the patches that have been merged.
We also finally drop the workaround disabling event_idx for scsi: it
was always questionable, and now we know it's not needed.
There's also a memory leak fix"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vhost-scsi: Enable VIRTIO_RING_F_EVENT_IDX
vhost: drop virtio_net.h dependency
vhost-net: Cleanup vhost_ubuf and vhost_zcopy
vhost: Remove vhost_enable_zcopy in vhost.h
vhost: Remove comments for hdr in vhost.h
vhost: Move VHOST_NET_FEATURES to net.c
vhost-net: Free ubuf when vhost_dev_set_owner fails
vhost: Export vhost_dev_set_owner
Linus Torvalds [Tue, 7 May 2013 17:12:32 +0000 (10:12 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/mszeredi/fuse
Pull fuse updates from Miklos Szeredi:
"This contains two patchsets from Maxim Patlasov.
The first reworks the request throttling so that only async requests
are throttled. Wakeup of waiting async requests is also optimized.
The second series adds support for async processing of direct IO which
optimizes direct IO and enables the use of the AIO userspace
interface."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: add flag to turn on async direct IO
fuse: truncate file if async dio failed
fuse: optimize short direct reads
fuse: enable asynchronous processing direct IO
fuse: make fuse_direct_io() aware about AIO
fuse: add support of async IO
fuse: move fuse_release_user_pages() up
fuse: optimize wake_up
fuse: implement exclusive wakeup for blocked_waitq
fuse: skip blocking on allocations of synchronous requests
fuse: add flag fc->initialized
fuse: make request allocations for background processing explicit
Linus Torvalds [Tue, 7 May 2013 16:34:40 +0000 (09:34 -0700)]
Merge branch 'merge' of git://git./linux/kernel/git/benh/powerpc
Pull powerpc updates from Benjamin Herrenschmidt:
"Here are a few more powerpc bits that I would like in 3.10.
Mostly remaining bolts & screw tightening of power8 support such as
actually exposing the new features via the previously added AT_HWCAP2,
and a few fixes, some of them for problems exposed recently like
irqdomain warnings or sysfs access permission issues, some exposed by
power8 hardware.
The only change outside of arch/powerpc is a small one to irqdomain.c
to allow silent failure to fix a problem on Cell where we get a dozen
WARN_ON's tripping at boot for what is basically a normal case."
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc: Make hard_irq_disable() do the right thing vs. irq tracing
powerpc/topology: Fix spurr attribute permission
powerpc/pci: Support per-aperture memory offset
powerpc/cell/iommu: Improve error message for missing node
powerpc/cell/spufs: Fix status attribute permission
irqdomain: Allow quiet failure mode
powerpc/pnv: Fix "compatible" property for P8 PHB
powerpc/pci: Don't add bogus empty resources to PHBs
powerpc/powerpnv: Properly handle failure starting CPUs
powerpc/cputable: Advertise support for ISEL/HTM/DSCR/TAR on POWER8
powerpc/cputable: Advertise ISEL support on appropriate embedded processors
powerpc/cputable: Advertise DSCR support on P7/P7+
powerpc/cputable: Reserve bits in HWCAP2 for new features
powerpc/pseries: Perform proper max_bus_speed detection
powerpc/pseries: Force 32 bit MSIs for devices that require it
powerpc/tm: Fix null pointer deference in flush_hash_page
powerpc/powernv: Defer OPAL exception handler registration
powerpc: Emulate non privileged DSCR read and write
Linus Torvalds [Tue, 7 May 2013 16:22:03 +0000 (09:22 -0700)]
Merge branch 'rwsem-optimizations'
Merge rwsem optimizations from Michel Lespinasse:
"These patches extend Alex Shi's work (which added write lock stealing
on the rwsem slow path) in order to provide rwsem write lock stealing
on the fast path (that is, without taking the rwsem's wait_lock).
I have unfortunately been unable to push this through -next before due
to Ingo Molnar / David Howells / Peter Zijlstra being busy with other
things. However, this has gotten some attention from Rik van Riel and
Davidlohr Bueso who both commented that they felt this was ready for
v3.10, and Ingo Molnar has said that he was OK with me pushing
directly to you. So, here goes :)
Davidlohr got the following test results from pgbench running on a
quad-core laptop:
| db_size | clients | tps-vanilla | tps-rwsem |
+---------+----------+----------------+--------------+
| 160 MB | 1 | 5803 | 6906 | + 19.0%
| 160 MB | 2 | 13092 | 15931 |
| 160 MB | 4 | 29412 | 33021 |
| 160 MB | 8 | 32448 | 34626 |
| 160 MB | 16 | 32758 | 33098 |
| 160 MB | 20 | 26940 | 31343 | + 16.3%
| 160 MB | 30 | 25147 | 28961 |
| 160 MB | 40 | 25484 | 26902 |
| 160 MB | 50 | 24528 | 25760 |
------------------------------------------------------
| 1.6 GB | 1 | 5733 | 7729 | + 34.8%
| 1.6 GB | 2 | 9411 | 19009 | + 101.9%
| 1.6 GB | 4 | 31818 | 33185 |
| 1.6 GB | 8 | 33700 | 34550 |
| 1.6 GB | 16 | 32751 | 33079 |
| 1.6 GB | 20 | 30919 | 31494 |
| 1.6 GB | 30 | 28540 | 28535 |
| 1.6 GB | 40 | 26380 | 27054 |
| 1.6 GB | 50 | 25241 | 25591 |
------------------------------------------------------
| 7.6 GB | 1 | 5779 | 6224 |
| 7.6 GB | 2 | 10897 | 13611 | + 24.9%
| 7.6 GB | 4 | 32683 | 33108 |
| 7.6 GB | 8 | 33968 | 34712 |
| 7.6 GB | 16 | 32287 | 32895 |
| 7.6 GB | 20 | 27770 | 31689 | + 14.1%
| 7.6 GB | 30 | 26739 | 29003 |
| 7.6 GB | 40 | 24901 | 26683 |
| 7.6 GB | 50 | 17115 | 25925 | + 51.5%
------------------------------------------------------
(Davidlohr also has one additional patch which further improves
throughput, though I will ask him to send it directly to you as I have
suggested some minor changes)."
* emailed patches from Michel Lespinasse <walken@google.com>:
rwsem: no need for explicit signed longs
x86 rwsem: avoid taking slow path when stealing write lock
rwsem: do not block readers at head of queue if other readers are active
rwsem: implement support for write lock stealing on the fastpath
rwsem: simplify __rwsem_do_wake
rwsem: skip initial trylock in rwsem_down_write_failed
rwsem: avoid taking wait_lock in rwsem_down_write_failed
rwsem: use cmpxchg for trying to steal write lock
rwsem: more agressive lock stealing in rwsem_down_write_failed
rwsem: simplify rwsem_down_write_failed
rwsem: simplify rwsem_down_read_failed
rwsem: move rwsem_down_failed_common code into rwsem_down_{read,write}_failed
rwsem: shorter spinlocked section in rwsem_down_failed_common()
rwsem: make the waiter type an enumeration rather than a bitmask
Linus Torvalds [Tue, 7 May 2013 15:42:20 +0000 (08:42 -0700)]
Merge branch 'slab/for-linus' of git://git./linux/kernel/git/penberg/linux
Pull slab changes from Pekka Enberg:
"The bulk of the changes are more slab unification from Christoph.
There's also few fixes from Aaron, Glauber, and Joonsoo thrown into
the mix."
* 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux: (24 commits)
mm, slab_common: Fix bootstrap creation of kmalloc caches
slab: Return NULL for oversized allocations
mm: slab: Verify the nodeid passed to ____cache_alloc_node
slub: tid must be retrieved from the percpu area of the current processor
slub: Do not dereference NULL pointer in node_match
slub: add 'likely' macro to inc_slabs_node()
slub: correct to calculate num of acquired objects in get_partial_node()
slub: correctly bootstrap boot caches
mm/sl[au]b: correct allocation type check in kmalloc_slab()
slab: Fixup CONFIG_PAGE_ALLOC/DEBUG_SLAB_LEAK sections
slab: Handle ARCH_DMA_MINALIGN correctly
slab: Common definition for kmem_cache_node
slab: Rename list3/l3 to node
slab: Common Kmalloc cache determination
stat: Use size_t for sizes instead of unsigned
slab: Common function to create the kmalloc array
slab: Common definition for the array of kmalloc caches
slab: Common constants for kmalloc boundaries
slab: Rename nodelists to node
slab: Common name for the per node structures
...
Linus Torvalds [Tue, 7 May 2013 14:59:19 +0000 (07:59 -0700)]
Merge branch 'misc' of git://git./linux/kernel/git/mmarek/kbuild
Pull misc kbuild updates from Michal Marek:
"Non-critical kbuild changes:
- make coccicheck improvements, but no new semantic patches this time
- make rpm improvements
- make tar-pkg change to include the architecture in the filename.
This is a deliberate incompatibility, but nobody has complained so
far and it is useful if you build for different architectures. It
also matches what the deb-pkg and rpm-pkg targets produce.
- kbuild documentation fix"
* 'misc' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
rpm-pkg: Remove pointless set -e statements
rpm-pkg: Always regenerate the specfile
rpm-pkg: Do not write to the parent directory
rpm-pkg: Do not package the whole source directory
buildtar: Add ARCH to the archive name
Coccinelle: Fix patch output when coccicheck is used with M= and C=
Coccinelle: Add support to the SPFLAGS variable
Coccinelle: Cleanup the setting of the FLAGS and OPTIONS variables
Coccinelle: Restore coccicheck verbosity in ONLINE mode (C=1 or C=2)
scripts/package/Makefile: compare objtree with srctree instead of test KBUILD_OUTPUT
doc: change example to existing Makefile fragment
scripts/tags.sh: Add magic for OFFSET and DEFINE
Linus Torvalds [Tue, 7 May 2013 14:58:05 +0000 (07:58 -0700)]
Merge branch 'kconfig' of git://git./linux/kernel/git/mmarek/kbuild
Pull kconfig updates from Michal Marek:
- use pkg-config to detect curses libraries
- clean up the way curses headers are searched
- Some randconfig fixes, of which one had to be reverted
- KCONFIG_SEED for randconfig debugging
- memuconfig memory leak plugged
- menuconfig > breadcrumbs > navigation
- xconfig compilation fix
- Other minor fixes
* 'kconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
kconfig: fix lists definition for C++
Revert "kconfig: fix randomising choice entries in presence of KCONFIG_ALLCONFIG"
kconfig: implement KCONFIG_PROBABILITY for randconfig
kconfig: allow specifying the seed for randconfig
kconfig: fix randomising choice entries in presence of KCONFIG_ALLCONFIG
kconfig: do not override symbols already set
kconfig: fix randconfig tristate detection
kconfig/lxdialog: rationalise the include paths where to find {.n}curses{,w}.h
menuconfig: Add "breadcrumbs" navigation aid
menuconfig: Fix memory leak introduced by jump keys feature
merge_config.sh: Avoid creating unnessary source softlinks
kconfig: optionally use pkg-config to detect ncurses libs
menuconfig: optionally use pkg-config to detect ncurses libs
Linus Torvalds [Tue, 7 May 2013 14:56:26 +0000 (07:56 -0700)]
Merge branch 'kbuild' of git://git./linux/kernel/git/mmarek/kbuild
Pull kbuild changes from Michal Marek:
"Kbuild commits for v3.10-rc1:
- Fix make mrproper after mod/file2alias rework
- Fix ld-option Makefile function
- Rewrite headers_install to shell to drop Perl dependency.
There are some more patches I have to look at, so I might send another
pull request later. Or just queue them for 3.11."
* 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
Fix cleaning in scripts/mod
headers_install.pl: convert to headers_install.sh
kbuild: fix ld-option function
Li Zefan [Tue, 7 May 2013 13:56:54 +0000 (15:56 +0200)]
menuconfig: fix NULL pointer dereference when searching a symbol
Searching for PPC_EFIKA results in a segmentation fault, and it's
because get_symbol_prop() returns NULL.
In this case CONFIG_PPC_EFIKA is defined in arch/powerpc/platforms/
52xx/Kconfig, so it won't be parsed if ARCH!=PPC, but menuconfig knows
this symbol when it parses sound/soc/fsl/Kconfig:
config SND_MPC52xx_SOC_EFIKA
tristate "SoC AC97 Audio support for bbplan Efika and STAC9766"
depends on PPC_EFIKA
This bug was introduced by commit
bcdedcc1afd6 ("menuconfig: print more
info for symbol without prompts").
Reported-and-tested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Li Zefan <lizefan@huawei.com>
Tested-by: Libo Chen <libo.chen@huawei.com>
Reviewed-by: "Yann E. MORIN" <yann.morin.1998@free.fr>
Signed-off-by: Michal Marek <mmarek@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bruce Allan [Tue, 7 May 2013 05:52:47 +0000 (22:52 -0700)]
e1000e: fix scheduling while atomic bug
A scheduling while atomic bug was introduced recently (by commit
ce43a2168c59: "e1000e: cleanup USLEEP_RANGE checkpatch checks").
Revert the particular instance of usleep_range() which causes the bug.
Reported-by: Maarten Lankhorst <m.b.lankhorst@gmail.com>
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Davidlohr Bueso [Tue, 7 May 2013 13:46:02 +0000 (06:46 -0700)]
rwsem: no need for explicit signed longs
Change explicit "signed long" declarations into plain "long" as suggested
by Peter Hurley.
Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Reviewed-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michel Lespinasse [Tue, 7 May 2013 13:46:01 +0000 (06:46 -0700)]
x86 rwsem: avoid taking slow path when stealing write lock
modify __down_write[_nested] and __down_write_trylock to grab the write
lock whenever the active count is 0, even if there are queued waiters
(they must be writers pending wakeup, since the active count is 0).
Note that this is an optimization only; architectures without this
optimization will still work fine:
- __down_write() would take the slow path which would take the wait_lock
and then try stealing the lock (as in the spinlocked rwsem implementation)
- __down_write_trylock() would fail, but callers must be ready to deal
with that - since there are some writers pending wakeup, they could
have raced with us and obtained the lock before we steal it.
Signed-off-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michel Lespinasse [Tue, 7 May 2013 13:46:00 +0000 (06:46 -0700)]
rwsem: do not block readers at head of queue if other readers are active
This change fixes a race condition where a reader might determine it
needs to block, but by the time it acquires the wait_lock the rwsem has
active readers and no queued waiters.
In this situation the reader can run in parallel with the existing
active readers; it does not need to block until the active readers
complete.
Thanks to Peter Hurley for noticing this possible race.
Signed-off-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michel Lespinasse [Tue, 7 May 2013 13:45:59 +0000 (06:45 -0700)]
rwsem: implement support for write lock stealing on the fastpath
When we decide to wake up readers, we must first grant them as many read
locks as necessary, and then actually wake up all these readers. But in
order to know how many read shares to grant, we must first count the
readers at the head of the queue. This might take a while if there are
many readers, and we want to be protected against a writer stealing the
lock while we're counting. To that end, we grant the first reader lock
before counting how many more readers are queued.
We also require some adjustments to the wake_type semantics.
RWSEM_WAKE_NO_ACTIVE used to mean that we had found the count to be
RWSEM_WAITING_BIAS, in which case the rwsem was known to be free as
nobody could steal it while we hold the wait_lock. This doesn't make
sense once we implement fastpath write lock stealing, so we now use
RWSEM_WAKE_ANY in that case.
Similarly, when rwsem_down_write_failed found that a read lock was
active, it would use RWSEM_WAKE_READ_OWNED which signalled that new
readers could be woken without checking first that the rwsem was
available. We can't do that anymore since the existing readers might
release their read locks, and a writer could steal the lock before we
wake up additional readers. So, we have to use a new RWSEM_WAKE_READERS
value to indicate we only want to wake readers, but we don't currently
hold any read lock.
Signed-off-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michel Lespinasse [Tue, 7 May 2013 13:45:58 +0000 (06:45 -0700)]
rwsem: simplify __rwsem_do_wake
This is mostly for cleanup value:
- We don't need several gotos to handle the case where the first
waiter is a writer. Two simple tests will do (and generate very
similar code).
- In the remainder of the function, we know the first waiter is a reader,
so we don't have to double check that. We can use do..while loops
to iterate over the readers to wake (generates slightly better code).
Signed-off-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michel Lespinasse [Tue, 7 May 2013 13:45:57 +0000 (06:45 -0700)]
rwsem: skip initial trylock in rwsem_down_write_failed
We can skip the initial trylock in rwsem_down_write_failed() if there
are known active lockers already, thus saving one likely-to-fail
cmpxchg.
Signed-off-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michel Lespinasse [Tue, 7 May 2013 13:45:56 +0000 (06:45 -0700)]
rwsem: avoid taking wait_lock in rwsem_down_write_failed
In rwsem_down_write_failed(), if there are active locks after we wake up
(i.e. the lock got stolen from us), skip taking the wait_lock and go
back to sleep immediately.
Signed-off-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michel Lespinasse [Tue, 7 May 2013 13:45:55 +0000 (06:45 -0700)]
rwsem: use cmpxchg for trying to steal write lock
Using rwsem_atomic_update to try stealing the write lock forced us to
undo the adjustment in the failure path. We can have simpler and faster
code by using cmpxchg instead.
Signed-off-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michel Lespinasse [Tue, 7 May 2013 13:45:54 +0000 (06:45 -0700)]
rwsem: more agressive lock stealing in rwsem_down_write_failed
Some small code simplifications can be achieved by doing more agressive
lock stealing:
- When rwsem_down_write_failed() notices that there are no active locks
(and thus no thread to wake us if we decided to sleep), it used to wake
the first queued process. However, stealing the lock is also sufficient
to deal with this case, so we don't need this check anymore.
- In try_get_writer_sem(), we can steal the lock even when the first waiter
is a reader. This is correct because the code path that wakes readers is
protected by the wait_lock. As to the performance effects of this change,
they are expected to be minimal: readers are still granted the lock
(rather than having to acquire it themselves) when they reach the front
of the wait queue, so we have essentially the same behavior as in
rwsem-spinlock.
Signed-off-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michel Lespinasse [Tue, 7 May 2013 13:45:53 +0000 (06:45 -0700)]
rwsem: simplify rwsem_down_write_failed
When waking writers, we never grant them the lock - instead, they have
to acquire it themselves when they run, and remove themselves from the
wait_list when they succeed.
As a result, we can do a few simplifications in rwsem_down_write_failed():
- We don't need to check for !waiter.task since __rwsem_do_wake() doesn't
remove writers from the wait_list
- There is no point releaseing the wait_lock before entering the wait loop,
as we will need to reacquire it immediately. We can change the loop so
that the lock is always held at the start of each loop iteration.
- We don't need to get a reference on the task structure, since the task
is responsible for removing itself from the wait_list. There is no risk,
like in the rwsem_down_read_failed() case, that a task would wake up and
exit (thus destroying its task structure) while __rwsem_do_wake() is
still running - wait_lock protects against that.
Signed-off-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michel Lespinasse [Tue, 7 May 2013 13:45:52 +0000 (06:45 -0700)]
rwsem: simplify rwsem_down_read_failed
When trying to acquire a read lock, the RWSEM_ACTIVE_READ_BIAS
adjustment doesn't cause other readers to block, so we never have to
worry about waking them back after canceling this adjustment in
rwsem_down_read_failed().
We also never want to steal the lock in rwsem_down_read_failed(), so we
don't have to grab the wait_lock either.
Signed-off-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michel Lespinasse [Tue, 7 May 2013 13:45:51 +0000 (06:45 -0700)]
rwsem: move rwsem_down_failed_common code into rwsem_down_{read,write}_failed
Remove the rwsem_down_failed_common function and replace it with two
identical copies of its code in rwsem_down_{read,write}_failed.
This is because we want to make different optimizations in
rwsem_down_{read,write}_failed; we are adding this pure-duplication
step as a separate commit in order to make it easier to check the
following steps.
Signed-off-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michel Lespinasse [Tue, 7 May 2013 13:45:50 +0000 (06:45 -0700)]
rwsem: shorter spinlocked section in rwsem_down_failed_common()
This change reduces the size of the spinlocked and TASK_UNINTERRUPTIBLE
sections in rwsem_down_failed_common():
- We only need the sem->wait_lock to insert ourselves on the wait_list;
the waiter node can be prepared outside of the wait_lock.
- The task state only needs to be set to TASK_UNINTERRUPTIBLE immediately
before checking if we actually need to sleep; it doesn't need to protect
the entire function.
Signed-off-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michel Lespinasse [Tue, 7 May 2013 13:45:49 +0000 (06:45 -0700)]
rwsem: make the waiter type an enumeration rather than a bitmask
We are not planning to add some new waiter flags, so we can convert the
waiter type into an enumeration.
Background: David Howells suggested I do this back when I tried adding
a new waiter type for unfair readers. However, I believe the cleanup
applies regardless of that use case.
Signed-off-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Asias He [Tue, 7 May 2013 06:51:49 +0000 (14:51 +0800)]
vhost-scsi: Enable VIRTIO_RING_F_EVENT_IDX
It was disabled as a workaround. Now userspace bits work fine with it.
The broken version was not ever committed to QEMU, I guess the same is
true for nlkt.
So, let's enable it.
Signed-off-by: Asias He <asias@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Benjamin Herrenschmidt [Mon, 6 May 2013 21:04:02 +0000 (21:04 +0000)]
powerpc: Make hard_irq_disable() do the right thing vs. irq tracing
If hard_irq_disable() is called while interrupts are already soft-disabled
(which is the most common case) all is already well.
However you can (and in some cases want) to call it while everything is
enabled (to make sure you don't get a lazy even, for example before entry
into KVM guests) and in this case we need to inform the irq tracer that
the irqs are going off.
We have to change the inline into a macro to avoid an include circular
dependency hell hole.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Pekka Enberg [Tue, 7 May 2013 06:19:47 +0000 (09:19 +0300)]
Merge branch 'slab/next' into slab/for-linus
Linus Torvalds [Mon, 6 May 2013 22:51:10 +0000 (15:51 -0700)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
"Just a small pile of fixes"
1) Fix race conditions in IP fragmentation LRU list handling, from
Konstantin Khlebnikov.
2) vfree() is no longer verboten in interrupts, so deferring is
pointless, from Al Viro.
3) Conversion from mutex to semaphore in netpoll left trylock test
inverted, caught by Dan Carpenter.
4) 3c59x uses wrong base address when releasing regions, from Sergei
Shtylyov.
5) Bounds checking in TIPC from Dan Carpenter.
6) Fastopen cookies should not be expired as aggressively as other TCP
metrics. From Eric Dumazet.
7) Fix retrieval of MAC address in ibmveth, from Ben Herrenschmidt.
8) Don't use "u16" in virtio user headers, from Stephen Hemminger
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
tipc: potential divide by zero in tipc_link_recv_fragment()
tipc: add a bounds check in link_recv_changeover_msg()
net/usb: new driver for RTL8152
3c59x: fix freeing nonexistent resource on driver unload
netpoll: inverted down_trylock() test
rps_dev_flow_table_release(): no need to delay vfree()
fib_trie: no need to delay vfree()
net: frag, fix race conditions in LRU list maintenance
tcp: do not expire TCP fastopen cookies
net/eth/ibmveth: Fixup retrieval of MAC address
virtio: don't expose u16 in userspace api
Linus Torvalds [Mon, 6 May 2013 22:41:42 +0000 (15:41 -0700)]
Merge branch 'for-next' of git://git./linux/kernel/git/cooloney/linux-leds
Pull LED subsystem updates from Bryan Wu:
- move LED trigger drivers into a new directory
- lp55xx common driver updates
- other led drivers updates and bug fixing
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/linux-leds:
leds: leds-asic3: switch to using SIMPLE_DEV_PM_OPS
leds: leds-bd2802: add CONFIG_PM_SLEEP to suspend/resume functions
leds: lp55xx: configure the clock detection
leds: lp55xx: use common clock framework when external clock is used
leds: leds-ns2: fix oops at module removal
leds: leds-pwm: Defer led_pwm_set() if PWM can sleep
leds: lp55xx: fix the sysfs read operation
leds: lm355x, lm3642: support camera LED triggers for flash and torch
leds: add camera LED triggers
leds: trigger: use inline functions instead of macros
leds: tca6507: Use of_match_ptr() macro
leds: wm8350: Complain if we fail to reenable DCDC
leds: renesas: set gpio_request_one() flags param correctly
leds: leds-ns2: set devm_gpio_request_one() flags param correctly
leds: leds-lt3593: set devm_gpio_request_one() flags param correctly
leds: leds-bd2802: remove erroneous __exit annotation
leds: atmel-pwm: remove erroneous __exit annotation
leds: move LED trigger drivers into new subdirectory
leds: add new LP5562 LED driver
Linus Torvalds [Mon, 6 May 2013 22:40:55 +0000 (15:40 -0700)]
Merge tag 'gpio-for-linus' of git://git.secretlab.ca/git/linux
Pull GPIO changes from Grant Likely:
"The usual selection of bug fixes and driver updates for GPIO. Nothing
really stands out except the addition of the GRGPIO driver and some
enhacements to ACPI support"
I'm pulling this despite the earlier mess. Let's hope it compiles these
days.
* tag 'gpio-for-linus' of git://git.secretlab.ca/git/linux: (46 commits)
gpio: grgpio: Add irq support
gpio: grgpio: Add device driver for GRGPIO cores
gpiolib-acpi: introduce acpi_get_gpio_by_index() helper
GPIO: gpio-generic: remove kfree() from bgpio_remove call
gpio / ACPI: Handle ACPI events in accordance with the spec
gpio: lpc32xx: Fix off-by-one valid range checking for bank
gpio: mcp23s08: convert driver to DT
gpio/omap: force restore if context loss is not detectable
gpio/omap: optimise interrupt service routine
gpio/omap: remove extra context restores in *_runtime_resume()
gpio/omap: free irq domain in probe() failure paths
gpio: gpio-generic: Add 16 and 32 bit big endian byte order support
gpio: samsung: Add terminating entry for exynos_pinctrl_ids
gpio: mvebu: add dbg_show function
MAX7301 GPIO: Do not force SPI speed when using OF Platform
gpio: gpio-tps65910.c: fix checkpatch error
gpio: gpio-timberdale.c: fix checkpatch error
gpio: gpio-tc3589x.c: fix checkpatch errors
gpio: gpio-stp-xway.c: fix checkpatch error
gpio: gpio-sch.c: fix checkpatch error
...
Linus Torvalds [Mon, 6 May 2013 22:32:36 +0000 (15:32 -0700)]
Merge tag 'for-3.10-rc1' of git://gitorious.org/linux-pwm/linux-pwm
Pull pwm changes from Thierry Reding:
"Nothing very exciting this time around. A couple of bug fixes and a
lot of cleanup across the board. The DaVinci 8xx family of SoCs now
use the same driver as the AM33xx family.
Many thanks to Axel Lin and Jingoo Han who have done a great job
fixing various bugs and inconsistencies."
* tag 'for-3.10-rc1' of git://gitorious.org/linux-pwm/linux-pwm: (27 commits)
pwm: lpc32xx: Don't change PWM_ENABLE bit in lpc32xx_pwm_config
pwm: lpc32xx: Properly set PWM_ENABLE bit in lpc32xx_pwm_[enable|disable]
pwm: Constify OF match tables
pwm: pwm-tiehrpwm: Update device-tree binding document
pwm: pwm-tiecap: Update device-tree binding document
pwm: puv3: Remove unused enabled filed from struct puv3_pwm_chip
pwm: pxa: Remove PWM_ID_BASE macro
pwm: spear: Remove unused *dev from struct spear_pwm_chip
pwm: mxs: Remove unused *dev from struct mxs_pwm_chip
pwm: twl: Return proper error if twl6030_pwm_enable() fails
pwm: pxa: Remove clk_enabled field from struct pxa_pwm_chip
pwm: imx: Remove enabled field from struct imx_chip
pwm: twl: Add .owner to struct pwm_ops
pwm: twl-led: Add .owner to struct pwm_ops
pwm: atmel-tcb: Add .owner to struct pwm_ops
pwm: ab8500: Add .owner to struct pwm_ops
pwm: spear: Fix checking return value of clk_enable() and clk_prepare()
pwm: tiehrpwm: Staticize non-exported symbols
pwm: tiecap: Staticize non-exported symbols
pwm: ab8500: Fix trivial typo in dev_err message
...
Linus Torvalds [Mon, 6 May 2013 21:59:13 +0000 (14:59 -0700)]
Merge tag 'iommu-updates-v3.10' of git://git./linux/kernel/git/joro/iommu
Pull IOMMU updates from Joerg Roedel:
"The updates are mostly about the x86 IOMMUs this time.
Exceptions are the groundwork for the PAMU IOMMU from Freescale (for a
PPC platform) and an extension to the IOMMU group interface.
On the x86 side this includes a workaround for VT-d to disable
interrupt remapping on broken chipsets. On the AMD-Vi side the most
important new feature is a kernel command-line interface to override
broken information in IVRS ACPI tables and get interrupt remapping
working this way.
Besides that there are small fixes all over the place."
* tag 'iommu-updates-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (24 commits)
iommu/tegra: Fix printk formats for dma_addr_t
iommu: Add a function to find an iommu group by id
iommu/vt-d: Remove warning for HPET scope type
iommu: Move swap_pci_ref function to drivers/iommu/pci.h.
iommu/vt-d: Disable translation if already enabled
iommu/amd: fix error return code in early_amd_iommu_init()
iommu/AMD: Per-thread IOMMU Interrupt Handling
iommu: Include linux/err.h
iommu/amd: Workaround for ERBT1312
iommu/amd: Document ivrs_ioapic and ivrs_hpet parameters
iommu/amd: Don't report firmware bugs with cmd-line ivrs overrides
iommu/amd: Add ioapic and hpet ivrs override
iommu/amd: Add early maps for ioapic and hpet
iommu/amd: Extend IVRS special device data structure
iommu/amd: Move add_special_device() to __init
iommu: Fix compile warnings with forward declarations
iommu/amd: Properly initialize irq-table lock
iommu/amd: Use AMD specific data structure for irq remapping
iommu/amd: Remove map_sg_no_iommu()
iommu/vt-d: add quirk for broken interrupt remapping on 55XX chipsets
...
Andreas Schwab [Sat, 4 May 2013 14:32:53 +0000 (16:32 +0200)]
Fix cleaning in scripts/mod
Make sure devicetable-offsets.h is cleaned in the scripts/mod directory
Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Michal Marek <mmarek@suse.cz>
Christoph Lameter [Fri, 3 May 2013 18:04:18 +0000 (18:04 +0000)]
mm, slab_common: Fix bootstrap creation of kmalloc caches
For SLAB the kmalloc caches must be created in ascending sizes in order
for the OFF_SLAB sub-slab cache to work properly.
Create the non power of two caches immediately after the prior power of
two kmalloc cache. Do not create the non power of two caches before all
other caches.
Reported-and-tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Christoph Lamete <cl@linux.com>
Link: http://lkml.kernel.org/r/201305040348.CIF81716.OStQOHFJMFLOVF@I-love.SAKURA.ne.jp
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Dan Carpenter [Mon, 6 May 2013 09:31:17 +0000 (09:31 +0000)]
tipc: potential divide by zero in tipc_link_recv_fragment()
The worry here is that fragm_sz could be zero since it comes from
skb->data.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Mon, 6 May 2013 08:28:41 +0000 (08:28 +0000)]
tipc: add a bounds check in link_recv_changeover_msg()
The bearer_id here comes from skb->data and it can be a number from 0 to
7. The problem is that the ->links[] array has only 2 elements so I
have added a range check.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
hayeswang [Thu, 2 May 2013 16:01:25 +0000 (16:01 +0000)]
net/usb: new driver for RTL8152
Add new driver for supporting Realtek RTL8152 Based USB 2.0 Ethernet Adapters
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Cc: Realtek linux nic maintainers <nic_swsd@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Mon, 6 May 2013 20:11:19 +0000 (13:11 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/sage/ceph-client
Pull Ceph changes from Alex Elder:
"This is a big pull.
Most of it is culmination of Alex's work to implement RBD image
layering, which is now complete (yay!).
There is also some work from Yan to fix i_mutex behavior surrounding
writes in cephfs, a sync write fix, a fix for RBD images that get
resized while they are mapped, and a few patches from me that resolve
annoying auth warnings and fix several bugs in the ceph auth code."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (254 commits)
rbd: fix image request leak on parent read
libceph: use slab cache for osd client requests
libceph: allocate ceph message data with a slab allocator
libceph: allocate ceph messages with a slab allocator
rbd: allocate image object names with a slab allocator
rbd: allocate object requests with a slab allocator
rbd: allocate name separate from obj_request
rbd: allocate image requests with a slab allocator
rbd: use binary search for snapshot lookup
rbd: clear EXISTS flag if mapped snapshot disappears
rbd: kill off the snapshot list
rbd: define rbd_snap_size() and rbd_snap_features()
rbd: use snap_id not index to look up snap info
rbd: look up snapshot name in names buffer
rbd: drop obj_request->version
rbd: drop rbd_obj_method_sync() version parameter
rbd: more version parameter removal
rbd: get rid of some version parameters
rbd: stop tracking header object version
rbd: snap names are pointer to constant data
...
Linus Torvalds [Mon, 6 May 2013 20:07:47 +0000 (13:07 -0700)]
Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6
Pull CIFS fixes from Steve French:
"A set of cifs cleanup fixes.
The only big one of this set optimizes the cifs error logging,
renaming cFYI and cERROR macros to cifs_dbg, and in the process makes
it clearer and reduces module size."
* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
cifs: small variable name cleanup
CIFS: fix error return code in cifs_atomic_open()
cifs: store the real expected sequence number in the mid
cifs: on send failure, readjust server sequence number downward
cifs: remove ENOSPC handling in smb_sendv
[CIFS] cifs: Rename cERROR and cFYI to cifs_dbg
fs: cifs: use kmemdup instead of kmalloc + memcpy
cifs: replaced kmalloc + memset with kzalloc
cifs: ignore the unc= and prefixpath= mount options
David Jeffery [Mon, 6 May 2013 05:49:30 +0000 (13:49 +0800)]
autofs - remove autofs dentry mount check
When checking if an autofs mount point is busy it isn't sufficient to
only check if it's a mount point.
For example, if the mount of an offset mountpoint in a tree is denied
for this host by its export and the dentry becomes a process working
directory the check incorrectly returns the mount as not in use at
expire.
This can happen since the default when mounting within a tree is
nostrict, which means ingnore mount fails on mounts within the tree and
continue. The nostrict option is meant to allow mounting in this case.
Signed-off-by: David Jeffery <djeffery@redhat.com>
Signed-off-by: Ian Kent <raven@themaw.net>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Claudiu Ghioc [Mon, 6 May 2013 05:47:16 +0000 (13:47 +0800)]
autofs - fix sparse warning for autofs4_d_manage()
Fixed the sparse warning:
fs/autofs4/root.c:411:5: warning: symbol 'autofs4_d_manage' was not declared. Should it be static?"
[ Clearly it should be static as the function is declared static at the
top of root.c. - imk ]
Signed-off-by: Claudiu Ghioc <claudiu.ghioc@gmail.com>
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Mon, 6 May 2013 19:34:53 +0000 (12:34 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/s390/linux
Pull more s390 updates from Martin Schwidefsky:
"This is the second batch of s390 patches for the 3.10 merge window.
Heiko improved the memory detection, this fixes kdump for large memory
sizes. Some kvm related memory management work, new ipldev/condev
keywords in cio and bug fixes."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/mem_detect: remove artificial kdump memory types
s390/mm: add pte invalidation notifier for kvm
s390/zcrypt: ap bus rescan problem when toggle crypto adapters on/off
s390/memory hotplug,sclp: get rid of per memory increment usecount
s390/memory hotplug: provide memory_block_size_bytes() function
s390/mem_detect: limit memory detection loop to "mem=" parameter
s390/kdump,bootmem: fix bootmem allocator bitmap size
s390: get rid of odd global real_memory_size
s390/kvm: Change the virtual memory mapping location for Virtio devices
s390/zcore: calculate real memory size using own get_mem_size function
s390/mem_detect: add DAT sanity check
s390/mem_detect: fix lockdep irq tracing
s390/mem_detect: move memory detection code to mm folder
s390/zfcpdump: exploit new cio_ignore keywords
s390/cio: add condev keyword to cio_ignore
s390/cio: add ipldev keyword to cio_ignore
s390/uaccess: add "fallthrough" comments
Sergei Shtylyov [Thu, 2 May 2013 11:10:22 +0000 (11:10 +0000)]
3c59x: fix freeing nonexistent resource on driver unload
When unloading the driver that drives an EISA board, a message similar to the
following one is displayed:
Trying to free nonexistent resource <
0000000000013000-
000000000001301f>
Then an user is unable to reload the driver because the resource it requested in
the previous load hasn't been freed. This happens most probably due to a typo in
vortex_eisa_remove() which calls release_region() with 'dev->base_addr' instead
of 'edev->base_addr'...
Reported-by: Matthew Whitehead <tedheadster@gmail.com>
Tested-by: Matthew Whitehead <tedheadster@gmail.com>
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Mon, 6 May 2013 02:15:13 +0000 (02:15 +0000)]
netpoll: inverted down_trylock() test
The return value is reversed from mutex_trylock().
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Al Viro [Sun, 5 May 2013 16:05:55 +0000 (16:05 +0000)]
rps_dev_flow_table_release(): no need to delay vfree()
The same story as with fib_trie patch - vfree() from RCU callbacks
is legitimate now.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Al Viro [Sun, 5 May 2013 16:03:46 +0000 (16:03 +0000)]
fib_trie: no need to delay vfree()
Now that vfree() can be called from interrupt contexts, there's no
need to play games with schedule_work() to escape calling vfree()
from RCU callbacks.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Konstantin Khlebnikov [Sun, 5 May 2013 04:56:22 +0000 (04:56 +0000)]
net: frag, fix race conditions in LRU list maintenance
This patch fixes race between inet_frag_lru_move() and inet_frag_lru_add()
which was introduced in commit
3ef0eb0db4bf92c6d2510fe5c4dc51852746f206
("net: frag, move LRU list maintenance outside of rwlock")
One cpu already added new fragment queue into hash but not into LRU.
Other cpu found it in hash and tries to move it to the end of LRU.
This leads to NULL pointer dereference inside of list_move_tail().
Another possible race condition is between inet_frag_lru_move() and
inet_frag_lru_del(): move can happens after deletion.
This patch initializes LRU list head before adding fragment into hash and
inet_frag_lru_move() doesn't touches it if it's empty.
I saw this kernel oops two times in a couple of days.
[119482.128853] BUG: unable to handle kernel NULL pointer dereference at (null)
[119482.132693] IP: [<
ffffffff812ede89>] __list_del_entry+0x29/0xd0
[119482.136456] PGD
2148f6067 PUD
215ab9067 PMD 0
[119482.140221] Oops: 0000 [#1] SMP
[119482.144008] Modules linked in: vfat msdos fat 8021q fuse nfsd auth_rpcgss nfs_acl nfs lockd sunrpc ppp_async ppp_generic bridge slhc stp llc w83627ehf hwmon_vid snd_hda_codec_hdmi snd_hda_codec_realtek kvm_amd k10temp kvm snd_hda_intel snd_hda_codec edac_core radeon snd_hwdep ath9k snd_pcm ath9k_common snd_page_alloc ath9k_hw snd_timer snd soundcore drm_kms_helper ath ttm r8169 mii
[119482.152692] CPU 3
[119482.152721] Pid: 20, comm: ksoftirqd/3 Not tainted 3.9.0-zurg-00001-g9f95269 #132 To Be Filled By O.E.M. To Be Filled By O.E.M./RS880D
[119482.161478] RIP: 0010:[<
ffffffff812ede89>] [<
ffffffff812ede89>] __list_del_entry+0x29/0xd0
[119482.166004] RSP: 0018:
ffff880216d5db58 EFLAGS:
00010207
[119482.170568] RAX:
0000000000000000 RBX:
ffff88020882b9c0 RCX:
dead000000200200
[119482.175189] RDX:
0000000000000000 RSI:
0000000000000880 RDI:
ffff88020882ba00
[119482.179860] RBP:
ffff880216d5db58 R08:
ffffffff8155c7f0 R09:
0000000000000014
[119482.184570] R10:
0000000000000000 R11:
0000000000000000 R12:
ffff88020882ba00
[119482.189337] R13:
ffffffff81c8d780 R14:
ffff880204357f00 R15:
00000000000005a0
[119482.194140] FS:
00007f58124dc700(0000) GS:
ffff88021fcc0000(0000) knlGS:
0000000000000000
[119482.198928] CS: 0010 DS: 0000 ES: 0000 CR0:
000000008005003b
[119482.203711] CR2:
0000000000000000 CR3:
00000002155f0000 CR4:
00000000000007e0
[119482.208533] DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
[119482.213371] DR3:
0000000000000000 DR6:
00000000ffff0ff0 DR7:
0000000000000400
[119482.218221] Process ksoftirqd/3 (pid: 20, threadinfo
ffff880216d5c000, task
ffff880216d3a9a0)
[119482.223113] Stack:
[119482.228004]
ffff880216d5dbd8 ffffffff8155dcda 0000000000000000 ffff000200000001
[119482.233038]
ffff8802153c1f00 ffff880000289440 ffff880200000014 ffff88007bc72000
[119482.238083]
00000000000079d5 ffff88007bc72f44 ffffffff00000002 ffff880204357f00
[119482.243090] Call Trace:
[119482.248009] [<
ffffffff8155dcda>] ip_defrag+0x8fa/0xd10
[119482.252921] [<
ffffffff815a8013>] ipv4_conntrack_defrag+0x83/0xe0
[119482.257803] [<
ffffffff8154485b>] nf_iterate+0x8b/0xa0
[119482.262658] [<
ffffffff8155c7f0>] ? inet_del_offload+0x40/0x40
[119482.267527] [<
ffffffff815448e4>] nf_hook_slow+0x74/0x130
[119482.272412] [<
ffffffff8155c7f0>] ? inet_del_offload+0x40/0x40
[119482.277302] [<
ffffffff8155d068>] ip_rcv+0x268/0x320
[119482.282147] [<
ffffffff81519992>] __netif_receive_skb_core+0x612/0x7e0
[119482.286998] [<
ffffffff81519b78>] __netif_receive_skb+0x18/0x60
[119482.291826] [<
ffffffff8151a650>] process_backlog+0xa0/0x160
[119482.296648] [<
ffffffff81519f29>] net_rx_action+0x139/0x220
[119482.301403] [<
ffffffff81053707>] __do_softirq+0xe7/0x220
[119482.306103] [<
ffffffff81053868>] run_ksoftirqd+0x28/0x40
[119482.310809] [<
ffffffff81074f5f>] smpboot_thread_fn+0xff/0x1a0
[119482.315515] [<
ffffffff81074e60>] ? lg_local_lock_cpu+0x40/0x40
[119482.320219] [<
ffffffff8106d870>] kthread+0xc0/0xd0
[119482.324858] [<
ffffffff8106d7b0>] ? insert_kthread_work+0x40/0x40
[119482.329460] [<
ffffffff816c32dc>] ret_from_fork+0x7c/0xb0
[119482.334057] [<
ffffffff8106d7b0>] ? insert_kthread_work+0x40/0x40
[119482.338661] Code: 00 00 55 48 8b 17 48 b9 00 01 10 00 00 00 ad de 48 8b 47 08 48 89 e5 48 39 ca 74 29 48 b9 00 02 20 00 00 00 ad de 48 39 c8 74 7a <4c> 8b 00 4c 39 c7 75 53 4c 8b 42 08 4c 39 c7 75 2b 48 89 42 08
[119482.343787] RIP [<
ffffffff812ede89>] __list_del_entry+0x29/0xd0
[119482.348675] RSP <
ffff880216d5db58>
[119482.353493] CR2:
0000000000000000
Oops happened on this path:
ip_defrag() -> ip_frag_queue() -> inet_frag_lru_move() -> list_move_tail() -> __list_del_entry()
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: Eric Dumazet <edumazet@google.com>
Cc: David S. Miller <davem@davemloft.net>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael S. Tsirkin [Mon, 6 May 2013 10:29:36 +0000 (13:29 +0300)]
vhost: drop virtio_net.h dependency
There's no net specific code in vhost.c anymore,
don't include the virtio_net.h header.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Asias He [Mon, 6 May 2013 08:38:24 +0000 (16:38 +0800)]
vhost-net: Cleanup vhost_ubuf and vhost_zcopy
- Rename vhost_ubuf to vhost_net_ubuf
- Rename vhost_zcopy_mask to vhost_net_zcopy_mask
- Make funcs static
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Asias He [Mon, 6 May 2013 08:38:19 +0000 (16:38 +0800)]
vhost: Remove vhost_enable_zcopy in vhost.h
It is net.c specific.
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Asias He [Mon, 6 May 2013 08:38:22 +0000 (16:38 +0800)]
vhost: Remove comments for hdr in vhost.h
It is supposed to be removed when hdr is moved into vhost_net_virtqueue.
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Asias He [Mon, 6 May 2013 08:38:20 +0000 (16:38 +0800)]
vhost: Move VHOST_NET_FEATURES to net.c
vhost.h should not depend on device specific marcos like
VHOST_NET_F_VIRTIO_NET_HDR and VIRTIO_NET_F_MRG_RXBUF.
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Asias He [Mon, 6 May 2013 03:16:00 +0000 (11:16 +0800)]
vhost-net: Free ubuf when vhost_dev_set_owner fails
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Asias He [Mon, 6 May 2013 03:15:59 +0000 (11:15 +0800)]
vhost: Export vhost_dev_set_owner
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Christoph Lameter [Fri, 3 May 2013 15:43:18 +0000 (15:43 +0000)]
slab: Return NULL for oversized allocations
The inline path seems to have changed the SLAB behavior for very large
kmalloc allocations with commit
e3366016 ("slab: Use common
kmalloc_index/kmalloc_size functions"). This patch restores the old
behavior but also adds diagnostics so that we can figure where in the
code these large allocations occur.
Reported-and-tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Christoph Lameter <cl@linux.com>
Link: http://lkml.kernel.org/r/201305040348.CIF81716.OStQOHFJMFLOVF@I-love.SAKURA.ne.jp
[ penberg@kernel.org: use WARN_ON_ONCE ]
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Benjamin Herrenschmidt [Mon, 6 May 2013 05:02:40 +0000 (15:02 +1000)]
powerpc/topology: Fix spurr attribute permission
We are registering the attribute with permission 0600 but it
doesn't have a store callback, which causes WARN_ON's during
boot. Fix the permission.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt [Mon, 6 May 2013 03:40:40 +0000 (13:40 +1000)]
powerpc/pci: Support per-aperture memory offset
The PCI core supports an offset per aperture nowadays but our arch
code still has a single offset per host bridge representing the
difference betwen CPU memory addresses and PCI MMIO addresses.
This is a problem as new machines and hypervisor versions are
coming out where the 64-bit windows will have a different offset
(basically mapped 1:1) from the 32-bit windows.
This fixes it by using separate offsets. In the long run, we probably
want to get rid of that intermediary struct pci_controller and have
those directly stored into the pci_host_bridge as they are parsed
but this will be a more invasive change.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt [Mon, 6 May 2013 02:03:49 +0000 (12:03 +1000)]
powerpc/cell/iommu: Improve error message for missing node
Some devices don't have a correct node ID and thus can't be
attached to an iommu.
The message displayed by the iommu code isn't very useful if
you don't have a device-tree node as it tries to print the
device-tree path but not the struct device name.
Improve this by printing the device name as well.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt [Mon, 6 May 2013 02:02:05 +0000 (12:02 +1000)]
powerpc/cell/spufs: Fix status attribute permission
We are registering the attribute with permission 0644 but it
doesn't have a store callback, which causes WARN_ON's during
boot. Fix the permission.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt [Mon, 6 May 2013 01:37:43 +0000 (11:37 +1000)]
irqdomain: Allow quiet failure mode
Some interrupt controllers refuse to map interrupts marked as
"protected" by firwmare. Since we try to map everyting in the
device-tree on some platforms, we end up with a lot of nasty
WARN's in the boot log for what is a normal situation on those
machines.
This defines a specific return code (-EPERM) from the host map()
callback which cause irqdomain to fail silently.
MPIC is updated to return this when hitting a protected source
printing only a single line message for diagnostic purposes.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Linus Torvalds [Mon, 6 May 2013 00:36:20 +0000 (17:36 -0700)]
Merge tag 'mfd-3.10-1' of git://git./linux/kernel/git/sameo/mfd-next
Pull MFD update from Samuel Ortiz:
"For 3.10 we have a few new MFD drivers for:
- The ChromeOS embedded controller which provides keyboard, battery
and power management services. This controller is accessible
through i2c or SPI.
- Silicon Laboratories 476x controller, providing access to their FM
chipset and their audio codec.
- Realtek's RTS5249, a memory stick, MMC and SD/SDIO PCI based
reader.
- Nokia's Tahvo power button and watchdog device. This device is
very similar to Retu and is thus supported by the same code base.
- STMicroelectronics STMPE1801, a keyboard and GPIO controller
supported by the stmpe driver.
- ST-Ericsson AB8540 and AB8505 power management and voltage
converter controllers through the existing ab8500 code.
Some other drivers got cleaned up or improved. In particular:
- The Linaro/STE guys got the ab8500 driver in sync with their
internal code through a series of optimizations, fixes and
improvements.
- The AS3711 and OMAP USB drivers now have DT support.
- The arizona clock and interrupt handling code got improved.
- The wm5102 register patch and boot mechanism also got improved."
* tag 'mfd-3.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-next: (104 commits)
mfd: si476x: Don't use 0bNNN
mfd: vexpress: Handle pending config transactions
mfd: ab8500: Export ab8500_gpadc_sw_hw_convert properly
mfd: si476x: Fix i2c warning
mfd: si476x: Add header files and Kbuild plumbing
mfd: si476x: Add chip properties handling code
mfd: si476x: Add the bulk of the core driver
mfd: si476x: Add commands abstraction layer
mfd: rtsx: Support RTS5249
mfd: retu: Add Tahvo support
mfd: ucb1400: Pass ucb1400-gpio data through ac97 bus
mfd: wm8994: Add some OF properties
mfd: wm8994: Add device ID data to WM8994 OF device IDs
input: Export matrix_keypad_parse_of_params()
mfd: tps65090: Add compatible string for charger subnode
mfd: db8500-prcmu: Support platform dependant device selection
mfd: syscon: Fix warnings when printing resource_size_t
of: Add stub of_get_parent for non-OF builds
mfd: omap-usb-tll: Convert to devm_ioremap_resource()
mfd: omap-usb-host: Convert to devm_ioremap_resource()
...
Benjamin Herrenschmidt [Sat, 4 May 2013 14:24:32 +0000 (14:24 +0000)]
powerpc/pnv: Fix "compatible" property for P8 PHB
The property should be "ibm,power8-pciex", not "ibm,p8-pciex". The latter
was changed in FW because it was inconsistent with the rest of the nodes.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt [Sat, 4 May 2013 14:22:57 +0000 (14:22 +0000)]
powerpc/pci: Don't add bogus empty resources to PHBs
When converting to use the new pci_add_resource_offset() we didn't
properly account for empty resources (0 flags) and add those bogons
to the PHBs. The result is some annoying messages in the log.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt [Fri, 3 May 2013 17:21:00 +0000 (17:21 +0000)]
powerpc/powerpnv: Properly handle failure starting CPUs
If OPAL returns an error, propagate it upward rather than spinning
seconds waiting for a CPU that will never show up
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Nishanth Aravamudan [Fri, 3 May 2013 14:49:59 +0000 (14:49 +0000)]
powerpc/cputable: Advertise support for ISEL/HTM/DSCR/TAR on POWER8
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Nishanth Aravamudan [Sat, 4 May 2013 16:01:17 +0000 (16:01 +0000)]
powerpc/cputable: Advertise ISEL support on appropriate embedded processors
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Nishanth Aravamudan [Fri, 3 May 2013 14:48:38 +0000 (14:48 +0000)]
powerpc/cputable: Advertise DSCR support on P7/P7+
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Nishanth Aravamudan [Fri, 3 May 2013 14:47:56 +0000 (14:47 +0000)]
powerpc/cputable: Reserve bits in HWCAP2 for new features
Also, make HTM's presence dependent on the .config option.
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Kleber Sacilotto de Souza [Fri, 3 May 2013 12:43:12 +0000 (12:43 +0000)]
powerpc/pseries: Perform proper max_bus_speed detection
On pseries machines the detection for max_bus_speed should be done
through an OpenFirmware property. This patch adds a function to perform
this detection and a hook to perform dynamic adding of the function only
for pseries. This is done by overwriting the weak
pcibios_root_bridge_prepare function which is called by
pci_create_root_bus().
From: Lucas Kannebley Tavares <lucaskt@linux.vnet.ibm.com>
Signed-off-by: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Brian King [Fri, 3 May 2013 11:30:59 +0000 (11:30 +0000)]
powerpc/pseries: Force 32 bit MSIs for devices that require it
The following patch implements a new PAPR change which allows
the OS to force the use of 32 bit MSIs, regardless of what
the PCI capabilities indicate. This is required for some
devices that advertise support for 64 bit MSIs but don't
actually support them.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Michael Neuling [Thu, 2 May 2013 15:36:14 +0000 (15:36 +0000)]
powerpc/tm: Fix null pointer deference in flush_hash_page
Make sure that current->thread.reg exists before we deference it in
flush_hash_page.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Reported-by: John J Miller <millerjo@us.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Jeremy Kerr [Wed, 1 May 2013 22:31:50 +0000 (22:31 +0000)]
powerpc/powernv: Defer OPAL exception handler registration
Currently, the OPAL exception vectors are registered before the feature
fixups are processed. This means that the now-firmware-owned vectors
will likely be overwritten by the kernel.
This change moves the exception registration code to an early initcall,
rather than at machine_init time.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Anton Blanchard [Wed, 1 May 2013 20:06:33 +0000 (20:06 +0000)]
powerpc: Emulate non privileged DSCR read and write
POWER8 allows read and write of the DSCR in userspace. We added
kernel emulation so applications could always use the instructions
regardless of the CPU type.
Unfortunately there are two SPRs for the DSCR and we only added
emulation for the privileged one. Add code to match the non
privileged one.
A simple test was created to verify the fix:
http://ozlabs.org/~anton/junkcode/user_dscr_test.c
Without the patch we get a SIGILL and it passes with the patch.
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: <stable@kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Linus Torvalds [Sun, 5 May 2013 21:47:31 +0000 (14:47 -0700)]
Merge tag 'kvm-3.10-1' of git://git./virt/kvm/kvm
Pull kvm updates from Gleb Natapov:
"Highlights of the updates are:
general:
- new emulated device API
- legacy device assignment is now optional
- irqfd interface is more generic and can be shared between arches
x86:
- VMCS shadow support and other nested VMX improvements
- APIC virtualization and Posted Interrupt hardware support
- Optimize mmio spte zapping
ppc:
- BookE: in-kernel MPIC emulation with irqfd support
- Book3S: in-kernel XICS emulation (incomplete)
- Book3S: HV: migration fixes
- BookE: more debug support preparation
- BookE: e6500 support
ARM:
- reworking of Hyp idmaps
s390:
- ioeventfd for virtio-ccw
And many other bug fixes, cleanups and improvements"
* tag 'kvm-3.10-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (204 commits)
kvm: Add compat_ioctl for device control API
KVM: x86: Account for failing enable_irq_window for NMI window request
KVM: PPC: Book3S: Add API for in-kernel XICS emulation
kvm/ppc/mpic: fix missing unlock in set_base_addr()
kvm/ppc: Hold srcu lock when calling kvm_io_bus_read/write
kvm/ppc/mpic: remove users
kvm/ppc/mpic: fix mmio region lists when multiple guests used
kvm/ppc/mpic: remove default routes from documentation
kvm: KVM_CAP_IOMMU only available with device assignment
ARM: KVM: iterate over all CPUs for CPU compatibility check
KVM: ARM: Fix spelling in error message
ARM: KVM: define KVM_ARM_MAX_VCPUS unconditionally
KVM: ARM: Fix API documentation for ONE_REG encoding
ARM: KVM: promote vfp_host pointer to generic host cpu context
ARM: KVM: add architecture specific hook for capabilities
ARM: KVM: perform HYP initilization for hotplugged CPUs
ARM: KVM: switch to a dual-step HYP init code
ARM: KVM: rework HYP page table freeing
ARM: KVM: enforce maximum size for identity mapped code
ARM: KVM: move to a KVM provided HYP idmap
...
David Howells [Sat, 4 May 2013 07:48:27 +0000 (08:48 +0100)]
Give the OID registry file module info to avoid kernel tainting
Give the OID registry file module information so that it doesn't taint the
kernel when compiled as a module and loaded.
Reported-by: Dros Adamson <Weston.Adamson@netapp.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Trond Myklebust <Trond.Myklebust@netapp.com>
cc: stable@vger.kernel.org
cc: linux-nfs@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric Dumazet [Fri, 3 May 2013 19:12:45 +0000 (19:12 +0000)]
tcp: do not expire TCP fastopen cookies
TCP metric cache expires entries after one hour.
This probably make sense for TCP RTT/RTTVAR/CWND, but not
for TCP fastopen cookies.
Its better to try previous cookie. If it appears to be obsolete,
server will send us new cookie anyway.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Benjamin Herrenschmidt [Fri, 3 May 2013 17:19:01 +0000 (17:19 +0000)]
net/eth/ibmveth: Fixup retrieval of MAC address
Some ancient pHyp versions used to create a 8 bytes local-mac-address
property in the device-tree instead of a 6 bytes one for veth.
The Linux driver code to deal with that is an insane hack which also
happens to break with some choices of MAC addresses in qemu by testing
for a bit in the address rather than just looking at the size of the
property.
Sanitize this by doing the latter instead.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: <stable@vger.kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
stephen hemminger [Fri, 3 May 2013 14:49:41 +0000 (14:49 +0000)]
virtio: don't expose u16 in userspace api
Programs using virtio headers outside of kernel will no longer
build because u16 type does not exist in userspace. All user ABI
must use __u16 typedef instead.
Bug introduce by:
commit
986a4f4d452dec004697f667439d27c3fda9c928
Author: Jason Wang <jasowang@redhat.com>
Date: Fri Dec 7 07:04:56 2012 +0000
virtio_net: multiqueue support
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sun, 5 May 2013 20:23:27 +0000 (13:23 -0700)]
Merge branch 'timers-nohz-for-linus' of git://git./linux/kernel/git/tip/tip
Pull 'full dynticks' support from Ingo Molnar:
"This tree from Frederic Weisbecker adds a new, (exciting! :-) core
kernel feature to the timer and scheduler subsystems: 'full dynticks',
or CONFIG_NO_HZ_FULL=y.
This feature extends the nohz variable-size timer tick feature from
idle to busy CPUs (running at most one task) as well, potentially
reducing the number of timer interrupts significantly.
This feature got motivated by real-time folks and the -rt tree, but
the general utility and motivation of full-dynticks runs wider than
that:
- HPC workloads get faster: CPUs running a single task should be able
to utilize a maximum amount of CPU power. A periodic timer tick at
HZ=1000 can cause a constant overhead of up to 1.0%. This feature
removes that overhead - and speeds up the system by 0.5%-1.0% on
typical distro configs even on modern systems.
- Real-time workload latency reduction: CPUs running critical tasks
should experience as little jitter as possible. The last remaining
source of kernel-related jitter was the periodic timer tick.
- A single task executing on a CPU is a pretty common situation,
especially with an increasing number of cores/CPUs, so this feature
helps desktop and mobile workloads as well.
The cost of the feature is mainly related to increased timer
reprogramming overhead when a CPU switches its tick period, and thus
slightly longer to-idle and from-idle latency.
Configuration-wise a third mode of operation is added to the existing
two NOHZ kconfig modes:
- CONFIG_HZ_PERIODIC: [formerly !CONFIG_NO_HZ], now explicitly named
as a config option. This is the traditional Linux periodic tick
design: there's a HZ tick going on all the time, regardless of
whether a CPU is idle or not.
- CONFIG_NO_HZ_IDLE: [formerly CONFIG_NO_HZ=y], this turns off the
periodic tick when a CPU enters idle mode.
- CONFIG_NO_HZ_FULL: this new mode, in addition to turning off the
tick when a CPU is idle, also slows the tick down to 1 Hz (one
timer interrupt per second) when only a single task is running on a
CPU.
The .config behavior is compatible: existing !CONFIG_NO_HZ and
CONFIG_NO_HZ=y settings get translated to the new values, without the
user having to configure anything. CONFIG_NO_HZ_FULL is turned off by
default.
This feature is based on a lot of infrastructure work that has been
steadily going upstream in the last 2-3 cycles: related RCU support
and non-periodic cputime support in particular is upstream already.
This tree adds the final pieces and activates the feature. The pull
request is marked RFC because:
- it's marked 64-bit only at the moment - the 32-bit support patch is
small but did not get ready in time.
- it has a number of fresh commits that came in after the merge
window. The overwhelming majority of commits are from before the
merge window, but still some aspects of the tree are fresh and so I
marked it RFC.
- it's a pretty wide-reaching feature with lots of effects - and
while the components have been in testing for some time, the full
combination is still not very widely used. That it's default-off
should reduce its regression abilities and obviously there are no
known regressions with CONFIG_NO_HZ_FULL=y enabled either.
- the feature is not completely idempotent: there is no 100%
equivalent replacement for a periodic scheduler/timer tick. In
particular there's ongoing work to map out and reduce its effects
on scheduler load-balancing and statistics. This should not impact
correctness though, there are no known regressions related to this
feature at this point.
- it's a pretty ambitious feature that with time will likely be
enabled by most Linux distros, and we'd like you to make input on
its design/implementation, if you dislike some aspect we missed.
Without flaming us to crisp! :-)
Future plans:
- there's ongoing work to reduce 1Hz to 0Hz, to essentially shut off
the periodic tick altogether when there's a single busy task on a
CPU. We'd first like 1 Hz to be exposed more widely before we go
for the 0 Hz target though.
- once we reach 0 Hz we can remove the periodic tick assumption from
nr_running>=2 as well, by essentially interrupting busy tasks only
as frequently as the sched_latency constraints require us to do -
once every 4-40 msecs, depending on nr_running.
I am personally leaning towards biting the bullet and doing this in
v3.10, like the -rt tree this effort has been going on for too long -
but the final word is up to you as usual.
More technical details can be found in Documentation/timers/NO_HZ.txt"
* 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits)
sched: Keep at least 1 tick per second for active dynticks tasks
rcu: Fix full dynticks' dependency on wide RCU nocb mode
nohz: Protect smp_processor_id() in tick_nohz_task_switch()
nohz_full: Add documentation.
cputime_nsecs: use math64.h for nsec resolution conversion helpers
nohz: Select VIRT_CPU_ACCOUNTING_GEN from full dynticks config
nohz: Reduce overhead under high-freq idling patterns
nohz: Remove full dynticks' superfluous dependency on RCU tree
nohz: Fix unavailable tick_stop tracepoint in dynticks idle
nohz: Add basic tracing
nohz: Select wide RCU nocb for full dynticks
nohz: Disable the tick when irq resume in full dynticks CPU
nohz: Re-evaluate the tick for the new task after a context switch
nohz: Prepare to stop the tick on irq exit
nohz: Implement full dynticks kick
nohz: Re-evaluate the tick from the scheduler IPI
sched: New helper to prevent from stopping the tick in full dynticks
sched: Kick full dynticks CPU that have more than one task enqueued.
perf: New helper to prevent full dynticks CPUs from stopping tick
perf: Kick full dynticks CPU if events rotation is needed
...
Linus Torvalds [Sun, 5 May 2013 18:37:16 +0000 (11:37 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"Misc fixes plus a small hw-enablement patch for Intel IB model 58
uncore events"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel/lbr: Demand proper privileges for PERF_SAMPLE_BRANCH_KERNEL
perf/x86/intel/lbr: Fix LBR filter
perf/x86: Blacklist all MEM_*_RETIRED events for Ivy Bridge
perf: Fix vmalloc ring buffer pages handling
perf/x86/intel: Fix unintended variable name reuse
perf/x86/intel: Add support for IvyBridge model 58 Uncore
perf/x86/intel: Fix typo in perf_event_intel_uncore.c
x86: Eliminate irq_mis_count counted in arch_irq_stat
Linus Torvalds [Sun, 5 May 2013 17:58:06 +0000 (10:58 -0700)]
Merge tag 'modules-next-for-linus' of git://git./linux/kernel/git/rusty/linux
Pull mudule updates from Rusty Russell:
"We get rid of the general module prefix confusion with a binary config
option, fix a remove/insert race which Never Happens, and (my
favorite) handle the case when we have too many modules for a single
commandline. Seriously, the kernel is full, please go away!"
* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
modpost: fix unwanted VMLINUX_SYMBOL_STR expansion
X.509: Support parse long form of length octets in Authority Key Identifier
module: don't unlink the module until we've removed all exposure.
kernel: kallsyms: memory override issue, need check destination buffer length
MODSIGN: do not send garbage to stderr when enabling modules signature
modpost: handle huge numbers of modules.
modpost: add -T option to read module names from file/stdin.
modpost: minor cleanup.
genksyms: pass symbol-prefix instead of arch
module: fix symbol versioning with symbol prefixes
CONFIG_SYMBOL_PREFIX: cleanup.
Linus Torvalds [Sun, 5 May 2013 17:35:26 +0000 (10:35 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs
Pull single_open() leak fixes from Al Viro:
"A bunch of fixes for a moderately common class of bugs: file with
single_open() done by its ->open() and seq_release as its ->release().
That leaks; fortunately, it's not _too_ common (either people manage
to RTFM that says "When using single_open(), the programmer should use
single_release() instead of seq_release() in the file_operations
structure to avoid a memory leak", or they just copy a correct
instance), but grepping through the tree has caught quite a pile.
All of that is, AFAICS, -stable fodder, for as far as the patches
apply. I tried to carve it up into reasonably-sized pieces (more or
less "comes from the same tree")"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
rcutrace: single_open() leaks
gadget: single_open() leaks
staging: single_open() leaks
megaraid: single_open() leak
wireless: single_open() leaks
input: single_open() leak
rtc: single_open() leaks
ds1620: single_open() leak
sh: single_open() leaks
parisc: single_open() leaks
mips: single_open() leaks
ia64: single_open() leaks
h8300: single_open() leaks
cris: single_open() leaks
arm: single_open() leaks
Linus Torvalds [Sun, 5 May 2013 17:13:44 +0000 (10:13 -0700)]
Merge branch 'ipc-cleanups'
Merge ipc fixes and cleanups from my IPC branch.
The ipc locking has always been pretty ugly, and the scalability fixes
to some degree made it even less readable. We had two cases of double
unlocks in error paths due to this (one rcu read unlock, one semaphore
unlock), and this fixes the bugs I found while trying to clean things up
a bit so that we are less likely to have more.
* ipc-cleanups:
ipc: simplify rcu_read_lock() in semctl_nolock()
ipc: simplify semtimedop/semctl_main() common error path handling
ipc: move sem_obtain_lock() rcu locking into the only caller
ipc: fix double sem unlock in semctl error path
ipc: move the rcu_read_lock() from sem_lock_and_putref() into callers
ipc: sem_putref() does not need the semaphore lock any more
ipc: move rcu_read_unlock() out of sem_unlock() and into callers
Scott Wood [Wed, 1 May 2013 01:00:45 +0000 (20:00 -0500)]
kvm: Add compat_ioctl for device control API
This API shouldn't have 32/64-bit issues, but VFS assumes it does
unless told otherwise.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Peter Zijlstra [Fri, 3 May 2013 12:11:25 +0000 (14:11 +0200)]
perf/x86/intel/lbr: Demand proper privileges for PERF_SAMPLE_BRANCH_KERNEL
We should always have proper privileges when requesting kernel
data.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: <stable@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/20130503121256.230745028@chello.nl
[ Fix build error reported by fengguang.wu@intel.com, propagate error code back. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/n/tip-v0x9ky3ahzr6nm3c6ilwrili@git.kernel.org
Al Viro [Sun, 5 May 2013 04:16:35 +0000 (00:16 -0400)]
rcutrace: single_open() leaks
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 5 May 2013 04:16:11 +0000 (00:16 -0400)]
gadget: single_open() leaks
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 5 May 2013 04:15:43 +0000 (00:15 -0400)]
staging: single_open() leaks
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 5 May 2013 04:15:15 +0000 (00:15 -0400)]
megaraid: single_open() leak
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 5 May 2013 04:13:20 +0000 (00:13 -0400)]
wireless: single_open() leaks
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 5 May 2013 04:12:56 +0000 (00:12 -0400)]
input: single_open() leak
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 5 May 2013 04:12:29 +0000 (00:12 -0400)]
rtc: single_open() leaks
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 5 May 2013 04:11:29 +0000 (00:11 -0400)]
ds1620: single_open() leak
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 5 May 2013 04:11:01 +0000 (00:11 -0400)]
sh: single_open() leaks
Cc: vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 5 May 2013 04:09:44 +0000 (00:09 -0400)]
parisc: single_open() leaks
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 5 May 2013 04:09:30 +0000 (00:09 -0400)]
mips: single_open() leaks
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 5 May 2013 04:09:04 +0000 (00:09 -0400)]
ia64: single_open() leaks
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 5 May 2013 04:08:26 +0000 (00:08 -0400)]
h8300: single_open() leaks
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 5 May 2013 04:07:52 +0000 (00:07 -0400)]
cris: single_open() leaks
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 5 May 2013 04:07:22 +0000 (00:07 -0400)]
arm: single_open() leaks
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>