Russ Anderson [Fri, 22 Mar 2013 22:04:43 +0000 (15:04 -0700)]
mm: zone_end_pfn is too small
Booting with 32 TBytes memory hits BUG at mm/page_alloc.c:552! (output
below).
The key hint is "page
4294967296 outside zone".
4294967296 = 0x100000000 (bit 32 is set).
The problem is in include/linux/mmzone.h:
530 static inline unsigned zone_end_pfn(const struct zone *zone)
531 {
532 return zone->zone_start_pfn + zone->spanned_pages;
533 }
zone_end_pfn is "unsigned" (32 bits). Changing it to "unsigned long"
(64 bits) fixes the problem.
zone_end_pfn() was added recently in commit
108bcc96ef70 ("mm: add & use
zone_end_pfn() and zone_spans_pfn()")
Output from the failure.
No AGP bridge found
page
4294967296 outside zone [
4294967296 -
4327469056 ]
------------[ cut here ]------------
kernel BUG at mm/page_alloc.c:552!
invalid opcode: 0000 [#1] SMP
Modules linked in:
CPU 0
Pid: 0, comm: swapper Not tainted 3.9.0-rc2.dtp+ #10
RIP: free_one_page+0x382/0x430
Process swapper (pid: 0, threadinfo
ffffffff81942000, task
ffffffff81955420)
Call Trace:
__free_pages_ok+0x96/0xb0
__free_pages+0x25/0x50
__free_pages_bootmem+0x8a/0x8c
__free_memory_core+0xea/0x131
free_low_memory_core_early+0x4a/0x98
free_all_bootmem+0x45/0x47
mem_init+0x7b/0x14c
start_kernel+0x216/0x433
x86_64_start_reservations+0x2a/0x2c
x86_64_start_kernel+0x144/0x153
Code: 89 f1 ba 01 00 00 00 31 f6 d3 e2 4c 89 ef e8 66 a4 01 00 e9 2c fe ff ff 0f 0b eb fe 0f 0b 66 66 2e 0f 1f 84 00 00 00 00 00 eb f3 <0f> 0b eb fe 0f 0b 0f 1f 84 00 00 00 00 00 eb f6 0f 0b eb fe 49
Signed-off-by: Russ Anderson <rja@sgi.com>
Reported-by: George Beshers <gbeshers@sgi.com>
Acked-by: Hedi Berriche <hedi@sgi.com>
Cc: Cody P Schafer <cody@linux.vnet.ibm.com>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Fri, 22 Mar 2013 22:04:41 +0000 (15:04 -0700)]
poweroff: change orderly_poweroff() to use schedule_work()
David said:
Commit
6c0c0d4d1080 ("poweroff: fix bug in orderly_poweroff()")
apparently fixes one bug in orderly_poweroff(), but introduces
another. The comments on orderly_poweroff() claim it can be called
from any context - and indeed we call it from interrupt context in
arch/powerpc/platforms/pseries/ras.c for example. But since that
commit this is no longer safe, since call_usermodehelper_fns() is not
safe in interrupt context without the UMH_NO_WAIT option.
orderly_poweroff() can be used from any context but UMH_WAIT_EXEC is
sleepable. Move the "force" logic into __orderly_poweroff() and change
orderly_poweroff() to use the global poweroff_work which simply calls
__orderly_poweroff().
While at it, remove the unneeded "int argc" and change argv_split() to
use GFP_KERNEL.
We use the global "bool poweroff_force" to pass the argument, this can
obviously affect the previous request if it is pending/running. So we
only allow the "false => true" transition assuming that the pending
"true" should succeed anyway. If schedule_work() fails after that we
know that work->func() was not called yet, it must see the new value.
This means that orderly_poweroff() becomes async even if we do not run
the command and always succeeds, schedule_work() can only fail if the
work is already pending. We can export __orderly_poweroff() and change
the non-atomic callers which want the old semantics.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reported-by: David Gibson <david@gibson.dropbear.id.au>
Cc: Lucas De Marchi <lucas.demarchi@profusion.mobi>
Cc: Feng Hong <hongfeng@marvell.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wanpeng Li [Fri, 22 Mar 2013 22:04:40 +0000 (15:04 -0700)]
mm/hugetlb: fix total hugetlbfs pages count when using memory overcommit accouting
hugetlb_total_pages is used for overcommit calculations but the current
implementation considers only the default hugetlb page size (which is
either the first defined hugepage size or the one specified by
default_hugepagesz kernel boot parameter).
If the system is configured for more than one hugepage size, which is
possible since commit
a137e1cc6d6e ("hugetlbfs: per mount huge page
sizes") then the overcommit estimation done by __vm_enough_memory()
(resp. shown by meminfo_proc_show) is not precise - there is an
impression of more available/allowed memory. This can lead to an
unexpected ENOMEM/EFAULT resp. SIGSEGV when memory is accounted.
Testcase:
boot: hugepagesz=1G hugepages=1
the default overcommit ratio is 50
before patch:
egrep 'CommitLimit' /proc/meminfo
CommitLimit:
55434168 kB
after patch:
egrep 'CommitLimit' /proc/meminfo
CommitLimit:
54909880 kB
[akpm@linux-foundation.org: coding-style tweak]
Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: <stable@vger.kernel.org> [3.0+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Frederic Weisbecker [Fri, 22 Mar 2013 22:04:39 +0000 (15:04 -0700)]
printk: Provide a wake_up_klogd() off-case
wake_up_klogd() is useless when CONFIG_PRINTK=n because neither printk()
nor printk_sched() are in use and there are actually no waiter on
log_wait waitqueue. It should be a stub in this case for users like
bust_spinlocks().
Otherwise this results in this warning when CONFIG_PRINTK=n and
CONFIG_IRQ_WORK=n:
kernel/built-in.o In function `wake_up_klogd':
(.text.wake_up_klogd+0xb4): undefined reference to `irq_work_queue'
To fix this, provide an off-case for wake_up_klogd() when
CONFIG_PRINTK=n.
There is much more from console_unlock() and other console related code
in printk.c that should be moved under CONFIG_PRINTK. But for now,
focus on a minimal fix as we passed the merged window already.
[akpm@linux-foundation.org: include printk.h in bust_spinlocks.c]
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Reported-by: James Hogan <james.hogan@imgtec.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
James Hogan [Fri, 22 Mar 2013 22:04:37 +0000 (15:04 -0700)]
irq_work.h: fix warning when CONFIG_IRQ_WORK=n
A randconfig caught repeated compiler warnings when CONFIG_IRQ_WORK=n
due to the definition of a non-inline static function in
<linux/irq_work.h>:
include/linux/irq_work.h +40 : warning: 'irq_work_needs_cpu' defined but not used
Make it inline to supress the warning. This is caused commit
00b42959106a ("irq_work: Don't stop the tick with pending works") merged
in v3.9-rc1.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 22 Mar 2013 00:59:22 +0000 (17:59 -0700)]
Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6
Pull CIFS fixes from Steve French:
"Three small CIFS Fixes (the most important of the three fixes a recent
problem authenticating to Windows 8 using cifs rather than SMB2)"
* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
cifs: ignore everything in SPNEGO blob after mechTypes
cifs: delay super block destruction until all cifsFileInfo objects are gone
cifs: map NT_STATUS_SHARING_VIOLATION to EBUSY instead of ETXTBSY
Linus Torvalds [Fri, 22 Mar 2013 00:56:10 +0000 (17:56 -0700)]
Merge tag 'ext4_for_linue' of git://git./linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"Fix a number of regression and other bugs in ext4, most of which were
relatively obscure cornercases or races that were found using
regression tests."
* tag 'ext4_for_linue' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (21 commits)
ext4: fix data=journal fast mount/umount hang
ext4: fix ext4_evict_inode() racing against workqueue processing code
ext4: fix memory leakage in mext_check_coverage
ext4: use s_extent_max_zeroout_kb value as number of kb
ext4: use atomic64_t for the per-flexbg free_clusters count
jbd2: fix use after free in jbd2_journal_dirty_metadata()
ext4: reserve metadata block for every delayed write
ext4: update reserved space after the 'correction'
ext4: do not use yield()
ext4: remove unused variable in ext4_free_blocks()
ext4: fix WARN_ON from ext4_releasepage()
ext4: fix the wrong number of the allocated blocks in ext4_split_extent()
ext4: update extent status tree after an extent is zeroed out
ext4: fix wrong m_len value after unwritten extent conversion
ext4: add self-testing infrastructure to do a sanity check
ext4: avoid a potential overflow in ext4_es_can_be_merged()
ext4: invalidate extent status tree during extent migration
ext4: remove unnecessary wait for extent conversion in ext4_fallocate()
ext4: add warning to ext4_convert_unwritten_extents_endio
ext4: disable merging of uninitialized extents
...
Jeff Layton [Mon, 11 Mar 2013 13:52:19 +0000 (09:52 -0400)]
cifs: ignore everything in SPNEGO blob after mechTypes
We've had several reports of people attempting to mount Windows 8 shares
and getting failures with a return code of -EINVAL. The default sec=
mode changed recently to sec=ntlmssp. With that, we expect and parse a
SPNEGO blob from the server in the NEGOTIATE reply.
The current decode_negTokenInit function first parses all of the
mechTypes and then tries to parse the rest of the negTokenInit reply.
The parser however currently expects a mechListMIC or nothing to follow the
mechTypes, but Windows 8 puts a mechToken field there instead to carry
some info for the new NegoEx stuff.
In practice, we don't do anything with the fields after the mechTypes
anyway so I don't see any real benefit in continuing to parse them.
This patch just has the kernel ignore the fields after the mechTypes.
We'll probably need to reinstate some of this if we ever want to support
NegoEx.
Reported-by: Jason Burgess <jason@jacknife2.dns2go.com>
Reported-by: Yan Li <elliot.li.tech@gmail.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Linus Torvalds [Thu, 21 Mar 2013 15:37:10 +0000 (08:37 -0700)]
Merge branch 'next' of git://git./linux/kernel/git/rzhang/linux
Pull thermal management fixes from Zhang Rui.
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux:
thermal: exynos_thermal: return a proper error code while thermal_zone_device_register fail.
thermal: rcar_thermal: propagate return value of thermal_zone_device_register
Thermal: kirkwood: Convert to devm_ioremap_resource()
Thermal: rcar: Convert to devm_ioremap_resource()
Thermal: dove: Convert to devm_ioremap_resource()
thermal: rcar: fix missing unlock on error in rcar_thermal_update_temp()
Linus Torvalds [Thu, 21 Mar 2013 15:29:11 +0000 (08:29 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"A fair chunk of the linecount comes from a fix for a tracing bug that
corrupts latency tracing buffers when the overwrite mode is changed on
the fly - the rest is mostly assorted fewliner fixlets."
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86: Add SNB/SNB-EP scheduling constraints for cycle_activity event
kprobes/x86: Check Interrupt Flag modifier when registering probe
kprobes: Make hash_64() as always inlined
perf: Generate EXIT event only once per task context
perf: Reset hwc->last_period on sw clock events
tracing: Prevent buffer overwrite disabled for latency tracers
tracing: Keep overwrite in sync between regular and snapshot buffers
tracing: Protect tracer flags with trace_types_lock
perf tools: Fix LIBNUMA build with glibc 2.12 and older.
tracing: Fix free of probe entry by calling call_rcu_sched()
perf/POWER7: Create a sysfs format entry for Power7 events
perf probe: Fix segfault
libtraceevent: Remove hard coded include to /usr/local/include in Makefile
perf record: Fix -C option
perf tools: check if -DFORTIFY_SOURCE=2 is allowed
perf report: Fix build with NO_NEWT=1
perf annotate: Fix build with NO_NEWT=1
tracing: Fix race in snapshot swapping
Linus Torvalds [Thu, 21 Mar 2013 15:27:58 +0000 (08:27 -0700)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"Radeon, intel and nouveau, along with one mgag200 fix
- intel fix for an ioctl overflow, along with a regression fix for
some phantom irqs on Ironlake.
- nouveau has a lockdep warning and a bunch of thermal fixes
- radeon has new pci ids and some minor fixes."
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (26 commits)
drm/mgag200: Bug fix: Modified pll algorithm for EH project
drm/i915: stop using GMBUS IRQs on Gen4 chips
drm/nv50/kms: prevent lockdep false-positive in page flipping path
drm/nouveau/core: fix return value of nouveau_object_del()
MAINTAINERS: intel-gfx is no longer subscribers-only
drm/i915: Use the fixed pixel clock for eDP in intel_dp_set_m_n()
drm/nouveau/hwmon: do not expose a buggy temperature if it is unavailable
drm/nouveau/therm: display the availability of the internal sensor
drm/nouveau/therm: disable temperature management if the sensor isn't readable
drm/nouveau/therm: disable auto fan management if temperature is not available
drm/nv40/therm: reserve negative temperatures for errors
drm/nv40/therm: disable temperature reading if the bios misses some parameters
drm/nouveau/therm-ic: the temperature is off by sensor_constant, warn the user
drm/nouveau/therm: remove some confusion introduced by therm_mode
drm/nouveau/therm: do not make assumptions on temperature
drm/nv40/therm: increase the sensor's settling delay to 20ms
drm/nv40/therm: improve selection between the old and the new style
Revert "drm/i915: try to train DP even harder"
drm/radeon: add Richland pci ids
drm/radeon: add support for Richland APUs
...
Linus Torvalds [Thu, 21 Mar 2013 15:27:03 +0000 (08:27 -0700)]
Merge tag 'dm-3.9-fixes' of git://git./linux/kernel/git/agk/linux-dm
Pull device-mapper fixes from Alasdair G Kergon:
"Fix reported data loss with discards and thin snapshots; avoid a
deadlock observed in dm verity; fix a race in the new dm cache code
along with some other minor bugs; store the cache policy version on
disk to make the stored hints format future-proof."
* tag 'dm-3.9-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm:
dm cache: policy ignore hints if generated by different version
dm cache: policy change version from string to integer set
dm cache: fix race in writethrough implementation
dm cache: metadata clear dirty bits on clean shutdown
dm cache: avoid calling policy destructor twice on error
dm cache: detect cache_create failure
dm cache: avoid 64 bit division on 32 bit
dm verity: avoid deadlock
dm thin: fix non power of two discard granularity calc
dm thin: fix discard corruption
Dave Airlie [Thu, 21 Mar 2013 00:17:38 +0000 (10:17 +1000)]
Merge branch 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel into drm-next
Daniel writes:
Bunch of fixes, all pretty high-priority
- Fix execbuf argument checking (Kees Cook)
- Optionally obfuscate kernel addresses in dumps (Kees Cook)
- Two patches from Takashi Iwai to fix DP link training regressions he's
seen.
- intel-gfx is no longer subscribers-only (well, just no longer moderated
in an annoying way for non-subscribers), update MAINTAINERS
- gm45 gmbus irq fallout fix (Jiri Kosina)
* 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel:
drm/i915: stop using GMBUS IRQs on Gen4 chips
MAINTAINERS: intel-gfx is no longer subscribers-only
drm/i915: Use the fixed pixel clock for eDP in intel_dp_set_m_n()
Revert "drm/i915: try to train DP even harder"
drm/i915: bounds check execbuffer relocation count
drm/i915: restrict kernel address leak in debugfs
Julia Lemire [Mon, 18 Mar 2013 14:17:47 +0000 (10:17 -0400)]
drm/mgag200: Bug fix: Modified pll algorithm for EH project
While testing the mgag200 kms driver on the HP ProLiant Gen8, a
bug was seen. Once the bootloader would load the selected kernel,
the screen would go black. At first it was assumed that the
mgag200 kms driver was hanging. But after setting up the grub
serial output, it was seen that the driver was being loaded
properly. After trying serval monitors, one finaly displayed
the message "Frequency Out of Range". By comparing the kms pll
algorithm with the previous mgag200 xorg driver pll algorithm,
discrepencies were found. Once the kms pll algorithm was
modified, the expected pll values were produced. This fix was
tested on several monitors of varying native resolutions.
Signed-off-by: Julia Lemire <jlemire@matrox.com>
Cc: stable@vger.kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
Mike Snitzer [Wed, 20 Mar 2013 17:21:28 +0000 (17:21 +0000)]
dm cache: policy ignore hints if generated by different version
When reading the dm cache metadata from disk, ignore the policy hints
unless they were generated by the same major version number of the same
policy module.
The hints are considered to be private data belonging to the specific
module that generated them and there is no requirement for them to make
sense to different versions of the policy that generated them.
Policy modules are all required to work fine if no previous hints are
supplied (or if existing hints are lost).
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mike Snitzer [Wed, 20 Mar 2013 17:21:27 +0000 (17:21 +0000)]
dm cache: policy change version from string to integer set
Separate dm cache policy version string into 3 unsigned numbers
corresponding to major, minor and patchlevel and store them at the end
of the on-disk metadata so we know which version of the policy generated
the hints in case a future version wants to use them differently.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Joe Thornber [Wed, 20 Mar 2013 17:21:27 +0000 (17:21 +0000)]
dm cache: fix race in writethrough implementation
We have found a race in the optimisation used in the dm cache
writethrough implementation. Currently, dm core sends the cache target
two bios, one for the origin device and one for the cache device and
these are processed in parallel. This patch avoids the race by
changing the code back to a simpler (slower) implementation which
processes the two writes in series, one after the other, until we can
develop a complete fix for the problem.
When the cache is in writethrough mode it needs to send WRITE bios to
both the origin and cache devices.
Previously we've been implementing this by having dm core query the
cache target on every write to find out how many copies of the bio it
wants. The cache will ask for two bios if the block is in the cache,
and one otherwise.
Then main problem with this is it's racey. At the time this check is
made the bio hasn't yet been submitted and so isn't being taken into
account when quiescing a block for migration (promotion or demotion).
This means a single bio may be submitted when two were needed because
the block has since been promoted to the cache (catastrophic), or two
bios where only one is needed (harmless).
I really don't want to start entering bios into the quiescing system
(deferred_set) in the get_num_write_bios callback. Instead this patch
simplifies things; only one bio is submitted by the core, this is
first written to the origin and then the cache device in series.
Obviously this will have a latency impact.
deferred_writethrough_bios is introduced to record bios that must be
later issued to the cache device from the worker thread. This deferred
submission, after the origin bio completes, is required given that we're
in interrupt context (writethrough_endio).
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Joe Thornber [Wed, 20 Mar 2013 17:21:27 +0000 (17:21 +0000)]
dm cache: metadata clear dirty bits on clean shutdown
When writing the dirty bitset to the metadata device on a clean
shutdown, clear the dirty bits. Previously they were left indicating
the cache was dirty. This led to confusion about whether there really
was dirty data in the cache or not. (This was a harmless bug.)
Reported-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Heinz Mauelshagen [Wed, 20 Mar 2013 17:21:26 +0000 (17:21 +0000)]
dm cache: avoid calling policy destructor twice on error
If the cache policy's config values are not able to be set we must
set the policy to NULL after destroying it in create_cache_policy()
so we don't attempt to destroy it a second time later.
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Heinz Mauelshagen [Wed, 20 Mar 2013 17:21:26 +0000 (17:21 +0000)]
dm cache: detect cache_create failure
Return error if cache_create() fails.
A missing return check made cache_ctr continue even after an error in
cache_create() resulting in the cache object being destroyed. So a
simple failure like an odd number of cache policy config value arguments
would result in an oops.
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Joe Thornber [Wed, 20 Mar 2013 17:21:25 +0000 (17:21 +0000)]
dm cache: avoid 64 bit division on 32 bit
Squash various 32bit link errors.
>> on i386:
>> drivers/built-in.o: In function `is_discarded_oblock':
>> dm-cache-target.c:(.text+0x1ea28e): undefined reference to `__udivdi3'
...
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mikulas Patocka [Wed, 20 Mar 2013 17:21:25 +0000 (17:21 +0000)]
dm verity: avoid deadlock
A deadlock was found in the prefetch code in the dm verity map
function. This patch fixes this by transferring the prefetch
to a worker thread and skipping it completely if kmalloc fails.
If generic_make_request is called recursively, it queues the I/O
request on the current->bio_list without making the I/O request
and returns. The routine making the recursive call cannot wait
for the I/O to complete.
The deadlock occurs when one thread grabs the bufio_client
mutex and waits for an I/O to complete but the I/O is queued
on another thread's current->bio_list and is waiting to get
the mutex held by the first thread.
The fix recognises that prefetching is not essential. If memory
can be allocated, it queues the prefetch request to the worker thread,
but if not, it does nothing.
Signed-off-by: Paul Taysom <taysom@chromium.org>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: stable@kernel.org
Joe Thornber [Wed, 20 Mar 2013 17:21:25 +0000 (17:21 +0000)]
dm thin: fix non power of two discard granularity calc
Fix a discard granularity calculation to work for non power of 2 block sizes.
In order for thinp to passdown discard bios to the underlying data
device, the data device must have a discard granularity that is a
factor of the thinp block size. Originally this check was done by
using bitops since the block_size was known to be a power of two.
Introduced by commit
f13945d75730081830b6f3360266950e2b7c9067
("dm thin: support a non power of 2 discard_granularity").
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Joe Thornber [Wed, 20 Mar 2013 17:21:24 +0000 (17:21 +0000)]
dm thin: fix discard corruption
Fix a bug in dm_btree_remove that could leave leaf values with incorrect
reference counts. The effect of this was that removal of a shared block
could result in the space maps thinking the block was no longer used.
More concretely, if you have a thin device and a snapshot of it, sending
a discard to a shared region of the thin could corrupt the snapshot.
Thinp uses a 2-level nested btree to store it's mappings. This first
level is indexed by thin device, and the second level by logical
block.
Often when we're removing an entry in this mapping tree we need to
rebalance nodes, which can involve shadowing them, possibly creating a
copy if the block is shared. If we do create a copy then children of
that node need to have their reference counts incremented. In this
way reference counts percolate down the tree as shared trees diverge.
The rebalance functions were incrementing the children at the
appropriate time, but they were always assuming the children were
internal nodes. This meant the leaf values (in our case packed
block/flags entries) were not being incremented.
Cc: stable@vger.kernel.org
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Theodore Ts'o [Wed, 20 Mar 2013 13:42:11 +0000 (09:42 -0400)]
ext4: fix data=journal fast mount/umount hang
In data=journal mode, if we unmount the file system before a
transaction has a chance to complete, when the journal inode is being
evicted, we can end up calling into jbd2_log_wait_commit() for the
last transaction, after the journalling machinery has been shut down.
Arguably we should adjust ext4_should_journal_data() to return FALSE
for the journal inode, but the only place it matters is
ext4_evict_inode(), and so to save a bit of CPU time, and to make the
patch much more obviously correct by inspection(tm), we'll fix it by
explicitly not trying to waiting for a journal commit when we are
evicting the journal inode, since it's guaranteed to never succeed in
this case.
This can be easily replicated via:
mount -t ext4 -o data=journal /dev/vdb /vdb ; umount /vdb
------------[ cut here ]------------
WARNING: at /usr/projects/linux/ext4/fs/jbd2/journal.c:542 __jbd2_log_start_commit+0xba/0xcd()
Hardware name: Bochs
JBD2: bad log_start_commit:
3005630206 3005630206 0 0
Modules linked in:
Pid: 2909, comm: umount Not tainted 3.8.0-rc3 #1020
Call Trace:
[<
c015c0ef>] warn_slowpath_common+0x68/0x7d
[<
c02b7e7d>] ? __jbd2_log_start_commit+0xba/0xcd
[<
c015c177>] warn_slowpath_fmt+0x2b/0x2f
[<
c02b7e7d>] __jbd2_log_start_commit+0xba/0xcd
[<
c02b8075>] jbd2_log_start_commit+0x24/0x34
[<
c0279ed5>] ext4_evict_inode+0x71/0x2e3
[<
c021f0ec>] evict+0x94/0x135
[<
c021f9aa>] iput+0x10a/0x110
[<
c02b7836>] jbd2_journal_destroy+0x190/0x1ce
[<
c0175284>] ? bit_waitqueue+0x50/0x50
[<
c028d23f>] ext4_put_super+0x52/0x294
[<
c020efe3>] generic_shutdown_super+0x48/0xb4
[<
c020f071>] kill_block_super+0x22/0x60
[<
c020f3e0>] deactivate_locked_super+0x22/0x49
[<
c020f5d6>] deactivate_super+0x30/0x33
[<
c0222795>] mntput_no_expire+0x107/0x10c
[<
c02233a7>] sys_umount+0x2cf/0x2e0
[<
c02233ca>] sys_oldumount+0x12/0x14
[<
c08096b8>] syscall_call+0x7/0xb
---[ end trace
6a954cc790501c1f ]---
jbd2_log_wait_commit: error: j_commit_request=-
1289337090, tid=0
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: stable@vger.kernel.org
Theodore Ts'o [Wed, 20 Mar 2013 13:39:42 +0000 (09:39 -0400)]
ext4: fix ext4_evict_inode() racing against workqueue processing code
Commit
84c17543ab56 (ext4: move work from io_end to inode) triggered a
regression when running xfstest #270 when the file system is mounted
with dioread_nolock.
The problem is that after ext4_evict_inode() calls ext4_ioend_wait(),
this guarantees that last io_end structure has been freed, but it does
not guarantee that the workqueue structure, which was moved into the
inode by commit
84c17543ab56, is actually finished. Once
ext4_flush_completed_IO() calls ext4_free_io_end() on CPU #1, this
will allow ext4_ioend_wait() to return on CPU #2, at which point the
evict_inode() codepath can race against the workqueue code on CPU #1
accessing EXT4_I(inode)->i_unwritten_work to find the next item of
work to do.
Fix this by calling cancel_work_sync() in ext4_ioend_wait(), which
will be renamed ext4_ioend_shutdown(), since it is only used by
ext4_evict_inode(). Also, move the call to ext4_ioend_shutdown()
until after truncate_inode_pages() and filemap_write_and_wait() are
called, to make sure all dirty pages have been written back and
flushed from the page cache first.
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<
c01dda6a>] cwq_activate_delayed_work+0x3b/0x7e
*pdpt =
0000000030bc3001 *pde =
0000000000000000
Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
Modules linked in:
Pid: 6, comm: kworker/u:0 Not tainted 3.8.0-rc3-00013-g84c1754-dirty #91 Bochs Bochs
EIP: 0060:[<
c01dda6a>] EFLAGS:
00010046 CPU: 0
EIP is at cwq_activate_delayed_work+0x3b/0x7e
EAX:
00000000 EBX:
00000000 ECX:
f505fe54 EDX:
00000000
ESI:
ed5b697c EDI:
00000006 EBP:
f64b7e8c ESP:
f64b7e84
DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
CR0:
8005003b CR2:
00000000 CR3:
30bc2000 CR4:
000006f0
DR0:
00000000 DR1:
00000000 DR2:
00000000 DR3:
00000000
DR6:
ffff0ff0 DR7:
00000400
Process kworker/u:0 (pid: 6, ti=
f64b6000 task=
f64b4160 task.ti=
f64b6000)
Stack:
f505fe00 00000006 f64b7e9c c01de3d7 f6435540 00000003 f64b7efc c01def1d
f6435540 00000002 00000000 0000008a c16d0808 c040a10b c16d07d8 c16d08b0
f505fe00 c16d0780 00000000 00000000 ee153df4 c1ce4a30 c17d0e30 00000000
Call Trace:
[<
c01de3d7>] cwq_dec_nr_in_flight+0x71/0xfb
[<
c01def1d>] process_one_work+0x5d8/0x637
[<
c040a10b>] ? ext4_end_bio+0x300/0x300
[<
c01e3105>] worker_thread+0x249/0x3ef
[<
c01ea317>] kthread+0xd8/0xeb
[<
c01e2ebc>] ? manage_workers+0x4bb/0x4bb
[<
c023a370>] ? trace_hardirqs_on+0x27/0x37
[<
c0f1b4b7>] ret_from_kernel_thread+0x1b/0x28
[<
c01ea23f>] ? __init_kthread_worker+0x71/0x71
Code: 01 83 15 ac ff 6c c1 00 31 db 89 c6 8b 00 a8 04 74 12 89 c3 30 db 83 05 b0 ff 6c c1 01 83 15 b4 ff 6c c1 00 89 f0 e8 42 ff ff ff <8b> 13 89 f0 83 05 b8 ff 6c c1
6c c1 00 31 c9 83
EIP: [<
c01dda6a>] cwq_activate_delayed_work+0x3b/0x7e SS:ESP 0068:
f64b7e84
CR2:
0000000000000000
---[ end trace
a1923229da53d8a4 ]---
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Jan Kara <jack@suse.cz>
Dave Airlie [Wed, 20 Mar 2013 06:27:05 +0000 (16:27 +1000)]
Merge branch 'drm-fixes-3.9' of git://people.freedesktop.org/~agd5f/linux into drm-next
Alex writes:
"Mostly just small bug fixes. Big change is new pci ids
for Richland APUs."
* 'drm-fixes-3.9' of git://people.freedesktop.org/~agd5f/linux:
drm/radeon: add Richland pci ids
drm/radeon: add support for Richland APUs
drm/radeon/benchmark: allow same domains for dma copy
drm/radeon/benchmark: make sure bo blit copy exists before using it
drm/radeon: fix backend map setup on 1 RB trinity boards
drm/radeon: fix S/R on VM systems (cayman/TN/SI)
Dave Airlie [Wed, 20 Mar 2013 06:10:18 +0000 (16:10 +1000)]
Merge branch 'drm-nouveau-fixes-3.9' of git://anongit.freedesktop.org/git/nouveau/linux-2.6 into drm-next
Lots of thermal fixes and fix a lockdep warning we've been seeing.
* 'drm-nouveau-fixes-3.9' of git://anongit.freedesktop.org/git/nouveau/linux-2.6:
drm/nv50/kms: prevent lockdep false-positive in page flipping path
drm/nouveau/core: fix return value of nouveau_object_del()
drm/nouveau/hwmon: do not expose a buggy temperature if it is unavailable
drm/nouveau/therm: display the availability of the internal sensor
drm/nouveau/therm: disable temperature management if the sensor isn't readable
drm/nouveau/therm: disable auto fan management if temperature is not available
drm/nv40/therm: reserve negative temperatures for errors
drm/nv40/therm: disable temperature reading if the bios misses some parameters
drm/nouveau/therm-ic: the temperature is off by sensor_constant, warn the user
drm/nouveau/therm: remove some confusion introduced by therm_mode
drm/nouveau/therm: do not make assumptions on temperature
drm/nv40/therm: increase the sensor's settling delay to 20ms
drm/nv40/therm: improve selection between the old and the new style
Linus Torvalds [Wed, 20 Mar 2013 01:25:20 +0000 (18:25 -0700)]
Merge tag 'vfio-v3.9-rc4' of git://github.com/awilliam/linux-vfio
Pull vfio fix from Alex Williamson.
* tag 'vfio-v3.9-rc4' of git://github.com/awilliam/linux-vfio:
vfio: include <linux/slab.h> for kmalloc
Linus Torvalds [Wed, 20 Mar 2013 01:24:12 +0000 (18:24 -0700)]
Merge git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Marcelo Tosatti.
* git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: Fix bounds checking in ioapic indirect register reads (CVE-2013-1798)
KVM: x86: Convert MSR_KVM_SYSTEM_TIME to use gfn_to_hva_cache functions (CVE-2013-1797)
KVM: x86: fix for buffer overflow in handling of MSR_KVM_SYSTEM_TIME (CVE-2013-1796)
KVM: x86: fix deadlock in clock-in-progress request handling
KVM: allow host header to be included even for !CONFIG_KVM
Jiri Kosina [Tue, 19 Mar 2013 08:56:57 +0000 (09:56 +0100)]
drm/i915: stop using GMBUS IRQs on Gen4 chips
Commit
28c70f162 ("drm/i915: use the gmbus irq for waits") switched to
using GMBUS irqs instead of GPIO bit-banging for chipset generations 4
and above.
It turns out though that on many systems this leads to spurious interrupts
being generated, long after the register write to disable the IRQs has been
issued.
Typically this results in the spurious interrupt source getting
disabled:
[ 9.636345] irq 16: nobody cared (try booting with the "irqpoll" option)
[ 9.637915] Pid: 4157, comm: ifup Tainted: GF 3.9.0-rc2-00341-g0863702 #422
[ 9.639484] Call Trace:
[ 9.640731] <IRQ> [<
ffffffff8109b40d>] __report_bad_irq+0x1d/0xc7
[ 9.640731] [<
ffffffff8109b7db>] note_interrupt+0x15b/0x1e8
[ 9.640731] [<
ffffffff810999f7>] handle_irq_event_percpu+0x1bf/0x214
[ 9.640731] [<
ffffffff81099a88>] handle_irq_event+0x3c/0x5c
[ 9.640731] [<
ffffffff8109c139>] handle_fasteoi_irq+0x7a/0xb0
[ 9.640731] [<
ffffffff8100400e>] handle_irq+0x1a/0x24
[ 9.640731] [<
ffffffff81003d17>] do_IRQ+0x48/0xaf
[ 9.640731] [<
ffffffff8142f1ea>] common_interrupt+0x6a/0x6a
[ 9.640731] <EOI> [<
ffffffff8142f952>] ? system_call_fastpath+0x16/0x1b
[ 9.640731] handlers:
[ 9.640731] [<
ffffffffa000d771>] usb_hcd_irq [usbcore]
[ 9.640731] [<
ffffffffa0306189>] yenta_interrupt [yenta_socket]
[ 9.640731] Disabling IRQ #16
The really curious thing is now that irq 16 is _not_ the interrupt for
the i915 driver when using MSI, but it _is_ the interrupt when not
using MSI. So by all indications it seems like gmbus is able to
generate a legacy (shared) interrupt in MSI mode on some
configurations. I've tried to reproduce this and the differentiating
thing seems to be that on unaffected systems no other device uses irq
16 (which seems to be the non-MSI intel gfx interrupt on all gm45).
I have no idea how that even can happen.
To avoid tempting this elephant into a rage, just disable gmbus
interrupt support on gen 4.
v2: Improve the commit message with exact details of what's going on.
Also add a comment in the code to warn against this particular
elephant in the room.
v3: Move the comment explaing how gen4 blows up next to the definition
of HAS_GMBUS_IRQ to keep the code-flow straight. Suggested by Chris
Wilson.
Signed-off-by: Jiri Kosina <jkosina@suse.cz> (v1)
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
References: https://lkml.org/lkml/2013/3/8/325
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Linus Torvalds [Tue, 19 Mar 2013 22:17:40 +0000 (15:17 -0700)]
Merge tag 'for-linus-v3.9-rc4' of git://oss.sgi.com/xfs/xfs
Pull XFS fixes from Ben Myers:
- Fix for a potential infinite loop which was introduced in commit
4d559a3bcb73 ("xfs: limit speculative prealloc near ENOSPC
thresholds")
- Fix for the return type of xfs_iomap_eof_prealloc_initial_size from
commit
a1e16c26660b ("xfs: limit speculative prealloc size on sparse
files")
- Fix for a failed buffer readahead causing subsequent callers to fail
incorrectly
* tag 'for-linus-v3.9-rc4' of git://oss.sgi.com/xfs/xfs:
xfs: ensure we capture IO errors correctly
xfs: fix xfs_iomap_eof_prealloc_initial_size type
xfs: fix potential infinite loop in xfs_iomap_prealloc_size()
Matthew Garrett [Tue, 19 Mar 2013 21:26:57 +0000 (17:26 -0400)]
PCI: Use ROM images from firmware only if no other ROM source available
Mantas MikulÄ—nas reported that his graphics hardware failed to
initialise after commit
f9a37be0f02a ("x86: Use PCI setup data").
The aim of this commit was to ensure that ROM images were available on
some Apple systems that don't expose the GPU ROM via any other source.
In this case, UEFI appears to have provided a broken ROM image that we
were using even though there was a perfectly valid ROM available via
other sources. The simplest way to handle this seems to be to just
re-order pci_map_rom() and leave any firmare-supplied ROM to last.
Signed-off-by: Matthew Garrett <matthew.garrett@nebula.com>
Tested-by: Mantas MikulÄ—nas <grawity@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Tue, 19 Mar 2013 21:47:11 +0000 (14:47 -0700)]
Merge git://git./linux/kernel/git/davem/sparc
Pull sparc fixes from David Miller:
"Just some minor fixups, a sunsu console setup panic cure, and
recognition of a Fujitsu sun4v cpu."
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc: remove unused "config BITS"
sparc: delete "if !ULTRA_HAS_POPULATION_COUNT"
sparc64: correctly recognize SPARC64-X chips
sparc,leon: fix GRPCI2 device0 PCI config space access
sunsu: Fix panic in case of nonexistent port at "console=ttySY" cmdline option
Linus Torvalds [Tue, 19 Mar 2013 20:56:18 +0000 (13:56 -0700)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/cmarinas/linux-aarch64
Pull arm64 fixes from Catalin Marinas:
- Fix !SMP build error.
- Fix padding computation in struct ucontext (no ABI change).
- Minor clean-up after the signal patches (unused var).
- Two old Kconfig options clean-up.
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64:
arm64: Kconfig.debug: Remove unused CONFIG_DEBUG_ERRORS
arm64: Do not select GENERIC_HARDIRQS_NO_DEPRECATED
arm64: fix padding computation in struct ucontext
arm64: Fix build error with !SMP
arm64: Removed unused variable in compat_setup_rt_frame()
Paul Bolle [Tue, 19 Mar 2013 05:58:47 +0000 (05:58 +0000)]
sparc: remove unused "config BITS"
sparc's asm/module.h got removed in commit
786d35d45cc40b2a51a18f73e14e135d47fdced7 ("Make most arch asm/module.h
files use asm-generic/module.h"). That removed the only two uses of this
Kconfig symbol. So we can remove its entry too.
> >From arch/sparc/Makefile:
> ifeq ($(CONFIG_SPARC32),y)
> [...]
>
> [...]
> export BITS := 32
> [...]
>
> else
> [...]
>
> [...]
> export BITS := 64
> [...]
>
> So $(BITS) is set depending on whether CONFIG_SPARC32 is set or not.
> Using $(BITS) in sparc's Makefiles is not using CONFIG_BITS. That
> doesn't count as usage of "config BITS".
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Tue, 19 Mar 2013 20:20:51 +0000 (13:20 -0700)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Fix ARM BPF JIT handling of negative 'k' values, from Chen Gang.
2) Insufficient space reserved for bridge netlink values, fix from
Stephen Hemminger.
3) Some dst_neigh_lookup*() callers don't interpret error pointer
correctly, fix from Zhouyi Zhou.
4) Fix transport match in SCTP active_path loops, from Xugeng Zhang.
5) Fix qeth driver handling of multi-order SKB frags, from Frank
Blaschka.
6) fec driver is missing napi_disable() call, resulting in crashes on
unload, from Georg Hofmann.
7) Don't try to handle PMTU events on a listening socket, fix from Eric
Dumazet.
8) Fix timestamp location calculations in IP option processing, from
David Ward.
9) FIB_TABLE_HASHSZ setting is not controlled by the correct kconfig
tests, from Denis V Lunev.
10) Fix TX descriptor push handling in SFC driver, from Ben Hutchings.
11) Fix isdn/hisax and tulip/de4x5 kconfig dependencies, from Arnd
Bergmann.
12) bnx2x statistics don't handle 4GB rollover correctly, fix from
Maciej Żenczykowski.
13) Openvswitch bug fixes for vport del/new error reporting, missing
genlmsg_end() call in netlink processing, and mis-parsing of
LLC/SNAP ethernet types. From Rich Lane.
14) SKB pfmemalloc state should only be propagated from the head page of
a compound page, fix from Pavel Emelyanov.
15) Fix link handling in tg3 driver for 5715 chips when autonegotation
is disabled. From Nithin Sujir.
16) Fix inverted test of cpdma_check_free_tx_desc return value in
davinci_emac driver, from Mugunthan V N.
17) vlan_depth is incorrectly calculated in skb_network_protocol(), from
Li RongQing.
18) Fix probing of Gobi 1K devices in qmi_wwan driver, and fix NCM
device mode backwards compat in cdc_ncm driver. From Bjørn Mork.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (52 commits)
inet: limit length of fragment queue hash table bucket lists
qeth: Fix scatter-gather regression
qeth: Fix invalid router settings handling
qeth: delay feature trace
tcp: dont handle MTU reduction on LISTEN socket
bnx2x: fix occasional statistics off-by-4GB error
vhost/net: fix heads usage of ubuf_info
bridge: Add support for setting BR_ROOT_BLOCK flag.
bnx2x: add missing napi deletion in error path
drivers: net: ethernet: ti: davinci_emac: fix usage of cpdma_check_free_tx_desc()
ethernet/tulip: DE4x5 needs VIRT_TO_BUS
isdn: hisax: netjet requires VIRT_TO_BUS
net: cdc_ncm, cdc_mbim: allow user to prefer NCM for backwards compatibility
rtnetlink: Mask the rta_type when range checking
Revert "ip_gre: make ipgre_tunnel_xmit() not parse network header as IP unconditionally"
Fix dst_neigh_lookup/dst_neigh_lookup_skb return value handling bug
smsc75xx: configuration help incorrectly mentions smsc95xx
net: fec: fix missing napi_disable call
net: fec: restart the FEC when PHY speed changes
skb: Propagate pfmemalloc on skb from head page only
...
Paul Bolle [Tue, 12 Mar 2013 20:35:19 +0000 (21:35 +0100)]
sparc: delete "if !ULTRA_HAS_POPULATION_COUNT"
Commit
2d78d4beb64eb07d50665432867971c481192ebf ("[PATCH] bitops:
sparc64: use generic bitops") made the default of GENERIC_HWEIGHT depend
on !ULTRA_HAS_POPULATION_COUNT. But since there's no Kconfig symbol with
that name, this always evaluates to true. Delete this dependency.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Honig [Wed, 20 Feb 2013 22:49:16 +0000 (14:49 -0800)]
KVM: Fix bounds checking in ioapic indirect register reads (CVE-2013-1798)
If the guest specifies a IOAPIC_REG_SELECT with an invalid value and follows
that with a read of the IOAPIC_REG_WINDOW KVM does not properly validate
that request. ioapic_read_indirect contains an
ASSERT(redir_index < IOAPIC_NUM_PINS), but the ASSERT has no effect in
non-debug builds. In recent kernels this allows a guest to cause a kernel
oops by reading invalid memory. In older kernels (pre-3.3) this allows a
guest to read from large ranges of host memory.
Tested: tested against apic unit tests.
Signed-off-by: Andrew Honig <ahonig@google.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Andy Honig [Wed, 20 Feb 2013 22:48:10 +0000 (14:48 -0800)]
KVM: x86: Convert MSR_KVM_SYSTEM_TIME to use gfn_to_hva_cache functions (CVE-2013-1797)
There is a potential use after free issue with the handling of
MSR_KVM_SYSTEM_TIME. If the guest specifies a GPA in a movable or removable
memory such as frame buffers then KVM might continue to write to that
address even after it's removed via KVM_SET_USER_MEMORY_REGION. KVM pins
the page in memory so it's unlikely to cause an issue, but if the user
space component re-purposes the memory previously used for the guest, then
the guest will be able to corrupt that memory.
Tested: Tested against kvmclock unit test
Signed-off-by: Andrew Honig <ahonig@google.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Andy Honig [Mon, 11 Mar 2013 16:34:52 +0000 (09:34 -0700)]
KVM: x86: fix for buffer overflow in handling of MSR_KVM_SYSTEM_TIME (CVE-2013-1796)
If the guest sets the GPA of the time_page so that the request to update the
time straddles a page then KVM will write onto an incorrect page. The
write is done byusing kmap atomic to get a pointer to the page for the time
structure and then performing a memcpy to that page starting at an offset
that the guest controls. Well behaved guests always provide a 32-byte aligned
address, however a malicious guest could use this to corrupt host kernel
memory.
Tested: Tested against kvmclock unit test.
Signed-off-by: Andrew Honig <ahonig@google.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Paul Bolle [Tue, 19 Mar 2013 15:41:37 +0000 (15:41 +0000)]
arm64: Kconfig.debug: Remove unused CONFIG_DEBUG_ERRORS
The Kconfig entry for DEBUG_ERRORS is a verbatim copy of the former arm
entry for that symbol. It got removed in v2.6.39 because it wasn't
actually used anywhere. There are still no users of DEBUG_ERRORS so
remove this entry too.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
[catalin.marinas@arm.com: removed option from defconfig]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Paul Bolle [Tue, 5 Mar 2013 20:43:42 +0000 (20:43 +0000)]
arm64: Do not select GENERIC_HARDIRQS_NO_DEPRECATED
Config option GENERIC_HARDIRQS_NO_DEPRECATED was removed in commit
78c89825649a9a5ed526c507603196f467d781a5 ("genirq: Remove the now obsolete
config options and select statements"), but the select was accidentally
reintroduced in commit
8c2c3df31e3b87cb5348e48776c366ebd1dc5a7a ("arm64:
Build infrastructure").
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Hannes Frederic Sowa [Fri, 15 Mar 2013 11:32:30 +0000 (11:32 +0000)]
inet: limit length of fragment queue hash table bucket lists
This patch introduces a constant limit of the fragment queue hash
table bucket list lengths. Currently the limit 128 is choosen somewhat
arbitrary and just ensures that we can fill up the fragment cache with
empty packets up to the default ip_frag_high_thresh limits. It should
just protect from list iteration eating considerable amounts of cpu.
If we reach the maximum length in one hash bucket a warning is printed.
This is implemented on the caller side of inet_frag_find to distinguish
between the different users of inet_fragment.c.
I dropped the out of memory warning in the ipv4 fragment lookup path,
because we already get a warning by the slab allocator.
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jesper Dangaard Brouer <jbrouer@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Frank Blaschka [Mon, 18 Mar 2013 20:04:44 +0000 (20:04 +0000)]
qeth: Fix scatter-gather regression
This patch fixes a scatter-gather regression introduced with
commit
5640f768 net: use a per task frag allocator
Now the qeth driver can cope with bigger framents and split a fragment in
sub framents if required.
Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefan Raspl [Mon, 18 Mar 2013 20:04:43 +0000 (20:04 +0000)]
qeth: Fix invalid router settings handling
Give a bad return code when specifying a router setting that is either
invalid or not support on the respective device type. In addition, fall back
the previous setting instead of silently switching back to 'no routing'.
Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefan Raspl [Mon, 18 Mar 2013 20:04:42 +0000 (20:04 +0000)]
qeth: delay feature trace
Delay tracing of the card features until the optional commands have been
enabled.
Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Skeggs [Tue, 19 Mar 2013 05:20:00 +0000 (15:20 +1000)]
drm/nv50/kms: prevent lockdep false-positive in page flipping path
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Mon, 18 Mar 2013 23:57:57 +0000 (09:57 +1000)]
drm/nouveau/core: fix return value of nouveau_object_del()
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Linus Torvalds [Tue, 19 Mar 2013 01:49:42 +0000 (18:49 -0700)]
Merge branch 'hwmon-for-linus' of git://git./linux/kernel/git/jdelvare/staging
Pull hwmon fixes from Jean Delvare.
* 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
hwmon: (lm75) Fix tcn75 prefix
hwmon: (lm75.h) Update header inclusion
MAINTAINERS: Remove Mark M. Hoffman
Ben Collins [Mon, 18 Mar 2013 23:19:07 +0000 (19:19 -0400)]
sgy-cts1000: Remove __dev* attributes
Somehow the driver snuck in with these still in it.
Signed-off-by: Ben Collins <ben.c@servergy.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Tue, 19 Mar 2013 01:47:07 +0000 (18:47 -0700)]
Merge branch 'for-3.9-fixes' of git://git./linux/kernel/git/tj/wq
Pull workqueue fix from Tejun Heo:
"Lai's patch to fix highly unlikely but still possible workqueue stall
during CPU hotunplug."
* 'for-3.9-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: fix possible pool stall bug in wq_unbind_fn()
Marcelo Tosatti [Mon, 18 Mar 2013 16:54:32 +0000 (13:54 -0300)]
KVM: x86: fix deadlock in clock-in-progress request handling
There is a deadlock in pvclock handling:
cpu0: cpu1:
kvm_gen_update_masterclock()
kvm_guest_time_update()
spin_lock(pvclock_gtod_sync_lock)
local_irq_save(flags)
spin_lock(pvclock_gtod_sync_lock)
kvm_make_mclock_inprogress_request(kvm)
make_all_cpus_request()
smp_call_function_many()
Now if smp_call_function_many() called by cpu0 tries to call function on
cpu1 there will be a deadlock.
Fix by moving pvclock_gtod_sync_lock protected section outside irq
disabled section.
Analyzed by Gleb Natapov <gleb@redhat.com>
Acked-by: Gleb Natapov <gleb@redhat.com>
Reported-and-Tested-by: Yongjie Ren <yongjie.ren@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Kevin Hilman [Fri, 15 Mar 2013 00:13:46 +0000 (17:13 -0700)]
KVM: allow host header to be included even for !CONFIG_KVM
The new context tracking subsystem unconditionally includes kvm_host.h
headers for the guest enter/exit macros. This causes a compile
failure when KVM is not enabled.
Fix by adding an IS_ENABLED(CONFIG_KVM) check to kvm_host so it can
be included/compiled even when KVM is not enabled.
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Kevin Hilman <khilman@linaro.org>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Jean Delvare [Mon, 18 Mar 2013 20:19:49 +0000 (21:19 +0100)]
hwmon: (lm75) Fix tcn75 prefix
The TCN75 has its own prefix for a long time now.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Jean Delvare [Mon, 18 Mar 2013 20:19:49 +0000 (21:19 +0100)]
hwmon: (lm75.h) Update header inclusion
File lm75.h used to include <linux/hwmon.h> for SENSORS_LIMIT() but
this function is gone by now. Instead we call clamp_val() so we should
include <linux/kernel.h>, where this function is declared.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Jean Delvare [Mon, 18 Mar 2013 20:19:49 +0000 (21:19 +0100)]
MAINTAINERS: Remove Mark M. Hoffman
Mark M. Hoffman stopped working on the Linux kernel several years
ago, so he should no longer be listed as a driver maintainer. I'm not
even sure if his e-mail address still works.
I can take over 3 drivers he was responsible for, the 4th one will
fall down to the subsystem maintainer.
Also give Mark credit for all the good work he did.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: "Mark M. Hoffman" <mhoffman@lightlink.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Cc: Wolfram Sang <wolfram@the-dreams.de>
Dave Chinner [Tue, 12 Mar 2013 12:30:34 +0000 (23:30 +1100)]
xfs: ensure we capture IO errors correctly
Failed buffer readahead can leave the buffer in the cache marked
with an error. Most callers that then issue a subsequent read on the
buffer do not zero the b_error field out, and so we may incorectly
detect an error during IO completion due to the stale error value
left on the buffer.
Avoid this problem by zeroing the error before IO submission. This
ensures that the only IO errors that are detected those captured
from are those captured from bio submission or completion.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
(cherry picked from commit
c163f9a1760229a95d04e37b332de7d5c1c225cd)
Mark Tinguely [Sun, 24 Feb 2013 19:04:37 +0000 (13:04 -0600)]
xfs: fix xfs_iomap_eof_prealloc_initial_size type
Fix the return type of xfs_iomap_eof_prealloc_initial_size() to
xfs_fsblock_t to reflect the fact that the return value may be an
unsigned 64 bits if XFS_BIG_BLKNOS is defined.
Signed-off-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
(cherry picked from commit
e8108cedb1c5d1dc359690d18ca997e97a0061d2)
Brian Foster [Fri, 22 Feb 2013 18:32:56 +0000 (13:32 -0500)]
xfs: fix potential infinite loop in xfs_iomap_prealloc_size()
If freesp == 0, we could end up in an infinite loop while squashing
the preallocation. Break the loop when we've killed the prealloc
entirely.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
(cherry picked from commit
e78c420bfc2608bb5f9a0b9165b1071c1e31166a)
Eric Dumazet [Mon, 18 Mar 2013 07:01:28 +0000 (07:01 +0000)]
tcp: dont handle MTU reduction on LISTEN socket
When an ICMP ICMP_FRAG_NEEDED (or ICMPV6_PKT_TOOBIG) message finds a
LISTEN socket, and this socket is currently owned by the user, we
set TCP_MTU_REDUCED_DEFERRED flag in listener tsq_flags.
This is bad because if we clone the parent before it had a chance to
clear the flag, the child inherits the tsq_flags value, and next
tcp_release_cb() on the child will decrement sk_refcnt.
Result is that we might free a live TCP socket, as reported by
Dormando.
IPv4: Attempt to release TCP socket in state 1
Fix this issue by testing sk_state against TCP_LISTEN early, so that we
set TCP_MTU_REDUCED_DEFERRED on appropriate sockets (not a LISTEN one)
This bug was introduced in commit
563d34d05786
(tcp: dont drop MTU reduction indications)
Reported-by: dormando <dormando@rydia.net>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Maciej Żenczykowski [Fri, 15 Mar 2013 11:56:17 +0000 (11:56 +0000)]
bnx2x: fix occasional statistics off-by-4GB error
The UPDATE_QSTAT function introduced on February 15, 2012
in commit
1355b704b9ba "bnx2x: consistent statistics after
internal driver reload" incorrectly fails to handle overflow
during addition of the lower 32-bit field of a stat.
This bug is present since 3.4-rc1 and should thus be considered
a candidate for stable 3.4+ releases.
Google-Bug-Id: 8374428
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Cc: Mintz Yuval <yuvalmin@broadcom.com>
Acked-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Monakhov [Mon, 18 Mar 2013 15:40:19 +0000 (11:40 -0400)]
ext4: fix memory leakage in mext_check_coverage
Regression was introduced by following commit
8c854473
TESTCASE (git://oss.sgi.com/xfs/cmds/xfstests.git):
#while true;do ./check 301 || break ;done
Also fix potential memory leakage in get_ext_path() once
ext4_ext_find_extent() have failed.
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Linus Torvalds [Mon, 18 Mar 2013 15:27:41 +0000 (08:27 -0700)]
Merge tag 'for-linus-
20130318' of git://git.infradead.org/linux-mtd
Pull MTD fixes from David Woodhouse:
"This fixes a couple of problems. Firstly, some people are actually
still using old small-page flash and we broke it by removing the ready
check.
Secondly. fix the handling of partitions on Broadcom 47xx devices.
Recent changes had made it misdetect the location of the NVRAM and
scribble over the bootloader when it tried to update the variables
there. With predictably sad results."
* tag 'for-linus-
20130318' of git://git.infradead.org/linux-mtd:
mtd: nand: reintroduce NAND_NO_READRDY as NAND_NEED_READRDY
mtd: bcm47xxpart: look for NVRAM at the end of device
Revert "mtd: bcm47xxpart: improve probing of nvram partition"
Linus Torvalds [Mon, 18 Mar 2013 15:26:15 +0000 (08:26 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jmorris/linux-security
Pull selinux bugfix from James Morris.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
selinux: use GFP_ATOMIC under spin_lock
Linus Torvalds [Mon, 18 Mar 2013 15:19:13 +0000 (08:19 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/s390/linux
Pull s390 fixes from Martin Schwidefsky:
"A couple of bug fixes, the most hairy on is the flush_tlb_kernel_range
fix. Another case of "how could this ever have worked?"."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/kdump: Do not add standby memory for kdump
drivers/i2c: remove !S390 dependency, add missing GENERIC_HARDIRQS dependencies
s390/scm: process availability
s390/scm_blk: suspend writes
s390/scm_drv: extend notify callback
s390/scm_blk: fix request number accounting
s390/mm: fix flush_tlb_kernel_range()
s390/mm: fix vmemmap size calculation
s390: critical section cleanup vs. machine checks
Linus Torvalds [Mon, 18 Mar 2013 15:17:14 +0000 (08:17 -0700)]
Merge tag 'fixes' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM SoC bug fixes from Arnd Bergmann:
"Things are calming down for arm-soc as well. This set of bug fixes is
dominated in size by the at91 platform bug fixes. Some of them were
meant to go through the framebuffer tree during the merge window, but
since the framebuffer maintainer could not be reached, I offered to
take them here. The other notable at91 change is the addition of
pinctrl definitions to fix the NAND controller.
The rest are mostly simple regression fixes:
- Our removal of VIRT_TO_BUS conflicted with Stephen Rothwell's
renaming of the Kconfig symbol. You will get a trivial merge
conflict here, we still want to remove it.
- missing bits for clocks on imx and s5pv210
- missing header inclusions in mmp and shmobile
- typos in s5pv210 camera and vt8500 clock support code
and three trivial fixes for pre-3.8 bugs:
- an old bogus build warning in the joystick driver
- a misleading Kconfig description
- a NULL pointer check on davinci"
* tag 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: fix CONFIG_VIRT_TO_BUS handling
ARM: i.MX35: enable MAX clock
ARM: Scorpion is a v7 architecture, not v6
ARM: mmp: add platform_device head file in gplugd
input/joystick: use get_cycles on ARM
[media] s5p-fimc: fix s5pv210 build
clk: vt8500: Fix "fix device clock divisor calculations"
ARM: i.MX25: Fix DT compilation
ARM: at91: fix infinite loop in at91_irq_suspend/resume
ARM: at91: add gpio suspend/resume support when using pinctrl
ARM: at91: fix LCD-wiring mode
atmel_lcdfb: fix 16-bpp modes on older SOCs
ARM: at91: dt: at91sam9x5: complete NAND pinctrl
ARM: at91: dt: at91sam9x5: correct NAND pins comments
ARM: davinci: edma: fix dmaengine induced null pointer dereference on da830
ARM: shmobile: marzen: Include mmc/host.h
ARM: EXYNOS: Add #dma-cells for generic dma binding support for PL330
ARM: S5PV210: Fix PL330 DMA controller clkdev entries
Linus Torvalds [Mon, 18 Mar 2013 15:12:41 +0000 (08:12 -0700)]
Merge branch 'merge' of git://git./linux/kernel/git/benh/powerpc
Pull powerpc fixes from Ben Herrenschmidt:
"Here's a few powerpc fixes for 3.9, mostly regressions (though not all
from 3.9 merge window) that we've been hammering into shape over the
last couple of weeks. They fix booting on Cell and G5 among other
things (yes, we've been a bit sloppy with older machines this time
around)."
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc: Rename USER_ESID_BITS* to ESID_BITS*
powerpc: Update kernel VSID range
powerpc: Make VSID_BITS* dependency explicit
powerpc: Make sure that we alays include CONFIG_BINFMT_ELF
powerpc/ptrace: Fix brk.len used uninitialised
powerpc: Fix -mcmodel=medium breakage in prom_init.c
powerpc: Remove last traces of POWER4_ONLY
powerpc: Fix cputable entry for 970MP rev 1.0
powerpc: Fix STAB initialization
Linus Torvalds [Mon, 18 Mar 2013 15:11:53 +0000 (08:11 -0700)]
Merge branch 'fixes' of git://git.linaro.org/people/rmk/linux-arm
Pull ARM fixes from Russell King:
"Just three fixes this time - a fix for a fix for our memset function,
fixing the dummy clockevent so that it doesn't interfere with real
hardware clockevents, and fixing a build error for Tegra."
* 'fixes' of git://git.linaro.org/people/rmk/linux-arm:
ARM: 7675/1: amba: tegra-ahb: Fix build error w/ PM_SLEEP w/o PM_RUNTIME
ARM: 7674/1: smp: Avoid dummy clockevent being preferred over real hardware clock-event
ARM: 7670/1: fix the memset fix
Arnd Bergmann [Wed, 13 Mar 2013 16:36:37 +0000 (17:36 +0100)]
ARM: fix CONFIG_VIRT_TO_BUS handling
887cbce0 "arch Kconfig: centralise CONFIG_ARCH_NO_VIRT_TO_BUS"
and
4febd95a8 "Select VIRT_TO_BUS directly where needed" from
Stephen Rothwell changed globally how CONFIG_VIRT_TO_BUS is
selected, while my own
a5d533ee0 "ARM: disable virt_to_bus/
virt_to_bus almost everywhere" was merged at the same time and
changed which platforms select it on ARM.
The result of this conflict was that we again see CONFIG_VIRT_TO_BUS
on all ARM systems. This patch fixes up the problem and removes
CONFIG_ARCH_NO_VIRT_TO_BUS again on ARM.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Dan Carpenter [Sat, 16 Mar 2013 09:48:11 +0000 (12:48 +0300)]
selinux: use GFP_ATOMIC under spin_lock
The call tree here is:
sk_clone_lock() <- takes bh_lock_sock(newsk);
xfrm_sk_clone_policy()
__xfrm_sk_clone_policy()
clone_policy() <- uses GFP_ATOMIC for allocations
security_xfrm_policy_clone()
security_ops->xfrm_policy_clone_security()
selinux_xfrm_policy_clone()
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: stable@kernel.org
Signed-off-by: James Morris <james.l.morris@oracle.com>
Arnd Bergmann [Mon, 18 Mar 2013 10:54:01 +0000 (11:54 +0100)]
Merge tag 'renesas-fixes-for-v3.9' of git://git./linux/kernel/git/horms/renesas into fixes
From Simon Horman <horms@verge.net.au>:
Resolve a build failure present since v3.9-rc1
* tag 'renesas-fixes-for-v3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas:
ARM: shmobile: marzen: Include mmc/host.h
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Andreas Schwab [Tue, 26 Feb 2013 16:55:54 +0000 (16:55 +0000)]
arm64: fix padding computation in struct ucontext
The expression to compute the padding needed to fill the uc_sigmask field
to 1024 bits actually computes the padding needed for 1080 bits.
Fortunately, due to the 16-byte alignment of the following field
(uc_mcontext) the definition in glibc contains enough bytes of padding
after uc_sigmask so that the overall offsets and size match in both
definitions.
Signed-off-by: Andreas Schwab <schwab@suse.de>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Catalin Marinas [Mon, 18 Feb 2013 18:22:14 +0000 (18:22 +0000)]
arm64: Fix build error with !SMP
The __atomic_hash is only defined when SMP is enabled but the
arm64ksyms.c exports it even for the UP case.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Daniel Vetter [Wed, 13 Mar 2013 21:28:46 +0000 (22:28 +0100)]
MAINTAINERS: intel-gfx is no longer subscribers-only
It is though still filtered for non-subscribers, but without pissing
off people with moderation queue spam. So drop the subscribers-only
tag to make getmaintainers.pl tdrt.
Acked-by: Dave Airlie <airlied@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Takashi Iwai [Mon, 18 Mar 2013 10:25:36 +0000 (11:25 +0100)]
drm/i915: Use the fixed pixel clock for eDP in intel_dp_set_m_n()
The eDP output on HP Z1 is still broken when X is started even after
fixing the infinite link-train loop. The regression was introduced in
3.6 kernel for cleaning up the mode clock handling code in intel_dp.c
by the commit [
71244653: drm/i915: adjusted_mode->clock in the dp
mode_fix].
In the past, the clock of the reference mode was modified in
intel_dp_mode_fixup() in the case of eDP fixed clock, and this clock was
used for calculating in intel_dp_set_m_n(). This override was removed,
thus the wrong mode clock is used for the calculation, resulting in a
psychedelic smoking output in the end.
This patch corrects the clock to be used in the place.
v1->v2: Use intel_edp_target_clock() for checking eDP fixed clock
instead of open code as in ironlake_set_m_n().
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Catalin Marinas [Mon, 18 Mar 2013 10:12:56 +0000 (10:12 +0000)]
arm64: Removed unused variable in compat_setup_rt_frame()
Recent clean-up of the compat signal code left an unused 'stack'
variable.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Stephane Eranian [Sun, 17 Mar 2013 13:49:57 +0000 (14:49 +0100)]
perf/x86: Add SNB/SNB-EP scheduling constraints for cycle_activity event
Add scheduling constraints for SNB/SNB-EP CYCLE_ACTIVITY event
as defined by SDM Jan 2013 edition. The STALLS umasks are
combinations with the NO_DISPATCH umask.
Signed-off-by: Stephane Eranian <eranian@gmail.com>
Cc: peterz@infradead.org
Cc: ak@linux.intel.com
Cc: jolsa@redhat.com
Link: http://lkml.kernel.org/r/20130317134957.GA8550@quad
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Masami Hiramatsu [Thu, 14 Mar 2013 11:52:43 +0000 (20:52 +0900)]
kprobes/x86: Check Interrupt Flag modifier when registering probe
Currently kprobes check whether the copied instruction modifies
IF (interrupt flag) on each probe hit. This results not only in
introducing overhead but also involving
inat_get_opcode_attribute into the kprobes hot path, and it can
cause an infinite recursive call (and kernel panic in the end).
Actually, since the copied instruction itself can never be modified
on the buffer, it is needless to analyze the instruction on every
probe hit.
To fix this issue, we check it only once when registering probe
and store the result on ainsn->if_modifier.
Reported-by: Timo Juhani Lindfors <timo.lindfors@iki.fi>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: yrl.pp-manager.tt@hitachi.com
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20130314115242.19690.33573.stgit@mhiramat-M0-7522
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Masami Hiramatsu [Thu, 14 Mar 2013 11:52:30 +0000 (20:52 +0900)]
kprobes: Make hash_64() as always inlined
Because hash_64() is called from the get_kprobe() inside
int3 handler, kernel causes int3 recursion and crashes if
kprobes user puts a probe on it.
Usually hash_64() is inlined into caller function, but in
some cases, it has instances by gcc's interprocedural
constant propagation.
This patch uses __always_inline instead of inline to
prevent gcc from doing such things.
Reported-by: Timo Juhani Lindfors <timo.lindfors@iki.fi>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Nadia Yvette Chambers <nyc@holomorphy.com>
Cc: yrl.pp-manager.tt@hitachi.com
Cc: David S. Miller <davem@davemloft.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20130314115230.19690.39387.stgit@mhiramat-M0-7522
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ingo Molnar [Mon, 18 Mar 2013 09:00:56 +0000 (10:00 +0100)]
Merge tag 'perf-urgent-for-mingo' of git://git./linux/kernel/git/acme/linux into perf/urgent
Pull perf/urgent fixes from Arnaldo Carvalho de Melo:
. perf probe: Fix segfault due to testing the wrong pointer for NULL,
from Ananth N Mavinakayanahalli.
. libtraceevent: Remove hard coded include to /usr/local/include in
Makefile, which causes cross builds to include host header files,
fix from Jack Mitchell.
. perf record: Use the right target interface for synthesizing
threads when --cpu/-C option is used, fix from Jiri Olsa.
. Check if -DFORTIFY_SOURCE=2 is allowed, as gcc 4.7.2 defines
it and then the build is broken when it is redefined in perf,
fix from Marcin Slusarz.
. Fix build with NO_NEWT=1, that can happen explicitely or when
the newt-devel package is not installed, from Michael Ellerman.
. perf/POWER7: Create a sysfs format entry for Power7 events, missing
patch from a patchseries already merged, from Sukadev Bhattiprolu.
. Fix LIBNUMA build with glibc 2.12 and older, from Vinson Lee.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ingo Molnar [Mon, 18 Mar 2013 08:48:29 +0000 (09:48 +0100)]
Merge branch 'tip/perf/urgent-2' of git://git./linux/kernel/git/rostedt/linux-trace into perf/urgent
Pull tracing fixes from Steven Rostedt.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Namhyung Kim [Fri, 15 Mar 2013 07:27:13 +0000 (16:27 +0900)]
perf: Generate EXIT event only once per task context
perf_event_task_event() iterates pmu list and generate events
for each eligible pmu context. But if task_event has task_ctx
like in EXIT it'll generate events even though the pmu doesn't
have an eligible one. Fix it by moving the code to proper
places.
Before this patch:
$ perf record -n true
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.006 MB perf.data (~248 samples) ]
$ perf report -D | tail
Aggregated stats:
TOTAL events: 73
MMAP events: 67
COMM events: 2
EXIT events: 4
cycles stats:
TOTAL events: 73
MMAP events: 67
COMM events: 2
EXIT events: 4
After this patch:
$ perf report -D | tail
Aggregated stats:
TOTAL events: 70
MMAP events: 67
COMM events: 2
EXIT events: 1
cycles stats:
TOTAL events: 70
MMAP events: 67
COMM events: 2
EXIT events: 1
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1363332433-7637-1-git-send-email-namhyung@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Namhyung Kim [Mon, 18 Mar 2013 02:41:46 +0000 (11:41 +0900)]
perf: Reset hwc->last_period on sw clock events
When cpu/task clock events are initialized, their sampling
frequencies are converted to have a fixed value. However it
missed to update the hwc->last_period which was set to 1 for
initial sampling frequency calibration.
Because this hwc->last_period value is used as a period in
perf_swevent_ hrtime(), every recorded sample will have an
incorrected period of 1.
$ perf record -e task-clock noploop 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.158 MB perf.data (~6919 samples) ]
$ perf report -n --show-total-period --stdio
# Samples: 4K of event 'task-clock'
# Event count (approx.): 4000
#
# Overhead Samples Period Command Shared Object Symbol
# ........ ............ ............ ....... ............. ..................
#
99.95% 3998 3998 noploop noploop [.] main
0.03% 1 1 noploop libc-2.15.so [.] init_cacheinfo
0.03% 1 1 noploop ld-2.15.so [.] open_verify
Note that it doesn't affect the non-sampling event so that the
perf stat still gets correct value with or without this patch.
$ perf stat -e task-clock noploop 1
Performance counter stats for 'noploop 1':
1000.272525 task-clock # 1.000 CPUs utilized
1.
000560605 seconds time elapsed
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1363574507-18808-1-git-send-email-namhyung@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Martin Peres [Thu, 14 Mar 2013 23:59:55 +0000 (00:59 +0100)]
drm/nouveau/hwmon: do not expose a buggy temperature if it is unavailable
Signed-off-by: Martin Peres <martin.peres@labri.fr>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Martin Peres [Fri, 15 Mar 2013 00:47:16 +0000 (01:47 +0100)]
drm/nouveau/therm: display the availability of the internal sensor
Signed-off-by: Martin Peres <martin.peres@labri.fr>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Martin Peres [Thu, 14 Mar 2013 23:42:38 +0000 (00:42 +0100)]
drm/nouveau/therm: disable temperature management if the sensor isn't readable
Signed-off-by: Martin Peres <martin.peres@labri.fr>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Martin Peres [Thu, 14 Mar 2013 23:21:07 +0000 (00:21 +0100)]
drm/nouveau/therm: disable auto fan management if temperature is not available
Signed-off-by: Martin Peres <martin.peres@labri.fr>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Martin Peres [Fri, 15 Mar 2013 01:09:20 +0000 (02:09 +0100)]
drm/nv40/therm: reserve negative temperatures for errors
Signed-off-by: Martin Peres <martin.peres@labri.fr>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Martin Peres [Thu, 14 Mar 2013 22:51:16 +0000 (23:51 +0100)]
drm/nv40/therm: disable temperature reading if the bios misses some parameters
Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Martin Peres <martin.peres@labri.fr>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Martin Peres [Tue, 5 Mar 2013 10:24:04 +0000 (11:24 +0100)]
drm/nouveau/therm-ic: the temperature is off by sensor_constant, warn the user
Signed-off-by: Martin Peres <martin.peres@labri.fr>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Martin Peres [Tue, 5 Mar 2013 09:58:59 +0000 (10:58 +0100)]
drm/nouveau/therm: remove some confusion introduced by therm_mode
The kernel message "[ PTHERM][0000:01:00.0] Thermal management: disabled"
is misleading as it actually means "fan management: disabled".
This patch fixes both the source and the message to improve readability.
Signed-off-by: Martin Peres <martin.peres@labri.fr>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Martin Peres [Tue, 5 Mar 2013 09:44:12 +0000 (10:44 +0100)]
drm/nouveau/therm: do not make assumptions on temperature
In nouveau_therm_sensor_event, temperature is stored as an uint8_t
even though the original interface returns an int.
This change should make it more obvious when the sensor is either
very-ill-calibrated or when we selected the wrong sensor style
on the nv40 family.
Signed-off-by: Martin Peres <martin.peres@labri.fr>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Martin Peres [Tue, 5 Mar 2013 09:35:20 +0000 (10:35 +0100)]
drm/nv40/therm: increase the sensor's settling delay to 20ms
Based on my experience, 10ms wasn't always enough. Let's bump that
to a little more.
If this turns out to be insufficient-enough again, then an approach
based on letting the sensor settle for several seconds before starting
polling on the temperature would be better suited. This way, boot time
wouldn't be impacted by those waits too much.
Signed-off-by: Martin Peres <martin.peres@labri.fr>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Martin Peres [Tue, 5 Mar 2013 09:26:30 +0000 (10:26 +0100)]
drm/nv40/therm: improve selection between the old and the new style
The condition to select between the old and new style was a thinko
as rnndb orders chipsets based on their release date (or general
chronologie hw-wise) and not based on their chipset number.
As the nv40 family is a mess when it comes to numbers, this patch
introduces a switch-based selection between the old and new style.
Signed-off-by: Martin Peres <martin.peres@labri.fr>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Linus Torvalds [Sun, 17 Mar 2013 22:59:32 +0000 (15:59 -0700)]
Linux 3.9-rc3
David Rientjes [Sun, 17 Mar 2013 22:49:10 +0000 (15:49 -0700)]
perf,x86: fix link failure for non-Intel configs
Commit
1d9d8639c063 ("perf,x86: fix kernel crash with PEBS/BTS after
suspend/resume") introduces a link failure since
perf_restore_debug_store() is only defined for CONFIG_CPU_SUP_INTEL:
arch/x86/power/built-in.o: In function `restore_processor_state':
(.text+0x45c): undefined reference to `perf_restore_debug_store'
Fix it by defining the dummy function appropriately.
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 17 Mar 2013 22:44:43 +0000 (15:44 -0700)]
perf,x86: fix wrmsr_on_cpu() warning on suspend/resume
Commit
1d9d8639c063 ("perf,x86: fix kernel crash with PEBS/BTS after
suspend/resume") fixed a crash when doing PEBS performance profiling
after resuming, but in using init_debug_store_on_cpu() to restore the
DS_AREA mtrr it also resulted in a new WARN_ON() triggering.
init_debug_store_on_cpu() uses "wrmsr_on_cpu()", which in turn uses CPU
cross-calls to do the MSR update. Which is not really valid at the
early resume stage, and the warning is quite reasonable. Now, it all
happens to _work_, for the simple reason that smp_call_function_single()
ends up just doing the call directly on the CPU when the CPU number
matches, but we really should just do the wrmsr() directly instead.
This duplicates the wrmsr() logic, but hopefully we can just remove the
wrmsr_on_cpu() version eventually.
Reported-and-tested-by: Parag Warudkar <parag.lkml@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Takashi Iwai [Mon, 11 Mar 2013 17:40:16 +0000 (18:40 +0100)]
Revert "drm/i915: try to train DP even harder"
This reverts commit
0d71068835e2610576d369d6d4cbf90e0f802a71.
Not only that the commit introduces a bogus check (voltage_tries == 5
will never meet at the inserted code path), it brings the i915 driver
into an endless dp-train loop on HP Z1 desktop machine with IVY+eDP.
At least reverting this commit recovers the framebuffer (but X is
still broken by other reasons...)
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Michael S. Tsirkin [Sun, 17 Mar 2013 02:46:09 +0000 (02:46 +0000)]
vhost/net: fix heads usage of ubuf_info
ubuf info allocator uses guest controlled head as an index,
so a malicious guest could put the same head entry in the ring twice,
and we will get two callbacks on the same value.
To fix use upend_idx which is guaranteed to be unique.
Reported-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Cc: stable@kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>