review.tizen.org Git - profile/common/kernel-common.git/log

md/raid10:  Fix bug when activating a hot-spare.

This is a fairly serious bug in RAID10.

When a RAID10 array is degraded and a hot-spare is activated, the
spare does not take up the empty slot, but rather replaces the first
working device.
This is likely to make the array non-functional.   It would normally
be possible to recover the data, but that would need care and is not
guaranteed.

This bug was introduced in commit
   2bb77736ae5dca0a189829fbb7379d43364a9dac
which first appeared in 3.1.

Cc: stable@kernel.org
Signed-off-by: NeilBrown <neilb@suse.de>

md: Fix some bugs in recovery_disabled handling.

In 3.0 we changed the way recovery_disabled was handle so that instead
of testing against zero, we test an mddev-> value against a conf->
value.
Two problems:
  1/ one place in raid1 was missed and still sets to '1'.
  2/ We didn't explicitly set the conf-> value at array creation
     time.
     It defaulted to '0' just like the mddev value does so they
     could appear equal and thus disable recovery.
     This did not affect normal 'md' as it calls bind_rdev_to_array
     which changes the mddev value.  However the dmraid interface
     doesn't call this and so doesn't change ->recovery_disabled; so at
     array start all recovery is incorrectly disabled.

So initialise the 'conf' value to one less that the mddev value, so
the will only be the same when explicitly set that way.

Reported-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>

md/raid5: fix bug that could result in reads from a failed device.

This bug was introduced in 415e72d034c50520ddb7ff79e7d1792c1306f0c9
which was in 2.6.36.

There is a small window of time between when a device fails and when
it is removed from the array. During this time we might still read
from it, but we won't write to it - so it is possible that we could
read stale data.

We didn't need the test of 'Faulty' before because the test on
In_sync is sufficient. Since we started allowing reads from the early
part of non-In_sync devices we need a test on Faulty too.

This is suitable for any kernel from 2.6.36 onwards, though the patch
might need a bit of tweaking in 3.0 and earlier.

Cc: stable@kernel.org
Signed-off-by: NeilBrown <neilb@suse.de>

lib/raid6: Fix filename emitted in generated code

The files were renamed in commit cc4589ebf; fix the name in the file
itself.

Signed-off-by: Dan McGee <dpmcgee@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>

md.c: trivial comment fix

Trivial comment fix

Signed-off-by: Chris Dunlop <chris@onthe.net.au>
Signed-off-by: NeilBrown <neilb@suse.de>

MD: Allow restarting an interrupted incremental recovery.

If an incremental recovery was interrupted, a subsequent
re-add will result in a full recovery, even though an
incremental should be possible (seen with raid1).

Solve this problem by not updating the superblock on the
recovering device until array is not degraded any longer.

Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrei Warkentin <andreiw@vmware.com>
Signed-off-by: NeilBrown <neilb@suse.de>

md: clear In_sync bit on devices added to an active array.

When we add a device to an active array it can be meaningful to set
the 'insync' flag. This indicates that the device is in-sync with the
array except for locations recorded in the bitmap.
A bitmap-based recovery can then bring it completely in-sync.

Internally we move that flag to 'saved_raid_disk' but forgot to clear
In_sync like we do in add_new_disk.

So clear In_sync after moving its value to saved_raid_disk.

Reported-by: Andrei Warkentin <andreiw@vmware.com>
Signed-off-by: NeilBrown <neilb@suse.de>

md: add proper write-congestion reporting to RAID1 and RAID10.

RAID1 and RAID10 handle write requests by queuing them for handling by
a separate thread.  This is because when a write-intent-bitmap is
active we might need to update the bitmap first, so it is good to
queue a lot of writes, then do one big bitmap update for them all.

However writeback request devices to appear to be congested after a
while so it can make some guesstimate of throughput.  The infinite
queue defeats that (note that RAID5 has already has a finite queue so
it doesn't suffer from this problem).

So impose a limit on the number of pending write requests.  By default
it is 1024 which seems to be generally suitable.  Make it configurable
via module option just in case someone finds a regression.

Signed-off-by: NeilBrown <neilb@suse.de>

md: rename "mdk_personality" to "md_personality"

"mdk" doesn't mean anything any more.

Signed-off-by: NeilBrown <neilb@suse.de>

md/bitmap remove fault injection options.

These are too hard to use to be much more than noise.

Signed-off-by: NeilBrown <neilb@suse.de>

md/raid5: typedef removal: raid5_conf_t -> struct r5conf

Signed-off-by: NeilBrown <neilb@suse.de>

md/raid1: typedef removal: conf_t -> struct r1conf

Signed-off-by: NeilBrown <neilb@suse.de>

md/raid10: typedef removal: conf_t -> struct r10conf

Signed-off-by: NeilBrown <neilb@suse.de>

md/raid0: typedef removal: raid0_conf_t -> struct r0conf

Signed-off-by: NeilBrown <neilb@suse.de>

md/multipath: typedef removal: multipath_conf_t -> struct mpconf

Signed-off-by: NeilBrown <neilb@suse.de>

md/linear: typedef removal: linear_conf_t -> struct linear_conf

Signed-off-by: NeilBrown <neilb@suse.de>

md/faulty: remove typedef: conf_t -> struct faulty_conf

Signed-off-by: NeilBrown <neilb@suse.de>

md/linear: remove typedefs: dev_info_t -> struct dev_info

Signed-off-by: NeilBrown <neilb@suse.de>

md: remove typedefs: mirror_info_t -> struct mirror_info

Signed-off-by: NeilBrown <neilb@suse.de>

md: remove typedefs: r10bio_t -> struct r10bio and r1bio_t -> struct r1bio

Signed-off-by: NeilBrown <neilb@suse.de>

md: remove typedefs: mdk_thread_t -> struct md_thread

Signed-off-by: NeilBrown <neilb@suse.de>

md: remove typedefs: mddev_t -> struct mddev

Having mddev_t and 'struct mddev_s' is ugly and not preferred

Signed-off-by: NeilBrown <neilb@suse.de>

md: removing typedefs: mdk_rdev_t -> struct md_rdev

The typedefs are just annoying. 'mdk' probably refers to 'md_k.h'
which used to be an include file that defined this thing.

Signed-off-by: NeilBrown <neilb@suse.de>

md/raid0: convert some printks to pr_debug.

When md assembles a RAID0 array it prints out lots of info which
is really just for debugging, so convert that to pr_debug.
It also prints out the resulting configuration which could be
interesting, so keep that as 'printk' but tidy it up a bit.

Signed-off-by: NeilBrown <neilb@suse.de>

md: remove PRINTK and dprintk debugging and use pr_debug

Being able to dynamically enable these make them much more useful.

Signed-off-by: NeilBrown <neilb@suse.de>

md: remove some old DEBUGging code.

This code is not really helpful and is hard to maintain, so just
discard it.

Signed-off-by: NeilBrown <neilb@suse.de>

md/raid5: convert to macros into inline functions.

More type-safety. Easier to read.

Signed-off-by: NeilBrown <neilb@suse.de>

md/raid1/ avoid bio search in end_sync_read()

We know which device we just read from so we don't need to
search the bios to find out. Just use ->read_disk.

Signed-off-by: NeilBrown <neilb@suse.de>

md/raid1: factor out common bio handling code

When normal-write and sync-read/write bio completes, we should
find out the disk number the bio belongs to. Factor those common
code out to a separate function.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>

md/raid5: remove pointless NULL test.

In the 'abort' branch of run(), 'conf' cannot possibly be NULL,
so remove the test.

Reported-by: Zdenek Kabelac <zdenek.kabelac@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>

md/raid1: add documentation to r1_private_data_s data structure.

There wasn't much and it is inconsistent.
Also rearrange fields to keep related fields together.

Reported-by: Aapo Laine <aapo.laine@shiftmail.org>
Signed-off-by: NeilBrown <neilb@suse.de>

md: don't delay reboot by 1 second if no MD devices exist

The md_notify_reboot() method includes a call to mdelay(1000),
to deal with "exotic SCSI devices" which are too volatile on
reboot. The delay is unconditional. Even if the machine does
not have any block devices, let alone MD devices, the kernel
shutdown sequence is slowed down.

1 second does not matter much with physical hardware, but with
certain virtualization use cases any wasted time in the bootup
& shutdown sequence counts for alot.

* drivers/md/md.c: md_notify_reboot() - only impose a delay if
there was at least one MD device to be stopped during reboot

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>

trival: md_k.h should be md.h in the beginning comment of file md.h

Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>

md/bitmap: improve handling of 'allclean'.

The 'allclean' flag is used to cache the fact that there is nothing to
do, so we can avoid waking up and scanning the bitmap regularly.

The two sorts of pages that might need the attention of the bitmap
daemon are BITMAP_PAGE_PENDING and BITMAP_PAGE_NEEDWRITE pages.

So make sure allclean reflects exactly when there are none of those.
So:
  set it before scanning all pages with either bit set.
  clear it whenever these bits are set
  clear it when we desire not to clear one of these bits.
  don't clear it any other time.

Signed-off-by: NeilBrown <neilb@suse.de>

md/bitmap: rename and tidy up BITMAP_PAGE_CLEAN

The flag 'BITMAP_PAGE_CLEAN' has a confusing name as it doesn't mean
that the page is clean, but rather that there are counters in the page
which allow bits in the bitmap to be cleared - i.e. maybe cleaning can
happen.

So change it to BITMAP_PAGE_PENDING and fix some irregularities:
- Don't set it in bitmap_init_from_disk as bitmap_set_memory_bits
   sets it when needed
- in bitmap_daemon_work, if we find a counter that is '1', but
   need_sync is set, then set BITMAP_PAGE_PENDING again (it was
   recently cleared) to ensure we don't forget about this bit.

Signed-off-by: NeilBrown <neilb@suse.de>

md: Avoid waking up a thread after it has been freed.

Two related problems:

1/ some error paths call "md_unregister_thread(mddev->thread)"
   without subsequently clearing ->thread.  A subsequent call
   to mddev_unlock will try to wake the thread, and crash.

2/ Most calls to md_wakeup_thread are protected against the thread
   disappeared either by:
      - holding the ->mutex
      - having an active request, so something else must be keeping
        the array active.
   However mddev_unlock calls md_wakeup_thread after dropping the
   mutex and without any certainty of an active request, so the
   ->thread could theoretically disappear.
   So we need a spinlock to provide some protections.

So change md_unregister_thread to take a pointer to the thread
pointer, and ensure that it always does the required locking, and
clears the pointer properly.

Reported-by: "Moshe Melnikov" <moshe@zadarastorage.com>
Signed-off-by: NeilBrown <neilb@suse.de>
cc: stable@kernel.org

md: Fix handling for devices from 2TB to 4TB in 0.90 metadata.

0.90 metadata uses an unsigned 32bit number to count the number of
kilobytes used from each device.
This should allow up to 4TB per device.
However we multiply this by 2 (to get sectors) before casting to a
larger type, so sizes above 2TB get truncated.

Also we allow rdev->sectors to be larger than 4TB, so it is possible
for the array to be resized larger than the metadata can handle.
So make sure rdev->sectors never exceeds 4TB when 0.90 metadata is in
used.

Also the sanity check at the end of super_90_load should include level
1 as it used ->size too. (RAID0 and Linear don't use ->size at all).

Reported-by: Pim Zandbergen <P.Zandbergen@macroscoop.nl>
Cc: stable@kernel.org
Signed-off-by: NeilBrown <neilb@suse.de>

md/raid1,10: Remove use-after-free bug in make_request.

A single request to RAID1 or RAID10 might result in multiple
requests if there are known bad blocks that need to be avoided.

To detect if we need to submit another write request we test:
if (sectors_handled < (bio->bi_size >> 9)) {

However this is after we call **_write_done() so the 'bio' no longer
belongs to us - the writes could have completed and the bio freed.

So move the **_write_done call until after the test against
bio->bi_size.

This addresses https://bugzilla.kernel.org/show_bug.cgi?id=41862

Reported-by: Bruno Wolff III <bruno@wolff.to>
Tested-by: Bruno Wolff III <bruno@wolff.to>
Signed-off-by: NeilBrown <neilb@suse.de>

md/raid10: unify handling of write completion.

A write can complete at two different places:
1/ when the last member-device write completes, through
raid10_end_write_request
2/ in make_request() when we remove the initial bias from ->remaining.

These two should do exactly the same thing and the comment says they
do, but they don't.

So factor the correct code out into a function and call it in both
places. This makes the code much more similar to RAID1.

The difference is only significant if there is an error, and they
usually take a while, so it is unlikely that there will be an error
already when make_request is completing, so this is unlikely to cause
real problems.

Signed-off-by: NeilBrown <neilb@suse.de>

Avoid dereferencing a 'request_queue' after last close.

On the last close of an 'md' device which as been stopped, the device
is destroyed and in particular the request_queue is freed. The free
is done in a separate thread so it might happen a short time later.

__blkdev_put calls bdev_inode_switch_bdi *after* ->release has been
called.

Since commit f758eeabeb96f878c860e8f110f94ec8820822a9
bdev_inode_switch_bdi will dereference the 'old' bdi, which lives
inside a request_queue, to get a spin lock. This causes the last
close on an md device to sometime take a spin_lock which lives in
freed memory - which results in an oops.

So move the called to bdev_inode_switch_bdi before the call to
->release.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Acked-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: stable@kernel.org
Signed-off-by: NeilBrown <neilb@suse.de>

md/raid5: fix a hang on device failure.

Waiting for a 'blocked' rdev to become unblocked in the raid5d thread
cannot work with internal metadata as it is the raid5d thread which
will clear the blocked flag.
This wasn't a problem in 3.0 and earlier as we only set the blocked
flag when external metadata was used then.
However we now set it always, so we need to be more careful.

Signed-off-by: NeilBrown <neilb@suse.de>

md: fix clearing of 'blocked' flag in the presence of bad blocks.

When the 'blocked' flag on a device is cleared while there are
unacknowledged bad blocks we must fail the device. This is needed for
backwards compatability of the interface.

The code currently uses the wrong test for "unacknowledged bad blocks
exist". Change it to the right test.

Signed-off-by: NeilBrown <neilb@suse.de>

md/linear: avoid corrupting structure while waiting for rcu_free to complete.

I don't know what I was thinking putting 'rcu' after a dynamically
sized array! The array could still be in use when we call rcu_free()
(That is the point) so we mustn't corrupt it.

Cc: stable@kernel.org
Signed-off-by: NeilBrown <neilb@suse.de>

md: use REQ_NOIDLE flag in md_super_write()

Queue idling is used for the anticipation of immediate
sequencial I/O's but md_super_write() is a kind of one-
shot operation, coupled with md_super_wait(), so the
idling in this case will be just a waste of time.

Specifying REQ_NOIDLE prevents it. Instead of adding
the flag to submit_bio() directly, use pre-defined
macro WRITE_FLUSH_FUA.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>

md: ensure changes to 'write-mostly' are reflected in metadata.

The 'write-mostly' flag can be changed through sysfs.
With 0.90 metadata, those changes are reflected in the metadata.
For 1.x metadata, they aren't.

So fix super_1_sync to record 'write-mostly' status.

Signed-off-by: NeilBrown <neilb@suse.de>

md: report failure if a 'set faulty' request doesn't.

Sometimes a device will refuse to be set faulty. e.g. RAID1 will
never let the last working device become faulty.

So check if "md_error()" did manage to set the faulty flag and fail
with EBUSY if it didn't.

Resolves-Debian-Bug: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=601198
Reported-by: Mike Hommey <mh+reportbug@glandium.org>
Signed-off-by: NeilBrown <neilb@suse.de>

Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86-32, vdso: On system call restart after SYSENTER, use int $0x80
  x86, UV: Remove UV delay in starting slave cpus
  x86, olpc: Wait for last byte of EC command to be accepted

x86-32, vdso: On system call restart after SYSENTER, use int $0x80

When we enter a 32-bit system call via SYSENTER or SYSCALL, we shuffle
the arguments to match the int $0x80 calling convention.  This was
probably a design mistake, but it's what it is now.  This causes
errors if the system call as to be restarted.

For SYSENTER, we have to invoke the instruction from the vdso as the
return address is hardcoded.  Accordingly, we can simply replace the
jump in the vdso with an int $0x80 instruction and use the slower
entry point for a post-restart.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/CA%2B55aFztZ=r5wa0x26KJQxvZOaQq8s2v3u50wCyJcA-Sc4g8gQ@mail.gmail.com
Cc: <stable@kernel.org>

m68k: fix __page_to_pfn for a const struct page argument

Fixes fallout due to the removal of the cast in commit aa462abe8aaf
("mm: fix __page_to_pfn for a const struct page argument")

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: linux-m68k@lists.linux-m68k.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs

* 'for-linus' of git://oss.sgi.com/xfs/xfs:
  xfs: fix tracing builds inside the source tree
  xfs: remove subdirectories
  xfs: don't expect xfs headers to be in subdirectories

Merge git://git.infradead.org/users/cbou/battery-3.1

* git://git.infradead.org/users/cbou/battery-3.1:
  s3c-adc-battery: Fix compilation error due to missing header (module.h)
  max8997_charger: Needs module.h
  max8998_charger: Needs module.h

Merge branch 'drm-fixes' of git://git./linux/kernel/git/airlied/drm-2.6

* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
  drm/radeon: Extended DDC Probing for Toshiba L300D Radeon Mobility X1100 HDMI-A Connector
  drm/ttm: ensure ttm for new node is bound before calling move_notify()
  drm/ttm: unbind ttm before destroying node in accel move cleanup
  drm/ttm: fix ttm_bo_add_ttm(user) failure path
  drm/radeon: Make vramlimit parameter actually work.
  drm/radeon: Explicitly print GTT/VRAM offsets on test failure.
  drm/radeon: Take IH ring into account for test size calculation.
  drm/radeon/alpha: Add Alpha support to Radeon DRM code

Revert "irq: Always set IRQF_ONESHOT if no primary handler is specified"

This reverts commit f3637a5f2e2eb391ff5757bc83fb5de8f9726464.

It turns out that this breaks several drivers, one example being OMAP
boards which use the on-board OMAP UARTs and the omap-serial driver that
will not boot to userspace after the commit.

Paul Walmsley reports that enabling CONFIG_DEBUG_SHIRQ reveals 'IRQ
handler type mismatch' errors:

  IRQ handler type mismatch for IRQ 74
  current handler: serial idle
  ...

and the reason is that setting IRQF_ONESHOT will now result in those
interrupt handlers having different IRQF flags, and thus being
unsharable.  So the commit log in the reverted commit:

                            "Since it is required for those users and
    there is no difference for others it makes sense to add this flag
    unconditionally."

is simply not true: there may not be any difference from a "actions at
irq time", but there is a *big* difference wrt this flag testing irq
management (see __setup_irq() in kernel/irq/manage.c).

One solution may be to stop verifying IRQF_ONESHOT in __setup_irq(), but
right now the safe course of action is to revert the change.  Let's
revisit this in a later merge window.

Reported-by: Paul Walmsley <paul@pwsan.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Requested-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

drm/radeon: Extended DDC Probing for Toshiba L300D Radeon Mobility X1100 HDMI-A Connector

Toshiba Satellite L300D with ATI Mobility Radeon X1100 sends data
   to i2c bus for a HDMI connector that is not implemented/existent
   on the notebook's board.

   Fix by applying extented DDC probing for this connector.

   Requires [PATCH] drm/radeon: Extended DDC Probing for Connectors
   with Improperly Wired DDC Lines

   Tested for kernel 2.6.38 on Toshiba Satellite L300D notebook

BugLink: http://bugs.launchpad.net/bugs/826677
Signed-off-by: Thomas Reim <reimth@gmail.com>
Acked-by: Chris Routh <routhy@gmail.com>
Cc: <stable@kernel.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

drm/ttm: ensure ttm for new node is bound before calling move_notify()

This was true for new TTM_PL_SYSTEM and new TTM_PL_TT cases, but wasn't
the case on TTM_PL_SYSTEM<->TTM_PL_TT moves, which causes trouble on some
paths as nouveau's move_notify() hook requires that the dma addresses be
valid at this point.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

drm/ttm: unbind ttm before destroying node in accel move cleanup

Nouveau makes the assumption that if a TTM is bound there will be a mm_node
around for it and the backwards ordering here resulted in a use-after-free
on some eviction paths.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

drm/ttm: fix ttm_bo_add_ttm(user) failure path

ttm_tt_destroy kfrees passed object, so we need to nullify
a reference to it.

Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: stable@kernel.org
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

xfs: fix tracing builds inside the source tree

The code really requires the current source directory to be in the
header search path. We already do this if building with an object
tree separate from the source, but it needs to be added manually
if building inside the source. The cflags addition for it accidentally
got removed when collapsing the xfs directory structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Alex Elder <aelder@sgi.com>

Linux 3.1-rc3

Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf tools: Add group event scheduling option to perf record/stat
  MAINTAINERS: Fix list of perf events source files
  perf tools: Fix build against newer glibc
  perf tools: Fix error handling of unknown events
  perf evlist: Fix missing event name init for default event
  perf list: Fix exit value

Merge branch 'stable/bug.fixes' of git://git./linux/kernel/git/konrad/xen

* 'stable/bug.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen/tracing: Fix tracing config option properly
  xen: Do not enable PV IPIs when vector callback not present
  xen/x86: replace order-based range checking of M2P table by linear one
  xen: xen-selfballoon.c needs more header files

xen/tracing: Fix tracing config option properly

Steven Rostedt says we should use CONFIG_EVENT_TRACING.

Cc:Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen: Do not enable PV IPIs when vector callback not present

Fix regression for HVM case on older (<4.1.1) hypervisors caused by

  commit 99bbb3a84a99cd04ab16b998b20f01a72cfa9f4f
  Author: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
  Date:   Thu Dec 2 17:55:10 2010 +0000

    xen: PV on HVM: support PV spinlocks and IPIs

This change replaced the SMP operations with event based handlers without
taking into account that this only works when the hypervisor supports
callback vectors. This causes unexplainable hangs early on boot for
HVM guests with more than one CPU.

BugLink: http://bugs.launchpad.net/bugs/791850
CC: stable@kernel.org
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Tested-and-Reported-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

drm/radeon: Make vramlimit parameter actually work.

Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

drm/radeon: Explicitly print GTT/VRAM offsets on test failure.

Otherwise these would need to be painstakingly calculated looking at the source
code.

Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

drm/radeon: Take IH ring into account for test size calculation.

Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

drm/radeon/alpha: Add Alpha support to Radeon DRM code

Alpha needs to have available the system bus address for the Radeon's
local memory, so that it can be used in ttm_bo_vm_fault(), when building
the PTEs for accessing that VRAM. So, we make bus.addr hold the ioremap()
return, and then we can modify bus.base appropriately for use during page
fault processing.

Signed-off-by: Jay Estabrook <jay.estabrook@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

Merge branch 'fixes' of git://git./linux/kernel/git/ieee1394/linux1394-2.6

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6:
firewire: core: handle ack_busy when fetching the Config ROM

Btrfs: fix 64 bit divide problem

This fixes a regression introduced by commit cdcb725c05fe ("Btrfs: check
if there is enough space for balancing smarter"). We can't do 64-bit
divides on 32-bit architectures.

In cases where we need to divide/multiply by 2 we should just left/right
shift respectively, and in cases where theres N number of devices use
do_div. Also make the counters u64 to match up with rw_devices.
Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
Acked-and-tested-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'for_linus' of git://git./linux/kernel/git/tytso/ext4

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: flush any pending end_io requests before DIO reads w/dioread_nolock
  ext4: fix nomblk_io_submit option so it correctly converts uninit blocks
  ext4: Resolve the hang of direct i/o read in handling EXT4_IO_END_UNWRITTEN.
  ext4: call ext4_ioend_wait and ext4_flush_completed_IO in ext4_evict_inode
  ext4: Fix ext4_should_writeback_data() for no-journal mode

Merge branch 'for-linus' of git://git./linux/kernel/git/tiwai/sound-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
  ALSA: sound/aoa/fabrics/layout.c: remove unneeded kfree
  ALSA: hda - Fix error check from snd_hda_get_conn_index() in patch_cirrus.c
  ALSA: hda - Don't spew too many ELD errors
  ALSA: usb-audio - Fix missing mixer dB information
  ALSA: hda - Add "PCM" volume to vmaster slave list
  ALSA: hda - Fix duplicated capture-volume creation for ALC268 models
  ALSA: ac97: Add HP Compaq dc5100 SFF(PT003AW) to Headphone Jack Sense whitelist
  ALSA: snd_usb_caiaq: track submitted output urbs

pci: fix new kernel-doc warning in pci.c

Fix new kernel-doc warning in pci.c:

Warning(drivers/pci/pci.c:3259): No description found for parameter 'mps'
Warning(drivers/pci/pci.c:3259): Excess function parameter 'rq' description in 'pcie_set_mps'

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ALSA: sound/aoa/fabrics/layout.c: remove unneeded kfree

The label outnodev is only used when kzalloc has not yet taken place or has
failed, so there is no need for the call for kfree under this label.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
identifier x;
expression E1!=0,E2,E3,E4;
statement S;
iterator I;
@@

(
if (...) { ... when != kfree(x)
               when != x = E3
               when != E3 = x
*  return ...;
}
... when != x = E2
    when != I(...,x,...) S
if (...) { ... when != x = E4
kfree(x); ... return ...; }
)
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

ALSA: hda - Fix error check from snd_hda_get_conn_index() in patch_cirrus.c

snd_hda_get_conn_index() returns a negative value while the current code
stores it in an unsigned int. It must be stored in a signed integer.

Reported-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

ALSA: hda - Don't spew too many ELD errors

Currently HD-audio driver shows the all error ELD byte as an error
in the kernel message. This is annoying when the video driver doesn't
set the correct ELD from the beginning. e.g. radeon sends a zero-byte
data, but we still check ELD with the fixed 128 byte as a workaround
for some broken devices, it spews 128-times errors.

For avoiding this, the driver aborts reading when the first byte is
invalid. In such a case, the whole data is certainly invalid.

Signed-off-by: Takashi Iwai <tiwai@suse.de>

Merge branch 'for-linus' of git://git./linux/kernel/git/jbarnes/drm-intel

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/drm-intel:
drm/i915: set GFX_MODE to pre-Ivybridge default value even on Ivybridge

ext4: flush any pending end_io requests before DIO reads w/dioread_nolock

There is a race between ext4 buffer write and direct_IO read with
dioread_nolock mount option enabled. The problem is that we clear
PageWriteback flag during end_io time but will do
uninitialized-to-initialized extent conversion later with dioread_nolock.
If an O_direct read request comes in during this period, ext4 will return
zero instead of the recently written data.

This patch checks whether there are any pending uninitialized-to-initialized
extent conversion requests before doing O_direct read to close the race.
Note that this is just a bandaid fix. The fundamental issue is that we
clear PageWriteback flag before we really complete an IO, which is
problem-prone. To fix the fundamental issue, we may need to implement an
extent tree cache that we can use to look up pending to-be-converted extents.

Signed-off-by: Jiaying Zhang <jiayingz@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org

drm/i915: set GFX_MODE to pre-Ivybridge default value even on Ivybridge

Prior to Ivybridge, the GFX_MODE would default to 0x800, meaning that
MI_FLUSH would flush the TLBs in addition to the rest of the caches
indicated in the MI_FLUSH command. However starting with Ivybridge, the
register defaults to 0x2800 out of reset, meaning that to invalidate the
TLB we need to use PIPE_CONTROL. Since we're not doing that yet, go
back to the old default so things work.

v2: don't forget to actually *clear* the new bit

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>

Merge branch 'for-linus' of git://git.kernel.dk/linux-block

* 'for-linus' of git://git.kernel.dk/linux-block: (23 commits)
  Revert "cfq: Remove special treatment for metadata rqs."
  block: fix flush machinery for stacking drivers with differring flush flags
  block: improve rq_affinity placement
  blktrace: add FLUSH/FUA support
  Move some REQ flags to the common bio/request area
  allow blk_flush_policy to return REQ_FSEQ_DATA independent of *FLUSH
  xen/blkback: Make description more obvious.
  cfq-iosched: Add documentation about idling
  block: Make rq_affinity = 1 work as expected
  block: swim3: fix unterminated of_device_id table
  block/genhd.c: remove useless cast in diskstats_show()
  drivers/cdrom/cdrom.c: relax check on dvd manufacturer value
  drivers/block/drbd/drbd_nl.c: use bitmap_parse instead of __bitmap_parse
  bsg-lib: add module.h include
  cfq-iosched: Reduce linked group count upon group destruction
  blk-throttle: correctly determine sync bio
  loop: fix deadlock when sysfs and LOOP_CLR_FD race against each other
  loop: add BLK_DEV_LOOP_MIN_COUNT=%i to allow distros 0 pre-allocated loop devices
  loop: add management interface for on-demand device allocation
  loop: replace linked list of allocated devices with an idr index
  ...

Merge branch 'for-linus' of git://git./linux/kernel/git/jbarnes/pci-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
  PCI: OF: Don't crash when bridge parent is NULL.
  PCI: export pcie_bus_configure_settings symbol
  PCI: code and comments cleanup
  PCI: make cardbus-bridge resources optional
  PCI: make SRIOV resources optional
  PCI : ability to relocate assigned pci-resources
  PCI: honor child buses add_size in hot plug configuration
  PCI: Set PCI-E Max Payload Size on fabric

s3c-adc-battery: Fix compilation error due to missing header (module.h)

Add linux/module.h to fix this compilation error:

drivers/power/s3c_adc_battery.c:435:15: error: expected declaration specifiers or ‘...’ before string constant
drivers/power/s3c_adc_battery.c:435:1: warning: data definition has no type or storage class
drivers/power/s3c_adc_battery.c:435:1: warning: type defaults to ‘int’ in declaration of ‘MODULE_AUTHOR’
drivers/power/s3c_adc_battery.c:435:15: warning: function declaration isn’t a prototype
drivers/power/s3c_adc_battery.c:436:20: error: expected declaration specifiers or ‘...’ before string constant
drivers/power/s3c_adc_battery.c:436:1: warning: data definition has no type or storage class
drivers/power/s3c_adc_battery.c:436:1: warning: type defaults to ‘int’ in declaration of ‘MODULE_DESCRIPTION’
drivers/power/s3c_adc_battery.c:436:20: warning: function declaration isn’t a prototype
drivers/power/s3c_adc_battery.c:437:16: error: expected declaration specifiers or ‘...’ before string constant
drivers/power/s3c_adc_battery.c:437:1: warning: data definition has no type or storage class
drivers/power/s3c_adc_battery.c:437:1: warning: type defaults to ‘int’ in declaration of ‘MODULE_LICENSE’
drivers/power/s3c_adc_battery.c:437:16: warning: function declaration isn’t a prototype
make[2]: *** [drivers/power/s3c_adc_battery.o] Error 1

Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Signed-off-by: Ian Lartey <ian@opensource.wolfsonmicro.com>
Signed-off-by: Anton Vorontsov <cbouatmailru@gmail.com>

max8997_charger: Needs module.h

power/max8997_charger.c uses interfaces from linux/module.h,
so it should include that file. This fixes build errors.

Signed-off-by: Axel Lin <axel.lin@gmail.com>
Acked-by: MyungJoo Ham <myungjoo.ham@samsung.com>
Signed-off-by: Anton Vorontsov <cbouatmailru@gmail.com>

max8998_charger: Needs module.h

power/max8998_charger.c uses interfaces from linux/module.h,
so it should include that file. This fixes build errors.

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Anton Vorontsov <cbouatmailru@gmail.com>

PCI: OF: Don't crash when bridge parent is NULL.

In pcibios_get_phb_of_node(), we will crash while booting if
bus->bridge->parent is NULL.

Check for this case and avoid dereferencing the NULL pointer.

Signed-off-by: David Daney <david.daney@cavium.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>

Revert "cfq: Remove special treatment for metadata rqs."

We have a kernel build regression since 3.1-rc1, which is about 10%
regression. The kernel source is in an ext3 filesystem.
Alex Shi bisect it to commit:
commit a07405b7802691d29ab3b23bdc76ee6d006aad0b
Author: Justin TerAvest <teravest@google.com>
Date: Sun Jul 10 22:09:19 2011 +0200

cfq: Remove special treatment for metadata rqs.

Apparently this is caused by lack metadata preemption, where ext3/ext4
do use READ_META. I didn't see a way to fix the issue, so suggest
reverting the patch.

This reverts commit a07405b7802691d29ab3b23bdc76ee6d006aad0b.

Reported-by: Alex Shi<alex.shi@intel.com>
Reported-by: Shaohua Li<shaohua.li@intel.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>

ALSA: usb-audio - Fix missing mixer dB information

The recent fix for testing dB range at the mixer creation time seems
to cause regressions in some devices. In such devices, reading the dB
info at probing time gives an error, thus both dBmin and dBmax are still
zero, and TLV flag isn't set although the later read of dB info succeeds.

This patch adds a workaround for such a case by assuming that the later
read will succeed. In future, a similar test should be performed in a
case where a wrong dB range is seen even in the later read.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Cc: <stable@kernel.org>

Merge git://git./linux/kernel/git/davem/sparc

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc: fix array bounds error setting up PCIC NMI trap

Merge branch 'upstream-linus' of git://git./linux/kernel/git/jgarzik/libata-dev

* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
  drivers/ata/sata_dwc_460ex.c: add missing kfree
  ata: Add iMX pata support
  pata_via: disable ATAPI DMA on AVERATEC 3200
  [libata] sata_sil: fix used-uninit warning

Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFSv4.1: Return NFS4ERR_BADSESSION to callbacks during session resets
  NFSv4.1: Fix the callback 'highest_used_slotid' behaviour
  pnfs-obj: Fix the comp_index != 0 case
  pnfs-obj: Bug when we are running out of bio
  nfs: add missing prefetch.h include

sparc: fix array bounds error setting up PCIC NMI trap

CC arch/sparc/kernel/pcic.o
arch/sparc/kernel/pcic.c: In function 'pcic_probe':
arch/sparc/kernel/pcic.c:359:33: error: array subscript is above array bounds [-Werror=array-bounds]
arch/sparc/kernel/pcic.c:359:8: error: array subscript is above array bounds [-Werror=array-bounds]
arch/sparc/kernel/pcic.c:360:33: error: array subscript is above array bounds [-Werror=array-bounds]
arch/sparc/kernel/pcic.c:360:8: error: array subscript is above array bounds [-Werror=array-bounds]
arch/sparc/kernel/pcic.c:361:33: error: array subscript is above array bounds [-Werror=array-bounds]
arch/sparc/kernel/pcic.c:361:8: error: array subscript is above array bounds [-Werror=array-bounds]
cc1: all warnings being treated as errors

I'm not particularly familiar with sparc but t_nmi (defined in head_32.S via
the TRAP_ENTRY macro) and pcic_nmi_trap_patch (defined in entry.S) both appear
to be 4 instructions long and I presume from the usage that instructions are
int sized.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: sparclinux@vger.kernel.org
Reviewed-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers/ata/sata_dwc_460ex.c: add missing kfree

Currently, error handling code in this function calls the function
sata_dwc_port_stop, but this function has essentially no effect if hsdevp
has not been stored in ap, which is the case throughout this function.  The
only effect is to print a debugging message including ap->print_id.

The code is rewritten to not call sata_dwc_port_stop, but instead to jump
to a local label that prints the original error message and the print_id
information.  In the case where hsdevp has been already allocated (but not
yet stored in ap), this value is freed as well.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@exists@
local idexpression x;
statement S,S1;
expression E;
identifier fl;
expression *ptr != NULL;
@@

x = $kmalloc\|kzalloc\|kcalloc$(...);
...
if (x == NULL) S
<... when != x
     when != if (...) { <+...kfree(x)...+> }
     when any
     when != true x == NULL
x->fl
...>
(
if (x == NULL) S1
|
if (...) { ... when != x
               when forall
(
return $0\|<+...x...+>\|ptr$;
|
* return ...;
)
}
)
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>

ata: Add iMX pata support

Add basic support for pata on iMX. It has been tested only on imx51.
SDMA support will probably be added later so this version supports only
PIO.

v2:
  - enable only when needed IORDY
  - use dev_get_drvdata
v3:
  - add missing clk_put() calls
  - use platform_get_irq()
  - fix resume code to avoid disabling IORDY on resume
v4:
  - Remove EXPERIMENTAL and switch to depends on ARCH_MXC
  - Use devm_kzalloc()
  - make clock a must-have
  - Use only 1 ioremap

Signed-off-by: Arnaud Patard <arnaud.patard@rtp-net.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

pata_via: disable ATAPI DMA on AVERATEC 3200

On AVERATEC 3200, pata_via causes memory corruption with ATAPI DMA,
which often leads to random kernel oops. The cause of the problem is
not well understood yet and only small subset of machines using the
controller seem affected. Blacklist ATAPI DMA on the machine.

Signed-off-by: Tejun Heo <tj@kernel.org>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=11426
Reported-and-tested-by: Jim Bray <jimsantelmo@gmail.com>
Cc: Alan Cox <alan@linux.intel.com>
Cc: stable@kernel.org
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>

[libata] sata_sil: fix used-uninit warning

Init 'serror' to silence the following warning:

drivers/ata/sata_sil.c: In function ‘sil_interrupt’:
drivers/ata/sata_sil.c:453:14: warning: ‘serror’ may be used uninitialized in
this function [-Wuninitialized]

This is not a 'can never happen' but is nonetheless extremely unlikely.
The easiest and cleanest warning fix is simply to init the var,
rather than worry about marking the var uninit-ok.

Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

Merge branch 'for-linus' of git://git./linux/kernel/git/mason/btrfs-unstable

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
  Btrfs: set i_size properly when fallocating and we already
  btrfs: unlock on error in btrfs_file_llseek()
  btrfs: btrfs_permission's RO check shouldn't apply to device nodes
  Btrfs: truncate pages from clone ioctl target range
  Btrfs: fix uninitialized sync_pending
  Btrfs: fix wrong free space information
  btrfs: memory leak in btrfs_add_inode_defrag()
  Btrfs: use plain page_address() in header fields setget functions
  Btrfs: forced readonly when btrfs_drop_snapshot() fails
  Btrfs: check if there is enough space for balancing smarter
  Btrfs: fix a bug of balance on full multi-disk partitions
  Btrfs: fix an oops of log replay
  Btrfs: detect wether a device supports discard
  Btrfs: force unplugs when switching from high to regular priority bios

Merge git://git./linux/kernel/git/sfrench/cifs-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
  update cifs version to 1.75
  [CIFS] possible memory corruption on mount
  cifs: demote cERROR in build_path_from_dentry to cFYI

Merge branch 'for-linus' of git://git./linux/kernel/git/roland/infiniband

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  IB/iser: Support iSCSI PDU padding
  IBiser: Fix wrong mask when sizeof (dma_addr_t) > sizeof (unsigned long)
  IPoIB: Fix possible NULL dereference in ipoib_start_xmit()

Merge git://git./linux/kernel/git/hirofumi/fatfs-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/hirofumi/fatfs-2.6:
  fat: fat16 support maximum 4GB file/vol size as WinXP or 7.
  fat: fix utf8 iocharset warning message
  fat: fix build warning

irqdesc: fix new kernel-doc warning

Fix kernel-doc warning in irqdesc.c:

Warning(kernel/irq/irqdesc.c:353): No description found for parameter 'owner'

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

i7core_edac: fixed typo in error count calculation

Based on a patch from the PaX Team, found during a clang analysis pass.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Cc: PaX Team <pageexec@freemail.hu>
Cc: stable@kernel.org [v2.6.35+]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>