review.tizen.org Git - platform/adaptation/renesas_rcar/renesas

tracing/filter: Increase the max preds to 2^14

Now that the filter logic does not require to save the pred results
on the stack, we can increase the max number of preds we allow.
As the preds are index by a short value, and we use the MSBs as flags
we can increase the max preds to 2^14 (16384) which should be way
more than enough.

Cc: Tom Zanussi <tzanussi@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

tracing/filter: Move MAX_FILTER_PRED to local tracing directory

The MAX_FILTER_PRED is only needed by the kernel/trace/*.c files.
Move it to kernel/trace/trace.h.

Cc: Tom Zanussi <tzanussi@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

tracing/filter: Optimize filter by folding the tree

There are many cases that a filter will contain multiple ORs or
ANDs together near the leafs. Walking up and down the tree to get
to the next compare can be a waste.

If there are several ORs or ANDs together, fold them into a single
pred and allocate an array of the conditions that they check.
This will speed up the filter by linearly walking an array
and can still break out if a short circuit condition is met.

Cc: Tom Zanussi <tzanussi@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

tracing/filter: Check the created pred tree

Since the filter walks a tree to determine if a match is made or not,
if the tree was incorrectly created, it could cause an infinite loop.

Add a check to walk the entire tree before assigning it as a filter
to make sure the tree is correct.

Cc: Tom Zanussi <tzanussi@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

tracing/filter: Optimize short ciruit check

The test if we should break out early for OR and AND operations
can be optimized by comparing the current result with
(pred->op == OP_OR)

That is if the result is true and the op is an OP_OR, or
if the result is false and the op is not an OP_OR (thus an OP_AND)
we can break out early in either case. Otherwise we continue
processing.

Cc: Tom Zanussi <tzanussi@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

tracing/filter: Use a tree instead of stack for filter_match_preds()

Currently the filter_match_preds() requires a stack to push
and pop the preds to determine if the filter matches the record or not.
This has two drawbacks:

1) It requires a stack to store state information. As this is done
   in fast paths we can't allocate the storage for this stack, and
   we can't use a global as it must be re-entrant. The stack is stored
   on the kernel stack and this greatly limits how many preds we
   may allow.

2) All conditions are calculated even when a short circuit exists.
   a || b  will always calculate a and b even though a was determined
   to be true.

Using a tree we can walk a constant structure that will save
the state as we go. The algorithm is simply:

  pred = root;
  do {
switch (move) {
case MOVE_DOWN:
if (OR or AND) {
pred = left;
continue;
}
if (pred == root)
break;
match = pred->fn();
pred = pred->parent;
move = left child ? MOVE_UP_FROM_LEFT : MOVE_UP_FROM_RIGHT;
continue;

case MOVE_UP_FROM_LEFT:
/* Only OR or AND can be a parent */
if (match && OR || !match && AND) {
/* short circuit */
if (pred == root)
break;
pred = pred->parent;
move = left child ?
MOVE_UP_FROM_LEFT :
MOVE_UP_FROM_RIGHT;
continue;
}
pred = pred->right;
move = MOVE_DOWN;
continue;

case MOVE_UP_FROM_RIGHT:
if (pred == root)
break;
pred = pred->parent;
move = left child ? MOVE_UP_FROM_LEFT : MOVE_UP_FROM_RIGHT;
continue;
}
done = 1;
  } while (!done);

This way there's no strict limit to how many preds we allow
and it also will short circuit the logical operations when possible.

Cc: Tom Zanussi <tzanussi@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

tracing/filter: Free pred array on disabling of filter

When a filter is disabled, free the preds.

Cc: Tom Zanussi <tzanussi@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

tracing/filter: Allocate the preds in an array

Currently we allocate an array of pointers to filter_preds, and then
allocate a separate filter_pred for each item in the array.
This adds slight overhead in the filters as it needs to derefernce
twice to get to the op condition.

Allocating the preds themselves in a single array removes a dereference
as well as helps on the cache footprint.

Cc: Tom Zanussi <tzanussi@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

tracing/filter: Call synchronize_sched() just once for system filters

By separating out the reseting of the filter->n_preds to zero from
the reallocation of preds for the filter, we can reset groups of
filters first, call synchronize_sched() just once, and then reallocate
each of the filters in the system group.

Cc: Tom Zanussi <tzanussi@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

tracing/filter: Dynamically allocate preds

For every filter that is made, we create predicates to hold every
operation within the filter. We have a max of 32 predicates that we
can hold. Currently, we allocate all 32 even if we only need to
use one.

Part of the reason we do this is that the filter can be used at
any moment by any event. Fortunately, the filter is only used
with preemption disabled. By reseting the count of preds used "n_preds"
to zero, then performing a synchronize_sched(), we can safely
free and reallocate a new array of preds.

Cc: Tom Zanussi <tzanussi@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

tracing/filter: Move OR and AND logic out of fn() method

The ops OR and AND act different from the other ops, as they
are the only ones to take other ops as their arguements.
These ops als change the logic of the filter_match_preds.

By removing the OR and AND fn's we can also remove the val1 and val2
that is passed to all other fn's and are unused.

Cc: Tom Zanussi <tzanussi@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

tracing/filter: Have no filter return a match

The n_preds field of a file can change at anytime, and even can become
zero, just as the filter is about to be processed by an event.
In the case that is zero on entering the filter, return 1, telling
the caller the event matchs and should be trace.

Also use a variable and assign it with ACCESS_ONCE() such that the
count stays consistent within the function.

Cc: Tom Zanussi <tzanussi@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

Merge branch 'perf/core' of git://git./linux/kernel/git/acme/linux-2.6 into perf/core

Merge branch 'linus' into perf/core

Merge reason: Pick up perf fixes that are now upstream

Signed-off-by: Ingo Molnar <mingo@elte.hu>

Merge branch 'ixp4xx' of git://git./linux/kernel/git/chris/linux-2.6

* 'ixp4xx' of git://git.kernel.org/pub/scm/linux/kernel/git/chris/linux-2.6:
arm/ixp4xx: Rename FREQ macro to avoid collisions
IXP4xx: Fix qmgr_release_queue() flushing unexpected queue entries.

Merge branch 'timers-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip

* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
lockdep, timer: Fix del_timer_sync() annotation
RTC: Prevents a division by zero in kernel code.

Merge branch 'irq-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip

* 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
m32r: Fixup last __do_IRQ leftover
genirq: Add missing status flags to modification mask

Merge branch 'perf-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip

* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf stat: Fix aggreate counter reading accounting
  tracing: Replace syscall_meta_data struct array with pointer array
  tracepoints: Fix section alignment using pointer array
  tracing: Replace trace_event struct array with pointer array

Merge branch 'x86-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip

* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86-32: Make sure the stack is set up before we use it
  x86, mtrr: Avoid MTRR reprogramming on BP during boot on UP platforms
  x86, nx: Don't force pages RW when setting NX bits

Merge branch 'for-linus' of git://git./linux/kernel/git/tiwai/sound-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
  ALSA: use linux/io.h to fix compile warnings
  ALSA: hda - Fix memory leaks in conexant jack arrays
  ASoC: CX20442: fix NULL pointer dereference
  ASoC: Amstrad Delta: fix const related build error
  ALSA: oxygen: fix output routing on Xonar DG
  sound: silent echo'ed messages in Makefile
  ASoC: Fix mask/val_mask confusion snd_soc_dapm_put_volsw()
  ASoC: DaVinci: fix kernel panic due to uninitialized platform_data
  ALSA: HDA: Fix microphone(s) on Lenovo Edge 13
  ASoC: Fix module refcount for auxiliary devices
  ALSA: HDA: cxt5066 - Use asus model for Asus U50F, select correct SPDIF output
  ALSA: HDA: Add a new model "asus" for Conexant 5066/205xx
  ALSA: HDA: Refactor some redundant code for Conexant 5066/205xx

perf top: Ditch private annotation code, share perf annotate's

Next step: Live TUI annotation in perf top, just press enter on a symbol
line.

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

arm/ixp4xx: Rename FREQ macro to avoid collisions

FREQ is a ridiculously short name for a platform-specific macro in a
generic header, and it now conflicts with an enumeration in the
gspca/ov519 driver.

Also delete conditional reference to ixp4xx_get_board_tick_rate()
which is not defined anywhere.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl>

IXP4xx: Fix qmgr_release_queue() flushing unexpected queue entries.

Queues should be empty when released, if not, there is a safety valve.
Make sure the queue is usable after it triggers.

Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl>

perf annotate: Separate objdump parsing from actual screen rendering

Because in 'perf top' we'll need to parse just once and then, as samples
come, render multiple times with evolving counter values.

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

Merge branch 'timers/locking'

m32r: Fixup last __do_IRQ leftover

Somehow I managed to miss the last __do_IRQ caller when I cleanup the
remaining users. m32r is fully converted to the generic irq layer, but
I managed to not commit the conversion of __do_IRQ() to
generic_handle_irq() after compile testing the quilt series :(

Pointed-out-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Paul Mundt <lethal@linux-sh.org>

perf annotate: Config options for symbol__tty_annotate

Max line# that should be printed, minimum percentage filter, just like
'perf top', alas, due to it :-)

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

genirq: Add missing status flags to modification mask

The mask which filters out the valid bits which can be set via
irq_modify_status() is missing IRQ_NO_BALANCING, which breaks UV.

Add IRQ_PER_CPU as well to avoid another one line patch for 39.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

perf annotate: Support multiple histograms in annotation

The perf annotate tool continues aggregating everything on just one
histograms, but to support the top model add support for one histogram
perf evsel in the evlist.

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

perf annotate: Move annotate functions to util/

They will be used by perf top, so that we have just one set of routines
to do annotation.

Rename "struct sym_priv" to "struct annotation", etc, to clarify this
code a bit.

Rename "struct sym_ext" to "struct source_line", to give it a meaningful
name, that clarifies that it is a the result of an addr2line call, that
is sorted by percentage one particular source code line appeared in the
annotation.

And since we're moving things around also rename 'sym_hist->ip' to
'sym_hist->addr' as we want to do data structure annotation at some
point.

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

perf top: Remove superfluous name_len field

From the sym_entry struct, struct symbol already has this field.

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

x86-32: Make sure the stack is set up before we use it

Since checkin ebba638ae723d8a8fc2f7abce5ec18b688b791d7 we call
verify_cpu even in 32-bit mode. Unfortunately, calling a function
means using the stack, and the stack pointer was not initialized in
the 32-bit setup code! This code initializes the stack pointer, and
simplifies the interface slightly since it is easier to rely on just a
pointer value rather than a descriptor; we need to have different
values for the segment register anyway.

This retains start_stack as a virtual address, even though a physical
address would be more convenient for 32 bits; the 64-bit code wants
the other way around...

Reported-by: Matthieu Castet <castet.matthieu@free.fr>
LKML-Reference: <4D41E86D.8060205@free.fr>
Tested-by: Kees Cook <kees.cook@canonical.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>

Merge git://git./linux/kernel/git/davem/net-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (68 commits)
  net: can: janz-ican3: world-writable sysfs termination file
  net: can: at91_can: world-writable sysfs files
  MAINTAINERS: update email ids of the be2net driver maintainers.
  bridge: Don't put partly initialized fdb into hash
  r8169: prevent RxFIFO induced loops in the irq handler.
  r8169: RxFIFO overflow oddities with 8168 chipsets.
  r8169: use RxFIFO overflow workaround for 8168c chipset.
  include/net/genetlink.h: Allow genlmsg_cancel to accept a NULL argument
  net: Provide compat support for SIOCGETMIFCNT_IN6 and SIOCGETSGCNT_IN6.
  net: Support compat SIOCGETVIFCNT ioctl in ipv4.
  net: Fix bug in compat SIOCGETSGCNT handling.
  niu: Fix races between up/down and get_stats.
  tcp_ecn is an integer not a boolean
  atl1c: Add missing PCI device ID
  s390: Fix possibly wrong size in strncmp (smsgiucv)
  s390: Fix wrong size in memcmp (netiucv)
  qeth: allow OSA CHPARM change in suspend state
  qeth: allow HiperSockets framesize change in suspend
  qeth: add more strict MTU checking
  qeth: show new mac-address if its setting fails
  ...

net: can: janz-ican3: world-writable sysfs termination file

Don't allow everybody to set terminator via sysfs.

Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: can: at91_can: world-writable sysfs files

Don't allow everybody to write to mb0_id file.

Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
Acked-by: Kurt Van Dijck <kurt.van.dijck@eia.be>
Signed-off-by: David S. Miller <davem@davemloft.net>

MAINTAINERS: update email ids of the be2net driver maintainers.

Signed-off-by: Ajit Khaparde <ajit.khaparde@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bridge: Don't put partly initialized fdb into hash

The fdb_create() puts a new fdb into hash with only addr set. This is
not good, since there are callers, that search the hash w/o the lock
and access all the other its fields.

Applies to current netdev tree.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

CRED: Fix kernel panic upon security_file_alloc() failure.

In get_empty_filp() since 2.6.29, file_free(f) is called with f->f_cred == NULL
when security_file_alloc() returned an error. As a result, kernel will panic()
due to put_cred(NULL) call within RCU callback.

Fix this bug by assigning f->f_cred before calling security_file_alloc().

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'tip/perf/urgent-2' of git://git./linux/kernel/git/rostedt/linux-2.6-trace into perf/urgent

Merge branch 'drm-fixes' of git://git./linux/kernel/git/airlied/drm-2.6

* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6: (27 commits)
  gpu/stub: fix acpi_video build error, fix stub kconfig dependencies
  drm/radeon/kms: dynamically allocate power state space
  drm/radeon/kms: fix s/r issues with bios scratch regs
  agp: ensure GART has an address before enabling it
  Revert "agp: AMD AGP is used on UP1100 & UP1500 alpha boxen"
  amd-k7-agp: remove non-x86 code
  drm/radeon/kms/evergreen: always set certain VGT regs at CP init
  drm/radeon/kms: add updated ib_execute function for evergreen
  drm/radeon: remove 0x4243 pci id
  drm/radeon/kms: Enable new pll calculation for avivo+ asics
  drm/radeon/kms: add new pll algo for avivo asics
  drm/radeon/kms: add pll debugging output
  drm/radeon/kms: switch back to min->max pll post divider iteration
  drm/radeon/kms: rv6xx+ thermal sensor fixes
  drm/nv50: fix display on 0x50
  drm/nouveau: correctly pair hwmon_init and hwmon_fini
  drm/i915: Only bind to function 0 of the PCI device
  drm/i915: Suppress spurious vblank interrupts
  drm: Avoid leak of adjusted mode along quick set_mode paths
  drm: Simplify and defend later checks when disabling a crtc
  ...

drm: Only set DPMS ON when actually configuring a mode

In drm_crtc_helper_set_config, instead of always forcing all outputs
to DRM_MODE_DPMS_ON, only set them if the CRTC is actually getting a
mode set, as any mode set will turn all outputs on.

This fixes https://lkml.org/lkml/2011/1/24/457

Signed-off-by: Keith Packard <keithp@keithp.com>
Cc: stable@kernel.org (2.6.37)
Reported-and-tested-by: Carlos R. Mafra <crmafra2@gmail.com>
Tested-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'fix/asoc' into for-linus

r8169: prevent RxFIFO induced loops in the irq handler.

While the RxFIFO interruption is masked for most 8168, nothing prevents
it to appear in the irq status word. This is no excuse to crash.

Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Cc: Ivan Vecera <ivecera@redhat.com>
Cc: Hayes <hayeswang@realtek.com>

r8169: RxFIFO overflow oddities with 8168 chipsets.

Some experiment-based action to prevent my 8168 chipsets locking-up hard
in the irq handler under load (pktgen ~1Mpps). Apparently a reset is not
always mandatory (is it at all ?).

- RTL_GIGA_MAC_VER_12
- RTL_GIGA_MAC_VER_25
  Missed ~55% packets. Note:
  - this is an old SiS 965L motherboard
  - the 8168 chipset emits (lots of) control frames towards the sender

- RTL_GIGA_MAC_VER_26
  The chipset does not go into a frenzy of mac control pause when it
  crashes yet but it can still be crashed. It needs more work.

Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Cc: Ivan Vecera <ivecera@redhat.com>
Cc: Hayes <hayeswang@realtek.com>

r8169: use RxFIFO overflow workaround for 8168c chipset.

I found that one of the 8168c chipsets (concretely XID 1c4000c0) starts
generating RxFIFO overflow errors. The result is an infinite loop in
interrupt handler as the RxFIFOOver is handled only for ...MAC_VER_11.
With the workaround everything goes fine.

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Acked-by: Francois Romieu <romieu@fr.zoreil.com>
Cc: Hayes <hayeswang@realtek.com>

lockdep, timer: Fix del_timer_sync() annotation

Calling local_bh_enable() will want to actually start processing
softirqs, which isn't a good idea since this can get called with IRQs
disabled.

Cure this by using _local_bh_enable() which doesn't start processing
softirqs, and use raw_local_irq_save() to avoid any softirqs from
happening without letting lockdep think IRQs are in fact disabled.

Reported-by: Nick Bowler <nbowler@elliptictech.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Yong Zhang <yong.zhang0@gmail.com>
LKML-Reference: <20110203141548.039540914@chello.nl>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

include/net/genetlink.h: Allow genlmsg_cancel to accept a NULL argument

nlmsg_cancel can accept NULL as its second argument, so for similarity,
this patch extends genlmsg_cancel to be able to accept a NULL second
argument as well.

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>

gpu/stub: fix acpi_video build error, fix stub kconfig dependencies

The comments under "config STUB_POULSBO" are close to correct,
but they are not being followed. This patch updates them to reflect
the requirements for THERMAL.

This build error is caused by STUB_POULSBO selecting ACPI_VIDEO
when ACPI_VIDEO's config requirements are not met.

ERROR: "thermal_cooling_device_register" [drivers/acpi/video.ko] undefined!
ERROR: "thermal_cooling_device_unregister" [drivers/acpi/video.ko] undefined!

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Tested-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Dave Airlie <airlied@redhat.com>

net: Provide compat support for SIOCGETMIFCNT_IN6 and SIOCGETSGCNT_IN6.

Signed-off-by: David S. Miller <davem@davemloft.net>

net: Support compat SIOCGETVIFCNT ioctl in ipv4.

Signed-off-by: David S. Miller <davem@davemloft.net>

net: Fix bug in compat SIOCGETSGCNT handling.

Commit 709b46e8d90badda1898caea50483c12af178e96 ("net: Add compat
ioctl support for the ipv4 multicast ioctl SIOCGETSGCNT") added the
correct plumbing to handle SIOCGETSGCNT properly.

However, whilst definiting a proper "struct compat_sioc_sg_req" it
isn't actually used in ipmr_compat_ioctl().

Correct this oversight.

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'for-linus' of git://git./linux/kernel/git/hch/hfsplus

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hch/hfsplus:
  hfsplus: fix up a comparism in hfsplus_file_extend
  hfsplus: fix two memory leaks in wrapper.c
  hfsplus: do not leak buffer on error
  hfsplus: fix failed mount handling

niu: Fix races between up/down and get_stats.

As reported by Flavio Leitner, there is no synchronization to protect
NIU's get_stats method from seeing a NULL pointer in either
np->rx_rings or np->tx_rings.  In fact, as far as ->ndo_get_stats
is concerned, these values are set completely asynchronously.

Flavio attempted to fix this using a RW semaphore, which in fact
works most of the time.  However, dev_get_stats() can be invoked
from non-sleepable contexts in some cases, so this fix doesn't
work in all cases.

So instead, control the visibility of the np->{rx,tx}_ring pointers
when the device is being brough up, and use properties of the device
down sequence to our advantage.

In niu_get_stats(), return immediately if netif_running() is false.
The device shutdown sequence first marks the device as not running (by
clearing the __LINK_STATE_START bit), then it performans a
synchronize_rcu() (in dev_deactive_many()), and then finally it
invokes the driver ->ndo_stop() method.

This guarentees that all invocations of niu_get_stats() either see
netif_running() as false, or they see the channel pointers before
->ndo_stop() clears them out.

If netif_running() is true, protect against startup races by loading
the np->{rx,tx}_rings pointer into a local variable, and punting if
it is NULL.  Use ACCESS_ONCE to prevent the compiler from reloading
the pointer on us.

Also, during open, control the order in which the pointers and the
ring counts become visible globally using SMP write memory barriers.
We make sure the np->num_{rx,tx}_rings value is stable and visible
before np->{rx,tx}_rings is.

Such visibility control is not necessary on the niu_free_channels()
side because of the RCU sequencing that happens during device down as
described above.  We are always guarenteed that all niu_get_stats
calls are finished, or will see netif_running() false, by the time
->ndo_stop is invoked.

Reported-by: Flavio Leitner <fleitner@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

drm/radeon/kms: dynamically allocate power state space

We previously used a static array, but some new systems
had more states then we had array space, so dynamically
allocate space based on the number of states in the vbios.

Fixes:
https://bugs.freedesktop.org/show_bug.cgi?id=33851

Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>

drm/radeon/kms: fix s/r issues with bios scratch regs

The accelerate mode bit gets checked by certain atom
command tables to set up some register state. It needs
to be clear when setting modes and set when not.

Fixes:
https://bugzilla.kernel.org/show_bug.cgi?id=26942

Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>

agp: ensure GART has an address before enabling it

Some BIOSs (eg. the AMI BIOS on the Asus P4P800 motherboard) don't
initialise the GART address, and pcibios_assign_resources() can ignore it
because it can be marked as a host bridge (see
https://bugzilla.kernel.org/show_bug.cgi?id=24392#c5 for details). This
was handled correctly up to 2.6.35, but the pci_enable_device() cleanup in
2.6.36 96576a9e1a0cdb8 ("agp: intel-agp: do not use PCI resources before
pci_enable_device()") means that the kernel tries to enable the GART
before assigning it an address; in such cases the GART overlaps with other
device assignments and ends up being disabled.

This patch fixes https://bugzilla.kernel.org/show_bug.cgi?id=24392

Note that I imagine efficeon-agp.c probably has the same problem, but
I can't test that and I'd like to make sure this patch is suitable for
-stable (since 2.6.36 and 2.6.37 are affected).

Signed-off-by: Stephen Kitt <steve@sk2.org>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Maciej Rutecki <maciej.rutecki@gmail.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Kulikov Vasiliy <segooon@gmail.com>
Cc: Florian Mickler <florian@mickler.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>

Revert "agp: AMD AGP is used on UP1100 & UP1500 alpha boxen"

This reverts commit f191f144079b0083c6fa7d01a4acbd7263fb5032.

The AMD 751 and 761 chipsets are used on the UP1000, UP1100, and UP1500
OEM motherboards, but they neglect to do anything to make AGP work.

According to Ivan Kokshaysky:
There is quite fundamental conflict between the Alpha architecture and
x86 AGP implementation - Alpha is entirely cache coherent by design,
while x86 AGP is not (I mean native AGP DMA transactions, not a PCI over
AGP). There are no such things as non-cacheable mappings or software
support for cache flushing/invalidation on Alpha, so x86 AGP code won't
work on Nautilus.

So there's no point in allowing this driver to be configured on Alpha.

Signed-off-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

amd-k7-agp: remove non-x86 code

amd-k7-agp can't be built on Alpha anymore, so remove now unnecessary
code.

Signed-off-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

drm/radeon/kms/evergreen: always set certain VGT regs at CP init

These should be handled by the clear_state setup, but set them
directly as well just to be sure.

Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>

drm/radeon/kms: add updated ib_execute function for evergreen

Adds new packet to disable DX9 constant emulation.

Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>

hfsplus: fix up a comparism in hfsplus_file_extend

Revert an incorrect hunk from commit b2837fcf4994e699a4def002e26f274d95b387c1,

"hfsplus: %L-to-%ll, macro correction, and remove unneeded braces"

revert a pointless change of comparism operation argument order, which turned
out to not even be equivalent.

Reported-by: Joe Perches <joe@perches.com>
Signed-off-by: Christoph Hellwig <hch@tuxera.com>

hfsplus: fix two memory leaks in wrapper.c

Signed-Off-By: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: Christoph Hellwig <hch@tuxera.com>

hfsplus: do not leak buffer on error

Signed-Off-By: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: Christoph Hellwig <hch@tuxera.com>

hfsplus: fix failed mount handling

Currently the error handling in hfsplus_fill_super is a mess, and can
lead to accessing fields in the superblock that haven't been even set
up yet. Fix this by making sure we do not set up sb->s_root until we
have the mount fully set up, and before that do proper step by step
unwinding instead of using hfsplus_put_super as a big hammer.

Reported-by: Dan Williams <dcbw@redhat.com>
Signed-off-by: Christoph Hellwig <hch@tuxera.com>

Merge git://git./linux/kernel/git/jejb/scsi-rc-fixes-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6:
  [SCSI] libsas: fix runaway error handler problem
  [SCSI] fix incorrect value of SCSI_MAX_SG_CHAIN_SEGMENTS due to include file ordering
  [SCSI] arcmsr: Fix the issue of system hangup after commands timeout on ARC-1200
  [SCSI] mpt2sas: fix Integrated Raid unsynced on shutdown problem
  [SCSI] mpt2sas: Kernel Panic during Large Topology discovery
  [SCSI] mpt2sas: Fix the race between broadcast asyn event and scsi command completion
  [SCSI] mpt2sas: Correct resizing calculation for max_queue_depth
  [SCSI] mpt2sas: fix internal device reset for older firmware prior to MPI Rev K
  [SCSI] mpt2sas: Fix device removal handshake for zoned devices

x86, mm: avoid possible bogus tlb entries by clearing prev mm_cpumask after switching mm

Clearing the cpu in prev's mm_cpumask early will avoid the flush tlb
IPI's while the cr3 is still pointing to the prev mm.  And this window
can lead to the possibility of bogus TLB fills resulting in strange
failures.  One such problematic scenario is mentioned below.

T1. CPU-1 is context switching from mm1 to mm2 context and got a NMI
     etc between the point of clearing the cpu from the mm_cpumask(mm1)
     and before reloading the cr3 with the new mm2.

T2. CPU-2 is tearing down a specific vma for mm1 and will proceed with
     flushing the TLB for mm1.  It doesn't send the flush TLB to CPU-1
     as it doesn't see that cpu listed in the mm_cpumask(mm1).

T3. After the TLB flush is complete, CPU-2 goes ahead and frees the
     page-table pages associated with the removed vma mapping.

T4. CPU-2 now allocates those freed page-table pages for something
     else.

T5. As the CR3 and TLB caches for mm1 is still active on CPU-1, CPU-1
     can potentially speculate and walk through the page-table caches
     and can insert new TLB entries.  As the page-table pages are
     already freed and being used on CPU-2, this page walk can
     potentially insert a bogus global TLB entry depending on the
     (random) contents of the page that is being used on CPU-2.

T6. This bogus TLB entry being global will be active across future CR3
     changes and can result in weird memory corruption etc.

To avoid this issue, for the prev mm that is handing over the cpu to
another mm, clear the cpu from the mm_cpumask(prev) after the cr3 is
changed.

Marking it for -stable, though we haven't seen any reported failure that
can be attributed to this.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: stable@kernel.org [v2.6.32+]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

RTC: Prevents a division by zero in kernel code.

This patch prevents a user space program from calling the RTC_IRQP_SET
ioctl with a negative value of frequency. Also, if this call is make
with a zero value of frequency, there would be a division by zero in the
kernel code.

[jstultz: Also initialize irq_freq to 1 to catch other divbyzero issues]

CC: Alessandro Zummo <a.zummo@towertech.it>
CC: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Marcelo Roberto Jimenez <mroberto@cpti.cetuc.puc-rio.br>
Signed-off-by: John Stultz <john.stultz@linaro.org>

Merge branch 'perf/urgent' of git://git./linux/kernel/git/acme/linux-2.6 into perf/urgent

perf stat: Fix aggreate counter reading accounting

Introduced in: c52b12ed, when this sequence:

  count[0] = count[1] = count[2] = 0;

Was replaced with:

  aggr->val = 0;

Which is equivalent to zeroing just the first entry in the 'count'
array.

Fix it by zeroing the three entries with:

  aggr->val = aggr->ena = aggr->run = 0;

Reported-by: Ingo Molnar <mingo@elte.hu>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

Merge branch 'for-linus' of git://git./linux/kernel/git/roland/infiniband

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  RDMA: Update missed conversion of flush_scheduled_work()
  RDMA/ucma: Copy iWARP route information on queries
  RDMA/amso1100: Fix compile warnings
  RDMA/cxgb4: Set the correct device physical function for iWARP connections
  RDMA/cxgb4: Limit MAXBURST EQ context field to 256B
  IB/qib: Hold link for TX SERDES settings
  mlx4_core: Add ConnectX-3 device IDs

Merge branch 'irq-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip

* 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
genirq: Prevent irq storm on migration

Merge branch 'sched-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip

* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: Fix update_curr_rt()
sched, docs: Update schedstats documentation to version 15

Merge branch 'perf-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip

* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf: Fix reading in perf_event_read()
  watchdog: Don't change watchdog state on read of sysctl
  watchdog: Fix sysctl consistency
  watchdog: Fix broken nowatchdog logic
  perf: Fix Pentium4 raw event validation
  perf: Fix alloc_callchain_buffers()

tracing: Replace syscall_meta_data struct array with pointer array

Currently the syscall_meta structures for the syscall tracepoints are
placed in the __syscall_metadata section, and at link time, the linker
makes one large array of all these syscall metadata structures. On boot
up, this array is read (much like the initcall sections) and the syscall
data is processed.

The problem is that there is no guarantee that gcc will place complex
structures nicely together in an array format. Two structures in the
same file may be placed awkwardly, because gcc has no clue that they
are suppose to be in an array.

A hack was used previous to force the alignment to 4, to pack the
structures together. But this caused alignment issues with other
architectures (sparc).

Instead of packing the structures into an array, the structures' addresses
are now put into the __syscall_metadata section. As pointers are always the
natural alignment, gcc should always pack them tightly together
(otherwise initcall, extable, etc would also fail).

By having the pointers to the structures in the section, we can still
iterate the trace_events without causing unnecessary alignment problems
with other architectures, or depending on the current behaviour of
gcc that will likely change in the future just to tick us kernel developers
off a little more.

The __syscall_metadata section is also moved into the .init.data section
as it is now only needed at boot up.

Suggested-by: David Miller <davem@davemloft.net>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

tracepoints: Fix section alignment using pointer array

Make the tracepoints more robust, making them solid enough to handle compiler
changes by not relying on anything based on compiler-specific behavior with
respect to structure alignment. Implement an approach proposed by David Miller:
use an array of const pointers to refer to the individual structures, and export
this pointer array through the linker script rather than the structures per se.
It will consume 32 extra bytes per tracepoint (24 for structure padding and 8
for the pointers), but are less likely to break due to compiler changes.

History:

commit 7e066fb8 tracepoints: add DECLARE_TRACE() and DEFINE_TRACE()
added the aligned(32) type and variable attribute to the tracepoint structures
to deal with gcc happily aligning statically defined structures on 32-byte
multiples.

One attempt was to use a 8-byte alignment for tracepoint structures by applying
both the variable and type attribute to tracepoint structures definitions and
declarations. It worked fine with gcc 4.5.1, but broke with gcc 4.4.4 and 4.4.5.

The reason is that the "aligned" attribute only specify the _minimum_ alignment
for a structure, leaving both the compiler and the linker free to align on
larger multiples. Because tracepoint.c expects the structures to be placed as an
array within each section, up-alignment cause NULL-pointer exceptions due to the
extra unexpected padding.

(this patch applies on top of -tip)

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: David S. Miller <davem@davemloft.net>
LKML-Reference: <20110126222622.GA10794@Krystal>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

sched: Fix update_curr_rt()

cpu_stopper_thread()
  migration_cpu_stop()
    __migrate_task()
      deactivate_task()
        dequeue_task()
          dequeue_task_rq()
            update_curr_rt()

Will call update_curr_rt() on rq->curr, which at that time is
rq->stop. The problem is that rq->stop.prio matches an RT prio and
thus falsely assumes its a rt_sched_class task.

Reported-Debuged-Tested-Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Cc: stable@kernel.org # .37
Signed-off-by: Ingo Molnar <mingo@elte.hu>

perf: Fix reading in perf_event_read()

It is quite possible for the event to have been disabled between
perf_event_read() sending the IPI and the CPU servicing the IPI and
calling __perf_event_read(), hence revalidate the state.

Reported-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

perf: Cure task_oncpu_function_call() races

Oleg reported that on architectures with
__ARCH_WANT_INTERRUPTS_ON_CTXSW the IPI from
task_oncpu_function_call() can land before perf_event_task_sched_in()
and cause interesting situations for eg. perf_install_in_context().

This patch reworks the task_oncpu_function_call() interface to give a
more usable primitive as well as rework all its users to hopefully be
more obvious as well as remove the races.

While looking at the code I also found a number of races against
perf_event_task_sched_out() which can flip contexts between tasks so
plug those too.

Reported-and-reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

x86, mtrr: Avoid MTRR reprogramming on BP during boot on UP platforms

Markus Kohn ran into a hard hang regression on an acer aspire
1310, when acpi is enabled. git bisect showed the following
commit as the bad one that introduced the boot regression.

commit d0af9eed5aa91b6b7b5049cae69e5ea956fd85c3
Author: Suresh Siddha <suresh.b.siddha@intel.com>
Date:   Wed Aug 19 18:05:36 2009 -0700

    x86, pat/mtrr: Rendezvous all the cpus for MTRR/PAT init

Because of the UP configuration of that platform,
native_smp_prepare_cpus() bailed out (in smp_sanity_check())
before doing the set_mtrr_aps_delayed_init()

Further down the boot path, native_smp_cpus_done() will call the
delayed MTRR initialization for the AP's (mtrr_aps_init()) with
mtrr_aps_delayed_init not set. This resulted in the boot
processor reprogramming its MTRR's to the values seen during the
start of the OS boot. While this is not needed ideally, this
shouldn't have caused any side-effects. This is because the
reprogramming of MTRR's (set_mtrr_state() that gets called via
set_mtrr()) will check if the live register contents are
different from what is being asked to write and will do the actual
write only if they are different.

BP's mtrr state is read during the start of the OS boot and
typically nothing would have changed when we ask to reprogram it
on BP again because of the above scenario on an UP platform. So
on a normal UP platform no reprogramming of BP MTRR MSR's
happens and all is well.

However, on this platform, bios seems to be modifying the fixed
mtrr range registers between the start of OS boot and when we
double check the live registers for reprogramming BP MTRR
registers. And as the live registers are modified, we end up
reprogramming the MTRR's to the state seen during the start of
the OS boot.

During ACPI initialization, something in the bios (probably smi
handler?) don't like this fact and results in a hard lockup.

We didn't see this boot hang issue on this platform before the
commit d0af9eed5aa91b6b7b5049cae69e5ea956fd85c3, because only
the AP's (if any) will program its MTRR's to the value that BP
had at the start of the OS boot.

Fix this issue by checking mtrr_aps_delayed_init before
continuing further in the mtrr_aps_init(). Now, only AP's (if
any) will program its MTRR's to the BP values during boot.

Addresses https://bugzilla.novell.com/show_bug.cgi?id=623393

  [ By the way, this behavior of the bios modifying MTRR's after the start
    of the OS boot is not common and the kernel is not prepared to
    handle this situation well. Irrespective of this issue, during
    suspend/resume, linux kernel will try to reprogram the BP's MTRR values
    to the values seen during the start of the OS boot. So suspend/resume might
    be already broken on this platform for all linux kernel versions. ]

Reported-and-bisected-by: Markus Kohn <jabber@gmx.org>
Tested-by: Markus Kohn <jabber@gmx.org>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Thomas Renninger <trenn@novell.com>
Cc: Rafael Wysocki <rjw@novell.com>
Cc: Venkatesh Pallipadi <venki@google.com>
Cc: stable@kernel.org # [v2.6.32+]
LKML-Reference: <1296694975.4418.402.camel@sbsiddha-MOBL3.sc.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

tracing: Replace trace_event struct array with pointer array

Currently the trace_event structures are placed in the _ftrace_events
section, and at link time, the linker makes one large array of all
the trace_event structures. On boot up, this array is read (much like
the initcall sections) and the events are processed.

The problem is that there is no guarantee that gcc will place complex
structures nicely together in an array format. Two structures in the
same file may be placed awkwardly, because gcc has no clue that they
are suppose to be in an array.

A hack was used previous to force the alignment to 4, to pack the
structures together. But this caused alignment issues with other
architectures (sparc).

Instead of packing the structures into an array, the structures' addresses
are now put into the _ftrace_event section. As pointers are always the
natural alignment, gcc should always pack them tightly together
(otherwise initcall, extable, etc would also fail).

By having the pointers to the structures in the section, we can still
iterate the trace_events without causing unnecessary alignment problems
with other architectures, or depending on the current behaviour of
gcc that will likely change in the future just to tick us kernel developers
off a little more.

The _ftrace_event section is also moved into the .init.data section
as it is now only needed at boot up.

Suggested-by: David Miller <davem@davemloft.net>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

Revert "exofs: Set i_mapping->backing_dev_info anyway"

This reverts commit 115e19c53501edc11f730191f7f047736815ae3d.

Apparently setting inode->bdi to one's own sb->s_bdi stops VFS from
sending *read-aheads*.  This problem was bisected to this commit.  A
revert fixes it.  I'll investigate farther why is this happening for the
next Kernel, but for now a revert.

I'm sending to stable@kernel.org as well, since it exists also in
2.6.37.  2.6.36 is good and does not have this patch.

CC: Stable Tree <stable@kernel.org>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'media_fixes' of git://git./linux/kernel/git/mchehab/linux-2.6

* 'media_fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6:
  [media] fix saa7111 non-detection
  [media] rc/streamzap: fix reporting response times
  [media] mceusb: really fix remaining keybounce issues
  [media] rc: use time unit conversion macros correctly
  [media] rc/ir-lirc-codec: add back debug spew
  [media] ir-kbd-i2c: improve remote behavior with z8 behind usb
  [media] lirc_zilog: z8 on usb doesn't like back-to-back i2c_master_send
  [media] hdpvr: fix up i2c device registration
  [media] rc/mce: add mappings for missing keys
  [media] gspca - zc3xx: Discard the partial frames
  [media] gspca - zc3xx: Fix bad images with the sensor hv7131r
  [media] gspca - zc3xx: Bad delay when given by a table

Merge branch 'for-linus' of git://git390.marist.edu/linux-2.6

* 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6:
  [S390] reset default for CONFIG_CHSC_SCH
  [S390] qdio: prevent compile warning under CONFIG_32BIT
  [S390] use asm-generic/cacheflush.h
  [S390] tlb: fix build error caused by THP
  [S390] missing sacf in uaccess
  [S390] pgtable_list corruption
  [S390] dasd: prevent panic with unresumed devices

fs: make block fiemap mapping length at least blocksize long

Some filesystems don't deal well with being asked to map less than
blocksize blocks (GFS2 for example). Since we are always mapping at least
blocksize sections anyway, just make sure len is at least as big as a
blocksize so we don't trip up any filesystems. Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

vfs: sparse: add __FMODE_EXEC

FMODE_EXEC is a constant type of fmode_t but was used with normal integer
constants. This results in following warnings from sparse. Fix it using
new macro __FMODE_EXEC.

fs/exec.c:116:58: warning: restricted fmode_t degrades to integer
fs/exec.c:689:58: warning: restricted fmode_t degrades to integer
fs/fcntl.c:777:9: warning: restricted fmode_t degrades to integer

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

vfs: sparse: remove a warning on OPEN_FMODE()

AND-ing FMODE_* constant with normal integer results in following
sparse warnings. Fix it.

fs/open.c:662:21: warning: restricted fmode_t degrades to integer
fs/anon_inodes.c:123:34: warning: restricted fmode_t degrades to integer

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

memcg: fix event counting breakage from recent THP update

Changes in e401f1761 ("memcg: modify accounting function for supporting
THP better") adds nr_pages to support multiple page size in
memory_cgroup_charge_statistics.

But counting the number of event nees abs(nr_pages) for increasing
counters. This patch fixes event counting.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

memcg: never OOM when charging huge pages

Huge page coverage should obviously have less priority than the continued
execution of a process.

Never kill a process when charging it a huge page fails. Instead, give up
after the first failed reclaim attempt and fall back to regular pages.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

memcg: prevent endless loop when charging huge pages to near-limit group

If reclaim after a failed charging was unsuccessful, the limits are
checked again, just in case they settled by means of other tasks.

This is all fine as long as every charge is of size PAGE_SIZE, because in
that case, being below the limit means having at least PAGE_SIZE bytes
available.

But with transparent huge pages, we may end up in an endless loop where
charging and reclaim fail, but we keep going because the limits are not
yet exceeded, although not allowing for a huge page.

Fix this up by explicitely checking for enough room, not just whether we
are within limits.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

memcg: prevent endless loop when charging huge pages

The charging code can encounter a charge size that is bigger than a
regular page in two situations: one is a batched charge to fill the
per-cpu stocks, the other is a huge page charge.

This code is distributed over two functions, however, and only the outer
one is aware of huge pages. In case the charging fails, the inner
function will tell the outer function to retry if the charge size is
bigger than regular pages--assuming batched charging is the only case.
And the outer function will retry forever charging a huge page.

This patch makes sure the inner function can distinguish between batch
charging and a single huge page charge. It will only signal another
attempt if batch charging failed, and go into regular reclaim when it is
called on behalf of a huge page.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

thp: fix unsuitable behavior for hwpoisoned tail page

When a tail page of THP is poisoned, memory-failure will do nothing except
setting PG_hwpoison, while the expected behavior is that the process, who
is using the poisoned tail page, should be killed.

The above problem is caused by lru check of the poisoned tail page of THP.
Because PG_lru flag is only set on the head page of THP, the check always
consider the poisoned tail page as NON lru page.

So the lru check for the tail page of THP should be avoided, as like as
hugetlb.

This patch adds !PageTransCompound() before lru check for THP, because of
the check (!PageHuge() && !PageTransCompound()) the whole branch could be
optimized away at build time when both hugetlbfs and THP are set with "N"
(or in archs not supporting either of those).

[akpm@linux-foundation.org: fix unrelated typo in shake_page() comment]
Signed-off-by: Jin Dongming <jin.dongming@np.css.fujitsu.com>
Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

thp: fix the wrong reported address of hwpoisoned hugepages

When the tail page of THP is poisoned, the head page will be poisoned too.
And the wrong address, address of head page, will be sent with sigbus
always.

So when the poisoned page is used by Guest OS which is running on KVM,
after the address changing(hva->gpa) by qemu, the unexpected process on
Guest OS will be killed by sigbus.

What we expected is that the process using the poisoned tail page could be
killed on Guest OS, but not that the process using the healthy head page
is killed.

Since it is not good to poison the healthy page, avoid poisoning other
than the page which is really poisoned.
  (While we poison all pages in a huge page in case of hugetlb,
   we can do this for THP thanks to split_huge_page().)

Here we fix two parts:
  1. Isolate the poisoned page only to make sure
     the reported address is the address of poisoned page.
  2. make the poisoned page work as the poisoned regular page.

[akpm@linux-foundation.org: fix spello in comment]
Signed-off-by: Jin Dongming <jin.dongming@np.css.fujitsu.com>
Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

thp: fix splitting of hwpoisoned hugepages

The poisoned THP is now split with split_huge_page() in
collect_procs_anon().  If kmalloc() is failed in collect_procs(),
split_huge_page() could not be called.  And the work after
split_huge_page() for collecting the processes using poisoned page will
not be done, too.  So the processes using the poisoned page could not be
killed.

The condition becomes worse when CONFIG_DEBUG_VM == "Y".  Because the
poisoned THP could not be split, system panic will be caused by
VM_BUG_ON(PageTransHuge(page)) in try_to_unmap().

This patch does:
  1. move split_huge_page() to the place before collect_procs().
     This can be sure the failure of splitting THP is caused by itself.
  2. when splitting THP is failed, stop the operations after it.
     This can avoid unexpected system panic or non sense works.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Jin Dongming <jin.dongming@np.css.fujitsu.com>
Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

MAINTAINERS: fixup Simtec support email entries

The support@simtec.co.uk address is for direct customer support only, the
EB2410ITX and EB110ATX entries should direct to the Simtec Linux Team
address of linux@simtec.co.uk

Also add correct email address for Vincent Sanders

[akpm@linux-foundation.org: fix Vincent's address]
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Cc: Vincent Sanders <vince@simtec.co.uk>
Cc: Simtec Support <support@simtec.co.uk>
Cc: Simtec Linux Team <linux@simtec.co.uk>
Cc: Jack Stone <jwjstone@fastmail.fm>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

MAINTAINERS: fixup file entries for "SIMTEC EB2410ITX (BAST)"

Add the correct files for the Simtec BAST machine, ensuring the IDE and
IRQ routing are added, and move to the machine specific file instead of
trying to catch all of arch/arm/mach-s3c2410

Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Cc: Simtec Linux Team <linux@simtec.co.uk>
Cc: Simtec Support <support@simtec.co.uk>
Cc: Jack Stone <jwjstone@fastmail.fm>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

MAINTAINERS: move s3c2410 drivers to ARM/SAMSUNG ARM

There are currently two entries under the "SIMTEC EB2410ITX (BAST)"
machine entry for drivers/*/*s3c2410*, which is catching everything
s3c2410 driver related.

This entry is for a specific S3C2410 based machine, so move these two file
entries to the "ARM/SAMSUNG ARM ARCHITECTURES" entry, where it will reach
a wider audience of interested parties.

Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Cc: Simtec Linux Team <linux@simtec.co.uk>
Acked-by: Kukjin Kim <kgene.kim@samsung.com>
Cc: Jack Stone <jwjstone@fastmail.fm>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

epoll: epoll_wait() should not use timespec_add_ns()

commit 95aac7b1cd224f ("epoll: make epoll_wait() use the hrtimer range
feature") added a performance regression because it uses timespec_add_ns()
with potential very large 'ns' values.

[akpm@linux-foundation.org: s/epoll_set_mstimeout/ep_set_mstimeout/, per Davide]
Reported-by: Simon Kirby <sim@hostway.ca>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Shawn Bohrer <shawn.bohrer@gmail.com>
Acked-by: Davide Libenzi <davidel@xmailserver.org>
Cc: <stable@kernel.org> [2.6.37.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm/migration: fix page corruption during hugepage migration

If migrate_huge_page by memory-failure fails , it calls put_page in itself
to decrease page reference and caller of migrate_huge_page also calls
putback_lru_pages. It can do double free of page so it can make page
corruption on page holder.

In addtion, clean of pages on caller is consistent behavior with
migrate_pages by cf608ac19c ("mm: compaction: fix COMPACTPAGEFAILED
counting").

Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: when migrate_pages returns 0, all pages must have been released

In some cases migrate_pages could return zero while still leaving a few
pages in the pagelist (and some caller wouldn't notice it has to call
putback_lru_pages after commit cf608ac19c9 ("mm: compaction: fix
COMPACTPAGEFAILED counting")).

Add one missing putback_lru_pages not added by commit cf608ac19c95 ("mm:
compaction: fix COMPACTPAGEFAILED counting").

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

memsw: deprecate noswapaccount kernel parameter and schedule it for removal

noswapaccount couldn't be used to control memsw for both on/off cases so
we have added swapaccount[=0|1] parameter. This way we can turn the
feature in two ways noswapaccount resp. swapaccount=0. We have kept the
original noswapaccount but I think we should remove it after some time as
it just makes more command line parameters without any advantages and also
the code to handle parameters is uglier if we want both parameters.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
Requested-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>