review.tizen.org Git - platform/kernel/linux-3.10.git/log

[PATCH] Handle holes in node mask in node fallback list setup

Change the find_next_best_node algorithm to correctly skip
over holes in the node online mask. Previously it would not handle
missing nodes correctly and cause crashes at boot.

[Written by Linus, tested by AK]

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] i386: fix singlestepping though a syscall

Do not mask TIF_SINGLESTEP bit in _TIF_WORK_MASK. Masking this stopped
do_notify_resume() from being called when it should have been.

Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] Fix SGI O2 compile error in drivers/video/gbefb.c

A sysfs function call uses the wrong parameter, and thus breaks a build on
SGI O2.

CC drivers/video/gbefb.o
drivers/video/gbefb.c: In function ‘gbefb_remove’:
drivers/video/gbefb.c:1246: error: ‘dev’ undeclared (first use in this function)
drivers/video/gbefb.c:1246: error: (Each undeclared identifier is reported only once
drivers/video/gbefb.c:1246: error: for each function it appears in.)
make[2]: *** [drivers/video/gbefb.o] Error 1

Signed-off-by: Joshua Kinard <kumba@gentoo.org>
Signed-off-by: Antonino Daplas <adaplas@pol.net>
Signed-off-by: Martin Michlmayr <tbm@cyrius.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] Provide an interface for getting the current tick length

This provides an interface for arch code to find out how many
nanoseconds are going to be added on to xtime by the next call to
do_timer.  The value returned is a fixed-point number in 52.12 format
in nanoseconds.  The reason for this format is that it gives the
full precision that the timekeeping code is using internally.

The motivation for this is to fix a problem that has arisen on 32-bit
powerpc in that the value returned by do_gettimeofday drifts apart
from xtime if NTP is being used.  PowerPC is now using a lockless
do_gettimeofday based on reading the timebase register and performing
some simple arithmetic.  (This method of getting the time is also
exported to userspace via the VDSO.)  However, the factor and offset
it uses were calculated based on the nominal tick length and weren't
being adjusted when NTP varied the tick length.

Note that 64-bit powerpc has had the lockless do_gettimeofday for a
long time now.  It also had an extremely hairy routine that got called
from the 32-bit compat routine for adjtimex, which adjusted the
factor and offset according to what it thought the timekeeping code
was going to do.  Not only was this only called if a 32-bit task did
adjtimex (i.e. not if a 64-bit task did adjtimex), it was also
duplicating computations from kernel/timer.c and it wasn't clear that
it was (still) correct.

The simple solution is to ask the timekeeping code how long the
current jiffy will be on each timer interrupt, after calling
do_timer.  If this jiffy will be a different length from the last one,
we then need to compute new values for the factor and offset used in
the lockless do_gettimeofday.  In this way we can keep xtime and
do_gettimeofday in sync, even when NTP is varying the tick length.

Note that when adjtimex varies the tick length, it almost always
introduces the variation from the next tick on.  The only case I could
see where adjtimex would vary the length of the current tick is when
an old-style adjtime adjustment is being cancelled.  (It's not clear
to me why the adjustment has to be cancelled immediately rather than
from the next tick on.)  Thus I don't see any real need for a hook in
adjtimex; the rare case of an old-style adjustment being cancelled can
be fixed up at the next tick.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: john stultz <johnstul@us.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] Handle all and empty zones when setting up custom zonelists for mbind

The memory allocator doesn't like empty zones (which have an
uninitialized freelist), so a x86-64 system with a node fully
in GFP_DMA32 only would crash on mbind.

Fix that up by putting all possible zones as fallback into the zonelist
and skipping the empty ones.

In fact the code always enough allocated space for all zones,
but only used it for the highest. This change just uses all the
memory that was allocated before.

This should work fine for now, but whoever implements node hot removal
needs to fix this somewhere else too (or make sure zone datastructures
by itself never go away, only their memory)

Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

Merge branch 'release' of git://git./linux/kernel/git/aegl/linux-2.6

Merge master.kernel.org:/home/rmk/linux-2.6-serial

Merge master.kernel.org:/home/rmk/linux-2.6-arm

Merge master.kernel.org:/home/rmk/linux-2.6-mmc

[PATCH] x86_64: Always pass full number of nodes to NUMA hash computation

Previously the numa hash code would be confused by holes in the node space
and stop early. This is the first part of the fix for the non boot issue
with empty nodes on Opterons.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] x86_64: Relax SRAT covers all memory check a bit

Code was refusing good SRATs because about 12K got lost somewhere.
Allow less than 1MB of difference before rejecting it.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] x86_64: Resolve the RIP of an early exception using kallsyms

But do it after everything else to risk less from recursive
crashes.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] x86_64: Disable tsc when apicpmtimer is active

Otherwise it has no effect anyways.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] x86_64: Don't enable ATI apicmaintimer workaround when the machine has C2 or C3

Many laptops have problems with ticking the local APIC timer in C2/C3.
The code added earlier to use it by default on ATI didn't really work
for them. Don't enable it when the system supports C2/C3.

This doesn't fix the problem fully, but at least it's not worse than before.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] x86_64: Don't call do_exit with interrupts disabled after IRET exception

This caused a sigreturn with bad argument on a preemptible kernel
to complain with

Debug: sleeping function called from invalid context at /home/lsrc/quilt/linux/include/linux/rwsem.h:43
in_atomic():0, irqs_disabled():1

Call Trace: {__might_sleep+190} {profile_task_exit+21}
{__do_exit+34} {do_wait+0}

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] x86_64: Add boot option to disable randomized mappings and cleanup

AMD SimNow!'s JIT doesn't like them at all in the guest. For distribution
installation it's easiest if it's a boot time option.

Also I moved the variable to a more appropiate place and make
it independent from sysctl

And marked __read_mostly which it is.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] x86_64: make touch_nmi_watchdog() not touch impossible cpus' private data

Along with that, also suppress the memory touching altogether when the
watchdog is not running, to eliminate needless crosstalk. Plus ad a call
to it to make things consistent (one could also consider removing the call
in enable_timer_nmi_watchdog()).

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] x86_64: Update defconfig

... and enable 1394 by default.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[ARM] 3339/1: ARM EABI: make unmuxed syscalls visible

Patch from Nicolas Pitre

With EABI the multiplex sys_ipc and sys_socketcall syscalls are
unavailable and their support code even removed from the compiled
kernel, and the new unmuxed syscalls must be used instead.

Make those syscall numbers visible.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

[ARM] 3338/1: old ABI compat: sys_socketcall

Patch from Nicolas Pitre

Commit 99595d0237926b5aba1fe4c844a011a1ba1ee1f8 forgot to intercept
sys_socketcall as well.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

[ARM] 3337/1: Fix NSLU2 flash support according to window size configuration patch

Patch from Martin Michlmayr

ARM patch 3226/1 (IXP4xx runtime expansion bus window size configuration)
forgot to update mach-ixp4xx/nslu2-setup.c which leads to the following
compilation error.  Update NSLU2 flash support following patch 3226/1.

  CC      arch/arm/mach-ixp4xx/nslu2-setup.o
arch/arm/mach-ixp4xx/nslu2-setup.c:30: error: \91NSLU2_FLASH_BASE\92 undeclared here (not in a function)
arch/arm/mach-ixp4xx/nslu2-setup.c:31: error: \91NSLU2_FLASH_SIZE\92 undeclared here (not in a function)
make[1]: *** [arch/arm/mach-ixp4xx/nslu2-setup.o] Error 1
make: *** [arch/arm/mach-ixp4xx] Error 2

Signed-off-by: Martin Michlmayr <tbm@cyrius.com>
---

nslu2-setup.c |    6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

[IA64] Count disabled cpus as potential hot-pluggable CPUs

Minor updates to earlier patch.
- Added to documentation to add ia64 as well.
- Minor clarification on how to use disabled cpus
- used plain max instead of max_t per Andew Morton.

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>

Merge branch 'upstream-linus' of git://oss.oracle.com/home/sourcebo/git/ocfs2

Merge /linux/kernel/git/jejb/scsi-rc-fixes-2.6

[PATCH] ocfs2: detach from heartbeat events before freeing mle

Signed-off-by: Kurt Hackel <Kurt.Hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>

[PATCH] ocfs2: only checkpoint journal when asked to

Disable automatic checkpointing of the journal - this is a relic from older
ocfs2 days. Worth quite a bit of performance on longer running single node
tests.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>

[PATCH] ocfs2: manually grant remote recovery lock

* fix a hang in recovery that occurred in dlmlock_remote.  the $RECOVERY
  lock was never moved to the granted queue even after getting DLM_NORMAL
  back from the master node.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>

[PATCH] ocfs2: add dlm_wait_for_node_death

* add dlm_wait_for_node_death function to be used after receiving a network
  error.  this will wait for the given timeout to allow the heartbeat
  callbacks to update the domain map.  without this, some paths may spin
  and consume enough cpu that the heartbeat gets starved and never updates.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>

[PATCH] ocfs2: fix release of ast never reserved

* fix a bug in dlm_convert_lock_handler where dlm_lockres_release_ast was
being called even if no ast was ever reserved

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>

[PATCH] ocfs2: recheck recovery state after getting lock

* after successfully taking the $RECOVERY lock in EX mode, recheck to make
sure that recovery has not already begun or completed on another node

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>

[IA64] Missing check for TIF_WORK if trace/audit enabled

It appears that if auditing is enabled, the kernel fails to
check for pending signals before returning to user mode.

Signed-off-by: Jack Steiner <steiner@sgi.com>
Acked-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>

[MMC] mmci: allow small data transfers

If a data transfer is small (less than a FIFO size) we would
hang waiting for the data to be read due to the PIO interrupt
not occuring. We allowed for this in our PIO interrupt handler,
but not when setting up a data transfer.

Apply the "fix" when setting up a data transfer as well.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

[PATCH] Fix over-zealous tag clearing in radix_tree_delete

If a tag is set for a node being deleted from a radix_tree, then that
tag gets cleared from the parent of the node, even if it is set for some
siblings of the node begin deleted.

This patch changes the logic to include a test for any_tag_set similar
to the logic a little futher down.  Care is taken to ensure that
'nr_cleared_tags' remains equals to the number of entries in the 'tags'
array which are set to '0' (which means that this tag is not set in the
tree below pathp->node, and should be cleared at pathp->node and
possibly above.

[ Nick says: "Linus FYI, I was able to modify the radix tree test
  harness to catch the bug and can no longer trigger it after the fix.
  Resulting code passes all other harness tests as well of course." ]

Signed-off-by: Neil Brown <neilb@suse.de>
Acked-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[ARM] Fix SMP initialisation oops

A change to the SMP initialisation caused the following oops:

CPU1: Booted secondary processor
CPU1: D VIPT write-back cache
CPU1: I cache: 32768 bytes, associativity 4, 32 byte lines, 256 sets
CPU1: D cache: 32768 bytes, associativity 4, 32 byte lines, 256 sets
<7>Calibrating delay loop... 83.14 BogoMIPS (lpj=415744)
<1>Unable to handle kernel NULL pointer dereference at virtual address 0000001c
...
PC is at enqueue_task+0x1c/0x64
LR is at activate_task+0xcc/0xe4

SMP initialisation now requires cpu_possible_map to be initialised in
setup_arch(). Move this from smp_prepare_cpus() to smp_init_cpus()
and call it from our setup_arch() if CONFIG_SMP is enabled.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

Merge /pub/scm/linux/kernel/git/dtor/input

Merge /pub/scm/linux/kernel/git/davem/net-2.6

[PATCH] mmconfig: add kernel parameter documentation

Mention the "pci=nommconf" option in kernel-parameters.txt.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] swsusp: nuke noisy message

I get about 88 squillion of these when suspending an old ad450nx server.

Cc: Pavel Roskin <proski@gnu.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] x86_64: early initialization of cpu_to_node

The early initialization of cpu_to_node code as it is now only updates the
cpu_to_node array, and does not update cpu_pda()->nodemember. This will
cause numa_node_id() to return 0 on systems where CPU 0 is not on Node 0.
This leads to a kernel panic in slab.c.

I've tested the patch below on a 16 processor x86_64 ES7000-600 server, and
no longer see the panic I saw with the original 2.6.16-rc3.

Signed-off-by: Dan Yeisley <dan.yeisley@unisys.com>
Acked-by: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] hrtimer: fix multiple macro argument expansion

For two macros the arguments were expanded twice, change them to inline
functions to avoid it.

Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] add asm-generic/mman.h

Make new MADV_REMOVE, MADV_DONTFORK, MADV_DOFORK consistent across all
arches. The idea is to make it possible to use them portably even before
distros include them in libc headers.

Move common flags to asm-generic/mman.h

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] cpuset: oops in exit on null cpuset fix

Fix a latent bug in cpuset_exit() handling. If a task tried to allocate
memory after calling cpuset_exit(), it oops'd in
cpuset_update_task_memory_state() on a NULL cpuset pointer.

So set the exiting tasks cpuset to the root cpuset instead of to NULL.

A distro kernel hit this with an added kernel package that had just such a
hook (allocating memory) in the exit code path.

Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] ide: touch softlockup detector during pio

We're getting some softlockup false positives during heavy PIO operations. So
poke the lockup detector.

Cc: Bartlomiej Zolnierkiewicz <B.Zolnierkiewicz@elka.pw.edu.pl>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] kbuild: revert "fix make -jN with multiple targets with O=..."

Commit 296e0855b0f9a4ec9be17106ac541745a55b2ce1:

    "kbuild: fix make -jN with multiple targets with O=..."

causes a ~95% increase in build time for the kernel.  Before: 4m21s
after: 8m1.403s.  Can we revert this until another approach is found?

Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] neofb: avoid resetting display config on unblank (v2)

There were two mistakes in the register-read-on-(un)blank approach.

- First, without proper register (un)locking the value read back will always
  be zero, and this is what I missed entirely until just now.  Due to this,
  the logic could not be verified at all and I tried some bogus checks which
  are completely stupid.

- Second, the LCD status bit will always be set to zero when the backlight
  has been turned off.  Reading the value back during unblank will disable the
  LCD unconditionally, regardless of the state it is supposed to be in, since
  we set it to zero beforehand.

So this is what we do now:

- create a new variable in struct neofb_par, and use that to determine
  whether to read back registers (initialized to true)

- before actually blanking the screen, read back the register to sense any
  possible change made through Fn key combo

- use proper neoUnlock() / neoLock() to actually read something

- every call to neofb_blank() determines if we read back next time: blanking
  disables readback, unblanking (FB_BLANK_UNBLANK) enables it

This should give us a nice and clean state machine.  Has been thoroughly
tested on a Dell Latitude CPiA / NM220 Chip docked to a C/Dock2 with attached
CRT in all possible combinations of LCD/CRT on/off.  I changed the config via
Fn key, let the console blank, unblanked by keypress - works flawlessly.

Signed-off-by: Christian Trefzer <ctrefzer@gmx.de>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[NETFILTER]: nf_conntrack: Fix TCP/UDP HW checksum handling for IPv6 packet

If skb->ip_summed is CHECKSUM_HW here, skb->csum includes checksum
of actual IPv6 header and extension headers. Then such excess
checksum must be subtruct when nf_conntrack calculates TCP/UDP checksum
with pseudo IPv6 header. Spotted by Ben Skeggs.

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

[NETFILTER]: nf_conntrack: attach conntrack to locally generated ICMPv6 error

Locally generated ICMPv6 errors should be associated with the conntrack
of the original packet. Since the conntrack entry may not be in the hash
tables (for the first packet), it must be manually attached.

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

[NETFILTER]: nf_conntrack: attach conntrack to TCP RST generated by ip6t_REJECT

TCP RSTs generated by the REJECT target should be associated with the
conntrack of the original TCP packet. Since the conntrack entry is
usually not is the hash tables, it must be manually attached.

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

[NETFILTER]: nf_conntrack: move registration of __nf_ct_attach

Move registration of __nf_ct_attach to nf_conntrack_core to make it usable
for IPv6 connection tracking as well.

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

[NETFILTER]: x_tables: fix dependencies of conntrack related modules

NF_CONNTRACK_MARK is bool and depends on NF_CONNTRACK which is
tristate. If a variable depends on NF_CONNTRACK_MARK and doesn't take
care about NF_CONNTRACK, it can be y even if NF_CONNTRACK isn't y.
NF_CT_ACCT have same issue, too.

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

[NETFILTER]: Don't invoke okfn in CONFIG_NETFILTER=n variant of nf_hook()

nf_hook() is supposed to call the netfilter hook and return control of the
packet back to the caller in case it may pass, the okfn is only used for
queueing.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

Pull fix-cpu-possible-map into release branch

[IA64] support panic_on_oops sysctl

Trivial port of this feature from i386
As it stands, panic_on_oops but does nothing on ia64

Signed-Off-By: Horms <horms@verge.net.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>

[XFRM]: Fix SNAT-related crash in xfrm4_output_finish

When a packet matching an IPsec policy is SNATed so it doesn't match any
policy anymore it looses its xfrm bundle, which makes xfrm4_output_finish
crash because of a NULL pointer dereference.

This patch directs these packets to the original output path instead. Since
the packets have already passed the POST_ROUTING hook, but need to start at
the beginning of the original output path which includes another
POST_ROUTING invocation, a flag is added to the IPCB to indicate that the
packet was rerouted and doesn't need to pass the POST_ROUTING hook again.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

[IA64] ia64: simplify and fix udelay()

The original ia64 udelay() was simple, but flawed for platforms without
synchronized ITCs:  a preemption and migration to another CPU during the
while-loop likely resulted in too-early termination or very, very
lengthy looping.

The first fix (now in 2.6.15) broke the delay loop into smaller,
non-preemptible chunks, reenabling preemption between the chunks.  This
fix is flawed in that the total udelay is computed to be the sum of just
the non-premptible while-loop pieces, i.e., not counting the time spent
in the interim preemptible periods.  If an interrupt or a migration
occurs during one of these interim periods, then that time is invisible
and only serves to lengthen the effective udelay().

This new fix backs out the current flawed fix and returns to a simple
udelay(), fully preemptible and interruptible.  It implements two simple
alternative udelay() routines:  one a default generic version that uses
ia64_get_itc(), and the other an sn-specific version that uses that
platform's RTC.

Signed-off-by: John Hawkes <hawkes@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>

[IA64-SGI] enforce proper ordering of callouts by XPC

Fix XPC so that it does not deliver any messages until the connected
callout has returned, as well as, prevent the disconnected callout to
occur before the disconnecting callout has returned.

Signed-off-by: Dean Nelson <dcn@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>

[IA64-SGI] fix the size of __sn_cnodeid_to_nasid

The __sn_cnodeid_to_nasid array was incorrectly sized at MAX_NUMNODES.
On a large system, this array could overflow. The following patch
corrects this by defining it to MAX_COMPACT_NODES.

Signed-off-by: Dean Roe <roe@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>

[IA64-SGI] export sn_pcidev_info_get

Export sn_pcidev_info_get.

Signed-off-by Mark Maule <maule@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>

[IA64-SGI] remove compile time warning

This one falls into the "present for Andrew Morton" category to address
his wishlist for a compiler warning free build ;-)

Signed-off-by: Tony Luck <tony.luck@intel.com>

[IA64] remove obsolete corporate address

Remove obsolete SGI address

Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>

[IA64-SGI] sn2 minor fixes and cleanups

General SN2 code cleanup:
- Do not initialize global variables to zero
- Use kzalloc instead of kmalloc+memset
- Check kmalloc return values
- Do not obfuscate spin lock calls
- Remove some unused code
- Various formatting cleanups

Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>

[IA64] Remove duplicate EXPORT_SYMBOLs

Remove symbol exports from ia64_ksyms.c that are already exported in
lib/string.c.

Signed-off-by: Andreas Schwab <schwab@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>

[PATCH] fix zap_thread's ptrace related problems

1. The tracee can go from ptrace_stop() to do_signal_stop()
after __ptrace_unlink(p).

2. It is unsafe to __ptrace_unlink(p) while p->parent may wait
for tasklist_lock in ptrace_detach().

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] fix kill_proc_info() vs fork() theoretical race

copy_process:

attach_pid(p, PIDTYPE_PID, p->pid);
attach_pid(p, PIDTYPE_TGID, p->tgid);

What if kill_proc_info(p->pid) happens in between?

copy_process() holds current->sighand.siglock, so we are safe
in CLONE_THREAD case, because current->sighand == p->sighand.

Otherwise, p->sighand is unlocked, the new process is already
visible to the find_task_by_pid(), but have a copy of parent's
'struct pid' in ->pids[PIDTYPE_TGID].

This means that __group_complete_signal() may hang while doing

do ... while (next_thread() != p)

We can solve this problem if we reverse these 2 attach_pid()s:

attach_pid() does wmb()

group_send_sig_info() calls spin_lock(), which
provides a read barrier. // Yes ?

I don't think we can hit this race in practice, but still.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] fix kill_proc_info() vs CLONE_THREAD race

There is a window after copy_process() unlocks ->sighand.siglock
and before it adds the new thread to the thread list.

In that window __group_complete_signal(SIGKILL) will not see the
new thread yet, so this thread will start running while the whole
thread group was supposed to exit.

I beleive we have another good reason to place attach_pid(PID/TGID)
under ->sighand.siglock. We can do the same for

release_task()->__unhash_process()

de_thread()->switch_exec_pids()

After that we don't need tasklist_lock to iterate over the thread
list, and we can simplify things, see for example do_sigaction()
or sys_times().

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

Merge /pub/scm/linux/kernel/git/davem/net-2.6

[ARM] remove duplicate #includes

Signed-off-by: Herbert P?tzl <herbert@13thfloor.at>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

[SERIAL] Fix typo in comment

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

[SERIAL] Documentation/jsm.txt is a no show.

In kernel bugzilla #5176 (http://bugzilla.kernel.org/show_bug.cgi?id=5176)
Harry R\374ter <h.rueter@gmx.de> points out Documentation/jsm.txt is missing.

No one at Digi seems to care, so just remove the stale reference.

Signed-off-by: Arthur Othieno <apgo@patchbomb.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

[BRIDGE]: Fix deadlock in br_stp_disable_bridge

Looks like somebody forgot to use the _bh spin_lock variant. We ran into a
deadlock where br->hello_timer expired while br_stp_disable_br() walked
br->port_list.

Signed-off-by: Adrian Drzewiecki <z@drze.net>
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

[NETFILTER]: Fix xfrm lookup after SNAT

To find out if a packet needs to be handled by IPsec after SNAT, packets
are currently rerouted in POST_ROUTING and a new xfrm lookup is done. This
breaks SNAT of non-unicast packets to non-local addresses because the
packet is routed as incoming packet and no neighbour entry is bound to the
dst_entry. In general, it seems to be a bad idea to replace the dst_entry
after the packet was already sent to the output routine because its state
might not match what's expected.

This patch changes the xfrm lookup in POST_ROUTING to re-use the original
dst_entry without routing the packet again. This means no policy routing
can be used for transport mode transforms (which keep the original route)
when packets are SNATed to match the policy, but it looks like the best
we can do for now.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

Input: kill remnants of 98kbd{,-io} and 98spkr

98kbd{,-io} and 98spkr all went out with PC98 subarch. Remove stale Makefile
entries that remained.

Signed-off-by: Arthur Othieno <apgo@patchbomb.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>

Input: ads7846 - assorted updates

This updates the ads7846 touchscreen driver:
  - to allow faster clocking (this driver doesn't push sample rates);
  - bugfixes the conversion of spi_transfer to lists;
  - some dma-unsafe command buffers are fixed.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>

Input: ads7846 - convert to to dynamic input_dev allocation

Signed-off-by: Dmitry Torokhov <dtor@mail.ru>

Input: trackpoint - enable devices connected to external port

Signed-off-by: Dmitry Torokhov <dtor@mail.ru>

Input: logips2pp - add new signature (99)

Add Logitech mouse type 99 (Premium Optical Wheel Mouse, model M-BT58,
plain 3 buttons + wheel) to cure the following message: logips2pp: Detected
unknown logitech mouse model 99

Signed-off-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>

Input: ixp4xx-beeper - fix compile error

Signed-off-by: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>

[PATCH] CIFS: fix cifs_user_read oops when null SMB response on forcedirectio mount

This patch fixes an oops reported by Adrian Bunk in cifs_user_read when a null
read response is returned on a forcedirectio mount.

Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] neofb: avoid resetting display config on unblank

Fix issues with the NeoMagic framebuffer driver.

It nicely complements my previous fix already in linus' tree. The only
thing missing now is that the external CRT will not be activated at neofb
init when external-only is selected, either by register read or
module/kernel parameter.

Testing was done on a Dell Latitude CPi-A/NM2200 chip.

Previous behaviour:
- before booting linux, set the preferred display config X via FN+F8

- boot linux, neofb stores the register values in a private
variable

- change the display config to Y via keystroke

- leave the machine in peace until display is blanked

- touching any key will result in display config X being restored

- booting up, the BIOS will acknowledge config Y, though...

Current behaviour:
At the time of unblanking, config Y is honoured because we now read back
register contents instead of just overwriting them with outdated values.

Signed-off by: Christian Trefzer <ctrefzer@gmx.de>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] x86: gitignore some autogenerated files for i386

Add some more gitignore files for i386 architecture. This files are
created during the build process of a i386 kernel.

Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] x86: document sysenter path

This path isn't obvious. It looks as if the kernel will be taking three
args from the user stack, but it only takes one from there.

Signed-off-by: Albert Cahalan <acahalan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] FRV: Use virtual interrupt disablement

Make the FRV arch use virtual interrupt disablement because accesses to the
processor status register (PSR) are relatively slow and because we will
soon have the need to deal with multiple interrupt controls at the same
time (separate h/w and inter-core interrupts).

The way this is done is to dedicate one of the four integer condition code
registers (ICC2) to maintaining a virtual interrupt disablement state
whilst inside the kernel.  This uses the ICC2.Z flag (Zero) to indicate
whether the interrupts are virtually disabled and the ICC2.C flag (Carry)
to indicate whether the interrupts are physically disabled.

ICC2.Z is set to indicate interrupts are virtually disabled.  ICC2.C is set
to indicate interrupts are physically enabled.  Under normal running
conditions Z==0 and C==1.

Disabling interrupts with local_irq_disable() doesn't then actually
physically disable interrupts - it merely sets ICC2.Z to 1.  Should an
interrupt then happen, the exception prologue will note ICC2.Z is set and
branch out of line using one instruction (an unlikely BEQ).  Here it will
physically disable interrupts and clear ICC2.C.

When it comes time to enable interrupts (local_irq_enable()), this simply
clears the ICC2.Z flag and invokes a trap #2 if both Z and C flags are
clear (the HI integer condition).  This can be done with the TIHI
conditional trap instruction.

The trap then physically reenables interrupts and sets ICC2.C again.  Upon
returning the interrupt will be taken as interrupts will then be enabled.
Note that whilst processing the trap, the whole exceptions system is
disabled, and so an interrupt can't happen till it returns.

If no pending interrupt had happened, ICC2.C would still be set, the HI
condition would not be fulfilled, and no trap will happen.

Saving interrupts (local_irq_save) is simply a matter of pulling the ICC2.Z
flag out of the CCR register, shifting it down and masking it off.  This
gives a result of 0 if interrupts were enabled and 1 if they weren't.

Restoring interrupts (local_irq_restore) is then a matter of taking the
saved value mentioned previously and XOR'ing it against 1.  If it was one,
the result will be zero, and if it was zero the result will be non-zero.
This result is then used to affect the ICC2.Z flag directly (it is a
condition code flag after all).  An XOR instruction does not affect the
Carry flag, and so that bit of state is unchanged.  The two flags can then
be sampled to see if they're both zero using the trap (TIHI) as for the
unconditional reenablement (local_irq_enable).

This patch also:

(1) Modifies the debugging stub (break.S) to handle single-stepping crossing
     into the trap #2 handler and into virtually disabled interrupts.

(2) Removes superseded fixup pointers from the second instructions in the trap
     tables (there's no a separate fixup table for this).

(3) Declares the trap #3 vector for use in .org directives in the trap table.

(4) Moves irq_enter() and irq_exit() in do_IRQ() to avoid problems with
     virtual interrupt handling, and removes the duplicate code that has now
     been folded into irq_exit() (softirq and preemption handling).

(5) Tells the compiler in the arch Makefile that ICC2 is now reserved.

(6) Documents the in-kernel ABI, including the virtual interrupts.

(7) Renames the old irq management functions to different names.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] FRV: Miscellaneous fixes

Make various alterations and fixes to the FRV arch:

(1) Resyncs the FRV system call collection with the i386 arch.

(2) Discards __iounmap() as it's not used.

(3) Fixes the use of the SWAP/SWAPI instruction to get the arguments the right
way around in atomic.h, and also to get the asm constraints correct.

(4) Moves copy_to/from_user_page() to asm/cacheflush.h to be consistent with
other archs.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] hrtimer: round up relative start time on low-res arches

CONFIG_TIME_LOW_RES is a temporary way for architectures to signal that
they simply return xtime in do_gettimeoffset(). In this corner-case we
want to round up by resolution when starting a relative timer, to avoid
short timeouts. This will go away with the GTOD framework.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] s390: fix __delay implementation

Fix __delay implementation. Called with an argument "1" or "0" it would
loop nearly forever (since (1/2)-1 = 0xffffffff).

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] fix a typo in the CPU_H8300H dependencies

Jean-Luc Leger <reiga@dspnet.fr.eu.org> found this obvious typo.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] sched: revert "filter affine wakeups"

Revert commit d7102e95b7b9c00277562c29aad421d2d521c5f6:

    [PATCH] sched: filter affine wakeups

Apparently caused more than 10% performance regression for aim7 benchmark.
The setup in use is 16-cpu HP rx8620, 64Gb of memory and 12 MSA1000s with 144
disks.  Each disk is 72Gb with a single ext3 filesystem (courtesy of HP, who
supplied benchmark results).

The problem is, for aim7, the wake-up pattern is random, but it still needs
load balancing action in the wake-up path to achieve best performance.  With
the above commit, lack of load balancing hurts that workload.

However, for workloads like database transaction processing, the requirement
is exactly opposite.  In the wake up path, best performance is achieved with
absolutely zero load balancing.  We simply wake up the process on the CPU that
it was previously run.  Worst performance is obtained when we do load
balancing at wake up.

There isn't an easy way to auto detect the workload characteristics.  Ingo's
earlier patch that detects idle CPU and decide whether to load balance or not
doesn't perform with aim7 either since all CPUs are busy (it causes even
bigger perf.  regression).

Revert commit d7102e95b7b9c00277562c29aad421d2d521c5f6, which causes more
than 10% performance regression with aim7.

Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] madvise MADV_DONTFORK/MADV_DOFORK

Currently, copy-on-write may change the physical address of a page even if the
user requested that the page is pinned in memory (either by mlock or by
get_user_pages).  This happens if the process forks meanwhile, and the parent
writes to that page.  As a result, the page is orphaned: in case of
get_user_pages, the application will never see any data hardware DMA's into
this page after the COW.  In case of mlock'd memory, the parent is not getting
the realtime/security benefits of mlock.

In particular, this affects the Infiniband modules which do DMA from and into
user pages all the time.

This patch adds madvise options to control whether memory range is inherited
across fork.  Useful e.g.  for when hardware is doing DMA from/into these
pages.  Could also be useful to an application wanting to speed up its forks
by cutting large areas out of consideration.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Acked-by: Hugh Dickins <hugh@veritas.com>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] kprobes: Update Documentation/kprobes.txt

Update Documentation/kprobes.txt to reflect Kprobes enhancements and other
recent developments.

Acked-by: Ananth Mavinakayanahalli <mananth@in.ibm.com>
Signed-off-by: Jim Keniston <jkenisto@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] Fix NULL pointer dereference in isdn_tty_at_cout

The changes in the tty related code introduced wrong parenthesis in a if
condition in the isdn_tty_at_cout function. This caused access to index -1
in the dev->drv[] array. This patch change it back to the correct
condition from the previous versions.

Signed-off-by: Karsten Keil <kkeil@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] fix x86 topology export in sysfs for subarchitectures

The correct way to export hyperthreading based functions is to predicate
them on CONFIG_X86_HT. Without this, the topology exporting patch breaks
the build on all non-PC x86 subarchitectures.

Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] NLM: Fix the NLM_GRANTED callback checks

If 2 threads attached to the same process are blocking on different locks on
different files (maybe even on different servers) but have the same lock
arguments (i.e. same offset+length - actually quite common, since most
processes try to lock the entire file) then the first GRANTED call that wakes
one up will also wake the other.

Currently when the NLM_GRANTED callback comes in, lockd walks the list of
blocked locks in search of a match to the lock that the NLM server has
granted. Although it checks the lock pid, start and end, it fails to check
the filehandle and the server address.

By checking the filehandle and server IP address, we ensure that this only
happens if the locks truly are referencing the same file.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] jbd: revert checkpoint list changes

This patch reverts commit f93ea411b73594f7d144855fd34278bcf34a9afc:
  [PATCH] jbd: split checkpoint lists

This broke journal_flush() for OCFS2, which is its method of being sure
that metadata is sent to disk for another node.

And two related commits 8d3c7fce2d20ecc3264c8d8c91ae3beacdeaed1b and
43c3e6f5abdf6acac9b90c86bf03f995bf7d3d92 with the subjects:
  [PATCH] jbd: log_do_checkpoint fix
  [PATCH] jbd: remove_transaction fix

These seem to be incremental bugfixes on the original patch and as such are
no longer needed.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Cc: Jan Kara <jack@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] HPET: handle multiple ACPI EXTENDED_IRQ resources

When the _CRS for a single HPET contains multiple EXTENDED_IRQ resources,
we overwrote hdp->hd_nirqs every time we found one.

So the driver worked when all the IRQs were described in a single
EXTENDED_IRQ resource, but failed when multiple resources were used.
(Strictly speaking, I think the latter is actually more correct, but both
styles have been used.)

Someday we should remove all the ACPI stuff from hpet.c and use PNP driver
registration instead. But currently PNP_MAX_IRQ is 2, and HPETs often have
more IRQs. Hint, hint, Adam :-)

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Acked-by: Bob Picco <robert.picco@hp.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Adam Belay <ambx1@neo.rr.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] tty reference count fix

Fix hole where tty structure can be released when reference count is non
zero.  Existing code can sleep without tty_sem protection between deciding
to release the tty structure (setting local variables tty_closing and
otty_closing) and setting TTY_CLOSING to prevent further opens.  An open
can occur during this interval causing release_dev() to free the tty
structure while it is still referenced.

This should fix bugzilla.kernel.org [Bug 6041] New: Unable to handle kernel
paging request

In Bug 6041, tty_open() oopes on accessing the tty structure it has
successfully claimed.  Bug was on SMP machine with the same tty being
opened and closed by multiple processes, and DEBUG_PAGEALLOC enabled.

Signed-off-by: Paul Fulghum <paulkf@microgate.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] compound page: no access_process_vm check

The PageCompound check before access_process_vm's set_page_dirty_lock is no
longer necessary, so remove it. But leave the PageCompound checks in
bio_set_pages_dirty, dio_bio_complete and nfs_free_user_pages: at least some
of those were introduced as a little optimization on hugetlb pages.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] compound page: default destructor

Somehow I imagined that calling a NULL destructor would free a compound page
rather than oopsing.  No, we must supply a default destructor, __free_pages_ok
using the order noted by prep_compound_page.  hugetlb can still replace this
as before with its own free_huge_page pointer.

The case that needs this is not common: rarely does put_compound_page's
put_page_testzero bring the count down to 0.  But if get_user_pages is applied
to some part of a compound page, without immediate release (e.g.  AIO or
Infiniband), then it's possible for its put_page to come after the containing
vma has been unmapped and the driver done its free_pages.

That's just the kind of case compound pages are supposed to be guarding
against (but Nick points out, nor did PageReserved handle this right).

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] compound page: use page[1].lru

If a compound page has its own put_page_testzero destructor (the only current
example is free_huge_page), that is noted in page[1].mapping of the compound
page.  But that's rather a poor place to keep it: functions which call
set_page_dirty_lock after get_user_pages (e.g.  Infiniband's
__ib_umem_release) ought to be checking first, otherwise set_page_dirty is
liable to crash on what's not the address of a struct address_space.

And now I'm about to make that worse: it turns out that every compound page
needs a destructor, so we can no longer rely on hugetlb pages going their own
special way, to avoid further problems of page->mapping reuse.  For example,
not many people know that: on 50% of i386 -Os builds, the first tail page of a
compound page purports to be PageAnon (when its destructor has an odd
address), which surprises page_add_file_rmap.

Keep the compound page destructor in page[1].lru.next instead.  And to free up
the common pairing of mapping and index, also move compound page order from
index to lru.prev.  Slab reuses page->lru too: but if we ever need slab to use
compound pages, it can easily stack its use above this.

(akpm: decoded version of the above: the tail pages of a compound page now
have ->mapping==NULL, so there's no need for the set_page_dirty[_lock]()
caller to check that they're not compund pages before doing the dirty).

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] pktcdvd: Reduce stack usage

Reduce stack usage in the pkt_start_write() function. Even though it's not
currently a real problem, the pages and offsets arrays can be eliminated,
which saves approximately 1000 bytes of stack space.

Signed-off-by: Peter Osterlund <petero2@telia.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] pktcdvd: Don't unlock the door if the disc is in use

Unlocking the door when the disc is in use is obviously not good, because then
it's possible to eject the disc at the wrong time and cause severe disc data
corruption.

Signed-off-by: Peter Osterlund <petero2@telia.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>