review.tizen.org Git - platform/adaptation/renesas_rcar/renesas

x86: Hyper-V: register clocksource only if its advertised

Enable hyperv_clocksource only if its advertised as a feature.
XenServer 6 returns the signature which is checked in
ms_hyperv_platform(), but it does not offer all features. Currently the
clocksource is enabled unconditionally in ms_hyperv_init_platform(), and
the result is a hanging guest.

Hyper-V spec Bit 1 indicates the availability of Partition Reference
Counter.  Register the clocksource only if this bit is set.

The guest in question prints this in dmesg:
[    0.000000] Hypervisor detected: Microsoft HyperV
[    0.000000] HyperV: features 0x70, hints 0x0

This bug can be reproduced easily be setting 'viridian=1' in a HVM domU
.cfg file. A workaround without this patch is to boot the HVM guest with
'clocksource=jiffies'.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Link: http://lkml.kernel.org/r/1359940959-32168-1-git-send-email-kys@microsoft.com
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Cc: <stable@vger.kernel.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>

Merge branch 'autofs-fix' of git://git./linux/kernel/git/deller/parisc-linux into akpm

Pull hp parisc automounter fix from Helge Deller:
"This unbreaks automounter support for the parisc architecture (and
probably aarch64 as well).""

* 'autofs-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
unbreak automounter support on 64-bit kernel with 32-bit userspace (v2)

Merge branch 'for-linus' of git://git./linux/kernel/git/s390/linux into akpm

Pull s390 regression fix from Martin Schwidefsky:
"The recent fix for the s390 sched_clock() function uncovered yet
  another bug in s390_next_ktime which causes an endless loop in KVM.
  This regression should be fixed before v3.8.

  I keep the fingers crossed that this is the last one for v3.8."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/timer: avoid overflow when programming clock comparator

Merge branch 'for-linus' of git://git./linux/kernel/git/gerg/m68knommu into akpm

Pull m68knommu fix from Greg Ungerer:
"This contains a single critical fix for the non-MMU m68k platforms.

  The change of the kernel exec code path has revealed a problem in the
  start thread code that causes crashing on boot.  This is the fix for
  it."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
  m68knommu: fix trap on execing /bin/init

htb: fix values in opt dump

in htb_change_class() cl->buffer and cl->buffer are stored in ns.
So in dump, convert them back to psched ticks.

Note this was introduced by:
commit 56b765b79e9a78dc7d3f8850ba5e5567205a3ecd
htb: improved accuracy at high rates

Please consider this for -net/-stable.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

x86, head_32: Give the 6 label a real name

Jumping here we are about to enable paging so rename the label
accordingly.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1360592538-10643-5-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>

x86, head_32: Remove second CPUID detection from default_entry

We do that once earlier now and cache it into new_cpu_data.cpuid_level
so no need for the EFLAGS.ID toggling dance anymore.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1360592538-10643-4-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>

x86: Detect CPUID support early at boot

We detect CPUID function support on each CPU and save it for later use,
obviating the need to play the toggle EFLAGS.ID game every time. C code
is looking at ->cpuid_level anyway.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1360592538-10643-3-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>

x86, head_32: Remove i386 pieces

Remove code fragments detecting a 386 CPU since we don't support those
anymore. Also, do not do alignment checks because they're done only at
CPL3. Also, no need to preserve EFLAGS.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1360592538-10643-2-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>

Merge tag 'v3.8-rc7' into x86/asm

Merge in the updates to head_32.S from the previous urgent branch, as
upcoming patches will make further changes.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>

Merge branch 'stable' of git://git./linux/kernel/git/cmetcalf/linux-tile into akpm

Pull tile bugfixes from Chris Metcalf:
"This includes a variety of minor bug fixes, mostly to do with testing
  "make allyesconfig", "make allmodconfig", "make allnoconfig", inspired
  to Tejun Heo's observation about Kconfig.freezer not being included.

  The largest changes are just syntax changes removing the tile-specific
  use of a macro named INT_MASK, which is way too commonly redefined
  throughout driver code"

* 'stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
  tile: tag some code with #ifdef CONFIG_COMPAT
  tile: fix memcpy_*io functions for allnoconfig
  tile: export a handful of symbols appropriately
  drm: fix compile failure by including <linux/swiotlb.h>
  tile: avoid defining INT_MASK macro in <arch/interrupts.h>
  tile: provide "screen_info" when enabling VT
  drivers/input/joystick/analog.c: enable precise timer
  tile: include kernel/Kconfig.freezer in tile Kconfig
  tile: remove an unused variable in copy_thread()

Merge tag 'fixes-for-linus' of git://git./linux/kernel/git/arm/arm-soc into akpm

Pull ARM SoC fixes from Olof Johansson:
"We had a number of fixes queued up, but taking a strict pass-through
  and weeding out any that either have been broken for a while, or are
  for platforms that need out-of-tree code to be useful anyway, or other
  fixes for problems that few users are likely to see in real life, only
  this short branch of patches remains.

  The three patches here are to make SMP boot work on the Calxeda
  platforms again.  Some of the rework for cpuids on 3.8 broke it (and
  it was discovered late, unfortunately)."

* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  ARM: highbank: mask cluster id from cpu_logical_map
  ARM: scu: mask cluster id from cpu_logical_map
  ARM: scu: add empty scu_enable for !CONFIG_SMP

tracing/syscalls: Allow archs to ignore tracing compat syscalls

The tracing of ia32 compat system calls has been a bit of a pain as they
use different system call numbers than the 64bit equivalents.

I wrote a simple 'lls' program that lists files. I compiled it as a i686
ELF binary and ran it under a x86_64 box. This is the result:

echo 0 > /debug/tracing/tracing_on
echo 1 > /debug/tracing/events/syscalls/enable
echo 1 > /debug/tracing/tracing_on ; ./lls ; echo 0 > /debug/tracing/tracing_on

grep lls /debug/tracing/trace

[.. skipping calls before TS_COMPAT is set ...]

             lls-1127  [005] d...   936.409188: sys_recvfrom(fd: 0, ubuf: 4d560fc4, size: 0, flags: 8048034, addr: 8, addr_len: f7700420)
             lls-1127  [005] d...   936.409190: sys_recvfrom -> 0x8a77000
             lls-1127  [005] d...   936.409211: sys_lgetxattr(pathname: 0, name: 1000, value: 3, size: 22)
             lls-1127  [005] d...   936.409215: sys_lgetxattr -> 0xf76ff000
             lls-1127  [005] d...   936.409223: sys_dup2(oldfd: 4d55ae9b, newfd: 4)
             lls-1127  [005] d...   936.409228: sys_dup2 -> 0xfffffffffffffffe
             lls-1127  [005] d...   936.409236: sys_newfstat(fd: 4d55b085, statbuf: 80000)
             lls-1127  [005] d...   936.409242: sys_newfstat -> 0x3
             lls-1127  [005] d...   936.409243: sys_removexattr(pathname: 3, name: ffcd0060)
             lls-1127  [005] d...   936.409244: sys_removexattr -> 0x0
             lls-1127  [005] d...   936.409245: sys_lgetxattr(pathname: 0, name: 19614, value: 1, size: 2)
             lls-1127  [005] d...   936.409248: sys_lgetxattr -> 0xf76e5000
             lls-1127  [005] d...   936.409248: sys_newlstat(filename: 3, statbuf: 19614)
             lls-1127  [005] d...   936.409249: sys_newlstat -> 0x0
             lls-1127  [005] d...   936.409262: sys_newfstat(fd: f76fb588, statbuf: 80000)
             lls-1127  [005] d...   936.409279: sys_newfstat -> 0x3
             lls-1127  [005] d...   936.409279: sys_close(fd: 3)
             lls-1127  [005] d...   936.421550: sys_close -> 0x200
             lls-1127  [005] d...   936.421558: sys_removexattr(pathname: 3, name: ffcd00d0)
             lls-1127  [005] d...   936.421560: sys_removexattr -> 0x0
             lls-1127  [005] d...   936.421569: sys_lgetxattr(pathname: 4d564000, name: 1b1abc, value: 5, size: 802)
             lls-1127  [005] d...   936.421574: sys_lgetxattr -> 0x4d564000
             lls-1127  [005] d...   936.421575: sys_capget(header: 4d70f000, dataptr: 1000)
             lls-1127  [005] d...   936.421580: sys_capget -> 0x0
             lls-1127  [005] d...   936.421580: sys_lgetxattr(pathname: 4d710000, name: 3000, value: 3, size: 812)
             lls-1127  [005] d...   936.421589: sys_lgetxattr -> 0x4d710000
             lls-1127  [005] d...   936.426130: sys_lgetxattr(pathname: 4d713000, name: 2abc, value: 3, size: 32)
             lls-1127  [005] d...   936.426141: sys_lgetxattr -> 0x4d713000
             lls-1127  [005] d...   936.426145: sys_newlstat(filename: 3, statbuf: f76ff3f0)
             lls-1127  [005] d...   936.426146: sys_newlstat -> 0x0
             lls-1127  [005] d...   936.431748: sys_lgetxattr(pathname: 0, name: 1000, value: 3, size: 22)

Obviously I'm not calling newfstat with a fd of 4d55b085. The calls are
obviously incorrect, and confusing.

Other efforts have been made to fix this:

https://lkml.org/lkml/2012/3/26/367

But the real solution is to rewrite the syscall internals and come up
with a fixed solution. One that doesn't require all the kluge that the
current solution has.

Thus for now, instead of outputting incorrect data, simply ignore them.
With this patch the changes now have:

#> grep lls /debug/tracing/trace
#>

Compat system calls simply are not traced. If users need compat
syscalls, then they should just use the raw syscall tracepoints.

For an architecture to make their compat syscalls ignored, it must
define ARCH_TRACE_IGNORE_COMPAT_SYSCALLS (done in asm/ftrace.h) and also
define an arch_trace_is_compat_syscall() function that will return true
if the current task should ignore tracing the syscall.

I want to stress that this change does not affect actual syscalls in any
way, shape or form. It is only used within the tracing system and
doesn't interfere with the syscall logic at all. The changes are
consolidated nicely into trace_syscalls.c and asm/ftrace.h.

I had to make one small modification to asm/thread_info.h and that was
to remove the include of asm/ftrace.h. As asm/ftrace.h required the
current_thread_info() it was causing include hell. That include was
added back in 2008 when the function graph tracer was added:

commit caf4b323 "tracing, x86: add low level support for ftrace return tracing"

It does not need to be included there.

Link: http://lkml.kernel.org/r/1360703939.21867.99.camel@gandalf.local.home
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

mm: cma: fix accounting of CMA pages placed in high memory

The total number of low memory pages is determined as totalram_pages -
totalhigh_pages, so without this patch all CMA pageblocks placed in
highmem were accounted to low memory.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

kernel/pid.c: reenable interrupts when alloc_pid() fails because init has exited

We're forgetting to reenable local interrupts on an error path.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Reported-by: Josh Boyer <jwboyer@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

memcg: fix kmemcg registration for late caches

The designed workflow for the caches in kmemcg is: register it with
memcg_register_cache() if kmemcg is already available or later on when a
new kmemcg appears at memcg_update_cache_sizes() which will handle all
caches in the system. The caches created at boot time will be handled
by the later, and the memcg-caches as well as any system caches that are
registered later on by the former.

There is a bug, however, in memcg_register_cache: we correctly set up
the array size, but do not mark the cache as a root cache.

This means that allocations for any cache appearing late in the game
will see memcg->memcg_params->is_root_cache == false, and in particular,
trigger VM_BUG_ON(!cachep->memcg_params->is_root_cache) in
__memcg_kmem_cache_get.

The obvious fix is to include the missing assignment.

Signed-off-by: Glauber Costa <glommer@parallels.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: don't overwrite mm->def_flags in do_mlockall()

With commit 8e72033f2a48 ("thp: make MADV_HUGEPAGE check for
mm->def_flags") the VM_NOHUGEPAGE flag may be set on s390 in
mm->def_flags for certain processes, to prevent future thp mappings.
This would be overwritten by do_mlockall(), which sets it back to 0 with
an optional VM_LOCKED flag set.

To fix this, instead of overwriting mm->def_flags in do_mlockall(), only
the VM_LOCKED flag should be set or cleared.

Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Reported-by: Vivek Goyal <vgoyal@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

drivers/rtc/rtc-pl031.c: restore ST variant functionality

Commit e7e034e18a0a ("drivers/rtc/rtc-pl031.c: fix the missing operation
on enable") accidentally broke the ST variants of PL031.

The bit that is being poked as "clockwatch" enable bit for the ST
variants does the work of bit 0 on this variant. Bit 0 is used for a
clock divider on the ST variants, and setting it to 1 will affect
timekeeping in a very bad way.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Haojian Zhuang <haojian.zhuang@gmail.com>
Cc: Mian Yousaf KAUKAB <mian.yousaf.kaukab@stericsson.com>
Cc: Srinidhi Kasagar <srinidhi.kasagar@stericsson.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'for-davem' of git://git./linux/kernel/git/linville/wireless

John W. Linville says:

====================
Here is another handful of late-breaking fixes intended for the 3.8
stream... Hopefully the will still make it! :-)

There are three mac80211 fixes pulled from Johannes:

"Here are three fixes still for the 3.8 stream, the fix from Cong Ding
for the bad sizeof (Stephen Hemminger had pointed it out before but I'd
promptly forgotten), a mac80211 managed-mode channel context usage fix
where a downgrade would never stop until reaching non-HT and a bug in
the channel determination that could cause invalid channels like HT40+
on channel 11 to be used."

Also included is a mwl8k fix that avoids an oops when using mwl8k
devices that only support the 5 GHz band.

Please let me know if there are problems!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ixgbe: Only set gso_type to SKB_GSO_TCPV4 as RSC does not support IPv6

The original fix that was applied for setting gso_type required more change
than necessary because it was assumed ixgbe does RSC on IPv6 frames and this
is not correct. RSC is only supported with IPv4/TCP frames only. As such we
can simplify the fix and avoid the unnecessary move of eth_type_trans.

The previous patch "ixgbe: fix gso type" and this patch reduce the entire fix
to one line that sets gso_type to TCPV4 if the frame is RSC.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fix infinite loop in __skb_recv_datagram()

Tommi was fuzzing with trinity and reported the following problem :

commit 3f518bf745 (datagram: Add offset argument to __skb_recv_datagram)
missed that a raw socket receive queue can contain skbs with no payload.

We can loop in __skb_recv_datagram() with MSG_PEEK mode, because
wait_for_packet() is not prepared to skip these skbs.

[   83.541011] INFO: rcu_sched detected stalls on CPUs/tasks: {}
(detected by 0, t=26002 jiffies, g=27673, c=27672, q=75)
[   83.541011] INFO: Stall ended before state dump start
[  108.067010] BUG: soft lockup - CPU#0 stuck for 22s! [trinity-child31:2847]
...
[  108.067010] Call Trace:
[  108.067010]  [<ffffffff818cc103>] __skb_recv_datagram+0x1a3/0x3b0
[  108.067010]  [<ffffffff818cc33d>] skb_recv_datagram+0x2d/0x30
[  108.067010]  [<ffffffff819ed43d>] rawv6_recvmsg+0xad/0x240
[  108.067010]  [<ffffffff818c4b04>] sock_common_recvmsg+0x34/0x50
[  108.067010]  [<ffffffff818bc8ec>] sock_recvmsg+0xbc/0xf0
[  108.067010]  [<ffffffff818bf31e>] sys_recvfrom+0xde/0x150
[  108.067010]  [<ffffffff81ca4329>] system_call_fastpath+0x16/0x1b

Reported-by: Tommi Rantala <tt.rantala@gmail.com>
Tested-by: Tommi Rantala <tt.rantala@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: qmi_wwan: add Yota / Megafon M100-1 4g modem

Interface layout:

00 CD-ROM
01 debug COM port
02 AP control port
03 modem
04 usb-ethernet

Bus=01 Lev=02 Prnt=02 Port=01 Cnt=02 Dev#=  4 Spd=480  MxCh= 0
D:  Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
P:  Vendor=0408 ProdID=ea42 Rev= 0.00
S:  Manufacturer=Qualcomm, Incorporated
S:  Product=Qualcomm CDMA Technologies MSM
S:  SerialNumber=353568051xxxxxx
C:* #Ifs= 5 Cfg#= 1 Atr=e0 MxPwr=500mA
I:* If#= 0 Alt= 0 #EPs= 2 Cls=08(stor.) Sub=06 Prot=50 Driver=usb-storage
E:  Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:* If#= 1 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=(none)
E:  Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=4ms
I:* If#= 2 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=(none)
E:  Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=4ms
I:* If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=(none)
E:  Ad=84(I) Atr=03(Int.) MxPS=  64 Ivl=2ms
E:  Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=4ms
I:* If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=(none)
E:  Ad=86(I) Atr=03(Int.) MxPS=  64 Ivl=2ms
E:  Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=05(O) Atr=02(Bulk) MxPS= 512 Ivl=4ms

Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'timers/for-arm' into timers/core

clockevents: Fix generic broadcast for FEAT_C3STOP

Commit 12ad100046: "clockevents: Add generic timer broadcast function"
made tick_device_uses_broadcast set up the generic broadcast function
for dummy devices (where !tick_device_is_functional(dev)), but neglected
to set up the broadcast function for devices that stop in low power
states (with the CLOCK_EVT_FEAT_C3STOP flag).

When these devices enter low power states they will not have the generic
broadcast function assigned, and will bring down the system when an
attempt is made to broadcast to them.

This patch ensures that the broadcast function is also assigned for
devices which require broadcast in low power states.

Reported-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Stephen Warren <swarren@nvidia.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: nico@linaro.org
Cc: Marc.Zyngier@arm.com
Cc: Will.Deacon@arm.com
Cc: santosh.shilimkar@ti.com
Cc: john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:
"Three nouveau fixes, all user visible issues, and one radeon
  regression fix"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
  drm/radeon: enforce use of radeon_get_ib_value when reading user cmd
  drm/nouveau: add lockdep annotations
  drm/nv50/fb: Fix nullptr-deref on IGPs
  drm/nouveau: use different register to wait for secret scrubber

Merge branch 'master' of git://git./linux/kernel/git/linville/wireless into for-davem

drm/radeon: enforce use of radeon_get_ib_value when reading user cmd

When ever parsing cmd buffer supplied by userspace we need to use
radeon_get_ib_value rather than directly accessing the ib as the user
cmd might not yet be copied into the ib thus the parser might read
value that does not correspond to what user is sending and possibly
allowing user to send malicious command undected.

Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

x86, uv, uv3: Trim MMR register definitions after code changes for SGI UV3

This patch trims the MMR register definitions after the updates for the
SGI UV3 system have been applied. Note that because these definitions
are automatically generated from the RTL we cannot control the length
of the names. Therefore there are lines that exceed 80 characters.

Signed-off-by: Mike Travis <travis@sgi.com>
Link: http://lkml.kernel.org/r/20130211194509.173026880@gulag1.americas.sgi.com
Acked-by: Russ Anderson <rja@sgi.com>
Reviewed-by: Dimitri Sivanich <sivanich@sgi.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>

x86, uv, uv3: Check current gru hub support for SGI UV3

This patch checks current hub support to avoid panicing the
system until all the GRU changes for UV3+ are in place.

Signed-off-by: Mike Travis <travis@sgi.com>
Link: http://lkml.kernel.org/r/20130211194509.035828372@gulag1.americas.sgi.com
Acked-by: Dimitri Sivanich <sivanich@sgi.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>

x86, uv, uv3: Update Time Support for SGI UV3

This patch updates time support for the SGI UV3 hub. Since the UV2
and UV3 time support is identical, "is_uvx_hub" is used instead of
having both "is_uv2_hub" and "is_uv3_hub".

Signed-off-by: Mike Travis <travis@sgi.com>
Link: http://lkml.kernel.org/r/20130211194508.893907185@gulag1.americas.sgi.com
Acked-by: Russ Anderson <rja@sgi.com>
Reviewed-by: Dimitri Sivanich <sivanich@sgi.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>

x86, uv, uv3: Update x2apic Support for SGI UV3

This patch adds support for the SGI UV3 hub to the common x2apic
functions.  The primary changes are to account for the similarities
between UV2 and UV3 which are encompassed within the "UVX" nomenclature.

One significant difference within UV3 is the handling of the MMIOH
regions which are redirected to the target blade (with the device) in
a different manner.  It also now has two MMIOH regions for both small and
large BARs.  This aids in limiting the amount of physical address space
removed from real memory that's used for I/O in the max config of 64TB.

Signed-off-by: Mike Travis <travis@sgi.com>
Link: http://lkml.kernel.org/r/20130211194508.752924185@gulag1.americas.sgi.com
Acked-by: Russ Anderson <rja@sgi.com>
Reviewed-by: Dimitri Sivanich <sivanich@sgi.com>
Cc: Alexander Gordeev <agordeev@redhat.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Steffen Persvold <sp@numascale.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>

x86, uv, uv3: Update Hub Info for SGI UV3

This patch updates the UV HUB info for UV3. The "is_uv3_hub" and
"is_uvx_hub" (UV2 or UV3) functions are added as well as the addresses
and sizes of the MMR regions for UV3.

Signed-off-by: Mike Travis <travis@sgi.com>
Link: http://lkml.kernel.org/r/20130211194508.610723192@gulag1.americas.sgi.com
Acked-by: Russ Anderson <rja@sgi.com>
Reviewed-by: Dimitri Sivanich <sivanich@sgi.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>

x86, uv, uv3: Update ACPI Check to include SGI UV3

Add UV3 to exclusion list. Instead of adding every new series of
SGI UV systems, just check oem_id to have a prefix of "SGI".

Signed-off-by: Mike Travis <travis@sgi.com>
Link: http://lkml.kernel.org/r/20130211194508.457937455@gulag1.americas.sgi.com
Acked-by: Russ Anderson <rja@sgi.com>
Reviewed-by: Dimitri Sivanich <sivanich@sgi.com>
Cc: Jiang Liu <liuj97@gmail.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>

x86, uv, uv3: Update MMR register definitions for SGI Ultraviolet System 3 (UV3)

This patch updates the MMR register definitions for the SGI UV3 system.
Note that because these definitions are automatically generated from
the RTL we cannot control the length of the names. Therefore there are
lines that exceed 80 characters.

All the new MMR definitions are added in this patch. The patches that
follow then update the references. The last patch is a "trim" patch
which reduces the size of the MMR definitions file by about a third.
This keeps "bi-sectability" in place as the intermediate patches would
not compile correctly if the trimmed MMR defines were done first.

Signed-off-by: Mike Travis <travis@sgi.com>
Link: http://lkml.kernel.org/r/20130211194508.326204556@gulag1.americas.sgi.com
Acked-by: Russ Anderson <rja@sgi.com>
Reviewed-by: Dimitri Sivanich <sivanich@sgi.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>

mwl8k: fix band for supported channels

The band field for the supported channels were left unpopulated, making
them default to 0 == IEEE80211_BAND_2GHZ, even for the 5GHz channels.

This resulted in null pointer accesses if anything tries to access
wiphy->bands[channel->band] of a 5GHz channel on 5GHz only cards, since
wiphy->bands[2GHZ] is NULL for them (e.g. cfg80211_chandef_usable does).

Example kernel OOPS:

[  665.669993] Unable to handle kernel NULL pointer dereference at virtual address 00000016
[  665.678194] pgd = c6d58000
[  665.680941] [00000016] *pgd=06f8a831, *pte=00000000, *ppte=00000000
[  665.687303] Internal error: Oops: 17 [#1]
(...)
[  666.116373] Backtrace:
[  666.118866] [<bf0368dc>] (cfg80211_chandef_usable+0x0/0x1bc [cfg80211]) from [<bf025e64>] (nl80211_leave_mesh+0x244/0x264 [cfg80211])
[  666.130919]  r7:c6d12100 r6:0000143c r5:c0611c48 r4:c0611b98
[  666.136668] [<bf025d84>] (nl80211_leave_mesh+0x164/0x264 [cfg80211]) from [<bf02634c>] (nl80211_remain_on_channel+0x2a0/0x358 [cfg80211])
[  666.149074]  r7:c6d12000 r6:c6d12000 r5:c6f4f368 r4:00000003
[  666.154814] [<bf0262ec>] (nl80211_remain_on_channel+0x240/0x358 [cfg80211]) from [<bf02ddb0>] (nl80211_set_wiphy+0x264/0x560 [cfg80211])
[  666.167150] [<bf02db4c>] (nl80211_set_wiphy+0x0/0x560 [cfg80211]) from [<c01f94e0>] (genl_rcv_msg+0x1b8/0x1f8)
[  666.177205] [<c01f9328>] (genl_rcv_msg+0x0/0x1f8) from [<c01f89a0>] (netlink_rcv_skb+0x58/0xb4)
[  666.185949] [<c01f8948>] (netlink_rcv_skb+0x0/0xb4) from [<c01f931c>] (genl_rcv+0x20/0x2c)
[  666.194251]  r6:c6f70780 r5:0000002c r4:c6f70780 r3:00000001
[  666.199973] [<c01f92fc>] (genl_rcv+0x0/0x2c) from [<c01f8418>] (netlink_unicast+0x154/0x1f4)
[  666.208449]  r4:c785ea00 r3:c01f92fc
[  666.212057] [<c01f82c4>] (netlink_unicast+0x0/0x1f4) from [<c01f8790>] (netlink_sendmsg+0x230/0x2b0)
[  666.221240] [<c01f8560>] (netlink_sendmsg+0x0/0x2b0) from [<c01cccf8>] (sock_sendmsg+0x90/0xa4)
[  666.229986] [<c01ccc68>] (sock_sendmsg+0x0/0xa4) from [<c01cdcb0>] (__sys_sendmsg+0x290/0x298)
[  666.238637]  r9:00000000 r8:c0611ec8 r6:0000002c r5:c0610000 r4:c0611f64
[  666.245411] [<c01cda20>] (__sys_sendmsg+0x0/0x298) from [<c01cf52c>] (sys_sendmsg+0x44/0x6c)
[  666.253897] [<c01cf4e8>] (sys_sendmsg+0x0/0x6c) from [<c00090a0>] (ret_fast_syscall+0x0/0x2c)
[  666.262460]  r6:00000000 r5:beeff96c r4:00000005

Signed-off-by: Jonas Gorski <jogo@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

Merge branch 'for-john' of git://git./linux/kernel/git/jberg/mac80211

bridge: set priority of STP packets

Spanning Tree Protocol packets should have always been marked as
control packets, this causes them to get queued in the high prirority
FIFO. As Radia Perlman mentioned in her LCA talk, STP dies if bridge
gets overloaded and can't communicate. This is a long-standing bug back
to the first versions of Linux bridge.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

x86/apic: Work around boot failure on HP ProLiant DL980 G7 Server systems

When a HP ProLiant DL980 G7 Server boots a regular kernel,
there will be intermittent lost interrupts which could
result in a hang or (in extreme cases) data loss.

The reason is that this system only supports x2apic physical
mode, while the kernel boots with a logical-cluster default
setting.

This bug can be worked around by specifying the "x2apic_phys" or
"nox2apic" boot option, but we want to handle this system
without requiring manual workarounds.

The BIOS sets ACPI_FADT_APIC_PHYSICAL in FADT table.
As all apicids are smaller than 255, BIOS need to pass the
control to the OS with xapic mode, according to x2apic-spec,
chapter 2.9.

Current code handle x2apic when BIOS pass with xapic mode
enabled:

When user specifies x2apic_phys, or FADT indicates PHYSICAL:

1. During madt oem check, apic driver is set with xapic logical
   or xapic phys driver at first.

2. enable_IR_x2apic() will enable x2apic_mode.

3. if user specifies x2apic_phys on the boot line, x2apic_phys_probe()
   will install the correct x2apic phys driver and use x2apic phys mode.
   Otherwise it will skip the driver will let x2apic_cluster_probe to
   take over to install x2apic cluster driver (wrong one) even though FADT
   indicates PHYSICAL, because x2apic_phys_probe does not check
   FADT PHYSICAL.

Add checking x2apic_fadt_phys in x2apic_phys_probe() to fix the
problem.

Signed-off-by: Stoney Wang <song-bo.wang@hp.com>
[ updated the changelog and simplified the code ]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: stable@kernel.org
Link: http://lkml.kernel.org/r/1360263182-16226-1-git-send-email-yinghai@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

mac80211: fix channel selection bug

When trying to connect to an AP that advertises HT but not
VHT, the mac80211 code erroneously uses the configuration
from the AP as is instead of checking it against regulatory
and local capabilities. This can lead to using an invalid
or even inexistent channel (like 11/HT40+).

Additionally, the return flags from downgrading must be
ORed together, to collect them from all of the downgrades.
Also clarify the message.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>

sched, powerpc: Fix sched.h split-up build failure

Fix PowerPC/Cell build fallout from:

8bd75c77b7c6 sched/rt: Move rt specific bits into new header file

Reported-by: Michael Ellerman <michael@ellerman.id.au>
Cc: Clark Williams <williams@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20130207094707.7b9f825f@riff.lan
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Merge branch 'uprobes/core' of git://git./linux/kernel/git/oleg/misc into perf/core

Improve uprobes performance by adding 'pre-filtering' support,
by Oleg Nesterov:

# time perl -e 'syscall -1 for 1..100_000'
real    0m0.040s
user    0m0.027s
sys     0m0.010s

# perf probe -x /lib/libc.so.6 syscall
# perf record -e probe_libc:syscall sleep 100 &

Before this series:

# time perl -e 'syscall -1 for 1..100_000'
real    0m1.714s
user    0m0.103s
sys     0m1.607s

After:

# time perl -e 'syscall -1 for 1..100_000'
real    0m0.037s
user    0m0.013s
sys     0m0.023s

Signed-off-by: Ingo Molnar <mingo@kernel.org>

Merge branch 'master' of git://1984.lsi.us.es/nf

Pablo Neira Ayuso says:

====================
The following patchset contains Netfilter/IPVS fixes for 3.8-rc7, they are:

* Fix oops in IPVS state-sync due to releasing a random memory area due
  to unitialized pointer, from Dan Carpenter.

* Fix SCTP flow establishment due to bad checksumming mangling in IPVS,
  from Daniel Borkmann.

* Three fixes for the recently added IPv6 NPT, all from YOSHIFUJI Hideaki,
  with an amendment collapsed into those patches from Ulrich Weber. They
  fiix adjustment calculation, fix prefix mangling and ensure LSB of
  prefixes are zeroes (as required by RFC).

Specifically, it took me a while to validate the 1's complement arithmetics/
checksumming approach in the IPv6 NPT code.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

arp: fix possible crash in arp_rcv()

We should call skb_share_check() before pskb_may_pull(), or we
can crash in pskb_expand_head()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'gso_type'

Michael S. Tsirkin says:

====================
At the moment, macvtap crashes are observed if macvtap is attached
to an interface with LRO enabled.
The crash in question is BUG() in macvtap_skb_to_vnet_hdr.
This happens because several drivers set gso_size but not gso_type
in incoming skbs.
This didn't use to be the case: with intel cards on 3.2 and older
kernels, with qlogic - on 3.4 and older kernels, so it's a regression if
not a recent one.
The following patches fix this for qlogic, broadcom and intel drivers.

I tested that the patch fixes the crash for ixgbe but
don't have qlogic/broadcom hardware to test.
I also only tested TCPv4.

Please review, and consider for 3.8.

Changes from v1:
- added missing htons as suggested by Eric
- backported the relevant bits from
cbf1de72324a8105ddcc3d9ce9acbc613faea17e for bnx2x
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: set gso_type

In LRO mode, bnx2x set gso_size but not gso type.
This leads to crashes in macvtap.
Commit cbf1de72324a8105ddcc3d9ce9acbc613faea17e
queued for 3.9 includes a more complete fix.
This is a minimal patch to avoid the crash, for 3.8.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qlcnic: set gso_type

qlcnic set gso_size but not gso type. This leads to crashes
in macvtap.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ixgbe: fix gso type

ixgbe set gso_size but not gso_type. This leads to
crashes in macvtap.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

stmmac: mdio register has to fail if the phy is not found

With this patch the stmmac fails in case of the phy device
is not found; w/o this fix the mdio can be register twice when
do down/up the iface and this is not correct.

Reported-by: Stas <stsp@list.ru>
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

stmmac: fix macro used for debugging the xmit

This patch fixes the name of the macro used for
debugging the transmit process. I used STMMAC_TX_DEBUG
instead of STMMAC_XMIT_DEBUG.

Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'davem.r8169' of git://violet.fr.zoreil.com/romieu/linux

Revert two power saving r8169 changes to fix some regressions
reported.

Reported-by: Jörg Otte <jrg.otte@gmail.com>
Tested-by: Jörg Otte <jrg.otte@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'drm-nouveau-fixes-3.8' of git://anongit.freedesktop.org/git/nouveau/linux-2.6 into drm-next

Fixes for one major lockdep warning, one oops reported by a few people, and
fix for a long hang on some gpu engines.

* 'drm-nouveau-fixes-3.8' of git://anongit.freedesktop.org/git/nouveau/linux-2.6:
  drm/nouveau: add lockdep annotations
  drm/nv50/fb: Fix nullptr-deref on IGPs
  drm/nouveau: use different register to wait for secret scrubber

Merge tag 'highbank-fixes-for-3.8' of git://sources.calxeda.com/kernel/linux into fixes

From Rob Herring:
highbank fixes for 3.8

-Compile fix for !SMP
-More cpu cluster id related fixes

* tag 'highbank-fixes-for-3.8' of git://sources.calxeda.com/kernel/linux:
  ARM: highbank: mask cluster id from cpu_logical_map
  ARM: scu: mask cluster id from cpu_logical_map
  ARM: scu: add empty scu_enable for !CONFIG_SMP

wimax/i2400m: fix i2400m->wake_tx_skb handling

i2400m_net_wake_tx() sets ->wake_tx_skb with the given skb if
->wake_tx_ws is not pending; however, i2400m_wake_tx_work() could have
just started execution and haven't fetched -><wake_tx_skb yet.  The
previous packet will be leaked.

Update ->wake_tx_skb handling.

* i2400m_net_wake_tx() now tests whether the previous ->wake_tx_skb
  has been consumed by ->wake_tx_ws instead of testing work_pending().

* i2400m_net_wake_stop() is simplified similarly.  It always puts
  ->wake_tx_skb if non-NULL.

* Spurious ->wake_tx_skb dereference outside critical section dropped
  from i2400m_wake_tx_work().

Only compile tested.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Dan Williams <dcbw@redhat.com>
Cc: Inaky Perez-Gonzalez <inaky.perez-gonzalez@intel.com>
Cc: linux-wimax@intel.com
Cc: wimax@linuxwimax.org

kprobes: fix wait_for_kprobe_optimizer()

wait_for_kprobe_optimizer() seems largely broken. It uses
optimizer_comp which is never re-initialized, so
wait_for_kprobe_optimizer() will never wait for anything once
kprobe_optimizer() finishes all pending jobs for the first time.

Also, aside from completion, delayed_work_pending() is %false once
kprobe_optimizer() starts execution and wait_for_kprobe_optimizer()
won't wait for it.

Reimplement it so that it flushes optimizing_work until
[un]optimizing_lists are empty. Note that this also makes
optimizing_work execute immediately if someone's waiting for it, which
is the nicer behavior.

Only compile tested.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>

ipw2x00: simplify scan_event handling

* Drop unnesssary delayd_work_pending() tests.

* Unify scan_event_{now|later} by using mod_delayed_work() w/ 0 delay
for scan_event_now.

* Make ipw2200 scan_event handling match ipw2100 - use
mod_delayed_work() w/ 0 delay for immediate scanning.

Only compile tested.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Stanislav Yakovlev <stas.yakovlev@gmail.com>
Cc: linux-wireless@vger.kernel.org

drm/nouveau: add lockdep annotations

1) Lockdep thinks all nouveau subdevs belong to the same class and can be
locked in arbitrary order, which is not true (at least in general case).
Tell it to distinguish subdevs by (o)class type.
2) DRM client can be locked under user client lock - tell lockdep to put
DRM client lock in a separate class.

Reported-by: Arend van Spriel <arend@broadcom.com>
Reported-by: Peter Hurley <peter@hurleysoftware.com>
Reported-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Reported-by: Daniel J Blueman <daniel@quora.org>
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: stable@vger.kernel.org [3.7, but needs s/const ofuncs/ofuncs/ to build]
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>

time, Fix setting of hardware clock in NTP code

At init time, if the system time is "warped" forward in warp_clock()
it will differ from the hardware clock by sys_tz.tz_minuteswest.  This time
difference is not taken into account when ntp updates the hardware clock,
and this causes the system time to jump forward by this offset every reboot.

The kernel must take this offset into account when writing the system time
to the hardware clock in the ntp code.  This patch adds
persistent_clock_is_local which indicates that an offset has been applied
in warp_clock() and accounts for the "warp" before writing the hardware
clock.

x86 does not have this problem as rtc writes are software limited to a
+/-15 minute window relative to the current rtc time.  Other arches, such
as powerpc, however do a full synchronization of the system time to the
rtc and will see this problem.

[v2]: generated against tip/timers/core

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>

Revert "r8169: enable internal ASPM and clock request settings".

This reverts commit d64ec841517a25f6d468bde9f67e5b4cffdc67c7.

Jörg Otte reported his 8168evl to increase boot time link detection
from 1.6 to 10 s.

Hayes suggests reverting it for the time being.

Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Cc: Hayes Wang <hayeswang@realtek.com>
Cc: Jörg Otte <jrg.otte@gmail.com>

Revert "r8169: enable ALDPS for power saving".

This reverts commit e0c075577965d1c01b30038d38bf637b027a1df3.

Jörg Otte reported his 8168evl to fail boot time link detection.

Hayes suggests reverting it for the time being.

Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Cc: Hayes Wang <hayeswang@realtek.com>
Cc: Jörg Otte <jrg.otte@gmail.com>

Linux 3.8-rc7

Merge branch 'fixes' of git://git.linaro.org/people/rmk/linux-arm

Pull ARM fixes from Russell King:
"I was going to hold these off until v3.8 was out, and send them with a
  stable tag, but as everyone else is pushing much bigger fixes which
  Linus is accepting, let's save people from the hastle of having to
  patch v3.8 back into working or use a stable kernel.

  Looking at the diffstat, this really is high value for its size; this
  is miniscule compared to how the -rc6 to tip diffstat currently looks.

  So, four patches in this set:
   - Punit Agrawal reports that the kernel no longer boots on MPCore due
     to a new assumption made in the GIC code which isn't true of
     earlier GIC designs.  This is the biggest change in this set.
   - Punit's boot log also revealed a bunch of WARN_ON() dumps caused by
     the DT-ification of the GIC support without fixing up non-DT
     Realview - which now sees a greater number of interrupts than it
     did before.
   - A fix for the DMA coherent code from Marek which uses the wrong
     check for atomic allocations; this can result in spinlock lockups
     or other nasty effects.
   - A fix from Will, which will affect all Android based platforms if
     not applied (which use the 2G:2G VM split) - this causes
     particularly 'make' to misbehave unless this bug is fixed."

* 'fixes' of git://git.linaro.org/people/rmk/linux-arm:
  ARM: 7641/1: memory: fix broken mmap by ensuring TASK_UNMAPPED_BASE is aligned
  ARM: DMA mapping: fix bad atomic test
  ARM: realview: ensure that we have sufficient IRQs available
  ARM: GIC: fix GIC cpumask initialization

Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) Revert iwlwifi reclaimed packet tracking, it causes problems for a
    bunch of folks.  From Emmanuel Grumbach.

2) Work limiting code in brcmsmac wifi driver can clear tx status
    without processing the event.  From Arend van Spriel.

3) rtlwifi USB driver processes wrong SKB, fix from Larry Finger.

4) l2tp tunnel delete can race with close, fix from Tom Parkin.

5) pktgen_add_device() failures are not checked at all, fix from Cong
    Wang.

6) Fix unintentional removal of carrier off from tun_detach(),
    otherwise we confuse userspace, from Michael S.  Tsirkin.

7) Don't leak socket reference counts and ubufs in vhost-net driver,
    from Jason Wang.

8) vmxnet3 driver gets it's initial carrier state wrong, fix from Neil
    Horman.

9) Protect against USB networking devices which spam the host with 0
    length frames, from Bjørn Mork.

10) Prevent neighbour overflows in ipv6 for locally destined routes,
    from Marcelo Ricardo.  This is the best short-term fix for this, a
    longer term fix has been implemented in net-next.

11) L2TP uses ipv4 datagram routines in it's ipv6 code, whoops.  This
    mistake is largely because the ipv6 functions don't even have some
    kind of prefix in their names to suggest they are ipv6 specific.
    From Tom Parkin.

12) Check SYN packet drops properly in tcp_rcv_fastopen_synack(), from
    Yuchung Cheng.

13) Fix races and TX skb freeing bugs in via-rhine's NAPI support, from
    Francois Romieu and your's truly.

14) Fix infinite loops and divides by zero in TCP congestion window
    handling, from Eric Dumazet, Neal Cardwell, and Ilpo Järvinen.

15) AF_PACKET tx ring handling can leak kernel memory to userspace, fix
    from Phil Sutter.

16) Fix error handling in ipv6 GRE tunnel transmit, from Tommi Rantala.

17) Protect XEN netback driver against hostile frontend putting garbage
    into the rings, don't leak pages in TX GOP checking, and add proper
    resource releasing in error path of xen_netbk_get_requests().  From
    Ian Campbell.

18) SCTP authentication keys should be cleared out and released with
    kzfree(), from Daniel Borkmann.

19) L2TP is a bit too clever trying to maintain skb->truesize, and ends
    up corrupting socket memory accounting to the point where packet
    sending is halted indefinitely.  Just remove the adjustments
    entirely, they aren't really needed.  From Eric Dumazet.

20) ATM Iphase driver uses a data type with the same name as the S390
    headers, rename to fix the build.  From Heiko Carstens.

21) Fix a typo in copying the inner network header offset from one SKB
    to another, from Pravin B Shelar.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (56 commits)
  net: sctp: sctp_endpoint_free: zero out secret key data
  net: sctp: sctp_setsockopt_auth_key: use kzfree instead of kfree
  atm/iphase: rename fregt_t -> ffreg_t
  net: usb: fix regression from FLAG_NOARP code
  l2tp: dont play with skb->truesize
  net: sctp: sctp_auth_key_put: use kzfree instead of kfree
  netback: correct netbk_tx_err to handle wrap around.
  xen/netback: free already allocated memory on failure in xen_netbk_get_requests
  xen/netback: don't leak pages on failure in xen_netbk_tx_check_gop.
  xen/netback: shutdown the ring if it contains garbage.
  net: qmi_wwan: add more Huawei devices, including E320
  net: cdc_ncm: add another Huawei vendor specific device
  ipv6/ip6_gre: fix error case handling in ip6gre_tunnel_xmit()
  tcp: fix for zero packets_in_flight was too broad
  brcmsmac: rework of mac80211 .flush() callback operation
  ssb: unregister gpios before unloading ssb
  bcma: unregister gpios before unloading bcma
  rtlwifi: Fix scheduling while atomic bug
  net: usbnet: fix tx_dropped statistics
  tcp: ipv6: Update MIB counters for drops
  ...

Merge branch 'sctp_keys'

Daniel Borkmann says:

====================
Cryptographically used keys should be zeroed out when our session
ends resp. memory is freed, thus do not leave them somewhere in the
memory.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: sctp: sctp_endpoint_free: zero out secret key data

On sctp_endpoint_destroy, previously used sensitive keying material
should be zeroed out before the memory is returned, as we already do
with e.g. auth keys when released.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sctp: sctp_setsockopt_auth_key: use kzfree instead of kfree

In sctp_setsockopt_auth_key, we create a temporary copy of the user
passed shared auth key for the endpoint or association and after
internal setup, we free it right away. Since it's sensitive data, we
should zero out the key before returning the memory back to the
allocator. Thus, use kzfree instead of kfree, just as we do in
sctp_auth_key_put().

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

unbreak automounter support on 64-bit kernel with 32-bit userspace (v2)

automount-support is broken on the parisc architecture, because the existing
#if list does not include a check for defined(__hppa__). The HPPA (parisc)
architecture is similiar to other 64bit Linux targets where we have to define
autofs_wqt_t (which is passed back and forth to user space) as int type which
has a size of 32bit across 32 and 64bit kernels.

During the discussion on the mailing list, H. Peter Anvin suggested to invert
the #if list since only specific platforms (specifically those who do not have
a 32bit userspace, like IA64 and Alpha) should have autofs_wqt_t as unsigned
long type.

This suggestion is probably the best way to go, since Arm64 (and maybe others?)
seems to have a non-working automounter. So in the long run even for other new
upcoming architectures this inverted check seem to be the best solution, since
it will not require them to change this #if again (unless they are 64bit only).

Signed-off-by: Helge Deller <deller@gmx.de>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Acked-by: Ian Kent <raven@themaw.net>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
CC: James Bottomley <James.Bottomley@HansenPartnership.com>
CC: Rolf Eike Beer <eike-kernel@sf-tec.de>

atm/iphase: rename fregt_t -> ffreg_t

We have conflicting type qualifiers for "freg_t" in s390's ptrace.h and the
iphase atm device driver, which causes the compile error below.
Unfortunately the s390 typedef can't be renamed, since it's a user visible api,
nor can I change the include order in s390 code to avoid the conflict.

So simply rename the iphase typedef to a new name. Fixes this compile error:

In file included from drivers/atm/iphase.c:66:0:
drivers/atm/iphase.h:639:25: error: conflicting type qualifiers for 'freg_t'
In file included from next/arch/s390/include/asm/ptrace.h:9:0,
                 from next/arch/s390/include/asm/lowcore.h:12,
                 from next/arch/s390/include/asm/thread_info.h:30,
                 from include/linux/thread_info.h:54,
                 from include/linux/preempt.h:9,
                 from include/linux/spinlock.h:50,
                 from include/linux/seqlock.h:29,
                 from include/linux/time.h:5,
                 from include/linux/stat.h:18,
                 from include/linux/module.h:10,
                 from drivers/atm/iphase.c:43:
next/arch/s390/include/uapi/asm/ptrace.h:197:3: note: previous declaration of 'freg_t' was here

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: chas williams - CONTRACTOR <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>

tile: tag some code with #ifdef CONFIG_COMPAT

This allows us to disable COMPAT mode without a link error.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>

tile: fix memcpy_*io functions for allnoconfig

On tilepro without CONFIG_PCI, we can't provide inlines of these
functions, as we don't have readl/writel.

In addition, fix memset_io() signature to take a volatile void *.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>

tile: export a handful of symbols appropriately

This was shown up by running with "allmodconfig". I used
EXPORT_SYMBOL() to match existing conventions in files that
were already exporting symbols, or that were exported that way
by other architectures, and otherwise EXPORT_SYMBOL_GPL().

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>

uprobes/perf: Avoid uprobe_apply() whenever possible

uprobe_perf_open/close call the costly uprobe_apply() every time,
we can avoid it if:

- "nr_systemwide != 0" is not changed.

- There is another process/thread with the same ->mm.

- copy_proccess() does inherit_event(). dup_mmap() preserves the
  inserted breakpoints.

- event->attr.enable_on_exec == T, we can rely on uprobe_mmap()
  called by exec/mmap paths.

- tp_target is exiting. Only _close() checks PF_EXITING, I don't
  think TRACE_REG_PERF_OPEN can hit the dying task too often.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>

uprobes/perf: Teach trace_uprobe/perf code to use UPROBE_HANDLER_REMOVE

Change uprobe_trace_func() and uprobe_perf_func() to return "int". Change
uprobe_dispatcher() to return "trace_ret | perf_ret" although this is not
needed, currently TP_FLAG_TRACE/TP_FLAG_PROFILE are mutually exclusive.

The only functional change is that uprobe_perf_func() checks the filtering
too and returns UPROBE_HANDLER_REMOVE if nobody wants to trace current.

Testing:

# perf probe -x /lib/libc.so.6 syscall

# perf record -e probe_libc:syscall -i perl -e 'fork; syscall -1 for 1..10; wait'

# perf report --show-total-period
100.00% 10 perl libc-2.8.so [.] syscall

Before this patch:

# cat /sys/kernel/debug/tracing/uprobe_profile
/lib/libc.so.6 syscall 20

A child process doesn't have a counter, but still it hits this breakoint
"copied" by dup_mmap().

After the patch:

# cat /sys/kernel/debug/tracing/uprobe_profile
/lib/libc.so.6 syscall 11

The child process hits this int3 only once and does unapply_uprobe().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>

uprobes/perf: Teach trace_uprobe/perf code to pre-filter

Finally implement uprobe_perf_filter() which checks ->nr_systemwide or
->perf_events to figure out whether we need to insert the breakpoint.

uprobe_perf_open/close are changed to do uprobe_apply(true/false) when
the new perf event comes or goes away.

Note that currently this is very suboptimal:

- uprobe_register() called by TRACE_REG_PERF_REGISTER becomes a
  heavy nop, consumer->filter() always returns F at this stage.

  As it was already discussed we need uprobe_register_only() to
  avoid the costly register_for_each_vma() when possible.

- uprobe_apply() is oftenly overkill. Unless "nr_systemwide != 0"
  changes we need uprobe_apply_mm(), unapply_uprobe() is almost
  what we need.

- uprobe_apply() can be simply avoided sometimes, see the next
  changes.

Testing:

# perf probe -x /lib/libc.so.6 syscall

# perl -e 'syscall -1 while 1' &
[1] 530

# perf record -e probe_libc:syscall perl -e 'syscall -1 for 1..10; sleep 1'

# perf report --show-total-period
100.00%            10     perl  libc-2.8.so    [.] syscall

Before this patch:

# cat /sys/kernel/debug/tracing/uprobe_profile
/lib/libc.so.6 syscall 79291

A huge ->nrhit == 79291 reflects the fact that the background process
530 constantly hits this breakpoint too, even if doesn't contribute to
the output.

After the patch:

# cat /sys/kernel/debug/tracing/uprobe_profile
/lib/libc.so.6 syscall 10

This shows that only the target process was punished by int3.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>

uprobes/perf: Teach trace_uprobe/perf code to track the active perf_event's

Introduce "struct trace_uprobe_filter" which records the "active"
perf_event's attached to ftrace_event_call. For the start we simply
use list_head, we can optimize this later if needed. For example, we
do not really need to record an event with ->parent != NULL, we can
rely on parent->child_list. And we can certainly do some optimizations
for the case when 2 events have the same ->tp_target or tp_target->mm.

Change trace_uprobe_register() to process TRACE_REG_PERF_OPEN/CLOSE
and add/del this perf_event to the list.

We can probably avoid any locking, but lets start with the "obvioulsy
correct" trace_uprobe_filter->rwlock which protects everything.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>

uprobes: Introduce uprobe_apply()

Currently it is not possible to change the filtering constraints after
uprobe_register(), so a consumer can not, say, start to trace a task/mm
which was previously filtered out, or remove the no longer needed bp's.

Introduce uprobe_apply() which simply does register_for_each_vma() again
to consult uprobe_consumer->filter() and install/remove the breakpoints.
The only complication is that register_for_each_vma() can no longer
assume that uprobe->consumers should be consulter if is_register == T,
so we change it to accept "struct uprobe_consumer *new" instead.

Unlike uprobe_register(), uprobe_apply(true) doesn't do "unregister" if
register_for_each_vma() fails, it is up to caller to handle the error.

Note: we probably need to cleanup the current interface, it is strange
that uprobe_apply/unregister need inode/offset. We should either change
uprobe_register() to return "struct uprobe *", or add a private ->uprobe
member in uprobe_consumer. And in the long term uprobe_apply() should
take a single argument, uprobe or consumer, even "bool add" should go
away.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>

perf: Introduce hw_perf_event->tp_target and ->tp_list

sys_perf_event_open()->perf_init_event(event) is called before
find_get_context(event), this means that event->ctx == NULL when
class->reg(TRACE_REG_PERF_REGISTER/OPEN) is called and thus it
can't know if this event is per-task or system-wide.

This patch adds hw_perf_event->tp_target for PERF_TYPE_TRACEPOINT,
this is analogous to PERF_TYPE_BREAKPOINT/bp_target we already have.
The patch also moves ->bp_target up so that it can overlap with the
new member, this can help the compiler to generate the better code.

trace_uprobe_register() will use it for prefiltering to avoid the
unnecessary breakpoints in mm's we do not want to trace.

->tp_target doesn't have its own reference, but we can rely on the
fact that either sys_perf_event_open() holds a reference, or it is
equal to event->ctx->task. So this pointer is always valid until
free_event().

Also add the "struct list_head tp_list" into this union. It is not
strictly necessary, but it can simplify the next changes and we can
add it for free.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>

x86, doc: Add a bootloader ID for OVMF

OVMF (an implementation of UEFI based on TianoCore used in virtual
environments) now has the ability to boot Linux natively; this is used
for "qemu -kernel" and similar things in a UEFI environment.

Accordingly, assign it a bootloader ID.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>

uprobes/perf: Always increment trace_uprobe->nhit

Move tu->nhit++ from uprobe_trace_func() to uprobe_dispatcher().

->nhit counts how many time we hit the breakpoint inserted by this
uprobe, we do not want to loose this info if uprobe was enabled by
sys_perf_event_open().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>

uprobes/tracing: Kill uprobe_trace_consumer, embed uprobe_consumer into trace_uprobe

trace_uprobe->consumer and "struct uprobe_trace_consumer" add the
unnecessary indirection and complicate the code for no reason.

This patch simply embeds uprobe_consumer into "struct trace_uprobe",
all other changes only fix the compilation errors.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>

uprobes/tracing: Introduce is_trace_uprobe_enabled()

probe_event_enable/disable() check tu->consumer != NULL to avoid the
wrong uprobe_register/unregister().

We are going to kill this pointer and "struct uprobe_trace_consumer",
so we add the new helper, is_trace_uprobe_enabled(), which can rely
on TP_FLAG_TRACE/TP_FLAG_PROFILE instead.

Note: the current logic doesn't look optimal, it is not clear why
TP_FLAG_TRACE/TP_FLAG_PROFILE are mutually exclusive, we will probably
change this later.

Also kill the unused TP_FLAG_UPROBE.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>

uprobes/tracing: Ensure inode != NULL in create_trace_uprobe()

probe_event_enable/disable() check tu->inode != NULL at the start.
This is ugly, if igrab() can fail create_trace_uprobe() should not
succeed and "postpone" the failure.

And S_ISREG(inode->i_mode) check added by d24d7dbf is not safe.

Note: alloc_uprobe() should probably check igrab() != NULL as well.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>

uprobes/tracing: Fully initialize uprobe_trace_consumer before uprobe_register()

probe_event_enable() does uprobe_register() and only after that sets
utc->tu and tu->consumer/flags. This can race with uprobe_dispatcher()
which can miss these assignments or see them out of order. Nothing
really bad can happen, but this doesn't look clean/safe.

And this does not allow to use uprobe_consumer->filter() we are going
to add, it is called by uprobe_register() and it needs utc->tu.

Change this code to initialize everything before uprobe_register(), and
reset tu->consumer/flags if it fails. We can't race with event_disable(),
the caller holds event_mutex, and if we could the code would be wrong
anyway.

In fact I think uprobe_trace_consumer should die, it buys nothing but
complicates the code. We can simply add uprobe_consumer into trace_uprobe.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>

uprobes/tracing: Fix dentry/mount leak in create_trace_uprobe()

create_trace_uprobe() does kern_path() to find ->d_inode, but forgets
to do path_put(). We can do this right after igrab().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>

uprobes: Add exports for module use

The original pull message for uprobes (commit 654443e2) noted:

  This tree includes uprobes support in 'perf probe' - but SystemTap
  (and other tools) can take advantage of user probe points as well.

In order to actually be usable in module-based tools like SystemTap, the
interface needs to be exported.  This patch first adds the obvious
exports for uprobe_register and uprobe_unregister.  Then it also adds
one for task_user_regset_view, which is necessary to get the correct
state of userspace registers.

Signed-off-by: Josh Stone <jistone@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>

uprobes: Kill the bogus IS_ERR_VALUE(xol_vaddr) check

utask->xol_vaddr is either zero or valid, remove the bogus
IS_ERR_VALUE() check in xol_free_insn_slot().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Anton Arapov <anton@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>

uprobes: Do not allocate current->utask unnecessary

handle_swbp() does get_utask() before can_skip_sstep() for no reason,
we do not need ->utask if can_skip_sstep() succeeds.

Move get_utask() to pre_ssout() who actually starts to use it. Move
the initialization of utask->active_uprobe/state as well. This way
the whole initialization is consolidated in pre_ssout().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Anton Arapov <anton@redhat.com>

uprobes: Fix utask->xol_vaddr leak in pre_ssout()

pre_ssout() should do xol_free_insn_slot() if arch_uprobe_pre_xol()
fails, otherwise nobody will free the allocated slot.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Anton Arapov <anton@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>

uprobes: Do not play with utask in xol_get_insn_slot()

pre_ssout()->xol_get_insn_slot() path is confusing and buggy. This patch
cleanups the code, the next one fixes the bug.

Change xol_get_insn_slot() to only allocate the slot and do nothing more,
move the initialization of utask->xol_vaddr/vaddr into pre_ssout().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Anton Arapov <anton@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>

uprobes: Turn add_utask() into get_utask()

Rename add_utask() into get_utask() and change it to allocate on
demand to simplify the caller. Like get_xol_area() it will have
more users.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Anton Arapov <anton@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>

uprobes: Fold xol_alloc_area() into get_xol_area()

Currently only xol_get_insn_slot() does get_xol_area() + xol_alloc_area(),
but this will have more users and we do not want to copy-and-paste this
code. This patch simply moves xol_alloc_area() into get_xol_area() to
simplify the current and future code.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Anton Arapov <anton@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>

uprobes: Move alloc_page() from xol_add_vma() to xol_alloc_area()

Move alloc_page() from xol_add_vma() to xol_alloc_area() to cleanup
the code. This separates the memory allocations and consolidates the
-EALREADY cleanups and the error handling.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Anton Arapov <anton@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>

uprobes: Change handle_swbp() to expose bp_vaddr to handler_chain()

Change handle_swbp() to set regs->ip = bp_vaddr in advance, this is
what consumer->handler() needs but uprobe_get_swbp_addr() is not
exported.

This also simplifies the code and makes it more consistent across
the supported architectures. handle_swbp() becomes the only caller
of uprobe_get_swbp_addr().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>

uprobes/x86: Change __skip_sstep() to actually skip the whole insn

__skip_sstep() doesn't update regs->ip. Currently this is correct
but only "by accident" and it doesn't skip the whole insn. Change
it to advance ->ip by the length of the detected 0x66*0x90 sequence.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>

uprobes: Teach handler_chain() to filter out the probed task

Currrently the are 2 problems with pre-filtering:

1. It is not possible to add/remove a task (mm) after uprobe_register()

2. A forked child inherits all breakpoints and uprobe_consumer can not
control this.

This patch does the first step to improve the filtering. handler_chain()
removes the breakpoints installed by this uprobe from current->mm if all
handlers return UPROBE_HANDLER_REMOVE.

Note that handler_chain() relies on ->register_rwsem to avoid the race
with uprobe_register/unregister which can add/del a consumer, or even
remove and then insert the new uprobe at the same address.

Perhaps we will add uprobe_apply_mm(uprobe, mm, is_register) and teach
copy_mm() to do filter(UPROBE_FILTER_FORK), but I think this change makes
sense anyway.

Note: instead of checking the retcode from uc->handler, we could add
uc->filter(UPROBE_FILTER_BPHIT). But I think this is not optimal to
call 2 hooks in a row. This buys nothing, and if handler/filter do
something nontrivial they will probably do the same work twice.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>

uprobes: Reintroduce uprobe_consumer->filter()

Finally add uprobe_consumer->filter() and change consumer_filter()
to actually call this method.

Note that ->filter() accepts mm_struct, not task_struct. Because:

1. We do not have for_each_mm_user(mm, task).

2. Even if we implement for_each_mm_user(), ->filter() can
use it itself.

3. It is not clear who will actually need this interface to
do the "nontrivial" filtering.

Another argument is "enum uprobe_filter_ctx", consumer->filter() can
use it to figure out why/where it was called. For example, perhaps
we can add UPROBE_FILTER_PRE_REGISTER used by build_map_info() to
quickly "nack" the unwanted mm's. In this case consumer should know
that it is called under ->i_mmap_mutex.

See the previous discussion at http://marc.info/?t=135214229700002
Perhaps we should pass more arguments, vma/vaddr?

Note: this patch obviously can't help to filter out the child created
by fork(), this will be addressed later.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>

uprobes: Rationalize the usage of filter_chain()

filter_chain() was added into install_breakpoint/remove_breakpoint to
simplify the initial changes but this is sub-optimal.

This patch shifts the callsite to the callers, register_for_each_vma()
and uprobe_mmap(). This way:

- It will be easier to add the new arguments. This is the main reason,
  we can do more optimizations later.

- register_for_each_vma(is_register => true) can be optimized, we only
  need to consult the new consumer. The previous consumers were already
  asked when they called uprobe_register().

This patch also moves the MMF_HAS_UPROBES check from remove_breakpoint(),
this allows to avoid the potentionally costly filter_chain(). Note that
register_for_each_vma(is_register => false) doesn't really need to take
->consumer_rwsem, but I don't think it makes sense to optimize this and
introduce filter_chain_lockless().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>

uprobes: Kill uprobes_mutex[], separate alloc_uprobe() and __uprobe_register()

uprobe_register() and uprobe_unregister() are the only users of
mutex_lock(uprobes_hash(inode)), and the only reason why we can't
simply remove it is that we need to ensure that delete_uprobe() is
not possible after alloc_uprobe() and before consumer_add().

IOW, we need to ensure that when we take uprobe->register_rwsem
this uprobe is still valid and we didn't race with _unregister()
which called delete_uprobe() in between.

With this patch uprobe_register() simply checks uprobe_is_active()
and retries if it hits this very unlikely race. uprobes_mutex[] is
no longer needed and can be removed.

There is another reason for this change, prepare_uprobe() should be
folded into alloc_uprobe() and we do not want to hold the extra locks
around read_mapping_page/etc.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Anton Arapov <anton@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>

uprobes: Introduce uprobe_is_active()

The lifetime of uprobe->rb_node and uprobe->inode is not refcounted,
delete_uprobe() is called when we detect that uprobe has no consumers,
and it would be deadly wrong to do this twice.

Change delete_uprobe() to WARN() if it was already called. We use
RB_CLEAR_NODE() to mark uprobe "inactive", then RB_EMPTY_NODE() can
be used to detect this case.

RB_EMPTY_NODE() is not used directly, we add the trivial helper for
the next change.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Anton Arapov <anton@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>

uprobes: Kill uprobe_events, use RB_EMPTY_ROOT() instead

uprobe_events counts the number of uprobes in uprobes_tree but
it is used as a boolean. We can use RB_EMPTY_ROOT() instead.

Probably no_uprobe_events() added by this patch can have more
callers, say, mmf_recalc_uprobes().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Anton Arapov <anton@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>

uprobes: Kill uprobe->copy_mutex

Now that ->register_rwsem is safe under ->mmap_sem we can kill
->copy_mutex and abuse down_write(&uprobe->consumer_rwsem).

This makes prepare_uprobe() even more ugly, but we should kill
it anyway.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>