review.tizen.org Git - platform/adaptation/renesas_rcar/renesas

tracing/kprobes: Add trace event trigger invocations

Add code to the kprobe/kretprobe event functions that will invoke any
event triggers associated with a probe's ftrace_event_file.

The code to do this is very similar to the invocation code already
used to invoke the triggers associated with static events and
essentially replaces the existing soft-disable checks with a superset
that preserves the original behavior but adds the bits needed to
support event triggers.

Link: http://lkml.kernel.org/r/f2d49f157b608070045fdb26c9564d5a05a5a7d0.1389036657.git.tom.zanussi@linux.intel.com
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

tracing/probes: Fix build break on !CONFIG_KPROBE_EVENT

When kprobe-based dynamic event tracer is not enabled, it caused
following build error:

   kernel/built-in.o: In function `traceprobe_update_arg':
   (.text+0x10c8dd): undefined reference to `fetch_symbol_u8'
   kernel/built-in.o: In function `traceprobe_update_arg':
   (.text+0x10c8e9): undefined reference to `fetch_symbol_u16'
   kernel/built-in.o: In function `traceprobe_update_arg':
   (.text+0x10c8f5): undefined reference to `fetch_symbol_u32'
   kernel/built-in.o: In function `traceprobe_update_arg':
   (.text+0x10c901): undefined reference to `fetch_symbol_u64'
   kernel/built-in.o: In function `traceprobe_update_arg':
   (.text+0x10c909): undefined reference to `fetch_symbol_string'
   kernel/built-in.o: In function `traceprobe_update_arg':
   (.text+0x10c913): undefined reference to `fetch_symbol_string_size'
   ...

It was due to the fetch methods are referred from CHECK_FETCH_FUNCS
macro and since it was only defined in trace_kprobe.c.  Move NULL
definition of such fetch functions to the header file.

Note, it also requires CONFIG_BRANCH_PROFILING enabled to trigger
this failure as well. This is because the "fetch_symbol_*" variables
are referenced in a "else if" statement that will only call
update_symbol_cache(), which is a static inline stub function
when CONFIG_KPROBE_EVENT is not enabled. gcc is smart enough
to optimize this "else if" out and that also removes the code that
references the undefined variables.

But when BRANCH_PROFILING is enabled, it fools gcc into keeping
the if statement around and thus references the undefined symbols
and fails to build.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

tracing/uprobes: Add @+file_offset fetch method

Enable to fetch data from a file offset. Currently it only supports
fetching from same binary uprobe set. It'll translate the file offset
to a proper virtual address in the process.

The syntax is "@+OFFSET" as it does similar to normal memory fetching
(@ADDR) which does no address translation.

Suggested-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

uprobes: Allocate ->utask before handler_chain() for tracing handlers

uprobe_trace_print() and uprobe_perf_print() need to pass the additional
info to call_fetch() methods, currently there is no simple way to do this.

current->utask looks like a natural place to hold this info, but we need
to allocate it before handler_chain().

This is a bit unfortunate, perhaps we will find a better solution later,
but this is simple and should work right now.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

tracing/uprobes: Add support for full argument access methods

Enable to fetch other types of argument for the uprobes.  IOW, we can
access stack, memory, deref, bitfield and retval from uprobes now.

The format for the argument types are same as kprobes (but @SYMBOL
type is not supported for uprobes), i.e:

  @ADDR   : Fetch memory at ADDR
  $stackN : Fetch Nth entry of stack (N >= 0)
  $stack  : Fetch stack address
  $retval : Fetch return value
  +|-offs(FETCHARG) : Fetch memory at FETCHARG +|- offs address

Note that the retval only can be used with uretprobes.

Original-patch-by: Hyeoncheol Lee <cheol.lee@lge.com>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Hyeoncheol Lee <cheol.lee@lge.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

tracing/uprobes: Fetch args before reserving a ring buffer

Fetching from user space should be done in a non-atomic context.  So
use a per-cpu buffer and copy its content to the ring buffer
atomically.  Note that we can migrate during accessing user memory
thus use a per-cpu mutex to protect concurrent accesses.

This is needed since we'll be able to fetch args from an user memory
which can be swapped out.  Before that uprobes could fetch args from
registers only which saved in a kernel space.

While at it, use __get_data_size() and store_trace_args() to reduce
code duplication.  And add struct uprobe_cpu_buffer and its helpers as
suggested by Oleg.

Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

tracing/uprobes: Pass 'is_return' to traceprobe_parse_probe_arg()

Currently uprobes don't pass is_return to the argument parser so that
it cannot make use of "$retval" fetch method since it only works for
return probes.

Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

tracing/probes: Implement 'memory' fetch method for uprobes

Use separate method to fetch from memory. Move existing functions to
trace_kprobe.c and make them static. Also add new memory fetch
implementation for uprobes.

Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

tracing/probes: Add fetch{,_size} member into deref fetch method

The deref fetch methods access a memory region but it assumes that
it's a kernel memory since uprobes does not support them.

Add ->fetch and ->fetch_size member in order to provide a proper
access methods for supporting uprobes.

Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Hyeoncheol Lee <cheol.lee@lge.com>
[namhyung@kernel.org: Split original patch into pieces as requested]
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

tracing/probes: Move 'symbol' fetch method to kprobes

Move existing functions to trace_kprobe.c and add NULL entries to the
uprobes fetch type table. I don't make them static since some generic
routines like update/free_XXX_fetch_param() require pointers to the
functions.

Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

tracing/probes: Implement 'stack' fetch method for uprobes

Use separate method to fetch from stack. Move existing functions to
trace_kprobe.c and make them static. Also add new stack fetch
implementation for uprobes.

Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

tracing/probes: Split [ku]probes_fetch_type_table

Use separate fetch_type_table for kprobes and uprobes. It currently
shares all fetch methods but some of them will be implemented
differently later.

This is not to break build if [ku]probes is configured alone (like
!CONFIG_KPROBE_EVENT and CONFIG_UPROBE_EVENT). So I added '__weak'
to the table declaration so that it can be safely omitted when it
configured out.

Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

tracing/probes: Move fetch function helpers to trace_probe.h

Move fetch function helper macros/functions to the header file and
make them external. This is preparation of supporting uprobe fetch
table in next patch.

Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

tracing/probes: Integrate duplicate set_print_fmt()

The set_print_fmt() functions are implemented almost same for
[ku]probes. Move it to a common place and get rid of the duplication.

Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

tracing/kprobes: Move common functions to trace_probe.h

The __get_data_size() and store_trace_args() will be used by uprobes
too. Move them to a common location.

Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

tracing/uprobes: Convert to struct trace_probe

Convert struct trace_uprobe to make use of the common trace_probe
structure.

Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

tracing/kprobes: Factor out struct trace_probe

There are functions that can be shared to both of kprobes and uprobes.
Separate common data structure to struct trace_probe and use it from
the shared functions.

Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

tracing/probes: Fix basic print type functions

The print format of s32 type was "ld" and it's casted to "long".  So
it turned out to print 4294967295 for "-1" on 64-bit systems.  Not
sure whether it worked well on 32-bit systems.

Anyway, it doesn't need to have cast argument at all since it already
casted using type pointer - just get rid of it.  Thanks to Oleg for
pointing that out.

And print 0x prefix for unsigned type as it shows hex numbers.

Suggested-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

tracing/uprobes: Fix documentation of uprobe registration syntax

The uprobe syntax requires an offset after a file path not a symbol.

Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

tracing: Fix rcu handling of event_trigger_data filter field

The filter field of the event_trigger_data structure is protected under
RCU sched locks. It was not annotated as such, and after doing so,
sparse pointed out several locations that required fix ups.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Tested-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Acked-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

tracing: Add generic tracing_lseek() function

Trace event triggers added a lseek that uses the ftrace_filter_lseek()
function. Unfortunately, when function tracing is not configured in
that function is not defined and the kernel fails to build.

This is the second time that function was added to a file ops and
it broke the build due to requiring special config dependencies.

Make a generic tracing_lseek() that all the tracing utilities may
use.

Also, modify the old ftrace_filter_lseek() to return 0 instead of
1 on WRONLY. Not sure why it was a 1 as that does not make sense.

This also changes the old tracing_seek() to modify the file pos
pointer on WRONLY as well.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Tested-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Acked-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

tracing: Add documentation for trace event triggers

Provide a basic overview of trace event triggers and document the
available trigger commands, along with a few simple examples.

Link: http://lkml.kernel.org/r/2595dd9196d7b553049611f2a3f849ca75d650a2.1382622043.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

tracing: Add and use generic set_trigger_filter() implementation

Add a generic event_command.set_trigger_filter() op implementation and
have the current set of trigger commands use it - this essentially
gives them all support for filters.

Syntactically, filters are supported by adding 'if <filter>' just
after the command, in which case only events matching the filter will
invoke the trigger.  For example, to add a filter to an
enable/disable_event command:

    echo 'enable_event:system:event if common_pid == 999' > \
              .../othersys/otherevent/trigger

The above command will only enable the system:event event if the
common_pid field in the othersys:otherevent event is 999.

As another example, to add a filter to a stacktrace command:

    echo 'stacktrace if common_pid == 999' > \
                   .../somesys/someevent/trigger

The above command will only trigger a stacktrace if the common_pid
field in the event is 999.

The filter syntax is the same as that described in the 'Event
filtering' section of Documentation/trace/events.txt.

Because triggers can now use filters, the trigger-invoking logic needs
to be moved in those cases - e.g. for ftrace_raw_event_calls, if a
trigger has a filter associated with it, the trigger invocation now
needs to happen after the { assign; } part of the call, in order for
the trigger condition to be tested.

There's still a SOFT_DISABLED-only check at the top of e.g. the
ftrace_raw_events function, so when an event is soft disabled but not
because of the presence of a trigger, the original SOFT_DISABLED
behavior remains unchanged.

There's also a bit of trickiness in that some triggers need to avoid
being invoked while an event is currently in the process of being
logged, since the trigger may itself log data into the trace buffer.
Thus we make sure the current event is committed before invoking those
triggers.  To do that, we split the trigger invocation in two - the
first part (event_triggers_call()) checks the filter using the current
trace record; if a command has the post_trigger flag set, it sets a
bit for itself in the return value, otherwise it directly invoks the
trigger.  Once all commands have been either invoked or set their
return flag, event_triggers_call() returns.  The current record is
then either committed or discarded; if any commands have deferred
their triggers, those commands are finally invoked following the close
of the current event by event_triggers_post_call().

To simplify the above and make it more efficient, the TRIGGER_COND bit
is introduced, which is set only if a soft-disabled trigger needs to
use the log record for filter testing or needs to wait until the
current log record is closed.

The syscall event invocation code is also changed in analogous ways.

Because event triggers need to be able to create and free filters,
this also adds a couple external wrappers for the existing
create_filter and free_filter functions, which are too generic to be
made extern functions themselves.

Link: http://lkml.kernel.org/r/7164930759d8719ef460357f143d995406e4eead.1382622043.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

tracing: Move ftrace_event_file() out of DYNAMIC_FTRACE ifdef

Now that event triggers use ftrace_event_file(), it needs to be outside
the #ifdef CONFIG_DYNAMIC_FTRACE, as it can now be used when that is
not defined.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

tracing: Add 'enable_event' and 'disable_event' event trigger commands

Add 'enable_event' and 'disable_event' event_command commands.

enable_event and disable_event event triggers are added by the user
via these commands in a similar way and using practically the same
syntax as the analagous 'enable_event' and 'disable_event' ftrace
function commands, but instead of writing to the set_ftrace_filter
file, the enable_event and disable_event triggers are written to the
per-event 'trigger' files:

    echo 'enable_event:system:event' > .../othersys/otherevent/trigger
    echo 'disable_event:system:event' > .../othersys/otherevent/trigger

The above commands will enable or disable the 'system:event' trace
events whenever the othersys:otherevent events are hit.

This also adds a 'count' version that limits the number of times the
command will be invoked:

    echo 'enable_event:system:event:N' > .../othersys/otherevent/trigger
    echo 'disable_event:system:event:N' > .../othersys/otherevent/trigger

Where N is the number of times the command will be invoked.

The above commands will will enable or disable the 'system:event'
trace events whenever the othersys:otherevent events are hit, but only
N times.

This also makes the find_event_file() helper function extern, since
it's useful to use from other places, such as the event triggers code,
so make it accessible.

Link: http://lkml.kernel.org/r/f825f3048c3f6b026ee37ae5825f9fc373451828.1382622043.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

tracing: Add 'stacktrace' event trigger command

Add 'stacktrace' event_command.  stacktrace event triggers are added
by the user via this command in a similar way and using practically
the same syntax as the analogous 'stacktrace' ftrace function command,
but instead of writing to the set_ftrace_filter file, the stacktrace
event trigger is written to the per-event 'trigger' files:

    echo 'stacktrace' > .../tracing/events/somesys/someevent/trigger

The above command will turn on stacktraces for someevent i.e. whenever
someevent is hit, a stacktrace will be logged.

This also adds a 'count' version that limits the number of times the
command will be invoked:

    echo 'stacktrace:N' > .../tracing/events/somesys/someevent/trigger

Where N is the number of times the command will be invoked.

The above command will log N stacktraces for someevent i.e. whenever
someevent is hit N times, a stacktrace will be logged.

Link: http://lkml.kernel.org/r/0c30c008a0828c660aa0e1bbd3255cf179ed5c30.1382622043.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

tracing: Add 'snapshot' event trigger command

Add 'snapshot' event_command.  snapshot event triggers are added by
the user via this command in a similar way and using practically the
same syntax as the analogous 'snapshot' ftrace function command, but
instead of writing to the set_ftrace_filter file, the snapshot event
trigger is written to the per-event 'trigger' files:

    echo 'snapshot' > .../somesys/someevent/trigger

The above command will turn on snapshots for someevent i.e. whenever
someevent is hit, a snapshot will be done.

This also adds a 'count' version that limits the number of times the
command will be invoked:

    echo 'snapshot:N' > .../somesys/someevent/trigger

Where N is the number of times the command will be invoked.

The above command will snapshot N times for someevent i.e. whenever
someevent is hit N times, a snapshot will be done.

Also adds a new tracing_alloc_snapshot() function - the existing
tracing_snapshot_alloc() function is a special version of
tracing_snapshot() that also does the snapshot allocation - the
snapshot triggers would like to be able to do just the allocation but
not take a snapshot; the existing tracing_snapshot_alloc() in turn now
also calls tracing_alloc_snapshot() underneath to do that allocation.

Link: http://lkml.kernel.org/r/c9524dd07ce01f9dcbd59011290e0a8d5b47d7ad.1382622043.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
[ fix up from kbuild test robot <fengguang.wu@intel.com report ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

tracing: Add 'traceon' and 'traceoff' event trigger commands

Add 'traceon' and 'traceoff' event_command commands.  traceon and
traceoff event triggers are added by the user via these commands in a
similar way and using practically the same syntax as the analagous
'traceon' and 'traceoff' ftrace function commands, but instead of
writing to the set_ftrace_filter file, the traceon and traceoff
triggers are written to the per-event 'trigger' files:

    echo 'traceon' > .../tracing/events/somesys/someevent/trigger
    echo 'traceoff' > .../tracing/events/somesys/someevent/trigger

The above command will turn tracing on or off whenever someevent is
hit.

This also adds a 'count' version that limits the number of times the
command will be invoked:

    echo 'traceon:N' > .../tracing/events/somesys/someevent/trigger
    echo 'traceoff:N' > .../tracing/events/somesys/someevent/trigger

Where N is the number of times the command will be invoked.

The above commands will will turn tracing on or off whenever someevent
is hit, but only N times.

Some common register/unregister_trigger() implementations of the
event_command reg()/unreg() callbacks are also provided, which add and
remove trigger instances to the per-event list of triggers, and
arm/disarm them as appropriate.  event_trigger_callback() is a
general-purpose event_command func() implementation that orchestrates
command parsing and registration for most normal commands.

Most event commands will use these, but some will override and
possibly reuse them.

The event_trigger_init(), event_trigger_free(), and
event_trigger_print() functions are meant to be common implementations
of the event_trigger_ops init(), free(), and print() ops,
respectively.

Most trigger_ops implementations will use these, but some will
override and possibly reuse them.

Link: http://lkml.kernel.org/r/00a52816703b98d2072947478dd6e2d70cde5197.1382622043.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

tracing: Add basic event trigger framework

Add a 'trigger' file for each trace event, enabling 'trace event
triggers' to be set for trace events.

'trace event triggers' are patterned after the existing 'ftrace
function triggers' implementation except that triggers are written to
per-event 'trigger' files instead of to a single file such as the
'set_ftrace_filter' used for ftrace function triggers.

The implementation is meant to be entirely separate from ftrace
function triggers, in order to keep the respective implementations
relatively simple and to allow them to diverge.

The event trigger functionality is built on top of SOFT_DISABLE
functionality.  It adds a TRIGGER_MODE bit to the ftrace_event_file
flags which is checked when any trace event fires.  Triggers set for a
particular event need to be checked regardless of whether that event
is actually enabled or not - getting an event to fire even if it's not
enabled is what's already implemented by SOFT_DISABLE mode, so trigger
mode directly reuses that.  Event trigger essentially inherit the soft
disable logic in __ftrace_event_enable_disable() while adding a bit of
logic and trigger reference counting via tm_ref on top of that in a
new trace_event_trigger_enable_disable() function.  Because the base
__ftrace_event_enable_disable() code now needs to be invoked from
outside trace_events.c, a wrapper is also added for those usages.

The triggers for an event are actually invoked via a new function,
event_triggers_call(), and code is also added to invoke them for
ftrace_raw_event calls as well as syscall events.

The main part of the patch creates a new trace_events_trigger.c file
to contain the trace event triggers implementation.

The standard open, read, and release file operations are implemented
here.

The open() implementation sets up for the various open modes of the
'trigger' file.  It creates and attaches the trigger iterator and sets
up the command parser.  If opened for reading set up the trigger
seq_ops.

The read() implementation parses the event trigger written to the
'trigger' file, looks up the trigger command, and passes it along to
that event_command's func() implementation for command-specific
processing.

The release() implementation does whatever cleanup is needed to
release the 'trigger' file, like releasing the parser and trigger
iterator, etc.

A couple of functions for event command registration and
unregistration are added, along with a list to add them to and a mutex
to protect them, as well as an (initially empty) registration function
to add the set of commands that will be added by future commits, and
call to it from the trace event initialization code.

also added are a couple trigger-specific data structures needed for
these implementations such as a trigger iterator and a struct for
trigger-specific data.

A couple structs consisting mostly of function meant to be implemented
in command-specific ways, event_command and event_trigger_ops, are
used by the generic event trigger command implementations.  They're
being put into trace.h alongside the other trace_event data structures
and functions, in the expectation that they'll be needed in several
trace_event-related files such as trace_events_trigger.c and
trace_events.c.

The event_command.func() function is meant to be called by the trigger
parsing code in order to add a trigger instance to the corresponding
event.  It essentially coordinates adding a live trigger instance to
the event, and arming the triggering the event.

Every event_command func() implementation essentially does the
same thing for any command:

   - choose ops - use the value of param to choose either a number or
     count version of event_trigger_ops specific to the command
   - do the register or unregister of those ops
   - associate a filter, if specified, with the triggering event

The reg() and unreg() ops allow command-specific implementations for
event_trigger_op registration and unregistration, and the
get_trigger_ops() op allows command-specific event_trigger_ops
selection to be parameterized.  When a trigger instance is added, the
reg() op essentially adds that trigger to the triggering event and
arms it, while unreg() does the opposite.  The set_filter() function
is used to associate a filter with the trigger - if the command
doesn't specify a set_filter() implementation, the command will ignore
filters.

Each command has an associated trigger_type, which serves double duty,
both as a unique identifier for the command as well as a value that
can be used for setting a trigger mode bit during trigger invocation.

The signature of func() adds a pointer to the event_command struct,
used to invoke those functions, along with a command_data param that
can be passed to the reg/unreg functions.  This allows func()
implementations to use command-specific blobs and supports code
re-use.

The event_trigger_ops.func() command corrsponds to the trigger 'probe'
function that gets called when the triggering event is actually
invoked.  The other functions are used to list the trigger when
needed, along with a couple mundane book-keeping functions.

This also moves event_file_data() into trace.h so it can be used
outside of trace_events.c.

Link: http://lkml.kernel.org/r/316d95061accdee070aac8e5750afba0192fa5b9.1382622043.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Idea-by: Steve Rostedt <rostedt@goodmis.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

Linux 3.13-rc4

null_blk: mem garbage on NUMA systems during init

For NUMA systems, initializing the blk-mq layer and using per node hctx.
We initialize submit queues to 1, while blk-mq nr_hw_queues is
initialized to the number of NUMA nodes.

This makes the null_init_hctx function overwrite memory outside of what
it allocated. In my case it lead to writing garbage into struct
request_queue's mq_map.

Signed-off-by: Matias Bjorling <m@bjorling.me>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

radeon_pm: fix oops in hwmon_attributes_visible() and radeon_hwmon_show_temp_thresh()

Since commit ec39f64bba34 ("drm/radeon/dpm: Convert to use
devm_hwmon_register_with_groups") radeon_hwmon_init() is using
hwmon_device_register_with_groups(), which sets `rdev' as a device
private driver_data, while hwmon_attributes_visible() and
radeon_hwmon_show_temp_thresh() are still waiting for `drm_device'.

Fix them by using dev_get_drvdata(), in order to avoid this oops:

  BUG: unable to handle kernel paging request at 0000000000001e28
  IP: [<ffffffffa02ae8b4>] hwmon_attributes_visible+0x18/0x3d [radeon]
  PGD 15057e067 PUD 151a8e067 PMD 0
  Oops: 0000 [#1] PREEMPT SMP
  Call Trace:
    internal_create_group+0x114/0x1d9
    sysfs_create_group+0xe/0x10
    sysfs_create_groups+0x22/0x5f
    device_add+0x34f/0x501
    device_register+0x15/0x18
    hwmon_device_register_with_groups+0xb5/0xed
    radeon_hwmon_init+0x56/0x7c [radeon]
    radeon_pm_init+0x134/0x7e5 [radeon]
    radeon_modeset_init+0x75f/0x8ed [radeon]
    radeon_driver_load_kms+0xc6/0x187 [radeon]
    drm_dev_register+0xf9/0x1b4 [drm]
    drm_get_pci_dev+0x98/0x129 [drm]
    radeon_pci_probe+0xa3/0xac [radeon]
    pci_device_probe+0x6e/0xcf
    driver_probe_device+0x98/0x1c4
    __driver_attach+0x5c/0x7e
    bus_for_each_dev+0x7b/0x85
    driver_attach+0x19/0x1b
    bus_add_driver+0x104/0x1ce
    driver_register+0x89/0xc5
    __pci_register_driver+0x58/0x5b
    drm_pci_init+0x86/0xea [drm]
    radeon_init+0x97/0x1000 [radeon]
    do_one_initcall+0x7f/0x117
    load_module+0x1583/0x1bb4
    SyS_init_module+0xa0/0xaf

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Alexander Deucher <Alexander.Deucher@amd.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) Revert CHECKSUM_COMPLETE optimization in pskb_trim_rcsum(), I can't
    figure out why it breaks things.

2) Fix comparison in netfilter ipset's hash_netnet4_data_equal(), it
    was basically doing "x == x", from Dave Jones.

3) Freescale FEC driver was DMA mapping the wrong number of bytes, from
    Sebastian Siewior.

4) Blackhole and prohibit routes in ipv6 were not doing the right thing
    because their ->input and ->output methods were not being assigned
    correctly.  Now they behave properly like their ipv4 counterparts.
    From Kamala R.

5) Several drivers advertise the NETIF_F_FRAGLIST capability, but
    really do not support this feature and will send garbage packets if
    fed fraglist SKBs.  From Eric Dumazet.

6) Fix long standing user triggerable BUG_ON over loopback in RDS
    protocol stack, from Venkat Venkatsubra.

7) Several not so common code paths can potentially try to invoke
    packet scheduler actions that might be NULL without checking.  Shore
    things up by either 1) defining a method as mandatory and erroring
    on registration if that method is NULL 2) defininig a method as
    optional and the registration function hooks up a default
    implementation when NULL is seen.  From Jamal Hadi Salim.

8) Fix fragment detection in xen-natback driver, from Paul Durrant.

9) Kill dangling enter_memory_pressure method in cg_proto ops, from
    Eric W Biederman.

10) SKBs that traverse namespaces should have their local_df cleared,
    from Hannes Frederic Sowa.

11) IOCB file position is not being updated by macvtap_aio_read() and
    tun_chr_aio_read().  From Zhi Yong Wu.

12) Don't free virtio_net netdev before releasing all of the NAPI
    instances.  From Andrey Vagin.

13) Procfs entry leak in xt_hashlimit, from Sergey Popovich.

14) IPv6 routes that are no cached routes should not count against the
    garbage collection limits.  We had this almost right, but were
    missing handling addrconf generated routes properly.  From Hannes
    Frederic Sowa.

15) fib{4,6}_rule_suppress() have to consider potentially seeing NULL
    route info when they are called, from Stefan Tomanek.

16) TUN and MACVTAP have had truncated packet signalling for some time,
    fix from Jason Wang.

17) Fix use after frrr in __udp4_lib_rcv(), from Eric Dumazet.

18) xen-netback does not interpret the NAPI budget properly for TX work,
    fix from Paul Durrant.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (132 commits)
  igb: Fix for issue where values could be too high for udelay function.
  i40e: fix null dereference
  xen-netback: fix gso_prefix check
  net: make neigh_priv_len in struct net_device 16bit instead of 8bit
  drivers: net: cpsw: fix for cpsw crash when build as modules
  xen-netback: napi: don't prematurely request a tx event
  xen-netback: napi: fix abuse of budget
  sch_tbf: use do_div() for 64-bit divide
  udp: ipv4: must add synchronization in udp_sk_rx_dst_set()
  net:fec: remove duplicate lines in comment about errata ERR006358
  Revert "8390 : Replace ei_debug with msg_enable/NETIF_MSG_* feature"
  8390 : Replace ei_debug with msg_enable/NETIF_MSG_* feature
  xen-netback: make sure skb linear area covers checksum field
  net: smc91x: Fix device tree based configuration so it's usable
  udp: ipv4: fix potential use after free in udp_v4_early_demux()
  macvtap: signal truncated packets
  tun: unbreak truncated packet signalling
  net: sched: htb: fix the calculation of quantum
  net: sched: tbf: fix the calculation of max_size
  micrel: add support for KSZ8041RNLI
  ...

Merge branch 'x86/urgent' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Peter Anvin:
"This is a pretty small batch:

  The biggest single change is to stop using EFI time services on 32-bit
  platforms.  This matches our current behavior on 64-bit platforms as
  we already had ruled them out there as being too unreliable.  Turns
  out that affects 32-bit platforms, too.

  One NULL pointer fix for SGI UV.

  Two minor build fixes, one of which only affects icc and the other
  which affects icc and future versions or nonstandard default settings
  of gcc"

* 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, efi: Don't use (U)EFI time services on 32 bit
  x86, build, icc: Remove uninitialized_var() from compiler-intel.h
  x86/UV: Fix NULL pointer dereference in uv_flush_tlb_others() if the 'nobau' boot option is used
  x86, build: Pass in additional -mno-mmx, -mno-sse options

Merge tag 'pci-v3.13-fixes-2' of git://git./linux/kernel/git/helgaas/pci

Pull PCI updates from Bjorn Helgaas:
"PCI device hotplug
    - Move device_del() from pci_stop_dev() to pci_destroy_dev() (Rafael
      Wysocki)

  Host bridge drivers
    - Update maintainers for DesignWare, i.MX6, Armada, R-Car (Bjorn
      Helgaas)
    - mvebu: Return 'unsupported' for Interrupt Line and Interrupt Pin
      (Jason Gunthorpe)

  Miscellaneous
    - Avoid unnecessary CPU switch when calling .probe() (Alexander
      Duyck)
    - Revert "workqueue: allow work_on_cpu() to be called recursively"
      (Bjorn Helgaas)
    - Disable Bus Master only on kexec reboot (Khalid Aziz)
    - Omit PCI ID macro strings to shorten quirk names for LTO (Michal
      Marek)"

* tag 'pci-v3.13-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  MAINTAINERS: Add DesignWare, i.MX6, Armada, R-Car PCI host maintainers
  PCI: Disable Bus Master only on kexec reboot
  PCI: mvebu: Return 'unsupported' for Interrupt Line and Interrupt Pin
  PCI: Omit PCI ID macro strings to shorten quirk names
  PCI: Move device_del() from pci_stop_dev() to pci_destroy_dev()
  Revert "workqueue: allow work_on_cpu() to be called recursively"
  PCI: Avoid unnecessary CPU switch when calling driver .probe() method

Merge branch 'for-linus' of git://git./linux/kernel/git/jmorris/linux-security

Pull SELinux fixes from James Morris.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
  selinux: process labeled IPsec TCP SYN-ACK packets properly in selinux_ip_postroute()
  selinux: look for IPsec labels on both inbound and outbound packets
  selinux: handle TCP SYN-ACK packets correctly in selinux_ip_postroute()
  selinux: handle TCP SYN-ACK packets correctly in selinux_ip_output()
  selinux: fix possible memory leak

Revert "selinux: consider filesystem subtype in policies"

This reverts commit 102aefdda4d8275ce7d7100bc16c88c74272b260.

Tom London reports that it causes sync() to hang on Fedora rawhide:

https://bugzilla.redhat.com/show_bug.cgi?id=1033965

and Josh Boyer bisected it down to this commit. Reverting the commit in
the rawhide kernel fixes the problem.

Eric Paris root-caused it to incorrect subtype matching in that commit
breaking fuse, and has a tentative patch, but by now we're better off
retrying this in 3.14 rather than playing with it any more.

Reported-by: Tom London <selinux@gmail.com>
Bisected-by: Josh Boyer <jwboyer@fedoraproject.org>
Acked-by: Eric Paris <eparis@redhat.com>
Cc: James Morris <jmorris@namei.org>
Cc: Anand Avati <avati@redhat.com>
Cc: Paul Moore <paul@paul-moore.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

igb: Fix for issue where values could be too high for udelay function.

This patch changes the igb_phy_has_link function to check the value of the
parameter before deciding to use udelay or mdelay in order to be sure that
the value is not too high for udelay function.

CC: stable kernel <stable@vger.kernel.org> # 3.9+
Signed-off-by: Sunil K Pandey <sunil.k.pandey@intel.com>
Signed-off-by: Kevin B Smith <kevin.b.smith@intel.com>
Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

i40e: fix null dereference

If the vsi->tx_rings structure is NULL we don't want to panic.

Change-Id: Ic694f043701738c434e8ebe0caf0673f4410dc10
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'edac_fixes_for_3.13' of git://git./linux/kernel/git/bp/bp

Pull EDAC fix from Borislav Petkov:
"Silence a compiler warning in sb_edac"

* tag 'edac_fixes_for_3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
sb_edac: Shut up compiler warning when EDAC_DEBUG is enabled

Merge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm

Pull ARM fixes from Russell King:
"This resolves some further issues with the dma mask changes on ARM
  which have been found by TI and others, and also some corner cases
  with the updates to the virtual to physical address translations.

  Konstantin also found some problems with the unwinder, which now
  performs tighter verification that the stack is valid while unwinding"

* 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
  ARM: fix asm/memory.h build error
  ARM: 7917/1: cacheflush: correctly limit range of memory region being flushed
  ARM: 7913/1: fix framepointer check in unwind_frame
  ARM: 7912/1: check stack pointer in get_wchan
  ARM: 7909/1: mm: Call setup_dma_zone() post early_paging_init()
  ARM: 7908/1: mm: Fix the arm_dma_limit calculation
  ARM: another fix for the DMA mapping checks

Merge tag 'arc-fixes-for-3.13' of git://git./linux/kernel/git/vgupta/arc

Pull ARC fixes from Vineet Gupta:
"These are couple of weeks old already, but I just couldn't get them to
  you earlier.

   - couple of fixes for recently added perf code
   - build time extable sort"

* tag 'arc-fixes-for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
  ARC: [perf] Fix a few thinkos
  ARC: Add guard macro to uapi/asm/unistd.h
  ARC: extable: Enable sorting at build time

Merge tag 'dm-3.13-fixes' of git://git./linux/kernel/git/device-mapper/linux-dm

Pull device mapper fixes from Mike Snitzer:
"A set of device-mapper fixes for 3.13.

  A fix for possible memory corruption during DM table load, fix a
  possible leak of snapshot space in case of a crash, fix a possible
  deadlock due to a shared workqueue in the delay target, fix to
  initialize read-only module parameters that are used to export metrics
  for dm stats and dm bufio.

  Quite a few stable fixes were identified for both the thin-
  provisioning and caching targets as a result of increased regression
  testing using the device-mapper-test-suite (dmts).  The most notable
  of these are the reference counting fixes for the space map btree that
  is used by the dm-array interface -- without these the dm-cache
  metadata will leak, resulting in dm-cache devices running out of
  metadata blocks.  Also, some important fixes related to the
  thin-provisioning target's transition to read-only mode on error"

* tag 'dm-3.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm array: fix a reference counting bug in shadow_ablock
  dm space map: disallow decrementing a reference count below zero
  dm stats: initialize read-only module parameter
  dm bufio: initialize read-only module parameters
  dm cache: actually resize cache
  dm cache: update Documentation for invalidate_cblocks's range syntax
  dm cache policy mq: fix promotions to occur as expected
  dm thin: allow pool in read-only mode to transition to read-write mode
  dm thin: re-establish read-only state when switching to fail mode
  dm thin: always fallback the pool mode if commit fails
  dm thin: switch to read-only mode if metadata space is exhausted
  dm thin: switch to read only mode if a mapping insert fails
  dm space map metadata: return on failure in sm_metadata_new_block
  dm table: fail dm_table_create on dm_round_up overflow
  dm snapshot: avoid snapshot space leak on crash
  dm delay: fix a possible deadlock due to shared workqueue

Merge branch 'for-linus' of git://git./linux/kernel/git/jikos/hid

Pull HID fixes from Jiri Kosina:

- Genius Gx Imperator Keyboard regression fix (missing break in case),
   by Ben Hutchings

- duplicate sysfs entry error fix for hid-sensor-hub driver, by
   Srinivas Pandruvada

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
  HID: hid-sensor-hub: fix duplicate sysfs entry error
  HID: kye: Fix missing break in kye_report_fixup()

ARM: fix asm/memory.h build error

Jason Gunthorpe reports a build failure when ARM_PATCH_PHYS_VIRT is
not defined:

In file included from arch/arm/include/asm/page.h:163:0,
                 from include/linux/mm_types.h:16,
                 from include/linux/sched.h:24,
                 from arch/arm/kernel/asm-offsets.c:13:
arch/arm/include/asm/memory.h: In function '__virt_to_phys':
arch/arm/include/asm/memory.h:244:40: error: 'PHYS_OFFSET' undeclared (first use in this function)
arch/arm/include/asm/memory.h:244:40: note: each undeclared identifier is reported only once for each function it appears in
arch/arm/include/asm/memory.h: In function '__phys_to_virt':
arch/arm/include/asm/memory.h:249:13: error: 'PHYS_OFFSET' undeclared (first use in this function)

Fixes: ca5a45c06cd4 ("ARM: mm: use phys_addr_t appropriately in p2v and v2p conversions")
Tested-By: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

Merge tag 'regulator-v3.13-rc3' of git://git./linux/kernel/git/broonie/regulator

Pull regulator fixes from Mark Brown:
"A small set of driver fixes plus one larger core change which changes
  the way we check to see if we're using DT so that there aren't any
  races between deciding we're using DT and the regulator subsystem
  noticing.

  This makes the new support for substituting a dummy regulator and
  optional regulators work a lot better on DT systems since it ensures
  that we don't trigger probe deferral when we shouldn't which was
  causing bugs in clients"

* tag 'regulator-v3.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: pfuze100: allow misprogrammed ID
  regulator: pfuze100: Fix address of FABID
  regulator: as3722: set the correct current limit
  regulator: core: Check for DT every time we check full constraints
  regulator: core: Replace checks of have_full_constraints with a function

Merge tag 'regmap-v3.13-rc3' of git://git./linux/kernel/git/broonie/regmap

Pull regmap fixes from Mark Brown:
"Two small changes to fix some error handling and checking (both of
  which would be quite serious if the errors trigger) plus a trivial
  documentation fix"

* tag 'regmap-v3.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
  regmap: use IS_ERR() to check clk_get() results
  regmap: make sure we unlock on failure in regmap_bulk_write
  regmap: trivial comment fix (copy'n'paste error)

Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:
"Here are two simple but wanted fixes for the i2c subsystem"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: imx: Check the return value from clk_prepare_enable()
i2c: mux: Inherit retry count and timeout from parent for muxed bus

Merge tag 'for-linus-20131212' of git://git.infradead.org/linux-mtd

Pull MTD fixes from Brian Norris:
"Two MTD fixes, for the pxa3xx-nand driver:

   - This driver was not ready to fully Armada 370 NAND, with
     particularly notable problems seen on flash with 2KB page sizes.
     This "compatible" entry really should have been held back until
     3.14 or later.

   - Fix a bug seen in rare cases on the error path of a failed probe
     attempt, where we free unallocated DMA resources"

* tag 'for-linus-20131212' of git://git.infradead.org/linux-mtd:
  mtd: nand: pxa3xx: Use info->use_dma to release DMA resources
  Partially revert "mtd: nand: pxa3xx: Introduce 'marvell,armada370-nand' compatible string"

Merge branch 'fixes' of git://git.infradead.org/users/vkoul/slave-dma

Pull slave-dmaengine fixes from Vinod Koul:
"Here is the common fixes PULL for dmaengine.

  Dan has been working on fixing the build issues in bunch of drivers.
  Here we have one fixing s3c24xx-dma, along with fix from Russell on
  pl08x.  Also we have Kuninori rcar dma fixes.  The s3c24xx-dma which
  was added in last merge window missed updates to usage of DMA_COMPLETE
  so converting the last driver"

* 'fixes' of git://git.infradead.org/users/vkoul/slave-dma:
  dma: fix build breakage in s3c24xx-dma
  Fix pl08x warnings
  rcar-hpbdma: initialise plane information when halted
  rcar-hpbdma: fixup channel busy check for double plane
  rcar-hpbdma: add max transfer size
  dma: mmp_pdma: add missing platform_set_drvdata() in mmp_pdma_probe()
  dmaengine: s3c24xx-dma: use DMA_COMPLETE for dma completion status

dm array: fix a reference counting bug in shadow_ablock

An old array block could have its reference count decremented below
zero when it is being replaced in the btree by a new array block.

The fix is to increment the old ablock's reference count just before
inserting a new ablock into the btree.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 3.9+

dm space map: disallow decrementing a reference count below zero

The old behaviour, returning -EINVAL if a ref_count of 0 would be
decremented, was removed in commit f722063 ("dm space map: optimise
sm_ll_dec and sm_ll_inc"). To fix this regression we return an error
code from the mutator function pointer passed to sm_ll_mutate() and have
dec_ref_count() return -EINVAL if the old ref_count is 0.

Add a DMERR to reflect the potential seriousness of this error.

Also, add missing dm_tm_unlock() to sm_ll_mutate()'s error path.

With this fix the following dmts regression test now passes:
dmtest run --suite cache -n /metadata_use_kernel/

The next patch fixes the higher-level dm-array code that exposed this
regression.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 3.12+

Merge remote-tracking branch 'regulator/topic/constraints' into regulator-linus

Merge branch 'master' of git://git.infradead.org/users/pcmoore/selinux_fixes into for-linus

Merge branch 'akpm' (fixes from Andrew)

Merge patches from Andrew Morton:
  "13 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm: memcg: do not allow task about to OOM kill to bypass the limit
  mm: memcg: fix race condition between memcg teardown and swapin
  thp: move preallocated PTE page table on move_huge_pmd()
  mfd/rtc: s5m: fix register updating by adding regmap for RTC
  rtc: s5m: enable IRQ wake during suspend
  rtc: s5m: limit endless loop waiting for register update
  rtc: s5m: fix unsuccesful IRQ request during probe
  drivers/rtc/rtc-s5m.c: fix info->rtc assignment
  include/linux/kernel.h: make might_fault() a nop for !MMU
  drivers/rtc/rtc-at91rm9200.c: correct alarm over day/month wrap
  procfs: also fix proc_reg_get_unmapped_area() for !MMU case
  mm: memcg: do not declare OOM from __GFP_NOFAIL allocations
  include/linux/hugetlb.h: make isolate_huge_page() an inline

mm: memcg: do not allow task about to OOM kill to bypass the limit

Commit 4942642080ea ("mm: memcg: handle non-error OOM situations more
gracefully") allowed tasks that already entered a memcg OOM condition to
bypass the memcg limit on subsequent allocation attempts hoping this
would expedite finishing the page fault and executing the kill.

David Rientjes is worried that this breaks memcg isolation guarantees
and since there is no evidence that the bypass actually speeds up fault
processing just change it so that these subsequent charge attempts fail
outright. The notable exception being __GFP_NOFAIL charges which are
required to bypass the limit regardless.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: David Rientjes <rientjes@google.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-bt: David Rientjes <rientjes@google.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: memcg: fix race condition between memcg teardown and swapin

There is a race condition between a memcg being torn down and a swapin
triggered from a different memcg of a page that was recorded to belong
to the exiting memcg on swapout (with CONFIG_MEMCG_SWAP extension).  The
result is unreclaimable pages pointing to dead memcgs, which can lead to
anything from endless loops in later memcg teardown (the page is charged
to all hierarchical parents but is not on any LRU list) or crashes from
following the dangling memcg pointer.

Memcgs with tasks in them can not be torn down and usually charges don't
show up in memcgs without tasks.  Swapin with the CONFIG_MEMCG_SWAP
extension is the notable exception because it charges the cgroup that
was recorded as owner during swapout, which may be empty and in the
process of being torn down when a task in another memcg triggers the
swapin:

  teardown:                 swapin:

                            lookup_swap_cgroup_id()
                            rcu_read_lock()
                            mem_cgroup_lookup()
                            css_tryget()
                            rcu_read_unlock()
  disable css_tryget()
  call_rcu()
    offline_css()
      reparent_charges()
                            res_counter_charge() (hierarchical!)
                            css_put()
                              css_free()
                            pc->mem_cgroup = dead memcg
                            add page to dead lru

Add a final reparenting step into css_free() to make sure any such raced
charges are moved out of the memcg before it's finally freed.

In the longer term it would be cleaner to have the css_tryget() and the
res_counter charge under the same RCU lock section so that the charge
reparenting is deferred until the last charge whose tryget succeeded is
visible.  But this will require more invasive changes that will be
harder to evaluate and backport into stable, so better defer them to a
separate change set.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

thp: move preallocated PTE page table on move_huge_pmd()

Andrey Wagin reported crash on VM_BUG_ON() in pgtable_pmd_page_dtor() with
fallowing backtrace:

  free_pgd_range+0x2bf/0x410
  free_pgtables+0xce/0x120
  unmap_region+0xe0/0x120
  do_munmap+0x249/0x360
  move_vma+0x144/0x270
  SyS_mremap+0x3b9/0x510
  system_call_fastpath+0x16/0x1b

The crash can be reproduce with this test case:

  #define _GNU_SOURCE
  #include <sys/mman.h>
  #include <stdio.h>
  #include <unistd.h>

  #define MB (1024 * 1024UL)
  #define GB (1024 * MB)

  int main(int argc, char **argv)
  {
char *p;
int i;

p = mmap((void *) GB, 10 * MB, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
for (i = 0; i < 10 * MB; i += 4096)
p[i] = 1;
mremap(p, 10 * MB, 10 * MB, MREMAP_FIXED | MREMAP_MAYMOVE, 2 * GB);
return 0;
  }

Due to split PMD lock, we now store preallocated PTE tables for THP
pages per-PMD table.  It means we need to move them to other PMD table
if huge PMD moved there.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Andrey Vagin <avagin@openvz.org>
Tested-by: Andrey Vagin <avagin@openvz.org>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mfd/rtc: s5m: fix register updating by adding regmap for RTC

Rename old regmap field of "struct sec_pmic_dev" to "regmap_pmic" and
add new regmap for RTC.

On S5M8767A registers were not properly updated and read due to usage of
the same regmap as the PMIC. This could be observed in various hangs,
e.g. in infinite loop during waiting for UDR field change.

On this chip family the RTC has different I2C address than PMIC so
additional regmap is needed.

Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Reviewed-by: Mark Brown <broonie@linaro.org>
Acked-by: Sangbeom Kim <sbkim73@samsung.com>
Cc: Samuel Ortiz <sameo@linux.intel.com>
Cc: Lee Jones <lee.jones@linaro.org>
Cc: Liam Girdwood <lgirdwood@gmail.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

rtc: s5m: enable IRQ wake during suspend

Add PM suspend/resume ops to rtc-s5m driver and enable IRQ wake during
suspend so the RTC would act like a wake up source. This allows waking
up from suspend to RAM on RTC alarm interrupt.

Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Mark Brown <broonie@linaro.org>
Acked-by: Sangbeom Kim <sbkim73@samsung.com>
Cc: Samuel Ortiz <sameo@linux.intel.com>
Cc: Lee Jones <lee.jones@linaro.org>
Cc: Liam Girdwood <lgirdwood@gmail.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

rtc: s5m: limit endless loop waiting for register update

After setting alarm or time the driver is waiting for UDR register to be
cleared indicating that registers data have been transferred.

Limit the endless loop to only 5 retries.

Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Reviewed-by: Mark Brown <broonie@linaro.org>
Acked-by: Sangbeom Kim <sbkim73@samsung.com>
Cc: Samuel Ortiz <sameo@linux.intel.com>
Cc: Lee Jones <lee.jones@linaro.org>
Cc: Liam Girdwood <lgirdwood@gmail.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

rtc: s5m: fix unsuccesful IRQ request during probe

Probe failed for rtc-s5m:

s5m-rtc s5m-rtc: Failed to request alarm IRQ: 12: -22
s5m-rtc: probe of s5m-rtc failed with error -22

Fix rtc-s5m interrupt request by using regmap_irq_get_virq() for mapping
the IRQ.

Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Reviewed-by: Mark Brown <broonie@linaro.org>
Acked-by: Sangbeom Kim <sbkim73@samsung.com>
Cc: Samuel Ortiz <sameo@linux.intel.com>
Cc: Lee Jones <lee.jones@linaro.org>
Cc: Liam Girdwood <lgirdwood@gmail.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

drivers/rtc/rtc-s5m.c: fix info->rtc assignment

Fix this warning:

drivers/rtc/rtc-s5m.c: In function `s5m_rtc_probe':
drivers/rtc/rtc-s5m.c:545: warning: assignment from incompatible pointer type

struct s5m_rtc_info.rtc has type "struct regmap *", while
struct sec_pmic_dev.rtc has type "struct i2c_client *".

Probably the author wanted to assign "struct sec_pmic_dev.regmap", which
has the correct type.

Also, as "rtc" doesn't make much sense as a name for a regmap, rename it
to "regmap".

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Sangbeom Kim <sbkim73@samsung.com>
Cc: Sachin Kamat <sachin.kamat@linaro.org>
Tested-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

include/linux/kernel.h: make might_fault() a nop for !MMU

The machine cannot fault if !MUU, so make might_fault() a nop for !MMU.

This fixes below build error if
!CONFIG_MMU && (CONFIG_PROVE_LOCKING=y || CONFIG_DEBUG_ATOMIC_SLEEP=y):

  arch/arm/kernel/built-in.o: In function `arch_ptrace':
  arch/arm/kernel/ptrace.c:852: undefined reference to `might_fault'
  arch/arm/kernel/built-in.o: In function `restore_sigframe':
  arch/arm/kernel/signal.c:173: undefined reference to `might_fault'
  ...
  arch/arm/kernel/built-in.o:arch/arm/kernel/signal.c:177: more undefined references to `might_fault' follow
  make: *** [vmlinux] Error 1

Signed-off-by: Axel Lin <axel.lin@ingics.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

drivers/rtc/rtc-at91rm9200.c: correct alarm over day/month wrap

Update month and day of month to the alarm month/day instead of current
day/month when setting the RTC alarm mask.

Signed-off-by: Linus Pizunski <linus@narrativeteam.com>
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

procfs: also fix proc_reg_get_unmapped_area() for !MMU case

Commit fad1a86e25e0 ("procfs: call default get_unmapped_area on
MMU-present architectures"), as its title says, took care of only the
MMU case, leaving the !MMU side still in the regressed state (returning
-EIO in all cases where pde->proc_fops->get_unmapped_area is NULL).

From the fad1a86e25e0 changelog:

"Commit c4fe24485729 ("sparc: fix PCI device proc file mmap(2)") added
  proc_reg_get_unmapped_area in proc_reg_file_ops and
  proc_reg_file_ops_no_compat, by which now mmap always returns EIO if
  get_unmapped_area method is not defined for the target procfs file, which
  causes regression of mmap on /proc/vmcore.

  To address this issue, like get_unmapped_area(), call default
  current->mm->get_unmapped_area on MMU-present architectures if
  pde->proc_fops->get_unmapped_area, i.e.  the one in actual file operation
  in the procfs file, is not defined"

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: <stable@vger.kernel.org> [3.12.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: memcg: do not declare OOM from __GFP_NOFAIL allocations

Commit 84235de394d9 ("fs: buffer: move allocation failure loop into the
allocator") started recognizing __GFP_NOFAIL in memory cgroups but
forgot to disable the OOM killer.

Any task that does not fail allocation will also not enter the OOM
completion path. So don't declare an OOM state in this case or it'll be
leaked and the task be able to bypass the limit until the next
userspace-triggered page fault cleans up the OOM state.

Reported-by: William Dauchy <wdauchy@gmail.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: <stable@vger.kernel.org> [3.12.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

include/linux/hugetlb.h: make isolate_huge_page() an inline

With CONFIG_HUGETLBFS=n:

  mm/migrate.c: In function `do_move_page_to_node_array':
  include/linux/hugetlb.h:140:33: warning: statement with no effect [-Wunused-value]
   #define isolate_huge_page(p, l) false
                                   ^
  mm/migrate.c:1170:4: note: in expansion of macro `isolate_huge_page'
      isolate_huge_page(page, &pagelist);

Reported-by: Borislav Petkov <bp@alien8.de>
Tested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'for-linus' of git://git./virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
"Four security fixes for KVM on x86.  Thanks to Andrew Honig and Lars
  Bull from Google for reporting them"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: fix guest-initiated crash with x2apic (CVE-2013-6376)
  KVM: x86: Convert vapic synchronization to _cached functions (CVE-2013-6368)
  KVM: x86: Fix potential divide by 0 in lapic (CVE-2013-6367)
  KVM: Improve create VCPU parameter (CVE-2013-4587)

Merge tag 'fixes-for-linus' of git://git./linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Olof Johansson:
"Another week, another batch of fixes.

  Again, OMAP regressions due to move to DT is the bulk of the changes
  here, but this should be the last of it for 3.13.  There are also a
  handful of OMAP hwmod changes (power management, reset handling) for
  USB on OMAP3 that fixes some longish-standing bugs around USB resets.

  There are a couple of other changes that also add up line count a bit:
  One is a long-standing bug with the keyboard layout on one of the PXA
  platforms.  The other is a fix for highbank that moves their
  power-off/reset button handling to be done in-kernel since relying on
  userspace to handle it was fragile and awkward"

* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  ARM: sun6i: dt: Fix interrupt trigger types
  ARM: sun7i: dt: Fix interrupt trigger types
  MAINTAINERS: merge IMX6 entry into IMX
  ARM: tegra: add missing break to fuse initialization code
  ARM: pxa: prevent PXA270 occasional reboot freezes
  ARM: pxa: tosa: fix keys mapping
  ARM: OMAP2+: omap_device: add fail hook for runtime_pm when bad data is detected
  ARM: OMAP2+: hwmod: Fix usage of invalid iclk / oclk when clock node is not present
  ARM: OMAP3: hwmod data: Don't prevent RESET of USB Host module
  ARM: OMAP2+: hwmod: Fix SOFTRESET logic
  ARM: OMAP4+: hwmod data: Don't prevent RESET of USB Host module
  ARM: dts: Fix booting for secure omaps
  ARM: OMAP2+: Fix the machine entry for am3517
  ARM: dts: Fix missing entries for am3517
  ARM: OMAP2+: Fix overwriting hwmod data with data from device tree
  ARM: davinci: Fix McASP mem resource names
  ARM: highbank: handle soft poweroff and reset key events
  ARM: davinci: fix number of resources passed to davinci_gpio_register()
  gpio: davinci: fix check for unbanked gpio

Merge branch 'for-linus' of git://git./linux/kernel/git/mason/linux-btrfs

Pull btrfs fixes from Chris Mason:
"This is a small collection of fixes.  It was rebased this morning, but
  I was just fixing signed-off-by tags with the wrong email"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: fix access_ok() check in btrfs_ioctl_send()
  Btrfs: make sure we cleanup all reloc roots if error happens
  Btrfs: skip building backref tree for uuid and quota tree when doing balance relocation
  Btrfs: fix an oops when doing balance relocation
  Btrfs: don't miss skinny extent items on delayed ref head contention
  btrfs: call mnt_drop_write after interrupted subvol deletion
  Btrfs: don't clear the default compression type

Merge branch 'for-3.13' of git://linux-nfs.org/~bfields/linux

Pull nfsd reply cache bugfix from Bruce Fields:
"One bugfix for nfsd crashes"

* 'for-3.13' of git://linux-nfs.org/~bfields/linux:
nfsd: when reusing an existing repcache entry, unhash it first

mtd: nand: pxa3xx: Use info->use_dma to release DMA resources

In commit:

  commit 62e8b851783138a11da63285be0fbf69530ff73d
  Author: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
  Date:   Fri Oct 4 15:30:38 2013 -0300

  mtd: nand: pxa3xx: Allocate data buffer on detected flash size

the way the buffer is allocated was changed: the first READ_ID is issued
with a small kmalloc'ed buffer. Only once the flash page size is detected
the DMA buffers are allocated, and info->use_dma is set.

Currently, if the device detection fails, the driver checks the 'use_dma'
module parameter and tries to release unallocated DMA resources.

Fix this by checking the proper indicator of the DMA allocation, which
is 'info->use_dma'.

Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>

Partially revert "mtd: nand: pxa3xx: Introduce 'marvell,armada370-nand' compatible string"

This partially reverts c0f3b8643a6fa2461d70760ec49d21d2b031d611.

The "armada370-nand" compatible support is not complete, and it was mistake
to add it. Revert it and postpone the support until the infrastructure is
in place.

Cc: <stable@vger.kernel.org> # 3.12
Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Acked-by: Jason Cooper <jason@lakedaemon.net>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>

selinux: process labeled IPsec TCP SYN-ACK packets properly in selinux_ip_postroute()

Due to difficulty in arriving at the proper security label for
TCP SYN-ACK packets in selinux_ip_postroute(), we need to check packets
while/before they are undergoing XFRM transforms instead of waiting
until afterwards so that we can determine the correct security label.

Reported-by: Janak Desai <Janak.Desai@gtri.gatech.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>

selinux: look for IPsec labels on both inbound and outbound packets

Previously selinux_skb_peerlbl_sid() would only check for labeled
IPsec security labels on inbound packets, this patch enables it to
check both inbound and outbound traffic for labeled IPsec security
labels.

Reported-by: Janak Desai <Janak.Desai@gtri.gatech.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>

selinux: handle TCP SYN-ACK packets correctly in selinux_ip_postroute()

In selinux_ip_postroute() we perform access checks based on the
packet's security label.  For locally generated traffic we get the
packet's security label from the associated socket; this works in all
cases except for TCP SYN-ACK packets.  In the case of SYN-ACK packet's
the correct security label is stored in the connection's request_sock,
not the server's socket.  Unfortunately, at the point in time when
selinux_ip_postroute() is called we can't query the request_sock
directly, we need to recreate the label using the same logic that
originally labeled the associated request_sock.

See the inline comments for more explanation.

Reported-by: Janak Desai <Janak.Desai@gtri.gatech.edu>
Tested-by: Janak Desai <Janak.Desai@gtri.gatech.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>

selinux: handle TCP SYN-ACK packets correctly in selinux_ip_output()

In selinux_ip_output() we always label packets based on the parent
socket. While this approach works in almost all cases, it doesn't
work in the case of TCP SYN-ACK packets when the correct label is not
the label of the parent socket, but rather the label of the larval
socket represented by the request_sock struct.

Unfortunately, since the request_sock isn't queued on the parent
socket until *after* the SYN-ACK packet is sent, we can't lookup the
request_sock to determine the correct label for the packet; at this
point in time the best we can do is simply pass/NF_ACCEPT the packet.
It must be said that simply passing the packet without any explicit
labeling action, while far from ideal, is not terrible as the SYN-ACK
packet will inherit any IP option based labeling from the initial
connection request so the label *should* be correct and all our
access controls remain in place so we shouldn't have to worry about
information leaks.

Reported-by: Janak Desai <Janak.Desai@gtri.gatech.edu>
Tested-by: Janak Desai <Janak.Desai@gtri.gatech.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>

i2c: imx: Check the return value from clk_prepare_enable()

clk_prepare_enable() may fail, so let's check its return value and propagate it
in the case of error.

Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>

KVM: x86: fix guest-initiated crash with x2apic (CVE-2013-6376)

A guest can cause a BUG_ON() leading to a host kernel crash.
When the guest writes to the ICR to request an IPI, while in x2apic
mode the following things happen, the destination is read from
ICR2, which is a register that the guest can control.

kvm_irq_delivery_to_apic_fast uses the high 16 bits of ICR2 as the
cluster id. A BUG_ON is triggered, which is a protection against
accessing map->logical_map with an out-of-bounds access and manages
to avoid that anything really unsafe occurs.

The logic in the code is correct from real HW point of view. The problem
is that KVM supports only one cluster with ID 0 in clustered mode, but
the code that has the bug does not take this into account.

Reported-by: Lars Bull <larsbull@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86: Convert vapic synchronization to _cached functions (CVE-2013-6368)

In kvm_lapic_sync_from_vapic and kvm_lapic_sync_to_vapic there is the
potential to corrupt kernel memory if userspace provides an address that
is at the end of a page.  This patches concerts those functions to use
kvm_write_guest_cached and kvm_read_guest_cached.  It also checks the
vapic_address specified by userspace during ioctl processing and returns
an error to userspace if the address is not a valid GPA.

This is generally not guest triggerable, because the required write is
done by firmware that runs before the guest.  Also, it only affects AMD
processors and oldish Intel that do not have the FlexPriority feature
(unless you disable FlexPriority, of course; then newer processors are
also affected).

Fixes: b93463aa59d6 ('KVM: Accelerated apic support')

Reported-by: Andrew Honig <ahonig@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Andrew Honig <ahonig@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86: Fix potential divide by 0 in lapic (CVE-2013-6367)

Under guest controllable circumstances apic_get_tmcct will execute a
divide by zero and cause a crash. If the guest cpuid support
tsc deadline timers and performs the following sequence of requests
the host will crash.
- Set the mode to periodic
- Set the TMICT to 0
- Set the mode bits to 11 (neither periodic, nor one shot, nor tsc deadline)
- Set the TMICT to non-zero.
Then the lapic_timer.period will be 0, but the TMICT will not be. If the
guest then reads from the TMCCT then the host will perform a divide by 0.

This patch ensures that if the lapic_timer.period is 0, then the division
does not occur.

Reported-by: Andrew Honig <ahonig@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Andrew Honig <ahonig@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: Improve create VCPU parameter (CVE-2013-4587)

In multiple functions the vcpu_id is used as an offset into a bitfield.  Ag
malicious user could specify a vcpu_id greater than 255 in order to set or
clear bits in kernel memory.  This could be used to elevate priveges in the
kernel.  This patch verifies that the vcpu_id provided is less than 255.
The api documentation already specifies that the vcpu_id must be less than
max_vcpus, but this is currently not checked.

Reported-by: Andrew Honig <ahonig@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Andrew Honig <ahonig@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i2c: mux: Inherit retry count and timeout from parent for muxed bus

If a muxed i2c bus gets created the default retry count and
timeout of the muxed bus is zero. Hence it it possible that you
end up with a situation where the parent controller sets a default
retry count and timeout which gets applied and used while the muxed
bus (using the same controller) has a default retry count of zero
and a default timeout of 1s (set in i2c_add_adapter()). This can be
solved by initializing the retry count and timeout of the muxed
bus with the values used by the the parent at creation time.

Signed-off-by: Elie De Brauwer <eliedebrauwer@gmail.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>

Merge tag 'sound-3.13-rc4' of git://git./linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"Still a slightly high amount of changes than wished, but they are all
  good regression and/or device-specific fixes.  Majority of commits are
  for HD-audio, an HDMI ctl index fix that hits old graphics boards,
  regression fixes for AD codecs and a few quirks.

  Other than that, two major fixes are included: a 64bit ABI fix for
  compress offload, and 64bit dma_addr_t truncation fix, which had hit
  on PAE kernels"

* tag 'sound-3.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda - Add static DAC/pin mapping for AD1986A codec
  ALSA: hda - One more Dell headset detection quirk
  ALSA: hda - hdmi: Fix IEC958 ctl indexes for some simple HDMI devices
  ALSA: hda - Mute all aamix inputs as default
  ALSA: compress: Fix 64bit ABI incompatibility
  ALSA: memalloc.h - fix wrong truncation of dma_addr_t
  ALSA: hda - Another Dell headset detection quirk
  ALSA: hda - A Dell headset detection quirk
  ALSA: hda - Remove quirk for Dell Vostro 131
  ALSA: usb-audio: fix uninitialized variable compile warning
  ALSA: hda - fix mic issues on Acer Aspire E-572

Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input

Pull input fixes from Dmitry Torokhov:
"A fix for recent sysfs breakage in serio subsystem plus a fixup to
  adxl34x driver"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: adxl34x - Fix bug in definition of ADXL346_2D_ORIENT
  Input: serio - fix sysfs layout

xen-netback: fix gso_prefix check

There is a mistake in checking the gso_prefix mask when passing large
packets to a guest. The wrong shift is applied to the bit - the raw skb
gso type is used rather then the translated one. This leads to large packets
being handed to the guest without the GSO metadata. This patch fixes the
check.

The mistake manifested as errors whilst running Microsoft HCK large packet
offload tests between a pair of Windows 8 VMs. I have verified this patch
fixes those errors.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: make neigh_priv_len in struct net_device 16bit instead of 8bit

neigh_priv_len is defined as u8. With all debug enabled struct
ipoib_neigh has 200 bytes. The largest part is sk_buff_head with 96
bytes and here the spinlock with 72 bytes.
The size value still fits in this u8 leaving some room for more.

On -RT struct ipoib_neigh put on weight and has 392 bytes. The main
reason is sk_buff_head with 288 and the fatty here is spinlock with 192
bytes. This does no longer fit into into neigh_priv_len and gcc
complains.

This patch changes neigh_priv_len from being 8bit to 16bit. Since the
following element (dev_id) is 16bit followed by a spinlock which is
aligned, the struct remains with a total size of 3200 (allmodconfig) /
2048 (with as much debug off as possible) bytes on x86-64.
On x86-32 the struct is 1856 (allmodconfig) / 1216 (with as much debug
off as possible) bytes long. The numbers were gained with and without
the patch to prove that this change does not increase the size of the
struct.

Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'v4l_for_linus' of git://git./linux/kernel/git/mchehab/linux-media

Pull media fixes from Mauro Carvalho Chehab:
"A dvb core deadlock fix, a couple videobuf2 fixes an a series of media
  driver fixes"

* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (30 commits)
  [media] videobuf2-dma-sg: fix possible memory leak
  [media] vb2: regression fix: always set length field.
  [media] mt9p031: Include linux/of.h header
  [media] rtl2830: add parent for I2C adapter
  [media] media: marvell-ccic: use devm to release clk
  [media] ths7303: Declare as static a private function
  [media] em28xx-video: Swap release order to avoid lock nesting
  [media] usbtv: Add support for PAL video source
  [media] media_tree: Fix spelling errors
  [media] videobuf2: Add support for file access mode flags for DMABUF exporting
  [media] radio-shark2: Mark shark_resume_leds() inline to kill compiler warning
  [media] radio-shark: Mark shark_resume_leds() inline to kill compiler warning
  [media] af9035: unlock on error in af9035_i2c_master_xfer()
  [media] af9033: fix broken I2C
  [media] v4l: omap3isp: Don't check for missing get_fmt op on remote subdev
  [media] af9035: fix broken I2C and USB I/O
  [media] wm8775: fix broken audio routing
  [media] marvell-ccic: drop resource free in driver remove
  [media] tef6862/radio-tea5764: actually assign clamp result
  [media] cx231xx: use after free on error path in probe
  ...

Merge tag 'hwmon-for-linus' of git://git./linux/kernel/git/groeck/linux-staging

Pull hwmon fix from Guenter Roeck:
"Fix HIH-6130 driver to work with BeagleBone"

* tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: HIH-6130: Support I2C bus drivers without I2C_FUNC_SMBUS_QUICK

Merge branch 'hwmon-for-linus' of git://git./linux/kernel/git/jdelvare/staging

Pull hwmon fixes from Jean Delvare.

* 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
  hwmon: Prevent some divide by zeros in FAN_TO_REG()
  hwmon: (w83l768ng) Fix fan speed control range
  hwmon: (w83l786ng) Fix fan speed control mode setting and reporting
  hwmon: (lm90) Unregister hwmon device if interrupt setup fails

drivers: net: cpsw: fix for cpsw crash when build as modules

When CPSW and Davinci MDIO are build as modules, CPSW crashes when
accessing CPSW registers in CPSW probe. The same is working in built-in
as the CPSW clocks are enabled in Davindi MDIO probe, SO Enabling the
clocks before accessing the version register and moving out the other
register access to cpsw device open.

Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

word-at-a-time: provide generic big-endian zero_bytemask implementation

Whilst architectures may be able to do better than this (which they can,
by simply defining their own macro), this is a generic stab at a
zero_bytemask implementation for the asm-generic, big-endian
word-at-a-time implementation.

On arm64, a clz instruction is used to implement the fls efficiently.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

dcache: allow word-at-a-time name hashing with big-endian CPUs

When explicitly hashing the end of a string with the word-at-a-time
interface, we have to be careful which end of the word we pick up.

On big-endian CPUs, the upper-bits will contain the data we're after, so
ensure we generate our masks accordingly (and avoid hashing whatever
random junk may have been sitting after the string).

This patch adds a new dcache helper, bytemask_from_count, which creates
a mask appropriate for the CPU endianness.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

xen-netback: napi: don't prematurely request a tx event

This patch changes the RING_FINAL_CHECK_FOR_REQUESTS in
xenvif_build_tx_gops to a check for RING_HAS_UNCONSUMED_REQUESTS as the
former call has the side effect of advancing the ring event pointer and
therefore inviting another interrupt from the frontend before the napi
poll has actually finished, thereby defeating the point of napi.

The event pointer is updated by RING_FINAL_CHECK_FOR_REQUESTS in
xenvif_poll, the napi poll function, if the work done is less than the
budget i.e. when actually transitioning back to interrupt mode.

Reported-by: Malcolm Crossley <malcolm.crossley@citrix.com>
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

xen-netback: napi: fix abuse of budget

netback seems to be somewhat confused about the napi budget parameter. The
parameter is supposed to limit the number of skbs processed in each poll,
but netback has this confused with grant operations.

This patch fixes that, properly limiting the work done in each poll. Note
that this limit makes sure we do not process any more data from the shared
ring than we intend to pass back from the poll. This is important to
prevent tx_queue potentially growing without bound.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'iommu-fixes-for-v3.13-rc4' of git://github.com/awilliam/linux-vfio

Pull iommu fixes from Alex Williamson:
"arm/smmu driver updates via Will Deacon fixing locking around page
  table walks and a couple other issues"

* tag 'iommu-fixes-for-v3.13-rc4' of git://github.com/awilliam/linux-vfio:
  iommu/arm-smmu: fix error return code in arm_smmu_device_dt_probe()
  iommu/arm-smmu: remove potential NULL dereference on mapping path
  iommu/arm-smmu: use mutex instead of spinlock for locking page tables

Merge tag 'keys-devel-20131210' of git://git./linux/kernel/git/dhowells/linux-fs

Pull misc keyrings fixes from David Howells:
"These break down into five sets:

   - A patch to error handling in the big_key type for huge payloads.
     If the payload is larger than the "low limit" and the backing store
     allocation fails, then big_key_instantiate() doesn't clear the
     payload pointers in the key, assuming them to have been previously
     cleared - but only one of them is.

     Unfortunately, the garbage collector still calls big_key_destroy()
     when sees one of the pointers with a weird value in it (and not
     NULL) which it then tries to clean up.

   - Three patches to fix the keyring type:

     * A patch to fix the hash function to correctly divide keyrings off
       from keys in the topology of the tree inside the associative
       array.  This is only a problem if searching through nested
       keyrings - and only if the hash function incorrectly puts the a
       keyring outside of the 0 branch of the root node.

     * A patch to fix keyrings' use of the associative array.  The
       __key_link_begin() function initially passes a NULL key pointer
       to assoc_array_insert() on the basis that it's holding a place in
       the tree whilst it does more allocation and stuff.

       This is only a problem when a node contains 16 keys that match at
       that level and we want to add an also matching 17th.  This should
       easily be manufactured with a keyring full of keyrings (without
       chucking any other sort of key into the mix) - except for (a)
       above which makes it on average adding the 65th keyring.

     * A patch to fix searching down through nested keyrings, where any
       keyring in the set has more than 16 keyrings and none of the
       first keyrings we look through has a match (before the tree
       iteration needs to step to a more distal node).

     Test in keyutils test suite:

        http://git.kernel.org/cgit/linux/kernel/git/dhowells/keyutils.git/commit/?id=8b4ae963ed92523aea18dfbb8cab3f4979e13bd1

   - A patch to fix the big_key type's use of a shmem file as its
     backing store causing audit messages and LSM check failures.  This
     is done by setting S_PRIVATE on the file to avoid LSM checks on the
     file (access to the shmem file goes through the keyctl() interface
     and so is gated by the LSM that way).

     This isn't normally a problem if a key is used by the context that
     generated it - and it's currently only used by libkrb5.

     Test in keyutils test suite:

        http://git.kernel.org/cgit/linux/kernel/git/dhowells/keyutils.git/commit/?id=d9a53cbab42c293962f2f78f7190253fc73bd32e

   - A patch to add a generated file to .gitignore.

   - A patch to fix the alignment of the system certificate data such
     that it it works on s390.  As I understand it, on the S390 arch,
     symbols must be 2-byte aligned because loading the address discards
     the least-significant bit"

* tag 'keys-devel-20131210' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  KEYS: correct alignment of system_certificate_list content in assembly file
  Ignore generated file kernel/x509_certificate_list
  security: shmem: implement kernel private shmem inodes
  KEYS: Fix searching of nested keyrings
  KEYS: Fix multiple key add into associative array
  KEYS: Fix the keyring hash function
  KEYS: Pre-clear struct key on allocation

Merge tag 'xfs-for-linus-v3.13-rc4' of git://oss.sgi.com/xfs/xfs

Pull xfs bugfixes from Ben Myers:

- fix for buffer overrun in agfl with growfs on v4 superblock

- return EINVAL if requested discard length is less than a block

- fix possible memory corruption in xfs_attrlist_by_handle()

* tag 'xfs-for-linus-v3.13-rc4' of git://oss.sgi.com/xfs/xfs:
  xfs: growfs overruns AGFL buffer on V4 filesystems
  xfs: don't perform discard if the given range length is less than block size
  xfs: underflow bug in xfs_attrlist_by_handle()

futex: move user address verification up to common code

When debugging the read-only hugepage case, I was confused by the fact
that get_futex_key() did an access_ok() only for the non-shared futex
case, since the user address checking really isn't in any way specific
to the private key handling.

Now, it turns out that the shared key handling does effectively do the
equivalent checks inside get_user_pages_fast() (it doesn't actually
check the address range on x86, but does check the page protections for
being a user page). So it wasn't actually a bug, but the fact that we
treat the address differently for private and shared futexes threw me
for a loop.

Just move the check up, so that it gets done for both cases. Also, use
the 'rw' parameter for the type, even if it doesn't actually matter any
more (it's a historical artifact of the old racy i386 "page faults from
kernel space don't check write protections").

Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>