Peter Hurley [Wed, 24 Jul 2013 12:29:56 +0000 (08:29 -0400)]
n_tty: Factor LNEXT processing from per-char i/o path
LNEXT processing accounts for ~15% of total cpu time in end-to-end
tty i/o; factor the lnext test/clear from the per-char i/o path.
Instead, attempt to immediately handle the literal next char if not
at the end of this received buffer; otherwise, handle the first char
of the next received buffer as the literal next char, then continue
with normal i/o.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Wed, 24 Jul 2013 12:29:55 +0000 (08:29 -0400)]
n_tty: Un-inline single-use functions
gcc will likely inline these single-use functions anyway; remove
inline modifier.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Wed, 24 Jul 2013 12:29:54 +0000 (08:29 -0400)]
n_tty: Remove overflow tests from receive_buf() path
Always pre-figure the space available in the read_buf and limit
the inbound receive request to that amount.
For compatibility reasons with the non-flow-controlled interface,
n_tty_receive_buf() will continue filling read_buf until all data
has been received or receive_room() returns 0.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Wed, 24 Jul 2013 12:29:53 +0000 (08:29 -0400)]
n_tty: Factor PARMRK from normal per-char i/o
Handle PARMRK processing on the slow per-char i/o path.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Wed, 24 Jul 2013 12:29:52 +0000 (08:29 -0400)]
n_tty: Factor ISTRIP and IUCLC receive_buf into separate fn
Convert to modal receive_buf processing; factor char receive
processing for unusual termios settings out of normal per-char
i/o path.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Wed, 24 Jul 2013 12:29:51 +0000 (08:29 -0400)]
n_tty: Split n_tty_receive_char()
Factor 'special' per-char processing into standalone fn,
n_tty_receive_char_special(), which handles processing for chars
marked in the char_map.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Wed, 24 Jul 2013 12:29:50 +0000 (08:29 -0400)]
n_tty: Eliminate char tests from IXANY restart test
Relocate the IXANY restart tty test to code paths where the
the received char is not START_CHAR, STOP_CHAR, INTR_CHAR,
QUIT_CHAR or SUSP_CHAR.
Fixes the condition when ISIG if off and one of INTR_CHAR,
QUIT_CHAR or SUSP_CHAR does not restart i/o.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Wed, 24 Jul 2013 12:29:49 +0000 (08:29 -0400)]
n_tty: Factor standard per-char i/o into separate fn
Simplify __receive_buf() into a dispatch function; perform per-char
processing for all other modes not already handled.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Wed, 24 Jul 2013 13:30:05 +0000 (09:30 -0400)]
n_tty: Fix build breakage on ppc64
Commit
20bafb3d23d108bc0a896eb8b7c1501f4f649b77
'n_tty: Move buffers into n_tty_data'
broke the ppc64 build.
Include vmalloc.h for the required function declarations.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Sat, 15 Jun 2013 14:21:27 +0000 (10:21 -0400)]
n_tty: Factor tty->closing receive_buf() into separate fn
Convert to modal receive_buf() processing; factor receive char
processing when tty->closing into n_tty_receive_buf_closing().
Note that EXTPROC when ISTRIP or IUCLC is set continues to be
handled by n_tty_receive_char().
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Sat, 15 Jun 2013 14:21:26 +0000 (10:21 -0400)]
n_tty: Special case EXTPROC receive_buf() as raw mode
When EXTPROC is set without ISTRIP or IUCLC, processing is
identical to raw mode; handle this receiving mode as a special-case
of raw mode.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Sat, 15 Jun 2013 14:21:25 +0000 (10:21 -0400)]
n_tty: Factor raw mode receive_buf() into separate fn
Convert to modal receive_buf() processing; factor raw mode
per-char i/o into n_tty_receive_buf_raw().
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Sat, 15 Jun 2013 14:21:24 +0000 (10:21 -0400)]
n_tty: Factor flagged char handling into separate fn
Prepare for modal receive_buf() handling; factor handling for
TTY_BREAK, TTY_PARITY, TTY_FRAME and TTY_OVERRUN into
n_tty_receive_char_flagged().
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Sat, 15 Jun 2013 14:21:23 +0000 (10:21 -0400)]
n_tty: Factor signal char handling into separate fn
Reduce the monolithic n_tty_receive_char() complexity; factor the
handling of INTR_CHAR, QUIT_CHAR and SUSP_CHAR into
n_tty_receive_signal_char().
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Sat, 15 Jun 2013 14:21:22 +0000 (10:21 -0400)]
n_tty: Factor 'real raw' receive_buf into standalone fn
Convert to modal receive_buf() processing; factor real_raw
receive_buf() into n_tty_receive_buf_real_raw().
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Sat, 15 Jun 2013 14:21:21 +0000 (10:21 -0400)]
n_tty: Simplify __receive_buf loop count
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Sat, 15 Jun 2013 14:21:20 +0000 (10:21 -0400)]
n_tty: Rename process_char_map to char_map
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Sat, 15 Jun 2013 14:21:19 +0000 (10:21 -0400)]
n_tty: Move buffers into n_tty_data
Reduce pointer reloading and improve locality-of-reference;
allocate read_buf and echo_buf within struct n_tty_data.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Sat, 15 Jun 2013 14:21:18 +0000 (10:21 -0400)]
n_tty: Remove alias ptrs in __receive_buf()
The char and flag buffer local alias pointers, p and f, are
unnecessary; remove them.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Sat, 15 Jun 2013 14:21:17 +0000 (10:21 -0400)]
n_tty: Fix EOF push handling
In canonical mode, an EOF which is not the first character of the line
causes read() to complete and return the number of characters read so
far (commonly referred to as EOF push). However, if the previous read()
returned because the user buffer was full _and_ the next character
is an EOF not at the beginning of the line, read() must not return 0,
thus mistakenly indicating the end-of-file condition.
The TTY_PUSH flag is used to indicate an EOF was received which is not
at the beginning of the line. Because the EOF push condition is
evaluated by a thread other than the read(), multiple EOF pushes can
cause a premature end-of-file to be indicated.
Instead, discover the 'EOF push as first read character' condition
from the read() thread itself, and restart the i/o loop if detected.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Sat, 15 Jun 2013 14:04:29 +0000 (10:04 -0400)]
n_tty: Avoid false-sharing echo buffer indices
Separate the head & commit indices from the tail index to avoid
cache-line contention (so called 'false-sharing') between concurrent
threads.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Sat, 15 Jun 2013 14:04:28 +0000 (10:04 -0400)]
n_tty: Eliminate counter in __process_echoes
Since neither echo_commit nor echo_tail can change for the duration
of __process_echoes loop, substitute index comparison for the
snapshot counter.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Sat, 15 Jun 2013 14:04:27 +0000 (10:04 -0400)]
n_tty: Only flush echo output if actually output
Don't have the driver flush received echoes if no echoes were
actually output.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Sat, 15 Jun 2013 14:04:26 +0000 (10:04 -0400)]
n_tty: Process echoes in blocks
Byte-by-byte echo output is painfully slow, requiring a lock/unlock
cycle for every input byte.
Instead, perform the echo output in blocks of 256 characters, and
at least once per flip buffer receive. Enough space is reserved in
the echo buffer to guarantee a full block can be saved without
overrunning the echo output. Overrun is prevented by discarding
the oldest echoes until enough space exists in the echo buffer
to receive at least a full block of new echoes.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Sat, 15 Jun 2013 14:04:25 +0000 (10:04 -0400)]
n_tty: Eliminate echo_commit memory barrier
Use output_lock mutex as a memory barrier when storing echo_commit.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Sat, 15 Jun 2013 14:04:24 +0000 (10:04 -0400)]
n_tty: Remove echo_lock
Adding data to echo_buf (via add_echo_byte()) is guaranteed to be
single-threaded, since all callers are from the n_tty_receive_buf()
path. Processing the echo_buf can be called from either the
n_tty_receive_buf() path or the n_tty_write() path; however, these
callers are already serialized by output_lock.
Publish cumulative echo_head changes to echo_commit; process echo_buf
from echo_tail to echo_commit; remove echo_lock.
On echo_buf overrun, claim output_lock to serialize changes to
echo_tail.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Sat, 15 Jun 2013 14:04:23 +0000 (10:04 -0400)]
n_tty: Replace echo_cnt with computed value
Prepare for lockless echo_buf handling; compute current byte count
of echo_buf from head and tail indices.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Sat, 15 Jun 2013 14:04:22 +0000 (10:04 -0400)]
n_tty: Use separate head and tail indices for echo_buf
Instead of using a single index to track the current echo_buf position,
use a head index when adding to the buffer and a tail index when
consuming from the buffer. Allow these head and tail indices to wrap
at max representable value; perform modulo reduction via helper
functions when accessing the buffer.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Sat, 15 Jun 2013 14:04:21 +0000 (10:04 -0400)]
n_tty: Remove unused echo_overrun field
The echo_overrun field is only assigned and never tested; remove it.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Sat, 15 Jun 2013 13:36:16 +0000 (09:36 -0400)]
tty: Remove private constant from global namespace
TTY_BUFFER_PAGE is only used within drivers/tty/tty_buffer.c;
relocate to that file scope.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Sat, 15 Jun 2013 13:36:15 +0000 (09:36 -0400)]
tty: Fix unsafe vt paste_selection()
Convert the tty_buffer_flush() exclusion mechanism to a
public interface - tty_buffer_lock/unlock_exclusive() - and use
the interface to safely write the paste selection to the line
discipline.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Sat, 15 Jun 2013 13:36:14 +0000 (09:36 -0400)]
tty: Merge __tty_flush_buffer() into lone call site
__tty_flush_buffer() is now only called by tty_flush_buffer();
merge functions.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Sat, 15 Jun 2013 13:36:13 +0000 (09:36 -0400)]
tty: Use non-atomic state to signal flip buffer flush pending
Atomic bit ops are no longer required to indicate a flip buffer
flush is pending, as the flush_mutex is sufficient barrier.
Remove the unnecessary port .iflags field and localize flip buffer
state to struct tty_bufhead.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Sat, 15 Jun 2013 13:36:12 +0000 (09:36 -0400)]
tty: Avoid false-sharing flip buffer ptrs
Separate the head and tail ptrs to avoid cache-line contention
(so called 'false-sharing') between concurrent threads.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Sat, 15 Jun 2013 13:36:11 +0000 (09:36 -0400)]
tty: Only perform flip buffer flush from tty_buffer_flush()
Now that dropping the buffer lock is not necessary (as result of
converting the spin lock to a mutex), the flip buffer flush no
longer needs to be handled by the buffer work.
Simply signal a flush is required; the buffer work will exit the
i/o loop, which allows tty_buffer_flush() to proceed.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Sat, 15 Jun 2013 13:36:10 +0000 (09:36 -0400)]
tty: Ensure single-threaded flip buffer consumer with mutex
The buffer work may race with parallel tty_buffer_flush. Use a
mutex to guarantee exclusive modify access to the head flip
buffer.
Remove the unneeded spin lock.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Sat, 15 Jun 2013 13:36:09 +0000 (09:36 -0400)]
tty: Make driver-side flip buffers lockless
Driver-side flip buffer input is already single-threaded; 'publish'
the .next link as the last operation on the tail buffer so the
'consumer' sees the already-completed flip buffer.
The commit buffer index is already 'published' by driver-side functions.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Sat, 15 Jun 2013 13:36:08 +0000 (09:36 -0400)]
tty: Track flip buffer memory limit atomically
Lockless flip buffers require atomically updating the bytes-in-use
watermark.
The pty driver also peeks at the watermark value to limit
memory consumption to a much lower value than the default; query
the watermark with new fn, tty_buffer_space_avail().
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Sat, 15 Jun 2013 13:36:07 +0000 (09:36 -0400)]
tty: Simplify flip buffer list with 0-sized sentinel
Use a 0-sized sentinel to avoid assigning the head ptr from
the driver side thread. This also eliminates testing head/tail
for NULL.
When the sentinel is first 'consumed' by the buffer work
(or by tty_buffer_flush()), it is detached from the list but not
freed nor added to the free list. Both buffer work and
tty_buffer_flush() continue to preserve at least 1 flip buffer
to which head & tail is pointed.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Sat, 15 Jun 2013 13:36:06 +0000 (09:36 -0400)]
tty: Use lockless flip buffer free list
In preparation for lockless flip buffers, make the flip buffer
free list lockless.
NB: using llist is not the optimal solution, as the driver and
buffer work may contend over the llist head unnecessarily. However,
test measurements indicate this contention is low.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Sat, 15 Jun 2013 13:36:05 +0000 (09:36 -0400)]
tty: Use generic names for flip buffer list cursors
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Sat, 15 Jun 2013 13:36:04 +0000 (09:36 -0400)]
tty: Merge tty_buffer_find() into tty_buffer_alloc()
tty_buffer_find() implements a simple free list lookaside cache.
Merge this functionality into tty_buffer_alloc() to reflect the
more traditional alloc/free symmetry.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Sat, 15 Jun 2013 13:36:03 +0000 (09:36 -0400)]
tty: Factor flip buffer initialization into helper function
Factor shared code; prepare for adding 0-sized sentinel flip buffer.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Sat, 15 Jun 2013 13:36:02 +0000 (09:36 -0400)]
tty: Fix flip buffer free list
Since flip buffers are size-aligned to 256 bytes and all flip
buffers 512-bytes or larger are not added to the free list, the
free list only contains 256-byte flip buffers.
Remove the list search when allocating a new flip buffer.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Sat, 15 Jun 2013 13:36:01 +0000 (09:36 -0400)]
tty: Compute flip buffer ptrs
The char_buf_ptr and flag_buf_ptr values are trivially derived from
the .data field offset; compute values as needed.
Fixes a long-standing type-mismatch with the char and flag ptrs.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Sat, 15 Jun 2013 13:14:36 +0000 (09:14 -0400)]
n_tty: Queue buffer work on any available cpu
Scheduling buffer work on the same cpu as the read() thread
limits the parallelism now possible between the receive_buf path
and the n_tty_read() path.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Tue, 23 Jul 2013 12:47:30 +0000 (08:47 -0400)]
n_tty: Special case pty flow control
The pty driver forces ldisc flow control on, regardless of available
receive buffer space, so the writer can be woken whenever unthrottle
is called. However, this 'forced throttle' has performance
consequences, as multiple atomic operations are necessary to
unthrottle and perform the write wakeup for every input line (in
canonical mode).
Instead, short-circuit the unthrottle if the tty is a pty and perform
the write wakeup directly.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Sat, 15 Jun 2013 13:14:34 +0000 (09:14 -0400)]
n_tty: Move n_tty_write_wakeup() to avoid forward declaration
Prepare to special case pty flow control; avoid forward declaration.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Sat, 15 Jun 2013 13:14:33 +0000 (09:14 -0400)]
n_tty: Factor throttle/unthrottle into helper functions
Prepare for special handling of pty throttle/unthrottle; factor
flow control into helper functions.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Sat, 15 Jun 2013 13:14:32 +0000 (09:14 -0400)]
n_tty: Move chars_in_buffer() to factor throttle/unthrottle
Prepare to factor throttle and unthrottle into helper functions;
relocate chars_in_buffer() to avoid forward declaration.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Sat, 15 Jun 2013 13:14:31 +0000 (09:14 -0400)]
tty: Only guarantee termios read safety for throttle/unthrottle
No tty driver modifies termios during throttle() or unthrottle().
Therefore, only read safety is required.
However, tty_throttle_safe and tty_unthrottle_safe must still be
mutually exclusive; introduce throttle_mutex for that purpose.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Sat, 15 Jun 2013 13:14:30 +0000 (09:14 -0400)]
n_tty: Separate buffer indices to prevent cache-line sharing
If the read buffer indices are in the same cache-line, cpus will
contended over the cache-line (so called 'false sharing').
Separate the producer-published fields from the consumer-published
fields; document the locks relevant to each field.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Sat, 15 Jun 2013 13:14:29 +0000 (09:14 -0400)]
n_tty: Don't wait for buffer work in read() loop
User-space read() can run concurrently with receiving from device;
waiting for receive_buf() to complete is not required.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Sat, 15 Jun 2013 13:14:28 +0000 (09:14 -0400)]
n_tty: Fix type mismatches in receive_buf raw copy
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Sat, 15 Jun 2013 13:14:27 +0000 (09:14 -0400)]
n_tty: Reset lnext if canonical mode changes
lnext escapes the next input character as a literal, and must
be reset when canonical mode changes (to avoid misinterpreting
a special character as a literal if canonical mode is changed
back again).
lnext is specifically not reset on a buffer flush so as to avoid
misinterpreting the next input character as a special character.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Sat, 15 Jun 2013 13:14:26 +0000 (09:14 -0400)]
n_tty: Make N_TTY ldisc receive path lockless
n_tty has a single-producer/single-consumer input model;
use lockless publish instead.
Use termios_rwsem to exclude both consumer and producer while
changing or resetting buffer indices, eg., when flushing. Also,
claim exclusive termios_rwsem to safely retrieve the buffer
indices from a thread other than consumer or producer
(eg., TIOCINQ ioctl).
Note the read_tail is published _after_ clearing the newline
indicator in read_flags to avoid racing the producer.
Drop read_lock spinlock.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Sat, 15 Jun 2013 13:14:25 +0000 (09:14 -0400)]
n_tty: Replace canon_data with index comparison
canon_data represented the # of lines which had been copied
to the receive buffer but not yet copied to the user buffer.
The value was tested to determine if input was available in
canonical mode (and also to force input overrun if the
receive buffer was full but a newline had not been received).
However, the actual count was irrelevent; only whether it was
non-zero (meaning 'is there any input to transfer?'). This
shared count is unnecessary and unsafe with a lockless algorithm.
The same check is made by comparing canon_head with read_tail instead.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Sat, 15 Jun 2013 13:14:24 +0000 (09:14 -0400)]
n_tty: Access termios values safely
Use termios_rwsem to guarantee safe access to the termios values.
This is particularly important for N_TTY as changing certain termios
settings alters the mode of operation.
termios_rwsem must be dropped across throttle/unthrottle since
those functions claim the termios_rwsem exclusively (to guarantee
safe access to the termios and for mutual exclusion).
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Sat, 15 Jun 2013 13:14:23 +0000 (09:14 -0400)]
tty: Convert termios_mutex to termios_rwsem
termios is commonly accessed unsafely (especially by N_TTY)
because the existing mutex forces exclusive access.
Convert existing usage.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Sat, 15 Jun 2013 13:14:22 +0000 (09:14 -0400)]
n_tty: Remove read_cnt
Storing the read_cnt creates an unnecessary shared variable
between the single-producer (n_tty_receive_buf()) and the
single-consumer (n_tty_read()).
Compute read_cnt from head & tail instead of storing.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Sat, 15 Jun 2013 13:14:21 +0000 (09:14 -0400)]
n_tty: Don't wrap input buffer indices at buffer size
Wrap read_buf indices (read_head, read_tail, canon_head) at
max representable value, instead of at the N_TTY_BUF_SIZE. This step
is necessary to allow lockless reads of these shared variables
(by updating the variables atomically).
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Sat, 15 Jun 2013 13:14:20 +0000 (09:14 -0400)]
n_tty: Get read_cnt through accessor
Prepare for replacing read_cnt field with computed value.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Sat, 15 Jun 2013 13:14:19 +0000 (09:14 -0400)]
tty: Deprecate ldisc .chars_in_buffer() method
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Sat, 15 Jun 2013 13:14:18 +0000 (09:14 -0400)]
n_tty: Split n_tty_chars_in_buffer() for reader-only interface
N_TTY .chars_in_buffer() method requires serialized access if
the current thread is not the single-consumer, n_tty_read().
Separate the internal interface; prepare for lockless read-side.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Sat, 15 Jun 2013 13:14:17 +0000 (09:14 -0400)]
n_tty: Line copy to user buffer in canonical mode
Instead of pushing one char per loop, pre-compute the data length
to copy and copy all at once.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Sat, 15 Jun 2013 13:14:16 +0000 (09:14 -0400)]
n_tty: Factor canonical mode copy from n_tty_read()
Simplify n_tty_read(); extract complex copy algorithm
into separate function, canon_copy_to_user().
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Sat, 15 Jun 2013 13:14:15 +0000 (09:14 -0400)]
tty: Make ldisc input flow control concurrency-friendly
Although line discipline receiving is single-producer/single-consumer,
using tty->receive_room to manage flow control creates unnecessary
critical regions requiring additional lock use.
Instead, introduce the optional .receive_buf2() ldisc method which
returns the # of bytes actually received. Serialization is guaranteed
by the caller.
In turn, the line discipline should schedule the buffer work item
whenever space becomes available; ie., when there is room to receive
data and receive_room() previously returned 0 (the buffer work
item stops processing if receive_buf2() returns 0). Note the
'no room' state need not be atomic despite concurrent use by two
threads because only the buffer work thread can set the state and
only the read() thread can clear the state.
Add n_tty_receive_buf2() as the receive_buf2() method for N_TTY.
Provide a public helper function, tty_ldisc_receive_buf(), to use
when directly accessing the receive_buf() methods.
Line disciplines not using input flow control can continue to set
tty->receive_room to a fixed value and only provide the receive_buf()
method.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Sat, 15 Jun 2013 13:14:14 +0000 (09:14 -0400)]
tty: Simplify tty buffer/ldisc interface with helper function
Ldisc interface functions must be called with interrupts enabled.
Separating the ldisc calls into a helper function simplies the
eventual removal of the spinlock.
Note that access to the buf->head ptr outside the spinlock is
safe here because;
* __tty_buffer_flush() is prevented from running while buffer work
performs i/o,
* tty_buffer_find() only assigns buf->head if the flip buffer list
is empty (which is never the case in flush_to_ldisc() since at
least one buffer is always left in the list after use)
Access to the read index outside the spinlock is safe here for the
same reasons.
Update the buffer's read index _after_ the data has been received
by the ldisc.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Sat, 15 Jun 2013 13:14:13 +0000 (09:14 -0400)]
tty: Don't change receive_room for ioctl(TIOCSETD)
tty_set_ldisc() is guaranteed exclusive use of the line discipline
by tty_ldisc_lock_pair_timeout(); shutting off input by resetting
receive_room is unnecessary.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Sat, 15 Jun 2013 11:04:51 +0000 (07:04 -0400)]
tty: Clarify multiple-references comment in TIOCSETD ioctl
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Sat, 15 Jun 2013 11:04:50 +0000 (07:04 -0400)]
tty: Fix hangup race with TIOCSETD ioctl
The hangup may already have happened; check for that state also.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Sat, 15 Jun 2013 11:04:49 +0000 (07:04 -0400)]
tty: Clarify ldisc variable
Rename o_ldisc to avoid confusion with the ldisc of the
'other' tty.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Sat, 15 Jun 2013 11:04:48 +0000 (07:04 -0400)]
tty: Replace ldisc locking with ldisc_sem
Line discipline locking was performed with a combination of
a mutex, a status bit, a count, and a waitqueue -- basically,
a rw semaphore.
Replace the existing combination with an ld_semaphore.
Fixes:
1) the 'reference acquire after ldisc locked' bug
2) the over-complicated halt mechanism
3) lock order wrt. tty_lock()
4) dropping locks while changing ldisc
5) previously unidentified deadlock while locking ldisc from
both linked ttys concurrently
6) previously unidentified recursive deadlocks
Adds much-needed lockdep diagnostics.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Sat, 15 Jun 2013 11:04:47 +0000 (07:04 -0400)]
tty: Add lock/unlock ldisc pair functions
Just as the tty pair must be locked in a stable sequence
(ie, independent of which is consider the 'other' tty), so must
the ldisc pair be locked in a stable sequence as well.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Sat, 15 Jun 2013 11:04:46 +0000 (07:04 -0400)]
tty: Fix tty_ldisc_lock name collision
The file scope spinlock identifier, tty_ldisc_lock, will collide
with the file scope lock function tty_ldisc_lock() so rename it.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Linus Torvalds [Sun, 21 Jul 2013 19:05:29 +0000 (12:05 -0700)]
Linux 3.11-rc2
Linus Torvalds [Sun, 21 Jul 2013 17:11:04 +0000 (10:11 -0700)]
Merge tag 'acpi-video-3.11' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI video support fixes from Rafael Wysocki:
"I'm sending a separate pull request for this as it may be somewhat
controversial. The breakage addressed here is not really new and the
fixes may not satisfy all users of the affected systems, but we've had
so much back and forth dance in this area over the last several weeks
that I think it's time to actually make some progress.
The source of the problem is that about a year ago we started to tell
BIOSes that we're compatible with Windows 8, which we really need to
do, because some systems shipping with Windows 8 are tested with it
and nothing else, so if we tell their BIOSes that we aren't compatible
with Windows 8, we expose our users to untested BIOS/AML code paths.
However, as it turns out, some Windows 8-specific AML code paths are
not tested either, because Windows 8 actually doesn't use the ACPI
methods containing them, so if we declare Windows 8 compatibility and
attempt to use those ACPI methods, things break. That occurs mostly
in the backlight support area where in particular the _BCM and _BQC
methods are plain unusable on some systems if the OS declares Windows
8 compatibility.
[ The additional twist is that they actually become usable if the OS
says it is not compatible with Windows 8, but that may cause
problems to show up elsewhere ]
Investigation carried out by Matthew Garrett indicates that what
Windows 8 does about backlight is to leave backlight control up to
individual graphics drivers. At least there's evidence that it does
that if the Intel graphics driver is used, so we've decided to follow
Windows 8 in that respect and allow i915 to control backlight (Daniel
likes that part).
The first commit from Aaron Lu makes ACPICA export the variable from
which we can infer whether or not the BIOS believes that we are
compatible with Windows 8.
The second commit from Matthew Garrett prepares the ACPI video driver
by making it initialize the ACPI backlight even if it is not going to
be used afterward (that is needed for backlight control to work on
Thinkpads).
The third commit implements the actual workaround making i915 take
over backlight control if the firmware thinks it's dealing with
Windows 8 and is based on the work of multiple developers, including
Matthew Garrett, Chun-Yi Lee, Seth Forshee, and Aaron Lu.
The final commit from Aaron Lu makes us follow Windows 8 by informing
the firmware through the _DOS method that it should not carry out
automatic brightness changes, so that brightness can be controlled by
GUI.
Hopefully, this approach will allow us to avoid using blacklists of
systems that should not declare Windows 8 compatibility just to avoid
backlight control problems in the future.
- Change from Aaron Lu makes ACPICA export a variable which can be
used by driver code to determine whether or not the BIOS believes
that we are compatible with Windows 8.
- Change from Matthew Garrett makes the ACPI video driver initialize
the ACPI backlight even if it is not going to be used afterward
(that is needed for backlight control to work on Thinkpads).
- Fix from Rafael J Wysocki implements Windows 8 backlight support
workaround making i915 take over bakclight control if the firmware
thinks it's dealing with Windows 8. Based on the work of multiple
developers including Matthew Garrett, Chun-Yi Lee, Seth Forshee,
and Aaron Lu.
- Fix from Aaron Lu makes the kernel follow Windows 8 by informing
the firmware through the _DOS method that it should not carry out
automatic brightness changes, so that brightness can be controlled
by GUI"
* tag 'acpi-video-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / video: no automatic brightness changes by win8-compatible firmware
ACPI / video / i915: No ACPI backlight if firmware expects Windows 8
ACPI / video: Always call acpi_video_init_brightness() on init
ACPICA: expose OSI version
Linus Torvalds [Sun, 21 Jul 2013 03:11:42 +0000 (20:11 -0700)]
Merge tag 'ext4_for_linus' of git://git./linux/kernel/git/tytso/ext4
Pull ext[34] tmpfile bugfix from Ted Ts'o:
"Fix regression caused by commit
af51a2ac36d1f which added ->tmpfile()
support (along with a similar fix for ext3)"
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext3: fix a BUG when opening a file with O_TMPFILE flag
ext4: fix a BUG when opening a file with O_TMPFILE flag
Zheng Liu [Sun, 21 Jul 2013 02:03:20 +0000 (22:03 -0400)]
ext3: fix a BUG when opening a file with O_TMPFILE flag
When we try to open a file with O_TMPFILE flag, we will trigger a bug.
The root cause is that in ext4_orphan_add() we check ->i_nlink == 0 and
this check always fails because we set ->i_nlink = 1 in
inode_init_always(). We can use the following program to trigger it:
int main(int argc, char *argv[])
{
int fd;
fd = open(argv[1], O_TMPFILE, 0666);
if (fd < 0) {
perror("open ");
return -1;
}
close(fd);
return 0;
}
The oops message looks like this:
kernel: kernel BUG at fs/ext3/namei.c:1992!
kernel: invalid opcode: 0000 [#1] SMP
kernel: Modules linked in: ext4 jbd2 crc16 cpufreq_ondemand ipv6 dm_mirror dm_region_hash dm_log dm_mod parport_pc parport serio_raw sg dcdbas pcspkr i2c_i801 ehci_pci ehci_hcd button acpi_cpufreq mperf e1000e ptp pps_core ttm drm_kms_helper drm hwmon i2c_algo_bit i2c_core ext3 jbd sd_mod ahci libahci libata scsi_mod uhci_hcd
kernel: CPU: 0 PID: 2882 Comm: tst_tmpfile Not tainted 3.11.0-rc1+ #4
kernel: Hardware name: Dell Inc. OptiPlex 780 /0V4W66, BIOS A05 08/11/2010
kernel: task:
ffff880112d30050 ti:
ffff8801124d4000 task.ti:
ffff8801124d4000
kernel: RIP: 0010:[<
ffffffffa00db5ae>] [<
ffffffffa00db5ae>] ext3_orphan_add+0x6a/0x1eb [ext3]
kernel: RSP: 0018:
ffff8801124d5cc8 EFLAGS:
00010202
kernel: RAX:
0000000000000000 RBX:
ffff880111510128 RCX:
ffff8801114683a0
kernel: RDX:
0000000000000000 RSI:
ffff880111510128 RDI:
ffff88010fcf65a8
kernel: RBP:
ffff8801124d5d18 R08:
0080000000000000 R09:
ffffffffa00d3b7f
kernel: R10:
ffff8801114683a0 R11:
ffff8801032a2558 R12:
0000000000000000
kernel: R13:
ffff88010fcf6800 R14:
ffff8801032a2558 R15:
ffff8801115100d8
kernel: FS:
00007f5d172b5700(0000) GS:
ffff880117c00000(0000) knlGS:
0000000000000000
kernel: CS: 0010 DS: 0000 ES: 0000 CR0:
000000008005003b
kernel: CR2:
00007f5d16df15d0 CR3:
0000000110b1d000 CR4:
00000000000407f0
kernel: Stack:
kernel:
000000000000000c ffff8801048a7dc8 ffff8801114685a8 ffffffffa00b80d7
kernel:
ffff8801124d5e38 ffff8801032a2558 ffff88010ce24d68 0000000000000000
kernel:
ffff88011146b300 ffff8801124d5d44 ffff8801124d5d78 ffffffffa00db7e1
kernel: Call Trace:
kernel: [<
ffffffffa00b80d7>] ? journal_start+0x8c/0xbd [jbd]
kernel: [<
ffffffffa00db7e1>] ext3_tmpfile+0xb2/0x13b [ext3]
kernel: [<
ffffffff821076f8>] path_openat+0x11f/0x5e7
kernel: [<
ffffffff821c86b4>] ? list_del+0x11/0x30
kernel: [<
ffffffff82065fa2>] ? __dequeue_entity+0x33/0x38
kernel: [<
ffffffff82107cd5>] do_filp_open+0x3f/0x8d
kernel: [<
ffffffff82112532>] ? __alloc_fd+0x50/0x102
kernel: [<
ffffffff820f9296>] do_sys_open+0x13b/0x1cd
kernel: [<
ffffffff820f935c>] SyS_open+0x1e/0x20
kernel: [<
ffffffff82398c02>] system_call_fastpath+0x16/0x1b
kernel: Code: 39 c7 0f 85 67 01 00 00 0f b7 03 25 00 f0 00 00 3d 00 40 00 00 74 18 3d 00 80 00 00 74 11 3d 00 a0 00 00 74 0a 83 7b 48 00 74 04 <0f> 0b eb fe 49 8b 85 50 03 00 00 4c 89 f6 48 c7 c7 c0 99 0e a0
kernel: RIP [<
ffffffffa00db5ae>] ext3_orphan_add+0x6a/0x1eb [ext3]
kernel: RSP <
ffff8801124d5cc8>
Here we couldn't call clear_nlink() directly because in d_tmpfile() we
will call inode_dec_link_count() to decrease ->i_nlink. So this commit
tries to call d_tmpfile() before ext4_orphan_add() to fix this problem.
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Jan Kara <jack@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Zheng Liu [Sun, 21 Jul 2013 01:58:38 +0000 (21:58 -0400)]
ext4: fix a BUG when opening a file with O_TMPFILE flag
When we try to open a file with O_TMPFILE flag, we will trigger a bug.
The root cause is that in ext4_orphan_add() we check ->i_nlink == 0 and
this check always fails because we set ->i_nlink = 1 in
inode_init_always(). We can use the following program to trigger it:
int main(int argc, char *argv[])
{
int fd;
fd = open(argv[1], O_TMPFILE, 0666);
if (fd < 0) {
perror("open ");
return -1;
}
close(fd);
return 0;
}
The oops message looks like this:
kernel BUG at fs/ext4/namei.c:2572!
invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in: dlci bridge stp hidp cmtp kernelcapi l2tp_ppp l2tp_netlink l2tp_core sctp libcrc32c rfcomm tun fuse nfnetli
nk can_raw ipt_ULOG can_bcm x25 scsi_transport_iscsi ipx p8023 p8022 appletalk phonet psnap vmw_vsock_vmci_transport af_key vmw_vmci rose vsock atm can netrom ax25 af_rxrpc ir
da pppoe pppox ppp_generic slhc bluetooth nfc rfkill rds caif_socket caif crc_ccitt af_802154 llc2 llc snd_hda_codec_realtek snd_hda_intel snd_hda_codec serio_raw snd_pcm pcsp
kr edac_core snd_page_alloc snd_timer snd soundcore r8169 mii sr_mod cdrom pata_atiixp radeon backlight drm_kms_helper ttm
CPU: 1 PID: 1812571 Comm: trinity-child2 Not tainted 3.11.0-rc1+ #12
Hardware name: Gigabyte Technology Co., Ltd. GA-MA78GM-S2H/GA-MA78GM-S2H, BIOS F12a 04/23/2010
task:
ffff88007dfe69a0 ti:
ffff88010f7b6000 task.ti:
ffff88010f7b6000
RIP: 0010:[<
ffffffff8125ce69>] [<
ffffffff8125ce69>] ext4_orphan_add+0x299/0x2b0
RSP: 0018:
ffff88010f7b7cf8 EFLAGS:
00010202
RAX:
0000000000000000 RBX:
ffff8800966d3020 RCX:
0000000000000000
RDX:
0000000000000000 RSI:
ffff88007dfe70b8 RDI:
0000000000000001
RBP:
ffff88010f7b7d40 R08:
ffff880126a3c4e0 R09:
ffff88010f7b7ca0
R10:
0000000000000000 R11:
0000000000000000 R12:
ffff8801271fd668
R13:
ffff8800966d2f78 R14:
ffff88011d7089f0 R15:
ffff88007dfe69a0
FS:
00007f70441a3740(0000) GS:
ffff88012a800000(0000) knlGS:
00000000f77c96c0
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
0000000002834000 CR3:
0000000107964000 CR4:
00000000000007e0
DR0:
0000000000780000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000ffff0ff0 DR7:
0000000000000600
Stack:
0000000000002000 00000020810b6dde 0000000000000000 ffff88011d46db00
ffff8800966d3020 ffff88011d7089f0 ffff88009c7f4c10 ffff88010f7b7f2c
ffff88007dfe69a0 ffff88010f7b7da8 ffffffff8125cfac ffff880100000004
Call Trace:
[<
ffffffff8125cfac>] ext4_tmpfile+0x12c/0x180
[<
ffffffff811cba78>] path_openat+0x238/0x700
[<
ffffffff8100afc4>] ? native_sched_clock+0x24/0x80
[<
ffffffff811cc647>] do_filp_open+0x47/0xa0
[<
ffffffff811db73f>] ? __alloc_fd+0xaf/0x200
[<
ffffffff811ba2e4>] do_sys_open+0x124/0x210
[<
ffffffff81010725>] ? syscall_trace_enter+0x25/0x290
[<
ffffffff811ba3ee>] SyS_open+0x1e/0x20
[<
ffffffff816ca8d4>] tracesys+0xdd/0xe2
[<
ffffffff81001001>] ? start_thread_common.constprop.6+0x1/0xa0
Code: 04 00 00 00 89 04 24 31 c0 e8 c4 77 04 00 e9 43 fe ff ff 66 25 00 d0 66 3d 00 80 0f 84 0e fe ff ff 83 7b 48 00 0f 84 04 fe ff ff <0f> 0b 49 8b 8c 24 50 07 00 00 e9 88 fe ff ff 0f 1f 84 00 00 00
Here we couldn't call clear_nlink() directly because in d_tmpfile() we
will call inode_dec_link_count() to decrease ->i_nlink. So this commit
tries to call d_tmpfile() before ext4_orphan_add() to fix this problem.
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Tested-by: Darrick J. Wong <darrick.wong@oracle.com>
Tested-by: Dave Jones <davej@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Linus Torvalds [Sat, 20 Jul 2013 22:42:38 +0000 (15:42 -0700)]
Merge tag 'staging-3.11-rc2' of git://git./linux/kernel/git/gregkh/staging
Pull staging tree fixes from Greg KH:
"Here are a few iio driver fixes for 3.11-rc2. They are still spread
across drivers/iio and drivers/staging/iio so they are coming in
through this tree.
I've also removed the drivers/staging/csr/ driver as the developers
who originally sent it to me have moved on to other companies, and CSR
still will not send us the specs for the device, making the driver
pretty much obsolete and impossible to fix up. Deleting it now
prevents people from sending in lots of tiny codingsyle fixes that
will never go anywhere.
It also helps to offset the large lustre filesystem merge that
happened in 3.11-rc1 in the overall 3.11.0 diffstat. :)"
* tag 'staging-3.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: csr: remove driver
iio: lps331ap: Fix wrong in_pressure_scale output value
iio staging: fix lis3l02dq, read error handling
staging:iio:ad7291: add missing .driver_module to struct iio_info
iio: ti_am335x_adc: add missing .driver_module to struct iio_info
iio: mxs-lradc: Remove useless check in read_raw
iio: mxs-lradc: Fix misuse of iio->trig
iio: inkern: fix iio_convert_raw_to_processed_unlocked
iio: Fix iio_channel_has_info
iio:trigger: device_unregister->device_del to avoid double free
iio: dac: ad7303: fix error return code in ad7303_probe()
Linus Torvalds [Sat, 20 Jul 2013 17:50:01 +0000 (10:50 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
"The sget() one is a long-standing bug and will need to go into -stable
(in fact, it had been originally caught in RHEL6), the other two are
3.11-only"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
vfs: constify dentry parameter in d_count()
livelock avoidance in sget()
allow O_TMPFILE to work with O_WRONLY
Linus Torvalds [Sat, 20 Jul 2013 17:48:59 +0000 (10:48 -0700)]
Merge tag 'ext4_for_linus' of git://git./linux/kernel/git/tytso/ext4
Pull ext4 bugfixes from Ted Ts'o:
"Fixes for 3.11-rc2, sent at 5pm, in the professoinal style. :-)"
I'm not sure I like this new level of "professionalism".
9-5, people, 9-5.
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: call ext4_es_lru_add() after handling cache miss
ext4: yield during large unlinks
ext4: make the extent_status code more robust against ENOMEM failures
ext4: simplify calculation of blocks to free on error
ext4: fix error handling in ext4_ext_truncate()
Linus Torvalds [Sat, 20 Jul 2013 17:48:24 +0000 (10:48 -0700)]
Merge tag 'nfs-for-3.11-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes from Trond Myklebust:
- Fix a regression against NFSv4 FreeBSD servers when creating a new
file
- Fix another regression in rpc_client_register()
* tag 'nfs-for-3.11-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFSv4: Fix a regression against the FreeBSD server
SUNRPC: Fix another issue with rpc_client_register()
Linus Torvalds [Sat, 20 Jul 2013 17:47:38 +0000 (10:47 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/josef/btrfs-next
Pull btrfs fixes from Josef Bacik:
"I'm playing the role of Chris Mason this week while he's on vacation.
There are a few critical fixes for btrfs here, all regressions and
have been tested well"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next:
Btrfs: fix wrong write offset when replacing a device
Btrfs: re-add root to dead root list if we stop dropping it
Btrfs: fix lock leak when resuming snapshot deletion
Btrfs: update drop progress before stopping snapshot dropping
Peng Tao [Thu, 18 Jul 2013 14:09:08 +0000 (22:09 +0800)]
vfs: constify dentry parameter in d_count()
so that it can be used in places like d_compare/d_hash
without causing a compiler warning.
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Fri, 19 Jul 2013 23:13:55 +0000 (03:13 +0400)]
livelock avoidance in sget()
Eric Sandeen has found a nasty livelock in sget() - take a mount(2) about
to fail. The superblock is on ->fs_supers, ->s_umount is held exclusive,
->s_active is 1. Along comes two more processes, trying to mount the same
thing; sget() in each is picking that superblock, bumping ->s_count and
trying to grab ->s_umount. ->s_active is 3 now. Original mount(2)
finally gets to deactivate_locked_super() on failure; ->s_active is 2,
superblock is still ->fs_supers because shutdown will *not* happen until
->s_active hits 0. ->s_umount is dropped and now we have two processes
chasing each other:
s_active = 2, A acquired ->s_umount, B blocked
A sees that the damn thing is stillborn, does deactivate_locked_super()
s_active = 1, A drops ->s_umount, B gets it
A restarts the search and finds the same superblock. And bumps it ->s_active.
s_active = 2, B holds ->s_umount, A blocked on trying to get it
... and we are in the earlier situation with A and B switched places.
The root cause, of course, is that ->s_active should not grow until we'd
got MS_BORN. Then failing ->mount() will have deactivate_locked_super()
shut the damn thing down. Fortunately, it's easy to do - the key point
is that grab_super() is called only for superblocks currently on ->fs_supers,
so it can bump ->s_count and grab ->s_umount first, then check MS_BORN and
bump ->s_active; we must never increment ->s_count for superblocks past
->kill_sb(), but grab_super() is never called for those.
The bug is pretty old; we would've caught it by now, if not for accidental
exclusion between sget() for block filesystems; the things like cgroup or
e.g. mtd-based filesystems don't have anything of that sort, so they get
bitten. The right way to deal with that is obviously to fix sget()...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Fri, 19 Jul 2013 23:11:32 +0000 (03:11 +0400)]
allow O_TMPFILE to work with O_WRONLY
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Linus Torvalds [Fri, 19 Jul 2013 22:11:09 +0000 (15:11 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/rw/uml
Pull UML fixes from Richard Weinberger:
"Special thanks goes to Toralf Föster for continuously testing UML and
reporting issues!"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
um: remove dead code
um: siginfo cleanup
uml: Fix which_tmpdir failure when /dev/shm is a symlink, and in other edge cases
um: Fix wait_stub_done() error handling
um: Mark stub pages mapping with VM_PFNMAP
um: Fix return value of strnlen_user()
Linus Torvalds [Fri, 19 Jul 2013 22:10:01 +0000 (15:10 -0700)]
Merge branch 'upstream' of git://git.linux-mips.org/ralf/upstream-linus
Pull MIPS fixes from Ralf Baechle:
"MIPS fixes for 3.11. Half of then is for Netlogic the remainder
touches things across arch/mips.
Nothing really dramatic and by rc1 standards MIPS will be in fairly
good shape with this applied. Tested by building all MIPS defconfigs
of which with this pull request four platforms won't build. And yes,
it boots also on my favorite test systems"
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
MIPS: kvm: Kconfig: Drop HAVE_KVM dependency from VIRTUALIZATION
MIPS: Octeon: Fix DT pruning bug with pip ports
MIPS: KVM: Mark KVM_GUEST (T&E KVM) as BROKEN_ON_SMP
MIPS: tlbex: fix broken build in v3.11-rc1
MIPS: Netlogic: Add XLP PIC irqdomain
MIPS: Netlogic: Fix USB block's coherent DMA mask
MIPS: tlbex: Fix typo in r3000 tlb store handler
MIPS: BMIPS: Fix thinko to release slave TP from reset
MIPS: Delete dead invocation of exception_exit().
Linus Torvalds [Fri, 19 Jul 2013 22:08:53 +0000 (15:08 -0700)]
Merge tag 'arm64-stable' of git://git./linux/kernel/git/cmarinas/linux-aarch64
Pull arm64 fixes from Catalin Marinas:
- Post -rc1 update to the common reboot infrastructure.
- Fixes (user cache maintenance fault handling, !COMPAT compilation,
CPU online and interrupt hanlding).
* tag 'arm64-stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64:
arm64: use common reboot infrastructure
arm64: mm: don't treat user cache maintenance faults as writes
arm64: add '#ifdef CONFIG_COMPAT' for aarch32_break_handler()
arm64: Only enable local interrupts after the CPU is marked online
Linus Torvalds [Fri, 19 Jul 2013 22:08:12 +0000 (15:08 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/s390/linux
Pull s390 fixes from Martin Schwidefsky:
"An update for the BFP jit to the latest and greatest, two patches to
get kdump working again, the random-abort ptrace extention for
transactional execution, the z90crypt module alias for ap and a tiny
cleanup"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/zcrypt: Alias for new zcrypt device driver base module
s390/kdump: Allow copy_oldmem_page() copy to virtual memory
s390/kdump: Disable mmap for s390
s390/bpf,jit: add pkt_type support
s390/bpf,jit: address randomize and write protect jit code
s390/bpf,jit: use generic jit dumper
s390/bpf,jit: call module_free() from any context
s390/qdio: remove unused variable
s390/ptrace: PTRACE_TE_ABORT_RAND
Stefan Behrens [Thu, 4 Jul 2013 14:14:23 +0000 (16:14 +0200)]
Btrfs: fix wrong write offset when replacing a device
Miao Xie reported the following issue:
The filesystem was corrupted after we did a device replace.
Steps to reproduce:
# mkfs.btrfs -f -m single -d raid10 <device0>..<device3>
# mount <device0> <mnt>
# btrfs replace start -rfB 1 <device4> <mnt>
# umount <mnt>
# btrfsck <device4>
The reason for the issue is that we changed the write offset by mistake,
introduced by commit
625f1c8dc.
We read the data from the source device at first, and then write the
data into the corresponding place of the new device. In order to
implement the "-r" option, the source location is remapped using
btrfs_map_block(). The read takes place on the mapped location, and
the write needs to take place on the unmapped location. Currently
the write is using the mapped location, and this commit changes it
back by undoing the change to the write address that the aforementioned
commit added by mistake.
Reported-by: Miao Xie <miaox@cn.fujitsu.com>
Cc: <stable@vger.kernel.org> # 3.10+
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Josef Bacik [Wed, 17 Jul 2013 23:30:20 +0000 (19:30 -0400)]
Btrfs: re-add root to dead root list if we stop dropping it
If we stop dropping a root for whatever reason we need to add it back to the
dead root list so that we will re-start the dropping next transaction commit.
The other case this happens is if we recover a drop because we will add a root
without adding it to the fs radix tree, so we can leak it's root and commit root
extent buffer, adding this to the dead root list makes this cleanup happen.
Thanks,
Cc: stable@vger.kernel.org
Reported-by: Alex Lyakas <alex.btrfs@zadarastorage.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Josef Bacik [Mon, 15 Jul 2013 16:41:42 +0000 (12:41 -0400)]
Btrfs: fix lock leak when resuming snapshot deletion
We aren't setting path->locks[level] when we resume a snapshot deletion which
means we won't unlock the buffer when we free the path. This causes deadlocks
if we happen to re-allocate the block before we've evicted the extent buffer
from cache. Thanks,
Cc: stable@vger.kernel.org
Reported-by: Alex Lyakas <alex.btrfs@zadarastorage.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Josef Bacik [Mon, 15 Jul 2013 15:57:06 +0000 (11:57 -0400)]
Btrfs: update drop progress before stopping snapshot dropping
Alex pointed out a problem and fix that exists in the drop one snapshot at a
time patch. If we decide we need to exit for whatever reason (umount for
example) we will just exit the snapshot dropping without updating the drop
progress. So the next time we go to resume we will BUG_ON() because we can't
find the extent we left off at because we never updated it. This patch fixes
the problem.
Cc: stable@vger.kernel.org
Reported-by: Alex Lyakas <alex.btrfs@zadarastorage.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Linus Torvalds [Fri, 19 Jul 2013 17:17:12 +0000 (10:17 -0700)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull KVM fix from Paolo Bonzini:
"This single patch fixes a regression caused by one of the
optimizations introduced in 3.11, which is generally visible only on
AMD processors"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: MMU: avoid fast page fault fixing mmio page fault
Linus Torvalds [Fri, 19 Jul 2013 16:59:06 +0000 (09:59 -0700)]
Merge tag 'pm+acpi-3.11-rc2' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management and ACPI fixes from Rafael Wysocki:
"These are fixes collected over the last week, most importnatly two
cpufreq reverts fixing regressions introduced in 3.10, an autoseelp
fix preventing systems using it from crashing during shutdown and two
ACPI scan fixes related to hotplug.
Specifics:
- Two cpufreq commits from the 3.10 cycle introduced regressions.
The first of them was buggy (it did way much more than it needed to
do) and the second one attempted to fix an issue introduced by the
first one. Fixes from Srivatsa S Bhat revert both.
- If autosleep triggers during system shutdown and the shutdown
callbacks of some device drivers have been called already, it may
crash the system. Fix from Liu Shuo prevents that from happening
by making try_to_suspend() check system_state.
- The ACPI memory hotplug driver doesn't clear its driver_data on
errors which may cause a NULL poiter dereference to happen later.
Fix from Toshi Kani.
- The ACPI namespace scanning code should not try to attach scan
handlers to device objects that have them already, which may
confuse things quite a bit, and it should rescan the whole
namespace branch starting at the given node after receiving a bus
check notify event even if the device at that particular node has
been discovered already. Fixes from Rafael J Wysocki.
- New ACPI video blacklist entry for a system whose initial backlight
setting from the BIOS doesn't make sense. From Lan Tianyu.
- Garbage string output avoindance for ACPI PNP from Liu Shuo.
- Two Kconfig fixes for issues introduced recently in the s3c24xx
cpufreq driver (when moving the driver to drivers/cpufreq) from
Paul Bolle.
- Trivial comment fix in pm_wakeup.h from Chanwoo Choi"
* tag 'pm+acpi-3.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / video: ignore BIOS initial backlight value for Fujitsu E753
PNP / ACPI: avoid garbage in resource name
cpufreq: Revert commit
2f7021a8 to fix CPU hotplug regression
cpufreq: s3c24xx: fix "depends on ARM_S3C24XX" in Kconfig
cpufreq: s3c24xx: rename CONFIG_CPU_FREQ_S3C24XX_DEBUGFS
PM / Sleep: Fix comment typo in pm_wakeup.h
PM / Sleep: avoid 'autosleep' in shutdown progress
cpufreq: Revert commit a66b2e to fix suspend/resume regression
ACPI / memhotplug: Fix a stale pointer in error path
ACPI / scan: Always call acpi_bus_scan() for bus check notifications
ACPI / scan: Do not try to attach scan handlers to devices having them
Marc Zyngier [Thu, 11 Jul 2013 11:13:00 +0000 (12:13 +0100)]
arm64: use common reboot infrastructure
Commit
7b6d864b48d9 (reboot: arm: change reboot_mode to use enum
reboot_mode) changed the way reboot is handled on arm, which has a
direct impact on arm64 as we share the reset driver on the VE platform.
The obvious fix is to move arm64 to use the same infrastructure.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
[catalin.marinas@arm.com: removed reboot_mode = REBOOT_HARD default setting]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Will Deacon [Fri, 19 Jul 2013 14:37:12 +0000 (15:37 +0100)]
arm64: mm: don't treat user cache maintenance faults as writes
On arm64, cache maintenance faults appear as data aborts with the CM
bit set in the ESR. The WnR bit, usually used to distinguish between
faulting loads and stores, always reads as 1 and (slightly confusingly)
the instructions are treated as reads by the architecture.
This patch fixes our fault handling code to treat cache maintenance
faults in the same way as loads.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>