profile/common/kernel-common.git
13 years agopnfs-obj: pg_test check for max_io_size
Boaz Harrosh [Wed, 25 May 2011 18:25:29 +0000 (21:25 +0300)]
pnfs-obj: pg_test check for max_io_size

Implement pg_test vector to test for max IO sizes. We calculate
a max_io_size member only once, and cache it in lseg so to not
do so on every page insert.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
[simplify logic]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
13 years agoNFSv4.1: define nfs_generic_pg_test
Boaz Harrosh [Sun, 29 May 2011 08:45:39 +0000 (11:45 +0300)]
NFSv4.1: define nfs_generic_pg_test

By default, unless pnfs is used coalesce pages until pg_bsize
(rsize or wsize) is reached.

pnfs layout drivers define their own pg_test methods that use
pnfs_generic_pg_test and need to define their own I/O size
limits (e.g. based on the file stripe size).

[Move a check from nfs_pageio_do_add_request to nfs_generic_pg_test]
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
13 years agoNFSv4.1: use pnfs_generic_pg_test directly by layout driver
Benny Halevy [Wed, 25 May 2011 17:54:40 +0000 (20:54 +0300)]
NFSv4.1: use pnfs_generic_pg_test directly by layout driver

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
13 years agoNFSv4.1: change pg_test return type to bool
Benny Halevy [Wed, 25 May 2011 18:03:56 +0000 (21:03 +0300)]
NFSv4.1: change pg_test return type to bool

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
13 years agoNFSv4.1: unify pnfs_pageio_init functions
Benny Halevy [Wed, 25 May 2011 17:25:22 +0000 (20:25 +0300)]
NFSv4.1: unify pnfs_pageio_init functions

Use common code for pnfs_pageio_init_{read,write} and use
a common generic pg_test function.

Note that this function always assumes the the layout driver's
pg_test method is implemented.

[Fix BUG]
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
13 years agopnfs-obj: objlayout_encode_layoutcommit implementation
Boaz Harrosh [Sun, 22 May 2011 16:54:13 +0000 (19:54 +0300)]
pnfs-obj: objlayout_encode_layoutcommit implementation

* Define API for io-engines to report delta_space_used in IOs
* Encode the osd-layout specific information of the layoutcommit
  XDR buffer.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
13 years agopnfs: encode_layoutcommit
Benny Halevy [Sun, 22 May 2011 16:53:48 +0000 (19:53 +0300)]
pnfs: encode_layoutcommit

Add a layout driver method to encode the layout type specific
opaque part of layout commit in-line in the xdr stream.

Currently, the pnfs-objects layout driver uses it to encode metadata hints
to the MDS and the blocks layout driver to commit provisionally allocated
extents to the file.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
13 years agopnfs-obj: report errors and .encode_layoutreturn Implementation.
Boaz Harrosh [Thu, 26 May 2011 18:49:46 +0000 (21:49 +0300)]
pnfs-obj: report errors and .encode_layoutreturn Implementation.

An io_state pre-allocates an error information structure for each
possible osd-device that might error during IO. When IO is done if all
was well the io_state is freed. (as today). If the I/O has ended with an
error, the io_state is queued on a per-layout err_list. When eventually
encode_layoutreturn() is called, each error is properly encoded on the
XDR buffer and only then the io_state is removed from err_list and
de-allocated.

It is up to the io_engine to fill in the segment that fault and the type
of osd_error that occurred. By calling objlayout_io_set_result() for
each failing device.

In objio_osd:
* Allocate io-error descriptors space as part of io_state
* Use generic objlayout error reporting at end of io.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
13 years agopnfs: encode_layoutreturn
Andy Adamson [Sun, 22 May 2011 16:53:10 +0000 (19:53 +0300)]
pnfs: encode_layoutreturn

Add a layout driver method to encode the layout type specific
opaque part of layout return in-line in the xdr stream.

Currently the pnfs-objects layout driver uses it to encode i/o error
information on LAYOUTRETURN.

Signed-off-by: Andy Adamson <andros@netapp.com>
[fixup layout header pointer for encode_layoutreturn]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
13 years agopnfs: layoutret_on_setattr
Benny Halevy [Wed, 14 Jul 2010 19:43:57 +0000 (15:43 -0400)]
pnfs: layoutret_on_setattr

With the objects layout security model, we have object capabilities
that are associated with the layout and we anticipate that the server
will issue a cb_layoutrecall for any setattr that changes security
related attributes (user/group/mode/acl) or truncates the file.

Therefore, the layout is returned before issuing the setattr to avoid
the anticipated cb_layoutrecall.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
13 years agopnfs: layoutreturn
Benny Halevy [Sun, 22 May 2011 16:52:37 +0000 (19:52 +0300)]
pnfs: layoutreturn

NFSv4.1 LAYOUTRETURN implementation

Currently, does not support layout-type payload encoding.

Signed-off-by: Alexandros Batsakis <batsakis@netapp.com>
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com>
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: Zhang Jingwang <zhangjingwang@nrchpc.ac.cn>
[call pnfs_return_layout right before pnfs_destroy_layout]
[remove assert_spin_locked from pnfs_clear_lseg_list]
[remove wait parameter from the layoutreturn path.]
[remove return_type field from nfs4_layoutreturn_args]
[remove range from nfs4_layoutreturn_args]
[no need to send layoutcommit from _pnfs_return_layout]
[don't wait on sync layoutreturn]
[fix layout stateid in layoutreturn args]
[fixed NULL deref in _pnfs_return_layout]
[removed recaim member of nfs4_layoutreturn_args]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
13 years agopnfs-obj: osd raid engine read/write implementation
Boaz Harrosh [Sun, 22 May 2011 16:52:19 +0000 (19:52 +0300)]
pnfs-obj: osd raid engine read/write implementation

With the use of the in-kernel osd library. Implement read/write
of data from/to osd-objects according to information specified
in the objects-layout.

Support for stripping over mirrors with a received stripe_unit.
There are however a few constrains which are not supported:
 1. Stripe Unit must be a multiple of PAGE_SIZE
 2. stripe length (stripe_unit * number_of_stripes) can not be
    bigger then 32bit.

Also support raid-groups and partial-layout. Partial-layout is
when not all the groups are received on the line, addressing
only a partial range of the file.

TODO:
  Only raid0! raid 4/5/6 support will come at later stage

A none supported layout will send IO through the MDS

[Important fallout from the last rebase]
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
[gfp_flags]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
13 years agopnfs: support for non-rpc layout drivers
Benny Halevy [Sun, 22 May 2011 16:52:03 +0000 (19:52 +0300)]
pnfs: support for non-rpc layout drivers

Non-rpc layout driver such as for objects and blocks
implement their own I/O path and error handling logic.
Therefore bypass NFS-based error handling for these layout drivers.

[fix lseg ref-count bugs, and null de-refs]
[Fall out from: non-rpc layout drivers]
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
[get rid of PNFS_USE_RPC_CODE]
[get rid of __nfs4_write_done_cb]
[revert useless change in nfs4_write_done_cb]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
13 years agopnfs-obj: define per-inode private structure
Benny Halevy [Sun, 22 May 2011 16:51:48 +0000 (19:51 +0300)]
pnfs-obj: define per-inode private structure

allocate and deallocate per-inode private pnfs_layout_hdr
in preparation for I/O implementation.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
13 years agopnfs: alloc and free layout_hdr layoutdriver methods
Benny Halevy [Sun, 22 May 2011 16:51:33 +0000 (19:51 +0300)]
pnfs: alloc and free layout_hdr layoutdriver methods

[gfp_flags]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
13 years agopnfs-obj: objio_osd device information retrieval and caching
Boaz Harrosh [Thu, 26 May 2011 18:45:34 +0000 (21:45 +0300)]
pnfs-obj: objio_osd device information retrieval and caching

When a new layout is received in objio_alloc_lseg all device_ids
referenced are retrieved. The device information is queried for from MDS
and then the osd_device is looked-up from the osd-initiator library. The
devices are cached in a per-mount-point list, for later use. At unmount
all devices are "put" back to the library.

objlayout_get_deviceinfo(), objlayout_put_deviceinfo() middleware
API for retrieving device information given a device_id.

TODO: The device cache can get big. Cap its size. Keep an LRU and start
      to return devices which were not used, when list gets to big, or
      when new entries allocation fail.

[pnfs-obj: Bugs in new global-device-cache code]
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
[gfp_flags]
[use global device cache]
[use layout driver in global device cache]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
13 years agopnfs-obj: decode layout, alloc/free lseg
Boaz Harrosh [Sun, 22 May 2011 16:50:20 +0000 (19:50 +0300)]
pnfs-obj: decode layout, alloc/free lseg

objlayout_alloc_lseg prepares an xdr_stream and calls the
raid engins objio_alloc_lseg() to allocate a private
pnfs_layout_segment.

objio_osd.c::objio_alloc_lseg() uses passed xdr_stream to
decode and store the layout_segment information in an
objio_segment struct, using the pnfs_osd_xdr.h API for
the actual parsing the layout xdr.

objlayout_free_lseg calls objio_free_lseg() to free the
allocated space.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
[gfp_flags]
[removed "extern" from function definitions]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
13 years agopnfs-obj: pnfs_osd XDR client implementation
Boaz Harrosh [Sun, 22 May 2011 16:49:57 +0000 (19:49 +0300)]
pnfs-obj: pnfs_osd XDR client implementation

* Add the fs/nfs/objlayout/pnfs_osd_xdr_cli.c file, which will
  include the XDR encode/decode implementations for the pNFS
  client objlayout driver.

[Wrong type in comments]
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
13 years agopnfs-obj: pnfs_osd XDR definitions
Benny Halevy [Sun, 22 May 2011 16:49:32 +0000 (19:49 +0300)]
pnfs-obj: pnfs_osd XDR definitions

* Add the pnfs_osd_xdr.h header

* defintions the pnfs_osd_layout structure including all it's
  sub-types and constants.
* Declare the pnfs_osd_xdr_decode_layout API + all needed
  inline helpers.

* Define the pnfs_osd_deviceaddr structure and all its subtypes and
  constants.
* Declare API for decoding of a pnfs_osd_deviceaddr from XDR stream.

* Define the pnfs_osd_ioerr structure, its substructures and constants.
* Declare API for encoding of a pnfs_osd_ioerr into XDR stream.

* Define the pnfs_osd_layoutupdate structure and its substructures.
* Declare API for encoding of a pnfs_osd_layoutupdate into XDR stream.

[Remove server definitions]
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
13 years agopnfs-obj: objlayoutdriver module skeleton
Benny Halevy [Sun, 22 May 2011 16:49:06 +0000 (19:49 +0300)]
pnfs-obj: objlayoutdriver module skeleton

* Define the PNFS_OBJLAYOUT Kconfig option in the nfs
  master Kconfig file.
* Add the objlayout driver to the Kernel's Kbuild system.
* Add the fs/nfs/objlayout/Kbuild file for building the
  objlayoutdriver.ko driver
* Define fs/nfs/objlayout/objio_osd.c, register the driver on module
  initialization and unregister on exit.

[pnfs-obj: remove of CONFIG_PNFS fallout]
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
[added "unsure" clause]
[depend on NFS_V4_1]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
13 years agopnfs: client stats
J. Bruce Fields [Sun, 22 May 2011 16:48:21 +0000 (19:48 +0300)]
pnfs: client stats

A pNFS client auto-negotiates a lot of features (minorversion level,
pNFS layout type, etc.).  This is convenient, but makes certain kinds of
failures hard for a user to detect.

For example, if the client falls back on 4.0, or falls back to MDS IO
because the user didn't connect to the right iscsi disks before
mounting, the only symptoms may be reduced performance, which may not be
noticed till long after the actual failure, and may be difficult for a
user to diagnose.

However, such "failures" may also be perfectly normal in some cases, so
we don't want to spam the system logs with them.

One approach would be to put some more information into
/proc/self/mountstats.

Signed-off-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[pnfs: add commit client stats]
[fixup data types for "ret" variables in pnfs_try_to* inline funcs.]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[fix definition of show_pnfs for !CONFIG_PNFS]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: Fix show_sessions in the not CONFIG_NFS_V4_1 case]
    There is a build error when CONFIG_NFS_V4 is set but
    CONFIG_NFS_V4_1 is *not* set. show_sessions() prototype
    was unbalanced between the two cases.
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
[pnfs: super.c remove CONFIG_PNFS]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
13 years agopnfs: Use byte-range for cb_layoutrecall
Benny Halevy [Sun, 22 May 2011 16:48:02 +0000 (19:48 +0300)]
pnfs: Use byte-range for cb_layoutrecall

Use recalled range to invalidate particular layout segments in the layout cache.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
13 years agopnfs: align layoutget requests on page boundaries
Benny Halevy [Sun, 22 May 2011 16:47:46 +0000 (19:47 +0300)]
pnfs: align layoutget requests on page boundaries

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
13 years agopnfs: Use byte-range for layoutget
Benny Halevy [Sun, 22 May 2011 16:47:26 +0000 (19:47 +0300)]
pnfs: Use byte-range for layoutget

Add offset and count parameters to pnfs_update_layout and use them to get
the layout in the pageio path.

Order cache layout segments in the following order:
* offset (ascending)
* length (descending)
* iomode (RW before READ)

Test byte range against the layout segment in use in pnfs_{read,write}_pg_test
so not to coalesce pages not using the same layout segment.

[fix lseg ordering]
[clean up pnfs_find_lseg lseg arg]
[remove unnecessary FIXME]
[fix ordering in pnfs_insert_layout]
[clean up pnfs_insert_layout]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
13 years agoSUNRPC: introduce xdr_init_decode_pages
Benny Halevy [Thu, 19 May 2011 18:16:47 +0000 (14:16 -0400)]
SUNRPC: introduce xdr_init_decode_pages

Initialize xdr_stream and xdr_buf using an array of page pointers
and length of buffer.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
13 years agoNFSv4.1: use layout driver in global device cache
Benny Halevy [Tue, 24 May 2011 15:04:02 +0000 (18:04 +0300)]
NFSv4.1: use layout driver in global device cache

pnfs deviceids are unique per server, per layout type.
struct nfs_client is currently used to distinguish deviceids from
different nfs servers, yet these may clash between different layout
types on the same server.  Therefore, use the layout driver associated
with each deviceid at insertion time to look it up, unhash, or
delete it.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
13 years agopnfs: CB_NOTIFY_DEVICEID
Marc Eshel [Sun, 22 May 2011 16:47:09 +0000 (19:47 +0300)]
pnfs: CB_NOTIFY_DEVICEID

Note: This functionlaity is incomplete as all layout segments referring to
the 'to be removed device id' need to be reaped, and all in flight I/O drained.

[use be32 res in nfs4_callback_devicenotify]
[use nfs_client to qualify deviceid for cb_notify_deviceid]
[use global deviceid cache for CB_NOTIFY_DEVICEID]
[refactor device cache _lookup_deviceid]
[refactor device cache _find_get_deviceid]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[Bug in new global-device-cache code]
[layout_driver MUST set free_deviceid_node if using dev-cache]
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
13 years agoNFSv4.1: purge deviceid cache on nfs_free_client
Benny Halevy [Fri, 20 May 2011 11:47:33 +0000 (13:47 +0200)]
NFSv4.1: purge deviceid cache on nfs_free_client

Use the pnfs_layoutdriver_type both as a qualifier for the deviceid,
distinguishing deviceid from different layout types on the server,
and for freeing the layout-driver allocated structure containing the
nfs4_deviceid_node.

[BUG in _deviceid_purge_client]
[layout_driver MUST set free_deviceid_node if using dev-cache]
[let ver < 4.1 compile]
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
[removed EXPORT_SYMBOL_GPL(nfs4_deviceid_purge_client)]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
13 years agoNFSv4.1: make deviceid cache global
Benny Halevy [Fri, 20 May 2011 02:14:47 +0000 (22:14 -0400)]
NFSv4.1: make deviceid cache global

Move deviceid cache from the pnfs files layout driver to the
generic layer in preparation for the objects layout driver.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
13 years agopnfs: resolve header dependency in pnfs.h
Benny Halevy [Thu, 5 May 2011 05:28:46 +0000 (08:28 +0300)]
pnfs: resolve header dependency in pnfs.h

Some definitions in the header file depend on nfs_fs.h so pnfs.h can't
be included independently.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
13 years agoNFSv4.1: use struct nfs_client to qualify deviceid
Benny Halevy [Fri, 20 May 2011 08:45:05 +0000 (10:45 +0200)]
NFSv4.1: use struct nfs_client to qualify deviceid

deviceids are unique per server, per layout type.
Therefore, in the global cache in the files layout driver
deviceids from different servers may clash so we need
to qualify them with a struct nfs_client that represents
the nfs server that returned the deviceid.

Introduced in 2.6.39 commit ea8eecdd
"NFSv4.1 move deviceid cache to filelayout driver"

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
13 years agoNFSv4.1: fix typo in filelayout_check_layout
Jim Rees [Wed, 23 Feb 2011 00:31:57 +0000 (19:31 -0500)]
NFSv4.1: fix typo in filelayout_check_layout

Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
13 years agoLinux 2.6.39 v2.6.39
Linus Torvalds [Thu, 19 May 2011 04:06:34 +0000 (21:06 -0700)]
Linux 2.6.39

13 years agoMerge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2
Linus Torvalds [Wed, 18 May 2011 23:50:28 +0000 (16:50 -0700)]
Merge branch 'fixes' of git://git./linux/kernel/git/jlbec/ocfs2

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2:
  configfs: Fix race between configfs_readdir() and configfs_d_iput()
  configfs: Don't try to d_delete() negative dentries.
  ocfs2/dlm: Target node death during resource migration leads to thread spin
  ocfs2: Skip mount recovery for hard-ro mounts
  ocfs2/cluster: Heartbeat mismatch message improved
  ocfs2/cluster: Increase the live threshold for global heartbeat
  ocfs2/dlm: Use negotiated o2dlm protocol version
  ocfs2: skip existing hole when removing the last extent_rec in punching-hole codes.
  ocfs2: Initialize data_ac (might be used uninitialized)

13 years agoMerge branch 'devicetree/merge' of git://git.secretlab.ca/git/linux-2.6
Linus Torvalds [Wed, 18 May 2011 20:25:57 +0000 (13:25 -0700)]
Merge branch 'devicetree/merge' of git://git.secretlab.ca/git/linux-2.6

* 'devicetree/merge' of git://git.secretlab.ca/git/linux-2.6:
  drivercore: revert addition of of_match to struct device
  of: fix race when matching drivers

13 years agoMerge branch 'upstream' of git://git.linux-mips.org/pub/scm/upstream-linus
Linus Torvalds [Wed, 18 May 2011 20:21:43 +0000 (13:21 -0700)]
Merge branch 'upstream' of git://git.linux-mips.org/upstream-linus

* 'upstream' of git://git.linux-mips.org/pub/scm/upstream-linus:
  MIPS: Kludge IP27 build for 2.6.39.
  MIPS: AR7: Fix GPIO register size for Titan variant.
  MIPS: Fix duplicate invocation of notify_die.
  MIPS: RB532: Fix iomap resource size miscalculation.

13 years agodrivercore: revert addition of of_match to struct device
Grant Likely [Wed, 18 May 2011 17:19:24 +0000 (11:19 -0600)]
drivercore: revert addition of of_match to struct device

Commit b826291c, "drivercore/dt: add a match table pointer to struct
device" added an of_match pointer to struct device to cache the
of_match_table entry discovered at driver match time.  This was unsafe
because matching is not an atomic operation with probing a driver.  If
two or more drivers are attempted to be matched to a driver at the
same time, then the cached matching entry pointer could get
overwritten.

This patch reverts the of_match cache pointer and reworks all users to
call of_match_device() directly instead.

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
13 years agoof: fix race when matching drivers
Milton Miller [Wed, 18 May 2011 15:27:39 +0000 (10:27 -0500)]
of: fix race when matching drivers

If two drivers are probing devices at the same time, both will write
their match table result to the dev->of_match cache at the same time.

Only write the result if the device matches.

In a thread titled "SBus devices sometimes detected, sometimes not",
Meelis reported his SBus hme was not detected about 50% of the time.
From the debug suggested by Grant it was obvious another driver matched
some devices between the call to match the hme and the hme discovery
failling.

Reported-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: Milton Miller <miltonm@bga.com>
[grant.likely: modified to only call of_match_device() once]
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
13 years agoMerge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
Linus Torvalds [Wed, 18 May 2011 13:49:02 +0000 (06:49 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block

* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  block: don't delay blk_run_queue_async
  scsi: remove performance regression due to async queue run
  blk-throttle: Use task_subsys_state() to determine a task's blkio_cgroup
  block: rescan partitions on invalidated devices on -ENOMEDIA too
  cdrom: always check_disk_change() on open
  block: unexport DISK_EVENT_MEDIA_CHANGE for legacy/fringe drivers

13 years agoMIPS: Kludge IP27 build for 2.6.39.
Ralf Baechle [Wed, 18 May 2011 12:14:36 +0000 (13:14 +0100)]
MIPS: Kludge IP27 build for 2.6.39.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
13 years agoMIPS: AR7: Fix GPIO register size for Titan variant.
Florian Fainelli [Fri, 13 May 2011 15:41:21 +0000 (17:41 +0200)]
MIPS: AR7: Fix GPIO register size for Titan variant.

The 'size' variable contains the correct register size for both AR7
and Titan, but we never used it to ioremap the correct register size.
This problem only shows up on Titan.

[ralf@linux-mips.org: Fixed the fix.  The original patch as in patchwork
recognizes the problem correctly then fails to fix it ...]

Reported-by: Alexander Clouter <alex@digriz.org.uk>
Signed-off-by: Florian Fainelli <florian@openwrt.org>
Patchwork: https://patchwork.linux-mips.org/patch/2380/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
13 years agoMIPS: Fix duplicate invocation of notify_die.
Ralf Baechle [Fri, 13 May 2011 09:33:28 +0000 (10:33 +0100)]
MIPS: Fix duplicate invocation of notify_die.

Initial patch by Yury Polyanskiy <ypolyans@princeton.edu>.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Patchwork: https://patchwork.linux-mips.org/patch/2373/

13 years agoMIPS: RB532: Fix iomap resource size miscalculation.
Ralf Baechle [Thu, 12 May 2011 12:55:48 +0000 (13:55 +0100)]
MIPS: RB532: Fix iomap resource size miscalculation.

This is the MIPS portion of Joe Perches <joe@perches.com>'s
https://patchwork.linux-mips.org/patch/2172/ which seems to have been
lost in time and space.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
13 years agoconfigfs: Fix race between configfs_readdir() and configfs_d_iput()
Joel Becker [Wed, 18 May 2011 11:08:16 +0000 (04:08 -0700)]
configfs: Fix race between configfs_readdir() and configfs_d_iput()

configfs_readdir() will use the existing inode numbers of inodes in the
dcache, but it makes them up for attribute files that aren't currently
instantiated.  There is a race where a closing attribute file can be
tearing down at the same time as configfs_readdir() is trying to get its
inode number.

We want to get the inode number of open attribute files, because they
should match while instantiated.  We can't lock down the transition
where dentry->d_inode is set to NULL, so we just check for NULL there.
We can, however, ensure that an inode we find isn't iput() in
configfs_d_iput() until after we've accessed it.

Signed-off-by: Joel Becker <jlbec@evilplan.org>
13 years agoconfigfs: Don't try to d_delete() negative dentries.
Joel Becker [Tue, 22 Feb 2011 09:09:49 +0000 (01:09 -0800)]
configfs: Don't try to d_delete() negative dentries.

When configfs is faking mkdir() on its subsystem or default group
objects, it starts by adding a negative dentry.  It then tries to
instantiate the group.  If that should fail, it must clean up after
itself.

I was using d_delete() here, but configfs_attach_group() promises to
return an empty dentry on error.  d_delete() explodes with the entry
dentry.  Let's try d_drop() instead.  The unhashing is what we want for
our dentry.

Signed-off-by: Joel Becker <jlbec@evilplan.org>
13 years agoblock: don't delay blk_run_queue_async
Shaohua Li [Wed, 18 May 2011 09:22:43 +0000 (11:22 +0200)]
block: don't delay blk_run_queue_async

Let's check a scenario:
1. blk_delay_queue(q, SCSI_QUEUE_DELAY);
2. blk_run_queue_async();
the second one will became a noop, because q->delay_work already has
WORK_STRUCT_PENDING_BIT set, so the delayed work will still run after
SCSI_QUEUE_DELAY. But blk_run_queue_async actually hopes the delayed
work runs immediately.

Fix this by doing a cancel on potentially pending delayed work
before queuing an immediate run of the workqueue.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
13 years agoMerge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab...
Linus Torvalds [Wed, 18 May 2011 10:16:38 +0000 (03:16 -0700)]
Merge branch 'v4l_for_linus' of git://git./linux/kernel/git/mchehab/linux-2.6

* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6:
  [media] V4L: soc-camera: regression fix: calculate .sizeimage in soc_camera.c
  [media] v4l2-subdev: fix broken subdev control enumeration
  [media] Fix cx88 remote control input
  [media] v4l: Release module if subdev registration fails

13 years agoMerge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Wed, 18 May 2011 10:14:34 +0000 (03:14 -0700)]
Merge branch 'x86-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip

* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, AMD: Fix ARAT feature setting again
  Revert "x86, AMD: Fix APIC timer erratum 400 affecting K8 Rev.A-E processors"
  x86, apic: Fix spurious error interrupts triggering on all non-boot APs
  x86, mce, AMD: Fix leaving freed data in a list
  x86: Fix UV BAU for non-consecutive nasids
  x86, UV: Fix NMI handler for UV platforms

13 years agoMerge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Wed, 18 May 2011 10:13:46 +0000 (03:13 -0700)]
Merge branch 'perf-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip

* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf evlist: Fix per thread mmap setup
  perf tools: Honour the cpu list parameter when also monitoring a thread list
  kprobes, x86: Disable irqs during optimized callback

13 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6
Linus Torvalds [Wed, 18 May 2011 10:13:11 +0000 (03:13 -0700)]
Merge git://git./linux/kernel/git/sfrench/cifs-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
  cifs: fix cifsConvertToUCS() for the mapchars case
  cifs: add fallback in is_path_accessible for old servers

13 years agoprocfs: add stub for proc_mkdir_mode()
Randy Dunlap [Tue, 17 May 2011 22:44:12 +0000 (15:44 -0700)]
procfs: add stub for proc_mkdir_mode()

Provide a stub for proc_mkdir_mode() when CONFIG_PROC_FS is not
enabled, just like the stub for proc_mkdir().

Fixes this linux-next build error:

  drivers/net/wireless/airo.c:4504: error: implicit declaration of function 'proc_mkdir_mode'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "John W. Linville" <linville@tuxdriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoum: fix abort
Richard Weinberger [Tue, 17 May 2011 22:44:11 +0000 (15:44 -0700)]
um: fix abort

os_dump_core() uses abort() to terminate UML in case of an fatal error.

glibc's abort() calls raise(SIGABRT) which makes use of tgkill().
tgkill() has no effect within UML's kernel threads because they are not
pthreads.  As fallback abort() executes an invalid instruction to
terminate the process.  Therefore UML gets killed by SIGSEGV and leaves a
ugly log entry in the host's kernel ring buffer.

To get rid of this we use our own abort routine.

Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agomemcg: fix zone congestion
KAMEZAWA Hiroyuki [Tue, 17 May 2011 22:44:10 +0000 (15:44 -0700)]
memcg: fix zone congestion

ZONE_CONGESTED should be a state of global memory reclaim.  If not, a busy
memcg sets this and give unnecessary throttoling in wait_iff_congested()
against memory recalim in other contexts.  This makes system performance
bad.

I'll think about "memcg is congested!" flag is required or not, later.
But this fix is required first.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Acked-by: Ying Han <yinghan@google.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agodrivers/leds/leds-lm3530.c: add MODULE_DEVICE_TABLE
Axel Lin [Tue, 17 May 2011 22:44:09 +0000 (15:44 -0700)]
drivers/leds/leds-lm3530.c: add MODULE_DEVICE_TABLE

Adding the necessary MODULE_DEVICE_TABLE() information allows the driver
to be automatically loaded by udev.

Signed-off-by: Axel Lin <axel.lin@gmail.com>
Cc: Shreshtha Kumar SAHU <shreshthakumar.sahu@stericsson.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agorapidio: fix default routing initialization
Alexandre Bounine [Tue, 17 May 2011 22:44:08 +0000 (15:44 -0700)]
rapidio: fix default routing initialization

Fix switch initialization to ensure that all switches have default routing
disabled.  This guarantees that no unexpected RapidIO packets arrive to
the default port set by reset and there is no default routing destination
until it is properly configured by software.

This update also unifies handling of unmapped destinations by tsi57x, IDT
Gen1 and IDT Gen2 switches.

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Thomas Moll <thomas.moll@sysgo.com>
Cc: <stable@kernel.org> [2.6.37+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agocifs: fix cifsConvertToUCS() for the mapchars case
Jeff Layton [Tue, 17 May 2011 19:28:21 +0000 (15:28 -0400)]
cifs: fix cifsConvertToUCS() for the mapchars case

As Metze pointed out, commit 84cdf74e broke mapchars option:

    Commit "cifs: fix unaligned accesses in cifsConvertToUCS"
    (84cdf74e8096a10dd6acbb870dd404b92f07a756) does multiple steps
    in just one commit (moving the function and changing it without
    testing).

    put_unaligned_le16(temp, &target[j]); is never called for any
    codepoint the goes via the 'default' switch statement. As a result
    we put just zero (or maybe uninitialized) bytes into the target
    buffer.

His proposed patch looks correct, but doesn't apply to the current head
of the tree. This patch should also fix it.

Cc: <stable@kernel.org> # .38.x: 581ade4: cifs: clean up various nits in unicode routines (try #2)
Reported-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
13 years agocifs: add fallback in is_path_accessible for old servers
Jeff Layton [Tue, 17 May 2011 10:40:30 +0000 (06:40 -0400)]
cifs: add fallback in is_path_accessible for old servers

The is_path_accessible check uses a QPathInfo call, which isn't
supported by ancient win9x era servers. Fall back to an older
SMBQueryInfo call if it fails with the magic error codes.

Cc: stable@kernel.org
Reported-and-Tested-by: Sandro Bonazzola <sandro.bonazzola@gmail.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
13 years agoMerge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Tue, 17 May 2011 15:02:04 +0000 (08:02 -0700)]
Merge branch 'timers-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip

* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  tick: Clear broadcast active bit when switching to oneshot
  rtc: mc13xxx: Don't call rtc_device_register while holding lock
  rtc: rp5c01: Initialize drvdata before registering device
  rtc: pcap: Initialize drvdata before registering device
  rtc: msm6242: Initialize drvdata before registering device
  rtc: max8998: Initialize drvdata before registering device
  rtc: max8925: Initialize drvdata before registering device
  rtc: m41t80: Initialize clientdata before registering device
  rtc: ds1286: Initialize drvdata before registering device
  rtc: ep93xx: Initialize drvdata before registering device
  rtc: davinci: Initialize drvdata before registering device
  rtc: mxc: Initialize drvdata before registering device
  clocksource: Install completely before selecting

13 years agox86, AMD: Fix ARAT feature setting again
Borislav Petkov [Tue, 17 May 2011 12:55:19 +0000 (14:55 +0200)]
x86, AMD: Fix ARAT feature setting again

Trying to enable the local APIC timer on early K8 revisions
uncovers a number of other issues with it, in conjunction with
the C1E enter path on AMD. Fixing those causes much more churn
and troubles than the benefit of using that timer brings so
don't enable it on K8 at all, falling back to the original
functionality the kernel had wrt to that.

Reported-and-bisected-by: Nick Bowler <nbowler@elliptictech.com>
Cc: Boris Ostrovsky <Boris.Ostrovsky@amd.com>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Hans Rosenfeld <hans.rosenfeld@amd.com>
Cc: Nick Bowler <nbowler@elliptictech.com>
Cc: Joerg-Volker-Peetz <jvpeetz@web.de>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Link: http://lkml.kernel.org/r/1305636919-31165-3-git-send-email-bp@amd64.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
13 years agoRevert "x86, AMD: Fix APIC timer erratum 400 affecting K8 Rev.A-E processors"
Borislav Petkov [Tue, 17 May 2011 12:55:18 +0000 (14:55 +0200)]
Revert "x86, AMD: Fix APIC timer erratum 400 affecting K8 Rev.A-E processors"

This reverts commit e20a2d205c05cef6b5783df339a7d54adeb50962, as it crashes
certain boxes with specific AMD CPU models.

Moving the lower endpoint of the Erratum 400 check to accomodate
earlier K8 revisions (A-E) opens a can of worms which is simply
not worth to fix properly by tweaking the errata checking
framework:

* missing IntPenging MSR on revisions < CG cause #GP:

http://marc.info/?l=linux-kernel&m=130541471818831

* makes earlier revisions use the LAPIC timer instead of the C1E
idle routine which switches to HPET, thus not waking up in
deeper C-states:

http://lkml.org/lkml/2011/4/24/20

Therefore, leave the original boundary starting with K8-revF.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
13 years agoscsi: remove performance regression due to async queue run
Jens Axboe [Tue, 17 May 2011 09:04:44 +0000 (11:04 +0200)]
scsi: remove performance regression due to async queue run

Commit c21e6beb removed our queue request_fn re-enter
protection, and defaulted to always running the queues from
kblockd to be safe. This was a known potential slow down,
but should be safe.

Unfortunately this is causing big performance regressions for
some, so we need to improve this logic. Looking into the details
of the re-enter, the real issue is on requeue of requests.

Requeue of requests upon seeing a BUSY condition from the device
ends up re-running the queue, causing traces like this:

scsi_request_fn()
        scsi_dispatch_cmd()
                scsi_queue_insert()
                        __scsi_queue_insert()
                                scsi_run_queue()
scsi_request_fn()
...

potentially causing the issue we want to avoid. So special
case the requeue re-run of the queue, but improve it to offload
the entire run of local queue and starved queue from a single
workqueue callback. This is a lot better than potentially
kicking off a workqueue run for each device seen.

This also fixes the issue of the local device going into recursion,
since the above mentioned commit never moved that queue run out
of line.

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
13 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
Linus Torvalds [Tue, 17 May 2011 01:38:08 +0000 (18:38 -0700)]
Merge git://git./linux/kernel/git/davem/net-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  net: Change netdev_fix_features messages loglevel
  vmxnet3: Fix inconsistent LRO state after initialization
  sfc: Fix oops in register dump after mapping change
  IPVS: fix netns if reading ip_vs_* procfs entries
  bridge: fix forwarding of IPv6

13 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc
Linus Torvalds [Tue, 17 May 2011 01:36:47 +0000 (18:36 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/cjb/mmc

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc:
  Revert "mmc: fix a race between card-detect rescan and clock-gate work instances"

13 years agomm: fix kernel-doc warning in page_alloc.c
Randy Dunlap [Mon, 16 May 2011 20:16:54 +0000 (13:16 -0700)]
mm: fix kernel-doc warning in page_alloc.c

Fix new kernel-doc warning in mm/page_alloc.c:

  Warning(mm/page_alloc.c:2370): No description found for parameter 'nid'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoPCI: Clear bridge resource flags if requested size is 0
Yinghai Lu [Sat, 14 May 2011 01:06:17 +0000 (18:06 -0700)]
PCI: Clear bridge resource flags if requested size is 0

During pci remove/rescan testing found:

  pci 0000:c0:03.0: PCI bridge to [bus c4-c9]
  pci 0000:c0:03.0:   bridge window [io  0x1000-0x0fff]
  pci 0000:c0:03.0:   bridge window [mem 0xf0000000-0xf00fffff]
  pci 0000:c0:03.0:   bridge window [mem 0xfc180000000-0xfc197ffffff 64bit pref]
  pci 0000:c0:03.0: device not available (can't reserve [io  0x1000-0x0fff])
  pci 0000:c0:03.0: Error enabling bridge (-22), continuing
  pci 0000:c0:03.0: enabling bus mastering
  pci 0000:c0:03.0: setting latency timer to 64
  pcieport 0000:c0:03.0: device not available (can't reserve [io  0x1000-0x0fff])
  pcieport: probe of 0000:c0:03.0 failed with error -22

This bug was caused by commit c8adf9a3e873 ("PCI: pre-allocate
additional resources to devices only after successful allocation of
essential resources.")

After that commit, pci_hotplug_io_size is changed to additional_io_size
from minium size.  So it will not go through resource_size(res) != 0
path, and will not be reset.

The root cause is: pci_bridge_check_ranges will set RESOURCE_IO flag for
pci bridge, and later if children do not need IO resource.  those bridge
resources will not need to be allocated.  but flags is still there.
that will confuse the the pci_enable_bridges later.

related code:

   static void assign_requested_resources_sorted(struct resource_list *head,
                                    struct resource_list_x *fail_head)
   {
           struct resource *res;
           struct resource_list *list;
           int idx;

           for (list = head->next; list; list = list->next) {
                   res = list->res;
                   idx = res - &list->dev->resource[0];
                   if (resource_size(res) && pci_assign_resource(list->dev, idx)) {
   ...
                           reset_resource(res);
                   }
           }
   }

At last, We have to clear the flags in pbus_size_mem/io when requested
size == 0 and !add_head.  becasue this case it will not go through
adjust_resources_sorted().

Just make size1 = size0 when !add_head. it will make flags get cleared.

At the same time when requested size == 0, add_size != 0, will still
have in head and add_list.  because we do not clear the flags for it.

After this, we will get right result:

  pci 0000:c0:03.0: PCI bridge to [bus c4-c9]
  pci 0000:c0:03.0:   bridge window [io  disabled]
  pci 0000:c0:03.0:   bridge window [mem 0xf0000000-0xf00fffff]
  pci 0000:c0:03.0:   bridge window [mem 0xfc180000000-0xfc197ffffff 64bit pref]
  pci 0000:c0:03.0: enabling bus mastering
  pci 0000:c0:03.0: setting latency timer to 64
  pcieport 0000:c0:03.0: setting latency timer to 64
  pcieport 0000:c0:03.0: irq 160 for MSI/MSI-X
  pcieport 0000:c0:03.0: Signaling PME through PCIe PME interrupt
  pci 0000:c4:00.0: Signaling PME through PCIe PME interrupt
  pcie_pme 0000:c0:03.0:pcie01: service driver pcie_pme loaded
  aer 0000:c0:03.0:pcie02: service driver aer loaded
  pciehp 0000:c0:03.0:pcie04: Hotplug Controller:

v3: more simple fix. also fix one typo in pbus_size_mem

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Reviewed-by: Ram Pai <linuxram@us.ibm.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agotick: Clear broadcast active bit when switching to oneshot
Thomas Gleixner [Mon, 16 May 2011 09:07:48 +0000 (11:07 +0200)]
tick: Clear broadcast active bit when switching to oneshot

The first cpu which switches from periodic to oneshot mode switches
also the broadcast device into oneshot mode. The broadcast device
serves as a backup for per cpu timers which stop in deeper
C-states. To avoid starvation of the cpus which might be in idle and
depend on broadcast mode it marks the other cpus as broadcast active
and sets the brodcast expiry value of those cpus to the next tick.

The oneshot mode broadcast bit for the other cpus is sticky and gets
only cleared when those cpus exit idle. If a cpu was not idle while
the bit got set in consequence the bit prevents that the broadcast
device is armed on behalf of that cpu when it enters idle for the
first time after it switched to oneshot mode.

In most cases that goes unnoticed as one of the other cpus has usually
a timer pending which keeps the broadcast device armed with a short
timeout. Now if the only cpu which has a short timer active has the
bit set then the broadcast device will not be armed on behalf of that
cpu and will fire way after the expected timer expiry. In the case of
Christians bug report it took ~145 seconds which is about half of the
wrap around time of HPET (the limit for that device) due to the fact
that all other cpus had no timers armed which expired before the 145
seconds timeframe.

The solution is simply to clear the broadcast active bit
unconditionally when a cpu switches to oneshot mode after the first
cpu switched the broadcast device over. It's not idle at that point
otherwise it would not be executing that code.

[ I fundamentally hate that broadcast crap. Why the heck thought some
  folks that when going into deep idle it's a brilliant concept to
  switch off the last device which brings the cpu back from that
  state? ]

Thanks to Christian for providing all the valuable debug information!

Reported-and-tested-by: Christian Hoffmann <email@christianhoffmann.info>
Cc: John Stultz <johnstul@us.ibm.com>
Link: http://lkml.kernel.org/r/%3Calpine.LFD.2.02.1105161105170.3078%40ionos%3E
Cc: stable@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
13 years agonet: Change netdev_fix_features messages loglevel
Michał Mirosław [Mon, 16 May 2011 19:14:21 +0000 (15:14 -0400)]
net: Change netdev_fix_features messages loglevel

Those reduced to DEBUG can possibly be triggered by unprivileged processes
and are nothing exceptional. Illegal checksum combinations can only be
caused by driver bug, so promote those messages to WARN.

Since GSO without SG will now only cause DEBUG message from
netdev_fix_features(), remove the workaround from register_netdevice().

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 years agovmxnet3: Fix inconsistent LRO state after initialization
Thomas Jarosch [Mon, 16 May 2011 06:28:15 +0000 (06:28 +0000)]
vmxnet3: Fix inconsistent LRO state after initialization

During initialization of vmxnet3, the state of LRO
gets out of sync with netdev->features.

This leads to very poor TCP performance in a IP forwarding
setup and is hitting many VMware users.

Simplified call sequence:
1. vmxnet3_declare_features() initializes "adapter->lro" to true.

2. The kernel automatically disables LRO if IP forwarding is enabled,
so vmxnet3_set_flags() gets called. This also updates netdev->features.

3. Now vmxnet3_setup_driver_shared() is called. "adapter->lro" is still
set to true and LRO gets enabled again, even though
netdev->features shows it's disabled.

Fix it by updating "adapter->lro", too.

The private vmxnet3 adapter flags are scheduled for removal
in net-next, see commit a0d2730c9571aeba793cb5d3009094ee1d8fda35
"net: vmxnet3: convert to hw_features".

Patch applies to 2.6.37 / 2.6.38 and 2.6.39-rc6.

Please CC: comments.

Signed-off-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 years agosfc: Fix oops in register dump after mapping change
Ben Hutchings [Mon, 16 May 2011 06:13:49 +0000 (06:13 +0000)]
sfc: Fix oops in register dump after mapping change

Commit 747df2258b1b9a2e25929ef496262c339c380009 ('sfc: Always map MCDI
shared memory as uncacheable') introduced a separate mapping for the
MCDI shared memory (MC_TREG_SMEM).  This means we can no longer easily
include it in the register dump.  Since it is not particularly useful
in debugging, substitute a recognisable dummy value.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 years agoMerge branch 'omap-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Mon, 16 May 2011 15:55:49 +0000 (08:55 -0700)]
Merge branch 'omap-fixes-for-linus' of git://git./linux/kernel/git/tmlind/linux-omap-2.6

* 'omap-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6:
  OMAP3: set the core dpll clk rate in its set_rate function
  omap: iommu: Return IRQ_HANDLED in fault handler when no fault occured

13 years agoMerge branch 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied...
Linus Torvalds [Mon, 16 May 2011 15:47:31 +0000 (08:47 -0700)]
Merge branch 'drm-fixes' of git://git./linux/kernel/git/airlied/drm-2.6

* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
  drm: Take lock around probes for drm_fb_helper_hotplug_event
  drm/i915: Revert i915.semaphore=1 default from 47ae63e0
  vga_switcheroo: don't toggle-switch devices
  drm/radeon/kms: add some evergreen/ni safe regs
  drm/radeon/kms: fix extended lvds info parsing
  drm/radeon/kms: fix tiling reg on fusion

13 years agoRevert "mmc: fix a race between card-detect rescan and clock-gate work instances"
Chris Ball [Mon, 16 May 2011 15:32:26 +0000 (11:32 -0400)]
Revert "mmc: fix a race between card-detect rescan and clock-gate work instances"

This reverts commit 26fc8775b51484d8c0a671198639c6d5ae60533e, which has
been reported to cause boot/resume-time crashes for some users:

https://bbs.archlinux.org/viewtopic.php?id=118751.

Signed-off-by: Chris Ball <cjb@laptop.org>
Cc: <stable@kernel.org>
13 years agoblk-throttle: Use task_subsys_state() to determine a task's blkio_cgroup
Vivek Goyal [Mon, 16 May 2011 13:24:08 +0000 (15:24 +0200)]
blk-throttle: Use task_subsys_state() to determine a task's blkio_cgroup

Currentlly we first map the task to cgroup and then cgroup to
blkio_cgroup. There is a more direct way to get to blkio_cgroup
from task using task_subsys_state(). Use that.

The real reason for the fix is that it also avoids a race in generic
cgroup code. During remount/umount rebind_subsystems() is called and
it can do following with and rcu protection.

cgrp->subsys[i] = NULL;

That means if somebody got hold of cgroup under rcu and then it tried
to do cgroup->subsys[] to get to blkio_cgroup, it would get NULL which
is wrong. I was running into this race condition with ltp running on a
upstream derived kernel and that lead to crash.

So ideally we should also fix cgroup generic code to wait for rcu
grace period before setting pointer to NULL. Li Zefan is not very keen
on introducing synchronize_wait() as he thinks it will slow
down moun/remount/umount operations.

So for the time being atleast fix the kernel crash by taking a more
direct route to blkio_cgroup.

One tester had reported a crash while running LTP on a derived kernel
and with this fix crash is no more seen while the test has been
running for over 6 days.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
13 years agox86, apic: Fix spurious error interrupts triggering on all non-boot APs
Youquan Song [Thu, 21 Apr 2011 16:22:43 +0000 (00:22 +0800)]
x86, apic: Fix spurious error interrupts triggering on all non-boot APs

This patch fixes a bug reported by a customer, who found
that many unreasonable error interrupts reported on all
non-boot CPUs (APs) during the system boot stage.

According to Chapter 10 of Intel Software Developer Manual
Volume 3A, Local APIC may signal an illegal vector error when
an LVT entry is set as an illegal vector value (0~15) under
FIXED delivery mode (bits 8-11 is 0), regardless of whether
the mask bit is set or an interrupt actually happen. These
errors are seen as error interrupts.

The initial value of thermal LVT entries on all APs always reads
0x10000 because APs are woken up by BSP issuing INIT-SIPI-SIPI
sequence to them and LVT registers are reset to 0s except for
the mask bits which are set to 1s when APs receive INIT IPI.

When the BIOS takes over the thermal throttling interrupt,
the LVT thermal deliver mode should be SMI and it is required
from the kernel to keep AP's LVT thermal monitoring register
programmed as such as well.

This issue happens when BIOS does not take over thermal throttling
interrupt, AP's LVT thermal monitor register will be restored to
0x10000 which means vector 0 and fixed deliver mode, so all APs will
signal illegal vector error interrupts.

This patch check if interrupt delivery mode is not fixed mode before
restoring AP's LVT thermal monitor register.

Signed-off-by: Youquan Song <youquan.song@intel.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Yong Wang <yong.y.wang@intel.com>
Cc: hpa@linux.intel.com
Cc: joe@perches.com
Cc: jbaron@redhat.com
Cc: trenn@suse.de
Cc: kent.liu@intel.com
Cc: chaohong.guo@intel.com
Cc: <stable@kernel.org> # As far back as possible
Link: http://lkml.kernel.org/r/1303402963-17738-1-git-send-email-youquan.song@intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
13 years agodrm: Take lock around probes for drm_fb_helper_hotplug_event
Chris Wilson [Fri, 22 Apr 2011 10:03:57 +0000 (11:03 +0100)]
drm: Take lock around probes for drm_fb_helper_hotplug_event

We need to hold the dev->mode_config.mutex whilst detecting the output
status. But we also need to drop it for the call into
drm_fb_helper_single_fb_probe(), which indirectly acquires the lock when
attaching the fbcon.

Failure to do so exposes a race with normal output probing. Detected by
adding some warnings that the mutex is held to the backend detect routines:

[   17.772456] WARNING: at drivers/gpu/drm/i915/intel_crt.c:471 intel_crt_detect+0x3e/0x373 [i915]()
[   17.772458] Hardware name: Latitude E6400
[   17.772460] Modules linked in: ....
[   17.772582] Pid: 11, comm: kworker/0:1 Tainted: G        W 2.6.38.4-custom.2 #8
[   17.772584] Call Trace:
[   17.772591]  [<ffffffff81046af5>] ? warn_slowpath_common+0x78/0x8c
[   17.772603]  [<ffffffffa03f3e5c>] ? intel_crt_detect+0x3e/0x373 [i915]
[   17.772612]  [<ffffffffa0355d49>] ?  drm_helper_probe_single_connector_modes+0xbf/0x2af [drm_kms_helper]
[   17.772619]  [<ffffffffa03534d5>] ?  drm_fb_helper_probe_connector_modes+0x39/0x4d [drm_kms_helper]
[   17.772625]  [<ffffffffa0354760>] ?  drm_fb_helper_hotplug_event+0xa5/0xc3 [drm_kms_helper]
[   17.772633]  [<ffffffffa035577f>] ? output_poll_execute+0x146/0x17c [drm_kms_helper]
[   17.772638]  [<ffffffff81193c01>] ? cfq_init_queue+0x247/0x345
[   17.772644]  [<ffffffffa0355639>] ? output_poll_execute+0x0/0x17c [drm_kms_helper]
[   17.772648]  [<ffffffff8105b540>] ? process_one_work+0x193/0x28e
[   17.772652]  [<ffffffff8105c6bc>] ? worker_thread+0xef/0x172
[   17.772655]  [<ffffffff8105c5cd>] ? worker_thread+0x0/0x172
[   17.772658]  [<ffffffff8105c5cd>] ? worker_thread+0x0/0x172
[   17.772663]  [<ffffffff8105f767>] ? kthread+0x7a/0x82
[   17.772668]  [<ffffffff8100a724>] ? kernel_thread_helper+0x4/0x10
[   17.772671]  [<ffffffff8105f6ed>] ? kthread+0x0/0x82
[   17.772674]  [<ffffffff8100a720>] ? kernel_thread_helper+0x0/0x10

Reported-by: Frederik Himpe <fhimpe@telenet.be>
References: https://bugs.freedesktop.org/show_bug.cgi?id=36394
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Dave Airlie <airlied@redhat.com>
13 years agodrm/i915: Revert i915.semaphore=1 default from 47ae63e0
Andy Lutomirski [Fri, 13 May 2011 16:14:54 +0000 (12:14 -0400)]
drm/i915: Revert i915.semaphore=1 default from 47ae63e0

My Q67 / i7-2600 box has rev09 Sandy Bridge graphics.  It hangs
instantly when GNOME loads and it hangs so hard the reset button
doesn't work.  Setting i915.semaphore=0 fixes it.

Semaphores were disabled in a1656b9090f7008d2941c314f5a64724bea2ae37
in 2.6.38 and were re-enabled by

commit 47ae63e0c2e5fdb582d471dc906eb29be94c732f
Merge: c59a333 467cffb
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Mar 7 12:32:44 2011 +0000

    Merge branch 'drm-intel-fixes' into drm-intel-next

    Apply the trivial conflicting regression fixes, but keep GPU semaphores
    enabled.

    Conflicts:
        drivers/gpu/drm/i915/i915_drv.h
        drivers/gpu/drm/i915/i915_gem_execbuffer.c

(It's worth noting that the offending change is i915_drv.c,
 which is not a conflict.)

Signed-off-by: Andy Lutomirski <luto@mit.edu>
Acked-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
13 years agovga_switcheroo: don't toggle-switch devices
Florian Mickler [Sun, 15 May 2011 14:32:50 +0000 (16:32 +0200)]
vga_switcheroo: don't toggle-switch devices

If the requested device is already active, ignore the request.

This restores the original behaviour of the interface. The change was
probably an unintended side effect of

commit 66b37c6777c4 vga_switcheroo: split switching into two stages

which did not take into account to duplicate the !active check in the split-off
stage2.

Fix this by factoring that check out of stage1 into the debugfs_write routine.

References: https://bugzilla.kernel.org/show_bug.cgi?id=34252
Reported-by: Igor Murzov <e-mail@date.by>
Tested-by: Igor Murzov <e-mail@date.by>
Signed-off-by: Florian Mickler <florian@mickler.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
13 years agoMerge branch 'pablo/nf-2.6-updates' of git://1984.lsi.us.es/net-2.6
David S. Miller [Sun, 15 May 2011 22:14:02 +0000 (18:14 -0400)]
Merge branch 'pablo/nf-2.6-updates' of git://1984.lsi.us.es/net-2.6

13 years agoMerge branch 'perf/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/acme...
Ingo Molnar [Sun, 15 May 2011 17:41:00 +0000 (19:41 +0200)]
Merge branch 'perf/urgent' of git://git./linux/kernel/git/acme/linux-2.6 into perf/urgent

13 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable
Linus Torvalds [Sun, 15 May 2011 17:22:10 +0000 (10:22 -0700)]
Merge git://git./linux/kernel/git/mason/btrfs-unstable

* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
  Btrfs: fix FS_IOC_SETFLAGS ioctl
  Btrfs: fix FS_IOC_GETFLAGS ioctl
  fs: remove FS_COW_FL
  Btrfs: fix easily get into ENOSPC in mixed case
  Prevent oopsing in posix_acl_valid()

13 years agoIPVS: fix netns if reading ip_vs_* procfs entries
Hans Schillstrom [Sun, 15 May 2011 15:20:29 +0000 (17:20 +0200)]
IPVS: fix netns if reading ip_vs_* procfs entries

Without this patch every access to ip_vs in procfs will increase
the netns count i.e. an unbalanced get_net()/put_net().
(ipvsadm commands also use procfs.)
The result is you can't exit a netns if reading ip_vs_* procfs entries.

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
13 years agobridge: fix forwarding of IPv6
Stephen Hemminger [Fri, 13 May 2011 20:03:24 +0000 (16:03 -0400)]
bridge: fix forwarding of IPv6

The commit 6b1e960fdbd75dcd9bcc3ba5ff8898ff1ad30b6e
    bridge: Reset IPCB when entering IP stack on NF_FORWARD
broke forwarding of IPV6 packets in bridge because it would
call bp_parse_ip_options with an IPV6 packet.

Reported-by: Noah Meyerhans <noahm@debian.org>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
13 years agoperf evlist: Fix per thread mmap setup
Arnaldo Carvalho de Melo [Sun, 15 May 2011 12:39:00 +0000 (09:39 -0300)]
perf evlist: Fix per thread mmap setup

The PERF_EVENT_IOC_SET_OUTPUT ioctl was returning -EINVAL when using
--pid when monitoring multithreaded apps, as we can only share a ring
buffer for events on the same thread if not doing per cpu.

Fix it by using per thread ring buffers.

Tested with:

[root@felicio ~]# tuna -t 26131 -CP | nl
  1                      thread       ctxt_switches
  2    pid SCHED_ rtpri affinity voluntary nonvoluntary             cmd
  3 26131   OTHER     0      0,1  10814276      2397830 chromium-browse
  4  642    OTHER     0      0,1     14688            0 chromium-browse
  5  26148  OTHER     0      0,1    713602       115479 chromium-browse
  6  26149  OTHER     0      0,1    801958         2262 chromium-browse
  7  26150  OTHER     0      0,1   1271128          248 chromium-browse
  8  26151  OTHER     0      0,1         3            0 chromium-browse
  9  27049  OTHER     0      0,1     36796            9 chromium-browse
 10  618    OTHER     0      0,1     14711            0 chromium-browse
 11  661    OTHER     0      0,1     14593            0 chromium-browse
 12  29048  OTHER     0      0,1     28125            0 chromium-browse
 13  26143  OTHER     0      0,1   2202789          781 chromium-browse
[root@felicio ~]#

So 11 threads under pid 26131, then:

[root@felicio ~]# perf record -F 50000 --pid 26131

[root@felicio ~]# grep perf_event /proc/`pidof perf`/maps | nl
  1 7fa4a2538000-7fa4a25b9000 rwxs 00000000 00:09 4064 anon_inode:[perf_event]
  2 7fa4a25b9000-7fa4a263a000 rwxs 00000000 00:09 4064 anon_inode:[perf_event]
  3 7fa4a263a000-7fa4a26bb000 rwxs 00000000 00:09 4064 anon_inode:[perf_event]
  4 7fa4a26bb000-7fa4a273c000 rwxs 00000000 00:09 4064 anon_inode:[perf_event]
  5 7fa4a273c000-7fa4a27bd000 rwxs 00000000 00:09 4064 anon_inode:[perf_event]
  6 7fa4a27bd000-7fa4a283e000 rwxs 00000000 00:09 4064 anon_inode:[perf_event]
  7 7fa4a283e000-7fa4a28bf000 rwxs 00000000 00:09 4064 anon_inode:[perf_event]
  8 7fa4a28bf000-7fa4a2940000 rwxs 00000000 00:09 4064 anon_inode:[perf_event]
  9 7fa4a2940000-7fa4a29c1000 rwxs 00000000 00:09 4064 anon_inode:[perf_event]
 10 7fa4a29c1000-7fa4a2a42000 rwxs 00000000 00:09 4064 anon_inode:[perf_event]
 11 7fa4a2a42000-7fa4a2ac3000 rwxs 00000000 00:09 4064 anon_inode:[perf_event]
[root@felicio ~]#

11 mmaps, one per thread since we didn't specify any CPU list, so we need one
mmap per thread and:

[root@felicio ~]# perf record -F 50000 --pid 26131
^M
^C[ perf record: Woken up 79 times to write data ]
[ perf record: Captured and wrote 20.614 MB perf.data (~900639 samples) ]

[root@felicio ~]# perf report -D | grep PERF_RECORD_SAMPLE | cut -d/ -f2 | cut -d: -f1 | sort -n | uniq -c | sort -nr | nl
     1  371310 26131
     2   96516 26148
     3   95694 26149
     4   95203 26150
     5    7291 26143
     6      87 27049
     7      76 661
     8      60 29048
     9      47 618
    10      43 642
[root@felicio ~]#

Ok, one of the threads, 26151 was quiescent, so no samples there, but all the
others are there.

Then, if I specify one CPU:

[root@felicio ~]# perf record -F 50000 --pid 26131 --cpu 1
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.680 MB perf.data (~29730 samples) ]

[root@felicio ~]# perf report -D | grep PERF_RECORD_SAMPLE | cut -d/ -f2 | cut -d: -f1 | sort -n | uniq -c | sort -nr | nl
     1    8444 26131
     2    2584 26149
     3    2518 26148
     4    2324 26150
     5     123 26143
     6       9 661
     7       9 29048
[root@felicio ~]#

This machine has two cores, so fewer threads appeared on the radar, and:

[root@felicio ~]# grep perf_event /proc/`pidof perf`/maps | nl
 1 7f484b922000-7f484b9a3000 rwxs 00000000 00:09 4064 anon_inode:[perf_event]
[root@felicio ~]#

Just one mmap, as now we can use just one per-cpu buffer instead of the
per-thread needed in the previous case.

For global profiling:

[root@felicio ~]# perf record -F 50000 -a
^C[ perf record: Woken up 26 times to write data ]
[ perf record: Captured and wrote 7.128 MB perf.data (~311412 samples) ]

[root@felicio ~]# grep perf_event /proc/`pidof perf`/maps | nl
     1 7fb49b435000-7fb49b4b6000 rwxs 00000000 00:09 4064                       anon_inode:[perf_event]
     2 7fb49b4b6000-7fb49b537000 rwxs 00000000 00:09 4064                       anon_inode:[perf_event]
[root@felicio ~]#

It uses per-cpu buffers.

For just one thread:

[root@felicio ~]# perf record -F 50000 --tid 26148
^C[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 0.330 MB perf.data (~14426 samples) ]

[root@felicio ~]# perf report -D | grep PERF_RECORD_SAMPLE | cut -d/ -f2 | cut -d: -f1 | sort -n | uniq -c | sort -nr | nl
     1    9969 26148
[root@felicio ~]#

[root@felicio ~]# grep perf_event /proc/`pidof perf`/maps | nl
     1 7f286a51b000-7f286a59c000 rwxs 00000000 00:09 4064                       anon_inode:[perf_event]
[root@felicio ~]#

Tested-by: David Ahern <dsahern@gmail.com>
Tested-by: Lin Ming <ming.m.lin@intel.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
Link: http://lkml.kernel.org/r/20110426204401.GB1746@ghostprotocols.net
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
13 years agoperf tools: Honour the cpu list parameter when also monitoring a thread list
Arnaldo Carvalho de Melo [Mon, 25 Apr 2011 19:25:20 +0000 (16:25 -0300)]
perf tools: Honour the cpu list parameter when also monitoring a thread list

The perf_evlist__create_maps was discarding the --cpu parameter when a
--pid or --tid was specified, fix that.

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
Link: http://lkml.kernel.org/r/20110426204401.GB1746@ghostprotocols.net
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
13 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph...
Linus Torvalds [Sat, 14 May 2011 22:41:10 +0000 (15:41 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/sage/ceph-client

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  rbd: fix split bio handling
  rbd: fix leak of ops struct

13 years agoBtrfs: fix FS_IOC_SETFLAGS ioctl
Li Zefan [Fri, 15 Apr 2011 03:03:17 +0000 (03:03 +0000)]
Btrfs: fix FS_IOC_SETFLAGS ioctl

Steps to reproduce the bug:

  - Call FS_IOC_SETLFAGS ioctl with flags=FS_COMPR_FL
  - Call FS_IOC_SETFLAGS ioctl with flags=0
  - Call FS_IOC_GETFLAGS ioctl, and you'll see FS_COMPR_FL is still set!

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
13 years agoBtrfs: fix FS_IOC_GETFLAGS ioctl
Li Zefan [Fri, 15 Apr 2011 03:03:06 +0000 (03:03 +0000)]
Btrfs: fix FS_IOC_GETFLAGS ioctl

As we've added per file compression/cow support.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
13 years agofs: remove FS_COW_FL
Li Zefan [Fri, 15 Apr 2011 03:02:49 +0000 (03:02 +0000)]
fs: remove FS_COW_FL

FS_COW_FL and FS_NOCOW_FL were newly introduced to control per file
COW in btrfs, but FS_NOCOW_FL is sufficient.

The fact is we don't have corresponding BTRFS_INODE_COW flag.

COW is default, and FS_NOCOW_FL can be used to switch off COW for
a single file.

If we mount btrfs with nodatacow, a newly created file will be set with
the FS_NOCOW_FL flag. So to turn on COW for it, we can just clear the
FS_NOCOW_FL flag.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
13 years agoBtrfs: fix easily get into ENOSPC in mixed case
liubo [Fri, 8 Apr 2011 08:44:37 +0000 (08:44 +0000)]
Btrfs: fix easily get into ENOSPC in mixed case

When a btrfs disk is created by mixed data & metadata option, it will have no
pure data or pure metadata space info.

In btrfs's for-linus branch, commit 78b1ea13838039cd88afdd62519b40b344d6c920
(Btrfs: fix OOPS of empty filesystem after balance) initializes space infos at
the very beginning.  The problem is this initialization does not take the mixed
case into account, which will cause btrfs will easily get into ENOSPC in mixed
case.

Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
13 years agoPrevent oopsing in posix_acl_valid()
Daniel J Blueman [Tue, 3 May 2011 16:44:13 +0000 (16:44 +0000)]
Prevent oopsing in posix_acl_valid()

If posix_acl_from_xattr() returns an error code, a negative address is
dereferenced causing an oops; fix by checking for error code first.

Signed-off-by: Daniel J Blueman <daniel.blueman@gmail.com>
Reviewed-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
13 years agoMerge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzi...
Linus Torvalds [Sat, 14 May 2011 19:19:18 +0000 (12:19 -0700)]
Merge branch 'upstream-linus' of git://git./linux/kernel/git/jgarzik/libata-dev

* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
  libata: fix oops when LPM is used with PMP

13 years agotmpfs: fix race between swapoff and writepage
Hugh Dickins [Sat, 14 May 2011 19:06:42 +0000 (12:06 -0700)]
tmpfs: fix race between swapoff and writepage

Shame on me!  Commit b1dea800ac39 "tmpfs: fix race between umount and
writepage" fixed the advertized race, but introduced another: as even
its comment makes clear, we cannot safely rely on a peek at list_empty()
while holding no lock - until info->swapped is set, shmem_unuse_inode()
may delete any formerly-swapped inode from the shmem_swaplist, which
in this case would leave a swap area impossible to swapoff.

Although I don't relish taking the mutex every time, I don't care much
for the alternatives either; and at least the peek at list_empty() in
shmem_evict_inode() (a hotter path since most inodes would never have
been swapped) remains safe, because we already truncated the whole file.

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agolibata: fix oops when LPM is used with PMP
Tejun Heo [Mon, 9 May 2011 14:04:11 +0000 (16:04 +0200)]
libata: fix oops when LPM is used with PMP

ae01b2493c (libata: Implement ATA_FLAG_NO_DIPM and apply it to mcp65)
added ATA_FLAG_NO_DIPM and made ata_eh_set_lpm() check the flag.
However, @ap is NULL if @link points to a PMP link and thus the
unconditional @ap->flags dereference leads to the following oops.

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
  IP: [<ffffffff813f98e1>] ata_eh_recover+0x9a1/0x1510
  ...
  Pid: 295, comm: scsi_eh_4 Tainted: P            2.6.38.5-core2 #1 System76, Inc. Serval Professional/Serval Professional
  RIP: 0010:[<ffffffff813f98e1>]  [<ffffffff813f98e1>] ata_eh_recover+0x9a1/0x1510
  RSP: 0018:ffff880132defbf0  EFLAGS: 00010246
  RAX: 0000000000000000 RBX: ffff880132f40000 RCX: 0000000000000000
  RDX: ffff88013377c000 RSI: ffff880132f40000 RDI: 0000000000000000
  RBP: ffff880132defce0 R08: ffff88013377dc58 R09: ffff880132defd98
  R10: 0000000000000000 R11: 00000000ffffffff R12: 0000000000000000
  R13: 0000000000000000 R14: ffff88013377c000 R15: 0000000000000000
  FS:  0000000000000000(0000) GS:ffff8800bf700000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: 0000000000000018 CR3: 0000000001a03000 CR4: 00000000000406e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
  Process scsi_eh_4 (pid: 295, threadinfo ffff880132dee000, task ffff880133b416c0)
  Stack:
   0000000000000000 ffff880132defcc0 0000000000000000 ffff880132f42738
   ffffffff813ee8f0 ffffffff813eefe0 ffff880132defd98 ffff88013377f190
   ffffffffa00b3e30 ffffffff813ef030 0000000032defc60 ffff880100000000
  Call Trace:
   [<ffffffff81400867>] sata_pmp_error_handler+0x607/0xc30
   [<ffffffffa00b273f>] ahci_error_handler+0x1f/0x70 [libahci]
   [<ffffffff813faade>] ata_scsi_error+0x5be/0x900
   [<ffffffff813cf724>] scsi_error_handler+0x124/0x650
   [<ffffffff810834b6>] kthread+0x96/0xa0
   [<ffffffff8100cd64>] kernel_thread_helper+0x4/0x10
  Code: 8b 95 70 ff ff ff b8 00 00 00 00 48 3b 9a 10 2e 00 00 48 0f 44 c2 48 89 85 70 ff ff ff 48 8b 8d 70 ff ff ff f6 83 69 02 00 00 01 <48> 8b 41 18 0f 85 48 01 00 00 48 85 c9 74 12 48 8b 51 08 48 83
  RIP  [<ffffffff813f98e1>] ata_eh_recover+0x9a1/0x1510
   RSP <ffff880132defbf0>
  CR2: 0000000000000018

Fix it by testing @link->ap->flags instead.

stable: ATA_FLAG_NO_DIPM was added during 2.6.39 cycle but was
        backported to 2.6.37 and 38.  This is a fix for that and thus
        also applicable to 2.6.37 and 38.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: "Nathan A. Mourey II" <nmoureyii@ne.rr.com>
LKML-Reference: <1304555277.2059.2.camel@localhost.localdomain>
Cc: Connor H <cmdkhh@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
13 years agoMerge branch 'fbmem'
Linus Torvalds [Sat, 14 May 2011 18:24:32 +0000 (11:24 -0700)]
Merge branch 'fbmem'

* fbmem:
  Further fbcon sanity checking
  fbmem: fix remove_conflicting_framebuffers races

13 years agoRevert "libata: ahci_start_engine compliant to AHCI spec"
Tejun Heo [Sat, 14 May 2011 10:28:04 +0000 (12:28 +0200)]
Revert "libata: ahci_start_engine compliant to AHCI spec"

This reverts commit 270dac35c26433d06a89150c51e75ca0181ca7e4.

The commits causes command timeouts on AC plug/unplug.  It isn't yet
clear why.  As the commit was for a single rather obscure controller,
revert the change for now.

The problem was reported and bisected by Gu Rui in bug#34692.

 https://bugzilla.kernel.org/show_bug.cgi?id=34692

Also, reported by Rafael and Michael in the following thread.

 http://thread.gmane.org/gmane.linux.kernel/1138771

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Gu Rui <chaos.proton@gmail.com>
Reported-by: Rafael J. Wysocki <rjw@sisk.pl>
Reported-by: Michael Leun <lkml20100708@newton.leun.net>
Cc: Jian Peng <jipeng2005@gmail.com>
Cc: Jeff Garzik <jgarzik@pobox.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoFurther fbcon sanity checking
Bruno Prémont [Sat, 14 May 2011 10:24:15 +0000 (12:24 +0200)]
Further fbcon sanity checking

This moves the

    if (num_registered_fb == FB_MAX)
            return -ENXIO;

check _AFTER_ the call to do_remove_conflicting_framebuffers() as this
would (now in a safe way) allow a native driver to replace the
conflicting one even if all slots in registered_fb[] are taken.

This also prevents unregistering a framebuffer that is no longer
registered (vga16f will unregister at module unload time even if the
frame buffer had been unregistered earlier due to being found
conflicting).

Signed-off-by: Bruno Prémont <bonbons@linux-vserver.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agofbmem: fix remove_conflicting_framebuffers races
Linus Torvalds [Fri, 13 May 2011 23:16:41 +0000 (16:16 -0700)]
fbmem: fix remove_conflicting_framebuffers races

When a register_framebuffer() call results in us removing old
conflicting framebuffers, the new registration_lock doesn't protect that
situation.  And we can't just add the same locking to the function,
because these functions call each other: register_framebuffer() calls
remove_conflicting_framebuffers, which in turn calls
unregister_framebuffer for any conflicting entry.

In order to fix it, this just creates wrapper functions around all three
functions and makes the versions that actually do the work be called
"do_xxx()", leaving just the wrapper that gets the lock and calls the
worker function.

So the rule becomes simply that "do_xxxx()" has to be called with the
lock held, and now do_register_framebuffer() can just call
do_remove_conflicting_framebuffers(), and that in turn can call
_do_unregister_framebuffer(), and there is no deadlock, and we can hold
the registration lock over the whole sequence, fixing the races.

It also makes error cases simpler, and fixes one situation where we
would return from unregister_framebuffer() without releasing the lock,
pointed out by Bruno Prémont.

Tested-by: Bruno Prémont <bonbons@linux-vserver.org>
Tested-by: Anca Emanuel <anca.emanuel@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88...
Linus Torvalds [Sat, 14 May 2011 00:29:03 +0000 (17:29 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/mattst88/alpha-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha-2.6:
  alpha: Wire up syscalls new to 2.6.39
  alpha: convert to clocksource_register_hz

13 years agoalpha: Wire up syscalls new to 2.6.39
Michael Cree [Wed, 4 May 2011 08:14:50 +0000 (08:14 +0000)]
alpha: Wire up syscalls new to 2.6.39

Wire up the syscalls:
   name_to_handle_at
   open_by_handle_at
   clock_adjtime
   syncfs
and adjust some whitespace in the neighbourhood to align commments.

Signed-off-by: Michael Cree <mcree@orcon.net.nz>
Signed-off-by: Matt Turner <mattst88@gmail.com>
13 years agoalpha: convert to clocksource_register_hz
John Stultz [Wed, 16 Feb 2011 06:34:49 +0000 (22:34 -0800)]
alpha: convert to clocksource_register_hz

Converts alpha to use clocksource_register_hz.

Signed-off-by: John Stultz <johnstul@us.ibm.com>
CC: Richard Henderson <rth@twiddle.net>
CC: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
CC: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Matt Turner <mattst88@gmail.com>