platform/kernel/linux-3.10.git
9 years agoconvert ncpfs
Al Viro [Wed, 22 May 2013 19:11:27 +0000 (15:11 -0400)]
convert ncpfs

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agoconvert hfsplus
Al Viro [Wed, 22 May 2013 18:59:39 +0000 (14:59 -0400)]
convert hfsplus

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agoconvert hfs
Al Viro [Wed, 22 May 2013 18:29:35 +0000 (14:29 -0400)]
convert hfs

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agoconvert befs
Al Viro [Wed, 22 May 2013 17:44:05 +0000 (13:44 -0400)]
convert befs

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agoconvert cifs
Al Viro [Wed, 22 May 2013 20:17:25 +0000 (16:17 -0400)]
convert cifs

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agoconvert freevxfs
Al Viro [Sat, 18 May 2013 07:15:00 +0000 (03:15 -0400)]
convert freevxfs

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agoconvert fuse
Al Viro [Sat, 18 May 2013 07:03:58 +0000 (03:03 -0400)]
convert fuse

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agoconvert hpfs
Al Viro [Sat, 18 May 2013 06:58:57 +0000 (02:58 -0400)]
convert hpfs

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agoreiserfs: switch reiserfs_readdir_dentry to inode
Al Viro [Sat, 18 May 2013 02:58:58 +0000 (22:58 -0400)]
reiserfs: switch reiserfs_readdir_dentry to inode

... and clean the callers up a bit

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agoreiserfs: is_privroot_deh() needs only directory inode, actually
Al Viro [Sat, 18 May 2013 02:45:29 +0000 (22:45 -0400)]
reiserfs: is_privroot_deh() needs only directory inode, actually

... and that - only to get the superblock.  Privroot is a directory
and we don't allow hardlinks to those...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agoconvert reiserfs
Al Viro [Sat, 18 May 2013 02:42:17 +0000 (22:42 -0400)]
convert reiserfs

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agoconvert ntfs
Al Viro [Sat, 18 May 2013 01:22:31 +0000 (21:22 -0400)]
convert ntfs

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agoconvert isofs
Al Viro [Sat, 18 May 2013 01:11:59 +0000 (21:11 -0400)]
convert isofs

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agoconvert jffs2
Al Viro [Fri, 17 May 2013 22:08:49 +0000 (18:08 -0400)]
convert jffs2

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agoconvert f2fs
Al Viro [Fri, 17 May 2013 22:02:17 +0000 (18:02 -0400)]
convert f2fs

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agoconvert 9p
Al Viro [Fri, 17 May 2013 21:51:41 +0000 (17:51 -0400)]
convert 9p

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agoconvert affs
Al Viro [Fri, 17 May 2013 21:44:42 +0000 (17:44 -0400)]
convert affs

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agoconvert adfs
Al Viro [Fri, 17 May 2013 21:30:10 +0000 (17:30 -0400)]
convert adfs

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agoconvert logfs
Al Viro [Fri, 17 May 2013 21:06:34 +0000 (17:06 -0400)]
convert logfs

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agoconvert jfs
Al Viro [Fri, 17 May 2013 21:00:34 +0000 (17:00 -0400)]
convert jfs

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agoconvert ceph
Al Viro [Fri, 17 May 2013 20:52:26 +0000 (16:52 -0400)]
convert ceph

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agoconvert nfs
Al Viro [Fri, 17 May 2013 20:34:50 +0000 (16:34 -0400)]
convert nfs

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agoconvert ext4
Al Viro [Fri, 17 May 2013 20:08:53 +0000 (16:08 -0400)]
convert ext4

and trim the living hell out bogosities in inline dir case

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agoconvert qnx6
Al Viro [Fri, 17 May 2013 19:32:10 +0000 (15:32 -0400)]
convert qnx6

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agoconvert qnx4
Al Viro [Fri, 17 May 2013 19:17:59 +0000 (15:17 -0400)]
convert qnx4

... and use strnlen() instead of strlen() - it's done on untrusted data,
after all.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agoconvert omfs
Al Viro [Fri, 17 May 2013 19:05:25 +0000 (15:05 -0400)]
convert omfs

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agoconvert nilfs2
Al Viro [Thu, 16 May 2013 18:36:14 +0000 (14:36 -0400)]
convert nilfs2

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agoconvert sysfs
Al Viro [Thu, 16 May 2013 18:31:02 +0000 (14:31 -0400)]
convert sysfs

get rid of the kludges in sysfs_readdir()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agoconvert gfs2
Al Viro [Thu, 16 May 2013 18:14:48 +0000 (14:14 -0400)]
convert gfs2

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agoconvert exofs
Al Viro [Thu, 16 May 2013 17:48:17 +0000 (13:48 -0400)]
convert exofs

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agoconvert bfs
Al Viro [Thu, 16 May 2013 17:41:48 +0000 (13:41 -0400)]
convert bfs

... and get rid of that ridiculous mutex in bfs_readdir()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agoconvert procfs
Al Viro [Thu, 16 May 2013 16:07:31 +0000 (12:07 -0400)]
convert procfs

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agoconvert openpromfs
Al Viro [Thu, 16 May 2013 05:52:12 +0000 (01:52 -0400)]
convert openpromfs

what the hell is op_mutex for, BTW?

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agoconvert efs
Al Viro [Thu, 16 May 2013 05:41:10 +0000 (01:41 -0400)]
convert efs

* sanity checks belong before risky operation, not after it
* don't quit as soon as we'd found an entry

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agoconvert configfs
Al Viro [Thu, 16 May 2013 05:28:34 +0000 (01:28 -0400)]
convert configfs

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agoconvert romfs
Al Viro [Thu, 16 May 2013 05:22:00 +0000 (01:22 -0400)]
convert romfs

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agoconvert squashfs
Al Viro [Thu, 16 May 2013 05:17:58 +0000 (01:17 -0400)]
convert squashfs

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agoconvert ubifs
Al Viro [Thu, 16 May 2013 05:14:46 +0000 (01:14 -0400)]
convert ubifs

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agoconvert udf
Al Viro [Thu, 16 May 2013 05:09:37 +0000 (01:09 -0400)]
convert udf

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agoconvert ext3
Al Viro [Thu, 16 May 2013 01:02:48 +0000 (21:02 -0400)]
convert ext3

new helper: dir_relax(inode).  Call when you are in location that will
_not_ be invalidated by directory modifications (block boundary, in case
of ext*).  Returns whether the directory has survived (dropping i_mutex
allows rmdir to kill the sucker; if it returns false to us, ->iterate()
is obviously done)

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agoswitch dcache_readdir() users to ->iterate()
Al Viro [Thu, 16 May 2013 00:23:06 +0000 (20:23 -0400)]
switch dcache_readdir() users to ->iterate()

new helpers - dir_emit_dot(file, ctx, dentry), dir_emit_dotdot(file, ctx),
dir_emit_dots(file, ctx).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agosimple local unixlike: switch to ->iterate()
Al Viro [Wed, 15 May 2013 22:51:49 +0000 (18:51 -0400)]
simple local unixlike: switch to ->iterate()

ext2, ufs, minix, sysv

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agointroduce ->iterate(), ctx->pos, dir_emit()
Al Viro [Wed, 15 May 2013 22:49:12 +0000 (18:49 -0400)]
introduce ->iterate(), ctx->pos, dir_emit()

New method - ->iterate(file, ctx).  That's the replacement for ->readdir();
it takes callback from ctx->actor, uses ctx->pos instead of file->f_pos and
calls dir_emit(ctx, ...) instead of filldir(data, ...).  It does *not*
update file->f_pos (or look at it, for that matter); iterate_dir() does the
update.

Note that dir_emit() takes the offset from ctx->pos (and eventually
filldir_t will lose that argument).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agointroduce iterate_dir() and dir_context
Al Viro [Wed, 15 May 2013 17:52:59 +0000 (13:52 -0400)]
introduce iterate_dir() and dir_context

iterate_dir(): new helper, replacing vfs_readdir().

struct dir_context: contains the readdir callback (and will get more stuff
in it), embedded into whatever data that callback wants to deal with;
eventually, we'll be passing it to ->readdir() replacement instead of
(data,filldir) pair.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agosysfs: clean up sysfs_get_dirent()
Tejun Heo [Thu, 12 Sep 2013 03:19:13 +0000 (23:19 -0400)]
sysfs: clean up sysfs_get_dirent()

The pre-existing sysfs interfaces which take explicit namespace
argument are weird in that they place the optional @ns in front of
@name which is contrary to the established convention.  For example,
we end up forcing vast majority of sysfs_get_dirent() users to do
sysfs_get_dirent(parent, NULL, name), which is silly and error-prone
especially as @ns and @name may be interchanged without causing
compilation warning.

This renames sysfs_get_dirent() to sysfs_get_dirent_ns() and swap the
positions of @name and @ns, and sysfs_get_dirent() is now a wrapper
around sysfs_get_dirent_ns().  This makes confusions a lot less
likely.

There are other interfaces which take @ns before @name.  They'll be
updated by following patches.

This patch doesn't introduce any functional changes.

v2: EXPORT_SYMBOL_GPL() wasn't updated leading to undefined symbol
    error on module builds.  Reported by build test robot.  Fixed.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Kay Sievers <kay@vrfy.org>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agosysfs: @name comes before @ns
Tejun Heo [Thu, 12 Sep 2013 02:29:09 +0000 (22:29 -0400)]
sysfs: @name comes before @ns

Some internal sysfs functions which take explicit namespace argument
are weird in that they place the optional @ns in front of @name which
is contrary to the established convention.  This is confusing and
error-prone especially as @ns and @name may be interchanged without
causing compilation warning.

Swap the positions of @name and @ns in the following internal
functions.

 sysfs_find_dirent()
 sysfs_rename()
 sysfs_hash_and_remove()
 sysfs_name_hash()
 sysfs_name_compare()
 create_dir()

This patch doesn't introduce any functional changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agosysfs: remove ktype->namespace() invocations in symlink code
Tejun Heo [Thu, 12 Sep 2013 02:29:06 +0000 (22:29 -0400)]
sysfs: remove ktype->namespace() invocations in symlink code

There's no reason for sysfs to be calling ktype->namespace().  It is
backwards, obfuscates what's going on and unnecessarily tangles two
separate layers.

There are two places where symlink code calls ktype->namespace().

* sysfs_do_create_link_sd() calls it to find out the namespace tag of
  the target directory.  Unless symlinking races with cross-namespace
  renaming, this equals @target_sd->s_ns.

* sysfs_rename_link() uses it to find out the new namespace to rename
  to and the new namespace can be different from the existing one.
  The function is renamed to sysfs_rename_link_ns() with an explicit
  @ns argument and the ktype->namespace() invocation is shifted to the
  device layer.

While this patch replaces ktype->namespace() invocation with the
recorded result in @target_sd, this shouldn't result in any behvior
difference.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agosysfs: remove ktype->namespace() invocations in directory code
Tejun Heo [Thu, 12 Sep 2013 02:29:05 +0000 (22:29 -0400)]
sysfs: remove ktype->namespace() invocations in directory code

For some unrecognizable reason, namespace information is communicated
to sysfs through ktype->namespace() callback when there's *nothing*
which needs the use of a callback.  The whole sequence of operations
is completely synchronous and sysfs operations simply end up calling
back into the layer which just invoked it in order to find out the
namespace information, which is completely backwards, obfuscates
what's going on and unnecessarily tangles two separate layers.

This patch doesn't remove ktype->namespace() but shifts its handling
to kobject layer.  We probably want to get rid of the callback in the
long term.

This patch adds an explicit param to sysfs_{create|rename|move}_dir()
and renames them to sysfs_{create|rename|move}_dir_ns(), respectively.
ktype->namespace() invocations are moved to the calling sites of the
above functions.  A new helper kboject_namespace() is introduced which
directly tests kobj_ns_type_operations->type which should give the
same result as testing sysfs_fs_type(parent_sd) and returns @kobj's
namespace tag as necessary.  kobject_namespace() is extern as it will
be used from another file in the following patches.

This patch should be an equivalent conversion without any functional
difference.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agosysfs: make attr namespace interface less convoluted
Tejun Heo [Thu, 12 Sep 2013 02:29:04 +0000 (22:29 -0400)]
sysfs: make attr namespace interface less convoluted

sysfs ns (namespace) implementation became more convoluted than
necessary while trying to hide ns information from visible interface.
The relatively recent attr ns support is a good example.

* attr ns tag is determined by sysfs_ops->namespace() callback while
  dir tag is determined by kobj_type->namespace().  The placement is
  arbitrary.

* Instead of performing operations with explicit ns tag, the namespace
  callback is routed through sysfs_attr_ns(), sysfs_ops->namespace(),
  class_attr_namespace(), class_attr->namespace().  It's not simpler
  in any sense.  The only thing this convolution does is traversing
  the whole stack backwards.

The namespace callbacks are unncessary because the operations involved
are inherently synchronous.  The information can be provided in in
straight-forward top-down direction and reversing that direction is
unnecessary and against basic design principles.

This backward interface is unnecessarily convoluted and hinders
properly separating out sysfs from driver model / kobject for proper
layering.  This patch updates attr ns support such that

* sysfs_ops->namespace() and class_attr->namespace() are dropped.

* sysfs_{create|remove}_file_ns(), which take explicit @ns param, are
  added and sysfs_{create|remove}_file() are now simple wrappers
  around the ns aware functions.

* ns handling is dropped from sysfs_chmod_file().  Nobody uses it at
  this point.  sysfs_chmod_file_ns() can be added later if necessary.

* Explicit @ns is propagated through class_{create|remove}_file_ns()
  and netdev_class_{create|remove}_file_ns().

* driver/net/bonding which is currently the only user of attr
  namespace is updated to use netdev_class_{create|remove}_file_ns()
  with @bh->net as the ns tag instead of using the namespace callback.

This patch should be an equivalent conversion without any functional
difference.  It makes the code easier to follow, reduces lines of code
a bit and helps proper separation and layering.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Kay Sievers <kay@vrfy.org>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agosysfs: drop semicolon from to_sysfs_dirent() definition
Tejun Heo [Thu, 12 Sep 2013 02:29:03 +0000 (22:29 -0400)]
sysfs: drop semicolon from to_sysfs_dirent() definition

The expansion of to_sysfs_dirent() contains an unncessary trailing
semicolon making it impossible to use in the middle of statements.
Drop it.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agosysfs: Restrict mounting sysfs
Eric W. Biederman [Tue, 26 Mar 2013 03:07:01 +0000 (20:07 -0700)]
sysfs: Restrict mounting sysfs

Don't allow mounting sysfs unless the caller has CAP_SYS_ADMIN rights
over the net namespace.  The principle here is if you create or have
capabilities over it you can mount it, otherwise you get to live with
what other people have mounted.

Instead of testing this with a straight forward ns_capable call,
perform this check the long and torturous way with kobject helpers,
this keeps direct knowledge of namespaces out of sysfs, and preserves
the existing sysfs abstractions.

Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
9 years agouserns: Better restrictions on when proc and sysfs can be mounted
Eric W. Biederman [Sun, 31 Mar 2013 02:57:41 +0000 (19:57 -0700)]
userns: Better restrictions on when proc and sysfs can be mounted

Rely on the fact that another flavor of the filesystem is already
mounted and do not rely on state in the user namespace.

Verify that the mounted filesystem is not covered in any significant
way.  I would love to verify that the previously mounted filesystem
has no mounts on top but there are at least the directories
/proc/sys/fs/binfmt_misc and /sys/fs/cgroup/ that exist explicitly
for other filesystems to mount on top of.

Refactor the test into a function named fs_fully_visible and call that
function from the mount routines of proc and sysfs.  This makes this
test local to the filesystems involved and the results current of when
the mounts take place, removing a weird threading of the user
namespace, the mount namespace and the filesystems themselves.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
9 years agosysfs: file.c: fix up broken string warnings
Greg Kroah-Hartman [Wed, 21 Aug 2013 23:37:42 +0000 (16:37 -0700)]
sysfs: file.c: fix up broken string warnings

This fixes the coding style warnings in fs/sysfs/file.c for broken
strings across lines.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agosysfs: fix up uaccess.h coding style warnings
Greg Kroah-Hartman [Wed, 21 Aug 2013 23:34:59 +0000 (16:34 -0700)]
sysfs: fix up uaccess.h coding style warnings

This fixes the uaccess.h warnings in the sysfs.c files.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agosysfs: fix up 80 column coding style issues
Greg Kroah-Hartman [Wed, 21 Aug 2013 23:33:34 +0000 (16:33 -0700)]
sysfs: fix up 80 column coding style issues

This fixes up the 80 column coding style issues in the sysfs .c files.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agosysfs: fix up space coding style issues
Greg Kroah-Hartman [Wed, 21 Aug 2013 23:28:26 +0000 (16:28 -0700)]
sysfs: fix up space coding style issues

This fixes up all of the space-related coding style issues for the sysfs
code.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agosysfs: remove trailing whitespace
Greg Kroah-Hartman [Wed, 21 Aug 2013 23:21:17 +0000 (16:21 -0700)]
sysfs: remove trailing whitespace

This removes all trailing whitespace errors in the sysfs code.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agosysfs: fix placement of EXPORT_SYMBOL()
Greg Kroah-Hartman [Wed, 21 Aug 2013 23:17:47 +0000 (16:17 -0700)]
sysfs: fix placement of EXPORT_SYMBOL()

The export should happen after the function, not at the bottom of the
file, so fix that up.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agosysfs: group: update copyright to add myself and the LF
Greg Kroah-Hartman [Wed, 21 Aug 2013 23:14:11 +0000 (16:14 -0700)]
sysfs: group: update copyright to add myself and the LF

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agosysfs: group.c: add kerneldoc for sysfs_remove_group
Greg Kroah-Hartman [Wed, 21 Aug 2013 23:12:34 +0000 (16:12 -0700)]
sysfs: group.c: add kerneldoc for sysfs_remove_group

sysfs_remove_group() never had kerneldoc, so add it, and fix up the
kerneldoc for sysfs_remove_groups() which didn't specify the parameters
properly.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agosysfs: group.c: fix up broken string coding style
Greg Kroah-Hartman [Wed, 21 Aug 2013 23:10:02 +0000 (16:10 -0700)]
sysfs: group.c: fix up broken string coding style

checkpatch complains about the broken string in the file, and it's
correct, so fix it up.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agosysfs: group.c: fix up some * coding style issues
Greg Kroah-Hartman [Wed, 21 Aug 2013 23:07:29 +0000 (16:07 -0700)]
sysfs: group.c: fix up some * coding style issues

This fixes up the * coding style warnings for the group.c sysfs file.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agosysfs: group.c: fix trailing whitespace
Greg Kroah-Hartman [Wed, 21 Aug 2013 23:06:14 +0000 (16:06 -0700)]
sysfs: group.c: fix trailing whitespace

There was some trailing spaces in the file, fix that up.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agosysfs: group.c: move EXPORT_SYMBOL_GPL() to the proper location
Greg Kroah-Hartman [Wed, 21 Aug 2013 23:04:12 +0000 (16:04 -0700)]
sysfs: group.c: move EXPORT_SYMBOL_GPL() to the proper location

This fixes up the coding style issue of incorrectly placing the
EXPORT_SYMBOL_GPL() macro, it should be right after the function itself,
not at the end of the file.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agosysfs: add sysfs_create/remove_groups()
Greg Kroah-Hartman [Wed, 21 Aug 2013 20:47:50 +0000 (13:47 -0700)]
sysfs: add sysfs_create/remove_groups()

These functions are being open-coded in 3 different places in the driver
core, and other driver subsystems will want to start doing this as well,
so move it to the sysfs core to keep it all in one place, where we know
it is written properly.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agosysfs: prevent warning when only using binary attributes
Oliver Schinagl [Sun, 14 Jul 2013 23:05:56 +0000 (16:05 -0700)]
sysfs: prevent warning when only using binary attributes

When only using bin_attrs instead of attrs the kernel prints a warning
and refuses to create the sysfs entry. This fixes that.

Signed-off-by: Oliver Schinagl <oliver@schinagl.nl>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agosysfs: add support for binary attributes in groups
Greg Kroah-Hartman [Sun, 14 Jul 2013 23:05:55 +0000 (16:05 -0700)]
sysfs: add support for binary attributes in groups

groups should be able to support binary attributes, just like it
supports "normal" attributes.  This lets us only handle one type of
structure, groups, throughout the driver core and subsystems, making
binary attributes a "full fledged" part of the driver model, and not
something just "tacked on".

Reported-by: Oliver Schinagl <oliver@schinagl.nl>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agosysfs_notify is only possible on file attributes
Nick Dyer [Fri, 7 Jun 2013 14:45:13 +0000 (15:45 +0100)]
sysfs_notify is only possible on file attributes

If sysfs_notify is called on a binary attribute, bad things can
happen, so prevent it.

Note, no in-kernel usage of this is currently present, but in the
future, it's good to be safe.

Changes in V2:
- Also ignore sysfs_notify on dirs, links
- Use WARN_ON rather than silently failing
- Compiled and tested (huge apologies about first submission)

Signed-off-by: Nick Dyer <nick.dyer@itdev.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agosysfs: kill sysfs_sb declaration in fs/sysfs/inode.c.
Rami Rosen [Mon, 29 Apr 2013 13:05:32 +0000 (16:05 +0300)]
sysfs: kill sysfs_sb declaration in fs/sysfs/inode.c.

This patch removes sysfs_sb declaration from fs/sysfs/inode.c
(due to 0f4288ec6fcc1a47d1fa0241ec1c6dacd5a09e96,
 "Kill unused sysfs_sb variable").

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agosysfs: sysfs_link_sibling(): fix typo in comment
Warner Wang [Mon, 13 May 2013 03:11:05 +0000 (11:11 +0800)]
sysfs: sysfs_link_sibling(): fix typo in comment

Fix a typo subling->sibling in the comment of sysfs_link_sibling().

Signed-off-by: Warner Wang <warner.wang@hp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agofuse: drop dentry on failed revalidate
Anand Avati [Thu, 5 Sep 2013 09:44:44 +0000 (11:44 +0200)]
fuse: drop dentry on failed revalidate

Drop a subtree when we find that it has moved or been delated.  This can be
done as long as there are no submounts under this location.

If the directory was moved and we come across the same directory in a
future lookup it will be reconnected by d_materialise_unique().

Signed-off-by: Anand Avati <avati@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agofuse: clean up return in fuse_dentry_revalidate()
Miklos Szeredi [Thu, 5 Sep 2013 09:44:43 +0000 (11:44 +0200)]
fuse: clean up return in fuse_dentry_revalidate()

On errors unrelated to the filesystem's state (ENOMEM, ENOTCONN) return the
error itself from ->d_revalidate() insted of returning zero (invalid).

Also make a common label for invalidating the dentry.  This will be used by
the next patch.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agofuse: use d_materialise_unique()
Miklos Szeredi [Thu, 5 Sep 2013 09:44:42 +0000 (11:44 +0200)]
fuse: use d_materialise_unique()

Use d_materialise_unique() instead of d_splice_alias().  This allows dentry
subtrees to be moved to a new place if there moved, even if something is
referencing a dentry in the subtree (open fd, cwd, etc..).

This will also allow us to drop a subtree if it is found to be replaced by
something else.  In this case the disconnected subtree can later be
reconnected to its new location.

d_materialise_unique() ensures that a directory entry only ever has one
alias.  We keep fc->inst_mutex around the calls for d_materialise_unique()
on directories to prevent a race with mkdir "stealing" the inode.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agosysfs: use check_submounts_and_drop()
Miklos Szeredi [Thu, 5 Sep 2013 09:44:41 +0000 (11:44 +0200)]
sysfs: use check_submounts_and_drop()

Do have_submounts(), shrink_dcache_parent() and d_drop() atomically.

check_submounts_and_drop() can deal with negative dentries and
non-directories as well.

Non-directories can also be mounted on.  And just like directories we don't
want these to disappear with invalidation.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agonfs: use check_submounts_and_drop()
Miklos Szeredi [Thu, 5 Sep 2013 09:44:40 +0000 (11:44 +0200)]
nfs: use check_submounts_and_drop()

Do have_submounts(), shrink_dcache_parent() and d_drop() atomically.

check_submounts_and_drop() can deal with negative dentries and
non-directories as well.

Non-directories can also be mounted on.  And just like directories we don't
want these to disappear with invalidation.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agogfs2: use check_submounts_and_drop()
Miklos Szeredi [Thu, 5 Sep 2013 09:44:39 +0000 (11:44 +0200)]
gfs2: use check_submounts_and_drop()

Do have_submounts(), shrink_dcache_parent() and d_drop() atomically.

check_submounts_and_drop() can deal with negative dentries and
non-directories as well.

Non-directories can also be mounted on.  And just like directories we don't
want these to disappear with invalidation.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agoafs: use check_submounts_and_drop()
Miklos Szeredi [Thu, 5 Sep 2013 09:44:38 +0000 (11:44 +0200)]
afs: use check_submounts_and_drop()

Do have_submounts(), shrink_dcache_parent() and d_drop() atomically.

check_submounts_and_drop() can deal with negative dentries as well.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agovfs: check unlinked ancestors before mount
Miklos Szeredi [Thu, 5 Sep 2013 12:39:11 +0000 (14:39 +0200)]
vfs: check unlinked ancestors before mount

We check submounts before doing d_drop() on a non-empty directory dentry in
NFS (have_submounts()), but we do not exclude a racing mount.  Nor do we
prevent mounts to be added to the disconnected subtree using relative paths
after the d_drop().

This patch fixes these issues by checking for unlinked (unhashed, non-root)
ancestors before proceeding with the mount.  This is done with rename
seqlock taken for write and with ->d_lock grabbed on each ancestor in turn,
including our dentry itself.  This ensures that the only one of
check_submounts_and_drop() or has_unlinked_ancestor() can succeed.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agovfs: check submounts and drop atomically
Miklos Szeredi [Thu, 5 Sep 2013 09:44:36 +0000 (11:44 +0200)]
vfs: check submounts and drop atomically

We check submounts before doing d_drop() on a non-empty directory dentry in
NFS (have_submounts()), but we do not exclude a racing mount.

 Process A: have_submounts() -> returns false
 Process B: mount() -> success
 Process A: d_drop()

This patch prepares the ground for the fix by doing the following
operations all under the same rename lock:

  have_submounts()
  shrink_dcache_parent()
  d_drop()

This is actually an optimization since have_submounts() and
shrink_dcache_parent() both traverse the same dentry tree separately.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: David Howells <dhowells@redhat.com>
CC: Steven Whitehouse <swhiteho@redhat.com>
CC: Trond Myklebust <Trond.Myklebust@netapp.com>
CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agovfs: add d_walk()
Miklos Szeredi [Thu, 5 Sep 2013 09:44:35 +0000 (11:44 +0200)]
vfs: add d_walk()

This one replaces three instances open coded tree walking (have_submounts,
select_parent, d_genocide) with a common helper.

In addition to slightly reducing the kernel size, this simplifies the
callers and makes them less bug prone.

Change-Id: I82891c4cc0b3cd13cc4faef5656d4eb01f4f1e99
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agovfs: restructure d_genocide()
Miklos Szeredi [Thu, 5 Sep 2013 09:44:34 +0000 (11:44 +0200)]
vfs: restructure d_genocide()

It shouldn't matter when we decrement the refcount during the walk as long
as we do it exactly once.

Restructure d_genocide() to do the killing on entering the dentry instead
of when leaving it.  This helps creating a common helper for tree walking.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agovfs: call d_op->d_prune() before unhashing dentry
Yan, Zheng [Tue, 13 Aug 2013 07:42:02 +0000 (15:42 +0800)]
vfs: call d_op->d_prune() before unhashing dentry

The d_prune dentry operation is used to notify filesystem when VFS
about to prune a hashed dentry from the dcache. There are three
code paths that prune dentries: shrink_dcache_for_umount_subtree(),
prune_dcache_sb() and d_prune_aliases(). For the d_prune_aliases()
case, VFS unhashes the dentry first, then call the d_prune dentry
operation. This confuses ceph_d_prune() (ceph uses the d_prune
dentry operation to maintain a flag indicating whether the complete
contents of a directory are in the dcache, pruning unhashed dentry
does not affect dir's completeness)

This patch fixes the issue by calling the d_prune dentry operation
in d_prune_aliases(), before unhashing the dentry. Also make VFS
only call the d_prune dentry operation for hashed dentry, to avoid
calling the d_prune dentry operation twice when dentry is pruned
by d_prune_aliases().

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agovfs: reimplement d_rcu_to_refcount() using lockref_get_or_lock()
Linus Torvalds [Mon, 2 Sep 2013 18:38:06 +0000 (11:38 -0700)]
vfs: reimplement d_rcu_to_refcount() using lockref_get_or_lock()

This moves __d_rcu_to_refcount() from <linux/dcache.h> into fs/namei.c
and re-implements it using the lockref infrastructure instead.  It also
adds a lot of comments about what is actually going on, because turning
a dentry that was looked up using RCU into a long-lived reference
counted entry is one of the more subtle parts of the rcu walk.

We also used to be _particularly_ subtle in unlazy_walk() where we
re-validate both the dentry and its parent using the same sequence
count.  We used to do it by nesting the locks and then verifying the
sequence count just once.

That was silly, because nested locking is expensive, but the sequence
count check is not.  So this just re-validates the dentry and the parent
separately, avoiding the nested locking, and making the lockref lookup
possible.

Acked-by: Waiman Long <waiman.long@hp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agovfs: use lockref_get_not_zero() for optimistic lockless dget_parent()
Waiman Long [Mon, 2 Sep 2013 18:29:22 +0000 (11:29 -0700)]
vfs: use lockref_get_not_zero() for optimistic lockless dget_parent()

A valid parent pointer is always going to have a non-zero reference
count, but if we look up the parent optimistically without locking, we
have to protect against the (very unlikely) race against renaming
changing the parent from under us.

We do that by using lockref_get_not_zero(), and then re-checking the
parent pointer after getting a valid reference.

[ This is a re-implementation of a chunk from the original patch by
  Waiman Long: "dcache: Enable lockless update of dentry's refcount".
  I've completely rewritten the patch-series and split it up, but I'm
  attributing this part to Waiman as it's close enough to his earlier
  patch  - Linus ]

Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agovfs: make the dentry cache use the lockref infrastructure
Waiman Long [Thu, 29 Aug 2013 01:24:59 +0000 (18:24 -0700)]
vfs: make the dentry cache use the lockref infrastructure

This just replaces the dentry count/lock combination with the lockref
structure that contains both a count and a spinlock, and does the
mechanical conversion to use the lockref infrastructure.

There are no semantic changes here, it's purely syntactic.  The
reference lockref implementation uses the spinlock exactly the same way
that the old dcache code did, and the bulk of this patch is just
expanding the internal "d_count" use in the dcache code to use
"d_lockref.count" instead.

This is purely preparation for the real change to make the reference
count updates be lockless during the 3.12 merge window.

[ As with the previous commit, this is a rewritten version of a concept
  originally from Waiman, so credit goes to him, blame for any errors
  goes to me.

  Waiman's patch had some semantic differences for taking advantage of
  the lockless update in dget_parent(), while this patch is
  intentionally a pure search-and-replace change with no semantic
  changes.     - Linus ]

Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agovfs: constify dentry parameter in d_count()
Peng Tao [Thu, 18 Jul 2013 14:09:08 +0000 (22:09 +0800)]
vfs: constify dentry parameter in d_count()

so that it can be used in places like d_compare/d_hash
without causing a compiler warning.

Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agohelper for reading ->d_count
Al Viro [Fri, 5 Jul 2013 14:59:33 +0000 (18:59 +0400)]
helper for reading ->d_count

Change-Id: I17f408c47173052817d0fb79f8506e418e47a5de
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agolockref: include mutex.h rather than reinvent arch_mutex_cpu_relax
Will Deacon [Wed, 27 Nov 2013 13:52:53 +0000 (13:52 +0000)]
lockref: include mutex.h rather than reinvent arch_mutex_cpu_relax

arch_mutex_cpu_relax is already conditionally defined in mutex.h, so
simply include that header rather than replicate the code here.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agolockref: use BLOATED_SPINLOCKS to avoid explicit config dependencies
Peter Zijlstra [Thu, 14 Nov 2013 22:31:54 +0000 (14:31 -0800)]
lockref: use BLOATED_SPINLOCKS to avoid explicit config dependencies

Avoid the fragile Kconfig construct guestimating spinlock_t sizes; use a
friendly compile-time test to determine this.

[kirill.shutemov@linux.intel.com: drop CONFIG_CMPXCHG_LOCKREF]
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agoGFS2: Use lockref for glocks
Steven Whitehouse [Tue, 15 Oct 2013 14:18:08 +0000 (15:18 +0100)]
GFS2: Use lockref for glocks

Currently glocks have an atomic reference count and also a spinlock
which covers various internal fields, such as the state. This intent of
this patch is to replace the spinlock and the atomic reference count
with a lockref structure. This contains a spinlock which we can continue
to use as before, and a reference counter which is used in conjuction
with the spinlock to replace the previous atomic counter.

As a result of this there are some new rules for reference counting on
glocks. We need to distinguish between reference count changes under
gl_spin (which are now just increment or decrement of the new counter,
provided the count cannot hit zero) and those which are outside of
gl_spin, but which now take gl_spin internally.

The conversion is relatively straight forward. There is probably some
further clean up which can be done, but the priority at this stage is to
make the change in as simple a manner as possible.

A consequence of this change is that the reference count is being
decoupled from the lru list processing. This should allow future
adoption of the lru_list code with glocks in due course.

The reason for using the "dead" state and not just relying on 0 being
the "invalid state" is so that in due course 0 ref counts can be
allowable. The intent is to eventually be able to remove the ref count
changes which are currently hidden away in state_change().

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
9 years agoGFS2: Take glock reference in examine_bucket()
Steven Whitehouse [Tue, 20 Aug 2013 08:35:09 +0000 (09:35 +0100)]
GFS2: Take glock reference in examine_bucket()

We need to check the glock ref counter in a race free way
in order to ensure that the gfs2_glock_hold() call will
succeed. The easiest way to do that is to simply take the
reference count early in the common code of examine_bucket,
skipping any glocks with zero ref count.

That means that the examiner functions all need to put their
reference on the glock once they've performed their function.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Reported-by: David Teigland <teigland@redhat.com>
Tested-by: David Teigland <teigland@redhat.com>
9 years agolockref: use arch_mutex_cpu_relax() in CMPXCHG_LOOP()
Heiko Carstens [Mon, 23 Sep 2013 10:59:56 +0000 (12:59 +0200)]
lockref: use arch_mutex_cpu_relax() in CMPXCHG_LOOP()

Make use of arch_mutex_cpu_relax() so architectures can override the
default cpu_relax() semantics.
This is especially useful for s390, where cpu_relax() means that we
yield() the current (virtual) cpu and therefore is very expensive,
and would contradict the whole purpose of the lockless cmpxchg loop.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
9 years agolockref: allow relaxed cmpxchg64 variant for lockless updates
Will Deacon [Thu, 26 Sep 2013 16:27:00 +0000 (17:27 +0100)]
lockref: allow relaxed cmpxchg64 variant for lockless updates

The 64-bit cmpxchg operation on the lockref is ordered by virtue of
hazarding between the cmpxchg operation and the reference count
manipulation. On weakly ordered memory architectures (such as ARM), it
can be of great benefit to omit the barrier instructions where they are
not needed.

This patch moves the lockless lockref code over to a cmpxchg64_relaxed
operation, which doesn't provide barrier semantics. If the operation
isn't defined, we simply #define it as the usual 64-bit cmpxchg macro.

Cc: Waiman Long <Waiman.Long@hp.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agolockref: use cmpxchg64 explicitly for lockless updates
Will Deacon [Thu, 19 Sep 2013 18:06:46 +0000 (19:06 +0100)]
lockref: use cmpxchg64 explicitly for lockless updates

The cmpxchg() function tends not to support 64-bit arguments on 32-bit
architectures.  This could be either due to use of unsigned long
arguments (like on ARM) or lack of instruction support (cmpxchgq on
x86).  However, these architectures may implement a specific cmpxchg64()
function to provide 64-bit cmpxchg support instead.

Since the lockref code requires a 64-bit cmpxchg and relies on the
architecture selecting ARCH_USE_CMPXCHG_LOCKREF, move to using cmpxchg64
instead of cmpxchg and allow 32-bit architectures to make use of the
lockless lockref implementation.

Cc: Waiman Long <Waiman.Long@hp.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agolockref: add ability to mark lockrefs "dead"
Linus Torvalds [Sat, 7 Sep 2013 22:49:18 +0000 (15:49 -0700)]
lockref: add ability to mark lockrefs "dead"

The only actual current lockref user (dcache) uses zero reference counts
even for perfectly live dentries, because it's a cache: there may not be
any users, but that doesn't mean that we want to throw away the dentry.

At the same time, the dentry cache does have a notion of a truly "dead"
dentry that we must not even increment the reference count of, because
we have pruned it and it is not valid.

Currently that distinction is not visible in the lockref itself, and the
dentry cache validation uses "lockref_get_or_lock()" to either get a new
reference to a dentry that already had existing references (and thus
cannot be dead), or get the dentry lock so that we can then verify the
dentry and increment the reference count under the lock if that
verification was successful.

That's all somewhat complicated.

This adds the concept of being "dead" to the lockref itself, by simply
using a count that is negative.  This allows a usage scenario where we
can increment the refcount of a dentry without having to validate it,
and pushing the special "we killed it" case into the lockref code.

The dentry code itself doesn't actually use this yet, and it's probably
too late in the merge window to do that code (the dentry_kill() code
with its "should I decrement the count" logic really is pretty complex
code), but let's introduce the concept at the lockref level now.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agolockref: implement lockless reference count updates using cmpxchg()
Linus Torvalds [Mon, 2 Sep 2013 19:12:15 +0000 (12:12 -0700)]
lockref: implement lockless reference count updates using cmpxchg()

Instead of taking the spinlock, the lockless versions atomically check
that the lock is not taken, and do the reference count update using a
cmpxchg() loop.  This is semantically identical to doing the reference
count update protected by the lock, but avoids the "wait for lock"
contention that you get when accesses to the reference count are
contended.

Note that a "lockref" is absolutely _not_ equivalent to an atomic_t.
Even when the lockref reference counts are updated atomically with
cmpxchg, the fact that they also verify the state of the spinlock means
that the lockless updates can never happen while somebody else holds the
spinlock.

So while "lockref_put_or_lock()" looks a lot like just another name for
"atomic_dec_and_lock()", and both optimize to lockless updates, they are
fundamentally different: the decrement done by atomic_dec_and_lock() is
truly independent of any lock (as long as it doesn't decrement to zero),
so a locked region can still see the count change.

The lockref structure, in contrast, really is a *locked* reference
count.  If you hold the spinlock, the reference count will be stable and
you can modify the reference count without using atomics, because even
the lockless updates will see and respect the state of the lock.

In order to enable the cmpxchg lockless code, the architecture needs to
do three things:

 (1) Make sure that the "arch_spinlock_t" and an "unsigned int" can fit
     in an aligned u64, and have a "cmpxchg()" implementation that works
     on such a u64 data type.

 (2) define a helper function to test for a spinlock being unlocked
     ("arch_spin_value_unlocked()")

 (3) select the "ARCH_USE_CMPXCHG_LOCKREF" config variable in its
     Kconfig file.

This enables it for x86-64 (but not 32-bit, we'd need to make sure
cmpxchg() turns into the proper cmpxchg8b in order to enable it for
32-bit mode).

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agolockref: uninline lockref helper functions
Linus Torvalds [Mon, 2 Sep 2013 18:58:20 +0000 (11:58 -0700)]
lockref: uninline lockref helper functions

They aren't very good to inline, since they already call external
functions (the spinlock code), and we're going to create rather more
complicated versions of them that can do the reference count updates
locklessly.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agolockref: add 'lockref_get_or_lock() helper
Linus Torvalds [Mon, 2 Sep 2013 18:14:19 +0000 (11:14 -0700)]
lockref: add 'lockref_get_or_lock() helper

This behaves like "lockref_get_not_zero()", but instead of doing nothing
if the count was zero, it returns with the lock held.

This allows callers to revalidate the lockref-protected data structure
if required even if the count was zero to begin with, and possibly
increment the count if it passes muster.

In particular, the dentry code wants this when it wants to turn an
RCU-protected dentry into a stable refcounted one: if the dentry count
it zero, but the sequence number still validates the dentry, we can take
a reference to it.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agoAdd new lockref infrastructure reference implementation
Waiman Long [Thu, 29 Aug 2013 01:13:26 +0000 (18:13 -0700)]
Add new lockref infrastructure reference implementation

This introduces a new "lockref" structure that supports the concept of
lockless updates of reference counts that still honor an attached
spinlock.

NOTE! This reference implementation is not the optimized lockless
version, rather it is the fallback implementation using standard
spinlocks.  The actual optimized versions will be merged into 3.12, but
I wanted to get the infrastructure in place and document the new
interfaces.

[ Also note that this particular commit is drastically cut-down minimal
  version of the original patch by Waiman.  In order to properly credit
  the original author I'm marking Waiman as the author here, but in the
  end this patch bears little resemblance to the patch by Waiman.  So
  blame any errors on me editing things down to the point where I can
  introduce the infrastructure before the merge window for 3.12 actually
  opens.     - Linus ]

Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agos5p-mfc: fix state check from encoder queue_setup 88/40388/1
Seung-Woo Kim [Wed, 13 May 2015 04:44:39 +0000 (13:44 +0900)]
s5p-mfc: fix state check from encoder queue_setup

MFCINST_GOT_INST state is set to encoder context with set_format
only for catpure buffer. In queue_setup of encoder called during
reqbufs, it is checked MFCINST_GOT_INST state for both capture
and output buffer. So this patch fixes to encoder to check
MFCINST_GOT_INST state only for capture buffer from queue_setup.

Change-Id: I58f449eb48f990a2dcd4ecc06bc775293d1b6396
Signed-off-by: Seung-Woo Kim <sw0312.kim@samsung.com>