Christian Brauner [Fri, 28 Oct 2022 07:56:20 +0000 (09:56 +0200)]
acl: conver higher-level helpers to rely on mnt_idmap
Convert an initial portion to rely on struct mnt_idmap by converting the
high level xattr helpers.
Reviewed-by: Seth Forshee (DigitalOcean) <sforshee@kernel.org>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Christian Brauner [Wed, 26 Oct 2022 10:51:27 +0000 (12:51 +0200)]
fs: introduce dedicated idmap type for mounts
Last cycle we've already made the interaction with idmapped mounts more
robust and type safe by introducing the vfs{g,u}id_t type. This cycle we
concluded the conversion and removed the legacy helpers.
Currently we still pass around the plain namespace that was attached to
a mount. This is in general pretty convenient but it makes it easy to
conflate filesystem and mount namespaces and what different roles they
have to play. Especially for filesystem developers without much
experience in this area this is an easy source for bugs.
Instead of passing the plain namespace we introduce a dedicated type
struct mnt_idmap and replace the pointer with a pointer to a struct
mnt_idmap. There are no semantic or size changes for the mount struct
caused by this.
We then start converting all places aware of idmapped mounts to rely on
struct mnt_idmap. Once the conversion is done all helpers down to the
really low-level make_vfs{g,u}id() and from_vfs{g,u}id() will take a
struct mnt_idmap argument instead of two namespace arguments. This way
it becomes impossible to conflate the two, removing and thus eliminating
the possibility of any bugs. Fwiw, I fixed some issues in that area a
while ago in ntfs3 and ksmbd in the past. Afterwards, only low-level
code can ultimately use the associated namespace for any permission
checks. Even most of the vfs can be ultimately completely oblivious
about this and filesystems will never interact with it directly in any
form in the future.
A struct mnt_idmap currently encompasses a simple refcount and a pointer
to the relevant namespace the mount is idmapped to. If a mount isn't
idmapped then it will point to a static nop_mnt_idmap. If it is an
idmapped mount it will point to a new struct mnt_idmap. As usual there
are no allocations or anything happening for non-idmapped mounts.
Everthing is carefully written to be a nop for non-idmapped mounts as
has always been the case.
If an idmapped mount or mount tree is created a new struct mnt_idmap is
allocated and a reference taken on the relevant namespace. For each
mount in a mount tree that gets idmapped or a mount that inherits the
idmap when it is cloned the reference count on the associated struct
mnt_idmap is bumped. Just a reminder that we only allow a mount to
change it's idmapping a single time and only if it hasn't already been
attached to the filesystems and has no active writers.
The actual changes are fairly straightforward. This will have huge
benefits for maintenance and security in the long run even if it causes
some churn. I'm aware that there's some cost for all of you. And I'll
commit to doing this work and make this as painless as I can.
Note that this also makes it possible to extend struct mount_idmap in
the future. For example, it would be possible to place the namespace
pointer in an anonymous union together with an idmapping struct. This
would allow us to expose an api to userspace that would let it specify
idmappings directly instead of having to go through the detour of
setting up namespaces at all.
This just adds the infrastructure and doesn't do any conversions.
Reviewed-by: Seth Forshee (DigitalOcean) <sforshee@kernel.org>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Christian Brauner [Fri, 28 Oct 2022 10:50:50 +0000 (12:50 +0200)]
Merge branch 'fs.acl.rework' into for-next
Christian Brauner [Fri, 28 Oct 2022 10:45:10 +0000 (12:45 +0200)]
cifs: check whether acl is valid early
Dan reported that acl is dereferenced before being checked and this is a
valid problem. Fix it be erroring out early instead of doing it later after
we've already relied on acl to be a valid pointer.
Fixes:
dc1af4c4b472 ("cifs: implement set acl method")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Christian Brauner [Fri, 28 Oct 2022 08:29:15 +0000 (10:29 +0200)]
Merge branch 'fs.acl.rework' into for-next
Christian Brauner [Fri, 28 Oct 2022 08:00:26 +0000 (10:00 +0200)]
acl: make vfs_posix_acl_to_xattr() static
After reworking posix acls this helper isn't used anywhere outside the core
posix acl paths. Make it static.
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Christian Brauner [Wed, 26 Oct 2022 08:10:09 +0000 (10:10 +0200)]
Merge branch 'fs.vfsuid.conversion' into for-next
Christian Brauner [Wed, 29 Jun 2022 10:53:54 +0000 (12:53 +0200)]
fs: remove unused idmapping helpers
Now that all places can deal with the new type safe helpers remove all
of the old helpers.
Reviewed-by: Seth Forshee (DigitalOcean) <sforshee@kernel.org>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Christian Brauner [Fri, 9 Sep 2022 09:07:47 +0000 (11:07 +0200)]
ovl: port to vfs{g,u}id_t and associated helpers
A while ago we introduced a dedicated vfs{g,u}id_t type in commit
1e5267cd0895 ("mnt_idmapping: add vfs{g,u}id_t"). We already switched
over a good part of the VFS. Ultimately we will remove all legacy
idmapped mount helpers that operate only on k{g,u}id_t in favor of the
new type safe helpers that operate on vfs{g,u}id_t.
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Reviewed-by: Seth Forshee (DigitalOcean) <sforshee@kernel.org>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Christian Brauner [Fri, 9 Sep 2022 09:40:21 +0000 (11:40 +0200)]
fuse: port to vfs{g,u}id_t and associated helpers
A while ago we introduced a dedicated vfs{g,u}id_t type in commit
1e5267cd0895 ("mnt_idmapping: add vfs{g,u}id_t"). We already switched over
a good part of the VFS. Ultimately we will remove all legacy idmapped
mount helpers that operate only on k{g,u}id_t in favor of the new type safe
helpers that operate on vfs{g,u}id_t.
Cc: Christoph Hellwig <hch@lst.de>
Reviewed-by: Seth Forshee (DigitalOcean) <sforshee@kernel.org>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Christian Brauner [Wed, 29 Jun 2022 14:28:04 +0000 (16:28 +0200)]
ima: use type safe idmapping helpers
We already ported most parts and filesystems over for v6.0 to the new
vfs{g,u}id_t type and associated helpers for v6.0. Convert the remaining
places so we can remove all the old helpers.
This is a non-functional change.
Reviewed-by: Seth Forshee (DigitalOcean) <sforshee@kernel.org>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Christian Brauner [Sun, 26 Jun 2022 16:06:01 +0000 (18:06 +0200)]
apparmor: use type safe idmapping helpers
We already ported most parts and filesystems over for v6.0 to the new
vfs{g,u}id_t type and associated helpers for v6.0. Convert the remaining
places so we can remove all the old helpers.
This is a non-functional change.
Reviewed-by: Seth Forshee (DigitalOcean) <sforshee@kernel.org>
Acked-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Christian Brauner [Thu, 20 Oct 2022 12:50:10 +0000 (14:50 +0200)]
caps: use type safe idmapping helpers
We already ported most parts and filesystems over for v6.0 to the new
vfs{g,u}id_t type and associated helpers for v6.0. Convert the remaining
places so we can remove all the old helpers.
This is a non-functional change.
Reviewed-by: Seth Forshee (DigitalOcean) <sforshee@kernel.org>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Christian Brauner [Wed, 22 Jun 2022 20:12:16 +0000 (22:12 +0200)]
fs: use type safe idmapping helpers
We already ported most parts and filesystems over for v6.0 to the new
vfs{g,u}id_t type and associated helpers for v6.0. Convert the remaining
places so we can remove all the old helpers.
This is a non-functional change.
Reviewed-by: Seth Forshee (DigitalOcean) <sforshee@kernel.org>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Christian Brauner [Sun, 26 Jun 2022 18:45:07 +0000 (20:45 +0200)]
mnt_idmapping: add missing helpers
Add missing helpers needed to convert all remaining places to the type
safe idmapped mount helpers. After the conversion we will remove all the
old helpers.
Reviewed-by: Seth Forshee (DigitalOcean) <sforshee@kernel.org>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Christian Brauner [Mon, 24 Oct 2022 14:43:21 +0000 (16:43 +0200)]
Merge branch 'fs.acl.rework' into for-next
Christian Brauner [Thu, 22 Sep 2022 15:17:27 +0000 (17:17 +0200)]
acl: remove a slew of now unused helpers
Now that the posix acl api is active we can remove all the hacky helpers
we had to keep around for all these years and also remove the set and
get posix acl xattr handler methods as they aren't needed anymore.
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Christian Brauner [Thu, 22 Sep 2022 15:17:26 +0000 (17:17 +0200)]
9p: use stub posix acl handlers
Now that 9p supports the get and set acl inode operations and the vfs
has been switched to the new posi api, 9p can simply rely on the stub
posix acl handlers. The custom xattr handlers and associated unused
helpers can be removed.
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Christian Brauner [Thu, 22 Sep 2022 15:17:25 +0000 (17:17 +0200)]
cifs: use stub posix acl handlers
Now that cifs supports the get and set acl inode operations and the vfs
has been switched to the new posi api, cifs can simply rely on the stub
posix acl handlers. The custom xattr handlers and associated unused
helpers can be removed.
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Christian Brauner [Thu, 22 Sep 2022 15:17:24 +0000 (17:17 +0200)]
ovl: use stub posix acl handlers
Now that ovl supports the get and set acl inode operations and the vfs
has been switched to the new posi api, ovl can simply rely on the stub
posix acl handlers. The custom xattr handlers and associated unused
helpers can be removed.
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Christian Brauner [Thu, 22 Sep 2022 15:17:23 +0000 (17:17 +0200)]
ecryptfs: use stub posix acl handlers
Now that ecryptfs supports the get and set acl inode operations and the
vfs has been switched to the new posi api, ecryptfs can simply rely on
the stub posix acl handlers.
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Christian Brauner [Thu, 22 Sep 2022 15:17:15 +0000 (17:17 +0200)]
evm: remove evm_xattr_acl_change()
The security and integrity infrastructure has dedicated hooks now so
evm_xattr_acl_change() is dead code. Before this commit the callchain was:
evm_protect_xattr()
-> evm_xattr_change()
-> evm_xattr_acl_change()
where evm_protect_xattr() was hit from evm_inode_setxattr() and
evm_inode_removexattr(). But now we have evm_inode_set_acl() and
evm_inode_remove_acl() and have switched over the vfs to rely on the posix
acl api so the code isn't hit anymore.
Suggested-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Christian Brauner [Thu, 22 Sep 2022 15:17:22 +0000 (17:17 +0200)]
xattr: use posix acl api
In previous patches we built a new posix api solely around get and set
inode operations. Now that we have all the pieces in place we can switch
the system calls and the vfs over to only rely on this api when
interacting with posix acls. This finally removes all type unsafety and
type conversion issues explained in detail in [1] that we aim to get rid
of.
With the new posix acl api we immediately translate into an appropriate
kernel internal struct posix_acl format both when getting and setting
posix acls. This is a stark contrast to before were we hacked unsafe raw
values into the uapi struct that was stored in a void pointer relying
and having filesystems and security modules hack around in the uapi
struct as well.
Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Christian Brauner [Thu, 22 Sep 2022 15:17:21 +0000 (17:17 +0200)]
ovl: use posix acl api
Now that posix acls have a proper api us it to copy them.
All filesystems that can serve as lower or upper layers for overlayfs
have gained support for the new posix acl api in previous patches.
So switch all internal overlayfs codepaths for copying posix acls to the
new posix acl api.
Acked-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Christian Brauner [Thu, 22 Sep 2022 15:17:20 +0000 (17:17 +0200)]
ovl: implement set acl method
The current way of setting and getting posix acls through the generic
xattr interface is error prone and type unsafe. The vfs needs to
interpret and fixup posix acls before storing or reporting it to
userspace. Various hacks exist to make this work. The code is hard to
understand and difficult to maintain in it's current form. Instead of
making this work by hacking posix acls through xattr handlers we are
building a dedicated posix acl api around the get and set inode
operations. This removes a lot of hackiness and makes the codepaths
easier to maintain. A lot of background can be found in [1].
In order to build a type safe posix api around get and set acl we need
all filesystem to implement get and set acl.
Now that we have added get and set acl inode operations that allow easy
access to the dentry we give overlayfs it's own get and set acl inode
operations.
The set acl inode operation is duplicates most of the ovl posix acl
xattr handler. The main difference being that the set acl inode
operation relies on the new posix acl api. Once the vfs has been
switched over the custom posix acl xattr handler will be removed
completely.
Note, until the vfs has been switched to the new posix acl api this
patch is a non-functional change.
Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org
Acked-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Christian Brauner [Thu, 22 Sep 2022 15:17:19 +0000 (17:17 +0200)]
ovl: implement get acl method
The current way of setting and getting posix acls through the generic
xattr interface is error prone and type unsafe. The vfs needs to
interpret and fixup posix acls before storing or reporting it to
userspace. Various hacks exist to make this work. The code is hard to
understand and difficult to maintain in it's current form. Instead of
making this work by hacking posix acls through xattr handlers we are
building a dedicated posix acl api around the get and set inode
operations. This removes a lot of hackiness and makes the codepaths
easier to maintain. A lot of background can be found in [1].
In order to build a type safe posix api around get and set acl we need
all filesystem to implement get and set acl.
Now that we have added get and set acl inode operations that allow easy
access to the dentry we give overlayfs it's own get and set acl inode
operations.
Since overlayfs is a stacking filesystem it will use the newly added
posix acl api when retrieving posix acls from the relevant layer.
Since overlayfs can also be mounted on top of idmapped layers. If
idmapped layers are used overlayfs must take the layer's idmapping into
account after it retrieved the posix acls from the relevant layer.
Note, until the vfs has been switched to the new posix acl api this
patch is a non-functional change.
Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Christian Brauner [Thu, 22 Sep 2022 15:17:18 +0000 (17:17 +0200)]
ecryptfs: implement set acl method
The current way of setting and getting posix acls through the generic
xattr interface is error prone and type unsafe. The vfs needs to
interpret and fixup posix acls before storing or reporting it to
userspace. Various hacks exist to make this work. The code is hard to
understand and difficult to maintain in it's current form. Instead of
making this work by hacking posix acls through xattr handlers we are
building a dedicated posix acl api around the get and set inode
operations. This removes a lot of hackiness and makes the codepaths
easier to maintain. A lot of background can be found in [1].
In order to build a type safe posix api around get and set acl we need
all filesystem to implement get and set acl.
So far ecryptfs didn't implement get and set acl inode operations
because it wanted easy access to the dentry. Now that we extended the
set acl inode operation to take a dentry argument and added a new get
acl inode operation that takes a dentry argument we can let ecryptfs
implement get and set acl inode operations.
Note, until the vfs has been switched to the new posix acl api this
patch is a non-functional change.
Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Christian Brauner [Thu, 22 Sep 2022 15:17:17 +0000 (17:17 +0200)]
ecryptfs: implement get acl method
The current way of setting and getting posix acls through the generic
xattr interface is error prone and type unsafe. The vfs needs to
interpret and fixup posix acls before storing or reporting it to
userspace. Various hacks exist to make this work. The code is hard to
understand and difficult to maintain in it's current form. Instead of
making this work by hacking posix acls through xattr handlers we are
building a dedicated posix acl api around the get and set inode
operations. This removes a lot of hackiness and makes the codepaths
easier to maintain. A lot of background can be found in [1].
In order to build a type safe posix api around get and set acl we need
all filesystem to implement get and set acl.
So far ecryptfs didn't implement get and set acl inode operations
because it wanted easy access to the dentry. Now that we extended the
set acl inode operation to take a dentry argument and added a new get
acl inode operation that takes a dentry argument we can let ecryptfs
implement get and set acl inode operations.
Note, until the vfs has been switched to the new posix acl api this
patch is a non-functional change.
Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Christian Brauner [Thu, 22 Sep 2022 15:17:16 +0000 (17:17 +0200)]
ksmbd: use vfs_remove_acl()
The current way of setting and getting posix acls through the generic
xattr interface is error prone and type unsafe. The vfs needs to
interpret and fixup posix acls before storing or reporting it to
userspace. Various hacks exist to make this work. The code is hard to
understand and difficult to maintain in it's current form. Instead of
making this work by hacking posix acls through xattr handlers we are
building a dedicated posix acl api around the get and set inode
operations. This removes a lot of hackiness and makes the codepaths
easier to maintain. A lot of background can be found in [1].
Now that we've switched all filesystems that can serve as the lower
filesystem for ksmbd we can switch ksmbd over to rely on
the posix acl api. Note that this is orthogonal to switching the vfs
itself over.
Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Christian Brauner [Thu, 22 Sep 2022 15:17:14 +0000 (17:17 +0200)]
acl: add vfs_remove_acl()
In previous patches we implemented get and set inode operations for all
non-stacking filesystems that support posix acls but didn't yet
implement get and/or set acl inode operations. This specifically
affected cifs and 9p.
Now we can build a posix acl api based solely on get and set inode
operations. We add a new vfs_remove_acl() api that can be used to set
posix acls. This finally removes all type unsafety and type conversion
issues explained in detail in [1] that we aim to get rid of.
After we finished building the vfs api we can switch stacking
filesystems to rely on the new posix api and then finally switch the
xattr system calls themselves to rely on the posix acl api.
Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Christian Brauner [Thu, 22 Sep 2022 15:17:13 +0000 (17:17 +0200)]
acl: add vfs_get_acl()
In previous patches we implemented get and set inode operations for all
non-stacking filesystems that support posix acls but didn't yet
implement get and/or set acl inode operations. This specifically
affected cifs and 9p.
Now we can build a posix acl api based solely on get and set inode
operations. We add a new vfs_get_acl() api that can be used to get posix
acls. This finally removes all type unsafety and type conversion issues
explained in detail in [1] that we aim to get rid of.
After we finished building the vfs api we can switch stacking
filesystems to rely on the new posix api and then finally switch the
xattr system calls themselves to rely on the posix acl api.
Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Christian Brauner [Thu, 22 Sep 2022 15:17:06 +0000 (17:17 +0200)]
acl: add vfs_set_acl()
In previous patches we implemented get and set inode operations for all
non-stacking filesystems that support posix acls but didn't yet
implement get and/or set acl inode operations. This specifically
affected cifs and 9p.
Now we can build a posix acl api based solely on get and set inode
operations. We add a new vfs_set_acl() api that can be used to set posix
acls. This finally removes all type unsafety and type conversion issues
explained in detail in [1] that we aim to get rid of.
After we finished building the vfs api we can switch stacking
filesystems to rely on the new posix api and then finally switch the
xattr system calls themselves to rely on the posix acl api.
Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Christian Brauner [Thu, 29 Sep 2022 08:47:36 +0000 (10:47 +0200)]
internal: add may_write_xattr()
Split out the generic checks whether an inode allows writing xattrs. Since
security.* and system.* xattrs don't have any restrictions and we're going
to split out posix acls into a dedicated api we will use this helper to
check whether we can write posix acls.
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Christian Brauner [Wed, 28 Sep 2022 11:34:00 +0000 (13:34 +0200)]
evm: add post set acl hook
The security_inode_post_setxattr() hook is used by security modules to
update their own security.* xattrs. Consequently none of the security
modules operate on posix acls. So we don't need an additional security
hook when post setting posix acls.
However, the integrity subsystem wants to be informed about posix acl
changes in order to reset the EVM status flag.
-> evm_inode_post_setxattr()
-> evm_update_evmxattr()
-> evm_calc_hmac()
-> evm_calc_hmac_or_hash()
and evm_cacl_hmac_or_hash() walks the global list of protected xattr
names evm_config_xattrnames. This global list can be modified via
/sys/security/integrity/evm/evm_xattrs. The write to "evm_xattrs" is
restricted to security.* xattrs and the default xattrs in
evm_config_xattrnames only contains security.* xattrs as well.
So the actual value for posix acls is currently completely irrelevant
for evm during evm_inode_post_setxattr() and frankly it should stay that
way in the future to not cause the vfs any more headaches. But if the
actual posix acl values matter then evm shouldn't operate on the binary
void blob and try to hack around in the uapi struct anyway. Instead it
should then in the future add a dedicated hook which takes a struct
posix_acl argument passing the posix acls in the proper vfs format.
For now it is sufficient to make evm_inode_post_set_acl() a wrapper
around evm_inode_post_setxattr() not passing any actual values down.
This will cause the hashes to be updated as before.
Reviewed-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Christian Brauner [Thu, 22 Sep 2022 15:17:10 +0000 (17:17 +0200)]
integrity: implement get and set acl hook
The current way of setting and getting posix acls through the generic
xattr interface is error prone and type unsafe. The vfs needs to
interpret and fixup posix acls before storing or reporting it to
userspace. Various hacks exist to make this work. The code is hard to
understand and difficult to maintain in it's current form. Instead of
making this work by hacking posix acls through xattr handlers we are
building a dedicated posix acl api around the get and set inode
operations. This removes a lot of hackiness and makes the codepaths
easier to maintain. A lot of background can be found in [1].
So far posix acls were passed as a void blob to the security and
integrity modules. Some of them like evm then proceed to interpret the
void pointer and convert it into the kernel internal struct posix acl
representation to perform their integrity checking magic. This is
obviously pretty problematic as that requires knowledge that only the
vfs is guaranteed to have and has lead to various bugs. Add a proper
security hook for setting posix acls and pass down the posix acls in
their appropriate vfs format instead of hacking it through a void
pointer stored in the uapi format.
I spent considerate time in the security module and integrity
infrastructure and audited all codepaths. EVM is the only part that
really has restrictions based on the actual posix acl values passed
through it (e.g., i_mode). Before this dedicated hook EVM used to translate
from the uapi posix acl format sent to it in the form of a void pointer
into the vfs format. This is not a good thing. Instead of hacking around in
the uapi struct give EVM the posix acls in the appropriate vfs format and
perform sane permissions checks that mirror what it used to to in the
generic xattr hook.
IMA doesn't have any restrictions on posix acls. When posix acls are
changed it just wants to update its appraisal status to trigger an EVM
revalidation.
The removal of posix acls is equivalent to passing NULL to the posix set
acl hooks. This is the same as before through the generic xattr api.
Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org
Acked-by: Paul Moore <paul@paul-moore.com> (LSM)
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Christian Brauner [Thu, 22 Sep 2022 15:17:09 +0000 (17:17 +0200)]
smack: implement get, set and remove acl hook
The current way of setting and getting posix acls through the generic
xattr interface is error prone and type unsafe. The vfs needs to
interpret and fixup posix acls before storing or reporting it to
userspace. Various hacks exist to make this work. The code is hard to
understand and difficult to maintain in it's current form. Instead of
making this work by hacking posix acls through xattr handlers we are
building a dedicated posix acl api around the get and set inode
operations. This removes a lot of hackiness and makes the codepaths
easier to maintain. A lot of background can be found in [1].
So far posix acls were passed as a void blob to the security and
integrity modules. Some of them like evm then proceed to interpret the
void pointer and convert it into the kernel internal struct posix acl
representation to perform their integrity checking magic. This is
obviously pretty problematic as that requires knowledge that only the
vfs is guaranteed to have and has lead to various bugs. Add a proper
security hook for setting posix acls and pass down the posix acls in
their appropriate vfs format instead of hacking it through a void
pointer stored in the uapi format.
I spent considerate time in the security module infrastructure and
audited all codepaths. Smack has no restrictions based on the posix
acl values passed through it. The capability hook doesn't need to be
called either because it only has restrictions on security.* xattrs. So
these all becomes very simple hooks for smack.
Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org
Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Christian Brauner [Thu, 22 Sep 2022 15:17:08 +0000 (17:17 +0200)]
selinux: implement get, set and remove acl hook
The current way of setting and getting posix acls through the generic
xattr interface is error prone and type unsafe. The vfs needs to
interpret and fixup posix acls before storing or reporting it to
userspace. Various hacks exist to make this work. The code is hard to
understand and difficult to maintain in it's current form. Instead of
making this work by hacking posix acls through xattr handlers we are
building a dedicated posix acl api around the get and set inode
operations. This removes a lot of hackiness and makes the codepaths
easier to maintain. A lot of background can be found in [1].
So far posix acls were passed as a void blob to the security and
integrity modules. Some of them like evm then proceed to interpret the
void pointer and convert it into the kernel internal struct posix acl
representation to perform their integrity checking magic. This is
obviously pretty problematic as that requires knowledge that only the
vfs is guaranteed to have and has lead to various bugs. Add a proper
security hook for setting posix acls and pass down the posix acls in
their appropriate vfs format instead of hacking it through a void
pointer stored in the uapi format.
I spent considerate time in the security module infrastructure and
audited all codepaths. SELinux has no restrictions based on the posix
acl values passed through it. The capability hook doesn't need to be
called either because it only has restrictions on security.* xattrs. So
these are all fairly simply hooks for SELinux.
Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Christian Brauner [Thu, 22 Sep 2022 15:17:07 +0000 (17:17 +0200)]
security: add get, remove and set acl hook
The current way of setting and getting posix acls through the generic
xattr interface is error prone and type unsafe. The vfs needs to
interpret and fixup posix acls before storing or reporting it to
userspace. Various hacks exist to make this work. The code is hard to
understand and difficult to maintain in it's current form. Instead of
making this work by hacking posix acls through xattr handlers we are
building a dedicated posix acl api around the get and set inode
operations. This removes a lot of hackiness and makes the codepaths
easier to maintain. A lot of background can be found in [1].
So far posix acls were passed as a void blob to the security and
integrity modules. Some of them like evm then proceed to interpret the
void pointer and convert it into the kernel internal struct posix acl
representation to perform their integrity checking magic. This is
obviously pretty problematic as that requires knowledge that only the
vfs is guaranteed to have and has lead to various bugs. Add a proper
security hook for setting posix acls and pass down the posix acls in
their appropriate vfs format instead of hacking it through a void
pointer stored in the uapi format.
In the next patches we implement the hooks for the few security modules
that do actually have restrictions on posix acls.
Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Christian Brauner [Thu, 22 Sep 2022 15:17:05 +0000 (17:17 +0200)]
9p: implement set acl method
The current way of setting and getting posix acls through the generic
xattr interface is error prone and type unsafe. The vfs needs to
interpret and fixup posix acls before storing or reporting it to
userspace. Various hacks exist to make this work. The code is hard to
understand and difficult to maintain in it's current form. Instead of
making this work by hacking posix acls through xattr handlers we are
building a dedicated posix acl api around the get and set inode
operations. This removes a lot of hackiness and makes the codepaths
easier to maintain. A lot of background can be found in [1].
In order to build a type safe posix api around get and set acl we need
all filesystem to implement get and set acl.
So far 9p implemented a ->get_inode_acl() operation that didn't require
access to the dentry in order to allow (limited) permission checking via
posix acls in the vfs. Now that we have get and set acl inode operations
that take a dentry argument we can give 9p get and set acl inode
operations.
This is mostly a light refactoring of the codepaths currently used in 9p
posix acl xattr handler. After we have fully implemented the posix acl
api and switched the vfs over to it, the 9p specific posix acl xattr
handler and associated code will be removed.
Note, until the vfs has been switched to the new posix acl api this
patch is a non-functional change.
Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Christian Brauner [Thu, 22 Sep 2022 15:17:04 +0000 (17:17 +0200)]
9p: implement get acl method
The current way of setting and getting posix acls through the generic
xattr interface is error prone and type unsafe. The vfs needs to
interpret and fixup posix acls before storing or reporting it to
userspace. Various hacks exist to make this work. The code is hard to
understand and difficult to maintain in it's current form. Instead of
making this work by hacking posix acls through xattr handlers we are
building a dedicated posix acl api around the get and set inode
operations. This removes a lot of hackiness and makes the codepaths
easier to maintain. A lot of background can be found in [1].
In order to build a type safe posix api around get and set acl we need
all filesystem to implement get and set acl.
So far 9p implemented a ->get_inode_acl() operation that didn't require
access to the dentry in order to allow (limited) permission checking via
posix acls in the vfs. Now that we have get and set acl inode operations
that take a dentry argument we can give 9p get and set acl inode
operations.
This is mostly a refactoring of the codepaths currently used in 9p posix
acl xattr handler. After we have fully implemented the posix acl api and
switched the vfs over to it, the 9p specific posix acl xattr handler and
associated code will be removed.
Note, until the vfs has been switched to the new posix acl api this
patch is a non-functional change.
Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Christian Brauner [Thu, 22 Sep 2022 15:17:03 +0000 (17:17 +0200)]
cifs: implement set acl method
The current way of setting and getting posix acls through the generic
xattr interface is error prone and type unsafe. The vfs needs to
interpret and fixup posix acls before storing or reporting it to
userspace. Various hacks exist to make this work. The code is hard to
understand and difficult to maintain in it's current form. Instead of
making this work by hacking posix acls through xattr handlers we are
building a dedicated posix acl api around the get and set inode
operations. This removes a lot of hackiness and makes the codepaths
easier to maintain. A lot of background can be found in [1].
In order to build a type safe posix api around get and set acl we need
all filesystem to implement get and set acl.
So far cifs wasn't able to implement get and set acl inode operations
because it needs access to the dentry. Now that we extended the set acl
inode operation to take a dentry argument and added a new get acl inode
operation that takes a dentry argument we can let cifs implement get and
set acl inode operations.
This is mostly a copy and paste of the codepaths currently used in cifs'
posix acl xattr handler. After we have fully implemented the posix acl
api and switched the vfs over to it, the cifs specific posix acl xattr
handler and associated code will be removed and the code duplication
will go away.
Note, until the vfs has been switched to the new posix acl api this
patch is a non-functional change.
Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Christian Brauner [Thu, 22 Sep 2022 15:17:02 +0000 (17:17 +0200)]
cifs: implement get acl method
The current way of setting and getting posix acls through the generic
xattr interface is error prone and type unsafe. The vfs needs to
interpret and fixup posix acls before storing or reporting it to
userspace. Various hacks exist to make this work. The code is hard to
understand and difficult to maintain in it's current form. Instead of
making this work by hacking posix acls through xattr handlers we are
building a dedicated posix acl api around the get and set inode
operations. This removes a lot of hackiness and makes the codepaths
easier to maintain. A lot of background can be found in [1].
In order to build a type safe posix api around get and set acl we need
all filesystem to implement get and set acl.
So far cifs wasn't able to implement get and set acl inode operations
because it needs access to the dentry. Now that we extended the set acl
inode operation to take a dentry argument and added a new get acl inode
operation that takes a dentry argument we can let cifs implement get and
set acl inode operations.
This is mostly a copy and paste of the codepaths currently used in cifs'
posix acl xattr handler. After we have fully implemented the posix acl
api and switched the vfs over to it, the cifs specific posix acl xattr
handler and associated code will be removed and the code duplication
will go away.
Note, until the vfs has been switched to the new posix acl api this
patch is a non-functional change.
Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Christian Brauner [Thu, 22 Sep 2022 15:17:01 +0000 (17:17 +0200)]
fs: add new get acl method
The current way of setting and getting posix acls through the generic
xattr interface is error prone and type unsafe. The vfs needs to
interpret and fixup posix acls before storing or reporting it to
userspace. Various hacks exist to make this work. The code is hard to
understand and difficult to maintain in it's current form. Instead of
making this work by hacking posix acls through xattr handlers we are
building a dedicated posix acl api around the get and set inode
operations. This removes a lot of hackiness and makes the codepaths
easier to maintain. A lot of background can be found in [1].
Since some filesystem rely on the dentry being available to them when
setting posix acls (e.g., 9p and cifs) they cannot rely on the old get
acl inode operation to retrieve posix acl and need to implement their
own custom handlers because of that.
In a previous patch we renamed the old get acl inode operation to
->get_inode_acl(). We decided to rename it and implement a new one since
->get_inode_acl() is called generic_permission() and inode_permission()
both of which can be called during an filesystem's ->permission()
handler. So simply passing a dentry argument to ->get_acl() would have
amounted to also having to pass a dentry argument to ->permission(). We
avoided that change.
This adds a new ->get_acl() inode operations which takes a dentry
argument which filesystems such as 9p, cifs, and overlayfs can implement
to get posix acls.
Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Christian Brauner [Thu, 22 Sep 2022 15:17:00 +0000 (17:17 +0200)]
fs: rename current get acl method
The current way of setting and getting posix acls through the generic
xattr interface is error prone and type unsafe. The vfs needs to
interpret and fixup posix acls before storing or reporting it to
userspace. Various hacks exist to make this work. The code is hard to
understand and difficult to maintain in it's current form. Instead of
making this work by hacking posix acls through xattr handlers we are
building a dedicated posix acl api around the get and set inode
operations. This removes a lot of hackiness and makes the codepaths
easier to maintain. A lot of background can be found in [1].
The current inode operation for getting posix acls takes an inode
argument but various filesystems (e.g., 9p, cifs, overlayfs) need access
to the dentry. In contrast to the ->set_acl() inode operation we cannot
simply extend ->get_acl() to take a dentry argument. The ->get_acl()
inode operation is called from:
acl_permission_check()
-> check_acl()
-> get_acl()
which is part of generic_permission() which in turn is part of
inode_permission(). Both generic_permission() and inode_permission() are
called in the ->permission() handler of various filesystems (e.g.,
overlayfs). So simply passing a dentry argument to ->get_acl() would
amount to also having to pass a dentry argument to ->permission(). We
should avoid this unnecessary change.
So instead of extending the existing inode operation rename it from
->get_acl() to ->get_inode_acl() and add a ->get_acl() method later that
passes a dentry argument and which filesystems that need access to the
dentry can implement instead of ->get_inode_acl(). Filesystems like cifs
which allow setting and getting posix acls but not using them for
permission checking during lookup can simply not implement
->get_inode_acl().
This is intended to be a non-functional change.
Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org
Suggested-by/Inspired-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Christian Brauner [Fri, 23 Sep 2022 08:29:39 +0000 (10:29 +0200)]
fs: pass dentry to set acl method
The current way of setting and getting posix acls through the generic
xattr interface is error prone and type unsafe. The vfs needs to
interpret and fixup posix acls before storing or reporting it to
userspace. Various hacks exist to make this work. The code is hard to
understand and difficult to maintain in it's current form. Instead of
making this work by hacking posix acls through xattr handlers we are
building a dedicated posix acl api around the get and set inode
operations. This removes a lot of hackiness and makes the codepaths
easier to maintain. A lot of background can be found in [1].
Since some filesystem rely on the dentry being available to them when
setting posix acls (e.g., 9p and cifs) they cannot rely on set acl inode
operation. But since ->set_acl() is required in order to use the generic
posix acl xattr handlers filesystems that do not implement this inode
operation cannot use the handler and need to implement their own
dedicated posix acl handlers.
Update the ->set_acl() inode method to take a dentry argument. This
allows all filesystems to rely on ->set_acl().
As far as I can tell all codepaths can be switched to rely on the dentry
instead of just the inode. Note that the original motivation for passing
the dentry separate from the inode instead of just the dentry in the
xattr handlers was because of security modules that call
security_d_instantiate(). This hook is called during
d_instantiate_new(), d_add(), __d_instantiate_anon(), and
d_splice_alias() to initialize the inode's security context and possibly
to set security.* xattrs. Since this only affects security.* xattrs this
is completely irrelevant for posix acls.
Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Christian Brauner [Fri, 23 Sep 2022 08:19:34 +0000 (10:19 +0200)]
orangefs: rework posix acl handling when creating new filesystem objects
When creating new filesytem objects orangefs used to create posix acls
after it had created and inserted a new inode. This made it necessary to
all posix_acl_chmod() on the newly created inode in case the mode of the
inode would be changed by the posix acls.
Instead of doing it this way calculate the correct mode directly before
actually creating the inode. So we first create posix acls, then pass
the mode that posix acls mandate into the orangefs getattr helper and
calculate the correct mode. This is needed so we can simply change
posix_acl_chmod() to take a dentry instead of an inode argument in the
next patch.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Christian Brauner [Tue, 18 Oct 2022 08:13:47 +0000 (10:13 +0200)]
Merge branch 'fs.ovl.setgid' into for-next
* fs.ovl.setgid:
ovl: remove privs in ovl_fallocate()
ovl: remove privs in ovl_copyfile()
attr: use consistent sgid stripping checks
attr: add setattr_should_drop_sgid()
fs: move should_remove_suid()
attr: add in_group_or_capable()
Amir Goldstein [Mon, 17 Oct 2022 15:06:39 +0000 (17:06 +0200)]
ovl: remove privs in ovl_fallocate()
Underlying fs doesn't remove privs because fallocate is called with
privileged mounter credentials.
This fixes some failure in fstests generic/683..687.
Fixes:
aab8848cee5e ("ovl: add ovl_fallocate()")
Acked-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Amir Goldstein [Mon, 17 Oct 2022 15:06:38 +0000 (17:06 +0200)]
ovl: remove privs in ovl_copyfile()
Underlying fs doesn't remove privs because copy_range/remap_range are
called with privileged mounter credentials.
This fixes some failures in fstest generic/673.
Fixes:
8ede205541ff ("ovl: add reflink/copyfile/dedup support")
Acked-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Christian Brauner [Mon, 17 Oct 2022 15:06:37 +0000 (17:06 +0200)]
attr: use consistent sgid stripping checks
Currently setgid stripping in file_remove_privs()'s should_remove_suid()
helper is inconsistent with other parts of the vfs. Specifically, it only
raises ATTR_KILL_SGID if the inode is S_ISGID and S_IXGRP but not if the
inode isn't in the caller's groups and the caller isn't privileged over the
inode although we require this already in setattr_prepare() and
setattr_copy() and so all filesystem implement this requirement implicitly
because they have to use setattr_{prepare,copy}() anyway.
But the inconsistency shows up in setgid stripping bugs for overlayfs in
xfstests (e.g., generic/673, generic/683, generic/685, generic/686,
generic/687). For example, we test whether suid and setgid stripping works
correctly when performing various write-like operations as an unprivileged
user (fallocate, reflink, write, etc.):
echo "Test 1 - qa_user, non-exec file $verb"
setup_testfile
chmod a+rws $junk_file
commit_and_check "$qa_user" "$verb" 64k 64k
The test basically creates a file with 6666 permissions. While the file has
the S_ISUID and S_ISGID bits set it does not have the S_IXGRP set. On a
regular filesystem like xfs what will happen is:
sys_fallocate()
-> vfs_fallocate()
-> xfs_file_fallocate()
-> file_modified()
-> __file_remove_privs()
-> dentry_needs_remove_privs()
-> should_remove_suid()
-> __remove_privs()
newattrs.ia_valid = ATTR_FORCE | kill;
-> notify_change()
-> setattr_copy()
In should_remove_suid() we can see that ATTR_KILL_SUID is raised
unconditionally because the file in the test has S_ISUID set.
But we also see that ATTR_KILL_SGID won't be set because while the file
is S_ISGID it is not S_IXGRP (see above) which is a condition for
ATTR_KILL_SGID being raised.
So by the time we call notify_change() we have attr->ia_valid set to
ATTR_KILL_SUID | ATTR_FORCE. Now notify_change() sees that
ATTR_KILL_SUID is set and does:
ia_valid = attr->ia_valid |= ATTR_MODE
attr->ia_mode = (inode->i_mode & ~S_ISUID);
which means that when we call setattr_copy() later we will definitely
update inode->i_mode. Note that attr->ia_mode still contains S_ISGID.
Now we call into the filesystem's ->setattr() inode operation which will
end up calling setattr_copy(). Since ATTR_MODE is set we will hit:
if (ia_valid & ATTR_MODE) {
umode_t mode = attr->ia_mode;
vfsgid_t vfsgid = i_gid_into_vfsgid(mnt_userns, inode);
if (!vfsgid_in_group_p(vfsgid) &&
!capable_wrt_inode_uidgid(mnt_userns, inode, CAP_FSETID))
mode &= ~S_ISGID;
inode->i_mode = mode;
}
and since the caller in the test is neither capable nor in the group of the
inode the S_ISGID bit is stripped.
But assume the file isn't suid then ATTR_KILL_SUID won't be raised which
has the consequence that neither the setgid nor the suid bits are stripped
even though it should be stripped because the inode isn't in the caller's
groups and the caller isn't privileged over the inode.
If overlayfs is in the mix things become a bit more complicated and the bug
shows up more clearly. When e.g., ovl_setattr() is hit from
ovl_fallocate()'s call to file_remove_privs() then ATTR_KILL_SUID and
ATTR_KILL_SGID might be raised but because the check in notify_change() is
questioning the ATTR_KILL_SGID flag again by requiring S_IXGRP for it to be
stripped the S_ISGID bit isn't removed even though it should be stripped:
sys_fallocate()
-> vfs_fallocate()
-> ovl_fallocate()
-> file_remove_privs()
-> dentry_needs_remove_privs()
-> should_remove_suid()
-> __remove_privs()
newattrs.ia_valid = ATTR_FORCE | kill;
-> notify_change()
-> ovl_setattr()
// TAKE ON MOUNTER'S CREDS
-> ovl_do_notify_change()
-> notify_change()
// GIVE UP MOUNTER'S CREDS
// TAKE ON MOUNTER'S CREDS
-> vfs_fallocate()
-> xfs_file_fallocate()
-> file_modified()
-> __file_remove_privs()
-> dentry_needs_remove_privs()
-> should_remove_suid()
-> __remove_privs()
newattrs.ia_valid = attr_force | kill;
-> notify_change()
The fix for all of this is to make file_remove_privs()'s
should_remove_suid() helper to perform the same checks as we already
require in setattr_prepare() and setattr_copy() and have notify_change()
not pointlessly requiring S_IXGRP again. It doesn't make any sense in the
first place because the caller must calculate the flags via
should_remove_suid() anyway which would raise ATTR_KILL_SGID.
While we're at it we move should_remove_suid() from inode.c to attr.c
where it belongs with the rest of the iattr helpers. Especially since it
returns ATTR_KILL_S{G,U}ID flags. We also rename it to
setattr_should_drop_suidgid() to better reflect that it indicates both
setuid and setgid bit removal and also that it returns attr flags.
Running xfstests with this doesn't report any regressions. We should really
try and use consistent checks.
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Christian Brauner [Mon, 17 Oct 2022 15:06:36 +0000 (17:06 +0200)]
attr: add setattr_should_drop_sgid()
The current setgid stripping logic during write and ownership change
operations is inconsistent and strewn over multiple places. In order to
consolidate it and make more consistent we'll add a new helper
setattr_should_drop_sgid(). The function retains the old behavior where
we remove the S_ISGID bit unconditionally when S_IXGRP is set but also
when it isn't set and the caller is neither in the group of the inode
nor privileged over the inode.
We will use this helper both in write operation permission removal such
as file_remove_privs() as well as in ownership change operations.
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Christian Brauner [Mon, 17 Oct 2022 15:06:35 +0000 (17:06 +0200)]
fs: move should_remove_suid()
Move the helper from inode.c to attr.c. This keeps the the core of the
set{g,u}id stripping logic in one place when we add follow-up changes.
It is the better place anyway, since should_remove_suid() returns
ATTR_KILL_S{G,U}ID flags.
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Christian Brauner [Mon, 17 Oct 2022 15:06:34 +0000 (17:06 +0200)]
attr: add in_group_or_capable()
In setattr_{copy,prepare}() we need to perform the same permission
checks to determine whether we need to drop the setgid bit or not.
Instead of open-coding it twice add a simple helper the encapsulates the
logic. We will reuse this helpers to make dropping the setgid bit during
write operations more consistent in a follow up patch.
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Linus Torvalds [Sun, 16 Oct 2022 22:36:24 +0000 (15:36 -0700)]
Linux 6.1-rc1
Linus Torvalds [Sun, 16 Oct 2022 22:27:07 +0000 (15:27 -0700)]
Merge tag 'random-6.1-rc1-for-linus' of git://git./linux/kernel/git/crng/random
Pull more random number generator updates from Jason Donenfeld:
"This time with some large scale treewide cleanups.
The intent of this pull is to clean up the way callers fetch random
integers. The current rules for doing this right are:
- If you want a secure or an insecure random u64, use get_random_u64()
- If you want a secure or an insecure random u32, use get_random_u32()
The old function prandom_u32() has been deprecated for a while
now and is just a wrapper around get_random_u32(). Same for
get_random_int().
- If you want a secure or an insecure random u16, use get_random_u16()
- If you want a secure or an insecure random u8, use get_random_u8()
- If you want secure or insecure random bytes, use get_random_bytes().
The old function prandom_bytes() has been deprecated for a while
now and has long been a wrapper around get_random_bytes()
- If you want a non-uniform random u32, u16, or u8 bounded by a
certain open interval maximum, use prandom_u32_max()
I say "non-uniform", because it doesn't do any rejection sampling
or divisions. Hence, it stays within the prandom_*() namespace, not
the get_random_*() namespace.
I'm currently investigating a "uniform" function for 6.2. We'll see
what comes of that.
By applying these rules uniformly, we get several benefits:
- By using prandom_u32_max() with an upper-bound that the compiler
can prove at compile-time is ≤65536 or ≤256, internally
get_random_u16() or get_random_u8() is used, which wastes fewer
batched random bytes, and hence has higher throughput.
- By using prandom_u32_max() instead of %, when the upper-bound is
not a constant, division is still avoided, because
prandom_u32_max() uses a faster multiplication-based trick instead.
- By using get_random_u16() or get_random_u8() in cases where the
return value is intended to indeed be a u16 or a u8, we waste fewer
batched random bytes, and hence have higher throughput.
This series was originally done by hand while I was on an airplane
without Internet. Later, Kees and I worked on retroactively figuring
out what could be done with Coccinelle and what had to be done
manually, and then we split things up based on that.
So while this touches a lot of files, the actual amount of code that's
hand fiddled is comfortably small"
* tag 'random-6.1-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
prandom: remove unused functions
treewide: use get_random_bytes() when possible
treewide: use get_random_u32() when possible
treewide: use get_random_{u8,u16}() when possible, part 2
treewide: use get_random_{u8,u16}() when possible, part 1
treewide: use prandom_u32_max() when possible, part 2
treewide: use prandom_u32_max() when possible, part 1
Linus Torvalds [Sun, 16 Oct 2022 22:14:29 +0000 (15:14 -0700)]
Merge tag 'perf-tools-for-v6.1-2-2022-10-16' of git://git./linux/kernel/git/acme/linux
Pull more perf tools updates from Arnaldo Carvalho de Melo:
- Use BPF CO-RE (Compile Once, Run Everywhere) to support old kernels
when using bperf (perf BPF based counters) with cgroups.
- Support HiSilicon PCIe Performance Monitoring Unit (PMU), that
monitors bandwidth, latency, bus utilization and buffer occupancy.
Documented in Documentation/admin-guide/perf/hisi-pcie-pmu.rst.
- User space tasks can migrate between CPUs, so when tracing selected
CPUs, system-wide sideband is still needed, fix it in the setup of
Intel PT on hybrid systems.
- Fix metricgroups title message in 'perf list', it should state that
the metrics groups are to be used with the '-M' option, not '-e'.
- Sync the msr-index.h copy with the kernel sources, adding support for
using "AMD64_TSC_RATIO" in filter expressions in 'perf trace' as well
as decoding it when printing the MSR tracepoint arguments.
- Fix program header size and alignment when generating a JIT ELF in
'perf inject'.
- Add multiple new Intel PT 'perf test' entries, including a jitdump
one.
- Fix the 'perf test' entries for 'perf stat' CSV and JSON output when
running on PowerPC due to an invalid topology number in that arch.
- Fix the 'perf test' for arm_coresight failures on the ARM Juno
system.
- Fix the 'perf test' attr entry for PERF_FORMAT_LOST, adding this
option to the or expression expected in the intercepted
perf_event_open() syscall.
- Add missing condition flags ('hs', 'lo', 'vc', 'vs') for arm64 in the
'perf annotate' asm parser.
- Fix 'perf mem record -C' option processing, it was being chopped up
when preparing the underlying 'perf record -e mem-events' and thus
being ignored, requiring using '-- -C CPUs' as a workaround.
- Improvements and tidy ups for 'perf test' shell infra.
- Fix Intel PT information printing segfault in uClibc, where a NULL
format was being passed to fprintf.
* tag 'perf-tools-for-v6.1-2-2022-10-16' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (23 commits)
tools arch x86: Sync the msr-index.h copy with the kernel sources
perf auxtrace arm64: Add support for parsing HiSilicon PCIe Trace packet
perf auxtrace arm64: Add support for HiSilicon PCIe Tune and Trace device driver
perf auxtrace arm: Refactor event list iteration in auxtrace_record__init()
perf tests stat+json_output: Include sanity check for topology
perf tests stat+csv_output: Include sanity check for topology
perf intel-pt: Fix system_wide dummy event for hybrid
perf intel-pt: Fix segfault in intel_pt_print_info() with uClibc
perf test: Fix attr tests for PERF_FORMAT_LOST
perf test: test_intel_pt.sh: Add 9 tests
perf inject: Fix GEN_ELF_TEXT_OFFSET for jit
perf test: test_intel_pt.sh: Add jitdump test
perf test: test_intel_pt.sh: Tidy some alignment
perf test: test_intel_pt.sh: Print a message when skipping kernel tracing
perf test: test_intel_pt.sh: Tidy some perf record options
perf test: test_intel_pt.sh: Fix return checking again
perf: Skip and warn on unknown format 'configN' attrs
perf list: Fix metricgroups title message
perf mem: Fix -C option behavior for perf mem record
perf annotate: Add missing condition flags for arm64
...
Linus Torvalds [Sun, 16 Oct 2022 18:12:22 +0000 (11:12 -0700)]
Merge tag 'kbuild-fixes-v6.1' of git://git./linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- Fix CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y compile error for the
combination of Clang >= 14 and GAS <= 2.35.
- Drop vmlinux.bz2 from the rpm package as it just annoyingly increased
the package size.
- Fix modpost error under build environments using musl.
- Make *.ll files keep value names for easier debugging
- Fix single directory build
- Prevent RISC-V from selecting the broken DWARF5 support when Clang
and GAS are used together.
* tag 'kbuild-fixes-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
lib/Kconfig.debug: Add check for non-constant .{s,u}leb128 support to DWARF5
kbuild: fix single directory build
kbuild: add -fno-discard-value-names to cmd_cc_ll_c
scripts/clang-tools: Convert clang-tidy args to list
modpost: put modpost options before argument
kbuild: Stop including vmlinux.bz2 in the rpm's
Kconfig.debug: add toolchain checks for DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT
Kconfig.debug: simplify the dependency of DEBUG_INFO_DWARF4/5
Linus Torvalds [Sun, 16 Oct 2022 18:08:19 +0000 (11:08 -0700)]
Merge tag 'clk-for-linus' of git://git./linux/kernel/git/clk/linux
Pull more clk updates from Stephen Boyd:
"This is the final part of the clk patches for this merge window.
The clk rate range series needed another week to fully bake. Maxime
fixed the bug that broke clk notifiers and prevented this from being
included in the first pull request. He also added a unit test on top
to make sure it doesn't break so easily again. The majority of the
series fixes up how the clk_set_rate_*() APIs work, particularly
around when the rate constraints are dropped and how they move around
when reparenting clks. Overall it's a much needed improvement to the
clk rate range APIs that used to be pretty broken if you looked
sideways.
Beyond the core changes there are a few driver fixes for a compilation
issue or improper data causing clks to fail to register or have the
wrong parents. These are good to get in before the first -rc so that
the system actually boots on the affected devices"
* tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (31 commits)
clk: tegra: Fix Tegra PWM parent clock
clk: at91: fix the build with binutils 2.27
clk: qcom: gcc-msm8660: Drop hardcoded fixed board clocks
clk: mediatek: clk-mux: Add .determine_rate() callback
clk: tests: Add tests for notifiers
clk: Update req_rate on __clk_recalc_rates()
clk: tests: Add missing test case for ranges
clk: qcom: clk-rcg2: Take clock boundaries into consideration for gfx3d
clk: Introduce the clk_hw_get_rate_range function
clk: Zero the clk_rate_request structure
clk: Stop forwarding clk_rate_requests to the parent
clk: Constify clk_has_parent()
clk: Introduce clk_core_has_parent()
clk: Switch from __clk_determine_rate to clk_core_round_rate_nolock
clk: Add our request boundaries in clk_core_init_rate_req
clk: Introduce clk_hw_init_rate_request()
clk: Move clk_core_init_rate_req() from clk_core_round_rate_nolock() to its caller
clk: Change clk_core_init_rate_req prototype
clk: Set req_rate on reparenting
clk: Take into account uncached clocks in clk_set_rate_range()
...
Linus Torvalds [Sun, 16 Oct 2022 18:01:40 +0000 (11:01 -0700)]
Merge tag '6.1-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6
Pull more cifs updates from Steve French:
- fix a regression in guest mounts to old servers
- improvements to directory leasing (caching directory entries safely
beyond the root directory)
- symlink improvement (reducing roundtrips needed to process symlinks)
- an lseek fix (to problem where some dir entries could be skipped)
- improved ioctl for returning more detailed information on directory
change notifications
- clarify multichannel interface query warning
- cleanup fix (for better aligning buffers using ALIGN and round_up)
- a compounding fix
- fix some uninitialized variable bugs found by Coverity and the kernel
test robot
* tag '6.1-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
smb3: improve SMB3 change notification support
cifs: lease key is uninitialized in two additional functions when smb1
cifs: lease key is uninitialized in smb1 paths
smb3: must initialize two ACL struct fields to zero
cifs: fix double-fault crash during ntlmssp
cifs: fix static checker warning
cifs: use ALIGN() and round_up() macros
cifs: find and use the dentry for cached non-root directories also
cifs: enable caching of directories for which a lease is held
cifs: prevent copying past input buffer boundaries
cifs: fix uninitialised var in smb2_compound_op()
cifs: improve symlink handling for smb2+
smb3: clarify multichannel warning
cifs: fix regression in very old smb1 mounts
cifs: fix skipping to incorrect offset in emit_cached_dirents
Tetsuo Handa [Sat, 15 Oct 2022 15:53:51 +0000 (00:53 +0900)]
Revert "cpumask: fix checking valid cpu range".
This reverts commit
78e5a3399421 ("cpumask: fix checking valid cpu range").
syzbot is hitting WARN_ON_ONCE(cpu >= nr_cpumask_bits) warning at
cpu_max_bits_warn() [1], for commit
78e5a3399421 ("cpumask: fix checking
valid cpu range") is broken. Obviously that patch hits WARN_ON_ONCE()
when e.g. reading /proc/cpuinfo because passing "cpu + 1" instead of
"cpu" will trivially hit cpu == nr_cpumask_bits condition.
Although syzbot found this problem in linux-next.git on 2022/09/27 [2],
this problem was not fixed immediately. As a result, that patch was
sent to linux.git before the patch author recognizes this problem, and
syzbot started failing to test changes in linux.git since 2022/10/10
[3].
Andrew Jones proposed a fix for x86 and riscv architectures [4]. But
[2] and [5] indicate that affected locations are not limited to arch
code. More delay before we find and fix affected locations, less tested
kernel (and more difficult to bisect and fix) before release.
We should have inspected and fixed basically all cpumask users before
applying that patch. We should not crash kernels in order to ask
existing cpumask users to update their code, even if limited to
CONFIG_DEBUG_PER_CPU_MAPS=y case.
Link: https://syzkaller.appspot.com/bug?extid=d0fd2bf0dd6da72496dd
Link: https://syzkaller.appspot.com/bug?extid=21da700f3c9f0bc40150
Link: https://syzkaller.appspot.com/bug?extid=51a652e2d24d53e75734
Link: https://lkml.kernel.org/r/20221014155845.1986223-1-ajones@ventanamicro.com
Link: https://syzkaller.appspot.com/bug?extid=4d46c43d81c3bd155060
Reported-by: Andrew Jones <ajones@ventanamicro.com>
Reported-by: syzbot+d0fd2bf0dd6da72496dd@syzkaller.appspotmail.com
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Yury Norov <yury.norov@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nathan Chancellor [Fri, 14 Oct 2022 20:42:11 +0000 (13:42 -0700)]
lib/Kconfig.debug: Add check for non-constant .{s,u}leb128 support to DWARF5
When building with a RISC-V kernel with DWARF5 debug info using clang
and the GNU assembler, several instances of the following error appear:
/tmp/vgettimeofday-48aa35.s:2963: Error: non-constant .uleb128 is not supported
Dumping the .s file reveals these .uleb128 directives come from
.debug_loc and .debug_ranges:
.Ldebug_loc0:
.byte 4 # DW_LLE_offset_pair
.uleb128 .Lfunc_begin0-.Lfunc_begin0 # starting offset
.uleb128 .Ltmp1-.Lfunc_begin0 # ending offset
.byte 1 # Loc expr size
.byte 90 # DW_OP_reg10
.byte 0 # DW_LLE_end_of_list
.Ldebug_ranges0:
.byte 4 # DW_RLE_offset_pair
.uleb128 .Ltmp6-.Lfunc_begin0 # starting offset
.uleb128 .Ltmp27-.Lfunc_begin0 # ending offset
.byte 4 # DW_RLE_offset_pair
.uleb128 .Ltmp28-.Lfunc_begin0 # starting offset
.uleb128 .Ltmp30-.Lfunc_begin0 # ending offset
.byte 0 # DW_RLE_end_of_list
There is an outstanding binutils issue to support a non-constant operand
to .sleb128 and .uleb128 in GAS for RISC-V but there does not appear to
be any movement on it, due to concerns over how it would work with
linker relaxation.
To avoid these build errors, prevent DWARF5 from being selected when
using clang and an assembler that does not have support for these symbol
deltas, which can be easily checked in Kconfig with as-instr plus the
small test program from the dwz test suite from the binutils issue.
Link: https://sourceware.org/bugzilla/show_bug.cgi?id=27215
Link: https://github.com/ClangBuiltLinux/linux/issues/1719
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Masahiro Yamada [Fri, 14 Oct 2022 20:18:11 +0000 (05:18 +0900)]
kbuild: fix single directory build
Commit
f110e5a250e3 ("kbuild: refactor single builds of *.ko") was wrong.
KBUILD_MODULES _is_ needed for single builds.
Otherwise, "make foo/bar/baz/" does not build module objects at all.
Fixes:
f110e5a250e3 ("kbuild: refactor single builds of *.ko")
Reported-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Tested-by: David Sterba <dsterba@suse.com>
Linus Torvalds [Sun, 16 Oct 2022 00:05:07 +0000 (17:05 -0700)]
Merge tag 'slab-for-6.1-rc1-hotfix' of git://git./linux/kernel/git/vbabka/slab
Pull slab hotfix from Vlastimil Babka:
"A single fix for the common-kmalloc series, for warnings on mips and
sparc64 reported by Guenter Roeck"
* tag 'slab-for-6.1-rc1-hotfix' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
mm/slab: use kmalloc_node() for off slab freelist_idx_t array allocation
Linus Torvalds [Sat, 15 Oct 2022 23:47:33 +0000 (16:47 -0700)]
Merge tag 'for-linus' of https://github.com/openrisc/linux
Pull OpenRISC updates from Stafford Horne:
"I have relocated to London so not much work from me while I get
settled.
Still, OpenRISC picked up two patches in this window:
- Fix for kernel page table walking from Jann Horn
- MAINTAINER entry cleanup from Palmer Dabbelt"
* tag 'for-linus' of https://github.com/openrisc/linux:
MAINTAINERS: git://github -> https://github.com for openrisc
openrisc: Fix pagewalk usage in arch_dma_{clear, set}_uncached
Linus Torvalds [Sat, 15 Oct 2022 23:36:38 +0000 (16:36 -0700)]
Merge tag 'pci-v6.1-fixes-1' of git://git./linux/kernel/git/helgaas/pci
Pull pci fix from Bjorn Helgaas:
"Revert the attempt to distribute spare resources to unconfigured
hotplug bridges at boot time.
This fixed some dock hot-add scenarios, but Jonathan Cameron reported
that it broke a topology with a multi-function device where one
function was a Switch Upstream Port and the other was an Endpoint"
* tag 'pci-v6.1-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
Revert "PCI: Distribute available resources for root buses, too"
Hyeonggon Yoo [Sat, 15 Oct 2022 04:34:29 +0000 (13:34 +0900)]
mm/slab: use kmalloc_node() for off slab freelist_idx_t array allocation
After commit
d6a71648dbc0 ("mm/slab: kmalloc: pass requests larger than
order-1 page to page allocator"), SLAB passes large ( > PAGE_SIZE * 2)
requests to buddy like SLUB does.
SLAB has been using kmalloc caches to allocate freelist_idx_t array for
off slab caches. But after the commit, freelist_size can be bigger than
KMALLOC_MAX_CACHE_SIZE.
Instead of using pointer to kmalloc cache, use kmalloc_node() and only
check if the kmalloc cache is off slab during calculate_slab_order().
If freelist_size > KMALLOC_MAX_CACHE_SIZE, no looping condition happens
as it allocates freelist_idx_t array directly from buddy.
Link: https://lore.kernel.org/all/20221014205818.GA1428667@roeck-us.net/
Reported-and-tested-by: Guenter Roeck <linux@roeck-us.net>
Fixes:
d6a71648dbc0 ("mm/slab: kmalloc: pass requests larger than order-1 page to page allocator")
Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Palmer Dabbelt [Thu, 13 Oct 2022 21:46:37 +0000 (14:46 -0700)]
MAINTAINERS: git://github -> https://github.com for openrisc
Github deprecated the git:// links about a year ago, so let's move to
the https:// URLs instead.
Reported-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://github.blog/2021-09-01-improving-git-protocol-security-github/
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Stafford Horne <shorne@gmail.com>
Steve French [Sat, 15 Oct 2022 05:43:22 +0000 (00:43 -0500)]
smb3: improve SMB3 change notification support
Change notification is a commonly supported feature by most servers,
but the current ioctl to request notification when a directory is
changed does not return the information about what changed
(even though it is returned by the server in the SMB3 change
notify response), it simply returns when there is a change.
This ioctl improves upon CIFS_IOC_NOTIFY by returning the notify
information structure which includes the name of the file(s) that
changed and why. See MS-SMB2 2.2.35 for details on the individual
filter flags and the file_notify_information structure returned.
To use this simply pass in the following (with enough space
to fit at least one file_notify_information structure)
struct __attribute__((__packed__)) smb3_notify {
uint32_t completion_filter;
bool watch_tree;
uint32_t data_len;
uint8_t data[];
} __packed;
using CIFS_IOC_NOTIFY_INFO 0xc009cf0b
or equivalently _IOWR(CIFS_IOCTL_MAGIC, 11, struct smb3_notify_info)
The ioctl will block until the server detects a change to that
directory or its subdirectories (if watch_tree is set).
Acked-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Acked-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Steve French [Sat, 15 Oct 2022 01:00:32 +0000 (20:00 -0500)]
cifs: lease key is uninitialized in two additional functions when smb1
cifs_open and _cifsFileInfo_put also end up with lease_key uninitialized
in smb1 mounts. It is cleaner to set lease key to zero in these
places where leases are not supported (smb1 can not return lease keys
so the field was uninitialized).
Addresses-Coverity: 1514207 ("Uninitialized scalar variable")
Addresses-Coverity: 1514331 ("Uninitialized scalar variable")
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
Steve French [Sat, 15 Oct 2022 00:18:32 +0000 (19:18 -0500)]
cifs: lease key is uninitialized in smb1 paths
It is cleaner to set lease key to zero in the places where leases are not
supported (smb1 can not return lease keys so the field was uninitialized).
Addresses-Coverity: 1513994 ("Uninitialized scalar variable")
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
Steve French [Fri, 14 Oct 2022 23:50:20 +0000 (18:50 -0500)]
smb3: must initialize two ACL struct fields to zero
Coverity spotted that we were not initalizing Stbz1 and Stbz2 to
zero in create_sd_buf.
Addresses-Coverity: 1513848 ("Uninitialized scalar variable")
Cc: <stable@vger.kernel.org>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
Paulo Alcantara [Fri, 14 Oct 2022 20:14:54 +0000 (17:14 -0300)]
cifs: fix double-fault crash during ntlmssp
The crash occurred because we were calling memzero_explicit() on an
already freed sess_data::iov[1] (ntlmsspblob) in sess_free_buffer().
Fix this by not calling memzero_explicit() on sess_data::iov[1] as
it's already by handled by callers.
Fixes:
a4e430c8c8ba ("cifs: replace kfree() with kfree_sensitive() for sensitive data")
Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
Arnaldo Carvalho de Melo [Fri, 7 Aug 2020 11:45:47 +0000 (08:45 -0300)]
tools arch x86: Sync the msr-index.h copy with the kernel sources
To pick up the changes in:
b8d1d163604bd1e6 ("x86/apic: Don't disable x2APIC if locked")
ca5b7c0d9621702e ("perf/x86/amd/lbr: Add LbrExtV2 branch record support")
Addressing these tools/perf build warnings:
diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h
Warning: Kernel ABI header at 'tools/arch/x86/include/asm/msr-index.h' differs from latest version at 'arch/x86/include/asm/msr-index.h'
That makes the beautification scripts to pick some new entries:
$ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before
$ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h
$ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after
$ diff -u before after
--- before 2022-10-14 18:06:34.
294561729 -0300
+++ after 2022-10-14 18:06:41.
285744044 -0300
@@ -264,6 +264,7 @@
[0xc0000102 - x86_64_specific_MSRs_offset] = "KERNEL_GS_BASE",
[0xc0000103 - x86_64_specific_MSRs_offset] = "TSC_AUX",
[0xc0000104 - x86_64_specific_MSRs_offset] = "AMD64_TSC_RATIO",
+ [0xc000010e - x86_64_specific_MSRs_offset] = "AMD64_LBR_SELECT",
[0xc000010f - x86_64_specific_MSRs_offset] = "AMD_DBG_EXTN_CFG",
[0xc0000300 - x86_64_specific_MSRs_offset] = "AMD64_PERF_CNTR_GLOBAL_STATUS",
[0xc0000301 - x86_64_specific_MSRs_offset] = "AMD64_PERF_CNTR_GLOBAL_CTL",
$
Now one can trace systemwide asking to see backtraces to where that MSR
is being read/written, see this example with a previous update:
# perf trace -e msr:*_msr/max-stack=32/ --filter="msr>=IA32_U_CET && msr<=IA32_INT_SSP_TAB"
^C#
If we use -v (verbose mode) we can see what it does behind the scenes:
# perf trace -v -e msr:*_msr/max-stack=32/ --filter="msr>=IA32_U_CET && msr<=IA32_INT_SSP_TAB"
Using CPUID AuthenticAMD-25-21-0
0x6a0
0x6a8
New filter for msr:read_msr: (msr>=0x6a0 && msr<=0x6a8) && (common_pid != 597499 && common_pid != 3313)
0x6a0
0x6a8
New filter for msr:write_msr: (msr>=0x6a0 && msr<=0x6a8) && (common_pid != 597499 && common_pid != 3313)
mmap size 528384B
^C#
Example with a frequent msr:
# perf trace -v -e msr:*_msr/max-stack=32/ --filter="msr==IA32_SPEC_CTRL" --max-events 2
Using CPUID AuthenticAMD-25-21-0
0x48
New filter for msr:read_msr: (msr==0x48) && (common_pid != 2612129 && common_pid != 3841)
0x48
New filter for msr:write_msr: (msr==0x48) && (common_pid != 2612129 && common_pid != 3841)
mmap size 528384B
Looking at the vmlinux_path (8 entries long)
symsrc__init: build id mismatch for vmlinux.
Using /proc/kcore for kernel data
Using /proc/kallsyms for symbols
0.000 Timer/2525383 msr:write_msr(msr: IA32_SPEC_CTRL, val: 6)
do_trace_write_msr ([kernel.kallsyms])
do_trace_write_msr ([kernel.kallsyms])
__switch_to_xtra ([kernel.kallsyms])
__switch_to ([kernel.kallsyms])
__schedule ([kernel.kallsyms])
schedule ([kernel.kallsyms])
futex_wait_queue_me ([kernel.kallsyms])
futex_wait ([kernel.kallsyms])
do_futex ([kernel.kallsyms])
__x64_sys_futex ([kernel.kallsyms])
do_syscall_64 ([kernel.kallsyms])
entry_SYSCALL_64_after_hwframe ([kernel.kallsyms])
__futex_abstimed_wait_common64 (/usr/lib64/libpthread-2.33.so)
0.030 :0/0 msr:write_msr(msr: IA32_SPEC_CTRL, val: 2)
do_trace_write_msr ([kernel.kallsyms])
do_trace_write_msr ([kernel.kallsyms])
__switch_to_xtra ([kernel.kallsyms])
__switch_to ([kernel.kallsyms])
__schedule ([kernel.kallsyms])
schedule_idle ([kernel.kallsyms])
do_idle ([kernel.kallsyms])
cpu_startup_entry ([kernel.kallsyms])
secondary_startup_64_no_verify ([kernel.kallsyms])
#
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Link: https://lore.kernel.org/lkml/Y0nQkz2TUJxwfXJd@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Qi Liu [Tue, 27 Sep 2022 08:14:00 +0000 (16:14 +0800)]
perf auxtrace arm64: Add support for parsing HiSilicon PCIe Trace packet
Add support for using 'perf report --dump-raw-trace' to parse PTT packet.
Example usage:
Output will contain raw PTT data and its textual representation, such
as (8DW format):
0 0 0x5810 [0x30]: PERF_RECORD_AUXTRACE size: 0x400000 offset: 0
ref: 0xa5d50c725 idx: 0 tid: -1 cpu: 0
.
. ... HISI PTT data: size 4194304 bytes
.
00000000: 00 00 00 00 Prefix
.
00000004: 08 20 00 60 Header DW0
.
00000008: ff 02 00 01 Header DW1
.
0000000c: 20 08 00 00 Header DW2
.
00000010: 10 e7 44 ab Header DW3
.
00000014: 2a a8 1e 01 Time
.
00000020: 00 00 00 00 Prefix
.
00000024: 01 00 00 60 Header DW0
.
00000028: 0f 1e 00 01 Header DW1
.
0000002c: 04 00 00 00 Header DW2
.
00000030: 40 00 81 02 Header DW3
.
00000034: ee 02 00 00 Time
....
This patch only add basic parsing support according to the definition of
the PTT packet described in Documentation/trace/hisi-ptt.rst. And the
fields of each packet can be further decoded following the PCIe Spec's
definition of TLP packet.
Signed-off-by: Qi Liu <liuqi115@huawei.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Bjorn Helgaas <helgaas@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Liu <liuqi6124@gmail.com>
Cc: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zeng Prime <prime.zeng@huawei.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-pci@vger.kernel.org
Cc: linuxarm@huawei.com
Link: https://lore.kernel.org/r/20220927081400.14364-4-yangyicong@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Qi Liu [Tue, 27 Sep 2022 08:13:59 +0000 (16:13 +0800)]
perf auxtrace arm64: Add support for HiSilicon PCIe Tune and Trace device driver
HiSilicon PCIe tune and trace device (PTT) could dynamically tune the
PCIe link's events, and trace the TLP headers).
This patch add support for PTT device in perf tool, so users could use
'perf record' to get TLP headers trace data.
Reviewed-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Qi Liu <liuqi115@huawei.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Acked-by: John Garry <john.garry@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Bjorn Helgaas <helgaas@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Liu <liuqi6124@gmail.com>
Cc: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zeng Prime <prime.zeng@huawei.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-pci@vger.kernel.org
Cc: linuxarm@huawei.com
Link: https://lore.kernel.org/r/20220927081400.14364-3-yangyicong@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Qi Liu [Tue, 27 Sep 2022 08:13:58 +0000 (16:13 +0800)]
perf auxtrace arm: Refactor event list iteration in auxtrace_record__init()
Add find_pmu_for_event() and use to simplify logic in
auxtrace_record_init(). find_pmu_for_event() will be reused in
subsequent patches.
Reviewed-by: John Garry <john.garry@huawei.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Qi Liu <liuqi115@huawei.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Bjorn Helgaas <helgaas@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Liu <liuqi6124@gmail.com>
Cc: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zeng Prime <prime.zeng@huawei.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-pci@vger.kernel.org
Cc: linuxarm@huawei.com
Link: https://lore.kernel.org/r/20220927081400.14364-2-yangyicong@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Athira Rajeev [Thu, 6 Oct 2022 15:51:49 +0000 (21:21 +0530)]
perf tests stat+json_output: Include sanity check for topology
Testcase stat+json_output.sh fails in powerpc:
86: perf stat JSON output linter : FAILED!
The testcase "stat+json_output.sh" verifies perf stat JSON output. The
test covers aggregation modes like per-socket, per-core, per-die, -A
(no_aggr mode) along with few other tests. It counts expected fields for
various commands. For example say -A (i.e, AGGR_NONE mode), expects 7
fields in the output having "CPU" as first field. Same way, for
per-socket, it expects the first field in result to point to socket id.
The testcases compares the result with expected count.
The values for socket, die, core and cpu are fetched from topology
directory:
/sys/devices/system/cpu/cpu*/topology.
For example, socket value is fetched from "physical_package_id" file of
topology directory. (cpu__get_topology_int() in util/cpumap.c)
If a platform fails to fetch the topology information, values will be
set to -1. For example, incase of pSeries platform of powerpc, value for
"physical_package_id" is restricted and not exposed. So, -1 will be
assigned.
Perf code has a checks for valid cpu id in "aggr_printout"
(stat-display.c), which displays the fields. So, in cases where topology
values not exposed, first field of the output displaying will be empty.
This cause the testcase to fail, as it counts number of fields in the
output.
Incase of -A (AGGR_NONE mode,), testcase expects 7 fields in the output,
becos of -1 value obtained from topology files for some, only 6 fields
are printed. Hence a testcase failure reported due to mismatch in number
of fields in the output.
Patch here adds a sanity check in the testcase for topology. Check will
help to skip the test if -1 value found.
Fixes:
0c343af2a2f82844 ("perf test: JSON format checking")
Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com>
Suggested-by: Ian Rogers <irogers@google.com>
Suggested-by: James Clark <james.clark@arm.com>
Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Claire Jensen <cjense@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nageswara R Sastry <rnsastry@linux.ibm.com>
Link: https://lore.kernel.org/r/20221006155149.67205-2-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Athira Rajeev [Thu, 6 Oct 2022 15:51:48 +0000 (21:21 +0530)]
perf tests stat+csv_output: Include sanity check for topology
Testcase stat+csv_output.sh fails in powerpc:
84: perf stat CSV output linter: FAILED!
The testcase "stat+csv_output.sh" verifies perf stat CSV output. The
test covers aggregation modes like per-socket, per-core, per-die, -A
(no_aggr mode) along with few other tests. It counts expected fields for
various commands. For example say -A (i.e, AGGR_NONE mode), expects 7
fields in the output having "CPU" as first field. Same way, for
per-socket, it expects the first field in result to point to socket id.
The testcases compares the result with expected count.
The values for socket, die, core and cpu are fetched from topology
directory:
/sys/devices/system/cpu/cpu*/topology.
For example, socket value is fetched from "physical_package_id" file of
topology directory. (cpu__get_topology_int() in util/cpumap.c)
If a platform fails to fetch the topology information, values will be
set to -1. For example, incase of pSeries platform of powerpc, value for
"physical_package_id" is restricted and not exposed. So, -1 will be
assigned.
Perf code has a checks for valid cpu id in "aggr_printout"
(stat-display.c), which displays the fields. So, in cases where topology
values not exposed, first field of the output displaying will be empty.
This cause the testcase to fail, as it counts number of fields in the
output.
Incase of -A (AGGR_NONE mode,), testcase expects 7 fields in the output,
becos of -1 value obtained from topology files for some, only 6 fields
are printed. Hence a testcase failure reported due to mismatch in number
of fields in the output.
Patch here adds a sanity check in the testcase for topology. Check will
help to skip the test if -1 value found.
Fixes:
7473ee56dbc91c98 ("perf test: Add checking for perf stat CSV output.")
Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com>
Suggested-by: Ian Rogers <irogers@google.com>
Suggested-by: James Clark <james.clark@arm.com>
Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Claire Jensen <cjense@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nageswara R Sastry <rnsastry@linux.ibm.com>
Link: https://lore.kernel.org/r/20221006155149.67205-1-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Wed, 12 Oct 2022 08:22:59 +0000 (11:22 +0300)]
perf intel-pt: Fix system_wide dummy event for hybrid
User space tasks can migrate between CPUs, so when tracing selected CPUs,
system-wide sideband is still needed, however evlist->core.has_user_cpus
is not set in the hybrid case, so check the target cpu_list instead.
Fixes:
7d189cadbeebc778 ("perf intel-pt: Track sideband system-wide when needed")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20221012082259.22394-3-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Wed, 12 Oct 2022 08:22:58 +0000 (11:22 +0300)]
perf intel-pt: Fix segfault in intel_pt_print_info() with uClibc
uClibc segfaulted because NULL was passed as the format to fprintf().
That happened because one of the format strings was missing and
intel_pt_print_info() didn't check that before calling fprintf().
Add the missing format string, and check format is not NULL before calling
fprintf().
Fixes:
11fa7cb86b56d361 ("perf tools: Pass Intel PT information for decoding MTC and CYC")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20221012082259.22394-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
James Clark [Wed, 12 Oct 2022 09:46:32 +0000 (10:46 +0100)]
perf test: Fix attr tests for PERF_FORMAT_LOST
Since PERF_FORMAT_LOST was added, the default read format has that bit
set, so add it to the tests. Keep the old value as well so that the test
still passes on older kernels.
This fixes the following failure:
expected read_format=0|4, got 20
FAILED './tests/attr/test-record-C0' - match failure
Fixes:
85b425f31c8866e0 ("perf record: Set PERF_FORMAT_LOST by default")
Signed-off-by: James Clark <james.clark@arm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221012094633.21669-2-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ammy Yi [Fri, 14 Oct 2022 17:09:05 +0000 (20:09 +0300)]
perf test: test_intel_pt.sh: Add 9 tests
Add tests:
Test with MTC and TSC disabled
Test with branches disabled
Test with/without CYC
Test recording with sample mode
Test with kernel trace
Test virtual LBR
Test power events
Test with TNT packets disabled
Test with event_trace
These tests mostly check that perf record works with the corresponding
Intel PT config terms, sometimes also checking that certain packets do or
do not appear in the resulting trace as appropriate.
The "Test virtual LBR" is slightly trickier, using a Python script to
check that branch stacks are actually synthesized.
Signed-off-by: Ammy Yi <ammy.yi@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20221014170905.64069-8-adrian.hunter@intel.com
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Fri, 14 Oct 2022 17:09:04 +0000 (20:09 +0300)]
perf inject: Fix GEN_ELF_TEXT_OFFSET for jit
When a program header was added, it moved the text section but
GEN_ELF_TEXT_OFFSET was not updated.
Fix by adding the program header size and aligning.
Fixes:
babd04386b1df8c3 ("perf jit: Include program header in ELF files")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Lieven Hey <lieven.hey@kdab.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20221014170905.64069-7-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Fri, 14 Oct 2022 17:09:03 +0000 (20:09 +0300)]
perf test: test_intel_pt.sh: Add jitdump test
Add a test for decoding self-modifying code using a jitdump file.
The test creates a workload that uses self-modifying code and generates its
own jitdump file. The result is processed with perf inject --jit and
checked for decoding errors.
Note the test will fail without patch "perf inject: Fix GEN_ELF_TEXT_OFFSET
for jit" applied.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20221014170905.64069-6-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Fri, 14 Oct 2022 17:09:02 +0000 (20:09 +0300)]
perf test: test_intel_pt.sh: Tidy some alignment
Tidy alignment of test function lines to make them more readable.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20221014170905.64069-5-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Fri, 14 Oct 2022 17:09:01 +0000 (20:09 +0300)]
perf test: test_intel_pt.sh: Print a message when skipping kernel tracing
Messages display with the perf test -v option. Add a message to show when
skipping a test because the user cannot do kernel tracing.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20221014170905.64069-4-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Fri, 14 Oct 2022 17:09:00 +0000 (20:09 +0300)]
perf test: test_intel_pt.sh: Tidy some perf record options
When not decoding, the options "-B -N --no-bpf-event" speed up perf record.
Make a common function for them.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20221014170905.64069-3-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Fri, 14 Oct 2022 17:08:59 +0000 (20:08 +0300)]
perf test: test_intel_pt.sh: Fix return checking again
count_result() does not always reset ret=0 which means the value can spill
into the next test result.
Fix by explicitly setting it to zero between tests.
Committer testing:
# perf test "Miscellaneous Intel PT testing"
110: Miscellaneous Intel PT testing : Ok
#
Tested as well with:
# perf test -v "Miscellaneous Intel PT testing"
Fixes:
fd9b45e39cfaf885 ("perf test: test_intel_pt.sh: Fix return checking")
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20221014170905.64069-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Linus Torvalds [Sat, 15 Oct 2022 01:41:41 +0000 (18:41 -0700)]
Merge tag 'libnvdimm-for-6.1' of git://git./linux/kernel/git/nvdimm/nvdimm
Pull nvdimm updates from Dan Williams:
"Some small cleanups and fixes in and around the nvdimm subsystem. The
most significant change is a regression fix for nvdimm namespace
(volume) creation when the namespace size is smaller than 2MB/
Summary:
- Fix nvdimm namespace creation on platforms that do not publish
associated 'DIMM' metadata for a persistent memory region.
- Miscellaneous fixes and cleanups"
* tag 'libnvdimm-for-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
ACPI: HMAT: Release platform device in case of platform_device_add_data() fails
dax: Remove usage of the deprecated ida_simple_xxx API
libnvdimm/region: Allow setting align attribute on regions without mappings
nvdimm/namespace: Fix comment typo
nvdimm: make __nvdimm_security_overwrite_query static
nvdimm/region: Fix kernel-doc
nvdimm/namespace: drop unneeded temporary variable in size_store()
nvdimm/namespace: return uuid_null only once in nd_dev_to_uuid()
Linus Torvalds [Sat, 15 Oct 2022 01:36:42 +0000 (18:36 -0700)]
Merge tag 'rtc-6.1' of git://git./linux/kernel/git/abelloni/linux
Pull RTC updates from Alexandre Belloni:
"A great rework of the isl12022 driver makes up the bulk of the
changes. There is also an important fix for CMOS and then the usual
small fixes:
- switch to devm_clk_get_enabled() where relevant
- cmos: event handler registration fix
- isl12022: code improvements"
* tag 'rtc-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux:
rtc: rv3028: Fix codestyle errors
rtc: cmos: Fix event handler registration ordering issue
rtc: k3: Use devm_clk_get_enabled() helper
rtc: jz4740: Use devm_clk_get_enabled() helper
rtc: mpfs: Use devm_clk_get_enabled() helper
rtc: ds1685: Fix spelling of function name in comment block
rtc: isl12022: switch to using regmap API
rtc: isl12022: drop redundant write to HR register
rtc: isl12022: use dev_set_drvdata() instead of i2c_set_clientdata()
rtc: isl12022: use %ptR
rtc: isl12022: simplify some expressions
rtc: isl12022: drop a dev_info()
rtc: isl12022: specify range_min and range_max
rtc: isl12022: stop using deprecated devm_rtc_device_register()
rtc: stmp3xxx: Add failure handling for stmp3xxx_wdt_register()
rtc: mxc: Use devm_clk_get_enabled() helper
rtc: gamecube: Always reset HW_SRNPROT after read
rtc: k3: detect SoC to determine erratum fix
rtc: k3: wait until the unlock field is not zero
rtc: mpfs: Remove printing of stray CR
Linus Torvalds [Sat, 15 Oct 2022 01:31:28 +0000 (18:31 -0700)]
Merge tag 'i3c/for-6.1' of git://git./linux/kernel/git/i3c/linux
Pull i3c updates from Alexandre Belloni:
"Not much this cycle, only two fixes for a rare event"
- fix device reattach issues"
* tag 'i3c/for-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux:
i3c: master: Remove the wrong place of reattach.
i3c: master: Free the old_dyn_addr when reattach.
Linus Torvalds [Sat, 15 Oct 2022 01:23:23 +0000 (18:23 -0700)]
Merge tag 'for-linus-6.1-rc1' of git://git./linux/kernel/git/rw/ubifs
Pull UBI and UBIFS updates from Richard Weinberger:
"UBI:
- Use bitmap API to allocate bitmaps
- New attach mode, disable_fm, to attach without fastmap
- Fixes for various typos in comments
UBIFS:
- Fix for a deadlock when setting xattrs for encrypted file
- Fix for an assertion failures when truncating encrypted files
- Fixes for various typos in comments"
* tag 'for-linus-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs:
ubi: fastmap: Add fastmap control support for 'UBI_IOCATT' ioctl
ubi: fastmap: Use the bitmap API to allocate bitmaps
ubifs: Fix AA deadlock when setting xattr for encrypted file
ubifs: Fix UBIFS ro fail due to truncate in the encrypted directory
mtd: ubi: drop unexpected word 'a' in comments
ubi: block: Fix typos in comments
ubi: fastmap: Fix typo in comments
ubi: Fix repeated words in comments
ubi: ubi-media.h: Fix comment typo
ubi: block: Remove in vain semicolon
ubifs: Fix ubifs_check_dir_empty() kernel-doc comment
Linus Torvalds [Sat, 15 Oct 2022 01:14:48 +0000 (18:14 -0700)]
Merge tag 'for-linus-6.1-rc1' of git://git./linux/kernel/git/uml/linux
Pull UML updates from Richard Weinberger:
- Move to strscpy()
- Improve panic notifiers
- Fix NR_CPUS usage
- Fixes for various comments
- Fixes for virtio driver
* tag 'for-linus-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux:
uml: Remove the initialization of statics to 0
um: Do not initialise statics to 0.
um: Fix comment typo
um: Improve panic notifiers consistency and ordering
um: remove unused reactivate_chan() declaration
um: mmaper: add __exit annotations to module exit funcs
um: virt-pci: add __init/__exit annotations to module init/exit funcs
hostfs: move from strlcpy with unused retval to strscpy
um: move from strlcpy with unused retval to strscpy
um: increase default virtual physical memory to 64 MiB
UM: cpuinfo: Fix a warning for CONFIG_CPUMASK_OFFSTACK
um: read multiple msg from virtio slave request fd
Linus Torvalds [Fri, 14 Oct 2022 20:47:42 +0000 (13:47 -0700)]
Merge tag 'asm-generic-fixes-6.1-1' of git://git./linux/kernel/git/arnd/asm-generic
Pull asm-generic fix from Arnd Bergmann:
"A last-minute arch/alpha regression fix: the previous asm-generic
branch contained a new regression from a typo"
* tag 'asm-generic-fixes-6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
alpha: fix marvel_ioread8 build regression
Linus Torvalds [Fri, 14 Oct 2022 20:44:53 +0000 (13:44 -0700)]
Merge tag 'arm-fixes-6.1-1' of git://git./linux/kernel/git/soc/soc
Pull ARM SoC fixes from Arnd Bergmann:
"These are three fixes for build warnings that came in during the merge
window"
* tag 'arm-fixes-6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
ARM: mmp: Make some symbols static
ARM: spear6xx: Staticize few definitions
clk: spear: Move prototype to accessible header
Stephen Boyd [Fri, 14 Oct 2022 20:44:44 +0000 (13:44 -0700)]
Merge branch 'clk-rate-range' into clk-next
- Various clk rate range fixes
- Drop clk rate range constraints on clk_put() (redux)
* clk-rate-range: (28 commits)
clk: mediatek: clk-mux: Add .determine_rate() callback
clk: tests: Add tests for notifiers
clk: Update req_rate on __clk_recalc_rates()
clk: tests: Add missing test case for ranges
clk: qcom: clk-rcg2: Take clock boundaries into consideration for gfx3d
clk: Introduce the clk_hw_get_rate_range function
clk: Zero the clk_rate_request structure
clk: Stop forwarding clk_rate_requests to the parent
clk: Constify clk_has_parent()
clk: Introduce clk_core_has_parent()
clk: Switch from __clk_determine_rate to clk_core_round_rate_nolock
clk: Add our request boundaries in clk_core_init_rate_req
clk: Introduce clk_hw_init_rate_request()
clk: Move clk_core_init_rate_req() from clk_core_round_rate_nolock() to its caller
clk: Change clk_core_init_rate_req prototype
clk: Set req_rate on reparenting
clk: Take into account uncached clocks in clk_set_rate_range()
clk: tests: Add some tests for orphan with multiple parents
clk: tests: Add tests for mux with multiple parents
clk: tests: Add tests for single parent mux
...
Jon Hunter [Mon, 10 Oct 2022 10:00:46 +0000 (11:00 +0100)]
clk: tegra: Fix Tegra PWM parent clock
Commit
8c193f4714df ("pwm: tegra: Optimize period calculation") updated
the period calculation in the Tegra PWM driver and now returns an error
if the period requested is less than minimum period supported. This is
breaking PWM support on various Tegra platforms. For example, on the
Tegra210 Jetson Nano platform this is breaking the PWM fan support and
probing the PWM fan driver now fails ...
pwm-fan pwm-fan: Failed to configure PWM: -22
pwm-fan: probe of pwm-fan failed with error -22
The problem is that the default parent clock for the PWM on Tegra210 is
a 32kHz clock and is unable to support the requested PWM period.
Fix PWM support on Tegra20, Tegra30, Tegra114, Tegra124 and Tegra210 by
updating the parent clock for the PWM to be the PLL_P.
Fixes:
8c193f4714df ("pwm: tegra: Optimize period calculation")
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Tested-by: Robert Eckelmann <longnoserob@gmail.com> # TF101 T20
Tested-by: Antoni Aloy Torrens <aaloytorrens@gmail.com> # TF101 T20
Tested-by: Svyatoslav Ryhel <clamor95@gmail.com> # TF201 T30
Tested-by: Andreas Westman Dorcsak <hedmoo@yahoo.com> # TF700T T3
Link: https://lore.kernel.org/r/20221010100046.6477-1-jonathanh@nvidia.com
Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Kefeng Wang [Wed, 12 Oct 2022 03:06:35 +0000 (11:06 +0800)]
clk: at91: fix the build with binutils 2.27
There is an issue when build with older versions of binutils 2.27.0,
arch/arm/mach-at91/pm_suspend.S: Assembler messages:
arch/arm/mach-at91/pm_suspend.S:1086: Error: garbage following instruction -- `ldr tmp1,=0x00020010UL'
Use UL() macro to fix the issue in assembly file.
Fixes:
4fd36e458392 ("ARM: at91: pm: add plla disable/enable support for sam9x60")
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Link: https://lore.kernel.org/r/20221012030635.13140-1-wangkefeng.wang@huawei.com
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Linus Walleij [Thu, 13 Oct 2022 14:07:45 +0000 (16:07 +0200)]
clk: qcom: gcc-msm8660: Drop hardcoded fixed board clocks
These two clocks are now registered in the device tree as fixed clocks,
causing a regression in the driver as the clock already exists with
e.g. the name "pxo_board" as the MSM8660 GCC driver probes.
Fix this by just not hard-coding this anymore and everything works
like a charm.
Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Fixes:
baecbda52933 ("ARM: dts: qcom: msm8660: fix node names for fixed clocks")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20221013140745.7801-1-linus.walleij@linaro.org
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
AngeloGioacchino Del Regno [Tue, 11 Oct 2022 13:55:48 +0000 (15:55 +0200)]
clk: mediatek: clk-mux: Add .determine_rate() callback
Since commit
262ca38f4b6e ("clk: Stop forwarding clk_rate_requests
to the parent"), the clk_rate_request is .. as the title says, not
forwarded anymore to the parent: this produces an issue with the
MediaTek clock MUX driver during GPU DVFS on MT8195, but not on
MT8192 or others.
This is because, differently from others, like MT8192 where all of
the clocks in the MFG parents tree are of mtk_mux type, but in the
parent tree of MT8195's MFG clock, we have one mtk_mux clock and
one (clk framework generic) mux clock, like so:
names: mfg_bg3d -> mfg_ck_fast_ref -> top_mfg_core_tmp (or) mfgpll
types: mtk_gate -> mux -> mtk_mux (or) mtk_pll
To solve this issue and also keep the GPU DVFS clocks code working
as expected, wire up a .determine_rate() callback for the mtk_mux
ops; for that, the standard clk_mux_determine_rate_flags() was used
as it was possible to.
This commit was successfully tested on MT6795 Xperia M5, MT8173 Elm,
MT8192 Spherion and MT8195 Tomato; no regressions were seen.
For the sake of some more documentation about this issue here's the
trace of it:
[ 12.211587] ------------[ cut here ]------------
[ 12.211589] WARNING: CPU: 6 PID: 78 at drivers/clk/clk.c:1462 clk_core_init_rate_req+0x84/0x90
[ 12.211593] Modules linked in: stp crct10dif_ce mtk_adsp_common llc rfkill snd_sof_xtensa_dsp
panfrost(+) sbs_battery cros_ec_lid_angle cros_ec_sensors snd_sof_of
cros_ec_sensors_core hid_multitouch cros_usbpd_logger snd_sof gpu_sched
snd_sof_utils fuse ipv6
[ 12.211614] CPU: 6 PID: 78 Comm: kworker/u16:2 Tainted: G W 6.0.0-next-
20221011+ #58
[ 12.211616] Hardware name: Acer Tomato (rev2) board (DT)
[ 12.211617] Workqueue: devfreq_wq devfreq_monitor
[ 12.211620] pstate:
40400009 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 12.211622] pc : clk_core_init_rate_req+0x84/0x90
[ 12.211625] lr : clk_core_forward_rate_req+0xa4/0xe4
[ 12.211627] sp :
ffff80000893b8e0
[ 12.211628] x29:
ffff80000893b8e0 x28:
ffffdddf92f9b000 x27:
ffff46a2c0e8bc05
[ 12.211632] x26:
ffff46a2c1041200 x25:
0000000000000000 x24:
00000000173eed80
[ 12.211636] x23:
ffff80000893b9c0 x22:
ffff80000893b940 x21:
0000000000000000
[ 12.211641] x20:
ffff46a2c1039f00 x19:
ffff46a2c1039f00 x18:
0000000000000000
[ 12.211645] x17:
0000000000000038 x16:
000000000000d904 x15:
0000000000000003
[ 12.211649] x14:
ffffdddf9357ce48 x13:
ffffdddf935e71c8 x12:
000000000004803c
[ 12.211653] x11:
00000000a867d7ad x10:
00000000a867d7ad x9 :
ffffdddf90c28df4
[ 12.211657] x8 :
ffffdddf9357a980 x7 :
0000000000000000 x6 :
0000000000000004
[ 12.211661] x5 :
ffffffffffffffc8 x4 :
00000000173eed80 x3 :
ffff80000893b940
[ 12.211665] x2 :
00000000173eed80 x1 :
ffff80000893b940 x0 :
0000000000000000
[ 12.211669] Call trace:
[ 12.211670] clk_core_init_rate_req+0x84/0x90
[ 12.211673] clk_core_round_rate_nolock+0xe8/0x10c
[ 12.211675] clk_mux_determine_rate_flags+0x174/0x1f0
[ 12.211677] clk_mux_determine_rate+0x1c/0x30
[ 12.211680] clk_core_determine_round_nolock+0x74/0x130
[ 12.211682] clk_core_round_rate_nolock+0x58/0x10c
[ 12.211684] clk_core_round_rate_nolock+0xf4/0x10c
[ 12.211686] clk_core_set_rate_nolock+0x194/0x2ac
[ 12.211688] clk_set_rate+0x40/0x94
[ 12.211691] _opp_config_clk_single+0x38/0xa0
[ 12.211693] _set_opp+0x1b0/0x500
[ 12.211695] dev_pm_opp_set_rate+0x120/0x290
[ 12.211697] panfrost_devfreq_target+0x3c/0x50 [panfrost]
[ 12.211705] devfreq_set_target+0x8c/0x2d0
[ 12.211707] devfreq_update_target+0xcc/0xf4
[ 12.211708] devfreq_monitor+0x40/0x1d0
[ 12.211710] process_one_work+0x294/0x664
[ 12.211712] worker_thread+0x7c/0x45c
[ 12.211713] kthread+0x104/0x110
[ 12.211716] ret_from_fork+0x10/0x20
[ 12.211718] irq event stamp: 7102
[ 12.211719] hardirqs last enabled at (7101): [<
ffffdddf904ea5a0>] finish_task_switch.isra.0+0xec/0x2f0
[ 12.211723] hardirqs last disabled at (7102): [<
ffffdddf91794b74>] el1_dbg+0x24/0x90
[ 12.211726] softirqs last enabled at (6716): [<
ffffdddf90410be4>] __do_softirq+0x414/0x588
[ 12.211728] softirqs last disabled at (6507): [<
ffffdddf904171d8>] ____do_softirq+0x18/0x24
[ 12.211730] ---[ end trace
0000000000000000 ]---
Fixes:
262ca38f4b6e ("clk: Stop forwarding clk_rate_requests to the parent")
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://lore.kernel.org/r/20221011135548.318323-1-angelogioacchino.delregno@collabora.com
Signed-off-by: Stephen Boyd <sboyd@kernel.org>