Amir Goldstein [Sun, 24 Dec 2017 16:28:04 +0000 (18:28 +0200)]
ovl: decode lower file handles of unlinked but open files
Lookup overlay inode in cache by origin inode, so we can decode a file
handle of an open file even if the index has a whiteout index entry to
mark this overlay inode was unlinked.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Amir Goldstein [Fri, 19 Jan 2018 19:36:20 +0000 (21:36 +0200)]
ovl: decode indexed non-dir file handles
Decoding an indexed non-dir file handle is similar to decoding a lower
non-dir file handle, but additionally, we lookup the file handle in index
dir by name to find the real upper inode.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Amir Goldstein [Fri, 19 Jan 2018 19:33:44 +0000 (21:33 +0200)]
ovl: decode lower non-dir file handles
Decoding a lower non-dir file handle is done by decoding the lower dentry
from underlying lower fs, finding or allocating an overlay inode that is
hashed by the real lower inode and instantiating an overlay dentry with
that inode.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Amir Goldstein [Thu, 28 Dec 2017 17:35:21 +0000 (19:35 +0200)]
ovl: encode lower file handles
For indexed or lower non-dir, encode a non-connectable lower file handle
from origin inode. For indexed or lower dir, when ofs->numlower == 1,
encode a lower file handle from lower dir.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Amir Goldstein [Thu, 18 Jan 2018 11:15:26 +0000 (13:15 +0200)]
ovl: copy up before encoding non-connectable dir file handle
Decoding a merge dir, whose origin's parent is under a redirected
lower dir is not always possible. As a simple aproximation, we do
not encode lower dir file handles when overlay has multiple lower
layers and origin is below the topmost lower layer.
We should later relax this condition and copy up only the parent
that is under a redirected lower.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Amir Goldstein [Thu, 18 Jan 2018 11:14:55 +0000 (13:14 +0200)]
ovl: encode non-indexed upper file handles
We only need to encode origin if there is a chance that the same object was
encoded pre copy up and then we need to stay consistent with the same
encoding also after copy up.
In case a non-pure upper is not indexed, then it was copied up before NFS
export support was enabled. In that case, we don't need to worry about
staying consistent with pre copy up encoding and we encode an upper file
handle.
This mitigates the problem that with no index, we cannot find an upper
inode from origin inode, so we cannot decode a non-indexed upper from
origin file handle.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Amir Goldstein [Thu, 28 Dec 2017 16:36:16 +0000 (18:36 +0200)]
ovl: decode connected upper dir file handles
Until this change, we decoded upper file handles by instantiating an
overlay dentry from the real upper dentry. This is sufficient to handle
pure upper files, but insufficient to handle merge/impure dirs.
To that end, if decoded real upper dir is connected and hashed, we
lookup an overlay dentry with the same path as the real upper dir.
If decoded real upper is non-dir, we instantiate a disconnected overlay
dentry as before this change.
Because ovl_fh_to_dentry() returns a connected overlay dir dentry,
exportfs never needs to call get_parent() and get_name() to reconnect an
upper overlay dir. Because connectable non-dir file handles are not
supported, exportfs will not be able to use fh_to_parent() and get_name()
methods to reconnect a disconnected non-dir to its parent. Therefore, the
methods get_parent() and get_name() are implemented just to print out a
sanity warning and the method fh_to_parent() is implemented to warn the
user that using the 'subtree_check' exportfs option is not supported.
An alternative approach could have been to implement instantiating of
an overlay directory inode from origin/index and implement get_parent()
and get_name() by calling into underlying fs operations and them
instantiating the overlay parent dir.
The reasons for not choosing the get_parent() approach were:
- Obtaining a disconnected overlay dir dentry would requires a
delicate re-factoring of ovl_lookup() to get a dentry with overlay
parent info. It was preferred to avoid doing that re-factoring unless
it was proven worthy.
- Going down the path of disconnected dir would mean that the (non
trivial) code path of d_splice_alias() could be traveled and that
meant writing more tests and introduces race cases that are very hard
to hit on purpose. Taking the path of connecting overlay dentry by
forward lookup is therefore the safe and boring way to avoid surprises.
The culprits of the chosen "connected overlay dentry" approach:
- We need to take special care to rename of ancestors while connecting
the overlay dentry by real dentry path. These subtleties are usually
handled by generic exportfs and VFS code.
- In a hypothetical workload, we could end up in a loop trying to connect,
interrupted by rename and restarting connect forever.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Amir Goldstein [Thu, 18 Jan 2018 23:03:23 +0000 (01:03 +0200)]
ovl: decode pure upper file handles
Decoding an upper file handle is done by decoding the upper dentry from
underlying upper fs, finding or allocating an overlay inode that is
hashed by the real upper inode and instantiating an overlay dentry with
that inode.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Amir Goldstein [Wed, 12 Jul 2017 11:17:16 +0000 (14:17 +0300)]
ovl: encode pure upper file handles
Encode overlay file handles as struct ovl_fh containing the file handle
encoding of the real upper inode.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Amir Goldstein [Thu, 25 May 2017 19:39:21 +0000 (22:39 +0300)]
ovl: document NFS export
Document NFS export design.
Followup patches will implement this design.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Fri, 19 Jan 2018 10:39:52 +0000 (11:39 +0100)]
vfs: factor out helpers d_instantiate_anon() and d_alloc_anon()
Those helpers are going to be used by overlayfs to implement
NFS export decode.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Amir Goldstein [Sun, 14 Jan 2018 17:25:31 +0000 (19:25 +0200)]
ovl: store 'has_upper' and 'opaque' as bit flags
We need to make some room in struct ovl_entry to store information
about redirected ancestors for NFS export, so cram two booleans as
bit flags.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Amir Goldstein [Sun, 15 Oct 2017 15:00:20 +0000 (18:00 +0300)]
ovl: copy up of disconnected dentries
With NFS export, some operations on decoded file handles (e.g. open,
link, setattr, xattr_set) may call copy up with a disconnected non-dir.
In this case, we will copy up lower inode to index dir without
linking it to upper dir.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Amir Goldstein [Fri, 29 Sep 2017 18:43:07 +0000 (21:43 +0300)]
ovl: use d_splice_alias() in place of d_add() in lookup
This is required for NFS export.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Amir Goldstein [Tue, 12 Dec 2017 21:43:16 +0000 (23:43 +0200)]
ovl: do not pass overlay dentry to ovl_get_inode()
This is needed for using ovl_get_inode() for decoding file handles
for NFS export.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Amir Goldstein [Thu, 28 Dec 2017 18:23:05 +0000 (20:23 +0200)]
ovl: factor out ovl_get_index_fh() helper
The helper is needed to lookup an index by file handle for NFS export.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Amir Goldstein [Thu, 11 Jan 2018 13:33:51 +0000 (15:33 +0200)]
ovl: whiteout orphan index entries on mount
Orphan index entries are non-dir index entries whose union nlink count
dropped to zero. With index=on, orphan index entries are removed on
mount. With NFS export feature enabled, orphan index entries are replaced
with white out index entries to block future open by handle from opening
the lower file.
When dir index has a stale 'upper' xattr, we assume that the upper dir
was removed and we treat the dir index as orphan entry that needs to be
whited out or removed.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Amir Goldstein [Tue, 24 Oct 2017 14:38:33 +0000 (17:38 +0300)]
ovl: whiteout index when union nlink drops to zero
With NFS export feature enabled, when overlay inode nlink drops to
zero, instead of removing the index entry, replace it with a whiteout
index entry.
This is needed for NFS export in order to prevent future open by handle
from opening the lower file directly.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Amir Goldstein [Tue, 26 Sep 2017 04:40:37 +0000 (07:40 +0300)]
ovl: cleanup dir index when dir nlink drops to zero
When non-dir index union nlink drops to zero the non-dir index
is cleaned. Do the same for directory type index entries when
union directory is removed.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Amir Goldstein [Thu, 11 Jan 2018 12:01:08 +0000 (14:01 +0200)]
ovl: index directories on copy up for NFS export
With the NFS export feature enabled, all dirs are indexed on copy up.
Non-dir files are copied up directly to indexdir and then hardlinked
to upper dir.
Directories are copied up to indexdir, then an index entry is created
in indexdir with 'upper' xattr pointing to the copied up dir and then
the copied up dir is moved to upper dir.
Directory index is also used for consistency verification, like
detecting multiple redirected dirs to the same lower dir on lookup.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Amir Goldstein [Tue, 21 Nov 2017 22:08:21 +0000 (00:08 +0200)]
ovl: index all non-dir on copy up for NFS export
With the NFS export feature enabled, all non-dir are indexed on copy up.
The copy up origin inode of an indexed non-dir can be used as a unique
identifier of the overlay object.
The full index is also used for consistency verfication, like detecting
multiple non-hardlink uppers with the same 'origin' on lookup.
Directory index on copy up will be implemented by following patch.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Amir Goldstein [Tue, 26 Sep 2017 04:55:26 +0000 (07:55 +0300)]
ovl: create ovl_need_index() helper
The helper determines which lower file needs to be indexed
on copy up and before nlink changes.
For index=on, the helper evaluates to true for lower hardlinks.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Amir Goldstein [Wed, 1 Nov 2017 08:13:51 +0000 (10:13 +0200)]
ovl: cleanup temp index entries
A previous failed attempt to create or whiteout a directory index may
leave index entries named '#%x' in the index dir. Cleanup those temp
entries on mount instead of failing the mount.
In the future, we may drop 'work' dir and use 'index' dir instead.
This change is enough for cleaning up copy up leftovers 'from the future',
but it is not enough for cleaning up rmdir leftovers 'from the future'
(i.e. temp dir containing whiteouts).
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Amir Goldstein [Thu, 11 Jan 2018 09:33:24 +0000 (11:33 +0200)]
ovl: verify directory index entries on mount
Directory index entries should have 'upper' xattr pointing to the real
upper dir. Verifying that the upper dir file handle is not stale is
expensive, so only verify stale directory index entries on mount if
NFS export feature is enabled.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Amir Goldstein [Thu, 11 Jan 2018 09:03:13 +0000 (11:03 +0200)]
ovl: verify whiteout index entries on mount
Whiteout index entries are used as an indication that an exported
overlay file handle should be treated as stale (i.e. after unlink
of the overlay inode).
Check on mount that whiteout index entries have a name that looks like
a valid file handle and cleanup invalid index entries.
For whiteout index entries, do not check that they also have valid
origin fh and nlink xattr, because those xattr do not exist for a
whiteout index entry.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Amir Goldstein [Thu, 11 Jan 2018 08:47:03 +0000 (10:47 +0200)]
ovl: use directory index entries for consistency verification
A directory index is a directory type entry in index dir with a
"trusted.overlay.upper" xattr containing an encoded ovl_fh of the merge
directory upper dir inode.
On lookup of non-dir files, lower file is followed by origin file handle.
On lookup of dir entries, lower dir is found by name and then compared
to origin file handle. We only trust dir index if we verified that lower
dir matches origin file handle, otherwise index may be inconsistent and
we ignore it.
If we find an indexed non-upper dir or an indexed merged dir, whose
index 'upper' xattr points to a different upper dir, that means that the
lower directory may be also referenced by another upper dir via redirect,
so we fail the lookup on inconsistency error.
To be consistent with directory index entries format, the association of
index dir to upper root dir, that was stored by older kernels in
"trusted.overlay.origin" xattr is now stored in "trusted.overlay.upper"
xattr. This also serves as an indication that overlay was mounted with a
kernel that support index directory entries. For backward compatibility,
if an 'origin' xattr exists on the index dir we also verify it on mount.
Directory index entries are going to be used for NFS export.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Amir Goldstein [Tue, 21 Nov 2017 11:55:51 +0000 (13:55 +0200)]
ovl: unbless lower st_ino of unverified origin
On a malformed overlay, several redirected dirs can point to the same
dir on a lower layer. This presents a similar challenge as broken
hardlinks, because different objects in the overlay can return the same
st_ino/st_dev pair from stat(2).
For broken hardlinks, we do not provide constant st_ino on copy up to
avoid this inconsistency. When NFS export feature is enabled, apply
the same logic to files and directories with unverified lower origin.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Amir Goldstein [Wed, 10 Jan 2018 20:29:38 +0000 (22:29 +0200)]
ovl: verify stored origin fh matches lower dir
When the NFS export feature is enabled, overlayfs implicitly enables the
feature "verify_lower". When the "verify_lower" feature is enabled, a
directory inode found in lower layer by name or by redirect_dir is
verified against the file handle of the copy up origin that is stored in
the upper layer.
This introduces a change of behavior for the case of lower layer
modification while overlay is offline. A lower directory created or
moved offline under an exisitng upper directory, will not be merged with
that upper directory.
The NFS export feature should not be used after copying layers, because
the new lower directory inodes would fail verification and won't be
merged with upper directories.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Amir Goldstein [Fri, 19 Jan 2018 09:26:53 +0000 (11:26 +0200)]
ovl: add support for "nfs_export" configuration
Introduce the "nfs_export" config, module and mount options.
The NFS export feature depends on the "index" feature and enables two
implicit overlayfs features: "index_all" and "verify_lower".
The "index_all" feature creates an index on copy up of every file and
directory. The "verify_lower" feature uses the full index to detect
overlay filesystems inconsistencies on lookup, like redirect from
multiple upper dirs to the same lower dir.
NFS export can be enabled for non-upper mount with no index. However,
because lower layer redirects cannot be verified with the index, enabling
NFS export support on an overlay with no upper layer requires turning off
redirect follow (e.g. "redirect_dir=nofollow").
The full index may incur some overhead on mount time, especially when
verifying that lower directory file handles are not stale.
NFS export support, full index and consistency verification will be
implemented by following patches.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Amir Goldstein [Wed, 10 Jan 2018 21:15:21 +0000 (23:15 +0200)]
ovl: update documentation of inodes index feature
Document that inode index feature solves breaking hard links on
copy up.
Simplify Kconfig backward compatibility disclaimer.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Amir Goldstein [Thu, 11 Jan 2018 06:25:32 +0000 (08:25 +0200)]
ovl: generalize ovl_verify_origin() and helpers
Remove the "origin" language from the functions that handle set, get
and verify of "origin" xattr and pass the xattr name as an argument.
The same helpers are going to be used for NFS export to get, get and
verify the "upper" xattr for directory index entries.
ovl_verify_origin() is now a helper used only to verify non upper
file handle stored in "origin" xattr of upper inode.
The upper root dir file handle is still stored in "origin" xattr on
the index dir for backward compatibility. This is going to be changed
by the patch that adds directory index entries support.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Amir Goldstein [Tue, 12 Dec 2017 20:40:46 +0000 (22:40 +0200)]
ovl: simplify arguments to ovl_check_origin_fh()
Pass the fs instance with lower_layers array instead of the dentry
lowerstack array to ovl_check_origin_fh(), because the dentry members
of lowerstack play no role in this helper.
This change simplifies the argument list of ovl_check_origin(),
ovl_cleanup_index() and ovl_verify_index().
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Amir Goldstein [Tue, 24 Oct 2017 12:12:15 +0000 (15:12 +0300)]
ovl: factor out ovl_check_origin_fh()
Re-factor ovl_check_origin() and ovl_get_origin(), so origin fh xattr is
read from upper inode only once during lookup with multiple lower layers
and only once when verifying index entry origin.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Amir Goldstein [Wed, 8 Nov 2017 17:23:36 +0000 (19:23 +0200)]
ovl: store layer index in ovl_layer
Store the fs root layer index inside ovl_layer struct, so we can
get the root fs layer index from merge dir lower layer instead of
find it with ovl_find_layer() helper.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Amir Goldstein [Tue, 19 Sep 2017 09:14:18 +0000 (12:14 +0300)]
ovl: force r/o mount when index dir creation fails
When work dir creation fails, a warning is emitted and overlay is
mounted r/o. Trying to remount r/w will fail with no work dir.
When index dir creation fails, the same warning is emitted and overlay
is mounted r/o, but trying to remount r/w will succeed. This may cause
unintentional corruption of filesystem consistency.
Adjust the behavior of index dir creation failure to that of work dir
creation failure and do not allow to remount r/w. User needs to state
an explicitly intention to work without an index by mounting with
option 'index=off' to allow r/w mount with no index dir.
When mounting with option 'index=on' and no 'upperdir', index is
implicitly disabled, so do not warn about no file handle support.
The issue was introduced with inodes index feature in v4.13, but this
patch will not apply cleanly before ovl_fill_super() re-factoring in
v4.15.
Fixes:
02bcd1577400 ("ovl: introduce the inodes index dir feature")
Cc: <stable@vger.kernel.org> #v4.13
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Amir Goldstein [Tue, 19 Sep 2017 09:14:18 +0000 (12:14 +0300)]
ovl: disable index when no xattr support
Overlayfs falls back to index=off if lower/upper fs does not support
file handles. Do the same if upper fs does not support xattr.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Amir Goldstein [Wed, 3 Jan 2018 17:34:45 +0000 (19:34 +0200)]
ovl: fix inconsistent d_ino for legacy merge dir
For a merge dir that was copied up before v4.12 or that was hand crafted
offline (e.g. mkdir {upper/lower}/dir), upper dir does not contain the
'trusted.overlay.origin' xattr. In that case, stat(2) on the merge dir
returns the lower dir st_ino, but getdents(2) returns the upper dir d_ino.
After this change, on merge dir lookup, missing origin xattr on upper
dir will be fixed and 'impure' xattr will be fixed on parent of the legacy
merge dir.
Suggested-by: zhangyi (F) <yi.zhang@huawei.com>
Reviewed-by: zhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Amir Goldstein [Wed, 3 Jan 2018 16:54:42 +0000 (18:54 +0200)]
ovl: take mnt_want_write() for removing impure xattr
The optimization in ovl_cache_get_impure() that tries to remove an
unneeded "impure" xattr needs to take mnt_want_write() on upper fs.
Fixes:
4edb83bb1041 ("ovl: constant d_ino for non-merge dirs")
Cc: <stable@vger.kernel.org> #v4.14
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Amir Goldstein [Wed, 3 Jan 2018 16:54:41 +0000 (18:54 +0200)]
ovl: take mnt_want_write() for work/index dir setup
There are several write operations on upper fs not covered by
mnt_want_write():
- test set/remove OPAQUE xattr
- test create O_TMPFILE
- set ORIGIN xattr in ovl_verify_origin()
- cleanup of index entries in ovl_indexdir_cleanup()
Some of these go way back, but this patch only applies over the
v4.14 re-factoring of ovl_fill_super().
Cc: <stable@vger.kernel.org> #v4.14
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Amir Goldstein [Mon, 18 Dec 2017 12:25:56 +0000 (14:25 +0200)]
ovl: fix another overlay: warning prefix
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Amir Goldstein [Fri, 10 Nov 2017 11:18:07 +0000 (13:18 +0200)]
ovl: take lower dir inode mutex outside upper sb_writers lock
The functions ovl_lower_positive() and ovl_check_empty_dir() both take
inode mutex on the real lower dir under ovl_want_write() which takes
the upper_mnt sb_writers lock.
While this is not a clear locking order or layering violation, it creates
an undesired lock dependency between two unrelated layers for no good
reason.
This lock dependency materializes to a false(?) positive lockdep warning
when calling rmdir() on a nested overlayfs, where both nested and
underlying overlayfs both use the same fs type as upper layer.
rmdir() on the nested overlayfs creates the lock chain:
sb_writers of upper_mnt (e.g. tmpfs) in ovl_do_remove()
ovl_i_mutex_dir_key[] of lower overlay dir in ovl_lower_positive()
rmdir() on the underlying overlayfs creates the lock chain in
reverse order:
ovl_i_mutex_dir_key[] of lower overlay dir in vfs_rmdir()
sb_writers of nested upper_mnt (e.g. tmpfs) in ovl_do_remove()
To rid of the unneeded locking dependency, move both ovl_lower_positive()
and ovl_check_empty_dir() to before ovl_want_write() in rmdir() and
rename() implementation.
This change spreads the pieces of ovl_check_empty_and_clear() directly
inside the rmdir()/rename() implementations so the helper is no longer
needed and removed.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Amir Goldstein [Wed, 8 Nov 2017 07:39:46 +0000 (09:39 +0200)]
ovl: fix failure to fsync lower dir
As a writable mount, it is not expected for overlayfs to return
EINVAL/EROFS for fsync, even if dir/file is not changed.
This commit fixes the case of fsync of directory, which is easier to
address, because overlayfs already implements fsync file operation for
directories.
The problem reported by Raphael is that new PostgreSQL 10.0 with a
database in overlayfs where lower layer in squashfs fails to start.
The failure is due to fsync error, when PostgreSQL does fsync on all
existing db directories on startup and a specific directory exists
lower layer with no changes.
Reported-by: Raphael Hertzog <raphael@ouaza.com>
Cc: <stable@vger.kernel.org> # v3.18
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Tested-by: Raphaël Hertzog <hertzog@debian.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Amir Goldstein [Sun, 14 Jan 2018 16:35:40 +0000 (18:35 +0200)]
ovl: hash directory inodes for fsnotify
fsnotify pins a watched directory inode in cache, but if directory dentry
is released, new lookup will allocate a new dentry and a new inode.
Directory events will be notified on the new inode, while fsnotify listener
is watching the old pinned inode.
Hash all directory inodes to reuse the pinned inode on lookup. Pure upper
dirs are hashes by real upper inode, merge and lower dirs are hashed by
real lower inode.
The reference to lower inode was being held by the lower dentry object
in the overlay dentry (oe->lowerstack[0]). Releasing the overlay dentry
may drop lower inode refcount to zero. Add a refcount on behalf of the
overlay inode to prevent that.
As a by-product, hashing directory inodes also detects multiple
redirected dirs to the same lower dir and uncovered redirected dir
target on and returns -ESTALE on lookup.
The reported issue dates back to initial version of overlayfs, but this
patch depends on ovl_inode code that was introduced in kernel v4.13.
Cc: <stable@vger.kernel.org> #v4.13
Reported-by: Niklas Cassel <niklas.cassel@axis.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Tested-by: Niklas Cassel <niklas.cassel@axis.com>
Linus Torvalds [Sun, 14 Jan 2018 23:32:30 +0000 (15:32 -0800)]
Linux 4.15-rc8
Linus Torvalds [Sun, 14 Jan 2018 23:30:02 +0000 (15:30 -0800)]
Merge branch 'x86-pti-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 fixlet from Thomas Gleixner.
Remove a warning about lack of compiler support for retpoline that most
people can't do anything about, so it just annoys them needlessly.
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/retpoline: Remove compile time warning
Linus Torvalds [Sun, 14 Jan 2018 23:03:17 +0000 (15:03 -0800)]
Merge tag 'powerpc-4.15-7' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"One fix for an oops at boot if we take a hotplug interrupt before we
are ready to handle it.
The bulk is patches to implement mitigation for Meltdown, see the
change logs for more details.
Thanks to: Nicholas Piggin, Michael Neuling, Oliver O'Halloran, Jon
Masters, Jose Ricardo Ziviani, David Gibson"
* tag 'powerpc-4.15-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/powernv: Check device-tree for RFI flush settings
powerpc/pseries: Query hypervisor for RFI flush settings
powerpc/64s: Support disabling RFI flush with no_rfi_flush and nopti
powerpc/64s: Add support for RFI flush of L1-D cache
powerpc/64s: Convert slb_miss_common to use RFI_TO_USER/KERNEL
powerpc/64: Convert fast_exception_return to use RFI_TO_USER/KERNEL
powerpc/64: Convert the syscall exit path to use RFI_TO_USER/KERNEL
powerpc/64s: Simple RFI macro conversions
powerpc/64: Add macros for annotating the destination of rfid/hrfid
powerpc/pseries: Add H_GET_CPU_CHARACTERISTICS flags & wrapper
powerpc/pseries: Make RAS IRQ explicitly dependent on DLPAR WQ
Thomas Gleixner [Sun, 14 Jan 2018 21:13:29 +0000 (22:13 +0100)]
x86/retpoline: Remove compile time warning
Remove the compile time warning when CONFIG_RETPOLINE=y and the compiler
does not have retpoline support. Linus rationale for this is:
It's wrong because it will just make people turn off RETPOLINE, and the
asm updates - and return stack clearing - that are independent of the
compiler are likely the most important parts because they are likely the
ones easiest to target.
And it's annoying because most people won't be able to do anything about
it. The number of people building their own compiler? Very small. So if
their distro hasn't got a compiler yet (and pretty much nobody does), the
warning is just annoying crap.
It is already properly reported as part of the sysfs interface. The
compile-time warning only encourages bad things.
Fixes:
76b043848fd2 ("x86/retpoline: Add initial retpoline support")
Requested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: thomas.lendacky@amd.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kees Cook <keescook@google.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Link: https://lkml.kernel.org/r/CA+55aFzWgquv4i6Mab6bASqYXg3ErV3XDFEYf=GEcCDQg5uAtw@mail.gmail.com
Linus Torvalds [Sun, 14 Jan 2018 18:22:45 +0000 (10:22 -0800)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull NVMe fix from Jens Axboe:
"Just a single fix for nvme over fabrics that should go into 4.15"
* 'for-linus' of git://git.kernel.dk/linux-block:
nvme-fabrics: initialize default host->id in nvmf_host_default()
Linus Torvalds [Sun, 14 Jan 2018 17:51:25 +0000 (09:51 -0800)]
Merge branch 'x86-pti-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 pti updates from Thomas Gleixner:
"This contains:
- a PTI bugfix to avoid setting reserved CR3 bits when PCID is
disabled. This seems to cause issues on a virtual machine at least
and is incorrect according to the AMD manual.
- a PTI bugfix which disables the perf BTS facility if PTI is
enabled. The BTS AUX buffer is not globally visible and causes the
CPU to fault when the mapping disappears on switching CR3 to user
space. A full fix which restores BTS on PTI is non trivial and will
be worked on.
- PTI bugfixes for EFI and trusted boot which make sure that the user
space visible page table entries have the NX bit cleared
- removal of dead code in the PTI pagetable setup functions
- add PTI documentation
- add a selftest for vsyscall to verify that the kernel actually
implements what it advertises.
- a sysfs interface to expose vulnerability and mitigation
information so there is a coherent way for users to retrieve the
status.
- the initial spectre_v2 mitigations, aka retpoline:
+ The necessary ASM thunk and compiler support
+ The ASM variants of retpoline and the conversion of affected ASM
code
+ Make LFENCE serializing on AMD so it can be used as speculation
trap
+ The RSB fill after vmexit
- initial objtool support for retpoline
As I said in the status mail this is the most of the set of patches
which should go into 4.15 except two straight forward patches still on
hold:
- the retpoline add on of LFENCE which waits for ACKs
- the RSB fill after context switch
Both should be ready to go early next week and with that we'll have
covered the major holes of spectre_v2 and go back to normality"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (28 commits)
x86,perf: Disable intel_bts when PTI
security/Kconfig: Correct the Documentation reference for PTI
x86/pti: Fix !PCID and sanitize defines
selftests/x86: Add test_vsyscall
x86/retpoline: Fill return stack buffer on vmexit
x86/retpoline/irq32: Convert assembler indirect jumps
x86/retpoline/checksum32: Convert assembler indirect jumps
x86/retpoline/xen: Convert Xen hypercall indirect jumps
x86/retpoline/hyperv: Convert assembler indirect jumps
x86/retpoline/ftrace: Convert ftrace assembler indirect jumps
x86/retpoline/entry: Convert entry assembler indirect jumps
x86/retpoline/crypto: Convert crypto assembler indirect jumps
x86/spectre: Add boot time option to select Spectre v2 mitigation
x86/retpoline: Add initial retpoline support
objtool: Allow alternatives to be ignored
objtool: Detect jumps to retpoline thunks
x86/pti: Make unpoison of pgd for trusted boot work for real
x86/alternatives: Fix optimize_nops() checking
sysfs/cpu: Fix typos in vulnerability documentation
x86/cpu/AMD: Use LFENCE_RDTSC in preference to MFENCE_RDTSC
...
Peter Zijlstra [Sun, 14 Jan 2018 10:27:13 +0000 (11:27 +0100)]
x86,perf: Disable intel_bts when PTI
The intel_bts driver does not use the 'normal' BTS buffer which is exposed
through the cpu_entry_area but instead uses the memory allocated for the
perf AUX buffer.
This obviously comes apart when using PTI because then the kernel mapping;
which includes that AUX buffer memory; disappears. Fixing this requires to
expose a mapping which is visible in all context and that's not trivial.
As a quick fix disable this driver when PTI is enabled to prevent
malfunction.
Fixes:
385ce0ea4c07 ("x86/mm/pti: Add Kconfig")
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Reported-by: Robert Święcki <robert@swiecki.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: greg@kroah.com
Cc: hughd@google.com
Cc: luto@amacapital.net
Cc: Vince Weaver <vince@deater.net>
Cc: torvalds@linux-foundation.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180114102713.GB6166@worktop.programming.kicks-ass.net
W. Trevor King [Fri, 12 Jan 2018 23:24:59 +0000 (15:24 -0800)]
security/Kconfig: Correct the Documentation reference for PTI
When the config option for PTI was added a reference to documentation was
added as well. But the documentation did not exist at that point. The final
documentation has a different file name.
Fix it up to point to the proper file.
Fixes:
385ce0ea ("x86/mm/pti: Add Kconfig")
Signed-off-by: W. Trevor King <wking@tremily.us>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: linux-mm@kvack.org
Cc: linux-security-module@vger.kernel.org
Cc: James Morris <james.l.morris@oracle.com>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/3009cc8ccbddcd897ec1e0cb6dda524929de0d14.1515799398.git.wking@tremily.us
Thomas Gleixner [Sat, 13 Jan 2018 23:23:57 +0000 (00:23 +0100)]
x86/pti: Fix !PCID and sanitize defines
The switch to the user space page tables in the low level ASM code sets
unconditionally bit 12 and bit 11 of CR3. Bit 12 is switching the base
address of the page directory to the user part, bit 11 is switching the
PCID to the PCID associated with the user page tables.
This fails on a machine which lacks PCID support because bit 11 is set in
CR3. Bit 11 is reserved when PCID is inactive.
While the Intel SDM claims that the reserved bits are ignored when PCID is
disabled, the AMD APM states that they should be cleared.
This went unnoticed as the AMD APM was not checked when the code was
developed and reviewed and test systems with Intel CPUs never failed to
boot. The report is against a Centos 6 host where the guest fails to boot,
so it's not yet clear whether this is a virt issue or can happen on real
hardware too, but thats irrelevant as the AMD APM clearly ask for clearing
the reserved bits.
Make sure that on non PCID machines bit 11 is not set by the page table
switching code.
Andy suggested to rename the related bits and masks so they are clearly
describing what they should be used for, which is done as well for clarity.
That split could have been done with alternatives but the macro hell is
horrible and ugly. This can be done on top if someone cares to remove the
extra orq. For now it's a straight forward fix.
Fixes:
6fd166aae78c ("x86/mm: Use/Fix PCID to optimize user/kernel switches")
Reported-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable <stable@vger.kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Willy Tarreau <w@1wt.eu>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801140009150.2371@nanos
Linus Torvalds [Sat, 13 Jan 2018 22:10:32 +0000 (14:10 -0800)]
Merge tag 'usb-4.15-rc8' of git://git./linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are some small USB fixes and device ids for 4.15-rc8
Nothing major, small fixes for various devices, some resolutions for
bugs found by fuzzers, and the usual handful of new device ids.
All of these have been in linux-next with no reported issues"
* tag 'usb-4.15-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
Documentation: usb: fix typo in UVC gadgetfs config command
usb: misc: usb3503: make sure reset is low for at least 100us
uas: ignore UAS for Norelsys NS1068(X) chips
USB: UDC core: fix double-free in usb_add_gadget_udc_release
USB: fix usbmon BUG trigger
usbip: vudc_tx: fix v_send_ret_submit() vulnerability to null xfer buffer
usbip: remove kernel addresses from usb device and urb debug msgs
usbip: fix vudc_rx: harden CMD_SUBMIT path to handle malicious input
USB: serial: cp210x: add new device ID ELV ALC 8xxx
USB: serial: cp210x: add IDs for LifeScan OneTouch Verio IQ
Linus Torvalds [Sat, 13 Jan 2018 22:04:06 +0000 (14:04 -0800)]
Merge tag 'staging-4.15-rc8' of git://git./linux/kernel/git/gregkh/staging
Pull staging driver fix from Greg KH:
"Here is a single android ashmem bugfix that resolves a reported issue
in that interface. It's been in linux-next this week with no reported
issues"
* tag 'staging-4.15-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: android: ashmem: fix a race condition in ASHMEM_SET_SIZE ioctl
Linus Torvalds [Sat, 13 Jan 2018 22:01:59 +0000 (14:01 -0800)]
Merge tag 'char-misc-4.15-rc8' of git://git./linux/kernel/git/gregkh/char-misc
Pull char/misc fixes from Greg KH:
"Here are two bugfixes for some driver bugs for 4.15-rc8
The first is a bluetooth security bug that has been ignored by the
Bluetooth developers for months for no obvious reason at all, so I've
taken it through my tree.
The second is a simple double-free bug in the mux subsystem.
Both have been in linux-next for a while with no reported issues"
* tag 'char-misc-4.15-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
mux: core: fix double get_device()
Bluetooth: Prevent stack info leak from the EFS element.
Linus Torvalds [Sat, 13 Jan 2018 21:24:56 +0000 (13:24 -0800)]
Merge tag 'kbuild-fixes-v4.15' of git://git./linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- fix cross-compilation for architectures that setup CROSS_COMPILE in
their arch Makefile
- fix Kconfig rational operators for bool / tristate
- drop a gperf-generated file from .gitignore
* tag 'kbuild-fixes-v4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
genksyms: drop *.hash.c from .gitignore
kconfig: fix relational operators for bool and tristate symbols
kbuild: move cc-option and cc-disable-warning after incl. arch Makefile
Linus Torvalds [Sat, 13 Jan 2018 21:18:15 +0000 (13:18 -0800)]
Merge tag 'apparmor-pr-2018-01-12' of git://git./linux/kernel/git/jj/linux-apparmor
Pull apparmor regression fixes from John Johansen:
"This fixes a couple bugs I have been working with Matthew Garrett on
this week. Specifically a regression in the handling of a conflicting
profile attachment and label match restrictions for ptrace when
profiles are stacked.
Summary:
- fix ptrace label match when matching stacked labels
- fix regression in profile conflict logic"
* tag 'apparmor-pr-2018-01-12' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor:
apparmor: Fix regression in profile conflict logic
apparmor: fix ptrace label match when matching stacked labels
Linus Torvalds [Sat, 13 Jan 2018 21:14:54 +0000 (13:14 -0800)]
Merge tag 'pci-v4.15-fixes-2' of git://git./linux/kernel/git/helgaas/pci
Pull PCI fixes from Bjorn Helgaas:
"Fix AMD boot regression due to 64-bit window conflicting with system
memory (Christian König)"
* tag 'pci-v4.15-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
x86/PCI: Move and shrink AMD 64-bit window to avoid conflict
x86/PCI: Add "pci=big_root_window" option for AMD 64-bit windows
Linus Torvalds [Sat, 13 Jan 2018 19:07:55 +0000 (11:07 -0800)]
Merge branch 'akpm' (patches from Andrew)
Merge misc fixlets from Andrew Morton:
"4 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
tools/objtool/Makefile: don't assume sync-check.sh is executable
kdump: write correct address of mem_section into vmcoreinfo
kmemleak: allow to coexist with fault injection
MAINTAINERS, nilfs2: change project home URLs
Andrew Morton [Sat, 13 Jan 2018 00:53:17 +0000 (16:53 -0800)]
tools/objtool/Makefile: don't assume sync-check.sh is executable
patch(1) loses the x bit. So if a user follows our patching
instructions in Documentation/admin-guide/README.rst, their kernel will
not compile.
Fixes:
3bd51c5a371de ("objtool: Move kernel headers/code sync check to a script")
Reported-by: Nicolas Bock <nicolasbock@gentoo.org>
Reported-by Joakim Tjernlund <Joakim.Tjernlund@infinera.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Sat, 13 Jan 2018 00:53:14 +0000 (16:53 -0800)]
kdump: write correct address of mem_section into vmcoreinfo
Depending on configuration mem_section can now be an array or a pointer
to an array allocated dynamically. In most cases, we can continue to
refer to it as 'mem_section' regardless of what it is.
But there's one exception: '&mem_section' means "address of the array"
if mem_section is an array, but if mem_section is a pointer, it would
mean "address of the pointer".
We've stepped onto this in kdump code. VMCOREINFO_SYMBOL(mem_section)
writes down address of pointer into vmcoreinfo, not array as we wanted.
Let's introduce VMCOREINFO_SYMBOL_ARRAY() that would handle the
situation correctly for both cases.
Link: http://lkml.kernel.org/r/20180112162532.35896-1-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Fixes:
83e3c48729d9 ("mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y")
Acked-by: Baoquan He <bhe@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Dave Young <dyoung@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dmitry Vyukov [Sat, 13 Jan 2018 00:53:10 +0000 (16:53 -0800)]
kmemleak: allow to coexist with fault injection
kmemleak does one slab allocation per user allocation. So if slab fault
injection is enabled to any degree, kmemleak instantly fails to allocate
and turns itself off. However, it's useful to use kmemleak with fault
injection to find leaks on error paths. On the other hand, checking
kmemleak itself is not so useful because (1) it's a debugging tool and
(2) it has a very regular allocation pattern (basically a single
allocation site, so it either works or not).
Turn off fault injection for kmemleak allocations.
Link: http://lkml.kernel.org/r/20180109192243.19316-1-dvyukov@google.com
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ryusuke Konishi [Sat, 13 Jan 2018 00:53:07 +0000 (16:53 -0800)]
MAINTAINERS, nilfs2: change project home URLs
The domain of NILFS project home was changed to "nilfs.sourceforge.io"
to enable https access (the previous domain "nilfs.sourceforge.net" is
redirected to the new one). Modify URLs of the project home to reflect
this change and to replace their protocol from http to https.
Link: http://lkml.kernel.org/r/1515416141-5614-1-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Masahiro Yamada [Thu, 11 Jan 2018 09:28:08 +0000 (18:28 +0900)]
genksyms: drop *.hash.c from .gitignore
This is a left-over of commit
bb3290d91695 ("Remove gperf usage from
toolchain").
We do not generate a hash function any more.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Andy Lutomirski [Fri, 12 Jan 2018 01:16:51 +0000 (17:16 -0800)]
selftests/x86: Add test_vsyscall
This tests that the vsyscall entries do what they're expected to do.
It also confirms that attempts to read the vsyscall page behave as
expected.
If changes are made to the vsyscall code or its memory map handling,
running this test in all three of vsyscall=none, vsyscall=emulate,
and vsyscall=native are helpful.
(Because it's easy, this also compares the vsyscall results to their
vDSO equivalents.)
Note to KAISER backporters: please test this under all three
vsyscall modes. Also, in the emulate and native modes, make sure
that test_vsyscall_64 agrees with the command line or config
option as to which mode you're in. It's quite easy to mess up
the kernel such that native mode accidentally emulates
or vice versa.
Greg, etc: please backport this to all your Meltdown-patched
kernels. It'll help make sure the patches didn't regress
vsyscalls.
CSigned-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/2b9c5a174c1d60fd7774461d518aa75598b1d8fd.1515719552.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Matthew Garrett [Thu, 11 Jan 2018 21:07:54 +0000 (13:07 -0800)]
apparmor: Fix regression in profile conflict logic
The intended behaviour in apparmor profile matching is to flag a
conflict if two profiles match equally well. However, right now a
conflict is generated if another profile has the same match length even
if that profile doesn't actually match. Fix the logic so we only
generate a conflict if the profiles match.
Fixes:
844b8292b631 ("apparmor: ensure that undecidable profile attachments fail")
Cc: Stable <stable@vger.kernel.org>
Signed-off-by: Matthew Garrett <mjg59@google.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
John Johansen [Sat, 9 Dec 2017 01:43:18 +0000 (17:43 -0800)]
apparmor: fix ptrace label match when matching stacked labels
Given a label with a profile stack of
A//&B or A//&C ...
A ptrace rule should be able to specify a generic trace pattern with
a rule like
ptrace trace A//&**,
however this is failing because while the correct label match routine
is called, it is being done post label decomposition so it is always
being done against a profile instead of the stacked label.
To fix this refactor the cross check to pass the full peer label in to
the label_match.
Fixes:
290f458a4f16 ("apparmor: allow ptrace checks to be finer grained than just capability")
Cc: Stable <stable@vger.kernel.org>
Reported-by: Matthew Garrett <mjg59@google.com>
Tested-by: Matthew Garrett <mjg59@google.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
Linus Torvalds [Fri, 12 Jan 2018 18:32:11 +0000 (10:32 -0800)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Two pending (non-PTI) x86 fixes:
- an Intel-MID crash fix
- and an Intel microcode loader blacklist quirk to avoid a
problematic revision"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/platform/intel-mid: Revert "Make 'bt_sfi_data' const"
x86/microcode/intel: Extend BDW late-loading with a revision check
Linus Torvalds [Fri, 12 Jan 2018 18:23:59 +0000 (10:23 -0800)]
Merge branch 'sched-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
"A Kconfig fix, a build fix and a membarrier bug fix"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
membarrier: Disable preemption when calling smp_call_function_many()
sched/isolation: Make CONFIG_CPU_ISOLATION=y depend on SMP or COMPILE_TEST
ia64, sched/cputime: Fix build error if CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y
Linus Torvalds [Fri, 12 Jan 2018 18:14:09 +0000 (10:14 -0800)]
Merge branch 'locking-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull locking fixes from Ingo Molnar:
"No functional effects intended: removes leftovers from recent lockdep
and refcounts work"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/refcounts: Remove stale comment from the ARCH_HAS_REFCOUNT Kconfig entry
locking/lockdep: Remove cross-release leftovers
locking/Documentation: Remove stale crossrelease_fullstack parameter
Linus Torvalds [Fri, 12 Jan 2018 18:00:15 +0000 (10:00 -0800)]
Merge tag 'for-linus-4.15-rc8-tag' of git://git./linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
"This contains two build fixes for clang and two fixes for rather
unlikely situations in the Xen gntdev driver"
* tag 'for-linus-4.15-rc8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/gntdev: Fix partial gntdev_mmap() cleanup
xen/gntdev: Fix off-by-one error when unmapping with holes
x86: xen: remove the use of VLAIS
x86/xen/time: fix section mismatch for xen_init_time_ops()
Linus Torvalds [Fri, 12 Jan 2018 17:56:52 +0000 (09:56 -0800)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
"PPC:
- user-triggerable use-after-free in HPT resizing
- stale TLB entries in the guest
- trap-and-emulate (PR) KVM guests failing to start under pHyp
x86:
- Another "Spectre" fix.
- async pagefault fix
- Revert an old fix for x86 nested virtualization, which turned out
to do more harm than good
- Check shrinker registration return code, to avoid warnings from
upcoming 4.16 -mm patches"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: Add memory barrier on vmcs field lookup
KVM: x86: emulate #UD while in guest mode
x86: kvm: propagate register_shrinker return code
KVM MMU: check pending exception before injecting APF
KVM: PPC: Book3S HV: Always flush TLB in kvmppc_alloc_reset_hpt()
KVM: PPC: Book3S PR: Fix WIMG handling under pHyp
KVM: PPC: Book3S HV: Fix use after free in case of multiple resize requests
KVM: PPC: Book3S HV: Drop prepare_done from struct kvm_resize_hpt
Linus Torvalds [Fri, 12 Jan 2018 17:47:58 +0000 (09:47 -0800)]
Merge branch 'linus' of git://git./linux/kernel/git/herbert/crypto-2.6
Pull crypto fix from Herbert Xu:
"This fixes a NULL pointer dereference in crypto_remove_spawns that can
be triggered through af_alg"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: algapi - fix NULL dereference in crypto_remove_spawns()
Jens Axboe [Fri, 12 Jan 2018 17:42:36 +0000 (10:42 -0700)]
Merge branch 'nvme-4.15' of git://git.infradead.org/nvme into for-linus
Pull a single NVMe fix from Christoph for 4.15.
Linus Torvalds [Fri, 12 Jan 2018 17:34:20 +0000 (09:34 -0800)]
Merge tag 'mmc-v4.15-rc2-2' of git://git./linux/kernel/git/ulfh/mmc
Pull MMC host fixes from Ulf Hansson:
- s3mci: mark debug_regs[] as static
- renesas_sdhi: Add MODULE_LICENSE
* tag 'mmc-v4.15-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: s3mci: mark debug_regs[] as static
mmc: renesas_sdhi: Add MODULE_LICENSE
Linus Torvalds [Fri, 12 Jan 2018 17:28:28 +0000 (09:28 -0800)]
Merge tag 'drm-fixes-for-v4.15-rc8' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
- Nouveau: regression fix
- Tegra: regression fix
- vmwgfx: crasher + freed data leak
- i915: KASAN use after free fix, whitelist register to avoid hang fix,
GVT fixes
- vc4: irq/pm fix
* tag 'drm-fixes-for-v4.15-rc8' of git://people.freedesktop.org/~airlied/linux:
drm/i915: Don't adjust priority on an already signaled fence
drm/i915: Whitelist SLICE_COMMON_ECO_CHICKEN1 on Geminilake.
drm/vmwgfx: Potential off by one in vmw_view_add()
drm/tegra: sor: Fix hang on Tegra124 eDP
drm/vmwgfx: Don't cache framebuffer maps
drm/nouveau/disp/gf119: add missing drive vfunc ptr
drm/i915/gvt: Fix stack-out-of-bounds bug in cmd parser
drm/i915/gvt: Clear the shadow page table entry after post-sync
drm/vc4: Move IRQ enable to PM path
David Woodhouse [Fri, 12 Jan 2018 11:11:27 +0000 (11:11 +0000)]
x86/retpoline: Fill return stack buffer on vmexit
In accordance with the Intel and AMD documentation, we need to overwrite
all entries in the RSB on exiting a guest, to prevent malicious branch
target predictions from affecting the host kernel. This is needed both
for retpoline and for IBRS.
[ak: numbers again for the RSB stuffing labels]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: thomas.lendacky@amd.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kees Cook <keescook@google.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: Paul Turner <pjt@google.com>
Link: https://lkml.kernel.org/r/1515755487-8524-1-git-send-email-dwmw@amazon.co.uk
Dave Airlie [Fri, 12 Jan 2018 01:48:06 +0000 (11:48 +1000)]
Merge tag 'drm-intel-fixes-2018-01-11-1' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
Hopefully final drm/i915 fixes for v4.15:
- Fix a KASAN reported use after free
- Whitelist a register to avoid hangs
- GVT fixes
* tag 'drm-intel-fixes-2018-01-11-1' of git://anongit.freedesktop.org/drm/drm-intel:
drm/i915: Don't adjust priority on an already signaled fence
drm/i915: Whitelist SLICE_COMMON_ECO_CHICKEN1 on Geminilake.
drm/i915/gvt: Fix stack-out-of-bounds bug in cmd parser
drm/i915/gvt: Clear the shadow page table entry after post-sync
Dave Airlie [Fri, 12 Jan 2018 01:47:40 +0000 (11:47 +1000)]
Merge branch 'vmwgfx-fixes-4.15' of git://people.freedesktop.org/~thomash/linux into drm-fixes
Two important fixes for vmwgfx.
The off-by-one fix could cause a malicious user to potentially crash the
kernel.
The framebuffer map cache fix can under some circumstances enable a user to
read from or write to freed pages.
* 'vmwgfx-fixes-4.15' of git://people.freedesktop.org/~thomash/linux:
drm/vmwgfx: Potential off by one in vmw_view_add()
drm/vmwgfx: Don't cache framebuffer maps
Dave Airlie [Fri, 12 Jan 2018 01:47:11 +0000 (11:47 +1000)]
Merge tag 'drm/tegra/for-4.15-rc8' of git://anongit.freedesktop.org/tegra/linux into drm-fixes
drm/tegra: Fixes for v4.15-rc8
A single fix for a Tegra124 eDP regression introduced by the SOR changes
in v4.15-rc1.
* tag 'drm/tegra/for-4.15-rc8' of git://anongit.freedesktop.org/tegra/linux:
drm/tegra: sor: Fix hang on Tegra124 eDP
Linus Torvalds [Fri, 12 Jan 2018 00:57:32 +0000 (16:57 -0800)]
Merge tag 'ceph-for-4.15-rc8' of git://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
"Two rbd fixes for 4.12 and 4.2 issues respectively, marked for
stable"
* tag 'ceph-for-4.15-rc8' of git://github.com/ceph/ceph-client:
rbd: set max_segments to USHRT_MAX
rbd: reacquire lock should update lock owner client id
Linus Torvalds [Fri, 12 Jan 2018 00:54:35 +0000 (16:54 -0800)]
Merge tag 'gpio-v4.15-4' of git://git./linux/kernel/git/linusw/linux-gpio
Pull GPIO fix from Linus Walleij:
"Fix a raw vs elaborate GPIO descriptor bug introduced by yours truly"
* tag 'gpio-v4.15-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpio: Add missing open drain/source handling to gpiod_set_value_cansleep()
Andi Kleen [Thu, 11 Jan 2018 21:46:33 +0000 (21:46 +0000)]
x86/retpoline/irq32: Convert assembler indirect jumps
Convert all indirect jumps in 32bit irq inline asm code to use non
speculative sequences.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: Rik van Riel <riel@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: thomas.lendacky@amd.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kees Cook <keescook@google.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: Paul Turner <pjt@google.com>
Link: https://lkml.kernel.org/r/1515707194-20531-12-git-send-email-dwmw@amazon.co.uk
David Woodhouse [Thu, 11 Jan 2018 21:46:32 +0000 (21:46 +0000)]
x86/retpoline/checksum32: Convert assembler indirect jumps
Convert all indirect jumps in 32bit checksum assembler code to use
non-speculative sequences when CONFIG_RETPOLINE is enabled.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: thomas.lendacky@amd.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kees Cook <keescook@google.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: Paul Turner <pjt@google.com>
Link: https://lkml.kernel.org/r/1515707194-20531-11-git-send-email-dwmw@amazon.co.uk
David Woodhouse [Thu, 11 Jan 2018 21:46:31 +0000 (21:46 +0000)]
x86/retpoline/xen: Convert Xen hypercall indirect jumps
Convert indirect call in Xen hypercall to use non-speculative sequence,
when CONFIG_RETPOLINE is enabled.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: thomas.lendacky@amd.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kees Cook <keescook@google.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: Paul Turner <pjt@google.com>
Link: https://lkml.kernel.org/r/1515707194-20531-10-git-send-email-dwmw@amazon.co.uk
David Woodhouse [Thu, 11 Jan 2018 21:46:30 +0000 (21:46 +0000)]
x86/retpoline/hyperv: Convert assembler indirect jumps
Convert all indirect jumps in hyperv inline asm code to use non-speculative
sequences when CONFIG_RETPOLINE is enabled.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: thomas.lendacky@amd.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kees Cook <keescook@google.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: Paul Turner <pjt@google.com>
Link: https://lkml.kernel.org/r/1515707194-20531-9-git-send-email-dwmw@amazon.co.uk
David Woodhouse [Thu, 11 Jan 2018 21:46:29 +0000 (21:46 +0000)]
x86/retpoline/ftrace: Convert ftrace assembler indirect jumps
Convert all indirect jumps in ftrace assembler code to use non-speculative
sequences when CONFIG_RETPOLINE is enabled.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: thomas.lendacky@amd.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kees Cook <keescook@google.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: Paul Turner <pjt@google.com>
Link: https://lkml.kernel.org/r/1515707194-20531-8-git-send-email-dwmw@amazon.co.uk
David Woodhouse [Thu, 11 Jan 2018 21:46:28 +0000 (21:46 +0000)]
x86/retpoline/entry: Convert entry assembler indirect jumps
Convert indirect jumps in core 32/64bit entry assembler code to use
non-speculative sequences when CONFIG_RETPOLINE is enabled.
Don't use CALL_NOSPEC in entry_SYSCALL_64_fastpath because the return
address after the 'call' instruction must be *precisely* at the
.Lentry_SYSCALL_64_after_fastpath label for stub_ptregs_64 to work,
and the use of alternatives will mess that up unless we play horrid
games to prepend with NOPs and make the variants the same length. It's
not worth it; in the case where we ALTERNATIVE out the retpoline, the
first instruction at __x86.indirect_thunk.rax is going to be a bare
jmp *%rax anyway.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: thomas.lendacky@amd.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kees Cook <keescook@google.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: Paul Turner <pjt@google.com>
Link: https://lkml.kernel.org/r/1515707194-20531-7-git-send-email-dwmw@amazon.co.uk
David Woodhouse [Thu, 11 Jan 2018 21:46:27 +0000 (21:46 +0000)]
x86/retpoline/crypto: Convert crypto assembler indirect jumps
Convert all indirect jumps in crypto assembler code to use non-speculative
sequences when CONFIG_RETPOLINE is enabled.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: thomas.lendacky@amd.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kees Cook <keescook@google.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: Paul Turner <pjt@google.com>
Link: https://lkml.kernel.org/r/1515707194-20531-6-git-send-email-dwmw@amazon.co.uk
David Woodhouse [Thu, 11 Jan 2018 21:46:26 +0000 (21:46 +0000)]
x86/spectre: Add boot time option to select Spectre v2 mitigation
Add a spectre_v2= option to select the mitigation used for the indirect
branch speculation vulnerability.
Currently, the only option available is retpoline, in its various forms.
This will be expanded to cover the new IBRS/IBPB microcode features.
The RETPOLINE_AMD feature relies on a serializing LFENCE for speculation
control. For AMD hardware, only set RETPOLINE_AMD if LFENCE is a
serializing instruction, which is indicated by the LFENCE_RDTSC feature.
[ tglx: Folded back the LFENCE/AMD fixes and reworked it so IBRS
integration becomes simple ]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: thomas.lendacky@amd.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kees Cook <keescook@google.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: Paul Turner <pjt@google.com>
Link: https://lkml.kernel.org/r/1515707194-20531-5-git-send-email-dwmw@amazon.co.uk
David Woodhouse [Thu, 11 Jan 2018 21:46:25 +0000 (21:46 +0000)]
x86/retpoline: Add initial retpoline support
Enable the use of -mindirect-branch=thunk-extern in newer GCC, and provide
the corresponding thunks. Provide assembler macros for invoking the thunks
in the same way that GCC does, from native and inline assembler.
This adds X86_FEATURE_RETPOLINE and sets it by default on all CPUs. In
some circumstances, IBRS microcode features may be used instead, and the
retpoline can be disabled.
On AMD CPUs if lfence is serialising, the retpoline can be dramatically
simplified to a simple "lfence; jmp *\reg". A future patch, after it has
been verified that lfence really is serialising in all circumstances, can
enable this by setting the X86_FEATURE_RETPOLINE_AMD feature bit in addition
to X86_FEATURE_RETPOLINE.
Do not align the retpoline in the altinstr section, because there is no
guarantee that it stays aligned when it's copied over the oldinstr during
alternative patching.
[ Andi Kleen: Rename the macros, add CONFIG_RETPOLINE option, export thunks]
[ tglx: Put actual function CALL/JMP in front of the macros, convert to
symbolic labels ]
[ dwmw2: Convert back to numeric labels, merge objtool fixes ]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: thomas.lendacky@amd.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kees Cook <keescook@google.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: Paul Turner <pjt@google.com>
Link: https://lkml.kernel.org/r/1515707194-20531-4-git-send-email-dwmw@amazon.co.uk
Josh Poimboeuf [Thu, 11 Jan 2018 21:46:24 +0000 (21:46 +0000)]
objtool: Allow alternatives to be ignored
Getting objtool to understand retpolines is going to be a bit of a
challenge. For now, take advantage of the fact that retpolines are
patched in with alternatives. Just read the original (sane)
non-alternative instruction, and ignore the patched-in retpoline.
This allows objtool to understand the control flow *around* the
retpoline, even if it can't yet follow what's inside. This means the
ORC unwinder will fail to unwind from inside a retpoline, but will work
fine otherwise.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: thomas.lendacky@amd.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kees Cook <keescook@google.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: Paul Turner <pjt@google.com>
Link: https://lkml.kernel.org/r/1515707194-20531-3-git-send-email-dwmw@amazon.co.uk
Josh Poimboeuf [Thu, 11 Jan 2018 21:46:23 +0000 (21:46 +0000)]
objtool: Detect jumps to retpoline thunks
A direct jump to a retpoline thunk is really an indirect jump in
disguise. Change the objtool instruction type accordingly.
Objtool needs to know where indirect branches are so it can detect
switch statement jump tables.
This fixes a bunch of warnings with CONFIG_RETPOLINE like:
arch/x86/events/intel/uncore_nhmex.o: warning: objtool: nhmex_rbox_msr_enable_event()+0x44: sibling call from callable instruction with modified stack frame
kernel/signal.o: warning: objtool: copy_siginfo_to_user()+0x91: sibling call from callable instruction with modified stack frame
...
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: thomas.lendacky@amd.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kees Cook <keescook@google.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: Paul Turner <pjt@google.com>
Link: https://lkml.kernel.org/r/1515707194-20531-2-git-send-email-dwmw@amazon.co.uk
Dave Hansen [Wed, 10 Jan 2018 22:49:39 +0000 (14:49 -0800)]
x86/pti: Make unpoison of pgd for trusted boot work for real
The inital fix for trusted boot and PTI potentially misses the pgd clearing
if pud_alloc() sets a PGD. It probably works in *practice* because for two
adjacent calls to map_tboot_page() that share a PGD entry, the first will
clear NX, *then* allocate and set the PGD (without NX clear). The second
call will *not* allocate but will clear the NX bit.
Defer the NX clearing to a point after it is known that all top-level
allocations have occurred. Add a comment to clarify why.
[ tglx: Massaged changelog ]
Fixes:
262b6b30087 ("x86/tboot: Unbreak tboot with PTI enabled")
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Jon Masters <jcm@redhat.com>
Cc: "Tim Chen" <tim.c.chen@linux.intel.com>
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: peterz@infradead.org
Cc: ning.sun@intel.com
Cc: tboot-devel@lists.sourceforge.net
Cc: andi@firstfloor.org
Cc: luto@kernel.org
Cc: law@redhat.com
Cc: pbonzini@redhat.com
Cc: torvalds@linux-foundation.org
Cc: gregkh@linux-foundation.org
Cc: dwmw@amazon.co.uk
Cc: nickc@redhat.com
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180110224939.2695CD47@viggo.jf.intel.com
=?UTF-8?q?Christian=20K=C3=B6nig?= [Thu, 11 Jan 2018 13:23:30 +0000 (14:23 +0100)]
x86/PCI: Move and shrink AMD 64-bit window to avoid conflict
Avoid problems with BIOS implementations which don't report all used
resources to the OS by only allocating a 256GB window directly below the
hardware limit (from the BKDG, sec 2.4.6).
Fixes a silent reboot loop reported by Aaro Koskinen <aaro.koskinen@iki.fi>
on an AMD-based MSI MS-7699/760GA-P43(FX) system. This was apparently
caused by RAM or other unreported hardware that conflicted with the new
window.
Link: https://support.amd.com/TechDocs/49125_15h_Models_30h-3Fh_BKDG.pdf
Link: https://lkml.kernel.org/r/20180105220412.fzpwqe4zljdawr36@darkstar.musicnaut.iki.fi
Fixes:
fa564ad96366 ("x86/PCI: Enable a 64bit BAR on AMD Family 15h (Models 00-1f, 30-3f, 60-7f)")
Reported-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Signed-off-by: Christian König <christian.koenig@amd.com>
[bhelgaas: changelog, comment, Fixes:]
Signed-off-by: Bjorn Helgaas <helgaas@kernel.org>
Bin Liu [Tue, 9 Jan 2018 19:27:17 +0000 (13:27 -0600)]
Documentation: usb: fix typo in UVC gadgetfs config command
This seems to be a copy&paste error. With the fix the uvc gadget now can
be created by following the instrucitons.
Signed-off-by: Bin Liu <b-liu@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Stefan Agner [Thu, 11 Jan 2018 13:47:40 +0000 (14:47 +0100)]
usb: misc: usb3503: make sure reset is low for at least 100us
When using a GPIO which is high by default, and initialize the
driver in USB Hub mode, initialization fails with:
[ 111.757794] usb3503 0-0008: SP_ILOCK failed (-5)
The reason seems to be that the chip is not properly reset.
Probe does initialize reset low, however some lines later the
code already set it back high, which is not long enouth.
Make sure reset is asserted for at least 100us by inserting a
delay after initializing the reset pin during probe.
Signed-off-by: Stefan Agner <stefan@agner.ch>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
=?UTF-8?q?Christian=20K=C3=B6nig?= [Thu, 11 Jan 2018 13:23:29 +0000 (14:23 +0100)]
x86/PCI: Add "pci=big_root_window" option for AMD 64-bit windows
Only try to enable a 64-bit window on AMD CPUs when "pci=big_root_window"
is specified.
This taints the kernel because the new 64-bit window uses address space we
don't know anything about, and it may contain unreported devices or memory
that would conflict with the window.
The pci_amd_enable_64bit_bar() quirk that enables the window is specific to
AMD CPUs. The generic solution would be to have the firmware enable the
window and describe it in the host bridge's _CRS method, or at least
describe it in the _PRS method so the OS would have the option of enabling
it.
Signed-off-by: Christian König <christian.koenig@amd.com>
[bhelgaas: changelog, extend doc, mention taint in dmesg]
Signed-off-by: Bjorn Helgaas <helgaas@kernel.org>
Paolo Bonzini [Thu, 11 Jan 2018 17:20:48 +0000 (18:20 +0100)]
Merge branch 'kvm-insert-lfence' into kvm-master
Topic branch for CVE-2017-5753, avoiding conflicts in the next merge window.
Andrew Honig [Wed, 10 Jan 2018 18:12:03 +0000 (10:12 -0800)]
KVM: x86: Add memory barrier on vmcs field lookup
This adds a memory barrier when performing a lookup into
the vmcs_field_to_offset_table. This is related to
CVE-2017-5753.
Signed-off-by: Andrew Honig <ahonig@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>