Yan, Zheng [Thu, 28 Apr 2016 09:43:35 +0000 (17:43 +0800)]
ceph: search cache postion for dcache readdir
use binary search to find cache index that corresponds to readdir
postion.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
Yan, Zheng [Thu, 21 Apr 2016 04:11:54 +0000 (12:11 +0800)]
ceph: use CEPH_MDS_OP_RMXATTR request to remove xattr
Setxattr with NULL value and XATTR_REPLACE flag should be equivalent
to removexattr. But current MDS does not support deleting vxattrs through
MDS_OP_SETXATTR request. The workaround is sending MDS_OP_RMXATTR request
if setxattr actually removs xattr.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
Yan, Zheng [Thu, 21 Apr 2016 03:09:55 +0000 (11:09 +0800)]
ceph: report mount root in session metadata
Signed-off-by: Yan, Zheng <zyan@redhat.com>
Yan, Zheng [Mon, 18 Apr 2016 08:51:37 +0000 (16:51 +0800)]
ceph: don't show symlink target in debugfs/mdsc
symlink target is useless for debug and can be very long. It's annoying
to show it in debugfs/mdsc.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
Yan, Zheng [Fri, 15 Apr 2016 05:56:12 +0000 (13:56 +0800)]
ceph: don't call truncate_pagecache in ceph_writepages_start
truncate_pagecache() may decrease inode's reference. This can cause
deadlock if inode's last reference is dropped and iput_final() wants
to evict the inode. (evict() calls inode_wait_for_writeback(), which
waits for ceph_writepages_start() to return).
The fix is use work thead to truncate dirty pages. Also add 'forced
umount' check to ceph_update_writeable_page(), which prevents new
pages getting dirty.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
Yan, Zheng [Fri, 8 Apr 2016 07:27:16 +0000 (15:27 +0800)]
ceph: renew caps for read/write if mds session got killed.
When mds session gets killed, read/write operation may hang.
Client waits for Frw caps, but mds does not know what caps client
wants. To recover this, client sends an open request to mds. The
request will tell mds what caps client wants.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
Yan, Zheng [Thu, 31 Mar 2016 07:53:01 +0000 (15:53 +0800)]
ceph: CEPH_FEATURE_MDSENC support
Signed-off-by: Yan, Zheng <zyan@redhat.com>
Yan, Zheng [Wed, 30 Mar 2016 09:18:34 +0000 (17:18 +0800)]
ceph: multiple filesystem support
To access non-default filesystem, we just need to subscribe to
mdsmap.<MDS_NAMESPACE_ID> and add a new mount option for mds
namespace id.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
[idryomov@gmail.com: switch to a new libceph API]
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Ilya Dryomov [Wed, 25 May 2016 22:05:01 +0000 (00:05 +0200)]
libceph: support for subscribing to "mdsmap.<id>" maps
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Ilya Dryomov [Thu, 28 Apr 2016 14:07:28 +0000 (16:07 +0200)]
libceph: replace ceph_monc_request_next_osdmap()
... with a wrapper around maybe_request_map() - no need for two
osdmap-specific functions.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Ilya Dryomov [Thu, 28 Apr 2016 14:07:27 +0000 (16:07 +0200)]
libceph: take osdc->lock in osdmap_show() and dump flags in hex
There is now about a dozen CEPH_OSDMAP_* flags. This is a debugging
interface, so just dump in hex instead of spelling each flag out.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Ilya Dryomov [Thu, 28 Apr 2016 14:07:27 +0000 (16:07 +0200)]
libceph: pool deletion detection
This adds the "map check" infrastructure for sending osdmap version
checks on CALC_TARGET_POOL_DNE and completing in-flight requests with
-ENOENT if the target pool doesn't exist or has just been deleted.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Ilya Dryomov [Thu, 28 Apr 2016 14:07:27 +0000 (16:07 +0200)]
libceph: async MON client generic requests
For map check, we are going to need to send CEPH_MSG_MON_GET_VERSION
messages asynchronously and get a callback on completion. Refactor MON
client to allow firing off generic requests asynchronously and add an
async variant of ceph_monc_get_version(). ceph_monc_do_statfs() is
switched over and remains sync.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Ilya Dryomov [Thu, 28 Apr 2016 14:07:27 +0000 (16:07 +0200)]
libceph: support for checking on status of watch
Implement ceph_osdc_watch_check() to be able to check on status of
watch. Note that the time it takes for a watch/notify event to get
delivered through the notify_wq is taken into account.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Ilya Dryomov [Thu, 28 Apr 2016 14:07:27 +0000 (16:07 +0200)]
libceph: support for sending notifies
Implement ceph_osdc_notify() for sending notifies.
Due to the fact that the current messenger can't do read-in into
pagelists (it can only do write-out from them), I had to go with a page
vector for a NOTIFY_COMPLETE payload, for now.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Ilya Dryomov [Wed, 25 May 2016 23:15:02 +0000 (01:15 +0200)]
libceph, rbd: ceph_osd_linger_request, watch/notify v2
This adds support and switches rbd to a new, more reliable version of
watch/notify protocol. As with the OSD client update, this is mostly
about getting the right structures linked into the right places so that
reconnects are properly sent when needed. watch/notify v2 also
requires sending regular pings to the OSDs - send_linger_ping().
A major change from the old watch/notify implementation is the
introduction of ceph_osd_linger_request - linger requests no longer
piggy back on ceph_osd_request. ceph_osd_event has been merged into
ceph_osd_linger_request.
All the details are now hidden within libceph, the interface consists
of a simple pair of watch/unwatch functions and ceph_osdc_notify_ack().
ceph_osdc_watch() does return ceph_osd_linger_request, but only to keep
the lifetime management simple.
ceph_osdc_notify_ack() accepts an optional data payload, which is
relayed back to the notifier.
Portions of this patch are loosely based on work by Douglas Fuller
<dfuller@redhat.com> and Mike Christie <michaelc@cs.wisc.edu>.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Ilya Dryomov [Thu, 28 Apr 2016 14:07:26 +0000 (16:07 +0200)]
rbd: rbd_dev_header_unwatch_sync() variant
Introduce __rbd_dev_header_unwatch_sync(), which doesn't flush notify
callbacks. This is for the new rados_watcherrcb_t, which would be
called from a notify callback.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Ilya Dryomov [Thu, 28 Apr 2016 14:07:26 +0000 (16:07 +0200)]
libceph: wait_request_timeout()
The unwatch timeout is currently implemented in rbd. With
watch/unwatch code moving into libceph, we are going to need
a ceph_osdc_wait_request() variant with a timeout.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Ilya Dryomov [Thu, 28 Apr 2016 14:07:26 +0000 (16:07 +0200)]
libceph: request_init() and request_release_checks()
These are going to be used by request_reinit() code.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Ilya Dryomov [Thu, 28 Apr 2016 14:07:26 +0000 (16:07 +0200)]
libceph: a major OSD client update
This is a major sync up, up to ~Jewel. The highlights are:
- per-session request trees (vs a global per-client tree)
- per-session locking (vs a global per-client rwlock)
- homeless OSD session
- no ad-hoc global per-client lists
- support for pool quotas
- foundation for watch/notify v2 support
- foundation for map check (pool deletion detection) support
The switchover is incomplete: lingering requests can be setup and
teared down but aren't ever reestablished. This functionality is
restored with the introduction of the new lingering infrastructure
(ceph_osd_linger_request, linger_work, etc) in a later commit.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Ilya Dryomov [Thu, 28 Apr 2016 14:07:26 +0000 (16:07 +0200)]
libceph: protect osdc->osd_lru list with a spinlock
OSD client is getting moved from the big per-client lock to a set of
per-session locks. The big rwlock would only be held for read most of
the time, so a global osdc->osd_lru needs additional protection.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Ilya Dryomov [Thu, 28 Apr 2016 14:07:25 +0000 (16:07 +0200)]
libceph: allocate ceph_osd with GFP_NOFAIL
create_osd() is called way too deep in the stack to be able to error
out in a sane way; a failing create_osd() just messes everything up.
The current req_notarget list solution is broken - the list is never
traversed as it's not entirely clear when to do it, I guess.
If we were to start traversing it at regular intervals and retrying
each request, we wouldn't be far off from what __GFP_NOFAIL is doing,
so allocate OSD sessions with __GFP_NOFAIL, at least until we come up
with a better fix.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Ilya Dryomov [Thu, 28 Apr 2016 14:07:25 +0000 (16:07 +0200)]
libceph: osd_init() and osd_cleanup()
These are going to be used by homeless OSD sessions code.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Ilya Dryomov [Thu, 28 Apr 2016 14:07:25 +0000 (16:07 +0200)]
libceph: handle_one_map()
Separate osdmap handling from decoding and iterating over a bag of maps
in a fresh MOSDMap message. This sets up the scene for the updated OSD
client.
Of particular importance here is the addition of pi->was_full, which
can be used to answer "did this pool go full -> not-full in this map?".
This is the key bit for supporting pool quotas.
We won't be able to downgrade map_sem for much longer, so drop
downgrade_write().
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Linus Torvalds [Wed, 25 May 2016 22:59:09 +0000 (15:59 -0700)]
Merge branch 'work.iov_iter' of git://git./linux/kernel/git/viro/vfs
Pull vfs iov_iter regression fix from Al Viro:
"Fix for braino in 'fold checks into iterate_and_advance()'"
* 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
do "fold checks into iterate_and_advance()" right
Linus Torvalds [Wed, 25 May 2016 22:54:35 +0000 (15:54 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs
Pull vfs xattr regression fixes from Al Viro.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
make xattr_resolve_handlers() safe to use with NULL ->s_xattr
xattr: Fail with -EINVAL for NULL attribute names
Linus Torvalds [Wed, 25 May 2016 22:38:56 +0000 (15:38 -0700)]
Merge tag 'acpi-4.7-rc1-more' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"Additional ACPI update for v4.7-rc1
Just one fix for incorrect async_synchronize_cookie() usage in the
ACPI battery driver (Chris Wilson)"
* tag 'acpi-4.7-rc1-more' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / battery: Correctly serialise with the pending async probe
Ilya Dryomov [Thu, 28 Apr 2016 14:07:25 +0000 (16:07 +0200)]
libceph: allocate dummy osdmap in ceph_osdc_init()
This leads to a simpler osdmap handling code, particularly when dealing
with pi->was_full, which is introduced in a later commit.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Ilya Dryomov [Thu, 28 Apr 2016 14:07:24 +0000 (16:07 +0200)]
libceph: schedule tick from ceph_osdc_init()
Both homeless OSD sessions and watch/notify v2, introduced in later
commits, require periodic ticks which don't depend on ->num_requests.
Schedule the initial tick from ceph_osdc_init() and reschedule from
handle_timeout() unconditionally.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Ilya Dryomov [Thu, 28 Apr 2016 14:07:24 +0000 (16:07 +0200)]
libceph: move schedule_delayed_work() in ceph_osdc_init()
ceph_osdc_stop() isn't called if ceph_osdc_init() fails, so we end up
with handle_osds_timeout() running on invalid memory if any one of the
allocations fails. Call schedule_delayed_work() after everything is
setup, just before returning.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Ilya Dryomov [Thu, 28 Apr 2016 14:07:24 +0000 (16:07 +0200)]
libceph: redo callbacks and factor out MOSDOpReply decoding
If you specify ACK | ONDISK and set ->r_unsafe_callback, both
->r_callback and ->r_unsafe_callback(true) are called on ack. This is
very confusing. Redo this so that only one of them is called:
->r_unsafe_callback(true), on ack
->r_unsafe_callback(false), on commit
or
->r_callback, on ack|commit
Decode everything in decode_MOSDOpReply() to reduce clutter.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Ilya Dryomov [Thu, 28 Apr 2016 14:07:24 +0000 (16:07 +0200)]
libceph: drop msg argument from ceph_osdc_callback_t
finish_read(), its only user, uses it to get to hdr.data_len, which is
what ->r_result is set to on success. This gains us the ability to
safely call callbacks from contexts other than reply, e.g. map check.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Ilya Dryomov [Wed, 25 May 2016 22:29:52 +0000 (00:29 +0200)]
libceph: switch to calc_target(), part 2
The crux of this is getting rid of ceph_osdc_build_request(), so that
MOSDOp can be encoded not before but after calc_target() calculates the
actual target. Encoding now happens within ceph_osdc_start_request().
Also nuked is the accompanying bunch of pointers into the encoded
buffer that was used to update fields on each send - instead, the
entire front is re-encoded. If we want to support target->name_len !=
base->name_len in the future, there is no other way, because oid is
surrounded by other fields in the encoded buffer.
Encoding OSD ops and adding data items to the request message were
mixed together in osd_req_encode_op(). While we want to re-encode OSD
ops, we don't want to add duplicate data items to the message when
resending, so all call to ceph_osdc_msg_data_add() are factored out
into a new setup_request_data().
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Ilya Dryomov [Thu, 28 Apr 2016 14:07:23 +0000 (16:07 +0200)]
libceph: switch to calc_target(), part 1
Replace __calc_request_pg() and most of __map_request() with
calc_target() and start using req->r_t.
ceph_osdc_build_request() however still encodes base_oid, because it's
called before calc_target() is and target_oid is empty at that point in
time; a printf in osdc_show() also shows base_oid. This is fixed in
"libceph: switch to calc_target(), part 2".
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Ilya Dryomov [Thu, 28 Apr 2016 14:07:23 +0000 (16:07 +0200)]
libceph: introduce ceph_osd_request_target, calc_target()
Introduce ceph_osd_request_target, containing all mapping-related
fields of ceph_osd_request and calc_target() for calculating mappings
and populating it.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Ilya Dryomov [Thu, 28 Apr 2016 14:07:23 +0000 (16:07 +0200)]
libceph: pi->min_size, pi->last_force_request_resend
Add and decode pi->min_size and pi->last_force_request_resend. These
are going to be used by calc_target().
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Ilya Dryomov [Thu, 28 Apr 2016 14:07:23 +0000 (16:07 +0200)]
libceph: make pgid_cmp() global
calc_target() code is going to need to know how to compare PGs. Take
lhs and rhs pgid by const * while at it.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Ilya Dryomov [Thu, 28 Apr 2016 14:07:23 +0000 (16:07 +0200)]
libceph: rename ceph_calc_pg_primary()
Rename ceph_calc_pg_primary() to ceph_pg_to_acting_primary() to
emphasise that it returns acting primary.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Ilya Dryomov [Thu, 28 Apr 2016 14:07:22 +0000 (16:07 +0200)]
libceph: ceph_osds, ceph_pg_to_up_acting_osds()
Knowning just acting set isn't enough, we need to be able to record up
set as well to detect interval changes. This means returning (up[],
up_len, up_primary, acting[], acting_len, acting_primary) and passing
it around. Introduce and switch to ceph_osds to help with that.
Rename ceph_calc_pg_acting() to ceph_pg_to_up_acting_osds() and return
both up and acting sets from it.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Ilya Dryomov [Thu, 28 Apr 2016 14:07:22 +0000 (16:07 +0200)]
libceph: rename ceph_oloc_oid_to_pg()
Rename ceph_oloc_oid_to_pg() to ceph_object_locator_to_pg(). Emphasise
that returned is raw PG and return -ENOENT instead of -EIO if the pool
doesn't exist.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Ilya Dryomov [Thu, 28 Apr 2016 14:07:22 +0000 (16:07 +0200)]
libceph: fix ceph_eversion encoding
eversion_t is version+epoch in userspace and is encoded in that order.
ceph_eversion is defined as epoch+version in rados.h, yet we memcpy it
in __send_request(). Reoder ceph_eversion fields.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Ilya Dryomov [Thu, 28 Apr 2016 14:07:22 +0000 (16:07 +0200)]
libceph: DEFINE_RB_FUNCS macro
Given
struct foo {
u64 id;
struct rb_node bar_node;
};
generate insert_bar(), erase_bar() and lookup_bar() functions with
DEFINE_RB_FUNCS(bar, struct foo, id, bar_node)
The key is assumed to be an integer (u64, int, etc), compared with
< and >. nodefld has to be initialized with RB_CLEAR_NODE().
Start using it for MDS, MON and OSD requests and OSD sessions.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Ilya Dryomov [Thu, 28 Apr 2016 14:07:22 +0000 (16:07 +0200)]
libceph: open-code remove_{all,old}_osds()
They are called only once, from ceph_osdc_stop() and
handle_osds_timeout() respectively.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Ilya Dryomov [Thu, 28 Apr 2016 14:07:21 +0000 (16:07 +0200)]
libceph: nuke unused fields and functions
Either unused or useless:
osdmap->mkfs_epoch
osd->o_marked_for_keepalive
monc->num_generic_requests
osdc->map_waiters
osdc->last_requested_map
osdc->timeout_tid
osd_req_op_cls_response_data()
osdmap_apply_incremental() @msgr arg
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Ilya Dryomov [Fri, 29 Apr 2016 18:01:25 +0000 (20:01 +0200)]
rbd: use header_oid instead of header_name
Switch to ceph_object_id and use ceph_oid_aprintf() instead of a bare
const char *. This reduces noise in rbd_dev_header_name().
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Ilya Dryomov [Fri, 29 Apr 2016 17:54:20 +0000 (19:54 +0200)]
libceph: variable-sized ceph_object_id
Currently ceph_object_id can hold object names of up to 100
(CEPH_MAX_OID_NAME_LEN) characters. This is enough for all use cases,
expect one - long rbd image names:
- a format 1 header is named "<imgname>.rbd"
- an object that points to a format 2 header is named "rbd_id.<imgname>"
We operate on these potentially long-named objects during rbd map, and,
for format 1 images, during header refresh. (A format 2 header name is
a small system-generated string.)
Lift this 100 character limit by making ceph_object_id be able to point
to an externally-allocated string. Apart from being able to work with
almost arbitrarily-long named objects, this allows us to reduce the
size of ceph_object_id from >100 bytes to 64 bytes.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Ilya Dryomov [Wed, 27 Apr 2016 16:32:56 +0000 (18:32 +0200)]
libceph: change how osd_op_reply message size is calculated
For a message pool message, preallocate a page, just like we do for
osd_op. For a normal message, take ceph_object_id into account and
don't bother subtracting CEPH_OSD_SLAB_OPS ceph_osd_ops.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Ilya Dryomov [Wed, 27 Apr 2016 12:15:51 +0000 (14:15 +0200)]
libceph: move message allocation out of ceph_osdc_alloc_request()
The size of ->r_request and ->r_reply messages depends on the size of
the object name (ceph_object_id), while the size of ceph_osd_request is
fixed. Move message allocation into a separate function that would
have to be called after ceph_object_id and ceph_object_locator (which
is also going to become variable in size with RADOS namespaces) have
been filled in:
req = ceph_osdc_alloc_request(...);
<fill in req->r_base_oid>
<fill in req->r_base_oloc>
ceph_osdc_alloc_messages(req);
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Ilya Dryomov [Tue, 26 Apr 2016 13:39:47 +0000 (15:39 +0200)]
libceph: grab snapc in ceph_osdc_alloc_request()
ceph_osdc_build_request() is going away. Grab snapc and initialize
->r_snapid in ceph_osdc_alloc_request().
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Ilya Dryomov [Tue, 26 Apr 2016 13:05:29 +0000 (15:05 +0200)]
libceph: make ceph_osdc_put_request() accept NULL
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Ilya Dryomov [Mon, 16 May 2016 11:18:57 +0000 (13:18 +0200)]
rbd: get/put img_request in rbd_img_request_submit()
By the time we get to checking for_each_obj_request_safe(img_request)
terminating condition, all obj_requests may be complete and img_request
ref, that rbd_img_request_submit() takes away from its caller, may be
put. Moving the next_obj_request cursor is then a use-after-free on
img_request.
It's totally benign, as the value that's read is never used, but
I think it's still worth fixing.
Cc: Alex Elder <elder@linaro.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Linus Torvalds [Wed, 25 May 2016 22:29:21 +0000 (15:29 -0700)]
Merge tag 'pm-4.7-rc1-more' of git://git./linux/kernel/git/rafael/linux-pm
Pull more power management updates from Rafael Wysocki:
"These are two stable-candidate fixes (PM core, cpuidle) and a bunch of
cpufreq cleanups.
Specifics:
- Stable-candidate cpuidle fix to make it check the right variable
when deciding whether or not to enable interrupts on the local CPU
so as to avoid enabling iterrupts too early in some cases if the
system has both coupled and per-core idle states (Daniel Lezcano).
- Stable-candidate PM core fix to make it handle failures at the
"late suspend" stage of device suspend consistently for all devices
regardless of whether or not async suspend/resume is enabled for
them (Rafael Wysocki).
- Cleanups in the cpufreq core, the schedutil governor and the
intel_pstate driver (Rafael Wysocki, Pankaj Gupta, Viresh Kumar)"
* tag 'pm-4.7-rc1-more' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM / sleep: Handle failures in device_suspend_late() consistently
cpufreq: schedutil: Improve prints messages with pr_fmt
cpuidle: Fix cpuidle_state_is_coupled() argument in cpuidle_enter()
cpufreq: simplified goto out in cpufreq_register_driver()
cpufreq: governor: CPUFREQ_GOV_STOP never fails
cpufreq: governor: CPUFREQ_GOV_POLICY_EXIT never fails
intel_pstate: Simplify conditional in intel_pstate_set_policy()
Al Viro [Wed, 25 May 2016 21:36:19 +0000 (17:36 -0400)]
do "fold checks into iterate_and_advance()" right
the only case when we should skip the iterate_and_advance() guts
is when nothing's left in the iterator, _not_ just when requested
amount is 0. Said guts will do nothing in the latter case anyway;
the problem we tried to deal with in the aforementioned commit is
that when there's nothing left *and* the amount requested is 0,
we might end up deferencing one iovec too many; the value we fetch
from there is discarded in that case, but theoretically it might
oops if the iovec array ends exactly at the end of page with the
next page not mapped.
Bailing out on zero size requested had an unexpected side effect -
zero-length segment in the beginning of iovec array ended up
throwing do_loop_readv_writev() into infinite spin; we do not
advance past the empty segment at all. Reproducer is trivial:
echo '#include <sys/uio.h>' >a.c
echo 'main() {char c; struct iovec v[] = {{&c,0},{&c,1}}; readv(0,v,2);}' >>a.c
cc a.c && ./a.out </proc/uptime
which should end up with the process not hanging. Probably ought to
go into LTP or xfstests...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Wed, 25 May 2016 21:34:41 +0000 (17:34 -0400)]
make xattr_resolve_handlers() safe to use with NULL ->s_xattr
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Andreas Gruenbacher [Mon, 9 May 2016 11:28:49 +0000 (13:28 +0200)]
xattr: Fail with -EINVAL for NULL attribute names
Commit
98e9cb57 improved the xattr name checks in xattr_resolve_name but
didn't update the NULL attribute name check appropriately, so NULL
attribute names lead to NULL pointer dereferences. Turn that into
-EINVAL results instead.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
fs/xattr.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
David Sterba [Wed, 25 May 2016 20:51:04 +0000 (22:51 +0200)]
Merge branch 'dev/comp-workspaces' into for-chris-4.7-
20160525
David Sterba [Wed, 25 May 2016 20:51:03 +0000 (22:51 +0200)]
Merge branch 'cleanups-4.7' into for-chris-4.7-
20160525
David Sterba [Wed, 25 May 2016 20:51:02 +0000 (22:51 +0200)]
Merge branch 'misc-4.7' into for-chris-4.7-
20160525
Nicholas D Steeves [Fri, 20 May 2016 01:18:45 +0000 (21:18 -0400)]
btrfs: fix string and comment grammatical issues and typos
Signed-off-by: Nicholas D Steeves <nsteeves@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Zhao Lei [Tue, 17 May 2016 09:37:38 +0000 (17:37 +0800)]
btrfs: scrub: Set bbio to NULL before calling btrfs_map_block
We usually call btrfs_put_bbio() when btrfs_map_block() failed,
btrfs_put_bbio() works right whether bbio is a valid value, or NULL.
But there is a exception, in some case, btrfs_map_block() will return
fail without touching *bbio(keeping its original value), and if bbio
was not initialized yet, invalid memory accessing will happened.
Above case is in scrub_missing_raid56_pages(), and similar case in
scrub_raid56_parity().
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Rafael J. Wysocki [Wed, 25 May 2016 20:11:28 +0000 (22:11 +0200)]
Merge branch 'acpi-battery'
* acpi-battery:
ACPI / battery: Correctly serialise with the pending async probe
Rafael J. Wysocki [Wed, 25 May 2016 19:54:45 +0000 (21:54 +0200)]
Merge branches 'pm-cpufreq', 'pm-cpuidle' and 'pm-core'
* pm-cpufreq:
cpufreq: schedutil: Improve prints messages with pr_fmt
cpufreq: simplified goto out in cpufreq_register_driver()
cpufreq: governor: CPUFREQ_GOV_STOP never fails
cpufreq: governor: CPUFREQ_GOV_POLICY_EXIT never fails
intel_pstate: Simplify conditional in intel_pstate_set_policy()
* pm-cpuidle:
cpuidle: Fix cpuidle_state_is_coupled() argument in cpuidle_enter()
* pm-core:
PM / sleep: Handle failures in device_suspend_late() consistently
Liu Bo [Wed, 18 May 2016 00:21:48 +0000 (17:21 -0700)]
Btrfs: fix unexpected return value of fiemap
btrfs's fiemap is supposed to return 0 on success and return < 0 on
error. however, ret becomes 1 after looking up the last file extent:
btrfs_lookup_file_extent ->
btrfs_search_slot(..., ins_len=0, cow=0)
and if the offset is beyond EOF, we'll get 'path' pointed to the place
of potentail insertion, and ret == 1.
This may confuse applications using ioctl(FIEL_IOC_FIEMAP).
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Liu Bo [Sat, 14 May 2016 00:06:59 +0000 (17:06 -0700)]
Btrfs: free sys_array eb as soon as possible
While reading sys_chunk_array in superblock, btrfs creates a temporary
extent buffer. Since we don't use it after finishing reading
sys_chunk_array, we don't need to keep it in memory.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Linus Torvalds [Wed, 25 May 2016 17:40:15 +0000 (10:40 -0700)]
Merge tag 'pwm/for-4.7-rc1' of git://git./linux/kernel/git/thierry.reding/linux-pwm
Pull pwm updates from Thierry Reding:
"This set of changes introduces an atomic API to the PWM subsystem.
This is influenced by the DRM atomic API that was introduced a while
back, though it is obviously a lot simpler. The fundamental idea
remains the same, though: drivers provide a single callback to
implement the atomic configuration of a PWM channel.
As a side-effect the PWM subsystem gains the ability for initial state
retrieval, so that the logical state mirrors that of the hardware.
Many use-cases don't care about this, but for others it is essential.
These new features require changes in all users, which these patches
take care of. The core is transitioned to use the atomic callback if
available and provides a fallback mechanism for other drivers.
Changes to transition users and drivers to the atomic API are
postponed to v4.8"
* tag 'pwm/for-4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm: (30 commits)
pwm: Add information about polarity, duty cycle and period to debugfs
pwm: Switch to the atomic API
pwm: Update documentation
pwm: Add core infrastructure to allow atomic updates
pwm: Add hardware readout infrastructure
pwm: Move the enabled/disabled info into pwm_state
pwm: Introduce the pwm_state concept
pwm: Keep PWM state in sync with hardware state
ARM: Explicitly apply PWM config extracted from pwm_args
drm: i915: Explicitly apply PWM config extracted from pwm_args
input: misc: pwm-beeper: Explicitly apply PWM config extracted from pwm_args
input: misc: max8997: Explicitly apply PWM config extracted from pwm_args
backlight: lm3630a: explicitly apply PWM config extracted from pwm_args
backlight: lp855x: Explicitly apply PWM config extracted from pwm_args
backlight: lp8788: Explicitly apply PWM config extracted from pwm_args
backlight: pwm_bl: Use pwm_get_args() where appropriate
fbdev: ssd1307fb: Use pwm_get_args() where appropriate
regulator: pwm: Use pwm_get_args() where appropriate
leds: pwm: Use pwm_get_args() where appropriate
input: misc: max77693: Use pwm_get_args() where appropriate
...
Tom Haynes [Wed, 25 May 2016 14:31:12 +0000 (07:31 -0700)]
nfs/flexfiles: Helper function to detect FF_FLAGS_NO_READ_IO
The mds can inform the client not to use the IOMODE_RW layout
segment for doing READs. I.e., it is basically a
IOMODE_WRITE layout segment.
It would do this to not interfere with the WRITEs.
Signed-off-by: Tom Haynes <loghyr@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Linus Torvalds [Wed, 25 May 2016 17:19:17 +0000 (10:19 -0700)]
Merge git://www.linux-watchdog.org/linux-watchdog
Pull watchdog updates from Wim Van Sebroeck:
- add support for Fintek F81865 Super-IO chip
- add support for watchdogs (RWDT and SWDT) found on RCar Gen3 based
SoCs from Renesas
- octeon: Handle the FROZEN hot plug notifier actions
- f71808e_wdt fixes and cleanups
- some small improvements in code and documentation
* git://www.linux-watchdog.org/linux-watchdog:
MAINTAINERS: Add file patterns for watchdog device tree bindings
Documentation: Add ebc-c384_wdt watchdog-parameters.txt entry
watchdog: shwdt: Use setup_timer()
watchdog: cpwd: Use setup_timer()
arm64: defconfig: enable Renesas Watchdog Timer
watchdog: renesas-wdt: add driver
watchdog: remove error message when unable to allocate watchdog device
watchdog: f71808e_wdt: Fix WDTMOUT_STS register read
watchdog: f71808e_wdt: Fix typo
watchdog: f71808e_wdt: Add F81865 support
watchdog: sp5100_tco: properly check for new register layouts
watchdog: core: Fix circular locking dependency
watchdog: core: fix trivial typo in a comment
watchdog: hpwdt: Adjust documentation to match latest kernel module parameters.
watchdog: imx2_wdt: add external reset support via dt prop
watchdog: octeon: Handle the FROZEN hot plug notifier actions.
watchdog: qcom: Report reboot reason
Weston Andros Adamson [Wed, 25 May 2016 14:07:23 +0000 (10:07 -0400)]
nfs: avoid race that crashes nfs_init_commit
Since the patch "NFS: Allow multiple commit requests in flight per file"
we can run multiple simultaneous commits on the same inode. This
introduced a race over collecting pages to commit that made it possible
to call nfs_init_commit() with an empty list - which causes crashes like
the one below.
The fix is to catch this race and avoid calling nfs_init_commit and
initiate_commit when there is no work to do.
Here is the crash:
[600522.076832] BUG: unable to handle kernel NULL pointer dereference at
0000000000000040
[600522.078475] IP: [<
ffffffffa0479e72>] nfs_init_commit+0x22/0x130 [nfs]
[600522.078745] PGD
4272b1067 PUD
4272cb067 PMD 0
[600522.078972] Oops: 0000 [#1] SMP
[600522.079204] Modules linked in: nfsv3 nfs_layout_flexfiles rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache dcdbas ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw vmw_vsock_vmci_transport vsock bonding ipmi_devintf ipmi_msghandler coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ppdev vmw_balloon parport_pc parport acpi_cpufreq vmw_vmci i2c_piix4 shpchp nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c vmwgfx drm_kms_helper ttm drm crc32c_intel serio_raw vmxnet3
[600522.081380] vmw_pvscsi ata_generic pata_acpi
[600522.081809] CPU: 3 PID: 15667 Comm: /usr/bin/python Not tainted 4.1.9-100.pd.88.el7.x86_64 #1
[600522.082281] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/30/2014
[600522.082814] task:
ffff8800bbbfa780 ti:
ffff88042ae84000 task.ti:
ffff88042ae84000
[600522.083378] RIP: 0010:[<
ffffffffa0479e72>] [<
ffffffffa0479e72>] nfs_init_commit+0x22/0x130 [nfs]
[600522.083973] RSP: 0018:
ffff88042ae87438 EFLAGS:
00010246
[600522.084571] RAX:
0000000000000000 RBX:
ffff880003485e40 RCX:
ffff88042ae87588
[600522.085188] RDX:
0000000000000000 RSI:
ffff88042ae874b0 RDI:
ffff880003485e40
[600522.085756] RBP:
ffff88042ae87448 R08:
ffff880003486010 R09:
ffff88042ae874b0
[600522.086332] R10:
0000000000000000 R11:
0000000000000005 R12:
ffff88042ae872d0
[600522.086905] R13:
ffff88042ae874b0 R14:
ffff880003485e40 R15:
ffff88042704c840
[600522.087484] FS:
00007f4728ff2740(0000) GS:
ffff88043fd80000(0000) knlGS:
0000000000000000
[600522.088070] CS: 0010 DS: 0000 ES: 0000 CR0:
000000008005003b
[600522.088663] CR2:
0000000000000040 CR3:
000000042b6aa000 CR4:
00000000001406e0
[600522.089327] Stack:
[600522.089926]
0000000000000001 ffff88042ae87588 ffff88042ae874f8 ffffffffa04f09fa
[600522.090549]
0000000000017840 0000000000017840 ffff88042ae87588 ffff8803258d9930
[600522.091169]
ffff88042ae87578 ffffffffa0563d80 0000000000000000 ffff88042704c840
[600522.091789] Call Trace:
[600522.092420] [<
ffffffffa04f09fa>] pnfs_generic_commit_pagelist+0x1da/0x320 [nfsv4]
[600522.093052] [<
ffffffffa0563d80>] ? ff_layout_commit_prepare_v3+0x30/0x30 [nfs_layout_flexfiles]
[600522.093696] [<
ffffffffa0562645>] ff_layout_commit_pagelist+0x15/0x20 [nfs_layout_flexfiles]
[600522.094359] [<
ffffffffa047bc78>] nfs_generic_commit_list+0xe8/0x120 [nfs]
[600522.095032] [<
ffffffffa047bd6a>] nfs_commit_inode+0xba/0x110 [nfs]
[600522.095719] [<
ffffffffa046ac54>] nfs_release_page+0x44/0xd0 [nfs]
[600522.096410] [<
ffffffff811a8122>] try_to_release_page+0x32/0x50
[600522.097109] [<
ffffffff811bd4f1>] shrink_page_list+0x961/0xb30
[600522.097812] [<
ffffffff811bdced>] shrink_inactive_list+0x1cd/0x550
[600522.098530] [<
ffffffff811bea65>] shrink_lruvec+0x635/0x840
[600522.099250] [<
ffffffff811bed60>] shrink_zone+0xf0/0x2f0
[600522.099974] [<
ffffffff811bf312>] do_try_to_free_pages+0x192/0x470
[600522.100709] [<
ffffffff811bf6ca>] try_to_free_pages+0xda/0x170
[600522.101464] [<
ffffffff811b2198>] __alloc_pages_nodemask+0x588/0x970
[600522.102235] [<
ffffffff811fbbd5>] alloc_pages_vma+0xb5/0x230
[600522.103000] [<
ffffffff813a1589>] ? cpumask_any_but+0x39/0x50
[600522.103774] [<
ffffffff811d6115>] wp_page_copy.isra.55+0x95/0x490
[600522.104558] [<
ffffffff810e3438>] ? __wake_up+0x48/0x60
[600522.105357] [<
ffffffff811d7d3b>] do_wp_page+0xab/0x4f0
[600522.106137] [<
ffffffff810a1bbb>] ? release_task+0x36b/0x470
[600522.106902] [<
ffffffff8126dbd7>] ? eventfd_ctx_read+0x67/0x1c0
[600522.107659] [<
ffffffff811da2a8>] handle_mm_fault+0xc78/0x1900
[600522.108431] [<
ffffffff81067ef1>] __do_page_fault+0x181/0x420
[600522.109173] [<
ffffffff811446a6>] ? __audit_syscall_exit+0x1e6/0x280
[600522.109893] [<
ffffffff810681c0>] do_page_fault+0x30/0x80
[600522.110594] [<
ffffffff81024f36>] ? syscall_trace_leave+0xc6/0x120
[600522.111288] [<
ffffffff81790a58>] page_fault+0x28/0x30
[600522.111947] Code: 5d c3 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 4c 8d 87 d0 01 00 00 48 89 e5 53 48 89 fb 48 83 ec 08 4c 8b 0e 49 8b 41 18 4c 39 ce <48> 8b 40 40 4c 8b 50 30 74 24 48 8b 87 d0 01 00 00 48 8b 7e 08
[600522.113343] RIP [<
ffffffffa0479e72>] nfs_init_commit+0x22/0x130 [nfs]
[600522.114003] RSP <
ffff88042ae87438>
[600522.114636] CR2:
0000000000000040
Fixes:
af7cf057 (NFS: Allow multiple commit requests in flight per file)
CC: stable@vger.kernel.org
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Linus Torvalds [Wed, 25 May 2016 16:47:26 +0000 (09:47 -0700)]
Merge tag 'vfio-v4.7-rc1' of git://github.com/awilliam/linux-vfio
Pull VFIO updates from Alex Williamson:
- Hide INTx on certain known broken devices (Alex Williamson)
- Additional backdoor reset detection (Alex Williamson)
- Remove unused iommudata reference (Alexey Kardashevskiy)
- Use cfg_size to avoid probing extended config space (Alexey
Kardashevskiy)
* tag 'vfio-v4.7-rc1' of git://github.com/awilliam/linux-vfio:
vfio_pci: Test for extended capabilities if config space > 256 bytes
vfio_iommu_spapr_tce: Remove unneeded iommu_group_get_iommudata
vfio/pci: Add test for BAR restore
vfio/pci: Hide broken INTx support from user
Linus Torvalds [Wed, 25 May 2016 16:37:50 +0000 (09:37 -0700)]
Merge tag 'drm-4.7-rc1-headers-fix' of git://people.freedesktop.org/~airlied/linux
Pull header warning fix from Dave Airlie:
"Here is the C++ guards warning fix from Arnd"
[ Background: there are 'extern "C" { }' guards in include/uapi for the
GPU headers.
They should arguably be wrapped somehow, but as it is they caused
checkpatch to warn because it would trigger on the 'extern' and think
it's exporting a function or variable from the kernel to user space.
This just fixes checkpatch. Whether we wrap the C++ guards some way
in the future will be an independent issue. ]
* tag 'drm-4.7-rc1-headers-fix' of git://people.freedesktop.org/~airlied/linux:
headers_check: don't warn about c++ guards
Linus Torvalds [Wed, 25 May 2016 16:27:52 +0000 (09:27 -0700)]
Merge branch 'parisc-4.7-1' of git://git./linux/kernel/git/deller/parisc-linux
Pull parisc updates from Helge Deller:
- Add native high-resolution timing code for sched_clock() and other
timing functions based on the processor internal cr16 cycle counters
- Add syscall tracepoint support
- Add regset support
- Speed up get_user() and put_user() functions
- Updated futex.h to match generic implementation (John David Anglin)
- A few smaller ftrace build fixes
- Fixed thuge-gen kernel self test to utilize architectured MAP_HUGETLB
value
- Added parisc architecture to seccomp_bpf kernel self test
- Various typo fixes (Andrea Gelmini)
* 'parisc-4.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Whitespace cleanups in unistd.h
parisc: Use long jump to reach ftrace_return_to_handler()
parisc: Fix typo in fpudispatch.c
parisc: Fix typos in eisa_eeprom.h
parisc: Fix typo in ldcw.h
parisc: Fix typo in pdc.h
parisc: Update futex.h to match generic implementation
parisc: Merge ftrace C-helper and assembler functions into .text.hot section
selftests/thuge-gen: Use platform specific MAP_HUGETLB value
parisc: Add native high-resolution sched_clock() implementation
parisc: Add ARCH_TRACEHOOK and regset support
parisc: Add 64bit get_user() and put_user() for 32bit kernel
parisc: Simplify and speed up get_user() and put_user()
parisc: Add syscall tracepoint support
Dan Carpenter [Mon, 23 May 2016 10:21:01 +0000 (13:21 +0300)]
NFS: checking for NULL instead of IS_ERR() in nfs_commit_file()
nfs_create_request() doesn't return NULL, it returns error pointers.
Fixes:
67911c8f18b5 ('NFS: Add nfs_commit_file()')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Janosch Frank [Wed, 18 May 2016 11:26:25 +0000 (13:26 +0200)]
tools: kvm_stat: Add comments
A lot of the code works with the perf events about which only sparse
documentation was available until 2012. Having that information now,
we can clarify what is done in the code.
Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Janosch Frank [Wed, 18 May 2016 11:26:24 +0000 (13:26 +0200)]
tools: kvm_stat: Introduce pid monitoring
Having stats for single VMs can help to determine the problem of a VM
without the need of running other tools like perf.
The tracepoints already allowed pid level monitoring, but kvm_stat
didn't have support for it till now. Support for the newly implemented
debugfs vm monitoring was also implemented.
Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Janosch Frank [Wed, 18 May 2016 11:26:23 +0000 (13:26 +0200)]
KVM: Create debugfs dir and stat files for each VM
This patch adds a kvm debugfs subdirectory for each VM, which is named
after its pid and file descriptor. The directories contain the same
kind of files that are already in the kvm debugfs directory, but the
data exported through them is now VM specific.
This makes the debugfs kvm data a convenient alternative to the
tracepoints which already have per VM data. The debugfs data is easy
to read and low overhead.
CC: Dan Carpenter <dan.carpenter@oracle.com> [includes fixes by Dan Carpenter]
Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Janosch Frank [Wed, 18 May 2016 11:26:22 +0000 (13:26 +0200)]
MAINTAINERS: Add kvm tools
The new kvm subdirectory in tools contains kvm related scripts.
Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Hemant Kumar [Tue, 19 Apr 2016 03:24:54 +0000 (08:54 +0530)]
tools: kvm_stat: Powerpc related fixes
kvm_stat script is failing to execute on powerpc :
# ./kvm_stat
Traceback (most recent call last):
File "./kvm_stat", line 825, in <module>
main()
File "./kvm_stat", line 813, in main
providers = get_providers(options)
File "./kvm_stat", line 778, in get_providers
providers.append(TracepointProvider())
File "./kvm_stat", line 416, in __init__
self.filters = get_filters()
File "./kvm_stat", line 315, in get_filters
if ARCH.exit_reasons:
AttributeError: 'ArchPPC' object has no attribute 'exit_reasons'
This is because, its trying to access a non-defined attribute.
Also, the IOCTL number of RESET is incorrect for powerpc. The correct
number has been added.
Signed-off-by: Hemant Kumar <hemant@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Tue, 24 May 2016 08:41:15 +0000 (10:41 +0200)]
tools: Add kvm_stat man page
Converted from the Texinfo source in QEMU to asciidoc. The a2x
incantation was provided by Janosch Frank.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Janosch Frank [Wed, 18 May 2016 11:26:21 +0000 (13:26 +0200)]
tools: Add kvm_stat vm monitor script
This tool displays kvm vm exit statistics to ease vm monitoring. It
takes its data from the kvm debugfs files or the vm tracepoints and
outputs them as a curses ui or simple text.
It was moved from qemu, as it is dependent on the kernel whereas qemu
works with a large number of kernel versions, some of which may break
the script.
Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Roman Kagan [Wed, 18 May 2016 14:48:20 +0000 (17:48 +0300)]
kvm:vmx: more complete state update on APICv on/off
The function to update APICv on/off state (in particular, to deactivate
it when enabling Hyper-V SynIC) is incomplete: it doesn't adjust
APICv-related fields among secondary processor-based VM-execution
controls. As a result, Windows 2012 guests get stuck when SynIC-based
auto-EOI interrupt intersected with e.g. an IPI in the guest.
In addition, the MSR intercept bitmap isn't updated every time "virtualize
x2APIC mode" is toggled. This path can only be triggered by a malicious
guest, because Windows didn't use x2APIC but rather their own synthetic
APIC access MSRs; however a guest running in a SynIC-enabled VM could
switch to x2APIC and thus obtain direct access to host APIC MSRs
(CVE-2016-4440).
The patch fixes those omissions.
Signed-off-by: Roman Kagan <rkagan@virtuozzo.com>
Reported-by: Steve Rutherford <srutherford@google.com>
Reported-by: Yang Zhang <yang.zhang.wz@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Jeff Mahoney [Tue, 24 May 2016 17:47:44 +0000 (13:47 -0400)]
drm/amd: add Kconfig dependency for ACP on DRM_AMDGPU
The DRM_AMD_ACP option doesn't have any dependencies and selects
MFD_CORE, which results in MFD_CORE=y. Since the code is only called
from DRM_AMDGPU, it should depend on it. Adding the dependency results
in MFD_CORE being selected as a module again if amdgpu is also a module.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Mario Kleiner [Tue, 24 May 2016 16:12:43 +0000 (18:12 +0200)]
drm/amdgpu: Fix hdmi deep color support.
When porting the hdmi deep color detection code from
radeon-kms to amdgpu-kms apparently some kind of
copy and paste error happened, attaching an else
branch to the wrong if statement.
The result is that hdmi deep color mode is always
disabled, regardless of gpu and display capabilities and
user wishes, as the code mistakenly thinks that the display
doesn't provide the required max_tmds_clock limit and falls
back to 8 bpc.
This patch fixes deep color support, as tested on a
R9 380 Tonga Pro + suitable display, and should be
backported to all kernels with amdgpu-kms support.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Cc: stable@vger.kernel.org
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Monk Liu [Wed, 18 May 2016 08:15:47 +0000 (16:15 +0800)]
drm/amdgpu: fix bug in fence driver fini
Using wrong counter for walking fences. Fixes
a crash when unloading the driver.
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Helge Deller [Wed, 25 May 2016 13:40:49 +0000 (15:40 +0200)]
parisc: Whitespace cleanups in unistd.h
Clean up whitespaces and mark unused syscalls as such.
Signed-off-by: Helge Deller <deller@gmx.de>
Peter Zijlstra [Mon, 23 May 2016 09:19:07 +0000 (11:19 +0200)]
sched/core: Fix remote wakeups
Commit:
b5179ac70de8 ("sched/fair: Prepare to fix fairness problems on migration")
... introduced a bug: Mike Galbraith found that it introduced a
performance regression, while Paul E. McKenney reported lost
wakeups and bisected it to this commit.
The reason is that I mis-read ttwu_queue() such that I assumed any
wakeup that got a remote queue must have had the task migrated.
Since this is not so; we need to transfer this information between
queueing the wakeup and actually doing the wakeup. Use a new
task_struct::sched_flag for this, we already write to
sched_contributes_to_load in the wakeup path so this is a hot and
modified cacheline.
Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reported-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Tested-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Hunter <ahh@google.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Ben Segall <bsegall@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul Turner <pjt@google.com>
Cc: Pavan Kondeti <pkondeti@codeaurora.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: byungchul.park@lge.com
Fixes:
b5179ac70de8 ("sched/fair: Prepare to fix fairness problems on migration")
Link: http://lkml.kernel.org/r/20160523091907.GD15728@worktop.ger.corp.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Dave Airlie [Wed, 25 May 2016 02:36:20 +0000 (12:36 +1000)]
Merge tag 'imx-drm-fixes-2016-05-24' of git://git.pengutronix.de/git/pza/linux into drm-next
imx-drm probing fix
Commit
950b410dd1ab ("gpu: ipu-v3: Fix imx-ipuv3-crtc module autoloading")
broke probing of the imx-drm driver in the non-modular case because the
unset dev->of_node during probing of imx-ipuv3-crtc would cause the
component matching to fail. This patch patch instead matches against
an of_node pointer stored in platform data, allowing dev->of_node to
be left unset for the platform probed imx-ipuv3-crtc devices.
* tag 'imx-drm-fixes-2016-05-24' of git://git.pengutronix.de/git/pza/linux:
drm/imx: Match imx-ipuv3-crtc components using device node in platform data
Linus Torvalds [Tue, 24 May 2016 22:50:58 +0000 (15:50 -0700)]
Merge tag 'armsoc-fixes' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Arnd Bergmann:
"This is a first set of bug fixes on top of what was merged for 4.7.
Two patches for lpc32xx address a harmless build warning that was just
introduced, one patch for the mediatek soc driver fixes a warning for
arm64, and the pxa changes are minor cleanups that should have been
part of the original pull requests but that I forgot to apply to the
cleanup-fixes branch earlier"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: lpc32xx: fix NR_IRQS confict
ARM: lpc32xx: remove legacy irq controller driver
soc: mtk-pmic-wrap: avoid integer overflow warning
ARM: pxa: Remove CLK_IS_ROOT
ARM: pxa: activate pinctrl for device-tree machines
Linus Torvalds [Tue, 24 May 2016 22:46:06 +0000 (15:46 -0700)]
Merge tag 'armsoc-late' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM SoC late DT updates from Arnd Bergmann:
"This is a collection of a few late fixes and other misc stuff that had
dependencies on things being merged from other trees.
The Renesas R-Car power domain handling, and the Nvidia Tegra USB
support both hand notable changes that required changing the DT
binding in a way that only provides compatibility with old DT blobs on
new kernels but not vice versa. As a consequence, the DT changes are
based on top of the driver changes and are now in this branch.
For NXP i.MX and Samsung Exynos, the changes in here depend on other
changes that got merged through the clk maintainer tree"
* tag 'armsoc-late' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (35 commits)
ARM: dts: exynos: Add support of Bus frequency using VDD_INT for exynos5422-odroidxu3
ARM: dts: exynos: Add bus nodes using VDD_INT for Exynos542x SoC
ARM: dts: exynos: Add NoC Probe dt node for Exynos542x SoC
ARM: dts: exynos: Add support of bus frequency for exynos4412-trats/odroidu3
ARM: dts: exynos: Expand the voltage range of buck1/3 regulator for exynos4412-odroidu3
ARM: dts: exynos: Add support of bus frequency using VDD_INT for exynos3250-rinato
ARM: dts: exynos: Add exynos4412-ppmu-common dtsi to delete duplicate PPMU nodes
ARM: dts: exynos: Add bus nodes using VDD_MIF for Exynos4210
ARM: dts: exynos: Add bus nodes using VDD_INT for Exynos4x12
ARM: dts: exynos: Add bus nodes using VDD_MIF for Exynos4x12
ARM: dts: exynos: Add bus nodes using VDD_INT for Exynos3250
ARM: dts: exynos: Add DMC bus frequency for exynos3250-rinato/monk
ARM: dts: exynos: Add DMC bus node for Exynos3250
ARM: tegra: Enable XUSB on Nyan
ARM: tegra: Enable XUSB on Jetson TK1
ARM: tegra: Enable XUSB on Venice2
ARM: tegra: Add Tegra124 XUSB controller
ARM: tegra: Move Tegra124 to the new XUSB pad controller binding
ARM: dts: r8a7794: Use SYSC "always-on" PM Domain
ARM: dts: r8a7793: Use SYSC "always-on" PM Domain
...
Linus Torvalds [Tue, 24 May 2016 22:24:37 +0000 (15:24 -0700)]
Merge tag 'asm-generic-4.7' of git://git./linux/kernel/git/arnd/asm-generic
Pull asm-generic cleanup from Arnd Bergmann:
"I have only one patch for asm-generic in this release, this one is
from James Hogan and updates the generic system call table for
renameat2 so we don't need to provide both renameat and renameat2 in
newly added architectures"
* tag 'asm-generic-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
asm-generic: Drop renameat syscall from default list
Linus Torvalds [Tue, 24 May 2016 21:39:20 +0000 (14:39 -0700)]
Merge tag 'nfsd-4.7' of git://linux-nfs.org/~bfields/linux
Pull nfsd updates from Bruce Fields:
"A very quiet cycle for nfsd, mainly just an RDMA update from Chuck
Lever"
* tag 'nfsd-4.7' of git://linux-nfs.org/~bfields/linux:
sunrpc: fix stripping of padded MIC tokens
svcrpc: autoload rdma module
svcrdma: Generalize svc_rdma_xdr_decode_req()
svcrdma: Eliminate code duplication in svc_rdma_recvfrom()
svcrdma: Drain QP before freeing svcrdma_xprt
svcrdma: Post Receives only for forward channel requests
svcrdma: Remove superfluous line from rdma_read_chunks()
svcrdma: svc_rdma_put_context() is invoked twice in Send error path
svcrdma: Do not add XDR padding to xdr_buf page vector
svcrdma: Support IPv6 with NFS/RDMA
nfsd: handle seqid wraparound in nfsd4_preprocess_layout_stateid
Remove unnecessary allocation
Linus Torvalds [Tue, 24 May 2016 19:55:26 +0000 (12:55 -0700)]
Merge tag 'ext4_for_linus' of git://git./linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
"Fix a number of bugs, most notably a potential stale data exposure
after a crash and a potential BUG_ON crash if a file has the data
journalling flag enabled while it has dirty delayed allocation blocks
that haven't been written yet. Also fix a potential crash in the new
project quota code and a maliciously corrupted file system.
In addition, fix some DAX-specific bugs, including when there is a
transient ENOSPC situation and races between writes via direct I/O and
an mmap'ed segment that could lead to lost I/O.
Finally the usual set of miscellaneous cleanups"
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (23 commits)
ext4: pre-zero allocated blocks for DAX IO
ext4: refactor direct IO code
ext4: fix race in transient ENOSPC detection
ext4: handle transient ENOSPC properly for DAX
dax: call get_blocks() with create == 1 for write faults to unwritten extents
ext4: remove unmeetable inconsisteny check from ext4_find_extent()
jbd2: remove excess descriptions for handle_s
ext4: remove unnecessary bio get/put
ext4: silence UBSAN in ext4_mb_init()
ext4: address UBSAN warning in mb_find_order_for_block()
ext4: fix oops on corrupted filesystem
ext4: fix check of dqget() return value in ext4_ioctl_setproject()
ext4: clean up error handling when orphan list is corrupted
ext4: fix hang when processing corrupted orphaned inode list
ext4: remove trailing \n from ext4_warning/ext4_error calls
ext4: fix races between changing inode journal mode and ext4_writepages
ext4: handle unwritten or delalloc buffers before enabling data journaling
ext4: fix jbd2 handle extension in ext4_ext_truncate_extend_restart()
ext4: do not ask jbd2 to write data for delalloc buffers
jbd2: add support for avoiding data writes during transaction commits
...
Arnd Bergmann [Thu, 19 May 2016 08:31:31 +0000 (10:31 +0200)]
ARM: lpc32xx: fix NR_IRQS confict
With the change to sparse IRQs, the lpc32xx platform gets a warning about
conflicting macros:
In file included from arch/arm/mach-lpc32xx/irq.c:31:0:
arch/arm/mach-lpc32xx/include/mach/irqs.h:115:0: warning: "NR_IRQS" redefined
#define NR_IRQS 96
arch/arm/include/asm/irq.h:9:0: note: this is the location of the previous definition
#define NR_IRQS NR_IRQS_LEGACY
One such instance was in the old irq driver that is now removed by
the previous patch, but any other file including mach/irqs.h still
has the issue. Since none of them use this constant, we can just
remove the old definition.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes:
8cb17b5ed017 ("irqchip: Add LPC32xx interrupt controller driver")
Vladimir Zapolskiy [Mon, 25 Apr 2016 01:00:44 +0000 (04:00 +0300)]
ARM: lpc32xx: remove legacy irq controller driver
New NXP LPC32xx irq chip driver is used instead of a legacy one.
[this also fixes a harmless build warning about the NR_IRQS redefinition]
Signed-off-by: Vladimir Zapolskiy <vz@mleia.com>
Acked-by: Sylvain Lemieux <slemieux.tyco@gmail.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Linus Torvalds [Tue, 24 May 2016 18:12:32 +0000 (11:12 -0700)]
Merge tag 'spi-v4.7' of git://git./linux/kernel/git/broonie/spi
Pull spi updates from Mark Brown:
"Another quiet release for SPI, almost entirely driver specific changes
with the diffstat dominated by two new drivers which are about two
thirds of it in terms of lines of code:
- new drivers for PIC32 standard and SQI controllers
- the Cadence driver has had runtime PM support added and quite a few
fixes and cleanups
- flash-specific accelerated path support now has a feature query
interface
- the pxa2xx driver has been moved to use the core DMA mapping support"
* tag 'spi-v4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: (48 commits)
spi: pic32-sqi: Fix linker error, undefined reference to `bad_dma_ops'
spi: dw-pci: Spelling s/paltforms/platforms/g
spi: pic32-sqi: Remove pic32_sqi_setup and pic32_sqi_cleanup
spi: Fix simple typo s/impelment/implement
spi: rockchip: potential NULL dereference on error
spi: zynqmp: disable clocks in error paths
spi: Drop unnecessary dependencies on relaxed I/O accessors
spi: qup: Add spi_master_put in remove function
spi: qup: Handle clocks in pm_runtime suspend and resume
spi: st-ssc4: Fix missing spi_master_put in spi_st_probe error paths
spi: st-ssc4: Allow compile test build
spi: omap2-mcspi: Use dma_request_chan() for requesting DMA channel
spi: davinci: Use dma_request_chan() for requesting DMA channel
spi: pic32: Fix checking return value of devm_ioremap_resource
spi: spi-fsl-dspi: Update DT binding documentation
spi: Drop duplicate code to set master->dev.parent
spi: pic32: Set proper bits_per_word_mask
spi: return error if kmap'd buffers passed to spi_map_buf()
spi: core: add hook flash_read_supported to spi_master
spi: pic32-sqi: silence array overflow warning
...
Linus Torvalds [Tue, 24 May 2016 18:00:20 +0000 (11:00 -0700)]
Merge tag 'for-linus-
20160523' of git://git.infradead.org/linux-mtd
Pull MTD updates from Brian Norris:
"First cycle with Boris as NAND maintainer! Many (most) bullets stolen
from him.
Generic:
- Migrated NAND LED trigger to be a generic MTD trigger
NAND:
- Introduction of the "ECC algorithm" concept, to avoid overloading
the ECC mode field too much more
- Replaced the nand_ecclayout infrastructure with something a little
more flexible (finally!) and future proof
- Rework of the OMAP GPMC and NAND drivers; the TI folks pulled some
of this into their own tree as well
- Prepare the sunxi NAND driver to receive DMA support
- Handle bitflips in erased pages on GPMI revisions that do not
support this in hardware.
SPI NOR:
- Start using the spi_flash_read() API for SPI drivers that support
it (i.e., SPI drivers with special memory-mapped flash modes)
And other small scattered improvments"
* tag 'for-linus-
20160523' of git://git.infradead.org/linux-mtd: (155 commits)
mtd: spi-nor: support GigaDevice gd25lq64c
mtd: nand_bch: fix spelling of "probably"
mtd: brcmnand: respect ECC algorithm set by NAND subsystem
gpmi-nand: Handle ECC Errors in erased pages
Documentation: devicetree: deprecate "soft_bch" nand-ecc-mode value
mtd: nand: add support for "nand-ecc-algo" DT property
mtd: mtd: drop NAND_ECC_SOFT_BCH enum value
mtd: drop support for NAND_ECC_SOFT_BCH as "soft_bch" mapping
mtd: nand: read ECC algorithm from the new field
mtd: nand: fsmc: validate ECC setup by checking algorithm directly
mtd: nand: set ECC algorithm to Hamming on fallback
staging: mt29f_spinand: set ECC algorithm explicitly
CRIS v32: nand: set ECC algorithm explicitly
mtd: nand: atmel: set ECC algorithm explicitly
mtd: nand: davinci: set ECC algorithm explicitly
mtd: nand: bf5xx: set ECC algorithm explicitly
mtd: nand: omap2: Fix high memory dma prefetch transfer
mtd: nand: omap2: Start dma request before enabling prefetch
mtd: nandsim: add __init attribute
mtd: nand: move of_get_nand_xxx() helpers into nand_base.c
...
Linus Torvalds [Tue, 24 May 2016 17:22:34 +0000 (10:22 -0700)]
Merge tag 'for-linus-4.7-rc0-tag' of git://git./linux/kernel/git/xen/tip
Pull xen bug fixes from David Vrabel.
* tag 'for-linus-4.7-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen: use same main loop for counting and remapping pages
xen/events: Don't move disabled irqs
xen/x86: actually allocate legacy interrupts on PV guests
Xen: don't warn about 2-byte wchar_t in efi
xen/gntdev: reduce copy batch size to 16
xen/x86: don't lose event interrupts
Linus Torvalds [Tue, 24 May 2016 16:46:45 +0000 (09:46 -0700)]
Merge tag 'for_linus' of git://git./linux/kernel/git/mst/vhost
Pull virtio updates from Michael Tsirkin:
"Looks like a quiet cycle for virtio. There's a new inorder option for
the ringtest tool, and a bugfix for balloon for ppc platforms when
using virtio 1 mode"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
ringtest: pass buf != NULL
virtio_balloon: fix PFN format for virtio-1
virtio: add inorder option
Linus Torvalds [Tue, 24 May 2016 16:41:46 +0000 (09:41 -0700)]
Merge tag 'nios2-v4.7' of git://git./linux/kernel/git/lftan/nios2
Pull nios2 update from Ley Foon Tan:
- add order-only DTC dependency to %.dtb target
- fix libgcc location detection
* tag 'nios2-v4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2:
nios2: Add order-only DTC dependency to %.dtb target
nios2: Fix libgcc location detection
Linus Torvalds [Tue, 24 May 2016 16:19:38 +0000 (09:19 -0700)]
Merge tag 'microblaze-4.7-rc1' of git://git.monstr.eu/linux-2.6-microblaze
Pull Microblaze updates from Michal Simek:
- Wire-up new syscalls
- Fix link error
* tag 'microblaze-4.7-rc1' of git://git.monstr.eu/linux-2.6-microblaze:
microblaze: pci: export isa_io_base to fix link errors
microblaze: Wire up userfaultfd, membarrier, mlock2 syscalls
Jiri Kosina [Tue, 24 May 2016 14:38:50 +0000 (16:38 +0200)]
MAINTAINERS: mark bcache as orphan
The submitted patches are not being reacted upon, and Jens is only picking
up stable fixes on an rather ad-hoc basis.
Link: lkml.kernel.org/r/
574462C5.40307@kernel.dk
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>