James Smart [Tue, 19 Sep 2017 23:33:56 +0000 (16:33 -0700)]
nvmet-fc: ensure target queue id within range.
When searching for queue id's ensure they are within the expected range.
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
James Smart [Tue, 19 Sep 2017 22:13:11 +0000 (15:13 -0700)]
nvmet-fc: on port remove call put outside lock
Avoid calling the put routine, as it may traverse to free routines while
holding the target lock.
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Sagi Grimberg [Thu, 21 Sep 2017 14:01:38 +0000 (17:01 +0300)]
nvme-rdma: don't fully stop the controller in error recovery
By calling nvme_stop_ctrl on a already failed controller will wait for the
scan work to complete (only by identify timeout expiration which is 60
seconds). This is unnecessary when we already know that the controller has
failed.
Reported-by: Yi Zhang <yizhan@redhat.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Sagi Grimberg [Thu, 21 Sep 2017 14:01:37 +0000 (17:01 +0300)]
nvme-rdma: give up reconnect if state change fails
If we failed to transition to state LIVE after a successful reconnect,
then controller deletion already started. In this case there is no
point moving forward with reconnect.
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Sagi Grimberg [Thu, 21 Sep 2017 14:01:36 +0000 (17:01 +0300)]
nvme-core: Use nvme_wq to queue async events and fw activation
async_event_work might race as it is executed from two different
workqueues at the moment.
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
James Smart [Thu, 21 Sep 2017 15:13:49 +0000 (08:13 -0700)]
nvme: fix sqhd reference when admin queue connect fails
Fix bug in sqhd patch.
It wasn't the sq that was at risk. In the case where the admin queue
connect command fails, the sq->size field is not set. Therefore, this
becomes a divide by zero error.
Add a quick check to bypass under this failure condition.
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Shaohua Li [Thu, 21 Sep 2017 19:17:16 +0000 (12:17 -0700)]
block: fix a crash caused by wrong API
part_stat_show takes a part device not a disk, so we should use
part_to_disk.
Fixes:
d62e26b3ffd2("block: pass in queue to inflight accounting")
Cc: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Omar Sandoval <osandov@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Lukas Czerner [Thu, 21 Sep 2017 14:16:29 +0000 (08:16 -0600)]
fs: Fix page cache inconsistency when mixing buffered and AIO DIO
Currently when mixing buffered reads and asynchronous direct writes it
is possible to end up with the situation where we have stale data in the
page cache while the new data is already written to disk. This is
permanent until the affected pages are flushed away. Despite the fact
that mixing buffered and direct IO is ill-advised it does pose a thread
for a data integrity, is unexpected and should be fixed.
Fix this by deferring completion of asynchronous direct writes to a
process context in the case that there are mapped pages to be found in
the inode. Later before the completion in dio_complete() invalidate
the pages in question. This ensures that after the completion the pages
in the written area are either unmapped, or populated with up-to-date
data. Also do the same for the iomap case which uses
iomap_dio_complete() instead.
This has a side effect of deferring the completion to a process context
for every AIO DIO that happens on inode that has pages mapped. However
since the consensus is that this is ill-advised practice the performance
implication should not be a problem.
This was based on proposal from Jeff Moyer, thanks!
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
James Smart [Mon, 18 Sep 2017 16:08:29 +0000 (09:08 -0700)]
nvmet: implement valid sqhd values in completions
To support sqhd, for initiators that are following the spec and
paying attention to sqhd vs their sqtail values:
- add sqhd to struct nvmet_sq
- initialize sqhd to 0 in nvmet_sq_setup
- rather than propagate the 0's-based qsize value from the connect message
which requires a +1 in every sqhd update, and as nothing else references
it, convert to 1's-based value in nvmt_sq/cq_setup() calls.
- validate connect message sqsize being non-zero per spec.
- updated assign sqhd for every completion that goes back.
Also remove handling the NULL sq case in __nvmet_req_complete, as it can't
happen with the current code.
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Guilherme G. Piccoli [Thu, 14 Sep 2017 16:59:28 +0000 (13:59 -0300)]
nvme-fabrics: Allow 0 as KATO value
Currently, driver code allows user to set 0 as KATO
(Keep Alive TimeOut), but this is not being respected.
This patch enforces the expected behavior.
Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
James Smart [Thu, 7 Sep 2017 20:18:04 +0000 (13:18 -0700)]
nvme: allow timed-out ios to retry
Currently the nvme_req_needs_retry() applies several checks to see if
a retry is allowed. On of those is whether the current time has exceeded
the start time of the io plus the timeout length. This check, if an io
times out, means there is never a retry allowed for the io. Which means
applications see the io failure.
Remove this check and allow the io to timeout, like it does on other
protocols, and retries to be made.
On the FC transport, a frame can be lost for an individual io, and there
may be no other errors that escalate for the connection/association.
The io will timeout, which causes the transport to escalate into creating
a new association, but the io that timed out, due to this retry logic, has
already failed back to the application and things are hosed.
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
James Smart [Thu, 14 Sep 2017 18:03:09 +0000 (11:03 -0700)]
nvme: stop aer posting if controller state not live
If an nvme async_event command completes, in most cases, a new
async event is posted. However, if the controller enters a
resetting or reconnecting state, there is nothing to block the
scheduled work element from posting the async event again. Nor are
there calls from the transport to stop async events when an
association dies.
In the case of FC, where the association is torn down, the aer must
be aborted on the FC link and completes through the normal job
completion path. Thus the terminated async event ends up being
rescheduled even though the controller isn't in a valid state for
the aer, and the reposting gets the transport into a partially torn
down data structure.
It's possible to hit the scenario on rdma, although much less likely
due to an aer completing right as the association is terminated and
as the association teardown reclaims the blk requests via
nvme_cancel_request() so its immediate, not a link-related action
like on FC.
Fix by putting controller state checks in both the async event
completion routine where it schedules the async event and in the
async event work routine before it calls into the transport. It's
effectively a "stop_async_events()" behavior. The transport, when
it creates a new association with the subsystem will transition
the state back to live and is already restarting the async event
posting.
Signed-off-by: James Smart <james.smart@broadcom.com>
[hch: remove taking a lock over reading the controller state]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Keith Busch [Fri, 15 Sep 2017 17:05:38 +0000 (13:05 -0400)]
nvme-pci: Print invalid SGL only once
The WARN_ONCE macro returns true if the condition is true, not if the
warn was raised, so we're printing the scatter list every time it's
invalid. This is excessive and makes debugging harder, so this patch
prints it just once.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Keith Busch [Thu, 14 Sep 2017 17:54:39 +0000 (13:54 -0400)]
nvme-pci: initialize queue memory before interrupts
A spurious interrupt before the nvme driver has initialized the completion
queue may inadvertently cause the driver to believe it has a completion
to process. This may result in a NULL dereference since the nvmeq's tags
are not set at this point.
The patch initializes the host's CQ memory so that a spurious interrupt
isn't mistaken for a real completion.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
James Smart [Mon, 11 Sep 2017 23:16:53 +0000 (16:16 -0700)]
nvmet-fc: fix failing max io queue connections
fc transport is treating NVMET_NR_QUEUES as maximum queue count, e.g.
admin queue plus NVMET_NR_QUEUES-1 io queues. But NVMET_NR_QUEUES is
the number of io queues, so maximum queue count is really
NVMET_NR_QUEUES+1.
Fix the handling in the target fc transport
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
James Smart [Thu, 7 Sep 2017 20:20:24 +0000 (13:20 -0700)]
nvme-fc: use transport-specific sgl format
Sync with NVM Express spec change and FC-NVME 1.18.
FC transport sets SGL type to Transport SGL Data Block Descriptor and
subtype to transport-specific value 0x0A.
Removed the warn-on's on the PRP fields. They are unneeded. They were
to check for values from the upper layer that weren't set right, and
for the most part were fine. But, with Async events, which reuse the
same structure and 2nd time issued the SGL overlay converted them to
the Transport SGL values - the warn-on's were errantly firing.
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
James Smart [Thu, 7 Sep 2017 20:20:23 +0000 (13:20 -0700)]
nvme: add transport SGL definitions
Add transport SGL defintions from NVMe TP 4008, required for
the final NVMe-FC standard.
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
James Smart [Thu, 7 Sep 2017 23:27:25 +0000 (16:27 -0700)]
nvme.h: remove FC transport-specific error values
The NVM express group recinded the reserved range for the transport.
Remove the FC-centric values that had been defined.
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
James Smart [Thu, 14 Sep 2017 18:30:15 +0000 (11:30 -0700)]
qla2xxx: remove use of FC-specific error codes
The qla2xxx driver uses the FC-specific error when it needed to return an
error to the FC-NVME transport. Convert to use a generic value instead.
Signed-off-by: James Smart <james.smart@broadcom.com>
Acked-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
James Smart [Thu, 7 Sep 2017 23:27:29 +0000 (16:27 -0700)]
lpfc: remove use of FC-specific error codes
The lpfc driver uses the FC-specific error when it needed to return an
error to the FC-NVME transport. Convert to use a generic value instead.
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
James Smart [Thu, 7 Sep 2017 23:27:28 +0000 (16:27 -0700)]
nvmet-fcloop: remove use of FC-specific error codes
The FC-NVME transport loopback test module used the FC-specific error
codes in cases where it emulated a transport abort case. Instead of
using the FC-specific values, now use a generic value (NVME_SC_INTERNAL).
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
James Smart [Thu, 7 Sep 2017 23:27:27 +0000 (16:27 -0700)]
nvmet-fc: remove use of FC-specific error codes
The FC-NVME target transport used the FC-specific error codes in
return codes when the transport or lldd failed. Instead of using the
FC-specific values, now use a generic value (NVME_SC_INTERNAL).
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
James Smart [Thu, 7 Sep 2017 23:27:26 +0000 (16:27 -0700)]
nvme-fc: remove use of FC-specific error codes
The FC-NVME transport used the FC-specific error codes in cases where
it had to fabricate an error to go back up stack. Instead of using the
FC-specific values, now use a generic value (NVME_SC_INTERNAL).
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Omar Sandoval [Wed, 20 Sep 2017 21:24:34 +0000 (14:24 -0700)]
loop: remove union of use_aio and ref in struct loop_cmd
When the request is completed, lo_complete_rq() checks cmd->use_aio.
However, if this is in fact an aio request, cmd->use_aio will have
already been reused as cmd->ref by lo_rw_aio*. Fix it by not using a
union. On x86_64, there's a hole after the union anyways, so this
doesn't make struct loop_cmd any bigger.
Fixes:
92d773324b7e ("block/loop: fix use after free")
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Waiman Long [Wed, 20 Sep 2017 19:12:20 +0000 (13:12 -0600)]
blktrace: Fix potential deadlock between delete & sysfs ops
The lockdep code had reported the following unsafe locking scenario:
CPU0 CPU1
---- ----
lock(s_active#228);
lock(&bdev->bd_mutex/1);
lock(s_active#228);
lock(&bdev->bd_mutex);
*** DEADLOCK ***
The deadlock may happen when one task (CPU1) is trying to delete a
partition in a block device and another task (CPU0) is accessing
tracing sysfs file (e.g. /sys/block/dm-1/trace/act_mask) in that
partition.
The s_active isn't an actual lock. It is a reference count (kn->count)
on the sysfs (kernfs) file. Removal of a sysfs file, however, require
a wait until all the references are gone. The reference count is
treated like a rwsem using lockdep instrumentation code.
The fact that a thread is in the sysfs callback method or in the
ioctl call means there is a reference to the opended sysfs or device
file. That should prevent the underlying block structure from being
removed.
Instead of using bd_mutex in the block_device structure, a new
blk_trace_mutex is now added to the request_queue structure to protect
access to the blk_trace structure.
Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Waiman Long <longman@redhat.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Fix typo in patch subject line, and prune a comment detailing how
the code used to work.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Josef Bacik [Sat, 6 May 2017 02:25:18 +0000 (22:25 -0400)]
nbd: ignore non-nbd ioctl's
In testing we noticed that nbd would spew if you ran a fio job against
the raw device itself. This is because fio calls a block device
specific ioctl, however the block layer will first pass this back to the
driver ioctl handler in case the driver wants to do something special.
Since the device was setup using netlink this caused us to spew every
time fio called this ioctl. Since we don't have special handling, just
error out for any non-nbd specific ioctl's that come in. This fixes the
spew.
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Thu, 7 Sep 2017 11:54:35 +0000 (13:54 +0200)]
bsg-lib: don't free job in bsg_prepare_job
The job structure is allocated as part of the request, so we should not
free it in the error path of bsg_prepare_job.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Mikulas Patocka [Wed, 13 Sep 2017 13:17:57 +0000 (09:17 -0400)]
brd: fix overflow in __brd_direct_access
The code in __brd_direct_access multiplies the pgoff variable by page size
and divides it by 512. It can cause overflow on 32-bit architectures. The
overflow happens if we create ramdisk larger than 4G and use it as a
sparse device.
This patch replaces multiplication and division with multiplication by the
number of sectors per page.
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Fixes:
1647b9b959c7 ("brd: add dax_operations support")
Cc: stable@vger.kernel.org # 4.12+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Wed, 20 Sep 2017 03:09:55 +0000 (17:09 -1000)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"This is a set of five small fixes: one is a null deref fix which is
pretty critical for the fc transport class and one fixes a potential
security issue of sg leaking kernel information"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: sg: fixup infoleak when using SG_GET_REQUEST_TABLE
scsi: sg: factor out sg_fill_request_table()
scsi: sd: Remove unnecessary condition in sd_read_block_limits()
scsi: acornscsi: fix build error
scsi: scsi_transport_fc: fix NULL pointer dereference in fc_bsg_job_timeout
Linus Torvalds [Wed, 20 Sep 2017 03:07:18 +0000 (17:07 -1000)]
Merge branch 'for-linus' of git://git./linux/kernel/git/ebiederm/user-namespace
Pull si_code fix from Eric Biederman:
"When sorting out the si_code ambiguity fcntl I accidentally overshot
and included SIGPOLL as well. Ooops! This is my trivial fix for that.
Vince Weaver caught this when it landed in your tree with his
perf_event_tests many of which started failing because the si_code
changed"
Quoth Vince Weaver:
"I've tested with this patch applied and can confirm all of my tests
now pass again"
Fixes:
d08477aa975e ("fcntl: Don't use ambiguous SIG_POLL si_codes")
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
fcntl: Don't set si_code to SI_SIGIO when sig == SIGPOLL
Linus Torvalds [Wed, 20 Sep 2017 03:05:53 +0000 (17:05 -1000)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull KVM fixes from Radim Krčmář:
- fix build without CONFIG_HAVE_KVM_IRQ_ROUTING
- fix NULL access in x86 CR access
- fix race with VMX posted interrups
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: VMX: remove WARN_ON_ONCE in kvm_vcpu_trigger_posted_interrupt
KVM: VMX: do not change SN bit in vmx_update_pi_irte()
KVM: x86: Fix the NULL pointer parameter in check_cr_write()
Revert "KVM: Don't accept obviously wrong gsi values via KVM_IRQFD"
Linus Torvalds [Tue, 19 Sep 2017 15:39:32 +0000 (08:39 -0700)]
Merge branch 'for-next' of git://git./linux/kernel/git/shli/md
Pull MD fixes from Shaohua Li:
"Two small patches to fix long-lived raid5 stripe batch bugs, one from
Dennis and the other from me"
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
md/raid5: preserve STRIPE_ON_UNPLUG_LIST in break_stripe_batch_list
md/raid5: fix a race condition in stripe batch
Linus Torvalds [Tue, 19 Sep 2017 15:35:42 +0000 (08:35 -0700)]
Merge tag '4.14-smb3-multidialect-support-and-fixes-for-stable' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
"Convert default dialect to smb2.1 or later to allow connecting to
Windows 7 for example, also includes some fixes for stable"
* tag '4.14-smb3-multidialect-support-and-fixes-for-stable' of git://git.samba.org/sfrench/cifs-2.6:
Update version of cifs module
cifs: hide unused functions
SMB3: Add support for multidialect negotiate (SMB2.1 and later)
CIFS/SMB3: Update documentation to reflect SMB3 and various changes
cifs: check rsp for NULL before dereferencing in SMB2_open
Haozhong Zhang [Mon, 18 Sep 2017 01:56:50 +0000 (09:56 +0800)]
KVM: VMX: remove WARN_ON_ONCE in kvm_vcpu_trigger_posted_interrupt
WARN_ON_ONCE(pi_test_sn(&vmx->pi_desc)) in kvm_vcpu_trigger_posted_interrupt()
intends to detect the violation of invariant that VT-d PI notification
event is not suppressed when vcpu is in the guest mode. Because the
two checks for the target vcpu mode and the target suppress field
cannot be performed atomically, the target vcpu mode may change in
between. If that does happen, WARN_ON_ONCE() here may raise false
alarms.
As the previous patch fixed the real invariant breaker, remove this
WARN_ON_ONCE() to avoid false alarms, and document the allowed cases
instead.
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Reported-by: "Ramamurthy, Venkatesh" <venkatesh.ramamurthy@intel.com>
Reported-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Fixes:
28b835d60fcc ("KVM: Update Posted-Interrupts Descriptor when vCPU is preempted")
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Haozhong Zhang [Mon, 18 Sep 2017 01:56:49 +0000 (09:56 +0800)]
KVM: VMX: do not change SN bit in vmx_update_pi_irte()
In kvm_vcpu_trigger_posted_interrupt() and pi_pre_block(), KVM
assumes that PI notification events should not be suppressed when the
target vCPU is not blocked.
vmx_update_pi_irte() sets the SN field before changing an interrupt
from posting to remapping, but it does not check the vCPU mode.
Therefore, the change of SN field may break above the assumption.
Besides, I don't see reasons to suppress notification events here, so
remove the changes of SN field to avoid race condition.
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Reported-by: "Ramamurthy, Venkatesh" <venkatesh.ramamurthy@intel.com>
Reported-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Fixes:
28b835d60fcc ("KVM: Update Posted-Interrupts Descriptor when vCPU is preempted")
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Yu Zhang [Mon, 18 Sep 2017 10:45:01 +0000 (18:45 +0800)]
KVM: x86: Fix the NULL pointer parameter in check_cr_write()
Routine check_cr_write() will trigger emulator_get_cpuid()->
kvm_cpuid() to get maxphyaddr, and NULL is passed as values
for ebx/ecx/edx. This is problematic because kvm_cpuid() will
dereference these pointers.
Fixes:
d1cd3ce90044 ("KVM: MMU: check guest CR3 reserved bits based on its physical address width.")
Reported-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Jan H. Schönherr [Sat, 16 Sep 2017 20:12:24 +0000 (22:12 +0200)]
Revert "KVM: Don't accept obviously wrong gsi values via KVM_IRQFD"
This reverts commit
36ae3c0a36b7456432fedce38ae2f7bd3e01a563.
The commit broke compilation on !CONFIG_HAVE_KVM_IRQ_ROUTING. Also,
there may be cases with CONFIG_HAVE_KVM_IRQ_ROUTING, where larger
gsi values make sense.
As the commit was meant as an early indicator to user space that
something is wrong, reverting just restores the previous behavior
where overly large values are ignored when encountered (without
any direct feedback).
Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Eric W. Biederman [Tue, 19 Sep 2017 03:51:14 +0000 (22:51 -0500)]
fcntl: Don't set si_code to SI_SIGIO when sig == SIGPOLL
When fixing things to avoid ambiguous cases I had a thinko
and included SIGPOLL/SIGIO in with all of the other signals
that have signal specific si_codes. Which is completely wrong.
Fix that.
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Linus Torvalds [Mon, 18 Sep 2017 15:44:51 +0000 (08:44 -0700)]
Merge tag 'mmc-v4.14-2' of git://git./linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
"MMC core:
- Fix trivial typo in Kconfig
- Fixup initialization of mmc block requests
MMC host:
- cavium: Fix use-after-free bug reported by KASAN"
* tag 'mmc-v4.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: cavium: Fix use-after-free in of_platform_device_destroy
mmc: host: fix typo after MMC_DEBUG move
mmc: block: Fix incorrectly initialized requests
Steve French [Tue, 12 Sep 2017 23:13:11 +0000 (18:13 -0500)]
Update version of cifs module
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Arnd Bergmann [Tue, 5 Sep 2017 09:24:15 +0000 (11:24 +0200)]
cifs: hide unused functions
The newly added SMB2+ attribute support causes unused function
warnings when CONFIG_CIFS_XATTR is disabled:
fs/cifs/smb2ops.c:563:1: error: 'smb2_set_ea' defined but not used [-Werror=unused-function]
smb2_set_ea(const unsigned int xid, struct cifs_tcon *tcon,
fs/cifs/smb2ops.c:513:1: error: 'smb2_query_eas' defined but not used [-Werror=unused-function]
smb2_query_eas(const unsigned int xid, struct cifs_tcon *tcon,
This adds another #ifdef around the affected functions.
Fixes:
5517554e4313 ("cifs: Add support for writing attributes on SMB2+")
Fixes:
95907fea4fd8 ("cifs: Add support for reading attributes on SMB2+")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Steve French <smfrench@gmail.com>
Steve French [Sun, 17 Sep 2017 15:41:35 +0000 (10:41 -0500)]
SMB3: Add support for multidialect negotiate (SMB2.1 and later)
With the need to discourage use of less secure dialect, SMB1 (CIFS),
we temporarily upgraded the dialect to SMB3 in 4.13, but since there
are various servers which only support SMB2.1 (2.1 is more secure
than CIFS/SMB1) but not optimal for a default dialect - add support
for multidialect negotiation. cifs.ko will now request SMB2.1
or later (ie SMB2.1 or SMB3.0, SMB3.02) and the server will
pick the latest most secure one it can support.
In addition since we are sending multidialect negotiate, add
support for secure negotiate to validate that a man in the
middle didn't downgrade us.
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
CC: Stable <stable@vger.kernel.org> # 4.13+
Steve French [Thu, 14 Sep 2017 19:51:20 +0000 (14:51 -0500)]
CIFS/SMB3: Update documentation to reflect SMB3 and various changes
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Linus Torvalds [Sun, 17 Sep 2017 15:16:36 +0000 (08:16 -0700)]
Merge branch 'core-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull misc fixes from Thomas Gleixner:
- A fix for a user space regression in /proc/$PID/stat
- A couple of objtool fixes:
~ Plug a memory leak
~ Avoid accessing empty sections which upsets certain binutil
versions
~ Prevent corrupting the obj file when section sizes did not change
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
fs/proc: Report eip/esp in /prod/PID/stat for coredumping
objtool: Fix object file corruption
objtool: Do not retrieve data from empty sections
objtool: Fix memory leak in elf_create_rela_section()
Linus Torvalds [Sun, 17 Sep 2017 15:15:11 +0000 (08:15 -0700)]
Merge branch 'irq-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull irq fix from Thomas Gleixner:
"Fix for an off by one error in a cpumask result comparison"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq: Fix cpumask check in __irq_startup_managed()
Linus Torvalds [Sun, 17 Sep 2017 15:05:54 +0000 (08:05 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 fix from Thomas Gleixner:
"A single fix addressing the missing CP8 feature bit in CPUID for a
range of AMD ZEN models/mask revisions"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu/AMD: Fix erratum 1076 (CPB bit)
Linus Torvalds [Sat, 16 Sep 2017 22:47:51 +0000 (15:47 -0700)]
Linux 4.14-rc1
Linus Torvalds [Sat, 16 Sep 2017 19:08:10 +0000 (12:08 -0700)]
Merge tag 'upstream-4.14-rc1' of git://git.infradead.org/linux-ubifs
Pull UBI updates from Richard Weinberger:
"Minor improvements"
* tag 'upstream-4.14-rc1' of git://git.infradead.org/linux-ubifs:
UBI: Fix two typos in comments
ubi: fastmap: fix spelling mistake: "invalidiate" -> "invalidate"
ubi: pr_err() strings should end with newlines
ubi: pr_err() strings should end with newlines
ubi: pr_err() strings should end with newlines
Linus Torvalds [Sat, 16 Sep 2017 19:03:25 +0000 (12:03 -0700)]
Merge branch 'for-linus-4.14-rc1' of git://git./linux/kernel/git/rw/uml
Pull UML updates from Richard Weinberger:
- minor improvements
- fixes for Debian's new gcc defaults (pie enabled by default)
- fixes for XSTATE/XSAVE to make UML work again on modern systems
* 'for-linus-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
um: return negative in tuntap_open_tramp()
um: remove a stray tab
um: Use relative modversions with LD_SCRIPT_DYN
um: link vmlinux with -no-pie
um: Fix CONFIG_GCOV for modules.
Fix minor typos and grammar in UML start_up help
um: defconfig: Cleanup from old Kconfig options
um: Fix FP register size for XSTATE/XSAVE
Linus Torvalds [Sat, 16 Sep 2017 18:28:59 +0000 (11:28 -0700)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Fix hotplug deadlock in hv_netvsc, from Stephen Hemminger.
2) Fix double-free in rmnet driver, from Dan Carpenter.
3) INET connection socket layer can double put request sockets, fix
from Eric Dumazet.
4) Don't match collect metadata-mode tunnels if the device is down,
from Haishuang Yan.
5) Do not perform TSO6/GSO on ipv6 packets with extensions headers in
be2net driver, from Suresh Reddy.
6) Fix scaling error in gen_estimator, from Eric Dumazet.
7) Fix 64-bit statistics deadlock in systemport driver, from Florian
Fainelli.
8) Fix use-after-free in sctp_sock_dump, from Xin Long.
9) Reject invalid BPF_END instructions in verifier, from Edward Cree.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (43 commits)
mlxsw: spectrum_router: Only handle IPv4 and IPv6 events
Documentation: link in networking docs
tcp: fix data delivery rate
bpf/verifier: reject BPF_ALU64|BPF_END
sctp: do not mark sk dumped when inet_sctp_diag_fill returns err
sctp: fix an use-after-free issue in sctp_sock_dump
netvsc: increase default receive buffer size
tcp: update skb->skb_mstamp more carefully
net: ipv4: fix l3slave check for index returned in IP_PKTINFO
net: smsc911x: Quieten netif during suspend
net: systemport: Fix 64-bit stats deadlock
net: vrf: avoid gcc-4.6 warning
qed: remove unnecessary call to memset
tg3: clean up redundant initialization of tnapi
tls: make tls_sw_free_resources static
sctp: potential read out of bounds in sctp_ulpevent_type_enabled()
MAINTAINERS: review Renesas DT bindings as well
net_sched: gen_estimator: fix scaling error in bytes/packets samples
nfp: wait for the NSP resource to appear on boot
nfp: wait for board state before talking to the NSP
...
Linus Torvalds [Sat, 16 Sep 2017 18:24:26 +0000 (11:24 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input
Pull more input updates from Dmitry Torokhov:
"A second round of updates for the input subsystem:
- a new driver for PWM-controlled vibrators
- ucb1400 touchscreen driver had completely busted suspend/resume
handling
- we now handle "home" button found on some devices with Goodix
touchscreens
- assorted other fixups"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: i8042 - add Gigabyte P57 to the keyboard reset table
Input: xpad - validate USB endpoint type during probe
Input: ucb1400_ts - fix suspend and resume handling
Input: edt-ft5x06 - fix access to non-existing register
Input: elantech - make arrays debounce_packet static, reduces object code size
Input: surface3_spi - make const array header static, reduces object code size
Input: goodix - add support for capacitive home button
Input: add a driver for PWM controllable vibrators
Input: adi - make array seq static, reduces object code size
Thomas Gleixner [Wed, 13 Sep 2017 21:29:03 +0000 (23:29 +0200)]
genirq: Fix cpumask check in __irq_startup_managed()
The result of cpumask_any_and() is invalid when result greater or equal
nr_cpu_ids. The current check is checking for greater only. Fix it.
Fixes:
761ea388e8c4 ("genirq: Handle managed irqs gracefully in irq_startup()")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Chen Yu <yu.c.chen@intel.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Alok Kataria <akataria@vmware.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: stable@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Rui Zhang <rui.zhang@intel.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Len Brown <lenb@kernel.org>
Link: http://lkml.kernel.org/r/20170913213152.272283444@linutronix.de
Markus Trippelsdorf [Sat, 16 Sep 2017 09:01:16 +0000 (11:01 +0200)]
firmware: Restore support for built-in firmware
Commit
5620a0d1aac ("firmware: delete in-kernel firmware") removed the
entire firmware directory. Unfortunately it thereby also removed the
support for built-in firmware.
This restores the ability to build firmware directly into the kernel by
pruning the original Makefile to the necessary minimum. The default for
EXTRA_FIRMWARE_DIR is now the standard directory /lib/firmware/.
Fixes:
5620a0d1aac ("firmware: delete in-kernel firmware")
Signed-off-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Acked-by: Greg K-H <gregkh@linuxfoundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ido Schimmel [Fri, 15 Sep 2017 13:31:07 +0000 (15:31 +0200)]
mlxsw: spectrum_router: Only handle IPv4 and IPv6 events
The driver doesn't support events from address families other than IPv4
and IPv6, so ignore them. Otherwise, we risk queueing a work item before
it's initialized.
This can happen in case a VRF is configured when MROUTE_MULTIPLE_TABLES
is enabled, as the VRF driver will try to add an l3mdev rule for the
IPMR family.
Fixes:
65e65ec137f4 ("mlxsw: spectrum_router: Don't ignore IPv6 notifications")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Andreas Rammhold <andreas@rammhold.de>
Reported-by: Florian Klink <flokli@flokli.de>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Machek [Sat, 16 Sep 2017 14:28:02 +0000 (16:28 +0200)]
Documentation: link in networking docs
Fix link in filter.txt.
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 15 Sep 2017 23:47:42 +0000 (16:47 -0700)]
tcp: fix data delivery rate
Now skb->mstamp_skb is updated later, we also need to call
tcp_rate_skb_sent() after the update is done.
Fixes:
8c72c65b426b ("tcp: update skb->skb_mstamp more carefully")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sat, 16 Sep 2017 03:43:33 +0000 (20:43 -0700)]
Merge branch '4.14-features' of git://git.linux-mips.org/ralf/upstream-linus
Pull MIPS updates from Ralf Baechle:
"This is the main pull request for 4.14 for MIPS; below a summary of
the non-merge commits:
CM:
- Rename mips_cm_base to mips_gcr_base
- Specify register size when generating accessors
- Use BIT/GENMASK for register fields, order & drop shifts
- Add cluster & block args to mips_cm_lock_other()
CPC:
- Use common CPS accessor generation macros
- Use BIT/GENMASK for register fields, order & drop shifts
- Introduce register modify (set/clear/change) accessors
- Use change_*, set_* & clear_* where appropriate
- Add CM/CPC 3.5 register definitions
- Use GlobalNumber macros rather than magic numbers
- Have asm/mips-cps.h include CM & CPC headers
- Cluster support for topology functions
- Detect CPUs in secondary clusters
CPS:
- Read GIC_VL_IDENT directly, not via irqchip driver
DMA:
- Consolidate coherent and non-coherent dma_alloc code
- Don't use dma_cache_sync to implement fd_cacheflush
FPU emulation / FP assist code:
- Another series of 14 commits fixing corner cases such as NaN
propgagation and other special input values.
- Zero bits 32-63 of the result for a CLASS.D instruction.
- Enhanced statics via debugfs
- Do not use bools for arithmetic. GCC 7.1 moans about this.
- Correct user fault_addr type
Generic MIPS:
- Enhancement of stack backtraces
- Cleanup from non-existing options
- Handle non word sized instructions when examining frame
- Fix detection and decoding of ADDIUSP instruction
- Fix decoding of SWSP16 instruction
- Refactor handling of stack pointer in get_frame_info
- Remove unreachable code from force_fcr31_sig()
- Convert to using %pOF instead of full_name
- Remove the R6000 support.
- Move FP code from *_switch.S to *_fpu.S
- Remove unused ST_OFF from r2300_switch.S
- Allow platform to specify multiple its.S files
- Add #includes to various files to ensure code builds reliable and
without warning..
- Remove __invalidate_kernel_vmap_range
- Remove plat_timer_setup
- Declare various variables & functions static
- Abstract CPU core & VP(E) ID access through accessor functions
- Store core & VP IDs in GlobalNumber-style variable
- Unify checks for sibling CPUs
- Add CPU cluster number accessors
- Prevent direct use of generic_defconfig
- Make CONFIG_MIPS_MT_SMP default y
- Add __ioread64_copy
- Remove unnecessary inclusions of linux/irqchip/mips-gic.h
GIC:
- Introduce asm/mips-gic.h with accessor functions
- Use new GIC accessor functions in mips-gic-timer
- Remove counter access functions from irq-mips-gic.c
- Remove gic_read_local_vp_id() from irq-mips-gic.c
- Simplify shared interrupt pending/mask reads in irq-mips-gic.c
- Simplify gic_local_irq_domain_map() in irq-mips-gic.c
- Drop gic_(re)set_mask() functions in irq-mips-gic.c
- Remove gic_set_polarity(), gic_set_trigger(), gic_set_dual_edge(),
gic_map_to_pin() and gic_map_to_vpe() from irq-mips-gic.c.
- Convert remaining shared reg access, local int mask access and
remaining local reg access to new accessors
- Move GIC_LOCAL_INT_* to asm/mips-gic.h
- Remove GIC_CPU_INT* macros from irq-mips-gic.c
- Move various definitions to the driver
- Remove gic_get_usm_range()
- Remove __gic_irq_dispatch() forward declaration
- Remove gic_init()
- Use mips_gic_present() in place of gic_present and remove
gic_present
- Move gic_get_c0_*_int() to asm/mips-gic.h
- Remove linux/irqchip/mips-gic.h
- Inline __gic_init()
- Inline gic_basic_init()
- Make pcpu_masks a per-cpu variable
- Use pcpu_masks to avoid reading GIC_SH_MASK*
- Clean up mti, reserved-cpu-vectors handling
- Use cpumask_first_and() in gic_set_affinity()
- Let the core set struct irq_common_data affinity
microMIPS:
- Fix microMIPS stack unwinding on big endian systems
MIPS-GIC:
- SYNC after enabling GIC region
NUMA:
- Remove the unused parent_node() macro
R6:
- Constify r2_decoder_tables
- Add accessor & bit definitions for GlobalNumber
SMP:
- Constify smp ops
- Allow boot_secondary SMP op to return errors
VDSO:
- Drop gic_get_usm_range() usage
- Avoid use of linux/irqchip/mips-gic.h
Platform changes:
Alchemy:
- Add devboard machine type to cpuinfo
- update cpu feature overrides
- Threaded carddetect irqs for devboards
AR7:
- allow NULL clock for clk_get_rate
BCM63xx:
- Fix ENETDMA_6345_MAXBURST_REG offset
- Allow NULL clock for clk_get_rate
CI20:
- Enable GPIO and RTC drivers in defconfig
- Add ethernet and fixed-regulator nodes to DTS
Generic platform:
- Move Boston and NI 169445 FIT image source to their own files
- Include asm/bootinfo.h for plat_fdt_relocated()
- Include asm/time.h for get_c0_*_int()
- Include asm/bootinfo.h for plat_fdt_relocated()
- Include asm/time.h for get_c0_*_int()
- Allow filtering enabled boards by requirements
- Don't explicitly disable CONFIG_USB_SUPPORT
- Bump default NR_CPUS to 16
JZ4700:
- Probe the jz4740-rtc driver from devicetree
Lantiq:
- Drop check of boot select from the spi-falcon driver.
- Drop check of boot select from the lantiq-flash MTD driver.
- Access boot cause register in the watchdog driver through regmap
- Add device tree binding documentation for the watchdog driver
- Add docs for the RCU DT bindings.
- Convert the fpi bus driver to a platform_driver
- Remove ltq_reset_cause() and ltq_boot_select(
- Switch to a proper reset driver
- Switch to a new drivers/soc GPHY driver
- Add an USB PHY driver for the Lantiq SoCs using the RCU module
- Use of_platform_default_populate instead of __dt_register_buses
- Enable MFD_SYSCON to be able to use it for the RCU MFD
- Replace ltq_boot_select() with dummy implementation.
Loongson 2F:
- Allow NULL clock for clk_get_rate
Malta:
- Use new GIC accessor functions
NI 169445:
- Add support for NI 169445 board.
- Only include in 32r2el kernels
Octeon:
- Add support for watchdog of 78XX SOCs.
- Add support for watchdog of CN68XX SOCs.
- Expose support for mips32r1, mips32r2 and mips64r1
- Enable more drivers in config file
- Add support for accessing the boot vector.
- Remove old boot vector code from watchdog driver
- Define watchdog registers for 70xx, 73xx, 78xx, F75xx.
- Make CSR functions node aware.
- Allow access to CIU3 IRQ domains.
- Misc cleanups in the watchdog driver
Omega2+:
- New board, add support and defconfig
Pistachio:
- Enable Root FS on NFS in defconfig
Ralink:
- Add Mediatek MT7628A SoC
- Allow NULL clock for clk_get_rate
- Explicitly request exclusive reset control in the pci-mt7620 PCI driver.
SEAD3:
- Only include in 32 bit kernels by default
VoCore:
- Add VoCore as a vendor t0 dt-bindings
- Add defconfig file"
* '4.14-features' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (167 commits)
MIPS: Refactor handling of stack pointer in get_frame_info
MIPS: Stacktrace: Fix microMIPS stack unwinding on big endian systems
MIPS: microMIPS: Fix decoding of swsp16 instruction
MIPS: microMIPS: Fix decoding of addiusp instruction
MIPS: microMIPS: Fix detection of addiusp instruction
MIPS: Handle non word sized instructions when examining frame
MIPS: ralink: allow NULL clock for clk_get_rate
MIPS: Loongson 2F: allow NULL clock for clk_get_rate
MIPS: BCM63XX: allow NULL clock for clk_get_rate
MIPS: AR7: allow NULL clock for clk_get_rate
MIPS: BCM63XX: fix ENETDMA_6345_MAXBURST_REG offset
mips: Save all registers when saving the frame
MIPS: Add DWARF unwinding to assembly
MIPS: Make SAVE_SOME more standard
MIPS: Fix issues in backtraces
MIPS: jz4780: DTS: Probe the jz4740-rtc driver from devicetree
MIPS: Ci20: Enable RTC driver
watchdog: octeon-wdt: Add support for 78XX SOCs.
watchdog: octeon-wdt: Add support for cn68XX SOCs.
watchdog: octeon-wdt: File cleaning.
...
Linus Torvalds [Sat, 16 Sep 2017 03:25:06 +0000 (20:25 -0700)]
Merge tag 'pci-v4.14-fixes-1' of git://git./linux/kernel/git/helgaas/pci
Pull PCI fix from Bjorn Helgaas:
"Revert an attempt to fix a race while enabling upstream bridges
because it broke iwlwifi firmware loading"
* tag 'pci-v4.14-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
Revert "PCI: Avoid race while enabling upstream bridges"
Linus Torvalds [Sat, 16 Sep 2017 00:52:52 +0000 (17:52 -0700)]
Merge tag 'drm-fixes-for-v4.14-rc1' of git://people.freedesktop.org/~airlied/linux
Pull drm AMD fixes from Dave Airlie:
"Just had a single AMD fixes pull from Alex for rc1"
* tag 'drm-fixes-for-v4.14-rc1' of git://people.freedesktop.org/~airlied/linux:
drm/amdgpu: revert "fix deadlock of reservation between cs and gpu reset v2"
drm/amdgpu: remove duplicate return statement
drm/amdgpu: check memory allocation failure
drm/amd/amdgpu: fix BANK_SELECT on Vega10 (v2)
drm/amdgpu: inline amdgpu_ttm_do_bind again
drm/amdgpu: fix amdgpu_ttm_bind
drm/amdgpu: remove the GART copy hack
drm/ttm:fix wrong decoding of bo_count
drm/ttm: fix missing inc bo_count
drm/amdgpu: set sched_hw_submission higher for KIQ (v3)
drm/amdgpu: move default gart size setting into gmc modules
drm/amdgpu: refine default gart size
drm/amd/powerplay: ACG frequency added in PPTable
drm/amdgpu: discard commands of killed processes
drm/amdgpu: fix and cleanup shadow handling
drm/amdgpu: add automatic per asic settings for gart_size
drm/amdgpu/gfx8: fix spelling typo in mqd allocation
drm/amd/powerplay: unhalt mec after loading
drm/amdgpu/virtual_dce: Virtual display doesn't support disable vblank immediately
drm/amdgpu: Fix huge page updates with CPU
Linus Torvalds [Sat, 16 Sep 2017 00:49:46 +0000 (17:49 -0700)]
Merge branch 'i2c/for-next' of git://git./linux/kernel/git/wsa/linux
Pull more i2c updates from Wolfram Sang:
"I2C has two more new drivers: Altera FPGA and STM32F7"
* 'i2c/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: i2c-stm32f7: add driver
i2c: i2c-stm32f4: use generic definition of speed enum
dt-bindings: i2c-stm32: Document the STM32F7 I2C bindings
i2c: altera: Add Altera I2C Controller driver
dt-bindings: i2c: Add Altera I2C Controller
Linus Torvalds [Fri, 15 Sep 2017 22:43:55 +0000 (15:43 -0700)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull more KVM updates from Paolo Bonzini:
- PPC bugfixes
- RCU splat fix
- swait races fix
- pointless userspace-triggerable BUG() fix
- misc fixes for KVM_RUN corner cases
- nested virt correctness fixes + one host DoS
- some cleanups
- clang build fix
- fix AMD AVIC with default QEMU command line options
- x86 bugfixes
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (28 commits)
kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly
kvm: vmx: Handle VMLAUNCH/VMRESUME failure properly
kvm: nVMX: Remove nested_vmx_succeed after successful VM-entry
kvm,mips: Fix potential swait_active() races
kvm,powerpc: Serialize wq active checks in ops->vcpu_kick
kvm: Serialize wq active checks in kvm_vcpu_wake_up()
kvm,x86: Fix apf_task_wake_one() wq serialization
kvm,lapic: Justify use of swait_active()
kvm,async_pf: Use swq_has_sleeper()
sched/wait: Add swq_has_sleeper()
KVM: VMX: Do not BUG() on out-of-bounds guest IRQ
KVM: Don't accept obviously wrong gsi values via KVM_IRQFD
kvm: nVMX: Don't allow L2 to access the hardware CR8
KVM: trace events: update list of exit reasons
KVM: async_pf: Fix #DF due to inject "Page not Present" and "Page Ready" exceptions simultaneously
KVM: X86: Don't block vCPU if there is pending exception
KVM: SVM: Add irqchip_split() checks before enabling AVIC
KVM: Add struct kvm_vcpu pointer parameter to get_enable_apicv()
KVM: SVM: Refactor AVIC vcpu initialization into avic_init_vcpu()
KVM: x86: fix clang build
...
Edward Cree [Fri, 15 Sep 2017 13:37:38 +0000 (14:37 +0100)]
bpf/verifier: reject BPF_ALU64|BPF_END
Neither ___bpf_prog_run nor the JITs accept it.
Also adds a new test case.
Fixes:
17a5267067f3 ("bpf: verifier (add verifier core)")
Signed-off-by: Edward Cree <ecree@solarflare.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Fri, 15 Sep 2017 03:02:48 +0000 (11:02 +0800)]
sctp: do not mark sk dumped when inet_sctp_diag_fill returns err
sctp_diag would not actually dump out sk/asoc if inet_sctp_diag_fill
returns err, in which case it shouldn't mark sk dumped by setting
cb->args[3] as 1 in sctp_sock_dump().
Otherwise, it could cause some asocs to have no parent's sk dumped
in 'ss --sctp'.
So this patch is to not set cb->args[3] when inet_sctp_diag_fill()
returns err in sctp_sock_dump().
Fixes:
8f840e47f190 ("sctp: add the sctp_diag.c file")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Fri, 15 Sep 2017 03:02:21 +0000 (11:02 +0800)]
sctp: fix an use-after-free issue in sctp_sock_dump
Commit
86fdb3448cc1 ("sctp: ensure ep is not destroyed before doing the
dump") tried to fix an use-after-free issue by checking !sctp_sk(sk)->ep
with holding sock and sock lock.
But Paolo noticed that endpoint could be destroyed in sctp_rcv without
sock lock protection. It means the use-after-free issue still could be
triggered when sctp_rcv put and destroy ep after sctp_sock_dump checks
!ep, although it's pretty hard to reproduce.
I could reproduce it by mdelay in sctp_rcv while msleep in sctp_close
and sctp_sock_dump long time.
This patch is to add another param cb_done to sctp_for_each_transport
and dump ep->assocs with holding tsp after jumping out of transport's
traversal in it to avoid this issue.
It can also improve sctp diag dump to make it run faster, as no need
to save sk into cb->args[5] and keep calling sctp_for_each_transport
any more.
This patch is also to use int * instead of int for the pos argument
in sctp_for_each_transport, which could make postion increment only
in sctp_for_each_transport and no need to keep changing cb->args[2]
in sctp_sock_filter and sctp_sock_dump any more.
Fixes:
86fdb3448cc1 ("sctp: ensure ep is not destroyed before doing the dump")
Reported-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Hemminger [Thu, 14 Sep 2017 16:31:07 +0000 (09:31 -0700)]
netvsc: increase default receive buffer size
The default receive buffer size was reduced by recent change
to a value which was appropriate for 10G and Windows Server 2016.
But the value is too small for full performance with 40G on Azure.
Increase the default back to maximum supported by host.
Fixes:
8b5327975ae1 ("netvsc: allow controlling send/recv buffer size")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 14 Sep 2017 03:30:39 +0000 (20:30 -0700)]
tcp: update skb->skb_mstamp more carefully
liujian reported a problem in TCP_USER_TIMEOUT processing with a patch
in tcp_probe_timer() :
https://www.spinics.net/lists/netdev/msg454496.html
After investigations, the root cause of the problem is that we update
skb->skb_mstamp of skbs in write queue, even if the attempt to send a
clone or copy of it failed. One reason being a routing problem.
This patch prevents this, solving liujian issue.
It also removes a potential RTT miscalculation, since
__tcp_retransmit_skb() is not OR-ing TCP_SKB_CB(skb)->sacked with
TCPCB_EVER_RETRANS if a failure happens, but skb->skb_mstamp has
been changed.
A future ACK would then lead to a very small RTT sample and min_rtt
would then be lowered to this too small value.
Tested:
# cat user_timeout.pkt
--local_ip=192.168.102.64
0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0
+0 `ifconfig tun0 192.168.102.64/16; ip ro add 192.0.2.1 dev tun0`
+0 < S 0:0(0) win 0 <mss 1460>
+0 > S. 0:0(0) ack 1 <mss 1460>
+.1 < . 1:1(0) ack 1 win 65530
+0 accept(3, ..., ...) = 4
+0 setsockopt(4, SOL_TCP, TCP_USER_TIMEOUT, [3000], 4) = 0
+0 write(4, ..., 24) = 24
+0 > P. 1:25(24) ack 1 win 29200
+.1 < . 1:1(0) ack 25 win 65530
//change the ipaddress
+1 `ifconfig tun0 192.168.0.10/16`
+1 write(4, ..., 24) = 24
+1 write(4, ..., 24) = 24
+1 write(4, ..., 24) = 24
+1 write(4, ..., 24) = 24
+0 `ifconfig tun0 192.168.102.64/16`
+0 < . 1:2(1) ack 25 win 65530
+0 `ifconfig tun0 192.168.0.10/16`
+3 write(4, ..., 24) = -1
# ./packetdrill user_timeout.pkt
Signed-off-by: Eric Dumazet <edumazet@googl.com>
Reported-by: liujian <liujian56@huawei.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
John Ogness [Thu, 14 Sep 2017 09:42:17 +0000 (11:42 +0200)]
fs/proc: Report eip/esp in /prod/PID/stat for coredumping
Commit
0a1eb2d474ed ("fs/proc: Stop reporting eip and esp in
/proc/PID/stat") stopped reporting eip/esp because it is
racy and dangerous for executing tasks. The comment adds:
As far as I know, there are no use programs that make any
material use of these fields, so just get rid of them.
However, existing userspace core-dump-handler applications (for
example, minicoredumper) are using these fields since they
provide an excellent cross-platform interface to these valuable
pointers. So that commit introduced a user space visible
regression.
Partially revert the change and make the readout possible for
tasks with the proper permissions and only if the target task
has the PF_DUMPCORE flag set.
Fixes:
0a1eb2d474ed ("fs/proc: Stop reporting eip and esp in> /proc/PID/stat")
Reported-by: Marco Felsch <marco.felsch@preh.de>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Cc: Tycho Andersen <tycho.andersen@canonical.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: stable@vger.kernel.org
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Linux API <linux-api@vger.kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/87poatfwg6.fsf@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
David Ahern [Thu, 14 Sep 2017 00:11:37 +0000 (17:11 -0700)]
net: ipv4: fix l3slave check for index returned in IP_PKTINFO
rt_iif is only set to the actual egress device for the output path. The
recent change to consider the l3slave flag when returning IP_PKTINFO
works for local traffic (the correct device index is returned), but it
broke the more typical use case of packets received from a remote host
always returning the VRF index rather than the original ingress device.
Update the fixup to consider l3slave and rt_iif actually getting set.
Fixes:
1dfa76390bf05 ("net: ipv4: add check for l3slave for index returned in IP_PKTINFO")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Geert Uytterhoeven [Wed, 13 Sep 2017 17:42:05 +0000 (19:42 +0200)]
net: smsc911x: Quieten netif during suspend
If the network interface is kept running during suspend, the net core
may call net_device_ops.ndo_start_xmit() while the Ethernet device is
still suspended, which may lead to a system crash.
E.g. on sh73a0/kzm9g and r8a73a4/ape6evm, the external Ethernet chip is
driven by a PM controlled clock. If the Ethernet registers are accessed
while the clock is not running, the system will crash with an imprecise
external abort.
As this is a race condition with a small time window, it is not so easy
to trigger at will. Using pm_test may increase your chances:
# echo 0 > /sys/module/printk/parameters/console_suspend
# echo platform > /sys/power/pm_test
# echo mem > /sys/power/state
To fix this, make sure the network interface is quietened during
suspend.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 12 Sep 2017 20:14:26 +0000 (13:14 -0700)]
net: systemport: Fix 64-bit stats deadlock
We can enter a deadlock situation because there is no sufficient protection
when ndo_get_stats64() runs in process context to guard against RX or TX NAPI
contexts running in softirq, this can lead to the following lockdep splat and
actual deadlock was experienced as well with an iperf session in the background
and a while loop doing ifconfig + ethtool.
[ 5.780350] ================================
[ 5.784679] WARNING: inconsistent lock state
[ 5.789011] 4.13.0-rc7-02179-g32fae27c725d #70 Not tainted
[ 5.794561] --------------------------------
[ 5.798890] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[ 5.804971] swapper/0/0 [HC0[0]:SC1[1]:HE0:SE0] takes:
[ 5.810175] (&syncp->seq#2){+.?...}, at: [<
c0768a28>] bcm_sysport_tx_reclaim+0x30/0x54
[ 5.818327] {SOFTIRQ-ON-W} state was registered at:
[ 5.823278] bcm_sysport_get_stats64+0x17c/0x258
[ 5.828053] dev_get_stats+0x38/0xac
[ 5.831776] rtnl_fill_stats+0x30/0x118
[ 5.835761] rtnl_fill_ifinfo+0x538/0xe24
[ 5.839921] rtmsg_ifinfo_build_skb+0x6c/0xd8
[ 5.844430] rtmsg_ifinfo_event.part.5+0x14/0x44
[ 5.849201] rtmsg_ifinfo+0x20/0x28
[ 5.852837] register_netdevice+0x628/0x6b8
[ 5.857171] register_netdev+0x14/0x24
[ 5.861051] bcm_sysport_probe+0x30c/0x438
[ 5.865280] platform_drv_probe+0x50/0xb0
[ 5.869418] driver_probe_device+0x2e8/0x450
[ 5.873817] __driver_attach+0x104/0x120
[ 5.877871] bus_for_each_dev+0x7c/0xc0
[ 5.881834] bus_add_driver+0x1b0/0x270
[ 5.885797] driver_register+0x78/0xf4
[ 5.889675] do_one_initcall+0x54/0x190
[ 5.893646] kernel_init_freeable+0x144/0x1d0
[ 5.898135] kernel_init+0x8/0x110
[ 5.901665] ret_from_fork+0x14/0x2c
[ 5.905363] irq event stamp: 24263
[ 5.908804] hardirqs last enabled at (24262): [<
c08eecf0>] net_rx_action+0xc4/0x4e4
[ 5.916624] hardirqs last disabled at (24263): [<
c0a7da00>] _raw_spin_lock_irqsave+0x1c/0x98
[ 5.925143] softirqs last enabled at (24258): [<
c022a7fc>] irq_enter+0x84/0x98
[ 5.932524] softirqs last disabled at (24259): [<
c022a918>] irq_exit+0x108/0x16c
[ 5.939985]
[ 5.939985] other info that might help us debug this:
[ 5.946576] Possible unsafe locking scenario:
[ 5.946576]
[ 5.952556] CPU0
[ 5.955031] ----
[ 5.957506] lock(&syncp->seq#2);
[ 5.960955] <Interrupt>
[ 5.963604] lock(&syncp->seq#2);
[ 5.967227]
[ 5.967227] *** DEADLOCK ***
[ 5.967227]
[ 5.973222] 1 lock held by swapper/0/0:
[ 5.977092] #0: (&(&ring->lock)->rlock){..-...}, at: [<
c0768a18>] bcm_sysport_tx_reclaim+0x20/0x54
So just remove the u64_stats_update_begin()/end() pair in ndo_get_stats64()
since it does not appear to be useful for anything. No inconsistency was
observed with either ifconfig or ethtool, global TX counts equal the sum of
per-queue TX counts on a 32-bit architecture.
Fixes:
10377ba7673d ("net: systemport: Support 64bit statistics")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Tue, 12 Sep 2017 20:10:53 +0000 (22:10 +0200)]
net: vrf: avoid gcc-4.6 warning
When building an allmodconfig kernel with gcc-4.6, we get a rather
odd warning:
drivers/net/vrf.c: In function ‘vrf_ip6_input_dst’:
drivers/net/vrf.c:964:3: error: initialized field with side-effects overwritten [-Werror]
drivers/net/vrf.c:964:3: error: (near initialization for ‘fl6’) [-Werror]
I have no idea what this warning is even trying to say, but it does
seem like a false positive. Reordering the initialization in to match
the structure definition gets rid of the warning, and might also avoid
whatever gcc thinks is wrong here.
Fixes:
9ff74384600a ("net: vrf: Handle ipv6 multicast and link-local addresses")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Himanshu Jha [Tue, 12 Sep 2017 11:19:22 +0000 (16:49 +0530)]
qed: remove unnecessary call to memset
call to memset to assign 0 value immediately after allocating
memory with kzalloc is unnecesaary as kzalloc allocates the memory
filled with 0 value.
Semantic patch used to resolve this issue:
@@
expression e,e2; constant c;
statement S;
@@
e = kzalloc(e2, c);
if(e == NULL) S
- memset(e, 0, e2);
Signed-off-by: Himanshu Jha <himanshujha199640@gmail.com>
Signed-off-by: Himanshu Jha <himanshujha199640@gmail.com>
Acked-by: Sudarsana Kalluru <sudarsana.kalluru@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 15 Sep 2017 19:58:55 +0000 (12:58 -0700)]
Merge tag 'firmware_removal-4.14-rc1' of git://git./linux/kernel/git/gregkh/driver-core
Pull firmware removal from Greg KH:
"Many many years ago (at the kernel summit in Boston), we all came to
the agreement that the firmware/ tree should be dropped from the
kernel, and everyone use the linux-firmware package instead. For some
minor reason, David Woodhouse didn't send the pull request at that
point in time, and everyone forgot about this.
The topic came up in the hallway track at the Plumbers conference this
week, so here's a single patch that drops the whole firmware tree. The
last firmware update was back in 2013, and all distros have been using
linux-firmware instead since at least that year, if not before. The
only commits to that directory since 2013 was some kbuild fixups for
various build tool issues.
So lets finally drop this, we don't need to lug them around in the
kernel source tree anymore, especially as no one wants or uses them.
This has passed build testing with 0-day, I don't think it made it
into linux-next this week, but I figured it was good to get in before
4.14-rc1 was out"
* tag 'firmware_removal-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
firmware: delete in-kernel firmware
Linus Torvalds [Fri, 15 Sep 2017 19:47:21 +0000 (12:47 -0700)]
Merge tag 'nios2-v4.14-rc1' of git://git./linux/kernel/git/lftan/nios2
Pull arch/nios2 update from Ley Foon Tan.
* tag 'nios2-v4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2:
nios2: time: Read timer in get_cycles only if initialized
nios2: add earlycon support to 3c120 devboard DTS
Linus Torvalds [Fri, 15 Sep 2017 19:44:59 +0000 (12:44 -0700)]
Merge tag 'powerpc-4.14-2' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fix from Michael Ellerman:
"Just one fix, for the handling of alignment interrupts on dcbz
instructions.
Thanks to Paul Mackerras, Christian Zigotzky, Michal Sojka"
* tag 'powerpc-4.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc: Fix handling of alignment interrupt on dcbz instruction
Hannes Reinecke [Fri, 15 Sep 2017 12:05:16 +0000 (14:05 +0200)]
scsi: sg: fixup infoleak when using SG_GET_REQUEST_TABLE
When calling SG_GET_REQUEST_TABLE ioctl only a half-filled table is
returned; the remaining part will then contain stale kernel memory
information. This patch zeroes out the entire table to avoid this
issue.
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Linus Torvalds [Fri, 15 Sep 2017 19:16:18 +0000 (12:16 -0700)]
Merge tag 'for-linus-4.14-ofs2' of git://git./linux/kernel/git/hubcap/linux
Pull orangefs updates from Mike Marshall:
"Some cleanups and a big bug fix for ACLs.
When I was reviewing Jan Kara's ACL patch, I realized that Orangefs
ACL code was busted, not just in the kernel module, but in the server
as well. I've been working on the code in the server mostly, but
here's one kernel patch, there will be more"
* tag 'for-linus-4.14-ofs2' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
orangefs: Adjust three checks for null pointers
orangefs: Use kcalloc() in orangefs_prepare_cdm_array()
orangefs: Delete error messages for a failed memory allocation in five functions
orangefs: constify xattr_handler structure
orangefs: don't call filemap_write_and_wait from fsync
orangefs: off by ones in xattr size checks
orangefs: documentation clean up
orangefs: react properly to posix_acl_update_mode's aftermath.
orangefs: Don't clear SGID when inheriting ACLs
Hannes Reinecke [Fri, 15 Sep 2017 12:05:15 +0000 (14:05 +0200)]
scsi: sg: factor out sg_fill_request_table()
Factor out sg_fill_request_table() for better readability.
[mkp: typos, applied by hand]
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Lukas Czerner [Wed, 13 Sep 2017 14:29:46 +0000 (16:29 +0200)]
scsi: sd: Remove unnecessary condition in sd_read_block_limits()
After series of changes around WRITE_SAME and UNMAP setup we ended up
with leftover unnecessary condition. Remove it.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Dmitry Torokhov [Fri, 15 Sep 2017 16:52:21 +0000 (09:52 -0700)]
Merge branch 'next' into for-linus
Prepare second round of input updates for 4.14 merge window.
Kai-Heng Feng [Fri, 15 Sep 2017 16:36:16 +0000 (09:36 -0700)]
Input: i8042 - add Gigabyte P57 to the keyboard reset table
Similar to other Gigabyte laptops, the touchpad on P57 requires a
keyboard reset to detect Elantech touchpad correctly.
BugLink: https://bugs.launchpad.net/bugs/1594214
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Jim Mattson [Thu, 14 Sep 2017 23:31:44 +0000 (16:31 -0700)]
kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly
When emulating a nested VM-entry from L1 to L2, several control field
validation checks are deferred to the hardware. Should one of these
validation checks fail, vcpu_vmx_run will set the vmx->fail flag. When
this happens, the L2 guest state is not loaded (even in part), and
execution should continue in L1 with the next instruction after the
VMLAUNCH/VMRESUME.
The VMCS12 is not modified (except for the VM-instruction error
field), the VMCS12 MSR save/load lists are not processed, and the CPU
state is not loaded from the VMCS12 host area. Moreover, the vmcs02
exit reason is stale, so it should not be consulted for any reason.
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Jim Mattson [Thu, 14 Sep 2017 23:31:42 +0000 (16:31 -0700)]
kvm: vmx: Handle VMLAUNCH/VMRESUME failure properly
On an early VMLAUNCH/VMRESUME failure (i.e. one which sets the
VM-instruction error field of the current VMCS), the launch state of
the current VMCS is not set to "launched," and the VM-exit information
fields of the current VMCS (including IDT-vectoring information and
exit reason) are stale.
On a late VMLAUNCH/VMRESUME failure (i.e. one which sets the high bit
of the exit reason field), the launch state of the current VMCS is not
set to "launched," and only two of the VM-exit information fields of
the current VMCS are modified (exit reason and exit
qualification). The remaining VM-exit information fields of the
current VMCS (including IDT-vectoring information, in particular) are
stale.
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Jim Mattson [Thu, 14 Sep 2017 23:31:40 +0000 (16:31 -0700)]
kvm: nVMX: Remove nested_vmx_succeed after successful VM-entry
After a successful VM-entry, RFLAGS is cleared, with the exception of
bit 1, which is always set. This is handled by load_vmcs12_host_state.
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Davidlohr Bueso [Wed, 13 Sep 2017 20:08:24 +0000 (13:08 -0700)]
kvm,mips: Fix potential swait_active() races
For example, the following could occur, making us miss a wakeup:
CPU0 CPU1
kvm_vcpu_block kvm_mips_comparecount_func
[L] swait_active(&vcpu->wq)
[S] prepare_to_swait(&vcpu->wq)
[L] if (!kvm_vcpu_has_pending_timer(vcpu))
schedule() [S] queue_timer_int(vcpu)
Ensure that the swait_active() check is not hoisted over the interrupt.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Davidlohr Bueso [Wed, 13 Sep 2017 20:08:23 +0000 (13:08 -0700)]
kvm,powerpc: Serialize wq active checks in ops->vcpu_kick
Particularly because kvmppc_fast_vcpu_kick_hv() is a callback,
ensure that we properly serialize wq active checks in order to
avoid potentially missing a wakeup due to racing with the waiter
side.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Davidlohr Bueso [Wed, 13 Sep 2017 20:08:22 +0000 (13:08 -0700)]
kvm: Serialize wq active checks in kvm_vcpu_wake_up()
This is a generic call and can be suceptible to races
in reading the wq task_list while another task is adding
itself to the list. Add a full barrier by using the
swq_has_sleeper() helper.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Davidlohr Bueso [Wed, 13 Sep 2017 20:08:21 +0000 (13:08 -0700)]
kvm,x86: Fix apf_task_wake_one() wq serialization
During code inspection, the following potential race was seen:
CPU0 CPU1
kvm_async_pf_task_wait apf_task_wake_one
[L] swait_active(&n->wq)
[S] prepare_to_swait(&n.wq)
[L] if (!hlist_unhahed(&n.link))
schedule() [S] hlist_del_init(&n->link);
Properly serialize swait_active() checks such that a wakeup is
not missed.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Davidlohr Bueso [Wed, 13 Sep 2017 20:08:20 +0000 (13:08 -0700)]
kvm,lapic: Justify use of swait_active()
A comment might serve future readers.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Davidlohr Bueso [Wed, 13 Sep 2017 20:08:19 +0000 (13:08 -0700)]
kvm,async_pf: Use swq_has_sleeper()
... as we've got the new helper now. This caller already
does the right thing, hence no changes in semantics.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Davidlohr Bueso [Wed, 13 Sep 2017 20:08:18 +0000 (13:08 -0700)]
sched/wait: Add swq_has_sleeper()
Which is the equivalent of what we have in regular waitqueues.
I'm not crazy about the name, but this also helps us get both
apis closer -- which iirc comes originally from the -net folks.
We also duplicate the comments for the lockless swait_active(),
from wait.h. Future users will make use of this interface.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Jan H. Schönherr [Thu, 7 Sep 2017 18:02:30 +0000 (19:02 +0100)]
KVM: VMX: Do not BUG() on out-of-bounds guest IRQ
The value of the guest_irq argument to vmx_update_pi_irte() is
ultimately coming from a KVM_IRQFD API call. Do not BUG() in
vmx_update_pi_irte() if the value is out-of bounds. (Especially,
since KVM as a whole seems to hang after that.)
Instead, print a message only once if we find that we don't have a
route for a certain IRQ (which can be out-of-bounds or within the
array).
This fixes CVE-2017-1000252.
Fixes:
efc644048ecde54 ("KVM: x86: Update IRTE for posted-interrupts")
Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Jan H. Schönherr [Thu, 7 Sep 2017 18:02:48 +0000 (19:02 +0100)]
KVM: Don't accept obviously wrong gsi values via KVM_IRQFD
We cannot add routes for gsi values >= KVM_MAX_IRQ_ROUTES -- see
kvm_set_irq_routing(). Hence, there is no sense in accepting them
via KVM_IRQFD. Prevent them from entering the system in the first
place.
Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Guenter Roeck [Tue, 12 Sep 2017 03:45:26 +0000 (20:45 -0700)]
nios2: time: Read timer in get_cycles only if initialized
Mainline crashes as follows when running nios2 images.
On node 0 totalpages: 65536
free_area_init_node: node 0, pgdat
c8408fa0, node_mem_map
c8726000
Normal zone: 512 pages used for memmap
Normal zone: 0 pages reserved
Normal zone: 65536 pages, LIFO batch:15
Unable to handle kernel NULL pointer dereference at virtual address
00000000
ea =
c8003cb0, ra =
c81cbf40, cause = 15
Kernel panic - not syncing: Oops
Problem is seen because get_cycles() is called before the timer it depends
on is initialized. Returning 0 in that situation fixes the problem.
Fixes:
33d72f3822d7 ("init/main.c: extract early boot entropy from the ..")
Cc: Laura Abbott <labbott@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Daniel Micay <danielmicay@gmail.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Tobias Klauser [Tue, 20 Jun 2017 07:43:18 +0000 (09:43 +0200)]
nios2: add earlycon support to 3c120 devboard DTS
Allow earlycon to be used on the JTAG UART present in the 3c120 GHRD.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Jim Mattson [Tue, 12 Sep 2017 20:02:54 +0000 (13:02 -0700)]
kvm: nVMX: Don't allow L2 to access the hardware CR8
If L1 does not specify the "use TPR shadow" VM-execution control in
vmcs12, then L0 must specify the "CR8-load exiting" and "CR8-store
exiting" VM-execution controls in vmcs02. Failure to do so will give
the L2 VM unrestricted read/write access to the hardware CR8.
This fixes CVE-2017-12154.
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Josh Poimboeuf [Fri, 15 Sep 2017 07:17:11 +0000 (02:17 -0500)]
objtool: Fix object file corruption
Arnd Bergmann reported that a randconfig build was failing with the
following link error:
built-in.o: member arch/x86/kernel/time.o in archive is not an object
It turns out the link failed because the time.o file had been corrupted
by objtool:
nm: arch/x86/kernel/time.o: File format not recognized
In certain rare cases, when a .o file's ORC table is very small, the
.data section size doesn't change because it's page aligned. Because
all the existing sections haven't changed size, libelf doesn't detect
any section header changes, and so it doesn't update the section header
table properly. Instead it writes junk in the section header entries
for the new ORC sections.
Make sure libelf properly updates the section header table by setting
the ELF_F_DIRTY flag in the top level elf struct.
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes:
627fce14809b ("objtool: Add ORC unwind table generation")
Link: http://lkml.kernel.org/r/e650fd0f2d8a209d1409a9785deb101fdaed55fb.1505459813.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Petr Vandrovec [Fri, 15 Sep 2017 07:15:05 +0000 (02:15 -0500)]
objtool: Do not retrieve data from empty sections
Binutils 2.29-9 in Debian return an error when elf_getdata is invoked
on empty section (.note.GNU-stack in all kernel files), causing
immediate failure of kernel build with:
elf_getdata: can't manipulate null section
As nothing is done with sections that have zero size, just do not
retrieve their data at all.
Signed-off-by: Petr Vandrovec <petr@vandrovec.name>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/2ce30a44349065b70d0f00e71e286dc0cbe745e6.1505459652.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Martin Kepplinger [Thu, 14 Sep 2017 06:01:38 +0000 (08:01 +0200)]
objtool: Fix memory leak in elf_create_rela_section()
Let's free the allocated char array 'relaname' before returning,
in order to avoid leaking memory.
Signed-off-by: Martin Kepplinger <martink@posteo.de>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: mingo.kernel.org@gmail.com
Link: http://lkml.kernel.org/r/20170914060138.26472-1-martink@posteo.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Borislav Petkov [Thu, 7 Sep 2017 17:08:21 +0000 (19:08 +0200)]
x86/cpu/AMD: Fix erratum 1076 (CPB bit)
CPUID Fn8000_0007_EDX[CPB] is wrongly 0 on models up to B1. But they do
support CPB (AMD's Core Performance Boosting cpufreq CPU feature), so fix that.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sherry Hurwitz <sherry.hurwitz@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170907170821.16021-1-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>