Chao Yu [Wed, 29 Jul 2015 09:33:13 +0000 (17:33 +0800)]
f2fs: freeze filesystem when fail to update meta page due to IO error
In get_meta_page, we guarantee no failure for the returned page,
but sometimes, IO error from device will incur returning an
non-updated page.
Then, we still use this page as updated one, exception could happen
when using this kind of page.
So in this condition, we'd better freeze fs by making fs readonly and
and stop doing checkpoint.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Fan Li [Tue, 4 Aug 2015 05:27:51 +0000 (13:27 +0800)]
f2fs: change the timing of f2fs_wait_on_page_writeback
some backing devices need pages to be stable during writeback. It doesn't
matter if
the page is completely overwritten or already uptodate, it needs to wait
before write.
Signed-off-by: Fan li <fanofcode.li@samsung.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Sat, 25 Jul 2015 07:52:52 +0000 (00:52 -0700)]
f2fs: handle error cases in commit_inmem_pages
This patch adds to handle error cases in commit_inmem_pages.
If an error occurs, it stops to write the pages and return the error right
away.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Fri, 24 Jul 2015 10:26:26 +0000 (18:26 +0800)]
f2fs: fix to build free nids from readaheaded nat pages
When there is no enough free nids in free nid cache, we will try to
readahead FREE_NID_PAGES:4 nat pages into page cache of meta_inode,
then, reading nat entries in nat page for adding free nids to free nid
cache.
But when traversing all nat pages we readaheaded in a circulation,
our exit condition is not set right, one more nat page will be scanned
without readaheading, resulting worse read performance.
This patch fixes to read the correct number nat pages to avoid bad
performance.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Fri, 24 Jul 2015 10:24:45 +0000 (18:24 +0800)]
f2fs: fix inline data/dentry stat number leak
If we clear inline data/dentry flag in handle_failed_inode, we will fail
to decline the stat count of inline data/dentry in f2fs_evict_inode due
to no flag in inode. So remove the wrong clearing.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Fri, 17 Jul 2015 10:06:35 +0000 (18:06 +0800)]
f2fs: convert inline data before set atomic/volatile flag
In f2fs_ioc_start_{atomic,volatile}_write, if we failed in converting
inline data, we will report error to user, but still remain atomic/volatile
flag in inode, it will impact further writes for this file. Fix it.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Fri, 17 Jul 2015 10:05:21 +0000 (18:05 +0800)]
f2fs: fix to wait all atomic written pages writeback
This patch fixes the incorrect range (0, LONG_MAX) which is used
in ranged fsync. If we use LONG_MAX as the parameter for indicating
the end of file we want to synchronize, in 32-bits architecture
machine, these datas after 4GB offset may not be persisted in
storage after ->fsync returned.
Here, we alter LONG_MAX to LLONG_MAX to fix this issue.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Fri, 17 Jul 2015 10:02:39 +0000 (18:02 +0800)]
f2fs: skip writing in ->writepages when no dirty pages exist
When flushing comes from background, if there is no dirty page in the
mapping of inode, we'd better to skip seeking dirty page from mapping
for writebacking.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Tiezhu Yang [Fri, 17 Jul 2015 04:56:00 +0000 (12:56 +0800)]
f2fs: optimize f2fs_write_cache_pages
The if statement "goto continue_unlock" is exactly the same when
each if condition is true that is depended on the value of both
"step" and "is_cold_data(page)" are 0 or 1. That means when the
value of "step" equals to "is_cold_data(page)", the if condition
is true and the if statement "goto continue_unlock" appears only
once, so it can be optimized to reduce the duplicated code.
Signed-off-by: Tiezhu Yang <kernelpatch@126.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Thu, 16 Jul 2015 10:19:02 +0000 (18:19 +0800)]
f2fs: fix double lock in handle_failed_inode
In handle_failed_inode, there is a potential deadlock which can happen
in below call path:
- f2fs_create
- f2fs_lock_op down_read(cp_rwsem)
- f2fs_add_link
- __f2fs_add_link
- init_inode_metadata
- f2fs_init_security failed
- truncate_blocks failed
- handle_failed_inode
- f2fs_truncate
- truncate_blocks(..,true)
- write_checkpoint
- block_operations
- f2fs_lock_all down_write(cp_rwsem)
- f2fs_lock_op down_read(cp_rwsem)
So in this path, we pass parameter to f2fs_truncate to make sure
cp_rwsem in truncate_blocks will not be locked again.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Thu, 16 Jul 2015 10:18:11 +0000 (18:18 +0800)]
f2fs: reduce region of cp_rwsem covered in f2fs_do_collapse
In f2fs_do_collapse, region cp_rwsem covered is large, since it will be
held until all blocks are left shifted, so if we try to collapse small
area at the beginning of large file, checkpoint who want to grab writer's
lock of cp_rwsem will be delayed for long time.
In order to avoid this condition, altering to lock/unlock cp_rwsem each
shift operation.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Fan Li [Wed, 15 Jul 2015 10:05:17 +0000 (18:05 +0800)]
f2fs: add new interfaces for extent tree
Add a lookup and a insertion interface for extent tree.
The new lookup return the insert position and the prev/next
extents closest to the offset we lookup when find no match.
The new insertion uses above parameters to improve performance.
There are three possible insertions after the lookup in
f2fs_update_extent_tree, two of them insert parts of removed extent
back to tree, since no merge happens during this process, new insertion
skips the merge check in this scanario; the another insertion inserts a
new extent to tree, new insertion uses prev/next extent and insert
position to insert this extent directly, and save the time of searching
down the tree.
As long as tree remains unchanged between lookup and insertion, this
would work fine. And the new lookup would be useful when add
multi-blocks extent support for insertion interface.
Signed-off-by: Fan li <fanofcode.li@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Wed, 15 Jul 2015 20:08:21 +0000 (13:08 -0700)]
f2fs: callers take care of the page from bio error
This patch changes for a caller to handle the page after its bio gets an error.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Wed, 15 Jul 2015 09:29:49 +0000 (17:29 +0800)]
f2fs: use atomic_t to record hit ratio info of extent cache
Variables for recording extent cache ratio info were updated without
protection, this patch tries to alter them to atomic_t type for more
accurate stat.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Wed, 15 Jul 2015 09:28:53 +0000 (17:28 +0800)]
f2fs: stat inline xattr inode number
This patch adds to stat the number of inline xattr inode for
showing in debugfs.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Tue, 14 Jul 2015 01:31:24 +0000 (18:31 -0700)]
f2fs: use a page temporarily for encrypted gced page
That encrypted page is used temporarily, so we don't need to mark it accessed.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Tue, 14 Jul 2015 10:56:10 +0000 (18:56 +0800)]
f2fs: expose f2fs_write_cache_pages
If there are gced dirty pages and normal dirty pages in the mapping
of one inode, we might writeback them alternately with discontinuous
block address, resulting in low performance.
This patch introduces f2fs_write_cache_pages with codes copied from
write_cache_pages in mm/page-writeback.c.
In this function, we refactor flow with two steps:
1) writeback all cold type pages.
2) writeback all non-cold type pages.
By using this method, f2fs will writeback dirty pages with the same
temperature in bunch mode, it makes writeouted block being with
more continuous address, so they can be merged as much as possible
in f2fs bio cache, and also it will reduce the chance of submiting
small IO from block layer.
Test environment: 8g nokia sd card (very old sd card, but it shows
better effect when testing with this patch, and with a 32g kingston
sd card, I didn't see much more improvement).
Test step:
1. touch testfile;
2. truncate -s 512K testfile;
3. write all pages with odd index;
4. trigger gc by ioctl;
5. write all pages with even index;
6. time fsync testfile.
before:
real 0m0.402s
user 0m0.000s
sys 0m0.000s
after:
real 0m0.143s
user 0m0.004s
sys 0m0.004s
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Mon, 13 Jul 2015 09:45:19 +0000 (17:45 +0800)]
f2fs: correct return value of ->setxattr
This patch fixes to return correct error number of ->setxattr, which
is reported by xfstest tests/generic/026 as below:
generic/026 - output mismatch
--- tests/generic/026.out
+++ results/generic/026.out.bad
@@ -4,6 +4,6 @@
1 below acl max
acl max
1 above acl max
-chacl: cannot set access acl on "largeaclfile": Argument list too long
+chacl: cannot set access acl on "largeaclfile": Numerical result out of range
use 16 aces
use 17 aces
...
Ran: generic/026
Failures: generic/026
Failed 1 of 1 tests
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Mon, 13 Jul 2015 09:44:25 +0000 (17:44 +0800)]
f2fs: cleanup write_orphan_inodes
Previously, since 'commit
4531929e3922 ("f2fs: move grabing orphan
pages out of protection region")' was committed, in write_orphan_inodes(),
we will grab all meta page in a batch before we use them under spinlock,
so that we can avoid large time delay of grabbing meta pages under
spinlock.
Now, 'commit
d6c67a4fee86 ("f2fs: revmove spin_lock for
write_orphan_inodes")' remove the spinlock in write_orphan_inodes,
so there is no issue we describe above, we'd better recover to move
the grab operation to original place for readability.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Mon, 13 Jul 2015 09:43:19 +0000 (17:43 +0800)]
f2fs: warm up cold page after mmaped write
With cost-benifit method, background gc will consider old section with
fewer valid blocks as candidate victim, these old blocks in section will
be treated as cold data, and laterly will be moved into cold segment.
But if the gcing page is attached by user through buffered or mmaped
write, we should reset the page as non-cold one, because this page may
have more opportunity for further updating.
So fix to add clearing code for the missed 'mmap' case.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Fri, 10 Jul 2015 10:08:10 +0000 (18:08 +0800)]
f2fs: add new ioctl F2FS_IOC_GARBAGE_COLLECT
When background gc is off, the only way to trigger gc is executing
a force gc in some operations who wants to grab space in disk.
The executing condition is limited: to execute force gc, we should
wait for the time when there is almost no more free section for LFS
allocation. This seems not reasonable for our user who wants to
control triggering gc by himself.
This patch introduces F2FS_IOC_GARBAGE_COLLECT interface for
triggering garbage collection by using ioctl. It provides our users
one more option to trigger gc.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Wed, 8 Jul 2015 09:59:36 +0000 (17:59 +0800)]
f2fs: maintain extent cache in separated file
This patch moves extent cache related code from data.c into extent_cache.c
since extent cache is independent feature, and its codes are not relate to
others in data.c, it's better for us to maintain them in separated place.
There is no functionality change, but several small coding style fixes
including:
* rename __drop_largest_extent to f2fs_drop_largest_extent for exporting;
* rename misspelled word 'untill' to 'until';
* remove unneeded 'return' in the end of f2fs_destroy_extent_tree().
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Fan Li [Wed, 8 Jul 2015 08:02:54 +0000 (16:02 +0800)]
f2fs: don't try to split extents shorter than F2FS_MIN_EXTENT_LEN
Since only parts of extents longer than F2FS_MIN_EXTENT_LEN will
be kept in extent cache after split, extents already shorter than
F2FS_MIN_EXTENT_LEN don't need to try split at all.
Signed-off-by: Fan Li <fanofcode.li@samsung.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Wed, 8 Jul 2015 10:24:38 +0000 (18:24 +0800)]
f2fs: fix to update page flag
This patch fixes to update page flag (e.g. Uptodate/cold flag) in
->write_begin.
Otherwise, page will be non-uptodate when we try to write entire
page, and cold data flag in page will not be clean when gced page
is being rewritten.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Mon, 29 Jun 2015 23:34:39 +0000 (16:34 -0700)]
f2fs: shrink unreferenced extent_caches first
If an extent_tree entry has a zero reference count, we can drop it from the
cache in higher priority rather than currently referencing entries.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Mon, 6 Jul 2015 12:31:49 +0000 (20:31 +0800)]
f2fs: enhance multithread performance
In ->writepages, we use writepages mutex lock to serialize all block
address allocation and page submitting pairs from different inodes.
This method makes our delayed dirty pages of one inode being written
continously as many as possible.
But there is one problem that we did not submit current cached bio in
protection region of writepages mutex lock, so there is a small chance
that we submit the one of other thread's as below, resulting in
splitting more bios.
thread 1 thread 2
->writepages
lock(writepages)
->write_cache_pages
unlock(writepages)
lock(writepages)
->write_cache_pages
->f2fs_submit_merged_bio
->writepage
unlock(writepages)
fs_mark-6535 [002] .... 2242.270230: f2fs_submit_write_bio: dev = (1,0), WRITE_SYNC, DATA, sector = 5766152, size = 524288
fs_mark-6536 [000] .... 2242.270361: f2fs_submit_write_bio: dev = (1,0), WRITE_SYNC, DATA, sector = 5767176, size = 4096
fs_mark-6536 [000] .... 2242.270370: f2fs_submit_write_bio: dev = (1,0), WRITE_SYNC, NODE, sector = 8138112, size = 4096
fs_mark-6535 [002] .... 2242.270776: f2fs_submit_write_bio: dev = (1,0), WRITE_SYNC, DATA, sector = 5767184, size = 516096
This may really increase time of block layer works, and may cause
larger IO lantency.
This patch moves the submitting operation into region of writepages
mutex lock to avoid bio splits when concurrently writebacking is
intensive.
my test environment: virtual machine,
intel cpu i5 2500, 8GB size memory, 4GB size ramdisk
time fs_mark -t 16 -L 1 -s 524288 -S 1 -d /mnt/f2fs/
before:
real 0m4.244s
user 0m0.088s
sys 0m12.336s
after:
real 0m3.822s
user 0m0.072s
sys 0m10.760s
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Mon, 6 Jul 2015 12:30:40 +0000 (20:30 +0800)]
f2fs: restrict multimedia filename
When testing with fs_mark, some blocks were written out as cold
data which were mixed with warm data, resulting in splitting more
bios.
This is because fs_mark will create file with random filename as
below:
559551ee~~~~~~~~15Z29OCC05JCKQP60JQ42MKV
559551ee~~~~~~~~NZAZ6X8OA8LHIIP6XD0L58RM
559551ef~~~~~~~~B15YDSWAK789HPSDZKYTW6WM
559551f1~~~~~~~~2DAE5DPS79785BUNTFWBEMP3
559551f1~~~~~~~~1MYDY0BKSQCJPI32Q8C514RM
559551f1~~~~~~~~YQOTMAOMN5CVRFOUNI026MP4
559551f3~~~~~~~~1WF42LPRTQJNPPGR3EINKMPE
559551f3~~~~~~~~8Y2NRK7CEPPAA02LY936PJPG
They are regarded as cold file since their filename are ended with
multimedia files' extension, but this should be wrong as we only
match the extension of filename, not the whole one.
In this patch, we try to fix the format of multimedia filename to:
"filename + '.' + extension", then we set cold file only its
filename matches the format.
So after this change, it will reduce the probability we set the
wrong cold file, also it helps a little for fs_mark's performance
on f2fs.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Mon, 6 Jul 2015 12:29:46 +0000 (20:29 +0800)]
MAINTAINERS: add missed trace file for f2fs
This patch adds missed trace file in maintainer-ship of f2fs,
so it completes the description of files maintained in f2fs,
and also it allows people to find correct mailing list by using
get_maintainer.pl when only patching the trace file of f2fs.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Nicholas Krause [Wed, 1 Jul 2015 01:37:21 +0000 (21:37 -0400)]
f2fs: make the function check_dnode have a return type of bool and change it's name to is_alive
This makes the function check_dnode have a return type of bool
due to this particular function only ever returning either one
or zero as its return value and changes the name of the function
to is_alive in order to better explain this function's intended
work of checking if a dnode is still in use by the filesystem.
Signed-off-by: Nicholas Krause <xerofoify@gmail.com>
[Jaegeuk Kim: change the return value check for the renamed function]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Mon, 29 Jun 2015 23:01:14 +0000 (16:01 -0700)]
f2fs: check the largest extent at look-up time
Because of the extent shrinker or other -ENOMEM scenarios, it cannot guarantee
that the largest extent would be cached in the tree all the time.
Instead of relying on extent_tree, we can simply check the cached one in extent
tree accordingly.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Sat, 20 Jun 2015 00:53:26 +0000 (17:53 -0700)]
f2fs: use extent_cache by default
We don't need to handle the duplicate extent information.
The integrated rule is:
- update on-disk extent with largest one tracked by in-memory extent_cache
- destroy extent_tree for the truncation case
- drop per-inode extent_cache by shrinker
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Fri, 26 Jun 2015 00:43:04 +0000 (17:43 -0700)]
f2fs: add noextent_cache mount option
This patch adds noextent_cache mount option.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Fri, 19 Jun 2015 20:41:23 +0000 (13:41 -0700)]
f2fs: shrink extent_cache entries
This patch registers shrinking extent_caches.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Fri, 19 Jun 2015 22:36:07 +0000 (15:36 -0700)]
f2fs: shrink nat_cache entries
This patch registers shrinking nat_cache entries.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Fri, 19 Jun 2015 19:01:21 +0000 (12:01 -0700)]
f2fs: introduce a shrinker for mounted fs
This patch introduces a shrinker targeting to reduce memory footprint consumed
by a number of in-memory f2fs data structures.
In addition, it newly adds:
- sbi->umount_mutex to avoid data races on shrinker and put_super
- sbi->shruinker_run_no to not revisit objects
Note that the basic implementation was copied from fs/ubifs/shrinker.c
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Tue, 23 Jun 2015 01:22:38 +0000 (18:22 -0700)]
f2fs: set cached_en after checking finally
This patch relocates cached_en not only to be covered by spin_lock, but also
to set once after checking out completely.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Tue, 16 Jun 2015 22:17:01 +0000 (15:17 -0700)]
f2fs: update on-disk extents even under extent_cache
Previously, f2fs_update_extent_cache() updates in-memory extent_cache all the
time, and then finally preserves its up-to-date extent into on-disk one during
f2fs_evict_inode.
But, in the following scenario:
1. mount
2. open & write an extent X
3. f2fs_evict_inode; on-disk extent is X
4. open & update the extent X with Y
5. sync; trigger checkpoint
6. power-cut
after power-on, f2fs should serve extent Y, but we have an on-disk extent X.
This causes a failure on xfstests/311.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Thu, 18 Jun 2015 21:17:04 +0000 (14:17 -0700)]
f2fs: fix wrong block address calculation for a split extent
This patch fixes wrong calculation on block address field when an extent is
split.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Wed, 17 Jun 2015 20:59:05 +0000 (13:59 -0700)]
f2fs: convert inline_data for various fallocate
For newly added fallocate types, it should convert inline_data before handling
block swapping.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Tue, 23 Jun 2015 17:36:08 +0000 (10:36 -0700)]
f2fs: avoid to use failed inode immediately
Before iput is called, the inode number used by a bad inode can be reassigned
to other new inode, resulting in any abnormal behaviors on the new inode.
This should not happen for the new inode.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Mon, 15 Jun 2015 21:52:29 +0000 (14:52 -0700)]
f2fs: avoid freed stat information
The write_checkpoint can update stat information, so we should destroy the stat
structure after it.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Mon, 29 Jun 2015 10:14:10 +0000 (18:14 +0800)]
f2fs: fix to record dirty page count for symlink
Dirty page can be exist in mapping of newly created symlink, but previously
we did not maintain the counting of dirty page for symlink like we maintained
for regular/directory, so the counting we lookuped should be wrong.
This patch adds missed dirty page counting for symlink to fix this issue.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Markus Elfring [Fri, 26 Jun 2015 15:28:55 +0000 (17:28 +0200)]
f2fs crypto: delete an unnecessary check before the function call "key_put"
The key_put() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Linus Torvalds [Tue, 4 Aug 2015 16:27:19 +0000 (09:27 -0700)]
Merge tag 'pci-v4.2-fixes-1' of git://git./linux/kernel/git/helgaas/pci
Pull PCI fix from Bjorn Helgaas:
"This is a trivial fix for a change that broke user program compilation
(QEMU in this case)"
* tag 'pci-v4.2-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI: Restore PCI_MSIX_FLAGS_BIRMASK definition
Linus Torvalds [Tue, 4 Aug 2015 15:51:06 +0000 (08:51 -0700)]
Merge tag 'topic/mst-fixes-2015-08-04' of git://anongit.freedesktop.org/drm-intel
Pull drm mst fixes from Daniel Vetter:
"Special pull request for mst fixes since most of the patches touch
code outside of i915 proper. DRM parts have also been reviewed by
Thierry (nvidia) since Dave's enjoying vacations"
* tag 'topic/mst-fixes-2015-08-04' of git://anongit.freedesktop.org/drm-intel:
drm/atomic-helpers: Make encoder picking more robust
drm/dp-mst: Remove debug WARN_ON
drm/i915: Fixup dp mst encoder selection
drm/atomic-helper: Add an atomice best_encoder callback
Linus Torvalds [Tue, 4 Aug 2015 15:49:08 +0000 (08:49 -0700)]
Merge tag 'for-linus-4.2-rc5-tag' of git://git./linux/kernel/git/xen/tip
Pull xen bug fixes from David Vrabel:
- don't lose interrupts when offlining CPUs
- fix gntdev oops during unmap
- drop the balloon lock occasionally to allow domain create/destroy
* tag 'for-linus-4.2-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/events/fifo: Handle linked events when closing a port
xen: release lock occasionally during ballooning
xen/gntdevt: Fix race condition in gntdev_release()
Ross Lagerwall [Fri, 31 Jul 2015 13:30:42 +0000 (14:30 +0100)]
xen/events/fifo: Handle linked events when closing a port
An event channel bound to a CPU that was offlined may still be linked
on that CPU's queue. If this event channel is closed and reused,
subsequent events will be lost because the event channel is never
unlinked and thus cannot be linked onto the correct queue.
When a channel is closed and the event is still linked into a queue,
ensure that it is unlinked before completing.
If the CPU to which the event channel bound is online, spin until the
event is handled by that CPU. If that CPU is offline, it can't handle
the event, so clear the event queue during the close, dropping the
events.
This fixes the missing interrupts (and subsequent disk stalls etc.)
when offlining a CPU.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Linus Torvalds [Tue, 4 Aug 2015 13:57:32 +0000 (06:57 -0700)]
Merge branch 'rc-fixes' of git://git./linux/kernel/git/mmarek/kbuild
Pull kbuild fixes from Michal Marek:
"Two fixes for kbuild:
- The new ARCH_{CPP,A,C}FLAGS variables are reset before including
the arch Makefile
- Fix calling make modules_install twice when module compression is
enabled"
* 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
Makefile: Force gzip and xz on module install
kbuild: Do not pick up ARCH_{CPP,A,C}FLAGS from the environment
Daniel Vetter [Mon, 3 Aug 2015 15:24:11 +0000 (17:24 +0200)]
drm/atomic-helpers: Make encoder picking more robust
We've had a few issues with atomic where subtle bugs in the encoder
picking logic lead to accidental self-stealing of the encoder,
resulting in a NULL connector_state->crtc in update_connector_routing
and subsequent.
Linus applied some duct-tape for an mst regression in
commit
27667f4744fc5a0f3e50910e78740bac5670d18b
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date: Wed Jul 29 22:18:16 2015 -0700
i915: temporary fix for DP MST docking station NULL pointer dereference
But that was incomplete (the code will still oops when debuggin is
enabled) and mangled the state even further. So instead WARN and bail
out as the more future-proof option.
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Thierry Reding <treding@nvidia.com>
Reviewed-by: Ander Conselvan de Oliveira <conselvan2@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Daniel Vetter [Mon, 3 Aug 2015 15:24:10 +0000 (17:24 +0200)]
drm/dp-mst: Remove debug WARN_ON
Apparently been in there since forever and fairly easy to hit when
hotplugging really fast. I can do that since my mst hub has a manual
button to flick the hpd line for reprobing. The resulting WARNING spam
isn't pretty.
Cc: Dave Airlie <airlied@gmail.com>
Cc: stable@vger.kernel.org
Reviewed-by: Thierry Reding <treding@nvidia.com>
Reviewed-by: Ander Conselvan de Oliveira <conselvan2@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Daniel Vetter [Mon, 3 Aug 2015 15:24:09 +0000 (17:24 +0200)]
drm/i915: Fixup dp mst encoder selection
In
commit
8c7b5ccb729870e606321b3703e2c2e698c49a95
Author: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
Date: Tue Apr 21 17:13:19 2015 +0300
drm/i915: Use atomic helpers for computing changed flags
we've switched over to the atomic version to compute the
crtc->encoder->connector routing from the i915 variant. That one
relies upon the ->best_encoder callback, but the i915-private version
relied upon intel_find_encoder. Which didn't matter except for dp mst,
where the encoder depends upon the selected crtc.
Fix this functional bug by implemented a correct atomic-state based
encoder selector for dp mst.
Note that we can't get rid of the legacy best_encoder callback since
the fbdev emulation uses that still. That means it's incorrect there
still, but that's been the case ever since i915 dp mst support was
merged so not a regression. Best to fix that by converting fbdev over
to atomic too.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Ander Conselvan de Oliveira <conselvan2@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Daniel Vetter [Mon, 3 Aug 2015 15:24:08 +0000 (17:24 +0200)]
drm/atomic-helper: Add an atomice best_encoder callback
With legacy helpers all the routing was already set up when calling
best_encoder and so could be inspected. But with atomic it's staged,
hence we need a new atomic compliant callback for drivers which need
to inspect the requested state and can't just decided the best encoder
statically.
This is needed to fix up i915 dp mst where we need to pick the right
encoder depending upon the requested CRTC for the connector.
v2: Don't forget to amend the kerneldoc
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Acked-by: Thierry Reding <treding@nvidia.com>
Reviewed-by: Ander Conselvan de Oliveira <conselvan2@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Linus Torvalds [Mon, 3 Aug 2015 21:51:30 +0000 (14:51 -0700)]
Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"A refcounting bugfix for the i2c-core, bugfixes for the generic bus
recovery algorithm and for its omap-user, making binary file
attributes for EEPROMs behave POSIX compliant, and a small typo fix
while we are here"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: fix leaked device refcount on of_find_i2c_* error path
i2c: Fix typo in i2c-bfin-twi.c
i2c: omap: fix bus recovery setup
i2c: core: only use set_scl for bus recovery after calling prepare_recovery
misc: eeprom: at24: clean up at24_bin_write()
i2c: slave eeprom: clean up sysfs bin attribute read()/write()
Linus Torvalds [Mon, 3 Aug 2015 18:09:07 +0000 (11:09 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/sage/ceph-client
Pull Ceph fixes from Sage Weil:
"There are two critical regression fixes for CephFS from Zheng, and an
RBD completion fix for layered images from Ilya"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
rbd: fix copyup completion race
ceph: always re-send cap flushes when MDS recovers
ceph: fix ceph_encode_locks_to_buffer()
Linus Torvalds [Mon, 3 Aug 2015 18:00:53 +0000 (11:00 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jmorris/linux-security
Pull security layer fix from James Morris:
"Yama initialization fix"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
Adding YAMA hooks also when YAMA is not stacked.
Linus Torvalds [Mon, 3 Aug 2015 17:53:58 +0000 (10:53 -0700)]
Merge git://git./linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
"This fixes the following issues:
- a bogus BUG_ON in ixp4xx that can be triggered by a dst buffer that
is an SG list.
- the error handling in hwrngd may cause a crash in case of an error.
- fix a race condition in qat registration when multiple devices are
present"
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
hwrng: core - correct error check of kthread_run call
crypto: ixp4xx - Remove bogus BUG_ON on scattered dst buffer
crypto: qat - Fix invalid synchronization between register/unregister sym algs
Linus Torvalds [Mon, 3 Aug 2015 17:25:32 +0000 (10:25 -0700)]
Merge tag 'fixes-for-linus' of git://git./linux/kernel/git/rusty/linux
Pull module fix from Rusty Russell:
"Single overzealous locking assertion fix"
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
module: weaken locking assertion for oops path.
Salvatore Mesoraca [Mon, 3 Aug 2015 10:40:51 +0000 (12:40 +0200)]
Adding YAMA hooks also when YAMA is not stacked.
Without this patch YAMA will not work at all if it is chosen
as the primary LSM instead of being "stacked".
Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Linus Torvalds [Mon, 3 Aug 2015 01:34:55 +0000 (18:34 -0700)]
Linux 4.2-rc5
Linus Torvalds [Mon, 3 Aug 2015 01:07:36 +0000 (18:07 -0700)]
Merge tag 'powerpc-4.2-3' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- TCE table memory calculation fix from Alexey
- Build fix for ans-lcd from Luis
- Unbalanced IRQ warning fix from Alistair
* tag 'powerpc-4.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/eeh-powernv: Fix unbalanced IRQ warning
macintosh/ans-lcd: fix build failure after module_init/exit relocation
powerpc/powernv/ioda2: Fix calculation for memory allocated for TCE table
Linus Torvalds [Thu, 30 Jul 2015 05:18:16 +0000 (22:18 -0700)]
i915: temporary fix for DP MST docking station NULL pointer dereference
Ted Ts'o reports that his Lenovo T540p ThinkPad crashes at boot if
attached to the docking station. This is a regression that he was able
to bisect to commit
8c7b5ccb7298: "drm/i915: Use atomic helpers for
computing changed flags:"
The reason seems to be the new call to drm_atomic_helper_check_modeset()
added to intel_modeset_compute_config(), which in turn calls
update_connector_routing(), and somehow ends up picking a NULL crtc for
the connector state, causing the subsequent drm_crtc_index() to OOPS.
Daniel Vetter says that the fundamental issue seems to be confusion in
the encoder selection, and this isn't the right fix, but while he chases
down the proper fix, this at least avoids the NULL pointer dereference
and makes Ted's docking station work again.
Reported-bisected-and-tested-by: Theodore Ts'o <tytso@mit.edu>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Mani Nikula <jani.nikula@linux.intel.com>
Cc: Dave Airlie <airlied@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 2 Aug 2015 16:36:21 +0000 (09:36 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"A set of three fixes for the ipr driver and one fairly major one for
memory leaks in the mq path of SCSI"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: fix memory leak with scsi-mq
ipr: Fix invalid array indexing for HRRQ
ipr: Fix incorrect trace indexing
ipr: Fix locking for unit attention handling
Linus Torvalds [Sun, 2 Aug 2015 16:12:46 +0000 (09:12 -0700)]
Merge tag 'armsoc-for-linus' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Olof Johansson:
"Things are calming down nicely here w.r.t. fixes. This batch
includes two week's worth since I missed to send before -rc4.
Nothing particularly scary to point out, smaller fixes here and there.
Shortlog describes it pretty well"
* tag 'armsoc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: dts: keystone: fix dt bindings to use post div register for mainpll
ARM: nomadik: disable UART0 on Nomadik boards
ARM: dts: i.MX35: Fix can support.
ARM: OMAP2+: hwmod: Fix _wait_target_ready() for hwmods without sysc
ARM: dts: add CPU OPP and regulator supply property for exynos4210
ARM: dts: Update video-phy node with syscon phandle for exynos3250
ARM: DRA7: hwmod: fix gpmc hwmod
Linus Torvalds [Sun, 2 Aug 2015 00:42:14 +0000 (17:42 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs
Pull VFS fix from Al Viro:
"Spurious ENOTDIR fix"
This should fix the problems reported by Dominique Martinet and Hugh
Dickins.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
link_path_walk(): be careful when failing with ENOTDIR
Al Viro [Sat, 1 Aug 2015 23:59:28 +0000 (19:59 -0400)]
link_path_walk(): be careful when failing with ENOTDIR
In RCU mode we might end up with dentry evicted just we check
that it's a directory. In such case we should return ECHILD
rather than ENOTDIR, so that pathwalk would be retries in non-RCU
mode.
Breakage had been introduced in commit b18825a - prior to that
we were looking at nd->inode, which had been fetched before
verifying that ->d_seq was still valid. That form of check
would only be satisfied if at some point the pathname prefix
would indeed have resolved to a non-directory. The fix consists
of checking ->d_seq after we'd run into a non-directory dentry,
and failing with ECHILD in case of mismatch.
Note that all branches since 3.12 have that problem...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Linus Torvalds [Sat, 1 Aug 2015 19:47:04 +0000 (12:47 -0700)]
Merge tag 'dmaengine-fix-4.2-rc5' of git://git.infradead.org/users/vkoul/slave-dma
Pull dmaengine fixes from Vinod Koul:
"We had a regression due to reuse of descriptor so we have reverted
that.
The rest are driver fixes:
- at_hdmac and at_xdmac for residue, trannfer width, and channel config
- pl330 final fix for dma fails and overflow issue
- xgene resouce map fix
- mv_xor big endian op fix"
* tag 'dmaengine-fix-4.2-rc5' of git://git.infradead.org/users/vkoul/slave-dma:
Revert "dmaengine: virt-dma: don't always free descriptor upon completion"
dmaengine: mv_xor: fix big endian operation in register mode
dmaengine: xgene-dma: Fix the resource map to handle overlapping
dmaengine: at_xdmac: fix transfer data width in at_xdmac_prep_slave_sg()
dmaengine: at_hdmac: fix residue computation
dmaengine: at_xdmac: fix bug about channel configuration
dmaengine: pl330: Really fix choppy sound because of wrong residue calculation
dmaengine: pl330: Fix overflow when reporting residue in memcpy
Linus Torvalds [Sat, 1 Aug 2015 16:47:11 +0000 (09:47 -0700)]
Merge branch 'irq-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull irq fixlets from Thomas Gleixner:
"Just two updates to the maintainers file"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
MAINTAINERS: Appoint Jiang and Marc as irqdomain maintainers
MAINTAINERS: Appoint Marc Zyngier as irqchips co-maintainer
Linus Torvalds [Sat, 1 Aug 2015 16:16:33 +0000 (09:16 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Fallout from the recent NMI fixes: make x86 LDT handling more robust.
Also some EFI fixes"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/ldt: Make modify_ldt synchronous
x86/xen: Probe target addresses in set_aliased_prot() before the hypercall
x86/irq: Use the caller provided polarity setting in mp_check_pin_attr()
efi: Check for NULL efi kernel parameters
x86/efi: Use all 64 bit of efi_memmap in setup_e820()
Vladimir Zapolskiy [Mon, 27 Jul 2015 14:30:48 +0000 (17:30 +0300)]
i2c: fix leaked device refcount on of_find_i2c_* error path
If of_find_i2c_device_by_node() or of_find_i2c_adapter_by_node() find
a device by node, but its type does not match, a reference to that
device is still held. This change fixes the problem.
Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Linus Torvalds [Sat, 1 Aug 2015 00:10:56 +0000 (17:10 -0700)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Must teardown SR-IOV before unregistering netdev in igb driver, from
Alex Williamson.
2) Fix ipv6 route unreachable crash in IPVS, from Alex Gartrell.
3) Default route selection in ipv4 should take the prefix length, table
ID, and TOS into account, from Julian Anastasov.
4) sch_plug must have a reset method in order to purge all buffered
packets when the qdisc is reset, likewise for sch_choke, from WANG
Cong.
5) Fix deadlock and races in slave_changelink/br_setport in bridging.
From Nikolay Aleksandrov.
6) mlx4 bug fixes (wrong index in port even propagation to VFs,
overzealous BUG_ON assertion, etc.) from Ido Shamay, Jack
Morgenstein, and Or Gerlitz.
7) Turn off klog message about SCTP userspace interface compat that
makes no sense at all, from Daniel Borkmann.
8) Fix unbounded restarts of inet frag eviction process, causing NMI
watchdog soft lockup messages, from Florian Westphal.
9) Suspend/resume fixes for r8152 from Hayes Wang.
10) Fix busy loop when MSG_WAITALL|MSG_PEEK is used in TCP recv, from
Sabrina Dubroca.
11) Fix performance regression when removing a lot of routes from the
ipv4 routing tables, from Alexander Duyck.
12) Fix device leak in AF_PACKET, from Lars Westerhoff.
13) AF_PACKET also has a header length comparison bug due to signedness,
from Alexander Drozdov.
14) Fix bug in EBPF tail call generation on x86, from Daniel Borkmann.
15) Memory leaks, TSO stats, watchdog timeout and other fixes to
thunderx driver from Sunil Goutham and Thanneeru Srinivasulu.
16) act_bpf can leak memory when replacing programs, from Daniel
Borkmann.
17) WOL packet fixes in gianfar driver, from Claudiu Manoil.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (79 commits)
stmmac: fix missing MODULE_LICENSE in stmmac_platform
gianfar: Enable device wakeup when appropriate
gianfar: Fix suspend/resume for wol magic packet
gianfar: Fix warning when CONFIG_PM off
act_pedit: check binding before calling tcf_hash_release()
net: sk_clone_lock() should only do get_net() if the parent is not a kernel socket
net: sched: fix refcount imbalance in actions
r8152: reset device when tx timeout
r8152: add pre_reset and post_reset
qlcnic: Fix corruption while copying
act_bpf: fix memory leaks when replacing bpf programs
net: thunderx: Fix for crash while BGX teardown
net: thunderx: Add PCI driver shutdown routine
net: thunderx: Fix crash when changing rss with mutliple traffic flows
net: thunderx: Set watchdog timeout value
net: thunderx: Wakeup TXQ only if CQE_TX are processed
net: thunderx: Suppress alloc_pages() failure warnings
net: thunderx: Fix TSO packet statistic
net: thunderx: Fix memory leak when changing queue count
net: thunderx: Fix RQ_DROP miscalculation
...
Linus Torvalds [Sat, 1 Aug 2015 00:05:37 +0000 (17:05 -0700)]
Merge branch 'for-linus-4.2' of git://git./linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
"Filipe fixed up a hard to trigger ENOSPC regression from our merge
window pull, and we have a few other smaller fixes"
* 'for-linus-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: fix quick exhaustion of the system array in the superblock
btrfs: its btrfs_err() instead of btrfs_error()
btrfs: Avoid NULL pointer dereference of free_extent_buffer when read_tree_block() fail
btrfs: Fix lockdep warning of btrfs_run_delayed_iputs()
Linus Torvalds [Sat, 1 Aug 2015 00:00:25 +0000 (17:00 -0700)]
Merge tag 'sound-4.2-rc5' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"This became a relative big update as it includes the collected ASoC
fixes. There are a few fixes in ASoC core side, mostly for DAPM and
the new topology API. The rest are various ASoC driver-specific
fixes, as well as the usual HD-audio and USB-audio quirks"
* tag 'sound-4.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (29 commits)
ALSA: hda - Fix MacBook Pro 5,2 quirk
ALSA: hda - Fix race between PM ops and HDA init/probe
ALSA: usb-audio: add dB range mapping for some devices
ALSA: hda - Apply a fixup to Dell Vostro 5480
ALSA: hda - Add pin quirk for the headset mic jack detection on Dell laptop
ALSA: hda - Apply fixup for another Toshiba Satellite S50D
ALSA: fireworks: add support for AudioFire2 quirk
ALSA: hda - Fix the headset mic that will not work on Dell desktop machine
ALSA: hda - fix cs4210_spdif_automute()
ASoC: pcm1681: Fix setting de-emphasis sampling rate selection
ASoC: ssm4567: Keep TDM_BCLKS in ssm4567_set_dai_fmt
ASoC: sgtl5000: Fix up define for SGTL5000_SMALL_POP
ASoC: dapm: Don't add prefix to widget stream name
ASoC: rt5645: Check if codec is initialized in workqueue handler
ASoC: Intel: Get correct usage_count value to load firmware
ASoC: topology: Fix to add dapm mixer info
ASoC: zx: spdif: Fix devm_ioremap_resource return value check
ASoC: zx: i2s: Fix devm_ioremap_resource return value check
ASoC: mediatek: Use platform_of_node for machine drivers
ASoC: Free card DAPM context on snd_soc_instantiate_card() error path
...
Joachim Eastwood [Fri, 31 Jul 2015 17:13:22 +0000 (19:13 +0200)]
stmmac: fix missing MODULE_LICENSE in stmmac_platform
Commit
50649ab14982 ("stmmac: drop driver from stmmac platform code")
was a bit overzealous in removing code and dropped the MODULE_*
macro's that are still needed since stmmac_platform can be a module.
Fix this by putting the macro's remvoed in
50649ab14982 back.
This fixes the following errors when used as a module:
stmmac_platform: module license 'unspecified' taints kernel.
Disabling lock debugging due to kernel taint
stmmac_platform: Unknown symbol devm_kmalloc (err 0)
stmmac_platform: Unknown symbol stmmac_suspend (err 0)
stmmac_platform: Unknown symbol platform_get_irq_byname (err 0)
stmmac_platform: Unknown symbol stmmac_dvr_remove (err 0)
stmmac_platform: Unknown symbol platform_get_resource (err 0)
stmmac_platform: Unknown symbol of_get_phy_mode (err 0)
stmmac_platform: Unknown symbol of_property_read_u32_array (err 0)
stmmac_platform: Unknown symbol of_alias_get_id (err 0)
stmmac_platform: Unknown symbol stmmac_resume (err 0)
stmmac_platform: Unknown symbol stmmac_dvr_probe (err 0)
Fixes:
50649ab14982 ("stmmac: drop driver from stmmac platform code")
Reported-by: Igor Gnatenko <i.gnatenko.brain@gmail.com>
Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 31 Jul 2015 22:41:50 +0000 (15:41 -0700)]
Merge branch 'gianfar-wol-fixes'
Claudiu Manoil says:
====================
gianfar: wol magic packet fixes
These changes were already validated as part of FSL SDK.
Patch 2 fixes occasional wake-on magic packet failures during
traffic, probably due to incorrect traffic stop/ device halt
sequence and incorrect usage of txlock.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Claudiu Manoil [Fri, 31 Jul 2015 15:38:33 +0000 (18:38 +0300)]
gianfar: Enable device wakeup when appropriate
The wol_en flag is 0 by default anyway, and we have the
following inconsistency: a MAGIC packet wol capable eth
interface is registered as a wake-up source but unable
to wake-up the system as wol_en is 0 (wake-on flag set to 'd').
Calling set_wakeup_enable() at netdev open is just redundant
because wol_en is 0 by default.
Let only ethtool call set_wakeup_enable() for now.
The bflock is obviously obsoleted, its utility has been corroded
over time. The bitfield flags used today in gianfar are accessed
only on the init/ config path, with no real possibility of
concurrency - nothing that would justify smth. like bflock.
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Claudiu Manoil [Fri, 31 Jul 2015 15:38:32 +0000 (18:38 +0300)]
gianfar: Fix suspend/resume for wol magic packet
If we disable NAPI in the first place we can mask the device's
interrupts (and halt it) without fearing that imask may be
concurrently accessed from interrupt context, so there's
no need to do local_irq_save() around gfar_halt_nodisable().
lock_rx_qs()/unlock_tx_qs() are just obsoleted and potentially
buggy routines. The txlock is currently used in the driver only
to manage TX congestion, it has nothing to do with halting the
device. With these changes, the TX processing is stopped before
gfar_halt().
Compact gfar_halt() is used instead of gfar_halt_nodisable(),
as it disables Rx/TX DMA h/w blocks and the Rx/TX h/w queues.
gfar_start() re-enables all these blocks on resume. Enabling
the magic-packet mode remains the same, note that the RX block
is re-enabled just before entering sleep mode.
Add IRQF_NO_SUSPEND flag for the error interrupt line, to signal
that the interrupt line must remain active during sleep in order
to wake the system by magic packet (MAG) reception interrupt.
(On some systems the MAG interrupt did trigger w/o this flag
as well, but on others it didn't.)
Without these fixes, when suspended during fair Tx traffic the
interface occasionally failed to be woken up by magic packet.
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Claudiu Manoil [Fri, 31 Jul 2015 15:38:31 +0000 (18:38 +0300)]
gianfar: Fix warning when CONFIG_PM off
CC drivers/net/ethernet/freescale/gianfar.o
drivers/net/ethernet/freescale/gianfar.c:568:13: warning: 'lock_tx_qs'
defined but not used [-Wunused-function]
static void lock_tx_qs(struct gfar_private *priv)
^
drivers/net/ethernet/freescale/gianfar.c:576:13: warning: 'unlock_tx_qs'
defined but not used [-Wunused-function]
static void unlock_tx_qs(struct gfar_private *priv)
^
Reported-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
WANG Cong [Fri, 31 Jul 2015 00:12:21 +0000 (17:12 -0700)]
act_pedit: check binding before calling tcf_hash_release()
When we share an action within a filter, the bind refcnt
should increase, therefore we should not call tcf_hash_release().
Fixes:
1a29321ed045 ("net_sched: act: Dont increment refcnt on replace")
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Cong Wang <cwang@twopensource.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Murali Karicheri [Fri, 29 May 2015 16:04:13 +0000 (12:04 -0400)]
ARM: dts: keystone: fix dt bindings to use post div register for mainpll
All of the keystone devices have a separate register to hold post
divider value for main pll clock. Currently the fixed-postdiv
value used for k2hk/l/e SoCs works by sheer luck as u-boot happens to
use a value of 2 for this. Now that we have fixed this in the pll
clock driver change the dt bindings for the same.
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Acked-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Olof Johansson <olof@lixom.net>
Linus Torvalds [Fri, 31 Jul 2015 19:34:10 +0000 (12:34 -0700)]
Merge tag 'iommu-fixes-v4.2-rc4' of git://git./linux/kernel/git/joro/iommu
Pull IOMMU fixes from Joerg Roedel:
"These fixes are all for the AMD IOMMU driver:
- A regression with HSA caused by the conversion of the driver to
default domains. The fixes make sure that an HSA device can still
be attached to an IOMMUv2 domain and that these domains also allow
non-IOMMUv2 capable devices.
- Fix iommu=pt mode which did not work because the dma_ops where set
to nommu_ops, which breaks devices that can only do 32bit DMA.
- Fix an issue with non-PCI devices not working, because there are no
dma_ops for them. This issue was discovered recently as new AMD
x86 platforms have non-PCI devices too"
* tag 'iommu-fixes-v4.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/amd: Allow non-ATS devices in IOMMUv2 domains
iommu/amd: Set global dma_ops if swiotlb is disabled
iommu/amd: Use swiotlb in passthrough mode
iommu/amd: Allow non-IOMMUv2 devices in IOMMUv2 domains
iommu/amd: Use iommu core for passthrough mode
iommu/amd: Use iommu_attach_group()
Linus Torvalds [Fri, 31 Jul 2015 19:11:01 +0000 (12:11 -0700)]
Merge tag 'drm-intel-fixes-2015-07-31' of git://anongit.freedesktop.org/drm-intel
Pull drm intel fixes from Daniel Vetter:
"I delayed my -fixes pull a bit hoping that I could include a fix for
the dp mst stuff but looks a bit more nasty than that. So just 3
other regression fixes, one 4.2 other two cc: stable"
* tag 'drm-intel-fixes-2015-07-31' of git://anongit.freedesktop.org/drm-intel:
drm/i915: Declare the swizzling unknown for L-shaped configurations
drm/i915: Mark PIN_USER binding as GLOBAL_BIND without the aliasing ppgtt
drm/i915: Replace WARN inside I915_READ64_2x32 with retry loop
Linus Torvalds [Fri, 31 Jul 2015 19:05:02 +0000 (12:05 -0700)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"This has a bunch of nouveau fixes, as Ben has been hibernating and has
lots of small fixes for lots of bugs across nouveau.
Radeon has one major fix for hdmi/dp audio regression that is larger
than Alex would like, but seems to fix up a fair few bugs, along with
some misc fixes.
And a few msm fixes, one of which is also a bit large.
But nothing in here seems insane or crazy for this stage, just more
than I'd like"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (33 commits)
drm/msm/mdp5: release SMB (shared memory blocks) in various cases
drm/msm: change to uninterruptible wait in atomic commit
drm/msm: mdp4: Fix drm_framebuffer dereference crash
drm/msm: fix msm_gem_prime_get_sg_table()
drm/amdgpu: add new parameter to seperate map and unmap
drm/amdgpu: hdp_flush is not needed for inside IB
drm/amdgpu: different emit_ib for gfx and compute
drm/amdgpu: information leak in amdgpu_info_ioctl()
drm/amdgpu: clean up init sequence for failures
drm/radeon/combios: add some validation of lvds values
drm/radeon: rework audio modeset to handle non-audio hdmi features
drm/radeon: rework audio detect (v4)
drm/amdgpu: Drop drm/ prefix for including drm.h in amdgpu_drm.h
drm/radeon: Drop drm/ prefix for including drm.h in radeon_drm.h
drm/nouveau/nouveau/ttm: fix tiled system memory with Maxwell
drm/nouveau/kms/nv50-: guard against enabling cursor on disabled heads
drm/nouveau/fbcon/g80: reduce PUSH_SPACE alloc, fire ring on accel init
drm/nouveau/fbcon/gf100-: reduce RING_SPACE allocation
drm/nouveau/fbcon/nv11-: correctly account for ring space usage
drm/nouveau/bios: add proper support for opcode 0x59
...
Jun Nie [Fri, 10 Jul 2015 12:02:49 +0000 (20:02 +0800)]
Revert "dmaengine: virt-dma: don't always free descriptor upon completion"
This reverts commit
b9855f03d560d351e95301b9de0bc3cad3b31fe9.
The patch break existing DMA usage case. For example, audio SOC
dmaengine never release channel and cause virt-dma to cache too
much memory in descriptor to exhaust system memory.
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Thomas Petazzoni [Wed, 8 Jul 2015 14:28:14 +0000 (16:28 +0200)]
dmaengine: mv_xor: fix big endian operation in register mode
Commit
6f166312c6ea2 ("dmaengine: mv_xor: add support for a38x command
in descriptor mode") introduced the support for a feature that
appeared in Armada 38x: specifying the operation to be performed in a
per-descriptor basis rather than globally per channel.
However, when doing so, it changed the function mv_chan_set_mode() to
use:
if (IS_ENABLED(__BIG_ENDIAN))
instead of:
#if defined(__BIG_ENDIAN)
While IS_ENABLED() is perfectly fine for CONFIG_* symbols, it is not
for other symbols such as __BIG_ENDIAN that is provided directly by
the compiler. Consequently, the commit broke support for big-endian,
as the XOR_DESCRIPTOR_SWAP flag was not set in the XOR channel
configuration register.
The primarily visible effect was some nasty warnings and failures
appearing during the self-test of the XOR unit:
[ 1.197368] mv_xor
d0060900.xor: error on chan 0. intr cause 0x00000082
[ 1.197393] mv_xor
d0060900.xor: config 0x00008440
[ 1.197410] mv_xor
d0060900.xor: activation 0x00000000
[ 1.197427] mv_xor
d0060900.xor: intr cause 0x00000082
[ 1.197443] mv_xor
d0060900.xor: intr mask 0x000003f7
[ 1.197460] mv_xor
d0060900.xor: error cause 0x00000000
[ 1.197477] mv_xor
d0060900.xor: error addr 0x00000000
[ 1.197491] ------------[ cut here ]------------
[ 1.197513] WARNING: CPU: 0 PID: 1 at ../drivers/dma/mv_xor.c:664 mv_xor_interrupt_handler+0x14c/0x170()
See also:
http://storage.kernelci.org/next/next-
20150617/arm-mvebu_v7_defconfig+CONFIG_CPU_BIG_ENDIAN=y/lab-khilman/boot-armada-xp-openblocks-ax3-4.txt
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Fixes:
6f166312c6ea2 ("dmaengine: mv_xor: add support for a38x command in descriptor mode")
Reviewed-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Rameshwar Prasad Sahu [Tue, 7 Jul 2015 10:04:25 +0000 (15:34 +0530)]
dmaengine: xgene-dma: Fix the resource map to handle overlapping
There is an overlap in dma ring cmd csr region due to sharing of ethernet
ring cmd csr region. This patch fix the resource overlapping by mapping
the entire dma ring cmd csr region.
Signed-off-by: Rameshwar Prasad Sahu <rsahu@apm.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Cyrille Pitchen [Tue, 30 Jun 2015 12:36:57 +0000 (14:36 +0200)]
dmaengine: at_xdmac: fix transfer data width in at_xdmac_prep_slave_sg()
This patch adds the missing update of the transfer data width in
at_xdmac_prep_slave_sg().
Indeed, for each item in the scatter-gather list, we check whether the
transfer length is aligned with the data width provided by
dmaengine_slave_config(). If so, we directly use this data width for the
current part of the transfer we are preparing. Otherwise, the data width
is reduced to 8 bits (1 byte). Of course, the actual number of register
accesses must also be updated to match the new data width.
So one chunk was missing in the original patch (see Fixes tag below): the
number of register accesses was correctly set to (len >> fixed_dwidth) in
mbr_ubc but the real data width was not updated in mbr_cfg. Since mbr_cfg
may change for each part of the scatter-gather transfer this also explains
why the original patch used the Descriptor View 2 instead of the
Descriptor View 1.
Let's take the example of a DMA transfer to write 8bit data into an Atmel
USART with FIFOs. When FIFOs are enabled in the USART, its Transmit
Holding Register (THR) works in multidata mode, that is to say that up to
4 8bit data can be written into the THR in a single 32bit access and it is
still possible to write only one data with a 8bit access. To take
advantage of this new feature, the DMA driver was modified to allow
multiple dwidths when doing slave transfers.
For instance, when the total length is 22 bytes, the USART driver splits
the transfer into 2 parts:
First part: 20 bytes transferred through 5 32bit writes into THR
Second part: 2 bytes transferred though 2 8bit writes into THR
For the second part, the data width was first set to 4_BYTES by the USART
driver thanks to dmaengine_slave_config() then at_xdmac_prep_slave_sg()
reduces this data width to 1_BYTE because the 2 byte length is not aligned
with the original 4_BYTES data width. Since the data width is modified,
the actual number of writes into THR must be set accordingly.
Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com>
Fixes:
6d3a7d9e3ada ("dmaengine: at_xdmac: allow muliple dwidths when doing slave transfers")
Cc: stable@vger.kernel.org #4.0 and later
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Acked-by: Ludovic Desroches <ludovic.desroches@atmel.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Cyrille Pitchen [Thu, 18 Jun 2015 11:25:41 +0000 (13:25 +0200)]
dmaengine: at_hdmac: fix residue computation
As claimed by the programmer datasheet and confirmed by the IP designer,
the Block Transfer Size (BTSIZE) bitfield of the Channel x Control A
Register (CTRLAx) always refers to a number of Source Width (SRC_WIDTH)
transfers.
Both the SRC_WIDTH and BTSIZE bitfields can be extacted from the CTRLAx
register to compute the DMA residue. So the 'tx_width' field is useless
and can be removed from the struct at_desc.
Before this patch, atc_prep_slave_sg() was not consistent: BTSIZE was
correctly initialized according to the SRC_WIDTH but 'tx_width' was always
set to reg_width, which was incorrect for MEM_TO_DEV transfers. It led to
bad DMA residue when 'tx_width' != SRC_WIDTH.
Also the 'tx_width' field was mostly set only in the first and last
descriptors. Depending on the kind of DMA transfer, this field remained
uninitialized for intermediate descriptors. The accurate DMA residue was
computed only when the currently processed descriptor was the first or the
last of the chain. This algorithm was a little bit odd. An accurate DMA
residue can always be computed using the SRC_WIDTH and BTSIZE bitfields
in the CTRLAx register.
Finally, the test to check whether the currently processed descriptor is
the last of the chain was wrong: for cyclic transfer, last_desc->lli.dscr
is NOT equal to zero, since set_desc_eol() is never called, but logically
equal to first_desc->txd.phys. This bug has a side effect on the
drivers/tty/serial/atmel_serial.c driver, which uses cyclic DMA transfer
to receive data. Since the DMA residue was wrong each time the DMA
transfer reaches the second (and last) period of the transfer, no more
data were received by the USART driver till the cyclic DMA transfer loops
back to the first period.
Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com>
Acked-by: Torsten Fleischer <torfl6749@gmail.com>
Tested-by: JirĂ Prchal <jiri.prchal@aksignal.cz>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Ludovic Desroches [Wed, 17 Jun 2015 14:22:26 +0000 (16:22 +0200)]
dmaengine: at_xdmac: fix bug about channel configuration
When using descriptor view 2 or higher, we don't write the configuration
into AT_XDMAC_CC register because this configuration will be fetch from
the descriptor. Unfortunately, the PROT bit is not updated with this
method, we have to do it manually before enabling the channel.
Signed-off-by: Ludovic Desroches <ludovic.desroches@atmel.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Joerg Roedel [Thu, 30 Jul 2015 09:24:45 +0000 (11:24 +0200)]
iommu/amd: Allow non-ATS devices in IOMMUv2 domains
With the grouping of multi-function devices a non-ATS
capable device might also end up in the same domain as an
IOMMUv2 capable device.
So handle this situation gracefully and don't consider it a
bug anymore.
Tested-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Masanari Iida [Tue, 28 Jul 2015 11:11:23 +0000 (20:11 +0900)]
i2c: Fix typo in i2c-bfin-twi.c
This patch fix some typos found in a printk message and
MODULE_DESCRIPTION.
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Sonic Zhang <sonic.zhang@analog.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Jan Luebbe [Wed, 8 Jul 2015 14:35:27 +0000 (16:35 +0200)]
i2c: omap: fix bus recovery setup
At least on the AM335x, enabling OMAP_I2C_SYSTEST_ST_EN is not enough to
allow direct access to the SCL and SDA pins. In addition to ST_EN, we
need to set the TMODE to 0b11 (Loop back & SDA/SCL IO mode select).
Also, as the reset values of SCL_O and SDA_O are 0 (which means "drive
low level"), we need to set them to 1 (which means "high-impedance") to
avoid unwanted changes on the pins.
As a precaution, reset all these bits to their default values after
recovery is complete.
Signed-off-by: Jan Luebbe <jlu@pengutronix.de>
Tested-by: Alexander Sverdlin <alexander.sverdlin@gmail.com>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Jan Luebbe [Wed, 8 Jul 2015 14:35:06 +0000 (16:35 +0200)]
i2c: core: only use set_scl for bus recovery after calling prepare_recovery
Using set_scl may be ineffective before calling the driver specific
prepare_recovery callback, which might change into a test mode. So
instead of setting SCL in i2c_generic_scl_recovery, move it to
i2c_generic_recovery (after the optional prepare_recovery).
Signed-off-by: Jan Luebbe <jlu@pengutronix.de>
Acked-by: Alexander Sverdlin <alexander.sverdlin@nokia.com>
Tested-by: Alexander Sverdlin <alexander.sverdlin@gmail.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Vladimir Zapolskiy [Sun, 26 Jul 2015 21:18:51 +0000 (00:18 +0300)]
misc: eeprom: at24: clean up at24_bin_write()
The change removes redundant sysfs binary file boundary check, since
this task is already done on caller side in fs/sysfs/file.c
Signed-off-by: Vladimir Zapolskiy <vz@mleia.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Vladimir Zapolskiy [Sun, 26 Jul 2015 21:16:31 +0000 (00:16 +0300)]
i2c: slave eeprom: clean up sysfs bin attribute read()/write()
The change removes redundant sysfs binary file boundary checks,
since this task is already done on caller side in fs/sysfs/file.c
Note, on file size overflow read() now returns 0, and this is a
correct and expected EOF notification according to POSIX.
Signed-off-by: Vladimir Zapolskiy <vz@mleia.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Ilya Dryomov [Thu, 16 Jul 2015 14:36:11 +0000 (17:36 +0300)]
rbd: fix copyup completion race
For write/discard obj_requests that involved a copyup method call, the
opcode of the first op is CEPH_OSD_OP_CALL and the ->callback is
rbd_img_obj_copyup_callback(). The latter frees copyup pages, sets
->xferred and delegates to rbd_img_obj_callback(), the "normal" image
object callback, for reporting to block layer and putting refs.
rbd_osd_req_callback() however treats CEPH_OSD_OP_CALL as a trivial op,
which means obj_request is marked done in rbd_osd_trivial_callback(),
*before* ->callback is invoked and rbd_img_obj_copyup_callback() has
a chance to run. Marking obj_request done essentially means giving
rbd_img_obj_callback() a license to end it at any moment, so if another
obj_request from the same img_request is being completed concurrently,
rbd_img_obj_end_request() may very well be called on such prematurally
marked done request:
<obj_request-1/2 reply>
handle_reply()
rbd_osd_req_callback()
rbd_osd_trivial_callback()
rbd_obj_request_complete()
rbd_img_obj_copyup_callback()
rbd_img_obj_callback()
<obj_request-2/2 reply>
handle_reply()
rbd_osd_req_callback()
rbd_osd_trivial_callback()
for_each_obj_request(obj_request->img_request) {
rbd_img_obj_end_request(obj_request-1/2)
rbd_img_obj_end_request(obj_request-2/2) <--
}
Calling rbd_img_obj_end_request() on such a request leads to trouble,
in particular because its ->xfferred is 0. We report 0 to the block
layer with blk_update_request(), get back 1 for "this request has more
data in flight" and then trip on
rbd_assert(more ^ (which == img_request->obj_request_count));
with rhs (which == ...) being 1 because rbd_img_obj_end_request() has
been called for both requests and lhs (more) being 1 because we haven't
got a chance to set ->xfferred in rbd_img_obj_copyup_callback() yet.
To fix this, leverage that rbd wants to call class methods in only two
cases: one is a generic method call wrapper (obj_request is standalone)
and the other is a copyup (obj_request is part of an img_request). So
make a dedicated handler for CEPH_OSD_OP_CALL and directly invoke
rbd_img_obj_copyup_callback() from it if obj_request is part of an
img_request, similar to how CEPH_OSD_OP_READ handler invokes
rbd_img_obj_request_read_callback().
Since rbd_img_obj_copyup_callback() is now being called from the OSD
request callback (only), it is renamed to rbd_osd_copyup_callback().
Cc: Alex Elder <elder@linaro.org>
Cc: stable@vger.kernel.org # 3.10+, needs backporting for < 3.18
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Alex Elder <elder@linaro.org>
Yan, Zheng [Mon, 20 Jul 2015 01:50:58 +0000 (09:50 +0800)]
ceph: always re-send cap flushes when MDS recovers
commit
e548e9b93d3e565e42b938a99804114565be1f81 makes the kclient
only re-send cap flush once during MDS failover. If the kclient sends
a cap flush after MDS enters reconnect stage but before MDS recovers.
The kclient will skip re-sending the same cap flush when MDS recovers.
This causes problem for newly created inode. The MDS handles cap
flushes before replaying unsafe requests, so it's possible that MDS
find corresponding inode is missing when handling cap flush. The fix
is reverting to old behaviour: always re-send when MDS recovers
Signed-off-by: Yan, Zheng <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Yan, Zheng [Tue, 7 Jul 2015 08:18:46 +0000 (16:18 +0800)]
ceph: fix ceph_encode_locks_to_buffer()
posix locks should be in ctx->flc_posix list
Signed-off-by: Yan, Zheng <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Andy Lutomirski [Thu, 30 Jul 2015 21:31:32 +0000 (14:31 -0700)]
x86/ldt: Make modify_ldt synchronous
modify_ldt() has questionable locking and does not synchronize
threads. Improve it: redesign the locking and synchronize all
threads' LDTs using an IPI on all modifications.
This will dramatically slow down modify_ldt in multithreaded
programs, but there shouldn't be any multithreaded programs that
care about modify_ldt's performance in the first place.
This fixes some fallout from the CVE-2015-5157 fixes.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: security@kernel.org <security@kernel.org>
Cc: <stable@vger.kernel.org>
Cc: xen-devel <xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/4c6978476782160600471bd865b318db34c7b628.1438291540.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Andy Lutomirski [Thu, 30 Jul 2015 21:31:31 +0000 (14:31 -0700)]
x86/xen: Probe target addresses in set_aliased_prot() before the hypercall
The update_va_mapping hypercall can fail if the VA isn't present
in the guest's page tables. Under certain loads, this can
result in an OOPS when the target address is in unpopulated vmap
space.
While we're at it, add comments to help explain what's going on.
This isn't a great long-term fix. This code should probably be
changed to use something like set_memory_ro.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Vrabel <dvrabel@cantab.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: security@kernel.org <security@kernel.org>
Cc: <stable@vger.kernel.org>
Cc: xen-devel <xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/0b0e55b995cda11e7829f140b833ef932fcabe3a.1438291540.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ingo Molnar [Fri, 31 Jul 2015 07:55:26 +0000 (09:55 +0200)]
Merge tag 'efi-urgent' of git://git./linux/kernel/git/mfleming/efi into x86/urgent
Pull EFI fixes from Matt Fleming:
* Fix an EFI boot issue preventing a Parallels virtual machine from
booting because the upper 32-bits of the EFI memmap pointer were
being discarded in setup_e820(). (Dmitry Skorodumov)
* Validate that the "efi" kernel parameter gets used with an argument,
otherwise we will oops. (Ricardo Neri)
Signed-off-by: Ingo Molnar <mingo@kernel.org>