review.tizen.org Git - sdk/emulator/emulator-kernel.git/log

Smack: fix backport of multi-onlycap patch

Adapt the patch for multiple labels in onlycap to older kernel version.
It was previously backported without an important dependency. There is a
difference in smk_import_entry function. In upstream it returns error codes,
but here the error is indicated by returning NULL.
Without this fix the kernel could crash when empty string is written to
onlycap interface file.

Change-Id: Ibadab8b78b86453526cd423100619ab0a10fa68c
Signed-off-by: Rafal Krypa <r.krypa@samsung.com>

Smack: update patch for multi-onlycap to the final upstream version

Synchronize the patch enabling multiple labels in onlycap with the last
version that was merged upstream. The patch merged in this tree was an
earlier version, before it was updated and merged upstream.
Changes are only cosmetic (function name, comments, code formatting), but
merging them will ease future synchronization with upstream Smack code.

Change-Id: Iefe9ec32659043e62bdf2a227aad8f42c3563b9d
Signed-off-by: Rafal Krypa <r.krypa@samsung.com>

Smack: allow multiple labels in onlycap

Smack onlycap allows limiting of CAP_MAC_ADMIN and CAP_MAC_OVERRIDE to
processes running with the configured label. But having single privileged
label is not enough in some real use cases. On a complex system like Tizen,
there maybe few programs that need to configure Smack policy in run-time
and running them all with a single label is not always practical.
This patch extends onlycap feature for multiple labels. They are configured
in the same smackfs "onlycap" interface, separated by spaces.

Change-Id: Ia95b93b4474669b7fd02926926e10b814b78405c
Signed-off-by: Rafal Krypa <r.krypa@samsung.com>
(cherry picked from commit c0a3794dfc6a153294fa90f6499a43c78a608047)

Smack: fix seq operations in smackfs

Use proper RCU functions and read locking in smackfs seq_operations.

Smack gets away with not using proper RCU functions in smackfs, because
it never removes entries from these lists. But now one list will be
needed (with interface in smackfs) that will have both elements added and
removed to it.
This change will also help any future changes implementing removal of
unneeded entries from other Smack lists.

The patch also fixes handling of pos argument in smk_seq_start and
smk_seq_next. This fixes a bug in case when smackfs is read with a small
buffer:

Kernel panic - not syncing: Kernel mode fault at addr 0xfa0000011b
CPU: 0 PID: 1292 Comm: dd Not tainted 4.1.0-rc1-00012-g98179b8 #13
Stack:
00000003 0000000d 7ff39e48 7f69fd00
7ff39ce0 601ae4b0 7ff39d50 600e587b
00000010 6039f690 7f69fd40 00612003
Call Trace:
[<601ae4b0>] load2_seq_show+0x19/0x1d
[<600e587b>] seq_read+0x168/0x331
[<600c5943>] __vfs_read+0x21/0x101
[<601a595e>] ? security_file_permission+0xf8/0x105
[<600c5ec6>] ? rw_verify_area+0x86/0xe2
[<600c5fc3>] vfs_read+0xa1/0x14c
[<600c68e2>] SyS_read+0x57/0xa0
[<6001da60>] handle_syscall+0x60/0x80
[<6003087d>] userspace+0x442/0x548
[<6001aa77>] ? interrupt_end+0x0/0x80
[<6001daae>] ? copy_chunk_to_user+0x0/0x2b
[<6002cb6b>] ? save_registers+0x1f/0x39
[<60032ef7>] ? arch_prctl+0xf5/0x170
[<6001a92d>] fork_handler+0x85/0x87

Change-Id: I032c1fc726c0670060d1cf4c419746257159b499
Signed-off-by: Rafal Krypa <r.krypa@samsung.com>
(cherry picked from commit f638effaf324d57f37453e421be87e537140e527)

virtio_blk: removed W/A for sdcard support

A sdcard support will be done by udev rules and deviced.

Change-Id: I126ba1c72e1215ee28ee4da34791877d86b56435
Signed-off-by: SeokYeon Hwang <syeon.hwang@samsung.com>

package: version up

Change-Id: I629d413854a34c5f21fda4de027772d8a2a1b0a5
Signed-off-by: Munkyu Im <munkyu.im@samsung.com>

virtio-net: support MII

It's incomplete.
But support it to enable ioctl call on guest side.

Change-Id: Iaec1ed63fe5f0ce2fff19be42890ca1063555179
Signed-off-by: Munkyu Im <munkyu.im@samsung.com>

packaging: remove dependency.

Remove the emulator-kernel-user-headers build dependency.

Change-Id: I6aa468dab8834e72789320e4d87e8e7752d6a951
Signed-off-by: Sooyoung Ha <yoosah.ha@samsung.com>

Revert "packaging: use linux-glibc-devel instread of emulator-kernel-headers"

This reverts commit 12fbb0687ba2d1f8ecbeeac8c3be53cc378f2b37.

Change-Id: Iac02a9324f10006953f82409283a526531d2bc9e
(cherry picked from commit 08becd57e5e8375b2e2dc2fe48de664757bfde66)

sensor: added logs for sensor capability debug purpose

Change-Id: Ibdf600bc24c9860200a171d88583ba56f560cb68
Signed-off-by: Jinhyung Choi <jinhyung2.choi@samsung.com>

sensor: add mutex for get_sensor_data in accelerometer

Change-Id: I6cc802e5b2ca413984b285798a2867419cec3977
Signed-off-by: Jinhyung Choi <jinhyung2.choi@samsung.com>

sensor: split scatter list initialization

split in/out virtqueue scatterlist to set sg_mark_end each

Change-Id: I562d51af7ab1d5e8b57b8971e7fe4d40d971a55b
Signed-off-by: Jinhyung Choi <jinhyung2.choi@samsung.com>

rotary: Apply the changed specifications

The value of REL_WHEEL event was chaneged like below.
- Before
  Right direction: 1 -> 2 -> 3
  Left direction: 3 -> 2-> 1
- After
  Departing from the current detent area: 1
  Returning the current detent area: -1
  Right direction(CW): 2
  Left direction(CCW): -2

Change-Id: Ibd86e5c97839ea90383797096ed2c887a703e8b7
Signed-off-by: sungmin ha <sungmin82.ha@samsung.com>

rotary: modified virtual rotary driver

tizen_rotary -> tizen_detent
Instead of the delta that the change of the degree,
sends up the detent value in every 15 degrees.

Change-Id: I1b2b7ea8e4a2ff4ac90626710ddfe7f691ad29e2
Signed-off-by: GiWoong Kim <giwoong.kim@samsung.com>

rotary: Added a new device driver

Added a new device driver for the rotary device

Change-Id: I8a388a1b40315a47e60dbf00f17ad0ad69d8414c
Signed-off-by: Jinhyung Jo <jinhyung.jo@samsung.com>

sensor driver: waited for set_sensor_data

In order to resolve timing issue, set_sensor_data is waiting for a callback
similar to get_sensor_data.

Change-Id: I436375ea99b5f19c07d0683a89dd35d50218f830
Signed-off-by: Jinhyung Choi <jinhyung2.choi@samsung.com>
(cherry picked from commit b603734d766fa00ef69f2c075dd6264a7ad1bdb6)

sensor: virtio memory allocation flag changed.

Change-Id: I58d577e26826a8f055a7c4e0349d8f326cb3ed7e
Signed-off-by: Jinhyung Choi <jinhyung2.choi@samsung.com>
(cherry picked from commit 44fd6adc9dbdade3a714b3efac550e8062e2b880)

config: set NOOP as a default IO scheduler

Change-Id: I252dc0d20afb20559efe02ceeb90f0b825af6ce9
Signed-off-by: SeokYeon Hwang <syeon.hwang@samsung.com>
(cherry picked from commit df6d0674dc7883bf1a971ffb8d969e86074537da)

config: enabled CONFIG_ANDROID_INTF_ALARM_DEV

Change-Id: I1af3b208e2dbaa28487a73f02ce04a660770e187
Signed-off-by: SeokYeon Hwang <syeon.hwang@samsung.com>
(cherry picked from commit 2b8f290969821e5602136a6abda02d6756242874)

package: restructure directories

change installed directory

Change-Id: If295b80efe71dd6090123f17b3c2df61e3d1c606
Signed-off-by: Munkyu Im <munkyu.im@samsung.com>

Package: Modified package name.

- Added platform version("2.4") to package name.

Change-Id: Ia6a5eb5eca93330bdca90f799b6e941705f3669a
Signed-off-by: minkee.lee <minkee.lee@samsung.com>

Merge branch 'tizen_2.4' into tizen_2.4_develop

Change-Id: I94112bf901574a1ae450adcc7f545a3dd9a295b5
Signed-off-by: minkee.lee <minkee.lee@samsung.com>

package: version up(3.14.5)

Change-Id: I40c6f16893d0be8747a36ff20aca4918c3f18035
Signed-off-by: sungmin ha <sungmin82.ha@samsung.com>

vfs: read file_handle only once in handle_to_path

This patch was related with "[CVE-2015-1420] Race condition in fs/fhandle.c in the Linux kernel".

We used to read file_handle twice. Once to get the amount of extra bytes, and
once to fetch the entire structure.

This may be problematic since we do size verifications only after the first
read, so if the number of extra bytes changes in userspace between the first
and second calls, we'll have an incoherent view of file_handle.

Instead, read the constant size once, and copy that over to the final
structure without having to re-read it again.

Change-Id: I318d7428079e323f53bc7eb1f7dc0a5dfac7eb0b
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Byungsoo Kim <bs1770.kim@samsung.com>
Signed-off-by: sungmin ha <sungmin82.ha@samsung.com>

x86, mm/ASLR: Fix stack randomization on 64-bit systems

The issue is that the stack for processes is not properly randomized on
64 bit architectures due to an integer overflow.

The affected function is randomize_stack_top() in file
"fs/binfmt_elf.c":

  static unsigned long randomize_stack_top(unsigned long stack_top)
  {
           unsigned int random_variable = 0;

           if ((current->flags & PF_RANDOMIZE) &&
                   !(current->personality & ADDR_NO_RANDOMIZE)) {
                   random_variable = get_random_int() & STACK_RND_MASK;
                   random_variable <<= PAGE_SHIFT;
           }
           return PAGE_ALIGN(stack_top) + random_variable;
           return PAGE_ALIGN(stack_top) - random_variable;
  }

Note that, it declares the "random_variable" variable as "unsigned int".
Since the result of the shifting operation between STACK_RND_MASK (which
is 0x3fffff on x86_64, 22 bits) and PAGE_SHIFT (which is 12 on x86_64):

  random_variable <<= PAGE_SHIFT;

then the two leftmost bits are dropped when storing the result in the
"random_variable". This variable shall be at least 34 bits long to hold
the (22+12) result.

These two dropped bits have an impact on the entropy of process stack.
Concretely, the total stack entropy is reduced by four: from 2^28 to
2^30 (One fourth of expected entropy).

This patch restores back the entropy by correcting the types involved
in the operations in the functions randomize_stack_top() and
stack_maxrandom_size().

The successful fix can be tested with:

  $ for i in `seq 1 10`; do cat /proc/self/maps | grep stack; done
  7ffeda566000-7ffeda587000 rw-p 00000000 00:00 0                          [stack]
  7fff5a332000-7fff5a353000 rw-p 00000000 00:00 0                          [stack]
  7ffcdb7a1000-7ffcdb7c2000 rw-p 00000000 00:00 0                          [stack]
  7ffd5e2c4000-7ffd5e2e5000 rw-p 00000000 00:00 0                          [stack]
  ...

Once corrected, the leading bytes should be between 7ffc and 7fff,
rather than always being 7fff.

Change-Id: I961d7977c511e0228a92f0020021fe50589e3e95
Signed-off-by: Hector Marco-Gisbert <hecmargi@upv.es>
Signed-off-by: Ismael Ripoll <iripoll@upv.es>
[ Rebased, fixed 80 char bugs, cleaned up commit message, added test example and CVE ]
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: <stable@vger.kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Fixes: CVE-2015-1593
Link: http://lkml.kernel.org/r/20150214173350.GA18393@www.outflux.net
Signed-off-by: Borislav Petkov <bp@suse.de>

net: sctp: fix slab corruption from use after free on INIT collisions

When hitting an INIT collision case during the 4WHS with AUTH enabled, as
already described in detail in commit 1be9a950c646 ("net: sctp: inherit
auth_capable on INIT collisions"), it can happen that we occasionally
still remotely trigger the following panic on server side which seems to
have been uncovered after the fix from commit 1be9a950c646 ...

[  533.876389] BUG: unable to handle kernel paging request at 00000000ffffffff
[  533.913657] IP: [<ffffffff811ac385>] __kmalloc+0x95/0x230
[  533.940559] PGD 5030f2067 PUD 0
[  533.957104] Oops: 0000 [#1] SMP
[  533.974283] Modules linked in: sctp mlx4_en [...]
[  534.939704] Call Trace:
[  534.951833]  [<ffffffff81294e30>] ? crypto_init_shash_ops+0x60/0xf0
[  534.984213]  [<ffffffff81294e30>] crypto_init_shash_ops+0x60/0xf0
[  535.015025]  [<ffffffff8128c8ed>] __crypto_alloc_tfm+0x6d/0x170
[  535.045661]  [<ffffffff8128d12c>] crypto_alloc_base+0x4c/0xb0
[  535.074593]  [<ffffffff8160bd42>] ? _raw_spin_lock_bh+0x12/0x50
[  535.105239]  [<ffffffffa0418c11>] sctp_inet_listen+0x161/0x1e0 [sctp]
[  535.138606]  [<ffffffff814e43bd>] SyS_listen+0x9d/0xb0
[  535.166848]  [<ffffffff816149a9>] system_call_fastpath+0x16/0x1b

... or depending on the the application, for example this one:

[ 1370.026490] BUG: unable to handle kernel paging request at 00000000ffffffff
[ 1370.026506] IP: [<ffffffff811ab455>] kmem_cache_alloc+0x75/0x1d0
[ 1370.054568] PGD 633c94067 PUD 0
[ 1370.070446] Oops: 0000 [#1] SMP
[ 1370.085010] Modules linked in: sctp kvm_amd kvm [...]
[ 1370.963431] Call Trace:
[ 1370.974632]  [<ffffffff8120f7cf>] ? SyS_epoll_ctl+0x53f/0x960
[ 1371.000863]  [<ffffffff8120f7cf>] SyS_epoll_ctl+0x53f/0x960
[ 1371.027154]  [<ffffffff812100d3>] ? anon_inode_getfile+0xd3/0x170
[ 1371.054679]  [<ffffffff811e3d67>] ? __alloc_fd+0xa7/0x130
[ 1371.080183]  [<ffffffff816149a9>] system_call_fastpath+0x16/0x1b

With slab debugging enabled, we can see that the poison has been overwritten:

[  669.826368] BUG kmalloc-128 (Tainted: G        W     ): Poison overwritten
[  669.826385] INFO: 0xffff880228b32e50-0xffff880228b32e50. First byte 0x6a instead of 0x6b
[  669.826414] INFO: Allocated in sctp_auth_create_key+0x23/0x50 [sctp] age=3 cpu=0 pid=18494
[  669.826424]  __slab_alloc+0x4bf/0x566
[  669.826433]  __kmalloc+0x280/0x310
[  669.826453]  sctp_auth_create_key+0x23/0x50 [sctp]
[  669.826471]  sctp_auth_asoc_create_secret+0xcb/0x1e0 [sctp]
[  669.826488]  sctp_auth_asoc_init_active_key+0x68/0xa0 [sctp]
[  669.826505]  sctp_do_sm+0x29d/0x17c0 [sctp] [...]
[  669.826629] INFO: Freed in kzfree+0x31/0x40 age=1 cpu=0 pid=18494
[  669.826635]  __slab_free+0x39/0x2a8
[  669.826643]  kfree+0x1d6/0x230
[  669.826650]  kzfree+0x31/0x40
[  669.826666]  sctp_auth_key_put+0x19/0x20 [sctp]
[  669.826681]  sctp_assoc_update+0x1ee/0x2d0 [sctp]
[  669.826695]  sctp_do_sm+0x674/0x17c0 [sctp]

Since this only triggers in some collision-cases with AUTH, the problem at
heart is that sctp_auth_key_put() on asoc->asoc_shared_key is called twice
when having refcnt 1, once directly in sctp_assoc_update() and yet again
from within sctp_auth_asoc_init_active_key() via sctp_assoc_update() on
the already kzfree'd memory, which is also consistent with the observation
of the poison decrease from 0x6b to 0x6a (note: the overwrite is detected
at a later point in time when poison is checked on new allocation).

Reference counting of auth keys revisited:

Shared keys for AUTH chunks are being stored in endpoints and associations
in endpoint_shared_keys list. On endpoint creation, a null key is being
added; on association creation, all endpoint shared keys are being cached
and thus cloned over to the association. struct sctp_shared_key only holds
a pointer to the actual key bytes, that is, struct sctp_auth_bytes which
keeps track of users internally through refcounting. Naturally, on assoc
or enpoint destruction, sctp_shared_key are being destroyed directly and
the reference on sctp_auth_bytes dropped.

User space can add keys to either list via setsockopt(2) through struct
sctp_authkey and by passing that to sctp_auth_set_key() which replaces or
adds a new auth key. There, sctp_auth_create_key() creates a new sctp_auth_bytes
with refcount 1 and in case of replacement drops the reference on the old
sctp_auth_bytes. A key can be set active from user space through setsockopt()
on the id via sctp_auth_set_active_key(), which iterates through either
endpoint_shared_keys and in case of an assoc, invokes (one of various places)
sctp_auth_asoc_init_active_key().

sctp_auth_asoc_init_active_key() computes the actual secret from local's
and peer's random, hmac and shared key parameters and returns a new key
directly as sctp_auth_bytes, that is asoc->asoc_shared_key, plus drops
the reference if there was a previous one. The secret, which where we
eventually double drop the ref comes from sctp_auth_asoc_set_secret() with
intitial refcount of 1, which also stays unchanged eventually in
sctp_assoc_update(). This key is later being used for crypto layer to
set the key for the hash in crypto_hash_setkey() from sctp_auth_calculate_hmac().

To close the loop: asoc->asoc_shared_key is freshly allocated secret
material and independant of the sctp_shared_key management keeping track
of only shared keys in endpoints and assocs. Hence, also commit 4184b2a79a76
("net: sctp: fix memory leak in auth key management") is independant of
this bug here since it concerns a different layer (though same structures
being used eventually). asoc->asoc_shared_key is reference dropped correctly
on assoc destruction in sctp_association_free() and when active keys are
being replaced in sctp_auth_asoc_init_active_key(), it always has a refcount
of 1. Hence, it's freed prematurely in sctp_assoc_update(). Simple fix is
to remove that sctp_auth_key_put() from there which fixes these panics.

Change-Id: I07e48e69eaa9bc6699d75957c75244849e0b5b46
Fixes: 730fc3d05cd4 ("[SCTP]: Implete SCTP-AUTH parameter processing")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

VIGS: add base dmabuf import/export support

Change-Id: I04b3c9558d99b096a8a54b57ab89a7a8b5b225b6
Signed-off-by: Vasiliy Ulyanov <v.ulyanov@samsung.com>

VIGS: enable render-nodes feature

Change-Id: I7304e19f61b10869dc3433d68405d5edb8645de4
Signed-off-by: Vasiliy Ulyanov <v.ulyanov@samsung.com>

package: add changelog for 3.14.4

Change-Id: I489e815d248313e748d4962c3cbc1b8ba941ea98
Signed-off-by: Sooyoung Ha <yoosah.ha@samsung.com>

Smack Bring-up mode including unconfined feature

Change-Id: Ie875af7fcde4af981aab4eb0219eb7ece74ff558
Signed-off-by: jooseong.lee <jooseong.lee@samsung.com>

package: version up

version update to 3.14.4

Change-Id: I106f66a67ab625559d377bb14e17fc92f415b113
Signed-off-by: Sooyoung Ha <yoosah.ha@samsung.com>

Smack: Lock mode for the floor and hat labels

The lock access mode allows setting a read lock on a file
for with the process has only read access. The floor label is
defined to make it easy to have the basic system installed such
that everyone can read it. Once there's a desire to read lock
(rationally or otherwise) a floor file a rule needs to get set.
This happens all the time, so make the floor label a little bit
more special and allow everyone lock access, too. By implication,
give processes with the hat label (hat can read everything)
lock access as well. This reduces clutter in the Smack rule set.

Change-Id: I09b6d234701b3efc67aad30bc3ea09da35c61792
Signed-off-by: jooseong.lee <jooseong.lee@samsung.com>

Security: smack: replace kzalloc with kmem_cache for inode_smack

The patch use kmem_cache to allocate/free inode_smack since they are
alloced in high volumes making it a perfect case for kmem_cache.

As per analysis, 24 bytes of memory is wasted per allocation due
to internal fragmentation. With kmem_cache, this can be avoided.

Accounting of memory allocation is below :
total       slack            net      count-alloc/free        caller
Before (with kzalloc)
1919872      719952          1919872      29998/0          new_inode_smack+0x14

After (with kmem_cache)
1201680          0           1201680      30042/0          new_inode_smack+0x18

>From above data, we found that 719952 bytes(~700 KB) of memory is
saved on allocation of 29998 smack inodes.

Change-Id: Ia930c48bb06eb9c461eb3cf449e25a3a53be7299
Signed-off-by: jooseong.lee <jooseong.lee@samsung.com>

security: smack: add kmem_cache for smack_master_list allocations

On ARM, sizeof(struct smack_master_list) == 12. Allocation by kmalloc() uses a
32-byte-long chunk to allocate 12 bytes. Just ask ksize(). It means that 63%
of memory is simply wasted for padding bytes.

The problem is fixed in this patch by using kmem_cache. The cache allocates
struct smack_master_list using 16-byte-long chunks according to ksize(). This
reduces amount of used memory by 50%.

Change-Id: Ice10c7eb1099931a82200081110275c5717b12be
Signed-off-by: jooseong.lee <jooseong.lee@samsung.com>

security: smack: add kmem_cache for smack_rule allocations

On ARM, sizeof(struct smack_rule)==20. Allocation by kmalloc() uses a
32-byte-long chunk to allocate 20 bytes. Just ask ksize(). It means that 40%
of memory is simply wasted for padding bytes.

The problem is fixed in this patch by using kmem_cache. The cache allocates
struct smack_rule using 24-byte-long chunks according to ksize(). This reduces
amount of used memory by 25%.

Change-Id: I2753cabc78c31b695ac07bf76cc8861232b64b1d
Signed-off-by: jooseong.lee <jooseong.lee@samsung.com>

Fix a bidirectional UDS connect check

Change-Id: Ib074a4e8ea27fdfff3e30fb74ee90f32d68d37c9
Signed-off-by: jooseong.lee <jooseong.lee@samsung.com>

initramfs: mount devtmpfs before switching root

Change-Id: I553ba3997f6873657787b404e3fe3cefa2867435

package: version up (3.14.3)

Change-Id: I66f3abba94b7cacf2b4f1a7a2e65bfee722a413e
Signed-off-by: sungmin ha <sungmin82.ha@samsung.com>

config: enabled some kernel config for Tizen Zone

Change-Id: Icc49948b8eb9ef8e0c1a477ec6fa43a0f162c21b
Signed-off-by: sungmin ha <sungmin82.ha@samsung.com>

Merge "packaging: use linux-glibc-devel instread of emulator-kernel-headers" into tizen_2.4

packaging: use linux-glibc-devel instread of emulator-kernel-headers

Toolchain pakcages will be upgraded on tizen_2.4 branch soon. After
upgrading the packages, linux-glibc-devel package which is
kernel-headers based on linux-3.10 will be used for default kernel
header package.

Change-Id: Ia8fb8cf78c2c9c288f2faaa0ac4a42711f1efe80
Signed-off-by: Chanho Park <chanho61.park@samsung.com>

Package: change maintainer

Change-Id: I354790d812ff9ee08b582de43d4d68c7439f8a08
Signed-off-by: Sangho Park <sangho1206.park@samsung.com>

Merge "packaging: apply spec file to build emulator-kernel-user-headers" into tizen_2.4

Merge "build: package version up (3.14.2)" into tizen_2.4

Merge "netfilter: conntrack: disable generic tracking for known protocols" into tizen_2.4

Merge "isofs: Fix unchecked printing of ER records" into tizen_2.4

Merge "userns: Document what the invariant required for safe unprivileged mappings." into tizen_2.4

Merge "isofs: Fix infinite looping over CE entries" into tizen_2.4

Merge "KEYS: close race between key lookup and freeing" into tizen_2.4

packaging: apply spec file to build emulator-kernel-user-headers

this header package is for SWAP module build

Change-Id: Ib27108b391903ddf7eb79959a9117c7b4218c125
Signed-off-by: Sooyoung Ha <yoosah.ha@samsung.com>

build: package version up (3.14.2)

Change-Id: I1196a65af8298d2630f2f1230c9970aed0d77e28
Signed-off-by: Jinhyung Choi <jinhyung2.choi@samsung.com>

sensor: a global sensor mutex for message transfer

Change-Id: I49f9506634aeb2d87cc24b931c186a8d24e25161
Signed-off-by: Jinhyung Choi <jinhyung2.choi@samsung.com>

netfilter: conntrack: disable generic tracking for known protocols

Given following iptables ruleset:

-P FORWARD DROP
-A FORWARD -m sctp --dport 9 -j ACCEPT
-A FORWARD -p tcp --dport 80 -j ACCEPT
-A FORWARD -p tcp -m conntrack -m state ESTABLISHED,RELATED -j ACCEPT

One would assume that this allows SCTP on port 9 and TCP on port 80.
Unfortunately, if the SCTP conntrack module is not loaded, this allows
*all* SCTP communication, to pass though, i.e. -p sctp -j ACCEPT,
which we think is a security issue.

This is because on the first SCTP packet on port 9, we create a dummy
"generic l4" conntrack entry without any port information (since
conntrack doesn't know how to extract this information).

All subsequent packets that are unknown will then be in established
state since they will fallback to proto_generic and will match the
'generic' entry.

Our originally proposed version [1] completely disabled generic protocol
tracking, but Jozsef suggests to not track protocols for which a more
suitable helper is available, hence we now mitigate the issue for in
tree known ct protocol helpers only, so that at least NAT and direction
information will still be preserved for others.

[1] http://www.spinics.net/lists/netfilter-devel/msg33430.html

Joint work with Daniel Borkmann.

Change-Id: Ic099f9cb24e84946e6d0f4352ce1201380cef2e9
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

isofs: Fix unchecked printing of ER records

We didn't check length of rock ridge ER records before printing them.
Thus corrupted isofs image can cause us to access and print some memory
behind the buffer with obvious consequences.

Change-Id: Ie5dbb6d4e0773320442e26ab1fbd01a52f3f8042
Reported-and-tested-by: Carl Henrik Lunde <chlunde@ping.uio.no>
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>

userns: Document what the invariant required for safe unprivileged mappings.

The rule is simple.  Don't allow anything that wouldn't be allowed
without unprivileged mappings.

It was previously overlooked that establishing gid mappings would
allow dropping groups and potentially gaining permission to files and
directories that had lesser permissions for a specific group than for
all other users.

This is the rule needed to fix CVE-2014-8989 and prevent any other
security issues with new_idmap_permitted.

The reason for this rule is that the unix permission model is old and
there are programs out there somewhere that take advantage of every
little corner of it.  So allowing a uid or gid mapping to be
established without privielge that would allow anything that would not
be allowed without that mapping will result in expectations from some
code somewhere being violated.  Violated expectations about the
behavior of the OS is a long way to say a security issue.

Change-Id: I66a4970dab52327190bc2c4540c4558219703267
Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

isofs: Fix infinite looping over CE entries

Rock Ridge extensions define so called Continuation Entries (CE) which
define where is further space with Rock Ridge data. Corrupted isofs
image can contain arbitrarily long chain of these, including a one
containing loop and thus causing kernel to end in an infinite loop when
traversing these entries.

Limit the traversal to 32 entries which should be more than enough space
to store all the Rock Ridge data.

Change-Id: Ia0475fc07ee3e8ecc1d53673439f32c175acc10b
Reported-by: P J P <ppandit@redhat.com>
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>

KEYS: close race between key lookup and freeing

When a key is being garbage collected, it's key->user would get put before
the ->destroy() callback is called, where the key is removed from it's
respective tracking structures.

This leaves a key hanging in a semi-invalid state which leaves a window open
for a different task to try an access key->user. An example is
find_keyring_by_name() which would dereference key->user for a key that is
in the process of being garbage collected (where key->user was freed but
->destroy() wasn't called yet - so it's still present in the linked list).

This would cause either a panic, or corrupt memory.

Fixes CVE-2014-9529.

Change-Id: I878791feeef1325bec21f3b69f4c0e449cdf32f2
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: David Howells <dhowells@redhat.com>

package: version up(3.14.1)

Change-Id: Id1b14d75ade23a897789cfb4f35b994d2771d0c2
Signed-off-by: sungmin ha <sungmin82.ha@samsung.com>

VIGS: workaround for qHD (540x960) video mode

Horizontal resolution was rounded up to 544 (GTF algorithm). It was
causing wrong rendering on emulator (black screen).

Change-Id: I15de1b24773c955c470db49f0ebb080f2e823989
Signed-off-by: Vasiliy Ulyanov <v.ulyanov@samsung.com>

VIGS: add new plane for cursor support

Change-Id: I20dcfa298a7223d6fb58e5781748b09d0978dc3d
Signed-off-by: Vasiliy Ulyanov <v.ulyanov@samsung.com>

initramfs: introduced prerun scripts.

It is very useful for manipulating some files on rootfs before init.
(Configuration files, ...)

Change-Id: I356a9190b80d3c46fb8f84e8f91d346c24263901
Signed-off-by: SeokYeon Hwang <syeon.hwang@samsung.com>

evdi: added IOCTL for booting done log

Change-Id: I08bc0f9ff1122efc84925c0af60d359c881a8ac1
Signed-off-by: Jinhyung Choi <jinhyung2.choi@samsung.com>

evdi: removed emuld connection nofication.

- The connection message is sent by emuld.

Change-Id: I3d40c422c44e74bdf79b86e6ffe7c1ebe2e1a653
Signed-off-by: Jinhyung Choi <jinhyung2.choi@samsung.com>

initramfs: support new-partitioned image

Support new-partitioned image.
Introduce rescue shell for emergency situation.

Change-Id: I4e81f3c7cbbf440a6812a85385869972892a491f
Signed-off-by: SeokYeon Hwang <syeon.hwang@samsung.com>

initramfs: using new built busybox

Using new built busybox for rich usage.

Change-Id: I8dc348dc5d8a1bc7d20a2aa6e5dd9bcfd79193e1
Signed-off-by: SeokYeon Hwang <syeon.hwang@samsung.com>

initramfs: generate /dev/console to avoid warning

Change-Id: Icbb2b0da9c2ddd846bd1d6853f987eb58808d2c1
Signed-off-by: SeokYeon Hwang <syeon.hwang@samsung.com>

initramfs: respect "init=", "root=" in kernel parameters

Change-Id: Id35d3b96deefc9df2b22471b734f6cebd9c79b97
Signed-off-by: SeokYeon Hwang <syeon.hwang@samsung.com>

initramfs: cleaned up

Using devtmpfs for populating /dev.
Removed pre-made dev nodes.
Removed pre-linked binaries to busybox.

Change-Id: I10be92fe41da52d3b8b05edbffd2a798a199b9ca
Signed-off-by: SeokYeon Hwang <syeon.hwang@samsung.com>

build: package version up (2.0.8)

Change-Id: I3e8e1f5b24167c2b12cb25232026a9e0a5cd7ba4
Signed-off-by: Alice Liu <alice.liu@intel.com>

Merge branch 'tizen_next_linux_3.14' into tizen_next

code conventions: apply the code conventions

You should use tab instead of space for indentation.

Change-Id: I43fb0cf0d397eeb84ffde490c19689e98f41ff16
Signed-off-by: Sooyoung Ha <yoosah.ha@samsung.com>

permission: remove useless execute permission

Change-Id: I0e44e362f0cbfd16b788613be637954b21111da7
Signed-off-by: Sooyoung Ha <yoosah.ha@samsung.com>

hwkey: added virtqueue_add_inbuf before virtqueue_kick

Change-Id: Icf632a24426b6fb51cf77d59cea07e4bc6e73b23
Signed-off-by: sungmin ha <sungmin82.ha@samsung.com>

brillcodec: add ioctl/mmio command for profile module.

- To check the profile status, add ioctl/mmio command
in brillcodec driver.

Change-Id: Ia03f1b65f16ad8240214f267bf5839202f36ded3
Signed-off-by: gunsoo83.kim <gunsoo83.kim@samsung.com>

brillcodec: introduce new feature MEMORY_MONOPOLIZING

Change-Id: I9a35dce462efee60149ee5b168d3f2ba4c65ffff
Signed-off-by: SeokYeon Hwang <syeon.hwang@samsung.com>

brillcodec: re-arrange device command

Re-arrange device command.
Apply strict version checking.

Change-Id: Ia473254cef42b077662922a151314dbf51c6def4
Signed-off-by: SeokYeon Hwang <syeon.hwang@samsung.com>

brillcodec: introduce brillcodec version 3

Change-Id: Ie08b8b4ee8094ce20b9f80f52c72aadcd4f34df5
Signed-off-by: SeokYeon Hwang <syeon.hwang@samsung.com>

Smack: Fix setting label on successful file open

While opening with CAP_MAC_OVERRIDE file label is not set.
Other calls may access it after CAP_MAC_OVERRIDE is dropped from process.

Change-Id: I1d9cdeb325c397dfb0b97e60eb7b2842c1819d99
Signed-off-by: Marcin Niesluchowski <m.niesluchow@samsung.com>

Smack: Verify read access on file open - v3

Smack believes that many of the operatons that can
be performed on an open file descriptor are read operations.
The fstat and lseek system calls are examples.
An implication of this is that files shouldn't be open
if the task doesn't have read access even if it has
write access and the file is being opened write only.

Targeted for git://git.gitorious.org/smack-next/kernel.git

Change-Id: I63d57bc62cd08fa4e1f128b544e7ed7316456e4c
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>

Smack: bidirectional UDS connect check

Smack IPC policy requires that the sender have write access
to the receiver. UDS streams don't do per-packet checks. The
only check is done at connect time. The existing code checks
if the connecting process can write to the other, but not the
other way around. This change adds a check that the other end
can write to the connecting process.

Targeted for git://git.gitorious.org/smack-next/kernel.git

Change-Id: I42bb66ba2f73c8e604bee85002fc9e419337732c
Signed-off-by: Casey Schuafler <casey@schaufler-ca.com>

Smack: Correctly remove SMACK64TRANSMUTE attribute

Sam Henderson points out that removing the SMACK64TRANSMUTE
attribute from a directory does not result in the directory
transmuting. This is because the inode flag indicating that
the directory is transmuting isn't cleared. The fix is a tad
less than trivial because smk_task and smk_mmap should have
been broken out, too.

Targeted for git://git.gitorious.org/smack-next/kernel.git

Change-Id: I73e29d988fd5ca7502aeab01e340189420a95c75
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>

SMACK: Fix handling value==NULL in post setxattr

The function `smack_inode_post_setxattr` is called each
time that a setxattr is done, for any value of name.
The kernel allow to put value==NULL when size==0
to set an empty attribute value. The systematic
call to smk_import_entry was causing the dereference
of a NULL pointer hence a KERNEL PANIC!

The problem can be produced easily by issuing the
command `setfattr -n user.data file` under bash prompt
when SMACK is active.

Moving the call to smk_import_entry as proposed by this
patch is correcting the behaviour because the function
smack_inode_post_setxattr is called for the SMACK's
attributes only if the function smack_inode_setxattr validated
the value and its size (what will not be the case when size==0).

It also has a benefical effect to not fill the smack hash
with garbage values coming from any extended attribute
write.

Change-Id: Iaf0039c2be9bccb6cee11c24a3b44d209101fe47
Signed-off-by: José Bollo <jose.bollo@open.eurogiciel.org>

bugfix patch for SMACK

1. In order to remove any SMACK extended attribute from a file, a user
should have CAP_MAC_ADMIN capability. But user without having this
capability is able to remove SMACK64MMAP security attribute.

2. While validating size and value of smack extended attribute in
smack_inode_setsecurity hook, wrong error code is returned.

Change-Id: Ie71e86840f47b6810aaf4ff9a577cdea8274925b
Signed-off-by: Pankaj Kumar <pamkaj.k2@samsung.com>
Signed-off-by: Himanshu Shukla <himanshu.sh@samsung.com>

Smack: adds smackfs/ptrace interface

This allows to limit ptrace beyond the regular smack access rules.
It adds a smackfs/ptrace interface that allows smack to be configured
to require equal smack labels for PTRACE_MODE_ATTACH access.
See the changes in Documentation/security/Smack.txt below for details.

Change-Id: I5459ff414e96dde0430ed8febd92c361c9dc1d81
Signed-off-by: Lukasz Pawelczyk <l.pawelczyk@partner.samsung.com>
Signed-off-by: Rafal Krypa <r.krypa@samsung.com>

Smack: unify all ptrace accesses in the smack

The decision whether we can trace a process is made in the following
functions:
smack_ptrace_traceme()
smack_ptrace_access_check()
smack_bprm_set_creds() (in case the proces is traced)

This patch unifies all those decisions by introducing one function that
checks whether ptrace is allowed: smk_ptrace_rule_check().

This makes possible to actually trace with TRACEME where first the
TRACEME itself must be allowed and then exec() on a traced process.

Additional bugs fixed:
- The decision is made according to the mode parameter that is now correctly
  translated from PTRACE_MODE_* to MAY_* instead of being treated 1:1.
  PTRACE_MODE_READ requires MAY_READ.
  PTRACE_MODE_ATTACH requires MAY_READWRITE.
- Add a smack audit log in case of exec() refused by bprm_set_creds().
- Honor the PTRACE_MODE_NOAUDIT flag and don't put smack audit info
  in case this flag is set.

Change-Id: I43d82d480f331e8ef90da7c287b1e414d55ff394
Signed-off-by: Lukasz Pawelczyk <l.pawelczyk@partner.samsung.com>
Signed-off-by: Rafal Krypa <r.krypa@samsung.com>

Smack: fix the subject/object order in smack_ptrace_traceme()

The order of subject/object is currently reversed in
smack_ptrace_traceme(). It is currently checked if the tracee has a
capability to trace tracer and according to this rule a decision is made
whether the tracer will be allowed to trace tracee.

Change-Id: I8ed75ceabe822c70cf9bdccda004139c4c817248
Signed-off-by: Lukasz Pawelczyk <l.pawelczyk@partner.samsung.com>
Signed-off-by: Rafal Krypa <r.krypa@samsung.com>

Minor improvement of 'smack_sb_kern_mount'

Fix a possible memory access fault when transmute is true and isp is NULL.

Change-Id: Ib922cfec405067ec5592880c4ae447969ba96633
Signed-off-by: José Bollo <jose.bollo@open.eurogiciel.org>

Smack: Cgroup filesystem access

The cgroup filesystems are not mounted using conventional
mechanisms. This prevents the use of mount options to
set Smack attributes. This patch makes the behavior
of cgroup filesystems compatable with the way systemd
uses them.

Change-Id: I1e0429f133db9e14117dc754d682dec08221354c
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>

Merge branch 'linux-3.14.25' into tizen_next_linux_3.14

Merge "config: enable specific capabilities again" into tizen_next_linux_3.14

config: enable specific capabilities again

Enable CONFIG_MARU and CONFIG_YAGL capability. It was disabled during merge.

Change-Id: I4a3b108697fb1c03254b07de9ff786a47a464e09
Signed-off-by: SeokYeon Hwang <syeon.hwang@samsung.com>

VIGS: use helpers for ttm_tt_populate/ttm_tt_unpopulate

Original dummy implementation was causing NULL pointer dereference in
ttm_tt_clear_mapping

Change-Id: I4e468e41a515855c8631527a1d58fc71f80e8a4b
Signed-off-by: Vasiliy Ulyanov <v.ulyanov@samsung.com>

Linux 3.14.25

mm/page_alloc: prevent MIGRATE_RESERVE pages from being misplaced

commit 5bcc9f86ef09a933255ee66bd899d4601785dad5 upstream.

For the MIGRATE_RESERVE pages, it is useful when they do not get
misplaced on free_list of other migratetype, otherwise they might get
allocated prematurely and e.g.  fragment the MIGRATE_RESEVE pageblocks.
While this cannot be avoided completely when allocating new
MIGRATE_RESERVE pageblocks in min_free_kbytes sysctl handler, we should
prevent the misplacement where possible.

Currently, it is possible for the misplacement to happen when a
MIGRATE_RESERVE page is allocated on pcplist through rmqueue_bulk() as a
fallback for other desired migratetype, and then later freed back
through free_pcppages_bulk() without being actually used.  This happens
because free_pcppages_bulk() uses get_freepage_migratetype() to choose
the free_list, and rmqueue_bulk() calls set_freepage_migratetype() with
the *desired* migratetype and not the page's original MIGRATE_RESERVE
migratetype.

This patch fixes the problem by moving the call to
set_freepage_migratetype() from rmqueue_bulk() down to
__rmqueue_smallest() and __rmqueue_fallback() where the actual page's
migratetype (e.g.  from which free_list the page is taken from) is used.
Note that this migratetype might be different from the pageblock's
migratetype due to freepage stealing decisions.  This is OK, as page
stealing never uses MIGRATE_RESERVE as a fallback, and also takes care
to leave all MIGRATE_CMA pages on the correct freelist.

Therefore, as an additional benefit, the call to
get_pageblock_migratetype() from rmqueue_bulk() when CMA is enabled, can
be removed completely.  This relies on the fact that MIGRATE_CMA
pageblocks are created only during system init, and the above.  The
related is_migrate_isolate() check is also unnecessary, as memory
isolation has other ways to move pages between freelists, and drain pcp
lists containing pages that should be isolated.  The buffered_rmqueue()
can also benefit from calling get_freepage_migratetype() instead of
get_pageblock_migratetype().

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Yong-Taek Lee <ytk.lee@samsung.com>
Reported-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Suggested-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Suggested-by: Mel Gorman <mgorman@suse.de>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: "Wang, Yalin" <Yalin.Wang@sonymobile.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mm: vmscan: use proportional scanning during direct reclaim and full scan at DEF_PRIORITY

commit 1a501907bbea8e6ebb0b16cf6db9e9cbf1d2c813 upstream.

Commit "mm: vmscan: obey proportional scanning requirements for kswapd"
ensured that file/anon lists were scanned proportionally for reclaim from
kswapd but ignored it for direct reclaim.  The intent was to minimse
direct reclaim latency but Yuanhan Liu pointer out that it substitutes one
long stall for many small stalls and distorts aging for normal workloads
like streaming readers/writers.  Hugh Dickins pointed out that a
side-effect of the same commit was that when one LRU list dropped to zero
that the entirety of the other list was shrunk leading to excessive
reclaim in memcgs.  This patch scans the file/anon lists proportionally
for direct reclaim to similarly age page whether reclaimed by kswapd or
direct reclaim but takes care to abort reclaim if one LRU drops to zero
after reclaiming the requested number of pages.

Based on ext4 and using the Intel VM scalability test

                                              3.15.0-rc5            3.15.0-rc5
                                                shrinker            proportion
Unit  lru-file-readonce    elapsed      5.3500 (  0.00%)      5.4200 ( -1.31%)
Unit  lru-file-readonce time_range      0.2700 (  0.00%)      0.1400 ( 48.15%)
Unit  lru-file-readonce time_stddv      0.1148 (  0.00%)      0.0536 ( 53.33%)
Unit lru-file-readtwice    elapsed      8.1700 (  0.00%)      8.1700 (  0.00%)
Unit lru-file-readtwice time_range      0.4300 (  0.00%)      0.2300 ( 46.51%)
Unit lru-file-readtwice time_stddv      0.1650 (  0.00%)      0.0971 ( 41.16%)

The test cases are running multiple dd instances reading sparse files. The results are within
the noise for the small test machine. The impact of the patch is more noticable from the vmstats

                            3.15.0-rc5  3.15.0-rc5
                              shrinker  proportion
Minor Faults                     35154       36784
Major Faults                       611        1305
Swap Ins                           394        1651
Swap Outs                         4394        5891
Allocation stalls               118616       44781
Direct pages scanned           4935171     4602313
Kswapd pages scanned          15921292    16258483
Kswapd pages reclaimed        15913301    16248305
Direct pages reclaimed         4933368     4601133
Kswapd efficiency                  99%         99%
Kswapd velocity             670088.047  682555.961
Direct efficiency                  99%         99%
Direct velocity             207709.217  193212.133
Percentage direct scans            23%         22%
Page writes by reclaim        4858.000    6232.000
Page writes file                   464         341
Page writes anon                  4394        5891

Note that there are fewer allocation stalls even though the amount
of direct reclaim scanning is very approximately the same.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Tested-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Cc: Bob Liu <bob.liu@oracle.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Rik van Riel <riel@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

fs/superblock: avoid locking counting inodes and dentries before reclaiming them

commit d23da150a37c9fe3cc83dbaf71b3e37fd434ed52 upstream.

We remove the call to grab_super_passive in call to super_cache_count.
This becomes a scalability bottleneck as multiple threads are trying to do
memory reclamation, e.g.  when we are doing large amount of file read and
page cache is under pressure.  The cached objects quickly got reclaimed
down to 0 and we are aborting the cache_scan() reclaim.  But counting
creates a log jam acquiring the sb_lock.

We are holding the shrinker_rwsem which ensures the safety of call to
list_lru_count_node() and s_op->nr_cached_objects.  The shrinker is
unregistered now before ->kill_sb() so the operation is safe when we are
doing unmount.

The impact will depend heavily on the machine and the workload but for a
small machine using postmark tuned to use 4xRAM size the results were

                                  3.15.0-rc5            3.15.0-rc5
                                     vanilla         shrinker-v1r1
Ops/sec Transactions         21.00 (  0.00%)       24.00 ( 14.29%)
Ops/sec FilesCreate          39.00 (  0.00%)       44.00 ( 12.82%)
Ops/sec CreateTransact       10.00 (  0.00%)       12.00 ( 20.00%)
Ops/sec FilesDeleted       6202.00 (  0.00%)     6202.00 (  0.00%)
Ops/sec DeleteTransact       11.00 (  0.00%)       12.00 (  9.09%)
Ops/sec DataRead/MB          25.97 (  0.00%)       29.10 ( 12.05%)
Ops/sec DataWrite/MB         49.99 (  0.00%)       56.02 ( 12.06%)

ffsb running in a configuration that is meant to simulate a mail server showed

                                 3.15.0-rc5             3.15.0-rc5
                                    vanilla          shrinker-v1r1
Ops/sec readall           9402.63 (  0.00%)      9567.97 (  1.76%)
Ops/sec create            4695.45 (  0.00%)      4735.00 (  0.84%)
Ops/sec delete             173.72 (  0.00%)       179.83 (  3.52%)
Ops/sec Transactions     14271.80 (  0.00%)     14482.81 (  1.48%)
Ops/sec Read                37.00 (  0.00%)        37.60 (  1.62%)
Ops/sec Write               18.20 (  0.00%)        18.30 (  0.55%)

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Chinner <david@fromorbit.com>
Tested-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Cc: Bob Liu <bob.liu@oracle.com>
Cc: Jan Kara <jack@suse.cz>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

fs/superblock: unregister sb shrinker before ->kill_sb()

commit 28f2cd4f6da24a1aa06c226618ed5ad69e13df64 upstream.

This series is aimed at regressions noticed during reclaim activity.  The
first two patches are shrinker patches that were posted ages ago but never
merged for reasons that are unclear to me.  I'm posting them again to see
if there was a reason they were dropped or if they just got lost.  Dave?
Time?  The last patch adjusts proportional reclaim.  Yuanhan Liu, can you
retest the vm scalability test cases on a larger machine?  Hugh, does this
work for you on the memcg test cases?

Based on ext4, I get the following results but unfortunately my larger
test machines are all unavailable so this is based on a relatively small
machine.

postmark
                                  3.15.0-rc5            3.15.0-rc5
                                     vanilla       proportion-v1r4
Ops/sec Transactions         21.00 (  0.00%)       25.00 ( 19.05%)
Ops/sec FilesCreate          39.00 (  0.00%)       45.00 ( 15.38%)
Ops/sec CreateTransact       10.00 (  0.00%)       12.00 ( 20.00%)
Ops/sec FilesDeleted       6202.00 (  0.00%)     6202.00 (  0.00%)
Ops/sec DeleteTransact       11.00 (  0.00%)       12.00 (  9.09%)
Ops/sec DataRead/MB          25.97 (  0.00%)       30.02 ( 15.59%)
Ops/sec DataWrite/MB         49.99 (  0.00%)       57.78 ( 15.58%)

ffsb (mail server simulator)
                                 3.15.0-rc5             3.15.0-rc5
                                    vanilla        proportion-v1r4
Ops/sec readall           9402.63 (  0.00%)      9805.74 (  4.29%)
Ops/sec create            4695.45 (  0.00%)      4781.39 (  1.83%)
Ops/sec delete             173.72 (  0.00%)       177.23 (  2.02%)
Ops/sec Transactions     14271.80 (  0.00%)     14764.37 (  3.45%)
Ops/sec Read                37.00 (  0.00%)        38.50 (  4.05%)
Ops/sec Write               18.20 (  0.00%)        18.50 (  1.65%)

dd of a large file
                                3.15.0-rc5            3.15.0-rc5
                                   vanilla       proportion-v1r4
WallTime DownloadTar       75.00 (  0.00%)       61.00 ( 18.67%)
WallTime DD               423.00 (  0.00%)      401.00 (  5.20%)
WallTime Delete             2.00 (  0.00%)        5.00 (-150.00%)

stutter (times mmap latency during large amounts of IO)

                            3.15.0-rc5            3.15.0-rc5
                               vanilla       proportion-v1r4
Unit >5ms Delays  80252.0000 (  0.00%)  81523.0000 ( -1.58%)
Unit Mmap min         8.2118 (  0.00%)      8.3206 ( -1.33%)
Unit Mmap mean       17.4614 (  0.00%)     17.2868 (  1.00%)
Unit Mmap stddev     24.9059 (  0.00%)     34.6771 (-39.23%)
Unit Mmap max      2811.6433 (  0.00%)   2645.1398 (  5.92%)
Unit Mmap 90%        20.5098 (  0.00%)     18.3105 ( 10.72%)
Unit Mmap 93%        22.9180 (  0.00%)     20.1751 ( 11.97%)
Unit Mmap 95%        25.2114 (  0.00%)     22.4988 ( 10.76%)
Unit Mmap 99%        46.1430 (  0.00%)     43.5952 (  5.52%)
Unit Ideal  Tput     85.2623 (  0.00%)     78.8906 (  7.47%)
Unit Tput min        44.0666 (  0.00%)     43.9609 (  0.24%)
Unit Tput mean       45.5646 (  0.00%)     45.2009 (  0.80%)
Unit Tput stddev      0.9318 (  0.00%)      1.1084 (-18.95%)
Unit Tput max        46.7375 (  0.00%)     46.7539 ( -0.04%)

This patch (of 3):

We will like to unregister the sb shrinker before ->kill_sb().  This will
allow cached objects to be counted without call to grab_super_passive() to
update ref count on sb.  We want to avoid locking during memory
reclamation especially when we are skipping the memory reclaim when we are
out of cached objects.

This is safe because grab_super_passive does a try-lock on the
sb->s_umount now, and so if we are in the unmount process, it won't ever
block.  That means what used to be a deadlock and races we were avoiding
by using grab_super_passive() is now:

        shrinker                        umount

        down_read(shrinker_rwsem)
                                        down_write(sb->s_umount)
                                        shrinker_unregister
                                          down_write(shrinker_rwsem)
                                            <blocks>
        grab_super_passive(sb)
          down_read_trylock(sb->s_umount)
            <fails>
        <shrinker aborts>
        ....
        <shrinkers finish running>
        up_read(shrinker_rwsem)
                                          <unblocks>
                                          <removes shrinker>
                                          up_write(shrinker_rwsem)
                                        ->kill_sb()
                                        ....

So it is safe to deregister the shrinker before ->kill_sb().

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Chinner <david@fromorbit.com>
Tested-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Cc: Bob Liu <bob.liu@oracle.com>
Cc: Jan Kara <jack@suse.cz>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mm: fix direct reclaim writeback regression

commit 8bdd638091605dc66d92c57c4b80eb87fffc15f7 upstream.

Shortly before 3.16-rc1, Dave Jones reported:

  WARNING: CPU: 3 PID: 19721 at fs/xfs/xfs_aops.c:971
           xfs_vm_writepage+0x5ce/0x630 [xfs]()
  CPU: 3 PID: 19721 Comm: trinity-c61 Not tainted 3.15.0+ #3
  Call Trace:
    xfs_vm_writepage+0x5ce/0x630 [xfs]
    shrink_page_list+0x8f9/0xb90
    shrink_inactive_list+0x253/0x510
    shrink_lruvec+0x563/0x6c0
    shrink_zone+0x3b/0x100
    shrink_zones+0x1f1/0x3c0
    try_to_free_pages+0x164/0x380
    __alloc_pages_nodemask+0x822/0xc90
    alloc_pages_vma+0xaf/0x1c0
    handle_mm_fault+0xa31/0xc50
  etc.

970   if (WARN_ON_ONCE((current->flags & (PF_MEMALLOC|PF_KSWAPD)) ==
971                   PF_MEMALLOC))

I did not respond at the time, because a glance at the PageDirty block
in shrink_page_list() quickly shows that this is impossible: we don't do
writeback on file pages (other than tmpfs) from direct reclaim nowadays.
Dave was hallucinating, but it would have been disrespectful to say so.

However, my own /var/log/messages now shows similar complaints

  WARNING: CPU: 1 PID: 28814 at fs/ext4/inode.c:1881 ext4_writepage+0xa7/0x38b()
  WARNING: CPU: 0 PID: 27347 at fs/ext4/inode.c:1764 ext4_writepage+0xa7/0x38b()

from stressing some mmotm trees during July.

Could a dirty xfs or ext4 file page somehow get marked PageSwapBacked,
so fail shrink_page_list()'s page_is_file_cache() test, and so proceed
to mapping->a_ops->writepage()?

Yes, 3.16-rc1's commit 68711a746345 ("mm, migration: add destination
page freeing callback") has provided such a way to compaction: if
migrating a SwapBacked page fails, its newpage may be put back on the
list for later use with PageSwapBacked still set, and nothing will clear
it.

Whether that can do anything worse than issue WARN_ON_ONCEs, and get
some statistics wrong, is unclear: easier to fix than to think through
the consequences.

Fixing it here, before the put_new_page(), addresses the bug directly,
but is probably the worst place to fix it.  Page migration is doing too
many parts of the job on too many levels: fixing it in
move_to_new_page() to complement its SetPageSwapBacked would be
preferable, except why is it (and newpage->mapping and newpage->index)
done there, rather than down in migrate_page_move_mapping(), once we are
sure of success? Not a cleanup to get into right now, especially not
with memcg cleanups coming in 3.17.

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/mm: In the PTE swapout page reclaim case clear the accessed bit instead of flushing the TLB

commit b13b1d2d8692b437203de7a404c6b809d2cc4d99 upstream.

We use the accessed bit to age a page at page reclaim time,
and currently we also flush the TLB when doing so.

But in some workloads TLB flush overhead is very heavy. In my
simple multithreaded app with a lot of swap to several pcie
SSDs, removing the tlb flush gives about 20% ~ 30% swapout
speedup.

Fortunately just removing the TLB flush is a valid optimization:
on x86 CPUs, clearing the accessed bit without a TLB flush
doesn't cause data corruption.

It could cause incorrect page aging and the (mistaken) reclaim of
hot pages, but the chance of that should be relatively low.

So as a performance optimization don't flush the TLB when
clearing the accessed bit, it will eventually be flushed by
a context switch or a VM operation anyway. [ In the rare
event of it not getting flushed for a long time the delay
shouldn't really matter because there's no real memory
pressure for swapout to react to. ]

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Shaohua Li <shli@fusionio.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Hugh Dickins <hughd@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: linux-mm@kvack.org
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20140408075809.GA1764@kernel.org
[ Rewrote the changelog and the code comments. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mm, compaction: properly signal and act upon lock and need_sched() contention

commit be9765722e6b7ece8263cbab857490332339bd6f upstream.

Compaction uses compact_checklock_irqsave() function to periodically check
for lock contention and need_resched() to either abort async compaction,
or to free the lock, schedule and retake the lock.  When aborting,
cc->contended is set to signal the contended state to the caller.  Two
problems have been identified in this mechanism.

First, compaction also calls directly cond_resched() in both scanners when
no lock is yet taken.  This call either does not abort async compaction,
or set cc->contended appropriately.  This patch introduces a new
compact_should_abort() function to achieve both.  In isolate_freepages(),
the check frequency is reduced to once by SWAP_CLUSTER_MAX pageblocks to
match what the migration scanner does in the preliminary page checks.  In
case a pageblock is found suitable for calling isolate_freepages_block(),
the checks within there are done on higher frequency.

Second, isolate_freepages() does not check if isolate_freepages_block()
aborted due to contention, and advances to the next pageblock.  This
violates the principle of aborting on contention, and might result in
pageblocks not being scanned completely, since the scanning cursor is
advanced.  This problem has been noticed in the code by Joonsoo Kim when
reviewing related patches.  This patch makes isolate_freepages_block()
check the cc->contended flag and abort.

In case isolate_freepages() has already isolated some pages before
aborting due to contention, page migration will proceed, which is OK since
we do not want to waste the work that has been done, and page migration
has own checks for contention.  However, we do not want another isolation
attempt by either of the scanners, so cc->contended flag check is added
also to compaction_alloc() and compact_finished() to make sure compaction
is aborted right after the migration.

The outcome of the patch should be reduced lock contention by async
compaction and lower latencies for higher-order allocations where direct
compaction is involved.

[akpm@linux-foundation.org: fix typo in comment]
Reported-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Rik van Riel <riel@redhat.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Tested-by: Shawn Guo <shawn.guo@linaro.org>
Tested-by: Kevin Hilman <khilman@linaro.org>
Tested-by: Stephen Warren <swarren@nvidia.com>
Tested-by: Fabio Estevam <fabio.estevam@freescale.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>