Linus Torvalds [Tue, 18 Feb 2014 23:52:43 +0000 (15:52 -0800)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) kvaser CAN driver has fixed limits of some of it's table, validate
that we won't exceed those limits at probe time. Fix from Olivier
Sobrie.
2) Fix rtl8192ce disabling interrupts for too long, from Olivier
Langlois.
3) Fix botched shift in ath5k driver, from Dan Carpenter.
4) Fix corruption of deferred packets in TIPC, from Erik Hugne.
5) Fix newlink error path in macvlan driver, from Cong Wang.
6) Fix netpoll deadlock in bonding, from Ding Tianhong.
7) Handle GSO packets properly in forwarding path when fragmentation is
necessary on egress, from Florian Westphal.
8) Fix axienet build errors, from Michal Simek.
9) Fix refcounting of ubufs on tx in vhost net driver, from Michael S
Tsirkin.
10) Carrier status isn't set properly in hyperv driver, from Haiyang
Zhang.
11) Missing pci_disable_device() in tulip_remove_one), from Ingo Molnar.
12) AF_PACKET qdisc bypass mode doesn't adhere to driver provided TX
queue selection method. Add a fallback method mechanism to fix this
bug, from Daniel Borkmann.
13) Fix regression in link local route handling on GRE tunnels, from
Nicolas Dichtel.
14) Bonding can assign dup aggregator IDs in some sequences of
configuration, fix by making the allocation counter per-bond instead
of global. From Jiri Bohac.
15) sctp_connectx() needs compat translations, from Daniel Borkmann.
16) Fix of_mdio PHY interrupt parsing, from Ben Dooks
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (62 commits)
MAINTAINERS: add entry for the PHY library
of_mdio: fix phy interrupt passing
net: ethernet: update dependency and help text of mvneta
NET: fec: only enable napi if we are successful
af_packet: remove a stray tab in packet_set_ring()
net: sctp: fix sctp_connectx abi for ia32 emulation/compat mode
ipv4: fix counter in_slow_tot
irtty-sir.c: Do not set_termios() on irtty_close()
bonding: 802.3ad: make aggregator_identifier bond-private
usbnet: remove generic hard_header_len check
gre: add link local route when local addr is any
batman-adv: fix potential kernel paging error for unicast transmissions
batman-adv: avoid double free when orig_node initialization fails
batman-adv: free skb on TVLV parsing success
batman-adv: fix TT CRC computation by ensuring byte order
batman-adv: fix potential orig_node reference leak
batman-adv: avoid potential race condition when adding a new neighbour
batman-adv: properly check pskb_may_pull return value
batman-adv: release vlan object after checking the CRC
batman-adv: fix TT-TVLV parsing on OGM reception
...
Linus Torvalds [Tue, 18 Feb 2014 23:49:58 +0000 (15:49 -0800)]
Merge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm
Pull ARM fixes from Russell King:
"A range of ARM fixes. Biggest change is the stage-2 attributes used
for for hyp mode which were wrong. I've killed some bits in a couple
of DT files which turned out not to be required, and a few other
fixes.
One fix touches code outside of arch/arm, which is related to sorting
out the DMA masks correctly. There is a long standing issue with the
conversion from PFNs to addresses where people assume that shifting an
unsigned long left by PAGE_SHIFT results in a correct address. This
is not the case with C: the integer promotion happens at assignment
after evaluation. This fixes the recently introduced dma_max_pfn()
function, but there's a number of other places where we try this
directly on an unsigned long in the mm code"
* 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
ARM: 7957/1: add DSB after icache flush in __flush_icache_all()
Fix uses of dma_max_pfn() when converting to a limiting address
ARM: 7955/1: spinlock: ensure we have a compiler barrier before sev
ARM: 7953/1: mm: ensure TLB invalidation is complete before enabling MMU
ARM: 7952/1: mm: Fix the memblock allocation for LPAE machines
ARM: 7950/1: mm: Fix stage-2 device memory attributes
ARM: dts: fix spdif pinmux configuration
Linus Torvalds [Tue, 18 Feb 2014 23:49:40 +0000 (15:49 -0800)]
Merge tag 'jfs-3.14-rc4' of git://github.com/kleikamp/linux-shaggy
Pull jfs fix from David Kleikamp:
"Another ACL regression. This one more subtle"
* tag 'jfs-3.14-rc4' of git://github.com/kleikamp/linux-shaggy:
jfs: set i_ctime when setting ACL
Florian Fainelli [Tue, 18 Feb 2014 17:47:49 +0000 (09:47 -0800)]
MAINTAINERS: add entry for the PHY library
The PHY library has been subject to some changes, new drivers and DT
interactions over the past few months. Add myself as a maintainer for
the core PHY library parts and drivers. Make sure the PHY library entry
also covers the Device Tree files which have a close interaction with
the MDIO bus, PHY connection and Ethernet PHY mode parsing.
CC: Grant Likely <grant.likely@linaro.org>
CC: Shaohui Xie <shaohui.xie@freescale.com>
CC: Andy Fleming <afleming@gmail.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Dooks [Tue, 18 Feb 2014 12:16:58 +0000 (12:16 +0000)]
of_mdio: fix phy interrupt passing
The of_mdiobus_register_phy() is not setting phy->irq thus causing
some drivers to incorrectly assume that the PHY does not have an
IRQ associated with it. Not only do some drivers report no IRQ
they do not install an interrupt handler for the PHY.
Simplify the code setting irq and set the phy->irq at the same
time so that we cover the following issues, which should cover
all the cases the code will find:
- Set phy->irq if node has irq property and mdio->irq is NULL
- Set phy->irq if node has no irq and mdio->irq is not NULL
- Leave phy->irq as PHY_POLL default if none of the above
This fixes the issue:
net eth0: attached PHY 1 (IRQ -1) to driver Micrel KSZ8041RNLI
to the correct:
net eth0: attached PHY 1 (IRQ 416) to driver Micrel KSZ8041RNLI
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Petazzoni [Tue, 18 Feb 2014 13:18:11 +0000 (14:18 +0100)]
net: ethernet: update dependency and help text of mvneta
With the introduction of the support for Armada 375 and Armada 38x,
the hidden Kconfig option MACH_ARMADA_370_XP is being renamed to
MACH_MVEBU_V7. Therefore, the dependency that was used for the mvneta
driver can no longer work. This commit replaces this dependency by a
dependency on PLAT_ORION, which is used similarly for the mv643xx_eth
driver.
In addition to this, it takes this opportunity to adjust the
description and help text to indicate that the driver can is also used
for Armada 38x. Note that Armada 375 cannot use this driver as it has
a completely different networking unit, which will require a separate
driver.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Acked-by: Jason Cooper <jason@lakedaemon.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Tue, 18 Feb 2014 12:55:42 +0000 (12:55 +0000)]
NET: fec: only enable napi if we are successful
If napi is left enabled after a failed attempt to bring the interface
up, we BUG:
fec 2188000.ethernet eth0: no PHY, assuming direct connection to switch
libphy: PHY fixed-0:00 not found
fec 2188000.ethernet eth0: could not attach to PHY
------------[ cut here ]------------
kernel BUG at include/linux/netdevice.h:502!
Internal error: Oops - BUG: 0 [#1] SMP ARM
...
PC is at fec_enet_open+0x4d0/0x500
LR is at __dev_open+0xa4/0xfc
Only enable napi after we are past all the failure paths.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Tue, 18 Feb 2014 12:20:51 +0000 (15:20 +0300)]
af_packet: remove a stray tab in packet_set_ring()
At first glance it looks like there is a missing curly brace but
actually the code works the same either way. I have adjusted the
indenting but left the code the same.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 18 Feb 2014 21:57:42 +0000 (16:57 -0500)]
Merge branch 'for-davem' of git://git./linux/kernel/git/linville/wireless
John W. Linville says:
====================
Please pull this batch of fixes intended for the 3.14 stream...
For the iwlwifi one, Emmanuel says:
"As explicitly written in the commit message, we prefer to disable Tx
AMPDU on NICs supported by iwldvm. This feature gives a big boost in
Tx performance, but the firmware is buggy and we can't rely on it.
Our hope is that most of the users out there want wifi to surf on
the web which means that they care more for Rx traffic than for Tx.
People who want to enable it can do so with the help of a module
parameter."
On top of that...
Dan Carpenter fixes a typo/thinko in ath5k.
Olivier Langlois fixes a couple of rtlwifi issues, one which leaves
IRQs disabled too long (causing a variety of problems elsewhere),
and one which fixes an incorrect return code when failing to enable
the NIC.
Russell King fixes a NULL pointer dereference in hostap.
Stanislaw Gruszka fixes a DMA coherence issue in the rtl8187 driver.
Please let me know if there are problems!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Mon, 17 Feb 2014 11:11:11 +0000 (12:11 +0100)]
net: sctp: fix sctp_connectx abi for ia32 emulation/compat mode
SCTP's sctp_connectx() abi breaks for 64bit kernels compiled with 32bit
emulation (e.g. ia32 emulation or x86_x32). Due to internal usage of
'struct sctp_getaddrs_old' which includes a struct sockaddr pointer,
sizeof(param) check will always fail in kernel as the structure in
64bit kernel space is 4bytes larger than for user binaries compiled
in 32bit mode. Thus, applications making use of sctp_connectx() won't
be able to run under such circumstances.
Introduce a compat interface in the kernel to deal with such
situations by using a 'struct compat_sctp_getaddrs_old' structure
where user data is copied into it, and then sucessively transformed
into a 'struct sctp_getaddrs_old' structure with the help of
compat_ptr(). That fixes sctp_connectx() abi without any changes
needed in user space, and lets the SCTP test suite pass when compiled
in 32bit and run on 64bit kernels.
Fixes:
f9c67811ebc0 ("sctp: Fix regression introduced by new sctp_connectx api")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 18 Feb 2014 20:40:50 +0000 (15:40 -0500)]
Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-merge
Included changes:
- fix soft-interface MTU computation
- fix bogus pointer mangling when parsing the TT-TVLV
container. This bug led to a wrong memory access.
- fix memory leak by properly releasing the VLAN object
after CRC check
- properly check pskb_may_pull() return value
- avoid potential race condition while adding new neighbour
- fix potential memory leak by removing all the references
to the orig_node object in case of initialization failure
- fix the TT CRC computation by ensuring that every node uses
the same byte order when hosts with different endianess are
part of the same network
- fix severe memory leak by freeing skb after a successful
TVLV parsing
- avoid potential double free when orig_node initialization
fails
- fix potential kernel paging error caused by the usage of
the old value of skb->data after skb reallocation
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Tue, 18 Feb 2014 18:04:09 +0000 (10:04 -0800)]
Merge tag 'ext4_for_linus_stable' of git://git./linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"Miscellaneous ext4 bug fixes for v3.14"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
jbd2: fix use after free in jbd2_journal_start_reserved()
ext4: don't leave i_crtime.tv_sec uninitialized
ext4: fix online resize with a non-standard blocks per group setting
ext4: fix online resize with very large inode tables
ext4: don't try to modify s_flags if the the file system is read-only
ext4: fix error paths in swap_inode_boot_loader()
ext4: fix xfstest generic/299 block validity failures
Dan Carpenter [Tue, 18 Feb 2014 01:33:01 +0000 (20:33 -0500)]
jbd2: fix use after free in jbd2_journal_start_reserved()
If start_this_handle() fails then it leads to a use after free of
"handle".
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
Vinayak Kale [Wed, 12 Feb 2014 06:30:01 +0000 (07:30 +0100)]
ARM: 7957/1: add DSB after icache flush in __flush_icache_all()
Add DSB after icache flush to complete the cache maintenance operation.
Signed-off-by: Vinayak Kale <vkale@apm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Russell King [Tue, 11 Feb 2014 17:11:04 +0000 (17:11 +0000)]
Fix uses of dma_max_pfn() when converting to a limiting address
We must use a 64-bit for this, otherwise overflowed bits get lost, and
that can result in a lower than intended value set.
Fixes:
8e0cb8a1f6ac ("ARM: 7797/1: mmc: Use dma_max_pfn(dev) helper for bounce_limit calculations")
Fixes:
7d35496dd982 ("ARM: 7796/1: scsi: Use dma_max_pfn(dev) helper for bounce_limit calculations")
Tested-Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Duan Jiong [Mon, 17 Feb 2014 07:23:43 +0000 (15:23 +0800)]
ipv4: fix counter in_slow_tot
since commit
89aef8921bf("ipv4: Delete routing cache."), the counter
in_slow_tot can't work correctly.
The counter in_slow_tot increase by one when fib_lookup() return successfully
in ip_route_input_slow(), but actually the dst struct maybe not be created and
cached, so we can increase in_slow_tot after the dst struct is created.
Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Mon, 17 Feb 2014 21:51:00 +0000 (13:51 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/sage/ceph-client
Pull Ceph fixes from Sage Weil:
"We have some patches fixing up ACL support issues from Zheng and
Guangliang and a mount option to enable/disable this support. (These
fixes were somewhat delayed by the Chinese holiday.)
There is also a small fix for cached readdir handling when directories
are fragmented"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
ceph: fix __dcache_readdir()
ceph: add acl, noacl options for cephfs mount
ceph: make ceph_forget_all_cached_acls() static inline
ceph: add missing init_acl() for mkdir() and atomic_open()
ceph: fix ceph_set_acl()
ceph: fix ceph_removexattr()
ceph: remove xattr when null value is given to setxattr()
ceph: properly handle XATTR_CREATE and XATTR_REPLACE
Linus Torvalds [Mon, 17 Feb 2014 21:50:11 +0000 (13:50 -0800)]
Merge branch 'for-linus' of git://git.samba.org/sfrench/cifs-2.6
Pull CIFS fixes from Steve French:
"Three cifs fixes, the most important fixing the problem with passing
bogus pointers with writev (CVE-2014-0069).
Two additional cifs fixes are still in review (including the fix for
an append problem which Al also discovered)"
* 'for-linus' of git://git.samba.org/sfrench/cifs-2.6:
CIFS: Fix too big maxBuf size for SMB3 mounts
cifs: ensure that uncached writes handle unmapped areas correctly
[CIFS] Fix cifsacl mounts over smb2 to not call cifs
David Howells [Mon, 17 Feb 2014 15:01:47 +0000 (15:01 +0000)]
FS-Cache: Handle removal of unadded object to the fscache_object_list rb tree
When FS-Cache allocates an object, the following sequence of events can
occur:
-->fscache_alloc_object()
-->cachefiles_alloc_object() [via cache->ops->alloc_object]
<--[returns new object]
-->fscache_attach_object()
<--[failed]
-->cachefiles_put_object() [via cache->ops->put_object]
-->fscache_object_destroy()
-->fscache_objlist_remove()
-->rb_erase() to remove the object from fscache_object_list.
resulting in a crash in the rbtree code.
The problem is that the object is only added to fscache_object_list on
the success path of fscache_attach_object() where it calls
fscache_objlist_add().
So if fscache_attach_object() fails, the object won't have been added to
the objlist rbtree. We do, however, unconditionally try to remove the
object from the tree.
Thanks to NeilBrown for finding this and suggesting this solution.
Reported-by: NeilBrown <neilb@suse.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: (a customer of) NeilBrown <neilb@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dave Jones [Mon, 17 Feb 2014 21:21:24 +0000 (16:21 -0500)]
reiserfs: fix utterly brain-damaged indentation.
This has been this way for years, and every time I stumble across it I
lose my lunch. After coming across it for the nth time in the Coverity
results, I had to overcome the bystander effect and do something about
it.
This ignores the 79 column limit in favor of making it look like C
instead of gibberish.
The correct thing to do here would be to lose some of the indentation by
breaking this function up into several smaller ones. I might do that at
some point if I have the stomach to look at this again.
(Also some of those overlong ternary operations would likely be more
readable as regular if's)
Signed-off-by: Dave Jones <davej@fedoraproject.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tommie Gannert [Mon, 17 Feb 2014 20:46:04 +0000 (20:46 +0000)]
irtty-sir.c: Do not set_termios() on irtty_close()
Issuing set_termios() from irtty_close() causes kernel Oops for
unplugged usb-serial devices.
Since no other tty_ldisc calls set_termios() on close and no tty driver
seem to check if tty->device_data is NULL or not on entry to set_termios(),
the only solution I can come up with is to remove the irtty_stop_receiver()
call, which only updates termios.
Signed-off-by: Tommie Gannert <tommie@gannert.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
John W. Linville [Mon, 17 Feb 2014 20:54:31 +0000 (15:54 -0500)]
Merge branch 'master' of git://git./linux/kernel/git/linville/wireless into for-davem
Linus Torvalds [Mon, 17 Feb 2014 20:42:45 +0000 (12:42 -0800)]
Merge tag 'dma-buf-for-3.14' of git://git./linux/kernel/git/sumits/dma-buf
Pull dma-buf fix from Sumit Semwal:
"Just some debugfs output updates.
There's another patch related to dma-buf, but it'll get upstreamed via
Greg KH's pull request"
* tag 'dma-buf-for-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/sumits/dma-buf:
dma-buf: update debugfs output
Linus Torvalds [Mon, 17 Feb 2014 20:40:36 +0000 (12:40 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/egtvedt/linux-avr32
Pull AVR32 fixes from Hans-Christian Egtvedt.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/egtvedt/linux-avr32:
avr32: add generic vga.h to Kbuild
avr32: add generic ioremap_wc() definition in io.h
avr32: Makefile: add '-D__linux__' flag for gcc-4.4.7 use
avr32: fix missing module.h causing build failure in mimc200/fram.c
Yan, Zheng [Thu, 13 Feb 2014 11:40:26 +0000 (19:40 +0800)]
ceph: fix __dcache_readdir()
If directory is fragmented, readdir() read its dirfrags one by one.
After reading all dirfrags, the corresponding dentries are sorted in
(frag_t, off) order in the dcache. If dentries of a directory are all
cached, __dcache_readdir() can use the cached dentries to satisfy
readdir syscall. But when checking if a given dentry is after the
position of readdir, __dcache_readdir() compares numerical value of
frag_t directly. This is wrong, it should use ceph_frag_compare().
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Sage Weil [Sun, 16 Feb 2014 18:05:29 +0000 (10:05 -0800)]
ceph: add acl, noacl options for cephfs mount
Make the 'acl' option dependent on having ACL support compiled in. Make
the 'noacl' option work even without it so that one can always ask it to
be off and not error out on mount when it is not supported.
Signed-off-by: Guangliang Zhao <lucienchao@gmail.com>
Signed-off-by: Sage Weil <sage@inktank.com>
Guangliang Zhao [Sun, 16 Feb 2014 16:35:52 +0000 (08:35 -0800)]
ceph: make ceph_forget_all_cached_acls() static inline
Signed-off-by: Guangliang Zhao <lucienchao@gmail.com>
Reviewed-by: Alex Elder <elder@linaro.org>
Signed-off-by: Sage Weil <sage@inktank.com>
Yan, Zheng [Tue, 11 Feb 2014 04:55:05 +0000 (12:55 +0800)]
ceph: add missing init_acl() for mkdir() and atomic_open()
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Yan, Zheng [Tue, 11 Feb 2014 05:08:51 +0000 (13:08 +0800)]
ceph: fix ceph_set_acl()
If acl is equivalent to file mode permission bits, ceph_set_acl()
needs to remove any existing acl xattr. Use __ceph_setxattr() to
handle both setting and removing acl xattr cases, it doesn't return
-ENODATA when there is no acl xattr.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Yan, Zheng [Tue, 11 Feb 2014 05:23:09 +0000 (13:23 +0800)]
ceph: fix ceph_removexattr()
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Yan, Zheng [Tue, 11 Feb 2014 05:04:19 +0000 (13:04 +0800)]
ceph: remove xattr when null value is given to setxattr()
For the setxattr request, introduce a new flag CEPH_XATTR_REMOVE
to distinguish null value case from the zero-length value case.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Yan, Zheng [Tue, 11 Feb 2014 05:01:19 +0000 (13:01 +0800)]
ceph: properly handle XATTR_CREATE and XATTR_REPLACE
return -EEXIST if XATTR_CREATE is set and xattr alread exists.
return -ENODATA if XATTR_REPLACE is set but xattr does not exist.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Linus Torvalds [Mon, 17 Feb 2014 20:36:49 +0000 (12:36 -0800)]
Merge branch 'merge' of git://git./linux/kernel/git/benh/powerpc
Pull powerpc fixes from Ben Herrenschmidt:
"Here are some more powerpc fixes for 3.14
The main one is a nasty issue with the NUMA balancing support which
requires a small generic change and the addition of a new accessor to
set _PAGE_NUMA. Both have been reviewed and acked by Mel and Rik.
The changelog should have plenty of details but basically, without
this fix, we get random user segfaults and/or corruptions due to
missing TLB/hash flushes. Aneesh series of 3 patches fixes it.
We have some vDSO vs. perf fixes from Anton, some small EEH fixes
from Gavin, a ppc32 regression vs the stack overflow detector, and a
fix for the way we handle PCIe host bridge speed settings on pseries
(which is needed for proper operations of AMD graphics cards on
Power8)"
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc/eeh: Disable EEH on reboot
powerpc/eeh: Cleanup on eeh_subsystem_enabled
powerpc/powernv: Rework EEH reset
powerpc: Use unstripped VDSO image for more accurate profiling data
powerpc: Link VDSOs at 0x0
mm: Use ptep/pmdp_set_numa() for updating _PAGE_NUMA bit
mm: Dirty accountable change only apply to non prot numa case
powerpc/mm: Add new "set" flag argument to pte/pmd update function
powerpc/pseries: Add Gen3 definitions for PCIE link speed
powerpc/pseries: Fix regression on PCI link speed
powerpc: Set the correct ksp_limit on ppc32 when switching to irq stack
Linus Torvalds [Mon, 17 Feb 2014 20:24:45 +0000 (12:24 -0800)]
printk: fix syslog() overflowing user buffer
This is not a buffer overflow in the traditional sense: we don't
overflow any *kernel* buffers, but we do mis-count the amount of data we
copy back to user space for the SYSLOG_ACTION_READ_ALL case.
In particular, if the user buffer is too small to hold everything, and
*if* there is a continuation line at just the right place, we can end up
giving the user more data than he asked for.
The reason is that we first count up the number of bytes all the log
records contains, then we walk the records again until we've skipped the
records at the beginning that won't fit, and then we walk the rest of
the records and copy them to the user space buffer.
And in between that "skip the initial records that won't fit" and the
"copy the records that *will* fit to user space", we reset the 'prev'
variable that contained the record information for the last record not
copied. That meant that when we started copying to user space, we now
had a different character count than what we had originally calculated
in the first record walk-through.
The fix is to simply not clear the 'prev' flags value (in both cases
where we had the same logic: syslog_print_all and kmsg_dump_get_buffer:
the latter is used for pstore-like dumping)
Reported-and-tested-by: Debabrata Banerjee <dbanerje@akamai.com>
Acked-by: Kay Sievers <kay@vrfy.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jiri Bohac [Fri, 14 Feb 2014 17:13:50 +0000 (18:13 +0100)]
bonding: 802.3ad: make aggregator_identifier bond-private
aggregator_identifier is used to assign unique aggregator identifiers
to aggregators of a bond during device enslaving.
aggregator_identifier is currently a global variable that is zeroed in
bond_3ad_initialize().
This sequence will lead to duplicate aggregator identifiers for eth1 and eth3:
create bond0
change bond0 mode to 802.3ad
enslave eth0 to bond0 //eth0 gets agg id 1
enslave eth1 to bond0 //eth1 gets agg id 2
create bond1
change bond1 mode to 802.3ad
enslave eth2 to bond1 //aggregator_identifier is reset to 0
//eth2 gets agg id 1
enslave eth3 to bond0 //eth3 gets agg id 2
Fix this by making aggregator_identifier private to the bond.
Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Acked-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Emil Goode [Thu, 13 Feb 2014 16:50:19 +0000 (17:50 +0100)]
usbnet: remove generic hard_header_len check
This patch removes a generic hard_header_len check from the usbnet
module that is causing dropped packages under certain circumstances
for devices that send rx packets that cross urb boundaries.
One example is the AX88772B which occasionally send rx packets that
cross urb boundaries where the remaining partial packet is sent with
no hardware header. When the buffer with a partial packet is of less
number of octets than the value of hard_header_len the buffer is
discarded by the usbnet module.
With AX88772B this can be reproduced by using ping with a packet
size between 1965-1976.
The bug has been reported here:
https://bugzilla.kernel.org/show_bug.cgi?id=29082
This patch introduces the following changes:
- Removes the generic hard_header_len check in the rx_complete
function in the usbnet module.
- Introduces a ETH_HLEN check for skbs that are not cloned from
within a rx_fixup callback.
- For safety a hard_header_len check is added to each rx_fixup
callback function that could be affected by this change.
These extra checks could possibly be removed by someone
who has the hardware to test.
- Removes a call to dev_kfree_skb_any() and instead utilizes the
dev->done list to queue skbs for cleanup.
The changes place full responsibility on the rx_fixup callback
functions that clone skbs to only pass valid skbs to the
usbnet_skb_return function.
Signed-off-by: Emil Goode <emilgoode@gmail.com>
Reported-by: Igor Gnatenko <i.gnatenko.brain@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nicolas Dichtel [Mon, 17 Feb 2014 13:22:21 +0000 (14:22 +0100)]
gre: add link local route when local addr is any
This bug was reported by Steinar H. Gunderson and was introduced by commit
f7cb8886335d ("sit/gre6: don't try to add the same route two times").
root@morgental:~# ip tunnel add foo mode gre remote 1.2.3.4 ttl 64
root@morgental:~# ip link set foo up mtu 1468
root@morgental:~# ip -6 route show dev foo
fe80::/64 proto kernel metric 256
but after the above commit, no such route shows up.
There is no link local route because dev->dev_addr is 0 (because local ipv4
address is 0), hence no link local address is configured.
In this scenario, the link local address is added manually: 'ip -6 addr add
fe80::1 dev foo' and because prefix is /128, no link local route is added by the
kernel.
Even if the right things to do is to add the link local address with a /64
prefix, we need to restore the previous behavior to avoid breaking userpace.
Reported-by: Steinar H. Gunderson <sesse@samfundet.no>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Antonio Quartulli [Sat, 15 Feb 2014 20:50:37 +0000 (21:50 +0100)]
batman-adv: fix potential kernel paging error for unicast transmissions
batadv_send_skb_prepare_unicast(_4addr) might reallocate the
skb's data. If it does then our ethhdr pointer is not valid
anymore in batadv_send_skb_unicast(), resulting in a kernel
paging error.
Fixing this by refetching the ethhdr pointer after the
potential reallocation.
Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Antonio Quartulli [Sat, 15 Feb 2014 01:17:20 +0000 (02:17 +0100)]
batman-adv: avoid double free when orig_node initialization fails
In the failure path of the orig_node initialization routine
the orig_node->bat_iv.bcast_own field is free'd twice: first
in batadv_iv_ogm_orig_get() and then later in
batadv_orig_node_free_rcu().
Fix it by removing the kfree in batadv_iv_ogm_orig_get().
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Antonio Quartulli [Tue, 11 Feb 2014 16:05:07 +0000 (17:05 +0100)]
batman-adv: free skb on TVLV parsing success
When the TVLV parsing routine succeed the skb is left
untouched thus leading to a memory leak.
Fix this by consuming the skb in case of success.
Introduced by
ef26157747d42254453f6b3ac2bd8bd3c53339c3
("batman-adv: tvlv - basic infrastructure")
Reported-by: Russel Senior <russell@personaltelco.net>
Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Tested-by: Russell Senior <russell@personaltelco.net>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Antonio Quartulli [Tue, 11 Feb 2014 16:05:06 +0000 (17:05 +0100)]
batman-adv: fix TT CRC computation by ensuring byte order
When computing the CRC on a 2byte variable the order of
the bytes obviously alters the final result. This means
that computing the CRC over the same value on two archs
having different endianess leads to different numbers.
The global and local translation table CRC computation
routine makes this mistake while processing the clients
VIDs. The result is a continuous CRC mismatching between
nodes having different endianess.
Fix this by converting the VID to Network Order before
processing it. This guarantees that every node uses the same
byte order.
Introduced by
7ea7b4a142758deaf46c1af0ca9ceca6dd55138b
("batman-adv: make the TT CRC logic VLAN specific")
Reported-by: Russel Senior <russell@personaltelco.net>
Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Tested-by: Russell Senior <russell@personaltelco.net>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Simon Wunderlich [Sat, 8 Feb 2014 15:45:06 +0000 (16:45 +0100)]
batman-adv: fix potential orig_node reference leak
Since batadv_orig_node_new() sets the refcount to two, assuming that
the calling function will use a reference for putting the orig_node into
a hash or similar, both references must be freed if initialization of
the orig_node fails. Otherwise that object may be leaked in that error
case.
Reported-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Antonio Quartulli [Wed, 29 Jan 2014 10:25:12 +0000 (11:25 +0100)]
batman-adv: avoid potential race condition when adding a new neighbour
When adding a new neighbour it is important to atomically
perform the following:
- check if the neighbour already exists
- append the neighbour to the proper list
If the two operations are not performed in an atomic context
it is possible that two concurrent insertions add the same
neighbour twice.
Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Antonio Quartulli [Wed, 29 Jan 2014 23:12:24 +0000 (00:12 +0100)]
batman-adv: properly check pskb_may_pull return value
pskb_may_pull() returns 1 on success and 0 in case of failure,
therefore checking for the return value being negative does
not make sense at all.
This way if the function fails we will probably read beyond the current
skb data buffer. Fix this by doing the proper check.
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Antonio Quartulli [Tue, 28 Jan 2014 01:06:47 +0000 (02:06 +0100)]
batman-adv: release vlan object after checking the CRC
There is a refcounter unbalance in the CRC checking routine
invoked on OGM reception. A vlan object is retrieved (thus
its refcounter is increased by one) but it is never properly
released. This leads to a memleak because the vlan object
will never be free'd.
Fix this by releasing the vlan object after having read the
CRC.
Reported-by: Russell Senior <russell@personaltelco.net>
Reported-by: Daniel <daniel@makrotopia.org>
Reported-by: cmsv <cmsv@wirelesspt.net>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Antonio Quartulli [Mon, 27 Jan 2014 11:23:28 +0000 (12:23 +0100)]
batman-adv: fix TT-TVLV parsing on OGM reception
When accessing a TT-TVLV container in the OGM RX path
the variable pointing to the list of changes to apply is
altered by mistake.
This makes the TT component read data at the wrong position
in the OGM packet buffer.
Fix it by removing the bogus pointer alteration.
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Antonio Quartulli [Tue, 21 Jan 2014 10:22:05 +0000 (11:22 +0100)]
batman-adv: fix soft-interface MTU computation
The current MTU computation always returns a value
smaller than 1500bytes even if the real interfaces
have an MTU large enough to compensate the batman-adv
overhead.
Fix the computation by properly returning the highest
admitted value.
Introduced by
a19d3d85e1b854e4a483a55d740a42458085560d
("batman-adv: limit local translation table max size")
Reported-by: Russell Senior <russell@personaltelco.net>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Chen Gang [Sun, 16 Feb 2014 11:36:06 +0000 (19:36 +0800)]
avr32: add generic vga.h to Kbuild
Need add generic "vga.h", or can not pass building for allmodconfig,
the related error:
CC [M] drivers/gpu/drm/drm_irq.o
In file included from include/linux/vgaarb.h:34,
from drivers/gpu/drm/drm_irq.c:42:
include/video/vga.h:22:21: error: asm/vga.h: No such file or directory
Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Acked-by: Hans-Christian Egtvedt <hegtvedt@cisco.com>
Chen Gang [Sun, 16 Feb 2014 11:39:30 +0000 (19:39 +0800)]
avr32: add generic ioremap_wc() definition in io.h
Need generic ioremap_wc(), or can not pass compiling with allmodconfig,
the related error:
CC [M] drivers/gpu/drm/drm_bufs.o
drivers/gpu/drm/drm_bufs.c: In function 'drm_addmap_core':
drivers/gpu/drm/drm_bufs.c:217: error: implicit declaration of function 'ioremap_wc'
drivers/gpu/drm/drm_bufs.c:218: warning: assignment makes pointer from integer without a cast
Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Acked-by: Hans-Christian Egtvedt <hegtvedt@cisco.com>
Chen Gang [Sat, 1 Feb 2014 12:35:54 +0000 (20:35 +0800)]
avr32: Makefile: add '-D__linux__' flag for gcc-4.4.7 use
For avr32 cross compiler, do not define '__linux__' internally, so it
will cause issue with allmodconfig.
The related error:
CC [M] fs/coda/psdev.o
In file included from include/linux/coda.h:64,
from fs/coda/psdev.c:45:
include/uapi/linux/coda.h:221: error: expected specifier-qualifier-list before 'u_quad_t'
The related toolchain version (which only download, not re-compile):
[root@gchen linux-next]# /upstream/toolchain/download/avr32-gnu-toolchain-linux_x86/bin/avr32-gcc -v
Using built-in specs.
Target: avr32
Configured with: /data2/home/toolsbuild/jenkins-knuth/workspace/avr32-gnu-toolchain/src/gcc/configure --target=avr32 --host=i686-pc-linux-gnu --build=x86_64-pc-linux-gnu --prefix=/home/toolsbuild/jenkins-knuth/workspace/avr32-gnu-toolchain/avr32-gnu-toolchain-linux_x86 --enable-languages=c,c++ --disable-nls --disable-libssp --disable-libstdcxx-pch --with-dwarf2 --enable-version-specific-runtime-libs --disable-shared --enable-doc --with-mpfr-lib=/home/toolsbuild/jenkins-knuth/workspace/avr32-gnu-toolchain/avr32-gnu-toolchain-linux_x86/lib --with-mpfr-include=/home/toolsbuild/jenkins-knuth/workspace/avr32-gnu-toolchain/avr32-gnu-toolchain-linux_x86/include --with-gmp=/home/toolsbuild/jenkins-knuth/workspace/avr32-gnu-toolchain/avr32-gnu-toolchain-linux_x86 --with-mpc=/home/toolsbuild/jenkins-knuth/workspace/avr32-gnu-toolchain/avr32-gnu-toolchain-linux_x86 --enable-__cxa_atexit --disable-shared --with-newlib --with-pkgversion=AVR_32_bit_GNU_Toolchain_3.4.2_435 --with-bugurl=http://www
.atmel.com/avr
Thread model: single
gcc version 4.4.7 (AVR_32_bit_GNU_Toolchain_3.4.2_435)
Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Acked-by: Hans-Christian Egtvedt <hegtvedt@cisco.com>
Cc: stable@vger.kernel.org
Paul Gortmaker [Fri, 10 Jan 2014 14:29:39 +0000 (09:29 -0500)]
avr32: fix missing module.h causing build failure in mimc200/fram.c
Causing this:
In file included from arch/avr32/boards/mimc200/fram.c:13:
include/linux/miscdevice.h:51: error: field 'list' has incomplete type
include/linux/miscdevice.h:55: error: expected specifier-qualifier-list before 'mode_t'
arch/avr32/boards/mimc200/fram.c:42: error: 'THIS_MODULE' undeclared here (not in a function)
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
Acked-by: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: stable@vger.kernel.org
Daniel Borkmann [Sun, 16 Feb 2014 14:55:22 +0000 (15:55 +0100)]
packet: check for ndo_select_queue during queue selection
Mathias reported that on an AMD Geode LX embedded board (ALiX)
with ath9k driver PACKET_QDISC_BYPASS, introduced in commit
d346a3fae3ff ("packet: introduce PACKET_QDISC_BYPASS socket
option"), triggers a WARN_ON() coming from the driver itself
via
066dae93bdf ("ath9k: rework tx queue selection and fix
queue stopping/waking").
The reason why this happened is that ndo_select_queue() call
is not invoked from direct xmit path i.e. for ieee80211 subsystem
that sets queue and TID (similar to 802.1d tag) which is being
put into the frame through 802.11e (WMM, QoS). If that is not
set, pending frame counter for e.g. ath9k can get messed up.
So the WARN_ON() in ath9k is absolutely legitimate. Generally,
the hw queue selection in ieee80211 depends on the type of
traffic, and priorities are set according to ieee80211_ac_numbers
mapping; working in a similar way as DiffServ only on a lower
layer, so that the AP can favour frames that have "real-time"
requirements like voice or video data frames.
Therefore, check for presence of ndo_select_queue() in netdev
ops and, if available, invoke it with a fallback handler to
__packet_pick_tx_queue(), so that driver such as bnx2x, ixgbe,
or mlx4 can still select a hw queue for transmission in
relation to the current CPU while e.g. ieee80211 subsystem
can make their own choices.
Reported-by: Mathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Sun, 16 Feb 2014 14:55:21 +0000 (15:55 +0100)]
netdevice: move netdev_cap_txqueue for shared usage to header
In order to allow users to invoke netdev_cap_txqueue, it needs to
be moved into netdevice.h header file. While at it, also add kernel
doc header to document the API.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Sun, 16 Feb 2014 14:55:20 +0000 (15:55 +0100)]
netdevice: add queue selection fallback handler for ndo_select_queue
Add a new argument for ndo_select_queue() callback that passes a
fallback handler. This gets invoked through netdev_pick_tx();
fallback handler is currently __netdev_pick_tx() as most drivers
invoke this function within their customized implementation in
case for skbs that don't need any special handling. This fallback
handler can then be replaced on other call-sites with different
queue selection methods (e.g. in packet sockets, pktgen etc).
This also has the nice side-effect that __netdev_pick_tx() is
then only invoked from netdev_pick_tx() and export of that
function to modules can be undone.
Suggested-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ingo Molnar [Fri, 14 Feb 2014 14:32:20 +0000 (15:32 +0100)]
drivers/net: tulip_remove_one needs to call pci_disable_device()
Otherwise the device is not completely shut down.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Matija Glavinic Pecotic [Fri, 14 Feb 2014 13:51:18 +0000 (14:51 +0100)]
net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer
Implementation of (a)rwnd calculation might lead to severe performance issues
and associations completely stalling. These problems are described and solution
is proposed which improves lksctp's robustness in congestion state.
1) Sudden drop of a_rwnd and incomplete window recovery afterwards
Data accounted in sctp_assoc_rwnd_decrease takes only payload size (sctp data),
but size of sk_buff, which is blamed against receiver buffer, is not accounted
in rwnd. Theoretically, this should not be the problem as actual size of buffer
is double the amount requested on the socket (SO_RECVBUF). Problem here is
that this will have bad scaling for data which is less then sizeof sk_buff.
E.g. in 4G (LTE) networks, link interfacing radio side will have a large portion
of traffic of this size (less then 100B).
An example of sudden drop and incomplete window recovery is given below. Node B
exhibits problematic behavior. Node A initiates association and B is configured
to advertise rwnd of 10000. A sends messages of size 43B (size of typical sctp
message in 4G (LTE) network). On B data is left in buffer by not reading socket
in userspace.
Lets examine when we will hit pressure state and declare rwnd to be 0 for
scenario with above stated parameters (rwnd == 10000, chunk size == 43, each
chunk is sent in separate sctp packet)
Logic is implemented in sctp_assoc_rwnd_decrease:
socket_buffer (see below) is maximum size which can be held in socket buffer
(sk_rcvbuf). current_alloced is amount of data currently allocated (rx_count)
A simple expression is given for which it will be examined after how many
packets for above stated parameters we enter pressure state:
We start by condition which has to be met in order to enter pressure state:
socket_buffer < currently_alloced;
currently_alloced is represented as size of sctp packets received so far and not
yet delivered to userspace. x is the number of chunks/packets (since there is no
bundling, and each chunk is delivered in separate packet, we can observe each
chunk also as sctp packet, and what is important here, having its own sk_buff):
socket_buffer < x*each_sctp_packet;
each_sctp_packet is sctp chunk size + sizeof(struct sk_buff). socket_buffer is
twice the amount of initially requested size of socket buffer, which is in case
of sctp, twice the a_rwnd requested:
2*rwnd < x*(payload+sizeof(struc sk_buff));
sizeof(struct sk_buff) is 190 (3.13.0-rc4+). Above is stated that rwnd is 10000
and each payload size is 43
20000 < x(43+190);
x > 20000/233;
x ~> 84;
After ~84 messages, pressure state is entered and 0 rwnd is advertised while
received 84*43B ~= 3612B sctp data. This is why external observer notices sudden
drop from 6474 to 0, as it will be now shown in example:
IP A.34340 > B.12345: sctp (1) [INIT] [init tag:
1875509148] [rwnd: 81920] [OS: 10] [MIS: 65535] [init TSN:
1096057017]
IP B.12345 > A.34340: sctp (1) [INIT ACK] [init tag:
3198966556] [rwnd: 10000] [OS: 10] [MIS: 10] [init TSN:
902132839]
IP A.34340 > B.12345: sctp (1) [COOKIE ECHO]
IP B.12345 > A.34340: sctp (1) [COOKIE ACK]
IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN:
1096057017] [SID: 0] [SSEQ 0] [PPID 0x18]
IP B.12345 > A.34340: sctp (1) [SACK] [cum ack
1096057017] [a_rwnd 9957] [#gap acks 0] [#dup tsns 0]
IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN:
1096057018] [SID: 0] [SSEQ 1] [PPID 0x18]
IP B.12345 > A.34340: sctp (1) [SACK] [cum ack
1096057018] [a_rwnd 9957] [#gap acks 0] [#dup tsns 0]
IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN:
1096057019] [SID: 0] [SSEQ 2] [PPID 0x18]
IP B.12345 > A.34340: sctp (1) [SACK] [cum ack
1096057019] [a_rwnd 9914] [#gap acks 0] [#dup tsns 0]
<...>
IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN:
1096057098] [SID: 0] [SSEQ 81] [PPID 0x18]
IP B.12345 > A.34340: sctp (1) [SACK] [cum ack
1096057098] [a_rwnd 6517] [#gap acks 0] [#dup tsns 0]
IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN:
1096057099] [SID: 0] [SSEQ 82] [PPID 0x18]
IP B.12345 > A.34340: sctp (1) [SACK] [cum ack
1096057099] [a_rwnd 6474] [#gap acks 0] [#dup tsns 0]
IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN:
1096057100] [SID: 0] [SSEQ 83] [PPID 0x18]
--> Sudden drop
IP B.12345 > A.34340: sctp (1) [SACK] [cum ack
1096057100] [a_rwnd 0] [#gap acks 0] [#dup tsns 0]
At this point, rwnd_press stores current rwnd value so it can be later restored
in sctp_assoc_rwnd_increase. This however doesn't happen as condition to start
slowly increasing rwnd until rwnd_press is returned to rwnd is never met. This
condition is not met since rwnd, after it hit 0, must first reach rwnd_press by
adding amount which is read from userspace. Let us observe values in above
example. Initial a_rwnd is 10000, pressure was hit when rwnd was ~6500 and the
amount of actual sctp data currently waiting to be delivered to userspace
is ~3500. When userspace starts to read, sctp_assoc_rwnd_increase will be blamed
only for sctp data, which is ~3500. Condition is never met, and when userspace
reads all data, rwnd stays on 3569.
IP B.12345 > A.34340: sctp (1) [SACK] [cum ack
1096057100] [a_rwnd 1505] [#gap acks 0] [#dup tsns 0]
IP B.12345 > A.34340: sctp (1) [SACK] [cum ack
1096057100] [a_rwnd 3010] [#gap acks 0] [#dup tsns 0]
IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN:
1096057101] [SID: 0] [SSEQ 84] [PPID 0x18]
IP B.12345 > A.34340: sctp (1) [SACK] [cum ack
1096057101] [a_rwnd 3569] [#gap acks 0] [#dup tsns 0]
--> At this point userspace read everything, rwnd recovered only to 3569
IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN:
1096057102] [SID: 0] [SSEQ 85] [PPID 0x18]
IP B.12345 > A.34340: sctp (1) [SACK] [cum ack
1096057102] [a_rwnd 3569] [#gap acks 0] [#dup tsns 0]
Reproduction is straight forward, it is enough for sender to send packets of
size less then sizeof(struct sk_buff) and receiver keeping them in its buffers.
2) Minute size window for associations sharing the same socket buffer
In case multiple associations share the same socket, and same socket buffer
(sctp.rcvbuf_policy == 0), different scenarios exist in which congestion on one
of the associations can permanently drop rwnd of other association(s).
Situation will be typically observed as one association suddenly having rwnd
dropped to size of last packet received and never recovering beyond that point.
Different scenarios will lead to it, but all have in common that one of the
associations (let it be association from 1)) nearly depleted socket buffer, and
the other association blames socket buffer just for the amount enough to start
the pressure. This association will enter pressure state, set rwnd_press and
announce 0 rwnd.
When data is read by userspace, similar situation as in 1) will occur, rwnd will
increase just for the size read by userspace but rwnd_press will be high enough
so that association doesn't have enough credit to reach rwnd_press and restore
to previous state. This case is special case of 1), being worse as there is, in
the worst case, only one packet in buffer for which size rwnd will be increased.
Consequence is association which has very low maximum rwnd ('minute size', in
our case down to 43B - size of packet which caused pressure) and as such
unusable.
Scenario happened in the field and labs frequently after congestion state (link
breaks, different probabilities of packet drop, packet reordering) and with
scenario 1) preceding. Here is given a deterministic scenario for reproduction:
>From node A establish two associations on the same socket, with rcvbuf_policy
being set to share one common buffer (sctp.rcvbuf_policy == 0). On association 1
repeat scenario from 1), that is, bring it down to 0 and restore up. Observe
scenario 1). Use small payload size (here we use 43). Once rwnd is 'recovered',
bring it down close to 0, as in just one more packet would close it. This has as
a consequence that association number 2 is able to receive (at least) one more
packet which will bring it in pressure state. E.g. if association 2 had rwnd of
10000, packet received was 43, and we enter at this point into pressure,
rwnd_press will have 9957. Once payload is delivered to userspace, rwnd will
increase for 43, but conditions to restore rwnd to original state, just as in
1), will never be satisfied.
--> Association 1, between A.y and B.12345
IP A.55915 > B.12345: sctp (1) [INIT] [init tag:
836880897] [rwnd: 10000] [OS: 10] [MIS: 65535] [init TSN:
4032536569]
IP B.12345 > A.55915: sctp (1) [INIT ACK] [init tag:
2873310749] [rwnd: 81920] [OS: 10] [MIS: 10] [init TSN:
3799315613]
IP A.55915 > B.12345: sctp (1) [COOKIE ECHO]
IP B.12345 > A.55915: sctp (1) [COOKIE ACK]
--> Association 2, between A.z and B.12346
IP A.55915 > B.12346: sctp (1) [INIT] [init tag:
534798321] [rwnd: 10000] [OS: 10] [MIS: 65535] [init TSN:
2099285173]
IP B.12346 > A.55915: sctp (1) [INIT ACK] [init tag:
516668823] [rwnd: 81920] [OS: 10] [MIS: 10] [init TSN:
3676403240]
IP A.55915 > B.12346: sctp (1) [COOKIE ECHO]
IP B.12346 > A.55915: sctp (1) [COOKIE ACK]
--> Deplete socket buffer by sending messages of size 43B over association 1
IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN:
3799315613] [SID: 0] [SSEQ 0] [PPID 0x18]
IP A.55915 > B.12345: sctp (1) [SACK] [cum ack
3799315613] [a_rwnd 9957] [#gap acks 0] [#dup tsns 0]
<...>
IP A.55915 > B.12345: sctp (1) [SACK] [cum ack
3799315696] [a_rwnd 6388] [#gap acks 0] [#dup tsns 0]
IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN:
3799315697] [SID: 0] [SSEQ 84] [PPID 0x18]
IP A.55915 > B.12345: sctp (1) [SACK] [cum ack
3799315697] [a_rwnd 6345] [#gap acks 0] [#dup tsns 0]
--> Sudden drop on 1
IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN:
3799315698] [SID: 0] [SSEQ 85] [PPID 0x18]
IP A.55915 > B.12345: sctp (1) [SACK] [cum ack
3799315698] [a_rwnd 0] [#gap acks 0] [#dup tsns 0]
--> Here userspace read, rwnd 'recovered' to 3698, now deplete again using
association 1 so there is place in buffer for only one more packet
IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN:
3799315799] [SID: 0] [SSEQ 186] [PPID 0x18]
IP A.55915 > B.12345: sctp (1) [SACK] [cum ack
3799315799] [a_rwnd 86] [#gap acks 0] [#dup tsns 0]
IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN:
3799315800] [SID: 0] [SSEQ 187] [PPID 0x18]
IP A.55915 > B.12345: sctp (1) [SACK] [cum ack
3799315800] [a_rwnd 43] [#gap acks 0] [#dup tsns 0]
--> Socket buffer is almost depleted, but there is space for one more packet,
send them over association 2, size 43B
IP B.12346 > A.55915: sctp (1) [DATA] (B)(E) [TSN:
3676403240] [SID: 0] [SSEQ 0] [PPID 0x18]
IP A.55915 > B.12346: sctp (1) [SACK] [cum ack
3676403240] [a_rwnd 0] [#gap acks 0] [#dup tsns 0]
--> Immediate drop
IP A.60995 > B.12346: sctp (1) [SACK] [cum ack
387491510] [a_rwnd 0] [#gap acks 0] [#dup tsns 0]
--> Read everything from the socket, both association recover up to maximum rwnd
they are capable of reaching, note that association 1 recovered up to 3698,
and association 2 recovered only to 43
IP A.55915 > B.12345: sctp (1) [SACK] [cum ack
3799315800] [a_rwnd 1548] [#gap acks 0] [#dup tsns 0]
IP A.55915 > B.12345: sctp (1) [SACK] [cum ack
3799315800] [a_rwnd 3053] [#gap acks 0] [#dup tsns 0]
IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN:
3799315801] [SID: 0] [SSEQ 188] [PPID 0x18]
IP A.55915 > B.12345: sctp (1) [SACK] [cum ack
3799315801] [a_rwnd 3698] [#gap acks 0] [#dup tsns 0]
IP B.12346 > A.55915: sctp (1) [DATA] (B)(E) [TSN:
3676403241] [SID: 0] [SSEQ 1] [PPID 0x18]
IP A.55915 > B.12346: sctp (1) [SACK] [cum ack
3676403241] [a_rwnd 43] [#gap acks 0] [#dup tsns 0]
A careful reader might wonder why it is necessary to reproduce 1) prior
reproduction of 2). It is simply easier to observe when to send packet over
association 2 which will push association into the pressure state.
Proposed solution:
Both problems share the same root cause, and that is improper scaling of socket
buffer with rwnd. Solution in which sizeof(sk_buff) is taken into concern while
calculating rwnd is not possible due to fact that there is no linear
relationship between amount of data blamed in increase/decrease with IP packet
in which payload arrived. Even in case such solution would be followed,
complexity of the code would increase. Due to nature of current rwnd handling,
slow increase (in sctp_assoc_rwnd_increase) of rwnd after pressure state is
entered is rationale, but it gives false representation to the sender of current
buffer space. Furthermore, it implements additional congestion control mechanism
which is defined on implementation, and not on standard basis.
Proposed solution simplifies whole algorithm having on mind definition from rfc:
o Receiver Window (rwnd): This gives the sender an indication of the space
available in the receiver's inbound buffer.
Core of the proposed solution is given with these lines:
sctp_assoc_rwnd_update:
if ((asoc->base.sk->sk_rcvbuf - rx_count) > 0)
asoc->rwnd = (asoc->base.sk->sk_rcvbuf - rx_count) >> 1;
else
asoc->rwnd = 0;
We advertise to sender (half of) actual space we have. Half is in the braces
depending whether you would like to observe size of socket buffer as SO_RECVBUF
or twice the amount, i.e. size is the one visible from userspace, that is,
from kernelspace.
In this way sender is given with good approximation of our buffer space,
regardless of the buffer policy - we always advertise what we have. Proposed
solution fixes described problems and removes necessity for rwnd restoration
algorithm. Finally, as proposed solution is simplification, some lines of code,
along with some bytes in struct sctp_association are saved.
Version 2 of the patch addressed comments from Vlad. Name of the function is set
to be more descriptive, and two parts of code are changed, in one removing the
superfluous call to sctp_assoc_rwnd_update since call would not result in update
of rwnd, and the other being reordering of the code in a way that call to
sctp_assoc_rwnd_update updates rwnd. Version 3 corrected change introduced in v2
in a way that existing function is not reordered/copied in line, but it is
correctly called. Thanks Vlad for suggesting.
Signed-off-by: Matija Glavinic Pecotic <matija.glavinic-pecotic.ext@nsn.com>
Reviewed-by: Alexander Sverdlin <alexander.sverdlin@nsn.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Duan Jiong [Fri, 14 Feb 2014 10:26:22 +0000 (18:26 +0800)]
ipv4: distinguish EHOSTUNREACH from the ENETUNREACH
since commit
251da413("ipv4: Cache ip_error() routes even when not forwarding."),
the counter IPSTATS_MIB_INADDRERRORS can't work correctly, because the value of
err was always set to ENETUNREACH.
Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Haiyang Zhang [Thu, 13 Feb 2014 00:54:27 +0000 (16:54 -0800)]
hyperv: Fix the carrier status setting
Without this patch, the "cat /sys/class/net/ethN/operstate" shows
"unknown", and "ethtool ethN" shows "Link detected: yes", when VM
boots up with or without vNIC connected.
This patch fixed the problem.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gerrit Renker [Fri, 14 Feb 2014 02:02:33 +0000 (19:02 -0700)]
dccp: re-enable debug macro
dccp tfrc: revert
This reverts
6aee49c558de ("dccp: make local variable static") since
the variable tfrc_debug is referenced by the tfrc_pr_debug(fmt, ...)
macro when TFRC debugging is enabled. If it is enabled, use of the
macro produces a compilation error.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Theodore Ts'o [Mon, 17 Feb 2014 00:29:32 +0000 (19:29 -0500)]
ext4: don't leave i_crtime.tv_sec uninitialized
If the i_crtime field is not present in the inode, don't leave the
field uninitialized.
Fixes:
ef7f38359 ("ext4: Add nanosecond timestamps")
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Tested-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
Gavin Shan [Wed, 12 Feb 2014 07:24:56 +0000 (15:24 +0800)]
powerpc/eeh: Disable EEH on reboot
We possiblly detect EEH errors during reboot, particularly in kexec
path, but it's impossible for device drivers and EEH core to handle
or recover them properly.
The patch registers one reboot notifier for EEH and disable EEH
subsystem during reboot. That means the EEH errors is going to be
cleared by hardware reset or second kernel during early stage of
PCI probe.
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Gavin Shan [Wed, 12 Feb 2014 07:24:55 +0000 (15:24 +0800)]
powerpc/eeh: Cleanup on eeh_subsystem_enabled
The patch cleans up variable eeh_subsystem_enabled so that we needn't
refer the variable directly from external. Instead, we will use
function eeh_enabled() and eeh_set_enable() to operate the variable.
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Gavin Shan [Wed, 12 Feb 2014 07:24:54 +0000 (15:24 +0800)]
powerpc/powernv: Rework EEH reset
When doing reset in order to recover the affected PE, we issue
hot reset on PE primary bus if it's not root bus. Otherwise, we
issue hot or fundamental reset on root port or PHB accordingly.
For the later case, we didn't cover the situation where PE only
includes root port and it potentially causes kernel crash upon
EEH error to the PE.
The patch reworks the logic of EEH reset to improve the code
readability and also avoid the kernel crash.
Cc: stable@vger.kernel.org
Reported-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Anton Blanchard [Wed, 12 Feb 2014 06:18:50 +0000 (17:18 +1100)]
powerpc: Use unstripped VDSO image for more accurate profiling data
We are seeing a lot of hits in the VDSO that are not resolved by perf.
A while(1) gettimeofday() loop shows the issue:
27.64% [vdso] [.] 0x000000000000060c
22.57% [vdso] [.] 0x0000000000000628
16.88% [vdso] [.] 0x0000000000000610
12.39% [vdso] [.] __kernel_gettimeofday
6.09% [vdso] [.] 0x00000000000005f8
3.58% test [.]
00000037.plt_call.gettimeofday@@GLIBC_2.18
2.94% [vdso] [.] __kernel_datapage_offset
2.90% test [.] main
We are using a stripped VDSO image which means only symbols with
relocation info can be resolved. There isn't a lot of point to
stripping the VDSO, the debug info is only about 1kB:
4680 arch/powerpc/kernel/vdso64/vdso64.so
5815 arch/powerpc/kernel/vdso64/vdso64.so.dbg
By using the unstripped image, we can resolve all the symbols in the
VDSO and the perf profile data looks much better:
76.53% [vdso] [.] __do_get_tspec
12.20% [vdso] [.] __kernel_gettimeofday
5.05% [vdso] [.] __get_datapage
3.20% test [.] main
2.92% test [.]
00000037.plt_call.gettimeofday@@GLIBC_2.18
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Anton Blanchard [Wed, 12 Feb 2014 06:17:05 +0000 (17:17 +1100)]
powerpc: Link VDSOs at 0x0
perf is failing to resolve symbols in the VDSO. A while (1)
gettimeofday() loop shows:
93.99% [vdso] [.] 0x00000000000005e0
3.12% test [.]
00000037.plt_call.gettimeofday@@GLIBC_2.18
2.81% test [.] main
The reason for this is that we are linking our VDSO shared libraries
at 1MB, which is a little weird. Even though this is uncommon, Alan
points out that it is valid and we should probably fix perf userspace.
Regardless, I can't see a reason why we are doing this. The code
is all position independent and we never rely on the VDSO ending
up at 1M (and we never place it there on 64bit tasks).
Changing our link address to 0x0 fixes perf VDSO symbol resolution:
73.18% [vdso] [.] 0x000000000000060c
12.39% [vdso] [.] __kernel_gettimeofday
3.58% test [.]
00000037.plt_call.gettimeofday@@GLIBC_2.18
2.94% [vdso] [.] __kernel_datapage_offset
2.90% test [.] main
We still have some local symbol resolution issues that will be
fixed in a subsequent patch.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Aneesh Kumar K.V [Wed, 12 Feb 2014 03:43:38 +0000 (09:13 +0530)]
mm: Use ptep/pmdp_set_numa() for updating _PAGE_NUMA bit
Archs like ppc64 doesn't do tlb flush in set_pte/pmd functions when using
a hash table MMU for various reasons (the flush is handled as part of
the PTE modification when necessary).
ppc64 thus doesn't implement flush_tlb_range for hash based MMUs.
Additionally ppc64 require the tlb flushing to be batched within ptl locks.
The reason to do that is to ensure that the hash page table is in sync with
linux page table.
We track the hpte index in linux pte and if we clear them without flushing
hash and drop the ptl lock, we can have another cpu update the pte and can
end up with duplicate entry in the hash table, which is fatal.
We also want to keep set_pte_at simpler by not requiring them to do hash
flush for performance reason. We do that by assuming that set_pte_at() is
never *ever* called on a PTE that is already valid.
This was the case until the NUMA code went in which broke that assumption.
Fix that by introducing a new pair of helpers to set _PAGE_NUMA in a
way similar to ptep/pmdp_set_wrprotect(), with a generic implementation
using set_pte_at() and a powerpc specific one using the appropriate
mechanism needed to keep the hash table in sync.
Acked-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Aneesh Kumar K.V [Wed, 12 Feb 2014 03:43:37 +0000 (09:13 +0530)]
mm: Dirty accountable change only apply to non prot numa case
So move it within the if loop
Acked-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Aneesh Kumar K.V [Wed, 12 Feb 2014 03:43:36 +0000 (09:13 +0530)]
powerpc/mm: Add new "set" flag argument to pte/pmd update function
pte_update() is a powerpc-ism used to change the bits of a PTE
when the access permission is being restricted (a flush is
potentially needed).
It uses atomic operations on when needed and handles the hash
synchronization on hash based processors.
It is currently only used to clear PTE bits and so the current
implementation doesn't provide a way to also set PTE bits.
The new _PAGE_NUMA bit, when set, is actually restricting access
so it must use that function too, so this change adds the ability
for pte_update() to also set bits.
We will use this later to set the _PAGE_NUMA bit.
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Kleber Sacilotto de Souza [Fri, 17 Jan 2014 13:56:52 +0000 (11:56 -0200)]
powerpc/pseries: Add Gen3 definitions for PCIE link speed
Rev3 of the PCI Express Base Specification defines a Supported Link
Speeds Vector where the bit definitions within this field are:
Bit 0 - 2.5 GT/s
Bit 1 - 5.0 GT/s
Bit 2 - 8.0 GT/s
This vector definition is used by the platform firmware to export the
maximum and current link speeds of the PCI bus via the
"ibm,pcie-link-speed-stats" device-tree property.
This patch updates pseries_root_bridge_prepare() to detect Gen3
speed buses (defined by 0x04).
Signed-off-by: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Kleber Sacilotto de Souza [Fri, 17 Jan 2014 13:56:51 +0000 (11:56 -0200)]
powerpc/pseries: Fix regression on PCI link speed
Commit 5091f0c (powerpc/pseries: Fix PCIE link speed endian issue)
introduced a regression on the PCI link speed detection using the
device-tree property. The ibm,pcie-link-speed-stats property is composed
of two 32-bit integers, the first one being the maxinum link speed and
the second the current link speed. The changes introduced by the
aforementioned commit are considering just the first integer.
Fix this issue by changing how the property is accessed, using the
helper functions to properly access the array of values. The explicit
byte swapping is not needed anymore here, since it's done by the helper
functions.
Signed-off-by: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Kevin Hao [Fri, 17 Jan 2014 04:25:28 +0000 (12:25 +0800)]
powerpc: Set the correct ksp_limit on ppc32 when switching to irq stack
Guenter Roeck has got the following call trace on a p2020 board:
Kernel stack overflow in process
eb3e5a00, r1=
eb79df90
CPU: 0 PID: 2838 Comm: ssh Not tainted 3.13.0-rc8-juniper-00146-g19eca00 #4
task:
eb3e5a00 ti:
c0616000 task.ti:
ef440000
NIP:
c003a420 LR:
c003a410 CTR:
c0017518
REGS:
eb79dee0 TRAP: 0901 Not tainted (3.13.0-rc8-juniper-00146-g19eca00)
MSR:
00029000 <CE,EE,ME> CR:
24008444 XER:
00000000
GPR00:
c003a410 eb79df90 eb3e5a00 00000000 eb05d900 00000001 65d87646 00000000
GPR08:
00000000 020b8000 00000000 00000000 44008442
NIP [
c003a420] __do_softirq+0x94/0x1ec
LR [
c003a410] __do_softirq+0x84/0x1ec
Call Trace:
[
eb79df90] [
c003a410] __do_softirq+0x84/0x1ec (unreliable)
[
eb79dfe0] [
c003a970] irq_exit+0xbc/0xc8
[
eb79dff0] [
c000cc1c] call_do_irq+0x24/0x3c
[
ef441f20] [
c00046a8] do_IRQ+0x8c/0xf8
[
ef441f40] [
c000e7f4] ret_from_except+0x0/0x18
--- Exception: 501 at 0xfcda524
LR = 0x10024900
Instruction dump:
7c781b78 3b40000a 3a73b040 543c0024 3a800000 3b3913a0 7ef5bb78 48201bf9
5463103a 7d3b182e 7e89b92e 7c008146 <
3ba00000>
7e7e9b78 48000014 57fff87f
Kernel panic - not syncing: kernel stack overflow
CPU: 0 PID: 2838 Comm: ssh Not tainted 3.13.0-rc8-juniper-00146-g19eca00 #4
Call Trace:
The reason is that we have used the wrong register to calculate the
ksp_limit in commit
cbc9565ee826 (powerpc: Remove ksp_limit on ppc64).
Just fix it.
As suggested by Benjamin Herrenschmidt, also add the C prototype of the
function in the comment in order to avoid such kind of errors in the
future.
Cc: stable@vger.kernel.org # 3.12
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Linus Torvalds [Sun, 16 Feb 2014 21:30:25 +0000 (13:30 -0800)]
Linux 3.14-rc3
Linus Torvalds [Sun, 16 Feb 2014 19:05:27 +0000 (11:05 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
"We have a small collection of fixes in my for-linus branch.
The big thing that stands out is a revert of a new ioctl. Users
haven't shipped yet in btrfs-progs, and Dave Sterba found a better way
to export the information"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: use right clone root offset for compressed extents
btrfs: fix null pointer deference at btrfs_sysfs_add_one+0x105
Btrfs: unset DCACHE_DISCONNECTED when mounting default subvol
Btrfs: fix max_inline mount option
Btrfs: fix a lockdep warning when cleaning up aborted transaction
Revert "btrfs: add ioctl to export size of global metadata reservation"
Linus Torvalds [Sun, 16 Feb 2014 19:03:58 +0000 (11:03 -0800)]
Merge tag 'dt-fixes-for-3.14' of git://git./linux/kernel/git/robh/linux
Pull devicetree fixes from Rob Herring:
"Fix booting on PPC boards. Changes to of_match_node matching caused
the serial port on some PPC boards to stop working. Reverted the
change and reimplement to split matching between new style compatible
only matching and fallback to old matching algorithm"
* tag 'dt-fixes-for-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
of: search the best compatible match first in __of_match_node()
Revert "OF: base: match each node compatible against all given matches first"
Theodore Ts'o [Sun, 16 Feb 2014 03:42:25 +0000 (22:42 -0500)]
ext4: fix online resize with a non-standard blocks per group setting
The set_flexbg_block_bitmap() function assumed that the number of
blocks in a blockgroup was sb->blocksize * 8, which is normally true,
but not always! Use EXT4_BLOCKS_PER_GROUP(sb) instead, to fix block
bitmap corruption after:
mke2fs -t ext4 -g 3072 -i 4096 /dev/vdd 1G
mount -t ext4 /dev/vdd /vdd
resize2fs /dev/vdd 8G
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reported-by: Jon Bernard <jbernard@tuxion.com>
Cc: stable@vger.kernel.org
Theodore Ts'o [Sun, 16 Feb 2014 02:33:13 +0000 (21:33 -0500)]
ext4: fix online resize with very large inode tables
If a file system has a large number of inodes per block group, all of
the metadata blocks in a flex_bg may be larger than what can fit in a
single block group. Unfortunately, ext4_alloc_group_tables() in
resize.c was never tested to see if it would handle this case
correctly, and there were a large number of bugs which caused the
following sequence to result in a BUG_ON:
kernel bug at fs/ext4/resize.c:409!
...
call trace:
[<
ffffffff81256768>] ext4_flex_group_add+0x1448/0x1830
[<
ffffffff81257de2>] ext4_resize_fs+0x7b2/0xe80
[<
ffffffff8123ac50>] ext4_ioctl+0xbf0/0xf00
[<
ffffffff811c111d>] do_vfs_ioctl+0x2dd/0x4b0
[<
ffffffff811b9df2>] ? final_putname+0x22/0x50
[<
ffffffff811c1371>] sys_ioctl+0x81/0xa0
[<
ffffffff81676aa9>] system_call_fastpath+0x16/0x1b
code: c8 4c 89 df e8 41 96 f8 ff 44 89 e8 49 01 c4 44 29 6d d4 0
rip [<
ffffffff81254fa1>] set_flexbg_block_bitmap+0x171/0x180
This can be reproduced with the following command sequence:
mke2fs -t ext4 -i 4096 /dev/vdd 1G
mount -t ext4 /dev/vdd /vdd
resize2fs /dev/vdd 8G
To fix this, we need to make sure the right thing happens when a block
group's inode table straddles two block groups, which means the
following bugs had to be fixed:
1) Not clearing the BLOCK_UNINIT flag in the second block group in
ext4_alloc_group_tables --- the was proximate cause of the BUG_ON.
2) Incorrectly determining how many block groups contained contiguous
free blocks in ext4_alloc_group_tables().
3) Incorrectly setting the start of the next block range to be marked
in use after a discontinuity in setup_new_flex_group_blocks().
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
Kevin Hao [Fri, 14 Feb 2014 05:22:46 +0000 (13:22 +0800)]
of: search the best compatible match first in __of_match_node()
Currently, of_match_node compares each given match against all node's
compatible strings with of_device_is_compatible.
To achieve multiple compatible strings per node with ordering from
specific to generic, this requires given matches to be ordered from
specific to generic. For most of the drivers this is not true and also
an alphabetical ordering is more sane there.
Therefore, this patch introduces a function to match each of the node's
compatible strings against all given compatible matches without type and
name first, before checking the next compatible string. This implies
that node's compatibles are ordered from specific to generic while
given matches can be in any order. If we fail to find such a match
entry, then fall-back to the old method in order to keep compatibility.
Cc: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Tested-by: Stephen Chivers <schivers@csc.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Linus Torvalds [Sun, 16 Feb 2014 00:18:47 +0000 (16:18 -0800)]
Merge git://git./linux/kernel/git/nab/target-pending
Pull SCSI target fixes from Nicholas Bellinger:
"Mostly minor fixes this time to v3.14-rc1 related changes. Also
included is one fix for a free after use regression in persistent
reservations UNREGISTER logic that is CC'ed to >= v3.11.y stable"
* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
Target/sbc: Fix protection copy routine
IB/srpt: replace strict_strtoul() with kstrtoul()
target: Simplify command completion by removing CMD_T_FAILED flag
iser-target: Fix leak on failure in isert_conn_create_fastreg_pool
iscsi-target: Fix SNACK Type 1 + BegRun=0 handling
target: Fix missing length check in spc_emulate_evpd_83()
qla2xxx: Remove last vestiges of qla_tgt_cmd.cmd_list
target: Fix 32-bit + CONFIG_LBDAF=n link error w/ sector_div
target: Fix free-after-use regression in PR unregister
Linus Torvalds [Sun, 16 Feb 2014 00:17:51 +0000 (16:17 -0800)]
Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"i2c has a bugfix and documentation improvements for you"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
Documentation: i2c: mention ACPI method for instantiating devices
Documentation: i2c: describe devicetree method for instantiating devices
i2c: mv64xxx: refactor message start to ensure proper initialization
Linus Torvalds [Sun, 16 Feb 2014 00:06:12 +0000 (16:06 -0800)]
Merge branches 'irq-urgent-for-linus' and 'irq-core-for-linus' of git://git./linux/kernel/git/tip/tip
Pull irq update from Thomas Gleixner:
"Fix from the urgent branch: a trivial oneliner adding the missing
Kconfig dependency curing build failures which have been discovered by
several build robots.
The update in the irq-core branch provides a new function in the
irq/devres code, which is a prerequisite for driver developers to get
rid of boilerplate code all over the place.
Not a bugfix, but it has zero impact on the current kernel due to the
lack of users. It's simpler to provide the infrastructure to
interested parties via your tree than fulfilling the wishlist of
driver maintainers on which particular commit or tag this should be
based on"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq: Add missing irq_to_desc export for CONFIG_SPARSE_IRQ=n
* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq: Add devm_request_any_context_irq()
Linus Torvalds [Sun, 16 Feb 2014 00:04:42 +0000 (16:04 -0800)]
Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
"The following trilogy of patches brings you:
- fix for a long standing math overflow issue with HZ < 60
- an onliner fix for a corner case in the dreaded tick broadcast
mechanism affecting a certain range of AMD machines which are
infested with the infamous automagic C1E power control misfeature
- a fix for one of the ARM platforms which allows the kernel to
proceed and boot instead of stupidly panicing for no good reason.
The patch is slightly larger than necessary, but it's less ugly
than the alternative 5 liner"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tick: Clear broadcast pending bit when switching to oneshot
clocksource: Kona: Print warning rather than panic
time: Fix overflow when HZ is smaller than 60
Linus Torvalds [Sat, 15 Feb 2014 23:03:34 +0000 (15:03 -0800)]
Merge tag 'trace-fixes-v3.14-rc2' of git://git./linux/kernel/git/rostedt/linux-trace
Pull twi tracing fixes from Steven Rostedt:
"Two urgent fixes in the tracing utility.
The first is a fix for the way the ring buffer stores timestamps.
After a restructure of the code was done, the ring buffer timestamp
logic missed the fact that the first event on a sub buffer is to have
a zero delta, as the full timestamp is stored on the sub buffer
itself. But because the delta was not cleared to zero, the timestamp
for that event will be calculated as the real timestamp + the delta
from the last timestamp. This can skew the timestamps of the events
and have them say they happened when they didn't really happen.
That's bad.
The second fix is for modifying the function graph caller site. When
the stop machine was removed from updating the function tracing code,
it missed updating the function graph call site location. It is still
modified as if it is being done via stop machine. But it's not. This
can lead to a GPF and kernel crash if the function graph call site
happens to lie between cache lines and one CPU is executing it while
another CPU is doing the update. It would be a very hard condition to
hit, but the result is severe enough to have it fixed ASAP"
* tag 'trace-fixes-v3.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ftrace/x86: Use breakpoints for converting function graph caller
ring-buffer: Fix first commit on sub-buffer having non-zero delta
Linus Torvalds [Sat, 15 Feb 2014 23:02:28 +0000 (15:02 -0800)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 EFI fixes from Peter Anvin:
"A few more EFI-related fixes"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/efi: Check status field to validate BGRT header
x86/efi: Fix 32-bit fallout
Linus Torvalds [Sat, 15 Feb 2014 23:01:33 +0000 (15:01 -0800)]
Merge tag 'fixes-for-linus' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Kevin Hilman:
"A collection of ARM SoC fixes for v3.14-rc1.
Mostly a collection of Kconfig, device tree data and compilation fixes
along with fix to drivers/phy that fixes a boot regression on some
Marvell mvebu platforms"
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
dma: mv_xor: Silence a bunch of LPAE-related warnings
ARM: ux500: disable msp2 device tree node
ARM: zynq: Reserve not DMAable space in front of the kernel
ARM: multi_v7_defconfig: Select CONFIG_SOC_DRA7XX
ARM: imx6: Initialize low-power mode early again
ARM: pxa: fix various compilation problems
ARM: pxa: fix compilation problem on AM300EPD board
ARM: at91: add Atmel's SAMA5D3 Xplained board
spi/atmel: document clock properties
mmc: atmel-mci: document clock properties
ARM: at91: enable USB host on at91sam9n12ek board
ARM: at91/dt: fix sama5d3 ohci hclk clock reference
ARM: at91/dt: sam9263: fix compatibility string for the I2C
ata: sata_mv: Fix probe failures with optional phys
drivers: phy: Add support for optional phys
drivers: phy: Make NULL a valid phy reference
ARM: fix HAVE_ARM_TWD selection for OMAP and shmobile
ARM: moxart: move DMA_OF selection to driver
ARM: hisi: fix kconfig warning on HAVE_ARM_TWD
Wolfram Sang [Sat, 15 Feb 2014 14:58:35 +0000 (15:58 +0100)]
Documentation: i2c: mention ACPI method for instantiating devices
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Wolfram Sang [Mon, 10 Feb 2014 10:03:55 +0000 (11:03 +0100)]
Documentation: i2c: describe devicetree method for instantiating devices
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Filipe David Borba Manana [Sat, 15 Feb 2014 15:53:16 +0000 (15:53 +0000)]
Btrfs: use right clone root offset for compressed extents
For non compressed extents, iterate_extent_inodes() gives us offsets
that take into account the data offset from the file extent items, while
for compressed extents it doesn't. Therefore we have to adjust them before
placing them in a send clone instruction. Not doing this adjustment leads to
the receiving end requesting for a wrong a file range to the clone ioctl,
which results in different file content from the one in the original send
root.
Issue reproducible with the following excerpt from the test I made for
xfstests:
_scratch_mkfs
_scratch_mount "-o compress-force=lzo"
$XFS_IO_PROG -f -c "truncate 118811" $SCRATCH_MNT/foo
$XFS_IO_PROG -c "pwrite -S 0x0d -b 39987 92267 39987" $SCRATCH_MNT/foo
$BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap1
$XFS_IO_PROG -c "pwrite -S 0x3e -b 80000 200000 80000" $SCRATCH_MNT/foo
$BTRFS_UTIL_PROG filesystem sync $SCRATCH_MNT
$XFS_IO_PROG -c "pwrite -S 0xdc -b 10000 250000 10000" $SCRATCH_MNT/foo
$XFS_IO_PROG -c "pwrite -S 0xff -b 10000 300000 10000" $SCRATCH_MNT/foo
# will be used for incremental send to be able to issue clone operations
$BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/clones_snap
$BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap2
$FSSUM_PROG -A -f -w $tmp/1.fssum $SCRATCH_MNT/mysnap1
$FSSUM_PROG -A -f -w $tmp/2.fssum -x $SCRATCH_MNT/mysnap2/mysnap1 \
-x $SCRATCH_MNT/mysnap2/clones_snap $SCRATCH_MNT/mysnap2
$FSSUM_PROG -A -f -w $tmp/clones.fssum $SCRATCH_MNT/clones_snap \
-x $SCRATCH_MNT/clones_snap/mysnap1 -x $SCRATCH_MNT/clones_snap/mysnap2
$BTRFS_UTIL_PROG send $SCRATCH_MNT/mysnap1 -f $tmp/1.snap
$BTRFS_UTIL_PROG send $SCRATCH_MNT/clones_snap -f $tmp/clones.snap
$BTRFS_UTIL_PROG send -p $SCRATCH_MNT/mysnap1 \
-c $SCRATCH_MNT/clones_snap $SCRATCH_MNT/mysnap2 -f $tmp/2.snap
_scratch_unmount
_scratch_mkfs
_scratch_mount
$BTRFS_UTIL_PROG receive $SCRATCH_MNT -f $tmp/1.snap
$FSSUM_PROG -r $tmp/1.fssum $SCRATCH_MNT/mysnap1 2>> $seqres.full
$BTRFS_UTIL_PROG receive $SCRATCH_MNT -f $tmp/clones.snap
$FSSUM_PROG -r $tmp/clones.fssum $SCRATCH_MNT/clones_snap 2>> $seqres.full
$BTRFS_UTIL_PROG receive $SCRATCH_MNT -f $tmp/2.snap
$FSSUM_PROG -r $tmp/2.fssum $SCRATCH_MNT/mysnap2 2>> $seqres.full
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Chris Mason <clm@fb.com>
Anand Jain [Wed, 15 Jan 2014 09:22:28 +0000 (17:22 +0800)]
btrfs: fix null pointer deference at btrfs_sysfs_add_one+0x105
bdev is null when disk has disappeared and mounted with
the degrade option
stack trace
---------
btrfs_sysfs_add_one+0x105/0x1c0 [btrfs]
open_ctree+0x15f3/0x1fe0 [btrfs]
btrfs_mount+0x5db/0x790 [btrfs]
? alloc_pages_current+0xa4/0x160
mount_fs+0x34/0x1b0
vfs_kern_mount+0x62/0xf0
do_mount+0x22e/0xa80
? __get_free_pages+0x9/0x40
? copy_mount_options+0x31/0x170
SyS_mount+0x7e/0xc0
system_call_fastpath+0x16/0x1b
---------
reproducer:
-------
mkfs.btrfs -draid1 -mraid1 /dev/sdc /dev/sdd
(detach a disk)
devmgt detach /dev/sdc [1]
mount -o degrade /dev/sdd /btrfs
-------
[1] github.com/anajain/devmgt.git
Signed-off-by: Anand Jain <Anand.Jain@oracle.com>
Tested-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
Wolfram Sang [Thu, 13 Feb 2014 20:36:29 +0000 (21:36 +0100)]
i2c: mv64xxx: refactor message start to ensure proper initialization
Because the offload mechanism can fall back to a standard transfer,
having two seperate initialization states is unfortunate. Let's just
have one state which does things consistently. This fixes a bug where
some preparation was missing when the fallback happened. And it makes
the code much easier to follow. To implement this, we put the check
if offload is possible at the top of the offload setup function.
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Tested-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Cc: stable@vger.kernel.org # v3.12+
Fixes:
930ab3d403ae (i2c: mv64xxx: Add I2C Transaction Generator support)
Linus Torvalds [Sat, 15 Feb 2014 00:15:45 +0000 (16:15 -0800)]
Merge tag 'usb-3.14-rc3' of git://git./linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here is a bunch of USB fixes for 3.14-rc3. Most of these are xhci
reverts, fixing a bunch of reported issues with USB 3 host controller
issues that loads of people have been hitting (with the exception of
kernel developers, all of our machines seem to be working fine, which
is why these took so long to get resolved...)
There are some other minor fixes and new device ids, as ususal. All
have been in linux-next successfully"
* tag 'usb-3.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (22 commits)
usb: option: blacklist ZTE MF667 net interface
Revert "usb: xhci: Link TRB must not occur within a USB payload burst"
Revert "xhci: Avoid infinite loop when sg urb requires too many trbs"
Revert "xhci: Set scatter-gather limit to avoid failed block writes."
xhci 1.0: Limit arbitrarily-aligned scatter gather.
Modpost: fixed USB alias generation for ranges including 0x9 and 0xA
usb: core: Fix potential memory leak adding dyn USBdevice IDs
USB: ftdi_sio: add Tagsys RFID Reader IDs
usb: qcserial: add Netgear Aircard 340U
usb-storage: enable multi-LUN scanning when needed
USB: simple: add Dynastream ANT USB-m Stick device support
usb-storage: add unusual-devs entry for BlackBerry 9000
usb-storage: restrict bcdDevice range for Super Top in Cypress ATACB
usb: phy: move some error messages to debug
usb: ftdi_sio: add Mindstorms EV3 console adapter
usb: dwc2: fix memory corruption in dwc2 driver
usb: dwc2: fix role switch breakage
usb: dwc2: bail out early when booting with "nousb"
Revert "xhci: replace xhci_read_64() with readq()"
Revert "xhci: replace xhci_write_64() with writeq()"
...
Linus Torvalds [Sat, 15 Feb 2014 00:15:03 +0000 (16:15 -0800)]
Merge tag 'tty-3.14-rc3' of git://git./linux/kernel/git/gregkh/tty
Pull tty/serial driver fixes from Greg KH:
"Here are a small number of tty/serial driver fixes to resolve reported
issues with 3.14-rc and earlier (in the case of the vt bugfix). Some
of these have been tested and reported by a number of people as the
tty bugfix was pretty commonly hit on some platforms.
All have been in linux-next for a while"
* tag 'tty-3.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
vt: Fix secure clear screen
serial: 8250: Support XR17V35x fraction divisor
n_tty: Fix stale echo output
serial: sirf: fix kernel panic caused by unpaired spinlock
serial: 8250_pci: unbreak last serial ports on NetMos 9865 cards
n_tty: Fix poll() when TIME_CHAR and MIN_CHAR == 0
serial: omap: fix rs485 probe on defered pinctrl
serial: 8250_dw: fix compilation warning when !CONFIG_PM_SLEEP
serial: omap-serial: Move info message to probe function
tty: Set correct tty name in 'active' sysfs attribute
tty: n_gsm: Fix for modems with brk in modem status control
drivers/tty/hvc: don't use module_init in non-modular hyp. console code
Linus Torvalds [Sat, 15 Feb 2014 00:14:11 +0000 (16:14 -0800)]
Merge tag 'staging-3.14-rc3' of git://git./linux/kernel/git/gregkh/staging
Pull staging driver fixes from Greg KH:
"Here are a number (lots, I know) of fixes for staging drivers to
resolve a bunch of reported issues.
The largest patches here is one revert of a patch that is in 3.14-rc1
to fix reported problems, and a sync of a usb host driver that
required some ARM patches to go in before it could be accepted (which
is why it missed -rc1)
All of these have been in linux-next for a while with no reported
issues"
* tag 'staging-3.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (56 commits)
staging/rtl8821ae: fix build, depends on MAC80211
iio: max1363: Use devm_regulator_get_optional for optional regulator
iio:accel:bma180: Use modifier instead of index in channel specification
iio: adis16400: Set timestamp as the last element in chan_spec
iio: ak8975: Fix calculation formula for convert micro tesla to gauss unit
staging:iio:ad799x fix typo in ad799x_events[]
iio: mxs-lradc: remove useless scale_available files
iio: mxs-lradc: fix buffer overflow
iio:magnetometer:mag3110: Fix output of decimal digits in show_int_plus_micros()
iio:magnetometer:mag3110: Report busy in _read_raw() / write_raw() when buffer is enabled
wlags49_h2: Fix overflow in wireless_set_essid()
xlr_net: Fix missing trivial allocation check
staging: r8188eu: overflow in rtw_p2p_get_go_device_address()
staging: r8188eu: array overflow in rtw_mp_ioctl_hdl()
staging: r8188eu: Fix typo in USB_DEVICE list
usbip/userspace/libsrc/names.c: memory leak
gpu: ion: dereferencing an ERR_PTR
staging: comedi: usbduxsigma: fix unaligned dereferences
staging: comedi: fix too early cleanup in comedi_auto_config()
staging: android: ion: dummy: fix an error code
...
Linus Torvalds [Sat, 15 Feb 2014 00:13:40 +0000 (16:13 -0800)]
Merge tag 'driver-core-3.14-rc3' of git://git./linux/kernel/git/gregkh/driver-core
Pull driver core fix from Greg KH:
"Here is a single driver core patch for 3.14-rc3 for the component code
that Russell has found and fixed"
* tag 'driver-core-3.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
drivers/base: fix devres handling for master device
Linus Torvalds [Sat, 15 Feb 2014 00:13:00 +0000 (16:13 -0800)]
Merge tag 'char-misc-3.14-rc3' of git://git./linux/kernel/git/gregkh/char-misc
Pull char/misc fixes from Greg KH:
"Here are some small char/misc driver fixes, along with some
documentation updates, for 3.14-rc3. Nothing major, just a number of
fixes for reported issues"
* tag 'char-misc-3.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
Revert "misc: eeprom: sunxi: Add new compatibles"
Revert "ARM: sunxi: dt: Convert to the new SID compatibles"
misc: mic: fix possible signed underflow (undefined behavior) in userspace API
ARM: sunxi: dt: Convert to the new SID compatibles
misc: eeprom: sunxi: Add new compatibles
misc: genwqe: Fix potential memory leak when pinning memory
Documentation:Update Documentation/zh_CN/arm64/memory.txt
Documentation:Update Documentation/zh_CN/arm64/booting.txt
Documentation:Chinese translation of Documentation/arm64/tagged-pointers.txt
raw: set range for MAX_RAW_DEVS
raw: test against runtime value of max_raw_minors
Drivers: hv: vmbus: Don't timeout during the initial connection with host
Drivers: hv: vmbus: Specify the target CPU that should receive notification
VME: Correct read/write alignment algorithm
mei: don't unset read cb ptr on reset
mei: clear write cb from waiting list on reset
Pavel Shilovsky [Fri, 14 Feb 2014 09:31:02 +0000 (13:31 +0400)]
CIFS: Fix too big maxBuf size for SMB3 mounts
SMB3 servers can respond with MaxTransactSize of more than 4M
that can cause a memory allocation error returned from kmalloc
in a lock codepath. Also the client doesn't support multicredit
requests now and allows buffer sizes of 65536 bytes only. Set
MaxTransactSize to this maximum supported value.
Cc: stable@vger.kernel.org # 3.7+
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Jeff Layton [Fri, 14 Feb 2014 12:20:35 +0000 (07:20 -0500)]
cifs: ensure that uncached writes handle unmapped areas correctly
It's possible for userland to pass down an iovec via writev() that has a
bogus user pointer in it. If that happens and we're doing an uncached
write, then we can end up getting less bytes than we expect from the
call to iov_iter_copy_from_user. This is CVE-2014-0069
cifs_iovec_write isn't set up to handle that situation however. It'll
blindly keep chugging through the page array and not filling those pages
with anything useful. Worse yet, we'll later end up with a negative
number in wdata->tailsz, which will confuse the sending routines and
cause an oops at the very least.
Fix this by having the copy phase of cifs_iovec_write stop copying data
in this situation and send the last write as a short one. At the same
time, we want to avoid sending a zero-length write to the server, so
break out of the loop and set rc to -EFAULT if that happens. This also
allows us to handle the case where no address in the iovec is valid.
[Note: Marking this for stable on v3.4+ kernels, but kernels as old as
v2.6.38 may have a similar problem and may need similar fix]
Cc: <stable@vger.kernel.org> # v3.4+
Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru>
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Josef Bacik [Fri, 14 Feb 2014 18:43:48 +0000 (13:43 -0500)]
Btrfs: unset DCACHE_DISCONNECTED when mounting default subvol
A user was running into errors from an NFS export of a subvolume that had a
default subvol set. When we mount a default subvol we will use d_obtain_alias()
to find an existing dentry for the subvolume in the case that the root subvol
has already been mounted, or a dummy one is allocated in the case that the root
subvol has not already been mounted. This allows us to connect the dentry later
on if we wander into the path. However if we don't ever wander into the path we
will keep DCACHE_DISCONNECTED set for a long time, which angers NFS. It doesn't
appear to cause any problems but it is annoying nonetheless, so simply unset
DCACHE_DISCONNECTED in the get_default_root case and switch btrfs_lookup() to
use d_materialise_unique() instead which will make everything play nicely
together and reconnect stuff if we wander into the defaul subvol path from a
different way. With this patch I'm no longer getting the NFS errors when
exporting a volume that has been mounted with a default subvol set. Thanks,
cc: bfields@fieldses.org
cc: ebiederm@xmission.com
Signed-off-by: Josef Bacik <jbacik@fb.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Chris Mason <clm@fb.com>
Mitch Harder [Thu, 13 Feb 2014 15:13:16 +0000 (09:13 -0600)]
Btrfs: fix max_inline mount option
Currently, the only mount option for max_inline that has any effect is
max_inline=0. Any other value that is supplied to max_inline will be
adjusted to a minimum of 4k. Since max_inline has an effective maximum
of ~3900 bytes due to page size limitations, the current behaviour
only has meaning for max_inline=0.
This patch will allow the the max_inline mount option to accept non-zero
values as indicated in the documentation.
Signed-off-by: Mitch Harder <mitch.harder@sabayonlinux.org>
Signed-off-by: Chris Mason <clm@fb.com>
Liu Bo [Sat, 8 Feb 2014 07:33:08 +0000 (15:33 +0800)]
Btrfs: fix a lockdep warning when cleaning up aborted transaction
Given now we have 2 spinlock for management of delayed refs,
CONFIG_DEBUG_SPINLOCK=y helped me find this,
[ 4723.413809] BUG: spinlock wrong CPU on CPU#1, btrfs-transacti/2258
[ 4723.414882] lock: 0xffff880048377670, .magic:
dead4ead, .owner: btrfs-transacti/2258, .owner_cpu: 2
[ 4723.417146] CPU: 1 PID: 2258 Comm: btrfs-transacti Tainted: G W O 3.12.0+ #4
[ 4723.421321] Call Trace:
[ 4723.421872] [<
ffffffff81680fe7>] dump_stack+0x54/0x74
[ 4723.422753] [<
ffffffff81681093>] spin_dump+0x8c/0x91
[ 4723.424979] [<
ffffffff816810b9>] spin_bug+0x21/0x26
[ 4723.425846] [<
ffffffff81323956>] do_raw_spin_unlock+0x66/0x90
[ 4723.434424] [<
ffffffff81689bf7>] _raw_spin_unlock+0x27/0x40
[ 4723.438747] [<
ffffffffa015da9e>] btrfs_cleanup_one_transaction+0x35e/0x710 [btrfs]
[ 4723.443321] [<
ffffffffa015df54>] btrfs_cleanup_transaction+0x104/0x570 [btrfs]
[ 4723.444692] [<
ffffffff810c1b5d>] ? trace_hardirqs_on_caller+0xfd/0x1c0
[ 4723.450336] [<
ffffffff810c1c2d>] ? trace_hardirqs_on+0xd/0x10
[ 4723.451332] [<
ffffffffa015e5ee>] transaction_kthread+0x22e/0x270 [btrfs]
[ 4723.452543] [<
ffffffffa015e3c0>] ? btrfs_cleanup_transaction+0x570/0x570 [btrfs]
[ 4723.457833] [<
ffffffff81079efa>] kthread+0xea/0xf0
[ 4723.458990] [<
ffffffff81079e10>] ? kthread_create_on_node+0x140/0x140
[ 4723.460133] [<
ffffffff81692aac>] ret_from_fork+0x7c/0xb0
[ 4723.460865] [<
ffffffff81079e10>] ? kthread_create_on_node+0x140/0x140
[ 4723.496521] ------------[ cut here ]------------
----------------------------------------------------------------------
The reason is that we get to call cond_resched_lock(&head_ref->lock) while
still holding @delayed_refs->lock.
So it's different with __btrfs_run_delayed_refs(), where we do drop-acquire
dance before and after actually processing delayed refs.
Here we don't drop the lock, others are not able to add new delayed refs to
head_ref, so cond_resched_lock(&head_ref->lock) is not necessary here.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <clm@fb.com>
Chris Mason [Fri, 14 Feb 2014 21:42:13 +0000 (13:42 -0800)]
Revert "btrfs: add ioctl to export size of global metadata reservation"
This reverts commit
01e219e8069516cdb98594d417b8bb8d906ed30d.
David Sterba found a different way to provide these features without adding a new
ioctl. We haven't released any progs with this ioctl yet, so I'm taking this out
for now until we finalize things.
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
CC: Jeff Mahoney <jeffm@suse.com>