Rik van Riel [Thu, 28 May 2015 13:52:49 +0000 (09:52 -0400)]
sched/numa: Only consider less busy nodes as numa balancing destinations
Changeset
a43455a1d572 ("sched/numa: Ensure task_numa_migrate() checks
the preferred node") fixes an issue where workloads would never
converge on a fully loaded (or overloaded) system.
However, it introduces a regression on less than fully loaded systems,
where workloads converge on a few NUMA nodes, instead of properly
staying spread out across the whole system. This leads to a reduction
in available memory bandwidth, and usable CPU cache, with predictable
performance problems.
The root cause appears to be an interaction between the load balancer
and NUMA balancing, where the short term load represented by the load
balancer differs from the long term load the NUMA balancing code would
like to base its decisions on.
Simply reverting
a43455a1d572 would re-introduce the non-convergence
of workloads on fully loaded systems, so that is not a good option. As
an aside, the check done before
a43455a1d572 only applied to a task's
preferred node, not to other candidate nodes in the system, so the
converge-on-too-few-nodes problem still happens, just to a lesser
degree.
Instead, try to compensate for the impedance mismatch between the load
balancer and NUMA balancing by only ever considering a lesser loaded
node as a destination for NUMA balancing, regardless of whether the
task is trying to move to the preferred node, or to another node.
This patch also addresses the issue that a system with a single
runnable thread would never migrate that thread to near its memory,
introduced by
095bebf61a46 ("sched/numa: Do not move past the balance
point if unbalanced").
A test where the main thread creates a large memory area, and spawns a
worker thread to iterate over the memory (placed on another node by
select_task_rq_fair), after which the main thread goes to sleep and
waits for the worker thread to loop over all the memory now sees the
worker thread migrated to where the memory is, instead of having all
the memory migrated over like before.
Jirka has run a number of performance tests on several systems: single
instance SpecJBB 2005 performance is 7-15% higher on a 4 node system,
with higher gains on systems with more cores per socket.
Multi-instance SpecJBB 2005 (one per node), linpack, and stream see
little or no changes with the revert of
095bebf61a46 and this patch.
Reported-by: Artem Bityutski <dedekind1@gmail.com>
Reported-by: Jirka Hladky <jhladky@redhat.com>
Tested-by: Jirka Hladky <jhladky@redhat.com>
Tested-by: Artem Bityutskiy <dedekind1@gmail.com>
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150528095249.3083ade0@annuminas.surriel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Rik van Riel [Wed, 27 May 2015 19:04:27 +0000 (15:04 -0400)]
Revert
095bebf61a46 ("sched/numa: Do not move past the balance point if unbalanced")
Commit
095bebf61a46 ("sched/numa: Do not move past the balance point
if unbalanced") broke convergence of workloads with just one runnable
thread, by making it impossible for the one runnable thread on the
system to move from one NUMA node to another.
Instead, the thread would remain where it was, and pull all the memory
across to its location, which is much slower than just migrating the
thread to where the memory is.
The next patch has a better fix for the issue that
095bebf61a46 tried
to address.
Reported-by: Jirka Hladky <jhladky@redhat.com>
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dedekind1@gmail.com
Cc: mgorman@suse.de
Link: http://lkml.kernel.org/r/1432753468-7785-2-git-send-email-riel@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ben Segall [Mon, 6 Apr 2015 22:28:10 +0000 (15:28 -0700)]
sched/fair: Prevent throttling in early pick_next_task_fair()
The optimized task selection logic optimistically selects a new task
to run without first doing a full put_prev_task(). This is so that we
can avoid a put/set on the common ancestors of the old and new task.
Similarly, we should only call check_cfs_rq_runtime() to throttle
eligible groups if they're part of the common ancestry, otherwise it
is possible to end up with no eligible task in the simple task
selection.
Imagine:
/root
/prev /next
/A /B
If our optimistic selection ends up throttling /next, we goto simple
and our put_prev_task() ends up throttling /prev, after which we're
going to bug out in set_next_entity() because there aren't any tasks
left.
Avoid this scenario by only throttling common ancestors.
Reported-by: Mohammed Naser <mnaser@vexxhost.com>
Reported-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Ben Segall <bsegall@google.com>
[ munged Changelog ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roman Gushchin <klamm@yandex-team.ru>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: pjt@google.com
Fixes:
678d5718d8d0 ("sched/fair: Optimize cgroup pick_next_task_fair()")
Link: http://lkml.kernel.org/r/xm26wq1oswoq.fsf@sword-of-the-dawn.mtv.corp.google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Frederic Weisbecker [Thu, 4 Jun 2015 15:39:09 +0000 (17:39 +0200)]
preempt: Reorganize the notrace definitions a bit
preempt.h has two seperate "#ifdef CONFIG_PREEMPT" sections: one to
define preempt_enable() and another to define preempt_enable_notrace().
Lets gather both.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1433432349-1021-4-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Frederic Weisbecker [Thu, 4 Jun 2015 15:39:08 +0000 (17:39 +0200)]
preempt: Use preempt_schedule_context() as the official tracing preemption point
preempt_schedule_context() is a tracing safe preemption point but it's
only used when CONFIG_CONTEXT_TRACKING=y. Other configs have tracing
recursion issues since commit:
b30f0e3ffedf ("sched/preempt: Optimize preemption operations on __schedule() callers")
introduced function based preemp_count_*() ops.
Lets make it available on all configs and give it a more appropriate
name for its new position.
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1433432349-1021-3-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Frederic Weisbecker [Thu, 4 Jun 2015 15:39:07 +0000 (17:39 +0200)]
sched: Make preempt_schedule_context() function-tracing safe
Since function tracing disables preemption, it needs a safe preemption
point to use when preemption is re-enabled without worrying about tracing
recursion. Ie: to avoid tracing recursion, that preemption point can't
be traced (use of notrace qualifier) and it can't call any traceable
function before that preemption point disables preemption itself, which
disarms the recursion.
preempt_schedule() was fine until commit:
b30f0e3ffedf ("sched/preempt: Optimize preemption operations on __schedule() callers")
because PREEMPT_ACTIVE (which has the property to disable preemption
and this disarm tracing preemption recursion) was set before calling
any further function.
But that commit introduced the use of preempt_count_add/sub() functions
to set PREEMPT_ACTIVE and because these functions are called before
preemption gets a chance to be disabled, we have a tracing recursion.
preempt_schedule_context() is one of the possible preemption functions
used by tracing. Its special purpose is to avoid tracing recursion
against context tracking. Lets enhance this function to become more
generally tracing safe by disabling preemption with raw accessors, such
that no function is called before preemption gets disabled and disarm
the tracing recursion.
This function is going to become the specific tracing-safe preemption
point in further commit.
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1433432349-1021-2-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ingo Molnar [Tue, 2 Jun 2015 06:05:42 +0000 (08:05 +0200)]
Merge branch 'linus' into sched/core, to resolve conflict
Conflicts:
arch/sparc/include/asm/topology_64.h
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Linus Torvalds [Tue, 2 Jun 2015 03:51:18 +0000 (20:51 -0700)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Various VTI tunnel (mark handling, PMTU) bug fixes from Alexander
Duyck and Steffen Klassert.
2) Revert ethtool PHY query change, it wasn't correct. The PHY address
selected by the driver running the PHY to MAC connection decides
what PHY address GET ethtool operations return information from.
3) Fix handling of sequence number bits for encryption IV generation in
ESP driver, from Herbert Xu.
4) UDP can return -EAGAIN when we hit a bad checksum on receive, even
when there are other packets in the receive queue which is wrong.
Just respect the error returned from the generic socket recv
datagram helper. From Eric Dumazet.
5) Fix BNA driver firmware loading on big-endian systems, from Ivan
Vecera.
6) Fix regression in that we were inheriting the congestion control of
the listening socket for new connections, the intended behavior
always was to use the default in this case. From Neal Cardwell.
7) Fix NULL deref in brcmfmac driver, from Arend van Spriel.
8) OTP parsing fix in iwlwifi from Liad Kaufman.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (26 commits)
vti6: Add pmtu handling to vti6_xmit.
Revert "net: core: 'ethtool' issue with querying phy settings"
bnx2x: Move statistics implementation into semaphores
xen: netback: read hotplug script once at start of day.
xen: netback: fix printf format string warning
Revert "netfilter: ensure number of counters is >0 in do_replace()"
net: dsa: Properly propagate errors from dsa_switch_setup_one
tcp: fix child sockets to use system default congestion control if not set
udp: fix behavior of wrong checksums
sfc: free multiple Rx buffers when required
bna: fix soft lock-up during firmware initialization failure
bna: remove unreasonable iocpf timer start
bna: fix firmware loading on big-endian machines
bridge: fix br_multicast_query_expired() bug
via-rhine: Resigning as maintainer
brcmfmac: avoid null pointer access when brcmf_msgbuf_get_pktid() fails
mac80211: Fix mac80211.h docbook comments
iwlwifi: nvm: fix otp parsing in 8000 hw family
iwlwifi: pcie: fix tracking of cmd_in_flight
ip_vti/ip6_vti: Preserve skb->mark after rcv_cb call
...
Linus Torvalds [Tue, 2 Jun 2015 03:44:51 +0000 (20:44 -0700)]
Merge git://git./linux/kernel/git/davem/sparc
Pull Sparc fixes from David Miller:
1) Setup the core/threads/sockets bitmaps correctly so that 'lscpus'
and friends operate properly. Frtom Chris Hyser.
2) The bit that normally means "Cached Virtually" on sun4v systems,
actually changes meaning in M7 and later chips. Fix from Khalid
Aziz.
3) One some PCI-E systems we need to probe different OF properties to
fill in the PCI slot information properly, from Eric Snowberg.
4) Kill an extraneous memset after kzalloc(), from Christophe Jaillet.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc: Resolve conflict between sparc v9 and M7 on usage of bit 9 of TTE
sparc64: pci slots information is not populated in sysfs
sparc: kernel: GRPCI2: Remove a useless memset
sparc64: Setup sysfs to mark LDOM sockets, cores and threads correctly
Linus Torvalds [Tue, 2 Jun 2015 01:49:45 +0000 (18:49 -0700)]
Merge tag 'for_linus' of git://git./linux/kernel/git/mst/vhost
Pull virtio fix from Michael Tsirkin:
"Last-minute virtio fix for 4.1
This tweaks an exported user-space header to fix build breakage for
userspace using it"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
include/uapi/linux/virtio_balloon.h: include linux/virtio_types.h
David S. Miller [Mon, 1 Jun 2015 23:56:43 +0000 (16:56 -0700)]
Merge git://git./pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:
====================
Netfilter fix for net
The following patch reverts the ebtables chunk that enforces counters that was
introduced in the recently applied
d26e2c9ffa38 ('Revert "netfilter: ensure
number of counters is >0 in do_replace()"') since this breaks ebtables.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 1 Jun 2015 23:06:29 +0000 (16:06 -0700)]
Merge tag 'wireless-drivers-for-davem-2015-06-01' of git://git./linux/kernel/git/kvalo/wireless-drivers
Kalle Valo says:
====================
iwlwifi:
* fix OTP parsing 8260
* fix powersave handling for 8260
brcmfmac:
* fix null pointer crash
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert [Fri, 29 May 2015 18:28:26 +0000 (11:28 -0700)]
vti6: Add pmtu handling to vti6_xmit.
We currently rely on the PMTU discovery of xfrm.
However if a packet is localy sent, the PMTU mechanism
of xfrm tries to to local socket notification what
might not work for applications like ping that don't
check for this. So add pmtu handling to vti6_xmit to
report MTU changes immediately.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 1 Jun 2015 21:43:50 +0000 (14:43 -0700)]
Revert "net: core: 'ethtool' issue with querying phy settings"
This reverts commit
f96dee13b8e10f00840124255bed1d8b4c6afd6f.
It isn't right, ethtool is meant to manage one PHY instance
per netdevice at a time, and this is selected by the SET
command. Therefore by definition the GET command must only
return the settings for the configured and selected PHY.
Reported-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Mon, 1 Jun 2015 12:08:18 +0000 (15:08 +0300)]
bnx2x: Move statistics implementation into semaphores
Commit
dff173de84958 ("bnx2x: Fix statistics locking scheme") changed the
bnx2x locking around statistics state into using a mutex - but the lock
is being accessed via a timer which is forbidden.
[If compiled with CONFIG_DEBUG_MUTEXES, logs show a warning about
accessing the mutex in interrupt context]
This moves the implementation into using a semaphore [with size '1']
instead.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ian Campbell [Mon, 1 Jun 2015 10:30:24 +0000 (11:30 +0100)]
xen: netback: read hotplug script once at start of day.
When we come to tear things down in netback_remove() and generate the
uevent it is possible that the xenstore directory has already been
removed (details below).
In such cases netback_uevent() won't be able to read the hotplug
script and will write a xenstore error node.
A recent change to the hypervisor exposed this race such that we now
sometimes lose it (where apparently we didn't ever before).
Instead read the hotplug script configuration during setup and use it
for the lifetime of the backend device.
The apparently more obvious fix of moving the transition to
state=Closed in netback_remove() to after the uevent does not work
because it is possible that we are already in state=Closed (in
reaction to the guest having disconnected as it shutdown). Being
already in Closed means the toolstack is at liberty to start tearing
down the xenstore directories. In principal it might be possible to
arrange to unregister the device sooner (e.g on transition to Closing)
such that xenstore would still be there but this state machine is
fragile and prone to anger...
A modern Xen system only relies on the hotplug uevent for driver
domains, when the backend is in the same domain as the toolstack it
will run the necessary setup/teardown directly in the correct sequence
wrt xenstore changes.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ian Campbell [Mon, 1 Jun 2015 10:30:04 +0000 (11:30 +0100)]
xen: netback: fix printf format string warning
drivers/net/xen-netback/netback.c: In function ‘xenvif_tx_build_gops’:
drivers/net/xen-netback/netback.c:1253:8: warning: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 5 has type ‘int’ [-Wformat=]
(txreq.offset&~PAGE_MASK) + txreq.size);
^
PAGE_MASK's type can vary by arch, so a cast is needed.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
----
v2: Cast to unsigned long, since PAGE_MASK can vary by arch.
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bernhard Thaler [Thu, 28 May 2015 08:26:18 +0000 (10:26 +0200)]
Revert "netfilter: ensure number of counters is >0 in do_replace()"
This partially reverts commit
1086bbe97a07 ("netfilter: ensure number of
counters is >0 in do_replace()") in net/bridge/netfilter/ebtables.c.
Setting rules with ebtables does not work any more with
1086bbe97a07 place.
There is an error message and no rules set in the end.
e.g.
~# ebtables -t nat -A POSTROUTING --src 12:34:56:78:9a:bc -j DROP
Unable to update the kernel. Two possible causes:
1. Multiple ebtables programs were executing simultaneously. The ebtables
userspace tool doesn't by default support multiple ebtables programs
running
Reverting the ebtables part of
1086bbe97a07 makes this work again.
Signed-off-by: Bernhard Thaler <bernhard.thaler@wvnet.at>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Mikko Rapeli [Sat, 30 May 2015 15:39:25 +0000 (17:39 +0200)]
include/uapi/linux/virtio_balloon.h: include linux/virtio_types.h
Fixes userspace compilation error:
error: unknown type name ‘__virtio16’
__virtio16 tag;
Signed-off-by: Mikko Rapeli <mikko.rapeli@iki.fi>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Khalid Aziz [Wed, 27 May 2015 16:00:46 +0000 (10:00 -0600)]
sparc: Resolve conflict between sparc v9 and M7 on usage of bit 9 of TTE
sparc: Resolve conflict between sparc v9 and M7 on usage of bit 9 of TTE
Bit 9 of TTE is CV (Cacheable in V-cache) on sparc v9 processor while
the same bit 9 is MCDE (Memory Corruption Detection Enable) on M7
processor. This creates a conflicting usage of the same bit. Kernel
sets TTE.cv bit on all pages for sun4v architecture which works well
for sparc v9 but enables memory corruption detection on M7 processor
which is not the intent. This patch adds code to determine if kernel
is running on M7 processor and takes steps to not enable memory
corruption detection in TTE erroneously.
Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Snowberg [Wed, 27 May 2015 15:59:19 +0000 (11:59 -0400)]
sparc64: pci slots information is not populated in sysfs
Add PCI slot numbers within sysfs for PCIe hardware. Larger
PCIe systems with nested PCI bridges and slots further
down on these bridges were not being populated within sysfs.
This will add ACPI style PCI slot numbers for these systems
since the OF 'slot-names' information is not available on
all PCIe platforms.
Signed-off-by: Eric Snowberg <eric.snowberg@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Christophe Jaillet [Fri, 1 May 2015 12:05:39 +0000 (14:05 +0200)]
sparc: kernel: GRPCI2: Remove a useless memset
grpci2priv is allocated using kzalloc, so there is no need to memset it.
Signed-off-by: Christophe Jaillet <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Fri, 29 May 2015 17:29:46 +0000 (10:29 -0700)]
net: dsa: Properly propagate errors from dsa_switch_setup_one
While shuffling some code around, dsa_switch_setup_one() was introduced,
and it was modified to return either an error code using ERR_PTR() or a
NULL pointer when running out of memory or failing to setup a switch.
This is a problem for its caler: dsa_switch_setup() which uses IS_ERR()
and expects to find an error code, not a NULL pointer, so we still try
to proceed with dsa_switch_setup() and operate on invalid memory
addresses. This can be easily reproduced by having e.g: the bcm_sf2
driver built-in, but having no such switch, such that drv->setup will
fail.
Fix this by using PTR_ERR() consistently which is both more informative
and avoids for the caller to use IS_ERR_OR_NULL().
Fixes:
df197195a5248 ("net: dsa: split dsa_switch_setup into two functions")
Reported-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Neal Cardwell [Fri, 29 May 2015 17:47:07 +0000 (13:47 -0400)]
tcp: fix child sockets to use system default congestion control if not set
Linux 3.17 and earlier are explicitly engineered so that if the app
doesn't specifically request a CC module on a listener before the SYN
arrives, then the child gets the system default CC when the connection
is established. See tcp_init_congestion_control() in 3.17 or earlier,
which says "if no choice made yet assign the current value set as
default". The change ("net: tcp: assign tcp cong_ops when tcp sk is
created") altered these semantics, so that children got their parent
listener's congestion control even if the system default had changed
after the listener was created.
This commit returns to those original semantics from 3.17 and earlier,
since they are the original semantics from 2007 in
4d4d3d1e8 ("[TCP]:
Congestion control initialization."), and some Linux congestion
control workflows depend on that.
In summary, if a listener socket specifically sets TCP_CONGESTION to
"x", or the route locks the CC module to "x", then the child gets
"x". Otherwise the child gets current system default from
net.ipv4.tcp_congestion_control. That's the behavior in 3.17 and
earlier, and this commit restores that.
Fixes:
55d8694fa82c ("net: tcp: assign tcp cong_ops when tcp sk is created")
Cc: Florian Westphal <fw@strlen.de>
Cc: Daniel Borkmann <dborkman@redhat.com>
Cc: Glenn Judd <glenn.judd@morganstanley.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sat, 30 May 2015 16:16:53 +0000 (09:16 -0700)]
udp: fix behavior of wrong checksums
We have two problems in UDP stack related to bogus checksums :
1) We return -EAGAIN to application even if receive queue is not empty.
This breaks applications using edge trigger epoll()
2) Under UDP flood, we can loop forever without yielding to other
processes, potentially hanging the host, especially on non SMP.
This patch is an attempt to make things better.
We might in the future add extra support for rt applications
wanting to better control time spent doing a recv() in a hostile
environment. For example we could validate checksums before queuing
packets in socket receive queue.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Mon, 1 Jun 2015 02:01:07 +0000 (19:01 -0700)]
Linux 4.1-rc6
Daniel Pieczko [Fri, 29 May 2015 11:25:54 +0000 (12:25 +0100)]
sfc: free multiple Rx buffers when required
When Rx packet data must be dropped, all the buffers
associated with that Rx packet must be freed. Extend
and rename efx_free_rx_buffer() to efx_free_rx_buffers()
and loop through all the fragments.
By doing so this patch fixes a possible memory leak.
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sun, 31 May 2015 23:00:34 +0000 (16:00 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs
Pull vfs fix from Al Viro:
"Off-by-one in d_walk()/__dentry_kill() race fix.
It's very hard to hit; possible in the same conditions as the original
bug, except that you need the skipped branch to contain all the
remaining evictables, so that the d_walk()-calling loop in
d_invalidate() decides there's nothing more to do and doesn't go for
another pass - otherwise that next pass will sweep the sucker.
So it's not too urgent, but seeing that the fix is obvious and the
original commit has spread into all -stable branches..."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
d_walk() might skip too much
Linus Torvalds [Sun, 31 May 2015 19:20:59 +0000 (12:20 -0700)]
Merge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm
Pull ARM fixes from Russell King:
"Three fixes this time around:
- fix a memory leak which occurs when probing performance monitoring
unit interrupts
- fix handling of non-PMD aligned end of RAM causing boot failures
- fix missing syscall trace exit path with syscall tracing enabled
causing a kernel oops in the audit code"
* 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
ARM: 8357/1: perf: fix memory leak when probing PMU PPIs
ARM: fix missing syscall trace exit
ARM: 8356/1: mm: handle non-pmd-aligned end of RAM
Linus Torvalds [Sun, 31 May 2015 19:03:42 +0000 (12:03 -0700)]
Merge branch 'upstream' of git://git./linux/kernel/git/ralf/linux
Pull MIPS fixes from Ralf Baechle:
"MIPS fixes for 4.1 all across the tree"
* 'upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/ralf/linux:
MIPS: strnlen_user.S: Fix a CPU_DADDI_WORKAROUNDS regression
MIPS: BMIPS: Fix bmips_wr_vec()
MIPS: ath79: fix build problem if CONFIG_BLK_DEV_INITRD is not set
MIPS: Fuloong 2E: Replace CONFIG_USB_ISP1760_HCD by CONFIG_USB_ISP1760
MIPS: irq: Use DECLARE_BITMAP
ttyFDC: Fix to use native endian MMIO reads
MIPS: Fix CDMM to use native endian MMIO reads
Linus Torvalds [Sun, 31 May 2015 18:39:25 +0000 (11:39 -0700)]
Merge branch 'turbostat' of git://git./linux/kernel/git/lenb/linux
Pull turbostat tool fixes from Len Brown:
"Just one minor kernel dependency in this batch -- added a #define to
msr-index.h"
* 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
tools/power turbostat: update version number to 4.7
tools/power turbostat: allow running without cpu0
tools/power turbostat: correctly decode of ENERGY_PERFORMANCE_BIAS
tools/power turbostat: enable turbostat to support Knights Landing (KNL)
tools/power turbostat: correctly display more than 2 threads/core
Linus Torvalds [Sun, 31 May 2015 18:31:42 +0000 (11:31 -0700)]
Merge git://git./linux/kernel/git/nab/target-pending
Pull SCSI target fixes from Nicholas Bellinger:
"These are mostly minor fixes, with the exception of the following that
address fall-out from recent v4.1-rc1 changes:
- regression fix related to the big fabric API registration changes
and configfs_depend_item() usage, that required cherry-picking one
of HCH's patches from for-next to address the issue for v4.1 code.
- remaining TCM-USER -v2 related changes to enforce full CDB
passthrough from Andy + Ilias.
Also included is a target_core_pscsi driver fix from Andy that
addresses a long standing issue with a Scsi_Host reference being
leaked on PSCSI device shutdown"
* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
iser-target: Fix error path in isert_create_pi_ctx()
target: Use a PASSTHROUGH flag instead of transport_types
target: Move passthrough CDB parsing into a common function
target/user: Only support full command pass-through
target/user: Update example code for new ABI requirements
target/pscsi: Don't leak scsi_host if hba is VIRTUAL_HOST
target: Fix se_tpg_tfo->tf_subsys regression + remove tf_subsystem
target: Drop signal_pending checks after interruptible lock acquire
target: Add missing parentheses
target: Fix bidi command handling
target/user: Disallow full passthrough (pass_level=0)
ISCSI: fix minor memory leak
Linus Torvalds [Sun, 31 May 2015 18:24:49 +0000 (11:24 -0700)]
Merge tag 'hwmon-for-linus-v4.1-rc6' of git://git./linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
"Some late hwmon patches, all headed for -stable
- fix sysfs attribute initialization in nct6775 and nct6683 drivers
- do not attempt to auto-detect tmp435 on I2C address 0x37
- ensure iio channel is of type IIO_VOLTAGE in ntc_thermistor driver"
* tag 'hwmon-for-linus-v4.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (nct6683) Add missing sysfs attribute initialization
hwmon: (nct6775) Add missing sysfs attribute initialization
hwmon: (tmp401) Do not auto-detect chip on I2C address 0x37
hwmon: (ntc_thermistor) Ensure iio channel is of type IIO_VOLTAGE
David S. Miller [Sun, 31 May 2015 06:46:54 +0000 (23:46 -0700)]
Merge branch 'bna-fixes'
Ivan Vecera says:
====================
bna: misc bugfixes
These patches fix several bugs found during device initialization debugging.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ivan Vecera [Thu, 28 May 2015 21:10:08 +0000 (23:10 +0200)]
bna: fix soft lock-up during firmware initialization failure
Bug in the driver initialization causes soft-lockup if firmware
initialization timeout is reached. Polling function bfa_ioc_poll_fwinit()
incorrectly calls bfa_nw_iocpf_timeout() when the timeout is reached.
The problem is that bfa_nw_iocpf_timeout() calls again
bfa_ioc_poll_fwinit()... etc. The bfa_ioc_poll_fwinit() should directly
send timeout event for iocpf and the same should be done if firmware
download into HW fails.
Cc: Rasesh Mody <rasesh.mody@qlogic.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ivan Vecera [Thu, 28 May 2015 21:10:07 +0000 (23:10 +0200)]
bna: remove unreasonable iocpf timer start
Driver starts iocpf timer prior bnad_ioceth_enable() call and this is
unreasonable. This piece of code probably originates from Brocade/Qlogic
out-of-box driver during initial import into upstream. This driver uses
only one timer and queue to implement multiple timers and this timer is
started at this place. The upstream driver uses multiple timers instead
of this.
Cc: Rasesh Mody <rasesh.mody@qlogic.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ivan Vecera [Thu, 28 May 2015 21:10:06 +0000 (23:10 +0200)]
bna: fix firmware loading on big-endian machines
Firmware required by bna is stored in appropriate files as sequence
of LE32 integers. After loading by request_firmware() they need to be
byte-swapped on big-endian arches. Without this conversion the NIC
is unusable on big-endian machines.
Cc: Rasesh Mody <rasesh.mody@qlogic.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 31 May 2015 06:37:46 +0000 (23:37 -0700)]
Merge tag 'mac80211-for-davem-2015-05-28' of git://git./linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
This just has a single docbook build fix. In my confusion
I'd already sent the same fix for -next, but Ben Hutchings
noted it's necessary in 4.1.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 28 May 2015 11:42:54 +0000 (04:42 -0700)]
bridge: fix br_multicast_query_expired() bug
br_multicast_query_expired() querier argument is a pointer to
a struct bridge_mcast_querier :
struct bridge_mcast_querier {
struct br_ip addr;
struct net_bridge_port __rcu *port;
};
Intent of the code was to clear port field, not the pointer to querier.
Fixes:
2cd4143192e8 ("bridge: memorize and export selected IGMP/MLD querier port")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Acked-by: Linus Lüssing <linus.luessing@c0d3.blue>
Cc: Linus Lüssing <linus.luessing@web.de>
Cc: Steinar H. Gunderson <sesse@samfundet.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Roland Dreier [Sat, 30 May 2015 06:12:10 +0000 (23:12 -0700)]
iser-target: Fix error path in isert_create_pi_ctx()
We don't assign pi_ctx to desc->pi_ctx until we're certain to succeed
in the function. That means the cleanup path should use the local
pi_ctx variable, not desc->pi_ctx.
This was detected by Coverity (CID 1260062).
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Andy Grover [Tue, 19 May 2015 21:44:41 +0000 (14:44 -0700)]
target: Use a PASSTHROUGH flag instead of transport_types
It seems like we only care if a transport is passthrough or not. Convert
transport_type to a flags field and replace TRANSPORT_PLUGIN_* with a
flag, TRANSPORT_FLAG_PASSTHROUGH.
Signed-off-by: Andy Grover <agrover@redhat.com>
Reviewed-by: Ilias Tsitsimpis <iliastsi@arrikto.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Andy Grover [Tue, 19 May 2015 21:44:40 +0000 (14:44 -0700)]
target: Move passthrough CDB parsing into a common function
Aside from whether they handle BIDI ops or not, parsing of the CDB by
kernel and user SCSI passthrough modules should be identical. Move this
into a new passthrough_parse_cdb() and call it from tcm-pscsi and tcm-user.
Reported-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ilias Tsitsimpis <iliastsi@arrikto.com>
Signed-off-by: Andy Grover <agrover@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Andy Grover [Tue, 19 May 2015 21:44:39 +0000 (14:44 -0700)]
target/user: Only support full command pass-through
After much discussion, give up on only passing a subset of SCSI commands
to userspace and pass them all. Based on what pscsi is doing, make sure
to set SCF_SCSI_DATA_CDB for I/O ops, and define attributes identical to
pscsi.
Make hw_block_size configurable via dev param.
Remove mention of command filtering from tcmu-design.txt.
Signed-off-by: Andy Grover <agrover@redhat.com>
Reviewed-by: Ilias Tsitsimpis <iliastsi@arrikto.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Andy Grover [Tue, 19 May 2015 21:44:38 +0000 (14:44 -0700)]
target/user: Update example code for new ABI requirements
We now require that the userspace handler set a bit if the command is not
handled.
Update calls to tcmu_hdr_get_op for v2.
Signed-off-by: Andy Grover <agrover@redhat.com>
Reviewed-by: Ilias Tsitsimpis <iliastsi@arrikto.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Andy Grover [Fri, 22 May 2015 21:07:44 +0000 (14:07 -0700)]
target/pscsi: Don't leak scsi_host if hba is VIRTUAL_HOST
See https://bugzilla.redhat.com/show_bug.cgi?id=1025672
We need to put() the reference to the scsi host that we got in
pscsi_configure_device(). In VIRTUAL_HOST mode it is associated with
the dev_virt, not the hba_virt.
Signed-off-by: Andy Grover <agrover@redhat.com>
Cc: stable@vger.kernel.org # 2.6.38+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Christoph Hellwig [Sun, 3 May 2015 06:50:52 +0000 (08:50 +0200)]
target: Fix se_tpg_tfo->tf_subsys regression + remove tf_subsystem
There is just one configfs subsystem in the target code, so we might as
well add two helpers to reference / unreference it from the core code
instead of passing pointers to it around.
This fixes a regression introduced for v4.1-rc1 with commit
9ac8928e6,
where configfs_depend_item() callers using se_tpg_tfo->tf_subsys would
fail, because the assignment from the original target_core_subsystem[]
is no longer happening at target_register_template() time.
(Fix target_core_exit_configfs pointer dereference - Sagi)
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Guenter Roeck [Thu, 28 May 2015 16:12:23 +0000 (09:12 -0700)]
hwmon: (nct6683) Add missing sysfs attribute initialization
The following error message is seen when loading the nct6683 driver
with DEBUG_LOCK_ALLOC enabled.
BUG: key
ffff88040b2f0030 not in .data!
------------[ cut here ]------------
WARNING: CPU: 0 PID: 186 at kernel/locking/lockdep.c:2988
lockdep_init_map+0x469/0x630()
DEBUG_LOCKS_WARN_ON(1)
Caused by a missing call to sysfs_attr_init() when initializing
sysfs attributes.
Reported-by: Alexey Orishko <alexey.orishko@gmail.com>
Reviewed-by: Jean Delvare <jdelvare@suse.de>
Cc: stable@vger.kernel.org # v3.18+
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Guenter Roeck [Thu, 28 May 2015 16:08:09 +0000 (09:08 -0700)]
hwmon: (nct6775) Add missing sysfs attribute initialization
The following error message is seen when loading the nct6775 driver
with DEBUG_LOCK_ALLOC enabled.
BUG: key
ffff88040b2f0030 not in .data!
------------[ cut here ]------------
WARNING: CPU: 0 PID: 186 at kernel/locking/lockdep.c:2988
lockdep_init_map+0x469/0x630()
DEBUG_LOCKS_WARN_ON(1)
Caused by a missing call to sysfs_attr_init() when initializing
sysfs attributes.
Reported-by: Alexey Orishko <alexey.orishko@gmail.com>
Reviewed-by: Jean Delvare <jdelvare@suse.de>
Cc: stable@vger.kernel.org # v3.12+
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Linus Torvalds [Sat, 30 May 2015 00:09:39 +0000 (17:09 -0700)]
Merge tag 'acpi-pci-4.1-rc6' of git://git./linux/kernel/git/rafael/linux-pm
Pull PCI / ACPI fix from Rafael Wysocki:
"This fixes a bug uncovered by a recent driver core change that
modified the implementation of the ACPI_COMPANION_SET() macro to
strictly rely on its second argument to be either NULL or a valid
pointer to struct acpi_device.
As it turns out, pcibios_root_bridge_prepare() on x86 and ia64 works
with the assumption that the only code path calling pci_create_root_bus()
is pci_acpi_scan_root() and therefore the sysdata argument passed to
it will always match the expectations of pcibios_root_bridge_prepare().
That need not be the case, however, and in particular it is not the
case for the Xen pcifront driver that passes a pointer to its own
private data strcture as sysdata to pci_scan_bus_parented() which then
passes it to pci_create_root_bus() and it ends up being used incorrectly
by pcibios_root_bridge_prepare()"
* tag 'acpi-pci-4.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PCI / ACPI: Do not set ACPI companions for host bridges with parents
Linus Torvalds [Fri, 29 May 2015 23:45:45 +0000 (16:45 -0700)]
Merge tag 'xfs-for-linus-4.1-rc6' of git://git./linux/kernel/git/dgc/linux-xfs
Pull xfs fixes from Dave Chinner:
"This is a little larger than I'd like late in the release cycle, but
all the fixes are for regressions introduced in the 4.1-rc1 merge, or
are needed back in -stable kernels fairly quickly as they are
filesystem corruption or userspace visible correctness issues.
Changes in this update:
- regression fix for new rename whiteout code
- regression fixes for new superblock generic per-cpu counter code
- fix for incorrect error return sign introduced in 3.17
- metadata corruption fixes that need to go back to -stable kernels"
* tag 'xfs-for-linus-4.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs:
xfs: fix broken i_nlink accounting for whiteout tmpfile inode
xfs: xfs_iozero can return positive errno
xfs: xfs_attr_inactive leaves inconsistent attr fork state behind
xfs: extent size hints can round up extents past MAXEXTLEN
xfs: inode and free block counters need to use __percpu_counter_compare
percpu_counter: batch size aware __percpu_counter_compare()
xfs: use percpu_counter_read_positive for mp->m_icount
Linus Torvalds [Fri, 29 May 2015 21:48:58 +0000 (14:48 -0700)]
Merge tag 'fixes-for-linus' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Arnd Bergmann:
"Two weeks worth of small bug fixes this time, nothing sticking out
this time:
- one defconfig change to adapt to a modified Kconfig symbol
- two fixes for i.MX for backwards compatibility with older DT files
that was accidentally broken
- one regression fix for irq handling on pxa
- three small dt files on omap, and one each for imx and exynos"
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: multi_v7_defconfig: Replace CONFIG_USB_ISP1760_HCD by CONFIG_USB_ISP1760
ARM: imx6: gpc: don't register power domain if DT data is missing
ARM: imx6: allow booting with old DT
ARM: dts: set display clock correctly for exynos4412-trats2
ARM: pxa: pxa_cplds: signedness bug in probe
ARM: dts: Fix WLAN interrupt line for AM335x EVM-SK
ARM: dts: omap3-devkit8000: Fix NAND DT node
ARM: dts: am335x-boneblack: disable RTC-only sleep
ARM: dts: fix imx27 dtb build rule
ARM: dts: imx27: only map 4 Kbyte for fec registers
Linus Torvalds [Fri, 29 May 2015 21:39:24 +0000 (14:39 -0700)]
Merge tag 'dm-4.1-fixes-3' of git://git./linux/kernel/git/device-mapper/linux-dm
Pull device-mapper fixes from Mike Snitzer:
"Quite a few fixes for DM's blk-mq support thanks to extra DM multipath
testing from Junichi Nomura and Bart Van Assche.
Also fix a casting bug in dm_merge_bvec() that could cause only a
single page to be added to a bio (Joe identified this while testing
dm-cache writeback)"
* tag 'dm-4.1-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm: fix casting bug in dm_merge_bvec()
dm: fix reload failure of 0 path multipath mapping on blk-mq devices
dm: fix false warning in free_rq_clone() for unmapped requests
dm: requeue from blk-mq dm_mq_queue_rq() using BLK_MQ_RQ_QUEUE_BUSY
dm mpath: fix leak of dm_mpath_io structure in blk-mq .queue_rq error path
dm: fix NULL pointer when clone_and_map_rq returns !DM_MAPIO_REMAPPED
dm: run queue on re-queue
Guenter Roeck [Wed, 27 May 2015 23:11:48 +0000 (16:11 -0700)]
hwmon: (tmp401) Do not auto-detect chip on I2C address 0x37
I2C address 0x37 may be used by EEPROMs, which can result in false
positives. Do not attempt to detect a chip at this address.
Reviewed-by: Jean Delvare <jdelvare@suse.de>
Cc: stable@vger.kernel.org # v4.0+
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Linus Torvalds [Fri, 29 May 2015 20:28:57 +0000 (13:28 -0700)]
Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
"10 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
scripts/gdb: fix lx-lsmod refcnt
omfs: fix potential integer overflow in allocator
omfs: fix sign confusion for bitmap loop counter
omfs: set error return when d_make_root() fails
fs, omfs: add NULL terminator in the end up the token list
MAINTAINERS: update CAPABILITIES pattern
fs/binfmt_elf.c:load_elf_binary(): return -EINVAL on zero-length mappings
tracing/mm: don't trace mm_page_pcpu_drain on offline cpus
tracing/mm: don't trace mm_page_free on offline cpus
tracing/mm: don't trace kmem_cache_free on offline cpus
Linus Torvalds [Fri, 29 May 2015 18:24:28 +0000 (11:24 -0700)]
Merge tag 'fixes-for-linus' of git://git./linux/kernel/git/rusty/linux
Pull fixes for cpumask and modules from Rusty Russell:
"** NOW WITH TESTING! **
Two fixes which got lost in my recent distraction. One is a weird
cpumask function which needed to be rewritten, the other is a module
bug which is cc:stable"
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
cpumask_set_cpu_local_first => cpumask_local_spread, lament
module: Call module notifier on failure after complete_formation()
Maciej W. Rozycki [Thu, 28 May 2015 16:46:49 +0000 (17:46 +0100)]
MIPS: strnlen_user.S: Fix a CPU_DADDI_WORKAROUNDS regression
Correct a regression introduced with
8453eebd [MIPS: Fix strnlen_user()
return value in case of overlong strings.] causing assembler warnings
and broken code generated in __strnlen_kernel_nocheck_asm:
arch/mips/lib/strnlen_user.S: Assembler messages:
arch/mips/lib/strnlen_user.S:64: Warning: Macro instruction expanded into multiple instructions in a branch delay slot
with the CPU_DADDI_WORKAROUNDS option set, resulting in the function
looping indefinitely upon mounting NFS root.
Use conditional assembly to avoid a microMIPS code size regression.
Using $at unconditionally would cause such a regression as there are no
16-bit instruction encodings available for ALU operations using this
register. Using $v1 unconditionally would produce short microMIPS
encodings, but would prevent this register from being used across calls
to this function.
The extra LI operation introduced is free, replacing a NOP originally
scheduled into the delay slot of the branch that follows.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/10205/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Petri Gynther [Wed, 27 May 2015 06:25:08 +0000 (23:25 -0700)]
MIPS: BMIPS: Fix bmips_wr_vec()
bmips_wr_vec() copies exception vector code from start to dst.
The call to dma_cache_wback() needs to flush (end-start) bytes,
starting at dst, from write-back cache to memory.
Signed-off-by: Petri Gynther <pgynther@google.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Kevin Cernekee <cernekee@gmail.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/10193/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Laurent Fasnacht [Wed, 27 May 2015 17:50:00 +0000 (19:50 +0200)]
MIPS: ath79: fix build problem if CONFIG_BLK_DEV_INITRD is not set
initrd_start is defined in init/do_mounts_initrd.c, which is only
included in kernel if CONFIG_BLK_DEV_INITRD=y.
Signed-off-by: Laurent Fasnacht <l@libres.ch>
Cc: linux-mips@linux-mips.org
Cc: trivial@kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/10198/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Linus Torvalds [Fri, 29 May 2015 17:52:15 +0000 (10:52 -0700)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"This is made up 4 groups of fixes detailed below.
vgem:
Due to some misgivings about possible bad use cases this allow,
backout a chunk of the interface to stop those use cases for now.
radeon:
Fix for an oops regression in the audio code, and a partial revert
for a fix that was cauing problems.
nouveau:
regression fix for Fermi, and display-less Maxwell boot fixes.
drm core:
a fix for i915 cursor vblank waiting in the atomic helpers"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/nouveau/gr/gm204: remove a stray printk
drm/nouveau/devinit/gm100-: force devinit table execution on boards without PDISP
drm/nouveau/devinit/gf100: make the force-post condition more obvious
drm/nouveau/gr/gf100-: fix wrong constant definition
drm/radeon: partially revert "fix VM_CONTEXT*_PAGE_TABLE_END_ADDR handling"
drm/radeon/audio: make sure connector is valid in hotplug case
Revert "drm/radeon: only mark audio as connected if the monitor supports it (v3)"
drm/radeon: don't share plls if monitors differ in audio support
drm/vgem: drop DRIVER_PRIME (v2)
drm/plane-helper: Adapt cursor hack to transitional helpers
Linus Torvalds [Fri, 29 May 2015 17:43:27 +0000 (10:43 -0700)]
Merge tag 'sound-4.1-rc6' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"No big surprise here, just a bunch of small fixes for HD-audio and
USB-audio:
- partial revert of widget power-saving for IDT codecs
- revert mute-LED enum ctl for Thinkpads due to confusion
- a quirk for a new Radeon HDMI controller
- Realtek codec name fix for Dell
- a workaround for headphone mic boost on some laptops
- stream_pm ops setup (and its fix for regression)
- another quirk for MS LifeCam USB-audio"
* tag 'sound-4.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda - Fix lost sound due to stream_pm ops cleanup
ALSA: hda - Disable Headphone Mic boost for ALC662
ALSA: hda - Disable power_save_node for IDT92HD71bxx
ALSA: hda - Fix noise on AMD radeon 290x controller
ALSA: hda - Set stream_pm ops automatically by generic parser
ALSA: hda/realtek - Add ALC256 alias name for Dell
Revert "ALSA: hda - Add mute-LED mode control to Thinkpad"
ALSA: usb-audio: Add quirk for MS LifeCam HD-3000
Joe Thornber [Fri, 29 May 2015 13:52:51 +0000 (14:52 +0100)]
dm: fix casting bug in dm_merge_bvec()
dm_merge_bvec() was originally added in f6fccb ("dm: introduce
merge_bvec_fn"). In that commit a value in sectors is converted to
bytes using << 9, and then assigned to an int. This code made
assumptions about the value of BIO_MAX_SECTORS.
A later commit 148e51 ("dm: improve documentation and code clarity in
dm_merge_bvec") was meant to have no functional change but it removed
the use of BIO_MAX_SECTORS in favor of using queue_max_sectors(). At
this point the cast from sector_t to int resulted in a zero value. The
fallout being dm_merge_bvec() would only allow a single page to be added
to a bio.
This interim fix is minimal for the benefit of stable@ because the more
comprehensive cleanup of passing a sector_t to all DM targets' merge
function will impact quite a few DM targets.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 3.19+
Junichi Nomura [Fri, 29 May 2015 08:51:03 +0000 (08:51 +0000)]
dm: fix reload failure of 0 path multipath mapping on blk-mq devices
dm-multipath accepts 0 path mapping.
# echo '0 2097152 multipath 0 0 0 0' | dmsetup create newdev
Such a mapping can be used to release underlying devices while still
holding requests in its queue until working paths come back.
However, once the multipath device is created over blk-mq devices,
it rejects reloading of 0 path mapping:
# echo '0 2097152 multipath 0 0 1 1 queue-length 0 1 1 /dev/sda 1' \
| dmsetup create mpath1
# echo '0 2097152 multipath 0 0 0 0' | dmsetup load mpath1
device-mapper: reload ioctl on mpath1 failed: Invalid argument
Command failed
With following kernel message:
device-mapper: ioctl: can't change device type after initial table load.
DM tries to inherit the current table type using dm_table_set_type()
but it doesn't work as expected because of unnecessary check about
whether the target type is hybrid or not.
Hybrid type is for targets that work as either request-based or bio-based
and not required for blk-mq or non blk-mq checking.
Fixes:
65803c205983 ("dm table: train hybrid target type detection to select blk-mq if appropriate")
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Linus Torvalds [Fri, 29 May 2015 17:35:21 +0000 (10:35 -0700)]
Merge tag 'md/4.1-rc5-fixes' of git://neil.brown.name/md
Pull m,ore md bugfixes gfrom Neil Brown:
"Assorted fixes for new RAID5 stripe-batching functionality.
Unfortunately this functionality was merged a little prematurely. The
necessary testing and code review is now complete (or as complete as
it can be) and to code passes a variety of tests and looks quite
sensible.
Also a fix for some recent locking changes - a race was introduced
which causes a reshape request to sometimes fail. No data safety
issues"
* tag 'md/4.1-rc5-fixes' of git://neil.brown.name/md:
md: fix race when unfreezing sync_action
md/raid5: break stripe-batches when the array has failed.
md/raid5: call break_stripe_batch_list from handle_stripe_clean_event
md/raid5: be more selective about distributing flags across batch.
md/raid5: add handle_flags arg to break_stripe_batch_list.
md/raid5: duplicate some more handle_stripe_clean_event code in break_stripe_batch_list
md/raid5: remove condition test from check_break_stripe_batch_list.
md/raid5: Ensure a batch member is not handled prematurely.
md/raid5: close race between STRIPE_BIT_DELAY and batching.
md/raid5: ensure whole batch is delayed for all required bitmap updates.
Mike Snitzer [Thu, 28 May 2015 19:12:52 +0000 (15:12 -0400)]
dm: fix false warning in free_rq_clone() for unmapped requests
When stacking request-based dm device on non blk-mq device and
device-mapper target could not map the request (error target is used,
multipath target with all paths down, etc), the WARN_ON_ONCE() in
free_rq_clone() will trigger when it shouldn't.
The warning was added by commit aa6df8d ("dm: fix free_rq_clone() NULL
pointer when requeueing unmapped request"). But free_rq_clone() with
clone->q == NULL is valid usage for the case where
dm_kill_unmapped_request() initiates request cleanup.
Fix this false warning by just removing the WARN_ON -- it only generated
false positives and was never useful in catching the intended case
(completing clone request not being mapped e.g. clone->q being NULL).
Fixes: aa6df8d ("dm: fix free_rq_clone() NULL pointer when requeueing unmapped request")
Reported-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reported-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Geert Uytterhoeven [Tue, 26 May 2015 10:58:37 +0000 (12:58 +0200)]
ARM: multi_v7_defconfig: Replace CONFIG_USB_ISP1760_HCD by CONFIG_USB_ISP1760
Since commit
100832abf065bc18 ("usb: isp1760: Make HCD support
optional"), CONFIG_USB_ISP1760_HCD is automatically selected when
needed. Enabling that option in the defconfig is now a no-op, and no
longer enables ISP1760 HCD support.
Re-enable the ISP1760 driver in the defconfig by enabling
USB_ISP1760_HOST_ROLE instead.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Arnd Bergmann [Fri, 29 May 2015 12:30:35 +0000 (14:30 +0200)]
Merge tag 'imx-fixes-4.1-3' of git://git./linux/kernel/git/shawnguo/linux into fixes
Merge "The i.MX fixes for 4.1, 3rd round" from Shawn Guo:
It includes a couple of fixes for i.MX6 GPC code to let the new kernel
be able to boot with old DTBs:
- Booting v4.1-rc kernel with old DTBs will fail with a fat warning
(require low-level debug to be seen), due to the adoption of stacked
IRQ domain. The first fix improves the situation by allowing kernel
boot up with old DTBs, although suspend/resume still breaks.
- Booting new kernel with old DTBs that do not have power-domain info
will result in a hang. The second patch fixes the hang by skipping
the kernel power-domain registration if DTB has no power-domain info.
* tag 'imx-fixes-4.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
ARM: imx6: gpc: don't register power domain if DT data is missing
ARM: imx6: allow booting with old DT
Arnd Bergmann [Fri, 29 May 2015 12:29:37 +0000 (14:29 +0200)]
Merge tag 'samsung-fixes-3' of git://git./linux/kernel/git/kgene/linux-samsung into fixes
Merge "Samsung fix for v4.1" from Kukjin Kim:
- Set display clock correctly for exynos4412-trats2
: fix the following error
exynos-drm: No connectors reported connected with modes
[drm] Cannot find any crtc or sizes - going 1024x768
* tag 'samsung-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung:
ARM: dts: set display clock correctly for exynos4412-trats2
Takashi Iwai [Fri, 29 May 2015 07:43:29 +0000 (09:43 +0200)]
ALSA: hda - Fix lost sound due to stream_pm ops cleanup
The commit [
49fb18972581: ALSA: hda - Set stream_pm ops automatically
by generic parser] resulted in regressions on some Realtek and VIA
codecs because these drivers set patch_ops after calling the generic
parser, thus stream_pm got cleared to NULL again. I haven't noticed
since I tested with IDT codec.
Restore (partial revert) the stream_pm ops for them to fix the
regression.
Fixes:
49fb18972581 ('ALSA: hda - Set stream_pm ops automatically by generic parser')
Reported-by: Jeremiah Mahler <jmmahler@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Al Viro [Fri, 29 May 2015 03:09:19 +0000 (23:09 -0400)]
d_walk() might skip too much
when we find that a child has died while we'd been trying to ascend,
we should go into the first live sibling itself, rather than its sibling.
Off-by-one in question had been introduced in "deal with deadlock in
d_walk()" and the fix needs to be backported to all branches this one
has been backported to.
Cc: stable@vger.kernel.org # 3.2 and later
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
David S. Miller [Fri, 29 May 2015 03:41:35 +0000 (20:41 -0700)]
Merge branch 'master' of git://git./linux/kernel/git/klassert/ipsec
Steffen Klassert says:
====================
pull request (net): ipsec 2015-05-28
1) Fix a race in xfrm_state_lookup_byspi, we need to take
the refcount before we release xfrm_state_lock.
From Li RongQing.
2) Fix IV generation on ESN state. We used just the
low order sequence numbers for IV generation on
ESN, as a result the IV can repeat on the same
state. Fix this by using the high order sequence
number bits too and make sure to always initialize
the high order bits with zero. These patches are
serious stable candidates. Fixes from Herbert Xu.
3) Fix the skb->mark handling on vti. We don't
reset skb->mark in skb_scrub_packet anymore,
so vti must care to restore the original
value back after it was used to lookup the
vti policy and state. Fixes from Alexander Duyck.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Roger Luethi [Tue, 26 May 2015 17:12:54 +0000 (19:12 +0200)]
via-rhine: Resigning as maintainer
I don't have enough time to look after via-rhine anymore.
Signed-off-by: Roger Luethi <rl@hellgate.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adrien Schildknecht [Thu, 28 May 2015 22:44:40 +0000 (15:44 -0700)]
scripts/gdb: fix lx-lsmod refcnt
Commit
2f35c41f58a9 ("module: Replace module_ref with atomic_t refcnt")
changes the way refcnt is handled but did not update the gdb script to
use the new variable.
Since refcnt is not per-cpu anymore, we can directly read its value.
Signed-off-by: Adrien Schildknecht <adrien+dev@schischi.me>
Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Pantelis Koukousoulas <pktoss@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bob Copeland [Thu, 28 May 2015 22:44:37 +0000 (15:44 -0700)]
omfs: fix potential integer overflow in allocator
Both 'i' and 'bits_per_entry' are signed integers but the result is a
u64 block number. Cast i to u64 to avoid truncation on 32-bit targets.
Found by Coverity (CID 200679).
Signed-off-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bob Copeland [Thu, 28 May 2015 22:44:35 +0000 (15:44 -0700)]
omfs: fix sign confusion for bitmap loop counter
The count variable is used to iterate down to (below) zero from the size
of the bitmap and handle the one-filling the remainder of the last
partial bitmap block. The loop conditional expects count to be signed
in order to detect when the final block is processed, after which count
goes negative.
Unfortunately, a recent change made this unsigned along with some other
related fields. The result of is this is that during mount,
omfs_get_imap will overrun the bitmap array and corrupt memory unless
number of blocks happens to be a multiple of 8 * blocksize.
Fix by changing count back to signed: it is guaranteed to fit in an s32
without overflow due to an enforced limit on the number of blocks in the
filesystem.
Signed-off-by: Bob Copeland <me@bobcopeland.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bob Copeland [Thu, 28 May 2015 22:44:32 +0000 (15:44 -0700)]
omfs: set error return when d_make_root() fails
A static checker found the following issue in the error path for
omfs_fill_super:
fs/omfs/inode.c:552 omfs_fill_super()
warn: missing error code here? 'd_make_root()' failed. 'ret' = '0'
Fix by returning -ENOMEM in this case.
Signed-off-by: Bob Copeland <me@bobcopeland.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sasha Levin [Thu, 28 May 2015 22:44:29 +0000 (15:44 -0700)]
fs, omfs: add NULL terminator in the end up the token list
match_token() expects a NULL terminator at the end of the token list so
that it would know where to stop. Not having one causes it to overrun
to invalid memory.
In practice, passing a mount option that omfs didn't recognize would
sometimes panic the system.
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Bob Copeland <me@bobcopeland.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joe Perches [Thu, 28 May 2015 22:44:27 +0000 (15:44 -0700)]
MAINTAINERS: update CAPABILITIES pattern
Commit
1ddd3b4e07a4 ("LSM: Remove unused capability.c") removed the
file, remove the file pattern.
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Thu, 28 May 2015 22:44:24 +0000 (15:44 -0700)]
fs/binfmt_elf.c:load_elf_binary(): return -EINVAL on zero-length mappings
load_elf_binary() returns `retval', not `error'.
Fixes:
a87938b2e246b81b4fb ("fs/binfmt_elf.c: fix bug in loading of PIE binaries")
Reported-by: James Hogan <james.hogan@imgtec.com>
Cc: Michael Davidson <md@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Shreyas B. Prabhu [Thu, 28 May 2015 22:44:22 +0000 (15:44 -0700)]
tracing/mm: don't trace mm_page_pcpu_drain on offline cpus
Since tracepoints use RCU for protection, they must not be called on
offline cpus. trace_mm_page_pcpu_drain can be called on an offline cpu
in this scenario caught by LOCKDEP:
===============================
[ INFO: suspicious RCU usage. ]
4.1.0-rc1+ #9 Not tainted
-------------------------------
include/trace/events/kmem.h:265 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
RCU used illegally from offline CPU!
rcu_scheduler_active = 1, debug_locks = 1
1 lock held by swapper/5/0:
#0: (&(&zone->lock)->rlock){..-...}, at: [<
c0000000002073b0>] .free_pcppages_bulk+0x70/0x920
stack backtrace:
CPU: 5 PID: 0 Comm: swapper/5 Not tainted 4.1.0-rc1+ #9
Call Trace:
.dump_stack+0x98/0xd4 (unreliable)
.lockdep_rcu_suspicious+0x108/0x170
.free_pcppages_bulk+0x60c/0x920
.free_hot_cold_page+0x208/0x280
.destroy_context+0x90/0xd0
.__mmdrop+0x58/0x160
.idle_task_exit+0xf0/0x100
.pnv_smp_cpu_kill_self+0x58/0x2c0
.cpu_die+0x34/0x50
.arch_cpu_idle_dead+0x20/0x40
.cpu_startup_entry+0x708/0x7a0
.start_secondary+0x36c/0x3a0
start_secondary_prolog+0x10/0x14
Fix this by converting mm_page_pcpu_drain trace point into
TRACE_EVENT_CONDITION where condition is cpu_online(smp_processor_id())
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Shreyas B. Prabhu [Thu, 28 May 2015 22:44:19 +0000 (15:44 -0700)]
tracing/mm: don't trace mm_page_free on offline cpus
Since tracepoints use RCU for protection, they must not be called on
offline cpus. trace_mm_page_free can be called on an offline cpu in this
scenario caught by LOCKDEP:
===============================
[ INFO: suspicious RCU usage. ]
4.1.0-rc1+ #9 Not tainted
-------------------------------
include/trace/events/kmem.h:170 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
RCU used illegally from offline CPU!
rcu_scheduler_active = 1, debug_locks = 1
no locks held by swapper/1/0.
stack backtrace:
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.1.0-rc1+ #9
Call Trace:
.dump_stack+0x98/0xd4 (unreliable)
.lockdep_rcu_suspicious+0x108/0x170
.free_pages_prepare+0x494/0x680
.free_hot_cold_page+0x50/0x280
.destroy_context+0x90/0xd0
.__mmdrop+0x58/0x160
.idle_task_exit+0xf0/0x100
.pnv_smp_cpu_kill_self+0x58/0x2c0
.cpu_die+0x34/0x50
.arch_cpu_idle_dead+0x20/0x40
.cpu_startup_entry+0x708/0x7a0
.start_secondary+0x36c/0x3a0
start_secondary_prolog+0x10/0x14
Fix this by converting mm_page_free trace point into TRACE_EVENT_CONDITION
where condition is cpu_online(smp_processor_id())
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Shreyas B. Prabhu [Thu, 28 May 2015 22:44:16 +0000 (15:44 -0700)]
tracing/mm: don't trace kmem_cache_free on offline cpus
Since tracepoints use RCU for protection, they must not be called on
offline cpus. trace_kmem_cache_free can be called on an offline cpu in
this scenario caught by LOCKDEP:
===============================
[ INFO: suspicious RCU usage. ]
4.1.0-rc1+ #9 Not tainted
-------------------------------
include/trace/events/kmem.h:148 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
RCU used illegally from offline CPU!
rcu_scheduler_active = 1, debug_locks = 1
no locks held by swapper/1/0.
stack backtrace:
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.1.0-rc1+ #9
Call Trace:
.dump_stack+0x98/0xd4 (unreliable)
.lockdep_rcu_suspicious+0x108/0x170
.kmem_cache_free+0x344/0x4b0
.__mmdrop+0x4c/0x160
.idle_task_exit+0xf0/0x100
.pnv_smp_cpu_kill_self+0x58/0x2c0
.cpu_die+0x34/0x50
.arch_cpu_idle_dead+0x20/0x40
.cpu_startup_entry+0x708/0x7a0
.start_secondary+0x36c/0x3a0
start_secondary_prolog+0x10/0x14
Fix this by converting kmem_cache_free trace point into
TRACE_EVENT_CONDITION where condition is cpu_online(smp_processor_id())
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dave Airlie [Fri, 29 May 2015 01:13:52 +0000 (11:13 +1000)]
Merge branch 'linux-4.1' of git://anongit.freedesktop.org/git/nouveau/linux-2.6 into drm-fixes
Regression fix for Fermi acceleration, and fixes important to bringing
up display-less Maxwell boards.
* 'linux-4.1' of git://anongit.freedesktop.org/git/nouveau/linux-2.6:
drm/nouveau/gr/gm204: remove a stray printk
drm/nouveau/devinit/gm100-: force devinit table execution on boards without PDISP
drm/nouveau/devinit/gf100: make the force-post condition more obvious
drm/nouveau/gr/gf100-: fix wrong constant definition
Ben Skeggs [Mon, 4 May 2015 02:26:00 +0000 (12:26 +1000)]
drm/nouveau/gr/gm204: remove a stray printk
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Thu, 21 May 2015 05:47:16 +0000 (15:47 +1000)]
drm/nouveau/devinit/gm100-: force devinit table execution on boards without PDISP
Should fix fdo#89558
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Thu, 21 May 2015 05:44:15 +0000 (15:44 +1000)]
drm/nouveau/devinit/gf100: make the force-post condition more obvious
And also more generic, so it can be used on newer chipsets.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Lars Seipel [Thu, 21 May 2015 01:04:10 +0000 (03:04 +0200)]
drm/nouveau/gr/gf100-: fix wrong constant definition
Commit
3740c82590d8 ("drm/nouveau/gr/gf100-: add symbolic names for
classes") introduced a wrong macro definition causing acceleration setup
to fail. Fix it.
Signed-off-by: Lars Seipel <ls@slrz.net>
Fixes:
3740c82590d8 ("drm/nouveau/gr/gf100-: add symbolic names for classes")
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Dave Airlie [Thu, 28 May 2015 23:05:53 +0000 (09:05 +1000)]
Merge branch 'drm-fixes-4.1' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
one more regression fix, partial revert.
* 'drm-fixes-4.1' of git://people.freedesktop.org/~agd5f/linux:
drm/radeon: partially revert "fix VM_CONTEXT*_PAGE_TABLE_END_ADDR handling"
Brian Foster [Thu, 28 May 2015 22:14:55 +0000 (08:14 +1000)]
xfs: fix broken i_nlink accounting for whiteout tmpfile inode
XFS uses the internal tmpfile() infrastructure for the whiteout inode
used for RENAME_WHITEOUT operations. For tmpfile inodes, XFS allocates
the inode, drops di_nlink, adds the inode to the agi unlinked list,
calls d_tmpfile() which correspondingly drops i_nlink of the vfs inode,
and then finishes the common inode setup (e.g., clear I_NEW and unlock).
The d_tmpfile() call was originally made inxfs_create_tmpfile(), but was
pulled up out of that function as part of the following commit to
resolve a deadlock issue:
330033d6 xfs: fix tmpfile/selinux deadlock and initialize security
As a result, callers of xfs_create_tmpfile() are responsible for either
calling d_tmpfile() or fixing up i_nlink appropriately. The whiteout
tmpfile allocation helper does neither. As a result, the vfs ->i_nlink
becomes inconsistent with the on-disk ->di_nlink once xfs_rename() links
it back into the source dentry and calls xfs_bumplink().
Update the assert in xfs_rename() to help detect this problem in the
future and update xfs_rename_alloc_whiteout() to decrement the link
count as part of the manual tmpfile inode setup.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Dave Chinner [Thu, 28 May 2015 21:40:32 +0000 (07:40 +1000)]
xfs: xfs_iozero can return positive errno
It was missed when we converted everything in XFs to use negative error
numbers, so fix it now. Bug introduced in 3.17 by commit 2451337 ("xfs: global
error sign conversion"), and should go back to stable kernels.
Thanks to Brian Foster for noticing it.
cc: <stable@vger.kernel.org> # 3.17, 3.18, 3.19, 4.0
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Dave Chinner [Thu, 28 May 2015 21:40:08 +0000 (07:40 +1000)]
xfs: xfs_attr_inactive leaves inconsistent attr fork state behind
xfs_attr_inactive() is supposed to clean up the attribute fork when
the inode is being freed. While it removes attribute fork extents,
it completely ignores attributes in local format, which means that
there can still be active attributes on the inode after
xfs_attr_inactive() has run.
This leads to problems with concurrent inode writeback - the in-core
inode attribute fork is removed without locking on the assumption
that nothing will be attempting to access the attribute fork after a
call to xfs_attr_inactive() because it isn't supposed to exist on
disk any more.
To fix this, make xfs_attr_inactive() completely remove all traces
of the attribute fork from the inode, regardless of it's state.
Further, also remove the in-core attribute fork structure safely so
that there is nothing further that needs to be done by callers to
clean up the attribute fork. This means we can remove the in-core
and on-disk attribute forks atomically.
Also, on error simply remove the in-memory attribute fork. There's
nothing that can be done with it once we have failed to remove the
on-disk attribute fork, so we may as well just blow it away here
anyway.
cc: <stable@vger.kernel.org> # 3.12 to 4.0
Reported-by: Waiman Long <waiman.long@hp.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Dave Chinner [Thu, 28 May 2015 21:40:06 +0000 (07:40 +1000)]
xfs: extent size hints can round up extents past MAXEXTLEN
This results in BMBT corruption, as seen by this test:
# mkfs.xfs -f -d size=
40051712b,agcount=4 /dev/vdc
....
# mount /dev/vdc /mnt/scratch
# xfs_io -ft -c "extsize 16m" -c "falloc 0 30g" -c "bmap -vp" /mnt/scratch/foo
which results in this failure on a debug kernel:
XFS: Assertion failed: (blockcount & xfs_mask64hi(64-BMBT_BLOCKCOUNT_BITLEN)) == 0, file: fs/xfs/libxfs/xfs_bmap_btree.c, line: 211
....
Call Trace:
[<
ffffffff814cf0ff>] xfs_bmbt_set_allf+0x8f/0x100
[<
ffffffff814cf18d>] xfs_bmbt_set_all+0x1d/0x20
[<
ffffffff814f2efe>] xfs_iext_insert+0x9e/0x120
[<
ffffffff814c7956>] ? xfs_bmap_add_extent_hole_real+0x1c6/0xc70
[<
ffffffff814c7956>] xfs_bmap_add_extent_hole_real+0x1c6/0xc70
[<
ffffffff814caaab>] xfs_bmapi_write+0x72b/0xed0
[<
ffffffff811c72ac>] ? kmem_cache_alloc+0x15c/0x170
[<
ffffffff814fe070>] xfs_alloc_file_space+0x160/0x400
[<
ffffffff81ddcc29>] ? down_write+0x29/0x60
[<
ffffffff815063eb>] xfs_file_fallocate+0x29b/0x310
[<
ffffffff811d2bc8>] ? __sb_start_write+0x58/0x120
[<
ffffffff811e3e18>] ? do_vfs_ioctl+0x318/0x570
[<
ffffffff811cd680>] vfs_fallocate+0x140/0x260
[<
ffffffff811ce6f8>] SyS_fallocate+0x48/0x80
[<
ffffffff81ddec09>] system_call_fastpath+0x12/0x17
The tracepoint that indicates the extent that triggered the assert
failure is:
xfs_iext_insert: idx 0 offset 0 block
16777224 count 2097152 flag 1
Clearly indicating that the extent length is greater than MAXEXTLEN,
which is 2097151. A prior trace point shows the allocation was an
exact size match and that a length greater than MAXEXTLEN was asked
for:
xfs_alloc_size_done: agno 1 agbno 8 minlen 2097152 maxlen 2097152
^^^^^^^ ^^^^^^^
We don't see this problem with extent size hints through the IO path
because we can't do single IOs large enough to trigger MAXEXTLEN
allocation. fallocate(), OTOH, is not limited in it's allocation
sizes and so needs help here.
The issue is that the extent size hint alignment is rounding up the
extent size past MAXEXTLEN, because xfs_bmapi_write() is not taking
into account extent size hints when calculating the maximum extent
length to allocate. xfs_bmapi_reserve_delalloc() is already doing
this, but direct extent allocation is not.
Unfortunately, the calculation in xfs_bmapi_reserve_delalloc() is
wrong, and it works only because delayed allocation extents are not
limited in size to MAXEXTLEN in the in-core extent tree. hence this
calculation does not work for direct allocation, and the delalloc
code needs fixing. This may, in fact be the underlying bug that
occassionally causes transaction overruns in delayed allocation
extent conversion, so now we know it's wrong we should fix it, too.
Many thanks to Brian Foster for finding this problem during review
of this patch.
Hence the fix, after much code reading, is to allow
xfs_bmap_extsize_align() to align partial extents when full
alignment would extend the alignment past MAXEXTLEN. We can safely
do this because all callers have higher layer allocation loops that
already handle short allocations, and so will simply run another
allocation to cover the remainder of the requested allocation range
that we ignored during alignment. The advantage of this approach is
that it also removes the need for callers to do anything other than
limit their requests to MAXEXTLEN - they don't really need to be
aware of extent size hints at all.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Dave Chinner [Thu, 28 May 2015 21:39:34 +0000 (07:39 +1000)]
xfs: inode and free block counters need to use __percpu_counter_compare
Because the counters use a custom batch size, the comparison
functions need to be aware of that batch size otherwise the
comparison does not work correctly. This leads to ASSERT failures
on generic/027 like this:
XFS: Assertion failed: 0, file: fs/xfs/xfs_mount.c, line: 1099
------------[ cut here ]------------
....
Call Trace:
[<
ffffffff81522a39>] xfs_mod_icount+0x99/0xc0
[<
ffffffff815285cb>] xfs_trans_unreserve_and_mod_sb+0x28b/0x5b0
[<
ffffffff8152f941>] xfs_log_commit_cil+0x321/0x580
[<
ffffffff81528e17>] xfs_trans_commit+0xb7/0x260
[<
ffffffff81503d4d>] xfs_bmap_finish+0xcd/0x1b0
[<
ffffffff8151da41>] xfs_inactive_ifree+0x1e1/0x250
[<
ffffffff8151dbe0>] xfs_inactive+0x130/0x200
[<
ffffffff81523a21>] xfs_fs_evict_inode+0x91/0xf0
[<
ffffffff811f3958>] evict+0xb8/0x190
[<
ffffffff811f433b>] iput+0x18b/0x1f0
[<
ffffffff811e8853>] do_unlinkat+0x1f3/0x320
[<
ffffffff811d548a>] ? filp_close+0x5a/0x80
[<
ffffffff811e999b>] SyS_unlinkat+0x1b/0x40
[<
ffffffff81e0892e>] system_call_fastpath+0x12/0x71
This is a regression introduced by commit 501ab32 ("xfs: use generic
percpu counters for inode counter").
This patch fixes the same problem for both the inode counter and the
free block counter in the superblocks.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Dave Chinner [Thu, 28 May 2015 21:39:34 +0000 (07:39 +1000)]
percpu_counter: batch size aware __percpu_counter_compare()
XFS uses non-stanard batch sizes for avoiding frequent global
counter updates on it's allocated inode counters, as they increment
or decrement in batches of 64 inodes. Hence the standard percpu
counter batch of 32 means that the counter is effectively a global
counter. Currently Xfs uses a batch size of 128 so that it doesn't
take the global lock on every single modification.
However, Xfs also needs to compare accurately against zero, which
means we need to use percpu_counter_compare(), and that has a
hard-coded batch size of 32, and hence will spuriously fail to
detect when it is supposed to use precise comparisons and hence
the accounting goes wrong.
Add __percpu_counter_compare() to take a custom batch size so we can
use it sanely in XFS and factor percpu_counter_compare() to use it.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>
George Wang [Thu, 28 May 2015 21:39:34 +0000 (07:39 +1000)]
xfs: use percpu_counter_read_positive for mp->m_icount
Function percpu_counter_read just return the current counter, which can be
negative. This will cause the checking of "allocated inode
counts <= m_maxicount" false positive. Use percpu_counter_read_positive can
solve this problem, and be consistent with the purpose to introduce percpu
mechanism to xfs.
Signed-off-by: George Wang <xuw2015@gmail.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Linus Torvalds [Thu, 28 May 2015 20:28:24 +0000 (13:28 -0700)]
Merge tag 'xtensa-
20150526' of git://github.com/czankel/xtensa-linux
Pull Xtensa fix from Chris Zankel:
"This fixes allmodconfig, which fails to build due to missing
dma_alloc_attrs() and dma_free_attrs() functions"
* tag 'xtensa-
20150526' of git://github.com/czankel/xtensa-linux:
xtensa: Provide dummy dma_alloc_attrs() and dma_free_attrs()
Linus Torvalds [Thu, 28 May 2015 20:19:53 +0000 (13:19 -0700)]
Merge tag 'platform-drivers-x86-v4.1-3' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86
Pull x86 platform driver fix from Darren Hart:
"thinkpad_acpi: Revert unintentional device attribute renaming"
* tag 'platform-drivers-x86-v4.1-3' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86:
thinkpad_acpi: Revert unintentional device attribute renaming
Christian König [Thu, 28 May 2015 13:51:59 +0000 (15:51 +0200)]
drm/radeon: partially revert "fix VM_CONTEXT*_PAGE_TABLE_END_ADDR handling"
We have that bug for years and some users report side effects when fixing it on older hardware.
So revert it for VM_CONTEXT0_PAGE_TABLE_END_ADDR, but keep it for VM 1-15.
Signed-off-by: Christian König <christian.koenig@amd.com>
CC: stable@vger.kernel.org
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Kalle Valo [Thu, 28 May 2015 13:28:03 +0000 (16:28 +0300)]
Merge tag 'iwlwifi-for-kalle-2015-05-28' of https://git./linux/kernel/git/iwlwifi/iwlwifi-fixes
* fix OTP parsing 8260
* fix powersave handling for 8260
Arend van Spriel [Tue, 26 May 2015 11:19:46 +0000 (13:19 +0200)]
brcmfmac: avoid null pointer access when brcmf_msgbuf_get_pktid() fails
The function brcmf_msgbuf_get_pktid() may return a NULL pointer so
the callers should check the return pointer before accessing it to
avoid the crash below (see [1]):
brcmfmac: brcmf_msgbuf_get_pktid: Invalid packet id 273 (not in use)
BUG: unable to handle kernel NULL pointer dereference at
0000000000000080
IP: [<
ffffffff8145b225>] skb_pull+0x5/0x50
PGD 0
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: pci_stub vboxpci(O) vboxnetflt(O) vboxnetadp(O) vboxdrv(O)
snd_hda_codec_hdmi bnep mousedev hid_generic ushwmon msr ext4 crc16 mbcache
jbd2 sd_mod uas usb_storage ahci libahci libata scsi_mod xhci_pci xhci_hcd
usbcore usb_common
CPU: 0 PID: 1661 Comm: irq/61-brcmf_pc Tainted: G O 4.0.1-MacbookPro-ARCH #1
Hardware name: Apple Inc. MacBookPro12,1/Mac-
E43C1C25D4880AD6,
BIOS MBP121.88Z.0167.B02.
1503241251 03/24/2015
task:
ffff880264203cc0 ti:
ffff88025ffe4000 task.ti:
ffff88025ffe4000
RIP: 0010:[<
ffffffff8145b225>] [<
ffffffff8145b225>] skb_pull+0x5/0x50
RSP: 0018:
ffff88025ffe7d40 EFLAGS:
00010202
RAX:
0000000000000000 RBX:
ffff88008a33c000 RCX:
0000000000000044
RDX:
0000000000000000 RSI:
000000000000004a RDI:
0000000000000000
RBP:
ffff88025ffe7da8 R08:
0000000000000096 R09:
000000000000004a
R10:
0000000000000000 R11:
000000000000048e R12:
ffff88025ff14f00
R13:
0000000000000000 R14:
ffff880263b48200 R15:
ffff88008a33c000
FS:
0000000000000000(0000) GS:
ffff88026ec00000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
0000000000000080 CR3:
000000000180b000 CR4:
00000000003407f0
Stack:
ffffffffa06aed74 ffff88025ffe7dc8 ffff880263b48270 ffff880263b48278
05ea88020000004a 0002ffff81014635 000000001720b2f6 ffff88026ec116c0
ffff880263b48200 0000000000010000 ffff880263b4ae00 ffff880264203cc0
Call Trace:
[<
ffffffffa06aed74>] ? brcmf_msgbuf_process_rx+0x404/0x480 [brcmfmac]
[<
ffffffff810cea60>] ? irq_finalize_oneshot.part.30+0xf0/0xf0
[<
ffffffffa06afb55>] brcmf_proto_msgbuf_rx_trigger+0x35/0xf0 [brcmfmac]
[<
ffffffffa06baf2a>] brcmf_pcie_isr_thread_v2+0x8a/0x130 [brcmfmac]
[<
ffffffff810cea80>] irq_thread_fn+0x20/0x50
[<
ffffffff810ceddf>] irq_thread+0x13f/0x170
[<
ffffffff810cebf0>] ? wake_threads_waitq+0x30/0x30
[<
ffffffff810ceca0>] ? irq_thread_dtor+0xb0/0xb0
[<
ffffffff81092a08>] kthread+0xd8/0xf0
[<
ffffffff81092930>] ? kthread_create_on_node+0x1c0/0x1c0
[<
ffffffff8156d898>] ret_from_fork+0x58/0x90
[<
ffffffff81092930>] ? kthread_create_on_node+0x1c0/0x1c0
Code: 01 83 e2 f7 88 50 01 48 83 c4 08 5b 5d f3 c3 0f 1f 80 00 00 00 00 83 e2
f7 88 50 01 c3 66 0f 1f 84 00 00 00 00 00 0f 1f
RIP [<
ffffffff8145b225>] skb_pull+0x5/0x50
RSP <
ffff88025ffe7d40>
CR2:
0000000000000080
---[ end trace
b074c0f90e7c997d ]---
[1] http://mid.gmane.org/
20150430193259.GA5630@googlemail.com
Cc: <stable@vger.kernel.org> # v3.18, v3.19, v4.0, v4.1
Reported-by: Michael Hornung <mhornung.linux@gmail.com>
Reviewed-by: Hante Meuleman <meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieterpg@broadcom.com>
Signed-off-by: Arend van Spriel <arend@broadcom.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Jonathan Corbet [Mon, 13 Apr 2015 16:27:35 +0000 (18:27 +0200)]
mac80211: Fix mac80211.h docbook comments
A couple of enums in mac80211.h became structures recently, but the
comments didn't follow suit, leading to errors like:
Error(.//include/net/mac80211.h:367): Cannot parse enum!
Documentation/DocBook/Makefile:93: recipe for target 'Documentation/DocBook/80211.xml' failed
make[1]: *** [Documentation/DocBook/80211.xml] Error 1
Makefile:1361: recipe for target 'mandocs' failed
make: *** [mandocs] Error 2
Fix the comments comments accordingly. Added a couple of other small
comment fixes while I was there to silence other recently-added docbook
warnings.
Reported-by: Jim Davis <jim.epost@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>