Alexander Gordeev [Mon, 13 May 2013 00:57:49 +0000 (00:57 +0000)]
powerpc: Fix irq_set_affinity() return values
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
David Woodhouse [Mon, 13 May 2013 00:23:38 +0000 (00:23 +0000)]
powerpc: Provide __bswapdi2
Some versions of GCC apparently expect this to be provided by libgcc.
Updates from Mikey to fix 32 bit version and adding "r" to registers.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt [Tue, 14 May 2013 05:12:31 +0000 (15:12 +1000)]
powerpc/powernv: Fix starting of secondary CPUs on OPALv2 and v3
The current code fails to handle kexec on OPALv2. This fixes it
and adds code to improve the situation on OPALv3 where we can
query the CPU status from the firmware and decide what to do
based on that.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt [Tue, 14 May 2013 05:10:02 +0000 (15:10 +1000)]
powerpc/powernv: Detect OPAL v3 API version
Future firmwares will support that new version
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Li Zhong [Mon, 6 May 2013 22:44:41 +0000 (22:44 +0000)]
powerpc: Fix MAX_STACK_TRACE_ENTRIES too low warning again
Saw this warning again, and this time from the ret_from_fork path.
It seems we could clear the back chain earlier in copy_thread(), which
could cover both path, and also fix potential lockdep usage in
schedule_tail(), or exception occurred before we clear the back chain.
Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Michael Ellerman [Mon, 6 May 2013 18:43:39 +0000 (18:43 +0000)]
powerpc: Make CONFIG_RTAS_PROC depend on CONFIG_PROC_FS
We are getting build errors with CONFIG_PROC_FS=n:
arch/powerpc/kernel/rtas_flash.c
In function 'rtas_flash_init':
745:33: error: unused variable 'f' [-Werror=unused-variable]
But rtas_flash.c should not be built when CONFIG_PROC_FS=n, beacause all
it does is provide a /proc interface to the RTAS flash routines.
CONFIG_RTAS_FLASH already depends on CONFIG_RTAS_PROC, to indicate that
it depends on the RTAS proc support, but CONFIG_RTAS_PROC does not
depend on CONFIG_PROC_FS. So fix that.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Robert Jennings [Tue, 7 May 2013 04:34:11 +0000 (04:34 +0000)]
powerpc: Bring all threads online prior to migration/hibernation
This patch brings online all threads which are present but not online
prior to migration/hibernation. After migration/hibernation those
threads are taken back offline.
During migration/hibernation all online CPUs must call H_JOIN, this is
required by the hypervisor. Without this patch, threads that are offline
(H_CEDE'd) will not be woken to make the H_JOIN call and the OS will be
deadlocked (all threads either JOIN'd or CEDE'd).
Cc: <stable@kernel.org>
Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Vasant Hegde [Tue, 7 May 2013 16:54:47 +0000 (16:54 +0000)]
powerpc/rtas_flash: Fix validate_flash buffer overflow issue
ibm,validate-flash-image RTAS call output buffer contains 150 - 200
bytes of data on latest system. Presently we have output
buffer size as 64 bytes and we use sprintf to copy data from
RTAS buffer to local buffer. This causes kernel oops (see below
call trace).
This patch increases local buffer size to 256 and also uses
snprintf instead of sprintf to copy data from RTAS buffer.
Kernel call trace :
-------------------
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=1024 NUMA pSeries
Modules linked in: nfs fscache lockd auth_rpcgss nfs_acl sunrpc fuse loop dm_mod ipv6 ipv6_lib usb_storage ehea(X) sr_mod qlge ses cdrom enclosure st be2net sg ext3 jbd mbcache usbhid hid ohci_hcd ehci_hcd usbcore qla2xxx usb_common sd_mod crc_t10dif scsi_dh_hp_sw scsi_dh_rdac scsi_dh_alua scsi_dh_emc scsi_dh lpfc scsi_transport_fc scsi_tgt ipr(X) libata scsi_mod
Supported: Yes
NIP:
4520323031333130 LR:
4520323031333130 CTR:
0000000000000000
REGS:
c0000001b91779b0 TRAP: 0400 Tainted: G X (3.0.13-0.27-ppc64)
MSR:
8000000040009032 <EE,ME,IR,DR> CR:
44022488 XER:
20000018
TASK =
c0000001bca1aba0[4736] 'cat' THREAD:
c0000001b9174000 CPU: 36
GPR00:
4520323031333130 c0000001b9177c30 c000000000f87c98 000000000000009b
GPR04:
c0000001b9177c4a 000000000000000b 3520323031333130 2032303133313031
GPR08:
3133313031350a4d 000000000000009b 0000000000000000 c0000000003664a4
GPR12:
0000000022022448 c000000003ee6c00 0000000000000002 00000000100e8a90
GPR16:
00000000100cb9d8 0000000010093370 000000001001d310 0000000000000000
GPR20:
0000000000008000 00000000100fae60 000000000000005e 0000000000000000
GPR24:
0000000010129350 46573738302e3030 2046573738302e30 300a4d4720323031
GPR28:
333130313520554e 4b4e4f574e0a4d47 2032303133313031 3520323031333130
NIP [
4520323031333130] 0x4520323031333130
LR [
4520323031333130] 0x4520323031333130
Call Trace:
[
c0000001b9177c30] [
4520323031333130] 0x4520323031333130 (unreliable)
Instruction dump:
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Anton Blanchard [Sun, 12 May 2013 15:04:53 +0000 (15:04 +0000)]
powerpc/kexec: Fix kexec when using VMX optimised memcpy
commit
b3f271e86e5a (powerpc: POWER7 optimised memcpy using VMX and
enhanced prefetch) uses VMX when it is safe to do so (ie not in
interrupt). It also looks at the task struct to decide if we have to
save the current tasks' VMX state.
kexec calls memcpy() at a point where the task struct may have been
overwritten by the new kexec segments. If it has been overwritten
then when memcpy -> enable_altivec looks up current->thread.regs->msr
we get a cryptic oops or lockup.
I also notice we aren't initialising thread_info->cpu, which means
smp_processor_id is broken. Fix that too.
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: <stable@vger.kernel.org> # 3.6+
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Aneesh Kumar K.V [Mon, 6 May 2013 10:51:00 +0000 (10:51 +0000)]
powerpc: Fix build errors STRICT_MM_TYPECHECKS
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Aneesh Kumar K.V [Sat, 11 May 2013 22:33:19 +0000 (22:33 +0000)]
powerpc/mm: Use the correct mask value when looking at pgtable address
Our pgtable are 2*sizeof(pte_t)*PTRS_PER_PTE which is PTE_FRAG_SIZE.
Instead of depending on frag size, mask with PMD_MASKED_BITS.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Scott Wood [Fri, 10 May 2013 03:09:41 +0000 (22:09 -0500)]
powerpc: hard_irq_disable(): Call trace_hardirqs_off after disabling
lockdep.c has this:
/*
* So we're supposed to get called after you mask local IRQs,
* but for some reason the hardware doesn't quite think you did
* a proper job.
*/
if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
return;
Since irqs_disabled() is based on soft_enabled(), that (not just the
hard EE bit) needs to be 0 before we call trace_hardirqs_off.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Benjamin Herrenschmidt [Fri, 10 May 2013 06:59:18 +0000 (16:59 +1000)]
powerpc/powernv: Improve kexec reliability
We add a machine_shutdown hook that frees the OPAL interrupts
(so they get masked at the source and don't fire while kexec'ing)
and which triggers an IODA reset on all the PCIe host bridges
which will have the effect of blocking all DMAs and subsequent
PCIs interrupts.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt [Wed, 8 May 2013 04:14:26 +0000 (14:14 +1000)]
powerpc/powernv: Properly drop characters if console is closed
If the firmware returns an error such as "closed" (or hardware
error), we should drop characters.
Currently we only do that when a firmware compatible with OPAL v2
APIs is detected, in the code that calls opal_console_write_buffer_space(),
which didn't exist with OPAL v1 (or didn't work).
However, when enabling early debug consoles, the flag indicating
that v2 is supported isn't set yet, causing us, in case of errors
or closed console, to spin forever.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Alistair Popple [Mon, 29 Apr 2013 18:07:47 +0000 (18:07 +0000)]
powerpc: Add an in memory udbg console
This patch adds a new udbg early debug console which utilises
statically defined input and output buffers stored within the kernel
BSS. It is primarily designed to assist with bring up of new hardware
which may not have a working console but which has a method of
reading/writing kernel memory.
This version incorporates comments made by Ben H (thanks!).
Changes from v1:
- Add memory barriers.
- Ensure updating of read/write positions is atomic.
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt [Mon, 6 May 2013 21:04:02 +0000 (21:04 +0000)]
powerpc: Make hard_irq_disable() do the right thing vs. irq tracing
If hard_irq_disable() is called while interrupts are already soft-disabled
(which is the most common case) all is already well.
However you can (and in some cases want) to call it while everything is
enabled (to make sure you don't get a lazy even, for example before entry
into KVM guests) and in this case we need to inform the irq tracer that
the irqs are going off.
We have to change the inline into a macro to avoid an include circular
dependency hell hole.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt [Mon, 6 May 2013 05:02:40 +0000 (15:02 +1000)]
powerpc/topology: Fix spurr attribute permission
We are registering the attribute with permission 0600 but it
doesn't have a store callback, which causes WARN_ON's during
boot. Fix the permission.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt [Mon, 6 May 2013 03:40:40 +0000 (13:40 +1000)]
powerpc/pci: Support per-aperture memory offset
The PCI core supports an offset per aperture nowadays but our arch
code still has a single offset per host bridge representing the
difference betwen CPU memory addresses and PCI MMIO addresses.
This is a problem as new machines and hypervisor versions are
coming out where the 64-bit windows will have a different offset
(basically mapped 1:1) from the 32-bit windows.
This fixes it by using separate offsets. In the long run, we probably
want to get rid of that intermediary struct pci_controller and have
those directly stored into the pci_host_bridge as they are parsed
but this will be a more invasive change.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt [Mon, 6 May 2013 02:03:49 +0000 (12:03 +1000)]
powerpc/cell/iommu: Improve error message for missing node
Some devices don't have a correct node ID and thus can't be
attached to an iommu.
The message displayed by the iommu code isn't very useful if
you don't have a device-tree node as it tries to print the
device-tree path but not the struct device name.
Improve this by printing the device name as well.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt [Mon, 6 May 2013 02:02:05 +0000 (12:02 +1000)]
powerpc/cell/spufs: Fix status attribute permission
We are registering the attribute with permission 0644 but it
doesn't have a store callback, which causes WARN_ON's during
boot. Fix the permission.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt [Mon, 6 May 2013 01:37:43 +0000 (11:37 +1000)]
irqdomain: Allow quiet failure mode
Some interrupt controllers refuse to map interrupts marked as
"protected" by firwmare. Since we try to map everyting in the
device-tree on some platforms, we end up with a lot of nasty
WARN's in the boot log for what is a normal situation on those
machines.
This defines a specific return code (-EPERM) from the host map()
callback which cause irqdomain to fail silently.
MPIC is updated to return this when hitting a protected source
printing only a single line message for diagnostic purposes.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt [Sat, 4 May 2013 14:24:32 +0000 (14:24 +0000)]
powerpc/pnv: Fix "compatible" property for P8 PHB
The property should be "ibm,power8-pciex", not "ibm,p8-pciex". The latter
was changed in FW because it was inconsistent with the rest of the nodes.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt [Sat, 4 May 2013 14:22:57 +0000 (14:22 +0000)]
powerpc/pci: Don't add bogus empty resources to PHBs
When converting to use the new pci_add_resource_offset() we didn't
properly account for empty resources (0 flags) and add those bogons
to the PHBs. The result is some annoying messages in the log.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt [Fri, 3 May 2013 17:21:00 +0000 (17:21 +0000)]
powerpc/powerpnv: Properly handle failure starting CPUs
If OPAL returns an error, propagate it upward rather than spinning
seconds waiting for a CPU that will never show up
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Nishanth Aravamudan [Fri, 3 May 2013 14:49:59 +0000 (14:49 +0000)]
powerpc/cputable: Advertise support for ISEL/HTM/DSCR/TAR on POWER8
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Nishanth Aravamudan [Sat, 4 May 2013 16:01:17 +0000 (16:01 +0000)]
powerpc/cputable: Advertise ISEL support on appropriate embedded processors
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Nishanth Aravamudan [Fri, 3 May 2013 14:48:38 +0000 (14:48 +0000)]
powerpc/cputable: Advertise DSCR support on P7/P7+
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Nishanth Aravamudan [Fri, 3 May 2013 14:47:56 +0000 (14:47 +0000)]
powerpc/cputable: Reserve bits in HWCAP2 for new features
Also, make HTM's presence dependent on the .config option.
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Kleber Sacilotto de Souza [Fri, 3 May 2013 12:43:12 +0000 (12:43 +0000)]
powerpc/pseries: Perform proper max_bus_speed detection
On pseries machines the detection for max_bus_speed should be done
through an OpenFirmware property. This patch adds a function to perform
this detection and a hook to perform dynamic adding of the function only
for pseries. This is done by overwriting the weak
pcibios_root_bridge_prepare function which is called by
pci_create_root_bus().
From: Lucas Kannebley Tavares <lucaskt@linux.vnet.ibm.com>
Signed-off-by: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Brian King [Fri, 3 May 2013 11:30:59 +0000 (11:30 +0000)]
powerpc/pseries: Force 32 bit MSIs for devices that require it
The following patch implements a new PAPR change which allows
the OS to force the use of 32 bit MSIs, regardless of what
the PCI capabilities indicate. This is required for some
devices that advertise support for 64 bit MSIs but don't
actually support them.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Michael Neuling [Thu, 2 May 2013 15:36:14 +0000 (15:36 +0000)]
powerpc/tm: Fix null pointer deference in flush_hash_page
Make sure that current->thread.reg exists before we deference it in
flush_hash_page.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Reported-by: John J Miller <millerjo@us.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Jeremy Kerr [Wed, 1 May 2013 22:31:50 +0000 (22:31 +0000)]
powerpc/powernv: Defer OPAL exception handler registration
Currently, the OPAL exception vectors are registered before the feature
fixups are processed. This means that the now-firmware-owned vectors
will likely be overwritten by the kernel.
This change moves the exception registration code to an early initcall,
rather than at machine_init time.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Anton Blanchard [Wed, 1 May 2013 20:06:33 +0000 (20:06 +0000)]
powerpc: Emulate non privileged DSCR read and write
POWER8 allows read and write of the DSCR in userspace. We added
kernel emulation so applications could always use the instructions
regardless of the CPU type.
Unfortunately there are two SPRs for the DSCR and we only added
emulation for the privileged one. Add code to match the non
privileged one.
A simple test was created to verify the fix:
http://ozlabs.org/~anton/junkcode/user_dscr_test.c
Without the patch we get a SIGILL and it passes with the patch.
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: <stable@kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Linus Torvalds [Sun, 5 May 2013 21:47:31 +0000 (14:47 -0700)]
Merge tag 'kvm-3.10-1' of git://git./virt/kvm/kvm
Pull kvm updates from Gleb Natapov:
"Highlights of the updates are:
general:
- new emulated device API
- legacy device assignment is now optional
- irqfd interface is more generic and can be shared between arches
x86:
- VMCS shadow support and other nested VMX improvements
- APIC virtualization and Posted Interrupt hardware support
- Optimize mmio spte zapping
ppc:
- BookE: in-kernel MPIC emulation with irqfd support
- Book3S: in-kernel XICS emulation (incomplete)
- Book3S: HV: migration fixes
- BookE: more debug support preparation
- BookE: e6500 support
ARM:
- reworking of Hyp idmaps
s390:
- ioeventfd for virtio-ccw
And many other bug fixes, cleanups and improvements"
* tag 'kvm-3.10-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (204 commits)
kvm: Add compat_ioctl for device control API
KVM: x86: Account for failing enable_irq_window for NMI window request
KVM: PPC: Book3S: Add API for in-kernel XICS emulation
kvm/ppc/mpic: fix missing unlock in set_base_addr()
kvm/ppc: Hold srcu lock when calling kvm_io_bus_read/write
kvm/ppc/mpic: remove users
kvm/ppc/mpic: fix mmio region lists when multiple guests used
kvm/ppc/mpic: remove default routes from documentation
kvm: KVM_CAP_IOMMU only available with device assignment
ARM: KVM: iterate over all CPUs for CPU compatibility check
KVM: ARM: Fix spelling in error message
ARM: KVM: define KVM_ARM_MAX_VCPUS unconditionally
KVM: ARM: Fix API documentation for ONE_REG encoding
ARM: KVM: promote vfp_host pointer to generic host cpu context
ARM: KVM: add architecture specific hook for capabilities
ARM: KVM: perform HYP initilization for hotplugged CPUs
ARM: KVM: switch to a dual-step HYP init code
ARM: KVM: rework HYP page table freeing
ARM: KVM: enforce maximum size for identity mapped code
ARM: KVM: move to a KVM provided HYP idmap
...
David Howells [Sat, 4 May 2013 07:48:27 +0000 (08:48 +0100)]
Give the OID registry file module info to avoid kernel tainting
Give the OID registry file module information so that it doesn't taint the
kernel when compiled as a module and loaded.
Reported-by: Dros Adamson <Weston.Adamson@netapp.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Trond Myklebust <Trond.Myklebust@netapp.com>
cc: stable@vger.kernel.org
cc: linux-nfs@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 5 May 2013 20:23:27 +0000 (13:23 -0700)]
Merge branch 'timers-nohz-for-linus' of git://git./linux/kernel/git/tip/tip
Pull 'full dynticks' support from Ingo Molnar:
"This tree from Frederic Weisbecker adds a new, (exciting! :-) core
kernel feature to the timer and scheduler subsystems: 'full dynticks',
or CONFIG_NO_HZ_FULL=y.
This feature extends the nohz variable-size timer tick feature from
idle to busy CPUs (running at most one task) as well, potentially
reducing the number of timer interrupts significantly.
This feature got motivated by real-time folks and the -rt tree, but
the general utility and motivation of full-dynticks runs wider than
that:
- HPC workloads get faster: CPUs running a single task should be able
to utilize a maximum amount of CPU power. A periodic timer tick at
HZ=1000 can cause a constant overhead of up to 1.0%. This feature
removes that overhead - and speeds up the system by 0.5%-1.0% on
typical distro configs even on modern systems.
- Real-time workload latency reduction: CPUs running critical tasks
should experience as little jitter as possible. The last remaining
source of kernel-related jitter was the periodic timer tick.
- A single task executing on a CPU is a pretty common situation,
especially with an increasing number of cores/CPUs, so this feature
helps desktop and mobile workloads as well.
The cost of the feature is mainly related to increased timer
reprogramming overhead when a CPU switches its tick period, and thus
slightly longer to-idle and from-idle latency.
Configuration-wise a third mode of operation is added to the existing
two NOHZ kconfig modes:
- CONFIG_HZ_PERIODIC: [formerly !CONFIG_NO_HZ], now explicitly named
as a config option. This is the traditional Linux periodic tick
design: there's a HZ tick going on all the time, regardless of
whether a CPU is idle or not.
- CONFIG_NO_HZ_IDLE: [formerly CONFIG_NO_HZ=y], this turns off the
periodic tick when a CPU enters idle mode.
- CONFIG_NO_HZ_FULL: this new mode, in addition to turning off the
tick when a CPU is idle, also slows the tick down to 1 Hz (one
timer interrupt per second) when only a single task is running on a
CPU.
The .config behavior is compatible: existing !CONFIG_NO_HZ and
CONFIG_NO_HZ=y settings get translated to the new values, without the
user having to configure anything. CONFIG_NO_HZ_FULL is turned off by
default.
This feature is based on a lot of infrastructure work that has been
steadily going upstream in the last 2-3 cycles: related RCU support
and non-periodic cputime support in particular is upstream already.
This tree adds the final pieces and activates the feature. The pull
request is marked RFC because:
- it's marked 64-bit only at the moment - the 32-bit support patch is
small but did not get ready in time.
- it has a number of fresh commits that came in after the merge
window. The overwhelming majority of commits are from before the
merge window, but still some aspects of the tree are fresh and so I
marked it RFC.
- it's a pretty wide-reaching feature with lots of effects - and
while the components have been in testing for some time, the full
combination is still not very widely used. That it's default-off
should reduce its regression abilities and obviously there are no
known regressions with CONFIG_NO_HZ_FULL=y enabled either.
- the feature is not completely idempotent: there is no 100%
equivalent replacement for a periodic scheduler/timer tick. In
particular there's ongoing work to map out and reduce its effects
on scheduler load-balancing and statistics. This should not impact
correctness though, there are no known regressions related to this
feature at this point.
- it's a pretty ambitious feature that with time will likely be
enabled by most Linux distros, and we'd like you to make input on
its design/implementation, if you dislike some aspect we missed.
Without flaming us to crisp! :-)
Future plans:
- there's ongoing work to reduce 1Hz to 0Hz, to essentially shut off
the periodic tick altogether when there's a single busy task on a
CPU. We'd first like 1 Hz to be exposed more widely before we go
for the 0 Hz target though.
- once we reach 0 Hz we can remove the periodic tick assumption from
nr_running>=2 as well, by essentially interrupting busy tasks only
as frequently as the sched_latency constraints require us to do -
once every 4-40 msecs, depending on nr_running.
I am personally leaning towards biting the bullet and doing this in
v3.10, like the -rt tree this effort has been going on for too long -
but the final word is up to you as usual.
More technical details can be found in Documentation/timers/NO_HZ.txt"
* 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits)
sched: Keep at least 1 tick per second for active dynticks tasks
rcu: Fix full dynticks' dependency on wide RCU nocb mode
nohz: Protect smp_processor_id() in tick_nohz_task_switch()
nohz_full: Add documentation.
cputime_nsecs: use math64.h for nsec resolution conversion helpers
nohz: Select VIRT_CPU_ACCOUNTING_GEN from full dynticks config
nohz: Reduce overhead under high-freq idling patterns
nohz: Remove full dynticks' superfluous dependency on RCU tree
nohz: Fix unavailable tick_stop tracepoint in dynticks idle
nohz: Add basic tracing
nohz: Select wide RCU nocb for full dynticks
nohz: Disable the tick when irq resume in full dynticks CPU
nohz: Re-evaluate the tick for the new task after a context switch
nohz: Prepare to stop the tick on irq exit
nohz: Implement full dynticks kick
nohz: Re-evaluate the tick from the scheduler IPI
sched: New helper to prevent from stopping the tick in full dynticks
sched: Kick full dynticks CPU that have more than one task enqueued.
perf: New helper to prevent full dynticks CPUs from stopping tick
perf: Kick full dynticks CPU if events rotation is needed
...
Linus Torvalds [Sun, 5 May 2013 18:37:16 +0000 (11:37 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"Misc fixes plus a small hw-enablement patch for Intel IB model 58
uncore events"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel/lbr: Demand proper privileges for PERF_SAMPLE_BRANCH_KERNEL
perf/x86/intel/lbr: Fix LBR filter
perf/x86: Blacklist all MEM_*_RETIRED events for Ivy Bridge
perf: Fix vmalloc ring buffer pages handling
perf/x86/intel: Fix unintended variable name reuse
perf/x86/intel: Add support for IvyBridge model 58 Uncore
perf/x86/intel: Fix typo in perf_event_intel_uncore.c
x86: Eliminate irq_mis_count counted in arch_irq_stat
Linus Torvalds [Sun, 5 May 2013 17:58:06 +0000 (10:58 -0700)]
Merge tag 'modules-next-for-linus' of git://git./linux/kernel/git/rusty/linux
Pull mudule updates from Rusty Russell:
"We get rid of the general module prefix confusion with a binary config
option, fix a remove/insert race which Never Happens, and (my
favorite) handle the case when we have too many modules for a single
commandline. Seriously, the kernel is full, please go away!"
* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
modpost: fix unwanted VMLINUX_SYMBOL_STR expansion
X.509: Support parse long form of length octets in Authority Key Identifier
module: don't unlink the module until we've removed all exposure.
kernel: kallsyms: memory override issue, need check destination buffer length
MODSIGN: do not send garbage to stderr when enabling modules signature
modpost: handle huge numbers of modules.
modpost: add -T option to read module names from file/stdin.
modpost: minor cleanup.
genksyms: pass symbol-prefix instead of arch
module: fix symbol versioning with symbol prefixes
CONFIG_SYMBOL_PREFIX: cleanup.
Linus Torvalds [Sun, 5 May 2013 17:35:26 +0000 (10:35 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs
Pull single_open() leak fixes from Al Viro:
"A bunch of fixes for a moderately common class of bugs: file with
single_open() done by its ->open() and seq_release as its ->release().
That leaks; fortunately, it's not _too_ common (either people manage
to RTFM that says "When using single_open(), the programmer should use
single_release() instead of seq_release() in the file_operations
structure to avoid a memory leak", or they just copy a correct
instance), but grepping through the tree has caught quite a pile.
All of that is, AFAICS, -stable fodder, for as far as the patches
apply. I tried to carve it up into reasonably-sized pieces (more or
less "comes from the same tree")"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
rcutrace: single_open() leaks
gadget: single_open() leaks
staging: single_open() leaks
megaraid: single_open() leak
wireless: single_open() leaks
input: single_open() leak
rtc: single_open() leaks
ds1620: single_open() leak
sh: single_open() leaks
parisc: single_open() leaks
mips: single_open() leaks
ia64: single_open() leaks
h8300: single_open() leaks
cris: single_open() leaks
arm: single_open() leaks
Linus Torvalds [Sun, 5 May 2013 17:13:44 +0000 (10:13 -0700)]
Merge branch 'ipc-cleanups'
Merge ipc fixes and cleanups from my IPC branch.
The ipc locking has always been pretty ugly, and the scalability fixes
to some degree made it even less readable. We had two cases of double
unlocks in error paths due to this (one rcu read unlock, one semaphore
unlock), and this fixes the bugs I found while trying to clean things up
a bit so that we are less likely to have more.
* ipc-cleanups:
ipc: simplify rcu_read_lock() in semctl_nolock()
ipc: simplify semtimedop/semctl_main() common error path handling
ipc: move sem_obtain_lock() rcu locking into the only caller
ipc: fix double sem unlock in semctl error path
ipc: move the rcu_read_lock() from sem_lock_and_putref() into callers
ipc: sem_putref() does not need the semaphore lock any more
ipc: move rcu_read_unlock() out of sem_unlock() and into callers
Scott Wood [Wed, 1 May 2013 01:00:45 +0000 (20:00 -0500)]
kvm: Add compat_ioctl for device control API
This API shouldn't have 32/64-bit issues, but VFS assumes it does
unless told otherwise.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Peter Zijlstra [Fri, 3 May 2013 12:11:25 +0000 (14:11 +0200)]
perf/x86/intel/lbr: Demand proper privileges for PERF_SAMPLE_BRANCH_KERNEL
We should always have proper privileges when requesting kernel
data.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: <stable@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/20130503121256.230745028@chello.nl
[ Fix build error reported by fengguang.wu@intel.com, propagate error code back. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/n/tip-v0x9ky3ahzr6nm3c6ilwrili@git.kernel.org
Al Viro [Sun, 5 May 2013 04:16:35 +0000 (00:16 -0400)]
rcutrace: single_open() leaks
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 5 May 2013 04:16:11 +0000 (00:16 -0400)]
gadget: single_open() leaks
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 5 May 2013 04:15:43 +0000 (00:15 -0400)]
staging: single_open() leaks
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 5 May 2013 04:15:15 +0000 (00:15 -0400)]
megaraid: single_open() leak
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 5 May 2013 04:13:20 +0000 (00:13 -0400)]
wireless: single_open() leaks
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 5 May 2013 04:12:56 +0000 (00:12 -0400)]
input: single_open() leak
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 5 May 2013 04:12:29 +0000 (00:12 -0400)]
rtc: single_open() leaks
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 5 May 2013 04:11:29 +0000 (00:11 -0400)]
ds1620: single_open() leak
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 5 May 2013 04:11:01 +0000 (00:11 -0400)]
sh: single_open() leaks
Cc: vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 5 May 2013 04:09:44 +0000 (00:09 -0400)]
parisc: single_open() leaks
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 5 May 2013 04:09:30 +0000 (00:09 -0400)]
mips: single_open() leaks
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 5 May 2013 04:09:04 +0000 (00:09 -0400)]
ia64: single_open() leaks
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 5 May 2013 04:08:26 +0000 (00:08 -0400)]
h8300: single_open() leaks
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 5 May 2013 04:07:52 +0000 (00:07 -0400)]
cris: single_open() leaks
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 5 May 2013 04:07:22 +0000 (00:07 -0400)]
arm: single_open() leaks
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Linus Torvalds [Sun, 5 May 2013 03:10:04 +0000 (20:10 -0700)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Several routines do not use netdev_features_t to hold such bitmasks,
fixes from Patrick McHardy and Bjørn Mork.
2) Update cpsw IRQ software state and the actual HW irq enabling in the
correct order. From Mugunthan V N.
3) When sending tipc packets to multiple bearers, we have to make
copies of the SKB rather than just giving the original SKB directly.
Fix from Gerlando Falauto.
4) Fix race with bridging topology change timer, from Stephen
Hemminger.
5) Fix TCPv6 segmentation handling in GRE and VXLAN, from Pravin B
Shelar.
6) Endian bug in USB pegasus driver, from Dan Carpenter.
7) Fix crashes on MTU reduction in USB asix driver, from Holger
Eitzenberger.
8) Don't allow the kernel to BUG() just because the user puts some crap
in an AF_PACKET mmap() ring descriptor. Fix from Daniel Borkmann.
9) Don't use variable sized arrays on the stack in xen-netback, from
Wei Liu.
10) Fix stats reporting and an unbalanced napi_disable() in be2net
driver. From Somnath Kotur and Ajit Khaparde.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (25 commits)
cxgb4: fix error recovery when t4_fw_hello returns a positive value
sky2: Fix crash on receiving VLAN frames
packet: tpacket_v3: do not trigger bug() on wrong header status
asix: fix BUG in receive path when lowering MTU
net: qmi_wwan: Add Telewell TW-LTE 4G
usbnet: pegasus: endian bug in write_mii_word()
vxlan: Fix TCPv6 segmentation.
gre: Fix GREv4 TCPv6 segmentation.
bridge: fix race with topology change timer
tipc: pskb_copy() buffers when sending on more than one bearer
tipc: tipc_bcbearer_send(): simplify bearer selection
tipc: cosmetic: clean up comments and break a long line
drivers: net: cpsw: irq not disabled in cpsw isr in particular sequence
xen-netback: better names for thresholds
xen-netback: avoid allocating variable size array on stack
xen-netback: remove redundent parameter in netbk_count_requests
be2net: Fix to fail probe if MSI-X enable fails for a VF
be2net: avoid napi_disable() when it has not been enabled
be2net: Fix firmware download for Lancer
be2net: Fix to receive Multicast Packets when Promiscuous mode is enabled on certain devices
...
Linus Torvalds [Sun, 5 May 2013 03:08:49 +0000 (20:08 -0700)]
Merge git://git./linux/kernel/git/davem/sparc-next
Pull sparc updates from David Miller:
1) Hibernation support, as well as removal of excess interrupt
twiddling in MMU context allocation on sparc64 from Kirill Tkhai.
2) Kill references to __ARCH_WANT_UNLOCKED_CTXSW.
3) Sparc32 LEON bug fixes from Daniel Hellstrom and Andreas Larsson.
4) Provide cmpxchg64(), from Geert Uytterhoeven.
5) Device refcount and registry bug fixes from Federico Vaga and Wei
Yongjun.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next:
serial: sunsu: add missing platform_driver_unregister() when module exit
sparc32, leon: Do not overwrite previously set irq flow handlers
sparc/kernel/vio.c: add put_device() after device_find_child()
sparc64: Do not save/restore interrupts in get_new_mmu_context()
sparc: Consistently use 'wr' and 'rd' instructions for ASRs.
sparc64: Kill __ARCH_WANT_UNLOCKED_CTXSW
sparc64: Provide cmpxchg64()
sparc64: Do not change num_physpages during initmem freeing
sparc64: Hibernation support
sparc,leon: updated GRPCI2 config name
sparc,leon: support for GRPCI1 PCI host bridge controller
sparc32,leon: add support for PCI busn resource for GRPCI2
David S. Miller [Sun, 5 May 2013 01:34:13 +0000 (18:34 -0700)]
Merge git://git./linux/kernel/git/davem/sparc
Merge sparc bug fixes that didn't make it into v3.9 into
sparc-next.
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yongjun [Sat, 27 Apr 2013 00:13:16 +0000 (00:13 +0000)]
serial: sunsu: add missing platform_driver_unregister() when module exit
We have registered platform driver when module init, and
need unregister it when module exit.
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andreas Larsson [Sun, 21 Apr 2013 21:23:06 +0000 (21:23 +0000)]
sparc32, leon: Do not overwrite previously set irq flow handlers
This is needed because when scan_of_devices finds the GAISLER_GPTIMER
core that corresponds to the SMP "ticker" timer, the previously set
proper irq flow handler gets overwritten with an incorrect one. This
leads to very flaky timer interrupt handling on some hardware. Proper
updates to handlers can still be done using leon_update_virq_handling.
Signed-off-by: Andreas Larsson <andreas@gaisler.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Federico Vaga [Mon, 15 Apr 2013 04:42:52 +0000 (04:42 +0000)]
sparc/kernel/vio.c: add put_device() after device_find_child()
The vio_remove() function uses device_find_child() but it does not drop
the reference of the retrieved child.
Signed-off-by: Federico Vaga <federico.vaga@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sat, 4 May 2013 18:04:29 +0000 (11:04 -0700)]
ipc: simplify rcu_read_lock() in semctl_nolock()
This trivially combines two rcu_read_lock() calls in both sides of a
if-statement into one single one in front of the if-statement.
Split out as an independent cleanup from the previous commit.
Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 4 May 2013 18:04:29 +0000 (11:04 -0700)]
ipc: simplify semtimedop/semctl_main() common error path handling
With various straight RCU lock/unlock movements, one common exit path
pattern had become
rcu_read_unlock();
goto out_wakeup;
and in fact there were no cases where we wanted to exit to out_wakeup
_without_ releasing the RCU read lock.
So replace that pattern with "goto out_rcu_wakeup", and remove the old
out_wakeup.
Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 4 May 2013 17:47:57 +0000 (10:47 -0700)]
ipc: move sem_obtain_lock() rcu locking into the only caller
sem_obtain_lock() was another of those functions that returned with the
RCU lock held for reading in the success case. Move the RCU locking to
the caller (semtimedop()), making it more obvious. We already did RCU
locking elsewhere in that function.
Side note: why does semtimedop() re-do the semphore lookup after the
sleep, rather than just getting a reference to the semaphore it already
looked up originally?
Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 4 May 2013 17:25:11 +0000 (10:25 -0700)]
ipc: fix double sem unlock in semctl error path
Fix another ipc locking buglet introduced by the scalability patches:
when semctl_down() was changed to delay the semaphore locking, one error
path for security_sem_semctl() went through the semaphore unlock logic
even though the semaphore had never been locked.
Introduced by commit
16df3674efe3 ("ipc,sem: do not hold ipc lock more
than necessary")
Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 4 May 2013 17:13:40 +0000 (10:13 -0700)]
ipc: move the rcu_read_lock() from sem_lock_and_putref() into callers
This is another ipc semaphore locking cleanup, trying to make the
locking more straightforward. We move the rcu read locking into the
callers of sem_lock_and_putref(), which in general means that we now
mostly do the rcu_read_lock() and rcu_read_unlock() in the same
function.
Mostly. We still have the ipc_addid/newary/freeary mess, and things
like ipcctl_pre_down_nolock().
Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 4 May 2013 20:45:17 +0000 (13:45 -0700)]
Merge tag 'mmc-updates-for-3.10-rc1' of git://git./linux/kernel/git/cjb/mmc
Pull MMC update from Chris Ball:
"MMC highlights for 3.10:
Core:
- Introduce MMC_CAP2_NO_PRESCAN_POWERUP to allow skipping
mmc_power_up() at boot/initialization time if it's already
happened, for performance (faster boot time) reasons.
- Fix a bit width test failure that resulted in old eMMC cards being
put into 1-bit mode when 4-bit mode was available.
- Expose fwrev/hwrev for MMCv4 parts.
- Improve card removal logic in the case where the card's removed
slowly; we were missing card removal events if the card retained
contact with the slot pads for long enough to reply to a CMD13
while being removed.
Drivers:
- davinci_mmc: Support using PIO instead of DMA.
- dw_mmc: Add support for Exynos4412.
- mxcmmc: DT support, use slot-gpio API.
- mxs-mmc: Add broken-cd/cd-inverted/non-removable DT property
support.
- sdhci-sirf: New sdhci-pltfm driver for CSR SiRF SoCs:
SiRFprimaII: unicore ARM Cortex-A9
SiRFatlas6: unicore ARM Cortex-A9
SiRFmarco: dual core ARM Cortex-A9 SMP
- sdhci-tegra: Add support for Tegra114 platforms, use
mmc_of_parse()"
* tag 'mmc-updates-for-3.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc: (66 commits)
mmc: sdhci-tegra: fix MODULE_DEVICE_TABLE
mmc: core: fix init controller performance regression, updated patch
mmc: mxcmmc: enable DMA support on mpc512x
mmc: mxcmmc: constify mxcmci_devtype
mmc: mxcmmc: use slot-gpio API for write-protect detection
mmc: mxcmmc: add mpc512x SDHC support
mmc: mxcmmc: fix race conditions for host->req and host->data access
mmc: mxcmmc: DT support
mmc: dw_mmc: let device core setup the default pin configuration
mmc: mxs-mmc: add broken-cd property
mmc: mxs-mmc: add non-removable property
mmc: mxs-mmc: add cd-inverted property
mmc: core: call pm_runtime_put_noidle in pm_runtime_get_sync failed case
mmc: mxcmmc: Fix bug when card is present during boot
mmc: core: fix performance regression initializing MMC host controllers
Revert "mmc: core: wait while adding MMC host to ensure root mounts successfully"
mmc: atmel-mci: pio hang on block errors
mmc: core: Fix bit width test failing on old eMMC cards
mmc: dw_mmc: Use pr_info instead of printk
mmc: dw_mmc: Check return value of regulator_enable
...
Linus Torvalds [Sat, 4 May 2013 20:44:38 +0000 (13:44 -0700)]
Merge branch 'hwmon-for-linus' of git://git./linux/kernel/git/jdelvare/staging
Pull hwmon update from Jean Delvare:
"Only lm75 driver updates this time"
* 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
hwmon: (lm75) Add support for the Dallas/Maxim DS7505
hwmon: (lm75) Tune resolution and sample time per chip
hwmon: (lm75) Prepare to support per-chip resolution and sample time
hwmon: (lm75) Per-chip configuration register initialization
Linus Torvalds [Sat, 4 May 2013 20:29:38 +0000 (13:29 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs
Pull second round of VFS updates from Al Viro:
"Assorted fixes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
xtensa simdisk: fix braino in "xtensa simdisk: switch to proc_create_data()"
hostfs: use kmalloc instead of kzalloc
hostfs: move HOSTFS_SUPER_MAGIC to <linux/magic.h>
hostfs: remove "will unlock" comment
vfs: use list_move instead of list_del/list_add
proc_devtree: Replace include linux/module.h with linux/export.h
create_mnt_ns: unidiomatic use of list_add()
fs: remove dentry_lru_prune()
Removed unused typedef to avoid "unused local typedef" warnings.
kill fs/read_write.h
fs: Fix hang with BSD accounting on frozen filesystem
sun3_scsi: add ->show_info()
nubus: Kill nubus_proc_detach_device()
more mode_t whack-a-mole...
do_coredump(): don't wait for thaw if coredump has already been interrupted
do_mount(): fix a leak introduced in 3.9 ("mount: consolidate permission checks")
Al Viro [Sat, 4 May 2013 20:00:50 +0000 (16:00 -0400)]
xtensa simdisk: fix braino in "xtensa simdisk: switch to proc_create_data()"
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
James Hogan [Wed, 27 Mar 2013 10:47:14 +0000 (10:47 +0000)]
hostfs: use kmalloc instead of kzalloc
The inode info structure is zeroed at allocation with kzalloc, and then
all but one of the fields (including the largest, vfs_inode) are
initialised explicitly. Switch to using kmalloc and initialise the
remaining field too.
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
James Hogan [Wed, 27 Mar 2013 10:47:13 +0000 (10:47 +0000)]
hostfs: move HOSTFS_SUPER_MAGIC to <linux/magic.h>
Move HOSTFS_SUPER_MAGIC to <linux/magic.h> to be with it's magical
friends from other file systems.
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
James Hogan [Wed, 27 Mar 2013 10:47:12 +0000 (10:47 +0000)]
hostfs: remove "will unlock" comment
A "will unlock" comment was added to hostfs in the following commit,
along with a spinlock:
Commit
e9193059b1b3733695d5b80e667778311695aa73 ("hostfs: fix races in
dentry_name() and inode_name()").
But the spinlock was subsequently removed in the following commit:
Commit
ec2447c278ee973d35f38e53ca16ba7f965ae33d ("hostfs: simplify
locking").
Since the comment is no longer applicable, remove it.
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Wei Yongjun [Mon, 11 Mar 2013 16:10:50 +0000 (00:10 +0800)]
vfs: use list_move instead of list_del/list_add
Using list_move() instead of list_del() + list_add().
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Linus Torvalds [Sat, 4 May 2013 19:34:30 +0000 (12:34 -0700)]
Merge tag 'boards-for-linus' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM SoC board specific changes (part 1) from Olof Johansson:
"These changes are all for board specific files. These used to make up
a large portion of the ARM changes in the past, but as we are
generalizing the support and moving to device tree probing, this has
gotten significantly smaller.
The only platform actually adding new code here at the moment is
Renesas shmobile, as they are still busy converting their code to
device tree and have not come far enough to not need it."
* tag 'boards-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (43 commits)
ARM: msm: USB_MSM_OTG needs USB_PHY
ARM: davinci: da850 evm: fix const qualifier placement
ARM: davinci: da850 board: add remoteproc support
ARM: pxa: move debug uart code
ARM: pxa: select PXA935 on saar & tavorevb
ARM: mmp: add more compatible names in gpio driver
ARM: pxa: move PXA_GPIO_TO_IRQ macro
ARM: pxa: remove cpu_is_xxx in gpio driver
ARM: Kirkwood: update Network Space Mini v2 description
ARM: Kirkwood: DT board setup for CloudBox
ARM: Kirkwood: sort board entries by ASCII-code order
ARM: OMAP: board-4430sdp: Provide regulator to pwm-backlight
ARM: OMAP: zoom: Use pwm stack for lcd and keyboard backlight
ARM: OMAP2+: omap2plus_defconfig: Add support for BMP085 pressure sensor
omap2+: Remove useless Makefile line
omap2+: Remove useless Makefile line
ARM: OMAP: RX-51: add missing regulator supply definitions for lis3lv02d
ARM: OMAP1: fix omap_udc registration
ARM: davinci: use is IS_ENABLED macro
ARM: kirkwood: add MACH_GURUPLUG_DT to defconfig
...
Linus Torvalds [Sat, 4 May 2013 19:33:36 +0000 (12:33 -0700)]
Merge tag 'firmware-for-linus' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM platform specific firmware interfaces from Olof Johansson:
"Two platforms, bcm and exynos have their own firmware interfaces using
the "secure monitor call", this adds support for those.
We had originally planned to have a third set of patches in here,
which would extend support for the existing generic "psci" call that
is used on multiple platforms as well as Xen and KVM guests, but that
ended up getting dropped because the patches were not ready in time."
* tag 'firmware-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: bcm: mark bcm_kona_smc_init as __init
ARM: bcm281xx: Add DT support for SMC handler
ARM: bcm281xx: Add L2 cache enable code
ARM: EXYNOS: Add secure firmware support to secondary CPU bring-up
ARM: EXYNOS: Add IO mapping for non-secure SYSRAM.
ARM: EXYNOS: Add support for Exynos secure firmware
ARM: EXYNOS: Add support for secure monitor calls
ARM: Add interface for registering and calling firmware-specific operations
Linus Torvalds [Sat, 4 May 2013 19:32:41 +0000 (12:32 -0700)]
Merge tag 'renesas-pinctrl-for-linus' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM SoC pinctrl changes for Renesas from Olof Johansson:
"This is yet another driver change, which is split out just because of
its size. As already in 3.9, a lot of changes are going on here, as
the shmobile platform gets converted from its own pin control API to
the generic drivers/pinctrl subsystem.
Based on agreements with Paul Mundt, we are merging the sh-arch-side
changes here as well"
* tag 'renesas-pinctrl-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (142 commits)
ARM: shmobile: r8a7779: Remove INTC function GPIOs
ARM: shmobile: r8a7779: Remove LBSC function GPIOs
ARM: shmobile: r8a7779: Remove USB function GPIOs
ARM: shmobile: r8a7779: Remove HSPI function GPIOs
ARM: shmobile: r8a7779: Remove SCIF function GPIOs
ARM: shmobile: r8a7779: Remove SDHI and MMCIF function GPIOs
ARM: shmobile: r8a7779: Remove DU function GPIOs
ARM: shmobile: r8a7779: Remove DU1_DOTCLKOUT1 GPIO
ARM: shmobile: r8a7740: Remove SDHI and MMCIF function GPIOs
ARM: shmobile: r8a7740: Remove LCD0 and LCD1 function GPIOs
ARM: shmobile: sh73a0: Remove IrDA function GPIOs
ARM: shmobile: sh73a0: Remove USB function GPIOs
ARM: shmobile: sh73a0: Remove BSC function GPIOs
ARM: shmobile: sh73a0: Remove KEYSC function GPIOs
ARM: shmobile: sh73a0: Remove pull-up function GPIOS
ARM: shmobile: sh73a0: Remove FSI function GPIOs
ARM: shmobile: sh73a0: Remove I2C function GPIOs
ARM: shmobile: sh73a0: Remove SCIFA and SCIFB function GPIOs
ARM: shmobile: sh73a0: Remove LCDC and LCDC2 function GPIOs
ARM: shmobile: sh7372: Remove SDHI and MMCIF function GPIOs
...
Linus Torvalds [Sat, 4 May 2013 19:31:18 +0000 (12:31 -0700)]
Merge tag 'drivers-for-linus' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM SoC driver changes from Olof Johansson:
"This is a rather large set of patches for device drivers that for one
reason or another the subsystem maintainer preferred to get merged
through the arm-soc tree. There are both new drivers as well as
existing drivers that are getting converted from platform-specific
code into standalone drivers using the appropriate subsystem specific
interfaces.
In particular, we can now have pinctrl, clk, clksource and irqchip
drivers in one file per driver, without the need to call into platform
specific interface, or to get called from platform specific code, as
long as all information about the hardware is provided through a
device tree.
Most of the drivers we touch this time are for clocksource. Since now
most of them are part of drivers/clocksource, I expect that we won't
have to touch these again from arm-soc and can let the clocksource
maintainers take care of these in the future.
Another larger part of this series is specific to the exynos platform,
which is seeing some significant effort in upstreaming and
modernization of its device drivers this time around, which
unfortunately is also the cause for the churn and a lot of the merge
conflicts.
There is one new subsystem that gets merged as part of this series:
the reset controller interface, which is a very simple interface for
taking devices on the SoC out of reset or back into reset. Patches to
use this interface on i.MX follow later in this merge window, and we
are going to have other platforms (at least tegra and sirf) get
converted in 3.11. This will let us get rid of platform specific
callbacks in a number of platform independent device drivers."
* tag 'drivers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (256 commits)
irqchip: s3c24xx: add missing __init annotations
ARM: dts: Disable the RTC by default on exynos5
clk: exynos5250: Fix parent clock for sclk_mmc{0,1,2,3}
ARM: exynos: restore mach/regs-clock.h for exynos5
clocksource: exynos_mct: fix build error on non-DT
pinctrl: vt8500: wmt: Fix checking return value of pinctrl_register()
irqchip: vt8500: Convert arch-vt8500 to new irqchip infrastructure
reset: NULL deref on allocation failure
reset: Add reset controller API
dt: describe base reset signal binding
ARM: EXYNOS: Add arm-pmu DT binding for exynos421x
ARM: EXYNOS: Add arm-pmu DT binding for exynos5250
ARM: EXYNOS: Enable PMUs for exynos4
irqchip: exynos-combiner: Correct combined IRQs for exynos4
irqchip: exynos-combiner: Add set_irq_affinity function for combiner_irq
ARM: EXYNOS: fix compilation error introduced due to common clock migration
clk: exynos5250: Fix divider values for sclk_mmc{0,1,2,3}
clk: exynos4: export clocks required for fimc-is
clk: samsung: Fix compilation error
clk: tegra: fix enum tegra114_clk to match binding
...
Syam Sidhardhan [Thu, 14 Feb 2013 20:54:32 +0000 (02:24 +0530)]
proc_devtree: Replace include linux/module.h with linux/export.h
Since it uses only THIS_MODULE macro, include <linux/export.h>
is the right to go here.
Signed-off-by: Syam Sidhardhan <s.syam@samsung.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 4 May 2013 19:18:53 +0000 (15:18 -0400)]
create_mnt_ns: unidiomatic use of list_add()
while list_add(A, B) and list_add(B, A) are equivalent when both A and B
are guaranteed to be empty, the usual idiom is list_add(what, where),
not the other way round... Not a bug per se, but only by accident and
it makes RTFS harder for no good reason.
Spotted-by: Rajat Sharma <fs.rajat@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Yan, Zheng [Mon, 15 Apr 2013 06:13:21 +0000 (14:13 +0800)]
fs: remove dentry_lru_prune()
When pruning a dentry, its ancestor dentry can also be pruned. But
the ancestor dentry does not go through dput(), so it does not get
put on the dentry LRU. Hence associating d_prune with removing the
dentry from the LRU is the wrong.
The fix is remove dentry_lru_prune(). Call file system's d_prune()
callback directly when pruning dentries.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Han Shen [Fri, 12 Apr 2013 23:26:58 +0000 (16:26 -0700)]
Removed unused typedef to avoid "unused local typedef" warnings.
Fix warnings about unused local typedefs (reported by gcc 4.8).
Signed-off-by: Han Shen (shenhan@google.com)
Change-Id: I4bccc234f1390daa808d2b309ed112e20c0ac096
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 4 May 2013 19:00:54 +0000 (15:00 -0400)]
kill fs/read_write.h
fs/compat.c doesn't need it anymore, so let's just move the remaining
contents (two typedefs) into fs/read_write.c
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Jan Kara [Fri, 3 May 2013 22:11:23 +0000 (00:11 +0200)]
fs: Fix hang with BSD accounting on frozen filesystem
When BSD process accounting is enabled and logs information to a
filesystem which gets frozen, system easily becomes unusable because
each attempt to account process information blocks. Thus e.g. every task
gets blocked in exit.
It seems better to drop accounting information (which can already happen
when filesystem is running out of space) instead of locking system up.
So we just skip the write if the filesystem is frozen.
Reported-by: Nikola Ciprich <nikola.ciprich@linuxbox.cz>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Geert Uytterhoeven [Wed, 10 Apr 2013 11:52:09 +0000 (13:52 +0200)]
sun3_scsi: add ->show_info()
Based on Al's changes to atari_scsi.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Geert Uytterhoeven [Fri, 3 May 2013 20:20:38 +0000 (22:20 +0200)]
nubus: Kill nubus_proc_detach_device()
Commit
59d8053f1e16904d54ed7469d4b36801ea6b8f2c ("proc: Move non-public
stuff from linux/proc_fs.h to fs/proc/internal.h") broke Apple NuBus
support:
drivers/nubus/proc.c: In function ‘nubus_proc_detach_device’:
drivers/nubus/proc.c:156: error: dereferencing pointer to incomplete type
drivers/nubus/proc.c:158: error: dereferencing pointer to incomplete type
Fortunately nubus_proc_detach_device() is unused, and appears to have never
been used, so just remove it.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 4 May 2013 18:46:28 +0000 (14:46 -0400)]
more mode_t whack-a-mole...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 4 May 2013 18:45:54 +0000 (14:45 -0400)]
do_coredump(): don't wait for thaw if coredump has already been interrupted
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 4 May 2013 18:40:51 +0000 (14:40 -0400)]
do_mount(): fix a leak introduced in 3.9 ("mount: consolidate permission checks")
Cc: stable@vger.kernel.org
Bisected-by: Michael Leun <lkml20130126@newton.leun.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Linus Torvalds [Fri, 3 May 2013 22:22:00 +0000 (15:22 -0700)]
ipc: sem_putref() does not need the semaphore lock any more
ipc_rcu_putref() uses atomics for the refcount, and the games to lock
and unlock the semaphore just to try to keep the reference counting
working are no longer useful.
Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 3 May 2013 22:04:40 +0000 (15:04 -0700)]
ipc: move rcu_read_unlock() out of sem_unlock() and into callers
The IPC locking is a mess, and sem_unlock() unlocks not only the
semaphore spinlock, it also drops the rcu read lock. Unlike sem_lock(),
which just gets the spin-lock, and expects the caller to get the rcu
read lock.
This all makes things very hard to follow, and it's very confusing when
you take the rcu read lock in one function, and then release it in
another. And it has caused actual bugs: the sem_obtain_lock() function
ended up dropping the RCU read lock twice in one error path, because it
first did the sem_unlock(), and then did a rcu_read_unlock() to match
the rcu_read_lock() it had done.
This is just a totally mindless "remove rcu_read_unlock() from
sem_unlock() and add it immediately after each caller" (except for the
aforementioned bug where we did too many rcu_read_unlock(), and in
find_alloc_undo() where we just got the rcu_read_lock() to correct for
the fact that sem_unlock would immediately drop it again).
We can (and should) clean things up further, but this fixes the bug with
the minimal amount of subtlety.
Reviewed-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jean Delvare [Sat, 4 May 2013 12:49:36 +0000 (14:49 +0200)]
hwmon: (lm75) Add support for the Dallas/Maxim DS7505
Basically it's the same as the original DS75 but much faster.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Jean Delvare [Sat, 4 May 2013 12:49:36 +0000 (14:49 +0200)]
hwmon: (lm75) Tune resolution and sample time per chip
Most LM75-compatible chips can either sample much faster or with a
much better resolution than the original LM75 chip. So far the lm75
driver did not let the user take benefit of these improvements. Do it
now.
I decided to almost always configure the chip to use the best
resolution possible, which also means the longest sample time. The
only chips for which I didn't are the DS75, DS1775 and STDS75, because
they are really too slow in 12-bit mode (1.2 to 1.5 second worst case)
so I went for 11-bit mode as a more reasonable tradeoff. This choice is
dictated by the fact that the hwmon subsystem is meant for system
monitoring, it has never been supposed to be ultra-fast, and as a
matter of fact we do cache the sampled values in almost all drivers.
If anyone isn't pleased with these default settings, they can always
introduce a platform data structure or DT support for the lm75. That
being said, it seems nobody ever complained that the driver wouldn't
refresh the value faster than every 1.5 second, and the change made
it faster for all chips even in 12-bit mode, so I don't expect any
complaint.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Jean Delvare [Sat, 4 May 2013 12:49:36 +0000 (14:49 +0200)]
hwmon: (lm75) Prepare to support per-chip resolution and sample time
Prepare the lm75 driver to support per-chip resolution and sample
time. For now we only make the code generic enough to support it, but
we still use the same, unchanged resolution (9-bit) and sample time
(1.5 s) for all chips.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Jean Delvare [Sat, 4 May 2013 12:49:36 +0000 (14:49 +0200)]
hwmon: (lm75) Per-chip configuration register initialization
There is no standard for the configuration register bits of LM75-like
chips. We shouldn't blindly clear bits setting the resolution as they
are either unused or used for something else on some of the supported
chips.
So, switch to per-chip configuration initialization. This will allow
for better tuning later, for example using more resolution bits when
available.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Peter Zijlstra [Fri, 3 May 2013 12:11:24 +0000 (14:11 +0200)]
perf/x86/intel/lbr: Fix LBR filter
The LBR 'from' adddress is under full userspace control; ensure
we validate it before reading from it.
Note: is_module_text_address() can potentially be quite
expensive; for those running into that with high overhead
in modules optimize it using an RCU backed rb-tree.
Reported-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: <stable@kernel.org>
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/20130503121256.158211806@chello.nl
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/n/tip-mk8i82ffzax01cnqo829iy1q@git.kernel.org
Peter Zijlstra [Fri, 3 May 2013 12:11:23 +0000 (14:11 +0200)]
perf/x86: Blacklist all MEM_*_RETIRED events for Ivy Bridge
Errata BV98 states that all MEM_*_RETIRED events corrupt the
counter value of the SMT sibling's counters. Blacklist these
events
Reported-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: <stable@kernel.org>
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/20130503121256.083340271@chello.nl
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/n/tip-jwra43mujrv1oq9xk6mfe57v@git.kernel.org
Frederic Weisbecker [Fri, 3 May 2013 01:39:05 +0000 (03:39 +0200)]
sched: Keep at least 1 tick per second for active dynticks tasks
The scheduler doesn't yet fully support environments
with a single task running without a periodic tick.
In order to ensure we still maintain the duties of scheduler_tick(),
keep at least 1 tick per second.
This makes sure that we keep the progression of various scheduler
accounting and background maintainance even with a very low granularity.
Examples include cpu load, sched average, CFS entity vruntime,
avenrun and events such as load balancing, amongst other details
handled in sched_class::task_tick().
This limitation will be removed in the future once we get
these individual items to work in full dynticks CPUs.
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>