Russ Dill [Tue, 15 Dec 2009 04:45:35 +0000 (21:45 -0700)]
USB: Close usb_find_interface race v3
USB drivers that create character devices call usb_register_dev in their
probe function. This associates the usb_interface device with that minor
number and creates the character device and announces it to the world.
However, the driver's probe function is called before the new
usb_interface is added to the driver's klist_devices.
This is a problem because userspace will respond to the character device
creation announcement by opening the character device. The driver's open
function will the call usb_find_interface to find the usb_interface
associated with that minor number. usb_find_interface will walk the
driver's list of devices and find the usb_interface with the matching
minor number.
Because the announcement happens before the usb_interface is added to the
driver's klist_devices, a race condition exists. A straightforward fix
is to walk the list of devices on usb_bus_type instead since the device
is added to that list before the announcement occurs.
bus_find_device calls get_device to bump the reference count on the found
device. It is arguable that the reference count should be dropped by the
caller of usb_find_interface instead of usb_find_interface, however,
the current users of usb_find_interface do not expect this.
The original version of this patch only matched against minor number
instead of driver and minor number. This version matches against both.
Signed-off-by: Russ Dill <Russ.Dill@gmail.com>
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Greg Kroah-Hartman [Tue, 15 Dec 2009 15:47:28 +0000 (07:47 -0800)]
Revert "USB: Close usb_find_interface race"
This reverts commit
a2582bd478c13c574d4c16ef1209d333f2a25935.
It turned out to be buggy and broke USB printers from working.
Cc: Russ Dill <Russ.Dill@gmail.com>
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Linus Torvalds [Mon, 14 Dec 2009 20:50:25 +0000 (12:50 -0800)]
Merge branch 'for_linus' of git://git./linux/kernel/git/jack/linux-udf-2.6
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6:
udf: Avoid IO in udf_clear_inode
udf: Try harder when looking for VAT inode
udf: Fix compilation with UDFFS_DEBUG enabled
Jan Kara [Thu, 3 Dec 2009 12:39:28 +0000 (13:39 +0100)]
udf: Avoid IO in udf_clear_inode
It is not very good to do IO in udf_clear_inode. First, VFS does not really
expect inode to become dirty there and thus we have to write it ourselves,
second, memory reclaim gets blocked waiting for IO when it does not really
expect it, third, the IO pattern (e.g. on umount) resulting from writes in
udf_clear_inode is bad and it slows down writing a lot.
The reason why UDF needed to do IO in udf_clear_inode is that UDF standard
mandates extent length to exactly match inode size. But when we allocate
extents to a file or directory, we don't really know what exactly the final
file size will be and thus temporarily set it to block boundary and later
truncate it to exact length in udf_clear_inode. Now, this is changed to
truncate to final file size in udf_release_file for regular files. For
directories and symlinks, we do the truncation at the moment when learn
what the final file size will be.
Signed-off-by: Jan Kara <jack@suse.cz>
Jan Kara [Mon, 30 Nov 2009 18:47:55 +0000 (19:47 +0100)]
udf: Try harder when looking for VAT inode
Some disks do not contain VAT inode in the last recorded block as required
by the standard but a few blocks earlier (or the number of recorded blocks
is wrong). So look for the VAT inode a bit before the end of the media.
Signed-off-by: Jan Kara <jack@suse.cz>
Jan Kara [Mon, 30 Nov 2009 18:47:10 +0000 (19:47 +0100)]
udf: Fix compilation with UDFFS_DEBUG enabled
Signed-off-by: Jan Kara <jack@suse.cz>
Linus Torvalds [Mon, 14 Dec 2009 20:36:46 +0000 (12:36 -0800)]
Merge branch 'x86-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, mce: Clean up thermal init by introducing intel_thermal_supported()
x86, mce: Thermal monitoring depends on APIC being enabled
x86: Gart: fix breakage due to IOMMU initialization cleanup
x86: Move swiotlb initialization before dma32_free_bootmem
x86: Fix build warning in arch/x86/mm/mmio-mod.c
x86: Remove usedac in feature-removal-schedule.txt
x86: Fix duplicated UV BAU interrupt vector
nvram: Fix write beyond end condition; prove to gcc copy is safe
mm: Adjust do_pages_stat() so gcc can see copy_from_user() is safe
x86: Limit the number of processor bootup messages
x86: Remove enabling x2apic message for every CPU
doc: Add documentation for bootloader_{type,version}
x86, msr: Add support for non-contiguous cpumasks
x86: Use find_e820() instead of hard coded trampoline address
x86, AMD: Fix stale cpuid4_info shared_map data in shared_cpu_map cpumasks
Trivial percpu-naming-introduced conflicts in arch/x86/kernel/cpu/intel_cacheinfo.c
Linus Torvalds [Mon, 14 Dec 2009 20:33:02 +0000 (12:33 -0800)]
Merge git://git./linux/kernel/git/brodo/pcmcia-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/brodo/pcmcia-2.6:
pcmcia: CodingStyle fixes
pcmcia: remove unused IRQ_FIRST_SHARED
Linus Torvalds [Mon, 14 Dec 2009 18:22:11 +0000 (10:22 -0800)]
Merge branch 'next-spi' of git://git.secretlab.ca/git/linux-2.6
* 'next-spi' of git://git.secretlab.ca/git/linux-2.6: (23 commits)
spi: fix probe/remove section markings
Add OMAP spi100k driver
spi-imx: don't access struct device directly but use dev_get_platdata
spi-imx: Add mx25 support
spi-imx: use positive logic to distinguish cpu variants
spi-imx: correct check for platform_get_irq failing
ARM: NUC900: Add spi driver support for nuc900
spi: SuperH MSIOF SPI Master driver V2
spi: fix spidev compilation failure when VERBOSE is defined
spi/au1550_spi: fix setupxfer not to override cfg with zeros
spi/mpc8xxx: don't use __exit_p to wrap plat_mpc8xxx_spi_remove
spi/i.MX: fix broken error handling for gpio_request
spi/i.mx: drain MXC SPI transfer buffer when probing device
MAINTAINERS: add SPI co-maintainer.
spi/xilinx_spi: fix incorrect casting
spi/mpc52xx-spi: minor cleanups
xilinx_spi: add a platform driver using the xilinx_spi common module.
xilinx_spi: add support for the DS570 IP.
xilinx_spi: Switch to iomem functions and support little endian.
xilinx_spi: Split into of driver and generic part.
...
Linus Torvalds [Mon, 14 Dec 2009 18:13:22 +0000 (10:13 -0800)]
Merge branch 'perf-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf sched: Fix build failure on sparc
perf bench: Add "all" pseudo subsystem and "all" pseudo suite
perf tools: Introduce perf_session class
perf symbols: Ditch dso->find_symbol
perf symbols: Allow lookups by symbol name too
perf symbols: Add missing "Variables" entry to map_type__name
perf symbols: Add support for 'variable' symtabs
perf symbols: Introduce ELF counterparts to symbol_type__is_a
perf symbols: Introduce symbol_type__is_a
perf symbols: Rename kthreads to kmaps, using another abstraction for it
perf tools: Allow building for ARM
hw-breakpoints: Handle bad modify_user_hw_breakpoint off-case return value
perf tools: Allow cross compiling
tracing, slab: Fix no callsite ifndef CONFIG_KMEMTRACE
tracing, slab: Define kmem_cache_alloc_notrace ifdef CONFIG_TRACING
Trivial conflict due to different fixes to modify_user_hw_breakpoint()
in include/linux/hw_breakpoint.h
David Howells [Mon, 14 Dec 2009 14:13:44 +0000 (14:13 +0000)]
PCI: Global variable decls must match the defs in section attributes
Global variable declarations must match the definitions in section attributes
as the compiler is at liberty to vary the method it uses to access a variable,
depending on the section it is in.
When building the FRV arch, I now see:
drivers/built-in.o: In function `pci_apply_final_quirks':
drivers/pci/quirks.c:2606: relocation truncated to fit: R_FRV_GPREL12 against symbol `pci_dfl_cache_line_size' defined in .devinit.data section in drivers/built-in.o
drivers/pci/quirks.c:2623: relocation truncated to fit: R_FRV_GPREL12 against symbol `pci_dfl_cache_line_size' defined in .devinit.data section in drivers/built-in.o
drivers/pci/quirks.c:2630: relocation truncated to fit: R_FRV_GPREL12 against symbol `pci_dfl_cache_line_size' defined in .devinit.data section in drivers/built-in.o
because the declaration of pci_dfl_cache_line_size in linux/pci.h does not
match the definition in drivers/pci/pci.c.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Mon, 14 Dec 2009 14:03:27 +0000 (14:03 +0000)]
FRV: Fix no-hardware-breakpoint case
If there is no hardware breakpoint support, modify_user_hw_breakpoint()
tries to return a NULL pointer through as an 'int' return value:
In file included from kernel/exit.c:53:
include/linux/hw_breakpoint.h: In function 'modify_user_hw_breakpoint':
include/linux/hw_breakpoint.h:96: warning: return makes integer from pointer without a cast
Return 0 instead.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Mon, 14 Dec 2009 18:04:04 +0000 (10:04 -0800)]
Merge branch 'for-linus' of git://git.monstr.eu/linux-2.6-microblaze
* 'for-linus' of git://git.monstr.eu/linux-2.6-microblaze: (46 commits)
microblaze: Remove rt_sigsuspend wrapper
microblaze: nommu: Don't clobber R11 on syscalls
microblaze: Remove show_tmem function
microblaze: Support for WB cache
microblaze: Add PVR for Microblaze v7.30.a
microblaze: Remove ancient and fake microblaze version from cpu_ver table
microblaze: Remove panic_timeout init value
microblaze: Do not count system calls in default
microblaze: Enable DTC compilation
microblaze: Core oprofile configs and hooks
microblaze: Fix level interrupt ACKing
microblaze: Enable futimesat syscall
microblaze: Checking DTS against PVR for write-back cache
microblaze: Remove duplicity from pgalloc.h
microblaze: Futex support
microblaze: Adding dev_arch_data functions
microblaze: Fix the heartbeat gpio to be more robust
microblaze: Simple __copy_tofrom_user for noMMU
microblaze: Export memory_start for modules
microblaze: Use lowest-common-denominator default CPU settings
...
Linus Torvalds [Mon, 14 Dec 2009 18:03:36 +0000 (10:03 -0800)]
Merge branch 'for-linus' of git://neil.brown.name/md
* 'for-linus' of git://neil.brown.name/md: (27 commits)
md: add 'recovery_start' per-device sysfs attribute
md: rcu_read_lock() walk of mddev->disks in md_do_sync()
md: integrate spares into array at earliest opportunity.
md: move compat_ioctl handling into md.c
md: revise Kconfig help for MD_MULTIPATH
md: add MODULE_DESCRIPTION for all md related modules.
raid: improve MD/raid10 handling of correctable read errors.
md/raid10: print more useful messages on device failure.
md/bitmap: update dirty flag when bitmap bits are explicitly set.
md: Support write-intent bitmaps with externally managed metadata.
md/bitmap: move setting of daemon_lastrun out of bitmap_read_sb
md: support updating bitmap parameters via sysfs.
md: factor out parsing of fixed-point numbers
md: support bitmap offset appropriate for external-metadata arrays.
md: remove needless setting of thread->timeout in raid10_quiesce
md: change daemon_sleep to be in 'jiffies' rather than 'seconds'.
md: move offset, daemon_sleep and chunksize out of bitmap structure
md: collect bitmap-specific fields into one structure.
md/raid1: add takeover support for raid5->raid1
md: add honouring of suspend_{lo,hi} to raid1.
...
Linus Torvalds [Mon, 14 Dec 2009 18:02:35 +0000 (10:02 -0800)]
Merge branch 'for-next' of git://git./linux/kernel/git/sameo/mfd-2.6
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6: (58 commits)
mfd: Add twl6030 regulator subdevices
regulator: Add support for twl6030 regulators
rtc: Add twl6030 RTC support
mfd: Add support for twl6030 irq framework
mfd: Rename twl4030_ routines in twl-regulator.c
mfd: Rename twl4030_ routines in rtc-twl.c
mfd: Rename all twl4030_i2c*
mfd: Rename twl4030* driver files to enable re-use
mfd: Clarify twl4030 return value for read and write
mfd: Add all twl4030 regulators to the twl4030 mfd driver
mfd: Don't set mc13783 ADREFMODE for touch conversions
mfd: Remove ezx-pcap defines for custom led gpio encoding
mfd: Near complete mc13783 rewrite
mfd: Remove build time warning for WM835x register default tables
mfd: Force I2C to be built in when building WM831x
mfd: Don't allow wm831x to be built as a module
mfd: Fix incorrect error check for wm8350-core
mfd: Fix twl4030 warning
gpiolib: Implement gpio_to_irq() for wm831x
mfd: Remove default selection of AB4500
...
Linus Torvalds [Mon, 14 Dec 2009 18:01:15 +0000 (10:01 -0800)]
Merge branch 'devel' of /home/rmk/linux-2.6-arm
* 'devel' of master.kernel.org:/home/rmk/linux-2.6-arm:
ARM: fix lh7a40x build
ARM: fix sa1100 build
ARM: fix clps711x, footbridge, integrator, ixp2000, ixp2300 and s3c build bug
ARM: VFP: fix vfp thread init bug and document vfp notifier entry conditions
ARM: pxa: fix now incorrect reference of skt->irq by using skt->socket.pci_irq
[ARM] pxa/zeus: default configuration for Arcom Zeus SBC.
[ARM] pxa/zeus: make Viper pcmcia support more generic to support Zeus
[ARM] pxa/zeus: basic support for Arcom Zeus SBC
[ARM] pxa/em-x270: fix usb hub power up/reset sequence
PCMCIA: fix pxa2xx_lubbock modular build error
ARM: RealView: Fix typo in the RealView/PBX Kconfig entry
ARM: Do not allow the probing of the local timer
ARM: Add an earlyprintk debug console
Linus Torvalds [Mon, 14 Dec 2009 18:00:24 +0000 (10:00 -0800)]
Merge git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (75 commits)
NFS: Fix nfs_migrate_page()
rpc: remove unneeded function parameter in gss_add_msg()
nfs41: Invoke RECLAIM_COMPLETE on all new client ids
SUNRPC: IS_ERR/PTR_ERR confusion
NFSv41: Fix a potential state leakage when restarting nfs4_close_prepare
nfs41: Handle NFSv4.1 session errors in the delegation recall code
nfs41: Retry delegation return if it failed with session error
nfs41: Handle session errors during delegation return
nfs41: Mark stateids in need of reclaim if state manager gets stale clientid
NFS: Fix up the declaration of nfs4_restart_rpc when NFSv4 not configured
nfs41: Don't clear DRAINING flag on NFS4ERR_STALE_CLIENTID
nfs41: nfs41_setup_state_renewal
NFSv41: More cleanups
NFSv41: Fix up some bugs in the NFS4CLNT_SESSION_DRAINING code
NFSv41: Clean up slot table management
NFSv41: Fix nfs4_proc_create_session
nfs41: Invoke RECLAIM_COMPLETE
nfs41: RECLAIM_COMPLETE functionality
nfs41: RECLAIM_COMPLETE XDR functionality
Cleanup some NFSv4 XDR decode comments
...
Linus Torvalds [Mon, 14 Dec 2009 17:58:24 +0000 (09:58 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/tj/percpu
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (34 commits)
m68k: rename global variable vmalloc_end to m68k_vmalloc_end
percpu: add missing per_cpu_ptr_to_phys() definition for UP
percpu: Fix kdump failure if booted with percpu_alloc=page
percpu: make misc percpu symbols unique
percpu: make percpu symbols in ia64 unique
percpu: make percpu symbols in powerpc unique
percpu: make percpu symbols in x86 unique
percpu: make percpu symbols in xen unique
percpu: make percpu symbols in cpufreq unique
percpu: make percpu symbols in oprofile unique
percpu: make percpu symbols in tracer unique
percpu: make percpu symbols under kernel/ and mm/ unique
percpu: remove some sparse warnings
percpu: make alloc_percpu() handle array types
vmalloc: fix use of non-existent percpu variable in put_cpu_var()
this_cpu: Use this_cpu_xx in trace_functions_graph.c
this_cpu: Use this_cpu_xx for ftrace
this_cpu: Use this_cpu_xx in nmi handling
this_cpu: Use this_cpu operations in RCU
this_cpu: Use this_cpu ops for VM statistics
...
Fix up trivial (famous last words) global per-cpu naming conflicts in
arch/x86/kvm/svm.c
mm/slab.c
William Allen Simpson [Sun, 13 Dec 2009 20:12:46 +0000 (15:12 -0500)]
Documentation: rw_lock lessons learned
In recent months, two different network projects erroneously
strayed down the rw_lock path. Update the Documentation
based upon comments by Eric Dumazet and Paul E. McKenney in
those threads.
Further updates await somebody else with more expertise.
Changes:
- Merged with extensive content by Stephen Hemminger.
- Fix one of the comments by Linus Torvalds.
Signed-off-by: William.Allen.Simpson@gmail.com
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hidetoshi Seto [Mon, 14 Dec 2009 08:57:00 +0000 (17:57 +0900)]
x86, mce: Clean up thermal init by introducing intel_thermal_supported()
It looks better to have a common function. No change in functionality.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
LKML-Reference: <
4B25FDDC.407@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cyrill Gorcunov [Mon, 14 Dec 2009 08:56:34 +0000 (17:56 +0900)]
x86, mce: Thermal monitoring depends on APIC being enabled
Add check if APIC is not disabled since thermal
monitoring depends on it. As only apic gets disabled
we should not try to install "thermal monitor" vector,
print out that thermal monitoring is enabled and etc...
Note that "Intel Correct Machine Check Interrupts" already
has such a check.
Also I decided to not add cpu_has_apic check into
mcheck_intel_therm_init since even if it'll call apic_read on
disabled apic -- it's safe here and allow us to save a few code
bytes.
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
LKML-Reference: <
4B25FDC2.3020401@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
David Miller [Mon, 14 Dec 2009 07:56:22 +0000 (23:56 -0800)]
perf sched: Fix build failure on sparc
Here, tvec->tv_usec is "unsigned int" not "unsigned long".
Since the type is different on every platform, it's probably
best to just use long printf formats and cast.
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <
20091213.235622.
53363059.davem@davemloft.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Yinghai Lu [Mon, 14 Dec 2009 02:52:15 +0000 (11:52 +0900)]
x86: Gart: fix breakage due to IOMMU initialization cleanup
This fixes the following breakage of the commit
75f1cdf1dda92cae037ec848ae63690d91913eac:
- GART systems that don't AGP with broken BIOS and more than 4GB
memory are forced to use swiotlb. They can allocate aperture by
hand and use GART.
- GART systems without GAP must disable GART on shutdown.
- swiotlb usage is forced by the boot option,
gart_iommu_hole_init() is not called, so we disable GART
early_gart_iommu_check().
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
LKML-Reference: <
1260759135-6450-3-git-send-email-fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
FUJITA Tomonori [Mon, 14 Dec 2009 02:52:14 +0000 (11:52 +0900)]
x86: Move swiotlb initialization before dma32_free_bootmem
The commit
75f1cdf1dda92cae037ec848ae63690d91913eac introduced a
bug that we initialize SWIOTLB right after dma32_free_bootmem so
we wrongly steal memory area allocated for GART with broken BIOS
earlier.
This moves swiotlb initialization before dma32_free_bootmem().
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: yinghai@kernel.org
LKML-Reference: <
1260759135-6450-2-git-send-email-fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Joe Perches [Mon, 14 Dec 2009 07:24:03 +0000 (23:24 -0800)]
x86: Fix build warning in arch/x86/mm/mmio-mod.c
Stephen Rothwell reported these warnings:
arch/x86/mm/mmio-mod.c: In function 'print_pte':
arch/x86/mm/mmio-mod.c:100: warning: too many arguments for format
arch/x86/mm/mmio-mod.c:106: warning: too many arguments for format
The 'fmt' was left out accidentally.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus <torvalds@linux-foundation.org>
LKML-Reference: <
1260775443.18538.16.camel@Joe-Laptop.home>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
FUJITA Tomonori [Mon, 14 Dec 2009 02:06:15 +0000 (11:06 +0900)]
x86: Remove usedac in feature-removal-schedule.txt
The reason of removal, "replaced by allowdac and no dac
combination" is incorrect. There is no way to do the same thing
with "allowdac" and "nodac" combination.
The usedac option enables us to stop via_no_dac() setting
forbid_dac to 1. That is, someone who uses VIA bridges can use
DAC with this option even if some of VIA bridges seem to be
broken about DAC.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: WANG Cong <amwang@redhat.com>
Cc: gcosta@redhat.com
LKML-Reference: <20091214104423X.fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Hitoshi Mitake [Sun, 13 Dec 2009 08:01:59 +0000 (17:01 +0900)]
perf bench: Add "all" pseudo subsystem and "all" pseudo suite
This patch adds a new "all" pseudo subsystem and an "all" pseudo
suite. These are for testing all subsystem and its all suite, or
all suite of one subsystem.
(This patch also contains a few trivial comment fixes for
bench/* and output style fixes. I judged that there are no
necessity to make them into individual patch.)
Example of use:
| % ./perf bench sched all # Test all suites of sched subsystem
| # Running sched/messaging benchmark...
| # 20 sender and receiver processes per group
| # 10 groups == 400 processes run
|
| Total time: 0.414 [sec]
|
| # Running sched/pipe benchmark...
| # Extecuted 1000000 pipe operations between two tasks
|
| Total time: 10.999 [sec]
|
| 10.999317 usecs/op
| 90914 ops/sec
|
| % ./perf bench all # Test all suites of all subsystems
| # Running sched/messaging benchmark...
| # 20 sender and receiver processes per group
| # 10 groups == 400 processes run
|
| Total time: 0.420 [sec]
|
| # Running sched/pipe benchmark...
| # Extecuted 1000000 pipe operations between two tasks
|
| Total time: 11.741 [sec]
|
| 11.741346 usecs/op
| 85169 ops/sec
|
| # Running mem/memcpy benchmark...
| # Copying 1MB Bytes from 0x7ff33e920010 to 0x7ff3401ae010 ...
|
| 808.407437 MB/Sec
Signed-off-by: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <
1260691319-4683-1-git-send-email-mitake@dcl.info.waseda.ac.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Michal Simek [Fri, 11 Dec 2009 11:54:04 +0000 (12:54 +0100)]
microblaze: Remove rt_sigsuspend wrapper
Generic rt_sigsuspend syscalls doesn't need any asm wrapper.
Signed-off-by: Michal Simek <monstr@monstr.eu>
steve@digidescorp.com [Wed, 9 Dec 2009 23:13:42 +0000 (17:13 -0600)]
microblaze: nommu: Don't clobber R11 on syscalls
The noMMU syscall trap has a bug that causes R11 to be zero on return to
userland. Remove the extra "save" of R11 responsible for the bug.
Remove reloading of mode indicator because r11 already contains it.
Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Thu, 10 Dec 2009 11:06:03 +0000 (12:06 +0100)]
microblaze: Remove show_tmem function
show_tmem function do nothing that's why I removed it.
There is also cleaning of commented ancient code.
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Thu, 10 Dec 2009 10:43:57 +0000 (11:43 +0100)]
microblaze: Support for WB cache
Microblaze version 7.20.d is the first MB version which can be run
on MMU linux. Please do not used previous version because they contain
HW bug.
Based on WB support was necessary to redesign whole cache design.
Microblaze versions from 7.20.a don't need to disable IRQ and cache
before working with them that's why there are special structures for it.
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Tue, 8 Dec 2009 16:54:07 +0000 (17:54 +0100)]
microblaze: Add PVR for Microblaze v7.30.a
Microblaze v7.30.a will have 0x10 version string.
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Tue, 8 Dec 2009 16:51:06 +0000 (17:51 +0100)]
microblaze: Remove ancient and fake microblaze version from cpu_ver table
We need to continue with next microblaze PVR version that's why
I have to remove that ancient version. These version strings not match
any versions. From Microblaze v5.00.a is possible to use this style.
I believe that none use ancients versions. If yes they will be just
labeled as unknown version.
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Tue, 8 Dec 2009 16:49:21 +0000 (17:49 +0100)]
microblaze: Remove panic_timeout init value
panic_timeout is in BSS section and it is cleared with BSS section.
This means that value is setup to 0.
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Mon, 7 Dec 2009 07:21:34 +0000 (08:21 +0100)]
microblaze: Do not count system calls in default
There is not necessary to count system calls that's why
I added DEBUG macro
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Mon, 30 Nov 2009 08:26:09 +0000 (09:26 +0100)]
microblaze: Enable DTC compilation
For simpleImage format we need to compile DTC. There is still possibility
to compile only Linux kernel without DTB compiled-in.
Signed-off-by: Michal Simek <monstr@monstr.eu>
John Williams [Tue, 24 Nov 2009 10:27:54 +0000 (20:27 +1000)]
microblaze: Core oprofile configs and hooks
Microblaze uses timer interrupt mode. Microblaze don't have
any performance counter that's why we use just simple implementation.
Signed-off-by: John Williams <john.williams@petalogix.com>
Signed-off-by: Michal Simek <monstr@monstr.eu>
steve@digidescorp.com [Tue, 17 Nov 2009 14:43:39 +0000 (08:43 -0600)]
microblaze: Fix level interrupt ACKing
Level interrupts need to be ack'd in the unmask handler, as in powerpc.
Among other issues, this bug causes the system clock to appear to run at
double-speed.
Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Mon, 19 Oct 2009 11:50:02 +0000 (13:50 +0200)]
microblaze: Enable futimesat syscall
Futimesat was disabled. LTP testing shows that MB has no
problem with this syscall.
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Wed, 21 Oct 2009 10:29:46 +0000 (12:29 +0200)]
microblaze: Checking DTS against PVR for write-back cache
WB cache has special flag in PVR. There is added checking mechanism
for PVR and DTS.
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Mon, 23 Nov 2009 09:15:00 +0000 (10:15 +0100)]
microblaze: Remove duplicity from pgalloc.h
just file cleanup
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Mon, 19 Oct 2009 09:58:44 +0000 (11:58 +0200)]
microblaze: Futex support
Microblaze v7.20 provides new lwx, swx instructions which bring
possibility to implement lock rutines.
There are some tests in open posix thread LTP part but current
toolchain not support it.
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Mon, 23 Nov 2009 09:07:51 +0000 (10:07 +0100)]
microblaze: Adding dev_arch_data functions
The functions, dev_arch_data_set_node and get_node are missing
and are needed by some device drivers such as I2C.
Signed-off-by: John Linn <john.linn@xilinx.com>
Signed-off-by: Michal Simek <monstr@monstr.eu>
John Linn [Fri, 5 Jun 2009 17:36:31 +0000 (11:36 -0600)]
microblaze: Fix the heartbeat gpio to be more robust
The device tree handling for the gpio in the heart beat was not handling
the system when there was no gpio and it wasn't working with a newer version
of the gpio core which does not have the is-bidir property.
Signed-off-by: John Linn <john.linn@xilinx.com>
Signed-off-by: Michal Simek <monstr@monstr.eu>
John Williams [Fri, 14 Aug 2009 02:06:46 +0000 (12:06 +1000)]
microblaze: Simple __copy_tofrom_user for noMMU
This is first patch which clear part of uaccess.h.
uaccess.h will be clear later.
Signed-off-by: John Williams <john.williams@petalogix.com>
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Thu, 23 Jul 2009 06:23:53 +0000 (08:23 +0200)]
microblaze: Export memory_start for modules
memory_start symbol is needed by kernel modules.
Signed-off-by: Michal Simek <monstr@monstr.eu>
John Williams [Mon, 24 Aug 2009 03:52:33 +0000 (13:52 +1000)]
microblaze: Use lowest-common-denominator default CPU settings
This will ensure that kernels built with no custom CPU settings will still boot
OK on hardware that has additional CPU hardware instructions etc.
Signed-off-by: John Williams <john.williams@petalogix.com>
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Fri, 21 Aug 2009 11:47:09 +0000 (13:47 +0200)]
microblaze: Update default generic DTS
It is generated with longer compatible list
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Mon, 26 Oct 2009 08:56:48 +0000 (09:56 +0100)]
microblaze: Enable asm optimization only for HW with barrel-shifter
Asm code uses barrel-shifter instruction that's why we have
to protect cases when HW don't have it.
Reported-by: John Linn <john.linn@xilinx.com>
Signed-off-by: Michal Simek <monstr@monstr.eu>
John Williams [Mon, 24 Aug 2009 03:52:32 +0000 (13:52 +1000)]
microblaze: Remove the buggy ALLOW_EDIT_AUTO config option
This was intended to allow manual override of CPU settings copied automatically
to Kconfig.auto, however it's problematic for several reasons, but mostly:
* If the defconfig doesn't have ALLOW_EDIT_AUTO=y, then it's impossible for
that defconfig to iverride the values in the kernel source tree. This leads
to very strange errors where the kernel is compiled with the wrong CPUFLAGS.
Next patch in the series will back out the default in Kconfig.auto to baseline
settings, so a kernel built with no default values will at least boot on any
hardware, just not make use of additional CPU features.
Signed-off-by: John Williams <john.williams@petalogix.com>
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Thu, 15 Oct 2009 13:18:13 +0000 (15:18 +0200)]
microblaze: Move cache macro from cache.h to cacheflush.h
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Wed, 14 Oct 2009 15:38:26 +0000 (17:38 +0200)]
microblaze: support U-BOOT image format
Two version are generated.
linux.bin.ub which is created from linux.bin file
and
simpleImage.<dts>.ub which is created from stripped simpleImage.<dts> file
Load address and entry point is for microblaze first instruction
which is CONFIG_KERNEL_BASE_ADDR variable.
There is possible for simpleImage format parse _start symbol too.
simpleImage.<dts> is still stripped elf file
I cleared simpleImage.<dts>.unstrip file because there are so big.
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Thu, 15 Oct 2009 09:32:25 +0000 (11:32 +0200)]
microblaze: Ptrace notifying from signal code
After the signal frame is set up on the userspace stack, ptrace() should
be given an opportunity to single-step into the signal handler
FRV, Blackfin, mn10300 and UM. Worth to look at that patches.
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Wed, 14 Oct 2009 09:12:50 +0000 (11:12 +0200)]
microblaze: Extend cpuinfo for support write-back caches
There is missing checking agains PVR but this is not important
for now. There are some missing checking too.
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Thu, 8 Oct 2009 11:06:42 +0000 (13:06 +0200)]
microblaze: Fix cache_line_lenght
We used cache_line as cache_line_lenght. For this reason
we did cache flushing 4 times longer than was necessary.
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Thu, 15 Oct 2009 11:34:31 +0000 (13:34 +0200)]
microblaze: Detect new 7.20.d version
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Thu, 29 Oct 2009 09:12:59 +0000 (10:12 +0100)]
microblaze: Support both levels for reset
Till this patch reset always perform writen to 1.
Now we can use negative logic and perform reset write to 0.
It is opposite level than is currently read from that pin
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Thu, 29 Oct 2009 07:58:15 +0000 (08:58 +0100)]
microblaze: Fix announce message for reset gpio
I had to change message for gpio-reset because I always
not to see it. Prefix RESET is big and visible.
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Fri, 13 Nov 2009 07:26:49 +0000 (08:26 +0100)]
microblaze: Remove saving and restoring before calling signal code
Saving is done in SAVE_STATE macros that's why another save discard
previous saved value.
This change has no effect to normal programs because they ends in any exception
and they are killed. On the other side has effect on debugging.
Signed-off-by: Michal Simek <monstr@monstr.eu>
steve@digidescorp.com [Fri, 13 Nov 2009 22:08:29 +0000 (16:08 -0600)]
microblaze: Fix pfn_valid() for noMMU
Configuring DEBUG_SLAB causes a noMMU kernel to die during initialization
with an invalid virtual address panic in kfree_debugcheck().
The panic is due to an improper definition of pfn_valid().
Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Mon, 16 Nov 2009 09:34:15 +0000 (10:34 +0100)]
microblaze: ftrace: Add dynamic function graph tracer
This patch add support for dynamic function graph tracer.
There is one my expactation that I can do flush_icache after
all code modification. On microblaze is this safer than do
flush for every entry. For icache is used name flush but
correct should be invalidation - this will be fix in upcomming
new cache implementaion and WB support.
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Mon, 16 Nov 2009 09:32:10 +0000 (10:32 +0100)]
microblaze: ftrace: add function graph support
For more information look at Documentation/trace folder.
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Thu, 10 Dec 2009 13:15:44 +0000 (14:15 +0100)]
microblaze: ftrace: Add dynamic trace support
With dynamic function tracer, by default, _mcount is defined as an
"empty" function, it returns directly without any more action. When
enabling it in user-space, it will jump to a real tracing
function(ftrace_caller), and do the real job for us.
Differ from the static function tracer, dynamic function tracer provides
two functions ftrace_make_call()/ftrace_make_nop() to enable/disable the
tracing of some indicated kernel functions(set_ftrace_filter).
In the kernel version, there is only one "_mcount" string for every
kernel function, so, we just need to match this one in mcount_regex of
scripts/recordmcount.pl.
For more information please look at code and Documentation/trace folder.
Steven ACK that scripts/recordmcount.pl part.
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Mon, 16 Nov 2009 08:55:08 +0000 (09:55 +0100)]
microblaze: ftrace: enable HAVE_FUNCTION_TRACE_MCOUNT_TEST
Implement MCOUNT_TEST in asm code - it is faster than use
generic code
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Mon, 16 Nov 2009 08:40:14 +0000 (09:40 +0100)]
microblaze: ftrace: add static function tracer
If -pg of gcc is enabled with CONFIG_FUNCTION_TRACER=y. a calling to
_mcount will be inserted into each kernel function. so, there is a
possibility to trace the kernel functions in _mcount.
This patch add the specific _mcount support for static function
tracing. by default, ftrace_trace_function is initialized as
ftrace_stub(an empty function), so, the default _mcount will introduce
very little overhead. after enabling ftrace in user-space, it will jump
to a real tracing function and do static function tracing for us.
Commit message from Wu Zhangjin <wuzhangjin@gmail.com>
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Fri, 30 Oct 2009 11:26:53 +0000 (12:26 +0100)]
microblaze: Add TRACE_IRQFLAGS_SUPPORT
There are just two major changes
Renamed local_irq functions to raw_local_irq in irq.c.
Added TRACE_IRQFLAGS_SUPPORT to Kconfig.debug.
Look at Documentation/irqflags-tracing.txt
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Mon, 16 Nov 2009 08:09:47 +0000 (09:09 +0100)]
microblaze: preliminary enabling for LATENCYTOP support in Kconfig
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Thu, 10 Dec 2009 11:07:02 +0000 (12:07 +0100)]
microblaze: Lockdep support
Microblaze needs to do lock_init very soon because MMU init calls lock functions.
Here is the explanation from Peter Zijlstra why we have to enable
__ARCH_WANTS_INTERRUPTS_ON_CTSW.
"So we schedule while holding rq->lock (for obvious reasons), but since
lockdep tracks held locks per tasks, we need to transfer the held state
from the prev to the next task. We do this by explicity calling
spin_release(&rq->lock) in context_switch() right before switch_to(),
and calling spin_acquire(&rq->lock) in
finish_task_switch()->finish_lock_switch().
Now, for some reason lockdep thinks that interrupts got enabled over the
context switch (git grep __ARCH_WANTS_INTERRUPTS_ON_CTSW arch/microblaze
doesn't seem to turn up anything).
Clearly trying to acquire the rq->lock with interrupts enabled is a bad
idea and lockdep warns you about this."
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Fri, 6 Nov 2009 11:31:00 +0000 (12:31 +0100)]
microblaze: Register timecounter/cyclecounter
It is the same counter as we use as free running one.
I would like to use it for ftrace.
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Tue, 10 Nov 2009 14:57:01 +0000 (15:57 +0100)]
microblaze: Stack trace support
This is working implemetation but the problem is that
Microblaze misses frame pointer that's why is there
big loop which trace and show all addresses which are in text.
It shows addresses which are in registers, etc.
This is problem and this is the reason why all Microblaze
traces are wrong. There is an option to do hacks and trace
the kernel code but this is too complicated.
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Fri, 6 Nov 2009 11:27:25 +0000 (12:27 +0100)]
microblaze: Add IRQENTRY_TEXT to lds
It is important for ftrace irqsoff support
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Fri, 30 Oct 2009 13:41:52 +0000 (14:41 +0100)]
microblaze: __init_begin symbol must be aligned
The problem was that free_initmem pass to free_initrd_mem got
bad aligned __init_begin symbol and free_initrd_mem don't care
about __init_end but take PAGE_SIZE instead.
Here is behavior in kernel bootlog.
ramdisk_execute_command from (init/main.c) was rewrite
Freeing unused kernel memory: 6224k freed
Failed to execute ��������������{���
Failed to execute ��������������{����. Attempting defaults...
Mounting proc:
Mounting var:
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Fri, 2 Oct 2009 10:48:47 +0000 (12:48 +0200)]
microblaze: GPIO reset support
Signed-off-by: Michal Simek <monstr@monstr.eu>
Dan Williams [Sun, 13 Dec 2009 04:17:12 +0000 (21:17 -0700)]
md: add 'recovery_start' per-device sysfs attribute
Enable external metadata arrays to manage rebuild checkpointing via a
md/dev-XXX/recovery_start attribute which reflects rdev->recovery_offset
Also update resync_start_store to allow 'none' to be written, for
consistency.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Dan Williams [Sun, 13 Dec 2009 04:17:06 +0000 (21:17 -0700)]
md: rcu_read_lock() walk of mddev->disks in md_do_sync()
Other walks of this list are either under rcu_read_lock() or the list
mutation lock (mddev_lock()). This protects against the improbable case of a
disk being removed from the array at the start of md_do_sync().
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
NeilBrown [Mon, 14 Dec 2009 01:50:06 +0000 (12:50 +1100)]
md: integrate spares into array at earliest opportunity.
As v1.x metadata can record that a member of the array is
not completely recovered, it make sense to record that a
spare has become a regular member of the array at the earliest
opportunity.
So remove the tests on "recovery_offset > 0" in super_1_sync
as they really aren't needed, and schedule a metadata update
immediately after adding spares to a degraded array.
This means that if a crash happens immediately after a recovery
starts, the new device will be included in the array and recovery will
continue from wherever it was up to. Previously this didn't happen
unless recovery was at least 1/16 of the way through.
Signed-off-by: NeilBrown <neilb@suse.de>
Arnd Bergmann [Mon, 14 Dec 2009 01:50:05 +0000 (12:50 +1100)]
md: move compat_ioctl handling into md.c
The RAID ioctls are only implemented in md.c, so the
handling for them should also be moved there from
fs/compat_ioctl.c.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Neil Brown <neilb@suse.de>
Cc: Andre Noll <maan@systemlinux.org>
Cc: linux-raid@vger.kernel.org
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Mon, 14 Dec 2009 01:49:59 +0000 (12:49 +1100)]
md: revise Kconfig help for MD_MULTIPATH
Make it clear in the config message that MD_MULTIPATH is not under
active development.
Cc: Oren Held <orenhe@il.ibm.com>
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Mon, 14 Dec 2009 01:49:58 +0000 (12:49 +1100)]
md: add MODULE_DESCRIPTION for all md related modules.
Suggested by Oren Held <orenhe@il.ibm.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Robert Becker [Mon, 14 Dec 2009 01:49:58 +0000 (12:49 +1100)]
raid: improve MD/raid10 handling of correctable read errors.
We've noticed severe lasting performance degradation of our raid
arrays when we have drives that yield large amounts of media errors.
The raid10 module will queue each failed read for retry, and also
will attempt call fix_read_error() to perform the read recovery.
Read recovery is performed while the array is frozen, so repeated
recovery attempts can degrade the performance of the array for
extended periods of time.
With this patch I propose adding a per md device max number of
corrected read attempts. Each rdev will maintain a count of
read correction attempts in the rdev->read_errors field (not
used currently for raid10). When we enter fix_read_error()
we'll check to see when the last read error occurred, and
divide the read error count by 2 for every hour since the
last read error. If at that point our read error count
exceeds the read error threshold, we'll fail the raid device.
In addition in this patch I add sysfs nodes (get/set) for
the per md max_read_errors attribute, the rdev->read_errors
attribute, and added some printk's to indicate when
fix_read_error fails to repair an rdev.
For testing I used debugfs->fail_make_request to inject
IO errors to the rdev while doing IO to the raid array.
Signed-off-by: Robert Becker <Rob.Becker@riverbed.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Robert Becker [Mon, 14 Dec 2009 01:49:57 +0000 (12:49 +1100)]
md/raid10: print more useful messages on device failure.
When we get a read error on a device in a RAID10, and attempting to
repair the error fails, print more useful messages about why it
failed.
Signed-off-by: Robert Becker <Rob.Becker@riverbed.com>
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Mon, 14 Dec 2009 01:49:56 +0000 (12:49 +1100)]
md/bitmap: update dirty flag when bitmap bits are explicitly set.
There is a sysfs file which allows bits in the write-intent
bitmap to be explicit set - indicating that the block is thought
to be 'dirty'.
When this happens we should really set recovery_cp backwards
to include the block to reflect this dirtiness.
In particular, a 'resync' process will refuse to start if
recovery_cp is beyond the end of the array, so this is needed
to allow a resync to be triggered.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Mon, 14 Dec 2009 01:49:56 +0000 (12:49 +1100)]
md: Support write-intent bitmaps with externally managed metadata.
In this case, the metadata needs to not be in the same
sector as the bitmap.
md will not read/write any bitmap metadata. Config must be
done via sysfs and when a recovery makes the array non-degraded
again, writing 'true' to 'bitmap/can_clear' will allow bits in
the bitmap to be cleared again.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Mon, 14 Dec 2009 01:49:56 +0000 (12:49 +1100)]
md/bitmap: move setting of daemon_lastrun out of bitmap_read_sb
Setting daemon_lastrun really has nothing to do with reading
the bitmap superblock, it just happens to be needed at the same time.
bitmap_read_sb is about to become options, so move that code out
to after the call to bitmap_read_sb.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Mon, 14 Dec 2009 01:49:55 +0000 (12:49 +1100)]
md: support updating bitmap parameters via sysfs.
A new attribute directory 'bitmap' in 'md' is created which
contains files for configuring the bitmap.
'location' identifies where the bitmap is, either 'none',
or 'file' or 'sector offset from metadata'.
Writing 'location' can create or remove a bitmap.
Adding a 'file' bitmap this way is not yet supported.
'chunksize' and 'time_base' must be set before 'location'
can be set.
'chunksize' can be set before creating a bitmap, but is
currently always over-ridden by the bitmap superblock.
'time_base' and 'backlog' can be updated at any time.
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Andre Noll <maan@systemlinux.org>
NeilBrown [Mon, 14 Dec 2009 01:49:55 +0000 (12:49 +1100)]
md: factor out parsing of fixed-point numbers
safe_delay_store can parse fixed point numbers (for fractions
of a second). We will want to do that for another sysfs
file soon, so factor out the code.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Mon, 14 Dec 2009 01:49:54 +0000 (12:49 +1100)]
md: support bitmap offset appropriate for external-metadata arrays.
For md arrays were metadata is managed externally, the kernel does not
know about a superblock so the superblock offset is 0.
If we want to have a write-intent-bitmap near the end of the
devices of such an array, we should support sector_t sized offset.
We need offset be possibly negative for when the bitmap is before
the metadata, so use loff_t instead.
Also add sanity check that bitmap does not overlap with data.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Mon, 14 Dec 2009 01:49:54 +0000 (12:49 +1100)]
md: remove needless setting of thread->timeout in raid10_quiesce
As bitmap_create and bitmap_destroy already set thread->timeout
as appropriate, there is no need to do it in raid10_quiesce.
There is a possible need to wake the thread after the timeout
has been set low, but it is better to do that where the timeout
is actually set low, in bitmap_create.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Mon, 14 Dec 2009 01:49:53 +0000 (12:49 +1100)]
md: change daemon_sleep to be in 'jiffies' rather than 'seconds'.
This removes a lot of multiplications by HZ.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Mon, 14 Dec 2009 01:49:53 +0000 (12:49 +1100)]
md: move offset, daemon_sleep and chunksize out of bitmap structure
... and into bitmap_info. These are all configuration parameters
that need to be set before the bitmap is created.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Mon, 14 Dec 2009 01:49:52 +0000 (12:49 +1100)]
md: collect bitmap-specific fields into one structure.
In preparation for making bitmap fields configurable via sysfs,
start tidying up by making a single structure to contain the
configuration fields.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Mon, 14 Dec 2009 01:49:51 +0000 (12:49 +1100)]
md/raid1: add takeover support for raid5->raid1
A 2-device raid5 array can now be converted to raid1.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Mon, 14 Dec 2009 01:49:51 +0000 (12:49 +1100)]
md: add honouring of suspend_{lo,hi} to raid1.
This will allow us to stop writeout to portions of the array
while they are resynced by someone else - e.g. another node in
a cluster.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Mon, 14 Dec 2009 01:49:50 +0000 (12:49 +1100)]
md/raid5: don't complete make_request on barrier until writes are scheduled
The post-barrier-flush is sent by md as soon as make_request on the
barrier write completes. For raid5, the data might not be in the
per-device queues yet. So for barrier requests, wait for any
pre-reading to be done so that the request will be in the per-device
queues.
We use the 'preread_active' count to check that nothing is still in
the preread phase, and delay the decrement of this count until after
write requests have been submitted to the underlying devices.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Mon, 14 Dec 2009 01:49:49 +0000 (12:49 +1100)]
md: support barrier requests on all personalities.
Previously barriers were only supported on RAID1. This is because
other levels requires synchronisation across all devices and so needed
a different approach.
Here is that approach.
When a barrier arrives, we send a zero-length barrier to every active
device. When that completes - and if the original request was not
empty - we submit the barrier request itself (with the barrier flag
cleared) and then submit a fresh load of zero length barriers.
The barrier request itself is asynchronous, but any subsequent
request will block until the barrier completes.
The reason for clearing the barrier flag is that a barrier request is
allowed to fail. If we pass a non-empty barrier through a striping
raid level it is conceivable that part of it could succeed and part
could fail. That would be way too hard to deal with.
So if the first run of zero length barriers succeed, we assume all is
sufficiently well that we send the request and ignore errors in the
second run of barriers.
RAID5 needs extra care as write requests may not have been submitted
to the underlying devices yet. So we flush the stripe cache before
proceeding with the barrier.
Note that the second set of zero-length barriers are submitted
immediately after the original request is submitted. Thus when
a personality finds mddev->barrier to be set during make_request,
it should not return from make_request until the corresponding
per-device request(s) have been queued.
That will be done in later patches.
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Andre Noll <maan@systemlinux.org>
NeilBrown [Mon, 14 Dec 2009 01:49:49 +0000 (12:49 +1100)]
md: don't reset curr_resync_completed after an interrupted resync
If a resync/recovery/check/repair is interrupted for some reason, it
can be useful to know exactly where it got up to.
So in that case, do not clear curr_resync_completed.
Initialise it when starting a resync/recovery/... instead.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Mon, 14 Dec 2009 01:49:48 +0000 (12:49 +1100)]
md: adjust resync_min usefully when resync aborts.
When a 'check' or 'repair' finished we should clear resync_min
so that a future check/repair will cover the whole array (by default).
However if it is interrupted, we should update resync_min to
where we got up to, so that when the check/repair continues it
just does the remainder of the array.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Mon, 14 Dec 2009 01:49:47 +0000 (12:49 +1100)]
md: remove sparse warning:symbol XXX was not declared.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Mon, 14 Dec 2009 01:49:47 +0000 (12:49 +1100)]
md/raid5: remove some sparse warnings.
qd_idx is previously declared and given exactly the same value!
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Mon, 14 Dec 2009 01:49:46 +0000 (12:49 +1100)]
md/bitmap: protect against bitmap removal while being updated.
A write intent bitmap can be removed from an array while the
array is active.
When this happens, all IO is suspended and flushed before the
bitmap is removed.
However it is possible that bitmap_daemon_work is still running to
clear old bits from the bitmap. If it is, it can dereference the
bitmap after it has been freed.
So introduce a new mutex to protect bitmap_daemon_work and get it
before destroying a bitmap.
This is suitable for any current -stable kernel.
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: stable@kernel.org