Linus Torvalds [Thu, 14 Aug 2008 03:48:46 +0000 (20:48 -0700)]
Merge git://git./linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (56 commits)
netns: Fix crash by making igmp per namespace
bnx2x: Version update
bnx2x: Checkpatch compliance
bnx2x: Spelling mistakes
bnx2x: Minor code improvements
bnx2x: Driver info
bnx2x: 1G LED does not turn off
bnx2x: 8073 PHY changes
bnx2x: Change GPIO for any port
bnx2x: Pause settings
bnx2x: Link order with external PHY
bnx2x: No LRO without Rx checksum
bnx2x: Wrong structure size
bnx2x: WoL capability
bnx2x: Clearing MAC addresses filters
bnx2x: Delay in while loops
bnx2x: PBA Table Page Alignment Workaround
bnx2x: Self-test false positive
bnx2x: Memory allocation
bnx2x: HW attention lock
...
Linus Torvalds [Thu, 14 Aug 2008 03:48:25 +0000 (20:48 -0700)]
Merge git://git./linux/kernel/git/davem/sparc-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
sparc64: Handle stack trace attempts before irqstacks are setup.
sparc64: Implement IRQ stacks.
sparc: remove include of linux/of_device.h from asm/of_device.h
sparc64: Fix recursion in stack overflow detection handling.
sparc/drivers: use linux/of_device.h instead of asm/of_device.h
sparc64: Don't MAGIC_SYSRQ ifdef smp_fetch_global_regs and support code.
David S. Miller [Thu, 14 Aug 2008 00:17:52 +0000 (17:17 -0700)]
sparc64: Handle stack trace attempts before irqstacks are setup.
Things like lockdep can try to do stack backtraces before
the irqstack blocks have been setup. So don't try to match
their ranges so early on.
Also, remove unused variable in save_stack_trace().
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Lezcano [Wed, 13 Aug 2008 23:15:57 +0000 (16:15 -0700)]
netns: Fix crash by making igmp per namespace
This patch makes the multicast socket to be per namespace.
When a network namespace is created, other than the init_net and a
multicast packet is received, the kernel goes to a hang or a kernel panic.
How to reproduce ?
* create a child network namespace
* create a pair virtual device veth
* ip link add type veth
* move one side to the pair network device to the child namespace
* ip link set netns <childpid> dev veth1
* ping -I veth0 224.0.0.1
The bug appears because the function ip_mc_init_dev does not initialize
the different multicast fields as it exits because it is not the init_net.
BUG: soft lockup - CPU#0 stuck for 61s! [avahi-daemon:2695]
Modules linked in:
irq event stamp: 50350
hardirqs last enabled at (50349): [<
c03ee949>] _spin_unlock_irqrestore+0x34/0x39
hardirqs last disabled at (50350): [<
c03ec639>] schedule+0x9f/0x5ff
softirqs last enabled at (45712): [<
c0374d4b>] ip_setsockopt+0x8e7/0x909
softirqs last disabled at (45710): [<
c03ee682>] _spin_lock_bh+0x8/0x27
Pid: 2695, comm: avahi-daemon Not tainted (2.6.27-rc2-00029-g0872073 #3)
EIP: 0060:[<
c03ee47c>] EFLAGS:
00000297 CPU: 0
EIP is at __read_lock_failed+0x8/0x10
EAX:
c4f38810 EBX:
c4f38810 ECX:
00000000 EDX:
c04cc22e
ESI:
fb0000e0 EDI:
00000011 EBP:
0f02000a ESP:
c4e3faa0
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
CR0:
8005003b CR2:
44618a40 CR3:
04e37000 CR4:
000006d0
DR0:
00000000 DR1:
00000000 DR2:
00000000 DR3:
00000000
DR6:
ffff0ff0 DR7:
00000400
[<
c02311f8>] ? _raw_read_lock+0x23/0x25
[<
c0390666>] ? ip_check_mc+0x1c/0x83
[<
c036d478>] ? ip_route_input+0x229/0xe92
[<
c022e2e4>] ? trace_hardirqs_on_thunk+0xc/0x10
[<
c0104c9c>] ? do_IRQ+0x69/0x7d
[<
c0102e64>] ? restore_nocheck_notrace+0x0/0xe
[<
c036fdba>] ? ip_rcv+0x227/0x505
[<
c0358764>] ? netif_receive_skb+0xfe/0x2b3
[<
c03588d2>] ? netif_receive_skb+0x26c/0x2b3
[<
c035af31>] ? process_backlog+0x73/0xbd
[<
c035a8cd>] ? net_rx_action+0xc1/0x1ae
[<
c01218a8>] ? __do_softirq+0x7b/0xef
[<
c0121953>] ? do_softirq+0x37/0x4d
[<
c035b50d>] ? dev_queue_xmit+0x3d4/0x40b
[<
c0122037>] ? local_bh_enable+0x96/0xab
[<
c035b50d>] ? dev_queue_xmit+0x3d4/0x40b
[<
c012181e>] ? _local_bh_enable+0x79/0x88
[<
c035fcb8>] ? neigh_resolve_output+0x20f/0x239
[<
c0373118>] ? ip_finish_output+0x1df/0x209
[<
c0373364>] ? ip_dev_loopback_xmit+0x62/0x66
[<
c0371db5>] ? ip_local_out+0x15/0x17
[<
c0372013>] ? ip_push_pending_frames+0x25c/0x2bb
[<
c03891b8>] ? udp_push_pending_frames+0x2bb/0x30e
[<
c038a189>] ? udp_sendmsg+0x413/0x51d
[<
c038a1a9>] ? udp_sendmsg+0x433/0x51d
[<
c038f927>] ? inet_sendmsg+0x35/0x3f
[<
c034f092>] ? sock_sendmsg+0xb8/0xd1
[<
c012d554>] ? autoremove_wake_function+0x0/0x2b
[<
c022e6de>] ? copy_from_user+0x32/0x5e
[<
c022e6de>] ? copy_from_user+0x32/0x5e
[<
c034f238>] ? sys_sendmsg+0x18d/0x1f0
[<
c0175e90>] ? pipe_write+0x3cb/0x3d7
[<
c0170347>] ? do_sync_write+0xbe/0x105
[<
c012d554>] ? autoremove_wake_function+0x0/0x2b
[<
c03503b2>] ? sys_socketcall+0x176/0x1b0
[<
c01085ea>] ? syscall_trace_enter+0x6c/0x7b
[<
c0102e1a>] ? syscall_call+0x7/0xb
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eilon Greenstein [Wed, 13 Aug 2008 22:59:45 +0000 (15:59 -0700)]
bnx2x: Version update
Version update
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eilon Greenstein [Wed, 13 Aug 2008 22:59:25 +0000 (15:59 -0700)]
bnx2x: Checkpatch compliance
Checkpatch compliance
The latest version of checkpatch found the following style errors in the
code
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eilon Greenstein [Wed, 13 Aug 2008 22:59:08 +0000 (15:59 -0700)]
bnx2x: Spelling mistakes
Spelling mistakes
Spelling has to L's in it...
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eilon Greenstein [Wed, 13 Aug 2008 22:58:49 +0000 (15:58 -0700)]
bnx2x: Minor code improvements
Minor code improvements
Small changes to make the code a little bit more efficient and mostly
more readable:
- Using unified macros for EMAC_RD/WR which looks like normal REG_RD/WR
- Removing the NIG_WR since it did nothing and was only confusing
- On bnx2x_panic_dump, print only the used parts of the rings
- define parameters only on the branch they are needed and not at the
beginning of the function
- using NETIF_MSG_INTR and not private BNX2X_MSG_SP for debug prints
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eilon Greenstein [Wed, 13 Aug 2008 22:58:30 +0000 (15:58 -0700)]
bnx2x: Driver info
Driver info
The internal FW which is downloaded by the driver should not be
displayed - it is only causing confusion and it is redundant since it
can be concluded from the driver version. Display only FW which is
burned on the board nvram
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eilon Greenstein [Wed, 13 Aug 2008 22:58:12 +0000 (15:58 -0700)]
bnx2x: 1G LED does not turn off
1G LED does not turn off
The 1G LED was not switched to off when the link was lost
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yaniv Rosner [Wed, 13 Aug 2008 22:57:28 +0000 (15:57 -0700)]
bnx2x: 8073 PHY changes
8073 PHY changes
The initial support we had for this PHY needs some serious changing. The
major change is that this PHY should be initialized only when the first
function is loaded and not for each function. The official SPI-ROM of
this PHY was released and it requires some changes in the initialization
code as well
Signed-off-by: Yaniv Rosner <yanivr@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eilon Greenstein [Wed, 13 Aug 2008 22:56:59 +0000 (15:56 -0700)]
bnx2x: Change GPIO for any port
Change GPIO for any port
The set GPIO function should receive the port index to allow changing
the GPIO of another port. This is needed for the common init phase (one
the first driver is loaded for the chip)
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yaniv Rosner [Wed, 13 Aug 2008 22:56:17 +0000 (15:56 -0700)]
bnx2x: Pause settings
Pause settings
- 1G pause was not working due to missing write to the emac block
(TX_MODE_FLOW_EN)
- The flow control should use the negotiated result (after autoneg) so
we should save both the requested autoneg and the result
- The HW credits with flow control at 1G speed were not optimized and
caused low throughput
- It is recommended to turn off flow control if the MTU is bigger than
5000B due to internal buffers size
Signed-off-by: Yaniv Rosner <yanivr@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yaniv Rosner [Wed, 13 Aug 2008 22:55:28 +0000 (15:55 -0700)]
bnx2x: Link order with external PHY
Link order with external PHY
When external PHY exists (second chip with the PHY to translate to
another physical medium) the link with the eternal PHY and the network
should be established before setting the link between the 5771x and the
PHY. This is the right order and it is important when using autoneg -
the link to the network should use the autoneg and the link between the
two chips should be forced to the network result.
Signed-off-by: Yaniv Rosner <yanivr@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladislav Zolotarov [Wed, 13 Aug 2008 22:53:38 +0000 (15:53 -0700)]
bnx2x: No LRO without Rx checksum
No LRO without Rx checksum
Disabling LRO when Rx checksum is disabled
Signed-off-by: Vladislav Zolotarov <vladz@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yitchak Gertner [Wed, 13 Aug 2008 22:53:12 +0000 (15:53 -0700)]
bnx2x: Wrong structure size
Wrong structure size
The wrong structure was used in the sizeof to clear (luckily both
structures have the same size in this version...)
Signed-off-by: Yitchak Gertner <gertner@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eilon Greenstein [Wed, 13 Aug 2008 22:52:46 +0000 (15:52 -0700)]
bnx2x: WoL capability
WoL capability
All designs reported WoL capability regardless of HW limitations - check
if this device is actually capable of WoL
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yitchak Gertner [Wed, 13 Aug 2008 22:52:28 +0000 (15:52 -0700)]
bnx2x: Clearing MAC addresses filters
Clearing MAC addresses filters
When the driver unloads, it should clear the MAC addresses filters in
the HW - this prevents packets from entering the chip when the driver is
re-loaded before initializing the right filters
Signed-off-by: Yitchak Gertner <gertner@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yitchak Gertner [Wed, 13 Aug 2008 22:52:08 +0000 (15:52 -0700)]
bnx2x: Delay in while loops
Delay in while loops
The delay in the loop should be after the change. This has very little
effect (can save one delay) but it is the right thing to do
Signed-off-by: Yitchak Gertner <gertner@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eilon Greenstein [Wed, 13 Aug 2008 22:51:48 +0000 (15:51 -0700)]
bnx2x: PBA Table Page Alignment Workaround
PBA Table Page Alignment Workaround
The PBA table starts on the middle of the page and that's causing very
low performance with virtualization. The solution is not to update via
the BAR directly but via chip access to the same memory
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yitchak Gertner [Wed, 13 Aug 2008 22:51:28 +0000 (15:51 -0700)]
bnx2x: Self-test false positive
Self-test false positive
- The memory test should use a mask according to the chip type
- In the register test, check the port only once and not inside the for
loop (not causing a failure - just ugly)
Signed-off-by: Yitchak Gertner <gertner@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eilon Greenstein [Wed, 13 Aug 2008 22:51:07 +0000 (15:51 -0700)]
bnx2x: Memory allocation
Memory allocation
- The CQE ring was allocated to the max size even for a chip that does
not support it. Fixed to allocate according to the chip type to save
memory
- The rx_page_ring was not freed on driver unload
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eilon Greenstein [Wed, 13 Aug 2008 22:50:45 +0000 (15:50 -0700)]
bnx2x: HW attention lock
HW attention lock
Making sure that only one function will handle the HW attention. This
makes the device parameter aeu_mask redundant so it is removed
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yitchak Gertner [Wed, 13 Aug 2008 22:50:23 +0000 (15:50 -0700)]
bnx2x: HW lock mechanism
HW lock mechanism
Enhancing the HW lock to work per function and not only per port - this
is needed for the next patch that protects races over HW attention
detection between the different functions. At this chance, changing the
functions names to be more inline with the current naming convention
Signed-off-by: Yitchak Gertner <gertner@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladislav Zolotarov [Wed, 13 Aug 2008 22:50:00 +0000 (15:50 -0700)]
bnx2x: Load/Unload under traffic
Load/Unload under traffic
Few issues were found when loading and unloading under traffic:
- When receiving Tx interrupt call netif_wake_queue if the queue is
stopped but the state is open
- Check that interrupts are enabled before doing anything else on the
msix_fp_int function
- In nic_load, enable the interrupts only when needed and ready for it
- Function stop_leading returns status since it can fail
- Add 1ms delay when unloading the driver to validate that there are no
open transactions that already started by the FW
- Splitting the "has work" function into Tx and Rx so the same function
will be used on unload and interrupts
- Do not request for WoL if only resetting the device (save the time
that it takes the FW to set the link after reset)
- Fixing the device reset after iSCSI boot and before driver load - all
internal buffers must be cleared before the driver is loaded
Signed-off-by: Vladislav Zolotarov <vladz@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eilon Greenstein [Wed, 13 Aug 2008 22:49:35 +0000 (15:49 -0700)]
bnx2x: FW Internal Memory structure
FW Internal Memory structure
The FW uses data structures on the chip internal memory to aggregate the
connections when TPA is enabled. The driver was clearing the wrong offsets
and therefore one function could cause another function to loose packets.
Changing the initialization of the chip internal memory to clear only the
relevant memory for each function which is being loaded
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yitchak Gertner [Wed, 13 Aug 2008 22:49:05 +0000 (15:49 -0700)]
bnx2x: Statistics
Statistics
- Making sure that each drop is accounted for in the driver statistics
- Clearing the FW statistics when driver is loaded to prevent
inconsistency with HW statistics
- Once error is detected (bnx2x_panic_dump), stop the statistics
before other actions (currently it is stopped last and can corrupt
the data) - Adding HW checksum error counter to the statistics
- Removing unused variable stats_ticks
- Using macros instead of magic numbers to indicate which statistics are
shared per port and which are per function
Signed-off-by: Yitchak Gertner <gertner@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eilon Greenstein [Wed, 13 Aug 2008 22:48:29 +0000 (15:48 -0700)]
bnx2x: Not dropping packets with L3/L4 checksum error
Not dropping packets with L3/L4 checksum error
Those packets should be passed to the OS. The problem is clear in
forwarding mode.
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eilon Greenstein [Wed, 13 Aug 2008 22:47:33 +0000 (15:47 -0700)]
bnx2x: FW (bootcode) interface fixes
FW (bootcode) interface fixes
- Making sure that the device will not cause kernel panic of the
bootcode is corrupted or missing
- Removing module debug parameter "nomcp" since no one should work
without the bootcode (this is a left over from the chip bring up days)
- Instead of waiting fix amount of time for bootcode response, sample it
every 10ms (usually the answer is ready after less than 10ms)
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Wed, 13 Aug 2008 22:24:35 +0000 (15:24 -0700)]
Merge git://git./linux/kernel/git/herbert/crypto-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: padlock - fix VIA PadLock instruction usage with irq_ts_save/restore()
crypto: hash - Add missing top-level functions
crypto: hash - Fix digest size check for digest type
crypto: tcrypt - Fix AEAD chunk testing
crypto: talitos - Add handling for SEC 3.x treatment of link table
Jarek Poplawski [Wed, 13 Aug 2008 22:20:24 +0000 (15:20 -0700)]
pkt_sched: Protect gen estimators under est_lock.
gen_kill_estimator() required rtnl_lock() protection, but since it is
moved to an RCU callback __qdisc_destroy() let's use est_lock instead.
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 13 Aug 2008 22:18:38 +0000 (15:18 -0700)]
pkt_sched: Fix queue quiescence testing in dev_deactivate().
Based upon discussions with Jarek P. and Herbert Xu.
First, we're testing the wrong qdisc. We just reset the device
queue qdiscs to &noop_qdisc and checking it's state is completely
pointless here.
We want to wait until the previous qdisc that was sitting at
the ->qdisc pointer is not busy any more. And that would be
->qdisc_sleeping.
Because of how we propagate the samples qdisc pointer down into
qdisc_run and friends via per-cpu ->output_queue and netif_schedule,
we have to wait also for the __QDISC_STATE_SCHED bit to clear as
well.
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Wed, 13 Aug 2008 22:17:49 +0000 (15:17 -0700)]
Merge git://oss.sgi.com:8090/xfs/linux-2.6
* git://oss.sgi.com:8090/xfs/linux-2.6: (45 commits)
[XFS] Fix use after free in xfs_log_done().
[XFS] Make xfs_bmap_*_count_leaves void.
[XFS] Use KM_NOFS for debug trace buffers
[XFS] use KM_MAYFAIL in xfs_mountfs
[XFS] refactor xfs_mount_free
[XFS] don't call xfs_freesb from xfs_unmountfs
[XFS] xfs_unmountfs should return void
[XFS] cleanup xfs_mountfs
[XFS] move root inode IRELE into xfs_unmountfs
[XFS] stop using file_update_time
[XFS] optimize xfs_ichgtime
[XFS] update timestamp in xfs_ialloc manually
[XFS] remove the sema_t from XFS.
[XFS] replace dquot flush semaphore with a completion
[XFS] replace inode flush semaphore with a completion
[XFS] extend completions to provide XFS object flush requirements
[XFS] replace the XFS buf iodone semaphore with a completion
[XFS] clean up stale references to semaphores
[XFS] use get_unaligned_* helpers
[XFS] Fix compile failure in xfs_buf_trace()
...
Jarek Poplawski [Wed, 13 Aug 2008 22:16:43 +0000 (15:16 -0700)]
pkt_sched: Fix oops in htb_delete.
Recent changes introduced a bug in htb_delete(): cl->parent->children
counter update misses checking cl->parent for NULL, which is used for
root classes, so deleting them causes an oops.
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Wed, 13 Aug 2008 22:16:10 +0000 (15:16 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/teigland/dlm
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm:
dlm: rename structs
dlm: add missing kfrees
Andrew Gallatin [Wed, 13 Aug 2008 22:16:00 +0000 (15:16 -0700)]
pktgen: prevent pktgen from using bad tx queue
With the new multi-queue transmit code, it is possible to accidentally
make pktgen pick a non-existing tx queue simply by using a stale
script to drive pktgen. Access to this non-existing tx queue will
then trigger a bad memory access and kill the machine.
For example, setting "queue_map_max 2" will cause my machine to die
when accessing a garbage spinlock in the non-existing tx queue:
BUG: spinlock bad magic on CPU#0, kpktgend_0/564
lock:
ffff88001ddf6718, .magic:
ffffffff, .owner: /-1, .owner_cpu: 0
Pid: 564, comm: kpktgend_0 Not tainted 2.6.27-rc3 #35
Call Trace:
[<
ffffffff803a1228>] spin_bug+0xa4/0xac
[<
ffffffff803a1253>] _raw_spin_lock+0x23/0x123
[<
ffffffff8055b06f>] _spin_lock_bh+0x17/0x1b
[<
ffffffff804cb57d>] pktgen_thread_worker+0xa97/0x1002
[<
ffffffff8022874d>] ? finish_task_switch+0x38/0x97
[<
ffffffff80242077>] ? autoremove_wake_function+0x0/0x36
[<
ffffffff80242077>] ? autoremove_wake_function+0x0/0x36
[<
ffffffff804caae6>] ? pktgen_thread_worker+0x0/0x1002
[<
ffffffff80241a40>] kthread+0x44/0x6d
[<
ffffffff8020c399>] child_rip+0xa/0x11
[<
ffffffff802419fc>] ? kthread+0x0/0x6d
[<
ffffffff8020c38f>] ? child_rip+0x0/0x11
The attached patch adds some sanity checking to prevent
these sorts of configuration errors.
Signed-off-by: Andrew Gallatin <gallatin@myri.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Wed, 13 Aug 2008 21:26:22 +0000 (14:26 -0700)]
[h8300] move include/asm-h8300 to arch/h8300/include/asm
Done as a script (well, a single "git mv" actually) on request from
Yoshinori Sato as a way to avoid a huge diff.
Requested-by: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Arnaldo Carvalho de Melo [Wed, 13 Aug 2008 20:48:39 +0000 (13:48 -0700)]
dccp: change L/R must have at least one byte in the dccpsf_val field
Thanks to Eugene Teo for reporting this problem.
Signed-off-by: Eugene Teo <eugenete@kernel.sg>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jean-Christophe DUBOIS [Wed, 13 Aug 2008 20:35:37 +0000 (13:35 -0700)]
xfrm: remove unnecessary variable in xfrm_output_resume() 2nd try
Small fix removing an unnecessary intermediate variable.
Signed-off-by: Jean-Christophe DUBOIS <jcd@tribudubois.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Teigland [Thu, 31 Jul 2008 14:31:53 +0000 (09:31 -0500)]
dlm: rename structs
Add a dlm_ prefix to the struct names in config.c. This resolves a
conflict with struct node in particular, when include/linux/node.h
happens to be included.
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Teigland <teigland@redhat.com>
David Teigland [Tue, 29 Jul 2008 20:21:19 +0000 (15:21 -0500)]
dlm: add missing kfrees
A couple of unlikely error conditions were missing a kfree on the error
exit path.
Reported-by: Juha Leppanen <juha_motorsportcom@luukku.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Suresh Siddha [Wed, 13 Aug 2008 12:02:26 +0000 (22:02 +1000)]
crypto: padlock - fix VIA PadLock instruction usage with irq_ts_save/restore()
Wolfgang Walter reported this oops on his via C3 using padlock for
AES-encryption:
##################################################################
BUG: unable to handle kernel NULL pointer dereference at
000001f0
IP: [<
c01028c5>] __switch_to+0x30/0x117
*pde =
00000000
Oops: 0002 [#1] PREEMPT
Modules linked in:
Pid: 2071, comm: sleep Not tainted (2.6.26 #11)
EIP: 0060:[<
c01028c5>] EFLAGS:
00010002 CPU: 0
EIP is at __switch_to+0x30/0x117
EAX:
00000000 EBX:
c0493300 ECX:
dc48dd00 EDX:
c0493300
ESI:
dc48dd00 EDI:
c0493530 EBP:
c04cff8c ESP:
c04cff7c
DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
Process sleep (pid: 2071, ti=
c04ce000 task=
dc48dd00 task.ti=
d2fe6000)
Stack:
dc48df30 c0493300 00000000 00000000 d2fe7f44 c03b5b43 c04cffc8 00000046
c0131856 0000005a dc472d3c c0493300 c0493470 d983ae00 00002696 00000000
c0239f54 00000000 c04c4000 c04cffd8 c01025fe c04f3740 00049800 c04cffe0
Call Trace:
[<
c03b5b43>] ? schedule+0x285/0x2ff
[<
c0131856>] ? pm_qos_requirement+0x3c/0x53
[<
c0239f54>] ? acpi_processor_idle+0x0/0x434
[<
c01025fe>] ? cpu_idle+0x73/0x7f
[<
c03a4dcd>] ? rest_init+0x61/0x63
=======================
Wolfgang also found out that adding kernel_fpu_begin() and kernel_fpu_end()
around the padlock instructions fix the oops.
Suresh wrote:
These padlock instructions though don't use/touch SSE registers, but it behaves
similar to other SSE instructions. For example, it might cause DNA faults
when cr0.ts is set. While this is a spurious DNA trap, it might cause
oops with the recent fpu code changes.
This is the code sequence that is probably causing this problem:
a) new app is getting exec'd and it is somewhere in between
start_thread() and flush_old_exec() in the load_xyz_binary()
b) At pont "a", task's fpu state (like TS_USEDFPU, used_math() etc) is
cleared.
c) Now we get an interrupt/softirq which starts using these encrypt/decrypt
routines in the network stack. This generates a math fault (as
cr0.ts is '1') which sets TS_USEDFPU and restores the math that is
in the task's xstate.
d) Return to exec code path, which does start_thread() which does
free_thread_xstate() and sets xstate pointer to NULL while
the TS_USEDFPU is still set.
e) At the next context switch from the new exec'd task to another task,
we have a scenarios where TS_USEDFPU is set but xstate pointer is null.
This can cause an oops during unlazy_fpu() in __switch_to()
Now:
1) This should happen with or with out pre-emption. Viro also encountered
similar problem with out CONFIG_PREEMPT.
2) kernel_fpu_begin() and kernel_fpu_end() will fix this problem, because
kernel_fpu_begin() will manually do a clts() and won't run in to the
situation of setting TS_USEDFPU in step "c" above.
3) This was working before the fpu changes, because its a spurious
math fault which doesn't corrupt any fpu/sse registers and the task's
math state was always in an allocated state.
With out the recent lazy fpu allocation changes, while we don't see oops,
there is a possible race still present in older kernels(for example,
while kernel is using kernel_fpu_begin() in some optimized clear/copy
page and an interrupt/softirq happens which uses these padlock
instructions generating DNA fault).
This is the failing scenario that existed even before the lazy fpu allocation
changes:
0. CPU's TS flag is set
1. kernel using FPU in some optimized copy routine and while doing
kernel_fpu_begin() takes an interrupt just before doing clts()
2. Takes an interrupt and ipsec uses padlock instruction. And we
take a DNA fault as TS flag is still set.
3. We handle the DNA fault and set TS_USEDFPU and clear cr0.ts
4. We complete the padlock routine
5. Go back to step-1, which resumes clts() in kernel_fpu_begin(), finishes
the optimized copy routine and does kernel_fpu_end(). At this point,
we have cr0.ts again set to '1' but the task's TS_USEFPU is stilll
set and not cleared.
6. Now kernel resumes its user operation. And at the next context
switch, kernel sees it has do a FP save as TS_USEDFPU is still set
and then will do a unlazy_fpu() in __switch_to(). unlazy_fpu()
will take a DNA fault, as cr0.ts is '1' and now, because we are
in __switch_to(), math_state_restore() will get confused and will
restore the next task's FP state and will save it in prev tasks's FP state.
Remember, in __switch_to() we are already on the stack of the next task
but take a DNA fault for the prev task.
This causes the fpu leakage.
Fix the padlock instruction usage by calling them inside the
context of new routines irq_ts_save/restore(), which clear/restore cr0.ts
manually in the interrupt context. This will not generate spurious DNA
in the context of the interrupt which will fix the oops encountered and
the possible FPU leakage issue.
Reported-and-bisected-by: Wolfgang Walter <wolfgang.walter@stwm.de>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Herbert Xu [Tue, 5 Aug 2008 05:34:30 +0000 (13:34 +0800)]
crypto: hash - Add missing top-level functions
The top-level functions init/update/final were missing for ahash.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Herbert Xu [Sun, 3 Aug 2008 13:19:43 +0000 (21:19 +0800)]
crypto: hash - Fix digest size check for digest type
The changeset
ca786dc738f4f583b57b1bba7a335b5e8233f4b0
crypto: hash - Fixed digest size check
missed one spot for the digest type. This patch corrects that
error.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Herbert Xu [Wed, 30 Jul 2008 08:23:51 +0000 (16:23 +0800)]
crypto: tcrypt - Fix AEAD chunk testing
My changeset
4b22f0ddb6564210c9ded7ba25b2a1007733e784
crypto: tcrpyt - Remove unnecessary kmap/kunmap calls
introduced a typo that broke AEAD chunk testing. In particular,
axbuf should really be xbuf.
There is also an issue with testing the last segment when encrypting.
The additional part produced by AEAD wasn't tested. Similarly, on
decryption the additional part of the AEAD input is mistaken for
corruption.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Lee Nipper [Wed, 30 Jul 2008 08:26:57 +0000 (16:26 +0800)]
crypto: talitos - Add handling for SEC 3.x treatment of link table
Later SEC revision requires the link table (used for scatter/gather)
to have an extra entry to account for the total length in descriptor [4],
which contains cipher Input and ICV.
This only applies to decrypt, not encrypt.
Without this change, on 837x, a gather return/length error results
when a decryption uses a link table to gather the fragments.
This is observed by doing a ping with size of 1447 or larger with AES,
or a ping with size 1455 or larger with 3des.
So, add check for SEC compatible "fsl,3.0" for using extra link table entry.
Signed-off-by: Lee Nipper <lee.nipper@freescale.com>
Signed-off-by: Kim Phillips <kim.phillips@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Jamal Hadi Salim [Wed, 13 Aug 2008 09:41:45 +0000 (02:41 -0700)]
net-sched: fix Action flushing return code
Flushing must consistently return ENOMEM on failure of any allocation
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jamal Hadi Salim [Wed, 13 Aug 2008 09:41:22 +0000 (02:41 -0700)]
net-sched: Fix actions flushing
Flushing of actions has been broken since we changed
the semantics of netlink parsed tb[X] to mean X is an attribute type.
This makes the flushing work.
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julien Brunel [Wed, 13 Aug 2008 09:40:48 +0000 (02:40 -0700)]
net/rxrpc: Use an IS_ERR test rather than a NULL test
In case of error, the function rxrpc_get_transport returns an ERR
pointer, but never returns a NULL pointer. So after a call to this
function, a NULL test should be replaced by an IS_ERR test.
A simplified version of the semantic patch that makes this change is
as follows:
(http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@correct_null_test@
expression x,E;
statement S1, S2;
@@
x = rxrpc_get_transport(...)
<... when != x = E
if (
(
- x@p2 != NULL
+ ! IS_ERR ( x )
|
- x@p2 == NULL
+ IS_ERR( x )
)
)
S1
else S2
...>
? x = E;
// </smpl>
Signed-off-by: Julien Brunel <brunel@diku.dk>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jamal Hadi Salim [Wed, 13 Aug 2008 09:39:56 +0000 (02:39 -0700)]
wext: Send name on events
In the minimal the wireless extensions oughta send at least
the name in addition to the ifindex.
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rami Rosen [Wed, 13 Aug 2008 09:35:39 +0000 (02:35 -0700)]
ipv6: Kill unused ip6_prohibit_entry and ip6_blk_hole_entry declarations.
This patch removes ip6_prohibit_entry and ip6_blk_hole_entry
declarations from include/net/ip6_route.h as they are unused.
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rami Rosen [Wed, 13 Aug 2008 09:34:39 +0000 (02:34 -0700)]
ipv6: ip6_route.h cleanup.
This patch removes rt6_lock declaration from include/net/ip6_route.h
as it is unused.
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Morton [Wed, 13 Aug 2008 09:32:06 +0000 (02:32 -0700)]
net/tipc/subscr.c: don't use ___constant_swab32
It's an internal implementation detail which we _should_ be free to change.
So we did, and it promptly broke.
The compiler shold be able to work out when to use the __constant version
anyway.
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 13 Aug 2008 09:13:34 +0000 (02:13 -0700)]
pkt_sched: Add queue stopped test back to qdisc_run().
Based upon a bug report by Andrew Gallatin on netdev
with subject "CPU utilization increased in 2.6.27rc"
In commit
37437bb2e1ae8af470dfcd5b4ff454110894ccaf
("pkt_sched: Schedule qdiscs instead of netdev_queue.")
the test of the queue being stopped was erroneously
removed from qdisc_run().
When the TX queue of the device fills up, this omission
causes lots of extraneous useless work to be queued up
to softirq context, where we'll just return immediately
because the device is still stuffed up.
Signed-off-by: David S. Miller <davem@davemloft.net>
Brian Haley [Wed, 13 Aug 2008 08:58:57 +0000 (01:58 -0700)]
ipv6: Fix OOPS, ip -f inet6 route get fec0::1, linux-2.6.26, ip6_route_output, rt6_fill_node+0x175
Alexey Dobriyan wrote:
> On Thu, Aug 07, 2008 at 07:00:56PM +0200, John Gumb wrote:
>> Scenario: no ipv6 default route set.
>
>> # ip -f inet6 route get fec0::1
>>
>> BUG: unable to handle kernel NULL pointer dereference at
00000000
>> IP: [<
c0369b85>] rt6_fill_node+0x175/0x3b0
>> EIP is at rt6_fill_node+0x175/0x3b0
>
> 0xffffffff80424dd3 is in rt6_fill_node (net/ipv6/route.c:2191).
> 2186 } else
> 2187 #endif
> 2188 NLA_PUT_U32(skb, RTA_IIF, iif);
> 2189 } else if (dst) {
> 2190 struct in6_addr saddr_buf;
> 2191 ====> if (ipv6_dev_get_saddr(ip6_dst_idev(&rt->u.dst)->dev,
> ^^^^^^^^^^^^^^^^^^^^^^^^
> NULL
>
> 2192 dst, 0, &saddr_buf) == 0)
> 2193 NLA_PUT(skb, RTA_PREFSRC, 16, &saddr_buf);
> 2194 }
The commit that changed this can't be reverted easily, but the patch
below works for me.
Fix NULL de-reference in rt6_fill_node() when there's no IPv6 input
device present in the dst entry.
Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lachlan McIlroy [Wed, 13 Aug 2008 06:52:50 +0000 (16:52 +1000)]
[XFS] Fix use after free in xfs_log_done().
The ticket allocation code got reworked in 2.6.26 and we now free tickets
whereas before we used to cache them so the use-after-free went
undetected.
SGI-PV: 985525
SGI-Modid: xfs-linux-melb:xfs-kern:31877a
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
Ruben Porras [Wed, 13 Aug 2008 06:52:25 +0000 (16:52 +1000)]
[XFS] Make xfs_bmap_*_count_leaves void.
xfs_bmap_count_leaves and xfs_bmap_disk_count_leaves always return always
0, make them void.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31844a
Signed-off-by: Ruben Porras <ruben.porras@linworks.de>
Signed-off-by: Donald Douwsma <donaldd@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Lachlan McIlroy [Wed, 13 Aug 2008 06:51:57 +0000 (16:51 +1000)]
[XFS] Use KM_NOFS for debug trace buffers
Use KM_NOFS to prevent recursion back into the filesystem which can cause
deadlocks.
In the case of xfs_iread() we hold the lock on the inode cluster buffer
while allocating memory for the trace buffers. If we recurse back into XFS
to flush data that may require a transaction to allocate extents which
needs log space. This can deadlock with the xfsaild thread which can't
push the tail of the log because it is trying to get the inode cluster
buffer lock.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31838a
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
Christoph Hellwig [Wed, 13 Aug 2008 06:51:29 +0000 (16:51 +1000)]
[XFS] use KM_MAYFAIL in xfs_mountfs
Use KM_MAYFAIL for the m_perag allocation, we can deal with the error
easily and blocking forever during mount is not a good idea either.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31837a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Christoph Hellwig [Wed, 13 Aug 2008 06:50:47 +0000 (16:50 +1000)]
[XFS] refactor xfs_mount_free
xfs_mount_free mostly frees the perag data, which is something that is
duplicated in the mount error path.
Move the XFS_QM_DONE call to the caller and remove the useless
mutex_destroy/spinlock_destroy calls so that we can re-use it for the
mount error path. Also rename it to xfs_free_perag to reflect what it
does.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31836a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Christoph Hellwig [Wed, 13 Aug 2008 06:50:21 +0000 (16:50 +1000)]
[XFS] don't call xfs_freesb from xfs_unmountfs
xfs_readsb is called before xfs_mount so xfs_freesb should be called after
xfs_unmountfs, too. This means it now happens after a few things during
the of xfs_unmount which all have nothing to do with the superblock.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31835a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Christoph Hellwig [Wed, 13 Aug 2008 06:49:57 +0000 (16:49 +1000)]
[XFS] xfs_unmountfs should return void
xfs_unmounts can't and shouldn't return errors so declare it as returning
void.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31833a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Christoph Hellwig [Wed, 13 Aug 2008 06:49:32 +0000 (16:49 +1000)]
[XFS] cleanup xfs_mountfs
Remove all the useless flags and code keyed off it in xfs_mountfs.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31831a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Christoph Hellwig [Wed, 13 Aug 2008 06:49:04 +0000 (16:49 +1000)]
[XFS] move root inode IRELE into xfs_unmountfs
The root inode is allocated in xfs_mountfs so it should be release in
xfs_unmountfs. For the unmount case that means we do it after the the
xfs_sync(mp, SYNC_WAIT | SYNC_CLOSE) in the forced shutdown case and the
dmapi unmount event. Note that both reference the rip variable which might
be freed by that time in case inode flushing has kicked in, so strictly
speaking this might count as a bug fix
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31830a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Christoph Hellwig [Wed, 13 Aug 2008 06:48:12 +0000 (16:48 +1000)]
[XFS] stop using file_update_time
xfs_ichtime updates the xfs_inode and Linux inode timestamps just fine, no
need to call file_update_time and then copy the values over to the XFS
inode. The only additional thing in file_update_time are checks not
applicable to the write path.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31829a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
Christoph Hellwig [Wed, 13 Aug 2008 06:45:13 +0000 (16:45 +1000)]
[XFS] optimize xfs_ichgtime
Port a little optmization from file_update_time to xfs_ichgtime, and only
update the timestamp and mark the inode dirty if the timestamp actually
changes in the timer tick resultion supported by the running kernel.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31827a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Christoph Hellwig [Wed, 13 Aug 2008 06:44:15 +0000 (16:44 +1000)]
[XFS] update timestamp in xfs_ialloc manually
In xfs_ialloc we just want to set all timestamps to the current time. We
don't need to mark the inode dirty like xfs_ichgtime does, and we don't
need nor want the opimizations in xfs_ichgtime that I will introduce in
the next patch.
So just opencode the timestamp update in xfs_ialloc, and remove the new
unused XFS_ICHGTIME_ACC case in xfs_ichgtime.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31825a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
David Chinner [Wed, 13 Aug 2008 06:42:10 +0000 (16:42 +1000)]
[XFS] remove the sema_t from XFS.
Now that all users of the sema_t are gone from XFS we can finally kill it.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31823a
Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
David Chinner [Wed, 13 Aug 2008 06:41:43 +0000 (16:41 +1000)]
[XFS] replace dquot flush semaphore with a completion
Use the new completion flush code to implement the dquot flush lock.
Removes one of the final users of semaphores in the XFS code base.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31822a
Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
David Chinner [Wed, 13 Aug 2008 06:41:16 +0000 (16:41 +1000)]
[XFS] replace inode flush semaphore with a completion
Use the new completion flush code to implement the inode flush lock.
Removes one of the final users of semaphores in the XFS code base.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31817a
Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
David Chinner [Wed, 13 Aug 2008 06:40:43 +0000 (16:40 +1000)]
[XFS] extend completions to provide XFS object flush requirements
XFS object flushing doesn't quite match existing completion semantics. It
mixed exclusive access with completion. That is, we need to mark an object as
being flushed before flushing it to disk, and then block any other attempt to
flush it until the completion occurs. We do this but adding an extra count to
the completion before we start using them. However, we still need to
determine if there is a completion in progress, and allow no-blocking attempts
fo completions to decrement the count.
To do this we introduce:
int try_wait_for_completion(struct completion *x)
returns a failure status if done == 0, otherwise decrements done
to zero and returns a "started" status. This is provided
to allow counted completions to begin safely while holding
object locks in inverted order.
int completion_done(struct completion *x)
returns 1 if there is no waiter, 0 if there is a waiter
(i.e. a completion in progress).
This replaces the use of semaphores for providing this exclusion
and completion mechanism.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31816a
Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
David Chinner [Wed, 13 Aug 2008 06:36:11 +0000 (16:36 +1000)]
[XFS] replace the XFS buf iodone semaphore with a completion
The xfs_buf_t b_iodonesema is really just a semaphore that wants to be a
completion. Change it to a completion and remove the last user of the
sema_t from XFS.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31815a
Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
David Chinner [Wed, 13 Aug 2008 06:34:31 +0000 (16:34 +1000)]
[XFS] clean up stale references to semaphores
A lot of code has been converted away from semaphores, but there are still
comments that reference semaphore behaviour. The log code is the worst
offender. Update the comments to reflect what the code really does now.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31814a
Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Harvey Harrison [Wed, 13 Aug 2008 06:29:21 +0000 (16:29 +1000)]
[XFS] use get_unaligned_* helpers
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31813a
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Lachlan McIlroy [Wed, 13 Aug 2008 06:28:40 +0000 (16:28 +1000)]
[XFS] Fix compile failure in xfs_buf_trace()
SGI-PV: 957103
SGI-Modid: xfs-linux-melb:xfs-kern:31804a
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Christoph Hellwig [Wed, 13 Aug 2008 06:25:27 +0000 (16:25 +1000)]
[XFS] Use the same btree_cur union member for alloc and inobt trees.
The alloc and inobt btree use the same agbp/agno pair in the btree_cur
union. Make them use the same bc_private.a union member so that code for
these two short form btree implementations can be shared.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31788a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Tim Shimmin <tes@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Christoph Hellwig [Wed, 13 Aug 2008 06:23:50 +0000 (16:23 +1000)]
[XFS] small cleanups in xfs_btree.c
Remove unneeded xfs_btree_get_block forward declaration. Move
xfs_btree_firstrec next to xfs_btree_lastrec.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31787a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Tim Shimmin <tes@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Christoph Hellwig [Wed, 13 Aug 2008 06:23:13 +0000 (16:23 +1000)]
[XFS] sanitize xfs_initialize_vnode
Sanitize setting up the Linux indode.
Setting up the xfs_inode <-> inode link is opencoded in xfs_iget_core now
because that's the only place it needs to be done, xfs_initialize_vnode is
renamed to xfs_setup_inode and loses all superflous paramaters. The check
for I_NEW is removed because it always is true and the di_mode check moves
into xfs_iget_core because it's only needed there.
xfs_set_inodeops and xfs_revalidate_inode are merged into xfs_setup_inode
and the whole things is moved into xfs_iops.c where it belongs.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31782a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Christoph Hellwig [Wed, 13 Aug 2008 06:22:40 +0000 (16:22 +1000)]
[XFS] kill bhv_vnode_t
All remaining bhv_vnode_t instance are in code that's more or less Linux
specific. (Well, for xfs_acl.c that could be argued, but that code is on
the removal list, too). So just do an s/bhv_vnode_t/struct inode/ over the
whole tree. We can clean up variable naming and some useless helpers
later.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31781a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Christoph Hellwig [Wed, 13 Aug 2008 06:22:09 +0000 (16:22 +1000)]
[XFS] remove some easy bhv_vnode_t instances
In various places we can just move a VFS_I call into the argument list of
called functions/macros instead of having a local bhv_vnode_t.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31776a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Christoph Hellwig [Wed, 13 Aug 2008 06:18:07 +0000 (16:18 +1000)]
[XFS] kill xfs_lock_dir_and_entry
When multiple inodes are locked in XFS it happens in order of the inode
number, with the everything but the first inode trylocked if any of the
previous inodes is in the AIL.
Except for the sorting of the inodes this logic is implemented in
xfs_lock_inodes, but also partially duplicated in xfs_lock_dir_and_entry
in a particularly stupid way adds a lock roundtrip if the inode ordering
is not optimal.
This patch adds a new helper xfs_lock_two_inodes that takes two inodes and
locks them in the most optimal way according to the above locking protocol
and uses it for all places that want to lock two inodes.
The only caller of xfs_lock_inodes is xfs_rename which might lock up to
four inodes.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31772a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Donald Douwsma <donaldd@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Christoph Hellwig [Wed, 13 Aug 2008 06:17:37 +0000 (16:17 +1000)]
[XFS] kill INDUCE_IO_ERROR
All the error injection is already enabled through ifdef DEBUG, so kill
the never set second cpp symbol to activate it without the rest of the
debugging infrastructure.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31771a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Christoph Hellwig [Wed, 13 Aug 2008 06:13:45 +0000 (16:13 +1000)]
[XFS] implement IHOLD/IRELE directly
Now that all direct calls to VN_HOLD/VN_RELE are gone we can implement
IHOLD/IRELE directly.
For the IHOLD case also replace igrab with a direct increment of i_count
because we are guaranteed to already have a live and referenced inode by
the VFS. Also remove the vn_hold statistic because it's been rather
meaningless for some time with most references done by other callers.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31764a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Christoph Hellwig [Wed, 13 Aug 2008 06:13:09 +0000 (16:13 +1000)]
[XFS] remove remaining VN_HOLD calls
Use IHOLD(ip) instead of VN_HOLD(VFS_I(ip)).
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31765a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Christoph Hellwig [Wed, 13 Aug 2008 06:12:37 +0000 (16:12 +1000)]
[XFS] remove spurious VN_HOLD/VN_RELE calls from xfs_acl.c
All the ACL routines are called from inode operations which are guaranteed
to have a referenced inode by the VFS, so there's no need for the ACL code
to grab another temporary one.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31763a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Christoph Hellwig [Wed, 13 Aug 2008 06:12:05 +0000 (16:12 +1000)]
[XFS] kill vn_to_inode
bhv_vnode_t is just a typedef for struct inode, so there's
no need for a helper to convert between the two.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31761a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Christoph Hellwig [Wed, 13 Aug 2008 06:11:26 +0000 (16:11 +1000)]
[XFS] Remove vn_from_inode()
bhv_vnode_t is just a typedef for struct inode, so there's
no need for a helper to convert between the two.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31760a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Eric Sandeen [Wed, 13 Aug 2008 06:10:52 +0000 (16:10 +1000)]
[XFS] remove shouting-indirection macros from xfs_trans.h
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31758a
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Eric Sandeen [Wed, 13 Aug 2008 06:09:25 +0000 (16:09 +1000)]
[XFS] convert xfs to use ERR_CAST
Looks like somehow xfs got missed in the conversion that took place in
e231c2ee64eb1c5cd3c63c31da9dac7d888dcf7f, "Convert ERR_PTR(PTR_ERR(p))
instances to ERR_CAST(p)
<http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit
diff;h=
e231c2ee64eb1c5cd3c63c31da9dac7d888dcf7f>"
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31757a
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Eric Sandeen [Wed, 13 Aug 2008 06:07:53 +0000 (16:07 +1000)]
[XFS] remove INT_GET and friends
Thanks to hch's endian work, INT_GET etc are no longer used, and may as
well be removed. INT_SET is still used in the acl code, though.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31756a
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Niv Sardi [Wed, 13 Aug 2008 06:05:49 +0000 (16:05 +1000)]
[XFS] Move xfs_attr_rolltrans to xfs_trans_roll
Move it from the attr code to the transaction code and make
the attr code call the new function.
We rolltrans is really usefull whenever we want to use rolling
transaction, should be generic, it isn't dependent on any part
of the attr code anyway.
We use this excuse to change all the:
if ((error = xfs_attr_rolltrans()))
calls into:
error = xfs_trans_roll();
if (error)
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31729a
Signed-off-by: Niv Sardi <xaiki@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Christoph Hellwig [Wed, 13 Aug 2008 06:04:05 +0000 (16:04 +1000)]
[XFS] don't leak m_fsname/m_rtname/m_logname
Add a helper to free the m_fsname/m_rtname/m_logname allocations and use
it properly for all mount failure cases. Also switch the allocations for
these to kstrdup while we're at it.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31728a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Niv Sardi [Wed, 13 Aug 2008 06:03:35 +0000 (16:03 +1000)]
[XFS] Move attr log alloc size calculator to another function.
We will need that to be able to calculate the size of log we need for a
specific attr (for Create+EA). The local flag is needed so that we can
fail if we run into ENOSPC when trying to alloc blocks.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31727a
Signed-off-by: Niv Sardi <xaiki@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Tim Shimmin <tes@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
David Chinner [Wed, 13 Aug 2008 06:02:51 +0000 (16:02 +1000)]
[XFS] Use KM_NOFS for incore inode extent tree allocation V2
If we allow incore extent tree allocations to recurse into the
filesystem under memory pressure, new delayed allocations through
xfs_iomap_write_delay() can deadlock on themselves if memory
reclaim tries to write back dirty pages from that inode.
It will deadlock in xfs_iomap_write_allocate() trying to take the
ilock we already hold. This can also show up as complex ABBA deadlocks
when multiple threads are triggering memory reclaim when trying to
allocate extents.
The main cause of this is the fact that delayed allocation is not done in
a transaction, so KM_NOFS is not automatically added to the allocations to
prevent this recursion.
Mark all allocations done for the incore inode extent tree as KM_NOFS to
ensure they never recurse back into the filesystem.
Version 2: o KM_NOFS implies KM_SLEEP, so just use KM_NOFS
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31726a
Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
David Chinner [Wed, 13 Aug 2008 06:01:45 +0000 (16:01 +1000)]
[XFS] XFS: Kill xfs_vtoi()
xfs_vtoi() is redundant and only unsed in small sections of code.
Replace them with widely used XFS_I() inline and kill xfs_vtoi().
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31725a
Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
David Chinner [Wed, 13 Aug 2008 06:00:45 +0000 (16:00 +1000)]
[XFS] Kill shouty XFS_ITOV() macro
Replace XFS_ITOV() with the new VFS_I() inline.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31724a
Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
David Chinner [Wed, 13 Aug 2008 05:47:43 +0000 (15:47 +1000)]
[XFS] kill shouty XFS_ITOV_NULL macro
Replace XFS_ITOV_NULL() with the new VFS_I() inline.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31722a
Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
David Chinner [Wed, 13 Aug 2008 05:45:15 +0000 (15:45 +1000)]
[XFS] Avoid directly referencing the VFS inode.
In several places we directly convert from the XFS inode
to the linux (VFS) inode by a simple deference of ip->i_vnode.
We should not do this - a helper function should be used to
extract the VFS inode from the XFS inode.
Introduce the function VFS_I() to extract the VFS inode
from the XFS inode. The name was chosen to match XFS_I() which
is used to extract the XFS inode from the VFS inode.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31720a
Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Lachlan McIlroy [Wed, 13 Aug 2008 05:42:10 +0000 (15:42 +1000)]
[XFS] Do not access buffers after dropping reference count
We should not access a buffer after dropping it's reference count
otherwise we could race with another thread that releases the final
reference count and frees the buffer causing us to access potentially
unmapped memory. The bug this change fixes only occured on DEBUG XFS since
the offending code was in an ASSERT.
SGI-PV: 984429
SGI-Modid: xfs-linux-melb:xfs-kern:31715a
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
David Chinner [Wed, 13 Aug 2008 05:41:12 +0000 (15:41 +1000)]
[XFS] Use the generic bitops rather than implementing them ourselves.
This keeps xfs_lowbit64 as it was since there aren't good generic helpers
there ... Patch inspired by Andi Kleen.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31472a
Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Donald Douwsma <donaldd@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>