Ralph Campbell [Mon, 7 Jan 2008 05:02:34 +0000 (21:02 -0800)]
IB/ipath: Export hardware counters more consistently
Various hardware counters are exported via the ipath file system (since
it is binary data). The old file format was very dependent on the HW
offsets for these registers. Newer HCA chips can have different
counters at different offsets. This patch adds a level of indirection
to make the file format consistent across HCAs.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Ralph Campbell [Mon, 7 Jan 2008 05:02:33 +0000 (21:02 -0800)]
IB/ipath: MAD performance sampling registers support
Add support for QLogic HCAs which have hardware performance sampling
registers for PortSamplesControl and PortSamplesResult MADs.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
David Dillow [Mon, 7 Jan 2008 23:23:41 +0000 (18:23 -0500)]
IB/srp: Add identifying information to log messages
When you have multiple targets, it gets really confusing when you try
to track down who did a reset when there is no identifying information
in the log message, especially when the same extension ID is mapped
through two different local IB ports. So, add an identifier that can
be used to track back to which local IB port/remote target pair is the
one having problems.
Signed-off-by: David Dillow <dillowda@ornl.gov>
Acked-by: Pete Wyckoff <pw@osc.edu>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Pradeep Satyanarayana [Fri, 21 Dec 2007 21:08:23 +0000 (13:08 -0800)]
IPoIB/CM: Enable SRQ support on HCAs that support fewer than 16 SG entries
Some HCAs (such as ehca2) support SRQ, but only support fewer than 16 SG
entries for SRQs. Currently IPoIB/CM implicitly assumes all HCAs will
support 16 SG entries for SRQs (to handle a 64K MTU with 4K pages). This
patch removes that restriction by limiting the maximum MTU in connected
mode to what the maximum number of SRQ SG entries allows.
This patch addresses <https://bugs.openfabrics.org/show_bug.cgi?id=728>
Signed-off-by: Pradeep Satyanarayana <pradeeps@linux.vnet.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
David Dillow [Wed, 19 Dec 2007 22:09:15 +0000 (17:09 -0500)]
IB/srp: Enable SG list chaining
By default, the SCSI mid-layer seems to send down 512KB requests
(sg_tablesize = 256), with some requests occasionally combined. By
allowing the mid-layer to chain requests, we can easily grow to 1024KB
or larger -- I've tested 4096KB I/O requests with no problems.
I looked through the DMA paths on the hardware drivers to ensure they
could take advantage of the SG chaining, and it seems that every one
except ipath uses the system's DMA routines, which have been converted
to handle chaining. ipath looks like it should be OK, but I have no
way to test it.
Signed-off-by: David Dillow <dillowda@ornl.gov>
[ Tested on ipath. - Roland ]
Signed-off-by: Roland Dreier <rolandd@cisco.com>
David Dillow [Wed, 19 Dec 2007 22:08:43 +0000 (17:08 -0500)]
IB/srp: Respect target credit limit
The current SRP initiator will send requests even if it has no credits
available. The results of sending extra requests are vendor specific,
but on some devices, overrunning credits will cost 85% of peak
performance -- e.g. 100 MB/s vs 720 MB/s. Other devices may just drop
the requests.
This patch will tell the SCSI midlayer to queue requests if there are
fewer than two credits remaining, and will not issue a task management
request if there are no credits remaining. The mid-layer will retry
the queued command once an outstanding command completes.
The patch also removes the unlikely() in __srp_get_tx_iu(), as it is
not at all unlikely to hit this limit under heavy load.
Signed-off-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Rolf Manderscheid [Mon, 10 Dec 2007 20:38:41 +0000 (13:38 -0700)]
IPoIB: improve IPv4/IPv6 to IB mcast mapping functions
An IPoIB subnet on an IB fabric that spans multiple IB subnets can't
use link-local scope in multicast GIDs. The existing routines that
map IP/IPv6 multicast addresses into IB link-level addresses hard-code
the scope to link-local, and they also leave the partition key field
uninitialised. This patch adds a parameter (the link-level broadcast
address) to the mapping routines, allowing them to initialise both the
scope and the P_Key appropriately, and fixes up the call sites.
The next step will be to add a way to configure the scope for an IPoIB
interface.
Signed-off-by: Rolf Manderscheid <rvm@obsidianresearch.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Dave Olson [Thu, 6 Dec 2007 08:28:02 +0000 (00:28 -0800)]
IB/ipath: Changes for fields moving from devdata to portdata
This patch moves some arrays that were defined per-device to be
variables defined in the per context data structure, thus avoiding extra
kzalloc() calls.
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Dave Olson [Fri, 21 Dec 2007 09:50:59 +0000 (01:50 -0800)]
IB/ipath: Generalize some xxx_SHIFT macros
In preparation for upcoming chips that have different values for
INFINIPATH_R_PORTENABLE_SHIFT, INFINIPATH_R_INTRAVAIL_SHIFT,
INFINIPATH_R_TAILUPD_SHIFT, and portcfg_shift, remove the shared
#defines and use device-specific variables instead.
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Ralph Campbell [Thu, 20 Dec 2007 10:43:23 +0000 (02:43 -0800)]
IB/ipath: kreceive uses portdata rather than devdata
kreceive is now portdata * instead of devdata * and other kreceive
related cleanups....
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Ralph Campbell [Sat, 22 Dec 2007 00:47:04 +0000 (16:47 -0800)]
IB/ipath: Cleanup ipath_get_egrbuf()
Remove an unused parameter and fix up the comment.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Ralph Campbell [Sat, 15 Dec 2007 03:22:34 +0000 (19:22 -0800)]
IB/ipath: Fix RNR NAK handling
This patch fixes a couple of minor problems with RNR NAK handling:
- The insertion sort was causing extra delay when inserting ahead
vs. behind an existing entry on the list.
- A resend of a first packet of a message which is still not ready,
needs another RNR NAK (i.e., it was suppressed when it shouldn't).
- Also, the resend tasklet doesn't need to be woken up unless the
ACK/NAK actually indicates progress has been made.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Hoang-Nam Nguyen [Thu, 20 Dec 2007 14:06:33 +0000 (15:06 +0100)]
IB/ehca: Forward event client-reregister-required to registered clients
This patch allows ehca to forward event client-reregister-required to
registered clients. One such event is generated by a switch eg. after
its reboot.
Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Roland Dreier [Fri, 25 Jan 2008 22:15:34 +0000 (14:15 -0800)]
IB/mlx4: Micro-optimize mlx4_ib_poll_one()
Rather than byte-swapping cqe->g_mlpath_rqpn each time we extract a
field from it, byte-swap it once into a temporary variable. This
results in smaller, better code -- eg, on 32-bit x86:
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-5 (-5)
function old new delta
mlx4_ib_poll_cq 1188 1183 -5
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Adrian Bunk [Tue, 1 Jan 2008 13:47:10 +0000 (15:47 +0200)]
IB/mthca: Remove MSI support as scheduled
Remove MSI support from the mthca driver, as scheduled. There is no
reason to use MSI instead of MSI-X, since MSI-X performs better. No
one has spoken up since MSI support was deprecated in commit
f6be6fbe
("IB/mthca: Schedule MSI support for removal"), so apparently the MSI
support is unused.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Oliver Pinter [Fri, 25 Jan 2008 22:15:32 +0000 (14:15 -0800)]
IB/iser: Typo fix (s/destory/destroy/)
Signed-off-by: Oliver Pinter <oliver.pntr@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Erez Zilber [Fri, 25 Jan 2008 22:15:32 +0000 (14:15 -0800)]
IB/iser: update URLs of iSER docs
Signed-off-by: Erez Zilber <erezz@voltaire.com>
Sean Hefty [Wed, 14 Nov 2007 08:29:50 +0000 (00:29 -0800)]
RDMA/cma: add support for rdma_migrate_id()
This is based on user feedback from Doug Ledford at RedHat:
Events that occur on an rdma_cm_id are reported to userspace through an
event channel. Connection request events are reported on the event
channel associated with the listen. When the connection is accepted, a
new rdma_cm_id is created and automatically uses the listen event
channel. This is suboptimal where the user only wants listen events on
that channel.
Additionally, it may be desirable to have events related to connection
establishment use a different event channel than those related to
already established connections.
Allow the user to migrate an rdma_cm_id between event channels. All
pending events associated with the rdma_cm_id are moved to the new event
channel.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Vladimir Sokolovsky [Sat, 8 Dec 2007 04:32:03 +0000 (20:32 -0800)]
RDMA/cma: Reenable device removal on passive side
Enable conn_id remove on the passive side after connection
establishment. This corrects an issue where the IB driver can't be
unloaded after running applications over RDS. The 'dev_remove' counter
does not reach 0 for established connections on the passive side.
This problem is limited to device removal, and only occurs on the
passive side if there are established connections.
Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.co.il>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Sean Hefty [Sat, 1 Dec 2007 01:30:18 +0000 (17:30 -0800)]
IB/mad: Fix incorrect access to items on local_list
In cancel_mads(), MADs are moved from the wait_list and local_list
to a cancel_list for processing. However, the structures on these two
lists are not the same. The wait_list references struct
ib_mad_send_wr_private, but local_list references struct
ib_mad_local_private. Cancel_mads() treats all items moved to the
cancel_list as struct ib_mad_send_wr_private. This leads to a system
crash when requests are moved from the local_list to the cancel_list.
Fix this by leaving local_list alone. All requests on the local_list
have completed are just awaiting processing by a queued worker thread.
Bug (crash) reported by Dotan Barak <dotanb@dev.mellanox.co.il>.
Problem with local_list access reported by Robert Reynolds
<rreynolds@opengridcomputing.com>.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Sean Hefty [Tue, 17 Jul 2007 04:49:35 +0000 (21:49 -0700)]
IB/cm: Add basic performance counters
Add performance/debug counters to track sent/received messages, retries,
and duplicates. Counters are tracked per CM message type, per port.
The counters are always enabled, so intrusive state tracking is not done.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Sean Hefty [Tue, 27 Nov 2007 08:11:04 +0000 (00:11 -0800)]
IB/mad: Report number of times a mad was retried
To allow ULPs to tune timeout values and capture retry statistics,
report the number of times that a mad send operation was retried.
For RMPP mads, report the total number of times that the any portion
(send window) of the send operation was retried.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Sean Hefty [Tue, 23 Oct 2007 04:52:54 +0000 (21:52 -0700)]
IB/multicast: Report errors on multicast groups if P_key changes
P_key changes can invalidate multicast groups. Report errors on all
multicast groups affected by a pkey change.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Joe Perches [Mon, 17 Dec 2007 19:30:36 +0000 (11:30 -0800)]
IB: Spelling fixes in comments
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Jack Morgenstein [Mon, 10 Dec 2007 03:25:23 +0000 (05:25 +0200)]
mlx4_core: Fix max_eqs masking in QUERY_DEV_CAP
log_max_eqs is a 4-bit field, not a 3-bit field in the response to the
QUERY_DEV_CAP FW command, so we should mask with 0xf instead of 0x7
when reading it.
Found by Yossi Leybovitch of Mellanox.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Nick Piggin [Thu, 13 Dec 2007 23:58:57 +0000 (15:58 -0800)]
IB/ipath: Convert from .nopage to .fault
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Ralph Campbell [Sat, 3 Nov 2007 00:40:36 +0000 (17:40 -0700)]
IB/ipath: Add the work completion error code to the QP error debug output
Add the work completion error code to the QP error debug output.
This makes it easier to determine the cause of the error.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Arthur Jones [Thu, 18 Oct 2007 01:18:29 +0000 (18:18 -0700)]
IB/ipath: Better comment for rmb() in ipath_intr()
An internal code review found the comment here lacking -- update it with
more specifics of how and why the rmb() is there.
Signed-off-by: Arthur Jones <arthur.jones@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Ralph Campbell [Tue, 27 Nov 2007 07:44:15 +0000 (23:44 -0800)]
IB/ipath: Fix comments for ipath_create_srq()
During a code review, someone noticed the comments didn't match the code.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Ralph Campbell [Fri, 9 Nov 2007 03:53:01 +0000 (19:53 -0800)]
IB/ipath: Fix error returned from ib_resize_cq if new size smaller than # entries
The gen2_basic tests check for the errno value when a CQ is resized
smaller than the number of outstanding completions queue on the CQ.
This patch changes ib_ipath to return EINVAL which is what ib_mthca
returns and what gen2_basic expects.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
John Gregor [Wed, 5 Sep 2007 08:57:14 +0000 (01:57 -0700)]
IB/ipath: Fix sendctrl locking
Code review pointed out that the locking around uses of ipath_sendctrl
and kr_sendctrl were, in several places, incorrect and/or inconsistent.
Signed-off-by: John Gregor <john.gregor@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Ralph Campbell [Sat, 13 Oct 2007 03:48:06 +0000 (20:48 -0700)]
IB/ipath: Remove dead code for user process waiting for send buffer
At one point in time there was code to allow a user process to
wait for a send buffer if none were available. This feature was
never used and most of the code was removed. This removes
some missed unused code.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Steve Wise [Mon, 26 Nov 2007 17:28:46 +0000 (11:28 -0600)]
RDMA/cxgb3: Support version 5.0 firmware
The 5.0 firmware now supports translating sgls in recv work requests,
so remove the host driver logic currently doing the translation.
Note: this change requires 5.0 firmware.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Steve Wise [Mon, 26 Nov 2007 17:28:44 +0000 (11:28 -0600)]
RDMA/cxgb3: Hold rtnl_lock() around ethtool get_drvinfo call
Currently the call into cxgb3 to get the driver info is not serialized.
The iw_cxgb3 module needs to hold the rtnl_lock around the ethtool ops
call like dev_ioctl() does.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Joe Perches [Tue, 20 Nov 2007 01:48:11 +0000 (17:48 -0800)]
drivers/infiniband: Add missing "space"
Add missing spaces in the middle of format strings.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Matthias Kaehlcke [Thu, 15 Nov 2007 23:23:25 +0000 (15:23 -0800)]
IB/ipath: Convert ipath_eep_sem semaphore to a mutex
Signed-off-by: Matthias Kaehlcke <matthias.kaehlcke@gmail.com>
Acked-by: Michael Albaugh <Michael.Albaugh@qlogic.com>
Tested-by: Arthur Jones <arthur.jones@qlogic.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Steve Welch [Tue, 23 Oct 2007 22:06:10 +0000 (15:06 -0700)]
IB/mad: Enable loopback of DR SMP responses from userspace
The local loopback of an outgoing DR SMP response is limited to those
that originate at the driver specific SMA implementation during the
driver specific process_mad() function. This patch enables a
returning DR SMP originating in userspace (or elsewhere) to be
delivered to the local managment stack. In this specific case the
driver process_mad() function does not consume or process the MAD, so
a reponse mad has not be created and the original MAD must manually be
copied to the MAD buffer that is to be handed off to the local agent.
Signed-off-by: Steve Welch <swelch@systemfabricworks.com>
Acked-by: Hal Rosenstock <hal@xsigo.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Ralph Campbell [Tue, 23 Oct 2007 22:07:41 +0000 (15:07 -0700)]
IB/ipath: Enable loopback of DR SMP responses from userspace
This patch is in response to reviewing a patch to the core MAD
processing which fixes loopback of directed route packets to/from user
level MAD agents. This change enables the core code to work for
ib_ipath by fixing the return code from the ipath process_mad method.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Ralph Campbell [Tue, 23 Oct 2007 22:04:15 +0000 (15:04 -0700)]
IB/mad: Remove redundant NULL pointer check in ib_mad_recv_done_handler()
In ib_mad_recv_done_handler(), the response pointer is checked for
NULL after allocating it. It is then checked again in the local
process_mad() path but there is no possibility of it changing in
between.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Acked-by: Hal Rosenstock <hal@xsigo.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Steve Wise [Mon, 29 Oct 2007 16:34:05 +0000 (11:34 -0500)]
RDMA/iwcm: Set initiator depth and responder resources to device max values
Set the initiator depth and responder resources to the device max
values for new connect request events in the iWARP connection manager.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Dave Olson [Wed, 10 Oct 2007 12:10:35 +0000 (05:10 -0700)]
IB/ipath: Improve interrupt handler cache footprint
Improve interrupt handler cache footprint by noinline'ing error
functions that are rarely called.
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Pradeep Satyanarayana [Fri, 25 Jan 2008 22:15:24 +0000 (14:15 -0800)]
IPoIB/cm: Add connected mode support for devices without SRQs
Some IB adapters (notably IBM's eHCA) do not implement SRQs (shared
receive queues). The current IPoIB connected mode support only works
on devices that support SRQs.
Fix this by adding support for using the receive queue of each
connected mode receive QP. The disadvantage of this compared to using
an SRQ is that it means a full queue of receives must be posted for
each remote connected mode peer, which means that total memory usage
is potentially much higher than when using SRQs. To manage this, add
a new module parameter "max_nonsrq_conn_qp" that limits the number of
connections allowed per interface.
The rest of the changes are fairly straightforward: we use a table of
struct ipoib_cm_rx to hold all the active connections, and put the
table index of the connection in the high bits of receive WR IDs.
This is needed because we cannot rely on the struct ib_wc.qp field for
non-SRQ receive completions. Most of the rest of the changes just
test whether or not an SRQ is available, and post receives or find
received packets in the right place depending on the answer.
Cleaning up dead connections actually becomes simpler, because we do
not have to do the "last WQE reached" dance that is required to
destroy QPs attached to an SRQ. We just move the QP to the error
state and wait for all pending receives to be flushed.
Signed-off-by: Pradeep Satyanarayana <pradeeps@linux.vnet.ibm.com>
[ Completely rewritten and split up, based on Pradeep's work. Several
bugs fixed and no doubt several bugs introduced. - Roland ]
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Roland Dreier [Fri, 25 Jan 2008 22:15:24 +0000 (14:15 -0800)]
IPoIB/cm: Factor out ipoib_cm_free_rx_reap_list()
Factor out the code for going through the rx_reap list of struct
ipoib_cm_rx and freeing each one. This consolidates the code
duplicated between ipoib_cm_dev_stop() and ipoib_cm_rx_reap() and
reduces the risk of error when adding additional accounting.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Roland Dreier [Fri, 25 Jan 2008 22:15:24 +0000 (14:15 -0800)]
IPoIB/cm: Factor out ipoib_cm_create_srq()
Factor out the code to create an SRQ and allocate the receive ring in
ipoib_cm_dev_init() into a new function ipoib_cm_create_srq(). This
will make the code neater when support for devices that don't implement
SRQs is added.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Roland Dreier [Fri, 25 Jan 2008 22:15:24 +0000 (14:15 -0800)]
IPoIB/cm: Factor out ipoib_cm_free_rx_ring()
Factor out the code to unmap/free skbs and free the receive ring in
ipoib_cm_dev_cleanup() into a new function ipoib_cm_free_rx_ring().
This function will be called from a couple of other places when
support for devices that don't implement SRQs is added.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Roland Dreier [Wed, 24 Oct 2007 02:57:54 +0000 (19:57 -0700)]
IPoIB: Trivial formatting cleanups
Fix whitespace blunders, convert "foo* bar" to "foo *bar", etc.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Roland Dreier [Fri, 25 Jan 2008 22:15:21 +0000 (14:15 -0800)]
IB/ipath: Fix crash on unload introduced by sysfs changes
Commit
23b9c1ab ("Infiniband: make ipath driver use default driver
groups.") introduced a bug in the ipath driver where
ipath_device_create_group() fell through into the error path, even on
success, which meant that the sysfs groups it created would always get
removed right away. This made ipath_device_remove_group() hit the
BUG_ON() in sysfs_remove_group() when it tried to remove those groups a
second time.
Correct the return path so that the groups stick around until they are
supposed to be cleaned up.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Linus Torvalds [Fri, 25 Jan 2008 16:44:29 +0000 (08:44 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jmorris/selinux-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6:
selinux: make mls_compute_sid always polyinstantiate
security/selinux: constify function pointer tables and fields
security: add a secctx_to_secid() hook
security: call security_file_permission from rw_verify_area
security: remove security_sb_post_mountroot hook
Security: remove security.h include from mm.h
Security: remove security_file_mmap hook sparse-warnings (NULL as 0).
Security: add get, set, and cloning of superblock security information
security/selinux: Add missing "space"
Linus Torvalds [Fri, 25 Jan 2008 16:40:02 +0000 (08:40 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/hskinnemoen/avr32-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hskinnemoen/avr32-2.6:
[AVR32] extint: Set initial irq type to low level
[AVR32] extint: change set_irq_type() handling
[AVR32] NMI debugging
[AVR32] constify function pointer tables
[AVR32] ATNGW100: Update defconfig
[AVR32] ATSTK1002: Update defconfig
[AVR32] Kconfig: Choose daughterboard instead of CPU
[AVR32] Add support for ATSTK1003 and ATSTK1004
[AVR32] Clean up external DAC setup code
[AVR32] ATSTK1000: Move gpio-leds setup to setup.c
[AVR32] Add support for AT32AP7001 and AT32AP7002
[AVR32] Provide more CPU information in /proc/cpuinfo and dmesg
[AVR32] Oprofile support
[AVR32] Include instrumentation menu
Disable VGA text console for AVR32 architecture
[AVR32] Enable debugging only when needed
ptrace: Call arch_ptrace_attach() when request=PTRACE_TRACEME
[AVR32] Remove redundant try_to_freeze() call from do_signal()
[AVR32] Drop GFP_COMP for DMA memory allocations
Linus Torvalds [Fri, 25 Jan 2008 16:39:18 +0000 (08:39 -0800)]
Merge git://git./linux/kernel/git/steve/gfs2-2.6-nmw
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: (56 commits)
[GFS2] Allow journal recovery on read-only mount
[GFS2] Lockup on error
[GFS2] Fix page_mkwrite truncation race path
[GFS2] Fix typo
[GFS2] Fix write alloc required shortcut calculation
[GFS2] gfs2_alloc_required performance
[GFS2] Remove unneeded i_spin
[GFS2] Reduce inode size by moving i_alloc out of line
[GFS2] Fix assert in log code
[GFS2] Fix problems relating to execution of files on GFS2
[GFS2] Initialize extent_list earlier
[GFS2] Allow page migration for writeback and ordered pages
[GFS2] Remove unused variable
[GFS2] Fix log block mapper
[GFS2] Minor correction
[GFS2] Eliminate the no longer needed sd_statfs_mutex
[GFS2] Incremental patch to fix compiler warning
[GFS2] Function meta_read optimization
[GFS2] Only fetch the dinode once in block_map
[GFS2] Reorganize function gfs2_glmutex_lock
...
Linus Torvalds [Fri, 25 Jan 2008 16:38:25 +0000 (08:38 -0800)]
Merge git://git./linux/kernel/git/herbert/crypto-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (125 commits)
[CRYPTO] twofish: Merge common glue code
[CRYPTO] hifn_795x: Fixup container_of() usage
[CRYPTO] cast6: inline bloat--
[CRYPTO] api: Set default CRYPTO_MINALIGN to unsigned long long
[CRYPTO] tcrypt: Make xcbc available as a standalone test
[CRYPTO] xcbc: Remove bogus hash/cipher test
[CRYPTO] xcbc: Fix algorithm leak when block size check fails
[CRYPTO] tcrypt: Zero axbuf in the right function
[CRYPTO] padlock: Only reset the key once for each CBC and ECB operation
[CRYPTO] api: Include sched.h for cond_resched in scatterwalk.h
[CRYPTO] salsa20-asm: Remove unnecessary dependency on CRYPTO_SALSA20
[CRYPTO] tcrypt: Add select of AEAD
[CRYPTO] salsa20: Add x86-64 assembly version
[CRYPTO] salsa20_i586: Salsa20 stream cipher algorithm (i586 version)
[CRYPTO] gcm: Introduce rfc4106
[CRYPTO] api: Show async type
[CRYPTO] chainiv: Avoid lock spinning where possible
[CRYPTO] seqiv: Add select AEAD in Kconfig
[CRYPTO] scatterwalk: Handle zero nbytes in scatterwalk_map_and_copy
[CRYPTO] null: Allow setkey on digest_null
...
Linus Torvalds [Fri, 25 Jan 2008 16:34:42 +0000 (08:34 -0800)]
Merge git://git./linux/kernel/git/gregkh/driver-2.6
This can be broken down into these major areas:
- Documentation updates (language translations and fixes, as
well as kobject and kset documenatation updates.)
- major kset/kobject/ktype rework and fixes. This cleans up the
kset and kobject and ktype relationship and architecture,
making sense of things now, and good documenation and samples
are provided for others to use. Also the attributes for
kobjects are much easier to handle now. This cleaned up a LOT
of code all through the kernel, making kobjects easier to use
if you want to.
- struct bus_type has been reworked to now handle the lifetime
rules properly, as the kobject is properly dynamic.
- struct driver has also been reworked, and now the lifetime
issues are resolved.
- the block subsystem has been converted to use struct device
now, and not "raw" kobjects. This patch has been in the -mm
tree for over a year now, and finally all the issues are
worked out with it. Older distros now properly work with new
kernels, and no userspace updates are needed at all.
- nozomi driver is added. This has also been in -mm for a long
time, and many people have asked for it to go in. It is now
in good enough shape to do so.
- lots of class_device conversions to use struct device instead.
The tree is almost all cleaned up now, only SCSI and IB is the
remaining code to fix up...
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6: (196 commits)
Driver core: coding style fixes
Kobject: fix coding style issues in kobject c files
Kobject: fix coding style issues in kobject.h
Driver core: fix coding style issues in device.h
spi: use class iteration api
scsi: use class iteration api
rtc: use class iteration api
power supply : use class iteration api
ieee1394: use class iteration api
Driver Core: add class iteration api
Driver core: Cleanup get_device_parent() in device_add() and device_move()
UIO: constify function pointer tables
Driver Core: constify the name passed to platform_device_register_simple
driver core: fix build with SYSFS=n
sysfs: make SYSFS_DEPRECATED depend on SYSFS
Driver core: use LIST_HEAD instead of call to INIT_LIST_HEAD in __init
kobject: add sample code for how to use ksets/ktypes/kobjects
kobject: add sample code for how to use kobjects in a simple manner.
kobject: update the kobject/kset documentation
kobject: remove old, outdated documentation.
...
Pekka Enberg [Fri, 25 Jan 2008 06:20:51 +0000 (08:20 +0200)]
slab: fix bootstrap on memoryless node
If the node we're booting on doesn't have memory, bootstrapping kmalloc()
caches resorts to fallback_alloc() which requires ->nodelists set for all
nodes. Fix that by calling set_up_list3s() for CACHE_CACHE in
kmem_cache_init().
As kmem_getpages() is called with GFP_THISNODE set, this used to work before
because of breakage in 2.6.22 and before with GFP_THISNODE returning pages from
the wrong node if a node had no memory. So it may have worked accidentally and
in an unsafe manner because the pages would have been associated with the wrong
node which could trigger bug ons and locking troubles.
Tested-by: Mel Gorman <mel@csn.ul.ie>
Tested-by: Olaf Hering <olaf@aepfle.de>
Reviewed-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
[ With additional one-liner by Olaf Hering - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Karsten Keil [Fri, 25 Jan 2008 10:55:28 +0000 (11:55 +0100)]
fix oops on rmmod capidrv
Fix overwriting the stack with the version string
(it is currently 10 bytes + zero) when unloading the
capidrv module. Safeguard against overwriting it
should the version string grow in the future.
Should fix Kernel Bug Tracker Bug 9696.
Signed-off-by: Gerd v. Egidy <gerd.von.egidy@intra2net.com>
Acked-by: Karsten Keil <kkeil@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Abhijith Das [Fri, 18 Jan 2008 20:06:37 +0000 (14:06 -0600)]
[GFS2] Allow journal recovery on read-only mount
This patch allows gfs2 to perform journal recovery even if it is mounted
read-only. Strictly speaking, a read-only mount should not be writing to
the filesystem, but we do this only to perform journal recovery. A
read-only mount will fail if we don't recover the dirty journal. Also,
when gfs2 is used as a root filesystem, it will be mounted read-only
before being mounted read-write during the boot sequence. A failed
read-only mount will panic the machine during bootup.
Signed-off-by: Abhijith Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Bob Peterson [Sun, 20 Jan 2008 03:50:24 +0000 (21:50 -0600)]
[GFS2] Lockup on error
I spotted this bug while I was digging around. Looks like it could cause
a lockup in some rare error condition.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Steven Whitehouse [Thu, 17 Jan 2008 15:12:03 +0000 (15:12 +0000)]
[GFS2] Fix page_mkwrite truncation race path
There was a bug in the truncation/invalidation race path for
->page_mkwrite for gfs2. It ought to return 0 so that the effect is the
same as if the page was truncated at any of the other points at which
the page_lock is dropped. This will result in the restart of the whole
page fault path. If it was due to a real truncation (as opposed to an
invalidate because we let a glock go) then the ->fault path will pick
that up when it gets called again.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Bob Peterson [Wed, 16 Jan 2008 14:45:39 +0000 (08:45 -0600)]
[GFS2] Fix typo
This patch fixes a minor typo. Surprisingly, it still compiled.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Steven Whitehouse [Wed, 16 Jan 2008 14:24:05 +0000 (14:24 +0000)]
[GFS2] Fix write alloc required shortcut calculation
The comparison was being made against the wrong quantity.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Bob Peterson [Fri, 11 Jan 2008 19:44:50 +0000 (13:44 -0600)]
[GFS2] gfs2_alloc_required performance
This is a small I/O performance enhancement to gfs2. (Actually, it is a rework of
an earlier version I got wrong). The idea here is to check if the write extends
past the last block in the file. If so, the function can save itself a lot of
time and trouble because it knows an allocate will be required. Benchmarks like
iozone should see better performance.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Bob Peterson [Fri, 11 Jan 2008 19:31:12 +0000 (13:31 -0600)]
[GFS2] Remove unneeded i_spin
This patch removes a vestigial variable "i_spin" from the gfs2_inode
structure. This not only saves us memory (>300000 of these in memory
for the oom test) it also saves us time because we don't have to
spend time initializing it (i.e. slightly better performance).
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Steven Whitehouse [Thu, 10 Jan 2008 15:18:55 +0000 (15:18 +0000)]
[GFS2] Reduce inode size by moving i_alloc out of line
It is possible to reduce the size of GFS2 inodes by taking the i_alloc
structure out of the gfs2_inode. This patch allocates the i_alloc
structure whenever its needed, and frees it afterward. This decreases
the amount of low memory we use at the expense of requiring a memory
allocation for each page or partial page that we write. A quick test
with postmark shows that the overhead is not measurable and I also note
that OCFS2 use the same approach.
In the future I'd like to solve the problem by shrinking down the size
of the members of the i_alloc structure, but for now, this reduces the
immediate problem of using too much low-memory on x86 and doesn't add
too much overhead.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Steven Whitehouse [Thu, 10 Jan 2008 14:49:43 +0000 (14:49 +0000)]
[GFS2] Fix assert in log code
Although the values were all being calculated correctly, there was a
race in the assert due to the way it was using atomic variables. This
changes the value we assert on so that we get the same effect by testing
a different variable. This prevents the assert triggering when it shouldn't.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Steven Whitehouse [Tue, 8 Jan 2008 08:14:30 +0000 (08:14 +0000)]
[GFS2] Fix problems relating to execution of files on GFS2
This patch fixes a couple of problems which affected the execution of files
on GFS2. The first is that there was a corner case where inodes were not
always uptodate at the point at which permissions checks were being carried
out, this was resulting in refusal of execute permission, but only on the
first lookup, subsequent requests worked correctly. The second was a problem
relating to incorrect updating of file sizes which was introduced with the
write_begin/end code for GFS2 a little while back.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Abhijith Das <adas@redhat.com>
Bob Peterson [Thu, 3 Jan 2008 15:24:53 +0000 (09:24 -0600)]
[GFS2] Initialize extent_list earlier
Here is a patch for the latest upstream GFS2 code:
The journal extent map needs to be initialized sooner than it
currently is. Otherwise failed mount attempts (e.g. not enough
journals, etc.) may panic trying to access the uninitialized list.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Steven Whitehouse [Thu, 3 Jan 2008 11:31:38 +0000 (11:31 +0000)]
[GFS2] Allow page migration for writeback and ordered pages
To improve performance on NUMA, we use the VM's standard page
migration for writeback and ordered pages. Probably we could
also do the same for journaled data, but that would need a
careful audit of the code, so will be the subject of a later
patch.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Steven Whitehouse [Wed, 2 Jan 2008 10:16:56 +0000 (10:16 +0000)]
[GFS2] Remove unused variable
The go_drop_th function is never called or referenced.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Steven Whitehouse [Fri, 14 Dec 2007 14:04:34 +0000 (14:04 +0000)]
[GFS2] Fix log block mapper
A missing offset in the calculation.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Bob Peterson [Wed, 12 Dec 2007 23:52:13 +0000 (17:52 -0600)]
[GFS2] Minor correction
This is a small correction to my previously posted patch1.
It just changes a divide to a shift. It's faster and doesn't
introduce odd dependencies on 32-bit compiles.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Bob Peterson [Wed, 12 Dec 2007 17:44:41 +0000 (11:44 -0600)]
[GFS2] Eliminate the no longer needed sd_statfs_mutex
This patch eliminates the unneeded sd_statfs_mutex mutex but preserves
the ordering as discussed.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Bob Peterson [Wed, 12 Dec 2007 15:24:08 +0000 (09:24 -0600)]
[GFS2] Incremental patch to fix compiler warning
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Bob Peterson [Wed, 12 Dec 2007 01:29:17 +0000 (19:29 -0600)]
[GFS2] Function meta_read optimization
This patch optimizes function gfs2_meta_read. Basically, gfs2_meta_wait
was being called regardless of whether a disk read was requested.
This just pulls that wait into the if that triggers the read.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Bob Peterson [Wed, 12 Dec 2007 01:16:09 +0000 (19:16 -0600)]
[GFS2] Only fetch the dinode once in block_map
Function gfs2_block_map was often looking up the disk inode twice.
This optimizes it so that only does it once.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Bob Peterson [Wed, 12 Dec 2007 01:13:54 +0000 (19:13 -0600)]
[GFS2] Reorganize function gfs2_glmutex_lock
This patch optimizes the function gfs2_glmutex_lock.
The basic theory is: Why bother initializing a holder, setting up
wait bits and then waiting on them, if you know the glock can be
yours. So the holder stuff is placed inside the if checking if the
glock is locked. This one needs careful scrutiny because changing
anything to do with locking should strike terror into one's heart.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Bob Peterson [Wed, 12 Dec 2007 01:00:16 +0000 (19:00 -0600)]
[GFS2] Run through full bitmaps quicker in gfs2_bitfit
I eliminated the passing of an unused parameter into gfs2_bitfit called rgd.
This also changes the gfs2_bitfit code that searches for free (or used) blocks.
Before, the code was trying to check for bytes that indicated 4 blocks in
the undesired state. The problem is, it was spending more time trying to
do this than it actually was saving. This version only optimizes the case
where we're looking for free blocks, and it checks a machine word at a time.
So on 32-bit machines, it will check 32-bits (16 blocks) and on 64-bit
machines, it will check 64-bits (32 blocks) at a time. The compiler
optimizes that quite well and we save some time, especially when running
through full bitmaps (like the bitmaps allocated for the journals).
There's probably a more elegant or optimized way to do this, but I haven't
thought of it yet. I'm open to suggestions.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Bob Peterson [Wed, 12 Dec 2007 00:51:25 +0000 (18:51 -0600)]
[GFS2] Get rid of useless "found" variable in quota.c
This just eliminates an unused variable from the quota code.
Not likely to be a time saver.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Bob Peterson [Wed, 12 Dec 2007 00:49:21 +0000 (18:49 -0600)]
[GFS2] Journal extent mapping
This patch saves a little time when gfs2 writes to the journals by
keeping a mapping between logical and physical blocks on disk.
That's better than constantly looking up indirect pointers in
buffers, when the journals are several levels of indirection
(which they typically are).
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Bob Peterson [Mon, 10 Dec 2007 20:13:27 +0000 (14:13 -0600)]
[GFS2] Remove function gfs2_get_block
This patch is just a cleanup. Function gfs2_get_block() just calls
function gfs2_block_map reversing the last two parameters. By
reversing the parameters, gfs2_block_map() may be called directly
and function gfs2_get_block may be eliminated altogether.
Since this function is done for every block operation,
this streamlines the code and makes it a little bit more efficient.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
David Teigland [Thu, 6 Dec 2007 15:35:25 +0000 (09:35 -0600)]
[GFS2] use pid for plock owner for nfs clients
The fl_owner is that of lockd when posix locks arrive from nfs
clients, so it can't be used to distinguish between lock holders.
Use fl_pid as owner instead; it's the pid of the process on the
nfs client.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Steven Whitehouse [Fri, 30 Nov 2007 08:17:15 +0000 (08:17 +0000)]
[GFS2] Remove unused variable
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Abhijith Das [Thu, 29 Nov 2007 20:13:54 +0000 (14:13 -0600)]
[GFS2] patch to check for recursive lock requests in gfs2_rename code path
A certain scenario in the rename code path triggers a kernel BUG()
because it accidentally does recursive locking The first lock is
requested to unlink an already existing inode (replacing a file) and the
second lock is requested when the destination directory needs to alloc
some space. It is rare that these two
events happen during the same rename call, and even more rare that these
two instances try to lock the same rgrp. It is, however, possible.
https://bugzilla.redhat.com/show_bug.cgi?id=404711
Signed-off-by: Abhijith Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Wendy Cheng [Thu, 29 Nov 2007 22:56:51 +0000 (17:56 -0500)]
[GFS2] Remove lock methods for lock_nolock protocol
GFS2 supports two modes of locking - lock_nolock for single node filesystem
and lock_dlm for cluster mode locking. The gfs2 lock methods are removed from
file operation table for lock_nolock protocol. This would allow VFS to handle
posix lock and flock logics just like other in-tree filesystems without
duplication.
Signed-off-by: S. Wendy Cheng <wcheng@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Fabio M. Di Nitto [Wed, 28 Nov 2007 15:22:09 +0000 (16:22 +0100)]
[GFS2] Remove unrequired code
Signed-off-by: Fabio M. Di Nitto <fabbione@ubuntu.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Fabio Massimo Di Nitto [Tue, 27 Nov 2007 05:16:42 +0000 (06:16 +0100)]
[GFS2] Fix build warnings
Hi Steven,
Steven Whitehouse wrote:
> Hi,
>
> Now in the -nmw git tree. Thanks,
>
> Steve.
>
> On Wed, 2007-11-21 at 11:54 -0600, Ryan O'Hara wrote:
this patch introduces a bunch of build warnings by leaving around
struct inode *inode = &ip->i_inode;
The patch in attachment cleans them up. Please apply.
Signed-off-by: Fabio Massimo Di Nitto <fabbione@ubuntu.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Ryan O'Hara [Wed, 21 Nov 2007 17:54:54 +0000 (11:54 -0600)]
[GFS2] remove unnecessary permission checks
Remove read/write permission() checks from xattr operations.
VFS layer is already handling permission for xattrs via the
xattr_permission() call, so there is no need for gfs2 to
check permissions. Futhermore, using permission() for SELinux
xattrs ops is incorrect.
Signed-off-by: Ryan O'Hara <rohara@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Fabio Massimo Di Nitto [Fri, 16 Nov 2007 09:50:40 +0000 (09:50 +0000)]
[GFS2] Fix runtime issue with UP kernels
The issue is indeed UP vs SMP and it is totally random.
spin_is_locked() is a bad assertion because there is no correct answer on UP.
on UP spin_is_locked() has to return either one value or another, always.
This means that in my setup I am lucky enough to trigger the issue and your you
are lucky enough not to.
the patch in attachment removes the bogus calls to BUG_ON and according to David
(in CC and thanks for the long explanation on the problem) we can rely upon
things like lockdep to find problem that might be trying to catch.
Signed-off-by: Fabio M. Di Nitto <fabbione@ubuntu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
David Teigland [Thu, 15 Nov 2007 15:01:13 +0000 (09:01 -0600)]
[GFS2] tidy up error message
Print error with log_error() to be consistent with others.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Fabio M. Di Nitto <fabbione@ubuntu.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Fabio Massimo Di Nitto [Thu, 15 Nov 2007 13:48:52 +0000 (13:48 +0000)]
[GFS2] Check for installation of mount helpers for DLM mounts
The patch is a fix to abort mount if the mount.gfs* and possible
umount.* are missing from /sbin.
While we do what we can to guarantee that they are installed properly in
userland (CVS HEAD), we want to make sure that mount still aborts properly.
The only sign of missing helpers is that lock_dlm will receive no mount options
at all. According to David the problem does not exist for lock_nolock as the
helpers are not required.
The patch has been tested for both gfs and gfs2 and it works as expected. The
lack of mount.gfs* will generate an error that is propagated to mount:
oot@node1:~# mount -t gfs2 /dev/nbd2 /mnt/
mount: wrong fs type, bad option, bad superblock on /dev/nbd2,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so
[ 3513.303346] GFS2: fsid=: Trying to join cluster "lock_dlm", "gutsy:gfs2"
[ 3513.304546] DLM/GFS2/GFS ERROR: (u)mount helpers are not installed properly!
[ 3513.306290] GFS2: fsid=: can't mount proto=lock_dlm, table=gutsy:gfs2, hostdata=
You might want to notice that it will also avoid mount to hang or fail silently
or with strange errors that will require the cluster to reboot/restart before
you can actually mount the filesystem again.
Signed-off-by: Fabio M. Di Nitto <fabbione@ubuntu.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Steven Whitehouse [Fri, 9 Nov 2007 10:07:21 +0000 (10:07 +0000)]
[GFS2] Don't periodically update the jindex
We only care about the content of the jindex in two cases,
one is when we mount the fs and the other is when we need
to recover another journal. In both cases we have to update
the jindex anyway, so there is no point in updating it
periodically between times, so this removes it to simplify
gfs2_logd.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Steven Whitehouse [Fri, 9 Nov 2007 10:01:41 +0000 (10:01 +0000)]
[GFS2] Move gfs2_logd into log.c
This means that we can mark gfs2_ail1_empty static and prepares
the way for further changes.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Steven Whitehouse [Thu, 8 Nov 2007 14:55:03 +0000 (14:55 +0000)]
[GFS2] Use atomic_t for journal free blocks counter
This patch changes the counter which keeps track of the free
blocks in the journal to an atomic_t in preparation for the
following patch which will update the log reservation code.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Steven Whitehouse [Thu, 8 Nov 2007 14:25:12 +0000 (14:25 +0000)]
[GFS2] Don't add glocks to the journal
The only reason for adding glocks to the journal was to keep track
of which locks required a log flush prior to release. We add a
flag to the glock to allow this check to be made in a simpler way.
This reduces the size of a glock (by 12 bytes on i386, 24 on x86_64)
and means that we can avoid extra work during the journal flush.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
David Teigland [Wed, 7 Nov 2007 15:03:56 +0000 (09:03 -0600)]
[GFS2] check kthread_should_stop when waiting
Use wait_event_interruptible() in the lock_dlm thread instead
of an open coded equivalent, and include a kthread_should_stop()
check in the wait test so we don't miss a kthread_stop().
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Bob Peterson [Fri, 2 Nov 2007 14:37:15 +0000 (09:37 -0500)]
[GFS2] Given device ID rather than s_id in "id" sysfs file
This patch changes the /sys/fs/gfs2/<s_id>/id file to give the device
id "major:minor" rather than the s_id. That enables gfs2_tool to
match devices properly (by id, not name) when locating the tuning files.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Steven Whitehouse [Fri, 2 Nov 2007 09:14:31 +0000 (09:14 +0000)]
[GFS2] Remove flags no longer required
The HIF_MUTEX and HIF_PROMOTE flags were set on the glock holders
depending upon which of the two waiters lists they were going to
be queued upon. They were then tested when the holders were taken
off the lists to ensure that the right type of holder was being
dequeued.
Since we are already using separate lists, there doesn't seem a
lot of point having these flags as well, and since setting them
and testing them is in the fast path for locking and unlocking
glock, this patch removes them.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Steven Whitehouse [Fri, 2 Nov 2007 08:39:34 +0000 (08:39 +0000)]
[GFS2] Reorder writeback for glock sync
Previously we were doing (write data, wait for data, write metadata, wait
for metadata). After this patch we so (write metadata, write data, wait for
data, wait for metadata) which should be more efficient.
Also I noticed that the drop_bh and xmote_bh functions were almost
identical. In fact the only difference was a single test, and that
test is such that in the drop_bh case, it would always evaluate to
the correct result. As such we can use the xmote_bh functions in
all the places where we were using the drop_bh function and remove
the drop_bh functions.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Steven Whitehouse [Thu, 1 Nov 2007 09:34:14 +0000 (09:34 +0000)]
[GFS2] Add sync_page to metadata address space operations
This set of address space operations was missing a sync_page
operation.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Steven Whitehouse [Thu, 1 Nov 2007 09:26:54 +0000 (09:26 +0000)]
[GFS2] Remove "reclaim limit"
This call to reclaim glocks is not needed, and in particular we don't want it
in the fast path for locking glocks. The limit was entirely arbitrary anyway
and we can't expect users to adjust things like this, the remaining code will
do the right thing on its own.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Steven Whitehouse [Wed, 31 Oct 2007 14:24:33 +0000 (14:24 +0000)]
[GFS2] Remove unused variables
These haven't been used for some time, remove them.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Steven Whitehouse [Thu, 18 Oct 2007 10:15:50 +0000 (11:15 +0100)]
[GFS2] Use correct include file in ops_address.c
Something changed in the upstream kernel, and it needs this
one-liner to allow ops_address.c to build.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>