platform/kernel/linux-stable.git
12 years agomlx4_core: Fix mtt profile issue
Marcel Apfelbaum [Thu, 19 Jan 2012 09:45:31 +0000 (09:45 +0000)]
mlx4_core: Fix mtt profile issue

Num mtts from profile is really the number of mtt segments.
Thus, in make profile, to get the proper number of MTT entries,
must multiply num_mtts by mtts per segment.

Signed-off-by: Marcel Apfelbaum <marcela@dev.mellanox.co.il>
Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agomlx4_core: removed function index from vf.
Marcel Apfelbaum [Thu, 19 Jan 2012 09:45:19 +0000 (09:45 +0000)]
mlx4_core: removed function index from vf.

The Virtual Functions should not be aware their function number.

Signed-off-by: Marcel Apfelbaum <marcela@dev.mellanox.co.il>
Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agomlx4_en: eth statistics modification
Eugenia Emantayev [Thu, 19 Jan 2012 09:45:05 +0000 (09:45 +0000)]
mlx4_en: eth statistics modification

In native mode display all available staticstics.
In SRIOV mode on VF display only SW counters statistics,
in SRIOV mode on hypervisor display SW counters and errors (got from FW)
statistics.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il>
Reviewed-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agomlx4: VF is not allowed to perform dump stats
Eugenia Emantayev [Thu, 19 Jan 2012 09:44:37 +0000 (09:44 +0000)]
mlx4: VF is not allowed to perform dump stats

In multifunction mode - DUMP_STATS command is not executed
for VFs.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il>
Reviewed-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agomlx4_en: clear all eth statistics when port goes up
Eugenia Emantayev [Thu, 19 Jan 2012 09:42:37 +0000 (09:42 +0000)]
mlx4_en: clear all eth statistics when port goes up

Bug fix: Not all stats fields were cleared.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il>
Reviewed-by: Yevgeny Petriln <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agotcp: fix undo after RTO for CUBIC
Neal Cardwell [Wed, 18 Jan 2012 17:47:59 +0000 (17:47 +0000)]
tcp: fix undo after RTO for CUBIC

This patch fixes CUBIC so that cwnd reductions made during RTOs can be
undone (just as they already can be undone when using the default/Reno
behavior).

When undoing cwnd reductions, BIC-derived congestion control modules
were restoring the cwnd from last_max_cwnd. There were two problems
with using last_max_cwnd to restore a cwnd during undo:

(a) last_max_cwnd was set to 0 on state transitions into TCP_CA_Loss
(by calling the module's reset() functions), so cwnd reductions from
RTOs could not be undone.

(b) when fast_covergence is enabled (which it is by default)
last_max_cwnd does not actually hold the value of snd_cwnd before the
loss; instead, it holds a scaled-down version of snd_cwnd.

This patch makes the following changes:

(1) upon undo, revert snd_cwnd to ca->loss_cwnd, which is already, as
the existing comment notes, the "congestion window at last loss"

(2) stop forgetting ca->loss_cwnd on TCP_CA_Loss events

(3) use ca->last_max_cwnd to check if we're in slow start

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Sangtae Ha <sangtae.ha@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agotcp: fix undo after RTO for BIC
Neal Cardwell [Wed, 18 Jan 2012 17:47:58 +0000 (17:47 +0000)]
tcp: fix undo after RTO for BIC

This patch fixes BIC so that cwnd reductions made during RTOs can be
undone (just as they already can be undone when using the default/Reno
behavior).

When undoing cwnd reductions, BIC-derived congestion control modules
were restoring the cwnd from last_max_cwnd. There were two problems
with using last_max_cwnd to restore a cwnd during undo:

(a) last_max_cwnd was set to 0 on state transitions into TCP_CA_Loss
(by calling the module's reset() functions), so cwnd reductions from
RTOs could not be undone.

(b) when fast_covergence is enabled (which it is by default)
last_max_cwnd does not actually hold the value of snd_cwnd before the
loss; instead, it holds a scaled-down version of snd_cwnd.

This patch makes the following changes:

(1) upon undo, revert snd_cwnd to ca->loss_cwnd, which is already, as
the existing comment notes, the "congestion window at last loss"

(2) stop forgetting ca->loss_cwnd on TCP_CA_Loss events

(3) use ca->last_max_cwnd to check if we're in slow start

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoenic: fix compile when CONFIG_PCI_IOV is not enabled
Roopa Prabhu [Thu, 19 Jan 2012 22:25:36 +0000 (22:25 +0000)]
enic: fix compile when CONFIG_PCI_IOV is not enabled

reverting back change that access enic->num_vfs outside
CONFIG_PCI_IOV

Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoixgbevf: make operations tables const
Stephen Hemminger [Wed, 18 Jan 2012 22:13:34 +0000 (22:13 +0000)]
ixgbevf: make operations tables const

The arrays of function pointers should be const to make life harder
for rootkits.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoixgbevf: fix sparse warnings
Stephen Hemminger [Wed, 18 Jan 2012 22:13:33 +0000 (22:13 +0000)]
ixgbevf: fix sparse warnings

Fixes sparse warnings:
drivers/net/ethernet/intel/ixgbevf/vf.c:418:21: warning: symbol 'ixgbevf_82599_vf_info' was not declared. Should it be static?
drivers/net/ethernet/intel/ixgbevf/vf.c:423:21: warning: symbol 'ixgbevf_X540_vf_info' was not declared. Should it be static?
drivers/net/ethernet/intel/ixgbevf/mbx.c:331:29: warning: symbol 'ixgbevf_mbx_ops' was not declared. Should it be static?

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoixgbevf: make ethtool ops and strings const
Stephen Hemminger [Wed, 18 Jan 2012 22:13:32 +0000 (22:13 +0000)]
ixgbevf: make ethtool ops and strings const

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoixgbevf: Prevent possible race condition by checking for message
Greg Rose [Wed, 18 Jan 2012 22:13:31 +0000 (22:13 +0000)]
ixgbevf: Prevent possible race condition by checking for message

The mailbox interrupt routine might cause a race condition sometimes
and cause a message to be missed.

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoixgbe: Fix register defines to correctly handle complex expressions
Alexander Duyck [Wed, 18 Jan 2012 22:13:30 +0000 (22:13 +0000)]
ixgbe: Fix register defines to correctly handle complex expressions

This patch is meant to address possible issues with the IXGBE register
defines generating incorrect values when given a complex expression for the
register offset.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoigbvf: Remove unnecessary irq disable/enable
Mitch A Williams [Wed, 18 Jan 2012 22:13:29 +0000 (22:13 +0000)]
igbvf: Remove unnecessary irq disable/enable

This irq disable/enable pair used to wrap access to the driver's vlgrp
struct, which is no longer present. So, then, this could also so no longer
be present.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoigbvf: remove unneeded cast
Stephen Hemminger [Wed, 18 Jan 2012 22:13:28 +0000 (22:13 +0000)]
igbvf: remove unneeded cast

The cast and comment are unnecessary in the current upstream kernel.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Tested-by: Garrett, RobertX E <robertx.e.garrett@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoigb: Update Copyright on all Intel copyrighted files.
Carolyn Wyborny [Wed, 18 Jan 2012 22:13:27 +0000 (22:13 +0000)]
igb: Update Copyright on all Intel copyrighted files.

Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoigb: make local functions static
Stephen Hemminger [Wed, 18 Jan 2012 22:13:26 +0000 (22:13 +0000)]
igb: make local functions static

Sparse caught two functions that were only being used in one file.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agonet: ftgmac100/ftmac100: add missing interrupt.h include
Thomas Faber [Wed, 18 Jan 2012 13:45:44 +0000 (13:45 +0000)]
net: ftgmac100/ftmac100: add missing interrupt.h include

Fixes compilation failure of these modules due to missing
irqreturn_t type for the ft(g)mac100_interrupt definition.

Signed-off-by: Thomas Faber <thfabba@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agobonding: fix enslaving in alb mode when link down
Jiri Bohac [Wed, 18 Jan 2012 12:24:54 +0000 (12:24 +0000)]
bonding: fix enslaving in alb mode when link down

bond_alb_init_slave() is called from bond_enslave() and sets the slave's MAC
address. This is done differently for TLB and ALB modes.
bond->alb_info.rlb_enabled is used to discriminate between the two modes but
this flag may be uninitialized if the slave is being enslaved prior to calling
bond_open() -> bond_alb_initialize() on the master.

It turns out all the callers of alb_set_slave_mac_addr() pass
bond->alb_info.rlb_enabled as the hw parameter.

This patch cleans up the unnecessary parameter of alb_set_slave_mac_addr() and
makes the function decide based on the bonding mode instead, which fixes the
above problem.

Reported-by: Narendra K <Narendra_K@Dell.com>
Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agopch_gbe: Do not abort probe on bad MAC
Darren Hart [Mon, 16 Jan 2012 09:50:19 +0000 (09:50 +0000)]
pch_gbe: Do not abort probe on bad MAC

If the MAC is invalid or not implemented, do not abort the probe. Issue
a warning and prevent bringing the interface up until a MAC is set manually
(via ifconfig $IFACE hw ether $MAC).

Tested on two platforms, one with a valid MAC, the other without a MAC. The real
MAC is used if present, the interface fails to come up until the MAC is set on
the other. They successfully get an IP over DHCP and pass a simple ping and
login over ssh test.

This is meant to allow the Inforce SYS940X development board:
http://www.inforcecomputing.com/SYS940X_ECX.html
(and others suffering from a missing MAC) to work with the mainline kernel.
Without this patch, the probe will fail and the interface will not be created,
preventing the user from configuring the MAC manually.

This does not make any attempt to address a missing or invalid MAC for the
pch_phub driver.

Signed-off-by: Darren Hart <dvhart@linux.intel.com>
CC: Arjan van de Ven <arjan@linux.intel.com>
CC: Alan Cox <alan@linux.intel.com>
CC: Tomoya MORINAGA <tomoya.rohm@gmail.com>
CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: Paul Gortmaker <paul.gortmaker@windriver.com>
CC: Jon Mason <jdmason@kudzu.us>
CC: netdev@vger.kernel.org
CC: Mark Brown <broonie@opensource.wolfsonmicro.com>
CC: David Laight <David.Laight@ACULAB.COM>
CC: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agonet: race condition in ipv6 forwarding and disable_ipv6 parameters
Francesco Ruggeri [Mon, 16 Jan 2012 10:40:10 +0000 (10:40 +0000)]
net: race condition in ipv6 forwarding and disable_ipv6 parameters

There is a race condition in addrconf_sysctl_forward() and
addrconf_sysctl_disable().
These functions change idev->cnf.forwarding (resp. idev->cnf.disable_ipv6)
and then try to grab the rtnl lock before performing any actions.
If that fails they restore the original value and restart the syscall.
This creates race conditions if ipv6 code tries to access
these parameters, or if multiple instances try to do the same operation.
As an example of the former, if __ipv6_ifa_notify() finds a 0 in
idev->cnf.forwarding when invoked by addrconf_ifdown() it may not free
anycast addresses, ultimately resulting in the net_device not being freed.
This patch reads the user parameters into a temporary location and only
writes the actual parameters when the rtnl lock is acquired.
Tested in 2.6.38.8.
Signed-off-by: Francesco Ruggeri <fruggeri@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoenic: fix location of vnic dev unregister in enic_probe cleanup code
Roopa Prabhu [Wed, 18 Jan 2012 04:24:12 +0000 (04:24 +0000)]
enic: fix location of vnic dev unregister in enic_probe cleanup code

The vnic_dev_unregister is erroneously under CONFIG_PCI_IOV. This patch moves
it out of CONFIG_PCI_IOV

Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
Signed-off-by: Christian Benvenuti <benve@cisco.com>
Signed-off-by: Sujith Sankar <ssujith@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoenic: rearrange some of the port profile code
Roopa Prabhu [Wed, 18 Jan 2012 04:24:07 +0000 (04:24 +0000)]
enic: rearrange some of the port profile code

This patch rearranges some of the port profile code in enic_probe.
It moves out some lines of port profile related code currently
inside CONFIG_PCI_IOV. This is only done to move all port profile
related code together so that it can help isolate the port profile
handling code under a separate #ifdef in our internal build scripts.

Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
Signed-off-by: Christian Benvenuti <benve@cisco.com>
Signed-off-by: Sujith Sankar <ssujith@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoenic: Add sriov vf device id checks in port profile code
Roopa Prabhu [Wed, 18 Jan 2012 04:24:02 +0000 (04:24 +0000)]
enic: Add sriov vf device id checks in port profile code

This patch adds checks for sriov vf's in enic port profile handling code.
sriov vf's are same as dynamic vnics but with a different device id.

Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
Signed-off-by: Christian Benvenuti <benve@cisco.com>
Signed-off-by: Sujith Sankar <ssujith@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoenic: This patch adds pci id 0x71 for SRIOV VF's
Roopa Prabhu [Wed, 18 Jan 2012 04:23:55 +0000 (04:23 +0000)]
enic: This patch adds pci id 0x71 for SRIOV VF's

This patch adds pci id 0x71 for sriov VF's.

Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
Signed-off-by: Christian Benvenuti <benve@cisco.com>
Signed-off-by: Sujith Sankar <ssujith@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agomlx4_en: set number of rx rings used by RSS using ethtool
Yevgeny Petrilin [Tue, 17 Jan 2012 22:54:55 +0000 (22:54 +0000)]
mlx4_en: set number of rx rings used by RSS using ethtool

Value must be a power of 2 due to HW limitation.
Driver supports only 'equal' mode in ethtool and can't be set by using weights.

Signed-off-by: Amir Vadai <amirv@mellanox.co.il>
Reviewed-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agonet: bpf_jit: fix divide by 0 generation
Eric Dumazet [Wed, 18 Jan 2012 07:21:42 +0000 (07:21 +0000)]
net: bpf_jit: fix divide by 0 generation

Several problems fixed in this patch :

1) Target of the conditional jump in case a divide by 0 is performed
   by a bpf is wrong.

2) Must 'generate' the full function prologue/epilogue at pass=0,
   or else we can stop too early in pass=1 if the proglen doesnt change.
   (if the increase of prologue/epilogue equals decrease of all
    instructions length because some jumps are converted to near jumps)

3) Change the wrong length detection at the end of code generation to
   issue a more explicit message, no need for a full stack trace.

Reported-by: Phil Oester <kernel@linuxace.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless
David S. Miller [Wed, 18 Jan 2012 20:59:32 +0000 (15:59 -0500)]
Merge branch 'master' of git://git./linux/kernel/git/linville/wireless

12 years agob43: add option to avoid duplicating device support with brcmsmac
John W. Linville [Wed, 11 Jan 2012 20:50:15 +0000 (15:50 -0500)]
b43: add option to avoid duplicating device support with brcmsmac

Signed-off-by: John W. Linville <linville@tuxdriver.com>
12 years agomac80211: fix work removal on deauth request
Johannes Berg [Wed, 18 Jan 2012 13:10:25 +0000 (14:10 +0100)]
mac80211: fix work removal on deauth request

When deauth is requested while an auth or assoc
work item is in progress, we currently delete it
without regard for any state it might need to
clean up. Fix it by cleaning up for those items.

In the case Pontus found, the problem manifested
itself as such:

authenticate with 00:23:69:aa:dd:7b (try 1)
authenticated
failed to insert Dummy STA entry for the AP (error -17)
deauthenticating from 00:23:69:aa:dd:7b by local choice (reason=2)

It could also happen differently if the driver
uses the tx_sync callback.

We can't just call the ->done() method of the work
items because that will lock up due to the locking
in cfg80211. This fix isn't very clean, but that
seems acceptable since I have patches pending to
remove this code completely.

Cc: stable@vger.kernel.org
Reported-by: Pontus Fuchs <pontus.fuchs@gmail.com>
Tested-by: Pontus Fuchs <pontus.fuchs@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
12 years agomac80211: Use the right headroom size for mesh mgmt frames
Javier Cardona [Wed, 18 Jan 2012 02:17:46 +0000 (18:17 -0800)]
mac80211: Use the right headroom size for mesh mgmt frames

Use local->tx_headroom instad of local->hw.extra_tx_headroom.
local->tx_headroom is the max of hw.extra_tx_headroom required by the
driver and the headroom required by mac80211 for status reporting.  On
drivers where hw.extra_tx_headroom is smaller than what mac80211
requires (e.g. ath5k), we would not reserve sufficient buffer space to
report tx status.

Also, don't reserve local->tx_headroom + local->hw.extra_tx_headroom.

Reported-by: Simon Morgenthaler <s.morgenthaler@students.unibe.ch>
Reported-by: Kai Scharwies <kai@scharwies.de>
Signed-off-by: Javier Cardona <javier@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
12 years agobrcmsmac: fix tx queue flush infinite loop
Stanislaw Gruszka [Tue, 17 Jan 2012 11:38:50 +0000 (12:38 +0100)]
brcmsmac: fix tx queue flush infinite loop

This patch workaround live deadlock problem caused by infinite loop
in brcms_c_wait_for_tx_completion(). I do not consider the patch as
the proper fix, which should fix the real reason of tx queue flush
failure, but patch helps with system lockup.

Reference:
https://bugzilla.kernel.org/show_bug.cgi?id=42576

Reported-and-tested-by: Patrick <ragamuffin@datacomm.ch>
Cc: stable@vger.kernel.org # 3.2+
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
12 years agomac80211: fix debugfs key->station symlink
Johannes Berg [Tue, 17 Jan 2012 09:32:01 +0000 (10:32 +0100)]
mac80211: fix debugfs key->station symlink

Since stations moved into a virtual interface
subdirectory, this link has been broken. Fix it.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
12 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Linus Torvalds [Wed, 18 Jan 2012 06:26:41 +0000 (22:26 -0800)]
Merge git://git./linux/kernel/git/davem/net

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (47 commits)
  tg3: Fix single-vector MSI-X code
  openvswitch: Fix multipart datapath dumps.
  ipv6: fix per device IP snmp counters
  inetpeer: initialize ->redirect_genid in inet_getpeer()
  net: fix NULL-deref in WARN() in skb_gso_segment()
  net: WARN if skb_checksum_help() is called on skb requiring segmentation
  caif: Remove bad WARN_ON in caif_dev
  caif: Fix typo in Vendor/Product-ID for CAIF modems
  bnx2x: Disable AN KR work-around for BCM57810
  bnx2x: Remove AutoGrEEEn for BCM84833
  bnx2x: Remove 100Mb force speed for BCM84833
  bnx2x: Fix PFC setting on BCM57840
  bnx2x: Fix Super-Isolate mode for BCM84833
  net: fix some sparse errors
  net: kill duplicate included header
  net: sh-eth: Fix build error by the value which is not defined
  net: Use device model to get driver name in skb_gso_segment()
  bridge: BH already disabled in br_fdb_cleanup()
  net: move sock_update_memcg outside of CONFIG_INET
  mwl8k: Fixing Sparse ENDIAN CHECK warning
  ...

12 years agotg3: Fix single-vector MSI-X code
Matt Carlson [Tue, 17 Jan 2012 15:27:23 +0000 (15:27 +0000)]
tg3: Fix single-vector MSI-X code

Kdump kernels leave MSI-X interrupts (as setup by the crashed kernel)
enabled.  However, kdump only enables one CPU in the new environment,
thus causing tg3 to abort MSI-X setup.  When the driver attempts to
enable INTA or MSI interrupt modes on a kdump kernel, interrupt
delivery fails.

This patch attempts to workaround the problem by forcing the driver
to enable a single MSI-X interrupt.  In such a configuration, the
device's multivector interrupt mode must be disabled.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoopenvswitch: Fix multipart datapath dumps.
Ben Pfaff [Tue, 17 Jan 2012 13:33:39 +0000 (13:33 +0000)]
openvswitch: Fix multipart datapath dumps.

The logic to split up the list of datapaths into multiple Netlink messages
was simply wrong, causing the list to be terminated after the first part.
Only about the first 50 datapaths would be dumped.  This fixes the
problem.

Reported-by: Paul Ingram <paul@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoipv6: fix per device IP snmp counters
Eric Dumazet [Tue, 17 Jan 2012 12:45:36 +0000 (12:45 +0000)]
ipv6: fix per device IP snmp counters

In commit 4ce3c183fca (snmp: 64bit ipstats_mib for all arches), I forgot
to change the /proc/net/dev_snmp6/xxx output for IP counters.

percpu array is 64bit per counter but the folding still used the 'long'
variant, and output garbage on 32bit arches.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoMerge tag 'arm-soc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Linus Torvalds [Wed, 18 Jan 2012 02:55:56 +0000 (18:55 -0800)]
Merge tag 'arm-soc-fixes' of git://git./linux/kernel/git/arm/arm-soc

ARM: fixes for ARM platforms

Some fallout from the 3.3. merge window as well as a couple bug fixes
for older preexisting bugs that seem valid to include at this time:

* sched_clock changes broke picoxcell, fix included
* BSYM bugs causing issues with thumb2-built kernels on SMP
* Missing module.h include on msm.
* A collection of bugfixes for samsung platforms that didn't make it into
  the first pull requests.

* tag 'arm-soc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  ARM: make BSYM macro assembly only
  ARM: highbank: remove incorrect BSYM usage
  ARM: imx: remove incorrect BSYM usage
  ARM: exynos: remove incorrect BSYM usage
  ARM: ux500: add missing ENDPROC to headsmp.S
  ARM: msm: Add missing ENDPROC to headsmp.S
  ARM: versatile: Add missing ENDPROC to headsmp.S
  ARM: EXYNOS: Invert VCLK polarity for framebuffer on ORIGEN
  ARM: S3C64XX: Fix interrupt configuration for PCA935x on Cragganmore
  ARM: S3C64XX: Fix the memory mapped GPIOs on Cragganmore
  ARM: S3C64XX: Remove hsmmc1 from Cragganmore
  ARM: S3C64XX: Remove unconditional power domain disables
  ARM: SAMSUNG: Declare struct platform_device in plat/s3c64xx-spi.h
  ARM: SAMSUNG: dma-ops.h needs mach/dma.h
  ARM: SAMSUNG: Guard against multiple inclusion of plat/dma.h
  ARM: picoxcell: fix sched_clock() cleanup fallout
  ARM: msm: vreg is a module and so needs module.h

12 years agoMerge branch 'next' of git://git.infradead.org/users/vkoul/slave-dma
Linus Torvalds [Wed, 18 Jan 2012 02:40:24 +0000 (18:40 -0800)]
Merge branch 'next' of git://git.infradead.org/users/vkoul/slave-dma

* 'next' of git://git.infradead.org/users/vkoul/slave-dma: (53 commits)
  ARM: mach-shmobile: specify CHCLR registers on SH7372
  dma: shdma: fix runtime PM: clear channel buffers on reset
  dma/imx-sdma: save irq flags when use spin_lock in sdma_tx_submit
  dmaengine/ste_dma40: clear LNK on channel startup
  dmaengine: intel_mid_dma: remove legacy pm interface
  ASoC: mxs: correct 'direction' of device_prep_dma_cyclic
  dmaengine: intel_mid_dma: error path fix
  dmaengine: intel_mid_dma: locking and freeing fixes
  mtd: gpmi-nand: move to dma_transfer_direction
  mtd: fix compile error for gpmi-nand
  mmc: mxs-mmc: fix the dma_transfer_direction migration
  dmaengine: add DMA_TRANS_NONE to dma_transfer_direction
  dma: mxs-dma: Don't use CLKGATE bits in CTRL0 to disable DMA channels
  dma: mxs-dma: make mxs_dma_prep_slave_sg() multi user safe
  dma: mxs-dma: Always leave mxs_dma_init() with the clock disabled.
  dma: mxs-dma: fix a typo in comment
  DMA: PL330: Remove pm_runtime_xxx calls from pl330 probe/remove
  video i.MX IPU: Fix display connections
  i.MX IPU DMA: Fix wrong burstsize settings
  dmaengine/ste_dma40: allow fixed physical channel
  ...

Fix up conflicts in drivers/dma/{Kconfig,mxs-dma.c,pl330.c}

The conflicts looked pretty trivial, but I'll ask people to verify them.

12 years agoMerge branch 'upstream-linus' of git://github.com/jgarzik/libata-dev
Linus Torvalds [Wed, 18 Jan 2012 02:11:38 +0000 (18:11 -0800)]
Merge branch 'upstream-linus' of git://github.com/jgarzik/libata-dev

* 'upstream-linus' of git://github.com/jgarzik/libata-dev:
  [libata] ata_piix: Add Toshiba Satellite Pro A120 to the quirks list due to broken suspend functionality.
  [libata] add DVRTD08A and DVR-215 to NOSETXFER device quirk list
  [libata] pata_bf54x: Support sg list in bmdma transfer.
  [libata] sata_fsl: fix the controller operating mode
  [libata] enable ata port async suspend

12 years agox86-32: Fix build failure with AUDIT=y, AUDITSYSCALL=n
Al Viro [Wed, 18 Jan 2012 01:51:22 +0000 (01:51 +0000)]
x86-32: Fix build failure with AUDIT=y, AUDITSYSCALL=n

JONGMAN HEO reports:

  With current linus git (commit a25a2b84), I got following build error,

  arch/x86/kernel/vm86_32.c: In function 'do_sys_vm86':
  arch/x86/kernel/vm86_32.c:340: error: implicit declaration of function '__audit_syscall_exit'
  make[3]: *** [arch/x86/kernel/vm86_32.o] Error 1

OK, I can reproduce it (32bit allmodconfig with AUDIT=y, AUDITSYSCALL=n)

It's due to commit d7e7528bcd45: "Audit: push audit success and retcode
into arch ptrace.h".

Reported-by: JONGMAN HEO <jongman.heo@samsung.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years ago[libata] ata_piix: Add Toshiba Satellite Pro A120 to the quirks list
Benjamin Larsson [Sat, 7 Jan 2012 23:39:10 +0000 (00:39 +0100)]
[libata] ata_piix: Add Toshiba Satellite Pro A120 to the quirks list
due to broken suspend functionality.

Signed-off-by: Benjamin Larsson <benjamin@southpole.se>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
12 years ago[libata] add DVRTD08A and DVR-215 to NOSETXFER device quirk list
Vladimir LAVALLADE [Sun, 8 Jan 2012 12:50:13 +0000 (13:50 +0100)]
[libata] add DVRTD08A and DVR-215 to NOSETXFER device quirk list

Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
12 years ago[libata] pata_bf54x: Support sg list in bmdma transfer.
Sonic Zhang [Wed, 4 Jan 2012 06:06:51 +0000 (14:06 +0800)]
[libata] pata_bf54x: Support sg list in bmdma transfer.

BF54x on-chip ATAPI controller allows maximum 0x1fffe bytes to be transfered
in one ATAPI transfer. So, set the max sg_tablesize to 4.

Signed-off-by: Sonic Zhang <sonic.zhang@analog.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
12 years ago[libata] sata_fsl: fix the controller operating mode
Jerry Huang [Tue, 20 Dec 2011 06:50:27 +0000 (14:50 +0800)]
[libata] sata_fsl: fix the controller operating mode

Configure the FSL SATA controller to the preferred, enterprise mode.

Signed-off-by: Yutaka Ando <r46913@freescale.com>
Signed-off-by: Jerry Huang <Chang-Ming.Huang@freescale.com>
CC: Jeff Garzik <jgarzik@pobox.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
12 years ago[libata] enable ata port async suspend
Lin Ming [Mon, 16 Jan 2012 05:23:23 +0000 (13:23 +0800)]
[libata] enable ata port async suspend

This saves devices suspend/resume time.

Tested system suspend/resume with SATA IDE/AHCI mode 3 times.
Below is the time took for devices suspend/resume.

SATA mode    vanilla-kernel           patched-kernel
---------    ---------------------    ---------------------
IDE          suspend: 0.744           suspend: 0.432
             (0.716, 0.768, 0.748)    (0.440, 0.428, 0.428)

             resume: 5.084            resume: 2.209
             (5.100, 5.064, 5.088)    (2.168, 2.232, 2.228)

AHCI:        suspend: 0.725           suspend: 0.449
             (0.740, 0.708, 0.728)    (0.456, 0.448, 0.444)

             resume: 2.556            resume: 1.896
             (2.604, 2.492, 2.572)    (1.932, 1.872, 1.884)

Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
12 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris...
Linus Torvalds [Wed, 18 Jan 2012 00:43:39 +0000 (16:43 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jmorris/linux-security

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
  integrity: digital signature config option name change
  lib: Removed MPILIB, MPILIB_EXTRA, and SIGNATURE prompts
  lib: MPILIB Kconfig description update
  lib: digital signature dependency fix
  lib: digital signature config option name change
  encrypted-keys: fix rcu and sparse messages
  keys: fix trusted/encrypted keys sparse rcu_assign_pointer messages
  KEYS: Add missing smp_rmb() primitives to the keyring search code
  TOMOYO: Accept \000 as a valid character.
  security: update MAINTAINERS file with new git repo

12 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit
Linus Torvalds [Wed, 18 Jan 2012 00:06:51 +0000 (16:06 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/audit

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit: (29 commits)
  audit: no leading space in audit_log_d_path prefix
  audit: treat s_id as an untrusted string
  audit: fix signedness bug in audit_log_execve_info()
  audit: comparison on interprocess fields
  audit: implement all object interfield comparisons
  audit: allow interfield comparison between gid and ogid
  audit: complex interfield comparison helper
  audit: allow interfield comparison in audit rules
  Kernel: Audit Support For The ARM Platform
  audit: do not call audit_getname on error
  audit: only allow tasks to set their loginuid if it is -1
  audit: remove task argument to audit_set_loginuid
  audit: allow audit matching on inode gid
  audit: allow matching on obj_uid
  audit: remove audit_finish_fork as it can't be called
  audit: reject entry,always rules
  audit: inline audit_free to simplify the look of generic code
  audit: drop audit_set_macxattr as it doesn't do anything
  audit: inline checks for not needing to collect aux records
  audit: drop some potentially inadvisable likely notations
  ...

Use evil merge to fix up grammar mistakes in Kconfig file.

Bad speling and horrible grammar (and copious swearing) is to be
expected, but let's keep it to commit messages and comments, rather than
expose it to users in config help texts or printouts.

12 years agoMerge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
Linus Torvalds [Tue, 17 Jan 2012 23:54:56 +0000 (15:54 -0800)]
Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs

* 'for-linus' of git://oss.sgi.com/xfs/xfs:
  xfs: cleanup xfs_file_aio_write
  xfs: always return with the iolock held from xfs_file_aio_write_checks
  xfs: remove the i_new_size field in struct xfs_inode
  xfs: remove the i_size field in struct xfs_inode
  xfs: replace i_pin_wait with a bit waitqueue
  xfs: replace i_flock with a sleeping bitlock
  xfs: make i_flags an unsigned long
  xfs: remove the if_ext_max field in struct xfs_ifork
  xfs: remove the unused dm_attrs structure
  xfs: cleanup xfs_iomap_eof_align_last_fsb
  xfs: remove xfs_itruncate_data

12 years agoMerge branch 'btrfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Linus Torvalds [Tue, 17 Jan 2012 23:52:51 +0000 (15:52 -0800)]
Merge branch 'btrfs' of git://git./linux/kernel/git/viro/vfs

* 'btrfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  btrfs: take allocation of ->tree_root into open_ctree()
  btrfs: let ->s_fs_info point to fs_info, not root...
  btrfs: consolidate failure exits in btrfs_mount() a bit
  btrfs: make free_fs_info() call ->kill_sb() unconditional
  btrfs: merge free_fs_info() calls on fill_super failures
  btrfs: kill pointless reassignment of ->s_fs_info in btrfs_fill_super()
  btrfs: make open_ctree() return int
  btrfs: sanitizing ->fs_info, part 5
  btrfs: sanitizing ->fs_info, part 4
  btrfs: sanitizing ->fs_info, part 3
  btrfs: sanitizing ->fs_info, part 2
  btrfs: sanitizing ->fs_info, part 1
  btrfs: fix a deadlock in btrfs_scan_one_device()
  btrfs: fix mount/umount race
  btrfs: get ->kill_sb() of its own
  btrfs: preparation to fixing mount/umount race

12 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux...
Linus Torvalds [Tue, 17 Jan 2012 23:49:54 +0000 (15:49 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/mason/linux-btrfs

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (62 commits)
  Btrfs: use larger system chunks
  Btrfs: add a delalloc mutex to inodes for delalloc reservations
  Btrfs: space leak tracepoints
  Btrfs: protect orphan block rsv with spin_lock
  Btrfs: add allocator tracepoints
  Btrfs: don't call btrfs_throttle in file write
  Btrfs: release space on error in page_mkwrite
  Btrfs: fix btrfsck error 400 when truncating a compressed
  Btrfs: do not use btrfs_end_transaction_throttle everywhere
  Btrfs: add balance progress reporting
  Btrfs: allow for resuming restriper after it was paused
  Btrfs: allow for canceling restriper
  Btrfs: allow for pausing restriper
  Btrfs: add skip_balance mount option
  Btrfs: recover balance on mount
  Btrfs: save balance parameters to disk
  Btrfs: soft profile changing mode (aka soft convert)
  Btrfs: implement online profile changing
  Btrfs: do not reduce profile in do_chunk_alloc()
  Btrfs: virtual address space subset filter
  ...

Fix up trivial conflict in fs/btrfs/ioctl.c due to the use of the new
mnt_drop_write_file() helper.

12 years agointegrity: digital signature config option name change
Dmitry Kasatkin [Tue, 17 Jan 2012 15:12:07 +0000 (17:12 +0200)]
integrity: digital signature config option name change

Similar to SIGNATURE, rename INTEGRITY_DIGSIG to INTEGRITY_SIGNATURE.

Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
Signed-off-by: James Morris <jmorris@namei.org>
12 years agolib: Removed MPILIB, MPILIB_EXTRA, and SIGNATURE prompts
Dmitry Kasatkin [Tue, 17 Jan 2012 15:12:06 +0000 (17:12 +0200)]
lib: Removed MPILIB, MPILIB_EXTRA, and SIGNATURE prompts

As modules are expected to select MPILIB, MPILIB_EXTRA, and SIGNATURE,
removed Kconfig prompts.

Requested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
Signed-off-by: James Morris <jmorris@namei.org>
12 years agolib: MPILIB Kconfig description update
Dmitry Kasatkin [Tue, 17 Jan 2012 15:12:05 +0000 (17:12 +0200)]
lib: MPILIB Kconfig description update

It was reported that description of the MPILIB_EXTRA is confusing.
Indeed it was copy-paste typo. It is fixed here.

Reported-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
Signed-off-by: James Morris <jmorris@namei.org>
12 years agolib: digital signature dependency fix
Dmitry Kasatkin [Tue, 17 Jan 2012 15:12:04 +0000 (17:12 +0200)]
lib: digital signature dependency fix

Randy Dunlap reported build break:

ERROR: "crypto_alloc_shash" [lib/digsig.ko] undefined!
ERROR: "crypto_shash_final" [lib/digsig.ko] undefined!
ERROR: "crypto_shash_update" [lib/digsig.ko] undefined!
ERROR: "crypto_destroy_tfm" [lib/digsig.ko] undefined!

Added CRYPTO dependency and selected SHA1 algorithm.

Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
Signed-off-by: James Morris <jmorris@namei.org>
12 years agolib: digital signature config option name change
Dmitry Kasatkin [Tue, 17 Jan 2012 15:12:03 +0000 (17:12 +0200)]
lib: digital signature config option name change

It was reported that DIGSIG is confusing name for digital signature
module. It was suggested to rename DIGSIG to SIGNATURE.

Requested-by: Linus Torvalds <torvalds@linux-foundation.org>
Suggested-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
Signed-off-by: James Morris <jmorris@namei.org>
12 years agoencrypted-keys: fix rcu and sparse messages
Mimi Zohar [Tue, 17 Jan 2012 20:40:02 +0000 (20:40 +0000)]
encrypted-keys: fix rcu and sparse messages

Enabling CONFIG_PROVE_RCU and CONFIG_SPARSE_RCU_POINTER resulted in
"suspicious rcu_dereference_check() usage!" and "incompatible types
in comparison expression (different address spaces)" messages.

Access the masterkey directly when holding the rwsem.

Changelog v1:
- Use either rcu_read_lock()/rcu_derefence_key()/rcu_read_unlock()
or remove the unnecessary rcu_derefence() - David Howells

Reported-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
12 years agokeys: fix trusted/encrypted keys sparse rcu_assign_pointer messages
Mimi Zohar [Tue, 17 Jan 2012 20:39:51 +0000 (20:39 +0000)]
keys: fix trusted/encrypted keys sparse rcu_assign_pointer messages

Define rcu_assign_keypointer(), which uses the key payload.rcudata instead
of payload.data, to resolve the CONFIG_SPARSE_RCU_POINTER message:
"incompatible types in comparison expression (different address spaces)"

Replace the rcu_assign_pointer() calls in encrypted/trusted keys with
rcu_assign_keypointer().

Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
12 years agoKEYS: Add missing smp_rmb() primitives to the keyring search code
David Howells [Tue, 17 Jan 2012 20:39:40 +0000 (20:39 +0000)]
KEYS: Add missing smp_rmb() primitives to the keyring search code

Add missing smp_rmb() primitives to the keyring search code.

When keyring payloads are appended to without replacement (thus using up spare
slots in the key pointer array), an smp_wmb() is issued between the pointer
assignment and the increment of the key count (nkeys).

There should be corresponding read barriers between the read of nkeys and
dereferences of keys[n] when n is dependent on the value of nkeys.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
12 years agoTOMOYO: Accept \000 as a valid character.
Tetsuo Handa [Sun, 15 Jan 2012 02:05:59 +0000 (11:05 +0900)]
TOMOYO: Accept \000 as a valid character.

TOMOYO 2.5 in Linux 3.2 and later handles Unix domain socket's address.
Thus, tomoyo_correct_word2() needs to accept \000 as a valid character, or
TOMOYO 2.5 cannot handle Unix domain's abstract socket address.

Reported-by: Steven Allen <steven@stebalien.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
CC: stable@vger.kernel.org [3.2+]
Signed-off-by: James Morris <jmorris@namei.org>
12 years agoFix compile breakage with kref.h
James Bottomley [Tue, 17 Jan 2012 21:14:05 +0000 (21:14 +0000)]
Fix compile breakage with kref.h

This set of build failures just started appearing on parisc:

  In file included from drivers/input/serio/serio_raw.c:12:
  include/linux/kref.h: In function 'kref_get':
  include/linux/kref.h:40: error: 'TAINT_WARN' undeclared (first use in this function)
  include/linux/kref.h:40: error: (Each undeclared identifier is reported only once
  include/linux/kref.h:40: error: for each function it appears in.)
  include/linux/kref.h: In function 'kref_sub':
  include/linux/kref.h:65: error: 'TAINT_WARN' undeclared (first use in this function)

It happens because TAINT_WARN is defined in kernel.h and this particular
compile doesn't seem to include it (no idea why it's just manifesting ..
probably some #include file untangling exposed it).

Fix by adding

  #include <linux/kernel.h>

to linux/kref.h

Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agosecurity: update MAINTAINERS file with new git repo
James Morris [Tue, 17 Jan 2012 23:40:44 +0000 (10:40 +1100)]
security: update MAINTAINERS file with new git repo

Update MAINTAINERS file with new git repo:
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security.git

Signed-off-by: James Morris <jmorris@namei.org>
12 years agoproc: clean up and fix /proc/<pid>/mem handling
Linus Torvalds [Tue, 17 Jan 2012 23:21:19 +0000 (15:21 -0800)]
proc: clean up and fix /proc/<pid>/mem handling

Jüri Aedla reported that the /proc/<pid>/mem handling really isn't very
robust, and it also doesn't match the permission checking of any of the
other related files.

This changes it to do the permission checks at open time, and instead of
tracking the process, it tracks the VM at the time of the open.  That
simplifies the code a lot, but does mean that if you hold the file
descriptor open over an execve(), you'll continue to read from the _old_
VM.

That is different from our previous behavior, but much simpler.  If
somebody actually finds a load where this matters, we'll need to revert
this commit.

I suspect that nobody will ever notice - because the process mapping
addresses will also have changed as part of the execve.  So you cannot
actually usefully access the fd across a VM change simply because all
the offsets for IO would have changed too.

Reported-by: Jüri Aedla <asd@ut.ee>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agoaudit: no leading space in audit_log_d_path prefix
Kees Cook [Fri, 6 Jan 2012 22:07:10 +0000 (14:07 -0800)]
audit: no leading space in audit_log_d_path prefix

audit_log_d_path() injects an additional space before the prefix,
which serves no purpose and doesn't mix well with other audit_log*()
functions that do not sneak extra characters into the log.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Eric Paris <eparis@redhat.com>
12 years agoaudit: treat s_id as an untrusted string
Kees Cook [Sat, 7 Jan 2012 18:41:04 +0000 (10:41 -0800)]
audit: treat s_id as an untrusted string

The use of s_id should go through the untrusted string path, just to be
extra careful.

Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Mimi Zohar <zohar@us.ibm.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
12 years agoaudit: fix signedness bug in audit_log_execve_info()
Xi Wang [Tue, 20 Dec 2011 23:39:41 +0000 (18:39 -0500)]
audit: fix signedness bug in audit_log_execve_info()

In the loop, a size_t "len" is used to hold the return value of
audit_log_single_execve_arg(), which returns -1 on error.  In that
case the error handling (len <= 0) will be bypassed since "len" is
unsigned, and the loop continues with (p += len) being wrapped.
Change the type of "len" to signed int to fix the error handling.

size_t len;
...
for (...) {
len = audit_log_single_execve_arg(...);
if (len <= 0)
break;
p += len;
}

Signed-off-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
12 years agoaudit: comparison on interprocess fields
Peter Moody [Wed, 4 Jan 2012 20:24:31 +0000 (15:24 -0500)]
audit: comparison on interprocess fields

This allows audit to specify rules in which we compare two fields of a
process.  Such as is the running process uid != to the running process
euid?

Signed-off-by: Peter Moody <pmoody@google.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
12 years agoaudit: implement all object interfield comparisons
Peter Moody [Wed, 14 Dec 2011 00:17:51 +0000 (16:17 -0800)]
audit: implement all object interfield comparisons

This completes the matrix of interfield comparisons between uid/gid
information for the current task and the uid/gid information for inodes.
aka I can audit based on differences between the euid of the process and
the uid of fs objects.

Signed-off-by: Peter Moody <pmoody@google.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
12 years agoaudit: allow interfield comparison between gid and ogid
Eric Paris [Tue, 3 Jan 2012 19:23:08 +0000 (14:23 -0500)]
audit: allow interfield comparison between gid and ogid

Allow audit rules to compare the gid of the running task to the gid of the
inode in question.

Signed-off-by: Eric Paris <eparis@redhat.com>
12 years agoaudit: complex interfield comparison helper
Eric Paris [Tue, 3 Jan 2012 19:23:08 +0000 (14:23 -0500)]
audit: complex interfield comparison helper

Rather than code the same loop over and over implement a helper function which
uses some pointer magic to make it generic enough to be used numerous places
as we implement more audit interfield comparisons

Signed-off-by: Eric Paris <eparis@redhat.com>
12 years agoaudit: allow interfield comparison in audit rules
Eric Paris [Tue, 3 Jan 2012 19:23:08 +0000 (14:23 -0500)]
audit: allow interfield comparison in audit rules

We wish to be able to audit when a uid=500 task accesses a file which is
uid=0.  Or vice versa.  This patch introduces a new audit filter type
AUDIT_FIELD_COMPARE which takes as an 'enum' which indicates which fields
should be compared.  At this point we only define the task->uid vs
inode->uid, but other comparisons can be added.

Signed-off-by: Eric Paris <eparis@redhat.com>
12 years agoKernel: Audit Support For The ARM Platform
Nathaniel Husted [Tue, 3 Jan 2012 19:23:09 +0000 (14:23 -0500)]
Kernel: Audit Support For The ARM Platform

This patch provides functionality to audit system call events on the
ARM platform. The implementation was based off the structure of the
MIPS platform and information in this
(http://lists.fedoraproject.org/pipermail/arm/2009-October/000382.html)
mailing list thread. The required audit_syscall_exit and
audit_syscall_entry checks were added to ptrace using the standard
registers for system call values (r0 through r3). A thread information
flag was added for auditing (TIF_SYSCALL_AUDIT) and a meta-flag was
added (_TIF_SYSCALL_WORK) to simplify modifications to the syscall
entry/exit. Now, if either the TRACE flag is set or the AUDIT flag is
set, the syscall_trace function will be executed. The prober changes
were made to Kconfig to allow CONFIG_AUDITSYSCALL to be enabled.

Due to platform availability limitations, this patch was only tested
on the Android platform running the modified "android-goldfish-2.6.29"
kernel. A test compile was performed using Code Sourcery's
cross-compilation toolset and the current linux-3.0 stable kernel. The
changes compile without error. I'm hoping, due to the simple modifications,
the patch is "obviously correct".

Signed-off-by: Nathaniel Husted <nhusted@gmail.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
12 years agoaudit: do not call audit_getname on error
Eric Paris [Tue, 3 Jan 2012 19:23:08 +0000 (14:23 -0500)]
audit: do not call audit_getname on error

Just a code cleanup really.  We don't need to make a function call just for
it to return on error.  This also makes the VFS function even easier to follow
and removes a conditional on a hot path.

Signed-off-by: Eric Paris <eparis@redhat.com>
12 years agoaudit: only allow tasks to set their loginuid if it is -1
Eric Paris [Tue, 3 Jan 2012 19:23:08 +0000 (14:23 -0500)]
audit: only allow tasks to set their loginuid if it is -1

At the moment we allow tasks to set their loginuid if they have
CAP_AUDIT_CONTROL.  In reality we want tasks to set the loginuid when they
log in and it be impossible to ever reset.  We had to make it mutable even
after it was once set (with the CAP) because on update and admin might have
to restart sshd.  Now sshd would get his loginuid and the next user which
logged in using ssh would not be able to set his loginuid.

Systemd has changed how userspace works and allowed us to make the kernel
work the way it should.  With systemd users (even admins) are not supposed
to restart services directly.  The system will restart the service for
them.  Thus since systemd is going to loginuid==-1, sshd would get -1, and
sshd would be allowed to set a new loginuid without special permissions.

If an admin in this system were to manually start an sshd he is inserting
himself into the system chain of trust and thus, logically, it's his
loginuid that should be used!  Since we have old systems I make this a
Kconfig option.

Signed-off-by: Eric Paris <eparis@redhat.com>
12 years agoaudit: remove task argument to audit_set_loginuid
Eric Paris [Tue, 3 Jan 2012 19:23:08 +0000 (14:23 -0500)]
audit: remove task argument to audit_set_loginuid

The function always deals with current.  Don't expose an option
pretending one can use it for something.  You can't.

Signed-off-by: Eric Paris <eparis@redhat.com>
12 years agoaudit: allow audit matching on inode gid
Eric Paris [Tue, 3 Jan 2012 19:23:07 +0000 (14:23 -0500)]
audit: allow audit matching on inode gid

Much like the ability to filter audit on the uid of an inode collected, we
should be able to filter on the gid of the inode.

Signed-off-by: Eric Paris <eparis@redhat.com>
12 years agoaudit: allow matching on obj_uid
Eric Paris [Tue, 3 Jan 2012 19:23:07 +0000 (14:23 -0500)]
audit: allow matching on obj_uid

Allow syscall exit filter matching based on the uid of the owner of an
inode used in a syscall.  aka:

auditctl -a always,exit -S open -F obj_uid=0 -F perm=wa

Signed-off-by: Eric Paris <eparis@redhat.com>
12 years agoaudit: remove audit_finish_fork as it can't be called
Eric Paris [Tue, 3 Jan 2012 19:23:07 +0000 (14:23 -0500)]
audit: remove audit_finish_fork as it can't be called

Audit entry,always rules are not allowed and are automatically changed in
exit,always rules in userspace.  The kernel refuses to load such rules.

Thus a task in the middle of a syscall (and thus in audit_finish_fork())
can only be in one of two states: AUDIT_BUILD_CONTEXT or AUDIT_DISABLED.
Since the current task cannot be in AUDIT_RECORD_CONTEXT we aren't every
going to actually use the code in audit_finish_fork() since it will
return without doing anything.  Thus drop the code.

Signed-off-by: Eric Paris <eparis@redhat.com>
12 years agoaudit: reject entry,always rules
Eric Paris [Tue, 3 Jan 2012 19:23:07 +0000 (14:23 -0500)]
audit: reject entry,always rules

We deprecated entry,always rules a long time ago.  Reject those rules as
invalid.

Signed-off-by: Eric Paris <eparis@redhat.com>
12 years agoaudit: inline audit_free to simplify the look of generic code
Eric Paris [Tue, 3 Jan 2012 19:23:07 +0000 (14:23 -0500)]
audit: inline audit_free to simplify the look of generic code

make the conditional a static inline instead of doing it in generic code.

Signed-off-by: Eric Paris <eparis@redhat.com>
12 years agoaudit: drop audit_set_macxattr as it doesn't do anything
Eric Paris [Tue, 3 Jan 2012 19:23:07 +0000 (14:23 -0500)]
audit: drop audit_set_macxattr as it doesn't do anything

unused.  deleted.

Signed-off-by: Eric Paris <eparis@redhat.com>
12 years agoaudit: inline checks for not needing to collect aux records
Eric Paris [Tue, 3 Jan 2012 19:23:07 +0000 (14:23 -0500)]
audit: inline checks for not needing to collect aux records

A number of audit hooks make function calls before they determine that
auxilary records do not need to be collected.  Do those checks as static
inlines since the most common case is going to be that records are not
needed and we can skip the function call overhead.

Signed-off-by: Eric Paris <eparis@redhat.com>
12 years agoaudit: drop some potentially inadvisable likely notations
Eric Paris [Tue, 3 Jan 2012 19:23:06 +0000 (14:23 -0500)]
audit: drop some potentially inadvisable likely notations

The audit code makes heavy use of likely() and unlikely() macros, but they
don't always make sense.  Drop any that seem questionable and let the
computer do it's thing.

Signed-off-by: Eric Paris <eparis@redhat.com>
12 years agoaudit: remove AUDIT_SETUP_CONTEXT as it isn't used
Eric Paris [Tue, 3 Jan 2012 19:23:06 +0000 (14:23 -0500)]
audit: remove AUDIT_SETUP_CONTEXT as it isn't used

Audit contexts have 3 states.  Disabled, which doesn't collect anything,
build, which collects info but might not emit it, and record, which
collects and emits.  There is a 4th state, setup, which isn't used.  Get
rid of it.

Signed-off-by: Eric Paris <eparis@redhat.com>
12 years agoaudit: inline audit_syscall_entry to reduce burden on archs
Eric Paris [Tue, 3 Jan 2012 19:23:06 +0000 (14:23 -0500)]
audit: inline audit_syscall_entry to reduce burden on archs

Every arch calls:

if (unlikely(current->audit_context))
audit_syscall_entry()

which requires knowledge about audit (the existance of audit_context) in
the arch code.  Just do it all in static inline in audit.h so that arch's
can remain blissfully ignorant.

Signed-off-by: Eric Paris <eparis@redhat.com>
12 years agoaudit: ia32entry.S sign extend error codes when calling 64 bit code
Eric Paris [Tue, 3 Jan 2012 19:23:06 +0000 (14:23 -0500)]
audit: ia32entry.S sign extend error codes when calling 64 bit code

In the ia32entry syscall exit audit fastpath we have assembly code which calls
__audit_syscall_exit directly.  This code was, however, zeroes the upper 32
bits of the return code.  It then proceeded to call code which expects longs
to be 64bits long.  In order to handle code which expects longs to be 64bit we
sign extend the return code if that code is an error.  Thus the
__audit_syscall_exit function can correctly handle using the values in
snprintf("%ld").  This fixes the regression introduced in 5cbf1565f29eb57a86a.

Old record:
type=SYSCALL msg=audit(1306197182.256:281): arch=40000003 syscall=192 success=no exit=4294967283
New record:
type=SYSCALL msg=audit(1306197182.256:281): arch=40000003 syscall=192 success=no exit=-13

Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
12 years agoAudit: push audit success and retcode into arch ptrace.h
Eric Paris [Tue, 3 Jan 2012 19:23:06 +0000 (14:23 -0500)]
Audit: push audit success and retcode into arch ptrace.h

The audit system previously expected arches calling to audit_syscall_exit to
supply as arguments if the syscall was a success and what the return code was.
Audit also provides a helper AUDITSC_RESULT which was supposed to simplify things
by converting from negative retcodes to an audit internal magic value stating
success or failure.  This helper was wrong and could indicate that a valid
pointer returned to userspace was a failed syscall.  The fix is to fix the
layering foolishness.  We now pass audit_syscall_exit a struct pt_reg and it
in turns calls back into arch code to collect the return value and to
determine if the syscall was a success or failure.  We also define a generic
is_syscall_success() macro which determines success/failure based on if the
value is < -MAX_ERRNO.  This works for arches like x86 which do not use a
separate mechanism to indicate syscall failure.

We make both the is_syscall_success() and regs_return_value() static inlines
instead of macros.  The reason is because the audit function must take a void*
for the regs.  (uml calls theirs struct uml_pt_regs instead of just struct
pt_regs so audit_syscall_exit can't take a struct pt_regs).  Since the audit
function takes a void* we need to use static inlines to cast it back to the
arch correct structure to dereference it.

The other major change is that on some arches, like ia64, MIPS and ppc, we
change regs_return_value() to give us the negative value on syscall failure.
THE only other user of this macro, kretprobe_example.c, won't notice and it
makes the value signed consistently for the audit functions across all archs.

In arch/sh/kernel/ptrace_64.c I see that we were using regs[9] in the old
audit code as the return value.  But the ptrace_64.h code defined the macro
regs_return_value() as regs[3].  I have no idea which one is correct, but this
patch now uses the regs_return_value() function, so it now uses regs[3].

For powerpc we previously used regs->result but now use the
regs_return_value() function which uses regs->gprs[3].  regs->gprs[3] is
always positive so the regs_return_value(), much like ia64 makes it negative
before calling the audit code when appropriate.

Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: H. Peter Anvin <hpa@zytor.com> [for x86 portion]
Acked-by: Tony Luck <tony.luck@intel.com> [for ia64]
Acked-by: Richard Weinberger <richard@nod.at> [for uml]
Acked-by: David S. Miller <davem@davemloft.net> [for sparc]
Acked-by: Ralf Baechle <ralf@linux-mips.org> [for mips]
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [for ppc]
12 years agoseccomp: audit abnormal end to a process due to seccomp
Eric Paris [Tue, 3 Jan 2012 19:23:05 +0000 (14:23 -0500)]
seccomp: audit abnormal end to a process due to seccomp

The audit system likes to collect information about processes that end
abnormally (SIGSEGV) as this may me useful intrusion detection information.
This patch adds audit support to collect information when seccomp forces a
task to exit because of misbehavior in a similar way.

Signed-off-by: Eric Paris <eparis@redhat.com>
12 years agoaudit: check current inode and containing object when filtering on major and minor
Eric Paris [Tue, 3 Jan 2012 19:23:05 +0000 (14:23 -0500)]
audit: check current inode and containing object when filtering on major and minor

The audit system has the ability to filter on the major and minor number of
the device containing the inode being operated upon.  Lets say that
/dev/sda1 has major,minor 8,1 and that we mount /dev/sda1 on /boot.  Now lets
say we add a watch with a filter on 8,1.  If we proceed to open an inode
inside /boot, such as /vboot/vmlinuz, we will match the major,minor filter.

Lets instead assume that one were to use a tool like debugfs and were to
open /dev/sda1 directly and to modify it's contents.  We might hope that
this would also be logged, but it isn't.  The rules will check the
major,minor of the device containing /dev/sda1.  In other words the rule
would match on the major/minor of the tmpfs mounted at /dev.

I believe these rules should trigger on either device.  The man page is
devoid of useful information about the intended semantics.  It only seems
logical that if you want to know everything that happened on a major,minor
that would include things that happened to the device itself...

Signed-off-by: Eric Paris <eparis@redhat.com>
12 years agoaudit: drop the meaningless and format breaking word 'user'
Eric Paris [Tue, 3 Jan 2012 19:23:05 +0000 (14:23 -0500)]
audit: drop the meaningless and format breaking word 'user'

userspace audit messages look like so:

type=USER msg=audit(1271170549.415:24710): user pid=14722 uid=0 auid=500 ses=1 subj=unconfined_u:unconfined_r:auditctl_t:s0-s0:c0.c1023 msg=''

That third field just says 'user'.  That's useless and doesn't follow the
key=value pair we are trying to enforce.  We already know it came from the
user based on the record type.  Kill that word.  Die.

Signed-off-by: Eric Paris <eparis@redhat.com>
12 years agoaudit: dynamically allocate audit_names when not enough space is in the names array
Eric Paris [Tue, 3 Jan 2012 19:23:05 +0000 (14:23 -0500)]
audit: dynamically allocate audit_names when not enough space is in the names array

This patch does 2 things.  First it reduces the number of audit_names
allocated in every audit context from 20 to 5.  5 should be enough for all
'normal' syscalls (rename being the worst).  Some syscalls can still touch
more the 5 inodes such as mount.  When rpc filesystem is mounted it will
create inodes and those can exceed 5.  To handle that problem this patch will
dynamically allocate audit_names if it needs more than 5.  This should
decrease the typicall memory usage while still supporting all the possible
kernel operations.

Signed-off-by: Eric Paris <eparis@redhat.com>
12 years agoaudit: make filetype matching consistent with other filters
Eric Paris [Tue, 3 Jan 2012 19:23:05 +0000 (14:23 -0500)]
audit: make filetype matching consistent with other filters

Every other filter that matches part of the inodes list collected by audit
will match against any of the inodes on that list.  The filetype matching
however had a strange way of doing things.  It allowed userspace to
indicated if it should match on the first of the second name collected by
the kernel.  Name collection ordering seems like a kernel internal and
making userspace rules get that right just seems like a bad idea.  As it
turns out the userspace audit writers had no idea it was doing this and
thus never overloaded the value field.  The kernel always checked the first
name collected which for the tested rules was always correct.

This patch just makes the filetype matching like the major, minor, inode,
and LSM rules in that it will match against any of the names collected.  It
also changes the rule validation to reject the old unused rule types.

Noone knew it was there.  Noone used it.  Why keep around the extra code?

Signed-off-by: Eric Paris <eparis@redhat.com>
12 years agoxfs: cleanup xfs_file_aio_write
Christoph Hellwig [Sun, 18 Dec 2011 20:00:14 +0000 (20:00 +0000)]
xfs: cleanup xfs_file_aio_write

With all the size field updates out of the way xfs_file_aio_write can
be further simplified by pushing all iolock handling into
xfs_file_dio_aio_write and xfs_file_buffered_aio_write and using
the generic generic_write_sync helper for synchronous writes.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
12 years agoxfs: always return with the iolock held from xfs_file_aio_write_checks
Christoph Hellwig [Sun, 18 Dec 2011 20:00:13 +0000 (20:00 +0000)]
xfs: always return with the iolock held from xfs_file_aio_write_checks

While xfs_iunlock is fine with 0 lockflags the calling conventions are much
cleaner if xfs_file_aio_write_checks never returns without the iolock held.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
12 years agoxfs: remove the i_new_size field in struct xfs_inode
Christoph Hellwig [Sun, 18 Dec 2011 20:00:12 +0000 (20:00 +0000)]
xfs: remove the i_new_size field in struct xfs_inode

Now that we use the VFS i_size field throughout XFS there is no need for the
i_new_size field any more given that the VFS i_size field gets updated
in ->write_end before unlocking the page, and thus is always uptodate when
writeback could see a page.  Removing i_new_size also has the advantage that
we will never have to trim back di_size during a failed buffered write,
given that it never gets updated past i_size.

Note that currently the generic direct I/O code only updates i_size after
calling our end_io handler, which requires a small workaround to make
sure di_size actually makes it to disk.  I hope to fix this properly in
the generic code.

A downside is that we lose the support for parallel non-overlapping O_DIRECT
appending writes that recently was added.  I don't think keeping the complex
and fragile i_new_size infrastructure for this is a good tradeoff - if we
really care about parallel appending writers we should investigate turning
the iolock into a range lock, which would also allow for parallel
non-overlapping buffered writers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
12 years agoxfs: remove the i_size field in struct xfs_inode
Christoph Hellwig [Sun, 18 Dec 2011 20:00:11 +0000 (20:00 +0000)]
xfs: remove the i_size field in struct xfs_inode

There is no fundamental need to keep an in-memory inode size copy in the XFS
inode.  We already have the on-disk value in the dinode, and the separate
in-memory copy that we need for regular files only in the XFS inode.

Remove the xfs_inode i_size field and change the XFS_ISIZE macro to use the
VFS inode i_size field for regular files.  Switch code that was directly
accessing the i_size field in the xfs_inode to XFS_ISIZE, or in cases where
we are limited to regular files direct access of the VFS inode i_size field.

This also allows dropping some fairly complicated code in the write path
which dealt with keeping the xfs_inode i_size uptodate with the VFS i_size
that is getting updated inside ->write_end.

Note that we do not bother resetting the VFS i_size when truncating a file
that gets freed to zero as there is no point in doing so because the VFS inode
is no longer in use at this point.  Just relax the assert in xfs_ifree to
only check the on-disk size instead.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
12 years agoxfs: replace i_pin_wait with a bit waitqueue
Christoph Hellwig [Sun, 18 Dec 2011 20:00:10 +0000 (20:00 +0000)]
xfs: replace i_pin_wait with a bit waitqueue

Replace i_pin_wait, which is only used during synchronous inode flushing
with a bit waitqueue.  This trades off a much smaller inode against
slightly slower wakeup performance, and saves 12 (32-bit) or 20 (64-bit)
bytes in the XFS inode.

Reviewed-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
12 years agoxfs: replace i_flock with a sleeping bitlock
Christoph Hellwig [Sun, 18 Dec 2011 20:00:09 +0000 (20:00 +0000)]
xfs: replace i_flock with a sleeping bitlock

We almost never block on i_flock, the exception is synchronous inode
flushing.  Instead of bloating the inode with a 16/24-byte completion
that we abuse as a semaphore just implement it as a bitlock that uses
a bit waitqueue for the rare sleeping path.  This primarily is a
tradeoff between a much smaller inode and a faster non-blocking
path vs faster wakeups, and we are much better off with the former.

A small downside is that we will lose lockdep checking for i_flock, but
given that it's always taken inside the ilock that should be acceptable.

Note that for example the inode writeback locking is implemented in a
very similar way.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
12 years agoxfs: make i_flags an unsigned long
Christoph Hellwig [Sun, 18 Dec 2011 20:00:08 +0000 (20:00 +0000)]
xfs: make i_flags an unsigned long

To be used for bit wakeup i_flags needs to be an unsigned long or we'll
run into trouble on big endian systems.  Because of the 1-byte i_update
field right after it this actually causes a fairly large size increase
on its own (4 or 8 bytes), but that increase will be more than offset
by the next two patches.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
12 years agoxfs: remove the if_ext_max field in struct xfs_ifork
Christoph Hellwig [Sun, 18 Dec 2011 20:00:07 +0000 (20:00 +0000)]
xfs: remove the if_ext_max field in struct xfs_ifork

We spent a lot of effort to maintain this field, but it always equals to the
fork size divided by the constant size of an extent.  The prime use of it is
to assert that the two stay in sync.  Just divide the fork size by the extent
size in the few places that we actually use it and remove the overhead
of maintaining it.  Also introduce a few helpers to consolidate the places
where we actually care about the value.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>