review.tizen.org Git - platform/kernel/linux-3.10.git/log

nl80211/mac80211: support full station state in AP mode

Today, stations are added already associated. That is
inefficient if, for example, the driver has no room
for stations any more because then the station will
go through the entire auth/assoc handshake, only to
be kicked out afterwards.

To address this a bit better, at least with drivers
using the new station state callback, allow hostapd
to add stations in unauthenticated mode, just after
receiving the AUTH frame, before even replying. Thus
if there's no more space at that point, it can send
a negative auth frame back. It still needs to handle
later state transition errors though, of course.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>

cfg80211: move some AP code to right file

Some AP code ended up in mlme.c as ap.c didn't
exist when it was written, move it now.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>

mac80211: restrict assoc request VHT capabilities

In interoperability testing some APs showed bad behaviour
if some of the VHT capabilities of the station are better
than their own. Restrict the assoc request parameters
- beamformee capabable,
- RX STBC and
- RX MCS set
to the subset that the AP can support.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>

cfg80211: move world roaming check for beacon hints

We should not add new beacon hints even if the wiphy
is not world roaming. Without this we were always adding
a beacon hint if not world roaming for every non world
roaming wiphy interface.

Tested-by: Ben Greear <greearb@candelatech.com>
Reported-by: Ben Greear <greearb@candelatech.com>
Reported-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
[fix locking]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

cfg80211: move reg_is_world_roaming()

This will be used later by other code. This has no
functional change.

Tested-by: Ben Greear <greearb@candelatech.com>
Reported-by: Ben Greear <greearb@candelatech.com>
Reported-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

cfg80211: do not process beacon hints if one is already queued

Regulatory beacon hints are used to help with world roaming
and as it is right now we learn from a beacon hint processed
on one wiphy to all other wiphys. The processing of beacon
hints however is scheduled and if we have a lot of interfaces
we may hit the case that we'll queue a the same beacon hint
many times until its processed.

To avoid this do a lookup on the queued up beacon hints prior
to adding a new beacon hint. If the beacon hint is removed
from the pending reg beacon hint list then it would be processed
and we'd ensure all wiphys would have learned from it, if its
on the pending reg beacon list we'd now find it prior to it
being processed.

Tested-by: Ben Greear <greearb@candelatech.com>
Reported-by: Ben Greear <greearb@candelatech.com>
Reported-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

mac80211: assign bss_conf.bssid only once

Instead of checking every time bss_info_changed is called,
assign the pointer once depending on the interface type
and then leave it untouched until the interface type is
changed. This makes the ieee80211_bss_info_change_notify()
now a simple wrapper to call the driver only.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>

mac80211: further simplify ieee80211_bss_info_change_notify

The special case in the function isn't really needed,
instead make the suspend code a bit better and also
easier to understand and move the warning into the
driver op wrapper inline.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>

mac80211: reconfig bss_info_changed only if beaconing

For AP/IBSS/mesh interfaces, call the driver to reconfigure
bss_info_changed only if the interface was beaconing before
suspend, otherwise we call the driver and it might interpret
the change as going from enabled to disabled.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>

mac80211: track enable_beacon explicitly

Instead of calculating in ieee80211_bss_info_change_notify()
whether beaconing should be enabled or not, set it in the
correct places in the callers. This simplifies the logic in
this function at the expense of offchannel, but is also more
robust.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>

mac80211: fix channel context iteration

During suspend/resume channel contexts might be
iterated even if they haven't been re-added to
the driver, keep track of this and skip them in
iteration. Also use the new status for sanity
checks.

Also clarify the fact that during HW restart all
contexts are iterated over (thanks Eliad.)

Signed-off-by: Johannes Berg <johannes.berg@intel.com>

mac80211: clean up association better in suspend

When suspending, bss_info_changed() is called to
disable beacons, but managed mode interfaces are
simply removed (bss_info_changed() is called with
"no change" only). This can lead to problems.

To fix this and copy the BSS configuration, clear
it during suspend and restore it on resume.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>

mac80211: clean up ieee80211_quiesce

It's a bit odd that there's a return value that only
depends on the iftype, move that logic out of the
function into the only caller that needs it.

Also, since the quiescing could stop timers that
trigger the sdata work, move the sdata work cancel
into the function and after the actual quiesce.

Finally, there's no need to call it on interfaces
that are down, so don't.

Change-Id: I1632d46d21ba3558ea713d035184f1939905f2f1
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

mac82011: use frame control to differentiate probe resp/beacon

The probe response/beacon management frame RX code passes a
bool parameter to differentiate beacons and probe responses.
This is useless since we have the frame and can thus use its
frame control field. Moreover it is buggy since there is one
call to ieee80211_rx_bss_info with a beacon frame that is
indicated as a probe response, which is also fixed by using
the frame control field, so do that.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

mac80211: fix ieee80211_ie_build_vht_cap indentation

Signed-off-by: Johannes Berg <johannes.berg@intel.com>

cfg80211: adjacent 80+80 MHz channel segments are invalid

In that case, it's really a 160 MHz channel, so disallow
this configuration.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>

mac80211: optimise AP stop RCU handling

If there are VLANs, stopping an AP is inefficient as it
calls rcu_barrier() once for each interface (the VLANs
and the AP itself). Optimise this by moving rcu_barrier()
out of the station cleanups and calling it only once for
all interfaces combined.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>

regulatory: use IS_ERR macro family for freq_reg_info

Instead of returning an error and filling a pointer
return the pointer and an ERR_PTR value in error cases.

Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

regulatory: use RCU to protect last_request

This will allow making freq_reg_info() lock-free.

Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

regulatory: use RCU to protect global and wiphy regdomains

To simplify the locking and not require cfg80211_mutex
(which nl80211 uses to access the global regdomain) and
also to make it possible for drivers to access their
wiphy->regd safely, use RCU to protect these pointers.

Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

regulatory: pass new regdomain to reset function

Instead of assigning after calling the function do
it inside the function. This will later avoid a
period of time where the pointer is NULL.

Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

regulatory: remove handling of channel bandwidth

The channel bandwidth handling isn't really quite right,
it assumes that a 40 MHz channel is really two 20 MHz
channels, which isn't strictly true. This is the way the
regulatory database handling is defined right now though
so remove the logic to handle other channel widths.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>

regulatory: fix reg_is_valid_request handling

There's a bug with the world regulatory domain, it
can be updated any time which is different from all
other regdomains that can only be updated once after
a request for them. Fix this by adding a check for
"processed" to the reg_is_valid_request() function
and clear that when doing a request.

While looking at this I also found another locking
bug, last_request is protected by the reg_mutex not
the cfg80211_mutex so the code in nl80211 is racy.
Remove that code as it only tries to prevent an
allocation in an error case, which isn't necessary.
Then the function can also become static and locking
in nl80211 can have a smaller scope.

Also change __set_regdom() to do the checks earlier
and not different for world/other regdomains.

Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

regulatory: remove locking from wiphy_apply_custom_regulatory

wiphy_apply_custom_regulatory() doesn't have to hold
the regulatory mutex as it only modifies the given
wiphy with the given regulatory domain, it doesn't
access any global regulatory data.

Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

regulatory: clarify locking rules and assertions

Many places that currently check that cfg80211_mutex
is held don't actually use any data protected by it.
The functions that need to hold the cfg80211_mutex
are the ones using the cfg80211_regdomain variable,
so add the lock assertion to those and clarify this
in the comments.

The reason for this is that nl80211 uses the regdom
without being able to hold reg_mutex.

Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

regulatory: simplify freq_reg_info_regd

The function itself has dual-purpose: it can
retrieve from a given regdomain or from the
globally installed one. Change it to have a
single purpose only: to look up from a given
regdomain. Pass the correct regdomain in the
freq_reg_info() function instead.

This also changes the locking rules for it,
no locking is required any more.

Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

regulatory: remove useless warning

Even if it never happens and is hidden behind the
debug config option, it's completely useless: the
calltrace will only show module loading.

Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

regulatory: remove redundant isalpha() check

toupper() only modifies lower-case letters, so
the isalpha() check is redundant; remove it.

Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

regulatory: simplify restore_regulatory_settings

Use list_splice_tail_init() and also simplify the locking.

Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

regulatory: remove BUG_ON

This code is a bit too BUG_ON happy, remove all
instances and while doing so make some code a bit
smarter by passing the right pointer instead of
indices into arrays.

Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

cfg80211: remove wiphy_idx_valid

This is pretty much useless since get_wiphy_idx()
always returns true since it's always called with
a valid wiphy pointer.

Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

regulatory: use proper enum for return values

Instead of treating special error codes specially,
like -EALREADY, introduce a real enum for all the
needed possibilities and use it.

Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

regulatory: remove useless locking on exit

It would be a major problem if anything were to run
concurrently while the module is being unloaded so
remove the locking that doesn't help anything.

Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

regulatory: code cleanup

Clean up various things like indentation, extra
parentheses, too many/few line breaks, etc.

Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

regulatory: simplify regulatory_hint_11d

There's no need to unlock before calling
queue_regulatory_request(), so simplify
the function.

Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

regulatory: don't test list before iterating

There's no need to test whether a list is
empty or not before iterating.

Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

regulatory: clean up reg_copy_regd()

Use ERR_PTR/IS_ERR to return the result or errors,
also do some code cleanups.

Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

regulatory: clean up regdom_intersect

As the dummy_rule (also renamed from irule) is only
used for output by the reg_rules_intersect() function
there's no need to clear it at all, remove that.

Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

regulatory: don't allocate too much memory

There's no need to allocate one reg rule more
than will be used, reduce the allocations. The
allocation in nl80211 already doesn't allocate
too much space.

Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

regulatory: don't write past array when intersecting rules

When intersecting rules, we count first to know how many
rules need to be allocated, and then do the intersection
into the allocated array. However, the code doing this
writes past the end of the array because it attempts to
do all intersections. Make it stop when the right number
of rules has been reached.

Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

mac80211: remove a bit of dead mesh code

In a file that's only built when CONFIG_MAC80211_MESH
is defined, having an #ifdef on the same is entirely
pointless, so remove it.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>

mac80211: optimise roaming time again

The last fixes re-added the RCU synchronize penalty
on roaming to fix the races. Split up sta_info_flush()
now to get rid of that again, and let managed mode
(and only it) delay the actual destruction.

Tested-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

mac80211: warn if unexpectedly removing stations

When an interface is brought down it must have been
disconnected (or similar) in all modes other than WDS,
so warn if any stations were removed in other modes.

Tested-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

mac80211: remove final sta_info_flush()

When all interfaces have been removed, there can't
be any stations left over, so there's no need to
flush again. Remove this, and all code associated
with it, which also simplifies the function.

Tested-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wireless: more 'capability info' bits

define bits for 'capability info', as in recent spec edition
IEEE802.11-2012

Also, add mask for 2-bit field 'bss type', as it is in 802.11ad

Signed-off-by: Vladimir Kondratiev <qca_vkondrat@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

mac80211_hwsim: allow testing paged RX

Paged RX, i.e. SKBs with (some of) the data in pages instead
of the SKB header data (skb->data) can behave differently in
the stack and cause other bugs. To make debugging easier add
an option to hwsim to test with such SKBs.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>

mac80211: use short slot time in mesh for 5GHz

Use short slot time in 5GHz for mesh. The performance is
increased from 16.4Mbps to 23.4Mbps for two directly
connected mesh STAs operating in legacy rate using iperf
measurement. Almost similar to the results claimed in IBSS
mode.

Signed-off-by: Chun-Yeow Yeoh <yeohchunyeow@gmail.com>
[call ieee80211_get_sdata_band() only once]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

mac80211: Allow disabling SGI-20

This allows user-space (wpa_supplicant) to disable
short guard interval (SGI) for 20Mhz. The SGI-40
disable option is already handled.

Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

Merge remote-tracking branch 'mac80211/master' into HEAD

mac80211: fix maximum MTU

The maximum MTU shouldn't take the headers into account,
the maximum MSDU size is exactly the maximum MTU.

Signed-off-by: T Krishna Chaitanya <chaitanyatk@posedge.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

mac80211: fix dtim_period in hidden SSID AP association

When AP's SSID is hidden the BSS can appear several times in
cfg80211's BSS list: once with a zero-length SSID that comes
from the beacon, and once for each SSID from probe reponses.

Since the mac80211 stores its data in ieee80211_bss which
is embedded into cfg80211_bss, mac80211's data will be
duplicated too.

This becomes a problem when a driver needs the dtim_period
since this data exists only in the beacon's instance in
cfg80211 bss table which isn't the instance that is used
when associating.

Remove the DTIM period from the BSS table and track it
explicitly to avoid this problem.

Cc: stable@vger.kernel.org
Tested-by: Efi Tubul <efi.tubul@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

mac80211: use del_timer_sync for final sta cleanup timer deletion

This is a very old bug, but there's nothing that prevents the
timer from running while the module is being removed when we
only do del_timer() instead of del_timer_sync().

The timer should normally not be running at this point, but
it's not clearly impossible (or we could just remove this.)

Cc: stable@vger.kernel.org
Tested-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

mac80211: fix station destruction in AP/mesh modes

Unfortunately, commit b22cfcfcae5b, intended to speed up roaming
by avoiding the synchronize_rcu() broke AP/mesh modes as it moved
some code into that work item that will still call into the driver
at a time where it's no longer expected to handle this: after the
AP or mesh has been stopped.

To fix this problem remove the per-station work struct, maintain a
station cleanup list instead and flush this list when stations are
flushed. To keep this patch smaller for stable, do this when the
stations are flushed (sta_info_flush()). This unfortunately brings
back the original roaming delay; I'll fix that again in a separate
patch.

Also, Ben reported that the original commit could sometimes (with
many interfaces) cause long delays when an interface is set down,
due to blocking on flush_workqueue(). Since we now maintain the
cleanup list, this particular change of the original patch can be
reverted.

Cc: stable@vger.kernel.org [3.7]
Reported-by: Ben Greear <greearb@candelatech.com>
Tested-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

mac80211: RMC buckets are just list heads

The array of rmc_entrys is redundant since only the
list_head is used. Make this an array of list_heads
instead and save ~6k per vif at runtime :D

Signed-off-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

mac80211: assign VLAN channel contexts

Make AP_VLAN type interfaces track the AP master channel
context so they have one assigned for the various lookups.
Don't give them their own refcount etc. since they're just
slaves to the AP master.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>

mac80211: flush AP_VLAN stations when tearing down the BSS AP

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
[change to flush stations with AP flush in second loop]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

mac80211: fix ibss scanning

Do not scan on no-IBSS and disabled channels in IBSS mode. Doing this
can trigger Microcode errors on iwlwifi and iwlegacy drivers.

Also rename ieee80211_request_internal_scan() function since it is only
used in IBSS mode and simplify calling it from ieee80211_sta_find_ibss().

This patch should address:
https://bugzilla.redhat.com/show_bug.cgi?id=883414
https://bugzilla.kernel.org/show_bug.cgi?id=49411

Reported-by: Jesse Kahtava <jesse_kahtava@f-m.fm>
Reported-by: Mikko Rapeli <mikko.rapeli@iki.fi>
Cc: stable@vger.kernel.org
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

Linux 3.8-rc1

Merge git://www.linux-watchdog.org/linux-watchdog

Pull watchdog updates from Wim Van Sebroeck:
"This includes some fixes and code improvements (like
  clk_prepare_enable and clk_disable_unprepare), conversion from the
  omap_wdt and twl4030_wdt drivers to the watchdog framework, addition
  of the SB8x0 chipset support and the DA9055 Watchdog driver and some
  OF support for the davinci_wdt driver."

* git://www.linux-watchdog.org/linux-watchdog: (22 commits)
  watchdog: mei: avoid oops in watchdog unregister code path
  watchdog: Orion: Fix possible null-deference in orion_wdt_probe
  watchdog: sp5100_tco: Add SB8x0 chipset support
  watchdog: davinci_wdt: add OF support
  watchdog: da9052: Fix invalid free of devm_ allocated data
  watchdog: twl4030_wdt: Change TWL4030_MODULE_PM_RECEIVER to TWL_MODULE_PM_RECEIVER
  watchdog: remove depends on CONFIG_EXPERIMENTAL
  watchdog: Convert dev_printk(KERN_<LEVEL> to dev_<level>(
  watchdog: DA9055 Watchdog driver
  watchdog: omap_wdt: eliminate goto
  watchdog: omap_wdt: delete redundant platform_set_drvdata() calls
  watchdog: omap_wdt: convert to devm_ functions
  watchdog: omap_wdt: convert to new watchdog core
  watchdog: WatchDog Timer Driver Core: fix comment
  watchdog: s3c2410_wdt: use clk_prepare_enable and clk_disable_unprepare
  watchdog: imx2_wdt: Select the driver via ARCH_MXC
  watchdog: cpu5wdt.c: add missing del_timer call
  watchdog: hpwdt.c: Increase version string
  watchdog: Convert twl4030_wdt to watchdog core
  davinci_wdt: preparation for switch to common clock framework
  ...

Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6

Pull CIFS fixes from Steve French:
"Misc small cifs fixes"

* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: eliminate cifsERROR variable
  cifs: don't compare uniqueids in cifs_prime_dcache unless server inode numbers are in use
  cifs: fix double-free of "string" in cifs_parse_mount_options

Merge tag 'dm-3.8-fixes' of git://git./linux/kernel/git/agk/linux-dm

Pull dm update from Alasdair G Kergon:
"Miscellaneous device-mapper fixes, cleanups and performance
  improvements.

  Of particular note:
   - Disable broken WRITE SAME support in all targets except linear and
     striped.  Use it when kcopyd is zeroing blocks.
   - Remove several mempools from targets by moving the data into the
     bio's new front_pad area(which dm calls 'per_bio_data').
   - Fix a race in thin provisioning if discards are misused.
   - Prevent userspace from interfering with the ioctl parameters and
     use kmalloc for the data buffer if it's small instead of vmalloc.
   - Throttle some annoying error messages when I/O fails."

* tag 'dm-3.8-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm: (36 commits)
  dm stripe: add WRITE SAME support
  dm: remove map_info
  dm snapshot: do not use map_context
  dm thin: dont use map_context
  dm raid1: dont use map_context
  dm flakey: dont use map_context
  dm raid1: rename read_record to bio_record
  dm: move target request nr to dm_target_io
  dm snapshot: use per_bio_data
  dm verity: use per_bio_data
  dm raid1: use per_bio_data
  dm: introduce per_bio_data
  dm kcopyd: add WRITE SAME support to dm_kcopyd_zero
  dm linear: add WRITE SAME support
  dm: add WRITE SAME support
  dm: prepare to support WRITE SAME
  dm ioctl: use kmalloc if possible
  dm ioctl: remove PF_MEMALLOC
  dm persistent data: improve improve space map block alloc failure message
  dm thin: use DMERR_LIMIT for errors
  ...

Revert "nfsd: warn on odd reply state in nfsd_vfs_read"

This reverts commit 79f77bf9a4e3dd5ead006b8f17e7c4ff07d8374e.

This is obviously wrong, and I have no idea how I missed seeing the
warning in testing: I must just not have looked at the right logs. The
caller bumps rq_resused/rq_next_page, so it will always be hit on a
large enough read.

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'rdma-for-linus' of git://git./linux/kernel/git/roland/infiniband

Pull more infiniband changes from Roland Dreier:
"Second batch of InfiniBand/RDMA changes for 3.8:
   - cxgb4 changes to fix lookup engine hash collisions
   - mlx4 changes to make flow steering usable
   - fix to IPoIB to avoid pinning dst reference for too long"

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  RDMA/cxgb4: Fix bug for active and passive LE hash collision path
  RDMA/cxgb4: Fix LE hash collision bug for passive open connection
  RDMA/cxgb4: Fix LE hash collision bug for active open connection
  mlx4_core: Allow choosing flow steering mode
  mlx4_core: Adjustments to Flow Steering activation logic for SR-IOV
  mlx4_core: Fix error flow in the flow steering wrapper
  mlx4_core: Add QPN enforcement for flow steering rules set by VFs
  cxgb4: Add LE hash collision bug fix path in LLD driver
  cxgb4: Add T4 filter support
  IPoIB: Call skb_dst_drop() once skb is enqueued for sending

Merge tag 'asm-generic' of git://git./linux/kernel/git/arnd/asm-generic

Pull asm-generic cleanup from Arnd Bergmann:
"These are a few cleanups for asm-generic:

   - a set of patches from Lars-Peter Clausen to generalize asm/mmu.h
     and use it in the architectures that don't need any special
     handling.
   - A patch from Will Deacon to remove the {read,write}s{b,w,l} as
     discussed during the arm64 review
   - A patch from James Hogan that helps with the meta architecture
     series."

* tag 'asm-generic' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  xtensa: Use generic asm/mmu.h for nommu
  h8300: Use generic asm/mmu.h
  c6x: Use generic asm/mmu.h
  asm-generic/mmu.h: Add support for FDPIC
  asm-generic/mmu.h: Remove unused vmlist field from mm_context_t
  asm-generic: io: remove {read,write} string functions
  asm-generic/io.h: remove asm/cacheflush.h include

ARM: dts: fix duplicated build target and alphabetical sort out for exynos

Commit db5b0ae00712 ("Merge tag 'dt' of git://git.kernel.org/.../arm-soc")
causes a duplicated build target. This patch fixes it and sorts out the
build target alphabetically so that we can recognize something wrong
easily.

Cc: Olof Johansson <olof@lixom.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

dm stripe: add WRITE SAME support

Rename stripe_map_discard to stripe_map_range and reuse it for WRITE
SAME bio processing.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

dm: remove map_info

This patch removes map_info from bio-based device mapper targets.
map_info is still used for request-based targets.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

dm snapshot: do not use map_context

Eliminate struct map_info from dm-snap.

map_info->ptr was used in dm-snap to indicate if the bio was tracked.
If map_info->ptr was non-NULL, the bio was linked in tracked_chunk_hash.

This patch removes the use of map_info->ptr. We determine if the bio was
tracked based on hlist_unhashed(&c->node). If hlist_unhashed is true,
the bio is not tracked, if it is false, the bio is tracked.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

dm thin: dont use map_context

This patch removes endio_hook_pool from dm-thin and uses per-bio data instead.

This patch removes any use of map_info in preparation for the next patch
that removes map_info from bio-based device mapper.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

dm raid1: dont use map_context

Don't use map_info any more in dm-raid1.

map_info was used for writes to hold the region number. For this purpose
we add a new field dm_bio_details to dm_raid1_bio_record.

map_info was used for reads to hold a pointer to dm_raid1_bio_record (if
the pointer was non-NULL, bio details were saved; if the pointer was
NULL, bio details were not saved). We use
dm_raid1_bio_record.details->bi_bdev for this purpose. If bi_bdev is
NULL, details were not saved, if bi_bdev is non-NULL, details were
saved.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

dm flakey: dont use map_context

Replace map_info with a per-bio structure "struct per_bio_data" in dm-flakey.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

dm raid1: rename read_record to bio_record

Rename struct read_record to bio_record in dm-raid1.

In the following patch, the structure will be used for both read and
write bios, so rename it.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

dm: move target request nr to dm_target_io

This patch moves target_request_nr from map_info to dm_target_io and
makes it accessible with dm_bio_get_target_request_nr.

This patch is a preparation for the next patch that removes map_info.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

dm snapshot: use per_bio_data

Replace tracked_chunk_pool with per_bio_data in dm-snap.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

dm verity: use per_bio_data

Replace io_mempool with per_bio_data in dm-verity.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

dm raid1: use per_bio_data

Replace read_record_pool with per_bio_data in dm-raid1.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

dm: introduce per_bio_data

Introduce a field per_bio_data_size in struct dm_target.

Targets can set this field in the constructor. If a target sets this
field to a non-zero value, "per_bio_data_size" bytes of auxiliary data
are allocated for each bio submitted to the target. These data can be
used for any purpose by the target and help us improve performance by
removing some per-target mempools.

Per-bio data is accessed with dm_per_bio_data. The
argument data_size must be the same as the value per_bio_data_size in
dm_target.

If the target has a pointer to per_bio_data, it can get a pointer to
the bio with dm_bio_from_per_bio_data() function (data_size must be the
same as the value passed to dm_per_bio_data).

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

dm kcopyd: add WRITE SAME support to dm_kcopyd_zero

Add WRITE SAME support to dm-io and make it accessible to
dm_kcopyd_zero(). dm_kcopyd_zero() provides an asynchronous interface
whereas the blkdev_issue_write_same() interface is synchronous.

WRITE SAME is a SCSI command that can be leveraged for more efficient
zeroing of a specified logical extent of a device which supports it.
Only a single zeroed logical block is transfered to the target for each
WRITE SAME and the target then writes that same block across the
specified extent.

The dm thin target uses this.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

dm linear: add WRITE SAME support

The linear target can already support WRITE SAME requests so signal
this by setting num_write_same_requests to 1.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

dm: add WRITE SAME support

WRITE SAME bios have a payload that contain a single page. When
cloning WRITE SAME bios DM has no need to modify the bi_io_vec
attributes (and doing so would be detrimental). DM need only alter the
start and end of the WRITE SAME bio accordingly.

Rather than duplicate __clone_and_map_discard, factor out a common
function that is also used by __clone_and_map_write_same.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

dm: prepare to support WRITE SAME

Allow targets to opt in to WRITE SAME support by setting
'num_write_same_requests' in the dm_target structure.

A dm device will only advertise WRITE SAME support if all its
targets and all its underlying devices support it.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

dm ioctl: use kmalloc if possible

If the parameter buffer is small enough, try to allocate it with kmalloc()
rather than vmalloc().

vmalloc is noticeably slower than kmalloc because it has to manipulate
page tables.

In my tests, on PA-RISC this patch speeds up activation 13 times.
On Opteron this patch speeds up activation by 5%.

This patch introduces a new function free_params() to free the
parameters and this uses new flags that record whether or not vmalloc()
was used and whether or not the input buffer must be wiped after use.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

dm ioctl: remove PF_MEMALLOC

When allocating memory for the userspace ioctl data, set some
appropriate GPF flags directly instead of using PF_MEMALLOC.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

dm persistent data: improve improve space map block alloc failure message

Improve space map error message when unable to allocate a new
metadata block.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

dm thin: use DMERR_LIMIT for errors

Throttle all errors logged from the IO path by dm thin.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

dm persistent data: use DMERR_LIMIT for errors

Nearly all of persistent-data is in the IO path so throttle error
messages with DMERR_LIMIT to limit the amount logged when
something has gone wrong.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

dm block manager: reinstate message when validator fails

Reinstate a useful error message when the block manager buffer validator fails.
This was mistakenly eliminated when the block manager was converted to use
dm-bufio.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

dm raid: round region_size to power of two

If the user does not supply a bitmap region_size to the dm raid target,
a reasonable size is computed automatically. If this is not a power of 2,
the md code will report an error later.

This patch catches the problem early and rounds the region_size to the
next power of two.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

dm thin: cleanup dead code

Remove unused @data_block parameter from cell_defer.
Change thin_bio_map to use many returns rather than setting a variable.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

dm thin: rename cell_defer_except to cell_defer_no_holder

Rename cell_defer_except() to cell_defer_no_holder() which describes
its function more clearly.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

dm snapshot: optimize track_chunk

track_chunk is always called with interrupts enabled. Consequently, we
do not need to save and restore interrupt state in "flags" variable.
This patch changes spin_lock_irqsave to spin_lock_irq and
spin_unlock_irqrestore to spin_unlock_irq.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

dm raid: use DM_ENDIO_INCOMPLETE

Use a defined macro DM_ENDIO_INCOMPLETE instead of a numeric constant.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

dm raid1: remove impossible mempool_alloc error test

mempool_alloc can't fail if __GFP_WAIT is specified, so the condition
that tests if read_record is non-NULL is always true.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

dm thin: emit ignore_discard in status when discards disabled

If "ignore_discard" is specified when creating the thin pool device then
discard support is disabled for that device. The pool device's status
should reflect this fact rather than stating "no_discard_passdown"
(which implies discards are enabled but passdown is disabled).

Reported-by: Zdenek Kabelac <zkabelac@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

dm persistent data: fix nested btree deletion

When deleting nested btrees, the code forgets to delete the innermost
btree. The thin-metadata code serendipitously compensates for this by
claiming there is one extra layer in the tree.

This patch corrects both problems.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

dm thin: wake worker when discard is prepared

When discards are prepared it is best to directly wake the worker that
will process them. The worker will be woken anyway, via periodic
commit, but there is no reason to not wake_worker here.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

dm thin: fix race between simultaneous io and discards to same block

There is a race when discard bios and non-discard bios are issued
simultaneously to the same block.

Discard support is expensive for all thin devices precisely because you
have to be careful to quiesce the area you're discarding.  DM thin must
handle this conflicting IO pattern (simultaneous non-discard vs discard)
even though a sane application shouldn't be issuing such IO.

The race manifests as follows:

1. A non-discard bio is mapped in thin_bio_map.
   This doesn't lock out parallel activity to the same block.

2. A discard bio is issued to the same block as the non-discard bio.

3. The discard bio is locked in a dm_bio_prison_cell in process_discard
   to lock out parallel activity against the same block.

4. The non-discard bio's mapping continues and its all_io_entry is
   incremented so the bio is accounted for in the thin pool's all_io_ds
   which is a dm_deferred_set used to track time locality of non-discard IO.

5. The non-discard bio is finally locked in a dm_bio_prison_cell in
   process_bio.

The race can result in deadlock, leaving the block layer hanging waiting
for completion of a discard bio that never completes, e.g.:

INFO: task ruby:15354 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ruby            D ffffffff8160f0e0     0 15354  15314 0x00000000
ffff8802fb08bc58 0000000000000082 ffff8802fb08bfd8 0000000000012900
ffff8802fb08a010 0000000000012900 0000000000012900 0000000000012900
ffff8802fb08bfd8 0000000000012900 ffff8803324b9480 ffff88032c6f14c0
Call Trace:
[<ffffffff814e5a19>] schedule+0x29/0x70
[<ffffffff814e3d85>] schedule_timeout+0x195/0x220
[<ffffffffa06b9bc1>] ? _dm_request+0x111/0x160 [dm_mod]
[<ffffffff814e589e>] wait_for_common+0x11e/0x190
[<ffffffff8107a170>] ? try_to_wake_up+0x2b0/0x2b0
[<ffffffff814e59ed>] wait_for_completion+0x1d/0x20
[<ffffffff81233289>] blkdev_issue_discard+0x219/0x260
[<ffffffff81233e79>] blkdev_ioctl+0x6e9/0x7b0
[<ffffffff8119a65c>] block_ioctl+0x3c/0x40
[<ffffffff8117539c>] do_vfs_ioctl+0x8c/0x340
[<ffffffff8119a547>] ? block_llseek+0x67/0xb0
[<ffffffff811756f1>] sys_ioctl+0xa1/0xb0
[<ffffffff810561f6>] ? sys_rt_sigprocmask+0x86/0xd0
[<ffffffff814ef099>] system_call_fastpath+0x16/0x1b

The thinp-test-suite's test_discard_random_sectors reliably hits this
deadlock on fast SSD storage.

The fix for this race is that the all_io_entry for a bio must be
incremented whilst the dm_bio_prison_cell is held for the bio's
associated virtual and physical blocks.  That cell locking wasn't
occurring early enough in thin_bio_map.  This patch fixes this.

Care is taken to always call the new function inc_all_io_entry() with
the relevant cells locked, but they are generally unlocked before
calling issue() to try to avoid holding the cells locked across
generic_submit_request.

Also, now that thin_bio_map may lock bios in a cell, process_bio() is no
longer the only thread that will do so.  Because of this we must be sure
to use cell_defer_except() to release all non-holder entries, that
were added by the other thread, because they must be deferred.

This patch depends on "dm thin: replace dm_cell_release_singleton with
cell_defer_except".

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: stable@vger.kernel.org

dm thin: replace dm_cell_release_singleton with cell_defer_except

Change existing users of the function dm_cell_release_singleton to share
cell_defer_except instead, and then remove the now-unused function.

Everywhere that calls dm_cell_release_singleton, the bio in question
is the holder of the cell.

If there are no non-holder entries in the cell then cell_defer_except
behaves exactly like dm_cell_release_singleton. Conversely, if there
*are* non-holder entries then dm_cell_release_singleton must not be used
because those entries would need to be deferred.

Consequently, it is safe to replace use of dm_cell_release_singleton
with cell_defer_except.

This patch is a pre-requisite for "dm thin: fix race between
simultaneous io and discards to same block".

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

dm: disable WRITE SAME

WRITE SAME bios are not yet handled correctly by device-mapper so
disable their use on device-mapper devices by setting
max_write_same_sectors to zero.

As an example, a ciphertext device is incompatible because the data
gets changed according to the location at which it written and so the
dm crypt target cannot support it.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
Cc: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

dm ioctl: prevent unsafe change to dm_ioctl data_size

Abort dm ioctl processing if userspace changes the data_size parameter
after we validated it but before we finished copying the data buffer
from userspace.

The dm ioctl parameters are processed in the following sequence:
1. ctl_ioctl() calls copy_params();
2. copy_params() makes a first copy of the fixed-sized portion of the
    userspace parameters into the local variable "tmp";
3. copy_params() then validates tmp.data_size and allocates a new
    structure big enough to hold the complete data and copies the whole
    userspace buffer there;
4. ctl_ioctl() reads userspace data the second time and copies the whole
    buffer into the pointer "param";
5. ctl_ioctl() reads param->data_size without any validation and stores it
    in the variable "input_param_size";
6. "input_param_size" is further used as the authoritative size of the
    kernel buffer.

The problem is that userspace code could change the contents of user
memory between steps 2 and 4.  In particular, the data_size parameter
can be changed to an invalid value after the kernel has validated it.
This lets userspace force the kernel to access invalid kernel memory.

The fix is to ensure that the size has not changed at step 4.

This patch shouldn't have a security impact because CAP_SYS_ADMIN is
required to run this code, but it should be fixed anyway.

Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: stable@kernel.org