review.tizen.org Git - platform/kernel/linux-rpi.git/log

ARM: dts: dra7-evm: Remove pinmux configurations for erratum i869

Pinmuxing for DRA7x/AM57x family of processors need to be done in IO
isolation as part of initial bootloader executed from SRAM. This is
done as part of iodelay configuration sequence and is required due
to the limitations introduced by erratum ID: i869[1] (IO Glitches
can occur when changing IO settings) and elaborated in the Technical
Reference Manual[2] 18.4.6.1.7 Isolation Requirements.

Only peripheral that is permitted for dynamic pin mux configuration
is MMC and DCAN. MMC is permitted to change to accommodate the
requirements for varied speeds (which require IO-delay support in
kernel as well). DCAN is a result of i893[1] (DCAN initialization
sequence).

DCAN pinmux is retained in this patch. MMC pinmux is missing from
the dra7-evm.dts file and the board is relying on configuration done
by bootloader. A subsequent patch will add MMC pinmux configuration.

A side-effect of this patch is that NAND support is removed. NAND
pins clash with VOUT3 on DRA7-EVM. U-Boot selects VOUT3 over NAND
as per TI EVM application needs.

[1] http://www.ti.com/lit/pdf/sprz429
[2] http://www.ti.com/lit/pdf/sprui30

Signed-off-by: Sekhar Nori <nsekhar@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>

ARM: dts: am335x-chilisom: Wakeup from RTC-only state by power on event

On chiliSOM TPS65217 nWAKEUP pin is connected to AM335x internal RTC
EXT_WAKEUP input. In RTC-only state TPS65217 is notifying about power on
events (such as power buton presses) by setting nWAKEUP output
low. After that it waits 5s for proper device boot. Currently it doesn't
happen, as the processor doesn't listen for such events. Consequently
TPS65217 changes state from SLEEP (RTC-only state) to OFF.

Enable EXT_WAKEUP input of AM335x's RTC, so the processor can properly
detect power on events and recover immediately from RTC-only states,
without powering off RTC and losing time.

Signed-off-by: Marcin Niestroj <m.niestroj@grinn-global.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>

ARM: dts: n900: cleanup

Fix GPIO comment to be consistent with rest of file and add comment what
tpa6130 is.

Signed-off-by: Pavel Machek <pavel@ucw.cz>
Reviewed-By: Sebastian Reichel <sre@kernel.org>
Reviewed-By: Pali Rohár <pali.rohar@gmail.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>

ARM: dts: Add am335x-bonegreen-wireless

SeeedStudio BeagleBone Green Wireless (BBGW) is an expansion of the
SeeedStudio Green (BBG) with the Ethernet replaced by a TI wl1835
wireless module.

This board can be indentified by the GW1x value after A335BNLT (BBB)
in the at24 eeprom:
GW1x [aa 55 33 ee 41 33 33 35 42 4e 4c 54 47 57 31 41 |.U3.A335BNLTGW1A|]

http://beagleboard.org/green-wireless
http://wiki.seeed.cc/BeagleBone_Green_Wireless/

firmware: https://github.com/beagleboard/beaglebone-black-wireless/tree/master/firmware
wl18xx mac address: Stored in at24 eeprom at address 5-16:
hexdump -e '8/1 "%c"' /sys/bus/i2c/devices/0-0050/eeprom | cut -b 5-16

Signed-off-by: Robert Nelson <robertcnelson@gmail.com>
CC: Tony Lindgren <tony@atomide.com>
CC: Jason Kridner <jkridner@beagleboard.org>
Signed-off-by: Tony Lindgren <tony@atomide.com>

ARM: dts: Move most of am335x-bonegreen.dts to am335x-bonegreen-common.dtsi

This is going to be shared with the SeeedStudio BeagleBone Green Wireless

Signed-off-by: Robert Nelson <robertcnelson@gmail.com>
CC: Tony Lindgren <tony@atomide.com>
CC: Jason Kridner <jkridner@beagleboard.org>
Signed-off-by: Tony Lindgren <tony@atomide.com>

ARM: dts: Add am335x-boneblack-wireless

BeagleBone Black Wireless is clone of the BeagleBone Black (BBB) with
the Ethernet replaced by a TI wl1835 wireless module.

This board can be indentified by the BWAx value after A335BNLT (BBB)
in the at24 eeprom:
BWAx [aa 55 33 ee 41 33 33 35 42 4e 4c 54 42 57 41 35 |.U3.A335BNLTBWA5|]

http://beagleboard.org/black-wireless
https://github.com/beagleboard/beaglebone-black-wireless

firmware: https://github.com/beagleboard/beaglebone-black-wireless/tree/master/firmware
wl18xx mac address: /proc/device-tree/ocp/ethernet@4a100000/slave@4a100200/mac-address

Signed-off-by: Robert Nelson <robertcnelson@gmail.com>
CC: Tony Lindgren <tony@atomide.com>
Acked-by: Jason Kridner <jkridner@beagleboard.org>
Signed-off-by: Tony Lindgren <tony@atomide.com>

ARM: dts: Move most of am335x-boneblack.dts to am335x-boneblack-common.dtsi

This is going to be shared with the BeagleBone Black Wireless

Signed-off-by: Robert Nelson <robertcnelson@gmail.com>
CC: Tony Lindgren <tony@atomide.com>
CC: Jason Kridner <jkridner@beagleboard.org>
Signed-off-by: Tony Lindgren <tony@atomide.com>

Linux 4.10-rc1

powerpc: Fix build warning on 32-bit PPC

I am getting the following warning when I build kernel 4.9-git on my
PowerBook G4 with a 32-bit PPC processor:

    AS      arch/powerpc/kernel/misc_32.o
  arch/powerpc/kernel/misc_32.S:299:7: warning: "CONFIG_FSL_BOOKE" is not defined [-Wundef]

This problem is evident after commit 989cea5c14be ("kbuild: prevent
lib-ksyms.o rebuilds"); however, this change in kbuild only exposes an
error that has been in the code since 2005 when this source file was
created.  That was with commit 9994a33865f4 ("powerpc: Introduce
entry_{32,64}.S, misc_{32,64}.S, systbl.S").

The offending line does not make a lot of sense.  This error does not
seem to cause any errors in the executable, thus I am not recommending
that it be applied to any stable versions.

Thanks to Nicholas Piggin for suggesting this solution.

Fixes: 9994a33865f4 ("powerpc: Introduce entry_{32,64}.S, misc_{32,64}.S, systbl.S")
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

avoid spurious "may be used uninitialized" warning

The timer type simplifications caused a new gcc warning:

  drivers/base/power/domain.c: In function ‘genpd_runtime_suspend’:
  drivers/base/power/domain.c:562:14: warning: ‘time_start’ may be used uninitialized in this function [-Wmaybe-uninitialized]
     elapsed_ns = ktime_to_ns(ktime_sub(ktime_get(), time_start));

despite the actual use of "time_start" not having changed in any way.
It appears that simply changing the type of ktime_t from a union to a
plain scalar type made gcc check the use.

The variable wasn't actually used uninitialized, but gcc apparently
failed to notice that the conditional around the use was exactly the
same as the conditional around the initialization of that variable.

Add an unnecessary initialization just to shut up the compiler.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull timer type cleanups from Thomas Gleixner:
"This series does a tree wide cleanup of types related to
  timers/timekeeping.

   - Get rid of cycles_t and use a plain u64. The type is not really
     helpful and caused more confusion than clarity

   - Get rid of the ktime union. The union has become useless as we use
     the scalar nanoseconds storage unconditionally now. The 32bit
     timespec alike storage got removed due to the Y2038 limitations
     some time ago.

     That leaves the odd union access around for no reason. Clean it up.

  Both changes have been done with coccinelle and a small amount of
  manual mopping up"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  ktime: Get rid of ktime_equal()
  ktime: Cleanup ktime_set() usage
  ktime: Get rid of the union
  clocksource: Use a plain u64 instead of cycle_t

Merge branch 'smp-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull SMP hotplug notifier removal from Thomas Gleixner:
"This is the final cleanup of the hotplug notifier infrastructure. The
  series has been reintgrated in the last two days because there came a
  new driver using the old infrastructure via the SCSI tree.

  Summary:

   - convert the last leftover drivers utilizing notifiers

   - fixup for a completely broken hotplug user

   - prevent setup of already used states

   - removal of the notifiers

   - treewide cleanup of hotplug state names

   - consolidation of state space

  There is a sphinx based documentation pending, but that needs review
  from the documentation folks"

* 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/armada-xp: Consolidate hotplug state space
  irqchip/gic: Consolidate hotplug state space
  coresight/etm3/4x: Consolidate hotplug state space
  cpu/hotplug: Cleanup state names
  cpu/hotplug: Remove obsolete cpu hotplug register/unregister functions
  staging/lustre/libcfs: Convert to hotplug state machine
  scsi/bnx2i: Convert to hotplug state machine
  scsi/bnx2fc: Convert to hotplug state machine
  cpu/hotplug: Prevent overwriting of callbacks
  x86/msr: Remove bogus cleanup from the error path
  bus: arm-ccn: Prevent hotplug callback leak
  perf/x86/intel/cstate: Prevent hotplug callback leak
  ARM/imx/mmcd: Fix broken cpu hotplug handling
  scsi: qedi: Convert to hotplug state machine

Merge branch 'turbostat' of git://git./linux/kernel/git/lenb/linux

Pull turbostat updates from Len Brown.

* 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
  tools/power turbostat: remove obsolete -M, -m, -C, -c options
  tools/power turbostat: Make extensible via the --add parameter
  tools/power turbostat: Denverton uses a 25 MHz crystal, not 19.2 MHz
  tools/power turbostat: line up headers when -M is used
  tools/power turbostat: fix SKX PKG_CSTATE_LIMIT decoding
  tools/power turbostat: Support Knights Mill (KNM)
  tools/power turbostat: Display HWP OOB status
  tools/power turbostat: fix Denverton BCLK
  tools/power turbostat: use intel-family.h model strings
  tools/power/turbostat: Add Denverton RAPL support
  tools/power/turbostat: Add Denverton support
  tools/power/turbostat: split core MSR support into status + limit
  tools/power turbostat: fix error case overflow read of slm_freq_table[]
  tools/power turbostat: Allocate correct amount of fd and irq entries
  tools/power turbostat: switch to tab delimited output
  tools/power turbostat: Gracefully handle ACPI S3
  tools/power turbostat: tidy up output on Joule counter overflow

mm: add PageWaiters indicating tasks are waiting for a page bit

Add a new page flag, PageWaiters, to indicate the page waitqueue has
tasks waiting. This can be tested rather than testing waitqueue_active
which requires another cacheline load.

This bit is always set when the page has tasks on page_waitqueue(page),
and is set and cleared under the waitqueue lock. It may be set when
there are no tasks on the waitqueue, which will cause a harmless extra
wakeup check that will clears the bit.

The generic bit-waitqueue infrastructure is no longer used for pages.
Instead, waitqueues are used directly with a custom key type. The
generic code was not flexible enough to have PageWaiters manipulation
under the waitqueue lock (which simplifies concurrency).

This improves the performance of page lock intensive microbenchmarks by
2-3%.

Putting two bits in the same word opens the opportunity to remove the
memory barrier between clearing the lock bit and testing the waiters
bit, after some work on the arch primitives (e.g., ensuring memory
operand widths match and cover both bits).

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Bob Peterson <rpeterso@redhat.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Andrew Lutomirski <luto@kernel.org>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: Use owner_priv bit for PageSwapCache, valid when PageSwapBacked

A page is not added to the swap cache without being swap backed,
so PageSwapBacked mappings can use PG_owner_priv_1 for PageSwapCache.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Bob Peterson <rpeterso@redhat.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Andrew Lutomirski <luto@kernel.org>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ktime: Get rid of ktime_equal()

No point in going through loops and hoops instead of just comparing the
values.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>

ktime: Cleanup ktime_set() usage

ktime_set(S,N) was required for the timespec storage type and is still
useful for situations where a Seconds and Nanoseconds part of a time value
needs to be converted. For anything where the Seconds argument is 0, this
is pointless and can be replaced with a simple assignment.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>

ktime: Get rid of the union

ktime is a union because the initial implementation stored the time in
scalar nanoseconds on 64 bit machine and in a endianess optimized timespec
variant for 32bit machines. The Y2038 cleanup removed the timespec variant
and switched everything to scalar nanoseconds. The union remained, but
become completely pointless.

Get rid of the union and just keep ktime_t as simple typedef of type s64.

The conversion was done with coccinelle and some manual mopping up.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>

clocksource: Use a plain u64 instead of cycle_t

There is no point in having an extra type for extra confusion. u64 is
unambiguous.

Conversion was done with the following coccinelle script:

@rem@
@@
-typedef u64 cycle_t;

@fix@
typedef cycle_t;
@@
-cycle_t
+u64

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: John Stultz <john.stultz@linaro.org>

irqchip/armada-xp: Consolidate hotplug state space

The mpic is either the main interrupt controller or is cascaded behind a
GIC. The mpic is single instance and the modes are mutually exclusive, so
there is no reason to have seperate cpu hotplug states.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Link: http://lkml.kernel.org/r/20161221192112.333161745@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

irqchip/gic: Consolidate hotplug state space

Even if both drivers are compiled in only one instance can run on a given
system depending on the available GIC version.

So having seperate hotplug states for them is pointless.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Link: http://lkml.kernel.org/r/20161221192112.252416267@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

coresight/etm3/4x: Consolidate hotplug state space

Even if both drivers are compiled in only one instance can run on a given
system depending on the available tracer cell.

So having seperate hotplug states for them is pointless.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Link: http://lkml.kernel.org/r/20161221192112.162765484@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

cpu/hotplug: Cleanup state names

When the state names got added a script was used to add the extra argument
to the calls. The script basically converted the state constant to a
string, but the cleanup to convert these strings into meaningful ones did
not happen.

Replace all the useless strings with 'subsys/xxx/yyy:state' strings which
are used in all the other places already.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Link: http://lkml.kernel.org/r/20161221192112.085444152@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

cpu/hotplug: Remove obsolete cpu hotplug register/unregister functions

hotcpu_notifier(), cpu_notifier(), __hotcpu_notifier(), __cpu_notifier(),
register_hotcpu_notifier(), register_cpu_notifier(),
__register_hotcpu_notifier(), __register_cpu_notifier(),
unregister_hotcpu_notifier(), unregister_cpu_notifier(),
__unregister_hotcpu_notifier(), __unregister_cpu_notifier()

are unused now. Remove them and all related code.

Remove also the now pointless cpu notifier error injection mechanism. The
states can be executed step by step and error rollback is the same as cpu
down, so any state transition can be tested w/o requiring the notifier
error injection.

Some CPU hotplug states are kept as they are (ab)used for hotplug state
tracking.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20161221192112.005642358@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

staging/lustre/libcfs: Convert to hotplug state machine

Install the callbacks via the state machine. No functional change.

Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: devel@driverdev.osuosl.org
Cc: Andreas Dilger <andreas.dilger@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Oleg Drokin <oleg.drokin@intel.com>
Cc: rt@linutronix.de
Cc: lustre-devel@lists.lustre.org
Link: http://lkml.kernel.org/r/20161202110027.htzzeervzkoc4muv@linutronix.de
Link: http://lkml.kernel.org/r/20161221192111.922872524@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

scsi/bnx2i: Convert to hotplug state machine

Install the callbacks via the state machine. No functional change.

This is the minimal fixup so we can remove the hotplug notifier mess
completely.

The real rework of this driver to use work queues is still stuck in
review/testing on the SCSI mailing list.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
Cc: linux-scsi@vger.kernel.org
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Chad Dupuis <chad.dupuis@qlogic.com>
Cc: QLogic-Storage-Upstream@qlogic.com
Cc: Johannes Thumshirn <jth@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Link: http://lkml.kernel.org/r/20161221192111.836895753@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

scsi/bnx2fc: Convert to hotplug state machine

Install the callbacks via the state machine. No functional change.

This is the minimal fixup so we can remove the hotplug notifier mess
completely.

The real rework of this driver to use work queues is still stuck in
review/testing on the SCSI mailing list.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
Cc: linux-scsi@vger.kernel.org
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Chad Dupuis <chad.dupuis@qlogic.com>
Cc: QLogic-Storage-Upstream@qlogic.com
Cc: Johannes Thumshirn <jth@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Link: http://lkml.kernel.org/r/20161221192111.757309869@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

cpu/hotplug: Prevent overwriting of callbacks

Developers manage to overwrite states blindly without thought. That's fatal
and hard to debug. Add sanity checks to make it fail.

This requries to restructure the code so that the dynamic state allocation
happens in the same lock protected section as the actual store. Otherwise
the previous assignment of 'Reserved' to the name field would trigger the
overwrite check.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Link: http://lkml.kernel.org/r/20161221192111.675234535@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

x86/msr: Remove bogus cleanup from the error path

The error cleanup which is invoked when the hotplug state setup failed
tries to remove the failed state, which is broken.

Fixes: 8fba38c937cd ("x86/msr: Convert to hotplug state machine")
Reported-by: kernel test robot <fengguang.wu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Sebastian Siewior <bigeasy@linutronix.de>

bus: arm-ccn: Prevent hotplug callback leak

In case the driver registration fails, the hotplug callback is leaked.

Not fatal, because it's never invoked as there are no instances registered,
but wrong nevertheless.

Fixes: fdc15a36d84e ("bus/arm-ccn: Convert to hotplug statemachine")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Pawel Moll <pawel.moll@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>

perf/x86/intel/cstate: Prevent hotplug callback leak

If the pmu registration fails the registered hotplug callbacks are not
removed. Wrong in any case, but fatal in case of a modular driver.

Replace the nonsensical state names with proper ones while at it.

Fixes: 77c34ef1c319 ("perf/x86/intel/cstate: Convert Intel CSTATE to hotplug state machine")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org

ARM/imx/mmcd: Fix broken cpu hotplug handling

The cpu hotplug support of this perf driver is broken in several ways:

1) It adds a instance before setting up the state.

2) The state for the instance is different from the state of the
   callback. It's just a randomly chosen state.

3) The instance registration is not error checked so nobody noticed that
   the call can never succeed.

4) The state for the multi install callbacks is chosen randomly and
   overwrites existing state. This is now prevented by the core code so the
   call is guaranteed to fail.

5) The error exit path in the init function leaves the instance registered
   and then frees the memory which contains the enqueued hlist node.

6) The remove function is removing the state and not the instance.

Fix it by:

- Setting up the state before adding instances. Use a dynamically allocated
  state for it.

- Installing instances after the state has been set up

- Removing the instance in the error path before freeing memory

- Removing the instance not the state in the driver remove callback

While at is use raw_cpu_processor_id(), because cpu_processor_id() cannot
be used in preemptible context, and set the driver data after successful
registration of the pmu.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Shawn Guo <shawnguo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Frank Li <frank.li@nxp.com>
Cc: Zhengyu Shen <zhengyu.shen@nxp.com>
Link: http://lkml.kernel.org/r/20161221192111.596204211@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

scsi: qedi: Convert to hotplug state machine

The CPU hotplug code is a trainwreck. It leaks a notifier in case of driver
registration error and the per cpu loop is racy against cpu hotplug. Aside
of that the driver should have been written and merged with the new state
machine interfaces in the first place.

Mop up the mess and Convert it to the hotplug state machine.

Signed-off-by: Thomas Grumpy Gleixner <tglx@linutronix.de>
Cc: Nilesh Javali <nilesh.javali@cavium.com>
Cc: Adheer Chandravanshi <adheer.chandravanshi@qlogic.com>
Cc: Chad Dupuis <chad.dupuis@cavium.com>
Cc: Saurav Kashyap <saurav.kashyap@cavium.com>
Cc: Arun Easi <arun.easi@cavium.com>
Cc: Manish Rangankar <manish.rangankar@cavium.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>

tools/power turbostat: remove obsolete -M, -m, -C, -c options

The new --add option has replaced the -M, -m, -C, -c options
Eg.

-M 0x10 is now --add msr0x10,raw
-m 0x10 is now --add msr0x10,raw,u32
-C 0x10 is now --add msr0x10,delta
-c 0x10 is now --add msr0x10,delta,u32

The --add option can be repeated to add any number of counters,
while the previous options were limited to adding one of each type.

In addition, the --add option can accept a column label,
and can also display a counter as a percentage of elapsed cycles.

Eg. --add msr0x3fe,core,percent,MY_CC3

Signed-off-by: Len Brown <len.brown@intel.com>

tools/power turbostat: Make extensible via the --add parameter

Create the "--add" parameter. This can be used to teach an existing
turbostat binary about any number of any type of counter.

turbostat(8) details the syntax for --add.

Signed-off-by: Len Brown <len.brown@intel.com>

Replace <asm/uaccess.h> with <linux/uaccess.h> globally

This was entirely automated, using the script by Al:

  PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
  sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
        $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)

to do the replacement at the end of the merge window.

Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fixes from Steve French:
"This ncludes various cifs/smb3 bug fixes, mostly for stable as well.

  In the next week I expect that Germano will have some reconnection
  fixes, and also I expect to have the remaining pieces of the snapshot
  enablement and SMB3 ACLs, but wanted to get this set of bug fixes in"

* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
  cifs_get_root shouldn't use path with tree name
  Fix default behaviour for empty domains and add domainauto option
  cifs: use %16phN for formatting md5 sum
  cifs: Fix smbencrypt() to stop pointing a scatterlist at the stack
  CIFS: Fix a possible double locking of mutex during reconnect
  CIFS: Fix a possible memory corruption during reconnect
  CIFS: Fix a possible memory corruption in push locks
  CIFS: Fix missing nls unload in smb2_reconnect()
  CIFS: Decrease verbosity of ioctl call
  SMB3: parsing for new snapshot timestamp mount parm

Merge tag 'watchdog-for-linus-v4.10' of git://git./linux/kernel/git/groeck/linux-staging

Pull watchdog updates from Wim Van Sebroeck and Guenter Roeck:

- new driver for Add Loongson1 SoC

- minor cleanup and fixes in various drivers

* tag 'watchdog-for-linus-v4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  watchdog: it87_wdt: add IT8620E ID
  watchdog: mpc8xxx: Remove unneeded linux/miscdevice.h include
  watchdog: octeon: Remove unneeded linux/miscdevice.h include
  watchdog: bcm2835_wdt: set WDOG_HW_RUNNING bit when appropriate
  watchdog: loongson1: Add Loongson1 SoC watchdog driver
  watchdog: cpwd: remove memory allocate failure message
  watchdog: da9062/61: watchdog driver
  intel-mid_wdt: Error code is just an integer
  intel-mid_wdt: make sure watchdog is not running at startup
  watchdog: mei_wdt: request stop on reboot to prevent false positive event
  watchdog: hpwdt: changed maintainer information
  watchdog: jz4740: Fix modular build
  watchdog: qcom: fix kernel panic due to external abort on non-linefetch
  watchdog: davinci: add support for deferred probing
  watchdog: meson: Remove unneeded platform MODULE_ALIAS
  watchdog: Standardize leading tabs and spaces in Kconfig file
  watchdog: max77620_wdt: fix module autoload
  watchdog: bcm7038_wdt: fix module autoload

Merge tag 'ntb-4.10' of git://github.com/jonmason/ntb

Pull NTB update from Jon Mason:

- NTB bug fixes for removing an unnecessary call to ntb_peer_spad_read,
   and correcting a free_irq inconsistency

- add Intel SKX support

- change the AMD NTB maintainer, and fix some bugs present there

* tag 'ntb-4.10' of git://github.com/jonmason/ntb:
  ntb_transport: Remove unnecessary call to ntb_peer_spad_read
  NTB: Fix 'request_irq()' and 'free_irq()' inconsistancy
  ntb: fix SKX NTB config space size register offsets
  NTB: correct ntb_peer_spad_read for case when callback is not supplied.
  MAINTAINERS: Change in maintainer for AMD NTB
  ntb_transport: Limit memory windows based on available, scratchpads
  NTB: Register and offset values fix for memory window
  NTB: add support for hotplug feature
  ntb: Adding Skylake Xeon NTB support

Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:
"There's a number of fixes:

   - a round of fixes for CPUID-less legacy CPUs
   - a number of microcode loader fixes
   - i8042 detection robustization fixes
   - stack dump/unwinder fixes
   - x86 SoC platform driver fixes
   - a GCC 7 warning fix
   - virtualization related fixes"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
  Revert "x86/unwind: Detect bad stack return address"
  x86/paravirt: Mark unused patch_default label
  x86/microcode/AMD: Reload proper initrd start address
  x86/platform/intel/quark: Add printf attribute to imr_self_test_result()
  x86/platform/intel-mid: Switch MPU3050 driver to IIO
  x86/alternatives: Do not use sync_core() to serialize I$
  x86/topology: Document cpu_llc_id
  x86/hyperv: Handle unknown NMIs on one CPU when unknown_nmi_panic
  x86/asm: Rewrite sync_core() to use IRET-to-self
  x86/microcode/intel: Replace sync_core() with native_cpuid()
  Revert "x86/boot: Fail the boot if !M486 and CPUID is missing"
  x86/asm/32: Make sync_core() handle missing CPUID on all 32-bit kernels
  x86/cpu: Probe CPUID leaf 6 even when cpuid_level == 6
  x86/tools: Fix gcc-7 warning in relocs.c
  x86/unwind: Dump stack data on warnings
  x86/unwind: Adjust last frame check for aligned function stacks
  x86/init: Fix a couple of comment typos
  x86/init: Remove i8042_detect() from platform ops
  Input: i8042 - Trust firmware a bit more when probing on X86
  x86/init: Add i8042 state to the platform data
  ...

Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull timer fix from Ingo Molnar:
"ARM/MOXA SoC clocksource driver fixes"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clocksource/drivers/moxart: Plug memory and mapping leaks

Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull perf fixes from Ingo Molnar:
"On the kernel side there's two x86 PMU driver fixes and a uprobes fix,
  plus on the tooling side there's a number of fixes and some late
  updates"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
  perf sched timehist: Fix invalid period calculation
  perf sched timehist: Remove hardcoded 'comm_width' check at print_summary
  perf sched timehist: Enlarge default 'comm_width'
  perf sched timehist: Honour 'comm_width' when aligning the headers
  perf/x86: Fix overlap counter scheduling bug
  perf/x86/pebs: Fix handling of PEBS buffer overflows
  samples/bpf: Move open_raw_sock to separate header
  samples/bpf: Remove perf_event_open() declaration
  samples/bpf: Be consistent with bpf_load_program bpf_insn parameter
  tools lib bpf: Add bpf_prog_{attach,detach}
  samples/bpf: Switch over to libbpf
  perf diff: Do not overwrite valid build id
  perf annotate: Don't throw error for zero length symbols
  perf bench futex: Fix lock-pi help string
  perf trace: Check if MAP_32BIT is defined (again)
  samples/bpf: Make perf_event_read() static
  uprobes: Fix uprobes on MIPS, allow for a cache flush after ixol breakpoint creation
  samples/bpf: Make samples more libbpf-centric
  tools lib bpf: Add flags to bpf_create_map()
  tools lib bpf: use __u32 from linux/types.h
  ...

Merge branch 'irq-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull irq fix from Ingo Molnar:
"A build warning fix with certain .config's"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/st: Mark st_irq_syscfg_resume() __maybe_unused

ntb_transport: Remove unnecessary call to ntb_peer_spad_read

The results were previously ignored, anyway.

Signed-off-by: Steve Wahl <Steve.Wahl@dell.com>
Fixes: e26a5843f7f5014ae4460030ca4de029a3ac35d3
Acked-by: Allen Hubbe <Allen.Hubbe@dell.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>

NTB: Fix 'request_irq()' and 'free_irq()' inconsistancy

'request_irq()' and 'free_irq()' should have the same 'dev_id'.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>

ntb: fix SKX NTB config space size register offsets

The offsets for the SZ registers are wrong. Updated.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Reported-by: Sandeep Mann <sandeep@purestorage.com>
Tested-by: Zachary Ross <zacharyx.ross@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>

NTB: correct ntb_peer_spad_read for case when callback is not supplied.

Correct ntb_peer_spad_read for case when callback is not supplied

Signed-off-by: Steve Wahl <Steve.Wahl@dell.com>
Acked-by: Allen Hubbe <Allen.Hubbe@dell.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>

MAINTAINERS: Change in maintainer for AMD NTB

I would like to take maintainership for AMD NTB

Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Signed-off-by: Xiangliang Yu <Xiangliang.Yu@amd.com>
Acked-by: Xiangliang Yu <Xiangliang.Yu@amd.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>

ntb_transport: Limit memory windows based on available, scratchpads

When the underlying NTB H/W driver advertises more memory windows
than the number of scratchpads available to setup MW's, it is likely
that we may end up filling the remaining memory windows with garbage.
So to avoid that, lets limit the memory windows that transport driver
can setup based on the available scratchpads.

Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Acked-by: Allen Hubbe <Allen.Hubbe@dell.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>

NTB: Register and offset values fix for memory window

Due to incorrect limit and translation register values, NTB link was
going down when the memory window was setup. Made appropriate changes
as per spec.

Fix limit register values for BAR1, which was overlapping
with the BAR23 address.

Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Acked-by: Allen Hubbe <Allen.Hubbe@dell.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>

NTB: add support for hotplug feature

AMD NTB support hotplug under B2B mode. NTB will trigger link
up/down interrupt event when doing plug add/remove, this patch
implements the two interrupt event to support B2B hotplug function.

Signed-off-by: Xiangliang Yu <Xiangliang.Yu@amd.com>
Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Acked-by: Allen Hubbe <Allen.Hubbe@dell.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>

ntb: Adding Skylake Xeon NTB support

The Skylake Xeon NTB hardware has made some changes to the register name,
offset, and the way doorbells work. Adding driver support for the new
hardware.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Acked-by: Allen Hubbe <Allen.Hubbe@dell.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>

Revert "x86/unwind: Detect bad stack return address"

Revert the following commit:

  b6959a362177 ("x86/unwind: Detect bad stack return address")

... because Andrey Konovalov reported an unwinder warning:

  WARNING: unrecognized kernel stack return address ffffffffa0000001 at ffff88006377fa18 in a.out:4467

The unwind was initiated from an interrupt which occurred while running in the
generated code for a kprobe.  The unwinder printed the warning because it
expected regs->ip to point to a valid text address, but instead it pointed to
the generated code.

Eventually we may want come up with a way to identify generated kprobe
code so the unwinder can know that it's a valid return address.  Until
then, just remove the warning.

Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/02f296848fbf49fb72dfeea706413ecbd9d4caf6.1482418739.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Merge tag 'perf-urgent-for-mingo-20161222' of git://git./linux/kernel/git/acme/linux into perf/urgent

Pull perf/urgent fixes from Arnaldo Carvalho de Melo:

Fixes for 'perf sched timehist': (Namhyung Kim)

- Define a larger initial alignment value for the COMM column and
   make it be more consistently honoured, for instance in the header.

- Fix invalid period calculation when using the --time option to
   select a time slice, when events outside that slice were being
   considered for the per cpu idle stats summary.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) We have to be careful to not try and place a checksum after the end
    of a rawv6 packet, fix from Dave Jones with help from Hannes
    Frederic Sowa.

2) Missing memory barriers in tcp_tasklet_func() lead to crashes, from
    Eric Dumazet.

3) Several bug fixes for the new XDP support in virtio_net, from Jason
    Wang.

4) Increase headroom in RX skbs in be2net driver to accomodate
    encapsulations such as geneve. From Kalesh A P.

5) Fix SKB frag unmapping on TX in mvpp2, from Thomas Petazzoni.

6) Pre-pulling UDP headers created a regression in RECVORIGDSTADDR
    socket option support, from Willem de Bruijn.

7) UID based routing added a potential OOPS in ip_do_redirect() when we
    see an SKB without a socket attached. We just need it for the
    network namespace which we can get from skb->dev instead. Fix from
    Lorenzo Colitti.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (30 commits)
  sctp: fix recovering from 0 win with small data chunks
  sctp: do not loose window information if in rwnd_over
  virtio-net: XDP support for small buffers
  virtio-net: remove big packet XDP codes
  virtio-net: forbid XDP when VIRTIO_NET_F_GUEST_UFO is support
  virtio-net: make rx buf size estimation works for XDP
  virtio-net: unbreak csumed packets for XDP_PASS
  virtio-net: correctly handle XDP_PASS for linearized packets
  virtio-net: fix page miscount during XDP linearizing
  virtio-net: correctly xmit linearized page on XDP_TX
  virtio-net: remove the warning before XDP linearizing
  mlxsw: spectrum_router: Correctly remove nexthop groups
  mlxsw: spectrum_router: Don't reflect dead neighs
  neigh: Send netevent after marking neigh as dead
  ipv6: handle -EFAULT from skb_copy_bits
  inet: fix IP(V6)_RECVORIGDSTADDR for udp sockets
  net/sched: cls_flower: Mandate mask when matching on flags
  net/sched: act_tunnel_key: Fix setting UDP dst port in metadata under IPv6
  stmmac: CSR clock configuration fix
  net: ipv4: Don't crash if passing a null sk to ip_do_redirect.
  ...

sctp: fix recovering from 0 win with small data chunks

Currently if SCTP closes the receive window with window pressure, mostly
caused by excessive skb overhead on payload/overheads ratio, SCTP will
close the window abruptly while saving the delta on rwnd_press. It will
start recovering rwnd as the chunks are consumed by the application and
the rwnd_press will be only recovered after rwnd reach the same value as
of rwnd_press, mostly to prevent silly window syndrome.

Thing is, this is very inefficient with small data chunks, as with those
it will never reach back that value, and thus it will never recover from
such pressure. This means that we will not issue window updates when
recovering from 0 window and will rely on a sender retransmit to notice
it.

The fix here is to remove such threshold, as no value is good enough: it
depends on the (avg) chunk sizes being used.

Test with netperf -t SCTP_STREAM -- -m 1, and trigger 0 window by
sending SIGSTOP to netserver, sleep 1.2, and SIGCONT.
Rate limited to 845kbps, for visibility. Capture done at netserver side.

Previously:
01.500751 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632372996] [a_rwnd 99153] [
01.500752 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632372997] [SID: 0] [SS
01.517471 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373010] [SID: 0] [SS
01.517483 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373009] [a_rwnd 0] [#gap
01.517485 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373083] [SID: 0] [SS
01.517488 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373009] [a_rwnd 0] [#gap
01.534168 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373096] [SID: 0] [SS
01.534180 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373009] [a_rwnd 0] [#gap
01.534181 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373169] [SID: 0] [SS
01.534185 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373009] [a_rwnd 0] [#gap
02.525978 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373010] [SID: 0] [SS
02.526021 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373009] [a_rwnd 0] [#gap
(window update missed)
04.573807 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373010] [SID: 0] [SS
04.779370 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373082] [a_rwnd 859] [#g
04.789162 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373083] [SID: 0] [SS
04.789323 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373156] [SID: 0] [SS
04.789372 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373228] [a_rwnd 786] [#g

After:
02.568957 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098728] [a_rwnd 99153]
02.568961 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098729] [SID: 0] [S
02.585631 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098742] [SID: 0] [S
02.585666 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 0] [#ga
02.585671 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098815] [SID: 0] [S
02.585683 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 0] [#ga
02.602330 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098828] [SID: 0] [S
02.602359 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 0] [#ga
02.602363 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098901] [SID: 0] [S
02.602372 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 0] [#ga
03.600788 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098742] [SID: 0] [S
03.600830 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 0] [#ga
03.619455 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 13508]
03.619479 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 27017]
03.619497 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 40526]
03.619516 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 54035]
03.619533 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 67544]
03.619552 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 81053]
03.619570 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 94562]
(following data transmission triggered by window updates above)
03.633504 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098742] [SID: 0] [S
03.836445 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098814] [a_rwnd 100000]
03.843125 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098815] [SID: 0] [S
03.843285 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098888] [SID: 0] [S
03.843345 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098960] [a_rwnd 99894]
03.856546 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098961] [SID: 0] [S
03.866450 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490099011] [SID: 0] [S

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sctp: do not loose window information if in rwnd_over

It's possible that we receive a packet that is larger than current
window. If it's the first packet in this way, it will cause it to
increase rwnd_over. Then, if we receive another data chunk (specially as
SCTP allows you to have one data chunk in flight even during 0 window),
rwnd_over will be overwritten instead of added to.

In the long run, this could cause the window to grow bigger than its
initial size, as rwnd_over would be charged only for the last received
data chunk while the code will try open the window for all packets that
were received and had its value in rwnd_over overwritten. This, then,
can lead to the worsening of payload/buffer ratio and cause rwnd_press
to kick in more often.

The fix is to sum it too, same as is done for rwnd_press, so that if we
receive 3 chunks after closing the window, we still have to release that
same amount before re-opening it.

Log snippet from sctp_test exhibiting the issue:
[  146.209232] sctp: sctp_assoc_rwnd_decrease: asoc:ffff88013928e000
rwnd decreased by 1 to (0, 1, 114221)
[  146.209232] sctp: sctp_assoc_rwnd_decrease:
association:ffff88013928e000 has asoc->rwnd:0, asoc->rwnd_over:1!
[  146.209232] sctp: sctp_assoc_rwnd_decrease: asoc:ffff88013928e000
rwnd decreased by 1 to (0, 1, 114221)
[  146.209232] sctp: sctp_assoc_rwnd_decrease:
association:ffff88013928e000 has asoc->rwnd:0, asoc->rwnd_over:1!
[  146.209232] sctp: sctp_assoc_rwnd_decrease: asoc:ffff88013928e000
rwnd decreased by 1 to (0, 1, 114221)
[  146.209232] sctp: sctp_assoc_rwnd_decrease:
association:ffff88013928e000 has asoc->rwnd:0, asoc->rwnd_over:1!
[  146.209232] sctp: sctp_assoc_rwnd_decrease: asoc:ffff88013928e000
rwnd decreased by 1 to (0, 1, 114221)

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs

Pull final vfs updates from Al Viro:
"Assorted cleanups and fixes all over the place"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  sg_write()/bsg_write() is not fit to be called under KERNEL_DS
  ufs: fix function declaration for ufs_truncate_blocks
  fs: exec: apply CLOEXEC before changing dumpable task flags
  seq_file: reset iterator to first record for zero offset
  vfs: fix isize/pos/len checks for reflink & dedupe
  [iov_iter] fix iterate_all_kinds() on empty iterators
  move aio compat to fs/aio.c
  reorganize do_make_slave()
  clone_private_mount() doesn't need to touch namespace_sem
  remove a bogus claim about namespace_sem being held by callers of mnt_alloc_id()

Merge branch 'virtio-net-xdp-fixes'

Jason Wang says:

====================
several fixups for virtio-net XDP

Merry Xmas and a Happy New year to all:

This series tries to fixes several issues for virtio-net XDP which
could be categorized into several parts:

- fix several issues during XDP linearizing
- allow csumed packet to work for XDP_PASS
- make EWMA rxbuf size estimation works for XDP
- forbid XDP when GUEST_UFO is support
- remove big packet XDP support
- add XDP support or small buffer

Please see individual patches for details.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

virtio-net: XDP support for small buffers

Commit f600b6905015 ("virtio_net: Add XDP support") leaves the case of
small receive buffer untouched. This will confuse the user who want to
set XDP but use small buffers. Other than forbid XDP in small buffer
mode, let's make it work. XDP then can only work at skb->data since
virtio-net create skbs during refill, this is sub optimal which could
be optimized in the future.

Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

virtio-net: remove big packet XDP codes

Now we in fact don't allow XDP for big packets, remove its codes.

Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

virtio-net: forbid XDP when VIRTIO_NET_F_GUEST_UFO is support

When VIRTIO_NET_F_GUEST_UFO is negotiated, host could still send UFO
packet that exceeds a single page which could not be handled
correctly by XDP. So this patch forbids setting XDP when GUEST_UFO is
supported. While at it, forbid XDP for ECN (which comes only from GRO)
too to prevent user from misconfiguration.

Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

virtio-net: make rx buf size estimation works for XDP

We don't update ewma rx buf size in the case of XDP. This will lead
underestimation of rx buf size which causes host to produce more than
one buffers. This will greatly increase the possibility of XDP page
linearization.

Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

virtio-net: unbreak csumed packets for XDP_PASS

We drop csumed packet when do XDP for packets. This breaks
XDP_PASS when GUEST_CSUM is supported. Fix this by allowing csum flag
to be set. With this patch, simple TCP works for XDP_PASS.

Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

virtio-net: correctly handle XDP_PASS for linearized packets

When XDP_PASS were determined for linearized packets, we try to get
new buffers in the virtqueue and build skbs from them. This is wrong,
we should create skbs based on existed buffers instead. Fixing them by
creating skb based on xdp_page.

With this patch "ping 192.168.100.4 -s 3900 -M do" works for XDP_PASS.

Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

virtio-net: fix page miscount during XDP linearizing

We don't put page during linearizing, the would cause leaking when
xmit through XDP_TX or the packet exceeds PAGE_SIZE. Fix them by
put page accordingly. Also decrease the number of buffers during
linearizing to make sure caller can free buffers correctly when packet
exceeds PAGE_SIZE. With this patch, we won't get OOM after linearize
huge number of packets.

Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

virtio-net: correctly xmit linearized page on XDP_TX

After we linearize page, we should xmit this page instead of the page
of first buffer which may lead unexpected result. With this patch, we
can see correct packet during XDP_TX.

Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

virtio-net: remove the warning before XDP linearizing

Since we use EWMA to estimate the size of rx buffer. When rx buffer
size is underestimated, it's usual to have a packet with more than one
buffers. Consider this is not a bug, remove the warning and correct
the comment before XDP linearizing.

Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'befs-v4.10-rc1' of git://github.com/luisbg/linux-befs

Pull befs updates from Luis de Bethencourt:
"A series of small fixes and adding NFS export support"

* tag 'befs-v4.10-rc1' of git://github.com/luisbg/linux-befs:
  befs: add NFS export support
  befs: remove trailing whitespaces
  befs: remove signatures from comments
  befs: fix style issues in header files
  befs: fix style issues in linuxvfs.c
  befs: fix typos in linuxvfs.c
  befs: fix style issues in io.c
  befs: fix style issues in inode.c
  befs: fix style issues in debug.c

Merge tag 'drm-fixes-for-4.10-rc1' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:
"Some fixes came in while I was out, mostly intel and amdgpu ones, with
  one ast fix"

Daniel Vetter says:
"This should also shut up the WARN_ON(!intel_dp->lane_count) noise"

* tag 'drm-fixes-for-4.10-rc1' of git://people.freedesktop.org/~airlied/linux: (35 commits)
  drm/amdgpu: update tile table for oland/hainan
  drm/amdgpu: update tile table for verde
  drm/amdgpu: update rev id for verde
  drm/amdgpu: update golden setting for verde
  drm/amdgpu: update rev id for oland
  drm/amdgpu: update golden setting for oland
  drm/amdgpu: update rev id for hainan
  drm/amdgpu: update golden setting for hainan
  drm/amdgpu: update rev id for pitcairn
  drm/amdgpu: update golden setting for pitcairn
  drm/amdgpu: update golden setting/tiling table of tahiti
  drm/i915: skip the first 4k of stolen memory on everything >= gen8
  drm/i915: Fallback to single PAGE_SIZE segments for DMA remapping
  drm/i915: Fix use after free in logical_render_ring_init
  drm/i915: disable PSR by default on HSW/BDW
  drm/i915: Fix setting of boost freq tunable
  drm/i915: tune down the fast link training vs boot fail
  drm/i915: Reorder phys backing storage release
  drm/i915/gen9: Fix PCODE polling during SAGV disabling
  drm/i915/gen9: Fix PCODE polling during CDCLK change notification
  ...

Merge tag 'for-linus' of git://git./linux/kernel/git/dledford/rdma

Pull rdma fixes from Doug Ledford:
"First round of -rc fixes for 4.10 kernel:

   - a series of qedr fixes
   - a series of rxe fixes
   - one i40iw fix
   - one cma fix
   - one cxgb4 fix"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
  IB/rxe: Don't check for null ptr in send()
  IB/rxe: Drop future atomic/read packets rather than retrying
  IB/rxe: Use BTH_PSN_MASK when ACKing duplicate sends
  qedr: Always notify the verb consumer of flushed CQEs
  qedr: clear the vendor error field in the work completion
  qedr: post_send/recv according to QP state
  qedr: ignore inline flag in read verbs
  qedr: modify QP state to error when destroying it
  qedr: return correct value on modify qp
  qedr: return error if destroy CQ failed
  qedr: configure the number of CQEs on CQ creation
  i40iw: Set 128B as the only supported RQ WQE size
  IB/cma: Fix a race condition in iboe_addr_get_sgid()
  IB/rxe: Fix a memory leak in rxe_qp_cleanup()
  iw_cxgb4: set correct FetchBurstMax for QPs

Merge tag 'scsi-for-linus' of git://git./linux/kernel/git/jejb/scsi

Pull late SCSI updates from James Bottomley:
"This is mostly stuff which missed the initial pull.

  There's a new driver: qedi, and some ufs, ibmvscsis and ncr5380
  updates plus some assorted driver fixes and also a fix for the bug
  where if a device goes into a blocked state between configuration and
  sysfs device add (which can be a long time under async probing) it
  would become permanently blocked"

* tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (30 commits)
  scsi: avoid a permanent stop of the scsi device's request queue
  scsi: mpt3sas: Recognize and act on iopriority info
  scsi: qla2xxx: Fix Target mode handling with Multiqueue changes.
  scsi: qla2xxx: Add Block Multi Queue functionality.
  scsi: qla2xxx: Add multiple queue pair functionality.
  scsi: qla2xxx: Utilize pci_alloc_irq_vectors/pci_free_irq_vectors calls.
  scsi: qla2xxx: Only allow operational MBX to proceed during RESET.
  scsi: hpsa: remove memory allocate failure message
  scsi: Update 3ware driver email addresses
  scsi: zfcp: fix rport unblock race with LUN recovery
  scsi: zfcp: do not trace pure benign residual HBA responses at default level
  scsi: zfcp: fix use-after-"free" in FC ingress path after TMF
  scsi: libcxgbi: return error if interface is not up
  scsi: cxgb4i: libcxgbi: add missing module_put()
  scsi: cxgb4i: libcxgbi: cxgb4: add T6 iSCSI completion feature
  scsi: cxgb4i: libcxgbi: add active open cmd for T6 adapters
  scsi: cxgb4i: use cxgb4_tp_smt_idx() to get smt_idx
  scsi: qedi: Add QLogic FastLinQ offload iSCSI driver framework.
  scsi: aacraid: remove wildcard for series 9 controllers
  scsi: ibmvscsi: add write memory barrier to CRQ processing
  ...

Merge tag 'arc-4.10-rc1-part2' of git://git./linux/kernel/git/vgupta/arc

Pull more ARC updates from Vineet Gupta:

- Fix for aliasing VIPT dcache in old ARC700 cores

- micro-optimization in ARC700 ProtV handler

- Enable SG_CHAIN  [Vladimir]

- ARC HS38 core intc default to prio 1

* tag 'arc-4.10-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
  ARC: mm: arc700: Don't assume 2 colours for aliasing VIPT dcache
  ARC: mm: No need to save cache version in @cpuinfo
  ARC: enable SG chaining
  ARCv2: intc: default all interrupts to priority 1
  ARCv2: entry: document intr disable in hard isr
  ARC: ARCompact entry: elide re-reading ECR in ProtV handler

Merge branch 'mlxsw-router-fixes'

Jiri Pirko says:

====================
mlxsw: Router fixes

Ido says:

First two patches ensure we remove from the device's table neighbours
that are considered to be dead by the neighbour core.

The last patch removes nexthop groups from the device when they are no
longer valid.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_router: Correctly remove nexthop groups

At the end of the nexthop initialization process we determine whether
the nexthop should be offloaded or not based on the NUD state of the
neighbour representing it. After all the nexthops were initialized we
refresh the nexthop group and potentially offload it to the device, in
case some of the nexthops were resolved.

Make the destruction of a nexthop group symmetric with its creation by
marking all nexthops as invalid and then refresh the nexthop group to
make sure it was removed from the device's tables.

Fixes: b2157149b0b0 ("mlxsw: spectrum_router: Add the nexthop neigh activity update")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_router: Don't reflect dead neighs

When a neighbour is considered to be dead, we should remove it from the
device's table regardless of its NUD state.

Without this patch, after setting a port to be administratively down we
get the following errors when we periodically try to update the kernel
about neighbours activity:

[ 461.947268] mlxsw_spectrum 0000:03:00.0 sw1p3: Failed to find
matching neighbour for IP=192.168.100.2

Fixes: a6bf9e933daf ("mlxsw: spectrum_router: Offload neighbours based on NUD state change")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

neigh: Send netevent after marking neigh as dead

neigh_cleanup_and_release() is always called after marking a neighbour
as dead, but it only notifies user space and not in-kernel listeners of
the netevent notification chain.

This can cause multiple problems. In my specific use case, it causes the
listener (a switch driver capable of L3 offloads) to believe a neighbour
entry is still valid, and is thus erroneously kept in the device's
table.

Fix that by sending a netevent after marking the neighbour as dead.

Fixes: a6bf9e933daf ("mlxsw: spectrum_router: Offload neighbours based on NUD state change")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: handle -EFAULT from skb_copy_bits

By setting certain socket options on ipv6 raw sockets, we can confuse the
length calculation in rawv6_push_pending_frames triggering a BUG_ON.

RIP: 0010:[<ffffffff817c6390>] [<ffffffff817c6390>] rawv6_sendmsg+0xc30/0xc40
RSP: 0018:ffff881f6c4a7c18 EFLAGS: 00010282
RAX: 00000000fffffff2 RBX: ffff881f6c681680 RCX: 0000000000000002
RDX: ffff881f6c4a7cf8 RSI: 0000000000000030 RDI: ffff881fed0f6a00
RBP: ffff881f6c4a7da8 R08: 0000000000000000 R09: 0000000000000009
R10: ffff881fed0f6a00 R11: 0000000000000009 R12: 0000000000000030
R13: ffff881fed0f6a00 R14: ffff881fee39ba00 R15: ffff881fefa93a80

Call Trace:
[<ffffffff8118ba23>] ? unmap_page_range+0x693/0x830
[<ffffffff81772697>] inet_sendmsg+0x67/0xa0
[<ffffffff816d93f8>] sock_sendmsg+0x38/0x50
[<ffffffff816d982f>] SYSC_sendto+0xef/0x170
[<ffffffff816da27e>] SyS_sendto+0xe/0x10
[<ffffffff81002910>] do_syscall_64+0x50/0xa0
[<ffffffff817f7cbc>] entry_SYSCALL64_slow_path+0x25/0x25

Handle by jumping to the failure path if skb_copy_bits gets an EFAULT.

Reproducer:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>

#define LEN 504

int main(int argc, char* argv[])
{
int fd;
int zero = 0;
char buf[LEN];

memset(buf, 0, LEN);

fd = socket(AF_INET6, SOCK_RAW, 7);

setsockopt(fd, SOL_IPV6, IPV6_CHECKSUM, &zero, 4);
setsockopt(fd, SOL_IPV6, IPV6_DSTOPTS, &buf, LEN);

sendto(fd, buf, 1, 0, (struct sockaddr *) buf, 110);
}

Signed-off-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

inet: fix IP(V6)_RECVORIGDSTADDR for udp sockets

Socket cmsg IP(V6)_RECVORIGDSTADDR checks that port range lies within
the packet. For sockets that have transport headers pulled, transport
offset can be negative. Use signed comparison to avoid overflow.

Fixes: e6afc8ace6dd ("udp: remove headers from UDP packets before queueing")
Reported-by: Nisar Jagabar <njagabar@cloudmark.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'cls_flower-act_tunnel_key-fixes'

Or Gerlitz says:

====================
net/sched fixes for cls_flower and act_tunnel_key

This small series contain a fix to the matching flags support
in flower and to the tunnel key action MD prep for IPv6.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net/sched: cls_flower: Mandate mask when matching on flags

When matching on flags, we should require the user to provide the
mask and avoid using an all-ones mask. Not doing so causes matching
on flags provided w.o mask to hit on the value being unset for all
flags, which may not what the user wanted to happen.

Fixes: faa3ffce7829 ('net/sched: cls_flower: Add support for matching on flags')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reported-by: Paul Blakey <paulb@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/sched: act_tunnel_key: Fix setting UDP dst port in metadata under IPv6

The UDP dst port was provided to the helper function which sets the
IPv6 IP tunnel meta-data under a wrong param order, fix that.

Fixes: 75bfbca01e48 ('net/sched: act_tunnel_key: Add UDP dst port option')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

stmmac: CSR clock configuration fix

When testing stmmac with my QoS reference design I checked a problem in the
CSR clock configuration that was impossibilitating the phy discovery, since
every read operation returned 0x0000ffff. This patch fixes the issue.

Signed-off-by: Joao Pinto <jpinto@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'work.namespace' into for-linus

sg_write()/bsg_write() is not fit to be called under KERNEL_DS

Both damn things interpret userland pointers embedded into the payload;
worse, they are actually traversing those. Leaving aside the bad
API design, this is very much _not_ safe to call with KERNEL_DS.
Bail out early if that happens.

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

ufs: fix function declaration for ufs_truncate_blocks

sparse says:

fs/ufs/inode.c:1195:6: warning: symbol 'ufs_truncate_blocks' was not declared. Should it be static?

Note that the forward declaration in the file is already marked static.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

fs: exec: apply CLOEXEC before changing dumpable task flags

If you have a process that has set itself to be non-dumpable, and it
then undergoes exec(2), any CLOEXEC file descriptors it has open are
"exposed" during a race window between the dumpable flags of the process
being reset for exec(2) and CLOEXEC being applied to the file
descriptors. This can be exploited by a process by attempting to access
/proc/<pid>/fd/... during this window, without requiring CAP_SYS_PTRACE.

The race in question is after set_dumpable has been (for get_link,
though the trace is basically the same for readlink):

[vfs]
-> proc_pid_link_inode_operations.get_link
   -> proc_pid_get_link
      -> proc_fd_access_allowed
         -> ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS);

Which will return 0, during the race window and CLOEXEC file descriptors
will still be open during this window because do_close_on_exec has not
been called yet. As a result, the ordering of these calls should be
reversed to avoid this race window.

This is of particular concern to container runtimes, where joining a
PID namespace with file descriptors referring to the host filesystem
can result in security issues (since PRCTL_SET_DUMPABLE doesn't protect
against access of CLOEXEC file descriptors -- file descriptors which may
reference filesystem objects the container shouldn't have access to).

Cc: dev@opencontainers.org
Cc: <stable@vger.kernel.org> # v3.2+
Reported-by: Michael Crosby <crosbymichael@gmail.com>
Signed-off-by: Aleksa Sarai <asarai@suse.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

seq_file: reset iterator to first record for zero offset

If kernfs file is empty on a first read, successive read operations
using the same file descriptor will return no data, even when data is
available. Default kernfs 'seq_next' implementation advances iterator
position even when next object is not there. Kernfs 'seq_start' for
following requests will not return iterator as position is already on
the second object.

This defect doesn't allow to monitor badblocks sysfs files from MD raid.
They are initially empty but if data appears at some stage, userspace is
not able to read it.

Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

vfs: fix isize/pos/len checks for reflink & dedupe

Strengthen the checking of pos/len vs. i_size, clarify the return values
for the clone prep function, and remove pointless code.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

[iov_iter] fix iterate_all_kinds() on empty iterators

Problem similar to ones dealt with in "fold checks into iterate_and_advance()"
and followups, except that in this case we really want to do nothing when
asked for zero-length operation - unlike zero-length iterate_and_advance(),
zero-length iterate_all_kinds() has no side effects, and callers are simpler
that way.

That got exposed when copy_from_iter_full() had been used by tipc, which
builds an msghdr with zero payload and (now) feeds it to a primitive
based on iterate_all_kinds() instead of iterate_and_advance().

Reported-by: Jon Maloy <jon.maloy@ericsson.com>
Tested-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

move aio compat to fs/aio.c

... and fix the minor buglet in compat io_submit() - native one
kills ioctx as cleanup when put_user() fails. Get rid of
bogus compat_... in !CONFIG_AIO case, while we are at it - they
should simply fail with ENOSYS, same as for native counterparts.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Merge branch 'misc' into for-linus

perf sched timehist: Fix invalid period calculation

When --time option is given with a value outside recorded time, the last
sample time (tprev) was set to that value and run time calculation might
be incorrect.  This is a problem of the first samples for each cpus
since it would skip the runtime update when tprev is 0.  But with --time
option it had non-zero (which is invalid) value so the calculation is
also incorrect.

For example, let's see the followging:

  $ perf sched timehist
             time    cpu  task name                       wait time  sch delay   run time
                          [tid/pid]                          (msec)     (msec)     (msec)
  --------------- ------  ------------------------------  ---------  ---------  ---------
      3195.968367 [0003]  <idle>                              0.000      0.000      0.000
      3195.968386 [0002]  Timer[4306/4277]                    0.000      0.000      0.018
      3195.968397 [0002]  Web Content[4277]                   0.000      0.000      0.000
      3195.968595 [0001]  JS Helper[4302/4277]                0.000      0.000      0.000
      3195.969217 [0000]  <idle>                              0.000      0.000      0.621
      3195.969251 [0001]  kworker/1:1H[291]                   0.000      0.000      0.033

The sample starts at 3195.968367 but when I gave a time interval from
3194 to 3196 (in sec) it will calculate the whole 2 second as runtime.
In below, 2 cpus accounted it as runtime, other 2 cpus accounted it as
idle time.

Before:

  $ perf sched timehist --time 3194,3196 -s | tail
  Idle stats:
      CPU  0 idle for   1995.991  msec
      CPU  1 idle for     20.793  msec
      CPU  2 idle for     30.191  msec
      CPU  3 idle for   1999.852  msec

      Total number of unique tasks: 23
  Total number of context switches: 128
             Total run time (msec): 3724.940

After:

  $ perf sched timehist --time 3194,3196 -s | tail
  Idle stats:
      CPU  0 idle for     10.811  msec
      CPU  1 idle for     20.793  msec
      CPU  2 idle for     30.191  msec
      CPU  3 idle for     18.337  msec

      Total number of unique tasks: 23
  Total number of context switches: 128
             Total run time (msec): 18.139

Committer notes:

Further testing:

Before:

  Idle stats:
      CPU  0 idle for    229.785  msec
      CPU  1 idle for    937.944  msec
      CPU  2 idle for    188.931  msec
      CPU  3 idle for    986.185  msec

  After:

  # perf sched timehist --time 40602,40603 -s | tail

  Idle stats:
      CPU  0 idle for    229.785  msec
      CPU  1 idle for    175.407  msec
      CPU  2 idle for    188.931  msec
      CPU  3 idle for    223.657  msec

      Total number of unique tasks: 68
  Total number of context switches: 814
             Total run time (msec): 97.688

  # for cpu in `seq 0 3` ; do echo -n "CPU $cpu idle for " ; perf sched timehist --time 40602,40603 | grep "\[000${cpu}\].*\<idle\>" | tr -s ' ' | cut -d' ' -f7 | awk '{entries++ ; s+=$1} END {print s " msec (entries: " entries ")"}' ; done
  CPU 0 idle for 229.721 msec (entries: 123)
  CPU 1 idle for 175.381 msec (entries: 65)
  CPU 2 idle for 188.903 msec (entries: 56)
  CPU 3 idle for 223.61 msec (entries: 102)

Difference due to the idle stats being accounted at nanoseconds precision while
the <idle> entries in 'perf sched timehist' are trucated at msec.usec.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Fixes: 853b74071110 ("perf sched timehist: Add option to specify time window of interest")
Link: http://lkml.kernel.org/r/20161222060350.17655-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

perf sched timehist: Remove hardcoded 'comm_width' check at print_summary

Now that the default 'comm_width' value is 30, no need to check that at
print_summary,

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20161222060350.17655-1-namhyung@kernel.org
[ Split from a larger patch ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

perf sched timehist: Enlarge default 'comm_width'

Current default value is 20 but it's easily changed to a bigger value as
task has a long name and different tid and pid.  And it makes the output
not aligned.  So change it to have a large value as summary shows.

Committer notes:

Before:

  # perf sched record
  ^C
  # perf sched timehist
  <SNIP>
    40602.770537 [0001]  rcuos/2[29]               7.970      0.002      0.020
    40602.771512 [0003]  <idle>                    0.003      0.000      0.986
    40602.771586 [0001]  <idle>                    0.020      0.000      1.049
    40602.771606 [0001]  qemu-system-x86[3593/3510]      0.000      0.002      0.020
    40602.771629 [0003]  qemu-system-x86[3510]           0.000      0.003      0.116
    40602.771776 [0000]  <idle>                          0.001      0.000      1.892
  <SNIP>

After:

  # perf sched timehist
  <SNIP>
   40602.770537 [0001]  rcuos/2[29]                         7.970      0.002      0.020
   40602.771512 [0003]  <idle>                              0.003      0.000      0.986
   40602.771586 [0001]  <idle>                              0.020      0.000      1.049
   40602.771606 [0001]  qemu-system-x86[3593/3510]          0.000      0.002      0.020
   40602.771629 [0003]  qemu-system-x86[3510]               0.000      0.003      0.116
  <SNIP>

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20161222060350.17655-1-namhyung@kernel.org
[ Split from a larger patch ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

perf sched timehist: Honour 'comm_width' when aligning the headers

Current default value is 20, but that may change in the future, so make
places where we have 20 hardcoded use 'comm_width'.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20161222060350.17655-1-namhyung@kernel.org
[ Split from a larger patch ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

Merge tag 'drm-intel-next-fixes-2016-12-22' of git://anongit.freedesktop.org/git/drm-intel into drm-fixes

First set of i915 fixes for code in next.

* tag 'drm-intel-next-fixes-2016-12-22' of git://anongit.freedesktop.org/git/drm-intel:
  drm/i915: skip the first 4k of stolen memory on everything >= gen8
  drm/i915: Fallback to single PAGE_SIZE segments for DMA remapping
  drm/i915: Fix use after free in logical_render_ring_init
  drm/i915: disable PSR by default on HSW/BDW
  drm/i915: Fix setting of boost freq tunable
  drm/i915: tune down the fast link training vs boot fail
  drm/i915: Reorder phys backing storage release
  drm/i915/gen9: Fix PCODE polling during SAGV disabling
  drm/i915/gen9: Fix PCODE polling during CDCLK change notification
  drm/i915/dsi: Fix chv_exec_gpio disabling the GPIOs it is setting
  drm/i915/dsi: Fix swapping of MIPI_SEQ_DEASSERT_RESET / MIPI_SEQ_ASSERT_RESET
  drm/i915/dsi: Do not clear DPOUNIT_CLOCK_GATE_DISABLE from vlv_init_display_clock_gating
  drm/i915: drop the struct_mutex when wedged or trying to reset

Merge tag 'drm-misc-fixes-2016-12-22' of git://anongit.freedesktop.org/git/drm-misc into drm-fixes

Here's the one lonely bugfix I talked about on irc.

* tag 'drm-misc-fixes-2016-12-22' of git://anongit.freedesktop.org/git/drm-misc:
drivers/gpu/drm/ast: Fix infinite loop if read fails

Merge branch 'drm-next-4.10' of git://people.freedesktop.org/~agd5f/linux into drm-fixes

- fix display regression on DCE6/8
- Powergating fixes for GFX8
- amdgpu SI fixes (golden settings, proper rev id setup, etc.)

* 'drm-next-4.10' of git://people.freedesktop.org/~agd5f/linux: (21 commits)
  drm/amdgpu: update tile table for oland/hainan
  drm/amdgpu: update tile table for verde
  drm/amdgpu: update rev id for verde
  drm/amdgpu: update golden setting for verde
  drm/amdgpu: update rev id for oland
  drm/amdgpu: update golden setting for oland
  drm/amdgpu: update rev id for hainan
  drm/amdgpu: update golden setting for hainan
  drm/amdgpu: update rev id for pitcairn
  drm/amdgpu: update golden setting for pitcairn
  drm/amdgpu: update golden setting/tiling table of tahiti
  drm/amdgpu: fix cursor setting of dce6/dce8
  drm/amdgpu: refine set clock gating for tonga/polaris
  drm/amdgpu: initialize cg flags for tonga/polaris10/polaris11.
  drm/amdgpu: add new gfx cg flags.
  drm/amdgpu: fix pg can't be disabled by PG mask.
  drm/amdgpu: always initialize gfx pg for gfx_v8.0.
  drm/amdgpu: enable AMD_PG_SUPPORT_CP in Carrizo/Stoney.
  drm/amdgpu: fix init save/restore list in gfx_v8.0
  drm/amdgpu: fix enable_cp_power_gating in gfx_v8.0.
  ...

Merge tag 'leds_for_4.10_email_update' of git://git./linux/kernel/git/j.anaszewski/linux-leds

Pull LED maintainer email update from Jacek Anaszewski:
"Update Jacek Anaszewski's email address"

* tag 'leds_for_4.10_email_update' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds:
MAINTAINERS: Update Jacek Anaszewski's email address