platform/kernel/linux-rpi.git
7 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next
Linus Torvalds [Sun, 10 Sep 2017 16:57:23 +0000 (09:57 -0700)]
Merge git://git./linux/kernel/git/davem/sparc-next

Pull sparc updates from David Miller:

 1) Use register window state adjustment instructions when available,
    from Anthony Yznaga.

 2) Add VCC console concentrator driver, from Jag Raman.

 3) Add 16GB hugepage support, from Nitin Gupta.

 4) Support cpu 'poke' hypercall, from Vijay Kumar.

 5) Add M7/M8 optimized memcpy/memset/copy_{to,from}_user, from Babu
    Moger.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next: (33 commits)
  sparc64: Handle additional cases of no fault loads
  sparc64: speed up etrap/rtrap on NG2 and later processors
  sparc64: vcc: make ktermios const
  sparc: leon: grpci1: constify of_device_id
  sparc: leon: grpci2: constify of_device_id
  sparc64: vcc: Check for IS_ERR() instead of NULL
  sparc64: Cleanup hugepage table walk functions
  sparc64: Add 16GB hugepage support
  sparc64: Support huge PUD case in get_user_pages
  sparc64: vcc: Add install & cleanup TTY operations
  sparc64: vcc: Add break_ctl TTY operation
  sparc64: vcc: Add chars_in_buffer TTY operation
  sparc64: vcc: Add write & write_room TTY operations
  sparc64: vcc: Add hangup TTY operation
  sparc64: vcc: Add open & close TTY operations
  sparc64: vcc: Enable LDC event processing engine
  sparc64: vcc: Add RX & TX timer for delayed LDC operation
  sparc64: vcc: Create sysfs attribute group
  sparc64: vcc: Enable VCC port probe and removal
  sparc64: vcc: TTY driver initialization and cleanup
  ...

7 years agom68k: Add braces to __pmd(x) initializer to kill compiler warning
Geert Uytterhoeven [Sun, 10 Sep 2017 13:20:03 +0000 (15:20 +0200)]
m68k: Add braces to __pmd(x) initializer to kill compiler warning

With gcc 4.1.2:

    include/linux/swapops.h: In function ‘swp_entry_to_pmd’:
    include/linux/swapops.h:294: warning: missing braces around initializer
    include/linux/swapops.h:294: warning: (near initialization for ‘(anonymous).pmd’)

Due to a GCC zero initializer bug (#53119), the standard "(pmd_t){ 0 }"
initializer is not accepted by all GCC versions.
In addition, on m68k pmd_t is an array instead of a single value, so we
need "(pmd_t){ { 0 }, }" instead of "(pmd_t){ 0 }".

Based on commit 9157259d16a8 ("mm: add pmd_t initializer __pmd() to
work around a GCC bug.") for sparc32.

Fixes: 616b8371539a ("mm: thp: enable thp migration in generic path")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agox86/mm/64: Fix an incorrect warning with CONFIG_DEBUG_VM=y, !PCID
Andy Lutomirski [Sun, 10 Sep 2017 15:52:58 +0000 (08:52 -0700)]
x86/mm/64: Fix an incorrect warning with CONFIG_DEBUG_VM=y, !PCID

I've been staring at the word PCID too long.

Fixes: f13c8e8c58ba ("x86/mm: Reinitialize TLB state on hotplug and resume")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agosparc64: Handle additional cases of no fault loads
Rob Gardner [Fri, 8 Sep 2017 22:34:21 +0000 (16:34 -0600)]
sparc64: Handle additional cases of no fault loads

Load instructions using ASI_PNF or other no-fault ASIs should not
cause a SIGSEGV or SIGBUS.

A garden variety unmapped address follows the TSB miss path, and when
no valid mapping is found in the process page tables, the miss handler
checks to see if the access was via a no-fault ASI.  It then fixes up
the target register with a zero, and skips the no-fault load
instruction.

But different paths are taken for data access exceptions and alignment
traps, and these do not respect the no-fault ASI. We add checks in
these paths for the no-fault ASI, and fix up the target register and
TPC just like in the TSB miss case.

Signed-off-by: Rob Gardner <rob.gardner@oracle.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosparc64: speed up etrap/rtrap on NG2 and later processors
Anthony Yznaga [Fri, 18 Aug 2017 19:40:36 +0000 (12:40 -0700)]
sparc64: speed up etrap/rtrap on NG2 and later processors

For many sun4v processor types, reading or writing a privileged register
has a latency of 40 to 70 cycles.  Use a combination of the low-latency
allclean, otherw, normalw, and nop instructions in etrap and rtrap to
replace 2 rdpr and 5 wrpr instructions and improve etrap/rtrap
performance.  allclean, otherw, and normalw are available on NG2 and
later processors.

The average ticks to execute the flush windows trap ("ta 0x3") with and
without this patch on select platforms:

 CPU            Not patched     Patched    % Latency Reduction

 NG2            1762            1558            -11.58
 NG4            3619            3204            -11.47
 M7             3015            2624            -12.97
 SPARC64-X      829             770              -7.12

Signed-off-by: Anthony Yznaga <anthony.yznaga@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge tag 'iommu-updates-v4.14' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 9 Sep 2017 22:03:24 +0000 (15:03 -0700)]
Merge tag 'iommu-updates-v4.14' of git://git./linux/kernel/git/joro/iommu

Pull IOMMU updates from Joerg Roedel:
 "Slightly more changes than usual this time:

   - KDump Kernel IOMMU take-over code for AMD IOMMU. The code now tries
     to preserve the mappings of the kernel so that master aborts for
     devices are avoided. Master aborts cause some devices to fail in
     the kdump kernel, so this code makes the dump more likely to
     succeed when AMD IOMMU is enabled.

   - common flush queue implementation for IOVA code users. The code is
     still optional, but AMD and Intel IOMMU drivers had their own
     implementation which is now unified.

   - finish support for iommu-groups. All drivers implement this feature
     now so that IOMMU core code can rely on it.

   - finish support for 'struct iommu_device' in iommu drivers. All
     drivers now use the interface.

   - new functions in the IOMMU-API for explicit IO/TLB flushing. This
     will help to reduce the number of IO/TLB flushes when IOMMU drivers
     support this interface.

   - support for mt2712 in the Mediatek IOMMU driver

   - new IOMMU driver for QCOM hardware

   - system PM support for ARM-SMMU

   - shutdown method for ARM-SMMU-v3

   - some constification patches

   - various other small improvements and fixes"

* tag 'iommu-updates-v4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (87 commits)
  iommu/vt-d: Don't be too aggressive when clearing one context entry
  iommu: Introduce Interface for IOMMU TLB Flushing
  iommu/s390: Constify iommu_ops
  iommu/vt-d: Avoid calling virt_to_phys() on null pointer
  iommu/vt-d: IOMMU Page Request needs to check if address is canonical.
  arm/tegra: Call bus_set_iommu() after iommu_device_register()
  iommu/exynos: Constify iommu_ops
  iommu/ipmmu-vmsa: Make ipmmu_gather_ops const
  iommu/ipmmu-vmsa: Rereserving a free context before setting up a pagetable
  iommu/amd: Rename a few flush functions
  iommu/amd: Check if domain is NULL in get_domain() and return -EBUSY
  iommu/mediatek: Fix a build warning of BIT(32) in ARM
  iommu/mediatek: Fix a build fail of m4u_type
  iommu: qcom: annotate PM functions as __maybe_unused
  iommu/pamu: Fix PAMU boot crash
  memory: mtk-smi: Degrade SMI init to module_init
  iommu/mediatek: Enlarge the validate PA range for 4GB mode
  iommu/mediatek: Disable iommu clock when system suspend
  iommu/mediatek: Move pgtable allocation into domain_alloc
  iommu/mediatek: Merge 2 M4U HWs into one iommu domain
  ...

7 years agoMerge tag 'for-linus-20170904' of git://git.infradead.org/linux-mtd
Linus Torvalds [Sat, 9 Sep 2017 21:48:21 +0000 (14:48 -0700)]
Merge tag 'for-linus-20170904' of git://git.infradead.org/linux-mtd

Pull MTD updates from Boris Brezillon:
 "General updates:
   - Constify pci_device_id in various drivers
   - Constify device_type
   - Remove pad control code from the Gemini driver
   - Use %pOF to print OF node full_name
   - Various fixes in the physmap_of driver
   - Remove unused vars in mtdswap
   - Check devm_kzalloc() return value in the spear_smi driver
   - Check clk_prepare_enable() return code in the st_spi_fsm driver
   - Create per MTD device debugfs enties

  NAND updates, from Boris Brezillon:
   - Fix memory leaks in the core
   - Remove unused NAND locking support
   - Rename nand.h into rawnand.h (preparing support for spi NANDs)
   - Use NAND_MAX_ID_LEN where appropriate
   - Fix support for 20nm Hynix chips
   - Fix support for Samsung and Hynix SLC NANDs
   - Various cleanup, improvements and fixes in the qcom driver
   - Fixes for bugs detected by various static code analysis tools
   - Fix mxc ooblayout definition
   - Add a new part_parsers to tmio and sharpsl platform data in order
     to define a custom list of partition parsers
   - Request the reset line in exclusive mode in the sunxi driver
   - Fix a build error in the orion-nand driver when compiled for ARMv4
   - Allow 64-bit mvebu platforms to select the PXA3XX driver

  SPI NOR updates, from Cyrille Pitchen and Marek Vasut:
   - add support to the JEDEC JESD216B specification (SFDP tables).
   - add support to the Intel Denverton SPI flash controller.
   - fix error recovery for Spansion/Cypress SPI NOR memories.
   - fix 4-byte address management for the Aspeed SPI controller.
   - add support to some Microchip SST26 memory parts
   - remove unneeded pinctrl header Write a message for tag:"

* tag 'for-linus-20170904' of git://git.infradead.org/linux-mtd: (74 commits)
  mtd: nand: complain loudly when chip->bits_per_cell is not correctly initialized
  mtd: nand: make Samsung SLC NAND usable again
  mtd: nand: tmio: Register partitions using the parsers
  mfd: tmio: Add partition parsers platform data
  mtd: nand: sharpsl: Register partitions using the parsers
  mtd: nand: sharpsl: Add partition parsers platform data
  mtd: nand: qcom: Support for IPQ8074 QPIC NAND controller
  mtd: nand: qcom: support for IPQ4019 QPIC NAND controller
  dt-bindings: qcom_nandc: IPQ8074 QPIC NAND documentation
  dt-bindings: qcom_nandc: IPQ4019 QPIC NAND documentation
  dt-bindings: qcom_nandc: fix the ipq806x device tree example
  mtd: nand: qcom: support for different DEV_CMD register offsets
  mtd: nand: qcom: QPIC data descriptors handling
  mtd: nand: qcom: enable BAM or ADM mode
  mtd: nand: qcom: erased codeword detection configuration
  mtd: nand: qcom: support for read location registers
  mtd: nand: qcom: support for passing flags in DMA helper functions
  mtd: nand: qcom: add BAM DMA descriptor handling
  mtd: nand: qcom: allocate BAM transaction
  mtd: nand: qcom: DMA mapping support for register read buffer
  ...

7 years agoMerge tag 'for-v4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux...
Linus Torvalds [Sat, 9 Sep 2017 21:44:39 +0000 (14:44 -0700)]
Merge tag 'for-v4.14' of git://git./linux/kernel/git/sre/linux-power-supply

Pull power supply and reset changes from Sebastian Reichel:
 "New chip/feature support:
   - bq27xxx: support updating battery config from DT
   - bq24190: support loading battery charge info from DT
   - LTC2941: add LTC2942/LTC2944 support
   - max17042: add ACPI support
   - max1721x: new driver

  Misc:
   - Move bq27xxx w1 driver from w1 into power-supply subsystem
   - Introduce power_supply_set_input_current_limit_from_supplier
   - constify stuff
   - some minor fixes"

* tag 'for-v4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply: (39 commits)
  power: supply: bq27xxx: enable writing capacity values for bq27421
  power: supply: bq24190_charger: Get input_current_limit from our supplier
  power: supply: bq24190_charger: Export 5V boost converter as regulator
  power: supply: bq24190_charger: Add power_supply_battery_info support
  power: supply: bq24190_charger: Add property system-minimum-microvolt
  power: supply: bq24190_charger: Enable devicetree config
  dt-bindings: power: supply: Add docs for TI BQ24190 battery charger
  power: supply: bq27xxx: Remove duplicate chip data arrays
  power: supply: bq27xxx: Enable data memory update for certain chips
  power: supply: bq27xxx: Add chip IDs for previously shadowed chips
  power: supply: bq27xxx: Create single chip data table
  power: supply: bq24190_charger: Add ti,bq24192i to devicetree table
  power: supply: bq24190_charger: Add input_current_limit property
  power: supply: Add power_supply_set_input_current_limit_from_supplier helper
  power: supply: max17042_battery: Fix compiler warning
  power: supply: core: Delete two error messages for a failed memory allocation in power_supply_check_supplies()
  power: supply: make device_attribute const
  power: supply: max17042_battery: Fix ACPI interrupt issues
  power: supply: max17042_battery: Add support for ACPI enumeration
  power: supply: lp8788: Make several arrays static const * const
  ...

7 years agoMerge tag 'rpmsg-v4.14' of git://github.com/andersson/remoteproc
Linus Torvalds [Sat, 9 Sep 2017 21:34:38 +0000 (14:34 -0700)]
Merge tag 'rpmsg-v4.14' of git://github.com/andersson/remoteproc

Pull rpmsg updates from Bjorn Andersson:
 "This extends the Qualcomm GLINK implementation to support the
  additional features used for communicating with modem and DSP
  coprocessors in modern Qualcomm platforms.

  In addition to this there's support for placing virtio RPMSG buffers
  in non-System RAM"

* tag 'rpmsg-v4.14' of git://github.com/andersson/remoteproc: (29 commits)
  rpmsg: glink: initialize ret to zero to ensure error status check is correct
  rpmsg: glink: fix null pointer dereference on a null intent
  dt-bindings: soc: qcom: Extend GLINK to cover SMEM
  remoteproc: qcom: adsp: Allow defining GLINK edge
  rpmsg: glink: Export symbols from common code
  rpmsg: glink: Release idr lock before returning on error
  rpmsg: glink: Handle remote rx done command
  rpmsg: glink: Request for intents when unavailable
  rpmsg: glink: Use the intents passed by remote
  rpmsg: glink: Receive and store the remote intent buffers
  rpmsg: glink: Add announce_create ops and preallocate intents
  rpmsg: glink: Add rx done command
  rpmsg: glink: Make RX FIFO peak accessor to take an offset
  rpmsg: glink: Use the local intents when receiving data
  rpmsg: glink: Add support for TX intents
  rpmsg: glink: Fix idr_lock from mutex to spinlock
  rpmsg: glink: Add support for transport version negotiation
  rpmsg: glink: Introduce glink smem based transport
  rpmsg: glink: Do a mbox_free_channel in remove
  rpmsg: glink: Return -EAGAIN when there is no FIFO space
  ...

7 years agoMerge tag 'rproc-v4.14' of git://github.com/andersson/remoteproc
Linus Torvalds [Sat, 9 Sep 2017 21:30:50 +0000 (14:30 -0700)]
Merge tag 'rproc-v4.14' of git://github.com/andersson/remoteproc

Pull remoteproc updates from Bjorn Andersson:
 "This adds and improves remoteproc support for TI DA8xx/OMAP-L13x DSP,
  TI Keystone 66AK2G DSP and iMX6SX/7D Cortex M4 coprocessors. It
  introduces the Qualcomm restart notifier and a few fixes"

* tag 'rproc-v4.14' of git://github.com/andersson/remoteproc:
  remoteproc: Introduce rproc handle accessor for children
  remoteproc: qcom: Make ssr_notifiers local
  remoteproc: Stop subdevices in reverse order
  remoteproc: imx_rproc: add a NXP/Freescale imx_rproc driver
  remoteproc: dt: Provide bindings for iMX6SX/7D Remote Processor Controller driver
  remoteproc: qcom: Use PTR_ERR_OR_ZERO
  remoteproc: st: explicitly request exclusive reset control
  remoteproc: qcom: explicitly request exclusive reset control
  remoteproc/keystone: explicitly request exclusive reset control
  remoteproc/keystone: Add support for Keystone 66AK2G SOCs
  remoteproc/davinci: Add device tree support for OMAP-L138 DSP
  dt-bindings: remoteproc: Add bindings for Davinci DSP processors
  remoteproc/davinci: Add support to parse internal memories
  remoteproc/davinci: Switch to platform_get_resource_byname()
  remoteproc: make device_type const
  soc: qcom: GLINK SSR notifier
  remoteproc: qcom: Add support for SSR notifications
  remoteproc: Merge __rproc_boot() with rproc_boot()

7 years agoMerge tag 'vfio-v4.14-rc1' of git://github.com/awilliam/linux-vfio
Linus Torvalds [Sat, 9 Sep 2017 21:28:45 +0000 (14:28 -0700)]
Merge tag 'vfio-v4.14-rc1' of git://github.com/awilliam/linux-vfio

Pull VFIO updates from Alex Williamson:

 - Base MSI remapping on either IOMMU domain or IRQ domain support
   (Robin Murphy)

 - Prioritize hardware MSI regions over software defined regions (Robin
   Murphy)

 - Fix no-iommu reference counting (Eric Auger)

 - Stall removing last device from group for container cleanup (Alex
   Williamson)

 - Constify amba_id (Arvind Yadav)

* tag 'vfio-v4.14-rc1' of git://github.com/awilliam/linux-vfio:
  vfio: platform: constify amba_id
  vfio: Stall vfio_del_group_dev() for container group detach
  vfio: fix noiommu vfio_iommu_group_get reference count
  vfio/type1: Give hardware MSI regions precedence
  vfio/type1: Cope with hardware MSI reserved regions

7 years agoMerge branch 'i2c/for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa...
Linus Torvalds [Sat, 9 Sep 2017 21:18:40 +0000 (14:18 -0700)]
Merge branch 'i2c/for-4.14' of git://git./linux/kernel/git/wsa/linux

Pull i2c updates from Wolfram Sang:

 - new drivers for Spreadtrum I2C, Intel Cherry Trail Whiskey Cove SMBUS

 - quite some driver updates

 - cleanups for the i2c-mux subsystem

 - some subsystem-wide constification

 - further cleanup of include/linux/i2c

* 'i2c/for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (50 commits)
  i2c: sprd: Fix undefined reference errors
  i2c: nomadik: constify amba_id
  i2c: versatile: Make i2c_algo_bit_data const
  i2c: busses: make i2c_adapter_quirks const
  i2c: busses: make i2c_adapter const
  i2c: busses: make i2c_algorithm const
  i2c: Add Spreadtrum I2C controller driver
  dt-bindings: i2c: Add Spreadtrum I2C controller documentation
  i2c-cht-wc: make cht_wc_i2c_adap_driver static
  MAINTAINERS: Add entry for drivers/i2c/busses/i2c-cht-wc.c
  i2c: aspeed: Retain delay/setup/hold values when configuring bus frequency
  dt-bindings: i2c: eeprom: Document vendor to be used and deprecated ones
  i2c: i801: Restore the presence state of P2SB PCI device after reading BAR
  MAINTAINERS: drop entry for Blackfin I2C and Sonic's email
  blackfin: merge the two TWI header files
  i2c: davinci: Preserve return value of devm_clk_get
  i2c: mediatek: Add i2c compatible for MediaTek MT7622
  dt-bindings: i2c: Add MediaTek MT7622 i2c binding
  dt-bindings: i2c: modify information formats
  i2c: mux: i2c-arb-gpio-challenge: allow compiling w/o OF support
  ...

7 years agoMerge tag 'nfsd-4.14' of git://linux-nfs.org/~bfields/linux
Linus Torvalds [Sat, 9 Sep 2017 20:31:49 +0000 (13:31 -0700)]
Merge tag 'nfsd-4.14' of git://linux-nfs.org/~bfields/linux

Pull nfsd updates from Bruce Fields:
 "More RDMA work and some op-structure constification from Chuck Lever,
  and a small cleanup to our xdr encoding"

* tag 'nfsd-4.14' of git://linux-nfs.org/~bfields/linux:
  svcrdma: Estimate Send Queue depth properly
  rdma core: Add rdma_rw_mr_payload()
  svcrdma: Limit RQ depth
  svcrdma: Populate tail iovec when receiving
  nfsd: Incoming xdr_bufs may have content in tail buffer
  svcrdma: Clean up svc_rdma_build_read_chunk()
  sunrpc: Const-ify struct sv_serv_ops
  nfsd: Const-ify NFSv4 encoding and decoding ops arrays
  sunrpc: Const-ify instances of struct svc_xprt_ops
  nfsd4: individual encoders no longer see error cases
  nfsd4: skip encoder in trivial error cases
  nfsd4: define ->op_release for compound ops
  nfsd4: opdesc will be useful outside nfs4proc.c
  nfsd4: move some nfsd4 op definitions to xdr4.h

7 years agoMerge branch 'for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Linus Torvalds [Sat, 9 Sep 2017 20:27:51 +0000 (13:27 -0700)]
Merge branch 'for-4.14' of git://git./linux/kernel/git/kdave/linux

Pull btrfs updates from David Sterba:
 "The changes range through all types: cleanups, core chagnes, sanity
  checks, fixes, other user visible changes, detailed list below:

   - deprecated: user transaction ioctl

   - mount option ssd does not change allocation alignments

   - degraded read-write mount is allowed if all the raid profile
     constraints are met, now based on more accurate check

   - defrag: do not reset compression afterwards; the NOCOMPRESS flag
     can be now overriden by defrag

   - prep work for better extent reference tracking (related to the
     qgroup slowness with balance)

   - prep work for compression heuristics

   - memory allocation reductions (may help latencies on a loaded
     system)

   - better accounting for io waiting states

   - error handling improvements (removed BUGs)

   - added more sanity checks for shared refs

   - fix readdir vs pagefault deadlock under some circumstances

   - fix for 'no-hole' mode, certain combination of compressed and
     inline extents

   - send: fix emission of invalid clone operations

   - fixup file mode if setting acls fail

   - more fixes from fuzzing

   - oher cleanups"

* 'for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (104 commits)
  btrfs: submit superblock io with REQ_META and REQ_PRIO
  btrfs: remove unnecessary memory barrier in btrfs_direct_IO
  btrfs: remove superfluous chunk_tree argument from btrfs_alloc_dev_extent
  btrfs: Remove chunk_objectid parameter of btrfs_alloc_dev_extent
  btrfs: pass fs_info to btrfs_del_root instead of tree_root
  Btrfs: add one more sanity check for shared ref type
  Btrfs: remove BUG_ON in __add_tree_block
  Btrfs: remove BUG() in add_data_reference
  Btrfs: remove BUG() in print_extent_item
  Btrfs: remove BUG() in btrfs_extent_inline_ref_size
  Btrfs: convert to use btrfs_get_extent_inline_ref_type
  Btrfs: add a helper to retrive extent inline ref type
  btrfs: scrub: simplify scrub worker initialization
  btrfs: scrub: clean up division in scrub_find_csum
  btrfs: scrub: clean up division in __scrub_mark_bitmap
  btrfs: scrub: use bool for flush_all_writes
  btrfs: preserve i_mode if __btrfs_set_acl() fails
  btrfs: Remove extraneous chunk_objectid variable
  btrfs: Remove chunk_objectid argument from btrfs_make_block_group
  btrfs: Remove extra parentheses from condition in copy_items()
  ...

7 years agoMerge branch 'for-4.14/block-postmerge' of git://git.kernel.dk/linux-block
Linus Torvalds [Sat, 9 Sep 2017 19:49:01 +0000 (12:49 -0700)]
Merge branch 'for-4.14/block-postmerge' of git://git.kernel.dk/linux-block

Pull followup block layer updates from Jens Axboe:
 "I ended up splitting the main pull request for this series into two,
  mainly because of clashes between NVMe fixes that went into 4.13 after
  the for-4.14 branches were split off. This pull request is mostly
  NVMe, but not exclusively. In detail, it contains:

   - Two pull request for NVMe changes from Christoph. Nothing new on
     the feature front, basically just fixes all over the map for the
     core bits, transport, rdma, etc.

   - Series from Bart, cleaning up various bits in the BFQ scheduler.

   - Series of bcache fixes, which has been lingering for a release or
     two. Coly sent this in, but patches from various people in this
     area.

   - Set of patches for BFQ from Paolo himself, updating both
     documentation and fixing some corner cases in performance.

   - Series from Omar, attempting to now get the 4k loop support
     correct. Our confidence level is higher this time.

   - Series from Shaohua for loop as well, improving O_DIRECT
     performance and fixing a use-after-free"

* 'for-4.14/block-postmerge' of git://git.kernel.dk/linux-block: (74 commits)
  bcache: initialize dirty stripes in flash_dev_run()
  loop: set physical block size to logical block size
  bcache: fix bch_hprint crash and improve output
  bcache: Update continue_at() documentation
  bcache: silence static checker warning
  bcache: fix for gc and write-back race
  bcache: increase the number of open buckets
  bcache: Correct return value for sysfs attach errors
  bcache: correct cache_dirty_target in __update_writeback_rate()
  bcache: gc does not work when triggering by manual command
  bcache: Don't reinvent the wheel but use existing llist API
  bcache: do not subtract sectors_to_gc for bypassed IO
  bcache: fix sequential large write IO bypass
  bcache: Fix leak of bdev reference
  block/loop: remove unused field
  block/loop: fix use after free
  bfq: Use icq_to_bic() consistently
  bfq: Suppress compiler warnings about comparisons
  bfq: Check kstrtoul() return value
  bfq: Declare local functions static
  ...

7 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Linus Torvalds [Sat, 9 Sep 2017 18:05:20 +0000 (11:05 -0700)]
Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:
 "The iwlwifi firmware compat fix is in here as well as some other
  stuff:

  1) Fix request socket leak introduced by BPF deadlock fix, from Eric
     Dumazet.

  2) Fix VLAN handling with TXQs in mac80211, from Johannes Berg.

  3) Missing __qdisc_drop conversions in prio and qfq schedulers, from
     Gao Feng.

  4) Use after free in netlink nlk groups handling, from Xin Long.

  5) Handle MTU update properly in ipv6 gre tunnels, from Xin Long.

  6) Fix leak of ipv6 fib tables on netns teardown, from Sabrina Dubroca
     with follow-on fix from Eric Dumazet.

  7) Need RCU and preemption disabled during generic XDP data patch,
     from John Fastabend"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (54 commits)
  bpf: make error reporting in bpf_warn_invalid_xdp_action more clear
  Revert "mdio_bus: Remove unneeded gpiod NULL check"
  bpf: devmap, use cond_resched instead of cpu_relax
  bpf: add support for sockmap detach programs
  net: rcu lock and preempt disable missing around generic xdp
  bpf: don't select potentially stale ri->map from buggy xdp progs
  net: tulip: Constify tulip_tbl
  net: ethernet: ti: netcp_core: no need in netif_napi_del
  davicom: Display proper debug level up to 6
  net: phy: sfp: rename dt properties to match the binding
  dt-binding: net: sfp binding documentation
  dt-bindings: add SFF vendor prefix
  dt-bindings: net: don't confuse with generic PHY property
  ip6_tunnel: fix setting hop_limit value for ipv6 tunnel
  ip_tunnel: fix setting ttl and tos value in collect_md mode
  ipv6: fix typo in fib6_net_exit()
  tcp: fix a request socket leak
  sctp: fix missing wake ups in some situations
  netfilter: xt_hashlimit: fix build error caused by 64bit division
  netfilter: xt_hashlimit: alloc hashtable with right size
  ...

7 years agoMerge branch 'akpm' (patches from Andrew)
Linus Torvalds [Sat, 9 Sep 2017 17:30:07 +0000 (10:30 -0700)]
Merge branch 'akpm' (patches from Andrew)

Merge more updates from Andrew Morton:

 - most of the rest of MM

 - a small number of misc things

 - lib/ updates

 - checkpatch

 - autofs updates

 - ipc/ updates

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (126 commits)
  ipc: optimize semget/shmget/msgget for lots of keys
  ipc/sem: play nicer with large nsops allocations
  ipc/sem: drop sem_checkid helper
  ipc: convert kern_ipc_perm.refcount from atomic_t to refcount_t
  ipc: convert sem_undo_list.refcnt from atomic_t to refcount_t
  ipc: convert ipc_namespace.count from atomic_t to refcount_t
  kcov: support compat processes
  sh: defconfig: cleanup from old Kconfig options
  mn10300: defconfig: cleanup from old Kconfig options
  m32r: defconfig: cleanup from old Kconfig options
  drivers/pps: use surrounding "if PPS" to remove numerous dependency checks
  drivers/pps: aesthetic tweaks to PPS-related content
  cpumask: make cpumask_next() out-of-line
  kmod: move #ifdef CONFIG_MODULES wrapper to Makefile
  kmod: split off umh headers into its own file
  MAINTAINERS: clarify kmod is just a kernel module loader
  kmod: split out umh code into its own file
  test_kmod: flip INT checks to be consistent
  test_kmod: remove paranoid UINT_MAX check on uint range processing
  vfat: deduplicate hex2bin()
  ...

7 years agoremove gperf left-overs from build system
Linus Torvalds [Sat, 9 Sep 2017 17:00:15 +0000 (10:00 -0700)]
remove gperf left-overs from build system

I removed all the gperf use, but not the Makefile rules.  Sam Ravnborg
says I get bonus points for cleaning this up.  I'll hold him to it.

Requested-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agobpf: make error reporting in bpf_warn_invalid_xdp_action more clear
Daniel Borkmann [Fri, 8 Sep 2017 23:40:35 +0000 (01:40 +0200)]
bpf: make error reporting in bpf_warn_invalid_xdp_action more clear

Differ between illegal XDP action code and just driver
unsupported one to provide better feedback when we throw
a one-time warning here. Reason is that with 814abfabef3c
("xdp: add bpf_redirect helper function") not all drivers
support the new XDP return code yet and thus they will
fall into their 'default' case when checking for return
codes after program return, which then triggers a
bpf_warn_invalid_xdp_action() stating that the return
code is illegal, but from XDP perspective it's not.

I decided not to place something like a XDP_ACT_MAX define
into uapi i) given we don't have this either for all other
program types, ii) future action codes could have further
encoding there, which would render such define unsuitable
and we wouldn't be able to rip it out again, and iii) we
rarely add new action codes.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoRevert "mdio_bus: Remove unneeded gpiod NULL check"
Florian Fainelli [Fri, 8 Sep 2017 22:38:07 +0000 (15:38 -0700)]
Revert "mdio_bus: Remove unneeded gpiod NULL check"

This reverts commit 95b80bf3db03c2bf572a357cf74b9a6aefef0a4a ("mdio_bus:
Remove unneeded gpiod NULL check"), this commit assumed that GPIOLIB
checks for NULL descriptors, so it's safe to drop them, but it is not
when CONFIG_GPIOLIB is disabled in the kernel. If we do call
gpiod_set_value_cansleep() on a GPIO descriptor we will issue warnings
coming from the inline stubs declared in include/linux/gpio/consumer.h.

Fixes: 95b80bf3db03 ("mdio_bus: Remove unneeded gpiod NULL check")
Reported-by: Woojung Huh <Woojung.Huh@microchip.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'xdp-bpf-fixes'
David S. Miller [Sat, 9 Sep 2017 04:11:01 +0000 (21:11 -0700)]
Merge branch 'xdp-bpf-fixes'

John Fastabend says:

====================
net: Fixes for XDP/BPF

The following fixes, UAPI updates, and small improvement,

i. XDP needs to be called inside RCU with preempt disabled.

ii. Not strictly a bug fix but we have an attach command in the
sockmap UAPI already to avoid having a single kernel released with
only the attach and not the detach I'm pushing this into net branch.
Its early in the RC cycle so I think this is OK (not ideal but better
than supporting a UAPI with a missing detach forever).

iii. Final patch replace cpu_relax with cond_resched in devmap.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobpf: devmap, use cond_resched instead of cpu_relax
John Fastabend [Fri, 8 Sep 2017 21:01:10 +0000 (14:01 -0700)]
bpf: devmap, use cond_resched instead of cpu_relax

Be a bit more friendly about waiting for flush bits to complete.
Replace the cpu_relax() with a cond_resched().

Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobpf: add support for sockmap detach programs
John Fastabend [Fri, 8 Sep 2017 21:00:49 +0000 (14:00 -0700)]
bpf: add support for sockmap detach programs

The bpf map sockmap supports adding programs via attach commands. This
patch adds the detach command to keep the API symmetric and allow
users to remove previously added programs. Otherwise the user would
have to delete the map and re-add it to get in this state.

This also adds a series of additional tests to capture detach operation
and also attaching/detaching invalid prog types.

API note: socks will run (or not run) programs depending on the state
of the map at the time the sock is added. We do not for example walk
the map and remove programs from previously attached socks.

Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: rcu lock and preempt disable missing around generic xdp
John Fastabend [Fri, 8 Sep 2017 21:00:30 +0000 (14:00 -0700)]
net: rcu lock and preempt disable missing around generic xdp

do_xdp_generic must be called inside rcu critical section with preempt
disabled to ensure BPF programs are valid and per-cpu variables used
for redirect operations are consistent. This patch ensures this is true
and fixes the splat below.

The netif_receive_skb_internal() code path is now broken into two rcu
critical sections. I decided it was better to limit the preempt_enable/disable
block to just the xdp static key portion and the fallout is more
rcu_read_lock/unlock calls. Seems like the best option to me.

[  607.596901] =============================
[  607.596906] WARNING: suspicious RCU usage
[  607.596912] 4.13.0-rc4+ #570 Not tainted
[  607.596917] -----------------------------
[  607.596923] net/core/dev.c:3948 suspicious rcu_dereference_check() usage!
[  607.596927]
[  607.596927] other info that might help us debug this:
[  607.596927]
[  607.596933]
[  607.596933] rcu_scheduler_active = 2, debug_locks = 1
[  607.596938] 2 locks held by pool/14624:
[  607.596943]  #0:  (rcu_read_lock_bh){......}, at: [<ffffffff95445ffd>] ip_finish_output2+0x14d/0x890
[  607.596973]  #1:  (rcu_read_lock_bh){......}, at: [<ffffffff953c8e3a>] __dev_queue_xmit+0x14a/0xfd0
[  607.597000]
[  607.597000] stack backtrace:
[  607.597006] CPU: 5 PID: 14624 Comm: pool Not tainted 4.13.0-rc4+ #570
[  607.597011] Hardware name: Dell Inc. Precision Tower 5810/0HHV7N, BIOS A17 03/01/2017
[  607.597016] Call Trace:
[  607.597027]  dump_stack+0x67/0x92
[  607.597040]  lockdep_rcu_suspicious+0xdd/0x110
[  607.597054]  do_xdp_generic+0x313/0xa50
[  607.597068]  ? time_hardirqs_on+0x5b/0x150
[  607.597076]  ? mark_held_locks+0x6b/0xc0
[  607.597088]  ? netdev_pick_tx+0x150/0x150
[  607.597117]  netif_rx_internal+0x205/0x3f0
[  607.597127]  ? do_xdp_generic+0xa50/0xa50
[  607.597144]  ? lock_downgrade+0x2b0/0x2b0
[  607.597158]  ? __lock_is_held+0x93/0x100
[  607.597187]  netif_rx+0x119/0x190
[  607.597202]  loopback_xmit+0xfd/0x1b0
[  607.597214]  dev_hard_start_xmit+0x127/0x4e0

Fixes: d445516966dc ("net: xdp: support xdp generic on virtual devices")
Fixes: b5cdae3291f7 ("net: Generic XDP")
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobpf: don't select potentially stale ri->map from buggy xdp progs
Daniel Borkmann [Thu, 7 Sep 2017 22:14:51 +0000 (00:14 +0200)]
bpf: don't select potentially stale ri->map from buggy xdp progs

We can potentially run into a couple of issues with the XDP
bpf_redirect_map() helper. The ri->map in the per CPU storage
can become stale in several ways, mostly due to misuse, where
we can then trigger a use after free on the map:

i) prog A is calling bpf_redirect_map(), returning XDP_REDIRECT
and running on a driver not supporting XDP_REDIRECT yet. The
ri->map on that CPU becomes stale when the XDP program is unloaded
on the driver, and a prog B loaded on a different driver which
supports XDP_REDIRECT return code. prog B would have to omit
calling to bpf_redirect_map() and just return XDP_REDIRECT, which
would then access the freed map in xdp_do_redirect() since not
cleared for that CPU.

ii) prog A is calling bpf_redirect_map(), returning a code other
than XDP_REDIRECT. prog A is then detached, which triggers release
of the map. prog B is attached which, similarly as in i), would
just return XDP_REDIRECT without having called bpf_redirect_map()
and thus be accessing the freed map in xdp_do_redirect() since
not cleared for that CPU.

iii) prog A is attached to generic XDP, calling the bpf_redirect_map()
helper and returning XDP_REDIRECT. xdp_do_generic_redirect() is
currently not handling ri->map (will be fixed by Jesper), so it's
not being reset. Later loading a e.g. native prog B which would,
say, call bpf_xdp_redirect() and then returns XDP_REDIRECT would
find in xdp_do_redirect() that a map was set and uses that causing
use after free on map access.

Fix thus needs to avoid accessing stale ri->map pointers, naive
way would be to call a BPF function from drivers that just resets
it to NULL for all XDP return codes but XDP_REDIRECT and including
XDP_REDIRECT for drivers not supporting it yet (and let ri->map
being handled in xdp_do_generic_redirect()). There is a less
intrusive way w/o letting drivers call a reset for each BPF run.

The verifier knows we're calling into bpf_xdp_redirect_map()
helper, so it can do a small insn rewrite transparent to the prog
itself in the sense that it fills R4 with a pointer to the own
bpf_prog. We have that pointer at verification time anyway and
R4 is allowed to be used as per calling convention we scratch
R0 to R5 anyway, so they become inaccessible and program cannot
read them prior to a write. Then, the helper would store the prog
pointer in the current CPUs struct redirect_info. Later in
xdp_do_*_redirect() we check whether the redirect_info's prog
pointer is the same as passed xdp_prog pointer, and if that's
the case then all good, since the prog holds a ref on the map
anyway, so it is always valid at that point in time and must
have a reference count of at least 1. If in the unlikely case
they are not equal, it means we got a stale pointer, so we clear
and bail out right there. Also do reset map and the owning prog
in bpf_xdp_redirect(), so that bpf_xdp_redirect_map() and
bpf_xdp_redirect() won't get mixed up, only the last call should
take precedence. A tc bpf_redirect() doesn't use map anywhere
yet, so no need to clear it there since never accessed in that
layer.

Note that in case the prog is released, and thus the map as
well we're still under RCU read critical section at that time
and have preemption disabled as well. Once we commit with the
__dev_map_insert_ctx() from xdp_do_redirect_map() and set the
map to ri->map_to_flush, we still wait for a xdp_do_flush_map()
to finish in devmap dismantle time once flush_needed bit is set,
so that is fine.

Fixes: 97f91a7cf04f ("bpf: add bpf_redirect_map helper routine")
Reported-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: tulip: Constify tulip_tbl
Kees Cook [Thu, 7 Sep 2017 19:35:14 +0000 (12:35 -0700)]
net: tulip: Constify tulip_tbl

It looks like all users of tulip_tbl are reads, so mark this table
as read-only.

$ git grep tulip_tbl  # edited to avoid line-wraps...
interrupt.c: iowrite32(tulip_tbl[tp->chip_id].valid_intrs, ...
interrupt.c: iowrite32(tulip_tbl[tp->chip_id].valid_intrs&~RxPollInt, ...
interrupt.c: iowrite32(tulip_tbl[tp->chip_id].valid_intrs, ...
interrupt.c: iowrite32(tulip_tbl[tp->chip_id].valid_intrs | TimerInt,
pnic.c:      iowrite32(tulip_tbl[tp->chip_id].valid_intrs, ioaddr + CSR7);
tulip.h:     extern struct tulip_chip_table tulip_tbl[];
tulip_core.c:struct tulip_chip_table tulip_tbl[] = {
tulip_core.c:iowrite32(tulip_tbl[tp->chip_id].valid_intrs, ioaddr + CSR5);
tulip_core.c:iowrite32(tulip_tbl[tp->chip_id].valid_intrs, ioaddr + CSR7);
tulip_core.c:setup_timer(&tp->timer, tulip_tbl[tp->chip_id].media_timer,
tulip_core.c:const char *chip_name = tulip_tbl[chip_idx].chip_name;
tulip_core.c:if (pci_resource_len (pdev, 0) < tulip_tbl[chip_idx].io_size)
tulip_core.c:ioaddr =  pci_iomap(..., tulip_tbl[chip_idx].io_size);
tulip_core.c:tp->flags = tulip_tbl[chip_idx].flags;
tulip_core.c:setup_timer(&tp->timer, tulip_tbl[tp->chip_id].media_timer,
tulip_core.c:INIT_WORK(&tp->media_work, tulip_tbl[tp->chip_id].media_task);

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jarod Wilson <jarod@redhat.com>
Cc: "Gustavo A. R. Silva" <gustavo@embeddedor.com>
Cc: netdev@vger.kernel.org
Cc: linux-parisc@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: ethernet: ti: netcp_core: no need in netif_napi_del
Ivan Khoronzhuk [Thu, 7 Sep 2017 15:32:30 +0000 (18:32 +0300)]
net: ethernet: ti: netcp_core: no need in netif_napi_del

Don't remove rx_napi specifically just before free_netdev(),
it's supposed to be done in it and is confusing w/o tx_napi deletion.

Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agodavicom: Display proper debug level up to 6
Mathieu Malaterre [Thu, 7 Sep 2017 11:24:20 +0000 (13:24 +0200)]
davicom: Display proper debug level up to 6

This will make it explicit some messages are of the form:
dm9000_dbg(db, 5, ...

Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: phy: sfp: rename dt properties to match the binding
Baruch Siach [Thu, 7 Sep 2017 09:25:50 +0000 (12:25 +0300)]
net: phy: sfp: rename dt properties to match the binding

Make the Rx rate select control gpio property name match the documented
binding. This would make the addition of 'rate-select1-gpios' for SFP+
support more natural.

Also, make the MOD-DEF0 gpio property name match the documentation.

Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agodt-binding: net: sfp binding documentation
Baruch Siach [Thu, 7 Sep 2017 09:25:49 +0000 (12:25 +0300)]
dt-binding: net: sfp binding documentation

Add device-tree binding documentation SFP transceivers. Support for SFP
transceivers has been recently introduced (drivers/net/phy/sfp.c).

Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agodt-bindings: add SFF vendor prefix
Baruch Siach [Thu, 7 Sep 2017 09:25:48 +0000 (12:25 +0300)]
dt-bindings: add SFF vendor prefix

Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agodt-bindings: net: don't confuse with generic PHY property
Baruch Siach [Thu, 7 Sep 2017 08:09:59 +0000 (11:09 +0300)]
dt-bindings: net: don't confuse with generic PHY property

This complements commit 9a94b3a4bd (dt-binding: phy: don't confuse with
Ethernet phy properties).

The generic PHY 'phys' property sometime appears in the same node with
the Ethernet PHY 'phy' or 'phy-handle' properties. Add a warning in
ethernet.txt to reduce confusion.

Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoip6_tunnel: fix setting hop_limit value for ipv6 tunnel
Haishuang Yan [Thu, 7 Sep 2017 06:08:35 +0000 (14:08 +0800)]
ip6_tunnel: fix setting hop_limit value for ipv6 tunnel

Similar to vxlan/geneve tunnel, if hop_limit is zero, it should fall
back to ip6_dst_hoplimt().

Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoip_tunnel: fix setting ttl and tos value in collect_md mode
Haishuang Yan [Thu, 7 Sep 2017 06:08:34 +0000 (14:08 +0800)]
ip_tunnel: fix setting ttl and tos value in collect_md mode

ttl and tos variables are declared and assigned, but are not used in
iptunnel_xmit() function.

Fixes: cfc7381b3002 ("ip_tunnel: add collect_md mode to IPIP tunnel")
Cc: Alexei Starovoitov <ast@fb.com>
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoipc: optimize semget/shmget/msgget for lots of keys
Guillaume Knispel [Fri, 8 Sep 2017 23:17:55 +0000 (16:17 -0700)]
ipc: optimize semget/shmget/msgget for lots of keys

ipc_findkey() used to scan all objects to look for the wanted key.  This
is slow when using a high number of keys.  This change adds an rhashtable
of kern_ipc_perm objects in ipc_ids, so that one lookup cease to be O(n).

This change gives a 865% improvement of benchmark reaim.jobs_per_min on a
56 threads Intel(R) Xeon(R) CPU E5-2695 v3 @ 2.30GHz with 256G memory [1]

Other (more micro) benchmark results, by the author: On an i5 laptop, the
following loop executed right after a reboot took, without and with this
change:

    for (int i = 0, k=0x424242; i < KEYS; ++i)
        semget(k++, 1, IPC_CREAT | 0600);

                 total       total          max single  max single
   KEYS        without        with        call without   call with

      1            3.5         4.9   Âµs            3.5         4.9
     10            7.6         8.6   Âµs            3.7         4.7
     32           16.2        15.9   Âµs            4.3         5.3
    100           72.9        41.8   Âµs            3.7         4.7
   1000        5,630.0       502.0   Âµs             *           *
  10000    1,340,000.0     7,240.0   Âµs             *           *
  31900   17,600,000.0    22,200.0   Âµs             *           *

 *: unreliable measure: high variance

The duration for a lookup-only usage was obtained by the same loop once
the keys are present:

                 total       total          max single  max single
   KEYS        without        with        call without   call with

      1            2.1         2.5   Âµs            2.1         2.5
     10            4.5         4.8   Âµs            2.2         2.3
     32           13.0        10.8   Âµs            2.3         2.8
    100           82.9        25.1   Âµs             *          2.3
   1000        5,780.0       217.0   Âµs             *           *
  10000    1,470,000.0     2,520.0   Âµs             *           *
  31900   17,400,000.0     7,810.0   Âµs             *           *

Finally, executing each semget() in a new process gave, when still
summing only the durations of these syscalls:

creation:
                 total       total
   KEYS        without        with

      1            3.7         5.0   Âµs
     10           32.9        36.7   Âµs
     32          125.0       109.0   Âµs
    100          523.0       353.0   Âµs
   1000       20,300.0     3,280.0   Âµs
  10000    2,470,000.0    46,700.0   Âµs
  31900   27,800,000.0   219,000.0   Âµs

lookup-only:
                 total       total
   KEYS        without        with

      1            2.5         2.7   Âµs
     10           25.4        24.4   Âµs
     32          106.0        72.6   Âµs
    100          591.0       352.0   Âµs
   1000       22,400.0     2,250.0   Âµs
  10000    2,510,000.0    25,700.0   Âµs
  31900   28,200,000.0   115,000.0   Âµs

[1] http://lkml.kernel.org/r/20170814060507.GE23258@yexl-desktop

Link: http://lkml.kernel.org/r/20170815194954.ck32ta2z35yuzpwp@debix
Signed-off-by: Guillaume Knispel <guillaume.knispel@supersonicimagine.com>
Reviewed-by: Marc Pardo <marc.pardo@supersonicimagine.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: Andrey Vagin <avagin@openvz.org>
Cc: Guillaume Knispel <guillaume.knispel@supersonicimagine.com>
Cc: Marc Pardo <marc.pardo@supersonicimagine.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoipc/sem: play nicer with large nsops allocations
Davidlohr Bueso [Fri, 8 Sep 2017 23:17:52 +0000 (16:17 -0700)]
ipc/sem: play nicer with large nsops allocations

Replacing semop()'s kmalloc for kvmalloc was originally proposed by
Manfred on the premise that it can be called for large (than order-1)
sizes.  For example, while Oracle recommends setting SEMOPM to a _minimum_
of 100, some distros[1] encourage the setting to be a factor of the amount
of db tasks (PROCESSES), which can get fishy for large systems (easily
going beyond 1000).

[1] An Example of Semaphore Settings
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/5/html/Tuning_and_Optimizing_Red_Hat_Enterprise_Linux_for_Oracle_9i_and_10g_Databases/sect-Oracle_9i_and_10g_Tuning_Guide-Setting_Semaphores-An_Example_of_Semaphore_Settings.html

So let's just convert this to kvmalloc, just like the rest of the
allocations we do in ipc.  While the fallback vmalloc obviously involves
more overhead, this by far the uncommon path, and it's better for the user
than just erroring out with kmalloc.

Link: http://lkml.kernel.org/r/20170803184136.13855-2-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoipc/sem: drop sem_checkid helper
Davidlohr Bueso [Fri, 8 Sep 2017 23:17:49 +0000 (16:17 -0700)]
ipc/sem: drop sem_checkid helper

... 'tis not used.

Link: http://lkml.kernel.org/r/20170803184136.13855-1-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoipc: convert kern_ipc_perm.refcount from atomic_t to refcount_t
Elena Reshetova [Fri, 8 Sep 2017 23:17:45 +0000 (16:17 -0700)]
ipc: convert kern_ipc_perm.refcount from atomic_t to refcount_t

refcount_t type and corresponding API should be used instead of atomic_t
when the variable is used as a reference counter.  This allows to avoid
accidental refcounter overflows that might lead to use-after-free
situations.

Link: http://lkml.kernel.org/r/1499417992-3238-4-git-send-email-elena.reshetova@intel.com
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: <arozansk@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoipc: convert sem_undo_list.refcnt from atomic_t to refcount_t
Elena Reshetova [Fri, 8 Sep 2017 23:17:42 +0000 (16:17 -0700)]
ipc: convert sem_undo_list.refcnt from atomic_t to refcount_t

refcount_t type and corresponding API should be used instead of atomic_t
when the variable is used as a reference counter.  This allows to avoid
accidental refcounter overflows that might lead to use-after-free
situations.

Link: http://lkml.kernel.org/r/1499417992-3238-3-git-send-email-elena.reshetova@intel.com
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: <arozansk@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoipc: convert ipc_namespace.count from atomic_t to refcount_t
Elena Reshetova [Fri, 8 Sep 2017 23:17:38 +0000 (16:17 -0700)]
ipc: convert ipc_namespace.count from atomic_t to refcount_t

refcount_t type and corresponding API should be used instead of atomic_t
when the variable is used as a reference counter.  This allows to avoid
accidental refcounter overflows that might lead to use-after-free
situations.

Link: http://lkml.kernel.org/r/1499417992-3238-2-git-send-email-elena.reshetova@intel.com
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: <arozansk@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agokcov: support compat processes
Dmitry Vyukov [Fri, 8 Sep 2017 23:17:35 +0000 (16:17 -0700)]
kcov: support compat processes

Support compat processes in KCOV by providing compat_ioctl callback.
Compat mode uses the same ioctl callback: we have 2 commands that do not
use the argument and 1 that already checks that the arg does not overflow
INT_MAX.  This allows to use KCOV-guided fuzzing in compat processes.

Link: http://lkml.kernel.org/r/20170823100553.55812-1-dvyukov@google.com
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Cc: <syzkaller@googlegroups.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agosh: defconfig: cleanup from old Kconfig options
Krzysztof Kozlowski [Fri, 8 Sep 2017 23:17:31 +0000 (16:17 -0700)]
sh: defconfig: cleanup from old Kconfig options

Remove old, dead Kconfig options (in order appearing in this commit):
 - EXPERIMENTAL is gone since v3.9;
 - INET_LRO: commit 7bbf3cae65b6 ("ipv4: Remove inet_lro library");
 - MTD_CONCAT: commit f53fdebcc3e1 ("mtd: drop MTD_CONCAT from Kconfig
   entirely");
 - MTD_PARTITIONS: commit 6a8a98b22b10 ("mtd: kill
   CONFIG_MTD_PARTITIONS");
 - MTD_CHAR: commit 660685d9d1b4 ("mtd: merge mtdchar module with
   mtdcore");
 - NETDEV_1000 and NETDEV_10000: commit f860b0522f65 ("drivers/net:
   Kconfig and Makefile cleanup"); NET_ETHERNET should be replaced with
   just ETHERNET but that is separate change;
 - HID_SUPPORT: commit 1f41a6a99476 ("HID: Fix the generic Kconfig
   options");
 - RCU_CPU_STALL_DETECTOR: commit a00e0d714fbd ("rcu: Remove conditional
   compilation for RCU CPU stall warnings");
 - SYSCTL_SYSCALL_CHECK: commit 7c60c48f58a7 ("sysctl: Improve the
   sysctl sanity checks");
 - VIDEO_OUTPUT_CONTROL: commit f167a64e9d67 ("video / output: Drop
   display output class support");
 - MISC_DEVICES: commit 7c5763b8453a ("drivers: misc: Remove
   MISC_DEVICES config option");
 - AUTOFS_FS: commit 561c5cf9236a ("staging: Remove autofs3");
 - IP_NF_QUEUE: commit 3dd6664fac7e ("netfilter: remove unused "config
   IP_NF_QUEUE"");
 - USB_DEVICE_CLASS: commit 007bab91324e ("USB: remove
   CONFIG_USB_DEVICE_CLASS");
 - USB_LIBUSUAL: commit f61870ee6f8c ("usb: remove libusual");
 - DISPLAY_SUPPORT: commit 5a6b5e02d673 ("fbdev: remove display
   subsystem");
 - IP_NF_TARGET_ULOG: commit d4da843e6fad ("netfilter: kill remnants of
   ulog targets");
 - IP6_NF_QUEUE: commit d16cf20e2f2f ("netfilter: remove ip_queue
   support");
 - IP6_NF_TARGET_LOG: commit 6939c33a757b ("netfilter: merge ipt_LOG and
   ip6_LOG into xt_LOG");

Link: http://lkml.kernel.org/r/1500526846-4072-1-git-send-email-krzk@kernel.org
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agomn10300: defconfig: cleanup from old Kconfig options
Krzysztof Kozlowski [Fri, 8 Sep 2017 23:17:28 +0000 (16:17 -0700)]
mn10300: defconfig: cleanup from old Kconfig options

Remove old, dead Kconfig options (in order appearing in this commit):
 - EXPERIMENTAL is gone since v3.9;
 - INET_LRO: commit 7bbf3cae65b6 ("ipv4: Remove inet_lro library");
 - MTD_PARTITIONS: commit 6a8a98b22b10 ("mtd: kill
   CONFIG_MTD_PARTITIONS");
 - MTD_CHAR: commit 660685d9d1b4 ("mtd: merge mtdchar module with
   mtdcore");
 - NETDEV_1000 and NETDEV_10000: commit f860b0522f65 ("drivers/net:
   Kconfig and Makefile cleanup"); NET_ETHERNET should be replaced with
   just ETHERNET but that is separate change;
 - HID_SUPPORT: commit 1f41a6a99476 ("HID: Fix the generic Kconfig
   options");
 - RCU_CPU_STALL_DETECTOR: commit a00e0d714fbd ("rcu: Remove conditional
   compilation for RCU CPU stall warnings");

Link: http://lkml.kernel.org/r/1500526786-3789-1-git-send-email-krzk@kernel.org
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agom32r: defconfig: cleanup from old Kconfig options
Krzysztof Kozlowski [Fri, 8 Sep 2017 23:17:25 +0000 (16:17 -0700)]
m32r: defconfig: cleanup from old Kconfig options

Remove old, dead Kconfig options (in order appearing in this commit):
 - EXPERIMENTAL is gone since v3.9;
 - IP_NF_QUEUE: commit 3dd6664fac7e ("netfilter: remove unused "config
   IP_NF_QUEUE"");
 - IP_NF_TARGET_ULOG: commit d4da843e6fad ("netfilter: kill remnants of
   ulog targets");
 - VIDEO_OUTPUT_CONTROL: commit f167a64e9d67 ("video / output: Drop
   display output class support");
 - MTD_PARTITIONS: commit 6a8a98b22b10 ("mtd: kill
   CONFIG_MTD_PARTITIONS");
 - MTD_CHAR: commit 660685d9d1b4 ("mtd: merge mtdchar module with
   mtdcore");
 - MTD_CONCAT: commit f53fdebcc3e1 ("mtd: drop MTD_CONCAT from Kconfig
   entirely");

Link: http://lkml.kernel.org/r/1500527065-4575-1-git-send-email-krzk@kernel.org
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Acked-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agodrivers/pps: use surrounding "if PPS" to remove numerous dependency checks
Robert P. J. Day [Fri, 8 Sep 2017 23:17:22 +0000 (16:17 -0700)]
drivers/pps: use surrounding "if PPS" to remove numerous dependency checks

Adding high-level "if PPS" makes lower-level dependency tests superfluous.

Link: http://lkml.kernel.org/r/alpine.LFD.2.20.1708261050500.8156@localhost.localdomain
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Acked-by: Rodolfo Giometti <giometti@enneenne.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agodrivers/pps: aesthetic tweaks to PPS-related content
Robert P. J. Day [Fri, 8 Sep 2017 23:17:19 +0000 (16:17 -0700)]
drivers/pps: aesthetic tweaks to PPS-related content

Collection of aesthetic adjustments to various PPS-related files,
directories and Documentation, some quite minor just for the sake of
consistency, including:

 * Updated example of pps device tree node (courtesy Rodolfo G.)
 * "PPS-API" -> "PPS API"
 * "pps_source_info_s" -> "pps_source_info"
 * "ktimer driver" -> "pps-ktimer driver"
 * "ppstest /dev/pps0" -> "ppstest /dev/pps1" to match example
 * Add missing PPS-related entries to MAINTAINERS file
 * Other trivialities

Link: http://lkml.kernel.org/r/alpine.LFD.2.20.1708261048220.8106@localhost.localdomain
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Acked-by: Rodolfo Giometti <giometti@enneenne.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agocpumask: make cpumask_next() out-of-line
Alexey Dobriyan [Fri, 8 Sep 2017 23:17:15 +0000 (16:17 -0700)]
cpumask: make cpumask_next() out-of-line

Every for_each_XXX_cpu() invocation calls cpumask_next() which is an
inline function:

static inline unsigned int cpumask_next(int n, const struct cpumask *srcp)
{
        /* -1 is a legal arg here. */
        if (n != -1)
                cpumask_check(n);
        return find_next_bit(cpumask_bits(srcp), nr_cpumask_bits, n + 1);
}

However!

find_next_bit() is regular out-of-line function which means "nr_cpu_ids"
load and increment happen at the caller resulting in a lot of bloat

x86_64 defconfig:
add/remove: 3/0 grow/shrink: 8/373 up/down: 155/-5668 (-5513)
x86_64 allyesconfig-ish:
add/remove: 3/1 grow/shrink: 57/634 up/down: 3515/-28177 (-24662) !!!

Some archs redefine find_next_bit() but it is OK:

m68k inline but SMP is not supported
arm out-of-line
unicore32 out-of-line

Function call will happen anyway, so move load and increment into callee.

Link: http://lkml.kernel.org/r/20170824230010.GA1593@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agokmod: move #ifdef CONFIG_MODULES wrapper to Makefile
Luis R. Rodriguez [Fri, 8 Sep 2017 23:17:12 +0000 (16:17 -0700)]
kmod: move #ifdef CONFIG_MODULES wrapper to Makefile

The entire file is now conditionally compiled only when CONFIG_MODULES is
enabled, and this this is a bool.  Just move this conditional to the
Makefile as its easier to read this way.

Link: http://lkml.kernel.org/r/20170810180618.22457-5-mcgrof@kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Jessica Yu <jeyu@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michal Marek <mmarek@suse.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Matt Redfearn <matt.redfearn@imgtec.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Daniel Mentz <danielmentz@google.com>
Cc: David Binderman <dcb314@hotmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agokmod: split off umh headers into its own file
Luis R. Rodriguez [Fri, 8 Sep 2017 23:17:08 +0000 (16:17 -0700)]
kmod: split off umh headers into its own file

In the future usermode helper users do not need to carry in all the of
kmod headers declarations.

Since kmod.h still includes umh.h this change has no functional changes,
each umh user can be cleaned up separately later and with time.

Link: http://lkml.kernel.org/r/20170810180618.22457-4-mcgrof@kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Jessica Yu <jeyu@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michal Marek <mmarek@suse.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Matt Redfearn <matt.redfearn@imgtec.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Daniel Mentz <danielmentz@google.com>
Cc: David Binderman <dcb314@hotmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoMAINTAINERS: clarify kmod is just a kernel module loader
Luis R. Rodriguez [Fri, 8 Sep 2017 23:17:04 +0000 (16:17 -0700)]
MAINTAINERS: clarify kmod is just a kernel module loader

This should make it clearer what the kmod code is now that
the umh code is split out separately.

Link: http://lkml.kernel.org/r/20170810180618.22457-3-mcgrof@kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Jessica Yu <jeyu@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michal Marek <mmarek@suse.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Matt Redfearn <matt.redfearn@imgtec.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Daniel Mentz <danielmentz@google.com>
Cc: David Binderman <dcb314@hotmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agokmod: split out umh code into its own file
Luis R. Rodriguez [Fri, 8 Sep 2017 23:17:00 +0000 (16:17 -0700)]
kmod: split out umh code into its own file

Patch series "kmod: few code cleanups to split out umh code"

The usermode helper has a provenance from the old usb code which first
required a usermode helper.  Eventually this was shoved into kmod.c and
the kernel's modprobe calls was converted over eventually to share the
same code.  Over time the list of usermode helpers in the kernel has grown
-- so kmod is just but one user of the API.

This series is a simple logical cleanup which acknowledges the code
evolution of the usermode helper and shoves the UMH API into its own
dedicated file.  This way users of the API can later just include umh.h
instead of kmod.h.

Note despite the diff state the first patch really is just a code shove,
no functional changes are done there.  I did use git format-patch -M to
generate the patch, but in the end the split was not enough for git to
consider it a rename hence the large diffstat.

I've put this through 0-day and it gives me their machine compilation
blessings with all tests as OK.

This patch (of 4):

There's a slew of usermode helper users and kmod is just one of them.
Split out the usermode helper code into its own file to keep the logic and
focus split up.

This change provides no functional changes.

Link: http://lkml.kernel.org/r/20170810180618.22457-2-mcgrof@kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Jessica Yu <jeyu@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michal Marek <mmarek@suse.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Matt Redfearn <matt.redfearn@imgtec.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Daniel Mentz <danielmentz@google.com>
Cc: David Binderman <dcb314@hotmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agotest_kmod: flip INT checks to be consistent
Dan Carpenter [Fri, 8 Sep 2017 23:16:57 +0000 (16:16 -0700)]
test_kmod: flip INT checks to be consistent

Most checks will check for min and then max, except the int check.  Flip
the checks to be consistent with the other code.

[mcgrof@kernel.org: massaged commit log]
Link: http://lkml.kernel.org/r/20170802211707.28020-3-mcgrof@kernel.org
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Jessica Yu <jeyu@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michal Marek <mmarek@suse.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: David Binderman <dcb314@hotmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agotest_kmod: remove paranoid UINT_MAX check on uint range processing
Dan Carpenter [Fri, 8 Sep 2017 23:16:53 +0000 (16:16 -0700)]
test_kmod: remove paranoid UINT_MAX check on uint range processing

The UINT_MAX comparison is not needed because "max" is already an unsigned
int, and we expect developer C code max value input to have a sensible 0 -
UINT_MAX range.  Note that if it so happens to be UINT_MAX + 1 it would
lead to an issue, but we expect the developer to know this.

[mcgrof@kernel.org: massaged commit log]
Link: http://lkml.kernel.org/r/20170802211707.28020-2-mcgrof@kernel.org
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Jessica Yu <jeyu@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michal Marek <mmarek@suse.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: David Binderman <dcb314@hotmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agovfat: deduplicate hex2bin()
OGAWA Hirofumi [Fri, 8 Sep 2017 23:16:50 +0000 (16:16 -0700)]
vfat: deduplicate hex2bin()

We may use hex2bin() instead of custom approach.

Link: http://lkml.kernel.org/r/87zibktpil.fsf@devron
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoautofs: use unsigned int/long instead of uint/ulong for ioctl args
Tomohiro Kusumi [Fri, 8 Sep 2017 23:16:47 +0000 (16:16 -0700)]
autofs: use unsigned int/long instead of uint/ulong for ioctl args

The standard types unsigned int and unsigned long should be used for
.compat_ioctl.  autofs is the only fs using uing/ulong for this, and these
are even the only uint/ulong in the entire autofs code.

Drop unneeded long cast in return value of autofs_dev_ioctl_compat().
It's already long.

Link: http://lkml.kernel.org/r/150285069709.4670.3884827966280147529.stgit@pluto.themaw.net
Signed-off-by: Tomohiro Kusumi <tkusumi@tuxera.com>
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoautofs: drop wrong comment
Tomohiro Kusumi [Fri, 8 Sep 2017 23:16:43 +0000 (16:16 -0700)]
autofs: drop wrong comment

This comment was correct when it was added in 8d7b48e0 ("autofs4: add
miscellaneous device for ioctls") in 2008, but not after 4e44b685 "Get rid
of path_lookup in autofs4" in 2009 which introduced find_autofs_mount().

Link: http://lkml.kernel.org/r/150285069148.4670.17959501481201077445.stgit@pluto.themaw.net
Signed-off-by: Tomohiro Kusumi <tkusumi@tuxera.com>
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoautofs: use AUTOFS_DEV_IOCTL_SIZE
Tomohiro Kusumi [Fri, 8 Sep 2017 23:16:40 +0000 (16:16 -0700)]
autofs: use AUTOFS_DEV_IOCTL_SIZE

Use a macro which defines misc-dev ioctl parameter size (excluding a path
beyond &path[0]) since it's been used to initialize and copy this
structure ever since it first appeared in 8d7b48e0 in 2008.

(or simply get rid of this if this is just unnecessary abstraction when
all it needs is sizeof(struct autofs_dev_ioctl))

Edit: raven@themaw.net
That's a good point but I'd prefer to keep the macro define.
End edit: raven@themaw.net

Link: http://lkml.kernel.org/r/150285068577.4670.2599968823770600622.stgit@pluto.themaw.net
Signed-off-by: Tomohiro Kusumi <tkusumi@tuxera.com>
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoautofs: non functional header inclusion cleanup
Tomohiro Kusumi [Fri, 8 Sep 2017 23:16:37 +0000 (16:16 -0700)]
autofs: non functional header inclusion cleanup

Having header includes before any macro (without any dependency) simply
looks normal.  No reason to have these macros in between.

Link: http://lkml.kernel.org/r/150285068011.4670.10271483982093996996.stgit@pluto.themaw.net
Signed-off-by: Tomohiro Kusumi <tkusumi@tuxera.com>
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoautofs: remove unused AUTOFS_IOC_EXPIRE_DIRECT/INDIRECT
Tomohiro Kusumi [Fri, 8 Sep 2017 23:16:34 +0000 (16:16 -0700)]
autofs: remove unused AUTOFS_IOC_EXPIRE_DIRECT/INDIRECT

These are not used by either kernel or userspace, although
AUTOFS_IOC_EXPIRE_DIRECT once seems to have been used by userspace in
around 2006-2008, which was technically just an alias of the existing
ioctl AUTOFS_IOC_EXPIRE_MULTI.

ioctls for autofs are already complicated enough that they could be
removed unless these are staying here to be able to compile userspace code
of certain period of time from a decade ago.

Edit: raven@themaw.net
Yes, this is indeed very old and anything that still uses must
be updated becuase it will be using broken functionality.
End edit: raven@themaw.net

Link: http://lkml.kernel.org/r/150285067347.4670.11494624644273072003.stgit@pluto.themaw.net
Signed-off-by: Tomohiro Kusumi <tkusumi@tuxera.com>
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoautofs: make dev ioctl version and ismountpoint user accessible
Ian Kent [Fri, 8 Sep 2017 23:16:30 +0000 (16:16 -0700)]
autofs: make dev ioctl version and ismountpoint user accessible

Some of the autofs miscellaneous device ioctls need to be accessable to
user space applications without CAP_SYS_ADMIN to get information about
autofs mounts.

Link: http://lkml.kernel.org/r/150216642517.11652.2338933266137331637.stgit@pluto.themaw.net
Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Colin Walters <walters@redhat.com>
Cc: Ondrej Holy <oholy@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoautofs: make disc device user accessible
Ian Kent [Fri, 8 Sep 2017 23:16:27 +0000 (16:16 -0700)]
autofs: make disc device user accessible

The autofs miscellanous device ioctls that shouldn't require
CAP_SYS_ADMIN need to be accessible to user space applications in order
to be able to get information about autofs mounts.

The module checks capabilities so the miscelaneous device should be fine
with broad permissions.

Link: http://lkml.kernel.org/r/150216641928.11652.7388977863125547969.stgit@pluto.themaw.net
Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Colin Walters <walters@redhat.com>
Cc: Ondrej Holy <oholy@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoautofs: fix AT_NO_AUTOMOUNT not being honored
Ian Kent [Fri, 8 Sep 2017 23:16:24 +0000 (16:16 -0700)]
autofs: fix AT_NO_AUTOMOUNT not being honored

The fstatat(2) and statx() calls can pass the flag AT_NO_AUTOMOUNT which
is meant to clear the LOOKUP_AUTOMOUNT flag and prevent triggering of an
automount by the call.  But this flag is unconditionally cleared for all
stat family system calls except statx().

stat family system calls have always triggered mount requests for the
negative dentry case in follow_automount() which is intended but prevents
the fstatat(2) and statx() AT_NO_AUTOMOUNT case from being handled.

In order to handle the AT_NO_AUTOMOUNT for both system calls the negative
dentry case in follow_automount() needs to be changed to return ENOENT
when the LOOKUP_AUTOMOUNT flag is clear (and the other required flags are
clear).

AFAICT this change doesn't have any noticable side effects and may, in
some use cases (although I didn't see it in testing) prevent unnecessary
callbacks to the automount daemon.

It's also possible that a stat family call has been made with a path that
is in the process of being mounted by some other process.  But stat family
calls should return the automount state of the path as it is "now" so it
shouldn't wait for mount completion.

This is the same semantic as the positive dentry case already handled.

Link: http://lkml.kernel.org/r/150216641255.11652.4204561328197919771.stgit@pluto.themaw.net
Fixes: deccf497d804a4c5fca ("Make stat/lstat/fstatat pass AT_NO_AUTOMOUNT to vfs_statx()")
Signed-off-by: Ian Kent <raven@themaw.net>
Cc: David Howells <dhowells@redhat.com>
Cc: Colin Walters <walters@redhat.com>
Cc: Ondrej Holy <oholy@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoinit/main.c: extract early boot entropy from the passed cmdline
Daniel Micay [Fri, 8 Sep 2017 23:16:20 +0000 (16:16 -0700)]
init/main.c: extract early boot entropy from the passed cmdline

Feed the boot command-line as to the /dev/random entropy pool

Existing Android bootloaders usually pass data which may not be known by
an external attacker on the kernel command-line.  It may also be the
case on other embedded systems.  Sample command-line from a Google Pixel
running CopperheadOS....

    console=ttyHSL0,115200,n8 androidboot.console=ttyHSL0
    androidboot.hardware=sailfish user_debug=31 ehci-hcd.park=3
    lpm_levels.sleep_disabled=1 cma=32M@0-0xffffffff buildvariant=user
    veritykeyid=id:dfcb9db0089e5b3b4090a592415c28e1cb4545ab
    androidboot.bootdevice=624000.ufshc androidboot.verifiedbootstate=yellow
    androidboot.veritymode=enforcing androidboot.keymaster=1
    androidboot.serialno=FA6CE0305299 androidboot.baseband=msm
    mdss_mdp.panel=1:dsi:0:qcom,mdss_dsi_samsung_ea8064tg_1080p_cmd:1:none:cfg:single_dsi
    androidboot.slot_suffix=_b fpsimd.fpsimd_settings=0
    app_setting.use_app_setting=0 kernelflag=0x00000000 debugflag=0x00000000
    androidboot.hardware.revision=PVT radioflag=0x00000000
    radioflagex1=0x00000000 radioflagex2=0x00000000 cpumask=0x00000000
    androidboot.hardware.ddr=4096MB,Hynix,LPDDR4 androidboot.ddrinfo=00000006
    androidboot.ddrsize=4GB androidboot.hardware.color=GRA00
    androidboot.hardware.ufs=32GB,Samsung androidboot.msm.hw_ver_id=268824801
    androidboot.qf.st=2 androidboot.cid=11111111 androidboot.mid=G-2PW4100
    androidboot.bootloader=8996-012001-1704121145
    androidboot.oem_unlock_support=1 androidboot.fp_src=1
    androidboot.htc.hrdump=detected androidboot.ramdump.opt=mem@2g:2g,mem@4g:2g
    androidboot.bootreason=reboot androidboot.ramdump_enable=0 ro
    root=/dev/dm-0 dm="system none ro,0 1 android-verity /dev/sda34"
    rootwait skip_initramfs init=/init androidboot.wificountrycode=US
    androidboot.boottime=1BLL:85,1BLE:669,2BLL:0,2BLE:1777,SW:6,KL:8136

Among other things, it contains a value unique to the device
(androidboot.serialno=FA6CE0305299), unique to the OS builds for the
device variant (veritykeyid=id:dfcb9db0089e5b3b4090a592415c28e1cb4545ab)
and timings from the bootloader stages in milliseconds
(androidboot.boottime=1BLL:85,1BLE:669,2BLL:0,2BLE:1777,SW:6,KL:8136).

[tytso@mit.edu: changelog tweak]
[labbott@redhat.com: line-wrapped command line]
Link: http://lkml.kernel.org/r/20170816231458.2299-3-labbott@redhat.com
Signed-off-by: Daniel Micay <danielmicay@gmail.com>
Signed-off-by: Laura Abbott <labbott@redhat.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Laura Abbott <lauraa@codeaurora.org>
Cc: Nick Kralevich <nnk@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoinit: move stack canary initialization after setup_arch
Laura Abbott [Fri, 8 Sep 2017 23:16:17 +0000 (16:16 -0700)]
init: move stack canary initialization after setup_arch

Patch series "Command line randomness", v3.

A series to add the kernel command line as a source of randomness.

This patch (of 2):

Stack canary intialization involves getting a random number.  Getting this
random number may involve accessing caches or other architectural specific
features which are not available until after the architecture is setup.
Move the stack canary initialization later to accommodate this.

Link: http://lkml.kernel.org/r/20170816231458.2299-2-labbott@redhat.com
Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
Signed-off-by: Laura Abbott <labbott@redhat.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Daniel Micay <danielmicay@gmail.com>
Cc: Nick Kralevich <nnk@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agobinfmt_flat: delete two error messages for a failed memory allocation in decompress_e...
Markus Elfring [Fri, 8 Sep 2017 23:16:14 +0000 (16:16 -0700)]
binfmt_flat: delete two error messages for a failed memory allocation in decompress_exec()

Omit extra messages for a memory allocation failure in this function.

This issue was detected by using the Coccinelle software.

Link: http://lkml.kernel.org/r/f92aac79-b05e-321a-1a19-d38c7159ee9c@users.sourceforge.net
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agocheckpatch: add 6 missing types to --list-types
Jean Delvare [Fri, 8 Sep 2017 23:16:11 +0000 (16:16 -0700)]
checkpatch: add 6 missing types to --list-types

Unlike all other types, LONG_LINE, LONG_LINE_COMMENT and LONG_LINE_STRING
are passed to WARN() through a variable.  This causes the parser in
list_types() to miss them and consequently they are not present in the
output of --list-types.

Additionally, types TYPO_SPELLING, FSF_MAILING_ADDRESS and AVOID_BUG are
passed with a variable level, causing the parser to miss them too.

So modify the regex to also catch these special cases.

Link: http://lkml.kernel.org/r/20170902175610.7e4a7c9d@endymion
Fixes: 3beb42eced39 ("checkpatch: add --list-types to show message types to show or ignore")
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Acked-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agocheckpatch: rename variables to avoid confusion
Jean Delvare [Fri, 8 Sep 2017 23:16:07 +0000 (16:16 -0700)]
checkpatch: rename variables to avoid confusion

The variable name "$msg_type" is sometimes used to set the message type,
and sometimes used to set the message level.  This works but is kind of
confusing.  Use "$msg_level" in the latter case instead, to make the code
clearer.

Link: http://lkml.kernel.org/r/20170902175345.175db33a@endymion
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Acked-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agocheckpatch: fix typo in comment
Jean Delvare [Fri, 8 Sep 2017 23:16:04 +0000 (16:16 -0700)]
checkpatch: fix typo in comment

Link: http://lkml.kernel.org/r/20170902175249.15bb77f2@endymion
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Acked-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agocheckpatch: add --strict check for ifs with unnecessary parentheses
Joe Perches [Fri, 8 Sep 2017 23:16:01 +0000 (16:16 -0700)]
checkpatch: add --strict check for ifs with unnecessary parentheses

An if statement test like
if ((foo == bar) && (baz != qux))
can arguably be better written without the parentheses as
if (foo == bar && baz != qux)

Add a test to find these cases.

Link: http://lkml.kernel.org/r/dcd0561ddd0fa43c51a420d53b550d738bf42001.1502734458.git.joe@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agolib/oid_registry.c: X.509: fix the buffer overflow in the utility function for OID...
Takashi Iwai [Fri, 8 Sep 2017 23:15:58 +0000 (16:15 -0700)]
lib/oid_registry.c: X.509: fix the buffer overflow in the utility function for OID string

The sprint_oid() utility function doesn't properly check the buffer size
that it causes that the warning in vsnprintf() be triggered.  For
example on v4.1 kernel:

  ------------[ cut here ]------------
  WARNING: CPU: 0 PID: 2357 at lib/vsprintf.c:1867 vsnprintf+0x5a7/0x5c0()
  ...

We can trigger this issue by injecting maliciously crafted x509 cert in
DER format.  Just using hex editor to change the length of OID to over
the length of the SEQUENCE container.  For example:

    0:d=0  hl=4 l= 980 cons: SEQUENCE
    4:d=1  hl=4 l= 700 cons:  SEQUENCE
    8:d=2  hl=2 l=   3 cons:   cont [ 0 ]
   10:d=3  hl=2 l=   1 prim:    INTEGER           :02
   13:d=2  hl=2 l=   9 prim:   INTEGER           :9B47FAF791E7D1E3
   24:d=2  hl=2 l=  13 cons:   SEQUENCE
   26:d=3  hl=2 l=   9 prim:    OBJECT            :sha256WithRSAEncryption
   37:d=3  hl=2 l=   0 prim:    NULL
   39:d=2  hl=2 l= 121 cons:   SEQUENCE
   41:d=3  hl=2 l=  22 cons:    SET
   43:d=4  hl=2 l=  20 cons:     SEQUENCE      <=== the SEQ length is 20
   45:d=5  hl=2 l=   3 prim:      OBJECT            :organizationName
<=== the original length is 3, change the length of OID to over the length of SEQUENCE

Pawel Wieczorkiewicz reported this problem and Takashi Iwai provided
patch to fix it by checking the bufsize in sprint_oid().

Link: http://lkml.kernel.org/r/20170903021646.2080-1-jlee@suse.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: "Lee, Chun-Yi" <jlee@suse.com>
Reported-by: Pawel Wieczorkiewicz <pwieczorkiewicz@suse.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Pawel Wieczorkiewicz <pwieczorkiewicz@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoradix-tree: must check __radix_tree_preload() return value
Eric Dumazet [Fri, 8 Sep 2017 23:15:54 +0000 (16:15 -0700)]
radix-tree: must check __radix_tree_preload() return value

__radix_tree_preload() only disables preemption if no error is returned.

So we really need to make sure callers always check the return value.

idr_preload() contract is to always disable preemption, so we need
to add a missing preempt_disable() if an error happened.

Similarly, ida_pre_get() only needs to call preempt_enable() in the
case no error happened.

Link: http://lkml.kernel.org/r/1504637190.15310.62.camel@edumazet-glaptop3.roam.corp.google.com
Fixes: 0a835c4f090a ("Reimplement IDR and IDA using the radix tree")
Fixes: 7ad3d4d85c7a ("ida: Move ida_bitmap to a percpu variable")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: <stable@vger.kernel.org> [4.11+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agolib/cmdline.c: remove meaningless comment
Baoquan He [Fri, 8 Sep 2017 23:15:51 +0000 (16:15 -0700)]
lib/cmdline.c: remove meaningless comment

One line of code was commented out by c++ style comment for debugging, but
forgot removing it.

Clean it up.

Link: http://lkml.kernel.org/r/1503312113-11843-1-git-send-email-bhe@redhat.com
Signed-off-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agolib/string.c: check for kmalloc() failure
Dan Carpenter [Fri, 8 Sep 2017 23:15:48 +0000 (16:15 -0700)]
lib/string.c: check for kmalloc() failure

This is mostly to keep the number of static checker warnings down so we
can spot new bugs instead of them being drowned in noise.  This function
doesn't return normal kernel error codes but instead the return value is
used to display exactly which memory failed.  I chose -1 as hopefully
that's a helpful thing to print.

Link: http://lkml.kernel.org/r/20170817115420.uikisjvfmtrqkzjn@mwanda
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Kees Cook <keescook@chromium.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Cc: Daniel Micay <danielmicay@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agolib/rhashtable: fix comment on locks_mul default value
Davidlohr Bueso [Fri, 8 Sep 2017 23:15:45 +0000 (16:15 -0700)]
lib/rhashtable: fix comment on locks_mul default value

As of commit 4cf0b354d92 ("rhashtable: avoid large lock-array
allocations"), the default value for the locks multiplier was reduced
from 128 to 32.

Update the header file to reflect this.

Link: http://lkml.kernel.org/r/20170815215401.30745-1-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agobitmap: introduce BITMAP_FROM_U64()
Yury Norov [Fri, 8 Sep 2017 23:15:41 +0000 (16:15 -0700)]
bitmap: introduce BITMAP_FROM_U64()

The macro is the compile-time analogue of bitmap_from_u64() with the same
purpose: convert the 64-bit number to the properly ordered pair of 32-bit
parts, suitable for filling the bitmap in 32-bit BE environment.

Use it to make test_bitmap_parselist() correct for 32-bit BE ABIs.

Tested on BE mips/qemu.

[akpm@linux-foundation.org: tweak code comment]
Link: http://lkml.kernel.org/r/20170810172916.24144-1-ynorov@caviumnetworks.com
Signed-off-by: Yury Norov <ynorov@caviumnetworks.com>
Cc: Noam Camus <noamca@mellanox.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agolib/test_bitmap.c: add test for bitmap_parselist()
Yury Norov [Fri, 8 Sep 2017 23:15:38 +0000 (16:15 -0700)]
lib/test_bitmap.c: add test for bitmap_parselist()

Do some basic checks for bitmap_parselist().

[akpm@linux-foundation.org: fix printk warning]
Link: http://lkml.kernel.org/r/20170807225438.16161-2-ynorov@caviumnetworks.com
Signed-off-by: Yury Norov <ynorov@caviumnetworks.com>
Cc: Noam Camus <noamca@mellanox.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agolib/bitmap.c: make bitmap_parselist() thread-safe and much faster
Yury Norov [Fri, 8 Sep 2017 23:15:34 +0000 (16:15 -0700)]
lib/bitmap.c: make bitmap_parselist() thread-safe and much faster

Current implementation of bitmap_parselist() uses a static variable to
save local state while setting bits in the bitmap.  It is obviously wrong
if we assume execution in multiprocessor environment.  Fortunately, it's
possible to rewrite this portion of code to avoid using the static
variable.

It is also possible to set bits in the mask per-range with bitmap_set(),
not per-bit, as it is implemented now, with set_bit(); which is way
faster.

The important side effect of this change is that setting bits in this
function from now is not per-bit atomic and less memory-ordered.  This is
because set_bit() guarantees the order of memory accesses, while
bitmap_set() does not.  I think that it is the advantage of the new
approach, because the bitmap_parselist() is intended to initialise bit
arrays, and user should protect the whole bitmap during initialisation if
needed.  So protecting individual bits looks expensive and useless.  Also,
other range-oriented functions in lib/bitmap.c don't worry much about
atomicity.

With all that, setting 2k bits in map with the pattern like 0-2047:128/256
becomes ~50 times faster after applying the patch in my testing
environment (arm64 hosted on qemu).

The second patch of the series adds the test for bitmap_parselist().  It's
not intended to cover all tricky cases, just to make sure that I didn't
screw up during rework.

Link: http://lkml.kernel.org/r/20170807225438.16161-1-ynorov@caviumnetworks.com
Signed-off-by: Yury Norov <ynorov@caviumnetworks.com>
Cc: Noam Camus <noamca@mellanox.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agolib: add test module for CONFIG_DEBUG_VIRTUAL
Florian Fainelli [Fri, 8 Sep 2017 23:15:31 +0000 (16:15 -0700)]
lib: add test module for CONFIG_DEBUG_VIRTUAL

Add a test module that allows testing that CONFIG_DEBUG_VIRTUAL works
correctly, at least that it can catch invalid calls to virt_to_phys()
against the non-linear kernel virtual address map.

Link: http://lkml.kernel.org/r/20170808164035.26725-1-f.fainelli@gmail.com
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agolib/hexdump.c: return -EINVAL in case of error in hex2bin()
Andy Shevchenko [Fri, 8 Sep 2017 23:15:28 +0000 (16:15 -0700)]
lib/hexdump.c: return -EINVAL in case of error in hex2bin()

In some cases caller would like to use error code directly without
shadowing.

-EINVAL feels a rightful code to return in case of error in hex2bin().

Link: http://lkml.kernel.org/r/20170731135510.68023-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoblock/cfq: cache rightmost rb_node
Davidlohr Bueso [Fri, 8 Sep 2017 23:15:25 +0000 (16:15 -0700)]
block/cfq: cache rightmost rb_node

For the same reasons we already cache the leftmost pointer, apply the same
optimization for rb_last() calls.  Users must explicitly do this as
rb_root_cached only deals with the smallest node.

[dave@stgolabs.net: brain fart #1]
Link: http://lkml.kernel.org/r/20170731155955.GD21328@linux-80c1.suse
Link: http://lkml.kernel.org/r/20170719014603.19029-18-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Jens Axboe <axboe@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agomem/memcg: cache rightmost node
Davidlohr Bueso [Fri, 8 Sep 2017 23:15:21 +0000 (16:15 -0700)]
mem/memcg: cache rightmost node

Such that we can optimize __mem_cgroup_largest_soft_limit_node().  The
only overhead is the extra footprint for the cached pointer, but this
should not be an issue for mem_cgroup_tree_per_node.

[dave@stgolabs.net: brain fart #2]
Link: http://lkml.kernel.org/r/20170731160114.GE21328@linux-80c1.suse
Link: http://lkml.kernel.org/r/20170719014603.19029-17-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agofs/epoll: use faster rb_first_cached()
Davidlohr Bueso [Fri, 8 Sep 2017 23:15:18 +0000 (16:15 -0700)]
fs/epoll: use faster rb_first_cached()

...  such that we can avoid the tree walks to get the node with the
smallest key.  Semantically the same, as the previously used rb_first(),
but O(1).  The main overhead is the extra footprint for the cached rb_node
pointer, which should not matter for epoll.

Link: http://lkml.kernel.org/r/20170719014603.19029-15-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoprocfs: use faster rb_first_cached()
Davidlohr Bueso [Fri, 8 Sep 2017 23:15:15 +0000 (16:15 -0700)]
procfs: use faster rb_first_cached()

...  such that we can avoid the tree walks to get the node with the
smallest key.  Semantically the same, as the previously used rb_first(),
but O(1).  The main overhead is the extra footprint for the cached rb_node
pointer, which should not matter for procfs.

Link: http://lkml.kernel.org/r/20170719014603.19029-14-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agolib/interval-tree: correct comment wrt generic flavor
Davidlohr Bueso [Fri, 8 Sep 2017 23:15:12 +0000 (16:15 -0700)]
lib/interval-tree: correct comment wrt generic flavor

interval_tree.h _is_ the generic flavor.

Link: http://lkml.kernel.org/r/20170719014603.19029-13-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agolib/interval_tree: fast overlap detection
Davidlohr Bueso [Fri, 8 Sep 2017 23:15:08 +0000 (16:15 -0700)]
lib/interval_tree: fast overlap detection

Allow interval trees to quickly check for overlaps to avoid unnecesary
tree lookups in interval_tree_iter_first().

As of this patch, all interval tree flavors will require using a
'rb_root_cached' such that we can have the leftmost node easily
available.  While most users will make use of this feature, those with
special functions (in addition to the generic insert, delete, search
calls) will avoid using the cached option as they can do funky things
with insertions -- for example, vma_interval_tree_insert_after().

[jglisse@redhat.com: fix deadlock from typo vm_lock_anon_vma()]
Link: http://lkml.kernel.org/r/20170808225719.20723-1-jglisse@redhat.com
Link: http://lkml.kernel.org/r/20170719014603.19029-12-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Doug Ledford <dledford@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Christian Benvenuti <benve@cisco.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoblock/cfq: replace cfq_rb_root leftmost caching
Davidlohr Bueso [Fri, 8 Sep 2017 23:15:05 +0000 (16:15 -0700)]
block/cfq: replace cfq_rb_root leftmost caching

... with the generic rbtree flavor instead. No changes
in semantics whatsoever.

Link: http://lkml.kernel.org/r/20170719014603.19029-11-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Jens Axboe <axboe@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agolocking/rtmutex: replace top-waiter and pi_waiters leftmost caching
Davidlohr Bueso [Fri, 8 Sep 2017 23:15:01 +0000 (16:15 -0700)]
locking/rtmutex: replace top-waiter and pi_waiters leftmost caching

... with the generic rbtree flavor instead. No changes
in semantics whatsoever.

Link: http://lkml.kernel.org/r/20170719014603.19029-10-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agosched/deadline: replace earliest dl and rq leftmost caching
Davidlohr Bueso [Fri, 8 Sep 2017 23:14:58 +0000 (16:14 -0700)]
sched/deadline: replace earliest dl and rq leftmost caching

... with the generic rbtree flavor instead. No changes
in semantics whatsoever.

Link: http://lkml.kernel.org/r/20170719014603.19029-9-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agosched/fair: replace cfs_rq->rb_leftmost
Davidlohr Bueso [Fri, 8 Sep 2017 23:14:55 +0000 (16:14 -0700)]
sched/fair: replace cfs_rq->rb_leftmost

... with the generic rbtree flavor instead. No changes
in semantics whatsoever.

Link: http://lkml.kernel.org/r/20170719014603.19029-8-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agolib/rbtree_test.c: support rb_root_cached
Davidlohr Bueso [Fri, 8 Sep 2017 23:14:52 +0000 (16:14 -0700)]
lib/rbtree_test.c: support rb_root_cached

We can work with a single rb_root_cached root to test both cached and
non-cached rbtrees.  In addition, also add a test to measure latencies
between rb_first and its fast counterpart.

Link: http://lkml.kernel.org/r/20170719014603.19029-7-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agolib/rbtree_test.c: add (inorder) traversal test
Davidlohr Bueso [Fri, 8 Sep 2017 23:14:49 +0000 (16:14 -0700)]
lib/rbtree_test.c: add (inorder) traversal test

This adds a second test for regular rb-tree testing in that there is no
need to repeat it for the augmented flavor.

Link: http://lkml.kernel.org/r/20170719014603.19029-6-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agolib/rbtree_test.c: make input module parameters
Davidlohr Bueso [Fri, 8 Sep 2017 23:14:46 +0000 (16:14 -0700)]
lib/rbtree_test.c: make input module parameters

Allows for more flexible debugging.

Link: http://lkml.kernel.org/r/20170719014603.19029-5-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agorbtree: add some additional comments for rebalancing cases
Davidlohr Bueso [Fri, 8 Sep 2017 23:14:42 +0000 (16:14 -0700)]
rbtree: add some additional comments for rebalancing cases

While overall the code is very nicely commented, it might not be
immediately obvious from the diagrams what is going on.  Add a very
brief summary of each case.  Opposite cases where the node is the left
child are left untouched.

Link: http://lkml.kernel.org/r/20170719014603.19029-4-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agorbtree: optimize root-check during rebalancing loop
Davidlohr Bueso [Fri, 8 Sep 2017 23:14:39 +0000 (16:14 -0700)]
rbtree: optimize root-check during rebalancing loop

The only times the nil-parent (root node) condition is true is when the
node is the first in the tree, or after fixing rbtree rule #4 and the
case 1 rebalancing made the node the root.  Such conditions do not apply
most of the time:

(i) The common case in an rbtree is to have more than a single node,
    so this is only true for the first rb_insert().

(ii) While there is a chance only one first rotation is needed, cases
    where the node's uncle is black (cases 2,3) are more common as we can
    have the following scenarios during the rotation looping:

    case1 only, case1+1, case2+3, case1+2+3, case3 only, etc.

This patch, therefore, adds an unlikely() optimization to this
conditional.  When profiling with CONFIG_PROFILE_ANNOTATED_BRANCHES, a
kernel build shows that the incorrect rate is less than 15%, and for
workloads that involve insert mostly trees overtime tend to have less
than 2% incorrect rate.

Link: http://lkml.kernel.org/r/20170719014603.19029-3-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agorbtree: cache leftmost node internally
Davidlohr Bueso [Fri, 8 Sep 2017 23:14:36 +0000 (16:14 -0700)]
rbtree: cache leftmost node internally

Patch series "rbtree: Cache leftmost node internally", v4.

A series to extending rbtrees to internally cache the leftmost node such
that we can have fast overlap check optimization for all interval tree
users[1].  The benefits of this series are that:

(i)   Unify users that do internal leftmost node caching.
(ii)  Optimize all interval tree users.
(iii) Convert at least two new users (epoll and procfs) to the new interface.

This patch (of 16):

Red-black tree semantics imply that nodes with smaller or greater (or
equal for duplicates) keys always be to the left and right,
respectively.  For the kernel this is extremely evident when considering
our rb_first() semantics.  Enabling lookups for the smallest node in the
tree in O(1) can save a good chunk of cycles in not having to walk down
the tree each time.  To this end there are a few core users that
explicitly do this, such as the scheduler and rtmutexes.  There is also
the desire for interval trees to have this optimization allowing faster
overlap checking.

This patch introduces a new 'struct rb_root_cached' which is just the
root with a cached pointer to the leftmost node.  The reason why the
regular rb_root was not extended instead of adding a new structure was
that this allows the user to have the choice between memory footprint
and actual tree performance.  The new wrappers on top of the regular
rb_root calls are:

 - rb_first_cached(cached_root) -- which is a fast replacement
     for rb_first.

 - rb_insert_color_cached(node, cached_root, new)

 - rb_erase_cached(node, cached_root)

In addition, augmented cached interfaces are also added for basic
insertion and deletion operations; which becomes important for the
interval tree changes.

With the exception of the inserts, which adds a bool for updating the
new leftmost, the interfaces are kept the same.  To this end, porting rb
users to the cached version becomes really trivial, and keeping current
rbtree semantics for users that don't care about the optimization
requires zero overhead.

Link: http://lkml.kernel.org/r/20170719014603.19029-2-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agobitops: avoid integer overflow in GENMASK(_ULL)
Matthias Kaehlcke [Fri, 8 Sep 2017 23:14:33 +0000 (16:14 -0700)]
bitops: avoid integer overflow in GENMASK(_ULL)

GENMASK(_ULL) performs a left-shift of ~0UL(L), which technically
results in an integer overflow.  clang raises a warning if the overflow
occurs in a preprocessor expression.  Clear the low-order bits through a
substraction instead of the left-shift to avoid the overflow.

(akpm: no change in .text size in my testing)

Link: http://lkml.kernel.org/r/20170803212020.24939-1-mka@chromium.org
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoinclude: warn for inconsistent endian config definition
Babu Moger [Fri, 8 Sep 2017 23:14:29 +0000 (16:14 -0700)]
include: warn for inconsistent endian config definition

We have seen some generic code use config parameter CONFIG_CPU_BIG_ENDIAN
to decide the endianness.

Here are the few examples.
include/asm-generic/qrwlock.h
drivers/of/base.c
drivers/of/fdt.c
drivers/tty/serial/earlycon.c
drivers/tty/serial/serial_core.c

Display warning if CPU_BIG_ENDIAN is not defined on big endian
architecture and also warn if it defined on little endian architectures.

Here is our original discussion
https://lkml.org/lkml/2017/5/24/620

Link: http://lkml.kernel.org/r/1499358861-179979-4-git-send-email-babu.moger@oracle.com
Signed-off-by: Babu Moger <babu.moger@oracle.com>
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: David S. Miller <davem@davemloft.net>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Cc: Michal Simek <monstr@monstr.eu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoarch/microblaze: add choice for endianness and update Makefile
Babu Moger [Fri, 8 Sep 2017 23:14:25 +0000 (16:14 -0700)]
arch/microblaze: add choice for endianness and update Makefile

microblaze architectures can be configured for either little or big endian
formats.  Add a choice option for the user to select the correct endian
format(default to big endian).

Also update the Makefile so toolchain can compile for the format it is
configured for.

Link: http://lkml.kernel.org/r/1499358861-179979-3-git-send-email-babu.moger@oracle.com
Signed-off-by: Babu Moger <babu.moger@oracle.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Michal Simek <monstr@monstr.eu>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: David S. Miller <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoarch: define CPU_BIG_ENDIAN for all fixed big endian archs
Babu Moger [Fri, 8 Sep 2017 23:14:22 +0000 (16:14 -0700)]
arch: define CPU_BIG_ENDIAN for all fixed big endian archs

Patch series "Define CPU_BIG_ENDIAN or warn for inconsistencies", v3.

While working on enabling queued rwlock on SPARC, found this following
code in include/asm-generic/qrwlock.h which uses CONFIG_CPU_BIG_ENDIAN to
clear a byte.

static inline u8 *__qrwlock_write_byte(struct qrwlock *lock)
 {
return (u8 *)lock + 3 * IS_BUILTIN(CONFIG_CPU_BIG_ENDIAN);
 }

Problem is many of the fixed big endian architectures don't define
CPU_BIG_ENDIAN and clears the wrong byte.

Define CPU_BIG_ENDIAN for all the fixed big endian architecture to fix it.

Also found few more references of this config parameter in
drivers/of/base.c
drivers/of/fdt.c
drivers/tty/serial/earlycon.c
drivers/tty/serial/serial_core.c
Be aware that this may cause regressions if someone has worked-around
problems in the above code already. Remove the work-around.

Here is our original discussion
https://lkml.org/lkml/2017/5/24/620

Link: http://lkml.kernel.org/r/1499358861-179979-2-git-send-email-babu.moger@oracle.com
Signed-off-by: Babu Moger <babu.moger@oracle.com>
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Stafford Horne <shorne@gmail.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agotreewide: make "nr_cpu_ids" unsigned
Alexey Dobriyan [Fri, 8 Sep 2017 23:14:18 +0000 (16:14 -0700)]
treewide: make "nr_cpu_ids" unsigned

First, number of CPUs can't be negative number.

Second, different signnnedness leads to suboptimal code in the following
cases:

1)
kmalloc(nr_cpu_ids * sizeof(X));

"int" has to be sign extended to size_t.

2)
while (loff_t *pos < nr_cpu_ids)

MOVSXD is 1 byte longed than the same MOV.

Other cases exist as well. Basically compiler is told that nr_cpu_ids
can't be negative which can't be deduced if it is "int".

Code savings on allyesconfig kernel: -3KB

add/remove: 0/0 grow/shrink: 25/264 up/down: 261/-3631 (-3370)
function                                     old     new   delta
coretemp_cpu_online                          450     512     +62
rcu_init_one                                1234    1272     +38
pci_device_probe                             374     399     +25

...

pgdat_reclaimable_pages                      628     556     -72
select_fallback_rq                           446     369     -77
task_numa_find_cpu                          1923    1807    -116

Link: http://lkml.kernel.org/r/20170819114959.GA30580@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>