review.tizen.org Git - platform/kernel/linux-rpi.git/log

projects / platform / kernel / linux-rpi.git / log

summary | shortlog | log | commit | commitdiff | tree
first ⋅ prev ⋅ next

commit | commitdiff | tree

NeilBrown [Wed, 15 Jul 2015 07:36:21 +0000 (17:36 +1000)]

md/raid5: strengthen check on reshape_position at run.

When reshaping, we work in units of the largest chunk size.
If changing from a larger to a smaller chunk size, that means we
reshape more than one stripe at a time. So the required alignment
of reshape_position needs to take into account both the old
and new chunk size.

This means that both 'here_new' and 'here_old' are calculated with
respect to the same (maximum) chunk size, so testing if they are the
same when delta_disks is zero becomes pointless.

Signed-off-by: NeilBrown <neilb@suse.com>

commit | commitdiff | tree

NeilBrown [Wed, 15 Jul 2015 07:24:17 +0000 (17:24 +1000)]

md/raid5: switch to use conf->chunk_sectors in place of mddev->chunk_sectors where possible

The chunk_sectors and new_chunk_sectors fields of mddev can be changed
any time (via sysfs) that the reconfig mutex can be taken. So raid5
keeps internal copies in 'conf' which are stable except for a short
locked moment when reshape stops/starts.

So any access that does not hold reconfig_mutex should use the 'conf'
values, not the 'mddev' values.
Several don't.

This could result in corruption if new values were written at awkward
times.

Also use min() or max() rather than open-coding.

Signed-off-by: NeilBrown <neilb@suse.com>

commit | commitdiff | tree

NeilBrown [Fri, 17 Jul 2015 02:17:50 +0000 (12:17 +1000)]

md/raid5: always set conf->prev_chunk_sectors and ->prev_algo

These aren't really needed when no reshape is happening,
but it is safer to have them always set to a meaningful value.
The next patch will use ->prev_chunk_sectors without checking
if a reshape is happening (because that makes the code simpler),
and this patch makes that safe.

Signed-off-by: NeilBrown <neilb@suse.com>

commit | commitdiff | tree

NeilBrown [Mon, 6 Jul 2015 06:33:47 +0000 (16:33 +1000)]

md/raid10: fix a few typos in comments

Signed-off-by: NeilBrown <neilb@suse.com>

commit | commitdiff | tree

NeilBrown [Mon, 6 Jul 2015 02:28:45 +0000 (12:28 +1000)]

md/raid5: consider updating reshape_position at start of reshape.

md/raid5 only updates ->reshape_position (which is stored in
metadata and is authoritative) occasionally, but particularly
when getting closed to ->resync_max as it must be correct
when ->resync_max is reached.

When mdadm tries to stop an array which is reshaping it will:
- freeze the reshape,
- set resync_max to where the reshape has reached.
- unfreeze the reshape.
When this happens, the reshape is aborted and then restarted.

The restart doesn't check that resync_max is close, and so doesn't
update ->reshape_position like it should.
This results in the reshape stopping, but ->reshape_position being
incorrect.

So on that first call to reshape_request, make sure ->reshape_position
is updated if needed.

Signed-off-by: NeilBrown <neilb@suse.com>

commit | commitdiff | tree

NeilBrown [Mon, 6 Jul 2015 02:26:57 +0000 (12:26 +1000)]

md: close some races between setting and checking sync_action.

When checking sync_action in a script, we want to be sure it is
as accurate as possible.
As resync/reshape etc doesn't always start immediately (a separate
thread is scheduled to do it), it is best if 'action_show'
checks if MD_RECOVER_NEEDED is set (which it does) and in that
case reports what is likely to start soon (which it only sometimes
does).

So:
- report 'reshape' if reshape_position suggests one might start.
- set MD_RECOVERY_RECOVER in raid1_reshape(), because that is very
likely to happen next.

Signed-off-by: NeilBrown <neilb@suse.com>

commit | commitdiff | tree

NeilBrown [Thu, 2 Jul 2015 07:12:58 +0000 (17:12 +1000)]

md: Keep /proc/mdstat reporting recovery until fully DONE.

Currently when a recovery completes, mdstat shows that it has finished
before the new device is marked as a full member. Because of this it
can appear to a script that the recovery finished but the array isn't
in sync.

So while MD_RECOVERY_DONE is still set, keep mdstat reporting "recovery".
Once md_reap_sync_thread() completes, the spare will be active and then
MD_RECOVERY_DONE will be cleared.

To ensure this is race-free, set MD_RECOVERY_DONE before clearning
curr_resync.

Signed-off-by: NeilBrown <neilb@suse.com>

commit | commitdiff | tree

Ard Biesheuvel [Wed, 1 Jul 2015 02:19:56 +0000 (12:19 +1000)]

md/raid6: delta syndrome for ARM NEON

This implements XOR syndrome calculation using NEON intrinsics.
As before, the module can be built for ARM and arm64 from the
same source.

Relative performance on a Cortex-A57 based system:

  raid6: int64x1  gen()   905 MB/s
  raid6: int64x1  xor()   881 MB/s
  raid6: int64x2  gen()  1343 MB/s
  raid6: int64x2  xor()  1286 MB/s
  raid6: int64x4  gen()  1896 MB/s
  raid6: int64x4  xor()  1321 MB/s
  raid6: int64x8  gen()  1773 MB/s
  raid6: int64x8  xor()  1165 MB/s
  raid6: neonx1   gen()  1834 MB/s
  raid6: neonx1   xor()  1278 MB/s
  raid6: neonx2   gen()  2528 MB/s
  raid6: neonx2   xor()  1942 MB/s
  raid6: neonx4   gen()  2888 MB/s
  raid6: neonx4   xor()  2334 MB/s
  raid6: neonx8   gen()  2957 MB/s
  raid6: neonx8   xor()  2232 MB/s
  raid6: using algorithm neonx8 gen() 2957 MB/s
  raid6: .... xor() 2232 MB/s, rmw enabled

Cc: Markus Stockhausen <stockhausen@collogia.de>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NeilBrown <neilb@suse.com>

commit | commitdiff | tree

NeilBrown [Mon, 3 Aug 2015 03:11:47 +0000 (13:11 +1000)]

md/raid0: update queue parameter in a safer location.

When a (e.g.) RAID5 array is reshaped to RAID0, the updating
of queue parameters (e.g. max number of sectors per bio) is
done in the wrong place.
It should be part of ->run, but it is actually part of ->takeover.
This means it happens before level_store() calls:

blk_set_stacking_limits(&mddev->queue->limits);

and so it ineffective. This can lead to errors from underlying
devices.

So move all the relevant settings out of create_stripe_zones()
and into raid0_run().

As this can lead to a bug-on it is suitable for any -stable
kernel which supports reshape to RAID0. So 2.6.35 or later.
As the bug has been present for five years there is no urgency,
so no need to rush into -stable.

Fixes: 9af204cf720c ("md: Add support for Raid5->Raid0 and Raid10->Raid0 takeover")
Cc: stable@vger.kernel.org (v2.6.35+ - please delay until after -final release).
Reported-by: Yi Zhang <yizhan@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.com>

commit | commitdiff | tree

Benjamin Randazzo [Sat, 25 Jul 2015 14:36:50 +0000 (16:36 +0200)]

md: simplify get_bitmap_file now that "file" is zeroed.

There is no point assigning '\0' to file->pathname[0] as
file is now zeroed out, so remove that branch and
simplify the code.

[Original patch combined this with the change to use
kzalloc. I split the two so that the change to kzalloc
is easier to backport. - neilb]

Signed-off-by: Benjamin Randazzo <benjamin@randazzo.fr>
Signed-off-by: NeilBrown <neilb@suse.com>

commit | commitdiff | tree

NeilBrown [Mon, 3 Aug 2015 07:09:57 +0000 (17:09 +1000)]

md/raid5: don't let shrink_slab shrink too far.

I have a report of drop_one_stripe() called from
raid5_cache_scan() apparently finding ->max_nr_stripes == 0.

This should not be allowed.

So add a test to keep max_nr_stripes above min_nr_stripes.

Also use a 'mask' rather than a 'mod' in drop_one_stripe
to ensure 'hash' is valid even if max_nr_stripes does reach zero.

Fixes: edbe83ab4c27 ("md/raid5: allow the stripe_cache to grow and shrink.")
Cc: stable@vger.kernel.org (4.1 - please release with 2d5b569b665)
Reported-by: Tomas Papan <tomas.papan@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.com>

commit | commitdiff | tree

Benjamin Randazzo [Sat, 25 Jul 2015 14:36:50 +0000 (16:36 +0200)]

md: use kzalloc() when bitmap is disabled

In drivers/md/md.c get_bitmap_file() uses kmalloc() for creating a
mdu_bitmap_file_t called "file".

5769         file = kmalloc(sizeof(*file), GFP_NOIO);
5770         if (!file)
5771                 return -ENOMEM;

This structure is copied to user space at the end of the function.

5786         if (err == 0 &&
5787             copy_to_user(arg, file, sizeof(*file)))
5788                 err = -EFAULT

But if bitmap is disabled only the first byte of "file" is initialized
with zero, so it's possible to read some bytes (up to 4095) of kernel
space memory from user space. This is an information leak.

5775         /* bitmap disabled, zero the first byte and copy out */
5776         if (!mddev->bitmap_info.file)
5777                 file->pathname[0] = '\0';

Signed-off-by: Benjamin Randazzo <benjamin@randazzo.fr>
Signed-off-by: NeilBrown <neilb@suse.com>

commit | commitdiff | tree

NeilBrown [Mon, 27 Jul 2015 01:48:52 +0000 (11:48 +1000)]

md/raid1: extend spinlock to protect raid1_end_read_request against inconsistencies

raid1_end_read_request() assumes that the In_sync bits are consistent
with the ->degaded count.
raid1_spare_active updates the In_sync bit before the ->degraded count
and so exposes an inconsistency, as does error()
So extend the spinlock in raid1_spare_active() and error() to hide those
inconsistencies.

This should probably be part of
  Commit: 34cab6f42003 ("md/raid1: fix test for 'was read error from
  last working device'.")
as it addresses the same issue.  It fixes the same bug and should go
to -stable for same reasons.

Fixes: 76073054c95b ("md/raid1: clean up read_balance.")
Cc: stable@vger.kernel.org (v3.0+)
Signed-off-by: NeilBrown <neilb@suse.com>

commit | commitdiff | tree

Linus Torvalds [Mon, 3 Aug 2015 01:34:55 +0000 (18:34 -0700)]

Linux 4.2-rc5

commit | commitdiff | tree

Linus Torvalds [Mon, 3 Aug 2015 01:07:36 +0000 (18:07 -0700)]

Merge tag 'powerpc-4.2-3' of git://git./linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
- TCE table memory calculation fix from Alexey
- Build fix for ans-lcd from Luis
- Unbalanced IRQ warning fix from Alistair

* tag 'powerpc-4.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/eeh-powernv: Fix unbalanced IRQ warning
  macintosh/ans-lcd: fix build failure after module_init/exit relocation
  powerpc/powernv/ioda2: Fix calculation for memory allocated for TCE table

commit | commitdiff | tree

Linus Torvalds [Thu, 30 Jul 2015 05:18:16 +0000 (22:18 -0700)]

i915: temporary fix for DP MST docking station NULL pointer dereference

Ted Ts'o reports that his Lenovo T540p ThinkPad crashes at boot if
attached to the docking station. This is a regression that he was able
to bisect to commit 8c7b5ccb7298: "drm/i915: Use atomic helpers for
computing changed flags:"

The reason seems to be the new call to drm_atomic_helper_check_modeset()
added to intel_modeset_compute_config(), which in turn calls
update_connector_routing(), and somehow ends up picking a NULL crtc for
the connector state, causing the subsequent drm_crtc_index() to OOPS.

Daniel Vetter says that the fundamental issue seems to be confusion in
the encoder selection, and this isn't the right fix, but while he chases
down the proper fix, this at least avoids the NULL pointer dereference
and makes Ted's docking station work again.

Reported-bisected-and-tested-by: Theodore Ts'o <tytso@mit.edu>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Mani Nikula <jani.nikula@linux.intel.com>
Cc: Dave Airlie <airlied@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Linus Torvalds [Sun, 2 Aug 2015 16:36:21 +0000 (09:36 -0700)]

Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
"A set of three fixes for the ipr driver and one fairly major one for
  memory leaks in the mq path of SCSI"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: fix memory leak with scsi-mq
  ipr: Fix invalid array indexing for HRRQ
  ipr: Fix incorrect trace indexing
  ipr: Fix locking for unit attention handling

commit | commitdiff | tree

Linus Torvalds [Sun, 2 Aug 2015 16:12:46 +0000 (09:12 -0700)]

Merge tag 'armsoc-for-linus' of git://git./linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Olof Johansson:
"Things are calming down nicely here w.r.t. fixes.  This batch
  includes two week's worth since I missed to send before -rc4.

  Nothing particularly scary to point out, smaller fixes here and there.
  Shortlog describes it pretty well"

* tag 'armsoc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  ARM: dts: keystone: fix dt bindings to use post div register for mainpll
  ARM: nomadik: disable UART0 on Nomadik boards
  ARM: dts: i.MX35: Fix can support.
  ARM: OMAP2+: hwmod: Fix _wait_target_ready() for hwmods without sysc
  ARM: dts: add CPU OPP and regulator supply property for exynos4210
  ARM: dts: Update video-phy node with syscon phandle for exynos3250
  ARM: DRA7: hwmod: fix gpmc hwmod

commit | commitdiff | tree

Linus Torvalds [Sun, 2 Aug 2015 00:42:14 +0000 (17:42 -0700)]

Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs

Pull VFS fix from Al Viro:
"Spurious ENOTDIR fix"

This should fix the problems reported by Dominique Martinet and Hugh
Dickins.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
link_path_walk(): be careful when failing with ENOTDIR

commit | commitdiff | tree

Al Viro [Sat, 1 Aug 2015 23:59:28 +0000 (19:59 -0400)]

link_path_walk(): be careful when failing with ENOTDIR

In RCU mode we might end up with dentry evicted just we check
that it's a directory.  In such case we should return ECHILD
rather than ENOTDIR, so that pathwalk would be retries in non-RCU
mode.

Breakage had been introduced in commit b18825a - prior to that
we were looking at nd->inode, which had been fetched before
verifying that ->d_seq was still valid.  That form of check
would only be satisfied if at some point the pathname prefix
would indeed have resolved to a non-directory.  The fix consists
of checking ->d_seq after we'd run into a non-directory dentry,
and failing with ECHILD in case of mismatch.

Note that all branches since 3.12 have that problem...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Linus Torvalds [Sat, 1 Aug 2015 19:47:04 +0000 (12:47 -0700)]

Merge tag 'dmaengine-fix-4.2-rc5' of git://git.infradead.org/users/vkoul/slave-dma

Pull dmaengine fixes from Vinod Koul:
"We had a regression due to reuse of descriptor so we have reverted
  that.

  The rest are driver fixes:

   - at_hdmac and at_xdmac for residue, trannfer width, and channel config
   - pl330 final fix for dma fails and overflow issue
   - xgene resouce map fix
   - mv_xor big endian op fix"

* tag 'dmaengine-fix-4.2-rc5' of git://git.infradead.org/users/vkoul/slave-dma:
  Revert "dmaengine: virt-dma: don't always free descriptor upon completion"
  dmaengine: mv_xor: fix big endian operation in register mode
  dmaengine: xgene-dma: Fix the resource map to handle overlapping
  dmaengine: at_xdmac: fix transfer data width in at_xdmac_prep_slave_sg()
  dmaengine: at_hdmac: fix residue computation
  dmaengine: at_xdmac: fix bug about channel configuration
  dmaengine: pl330: Really fix choppy sound because of wrong residue calculation
  dmaengine: pl330: Fix overflow when reporting residue in memcpy

commit | commitdiff | tree

Linus Torvalds [Sat, 1 Aug 2015 16:47:11 +0000 (09:47 -0700)]

Merge branch 'irq-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull irq fixlets from Thomas Gleixner:
"Just two updates to the maintainers file"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
MAINTAINERS: Appoint Jiang and Marc as irqdomain maintainers
MAINTAINERS: Appoint Marc Zyngier as irqchips co-maintainer

commit | commitdiff | tree

Linus Torvalds [Sat, 1 Aug 2015 16:16:33 +0000 (09:16 -0700)]

Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:
"Fallout from the recent NMI fixes: make x86 LDT handling more robust.

  Also some EFI fixes"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/ldt: Make modify_ldt synchronous
  x86/xen: Probe target addresses in set_aliased_prot() before the hypercall
  x86/irq: Use the caller provided polarity setting in mp_check_pin_attr()
  efi: Check for NULL efi kernel parameters
  x86/efi: Use all 64 bit of efi_memmap in setup_e820()

commit | commitdiff | tree

Linus Torvalds [Sat, 1 Aug 2015 00:10:56 +0000 (17:10 -0700)]

Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) Must teardown SR-IOV before unregistering netdev in igb driver, from
    Alex Williamson.

2) Fix ipv6 route unreachable crash in IPVS, from Alex Gartrell.

3) Default route selection in ipv4 should take the prefix length, table
    ID, and TOS into account, from Julian Anastasov.

4) sch_plug must have a reset method in order to purge all buffered
    packets when the qdisc is reset, likewise for sch_choke, from WANG
    Cong.

5) Fix deadlock and races in slave_changelink/br_setport in bridging.
    From Nikolay Aleksandrov.

6) mlx4 bug fixes (wrong index in port even propagation to VFs,
    overzealous BUG_ON assertion, etc.) from Ido Shamay, Jack
    Morgenstein, and Or Gerlitz.

7) Turn off klog message about SCTP userspace interface compat that
    makes no sense at all, from Daniel Borkmann.

8) Fix unbounded restarts of inet frag eviction process, causing NMI
    watchdog soft lockup messages, from Florian Westphal.

9) Suspend/resume fixes for r8152 from Hayes Wang.

10) Fix busy loop when MSG_WAITALL|MSG_PEEK is used in TCP recv, from
    Sabrina Dubroca.

11) Fix performance regression when removing a lot of routes from the
    ipv4 routing tables, from Alexander Duyck.

12) Fix device leak in AF_PACKET, from Lars Westerhoff.

13) AF_PACKET also has a header length comparison bug due to signedness,
    from Alexander Drozdov.

14) Fix bug in EBPF tail call generation on x86, from Daniel Borkmann.

15) Memory leaks, TSO stats, watchdog timeout and other fixes to
    thunderx driver from Sunil Goutham and Thanneeru Srinivasulu.

16) act_bpf can leak memory when replacing programs, from Daniel
    Borkmann.

17) WOL packet fixes in gianfar driver, from Claudiu Manoil.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (79 commits)
  stmmac: fix missing MODULE_LICENSE in stmmac_platform
  gianfar: Enable device wakeup when appropriate
  gianfar: Fix suspend/resume for wol magic packet
  gianfar: Fix warning when CONFIG_PM off
  act_pedit: check binding before calling tcf_hash_release()
  net: sk_clone_lock() should only do get_net() if the parent is not a kernel socket
  net: sched: fix refcount imbalance in actions
  r8152: reset device when tx timeout
  r8152: add pre_reset and post_reset
  qlcnic: Fix corruption while copying
  act_bpf: fix memory leaks when replacing bpf programs
  net: thunderx: Fix for crash while BGX teardown
  net: thunderx: Add PCI driver shutdown routine
  net: thunderx: Fix crash when changing rss with mutliple traffic flows
  net: thunderx: Set watchdog timeout value
  net: thunderx: Wakeup TXQ only if CQE_TX are processed
  net: thunderx: Suppress alloc_pages() failure warnings
  net: thunderx: Fix TSO packet statistic
  net: thunderx: Fix memory leak when changing queue count
  net: thunderx: Fix RQ_DROP miscalculation
  ...

commit | commitdiff | tree

Linus Torvalds [Sat, 1 Aug 2015 00:05:37 +0000 (17:05 -0700)]

Merge branch 'for-linus-4.2' of git://git./linux/kernel/git/mason/linux-btrfs

Pull btrfs fixes from Chris Mason:
"Filipe fixed up a hard to trigger ENOSPC regression from our merge
  window pull, and we have a few other smaller fixes"

* 'for-linus-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: fix quick exhaustion of the system array in the superblock
  btrfs: its btrfs_err() instead of btrfs_error()
  btrfs: Avoid NULL pointer dereference of free_extent_buffer when read_tree_block() fail
  btrfs: Fix lockdep warning of btrfs_run_delayed_iputs()

commit | commitdiff | tree

Linus Torvalds [Sat, 1 Aug 2015 00:00:25 +0000 (17:00 -0700)]

Merge tag 'sound-4.2-rc5' of git://git./linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"This became a relative big update as it includes the collected ASoC
  fixes.  There are a few fixes in ASoC core side, mostly for DAPM and
  the new topology API.  The rest are various ASoC driver-specific
  fixes, as well as the usual HD-audio and USB-audio quirks"

* tag 'sound-4.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (29 commits)
  ALSA: hda - Fix MacBook Pro 5,2 quirk
  ALSA: hda - Fix race between PM ops and HDA init/probe
  ALSA: usb-audio: add dB range mapping for some devices
  ALSA: hda - Apply a fixup to Dell Vostro 5480
  ALSA: hda - Add pin quirk for the headset mic jack detection on Dell laptop
  ALSA: hda - Apply fixup for another Toshiba Satellite S50D
  ALSA: fireworks: add support for AudioFire2 quirk
  ALSA: hda - Fix the headset mic that will not work on Dell desktop machine
  ALSA: hda - fix cs4210_spdif_automute()
  ASoC: pcm1681: Fix setting de-emphasis sampling rate selection
  ASoC: ssm4567: Keep TDM_BCLKS in ssm4567_set_dai_fmt
  ASoC: sgtl5000: Fix up define for SGTL5000_SMALL_POP
  ASoC: dapm: Don't add prefix to widget stream name
  ASoC: rt5645: Check if codec is initialized in workqueue handler
  ASoC: Intel: Get correct usage_count value to load firmware
  ASoC: topology: Fix to add dapm mixer info
  ASoC: zx: spdif: Fix devm_ioremap_resource return value check
  ASoC: zx: i2s: Fix devm_ioremap_resource return value check
  ASoC: mediatek: Use platform_of_node for machine drivers
  ASoC: Free card DAPM context on snd_soc_instantiate_card() error path
  ...

commit | commitdiff | tree

Joachim Eastwood [Fri, 31 Jul 2015 17:13:22 +0000 (19:13 +0200)]

stmmac: fix missing MODULE_LICENSE in stmmac_platform

Commit 50649ab14982 ("stmmac: drop driver from stmmac platform code")
was a bit overzealous in removing code and dropped the MODULE_*
macro's that are still needed since stmmac_platform can be a module.
Fix this by putting the macro's remvoed in 50649ab14982 back.

This fixes the following errors when used as a module:
  stmmac_platform: module license 'unspecified' taints kernel.
  Disabling lock debugging due to kernel taint
  stmmac_platform: Unknown symbol devm_kmalloc (err 0)
  stmmac_platform: Unknown symbol stmmac_suspend (err 0)
  stmmac_platform: Unknown symbol platform_get_irq_byname (err 0)
  stmmac_platform: Unknown symbol stmmac_dvr_remove (err 0)
  stmmac_platform: Unknown symbol platform_get_resource (err 0)
  stmmac_platform: Unknown symbol of_get_phy_mode (err 0)
  stmmac_platform: Unknown symbol of_property_read_u32_array (err 0)
  stmmac_platform: Unknown symbol of_alias_get_id (err 0)
  stmmac_platform: Unknown symbol stmmac_resume (err 0)
  stmmac_platform: Unknown symbol stmmac_dvr_probe (err 0)

Fixes: 50649ab14982 ("stmmac: drop driver from stmmac platform code")
Reported-by: Igor Gnatenko <i.gnatenko.brain@gmail.com>
Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Fri, 31 Jul 2015 22:41:50 +0000 (15:41 -0700)]

Merge branch 'gianfar-wol-fixes'

Claudiu Manoil says:

====================
gianfar: wol magic packet fixes

These changes were already validated as part of FSL SDK.
Patch 2 fixes occasional wake-on magic packet failures during
traffic, probably due to incorrect traffic stop/ device halt
sequence and incorrect usage of txlock.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Claudiu Manoil [Fri, 31 Jul 2015 15:38:33 +0000 (18:38 +0300)]

gianfar: Enable device wakeup when appropriate

The wol_en flag is 0 by default anyway, and we have the
following inconsistency: a MAGIC packet wol capable eth
interface is registered as a wake-up source but unable
to wake-up the system as wol_en is 0 (wake-on flag set to 'd').
Calling set_wakeup_enable() at netdev open is just redundant
because wol_en is 0 by default.
Let only ethtool call set_wakeup_enable() for now.

The bflock is obviously obsoleted, its utility has been corroded
over time. The bitfield flags used today in gianfar are accessed
only on the init/ config path, with no real possibility of
concurrency - nothing that would justify smth. like bflock.

Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Claudiu Manoil [Fri, 31 Jul 2015 15:38:32 +0000 (18:38 +0300)]

gianfar: Fix suspend/resume for wol magic packet

If we disable NAPI in the first place we can mask the device's
interrupts (and halt it) without fearing that imask may be
concurrently accessed from interrupt context, so there's
no need to do local_irq_save() around gfar_halt_nodisable().
lock_rx_qs()/unlock_tx_qs() are just obsoleted and potentially
buggy routines.  The txlock is currently used in the driver only
to manage TX congestion, it has nothing to do with halting the
device.  With these changes, the TX processing is stopped before
gfar_halt().

Compact gfar_halt() is used instead of gfar_halt_nodisable(),
as it disables Rx/TX DMA h/w blocks and the Rx/TX h/w queues.
gfar_start() re-enables all these blocks on resume.  Enabling
the magic-packet mode remains the same, note that the RX block
is re-enabled just before entering sleep mode.

Add IRQF_NO_SUSPEND flag for the error interrupt line, to signal
that the interrupt line must remain active during sleep in order
to wake the system by magic packet (MAG) reception interrupt.
(On some systems the MAG interrupt did trigger w/o this flag
as well, but on others it didn't.)

Without these fixes, when suspended during fair Tx traffic the
interface occasionally failed to be woken up by magic packet.

Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Claudiu Manoil [Fri, 31 Jul 2015 15:38:31 +0000 (18:38 +0300)]

gianfar: Fix warning when CONFIG_PM off

CC      drivers/net/ethernet/freescale/gianfar.o
drivers/net/ethernet/freescale/gianfar.c:568:13: warning: 'lock_tx_qs'
defined but not used [-Wunused-function]
static void lock_tx_qs(struct gfar_private *priv)
             ^
drivers/net/ethernet/freescale/gianfar.c:576:13: warning: 'unlock_tx_qs'
defined but not used [-Wunused-function]
static void unlock_tx_qs(struct gfar_private *priv)
             ^

Reported-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

WANG Cong [Fri, 31 Jul 2015 00:12:21 +0000 (17:12 -0700)]

act_pedit: check binding before calling tcf_hash_release()

When we share an action within a filter, the bind refcnt
should increase, therefore we should not call tcf_hash_release().

Fixes: 1a29321ed045 ("net_sched: act: Dont increment refcnt on replace")
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Cong Wang <cwang@twopensource.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Murali Karicheri [Fri, 29 May 2015 16:04:13 +0000 (12:04 -0400)]

ARM: dts: keystone: fix dt bindings to use post div register for mainpll

All of the keystone devices have a separate register to hold post
divider value for main pll clock. Currently the fixed-postdiv
value used for k2hk/l/e SoCs works by sheer luck as u-boot happens to
use a value of 2 for this. Now that we have fixed this in the pll
clock driver change the dt bindings for the same.

Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Acked-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Olof Johansson <olof@lixom.net>

commit | commitdiff | tree

Linus Torvalds [Fri, 31 Jul 2015 19:34:10 +0000 (12:34 -0700)]

Merge tag 'iommu-fixes-v4.2-rc4' of git://git./linux/kernel/git/joro/iommu

Pull IOMMU fixes from Joerg Roedel:
"These fixes are all for the AMD IOMMU driver:

   - A regression with HSA caused by the conversion of the driver to
     default domains.  The fixes make sure that an HSA device can still
     be attached to an IOMMUv2 domain and that these domains also allow
     non-IOMMUv2 capable devices.

   - Fix iommu=pt mode which did not work because the dma_ops where set
     to nommu_ops, which breaks devices that can only do 32bit DMA.

   - Fix an issue with non-PCI devices not working, because there are no
     dma_ops for them.  This issue was discovered recently as new AMD
     x86 platforms have non-PCI devices too"

* tag 'iommu-fixes-v4.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/amd: Allow non-ATS devices in IOMMUv2 domains
  iommu/amd: Set global dma_ops if swiotlb is disabled
  iommu/amd: Use swiotlb in passthrough mode
  iommu/amd: Allow non-IOMMUv2 devices in IOMMUv2 domains
  iommu/amd: Use iommu core for passthrough mode
  iommu/amd: Use iommu_attach_group()

commit | commitdiff | tree

Linus Torvalds [Fri, 31 Jul 2015 19:11:01 +0000 (12:11 -0700)]

Merge tag 'drm-intel-fixes-2015-07-31' of git://anongit.freedesktop.org/drm-intel

Pull drm intel fixes from Daniel Vetter:
"I delayed my -fixes pull a bit hoping that I could include a fix for
  the dp mst stuff but looks a bit more nasty than that.  So just 3
  other regression fixes, one 4.2 other two cc: stable"

* tag 'drm-intel-fixes-2015-07-31' of git://anongit.freedesktop.org/drm-intel:
  drm/i915: Declare the swizzling unknown for L-shaped configurations
  drm/i915: Mark PIN_USER binding as GLOBAL_BIND without the aliasing ppgtt
  drm/i915: Replace WARN inside I915_READ64_2x32 with retry loop

commit | commitdiff | tree

Linus Torvalds [Fri, 31 Jul 2015 19:05:02 +0000 (12:05 -0700)]

Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:
"This has a bunch of nouveau fixes, as Ben has been hibernating and has
  lots of small fixes for lots of bugs across nouveau.

  Radeon has one major fix for hdmi/dp audio regression that is larger
  than Alex would like, but seems to fix up a fair few bugs, along with
  some misc fixes.

  And a few msm fixes, one of which is also a bit large.

  But nothing in here seems insane or crazy for this stage, just more
  than I'd like"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (33 commits)
  drm/msm/mdp5: release SMB (shared memory blocks) in various cases
  drm/msm: change to uninterruptible wait in atomic commit
  drm/msm: mdp4: Fix drm_framebuffer dereference crash
  drm/msm: fix msm_gem_prime_get_sg_table()
  drm/amdgpu: add new parameter to seperate map and unmap
  drm/amdgpu: hdp_flush is not needed for inside IB
  drm/amdgpu: different emit_ib for gfx and compute
  drm/amdgpu: information leak in amdgpu_info_ioctl()
  drm/amdgpu: clean up init sequence for failures
  drm/radeon/combios: add some validation of lvds values
  drm/radeon: rework audio modeset to handle non-audio hdmi features
  drm/radeon: rework audio detect (v4)
  drm/amdgpu: Drop drm/ prefix for including drm.h in amdgpu_drm.h
  drm/radeon: Drop drm/ prefix for including drm.h in radeon_drm.h
  drm/nouveau/nouveau/ttm: fix tiled system memory with Maxwell
  drm/nouveau/kms/nv50-: guard against enabling cursor on disabled heads
  drm/nouveau/fbcon/g80: reduce PUSH_SPACE alloc, fire ring on accel init
  drm/nouveau/fbcon/gf100-: reduce RING_SPACE allocation
  drm/nouveau/fbcon/nv11-: correctly account for ring space usage
  drm/nouveau/bios: add proper support for opcode 0x59
  ...

commit | commitdiff | tree

Jun Nie [Fri, 10 Jul 2015 12:02:49 +0000 (20:02 +0800)]

Revert "dmaengine: virt-dma: don't always free descriptor upon completion"

This reverts commit b9855f03d560d351e95301b9de0bc3cad3b31fe9.
The patch break existing DMA usage case. For example, audio SOC
dmaengine never release channel and cause virt-dma to cache too
much memory in descriptor to exhaust system memory.

Signed-off-by: Vinod Koul <vinod.koul@intel.com>

commit | commitdiff | tree

Thomas Petazzoni [Wed, 8 Jul 2015 14:28:14 +0000 (16:28 +0200)]

dmaengine: mv_xor: fix big endian operation in register mode

Commit 6f166312c6ea2 ("dmaengine: mv_xor: add support for a38x command
in descriptor mode") introduced the support for a feature that
appeared in Armada 38x: specifying the operation to be performed in a
per-descriptor basis rather than globally per channel.

However, when doing so, it changed the function mv_chan_set_mode() to
use:

  if (IS_ENABLED(__BIG_ENDIAN))

instead of:

  #if defined(__BIG_ENDIAN)

While IS_ENABLED() is perfectly fine for CONFIG_* symbols, it is not
for other symbols such as __BIG_ENDIAN that is provided directly by
the compiler. Consequently, the commit broke support for big-endian,
as the XOR_DESCRIPTOR_SWAP flag was not set in the XOR channel
configuration register.

The primarily visible effect was some nasty warnings and failures
appearing during the self-test of the XOR unit:

[    1.197368] mv_xor d0060900.xor: error on chan 0. intr cause 0x00000082
[    1.197393] mv_xor d0060900.xor: config       0x00008440
[    1.197410] mv_xor d0060900.xor: activation   0x00000000
[    1.197427] mv_xor d0060900.xor: intr cause   0x00000082
[    1.197443] mv_xor d0060900.xor: intr mask    0x000003f7
[    1.197460] mv_xor d0060900.xor: error cause  0x00000000
[    1.197477] mv_xor d0060900.xor: error addr   0x00000000
[    1.197491] ------------[ cut here ]------------
[    1.197513] WARNING: CPU: 0 PID: 1 at ../drivers/dma/mv_xor.c:664 mv_xor_interrupt_handler+0x14c/0x170()

See also:

  http://storage.kernelci.org/next/next-20150617/arm-mvebu_v7_defconfig+CONFIG_CPU_BIG_ENDIAN=y/lab-khilman/boot-armada-xp-openblocks-ax3-4.txt

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Fixes: 6f166312c6ea2 ("dmaengine: mv_xor: add support for a38x command in descriptor mode")
Reviewed-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>

commit | commitdiff | tree

Rameshwar Prasad Sahu [Tue, 7 Jul 2015 10:04:25 +0000 (15:34 +0530)]

dmaengine: xgene-dma: Fix the resource map to handle overlapping

There is an overlap in dma ring cmd csr region due to sharing of ethernet
ring cmd csr region. This patch fix the resource overlapping by mapping
the entire dma ring cmd csr region.

Signed-off-by: Rameshwar Prasad Sahu <rsahu@apm.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>

commit | commitdiff | tree

Cyrille Pitchen [Tue, 30 Jun 2015 12:36:57 +0000 (14:36 +0200)]

dmaengine: at_xdmac: fix transfer data width in at_xdmac_prep_slave_sg()

This patch adds the missing update of the transfer data width in
at_xdmac_prep_slave_sg().

Indeed, for each item in the scatter-gather list, we check whether the
transfer length is aligned with the data width provided by
dmaengine_slave_config(). If so, we directly use this data width for the
current part of the transfer we are preparing. Otherwise, the data width
is reduced to 8 bits (1 byte). Of course, the actual number of register
accesses must also be updated to match the new data width.

So one chunk was missing in the original patch (see Fixes tag below): the
number of register accesses was correctly set to (len >> fixed_dwidth) in
mbr_ubc but the real data width was not updated in mbr_cfg. Since mbr_cfg
may change for each part of the scatter-gather transfer this also explains
why the original patch used the Descriptor View 2 instead of the
Descriptor View 1.

Let's take the example of a DMA transfer to write 8bit data into an Atmel
USART with FIFOs. When FIFOs are enabled in the USART, its Transmit
Holding Register (THR) works in multidata mode, that is to say that up to
4 8bit data can be written into the THR in a single 32bit access and it is
still possible to write only one data with a 8bit access. To take
advantage of this new feature, the DMA driver was modified to allow
multiple dwidths when doing slave transfers.
For instance, when the total length is 22 bytes, the USART driver splits
the transfer into 2 parts:

First part: 20 bytes transferred through 5 32bit writes into THR
Second part: 2 bytes transferred though 2 8bit writes into THR

For the second part, the data width was first set to 4_BYTES by the USART
driver thanks to dmaengine_slave_config() then at_xdmac_prep_slave_sg()
reduces this data width to 1_BYTE because the 2 byte length is not aligned
with the original 4_BYTES data width. Since the data width is modified,
the actual number of writes into THR must be set accordingly.

Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com>
Fixes: 6d3a7d9e3ada ("dmaengine: at_xdmac: allow muliple dwidths when doing slave transfers")
Cc: stable@vger.kernel.org #4.0 and later
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Acked-by: Ludovic Desroches <ludovic.desroches@atmel.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>

commit | commitdiff | tree

Cyrille Pitchen [Thu, 18 Jun 2015 11:25:41 +0000 (13:25 +0200)]

dmaengine: at_hdmac: fix residue computation

As claimed by the programmer datasheet and confirmed by the IP designer,
the Block Transfer Size (BTSIZE) bitfield of the Channel x Control A
Register (CTRLAx) always refers to a number of Source Width (SRC_WIDTH)
transfers.

Both the SRC_WIDTH and BTSIZE bitfields can be extacted from the CTRLAx
register to compute the DMA residue. So the 'tx_width' field is useless
and can be removed from the struct at_desc.

Before this patch, atc_prep_slave_sg() was not consistent: BTSIZE was
correctly initialized according to the SRC_WIDTH but 'tx_width' was always
set to reg_width, which was incorrect for MEM_TO_DEV transfers. It led to
bad DMA residue when 'tx_width' != SRC_WIDTH.

Also the 'tx_width' field was mostly set only in the first and last
descriptors. Depending on the kind of DMA transfer, this field remained
uninitialized for intermediate descriptors. The accurate DMA residue was
computed only when the currently processed descriptor was the first or the
last of the chain. This algorithm was a little bit odd. An accurate DMA
residue can always be computed using the SRC_WIDTH and BTSIZE bitfields
in the CTRLAx register.

Finally, the test to check whether the currently processed descriptor is
the last of the chain was wrong: for cyclic transfer, last_desc->lli.dscr
is NOT equal to zero, since set_desc_eol() is never called, but logically
equal to first_desc->txd.phys. This bug has a side effect on the
drivers/tty/serial/atmel_serial.c driver, which uses cyclic DMA transfer
to receive data. Since the DMA residue was wrong each time the DMA
transfer reaches the second (and last) period of the transfer, no more
data were received by the USART driver till the cyclic DMA transfer loops
back to the first period.

Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com>
Acked-by: Torsten Fleischer <torfl6749@gmail.com>
Tested-by: Jirí Prchal <jiri.prchal@aksignal.cz>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>

commit | commitdiff | tree

Ludovic Desroches [Wed, 17 Jun 2015 14:22:26 +0000 (16:22 +0200)]

dmaengine: at_xdmac: fix bug about channel configuration

When using descriptor view 2 or higher, we don't write the configuration
into AT_XDMAC_CC register because this configuration will be fetch from
the descriptor. Unfortunately, the PROT bit is not updated with this
method, we have to do it manually before enabling the channel.

Signed-off-by: Ludovic Desroches <ludovic.desroches@atmel.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>

commit | commitdiff | tree

Joerg Roedel [Thu, 30 Jul 2015 09:24:45 +0000 (11:24 +0200)]

iommu/amd: Allow non-ATS devices in IOMMUv2 domains

With the grouping of multi-function devices a non-ATS
capable device might also end up in the same domain as an
IOMMUv2 capable device.
So handle this situation gracefully and don't consider it a
bug anymore.

Tested-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>

commit | commitdiff | tree

Andy Lutomirski [Thu, 30 Jul 2015 21:31:32 +0000 (14:31 -0700)]

x86/ldt: Make modify_ldt synchronous

modify_ldt() has questionable locking and does not synchronize
threads. Improve it: redesign the locking and synchronize all
threads' LDTs using an IPI on all modifications.

This will dramatically slow down modify_ldt in multithreaded
programs, but there shouldn't be any multithreaded programs that
care about modify_ldt's performance in the first place.

This fixes some fallout from the CVE-2015-5157 fixes.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: security@kernel.org <security@kernel.org>
Cc: <stable@vger.kernel.org>
Cc: xen-devel <xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/4c6978476782160600471bd865b318db34c7b628.1438291540.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

commit | commitdiff | tree

Andy Lutomirski [Thu, 30 Jul 2015 21:31:31 +0000 (14:31 -0700)]

x86/xen: Probe target addresses in set_aliased_prot() before the hypercall

The update_va_mapping hypercall can fail if the VA isn't present
in the guest's page tables. Under certain loads, this can
result in an OOPS when the target address is in unpopulated vmap
space.

While we're at it, add comments to help explain what's going on.

This isn't a great long-term fix. This code should probably be
changed to use something like set_memory_ro.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Vrabel <dvrabel@cantab.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: security@kernel.org <security@kernel.org>
Cc: <stable@vger.kernel.org>
Cc: xen-devel <xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/0b0e55b995cda11e7829f140b833ef932fcabe3a.1438291540.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

commit | commitdiff | tree

Ingo Molnar [Fri, 31 Jul 2015 07:55:26 +0000 (09:55 +0200)]

Merge tag 'efi-urgent' of git://git./linux/kernel/git/mfleming/efi into x86/urgent

Pull EFI fixes from Matt Fleming:

* Fix an EFI boot issue preventing a Parallels virtual machine from
   booting because the upper 32-bits of the EFI memmap pointer were
   being discarded in setup_e820(). (Dmitry Skorodumov)

* Validate that the "efi" kernel parameter gets used with an argument,
   otherwise we will oops. (Ricardo Neri)

Signed-off-by: Ingo Molnar <mingo@kernel.org>

commit | commitdiff | tree

Linus Torvalds [Fri, 31 Jul 2015 03:36:49 +0000 (20:36 -0700)]

Merge tag 'xfs-for-linus-4.2-rc4' of git://git./linux/kernel/git/dgc/linux-xfs

Pull xfs fixes from Dave Chinner:
"There are a couple of recently found, long standing remote attribute
  corruption fixes caused by log recovery getting confused after a
  crash, and the new DAX code in XFS (merged in 4.2-rc1) needs to
  actually use the DAX fault path on read faults.

  Summary:

   - remote attribute log recovery corruption fixes

   - DAX page faults need to use direct mappings, not a page cache
     mapping"

* tag 'xfs-for-linus-4.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs:
  xfs: remote attributes need to be considered data
  xfs: remote attribute headers contain an invalid LSN
  xfs: call dax_fault on read page faults for DAX

commit | commitdiff | tree

Sowmini Varadhan [Thu, 30 Jul 2015 13:50:36 +0000 (15:50 +0200)]

net: sk_clone_lock() should only do get_net() if the parent is not a kernel socket

The newsk returned by sk_clone_lock should hold a get_net()
reference if, and only if, the parent is not a kernel socket
(making this similar to sk_alloc()).

E.g,. for the SYN_RECV path, tcp_v4_syn_recv_sock->..inet_csk_clone_lock
sets up the syn_recv newsk from sk_clone_lock. When the parent (listen)
socket is a kernel socket (defined in sk_alloc() as having
sk_net_refcnt == 0), then the newsk should also have a 0 sk_net_refcnt
and should not hold a get_net() reference.

Fixes: 26abe14379f8 ("net: Modify sk_alloc to not reference count the
netns of kernel sockets.")
Acked-by: Eric Dumazet <edumazet@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Daniel Borkmann [Wed, 29 Jul 2015 21:35:25 +0000 (23:35 +0200)]

net: sched: fix refcount imbalance in actions

Since commit 55334a5db5cd ("net_sched: act: refuse to remove bound action
outside"), we end up with a wrong reference count for a tc action.

Test case 1:

  FOO="1,6 0 0 4294967295,"
  BAR="1,6 0 0 4294967294,"
  tc filter add dev foo parent 1: bpf bytecode "$FOO" flowid 1:1 \
     action bpf bytecode "$FOO"
  tc actions show action bpf
    action order 0: bpf bytecode '1,6 0 0 4294967295' default-action pipe
    index 1 ref 1 bind 1
  tc actions replace action bpf bytecode "$BAR" index 1
  tc actions show action bpf
    action order 0: bpf bytecode '1,6 0 0 4294967294' default-action pipe
    index 1 ref 2 bind 1
  tc actions replace action bpf bytecode "$FOO" index 1
  tc actions show action bpf
    action order 0: bpf bytecode '1,6 0 0 4294967295' default-action pipe
    index 1 ref 3 bind 1

Test case 2:

  FOO="1,6 0 0 4294967295,"
  tc filter add dev foo parent 1: bpf bytecode "$FOO" flowid 1:1 action ok
  tc actions show action gact
    action order 0: gact action pass
    random type none pass val 0
     index 1 ref 1 bind 1
  tc actions add action drop index 1
    RTNETLINK answers: File exists [...]
  tc actions show action gact
    action order 0: gact action pass
     random type none pass val 0
     index 1 ref 2 bind 1
  tc actions add action drop index 1
    RTNETLINK answers: File exists [...]
  tc actions show action gact
    action order 0: gact action pass
     random type none pass val 0
     index 1 ref 3 bind 1

What happens is that in tcf_hash_check(), we check tcf_common for a given
index and increase tcfc_refcnt and conditionally tcfc_bindcnt when we've
found an existing action. Now there are the following cases:

  1) We do a late binding of an action. In that case, we leave the
     tcfc_refcnt/tcfc_bindcnt increased and are done with the ->init()
     handler. This is correctly handeled.

  2) We replace the given action, or we try to add one without replacing
     and find out that the action at a specific index already exists
     (thus, we go out with error in that case).

In case of 2), we have to undo the reference count increase from
tcf_hash_check() in the tcf_hash_check() function. Currently, we fail to
do so because of the 'tcfc_bindcnt > 0' check which bails out early with
an -EPERM error.

Now, while commit 55334a5db5cd prevents 'tc actions del action ...' on an
already classifier-bound action to drop the reference count (which could
then become negative, wrap around etc), this restriction only accounts for
invocations outside a specific action's ->init() handler.

One possible solution would be to add a flag thus we possibly trigger
the -EPERM ony in situations where it is indeed relevant.

After the patch, above test cases have correct reference count again.

Fixes: 55334a5db5cd ("net_sched: act: refuse to remove bound action outside")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Thu, 30 Jul 2015 21:03:46 +0000 (14:03 -0700)]

Merge branch 'r8152-fixes'

Hayes Wang says:

====================
r8152: device reset

v3:
For patch #2, remove cancel_delayed_work().

v2:
For patch #1, remove usb_autopm_get_interface(), usb_autopm_put_interface(), and
the checking of intf->condition.

For patch #2, replace the original method with usb_queue_reset_device() to reset
the device.

v1:
Although the driver works normally, we find the device may get all 0xff data when
transmitting packets on certain platforms. It would break the device and no packet
could be transmitted. The reset is necessary to recover the hw for this situation.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

hayeswang [Wed, 29 Jul 2015 12:39:09 +0000 (20:39 +0800)]

r8152: reset device when tx timeout

The device reset is necessary if the hw becomes abnormal and stops
transmitting packets.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

hayeswang [Wed, 29 Jul 2015 12:39:08 +0000 (20:39 +0800)]

r8152: add pre_reset and post_reset

Add rtl8152_pre_reset() and rtl8152_post_reset() which are used when
calling usb_reset_device(). The two functions could reduce the time
of reset when calling usb_reset_device() after probe().

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Takashi Iwai [Thu, 30 Jul 2015 20:30:29 +0000 (22:30 +0200)]

ALSA: hda - Fix MacBook Pro 5,2 quirk

MacBook Pro 5,2 with ALC889 codec had already a fixup entry, but this
seems not working correctly, a fix for pin NID 0x15 is needed in
addition. It's equivalent with the fixup for MacBook Air 1,1, so use
this instead.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=102131
Reported-and-tested-by: Jeffery Miller <jefferym@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

commit | commitdiff | tree

Thomas Gleixner [Thu, 30 Jul 2015 10:40:55 +0000 (12:40 +0200)]

MAINTAINERS: Appoint Jiang and Marc as irqdomain maintainers

Ben was pretty surprised that he is still listed as the maintainer and
he has no objections against transferring the duty to those who
rumaged in and revamped that code in the recent past.

Add kernel/irq/msi.c to the affected files as it's part of the shiny
new hierarchical irqdomain machinery.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Grant Likely <grant.likely@linaro.org>

commit | commitdiff | tree

Thomas Gleixner [Thu, 30 Jul 2015 10:38:06 +0000 (12:38 +0200)]

MAINTAINERS: Appoint Marc Zyngier as irqchips co-maintainer

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Jason Cooper <jason@lakedaemon.net>
Cc: Marc Zyngier <marc.zyngier@arm.com>

commit | commitdiff | tree

Jiang Liu [Thu, 30 Jul 2015 07:51:32 +0000 (15:51 +0800)]

x86/irq: Use the caller provided polarity setting in mp_check_pin_attr()

Commit d32932d02e18 ("x86/irq: Convert IOAPIC to use hierarchical
irqdomain interfaces") introduced a regression which causes
malfunction of interrupt lines.

The reason is that the conversion of mp_check_pin_attr() missed to
update the polarity selection of the interrupt pin with the caller
provided setting and instead uses a stale attribute value. That in
turn results in chosing the wrong interrupt flow handler.

Use the caller supplied setting to configure the pin correctly which
also choses the correct interrupt flow handler.

This restores the original behaviour and on the affected
machine/driver (Surface Pro 3, i2c controller) all IOAPIC IRQ
configuration are identical to v4.1.

Fixes: d32932d02e18 ("x86/irq: Convert IOAPIC to use hierarchical irqdomain interfaces")
Reported-and-tested-by: Matt Fleming <matt@codeblueprint.co.uk>
Reported-and-tested-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Chen Yu <yu.c.chen@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1438242695-23531-1-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit | commitdiff | tree

Linus Torvalds [Thu, 30 Jul 2015 18:03:04 +0000 (11:03 -0700)]

Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input

Pull input fixes from Dmitry Torokhov:
"The main change is support for keyboards and touchpads found in 2015
  editions of Macbooks"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Revert "Input: zforce - don't overwrite the stack"
  Input: bcm5974 - add support for the 2015 Macbook Pro
  HID: apple: Add support for the 2015 Macbook Pro
  Input: bcm5974 - prepare for a new trackpad generation
  Input: synaptics - dump ext10 capabilities as well

commit | commitdiff | tree

Tony Battersby [Thu, 16 Jul 2015 15:40:41 +0000 (11:40 -0400)]

scsi: fix memory leak with scsi-mq

Fix a memory leak with scsi-mq triggered by commands with large data
transfer length.

__sg_alloc_table() sets both table->nents and table->orig_nents to the
same value. When the scatterlist is DMA-mapped, table->nents is
overwritten with the (possibly smaller) size of the DMA-mapped
scatterlist, while table->orig_nents retains the original size of the
allocated scatterlist. scsi_free_sgtable() should therefore check
orig_nents instead of nents, and all code that initializes sdb->table
without calling __sg_alloc_table() should set both nents and orig_nents.

Fixes: d285203cf647 ("scsi: add support for a blk-mq based I/O path.")
Cc: <stable@vger.kernel.org> # 3.17+
Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>

commit | commitdiff | tree

Brian King [Tue, 14 Jul 2015 16:41:33 +0000 (11:41 -0500)]

ipr: Fix invalid array indexing for HRRQ

Fixes another signed / unsigned array indexing bug in the ipr driver.
Currently, when hrrq_index wraps, it becomes a negative number. We
do the modulo, but still have a negative number, so we end up indexing
backwards in the array. Given where the hrrq array is located in memory,
we probably won't actually reference memory we don't own, but nonetheless
ipr is still looking at data within struct ipr_ioa_cfg and interpreting it as
struct ipr_hrr_queue data, so bad things could certainly happen.

Each ipr adapter has anywhere from 1 to 16 HRRQs. By default, we use 2 on new
adapters. Let's take an example:

Assume ioa_cfg->hrrq_index=0x7fffffffe and ioa_cfg->hrrq_num=4:

The atomic_add_return will then return -1. We mod this with 3 and get -2, add
one and get -1 for an array index.

On adapters which support more than a single HRRQ, we dedicate HRRQ to adapter
initialization and error interrupts so that we can optimize the other queues
for fast path I/O. So all normal I/O uses HRRQ 1-15. So we want to spread the
I/O requests across those HRRQs.

With the default module parameter settings, this bug won't hit, only when
someone sets the ipr.number_of_msix parameter to a value larger than 3 is when
bad things start to happen.

Cc: <stable@vger.kernel.org>
Tested-by: Wen Xiong <wenxiong@linux.vnet.ibm.com>
Reviewed-by: Wen Xiong <wenxiong@linux.vnet.ibm.com>
Reviewed-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>

commit | commitdiff | tree

Brian King [Tue, 14 Jul 2015 16:41:31 +0000 (11:41 -0500)]

ipr: Fix incorrect trace indexing

When ipr's internal driver trace was changed to an atomic, a signed/unsigned
bug slipped in which results in us indexing backwards in our memory buffer
writing on memory that does not belong to us. This patch fixes this by removing
the modulo and instead just mask off the low bits.

Cc: <stable@vger.kernel.org>
Tested-by: Wen Xiong <wenxiong@linux.vnet.ibm.com>
Reviewed-by: Wen Xiong <wenxiong@linux.vnet.ibm.com>
Reviewed-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>

commit | commitdiff | tree

Brian King [Tue, 14 Jul 2015 16:41:29 +0000 (11:41 -0500)]

ipr: Fix locking for unit attention handling

Make sure we have the host lock held when calling scsi_report_bus_reset. Fixes
a crash seen as the __devices list in the scsi host was changing as we were
iterating through it.

Cc: <stable@vger.kernel.org>
Reviewed-by: Wen Xiong <wenxiong@linux.vnet.ibm.com>
Reviewed-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>

commit | commitdiff | tree

Ricardo Neri [Thu, 16 Jul 2015 02:36:03 +0000 (19:36 -0700)]

efi: Check for NULL efi kernel parameters

Even though it is documented how to specifiy efi parameters, it is
possible to cause a kernel panic due to a dereference of a NULL pointer when
parsing such parameters if "efi" alone is given:

PANIC: early exception 0e rip 10:ffffffff812fb361 error 0 cr2 0
[ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.2.0-rc1+ #450
[ 0.000000]  ffffffff81fe20a9 ffffffff81e03d50 ffffffff8184bb0f 00000000000003f8
[ 0.000000]  0000000000000000 ffffffff81e03e08 ffffffff81f371a1 64656c62616e6520
[ 0.000000]  0000000000000069 000000000000005f 0000000000000000 0000000000000000
[ 0.000000] Call Trace:
[ 0.000000]  [<ffffffff8184bb0f>] dump_stack+0x45/0x57
[ 0.000000]  [<ffffffff81f371a1>] early_idt_handler_common+0x81/0xae
[ 0.000000]  [<ffffffff812fb361>] ? parse_option_str+0x11/0x90
[ 0.000000]  [<ffffffff81f4dd69>] arch_parse_efi_cmdline+0x15/0x42
[ 0.000000]  [<ffffffff81f376e1>] do_early_param+0x50/0x8a
[ 0.000000]  [<ffffffff8106b1b3>] parse_args+0x1e3/0x400
[ 0.000000]  [<ffffffff81f37a43>] parse_early_options+0x24/0x28
[ 0.000000]  [<ffffffff81f37691>] ? loglevel+0x31/0x31
[ 0.000000]  [<ffffffff81f37a78>] parse_early_param+0x31/0x3d
[ 0.000000]  [<ffffffff81f3ae98>] setup_arch+0x2de/0xc08
[ 0.000000]  [<ffffffff8109629a>] ? vprintk_default+0x1a/0x20
[ 0.000000]  [<ffffffff81f37b20>] start_kernel+0x90/0x423
[ 0.000000]  [<ffffffff81f37495>] x86_64_start_reservations+0x2a/0x2c
[ 0.000000]  [<ffffffff81f37582>] x86_64_start_kernel+0xeb/0xef
[ 0.000000] RIP 0xffffffff81ba2efc

This panic is not reproducible with "efi=" as this will result in a non-NULL
zero-length string.

Thus, verify that the pointer to the parameter string is not NULL. This is
consistent with other parameter-parsing functions which check for NULL pointers.

Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>

commit | commitdiff | tree

Dmitry Skorodumov [Tue, 28 Jul 2015 14:38:32 +0000 (18:38 +0400)]

x86/efi: Use all 64 bit of efi_memmap in setup_e820()

The efi_info structure stores low 32 bits of memory map
in efi_memmap and high 32 bits in efi_memmap_hi.

While constructing pointer in the setup_e820(), need
to take into account all 64 bit of the pointer.

It is because on 64bit machine the function
efi_get_memory_map() may return full 64bit pointer and before
the patch that pointer was truncated.

The issue is triggered on Parallles virtual machine and
fixed with this patch.

Signed-off-by: Dmitry Skorodumov <sdmitry@parallels.com>
Cc: Denis V. Lunev <den@openvz.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>

commit | commitdiff | tree

Linus Torvalds [Thu, 30 Jul 2015 15:04:19 +0000 (08:04 -0700)]

Merge tag 'hwmon-for-linus-v4.2-rc5' of git://git./linux/kernel/git/groeck/linux-staging

Pull hwmon fixes from Guenter Roeck:
"Two patches headed for -stable.

  nct7802: Fix integer overflow seen when writing voltage limits
  nct7904: Rename pwm attributes to match hwmon ABI"

* tag 'hwmon-for-linus-v4.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (nct7802) Fix integer overflow seen when writing voltage limits
  hwmon: (nct7904) Rename pwm attributes to match hwmon ABI

commit | commitdiff | tree

Chris Wilson [Sun, 28 Jun 2015 08:19:26 +0000 (09:19 +0100)]

drm/i915: Declare the swizzling unknown for L-shaped configurations

The old style of memory interleaving swizzled upto the end of the
first even bank of memory, and then used the remainder as unswizzled on
the unpaired bank - i.e. swizzling is not constant for all memory. This
causes problems when we try to migrate memory and so the kernel prevents
migration at all when we detect L-shaped inconsistent swizzling.
However, this issue also extends to userspace who try to manually detile
into memory as the swizzling for an individual page is unknown (it
depends on its physical address only known to the kernel), userspace
cannot correctly swizzle.

Note that this is a new attempt for the previously merged one,
reverted in

commit d82c0ba6e306f079407f07003e53c262d683397b
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Tue Jul 14 12:29:27 2015 +0200

Revert "drm/i915: Declare the swizzling unknown for L-shaped configurations"

This is cc: stable since we need it to fix up troubles with wc cpu
mmaps that userspace recently started to use widely.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91105
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: stable@vger.kernel.org
[danvet: Add note about previous (failed attempt).]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

commit | commitdiff | tree

Chris Wilson [Wed, 29 Jul 2015 19:02:48 +0000 (20:02 +0100)]

drm/i915: Mark PIN_USER binding as GLOBAL_BIND without the aliasing ppgtt

If the device does not support the aliasing ppgtt, we must translate
user bind requests (PIN_USER) from LOCAL_BIND to a GLOBAL_BIND. However,
since this is device specific we cannot do this conveniently in the
upper layers and so must manage the vma->bound flags in the backend.

Partial revert of commit 75d04a3773ecee617847de963ae4195d6aa74c28 [4.2-rc1]
Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Date: Tue Apr 28 17:56:17 2015 +0300

drm/i915/gtt: Allocate va range only if vma is not bound

Note this was spotted by Daniel originally, but we dropped the ball in
getting the fix in before the bug going wild. Sorry all.

Reported-by: Vincent Legoll vincent.legoll@gmail.com
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91133
References: https://bugs.freedesktop.org/show_bug.cgi?id=90224
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

commit | commitdiff | tree

Alistair Popple [Thu, 30 Jul 2015 06:53:54 +0000 (16:53 +1000)]

powerpc/eeh-powernv: Fix unbalanced IRQ warning

pnv_eeh_next_error() re-enables the eeh opal event interrupt but it
gets called from a loop if there are more outstanding events to
process, resulting in a warning due to enabling an already enabled
interrupt. Instead the interrupt should only be re-enabled once the
last outstanding event has been processed.

Tested-by: Daniel Axtens <dja@axtens.net>
Reported-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

commit | commitdiff | tree

Joerg Roedel [Tue, 28 Jul 2015 14:58:51 +0000 (16:58 +0200)]

iommu/amd: Set global dma_ops if swiotlb is disabled

Some AMD systems also have non-PCI devices which can do DMA.
Those can't be handled by the AMD IOMMU, as the hardware can
only handle PCI. These devices would end up with no dma_ops,
as neither the per-device nor the global dma_ops will get
set. SWIOTLB provides global dma_ops when it is active, so
make sure there are global dma_ops too when swiotlb is
disabled.

Signed-off-by: Joerg Roedel <jroedel@suse.de>

commit | commitdiff | tree

Joerg Roedel [Tue, 28 Jul 2015 14:58:50 +0000 (16:58 +0200)]

iommu/amd: Use swiotlb in passthrough mode

In passthrough mode (iommu=pt) all devices are identity
mapped. If a device does not support 64bit DMA it might
still need remapping. Make sure swiotlb is initialized to
provide this remapping.

Signed-off-by: Joerg Roedel <jroedel@suse.de>

commit | commitdiff | tree

Joerg Roedel [Tue, 28 Jul 2015 14:58:49 +0000 (16:58 +0200)]

iommu/amd: Allow non-IOMMUv2 devices in IOMMUv2 domains

Since devices with IOMMUv2 functionality might be in the
same group as devices without it, allow those devices in
IOMMUv2 domains too.
Otherwise attaching the group with the IOMMUv2 device to the
domain will fail.

Signed-off-by: Joerg Roedel <jroedel@suse.de>

commit | commitdiff | tree

Joerg Roedel [Tue, 28 Jul 2015 14:58:48 +0000 (16:58 +0200)]

iommu/amd: Use iommu core for passthrough mode

Remove the AMD IOMMU driver implementation for passthrough
mode and rely on the new iommu core features for that.

Signed-off-by: Joerg Roedel <jroedel@suse.de>

commit | commitdiff | tree

Joerg Roedel [Tue, 28 Jul 2015 14:58:47 +0000 (16:58 +0200)]

iommu/amd: Use iommu_attach_group()

Since the conversion to default domains the
iommu_attach_device function only works for devices with
their own group. But this isn't always true for current
IOMMUv2 capable devices, so use iommu_attach_group instead.

Signed-off-by: Joerg Roedel <jroedel@suse.de>

commit | commitdiff | tree

Shahed Shaikh [Wed, 29 Jul 2015 11:55:35 +0000 (07:55 -0400)]

qlcnic: Fix corruption while copying

Use proper typecasting while performing byte-by-byte copy

Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Daniel Borkmann [Wed, 29 Jul 2015 16:40:56 +0000 (18:40 +0200)]

act_bpf: fix memory leaks when replacing bpf programs

We currently trigger multiple memory leaks when replacing bpf
actions, besides others:

  comm "tc", pid 1909, jiffies 4294851310 (age 1602.796s)
  hex dump (first 32 bytes):
    01 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00  ................
    18 b0 98 6d 00 88 ff ff 00 00 00 00 00 00 00 00  ...m............
  backtrace:
    [<ffffffff817e623e>] kmemleak_alloc+0x4e/0xb0
    [<ffffffff8120a22d>] __vmalloc_node_range+0x1bd/0x2c0
    [<ffffffff8120a37a>] __vmalloc+0x4a/0x50
    [<ffffffff811a8d0a>] bpf_prog_alloc+0x3a/0xa0
    [<ffffffff816c0684>] bpf_prog_create+0x44/0xa0
    [<ffffffffa09ba4eb>] tcf_bpf_init+0x28b/0x3c0 [act_bpf]
    [<ffffffff816d7001>] tcf_action_init_1+0x191/0x1b0
    [<ffffffff816d70a2>] tcf_action_init+0x82/0xf0
    [<ffffffff816d4d12>] tcf_exts_validate+0xb2/0xc0
    [<ffffffffa09b5838>] cls_bpf_modify_existing+0x98/0x340 [cls_bpf]
    [<ffffffffa09b5cd6>] cls_bpf_change+0x1a6/0x274 [cls_bpf]
    [<ffffffff816d56e5>] tc_ctl_tfilter+0x335/0x910
    [<ffffffff816b9145>] rtnetlink_rcv_msg+0x95/0x240
    [<ffffffff816df34f>] netlink_rcv_skb+0xaf/0xc0
    [<ffffffff816b909e>] rtnetlink_rcv+0x2e/0x40
    [<ffffffff816deaaf>] netlink_unicast+0xef/0x1b0

Issue is that the old content from tcf_bpf is allocated and needs
to be released when we replace it. We seem to do that since the
beginning of act_bpf on the filter and insns, later on the name as
well.

Example test case, after patch:

  # FOO="1,6 0 0 4294967295,"
  # BAR="1,6 0 0 4294967294,"
  # tc actions add action bpf bytecode "$FOO" index 2
  # tc actions show action bpf
   action order 0: bpf bytecode '1,6 0 0 4294967295' default-action pipe
   index 2 ref 1 bind 0
  # tc actions replace action bpf bytecode "$BAR" index 2
  # tc actions show action bpf
   action order 0: bpf bytecode '1,6 0 0 4294967294' default-action pipe
   index 2 ref 1 bind 0
  # tc actions replace action bpf bytecode "$FOO" index 2
  # tc actions show action bpf
   action order 0: bpf bytecode '1,6 0 0 4294967295' default-action pipe
   index 2 ref 1 bind 0
  # tc actions del action bpf index 2
  [...]
  # echo "scan" > /sys/kernel/debug/kmemleak
  # cat /sys/kernel/debug/kmemleak | grep "comm \"tc\"" | wc -l
  0

Fixes: d23b8ad8ab23 ("tc: add BPF based action")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Thu, 30 Jul 2015 06:52:32 +0000 (23:52 -0700)]

Merge branch 'thunderx-fixes'

Aleksey Makarov says:

====================
net: thunderx: Misc fixes

Miscellaneous fixes for the ThunderX VNIC driver

All the patches can be applied individually.
It's ok to drop some if the maintainer feels uncomfortable
with applying for 4.2.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Thanneeru Srinivasulu [Wed, 29 Jul 2015 13:49:46 +0000 (16:49 +0300)]

net: thunderx: Fix for crash while BGX teardown

Cortina phy does not have kernel driver and we don't attach
device with phy layer for intefaces like XFI, XLAUI etc,
Hence check for interface type before calling disconnect.

Signed-off-by: Thanneeru Srinivasulu <tsrinivasulu@caviumnetworks.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Sunil Goutham [Wed, 29 Jul 2015 13:49:45 +0000 (16:49 +0300)]

net: thunderx: Add PCI driver shutdown routine

Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Sunil Goutham [Wed, 29 Jul 2015 13:49:44 +0000 (16:49 +0300)]

net: thunderx: Fix crash when changing rss with mutliple traffic flows

This fixes a crash when changing rss with multiple traffic flows.

While interface teardown, disable tx queues after all NAPI threads
are done. If done otherwise tx queues might be woken up inside NAPI
if any CQE_TX are processed.

Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Sunil Goutham [Wed, 29 Jul 2015 13:49:43 +0000 (16:49 +0300)]

net: thunderx: Set watchdog timeout value

If a txq (SQ) remains in stopped state after this timeout its
considered as stuck and interface is reinited.

Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Sunil Goutham [Wed, 29 Jul 2015 13:49:42 +0000 (16:49 +0300)]

net: thunderx: Wakeup TXQ only if CQE_TX are processed

Previously TXQ is wakedup whenever napi is executed
and irrespective of if any CQE_TX are processed or not.
Added 'txq_stop' and 'txq_wake' counters to aid in debugging
if there are any future issues.

Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Sunil Goutham [Wed, 29 Jul 2015 13:49:41 +0000 (16:49 +0300)]

net: thunderx: Suppress alloc_pages() failure warnings

Suppressing standard alloc_pages() warnings. Some kernel configs limit
alloc size and the network driver may fail. Do not drop a kernel
warning in this case, instead just drop a oneliner that the network
driver could not be loaded since the buffer could not be allocated.

Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Sunil Goutham [Wed, 29 Jul 2015 13:49:40 +0000 (16:49 +0300)]

net: thunderx: Fix TSO packet statistic

Fixing TSO packages not being counted.

Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Sunil Goutham [Wed, 29 Jul 2015 13:49:39 +0000 (16:49 +0300)]

net: thunderx: Fix memory leak when changing queue count

Fix for memory leak when changing queue/channel count via ethtool

Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Sunil Goutham [Wed, 29 Jul 2015 13:49:38 +0000 (16:49 +0300)]

net: thunderx: Fix RQ_DROP miscalculation

With earlier configured value sufficient number of CQEs are not
being reserved for transmitted packets. Hence under heavy incoming
traffic load, receive notifications will take away most of the CQ
thus transmit notifications will be lost resulting in tx skbs not
being freed.

Finally SQ will be full and it will be stopped, watchdog timer
will kick in. After this fix receive notifications will not take
morethan half of CQ reserving the rest for transmit notifications.

Also changed CQ & SQ sizes from 16k to 4k.
This is also due to the receive notifications taking first half of
CQ under heavy load and time taken by NAPI to clear transmit notifications
will increase with higher queue sizes. Again results in SQ being stopped.

Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Sunil Goutham [Wed, 29 Jul 2015 13:49:37 +0000 (16:49 +0300)]

net: thunderx: Fix memory leak while tearing down interface

Fixed 'tso_hdrs' memory not being freed properly.
Also fixed SQ skbuff maintenance issues.

Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Sunil Goutham [Wed, 29 Jul 2015 13:49:36 +0000 (16:49 +0300)]

net: thunderx: Fix data integrity issues with LDWB

Switching back to LDD transactions from LDWB.

While transmitting packets out with LDWB transactions
data integrity issues are seen very frequently.
hence switching back to LDD.

Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: Robert Richter <rrichter@cavium.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Eric Dumazet [Wed, 29 Jul 2015 10:01:41 +0000 (12:01 +0200)]

ipv6: flush nd cache on IFF_NOARP change

This patch is the IPv6 equivalent of commit
6c8b4e3ff81b ("arp: flush arp cache on IFF_NOARP change")

Without it, we keep buggy neighbours in the cache, with destination
MAC address equal to our own MAC address.

Tested:
tcpdump -i eth0 -s 0 ip6 -n -e &
ip link set dev eth0 arp off
ping6 remote // sends buggy frames
ip link set dev eth0 arp on
ping6 remote // should work once kernel is patched

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Mario Fanelli <mariofanelli@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Guenter Roeck [Sat, 4 Jul 2015 20:23:42 +0000 (13:23 -0700)]

hwmon: (nct7802) Fix integer overflow seen when writing voltage limits

Writing a large value into a voltage limit attribute can result
in an overflow due to an auto-conversion from unsigned long to
unsigned int.

Cc: Constantine Shulyupin <const@MakeLinux.com>
Reviewed-by: Jean Delvare <jdelvare@suse.de>
Cc: stable@vger.kernel.org # v4.1+
Signed-off-by: Guenter Roeck <linux@roeck-us.net>

commit | commitdiff | tree

Guenter Roeck [Mon, 27 Jul 2015 17:21:46 +0000 (10:21 -0700)]

hwmon: (nct7904) Rename pwm attributes to match hwmon ABI

pwm attributes have well defined names, which should be used.

Cc: Vadim V. Vlasov <vvlasov@dev.rtsoft.ru>
Cc: stable@vger.kernel.org #v4.1+
Signed-off-by: Guenter Roeck <linux@roeck-us.net>

commit | commitdiff | tree

Dave Airlie [Thu, 30 Jul 2015 02:41:44 +0000 (12:41 +1000)]

Merge branch 'msm-fixes-4.2' of git://people.freedesktop.org/~robclark/linux into drm-fixes

Fix for nasty crash on mdp4 in disable path, fix for dma-buf export,
smb leak on mdp5 which could result in intermittent modeset fails, and
don't let interrupted system call disturb atomic commit once we are
past the point of no return.

* 'msm-fixes-4.2' of git://people.freedesktop.org/~robclark/linux:
  drm/msm/mdp5: release SMB (shared memory blocks) in various cases
  drm/msm: change to uninterruptible wait in atomic commit
  drm/msm: mdp4: Fix drm_framebuffer dereference crash
  drm/msm: fix msm_gem_prime_get_sg_table()

commit | commitdiff | tree

Dave Airlie [Thu, 30 Jul 2015 02:40:27 +0000 (12:40 +1000)]

Merge branch 'drm-fixes-4.2' of git://people.freedesktop.org/~agd5f/linux into drm-fixes

Radeon and amdgpu fixes for 4.2.  The audio fix ended up being more
invasive than I would have liked, but this should finally fix up the
last of the regressions since DP audio support was added.

* 'drm-fixes-4.2' of git://people.freedesktop.org/~agd5f/linux:
  drm/amdgpu: add new parameter to seperate map and unmap
  drm/amdgpu: hdp_flush is not needed for inside IB
  drm/amdgpu: different emit_ib for gfx and compute
  drm/amdgpu: information leak in amdgpu_info_ioctl()
  drm/amdgpu: clean up init sequence for failures
  drm/radeon/combios: add some validation of lvds values
  drm/radeon: rework audio modeset to handle non-audio hdmi features
  drm/radeon: rework audio detect (v4)
  drm/amdgpu: Drop drm/ prefix for including drm.h in amdgpu_drm.h
  drm/radeon: Drop drm/ prefix for including drm.h in radeon_drm.h

commit | commitdiff | tree

David S. Miller [Thu, 30 Jul 2015 01:37:41 +0000 (18:37 -0700)]

Merge branch 'netcp-fixes'

Murali Karicheri says:

====================
net: netcp: bug fixes for dynamic module support

This series fixes few bugs to allow keystone netcp modules to be
dynamically loaded and removed. Currently it allows following
sequence multiple times

insmod cpsw_ale.ko
insmod davinci_mdio.ko
insmod keystone_netcp.ko
insmod keystone_netcp_ethss.ko
ifup eth0
ifup eth1
ping <hosts on eth0>
ping <hosts on eth1>
ifdown eth1
ifdown eth0
rmmod keystone_netcp_ethss.ko
rmmod keystone_netcp.ko
rmmod davinci_mdio.ko
rmmod cpsw_ale.ko
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Karicheri, Muralidharan [Tue, 28 Jul 2015 22:20:14 +0000 (18:20 -0400)]

net: netcp: ethss: cleanup gbe_probe() and gbe_remove() functions

This patch clean up error handle code to use goto label properly. In some
cases, the code unnecessarily use goto instead of just returning the error
code. Code also make explicit calls to devm_* APIs on error which is
not necessary. In the gbe_remove() also it makes similar calls which is
also unnecessary.

Also fix few checkpatch warnings

Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Karicheri, Muralidharan [Tue, 28 Jul 2015 22:20:13 +0000 (18:20 -0400)]

net: netcp: ethss: fix up incorrect use of list api

The code seems to assume a null is returned when the list is empty
from first_sec_slave() to break the loop which is incorrect. Fix the
code by using list_empty().

Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Karicheri, Muralidharan [Tue, 28 Jul 2015 22:20:12 +0000 (18:20 -0400)]

net: netcp: fix cleanup interface list in netcp_remove()

Currently if user do rmmod keystone_netcp.ko following warning is
seen :-

[ 59.035891] ------------[ cut here ]------------
[ 59.040535] WARNING: CPU: 2 PID: 1619 at drivers/net/ethernet/ti/
netcp_core.c:2127 netcp_remove)

This is because the interface list is not cleaned up in netcp_remove.
This patch fixes this. Also fix some checkpatch related warnings.

Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Linus Torvalds [Thu, 30 Jul 2015 01:14:48 +0000 (18:14 -0700)]

Merge tag 'pm+acpi-4.2-rc5' of git://git./linux/kernel/git/rafael/linux-pm

Pull power management and ACPI fixes from Rafael Wysocki:
"These fix three regressions, two recent ones (cpufreq core and ACPI
  device power management) and one introduced during the 4.1 cycle
  (intel_pstate).

  Specifics:

   - Fix a recently introduced issue in the cpufreq core causing it to
     attempt to create duplicate symbolic links to the policy directory
     in sysfs for CPUs that are offline when the cpufreq driver is being
     registered (Rafael J Wysocki)

   - Fix a recently introduced problem in the ACPI device power
     management core code causing it to store an incorrect value in the
     device object's power.state field in some cases which in turn leads
     to attempts to turn power resources off while they should still be
     on going forward (Mika Westerberg)

   - Fix an intel_pstate driver issue introduced during the 4.1 cycle
     which leads to kernel panics on boot on Knights Landing chips due
     to incomplete support for them in that driver (Lukasz Anaczkowski)"

* tag 'pm+acpi-4.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpufreq: Avoid attempts to create duplicate symbolic links
  ACPI / PM: Use target_state to set the device power state
  intel_pstate: Add get_scaling cpu_defaults param to Knights Landing

commit | commitdiff | tree

Linus Torvalds [Thu, 30 Jul 2015 01:08:48 +0000 (18:08 -0700)]

Merge tag 'dm-4.2-fixes-3' of git://git./linux/kernel/git/device-mapper/linux-dm

Pull device mapper fixes from Mike Snitzer:

- fix DM thinp to consistently return -ENOSPC when out of data space

- fix a logic bug in the DM cache smq policy's creation error path

- revert a DM cache 4.2-rc3 change that reduced writeback efficiency

- fix a hang on DM cache device destruction due to improper
   prealloc_used accounting introduced in 4.2-rc3

- update URL for dm-crypt wiki page

* tag 'dm-4.2-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm cache: fix device destroy hang due to improper prealloc_used accounting
  Revert "dm cache: do not wake_worker() in free_migration()"
  dm crypt: update wiki page URL
  dm cache policy smq: fix alloc_bitset check that always evaluates as false
  dm thin: return -ENOSPC when erroring retry list due to out of data space

commit | commitdiff | tree

Daniel Borkmann [Tue, 28 Jul 2015 13:26:36 +0000 (15:26 +0200)]

ebpf, x86: fix general protection fault when tail call is invoked

With eBPF JIT compiler enabled on x86_64, I was able to reliably trigger
the following general protection fault out of an eBPF program with a simple
tail call, f.e. tracex5 (or a stripped down version of it):

  [  927.097918] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
  [...]
  [  927.100870] task: ffff8801f228b780 ti: ffff880016a64000 task.ti: ffff880016a64000
  [  927.102096] RIP: 0010:[<ffffffffa002440d>]  [<ffffffffa002440d>] 0xffffffffa002440d
  [  927.103390] RSP: 0018:ffff880016a67a68  EFLAGS: 00010006
  [  927.104683] RAX: 5a5a5a5a5a5a5a5a RBX: 0000000000000000 RCX: 0000000000000001
  [  927.105921] RDX: 0000000000000000 RSI: ffff88014e438000 RDI: ffff880016a67e00
  [  927.107137] RBP: ffff880016a67c90 R08: 0000000000000000 R09: 0000000000000001
  [  927.108351] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880016a67e00
  [  927.109567] R13: 0000000000000000 R14: ffff88026500e460 R15: ffff880220a81520
  [  927.110787] FS:  00007fe7d5c1f740(0000) GS:ffff880265000000(0000) knlGS:0000000000000000
  [  927.112021] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [  927.113255] CR2: 0000003e7bbb91a0 CR3: 000000006e04b000 CR4: 00000000001407e0
  [  927.114500] Stack:
  [  927.115737]  ffffc90008cdb000 ffff880016a67e00 ffff88026500e460 ffff880220a81520
  [  927.117005]  0000000100000000 000000000000001b ffff880016a67aa8 ffffffff8106c548
  [  927.118276]  00007ffcdaf22e58 0000000000000000 0000000000000000 ffff880016a67ff0
  [  927.119543] Call Trace:
  [  927.120797]  [<ffffffff8106c548>] ? lookup_address+0x28/0x30
  [  927.122058]  [<ffffffff8113d176>] ? __module_text_address+0x16/0x70
  [  927.123314]  [<ffffffff8117bf0e>] ? is_ftrace_trampoline+0x3e/0x70
  [  927.124562]  [<ffffffff810c1a0f>] ? __kernel_text_address+0x5f/0x80
  [  927.125806]  [<ffffffff8102086f>] ? print_context_stack+0x7f/0xf0
  [  927.127033]  [<ffffffff810f7852>] ? __lock_acquire+0x572/0x2050
  [  927.128254]  [<ffffffff810f7852>] ? __lock_acquire+0x572/0x2050
  [  927.129461]  [<ffffffff8119edfa>] ? trace_call_bpf+0x3a/0x140
  [  927.130654]  [<ffffffff8119ee4a>] trace_call_bpf+0x8a/0x140
  [  927.131837]  [<ffffffff8119edfa>] ? trace_call_bpf+0x3a/0x140
  [  927.133015]  [<ffffffff8119f008>] kprobe_perf_func+0x28/0x220
  [  927.134195]  [<ffffffff811a1668>] kprobe_dispatcher+0x38/0x60
  [  927.135367]  [<ffffffff81174b91>] ? seccomp_phase1+0x1/0x230
  [  927.136523]  [<ffffffff81061400>] kprobe_ftrace_handler+0xf0/0x150
  [  927.137666]  [<ffffffff81174b95>] ? seccomp_phase1+0x5/0x230
  [  927.138802]  [<ffffffff8117950c>] ftrace_ops_recurs_func+0x5c/0xb0
  [  927.139934]  [<ffffffffa022b0d5>] 0xffffffffa022b0d5
  [  927.141066]  [<ffffffff81174b91>] ? seccomp_phase1+0x1/0x230
  [  927.142199]  [<ffffffff81174b95>] seccomp_phase1+0x5/0x230
  [  927.143323]  [<ffffffff8102c0a4>] syscall_trace_enter_phase1+0xc4/0x150
  [  927.144450]  [<ffffffff81174b95>] ? seccomp_phase1+0x5/0x230
  [  927.145572]  [<ffffffff8102c0a4>] ? syscall_trace_enter_phase1+0xc4/0x150
  [  927.146666]  [<ffffffff817f9a9f>] tracesys+0xd/0x44
  [  927.147723] Code: 48 8b 46 10 48 39 d0 76 2c 8b 85 fc fd ff ff 83 f8 20 77 21 83
                       c0 01 89 85 fc fd ff ff 48 8d 44 d6 80 48 8b 00 48 83 f8 00 74
                       0a <48> 8b 40 20 48 83 c0 33 ff e0 48 89 d8 48 8b 9d d8 fd ff
                       ff 4c
  [  927.150046] RIP  [<ffffffffa002440d>] 0xffffffffa002440d

The code section with the instructions that traps points into the eBPF JIT
image of the root program (the one invoking the tail call instruction).

Using bpf_jit_disasm -o on the eBPF root program image:

  [...]
  4e:   mov    -0x204(%rbp),%eax
        8b 85 fc fd ff ff
  54:   cmp    $0x20,%eax               <--- if (tail_call_cnt > MAX_TAIL_CALL_CNT)
        83 f8 20
  57:   ja     0x000000000000007a
        77 21
  59:   add    $0x1,%eax                <--- tail_call_cnt++
        83 c0 01
  5c:   mov    %eax,-0x204(%rbp)
        89 85 fc fd ff ff
  62:   lea    -0x80(%rsi,%rdx,8),%rax  <--- prog = array->prog[index]
        48 8d 44 d6 80
  67:   mov    (%rax),%rax
        48 8b 00
  6a:   cmp    $0x0,%rax                <--- check for NULL
        48 83 f8 00
  6e:   je     0x000000000000007a
        74 0a
  70:   mov    0x20(%rax),%rax          <--- GPF triggered here! fetch of bpf_func
        48 8b 40 20                              [ matches <48> 8b 40 20 ... from above ]
  74:   add    $0x33,%rax               <--- prologue skip of new prog
        48 83 c0 33
  78:   jmpq   *%rax                    <--- jump to new prog insns
        ff e0
  [...]

The problem is that rax has 5a5a5a5a5a5a5a5a, which suggests a tail call
jump to map slot 0 is pointing to a poisoned page. The issue is the following:

lea instruction has a wrong offset, i.e. it should be ...

  lea    0x80(%rsi,%rdx,8),%rax

... but it actually seems to be ...

  lea   -0x80(%rsi,%rdx,8),%rax

... where 0x80 is offsetof(struct bpf_array, prog), thus the offset needs
to be positive instead of negative. Disassembling the interpreter, we btw
similarly do:

  [...]
  c88:  lea     0x80(%rax,%rdx,8),%rax  <--- prog = array->prog[index]
        48 8d 84 d0 80 00 00 00
  c90:  add     $0x1,%r13d
        41 83 c5 01
  c94:  mov     (%rax),%rax
        48 8b 00
  [...]

Now the other interesting fact is that this panic triggers only when things
like CONFIG_LOCKDEP are being used. In that case offsetof(struct bpf_array,
prog) starts at offset 0x80 and in non-CONFIG_LOCKDEP case at offset 0x50.
Reason is that the work_struct inside struct bpf_map grows by 48 bytes in my
case due to the lockdep_map member (which also has CONFIG_LOCK_STAT enabled
members).

Changing the emitter to always use the 4 byte displacement in the lea
instruction fixes the panic on my side. It increases the tail call instruction
emission by 3 more byte, but it should cover us from various combinations
(and perhaps other future increases on related structures).

After patch, disassembly:

  [...]
  9e:   lea    0x80(%rsi,%rdx,8),%rax   <--- CONFIG_LOCKDEP/CONFIG_LOCK_STAT
        48 8d 84 d6 80 00 00 00
  a6:   mov    (%rax),%rax
        48 8b 00
  [...]

  [...]
  9e:   lea    0x50(%rsi,%rdx,8),%rax   <--- No CONFIG_LOCKDEP
        48 8d 84 d6 50 00 00 00
  a6:   mov    (%rax),%rax
        48 8b 00
  [...]

Fixes: b52f00e6a715 ("x86: bpf_jit: implement bpf_tail_call() helper")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Nikolay Aleksandrov [Tue, 28 Jul 2015 11:10:44 +0000 (13:10 +0200)]

bridge: mdb: fix delmdb state in the notification

Since mdb states were introduced when deleting an entry the state was
left as it was set in the delete request from the user which leads to
the following output when doing a monitor (for example):
$ bridge mdb add dev br0 port eth3 grp 239.0.0.1 permanent
(monitor) dev br0 port eth3 grp 239.0.0.1 permanent
$ bridge mdb del dev br0 port eth3 grp 239.0.0.1 permanent
(monitor) dev br0 port eth3 grp 239.0.0.1 temp
^^^
Note the "temp" state in the delete notification which is wrong since
the entry was permanent, the state in a delete is always reported as
"temp" regardless of the real state of the entry.

After this patch:
$ bridge mdb add dev br0 port eth3 grp 239.0.0.1 permanent
(monitor) dev br0 port eth3 grp 239.0.0.1 permanent
$ bridge mdb del dev br0 port eth3 grp 239.0.0.1 permanent
(monitor) dev br0 port eth3 grp 239.0.0.1 permanent

There's one important note to make here that the state is actually not
matched when doing a delete, so one can delete a permanent entry by
stating "temp" in the end of the command, I've chosen this fix in order
not to break user-space tools which rely on this (incorrect) behaviour.

So to give an example after this patch and using the wrong state:
$ bridge mdb add dev br0 port eth3 grp 239.0.0.1 permanent
(monitor) dev br0 port eth3 grp 239.0.0.1 permanent
$ bridge mdb del dev br0 port eth3 grp 239.0.0.1 temp
(monitor) dev br0 port eth3 grp 239.0.0.1 permanent

Note the state of the entry that got deleted is correct in the
notification.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Fixes: ccb1c31a7a87 ("bridge: add flags to distinguish permanent mdb entires")
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Satish Ashok [Tue, 28 Jul 2015 10:28:27 +0000 (03:28 -0700)]

bridge: mcast: give fast leave precedence over multicast router and querier

When fast leave is configured on a bridge port and an IGMP leave is
received for a group, the group is not deleted immediately if there is
a router detected or if multicast querier is configured.
Ideally the group should be deleted immediately when fast leave is
configured.

Signed-off-by: Satish Ashok <sashok@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Domain: System / Kernel;