Chris Wilson [Thu, 23 Nov 2017 15:26:30 +0000 (15:26 +0000)]
drm/i915: Unwind incomplete legacy context switches
The legacy context switch for ringbuffer submission is multistaged,
where each of those stages may fail. However, we were updating global
state after some stages, and so we had to force the incomplete request
to be submitted because we could not unwind. Save the global state
before performing the switches, and so enable us to unwind back to the
previous global state should any phase fail. We then must cancel the
request instead of submitting it should the construction fail.
v2: s/saved_ctx/from_ctx/; s/ctx/to_ctx/ etc.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171123152631.31385-1-chris@chris-wilson.co.uk
Matthew Auld [Thu, 23 Nov 2017 13:54:21 +0000 (13:54 +0000)]
drm/i915/selftests: test descending addresses
For igt_write_huge make sure the higher gtt offsets don't feel left out,
which is especially true when dealing with the 48b PPGTT, where we
timeout long before we are able exhaust the address space.
v2: just use IGT_TIMEOUT
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171123135421.17967-2-matthew.auld@intel.com
Matthew Auld [Thu, 23 Nov 2017 13:54:20 +0000 (13:54 +0000)]
drm/i915/selftests: rein in igt_write_huge
Rather than repeat the test for each engine, which takes a long time,
let's try alternating between the engines in some randomized
order.
v2: fix gen2 blunder
fix !order blunder
more cunning permutation construction!
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171123135421.17967-1-matthew.auld@intel.com
Daniel Vetter [Tue, 21 Nov 2017 09:42:41 +0000 (10:42 +0100)]
drm/i915: remove stale comment from sanitize_encoder
This goes back to pre-atomic, where due to intermediate dpms states
connectors and encoder states might indeed not have matched.
With atomic that's all smashed together (and hopefully no bios ever
enables a vga output in dpms standby/suspedn state or we're toast).
In
commit
873ffe69a9097fb241fff2967ea6f0bf2c179195
Author: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Date: Wed Aug 5 12:37:07 2015 +0200
drm/i915: Remove connectors_active from sanitization, v2.
sanitize_encoders was changed to disable the encoder in all cases,
which made the comment obsolete.
Remove the misleading comment.
Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171121094241.9129-1-daniel.vetter@ffwll.ch
Daniel Vetter [Mon, 13 Nov 2017 16:01:40 +0000 (17:01 +0100)]
drm/i915: sync dp link status checks against atomic commmits
Two bits:
- check actual atomic state, the legacy stuff can only be looked at
from within the atomic_commit_tail function, since it's only
protected by ordering and not by any locks.
- Make sure we don't wreak the work an ongoing nonblocking commit is
doing.
v2: We need the crtc lock too, because a plane update might change it
without having to acquire the connection_mutex (Maarten). Use
Maarten's changes for this locking, while keeping the logic that uses
the connection->commit->hw_done signal for syncing with nonblocking
commits.
v3: The initial state objects from the hw state readout do not have a
commit object. Check for that (spotted by CI).
v4: Fix deadlock from jumping to put_power with locks still held.
(mlankhorst)
Cc: Manasi Navare <manasi.d.navare@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
References: https://bugs.freedesktop.org/show_bug.cgi?id=103336
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99272
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20171113160140.22679-1-maarten.lankhorst@linux.intel.com
Tvrtko Ursulin [Thu, 23 Nov 2017 10:26:54 +0000 (10:26 +0000)]
drm/i915/pmu: Clear the previous sample value when parking
When turning off the engines, and the pmu sampling, clear the previous
value as the current measurement should be 0.
v2: Use a for-loop
v3:
* Move clearing to timer self-dis-arm to avoid race with parking.
* Clear frequency samples as well.
v4:
* Init frequency to idle_freq. (Chris Wilson)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v3)
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171123102654.29296-1-tvrtko.ursulin@linux.intel.com
Tvrtko Ursulin [Thu, 23 Nov 2017 10:07:01 +0000 (10:07 +0000)]
drm/i915/pmu: Drop I915_ENGINE_SAMPLE_MAX from uapi headers
We have agreed during the engine classes discussion that fields marked as
non-ABI are better left out altogether from uapi headers.
v2: Use a local define for maintanability. (Chris Wilson)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171123100701.18430-1-tvrtko.ursulin@linux.intel.com
Anusha Srivatsa [Thu, 9 Nov 2017 18:51:43 +0000 (10:51 -0800)]
drm/i915/dmc: DMC 1.04 for Kabylake
There is a new version of DMC available for KBL.
The release notes mentions:
1. Fix for the issue where DC_STATE was getting enabled even
when disabled by driver causing data corruption.
v2: Remove pull request from commit message (Rodrigo).
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Anusha Srivatsa <anusha.srivatsa@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1510253503-12634-1-git-send-email-anusha.srivatsa@intel.com
Chris Wilson [Wed, 22 Nov 2017 22:25:10 +0000 (22:25 +0000)]
drm/i915: Save/restore irq state for vlv_residency_raw()
Since commit
6060b6aec03c ("drm/i915/pmu: Add RC6 residency metrics"),
vlv_residency_raw() may be called from an irq-disabled context (via perf
event sampling on remote cpu). As such, we can no longer assume that we
are called from process context and must save/restore the irq state for
the spinlock.
Fixes:
6060b6aec03c ("drm/i915/pmu: Add RC6 residency metrics")
Testcase: igt/perf_pmu/other-init-3
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171122222510.22627-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Chris Wilson [Wed, 22 Nov 2017 17:26:21 +0000 (17:26 +0000)]
drm/i915: Call i915_gem_init_userptr() before taking struct_mutex
We don't need struct_mutex to initialise userptr (it just allocates a
workqueue for itself etc), but we do need struct_mutex later on in
i915_gem_init() in order to feed requests onto the HW.
This should break the chain
[ 385.697902] ======================================================
[ 385.697907] WARNING: possible circular locking dependency detected
[ 385.697913] 4.14.0-CI-Patchwork_7234+ #1 Tainted: G U
[ 385.697917] ------------------------------------------------------
[ 385.697922] perf_pmu/2631 is trying to acquire lock:
[ 385.697927] (&mm->mmap_sem){++++}, at: [<
ffffffff811bfe1e>] __might_fault+0x3e/0x90
[ 385.697941]
but task is already holding lock:
[ 385.697946] (&cpuctx_mutex){+.+.}, at: [<
ffffffff8116fe8c>] perf_event_ctx_lock_nested+0xbc/0x1d0
[ 385.697957]
which lock already depends on the new lock.
[ 385.697963]
the existing dependency chain (in reverse order) is:
[ 385.697970]
-> #4 (&cpuctx_mutex){+.+.}:
[ 385.697980] __mutex_lock+0x86/0x9b0
[ 385.697985] perf_event_init_cpu+0x5a/0x90
[ 385.697991] perf_event_init+0x178/0x1a4
[ 385.697997] start_kernel+0x27f/0x3f1
[ 385.698003] verify_cpu+0x0/0xfb
[ 385.698006]
-> #3 (pmus_lock){+.+.}:
[ 385.698015] __mutex_lock+0x86/0x9b0
[ 385.698020] perf_event_init_cpu+0x21/0x90
[ 385.698025] cpuhp_invoke_callback+0xca/0xc00
[ 385.698030] _cpu_up+0xa7/0x170
[ 385.698035] do_cpu_up+0x57/0x70
[ 385.698039] smp_init+0x62/0xa6
[ 385.698044] kernel_init_freeable+0x97/0x193
[ 385.698050] kernel_init+0xa/0x100
[ 385.698055] ret_from_fork+0x27/0x40
[ 385.698058]
-> #2 (cpu_hotplug_lock.rw_sem){++++}:
[ 385.698068] cpus_read_lock+0x39/0xa0
[ 385.698073] apply_workqueue_attrs+0x12/0x50
[ 385.698078] __alloc_workqueue_key+0x1d8/0x4d8
[ 385.698134] i915_gem_init_userptr+0x5f/0x80 [i915]
[ 385.698176] i915_gem_init+0x7c/0x390 [i915]
[ 385.698213] i915_driver_load+0x99e/0x15c0 [i915]
[ 385.698250] i915_pci_probe+0x33/0x90 [i915]
[ 385.698256] pci_device_probe+0xa1/0x130
[ 385.698262] driver_probe_device+0x293/0x440
[ 385.698267] __driver_attach+0xde/0xe0
[ 385.698272] bus_for_each_dev+0x5c/0x90
[ 385.698277] bus_add_driver+0x16d/0x260
[ 385.698282] driver_register+0x57/0xc0
[ 385.698287] do_one_initcall+0x3e/0x160
[ 385.698292] do_init_module+0x5b/0x1fa
[ 385.698297] load_module+0x2374/0x2dc0
[ 385.698302] SyS_finit_module+0xaa/0xe0
[ 385.698307] entry_SYSCALL_64_fastpath+0x1c/0xb1
[ 385.698311]
-> #1 (&dev->struct_mutex){+.+.}:
[ 385.698320] __mutex_lock+0x86/0x9b0
[ 385.698361] i915_mutex_lock_interruptible+0x4c/0x130 [i915]
[ 385.698403] i915_gem_fault+0x206/0x760 [i915]
[ 385.698409] __do_fault+0x1a/0x70
[ 385.698413] __handle_mm_fault+0x7c4/0xdb0
[ 385.698417] handle_mm_fault+0x154/0x300
[ 385.698440] __do_page_fault+0x2d6/0x570
[ 385.698445] page_fault+0x22/0x30
[ 385.698449]
-> #0 (&mm->mmap_sem){++++}:
[ 385.698459] lock_acquire+0xaf/0x200
[ 385.698464] __might_fault+0x68/0x90
[ 385.698470] _copy_to_user+0x1e/0x70
[ 385.698475] perf_read+0x1aa/0x290
[ 385.698480] __vfs_read+0x23/0x120
[ 385.698484] vfs_read+0xa3/0x150
[ 385.698488] SyS_read+0x45/0xb0
[ 385.698493] entry_SYSCALL_64_fastpath+0x1c/0xb1
[ 385.698497]
other info that might help us debug this:
[ 385.698505] Chain exists of:
&mm->mmap_sem --> pmus_lock --> &cpuctx_mutex
[ 385.698517] Possible unsafe locking scenario:
[ 385.698522] CPU0 CPU1
[ 385.698526] ---- ----
[ 385.698529] lock(&cpuctx_mutex);
[ 385.698553] lock(pmus_lock);
[ 385.698558] lock(&cpuctx_mutex);
[ 385.698564] lock(&mm->mmap_sem);
[ 385.698568]
*** DEADLOCK ***
[ 385.698574] 1 lock held by perf_pmu/2631:
[ 385.698578] #0: (&cpuctx_mutex){+.+.}, at: [<
ffffffff8116fe8c>] perf_event_ctx_lock_nested+0xbc/0x1d0
[ 385.698589]
stack backtrace:
[ 385.698595] CPU: 3 PID: 2631 Comm: perf_pmu Tainted: G U 4.14.0-CI-Patchwork_7234+ #1
[ 385.698602] Hardware name: /NUC6CAYB, BIOS AYAPLCEL.86A.0040.2017.0619.1722 06/19/2017
[ 385.698609] Call Trace:
[ 385.698615] dump_stack+0x5f/0x86
[ 385.698621] print_circular_bug.isra.18+0x1d0/0x2c0
[ 385.698627] __lock_acquire+0x19c3/0x1b60
[ 385.698634] ? generic_exec_single+0x77/0xe0
[ 385.698640] ? lock_acquire+0xaf/0x200
[ 385.698644] lock_acquire+0xaf/0x200
[ 385.698650] ? __might_fault+0x3e/0x90
[ 385.698655] __might_fault+0x68/0x90
[ 385.698660] ? __might_fault+0x3e/0x90
[ 385.698665] _copy_to_user+0x1e/0x70
[ 385.698670] perf_read+0x1aa/0x290
[ 385.698675] __vfs_read+0x23/0x120
[ 385.698682] ? __fget+0x101/0x1f0
[ 385.698686] vfs_read+0xa3/0x150
[ 385.698691] SyS_read+0x45/0xb0
[ 385.698696] entry_SYSCALL_64_fastpath+0x1c/0xb1
[ 385.698701] RIP: 0033:0x7ff1c46876ed
[ 385.698705] RSP: 002b:
00007fff13552f90 EFLAGS:
00000293 ORIG_RAX:
0000000000000000
[ 385.698712] RAX:
ffffffffffffffda RBX:
ffffc90000647ff0 RCX:
00007ff1c46876ed
[ 385.698718] RDX:
0000000000000010 RSI:
00007fff13552fa0 RDI:
0000000000000005
[ 385.698723] RBP:
000056063d300580 R08:
0000000000000000 R09:
0000000000000060
[ 385.698729] R10:
0000000000000000 R11:
0000000000000293 R12:
0000000000000046
[ 385.698734] R13:
00007fff13552c6f R14:
00007ff1c6279d00 R15:
00007ff1c6279a40
Testcase: igt/perf_pmu
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171122172621.16158-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Chris Wilson [Wed, 22 Nov 2017 14:56:46 +0000 (14:56 +0000)]
drm/i915: Remove success dmesg noise for intel_rotate_pages()
During selftesting intel_rotate_pages() is very, very verbose without
giving us any information. Suppress the noise.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171122145646.1859-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Chris Wilson [Wed, 22 Nov 2017 12:06:00 +0000 (12:06 +0000)]
drm/i915/selftests: Use NOWARN for large allocations
We may try to do a large kmalloc for the permutation array, falling back
to a smaller array/test if the first allocation fails. Since we are
intentionally trying a large allocation which may fail, pass __GFP_NOWARN.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103842
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171122120600.27025-1-chris@chris-wilson.co.uk
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Tvrtko Ursulin [Tue, 21 Nov 2017 18:18:52 +0000 (18:18 +0000)]
drm/i915/pmu: Add RC6 residency metrics
For clients like intel-gpu-overlay it is easier to read the
counters via the perf API than having to parse sysfs.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171121181852.16128-9-tvrtko.ursulin@linux.intel.com
Tvrtko Ursulin [Tue, 21 Nov 2017 18:18:51 +0000 (18:18 +0000)]
drm/i915: Convert intel_rc6_residency_us to ns
Will be used for exposing the PMU counters.
v2:
* Move intel_runtime_pm_get/put to the callers. (Chris Wilson)
* Restore full unit conversion precision.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171121181852.16128-8-tvrtko.ursulin@linux.intel.com
Tvrtko Ursulin [Tue, 21 Nov 2017 18:18:50 +0000 (18:18 +0000)]
drm/i915/pmu: Add interrupt count metric
For clients like intel-gpu-overlay it is easier to read the
count via the perf API than having to parse /proc.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171121181852.16128-7-tvrtko.ursulin@linux.intel.com
Tvrtko Ursulin [Tue, 21 Nov 2017 18:18:49 +0000 (18:18 +0000)]
drm/i915/pmu: Wire up engine busy stats to PMU
We can use engine busy stats instead of the sampling timer for
better accuracy.
By doing this we replace the stohastic sampling with busyness
metric derived directly from engine activity. This is context
switch interrupt driven, so as accurate as we can get from
software tracking.
As a secondary benefit, we can also not run the sampling timer
in cases only busyness metric is enabled.
v2: Rebase.
v3:
* Rebase, comments.
* Leave engine busyness controls out of workers.
v4: Checkpatch cleanup.
v5: Added comment to pmu_needs_timer change.
v6:
* Rebase.
* Fix style of some comments. (Chris Wilson)
v7: Rebase and commit message update. (Chris Wilson)
v8: Add delayed stats disabling to improve accuracy in face of
CPU hotplug events.
v9: Rebase.
v10: Rebase - i915_modparams.enable_execlists removal.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171121181852.16128-6-tvrtko.ursulin@linux.intel.com
Tvrtko Ursulin [Tue, 21 Nov 2017 18:18:48 +0000 (18:18 +0000)]
drm/i915: Engine busy time tracking
Track total time requests have been executing on the hardware.
We add new kernel API to allow software tracking of time GPU
engines are spending executing requests.
Both per-engine and global API is added with the latter also
being exported for use by external users.
v2:
* Squashed with the internal API.
* Dropped static key.
* Made per-engine.
* Store time in monotonic ktime.
v3: Moved stats clearing to disable.
v4:
* Comments.
* Don't export the API just yet.
v5: Whitespace cleanup.
v6:
* Rename ref to active.
* Drop engine aggregate stats for now.
* Account initial busy period after enabling stats.
v7:
* Rebase.
v8:
* Move context in notification after the notifier. (Chris Wilson)
v9:
In cases where stats tracking is getting disabled while there is
an active context on an engine, add up the current value to the
total. This also implies we don't clear the total when tracking
is disabled any longer. There is no real need to do so because
we define the stats as relative while enabled, meaning
comparison between two samples while tracking is enabled is the
valid usage. However, when busy stats will later be plugged into
the perf PMU API, it is beneficial to not reset the total, since
the PMU core likes to do some counter disable/enable cycles on
startup, and while doing so during a single long context
executing on an engine we would lose some accuracy and so make
unit testing more difficult than needs to be.
v10:
* Fix accounting for preemption.
v11:
* Rebase for i915_modparams.enable_execlists removal.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171121181852.16128-5-tvrtko.ursulin@linux.intel.com
Tvrtko Ursulin [Tue, 21 Nov 2017 18:18:47 +0000 (18:18 +0000)]
drm/i915: Wrap context schedule notification
No functional change just something which will be handy in the
following patch.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171121181852.16128-4-tvrtko.ursulin@linux.intel.com
Tvrtko Ursulin [Tue, 21 Nov 2017 18:18:46 +0000 (18:18 +0000)]
drm/i915/pmu: Suspend sampling when GPU is idle
If only a subset of events is enabled we can afford to suspend
the sampling timer when the GPU is idle and so save some cycles
and power.
v2: Rebase and limit timer even more.
v3: Rebase.
v4: Rebase.
v5: Skip action if perf PMU failed to register.
v6: Checkpatch cleanup.
v7:
* Add a common helper to start the timer if needed. (Chris Wilson)
* Add comment explaining bitwise logic in pmu_needs_timer.
v8: Fix some comments styles. (Chris Wilson)
v9: Rebase.
v10: Move function declarations to i915_pmu.h.
v11: Rename functions to i915_pmu_gt_(un)parked. (Chris Wilson)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171121181852.16128-3-tvrtko.ursulin@linux.intel.com
Tvrtko Ursulin [Tue, 21 Nov 2017 18:18:45 +0000 (18:18 +0000)]
drm/i915/pmu: Expose a PMU interface for perf queries
From: Chris Wilson <chris@chris-wilson.co.uk>
From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
From: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
The first goal is to be able to measure GPU (and invidual ring) busyness
without having to poll registers from userspace. (Which not only incurs
holding the forcewake lock indefinitely, perturbing the system, but also
runs the risk of hanging the machine.) As an alternative we can use the
perf event counter interface to sample the ring registers periodically
and send those results to userspace.
Functionality we are exporting to userspace is via the existing perf PMU
API and can be exercised via the existing tools. For example:
perf stat -a -e i915/rcs0-busy/ -I 1000
Will print the render engine busynnes once per second. All the performance
counters can be enumerated (perf list) and have their unit of measure
correctly reported in sysfs.
v1-v2 (Chris Wilson):
v2: Use a common timer for the ring sampling.
v3: (Tvrtko Ursulin)
* Decouple uAPI from i915 engine ids.
* Complete uAPI defines.
* Refactor some code to helpers for clarity.
* Skip sampling disabled engines.
* Expose counters in sysfs.
* Pass in fake regs to avoid null ptr deref in perf core.
* Convert to class/instance uAPI.
* Use shared driver code for rc6 residency, power and frequency.
v4: (Dmitry Rogozhkin)
* Register PMU with .task_ctx_nr=perf_invalid_context
* Expose cpumask for the PMU with the single CPU in the mask
* Properly support pmu->stop(): it should call pmu->read()
* Properly support pmu->del(): it should call stop(event, PERF_EF_UPDATE)
* Introduce refcounting of event subscriptions.
* Make pmu.busy_stats a refcounter to avoid busy stats going away
with some deleted event.
* Expose cpumask for i915 PMU to avoid multiple events creation of
the same type followed by counter aggregation by perf-stat.
* Track CPUs getting online/offline to migrate perf context. If (likely)
cpumask will initially set CPU0, CONFIG_BOOTPARAM_HOTPLUG_CPU0 will be
needed to see effect of CPU status tracking.
* End result is that only global events are supported and perf stat
works correctly.
* Deny perf driver level sampling - it is prohibited for uncore PMU.
v5: (Tvrtko Ursulin)
* Don't hardcode number of engine samplers.
* Rewrite event ref-counting for correctness and simplicity.
* Store initial counter value when starting already enabled events
to correctly report values to all listeners.
* Fix RC6 residency readout.
* Comments, GPL header.
v6:
* Add missing entry to v4 changelog.
* Fix accounting in CPU hotplug case by copying the approach from
arch/x86/events/intel/cstate.c. (Dmitry Rogozhkin)
v7:
* Log failure message only on failure.
* Remove CPU hotplug notification state on unregister.
v8:
* Fix error unwind on failed registration.
* Checkpatch cleanup.
v9:
* Drop the energy metric, it is available via intel_rapl_perf.
(Ville Syrjälä)
* Use HAS_RC6(p). (Chris Wilson)
* Handle unsupported non-engine events. (Dmitry Rogozhkin)
* Rebase for intel_rc6_residency_ns needing caller managed
runtime pm.
* Drop HAS_RC6 checks from the read callback since creating those
events will be rejected at init time already.
* Add counter units to sysfs so perf stat output is nicer.
* Cleanup the attribute tables for brevity and readability.
v10:
* Fixed queued accounting.
v11:
* Move intel_engine_lookup_user to intel_engine_cs.c
* Commit update. (Joonas Lahtinen)
v12:
* More accurate sampling. (Chris Wilson)
* Store and report frequency in MHz for better usability from
perf stat.
* Removed metrics: queued, interrupts, rc6 counters.
* Sample engine busyness based on seqno difference only
for less MMIO (and forcewake) on all platforms. (Chris Wilson)
v13:
* Comment spelling, use mul_u32_u32 to work around potential GCC
issue and somne code alignment changes. (Chris Wilson)
v14:
* Rebase.
v15:
* Rebase for RPS refactoring.
v16:
* Use the dynamic slot in the CPU hotplug state machine so that we are
free to setup our state as multi-instance. Previously we were re-using
the CPUHP_AP_PERF_X86_UNCORE_ONLINE slot which is neither used as
multi-instance, nor owned by our driver to start with.
* Register the CPU hotplug handlers after the PMU, otherwise the callback
will get called before the PMU is initialized which can end up in
perf_pmu_migrate_context with an un-initialized base.
* Added workaround for a probable bug in cpuhp core.
v17:
* Remove workaround for the cpuhp bug.
v18:
* Rebase for drm_i915_gem_engine_class getting upstream before us.
v19:
* Rebase. (trivial)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171121181852.16128-2-tvrtko.ursulin@linux.intel.com
Tvrtko Ursulin [Tue, 21 Nov 2017 18:18:44 +0000 (18:18 +0000)]
drm/i915: Extract intel_get_cagf
Code to be shared between debugfs and the PMU implementation.
v2: Checkpatch cleanup.
v3: Also consolidate i915_sysfs.c/gt_act_freq_mhz_show.
v4: Rebase.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171121181852.16128-1-tvrtko.ursulin@linux.intel.com
Chris Wilson [Tue, 21 Nov 2017 11:06:52 +0000 (11:06 +0000)]
drm/i915/selftests: Avoid drm_gem_handle_create under struct_mutex
Despite us reloading the module around every selftest, the lockclasses
persist and the chains used in selftesting may then dictate how we are
allowed to nest locks during runtime testing. As such we have to be just
as careful, and in particular it turns out we are not allowed to nest
dev->object_name_lock (drm_gem_handle_create) inside dev->struct_mutex.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103830
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171121110652.1107-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Ville Syrjälä [Fri, 17 Nov 2017 19:19:17 +0000 (21:19 +0200)]
drm/i915: Add rudimentary plane state verification
Check that the planes are in the state we expect them to be. For
now we can only check whether each plane is correctly enabled or
disabled. In the future we may want to expand the plane state
readout to support a more thorough verification.
v2: Verify all planes part of the state as long as at least
one crtc is doing a modeset (Daniel)
v3: Fix typoes (James)
Cc: James Ausmus <james.ausmus@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: James Ausmus <james.ausmus@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171117191917.11506-11-ville.syrjala@linux.intel.com
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Ville Syrjälä [Fri, 17 Nov 2017 19:19:16 +0000 (21:19 +0200)]
drm/i915: Use plane->get_hw_state() for initial plane fb readout
Since we now have a ->get_hw_state() method for planes, let's use
that during the initial plane fb readout.
v2: s/plane/i9xx_plane/ etc. (James)
Cc: James Ausmus <james.ausmus@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: James Ausmus <james.ausmus@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171117191917.11506-10-ville.syrjala@linux.intel.com
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Ville Syrjälä [Fri, 17 Nov 2017 19:19:15 +0000 (21:19 +0200)]
drm/i915: Nuke crtc->plane
Eliminate crtc->plane since it's pretty much a layering violation.
We can always get the plane via crtc->primary if we actually need it.
The only ugly thing left is plane_to_crtc_mapping[], but that's
still needed by the pre-g4x watermark code.
v2: Removed a misplaced comment change (Daniel)
v3: Rebase due to fbc crtc->y usage removal
v4: s/plane/i9xx_plane/ etc. (James)
Cc: James Ausmus <james.ausmus@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20171117191917.11506-9-ville.syrjala@linux.intel.com
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Ville Syrjälä [Fri, 17 Nov 2017 19:19:14 +0000 (21:19 +0200)]
drm/i915: Switch fbc over to for_each_new_intel_plane_in_state()
Stop using the old for_each_intel_plane_in_state() type iteration
macro and replace it with for_each_new_intel_plane_in_state().
And similarly replace drm_atomic_get_existing_crtc_state() with
intel_atomic_get_new_crtc_state(). Switch over to intel_ types
as well to make the code less cluttered.
v2: s/plane/i9xx_plane/ etc. (James)
Cc: James Ausmus <james.ausmus@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20171117191917.11506-8-ville.syrjala@linux.intel.com
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Ville Syrjälä [Fri, 17 Nov 2017 19:19:13 +0000 (21:19 +0200)]
drm/i915: Nuke ironlake_get_initial_plane_config()
The only relevant difference between i9xx_get_initial_plane_config() and
ironlake_get_initial_plane_config() is the HSW/BDW TILEOFF handling.
Add that to i9xx_get_initial_plane_config() and nuke
ironlake_get_initial_plane_config().
v2: s/plane/i9xx_plane/ etc. (James)
Cc: James Ausmus <james.ausmus@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20171117191917.11506-7-ville.syrjala@linux.intel.com
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Ville Syrjälä [Fri, 17 Nov 2017 19:19:12 +0000 (21:19 +0200)]
drm/i915: Cleanup enum pipe/enum plane_id/enum i9xx_plane_id in initial fb readout
Use enum pipe, enum plane_id, and enum i9xx_plane_id consistently in the
initial framebuffe readout.
v2: Use old_plane_id in the ilk code
v3: s/old_plane_id/i9xx_plane_id/ (Daniel)
v4: Rebase due to GLK/CNL PLANE_COLOR_CTL alpha stuff
v5: s/plane/i9xx_plane/ etc. (James)
Cc: James Ausmus <james.ausmus@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20171117191917.11506-6-ville.syrjala@linux.intel.com
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Ville Syrjälä [Fri, 17 Nov 2017 19:19:11 +0000 (21:19 +0200)]
drm/i915: Use enum i9xx_plane_id for the .get_fifo_size() hooks
Replace the 0 and 1 with PLANE_A and PLANE_B in the pre-g4x wm code.
v2: s/old_plane_id/i9xx_plane_id/ (Daniel)
v3: s/plane/i9xx_plane/ etc. (James)
Cc: James Ausmus <james.ausmus@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20171117191917.11506-5-ville.syrjala@linux.intel.com
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Ville Syrjälä [Fri, 17 Nov 2017 19:19:10 +0000 (21:19 +0200)]
drm/i915: s/enum plane/enum i9xx_plane_id/
Rename enum plane to enum i9xx_plane_id to make it clear that it only
applies to pre-SKL platforms.
enum i9xx_plane_id is a global identifier, whereas enum plane_id is
per-pipe. We need the old global identifier to index the primary plane
(and the pre-g4x sprite C if we ever expose it) registers on pre-SKL
platforms.
v2: Reorder patches
v3: s/old_plane_id/i9xx_plane_id/ (Daniel)
Pimp the commit message a bit
Note that i9xx_plane_id doesn't apply to SKL+
v4: Rebase due to power domain handling in plane readout
v5: Rebase due to crtc->dspaddr_offset removal
v6: s/plane/i9xx_plane/ etc. (James)
Cc: James Ausmus <james.ausmus@intel.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20171117191917.11506-4-ville.syrjala@linux.intel.com
Reviewed-by: James Ausmus <james.ausmus@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Ville Syrjälä [Fri, 17 Nov 2017 19:19:09 +0000 (21:19 +0200)]
drm/i915: Redo plane sanitation during readout
Unify the plane disabling during state readout by pulling the code into
a new helper intel_plane_disable_noatomic(). We'll also read out the
state of all planes, so that we know which planes really need to be
diabled.
Additonally we change the plane<->pipe mapping sanitation to work by
simply disabling the offending planes instead of entire pipes. And
we do it before we otherwise sanitize the crtcs, which means we don't
have to worry about misassigned planes during crtc sanitation anymore.
v2: Reoder patches to not depend on enum old_plane_id
v3: s/for_each_pipe/for_each_intel_crtc/
Cc: Thierry Reding <thierry.reding@gmail.com>
Cc: Alex Villacís Lasso <alexvillacislasso@hotmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103223
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Tested-by: Thierry Reding <thierry.reding@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171117191917.11506-3-ville.syrjala@linux.intel.com
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Ville Syrjälä [Fri, 17 Nov 2017 19:19:08 +0000 (21:19 +0200)]
drm/i915: Add .get_hw_state() method for planes
Add a .get_hw_state() method for planes, returning true or false
depending on whether the plane is enabled. Use it to rewrite the
plane enabled/disabled asserts in platform agnostic fashion.
We do lose the pre-gen4 plane<->pipe mapping checks, but since we're
supposed sanitize that anyway it doesn't really matter.
v2: Reoder patches to not depend on enum old_plane_id
Just call assert_plane_disabled() from assert_planes_disabled()
v3: Deal with disabled power wells in .get_hw_state()
v4: Rebase due skl primary plane code removal
Cc: Thierry Reding <thierry.reding@gmail.com>
Cc: Alex Villacís Lasso <alexvillacislasso@hotmail.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> #v2
Tested-by: Thierry Reding <thierry.reding@gmail.com> #v2
Link: https://patchwork.freedesktop.org/patch/msgid/20171117191917.11506-2-ville.syrjala@linux.intel.com
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
David Weinehall [Fri, 17 Nov 2017 08:01:46 +0000 (10:01 +0200)]
drm/i915: Don't use GEN6_RC_VIDEO_FREQ on gen10+
GEN6_RC_VIDEO_FREQ is deprecated for >= gen10;
don't try to program it.
v2: Use IS_GEN9() instead of INTEL_GEN() and remove comment (Rodrigo)
Signed-off-by: David Weinehall <david.weinehall@linux.intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171117080146.20150-1-david.weinehall@linux.intel.com
Chris Wilson [Mon, 20 Nov 2017 21:19:07 +0000 (21:19 +0000)]
drm/i915/selftests: Declare we allocated the guc clients
Silence smatch over
drivers/gpu/drm/i915/selftests/intel_guc.c:135 igt_guc_init_doorbell_hw() error: we previously assumed 'guc->execbuf_client' could be null (see line 123)
drivers/gpu/drm/i915/selftests/intel_guc.c:142 igt_guc_init_doorbell_hw() error: we previously assumed 'guc->preempt_client' could be null (see line 123)
by asserting that we did succeed in creating the pair of clients for
testing.
References:
55bd6bd75717 ("drm/i915/selftests: Add a GuC doorbells selftest")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171120211907.1649-1-chris@chris-wilson.co.uk
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Chris Wilson [Mon, 20 Nov 2017 20:55:04 +0000 (20:55 +0000)]
drm/i915: Remove i915.semaphores modparam
Having disabled the broken semaphores on Sandybridge, there is no need
for a modparam any more, so remove it in favour of a simple
HAS_LEGACY_SEMAPHORES() guard.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171120205504.21892-5-chris@chris-wilson.co.uk
Chris Wilson [Mon, 20 Nov 2017 20:55:03 +0000 (20:55 +0000)]
drm/i915: Move debugfs/i915_semaphore_status to i915_engine_info
As the semaphores is just part of the engine, include it with the
general pretty printer universally used for debugging.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171120205504.21892-4-chris@chris-wilson.co.uk
Chris Wilson [Mon, 20 Nov 2017 20:55:02 +0000 (20:55 +0000)]
drm/i915: Disable semaphores on Sandybridge
I should have admitted defeat long ago as there has been a rare but
persistent error on Sandybridge where semaphore signaling did not
propagate to the waiter, leading to a GPU hang.
With the work on fence signaling for v4.9, the impact of using CPU driven
signaling was greatly reduced wrt to the latency of GPU semaphores,
though without logical rings support, the benefit of reordering work to
avoid bubbles is not realised (i.e. as it stands fence signaling is just
a slower, more costly version of HW semaphores; but works more
consistently). As a rough indicator of the difference,
with semaphores:
Sequential (3 engines, 1 processes): average 5.470us per cycle [expected 4.988us]
w/o semaphores:
Sequential (3 engines, 1 processes): average 15.771us per cycle [expected 4.923us]
In comparison, v3.4:
with semaphores:
Sequential (3 engines, 1 processes): average 16.066us per cycle [expected 11.842us]
w/o semaphores:
Sequential (3 engines, 1 processes): average 23.460us per cycle [expected 11.839us]
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=54226 #and 100+ dupes
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171120205504.21892-3-chris@chris-wilson.co.uk
Chris Wilson [Mon, 20 Nov 2017 20:55:01 +0000 (20:55 +0000)]
drm/i915: Remove obsolete ringbuffer emission for gen8+
Since removing the module parameter to force selection of ringbuffer
emission for gen8, the code is defunct. Remove it.
To put the difference into perspective, a couple of microbenchmarks
(bdw i7-5557u,
20170324):
ring execlists
exec continuous nops on all rings: 1.491us 2.223us
exec sequential nops on each ring: 12.508us 53.682us
single nop + sync: 9.272us 30.291us
vblank_mode=0 glxgears: ~11000fps ~9000fps
Since the earlier submission, gen8 ringbuffer submission has fallen
further and further behind in features. So while ringbuffer may hold the
throughput crown, in terms of interactive latency, execlists is much
better. Alas, we have no convenient metrics for such, other than
demonstrating things we can do with execlists but can not using
legacy ringbuffer submission.
We have made a few improvements to lowlevel execlists throughput,
and ringbuffer currently panics on boot! (bdw i7-5557u,
20171026):
ring execlists
exec continuous nops on all rings: n/a 1.921us
exec sequential nops on each ring: n/a 44.621us
single nop + sync: n/a 21.953us
vblank_mode=0 glxgears: n/a ~18500fps
References: https://bugs.freedesktop.org/show_bug.cgi?id=87725
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Once-upon-a-time-Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171120205504.21892-2-chris@chris-wilson.co.uk
Chris Wilson [Mon, 20 Nov 2017 20:55:00 +0000 (20:55 +0000)]
drm/i915: Remove i915.enable_execlists module parameter
Execlists and legacy ringbuffer submission are no longer feature
comparable (execlists now offer greater functionality that should
overcome their performance hit) and obsoletes the unsafe module
parameter, i.e. comparing the two modes of execution is no longer
useful, so remove the debug tool.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> #i915_perf.c
Link: https://patchwork.freedesktop.org/patch/msgid/20171120205504.21892-1-chris@chris-wilson.co.uk
Michel Thierry [Mon, 20 Nov 2017 12:34:58 +0000 (12:34 +0000)]
drm/i915/execlists: Delay writing to ELSP until HW has processed the previous write
The hardware needs some time to process the information received in the
ExecList Submission Port, and expects us to not write anything more until
it has 'acknowledged' this new submission by sending an IDLE_ACTIVE or
PREEMPTED CSB event.
If we do not follow this, the driver could write new data into the ELSP
before HW had finishing fetching the previous one, putting us in
'undefined behaviour' space.
This seems to be the problem causing the spurious PREEMPTED & COMPLETE
events after a COMPLETE like the one below:
[] vcs0: sw rd pointer = 2, hw wr pointer = 0, current 'head' = 3.
[] vcs0: Execlist CSB[0]: 0x00000018 _ 0x00000007
[] vcs0: Execlist CSB[1]: 0x00000001 _ 0x00000000
[] vcs0: Execlist CSB[2]: 0x00000018 _ 0x00000007 <<< COMPLETE
[] vcs0: Execlist CSB[3]: 0x00000012 _ 0x00000007 <<< PREEMPTED & COMPLETE
[] vcs0: Execlist CSB[4]: 0x00008002 _ 0x00000006
[] vcs0: Execlist CSB[5]: 0x00000014 _ 0x00000006
The ELSP writes that lead to this CSB sequence show that the HW hadn't
started executing the previous execlist (the one with only ctx 0x6) by the
time the new one was submitted; this is a bit more clear in the data
show in the EXECLIST_STATUS register at the time of the ELSP write.
[] vcs0: ELSP[0] = 0x0_0 [execlist1] - status_reg = 0x0_302
[] vcs0: ELSP[1] = 0x6_fedb2119 [execlist0] - status_reg = 0x0_8302
[] vcs0: ELSP[2] = 0x7_fedaf119 [execlist1] - status_reg = 0x0_8308
[] vcs0: ELSP[3] = 0x6_fedb2119 [execlist0] - status_reg = 0x7_8308
Note that having to wait for this ack does not disable lite-restores,
although it may reduce their numbers.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102035
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/<20171118003038.7935-1-michel.thierry@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171120123458.23242-4-chris@chris-wilson.co.uk
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Chris Wilson [Mon, 20 Nov 2017 13:26:06 +0000 (13:26 +0000)]
drm/i915/selftest: Make guc clients static
Make the private array used for stashing test clients static, to silence
sparse.
References:
55bd6bd75717 ("drm/i915/selftests: Add a GuC doorbells selftest")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171120132606.4254-1-chris@chris-wilson.co.uk
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Lionel Landwerlin [Fri, 27 Oct 2017 14:59:31 +0000 (15:59 +0100)]
drm/i915/perf: reuse timestamp frequency from device info
Now that we have this stored in the device info, we can drop it from perf
part of the driver.
Note that this requires to init perf after we've computed the frequency,
hence why we move i915_perf_init() from i915_driver_init_early() to after
intel_device_info_runtime_init().
v2: Use div_u64 (Chris)
v3: Drop u64 divs by switching to kHz (Chris/Ville)
Move i915_perf_fini to i915_driver_cleanup_hw (Matthew)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171113181902.12411-2-lionel.g.landwerlin@intel.com
Chris Wilson [Mon, 20 Nov 2017 10:20:02 +0000 (10:20 +0000)]
drm/i915: Automatic i915_switch_context for legacy
During request construction, after pinning the context we know whether
or not we have to emit a context switch. So move this common operation
from every caller into i915_gem_request_alloc() itself.
v2: Always submit the request if we emitted some commands during request
construction, as typically it also involves changes in global state.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171120102002.22254-2-chris@chris-wilson.co.uk
Chris Wilson [Mon, 20 Nov 2017 10:20:01 +0000 (10:20 +0000)]
drm/i915: Pull the unconditional GPU cache invalidation into request construction
As the request will, in the following patch, implicitly invoke a
context-switch on construction, we should precede that with a GPU TLB
invalidation. Also, even before using GGTT, we always want to invalidate
the TLBs for any updates (as well as the ppgtt invalidates that are
unconditionally applied by execbuf). Since we almost always require the
TLB invalidate, do it unconditionally on request allocation and so we can
remove it from all other paths.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171120102002.22254-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Lionel Landwerlin [Mon, 13 Nov 2017 23:34:52 +0000 (23:34 +0000)]
drm/i915/perf: replace .reg accesses with i915_mmio_reg_offset
This replaces accesses to the reg field of the i915_reg_t structure
with the i915_mmio_reg_offset() inline function.
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ewelina Musial <ewelina.musial@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171113233455.12085-2-lionel.g.landwerlin@intel.com
Chris Wilson [Mon, 20 Nov 2017 12:34:57 +0000 (12:34 +0000)]
drm/i915/execlists: Assert that we don't get mixed IDLE_ACTIVE | COMPLETE events
If IDLE_ACTIVE is set, then all other bits are invalid. For us, we can
assert that if we see a COMPLETE | PREEMPTED event, then it should be
impossible for it to also contain an IDLE_ACTIVE flag.
Suggested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171120123458.23242-3-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Chris Wilson [Mon, 20 Nov 2017 12:34:56 +0000 (12:34 +0000)]
drm/i915/execlists: Reduce completed event mask to COMPLETE | PREEMPTED
Since we get a COMPLETE event when the context switch occurs on
RING_HEAD == RING_TAIL and a PREEMPTED event when a switch occurs
before that point, COMPLETE | PREEMPTED should cover all possible context
switch completion events. We can move the ELEMENT_SWITCH info message
from the COMPLETED_MASK into an assertion for when we are performing a
switch to port[1].
Suggested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171120123458.23242-2-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Chris Wilson [Mon, 20 Nov 2017 12:34:55 +0000 (12:34 +0000)]
drm/i915/execlists: Listen to COMPLETE context event not ACTIVE_IDLE
Since commit
e1fee72c2ea2e9c0c6e6743d32a6832f21337d6c
Author: Oscar Mateo <oscar.mateo@intel.com>
Date: Thu Jul 24 17:04:40 2014 +0100
drm/i915/bdw: Avoid non-lite-restore preemptions
execlists has listened to (ACTIVE_IDLE | ELEMENT_SWITCH) for detecting
when one context completed and it either continued onto the next (in port
1) or idled. We would always see COMPLETE | ACTIVE_IDLE on the final
context-switch event, but on recent gen it appears that we now get
separate ACTIVE_IDLE and COMPLETE events. In particular, the ACTIVE_IDLE
events may not be coupled to a context (since it is a general state rather
than a specific context completion event).
v2: Update the history, execlists did originally start out by listening
to the COMPLETE event not ACTIVE_IDLE.
v3: Update preempt completion test to also use COMPLETE not ACTIVE_IDLE.
References: bspec/12255
References: https://bugs.freedesktop.org/show_bug.cgi?id=103800
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Oscar Mateo <oscar.mateo@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Acked-by: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171120123458.23242-1-chris@chris-wilson.co.uk
Ville Syrjälä [Thu, 16 Nov 2017 16:02:15 +0000 (18:02 +0200)]
drm/i915: Fix init_clock_gating for resume
Moving the init_clock_gating() call from intel_modeset_init_hw() to
intel_modeset_gem_init() had an unintended effect of not applying
some workarounds on resume. This, for example, cause some kind of
corruption to appear at the top of my IVB Thinkpad X1 Carbon LVDS
screen after hibernation. Fix the problem by explicitly calling
init_clock_gating() from the resume path.
I really hope this doesn't break something else again. At least
the problems reported at https://bugs.freedesktop.org/show_bug.cgi?id=103549
didn't make a comeback, even after a hibernate cycle.
v2: Reorder the init_clock_gating vs. modeset_init_hw to match
the display reset path (Rodrigo)
Cc: stable@vger.kernel.org
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Fixes:
6ac43272768c ("drm/i915: Move init_clock_gating() back to where it was")
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171116160215.25715-1-ville.syrjala@linux.intel.com
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Rodrigo Vivi [Fri, 17 Nov 2017 22:47:02 +0000 (14:47 -0800)]
drm/i915: Update DRIVER_DATE to
20171117
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Chris Wilson [Fri, 17 Nov 2017 10:26:35 +0000 (10:26 +0000)]
drm/i915: Add a policy note for removing workarounds
Rodrigo gave a persuasive argument for keeping workarounds: that they
serve as a good guide for the bring up of the next generation. Not only
do workarounds persist into the early revisions, they show where the
workarounds were previously added to the code flow and sometimes the old
workarounds have an explanation that give insight into their wider
implications.
Based on his suggestion, document the policy that we want to keep the
workarounds from the current generation to guide the next. Older
preproduction workarounds we still want to remove to keep the code
clean.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20171117102635.8689-1-chris@chris-wilson.co.uk
Acked-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Chris Wilson [Fri, 17 Nov 2017 16:29:45 +0000 (16:29 +0000)]
drm/i915/selftests: Report ENOMEM clearly for an allocation failure
If we can not run the drunk_hole test because we couldn't allocate the
memory for the permutation array (even after we tried trimming the
size), report a clear ENOMEM. Similary, if we are asked to operate on a
hole too small for ourselves, make it skip quietly.
v2: Avoid malloc(0) since that returns ZERO_SIZE_PTR not NULL.
v3: Fixup similar construction for lowlevel_hole
v4: Use u64 >> 1 to avoid 64b div.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171117101732.4335-1-chris@chris-wilson.co.uk
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171117162945.16390-1-chris@chris-wilson.co.uk
Radhakrishna Sripada [Fri, 17 Nov 2017 01:08:25 +0000 (17:08 -0800)]
Revert "drm/i915: Display WA #1133 WaFbcSkipSegments:cnl, glk"
This reverts commit
8f067837c4b713ce2e69be95af7b2a5eb3bd7de8.
HSD says "WA withdrawn. It was causing corruption with some images.
WA is not strictly necessary since this bug just causes loss of FBC
compression with some sizes and images, but doesn't break anything."
Fixes:
8f067837c4b7 ("drm/i915: Display WA #1133 WaFbcSkipSegments:cnl, glk")
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171117010825.23118-1-radhakrishna.sripada@intel.com
Maarten Lankhorst [Wed, 15 Nov 2017 16:31:57 +0000 (17:31 +0100)]
drm/i915: Calculate g4x intermediate watermarks correctly
The watermarks it should calculate against are the old optimal watermarks.
The currently active crtc watermarks are pure fiction, and are invalid in
case of a nonblocking modeset, page flip enabling/disabling planes or any
other reason.
When the crtc is disabled or during a modeset the intermediate watermarks
don't need to be programmed separately, and could be directly assigned
to the optimal watermarks.
CXSR must always be disabled in the intermediate case for modesets,
else we get a WARN for vblank wait timeout.
Also rename crtc_state to new_crtc_state, to distinguish it from the old
state.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171115163157.14372-2-maarten.lankhorst@linux.intel.com
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Maarten Lankhorst [Wed, 15 Nov 2017 16:31:56 +0000 (17:31 +0100)]
drm/i915: Calculate vlv/chv intermediate watermarks correctly, v3.
The watermarks it should calculate against are the old optimal watermarks.
The currently active crtc watermarks are pure fiction, and are invalid in
case of a nonblocking modeset, page flip enabling/disabling planes or any
other reason.
When the crtc is disabled or during a modeset the intermediate watermarks
don't need to be programmed separately, and could be directly assigned
to the optimal watermarks.
CXSR must always be disabled in the intermediate case for modesets, else
we get a WARN for vblank wait timeout.
Also rename crtc_state to new_crtc_state, to distinguish it from the old state.
Changes since v1:
- Use intel_atomic_get_old_crtc_state. (ville)
Changes since v2:
- Always unset cxsr during modeset.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171115163157.14372-1-maarten.lankhorst@linux.intel.com
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Maarten Lankhorst [Fri, 10 Nov 2017 11:35:00 +0000 (12:35 +0100)]
drm/i915: Pass crtc_state to ips toggle functions, v2
Changes since v1:
- Only pass crtc_state, not crtc.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171110113503.16253-8-maarten.lankhorst@linux.intel.com
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Maarten Lankhorst [Fri, 10 Nov 2017 11:34:59 +0000 (12:34 +0100)]
drm/i915: Pass idle crtc_state to intel_dp_sink_crc
IPS can only be enabled if the primary plane is visible, so
first make sure sw state matches hw state by waiting for hw_done.
After this pass crtc_state to intel_dp_sink_crc() so that can be used,
instead of using legacy pointers.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171110113503.16253-7-maarten.lankhorst@linux.intel.com
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Maarten Lankhorst [Mon, 13 Nov 2017 14:40:43 +0000 (15:40 +0100)]
drm/i915: Enable FIFO underrun reporting after initial fastset, v4.
The firmware may have set up the pipe correctly, but the FIFO
underrun and CRC interrupts are likely not enabled.
This resulted in debugfs_test.read_all_entries failing on haswell,
because of a timeout when reading the crc debugfs entry.
Solve this by enabling FIFO underrun reporting after the initial
fastset, which lets interrupts be generated as expected.
Changes since v1:
- Always enable CPU FIFO underrun reporting for >GEN2,
and handle GEN2 correctly.
Changes since v2:
- Remove unneeded HAS_DDI, simplify GEN2 case.
Changes since v3:
- Use intel_crtc_pch_transcoder to determine pch transcoder for underruns. (Ville)
- Remove crtc->config dereference in intel_crtc_pch_transcoder. (Ville)
Testcase: debugfs_test.read_all_entries
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171113144043.58658-1-maarten.lankhorst@linux.intel.com
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Chris Wilson [Tue, 14 Nov 2017 17:35:20 +0000 (17:35 +0000)]
drm/i915: Mark the userptr invalidate workqueue as WQ_MEM_RECLAIM
Commit
21cc6431e0c2 ("drm/i915: Mark the userptr invalidate workqueue
as WQ_MEM_RECLAIM") tried to fixup the check_flush_dependency warning
for hitting i915_gem_userptr_mn_invalidate_range_start from within the
shrinker, but I failed to notice userptr has 2 similarly named
workqueues. I marked up i915-userptr-acquire as WQ_MEM_RECLAIM whereas
we only wait upon i915-userptr-release from inside the reclaim paths.
[62530.869510] workqueue: PF_MEMALLOC task 7983(gem_shrink) is flushing !WQ_MEM_RECLAIM i915-userptr-release: (null)
[62530.869515] ------------[ cut here ]------------
[62530.869519] WARNING: CPU: 1 PID: 7983 at kernel/workqueue.c:2434 check_flush_dependency+0x7f/0x110
[62530.869519] Modules linked in: pegasus mii ip6table_filter ip6_tables bnep iptable_filter snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic binfmt_misc nls_iso8859_1 intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_intel snd_hda_codec kvm_intel snd_hda_core snd_hwdep kvm snd_pcm irqbypass snd_seq_midi snd_seq_midi_event snd_rawmidi crct10dif_pclmul crc32_pclmul 8250_dw ghash_clmulni_intel snd_seq pcbc snd_seq_device snd_timer btusb aesni_intel btrtl btbcm aes_x86_64 iwlwifi btintel crypto_simd glue_helper cryptd bluetooth snd intel_cstate input_leds idma64 intel_rapl_perf ecdh_generic serio_raw soundcore cfg80211 wmi_bmof virt_dma intel_lpss_pci intel_lpss acpi_als kfifo_buf industrialio winbond_cir soc_button_array rc_core spidev tpm_crb intel_hid acpi_pad mac_hid sparse_keymap
[62530.869546] parport_pc ppdev lp parport ip_tables x_tables autofs4 hid_generic usbhid i915 i2c_algo_bit prime_numbers drm_kms_helper syscopyarea e1000e sysfillrect sysimgblt fb_sys_fops ahci ptp pps_core libahci drm wmi video i2c_hid hid
[62530.869557] CPU: 1 PID: 7983 Comm: gem_shrink Tainted: G U W L 4.14.0-rc8-drm-tip-ww45-commit-1342299+ #1
[62530.869558] Hardware name: Intel Corporation CoffeeLake Client Platform/CoffeeLake H DDR4 RVP, BIOS CNLSFWR1.R00.X098.A00.
1707301945 07/30/2017
[62530.869559] task:
ffffa1049dbeec80 task.stack:
ffffae7d05c44000
[62530.869560] RIP: 0010:check_flush_dependency+0x7f/0x110
[62530.869561] RSP: 0018:
ffffae7d05c473a0 EFLAGS:
00010286
[62530.869562] RAX:
000000000000006e RBX:
ffffa1049540f400 RCX:
ffffffffa3e55788
[62530.869562] RDX:
0000000000000000 RSI:
0000000000000092 RDI:
0000000000000202
[62530.869563] RBP:
ffffae7d05c473c0 R08:
000000000000006e R09:
000000000038bb0e
[62530.869563] R10:
0000000000000000 R11:
000000000000006e R12:
ffffa1049dbeec80
[62530.869564] R13:
0000000000000000 R14:
0000000000000000 R15:
ffffae7d05c473e0
[62530.869565] FS:
00007f621b129880(0000) GS:
ffffa1050b240000(0000) knlGS:
0000000000000000
[62530.869566] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[62530.869566] CR2:
00007f6214400000 CR3:
0000000353a17003 CR4:
00000000003606e0
[62530.869567] DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
[62530.869567] DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
[62530.869568] Call Trace:
[62530.869570] flush_workqueue+0x115/0x3d0
[62530.869573] ? wake_up_process+0x15/0x20
[62530.869596] i915_gem_userptr_mn_invalidate_range_start+0x12f/0x160 [i915]
[62530.869614] ? i915_gem_userptr_mn_invalidate_range_start+0x12f/0x160 [i915]
[62530.869616] __mmu_notifier_invalidate_range_start+0x55/0x80
[62530.869618] try_to_unmap_one+0x791/0x8b0
[62530.869620] ? call_rwsem_down_read_failed+0x18/0x30
[62530.869622] rmap_walk_anon+0x10b/0x260
[62530.869624] rmap_walk+0x48/0x60
[62530.869625] try_to_unmap+0x93/0xf0
[62530.869626] ? page_remove_rmap+0x2a0/0x2a0
[62530.869627] ? page_not_mapped+0x20/0x20
[62530.869629] ? page_get_anon_vma+0x90/0x90
[62530.869630] ? invalid_mkclean_vma+0x20/0x20
[62530.869631] migrate_pages+0x946/0xaa0
[62530.869633] ? __ClearPageMovable+0x10/0x10
[62530.869635] ? isolate_freepages_block+0x3c0/0x3c0
[62530.869636] compact_zone+0x22f/0x970
[62530.869638] compact_zone_order+0xa3/0xd0
[62530.869640] try_to_compact_pages+0x1a5/0x2a0
[62530.869641] ? try_to_compact_pages+0x1a5/0x2a0
[62530.869643] __alloc_pages_direct_compact+0x50/0x110
[62530.869644] __alloc_pages_slowpath+0x4da/0xf30
[62530.869646] __alloc_pages_nodemask+0x262/0x280
[62530.869648] alloc_pages_vma+0x165/0x1e0
[62530.869649] shmem_alloc_hugepage+0xd0/0x130
[62530.869651] ? __radix_tree_insert+0x45/0x230
[62530.869652] ? __vm_enough_memory+0x29/0x130
[62530.869654] shmem_alloc_and_acct_page+0x10d/0x1e0
[62530.869655] shmem_getpage_gfp+0x426/0xc00
[62530.869657] shmem_fault+0xa0/0x1e0
[62530.869659] ? file_update_time+0x60/0x110
[62530.869660] __do_fault+0x1e/0xc0
[62530.869661] __handle_mm_fault+0xa35/0x1170
[62530.869662] handle_mm_fault+0xcc/0x1c0
[62530.869664] __do_page_fault+0x262/0x4f0
[62530.869666] do_page_fault+0x2e/0xe0
[62530.869667] page_fault+0x22/0x30
[62530.869668] RIP: 0033:0x404335
[62530.869669] RSP: 002b:
00007fff7829e420 EFLAGS:
00010216
[62530.869670] RAX:
00007f6210400000 RBX:
0000000000000004 RCX:
0000000000b80000
[62530.869670] RDX:
0000000000002e01 RSI:
0000000000008000 RDI:
0000000000000004
[62530.869671] RBP:
0000000000000019 R08:
0000000000000002 R09:
0000000000000000
[62530.869671] R10:
0000000000000559 R11:
0000000000000246 R12:
0000000008000000
[62530.869672] R13:
00000000004042f0 R14:
0000000000000004 R15:
000000000000007e
[62530.869673] Code: 00 8b b0 18 05 00 00 48 8d 8b b0 00 00 00 48 8d 90 c0 06 00 00 4d 89 f0 48 c7 c7 40 c0 c8 a3 c6 05 68 c5 e8 00 01 e8 c2 68 04 00 <0f> ff 4d 85 ed 74 18 49 8b 45 20 48 8b 70 08 8b 86 00 01 00 00
[62530.869691] ---[ end trace
01e01ad0ff5781f8 ]---
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103739
Fixes:
21cc6431e0c2 ("drm/i915: Mark the userptr invalidate workqueue as WQ_MEM_RECLAIM")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171114173520.8829-1-chris@chris-wilson.co.uk
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Chris Wilson [Tue, 14 Nov 2017 21:56:55 +0000 (21:56 +0000)]
drm/i915: Add might_sleep() check to wait_for()
We should long past the time of trying to use wait_for() from inside
atomic contexts, so add a might_sleep() check to prevent misuse.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171114215655.4849-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Michel Thierry [Thu, 16 Nov 2017 22:06:31 +0000 (14:06 -0800)]
drm/i915/selftests: Add a GuC doorbells selftest
The first test aims to check guc_init_doorbell_hw, changing the existing
guc clients and doorbells state before calling it.
The second test tries to create as many clients as it is currently possible
(currently limited to max number of doorbells) and exercise the doorbell
alloc/dealloc code.
Since our usage mode require very few clients/doorbells, this code has
been exercised very lightly and it's good to have a simple test for it.
As reference, this test already helped identify the bug fixed by
commit
7f1ea2ac3017 ("drm/i915/guc: Fix doorbell id selection").
v2: Extend number of clients; check for client allocation failure when
number of doorbells is exceeded; validate client properties; reuse
guc_init_doorbell_hw (Chris).
v3: guc_init_doorbell_hw test added per Chris suggestion.
v4: Try to explain why guc_init_doorbell_hw exist and comment some
details in the subtest.
v5: Remove redundant pr_info at the beginning of each subtest (Chris);
rebase (s/i915_guc_client/intel_guc_client/).
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171116220632.1909-1-michel.thierry@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Rodrigo Vivi [Thu, 16 Nov 2017 20:12:28 +0000 (12:12 -0800)]
Merge tag 'gvt-next-2017-11-16' of https://github.com/intel/gvt-linux into drm-intel-next-queued
gvt-next-2017-11-16
- CSB HWSP update support (Weinan)
- GVT debug helpers, dyndbg and debugfs (Chuanxiao, Shuo)
- full virtualized opregion (Xiaolin)
- VM health check for sane fallback (Fred)
- workload submission code refactor for future enabling (Zhi)
- Updated repo URL in MAINTAINERS (Zhenyu)
- other many misc fixes
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171116092007.ww5bvfx7rf36bjmn@zhen-hp.sh.intel.com
Rodrigo Vivi [Wed, 15 Nov 2017 18:42:05 +0000 (10:42 -0800)]
drm/i915/cnl: Extend HDMI 2.0 support to CNL.
Starting on GLK we support HDMI 2.0. So this patch only
extend the work Shashank has made to GLK to CNL.
v2: The version that compiles :/
v3: Invert order to newer || older platforms check. (Ville).
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Shashank Sharma <shashank.sharma@intel.com>
Cc: Manasi Navare <manasi.d.navare@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171115184205.8104-1-rodrigo.vivi@intel.com
Rodrigo Vivi [Wed, 15 Nov 2017 18:42:57 +0000 (10:42 -0800)]
drm/i915/cnl: Simplify dco_fraction calculation.
I confess I never fully understood that previous calculation,
so this is not a "fix". But let's simplify this math
so poor brains like mine can read and make some sense of
it in the future.
v2: Don't follow the spec since that gives invalid
values and it is also confusing. This Ville's
version is much simpler.
v3: Use u64 cast instead of declaring a u64 dco. (Ville).
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Mika Kahola <mika.kahola@intel.com>
Cc: Manasi Navare <manasi.d.navare@intel.com>
Cc: James Ausmus <james.ausmus@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171115184257.8633-1-rodrigo.vivi@intel.com
Rodrigo Vivi [Tue, 14 Nov 2017 19:47:57 +0000 (11:47 -0800)]
drm/i915/cnl: Don't blindly replace qdiv.
Accordingly to spec "If Kdiv != 2, then Qdiv must be 1."
but we already handle qdiv values properly and this case here
should be spurious. But instead of blindly replacing let's
warn loudly instead. Because it means something was really
wrong on initial setup.
Cc: Mika Kahola <mika.kahola@intel.com>
Cc: Manasi Navare <manasi.d.navare@intel.com>
Cc: James Ausmus <james.ausmus@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Manasi Navare <manasi.d.navare@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171114194759.24541-6-rodrigo.vivi@intel.com
Rodrigo Vivi [Tue, 14 Nov 2017 23:42:23 +0000 (15:42 -0800)]
drm/i915/cnl: Fix wrpll math for higher freqs.
Spec describe all values in MHz. We handle our
clocks in KHz. This includes the best_dco_centrality that was
forgot in the same unity as spec. Consequently we couldn't
get a good divider for high frequenies. Hence HDMI 2.0 wasn't
working.
Spec tells 999999 for initial best_dco_centrality meaning the
max value in MHz.
Since we convert dco from MHz to KHz we also need to convert
this initial best_doc_centrality to
999999000 or
999999999
or even better, to the max that its variable allow.
This patch also replaces the use of "* KHz(1)" with the values
directly on KHz to avoid future confusion.
v2: Use U32_MAX instead of random 99999 as spec tells. (Ville).
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Shashank Sharma <shashank.sharma@intel.com>
Cc: Mika Kahola <mika.kahola@intel.com>
Cc: Manasi Navare <manasi.d.navare@intel.com>
Cc: James Ausmus <james.ausmus@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171114234223.10600-1-rodrigo.vivi@intel.com
Rodrigo Vivi [Tue, 14 Nov 2017 19:47:55 +0000 (11:47 -0800)]
drm/i915/cnl: Fix, simplify and unify wrpll variable sizes.
- 64 bits is not needed for afe_clock now we don't convert
that to Hz.
- 16 bits is not enough for all dco stuff.
- unsigned is not relevant/needed for all divisors values.
Cc: Mika Kahola <mika.kahola@intel.com>
Cc: Manasi Navare <manasi.d.navare@intel.com>
Cc: James Ausmus <james.ausmus@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Manasi Navare <manasi.d.navare@intel.com
Link: https://patchwork.freedesktop.org/patch/msgid/20171114194759.24541-4-rodrigo.vivi@intel.com
Rodrigo Vivi [Tue, 14 Nov 2017 19:47:54 +0000 (11:47 -0800)]
drm/i915/cnl: Remove useless conversion.
No functional change. Just starting the wrpll fixes
with a clean-up to make units a bit more clear.
Cc: Mika Kahola <mika.kahola@intel.com>
Cc: Manasi Navare <manasi.d.navare@intel.com>
Cc: James Ausmus <james.ausmus@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Manasi Navare <manasi.d.navare@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171114194759.24541-3-rodrigo.vivi@intel.com
Rodrigo Vivi [Tue, 14 Nov 2017 19:47:53 +0000 (11:47 -0800)]
drm/i915/cnl: Remove spurious central_freq.
"Display software must leave this field at the default value.
It no longer needs to be configured as part of PLL programming."
We respect this already and we are setting up the default
one line below: "DPLL_CFGCR1_CENTRAL_FREQ".
Also we don't touch anywhere else this central_freq for cnl.
So let's remove from the final write.
No functional change. Only a clean-up patch.
Cc: Manasi Navare <manasi.d.navare@intel.com>
Cc: Mika Kahola <mika.kahola@intel.com>
Cc: James Ausmus <james.ausmus@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Manasi Navare <manasi.d.navare@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171114194759.24541-2-rodrigo.vivi@intel.com
Chris Wilson [Wed, 15 Nov 2017 15:25:58 +0000 (15:25 +0000)]
drm/i915/selftests: exercise_ggtt may have nothing to do
When operating on the live_ggtt we have to find a usuable hole for our
test. It is possible for there to be no hole we can use, so initialise
the err to 0 for the early exit.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171115152558.31252-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Ville Syrjälä [Wed, 15 Nov 2017 20:04:42 +0000 (22:04 +0200)]
drm/i915: Don't sanitize frame start delay if the pipe is off
Avoid touching PIPECONF in intel_sanitize_crtc() unless the pipe is
actually on. Should cure some unclaimed register accesses during reset,
as we are rather cavalier in our approach to powerdomain management.
We don't have to sanitize this if the pipe is off since we will
overwrite the frame start delay anyway when turning the pipe on.
v2: Amended commit message to implicate the reset path (Chris)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102249
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171115200442.15051-1-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Sagar Arun Kamble [Thu, 16 Nov 2017 13:32:41 +0000 (19:02 +0530)]
drm/i915/guc: Rename i915_guc_submission.c|h to intel_guc_submission.c|h
With all component structures and functions named appropriately, change
the names of GuC submission source files. There were bunch of style issues
in guc_submission.c that are highlighted now by checkpatch. Fix those.
Update name in Documentation/gpu. (Joonas)
v2: Rebase.
v3: Rebase.
Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Acked-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1510839162-25197-6-git-send-email-sagar.a.kamble@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Sagar Arun Kamble [Thu, 16 Nov 2017 13:32:40 +0000 (19:02 +0530)]
drm/i915/guc: Rename i915_guc_client struct to intel_guc_client
GuC submission clients are currently being used in kernel only hence
update the structure name to intel_guc_client.
Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Acked-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1510839162-25197-5-git-send-email-sagar.a.kamble@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Sagar Arun Kamble [Thu, 16 Nov 2017 13:32:39 +0000 (19:02 +0530)]
drm/i915/guc: Update name and prototype of GuC submission interface functions
i915 GuC submission is hardware interface and GuC APIs that are not user
facing should be named intel_guc* hence we change GuC submission related
functions name prefix to intel_guc. Also changed the parameter to these
functions to intel_guc struct.
v2: Using local guc variable in intel_uc_fini_hw. (Michal Wajdeczko)
Rebase.
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Acked-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1510839162-25197-4-git-send-email-sagar.a.kamble@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Sagar Arun Kamble [Thu, 16 Nov 2017 13:32:38 +0000 (19:02 +0530)]
drm/i915/guc: Update names of submission related static functions
i915_guc_submit, i915_guc_dequeue, i915_guc_submission_park and
i915_guc_submission_upark are functions internal to GuC submission
hence remove "i915_" prefix.
Suggested-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1510839162-25197-3-git-send-email-sagar.a.kamble@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Sagar Arun Kamble [Thu, 16 Nov 2017 13:32:37 +0000 (19:02 +0530)]
drm/i915: Update execlists tasklet naming
intel_lrc_irq_handler and i915_guc_irq_handler are HW submission related
tasklet functions. Name them with "submission_tasklet" suffix and
remove intel/i915 prefix as they are static. Also rename irq_tasklet
as just tasklet for clarity.
v2: s/_bh/_tasklet (Chris)
Suggested-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/1510839162-25197-2-git-send-email-sagar.a.kamble@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Chris Wilson [Thu, 16 Nov 2017 10:50:59 +0000 (10:50 +0000)]
drm/i915: Prevent overflow of execbuf.buffer_count and num_cliprects
We check whether the multiplies will overflow prior to calling
kmalloc_array so that we can respond with -EINVAL for the invalid user
arguments rather than treating it as an -ENOMEM that would otherwise
occur. However, as Dan Carpenter pointed out, we did an addition on the
unsigned int prior to passing to kmalloc_array where it would be
promoted to size_t for the calculation, thereby allowing it to overflow
and underallocate.
v2: buffer_count is currently limited to INT_MAX because we treat it as
signaled variable for LUT_HANDLE in eb_lookup_vma
v3: Move common checks for eb1/eb2 into the same function
v4: Put the check back for nfence*sizeof(user_fence) overflow
v5: access_ok uses ULONG_MAX but kvmalloc_array uses SIZE_MAX
v6: size_t and unsigned long are not type-equivalent on 32b
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171116105059.25142-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Chris Wilson [Wed, 15 Nov 2017 12:14:58 +0000 (12:14 +0000)]
drm/i915: Clear breadcrumb node when cancelling signaling
When we call intel_engine_cancel_signaling() to stop reporting when
a request is completed via an asynchronous signal, we remove that request
from the breadcrumb wait queue. However, we may be concurrently
processing that request in the signaler itself, the actual operations on
the request's node itself are serialised but we do not actually clear the
waiter after removing it from the tree allowing both parties to attempt
to do so and corrupting the rbtree. (Previously removing from the
breadcrumb wait queue could only be done on behalf of i915_wait_request,
so this race could not happen).
Reported-by: "He, Bo" <bo.he@intel.com>
Fixes:
9eb143bbec7d ("drm/i915: Allow a request to be cancelled")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: "He, Bo" <bo.he@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171115121458.24655-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Mika Kuoppala [Thu, 16 Nov 2017 08:39:54 +0000 (10:39 +0200)]
drm/i915: Print the condition causing GEM_BUG_ON
It is easier to categorize and debug bugs if the failed condition
is in plain sight in the actual dmesg output. Make it so.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Marta Lofstedt <marta.lofstedt@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171116083954.3357-1-mika.kuoppala@linux.intel.com
fred gao [Tue, 14 Nov 2017 09:09:35 +0000 (17:09 +0800)]
drm/i915/gvt: Move request alloc to dispatch_workload path only
Previously the performance is improved through the workload auditing
and shadowing ahead of vGPU scheduling, however, there is the case that
more requests are allocated in submit_context before the previous request
is added, the timeline will hold its seqno which is later.
This patch is to move the request alloc to dispatch_workload function,
where is the same place as request is added.
It will fix the issue of kernel BUG for (timeline->seqno != request->fence.seqno)
check when add_request.
Fixes:
89ea20b930cb ("drm/i915/gvt: Factor out scan and shadow from workload dispatch")
Signed-off-by: Chuanxiao Dong <chuanxiao.dong@intel.com>
Signed-off-by: fred gao <fred.gao@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Xiong Zhang [Tue, 7 Nov 2017 16:45:21 +0000 (00:45 +0800)]
drm/i915/gvt: Let each vgpu has separate opregion memory
Currently every vgpu share a common gvt opregion memory, but
it is freed at vgpu destroy, then the later vgpu doesn't have
opregion memory once the first vgpu is destroyed. This cause
guest function failure like reboot, second or later boot.
This patch allocate and init virt opregion memory for each
vgpu, so this memory could be freed at vgpu destroy.
Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Xiong Zhang [Mon, 6 Nov 2017 21:23:02 +0000 (05:23 +0800)]
drm/i915/gvt: Limit read hw reg to active vgpu
mmio_read_from_hw() let vgpu could read hw reg, if vgpu's workload
is running on hw, things is good. Otherwise vgpu will get other
vgpu's reg val, it is unsafe.
This patch limit such hw access to active vgpu. If vgpu isn't
running on hw, the reg read of this vgpu will get the last active
val which saved at schedule_out.
v2: ring timestamp is walking continuously even if the ring is idle.
so read hw directly. (Zhenyu)
Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Zhenyu Wang [Thu, 2 Nov 2017 09:44:52 +0000 (17:44 +0800)]
Revert "drm/i915/gvt: Refine broken PPGTT scratch"
This reverts commit
b20d09886fd1b74cd2255d846029a049e524db14.
This caused windows driver boot errors for invalid page address.
Revert for now.
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Zhi Wang <zhi.a.wang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Changbin Du [Thu, 2 Nov 2017 05:33:42 +0000 (13:33 +0800)]
drm/i915/gvt: Emulate PCI expansion ROM base address register
Our vGPU doesn't have a device ROM, we need follow the PCI spec to
report this info to drivers. Otherwise, we would see below errors.
Inspecting possible rom at 0xfe049000 (vd=8086:1912 bdf=00:10.0)
qemu-system-x86_64: vfio-pci: Cannot read device rom at
00000000-0000-0000-0000-
000000000001
Device option ROM contents are probably invalid (check dmesg).
Skip option ROM probe with rombar=0, or load from file with romfile=No option rom signature (got 4860)
I will also send a improvement patch to PCI subsystem related to PCI ROM.
But no idea to omit below error, since no pattern to detect vbios shadow
without touch its content.
0000:00:10.0: Invalid PCI ROM header signature: expecting 0xaa55, got 0x0000
Signed-off-by: Changbin Du <changbin.du@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Changbin Du [Thu, 2 Nov 2017 05:33:32 +0000 (13:33 +0800)]
drm/i915/gvt: Make gvt_vgpu_err use pr_err
gvt_vgpu_err means something goes wrong. We need the error propagates to
kernel message by default.
Signed-off-by: Changbin Du <changbin.du@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Changbin Du [Thu, 2 Nov 2017 05:33:23 +0000 (13:33 +0800)]
drm/i915/gvt: Don't dump partial state in cmd parser
I have seen the cmd parser dump partial odd info. Stop that and only dump
the full verbose info when debug enabled.
Signed-off-by: Changbin Du <changbin.du@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Changbin Du [Mon, 30 Oct 2017 06:19:15 +0000 (14:19 +0800)]
drm/i915/gvt: Reduce rcs mocs switch latency
Use I915_WRITE_FW instead of I915_WRITE to reduce overhead.
The overall mmio switch latency lowers from ~600us to ~180us.
Signed-off-by: Changbin Du <changbin.du@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Zhenyu Wang [Thu, 26 Oct 2017 07:29:13 +0000 (15:29 +0800)]
MAINTAINERS: Update gvt-linux.git new repo place
gvt-linux.git repo is moved for its new place under
https://github.com/intel/gvt-linux.git
Old https://github.com/01org/gvt-linux.git is set
only for redirect now.
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Changbin Du [Mon, 23 Oct 2017 03:46:44 +0000 (11:46 +0800)]
drm/i915/gvt: Add new debugfs tool mmio_diff
This new debugfs entry is used to figure out which registers of vGPU
is different to host. It is a useful tool for new platform enabling
and debugging. When read this entry, all the diff mmio are recognized
and sorted by mmio offset. Besides, the bit positions of different
value are listed in 'Diff' column. Here is a show:
$ sudo cat ./mmio_diff
Offset HW vGPU Diff
00002030 000025f8 00000000 3-8,10,13
00002034 012025f8 00000000 3-8,10,13,21,24
00002038 027fb000 00000000 12-13,15-22,25
0000203c 00003000 00000000 12-13
00002054 0000000a 00000040 1,3,6
00002074 012025f8 00000000 3-8,10,13,21,24
00002080 fffe6000 00000000 13-14,17-31
000020a8 fffffeff ffffffff 8
000020d4 00000004 00000000 2
....
00145974 eb42718c 010c11b0 2-5,13-14,17-19,22,25,27,29-31
00145978 0000002f 0000002a 0,2
0014597c 0000002f 0000002a 0,2
00145980 0000002b 00000028 0-1
00145984 a5a87c9e b27d20c0 1-4,6,10-12,14,16,18,20,22-26,28
001459c0 88390000 883c0000 16,18
00146200 88350000 883a0000 16-19
Total: 72432, Diff: 901
v3: fix a typo.
v2: add mmio_hw_access_pre/post().
Signed-off-by: Changbin Du <changbin.du@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Changbin Du [Mon, 23 Oct 2017 03:46:43 +0000 (11:46 +0800)]
drm/i915/gvt: Add mmio iterator intel_gvt_for_each_tracked_mmio()
This patch add a function intel_gvt_for_each_tracked_mmio() to
iterate each tracked mmio. The caller don't be aware of how the
tracked mmios are presented internally.
v2: remove snapshot_hw_mmio_registers().
Signed-off-by: Changbin Du <changbin.du@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Xiaolin Zhang [Fri, 20 Oct 2017 13:52:29 +0000 (21:52 +0800)]
drm/i915/gvt: opregion virtualization for win guest
this is an enhanced opregion emulation for win guest support
by initializing more data members including opregion header
size, version and child device propertity for display port.
for simplicity, redefined child_device_config structure.
Signed-off-by: Xiaolin Zhang <xiaolin.zhang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Weinan Li [Fri, 20 Oct 2017 07:16:46 +0000 (15:16 +0800)]
drm/i915/gvt: update CSB and CSB write pointer in virtual HWSP
The engine provides a mirror of the CSB and CSB write pointer in the HWSP.
Read these status from virtual HWSP in VM can reduce CPU utilization while
applications have much more short GPU workloads. Here we update the
corresponding data in virtual HWSP as it in virtual MMIO.
Before read these status from HWSP in GVT-g VM, please ensure the host
support it by checking the BIT(3) of caps in PVINFO.
Virtual HWSP only support GEN8+ platform, since the HWSP MMIO may change
follow the platform update, please add the corresponding MMIO emulation
when enable new platforms in GVT-g.
v3 : Add address audit in HWSP address update.
v4 :
Separate this patch with enalbe virtual HWSP in VM.
Use intel_gvt_render_mmio_to_ring_id() to determine ring_id by offset.
v5 : Remove unnessary check about Gen8, GVT-g only support Gen8+.
Signed-off-by: Weinan Li <weinan.z.li@intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Zhi Wang [Mon, 16 Oct 2017 18:00:27 +0000 (02:00 +0800)]
drm/i915/gvt: Refine broken PPGTT scratch
Refine previously broken PPGTT scratch. Scratch PTE was no correctly
handled and also the handling of scratch entries in page table walk was
not well organized, which brings gaps of introducing lazy shadow.
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
Zhi Wang [Tue, 10 Oct 2017 09:24:26 +0000 (17:24 +0800)]
drm/i915/gvt: Introduce ops->set_present()
We need ops->set_present() during generating a new scratch page table
entry.
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
Zhi Wang [Tue, 10 Oct 2017 09:19:30 +0000 (17:19 +0800)]
drm/i915/gvt: Introduce page table type of current level in GTT type enumerations
Need to figure out page table type of current level by GTT entry type
during getting a scratch page table entry.
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
Zhi Wang [Tue, 10 Oct 2017 09:14:13 +0000 (17:14 +0800)]
drm/i915/gvt: Fix a bug of unexpectedly clear scratch page table
During a vGPU reset, the scratch page table shouldn't be cleared, what
needs to be cleared should be the scratch page.
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
Zhi Wang [Tue, 10 Oct 2017 06:34:11 +0000 (14:34 +0800)]
drm/i915/gvt: Let the caller choose if a shadow page should be put into hash table
As we want to re-use intel_vgpu_shadow_page in buidling scrach page table
and we don't want to put scrach page table page into hash table, a new
param is introduced to give the caller a choice to decide if a shadow page
should be put into hash table.
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
Zhi Wang [Tue, 10 Oct 2017 05:51:32 +0000 (13:51 +0800)]
drm/i915/gvt: Use I915_GTT_PAGE_SIZE
As there is already an I915_GTT_PAGE_SIZE marco in i915, let GVT-g use it
as well. Also this patch re-names some GTT marcos with additional prefix.
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
Zhi Wang [Sat, 30 Sep 2017 09:42:20 +0000 (17:42 +0800)]
drm/i915/gvt: Export intel_gvt_render_mmio_to_ring_id()
Since many emulation logic needs to convert the offset of ring registers
into ring id, we export it for other caller which might need it.
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
Zhi Wang [Thu, 28 Sep 2017 18:47:55 +0000 (02:47 +0800)]
drm/i915/gvt: Factor intel_vgpu_page_track
As the data structure of "intel_vgpu_guest_page" will become much heavier
in future, it's better to factor out the guest memory page track mechnisim
as early as possible.
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>