Ilia Mirkin [Thu, 4 Jul 2019 18:41:00 +0000 (14:41 -0400)]
gallium: remove boolean from state tracker APIs
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Ilia Mirkin [Thu, 4 Jul 2019 15:41:41 +0000 (11:41 -0400)]
gallium: switch boolean -> bool at the interface definitions
This is a relatively minimal change to adjust all the gallium interfaces
to use bool instead of boolean. I tried to avoid making unrelated
changes inside of drivers to flip boolean -> bool to reduce the risk of
regressions (the compiler will much more easily allow "dirty" values
inside a char-based boolean than a C99 _Bool).
This has been build-tested on amd64 with:
Gallium drivers: nouveau r300 r600 radeonsi freedreno swrast etnaviv v3d
vc4 i915 svga virgl swr panfrost iris lima kmsro
Gallium st: mesa xa xvmc xvmc vdpau va
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Dave Airlie [Tue, 23 Jul 2019 00:40:05 +0000 (10:40 +1000)]
st/nir: fix arb fragment stage conversion
The comment even justifies the wrongness wrongly.
We should be translating to pipe values properly here or else
fragment maps to tess ctrl.
Fixes:
3d7611e9a6c ("st/nir: use NIR for asm programs")
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Marek Olšák [Mon, 22 Jul 2019 19:59:49 +0000 (15:59 -0400)]
radeonsi: fix warning: ‘ret’ may be used uninitialized
Reviewed-by: Dave Airlie <airlied@redhat.com>
Marek Olšák [Mon, 22 Jul 2019 19:59:22 +0000 (15:59 -0400)]
tgsi: fix warning: ‘interp’ may be used uninitialized
Reviewed-by: Dave Airlie <airlied@redhat.com>
Marek Olšák [Mon, 22 Jul 2019 19:58:58 +0000 (15:58 -0400)]
gallivm: fix warning: ‘op’ may be used uninitialized
Reviewed-by: Dave Airlie <airlied@redhat.com>
Kenneth Graunke [Mon, 22 Jul 2019 23:53:32 +0000 (16:53 -0700)]
iris: Support storage images that have matching typed formats for reads
Even if we don't directly support typed reads on a format, we can often
translate them to a reasonable matching format. Advertise those too.
Kenneth Graunke [Mon, 22 Jul 2019 23:18:40 +0000 (16:18 -0700)]
iris: Stop advertising MSAA storage images by mistake
st_extensions.c sets const->MaxImageSamples (GL_MAX_IMAGE_SAMPLES) by
looping over [16, 15, .. 1x] MSAA modes, and RGBA/BGRA/ARGB/ABGR 8888
color formats, calling pipe->is_format_supported() for each, with
the usage set to PIPE_BIND_SHADER_IMAGE. If any are supported, it
selects that number of samples.
We were checking if sample_count <= 1, which meant that we were getting
a value of 1x MSAA, rather than the expected 0x (feature doesn't exist).
But, only on Icelake because Gen11 adds support for typed read messages
for R8G8B8A8_UNORM. The lack of typed read messages for these formats
was tricking the check on Gen9 to say no correctly. This caused some
Icelake conformance failures, because we don't implement this feature.
Just check for sample_count == 0 instead.
Kenneth Graunke [Thu, 18 Jul 2019 00:03:17 +0000 (17:03 -0700)]
egl: Only expose 565 pbuffer configs if X can export them as DRI3 images
Glamor in xorg-server 1.20 cannot expose 16bpp pixmaps when running in
the usual 24bpp mode. This meant our 565 pbuffer configs would
ultimately fail to create a backing pixmap, leading to crashes.
To hack around this, make a 16bpp pixmap and try and export it.
If it works, expose the configs. Otherwise, just skip them.
This also disables them on DRI2. These configs were only added to pass
conformance requirements, and I doubt anybody cares about testing out
565 pbuffer visuals on DRI2-only drivers.
v2: Don't leak the fds (caught by Eric Anholt)
v3: Don't free(fds), it's not malloc'd
Fixes:
dacb11a585f ("egl: Add a 565 pbuffer-only EGL config under X11.")
Reviewed-by: Eric Anholt <eric@anholt.net>
Kenneth Graunke [Sat, 23 Mar 2019 01:00:55 +0000 (18:00 -0700)]
egl: Make the 565 pbuffer-only config single buffered.
In commit
dacb11a585face5ca179c34cfc588a71a425c1e0, Eric found the first
matching 565 pbuffer config, and stopped. Our double-buffered configs
come first in the list, so we added that, making a pbuffer-only config
that claimed to be double buffered. This doesn't make sense, since
pixmaps/pbuffers are fundamentally not double buffered.
When using that config, every call to eglCreatePbufferSurface would fail
with EGL_BAD_MATCH. The call chain looks like this:
- eglCreatePbufferSurface
- dri3_create_pbuffer_surface
- dri3_create_surface
- dri2_get_dri_config
which eventually does:
const bool double_buffer = surface_type == EGL_WINDOW_BIT;
and then fails to find a matching config, because it ends up looking
for a single-buffered config - and there aren't any.
To fix this, make the 565 pbuffer config single-buffered. This fixes
at least 51 dEQP-EGL.* tests.
Fixes:
dacb11a585f ("egl: Add a 565 pbuffer-only EGL config under X11.")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Kenneth Graunke [Fri, 28 Jun 2019 17:11:01 +0000 (10:11 -0700)]
egl: Quiet warning about front buffer rendering for pixmaps/pbuffers
pbuffer configs cause a million of these warnings to trigger, but
when using pixmaps or buffers, there is only one surface, so this
warning doesn't make much sense. Retain it for window surfaces for now.
Fixes:
dacb11a585f ("egl: Add a 565 pbuffer-only EGL config under X11.")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Kenneth Graunke [Fri, 28 Jun 2019 19:56:38 +0000 (12:56 -0700)]
mesa: Fix ReadBuffers with pbuffers
pbuffers are internally single-buffered. Marek fixed DrawBuffers to
handle this case, but we need to fix ReadBuffers too. Otherwise,
pretty much every conformance test fails because glReadPixels breaks.
v2: Refactor the switch into a helper (suggested by Eric Anholt)
Fixes:
35294f2eca8 ("mesa: fix pbuffers because internally they are front buffers")
Acked-by: Eric Engestrom <eric.engestrom@intel.com> (v1)
Reviewed-by: Eric Anholt <eric@anholt.net>
Marek Olšák [Mon, 22 Jul 2019 19:28:42 +0000 (15:28 -0400)]
mesa: fix assertion failure in TexImage
Check the assertion after error checking.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111194
Fixes:
9dd1f7cec01 ("mesa: pass gl_texture_object as arg to not depend on state")
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Jason Ekstrand [Mon, 22 Jul 2019 05:51:24 +0000 (00:51 -0500)]
nir: Remove a bunch of large stack arrays
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Mon, 22 Jul 2019 05:28:27 +0000 (00:28 -0500)]
intel/fs: Stop stack allocating large arrays
Normally, we haven't worried too much about stack sizes as Linux tends
to be fairly friendly towards large stacks. However, when running DXVK
apps under wine, we're suddenly subject to Windows' more stringent stack
limitations and can run out of space more easily. In particular, some
of the shaders in Elite Dangerous: Horizons have quite a few registers
and the arrays in split_virtual_grfs are large enough to blow a 1 MiB
stack leading to crashes during shader compilation.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108662
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Nataraj Deshpande [Fri, 19 Jul 2019 15:44:13 +0000 (08:44 -0700)]
egl/android: Update color_buffers querying for buffer age
color_buffers[] is currently hard coded to 3 for android which fails
in droid_window_dequeue_buffer when ANativeWindow creates color_buffers
>3 while querying buffer age during dEQP partial_update tests on chromeOS.
The patch removes static color_buffers[], queries for MIN_UNDEQUEUED_BUFFERS,
sets native window buffer count and allocates the correct number of
color_buffers as per android.
Fixes dEQP-EGL.functional.partial_update* tests on chromebooks with
enabling EGL_KHR_partial_update.
v2: update comment instead of removing (Eric Engestrom)
v3: change static array to dynamic allocated color_buffers
querying MIN_UNDEQUEUED_BUFFERS (Chia-I Wu olv@chromium.org)
Fixes:
2acc69da8ce "EGL/Android: Add EGL_EXT_buffer_age extension"
Signed-off-by: Nataraj Deshpande <nataraj.deshpande@intel.com>
Acked-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
Caio Marcelo de Oliveira Filho [Fri, 19 Jul 2019 17:34:53 +0000 (10:34 -0700)]
intel/compiler: Use nir_opt_conditional_discard
anv vkpipeline-db results for SKL:
total instructions in shared programs: 3622461 -> 3611281 (-0.31%)
instructions in affected programs: 396452 -> 385272 (-2.82%)
helped: 2062
HURT: 1
total cycles in shared programs:
1458144669 ->
1458105320 (<.01%)
cycles in affected programs: 4171830 -> 4132481 (-0.94%)
helped: 1874
HURT: 180
total loops in shared programs: 2437 -> 2437 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0
total spills in shared programs: 8745 -> 8748 (0.03%)
spills in affected programs: 8 -> 11 (37.50%)
helped: 1
HURT: 1
total fills in shared programs: 23392 -> 23395 (0.01%)
fills in affected programs: 8 -> 11 (37.50%)
helped: 1
HURT: 1
LOST: 0
GAINED: 1
No changes to shader-db on i965 or iris. The glsl compiler already
does a similar optimization.
Improvement suggested by Daniel Schürmann.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Alyssa Rosenzweig [Mon, 22 Jul 2019 15:34:26 +0000 (08:34 -0700)]
pan/decode: Disable magic divisor debugging
Memory corruption (for both legitimate and illegitimate reasons) causes
this to hang pantrace.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Mon, 22 Jul 2019 13:32:48 +0000 (06:32 -0700)]
pan/midgard: Report spills:fills to shader-db
Route this info through so we can track how we're doing on register
spilling.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Fri, 19 Jul 2019 23:27:39 +0000 (16:27 -0700)]
panfrost/midgard: Reenable pipeline register creation
This was disabled to permit regression-free RA work. Now that the spill
code is in place, we can reenable, with some caveats about efficacy.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Fri, 19 Jul 2019 23:23:52 +0000 (16:23 -0700)]
panfrost/midgard: Report tls_size
Pipe through the number of bytes of spilled memory used from the
compiler into the main driver, where it will be used to allocate the
Thread Local Storage buffer.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Fri, 19 Jul 2019 23:11:27 +0000 (16:11 -0700)]
panfrost: Set `initialized` in more cases
Indirect linear writes were not being marked as initialized, causing the
back blit to be dropped, breaking the listed tests.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Fri, 19 Jul 2019 22:53:25 +0000 (15:53 -0700)]
panfrost/ci: Update expectations
We've fixed some shader tests.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Fri, 19 Jul 2019 22:51:51 +0000 (15:51 -0700)]
panfrost/midgard: Promote to *move*, not rewrite for non-SSA
Fixes promoted uniform loads to registers.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Fri, 19 Jul 2019 22:38:49 +0000 (15:38 -0700)]
panfrost/midgard: Dump MIR of RA failure
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Fri, 19 Jul 2019 21:21:35 +0000 (14:21 -0700)]
pan/midgard; Dump successor graph when printing MIR
We just use the pointers of the midgard_block*, which is crude, but it
gets the point across and will help debug successor related issues.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Fri, 19 Jul 2019 21:20:43 +0000 (14:20 -0700)]
pan/midgard: Remove debug statement
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Fri, 19 Jul 2019 20:21:11 +0000 (13:21 -0700)]
panfrost/midgard: Implement register spilling
Now that we run RA in a loop, before each iteration after a failed
allocation we choose a spill node and spill it to Thread Local Storage
using st_int4/ld_int4 instructions (for spills and fills respectively).
This allows us to compile complex shaders that normally would not fit
within the 16 work register limits, although it comes at a fairly steep
performance penalty.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Fri, 19 Jul 2019 19:14:43 +0000 (12:14 -0700)]
panfrost/midgard: Add mir_has_arg helper
Helps scan the MIR for uses of an index.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Fri, 19 Jul 2019 19:11:09 +0000 (12:11 -0700)]
panfrost/midgard: Check write-before-read in liveness analysis
If we write to an index before reading it, the old copy we're checking
liveness for isn't live in this block, even if it does get read later.
Fixes abnormally high register pressure in shaders with loops.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Fri, 19 Jul 2019 18:02:56 +0000 (11:02 -0700)]
panfrost/midgard/disasm: Check for certain tag errors
Midgard bundles contain a tag, as well as a copy of the tag of the next
bundle to facilitate prefetch. Do some simple static analysis to detect
certain tag errors (particularly on shaders without branching).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Fri, 19 Jul 2019 17:51:08 +0000 (10:51 -0700)]
pan/midgard: Add OP_IS_CSEL helper
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Fri, 19 Jul 2019 17:50:34 +0000 (10:50 -0700)]
pan/midgard: Add mir_rewrite_index_src_single helper
Rather than rewriting an index away across the whole block, we expose
finer (per-instruction) granularity for rewrites.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Fri, 19 Jul 2019 16:11:56 +0000 (09:11 -0700)]
pan/midgard: Ignore inline_constant in liveness
It doesn't make any sense to look at it.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Fri, 19 Jul 2019 14:50:48 +0000 (07:50 -0700)]
panfrost/midgard: Implement load/store scratch opcodes
These are used to load/store from Thread Local Storage, which is memory
allocated per-thread (corresponding to ctx->scratchpad in the command
stream) and used for register spilling.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Wed, 17 Jul 2019 20:11:26 +0000 (13:11 -0700)]
pan/midg/disasm: Check for int varying ops
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Tue, 16 Jul 2019 21:40:32 +0000 (14:40 -0700)]
pan/midgard: Remove "aliasing"
It was a crazy idea that didn't pan out. We're better served by a good
copyprop pass. It's also unused now.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Tue, 16 Jul 2019 21:10:08 +0000 (14:10 -0700)]
panfrost: Promote uniform registers late
Rather than creating either a load or a uniform register read with a
fixed beginning offset, we always create a load and then promote to a
uniform register later. This will allow us to promote in a register
pressure aware manner.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Tue, 16 Jul 2019 16:45:11 +0000 (09:45 -0700)]
pan/midgard: Call scheduler/RA in a loop
This will allow us to insert instructions as a result of register
allocation, permitting spilling to be implemented. As a side effect,
with the assert commented out this would fix a bunch of glamor crashes
(due to RA failures) so MATE becomes useable.
Ideally we'll have scheduling or RA actually sorted out before the
branch point but if not this gives us a one-line out to get X working...
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Tue, 16 Jul 2019 16:16:39 +0000 (09:16 -0700)]
pan/midgard: Remove custom register selection callback
What we have is equivalent to the default callback; let's use that.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Samuel Pitoiset [Mon, 22 Jul 2019 08:12:48 +0000 (10:12 +0200)]
radv: fix crash in vkCmdClearAttachments with unused attachment
depth_stencil_attachment and/or ds_resolve attachment can be NULL.
This fixes crashes with
dEQP-VK.renderpass.suballocation.unused_clear_attachments.*
Cc: 19.1 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Sergii Romantsov [Wed, 17 Jul 2019 15:59:28 +0000 (18:59 +0300)]
i965: free object labels when deleting
Some leaks detected with GL_KHR_debug on i965.
CC: Timothy Arceri <t_arceri@yahoo.com.au>
Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Samuel Pitoiset [Thu, 18 Jul 2019 13:51:33 +0000 (15:51 +0200)]
radv/gfx10: update descriptors for inline uniform blocks
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Thu, 18 Jul 2019 13:51:32 +0000 (15:51 +0200)]
radv/gfx10: emit the GS NGG prologue before the nested barrier
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Thu, 18 Jul 2019 13:51:31 +0000 (15:51 +0200)]
radv/gfx10: do not allocate space for the ZPASS_DONE bug
GFX10 isn't affected.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Thu, 18 Jul 2019 13:51:30 +0000 (15:51 +0200)]
radv/gfx10: do not set ELEMENT_SIZE for buffer descriptors
This field doesn't exist.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Thu, 18 Jul 2019 13:51:29 +0000 (15:51 +0200)]
radv: clean up fill_geom_tess_rings()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Thu, 18 Jul 2019 13:51:28 +0000 (15:51 +0200)]
radv: change a bunch of >= GFX9 to == GFX9
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Thu, 18 Jul 2019 13:51:27 +0000 (15:51 +0200)]
ac/nir: do not clamp shadow reference on GFX10
RadeonSI only uses Z32_FLOAT_CLAMP for upgraded depth textures
on GFX10 and RADV doesn't promotes Z16 or Z24.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Daniel Schürmann [Sat, 20 Jul 2019 17:21:14 +0000 (19:21 +0200)]
radv: move nir_opt_conditional_discard out of optimization loop
This late optimization pass is only affected by nir_opt_if() and handles all cases
in a single pass. It's enough to call it once after the optimization loop.
No changes on vkpipeline-db.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Iago Toral Quiroga [Fri, 19 Jul 2019 07:54:54 +0000 (09:54 +0200)]
v3d: fill logicop_func in the fragment shader key when precompiling shaders
Since logicop_func 0 is PIPE_LOGIOP_CLEAR, we were trigger lowerinng
of logic ops on precompiled shaders, which we don't want to do. Also, this
had the side effect of making shader-db crash, as during this lowering we
would try to read the color format swizzle information from the fragment shader
key that we don't populate in precompiled shaders because right now we only
need it when logic operations are enabled.
Reviewed-by: Eric Anholt <eric@anholt.net>
Jose Maria Casanova Crespo [Tue, 9 Jul 2019 17:23:25 +0000 (19:23 +0200)]
v3d: Avoid scheduling an instruction that stalls waiting for SFU retval
If we detect that a scheduling candidate will stall because having a
register source that is the written by the SFU unit in the previous
instruction we reduce its priority so any non stalling operation would
be chosen.
The latency of SFU operations is defined as 2. So they would be scheduled
earlier if other candidates have the same priority.
Finally we won't merge instructions that stall to a previously chosen one.
As the result of the previous one would be waiting for an extra cycle.
Although shader-db result show that instruction are hurt with an increase
of 0.35% the sum of instructions + stalls is reduced a 0.52%. And
the total of sfu-stalls is reduced a 63.51%. It implies also a small
increase in the max-temps metric because of scheduling earlier SFU
operations.
total instructions in shared programs: 9102719 -> 9117851 (0.17%)
instructions in affected programs: 4324628 -> 4339760 (0.35%)
helped: 4162
HURT: 12128
helped stats (abs) min: 1 max: 10 x̄: 1.28 x̃: 1
helped stats (rel) min: 0.09% max: 4.76% x̄: 0.66% x̃: 0.51%
HURT stats (abs) min: 1 max: 27 x̄: 1.69 x̃: 1
HURT stats (rel) min: 0.05% max: 7.69% x̄: 0.87% x̃: 0.68%
95% mean confidence interval for instructions value: 0.90 0.96
95% mean confidence interval for instructions %-change: 0.47% 0.50%
Instructions are HURT.
total max-temps in shared programs: 1327728 -> 1327812 (<.01%)
max-temps in affected programs: 4730 -> 4814 (1.78%)
helped: 61
HURT: 134
helped stats (abs) min: 1 max: 2 x̄: 1.08 x̃: 1
helped stats (rel) min: 2.70% max: 13.33% x̄: 4.89% x̃: 4.17%
HURT stats (abs) min: 1 max: 3 x̄: 1.12 x̃: 1
HURT stats (rel) min: 1.54% max: 20.00% x̄: 6.10% x̃: 5.26%
95% mean confidence interval for max-temps value: 0.28 0.58
95% mean confidence interval for max-temps %-change: 1.80% 3.52%
Max-temps are HURT.
total sfu-stalls in shared programs: 99551 -> 36324 (-63.51%)
sfu-stalls in affected programs: 95029 -> 31802 (-66.53%)
helped: 25882
HURT: 0
helped stats (abs) min: 1 max: 27 x̄: 2.44 x̃: 2
helped stats (rel) min: 5.26% max: 100.00% x̄: 79.86% x̃: 100.00%
95% mean confidence interval for sfu-stalls value: -2.47 -2.42
95% mean confidence interval for sfu-stalls %-change: -80.18% -79.54%
Sfu-stalls are helped.
total inst-and-stalls in shared programs: 9202270 -> 9154175 (-0.52%)
inst-and-stalls in affected programs: 5618516 -> 5570421 (-0.86%)
helped: 22728
HURT: 855
helped stats (abs) min: 1 max: 31 x̄: 2.16 x̃: 1
helped stats (rel) min: 0.07% max: 16.67% x̄: 1.14% x̃: 0.92%
HURT stats (abs) min: 1 max: 5 x̄: 1.25 x̃: 1
HURT stats (rel) min: 0.12% max: 5.26% x̄: 1.24% x̃: 0.86%
95% mean confidence interval for inst-and-stalls value: -2.07 -2.01
95% mean confidence interval for inst-and-stalls %-change: -1.07% -1.05%
Inst-and-stalls are helped.
v2: Rename v3d_qpu_generates_sfu_stalls to v3d_qpu_instr_is_sfu (Eric)
Reviewed-by: Eric Anholt <eric@anholt.net>
Jose Maria Casanova Crespo [Tue, 2 Jul 2019 16:31:09 +0000 (18:31 +0200)]
v3d: add shader-db stat to count SFU stalls
SFU operations have a latency of 2 cicles, so if their results
are used in the following cycle to a SFU instruction, the GPU
stalls for an extra cycle until the result is available.
This adds the number of stalls to the shader-db debug mode and
sum of instruction + stalls to evaluate optimizations to schedule
instructions that avoid generating sfu-stalls.
v2: Rename v3d_qpu_generates_sfu_stalls to v3d_qpu_instr_is_sfu (Eric)
Reviewed-by: Eric Anholt <eric@anholt.net>
Eric Engestrom [Mon, 8 Jul 2019 16:14:28 +0000 (17:14 +0100)]
radv: replace memset()+strcpy() with snprintf()
Just like the next line :)
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Eric Engestrom [Mon, 8 Jul 2019 16:12:39 +0000 (17:12 +0100)]
radv: drop unnecessary memset() before snprintf()
snprintf() always terminates the string.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Bas Nieuwenhuizen [Thu, 18 Jul 2019 22:00:03 +0000 (00:00 +0200)]
radv: Fix uninitialized warning.
For es_vgpr_comp_cnt.
Fixes:
795adbbadd4 "radv/gfx10: Add pipeline state support for tess."
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Chia-I Wu [Tue, 16 Jul 2019 23:48:03 +0000 (16:48 -0700)]
virgl: fix a sync issue in virgl_buffer_transfer_extend
In virgl_buffer_transfer_extend, when no flush is needed, it tries
to extend a previously queued transfer instead if it can find one.
Comparing to virgl_resource_transfer_prepare, it fails to check if
the resource is busy.
The existence of a previously queued transfer normally implies that
the resource is not busy, maybe except for when the transfer is
PIPE_TRANSFER_UNSYNCHRONIZED. Rather than burdening us with a
lengthy comment, and potential concerns over breaking it as the
transfer code evolves, this commit makes the valid_buffer_range
check the only condition to take the fast path.
In real world, we hit the fast path almost only because of the
valid_buffer_range check. In micro benchmarks, the condition should
always be true, otherwise the benchmarks are not very representative
of meaningful workloads. I think this fix is justified.
The recent change to PIPE_TRANSFER_MAP_DIRECTLY usage disables the
fast path. This commit re-enables it as well.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Chia-I Wu [Wed, 17 Jul 2019 00:11:55 +0000 (17:11 -0700)]
virgl: rework virgl_transfer_queue_extend
Do not take a transfer and do the memcpy. Add a _buffer suffix to
the function name to make it clear that it is only for buffers.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Chia-I Wu [Wed, 10 Jul 2019 07:33:29 +0000 (00:33 -0700)]
virgl: fix virgl_buffer_transfer_extend
Without setting hw_res, virgl_transfer_queue_extend never finds a
match and always returns NULL.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Marek Olšák [Wed, 17 Jul 2019 03:17:38 +0000 (23:17 -0400)]
radeonsi: initialize scissor registers etc. without clear state
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Marek Olšák [Wed, 17 Jul 2019 01:30:57 +0000 (21:30 -0400)]
radeonsi: return success from vi_dcc_clear_level to simplify callers
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Marek Olšák [Tue, 16 Jul 2019 23:04:48 +0000 (19:04 -0400)]
radeonsi: fix compute-based culling regression in
1ce52c1e373
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Marek Olšák [Tue, 16 Jul 2019 17:23:17 +0000 (13:23 -0400)]
radeonsi/gfx10: fix VGT_PRIMITIVE_TYPE programming
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Marek Olšák [Fri, 12 Jul 2019 19:55:33 +0000 (15:55 -0400)]
radeonsi/gfx10: enable Wave32 for vertex, geometry, and tessellation shaders
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Marek Olšák [Fri, 12 Jul 2019 19:45:33 +0000 (15:45 -0400)]
radeonsi/gfx10: add debug options to enable/disable Wave32
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Marek Olšák [Fri, 12 Jul 2019 23:49:30 +0000 (19:49 -0400)]
radeonsi/gfx10: add as_ngg variant for TES as ES to select Wave32/64
Legacy GS has to use Wave64, so TES before GS has to use Wave64 too.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Marek Olšák [Fri, 12 Jul 2019 21:37:29 +0000 (17:37 -0400)]
radeonsi/gfx10: implement Wave32
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Marek Olšák [Tue, 16 Jul 2019 04:55:46 +0000 (00:55 -0400)]
radeonsi/gfx10: use 32-bit wavemasks for Wave32
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Marek Olšák [Fri, 12 Jul 2019 21:35:39 +0000 (17:35 -0400)]
ac: create the LLVM builder in ac_llvm_context_init
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Marek Olšák [Fri, 12 Jul 2019 21:32:18 +0000 (17:32 -0400)]
ac: create the LLVM module for Wave32 or Wave64 in ac_llvm_context_init
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Marek Olšák [Fri, 12 Jul 2019 21:20:36 +0000 (17:20 -0400)]
ac/rtld: add support for Wave32
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Marek Olšák [Fri, 12 Jul 2019 21:14:11 +0000 (17:14 -0400)]
ac: add Wave32 LLVM target machine
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Marek Olšák [Fri, 12 Jul 2019 21:12:17 +0000 (17:12 -0400)]
ac: initial Wave32 support in LLVM build helpers
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Marek Olšák [Tue, 16 Jul 2019 02:00:05 +0000 (22:00 -0400)]
radeonsi: assume that selector != NULL for compute shaders
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Marek Olšák [Tue, 16 Jul 2019 01:55:43 +0000 (21:55 -0400)]
radeonsi: remove what appears to be legacy compute code
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Marek Olšák [Tue, 16 Jul 2019 01:49:30 +0000 (21:49 -0400)]
radeonsi: remove si_program::use_code_object_v2
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Marek Olšák [Tue, 16 Jul 2019 01:39:22 +0000 (21:39 -0400)]
radeonsi: add si_shader_selector into si_compute
Now we can assume that shader->selector is always set.
This will simplify some code.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Marek Olšák [Fri, 12 Jul 2019 21:22:30 +0000 (17:22 -0400)]
radeonsi: set threadgroup size to 0 for threadgroups with only 1 wave
This has no effect on Wave64.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Marek Olšák [Fri, 12 Jul 2019 21:26:24 +0000 (17:26 -0400)]
radeonsi/gfx10: set as_ngg for GS prolog
as_ngg is required by Wave32.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Marek Olšák [Fri, 12 Jul 2019 19:31:14 +0000 (15:31 -0400)]
radeonsi/gfx10: remove the disable_ngg option
because legacy VS hangs.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Marek Olšák [Sat, 6 Jul 2019 04:12:26 +0000 (00:12 -0400)]
radeonsi/gfx10: combine hw edgeflags with user edgeflags for correct behavior
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Marek Olšák [Sat, 6 Jul 2019 04:11:36 +0000 (00:11 -0400)]
radeonsi/gfx10: deduplicate code for esvert_lds_size
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Marek Olšák [Sat, 6 Jul 2019 03:32:36 +0000 (23:32 -0400)]
radeonsi/gfx10: simplify a streamout loop in gfx10_emit_ngg_epilogue
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Marek Olšák [Sat, 6 Jul 2019 03:22:33 +0000 (23:22 -0400)]
radeonsi/gfx10: don't use MALLOC for outputs
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Marek Olšák [Sat, 6 Jul 2019 02:19:47 +0000 (22:19 -0400)]
radeonsi/gfx10: clean up ESGS ring size computation
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Marek Olšák [Sat, 6 Jul 2019 02:12:36 +0000 (22:12 -0400)]
radeonsi/gfx10: fix unnecessary LDS overallocation for NGG GS
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Marek Olšák [Sat, 6 Jul 2019 01:19:41 +0000 (21:19 -0400)]
radeonsi/gfx10: don't compile the GS copy shader if it's 100% not needed
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Marek Olšák [Sat, 6 Jul 2019 01:06:04 +0000 (21:06 -0400)]
radeonsi/gfx10: set GE_CTNL.PACKET_TO_ONE_PA for NGG
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Marek Olšák [Fri, 5 Jul 2019 21:53:47 +0000 (17:53 -0400)]
radeonsi/gfx10: update a tunable max_es_verts_base for NGG
We have to fix the computation so as not to break quads.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Marek Olšák [Fri, 5 Jul 2019 21:30:08 +0000 (17:30 -0400)]
radeonsi/gfx10: implement ARB_post_depth_coverage
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Marek Olšák [Tue, 16 Jul 2019 04:08:27 +0000 (00:08 -0400)]
radeonsi: fix leaked compute shader NIR
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Marek Olšák [Fri, 12 Jul 2019 19:42:44 +0000 (15:42 -0400)]
radeonsi: save the enable_nir option in the shader cache correctly
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Marek Olšák [Sat, 13 Jul 2019 00:21:01 +0000 (20:21 -0400)]
radeonsi/gfx10: enable SDMA
no changes since gfx9 for buffers
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Marek Olšák [Tue, 16 Jul 2019 05:07:49 +0000 (01:07 -0400)]
ac: use llvm.amdgcn.writelane
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Marek Olšák [Tue, 16 Jul 2019 03:42:35 +0000 (23:42 -0400)]
ac: fix shader clock on LLVM 9
Probably relevant commit:
commit
dd32dc3f72ec99b1794d62c74d2beb3b60468d50
Author: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
Date: Tue Jul 9 03:10:18 2019 +0000
[AMDGPU] Always use s_memtime for readcyclecounter
Differential Revision: https://reviews.llvm.org/D64369
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@365431
91177308-0d34-0410-b5e6-
96231b3b80d8
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Boyuan Zhang [Wed, 15 May 2019 19:05:21 +0000 (15:05 -0400)]
radeon/vcn: adding engine type for new fw interface
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Marek Olšák [Mon, 8 Jul 2019 18:57:42 +0000 (14:57 -0400)]
radeonsi: use the correct buffer size in si_vid_clear_buffer
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Pierre-Eric Pelloux-Prayer [Fri, 26 Apr 2019 14:50:57 +0000 (16:50 +0200)]
mesa: add EXT_dsa glEnabledIndexedEXT
The implementation uses _mesa_ActiveTexture to change the active texture unit and
then reset it.
It causes an unnecessary _NEW_TEXTURE_STATE but:
- adding an index argument to _mesa_set_enable causes a lot of changes (~140 callers)
- enable_texture (called by _mesa_set_enable) might cause a _NEW_TEXTURE_STATE
anyway.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Pierre-Eric Pelloux-Prayer [Mon, 20 May 2019 12:12:54 +0000 (14:12 +0200)]
mesa: add EXT_dsa glGetTextureLevelParameter*vEXT functions
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Pierre-Eric Pelloux-Prayer [Fri, 26 Apr 2019 14:50:31 +0000 (16:50 +0200)]
mesa: add EXT_dsa gl(Copy)Texture(Sub)Image1D/2D/3DEXT functions
Added functions:
- glTextureImage1DEXT
- glTextureImage2DEXT
- glTextureImage3DEXT
- glTextureSubImage1DEXT
- glTextureSubImage3DEXT
- glCopyTextureImage1DEXT
- glCopyTextureImage2DEXT
- glCopyTextureSubImage1DEXT
- glCopyTextureSubImage2DEXT
- glCopyTextureSubImage3DEXT
- glGetTextureImageEXT
All but the last one can be compiled in a display list.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>