Alyssa Rosenzweig [Wed, 15 May 2019 01:19:22 +0000 (01:19 +0000)]
panfrost/midgard: Remove imov workaround
The previous commit fixes the issue this patched around.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Alyssa Rosenzweig [Wed, 15 May 2019 01:16:51 +0000 (01:16 +0000)]
panfrost/midgard: Set int outmod for ops writing integers
By default, the "normal" output modifier is set on ALU ops. This is the
correct default for float outputs -- for floats, it preserves the semantic
value. Unfortunately, when used with integers, it does not preserve the
bitstream encoding, causing misbehaviour. (It's an open question what
happens when `normal` is used with integers -- does it apply some other
transformation? or does it do floating point normalization/etc on the
ints as if they were floats?).
Instead, we default to the "clamp to integer" output modifier for
ops writing integers. Semantically, this makes sense (clamping an
integer to the nearest integer is the identity function). In the
hardware with an integer opcode, this is the actual "normal".
This fixes numerous sporadic and sometimes bizarre bugs relating to
integers, especially integer moves. With this in place, we no longer
care about the types involved; it's just bits on the wire again.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Alyssa Rosenzweig [Tue, 14 May 2019 23:18:18 +0000 (23:18 +0000)]
panfrost: Set custom stride for textures when necessary
From Gallium (and our) perspective, the stride of a BO is arbitrary. For
internal buffers, we can make it something nice, but for imported linear
buffers (e.g. EGL clients), we don't always have that luxury. To cope,
we calculate the expected stride of a texture, compare it to the BO's
actual reported stride, and if they differ, set the latter as a custom
stride.
Fixes rendering of windows not on tile boundaries (noticeable in Weston
with es2gears_wayland, for instance). Also, this should fix stride
issues with bufer reloading.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Alyssa Rosenzweig [Tue, 14 May 2019 22:42:47 +0000 (22:42 +0000)]
panfrost/decode: Stride decoding
With a special flag, texture descriptors can include custom stride(s).
We haven't seen a case of this used for mipmaps/cubemaps, so it's not
clear how that will be encoded, but this dumps correctly for single
one-level 2D textures.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Alyssa Rosenzweig [Tue, 14 May 2019 22:21:39 +0000 (22:21 +0000)]
panfrost/decode: Futureproof texture dumping
One field was not dumped for some reason. It's observed to be 0, but
it's still good to have it available.
Also, extra fields might be snuck in the bitmaps array (it's
variable-lengthed at the end), and we want to guard against that
possibility, so we dump a little more.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Marek Olšák [Wed, 15 May 2019 02:16:20 +0000 (22:16 -0400)]
ac: rename SI-CIK-VI to GFX6-GFX7-GFX8
Acked-by: Dave Airlie <airlied@redhat.com>
We already use GFX9 and I don't want us to have confusing naming
in the driver. GFXn naming is better from the driver perspective,
because it's the real version of the gfx portion of the hw. Also,
CIK means Bonaire-Kaveri-Kabini, it doesn't mean CI.
It shouldn't confuse our SDMA, UVD, VCE etc. code much. Those have
nothing to do with GFXn and they have their own version numbers.
Marek Olšák [Wed, 15 May 2019 01:50:59 +0000 (21:50 -0400)]
ac: add comments to chip enums
Reviewed-by: Alex Deucher <alexander.deucher@amd.com> (except GFX2 changes)
Reviewed-by: Dave Airlie <airlied@redhat.com> (except <= GFX5 changes)
Anuj Phogat [Fri, 10 May 2019 20:22:31 +0000 (13:22 -0700)]
compiler: Add lowering support for 64-bit saturate operations to software
Fixes 7 Khronos GL CTS tests:
KHR-GL45.gpu_shader_fp64.builtin.smoothstep_dvec{double, 2, 3, 4}
KHR-GL45.gpu_shader_fp64.builtin.smoothstep_against_scalar_dvec{2, 3, 4}
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Kenneth Graunke [Wed, 15 May 2019 21:49:14 +0000 (14:49 -0700)]
st/dri: Minor style fixes
Trivial.
Chia-I Wu [Sat, 11 May 2019 04:23:12 +0000 (21:23 -0700)]
virgl: handle DONT_BLOCK and MAP_DIRECTLY
Handle PIPE_TRANSFER_DONT_BLOCK and PIPE_TRANSFER_MAP_DIRECTLY.
Make virgl_resource_transfer_prepare return an enum instead of a
bool for extensibility (e.g., instruct the callers to map
differently).
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Chia-I Wu [Thu, 9 May 2019 20:27:34 +0000 (13:27 -0700)]
virgl: add virgl_resource_transfer_prepare
virgl_resource_transfer_prepare should be called before mapping to
prepare the resource. It does flush, readback, and wait as needed.
virgl_res_needs_flush and virgl_res_needs_readback become internal
helpers to the new function.
There should be no externally visible change.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Chia-I Wu [Tue, 14 May 2019 17:14:03 +0000 (10:14 -0700)]
virgl: honor DISCARD_WHOLE_RESOURCE in virgl_res_needs_readback
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Chia-I Wu [Fri, 10 May 2019 03:40:41 +0000 (20:40 -0700)]
virgl: clean up virgl_res_needs_readback
Add comments and follow the coding style of virgl_res_needs_flush.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Lionel Landwerlin [Wed, 15 May 2019 18:09:36 +0000 (19:09 +0100)]
nir: fix lower_non_uniform_access pass
Obviously missing the instruction insertion into the SSA list.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes:
3bd545764151 ("nir: Add a lowering pass for non-uniform resource access")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Alex Villacís Lasso [Mon, 13 May 2019 01:34:28 +0000 (20:34 -0500)]
gbm: gbm_bo_get_handle_for_plane fallback to nonplanar handle
Commit
f9567ab435217a72cbae628336ead84dc0b2a803 (gbm: Export a getter for per
plane handles) contains an API version check that fails on i915 (API version 7
vs. check for minimum API version 13). Any client that migrates to the planar
API will start failing on i915 (see https://gitlab.gnome.org/GNOME/mutter/issues/127
for mutter, and https://bugs.freedesktop.org/show_bug.cgi?id=108487 for weston).
This commit adds a fallback for plane 0 when the API check fails and returns the
non-planar handle in this scenario, making the call equivalent to
gbm_bo_get_handle(). This is enough for weston 6.0.0 to start working again on an
i915 system.
Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=108487
Signed-off-by: Alex Villacís Lasso <a_villacis@palosanto.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Alyssa Rosenzweig [Wed, 15 May 2019 04:32:51 +0000 (04:32 +0000)]
gallium: Add default check for PIPE_CAP_FRAGMENT_SHADER_INTERLOCK
Fixes:
c704c0226 ("gallium: Add a PIPE_CAP_FRAGMENT_SHADER_INTERLOCK")
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Andrii Kryvytskyi [Mon, 13 May 2019 12:19:03 +0000 (15:19 +0300)]
iris: Check if resource has stencil before returning it
Signed-off-by: Andrii Kryvytskyi <andrii.o.kryvytskyi@globallogic.com>
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Jordan Justen [Fri, 10 May 2019 03:55:36 +0000 (20:55 -0700)]
i965/blorp: Set MOCS for gen11 in blorp_alloc_vertex_buffer
v2:
* Add build error for gen > 6 if MOCS is not set. (Lionel)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Kenneth Graunke [Sat, 11 May 2019 01:34:25 +0000 (18:34 -0700)]
iris: Enable fragment shader interlock on Gen9+.
There's some debate about whether we should support this on older
hardware as well. Currently i965 turns it off on Gen8- though, so
we follow suit. If this changes, we can update this as well.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Kenneth Graunke [Sat, 11 May 2019 01:32:46 +0000 (18:32 -0700)]
gallium: Add a PIPE_CAP_FRAGMENT_SHADER_INTERLOCK.
Corresponding to GL_ARB_fragment_shader_interlock and
GL_NV_fragment_shader_interlock. Currently, only the NIR paths
support this functionality, but someone could conceivably add it
to TGSI too.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Dave Airlie [Fri, 3 May 2019 00:17:54 +0000 (10:17 +1000)]
intel/compiler: use bitset instead of opencoding a 32-bit bitset. (v2)
In the future I want to expand this to 128-bits, for vec16 support, so
lets just put the code in place to use bitset ranges now.
v2: just declare the bitset to be the max of what we should ever see
and change assert to reflect it.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Dave Airlie [Fri, 3 May 2019 00:15:07 +0000 (10:15 +1000)]
intel/compiler: remove repeated bit_size / 8 in brw mem lowering pass.
Just use a variable already.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Fri, 3 May 2019 21:57:54 +0000 (14:57 -0700)]
intel/compiler: Implement TCS 8_PATCH mode and INTEL_DEBUG=tcs8
Our tessellation control shaders can be dispatched in several modes.
- SINGLE_PATCH (Gen7+) processes a single patch per thread, with each
channel corresponding to a different patch vertex. PATCHLIST_N will
launch (N / 8) threads. If N is less than 8, some channels will be
disabled, leaving some untapped hardware capabilities. Conditionals
based on gl_InvocationID are non-uniform, which means that they'll
often have to execute both paths. However, if there are fewer than
8 vertices, all invocations will happen within a single thread, so
barriers can become no-ops, which is nice. We also burn a maximum
of 4 registers for ICP handles, so we can compile without regard for
the value of N. It also works in all cases.
- DUAL_PATCH mode processes up to two patches at a time, where the first
four channels come from patch 1, and the second group of four come
from patch 2. This tries to provide better EU utilization for small
patches (N <= 4). It cannot be used in all cases.
- 8_PATCH mode processes 8 patches at a time, with a thread launched per
vertex in the patch. Each channel corresponds to the same vertex, but
in each of the 8 patches. This utilizes all channels even for small
patches. It also makes conditions on gl_InvocationID uniform, leading
to proper jumps. Barriers, unfortunately, become real. Worse, for
PATCHLIST_N, the thread payload burns N registers for ICP handles.
This can burn up to 32 registers, or 1/4 of our register file, for
URB handles. For Vulkan (and DX), we know the number of vertices at
compile time, so we can limit the amount of waste. In GL, the patch
dimension is dynamic state, so we either would have to waste all 32
(not reasonable) or guess (badly) and recompile. This is unfortunate.
Because we can only spawn 16 thread instances, we can only use this
mode for PATCHLIST_16 and smaller. The rest must use SINGLE_PATCH.
This patch implements the new 8_PATCH TCS mode, but leaves us using
SINGLE_PATCH by default. A new INTEL_DEBUG=tcs8 flag will switch to
using 8_PATCH mode for testing and benchmarking purposes. We may
want to consider using 8_PATCH mode in Vulkan in some cases.
The data I've seen shows that 8_PATCH mode can be more efficient in
some cases, but SINGLE_PATCH mode (the one we use today) is faster
in other cases. Ultimately, the TES matters much more than the TCS
for performance, so the decision may not matter much.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Fri, 3 May 2019 21:37:11 +0000 (14:37 -0700)]
intel/compiler: Move ICP handle fetching into a helper function.
This will be significantly different in 8_PATCH mode.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Fri, 3 May 2019 21:28:51 +0000 (14:28 -0700)]
intel/compiler: Don't repeat dispatch max fixing condition
Having a single flag will keep both places in sync if the condition
gets more complicated.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Fri, 3 May 2019 21:24:49 +0000 (14:24 -0700)]
intel/compiler: Rename invocation_id_mask to instance_id_mask
The payload field is actually "instance" (thread number), which is used
to calculate the invocation ID.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Fri, 3 May 2019 21:20:00 +0000 (14:20 -0700)]
intel/compiler: Refactor TCS invocation ID setup into a helper
When we add 8_PATCH mode, this will get a bit more complex, so we may
as well start by putting it in a helper function.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Tue, 7 May 2019 01:24:08 +0000 (18:24 -0700)]
i965: Pass compiler to default key populators
This lets us get devinfo and other misc. compiler settings.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Marek Olšák [Fri, 10 May 2019 00:58:21 +0000 (20:58 -0400)]
ac: use 1D GEPs for descriptors and constants
just a cleanup
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Marek Olšák [Mon, 13 May 2019 21:03:58 +0000 (17:03 -0400)]
mesa: fix _mesa_max_texture_levels for GL_TEXTURE_EXTERNAL_OES
This helps fix:
piglit/bin/ext_image_dma_buf_import-sample_yuv -fmt=NV12 -auto
Fixes:
d88f3392fff7c6342f3840c4bd8195a1296c2372
Reviewed-by: Eric Anholt <eric@anholt.net>
Eric Anholt [Mon, 13 May 2019 22:31:17 +0000 (15:31 -0700)]
freedreno: Restore msm_drm.h to a pristine "make headers_install" copy.
This diverged back in
f1374805a86d ("drm-uapi: use local files, not system
libdrm") to point at drm-uapi's copy, which we don't need now that we're
actually in drm-uapi.
Reviewed-by: Rob Clark <robdclark@gmail.com>
Eric Anholt [Mon, 13 May 2019 22:21:06 +0000 (15:21 -0700)]
freedreno: Move msm_drm.h to the same spot as other DRM uapi.
The new location matches other drivers, and has a README about the rules
for updating it.
Reviewed-by: Rob Clark <robdclark@gmail.com>
Ian Romanick [Wed, 28 Mar 2018 05:57:07 +0000 (22:57 -0700)]
nir/algebraic: Commute 1-fsat(a) to fsat(1-a) for all non-fmul instructions
The goal is to avoid having an extra MOV instruction to perform the
saturate. Doing the subtraction first allows the saturate to be applied
to the ADD instruction making the MOV unnecessary. Values generated in
different block and values from non-ALU instructions (e.g., texture
instructions) almost always need the extra MOV.
Multiply instructions are restricted because doing this rearrangement
can interfere with the generation of flrp and ffma instructions.
v2: Now that the final method has been selected, squash three commits
into one.
All Intel platforms has similar results. (Ice Lake shown)
total instructions in shared programs:
17223214 ->
17219386 (-0.02%)
instructions in affected programs: 1524376 -> 1520548 (-0.25%)
helped: 2686
HURT: 26
helped stats (abs) min: 1 max: 32 x̄: 1.44 x̃: 1
helped stats (rel) min: 0.03% max: 16.67% x̄: 0.54% x̃: 0.37%
HURT stats (abs) min: 1 max: 2 x̄: 1.69 x̃: 2
HURT stats (rel) min: 0.33% max: 1.67% x̄: 0.54% x̃: 0.35%
95% mean confidence interval for instructions value: -1.46 -1.36
95% mean confidence interval for instructions %-change: -0.56% -0.50%
Instructions are helped.
total cycles in shared programs:
360811571 ->
360791896 (<.01%)
cycles in affected programs:
103650214 ->
103630539 (-0.02%)
helped: 1557
HURT: 675
helped stats (abs) min: 1 max: 1773 x̄: 41.44 x̃: 16
helped stats (rel) min: <.01% max: 26.77% x̄: 1.37% x̃: 0.64%
HURT stats (abs) min: 1 max: 1513 x̄: 66.44 x̃: 14
HURT stats (rel) min: <.01% max: 46.16% x̄: 2.00% x̃: 0.49%
95% mean confidence interval for cycles value: -14.82 -2.81
95% mean confidence interval for cycles %-change: -0.50% -0.20%
Cycles are helped.
LOST: 2
GAINED: 0
Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Ian Romanick [Tue, 5 Mar 2019 22:54:35 +0000 (14:54 -0800)]
nir/algebraic: Eliminate useless fsat() on operand of comparison w/value in (0, 1)
v2: Fix copy-and-paste bug in a cmp b vs b cmp a cases.
All Gen7+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs:
17224337 ->
17224269 (<.01%)
instructions in affected programs: 13578 -> 13510 (-0.50%)
helped: 68
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.31% max: 3.12% x̄: 0.84% x̃: 0.42%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -1.05% -0.63%
Instructions are helped.
total cycles in shared programs:
360826090 ->
360825137 (<.01%)
cycles in affected programs: 94867 -> 93914 (-1.00%)
helped: 58
HURT: 1
helped stats (abs) min: 2 max: 28 x̄: 17.74 x̃: 18
helped stats (rel) min: 0.08% max: 3.17% x̄: 1.39% x̃: 1.22%
HURT stats (abs) min: 76 max: 76 x̄: 76.00 x̃: 76
HURT stats (rel) min: 2.86% max: 2.86% x̄: 2.86% x̃: 2.86%
95% mean confidence interval for cycles value: -19.53 -12.78
95% mean confidence interval for cycles %-change: -1.56% -1.08%
Cycles are helped.
No changes on any other Intel platform.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Ian Romanick [Wed, 6 Mar 2019 22:07:18 +0000 (14:07 -0800)]
nir/algebraic: Strip double negatives from comparison sources
All Intel platforms had similar results. (Ice Lake shown)
total instructions in shared programs:
17224623 ->
17224337 (<.01%)
instructions in affected programs: 32648 -> 32362 (-0.88%)
helped: 148
HURT: 0
helped stats (abs) min: 1 max: 2 x̄: 1.93 x̃: 2
helped stats (rel) min: 0.16% max: 2.74% x̄: 1.07% x̃: 1.08%
95% mean confidence interval for instructions value: -1.97 -1.89
95% mean confidence interval for instructions %-change: -1.15% -1.00%
Instructions are helped.
total cycles in shared programs:
360828714 ->
360826090 (<.01%)
cycles in affected programs: 347416 -> 344792 (-0.76%)
helped: 148
HURT: 26
helped stats (abs) min: 1 max: 426 x̄: 26.33 x̃: 18
helped stats (rel) min: 0.03% max: 15.10% x̄: 1.78% x̃: 1.41%
HURT stats (abs) min: 2 max: 337 x̄: 48.96 x̃: 6
HURT stats (rel) min: 0.04% max: 18.82% x̄: 2.15% x̃: 0.27%
95% mean confidence interval for cycles value: -23.78 -6.38
95% mean confidence interval for cycles %-change: -1.59% -0.79%
Cycles are helped.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Ian Romanick [Tue, 5 Mar 2019 20:08:29 +0000 (12:08 -0800)]
intel/compiler: Repeat nir_opt_algebraic_late
A tiny bit of help seems to come from nir_copy_prop. Future patches
will benefit from this change.
Doing more copy propagation on the vec4 backend led to a disaster in
hurt cycles.
v2: Fix typo in comment. Noticed by Matt.
All Gen8+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs:
17224634 ->
17224623 (<.01%)
instructions in affected programs: 4586 -> 4575 (-0.24%)
helped: 11
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.19% max: 0.53% x̄: 0.27% x̃: 0.23%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -0.36% -0.19%
Instructions are helped.
total cycles in shared programs:
360828542 ->
360828714 (<.01%)
cycles in affected programs: 151159 -> 151331 (0.11%)
helped: 49
HURT: 28
helped stats (abs) min: 1 max: 254 x̄: 26.41 x̃: 6
helped stats (rel) min: 0.06% max: 12.02% x̄: 1.34% x̃: 0.42%
HURT stats (abs) min: 1 max: 196 x̄: 52.36 x̃: 15
HURT stats (rel) min: 0.05% max: 10.74% x̄: 2.55% x̃: 0.88%
95% mean confidence interval for cycles value: -13.48 17.95
95% mean confidence interval for cycles %-change: -0.69% 0.84%
Inconclusive result (value mean confidence interval includes 0).
Haswell, Ivy Bridge, and Sandy Bridge had similar results. (Haswell shown)
total instructions in shared programs:
13529544 ->
13529542 (<.01%)
instructions in affected programs: 358 -> 356 (-0.56%)
helped: 2
HURT: 0
total cycles in shared programs:
357290311 ->
357289678 (<.01%)
cycles in affected programs: 178324 -> 177691 (-0.35%)
helped: 48
HURT: 40
helped stats (abs) min: 1 max: 201 x̄: 31.52 x̃: 13
helped stats (rel) min: 0.06% max: 10.92% x̄: 1.71% x̃: 0.66%
HURT stats (abs) min: 1 max: 224 x̄: 22.00 x̃: 6
HURT stats (rel) min: 0.05% max: 15.84% x̄: 1.29% x̃: 0.31%
95% mean confidence interval for cycles value: -18.28 3.89
95% mean confidence interval for cycles %-change: -1.01% 0.32%
Inconclusive result (value mean confidence interval includes 0).
Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8159110 -> 8158980 (<.01%)
instructions in affected programs: 22719 -> 22589 (-0.57%)
helped: 65
HURT: 0
helped stats (abs) min: 1 max: 3 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.07% max: 1.05% x̄: 0.73% x̃: 0.74%
95% mean confidence interval for instructions value: -2.06 -1.94
95% mean confidence interval for instructions %-change: -0.78% -0.68%
Instructions are helped.
total cycles in shared programs:
188609448 ->
188609214 (<.01%)
cycles in affected programs: 1875852 -> 1875618 (-0.01%)
helped: 109
HURT: 104
helped stats (abs) min: 2 max: 46 x̄: 5.30 x̃: 4
helped stats (rel) min: 0.02% max: 0.90% x̄: 0.09% x̃: 0.07%
HURT stats (abs) min: 2 max: 20 x̄: 3.31 x̃: 2
HURT stats (rel) min: 0.01% max: 0.26% x̄: 0.04% x̃: 0.02%
95% mean confidence interval for cycles value: -1.95 -0.25
95% mean confidence interval for cycles %-change: -0.04% -0.01%
Cycles are helped.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Ian Romanick [Wed, 13 Mar 2019 22:49:43 +0000 (15:49 -0700)]
Revert "nir: add late opt to turn inot/b2f combos back to bcsel"
This reverts commit
7acc8652268205a266068ea4d059eccce43e1f78.
With these optimizations in place, the extra constant folding added in
the next commit extends some live ranges of 0.0 and ±1.0 constants, and
that causes several hundred shaders to have more spills and fills.
I believe this optimization we made basically irrelevant by
7725d609387
"intel/fs: Emit better code for b2f(inot(a)) and b2i(inot(a))".
All Gen7.5+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs:
17225303 ->
17224634 (<.01%)
instructions in affected programs: 879402 -> 878733 (-0.08%)
helped: 679
HURT: 1
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.03% max: 0.93% x̄: 0.24% x̃: 0.05%
HURT stats (abs) min: 10 max: 10 x̄: 10.00 x̃: 10
HURT stats (rel) min: 0.45% max: 0.45% x̄: 0.45% x̃: 0.45%
95% mean confidence interval for instructions value: -1.02 -0.95
95% mean confidence interval for instructions %-change: -0.26% -0.22%
Instructions are helped.
total cycles in shared programs:
360842595 ->
360828542 (<.01%)
cycles in affected programs:
110443594 ->
110429541 (-0.01%)
helped: 389
HURT: 265
helped stats (abs) min: 1 max: 7525 x̄: 162.81 x̃: 28
helped stats (rel) min: <.01% max: 18.66% x̄: 1.11% x̃: 0.11%
HURT stats (abs) min: 1 max: 7614 x̄: 185.96 x̃: 48
HURT stats (rel) min: <.01% max: 25.08% x̄: 0.95% x̃: 0.10%
95% mean confidence interval for cycles value: -75.65 32.67
95% mean confidence interval for cycles %-change: -0.49% -0.06%
Inconclusive result (value mean confidence interval includes 0).
total spills in shared programs: 12159 -> 12161 (0.02%)
spills in affected programs: 13 -> 15 (15.38%)
helped: 0
HURT: 1
total fills in shared programs: 25207 -> 25208 (<.01%)
fills in affected programs: 25 -> 26 (4.00%)
helped: 0
HURT: 1
Ivy Bridge
total instructions in shared programs:
12082019 ->
12082013 (<.01%)
instructions in affected programs: 1033 -> 1027 (-0.58%)
helped: 6
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.41% max: 0.83% x̄: 0.61% x̃: 0.59%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -0.78% -0.45%
Instructions are helped.
total cycles in shared programs:
179849270 ->
179849157 (<.01%)
cycles in affected programs: 4735 -> 4622 (-2.39%)
helped: 4
HURT: 0
helped stats (abs) min: 2 max: 74 x̄: 28.25 x̃: 18
helped stats (rel) min: 0.13% max: 6.53% x̄: 2.85% x̃: 2.36%
95% mean confidence interval for cycles value: -82.73 26.23
95% mean confidence interval for cycles %-change: -7.98% 2.28%
Inconclusive result (value mean confidence interval includes 0).
Sandy Bridge
total instructions in shared programs:
10882750 ->
10882748 (<.01%)
instructions in affected programs: 266 -> 264 (-0.75%)
helped: 2
HURT: 0
Iron Lake
total cycles in shared programs:
188609440 ->
188609448 (<.01%)
cycles in affected programs: 4320 -> 4328 (0.19%)
helped: 0
HURT: 2
GM45
total cycles in shared programs:
129016868 ->
129016872 (<.01%)
cycles in affected programs: 2302 -> 2306 (0.17%)
helped: 0
HURT: 1
Reviewed-by: Matt Turner <mattst88@gmail.com>
Ian Romanick [Thu, 25 Oct 2018 03:13:45 +0000 (20:13 -0700)]
nir/algebraic: Eliminate a tautological compare
The value-range tracking pass that is coming is not clever enough to
know that the result of the ffma must be non-negative. Making it that
smart will require quite a bit of work. It might be possible to add a
special case that detects that a whole tree of fadd(fmul(fsat(a),
fneg(fsat(a))), 1.0) cannot be negative.
For cases when the comparison is used in the domain guard for a
square-root (see nir/algebraic: Simplify fsqrt domain guard), the
compare may be converted to a fmax. This patch also handles that case.
All of the affected cases are in DiRT: Showdown.
All Gen7+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs:
17225365 ->
17225303 (<.01%)
instructions in affected programs: 40051 -> 39989 (-0.15%)
helped: 62
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.07% max: 0.66% x̄: 0.27% x̃: 0.26%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -0.31% -0.22%
Instructions are helped.
total cycles in shared programs:
360842788 ->
360842595 (<.01%)
cycles in affected programs: 1818081 -> 1817888 (-0.01%)
helped: 29
HURT: 22
helped stats (abs) min: 1 max: 206 x̄: 20.66 x̃: 14
helped stats (rel) min: <.01% max: 9.55% x̄: 0.87% x̃: 0.42%
HURT stats (abs) min: 1 max: 108 x̄: 18.45 x̃: 7
HURT stats (rel) min: <.01% max: 4.48% x̄: 0.56% x̃: 0.19%
95% mean confidence interval for cycles value: -14.48 6.91
95% mean confidence interval for cycles %-change: -0.71% 0.21%
Inconclusive result (value mean confidence interval includes 0).
No changes on any other Intel platform.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Ian Romanick [Thu, 25 Oct 2018 02:48:49 +0000 (19:48 -0700)]
nir/algebraic: Simplify fsqrt domain guard
All Gen7+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs:
17228376 ->
17225365 (-0.02%)
instructions in affected programs: 280732 -> 277721 (-1.07%)
helped: 1072
HURT: 0
helped stats (abs) min: 1 max: 12 x̄: 2.81 x̃: 2
helped stats (rel) min: 0.16% max: 5.10% x̄: 1.43% x̃: 1.07%
95% mean confidence interval for instructions value: -2.92 -2.70
95% mean confidence interval for instructions %-change: -1.48% -1.37%
Instructions are helped.
total cycles in shared programs:
360935690 ->
360842788 (-0.03%)
cycles in affected programs: 7838017 -> 7745115 (-1.19%)
helped: 1569
HURT: 69
helped stats (abs) min: 1 max: 1198 x̄: 63.53 x̃: 20
helped stats (rel) min: 0.06% max: 26.17% x̄: 3.44% x̃: 2.12%
HURT stats (abs) min: 1 max: 2820 x̄: 98.22 x̃: 47
HURT stats (rel) min: 0.05% max: 16.67% x̄: 3.50% x̃: 2.31%
95% mean confidence interval for cycles value: -63.55 -49.89
95% mean confidence interval for cycles %-change: -3.33% -2.96%
Cycles are helped.
No changes on any other platform.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Ian Romanick [Wed, 20 Mar 2019 20:42:46 +0000 (13:42 -0700)]
nir/search: Don't compare 8-bit or 1-bit constants with floats
Without this, adding an algebraic rule like
(('bcsel', ('flt', a, 0.0), 0.0, ...), ...),
will cause assertion failures inside nir_src_comp_as_float in
GTF-GL46.gtf21.GL.lessThan.lessThan_vec3_frag (and related tests) from
the OpenGL CTS and shaders/closed/steam/witcher-2/511.shader_test from
shader-db.
All of these cases have some code that ends up like
('bcsel', ('flt', a, 0.0), 'b@1', ...)
When the 'b@1' is tested, nir_src_comp_as_float fails because there's
no such thing as a 1-bit float.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Ian Romanick [Tue, 5 Mar 2019 20:46:11 +0000 (12:46 -0800)]
nir/algebraic: Recognize open-coded fsat with modifiers
This change also enables a later change (nir/algebraic: Replace
1-fsat(a) with fsat(1-a)) to affect more shaders.
Almost all of the affected shaders are in Bioshock Infinite, and all of
those shaders all require GLSL 4.10.
All Intel platforms had similar results. (Ice Lake shown)
total instructions in shared programs:
17228584 ->
17228376 (<.01%)
instructions in affected programs: 31438 -> 31230 (-0.66%)
helped: 105
HURT: 0
helped stats (abs) min: 1 max: 5 x̄: 1.98 x̃: 1
helped stats (rel) min: 0.08% max: 1.53% x̄: 0.73% x̃: 0.70%
95% mean confidence interval for instructions value: -2.20 -1.76
95% mean confidence interval for instructions %-change: -0.80% -0.67%
Instructions are helped.
total cycles in shared programs:
360936431 ->
360935690 (<.01%)
cycles in affected programs: 420100 -> 419359 (-0.18%)
helped: 71
HURT: 21
helped stats (abs) min: 1 max: 160 x̄: 19.28 x̃: 10
helped stats (rel) min: <.01% max: 9.78% x̄: 0.95% x̃: 0.48%
HURT stats (abs) min: 1 max: 198 x̄: 29.90 x̃: 10
HURT stats (rel) min: 0.05% max: 8.36% x̄: 1.24% x̃: 0.90%
95% mean confidence interval for cycles value: -16.77 0.66
95% mean confidence interval for cycles %-change: -0.85% -0.06%
Inconclusive result (value mean confidence interval includes 0).
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Ian Romanick [Tue, 23 Oct 2018 21:30:41 +0000 (14:30 -0700)]
nir/algebraic: Push unary operations into source operands of fsat source
Pushing a unary operation, like fneg, into the operation that generates
its operand allows the fsat to be applied to the inner instruction
instead of on a separate instruction that performs the unary operation.
This changes
fmul ssa_100, ssa_99, ssa_98
fmov.sat ssa_101, -ssa_100
into
fmul.sat ssa_100, -ssa_99, ssa_98
Ice Lake, Skylake, and Broadwell had similar results. (Ice Lake shown)
total instructions in shared programs:
17228658 ->
17228584 (<.01%)
instructions in affected programs: 3163 -> 3089 (-2.34%)
helped: 49
HURT: 0
helped stats (abs) min: 1 max: 2 x̄: 1.51 x̃: 2
helped stats (rel) min: 0.58% max: 9.09% x̄: 3.69% x̃: 3.51%
95% mean confidence interval for instructions value: -1.66 -1.37
95% mean confidence interval for instructions %-change: -4.37% -3.00%
Instructions are helped.
total cycles in shared programs:
360937144 ->
360936431 (<.01%)
cycles in affected programs: 24029 -> 23316 (-2.97%)
helped: 47
HURT: 2
helped stats (abs) min: 4 max: 18 x̄: 15.34 x̃: 16
helped stats (rel) min: 0.69% max: 6.18% x̄: 3.78% x̃: 4.27%
HURT stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4
HURT stats (rel) min: 0.34% max: 0.67% x̄: 0.50% x̃: 0.50%
95% mean confidence interval for cycles value: -16.05 -13.05
95% mean confidence interval for cycles %-change: -4.07% -3.15%
Cycles are helped.
All Gen7 and earlier platforms had similar results. (Haswell shown)
total instructions in shared programs:
13536059 ->
13535884 (<.01%)
instructions in affected programs: 8797 -> 8622 (-1.99%)
helped: 150
HURT: 0
helped stats (abs) min: 1 max: 2 x̄: 1.17 x̃: 1
helped stats (rel) min: 0.40% max: 11.11% x̄: 3.51% x̃: 1.96%
95% mean confidence interval for instructions value: -1.23 -1.11
95% mean confidence interval for instructions %-change: -3.97% -3.05%
Instructions are helped.
total cycles in shared programs:
357696119 ->
357694193 (<.01%)
cycles in affected programs: 50216 -> 48290 (-3.84%)
helped: 109
HURT: 14
helped stats (abs) min: 2 max: 92 x̄: 18.97 x̃: 16
helped stats (rel) min: 0.26% max: 19.09% x̄: 7.37% x̃: 5.37%
HURT stats (abs) min: 2 max: 26 x̄: 10.14 x̃: 5
HURT stats (rel) min: 0.18% max: 4.73% x̄: 1.84% x̃: 0.92%
95% mean confidence interval for cycles value: -19.27 -12.05
95% mean confidence interval for cycles %-change: -7.34% -5.31%
Cycles are helped.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Ian Romanick [Wed, 13 Mar 2019 00:43:38 +0000 (17:43 -0700)]
nir/algebraic: Recognize open-coded flrp(a, b, fsat(c))
All Gen6+ GPUs had similar results. (Skylake shown)
total instructions in shared programs:
15336712 ->
15336622 (<.01%)
instructions in affected programs: 3952 -> 3862 (-2.28%)
helped: 24
HURT: 0
helped stats (abs) min: 3 max: 5 x̄: 3.75 x̃: 4
helped stats (rel) min: 1.75% max: 2.70% x̄: 2.34% x̃: 2.46%
95% mean confidence interval for instructions value: -4.06 -3.44
95% mean confidence interval for instructions %-change: -2.47% -2.22%
Instructions are helped.
total cycles in shared programs:
355722052 ->
355721235 (<.01%)
cycles in affected programs: 27326 -> 26509 (-2.99%)
helped: 20
HURT: 4
helped stats (abs) min: 1 max: 227 x̄: 44.75 x̃: 14
helped stats (rel) min: 0.12% max: 22.95% x̄: 3.83% x̃: 1.23%
HURT stats (abs) min: 2 max: 64 x̄: 19.50 x̃: 6
HURT stats (rel) min: 0.21% max: 3.63% x̄: 1.24% x̃: 0.55%
95% mean confidence interval for cycles value: -61.61 -6.47
95% mean confidence interval for cycles %-change: -5.59% -0.39%
Cycles are helped.
No changes on Ice Lake, Iron Lake, or GM45.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Ian Romanick [Mon, 11 Mar 2019 22:49:26 +0000 (15:49 -0700)]
intel/fs: Allow cmod propagation to instructions with saturate modifier
v2: Add unit tests. Suggested by Matt.
All Intel GPUs had similar results. (Ice Lake shown)
total instructions in shared programs:
17229441 ->
17228658 (<.01%)
instructions in affected programs: 159574 -> 158791 (-0.49%)
helped: 489
HURT: 0
helped stats (abs) min: 1 max: 5 x̄: 1.60 x̃: 1
helped stats (rel) min: 0.07% max: 2.70% x̄: 0.61% x̃: 0.59%
95% mean confidence interval for instructions value: -1.72 -1.48
95% mean confidence interval for instructions %-change: -0.64% -0.58%
Instructions are helped.
total cycles in shared programs:
360944149 ->
360937144 (<.01%)
cycles in affected programs: 1072195 -> 1065190 (-0.65%)
helped: 254
HURT: 27
helped stats (abs) min: 2 max: 234 x̄: 30.51 x̃: 9
helped stats (rel) min: 0.04% max: 8.99% x̄: 0.75% x̃: 0.24%
HURT stats (abs) min: 2 max: 83 x̄: 27.56 x̃: 24
HURT stats (rel) min: 0.09% max: 3.79% x̄: 1.28% x̃: 1.16%
95% mean confidence interval for cycles value: -30.11 -19.75
95% mean confidence interval for cycles %-change: -0.70% -0.41%
Cycles are helped.
Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
Ian Romanick [Fri, 10 May 2019 17:20:02 +0000 (10:20 -0700)]
nir/algebraic: Add missing ffma(-1, a, b) pattern
All Gen7+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs:
17229439 ->
17229377 (<.01%)
instructions in affected programs: 9859 -> 9797 (-0.63%)
helped: 41
HURT: 0
helped stats (abs) min: 1 max: 6 x̄: 1.51 x̃: 1
helped stats (rel) min: 0.08% max: 11.54% x̄: 1.65% x̃: 0.67%
95% mean confidence interval for instructions value: -1.88 -1.14
95% mean confidence interval for instructions %-change: -2.48% -0.81%
Instructions are helped.
total cycles in shared programs:
360944145 ->
360942989 (<.01%)
cycles in affected programs: 178167 -> 177011 (-0.65%)
helped: 36
HURT: 19
helped stats (abs) min: 1 max: 222 x̄: 38.03 x̃: 5
helped stats (rel) min: 0.01% max: 31.01% x̄: 4.01% x̃: 0.45%
HURT stats (abs) min: 1 max: 34 x̄: 11.21 x̃: 6
HURT stats (rel) min: 0.03% max: 2.74% x̄: 0.72% x̃: 0.50%
95% mean confidence interval for cycles value: -36.01 -6.02
95% mean confidence interval for cycles %-change: -4.18% -0.57%
Cycles are helped.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Ian Romanick [Tue, 14 Aug 2018 01:08:23 +0000 (18:08 -0700)]
nir: Mark ffma as 2src_commutative
This doesn't make any real difference now, but future work (not in this
series) will add a LOT of ffma patterns. Having to duplicate all of
them for ffma(a, b, c) and ffma(b, a, c) is just terrible.
No shader-db changes on any Intel platform.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Ian Romanick [Thu, 9 May 2019 22:33:11 +0000 (15:33 -0700)]
nir: Add support for 2src_commutative ops that have 3 sources
v2: Instead of handling 3 sources as a special case, generalize with
loops to N sources. Suggested by Jason.
v3: Further generalize by only checking that number of sources is >= 2.
Suggested by Jason.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Ian Romanick [Thu, 9 May 2019 22:27:14 +0000 (15:27 -0700)]
nir: Rename commutative to 2src_commutative
The meaning of the new name is that the first two sources are
commutative. Since this is only currently applied to two-source
operations, there is no change.
A future change will mark ffma as 2src_commutative.
It is also possible that future work will add 3src_commutative for
opcodes like fmin3.
v2: s/commutative_2src/2src_commutative/g. I had originally considered
this, but I discarded it because I did't want to deal with identifiers
that (should) start with 2. Jason suggested it in review, so we decided
that _2src_commutative would be used in nir_opcodes.py. Also add some
comments documenting what 2src_commutative means. Also suggested by
Jason.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Jason Ekstrand [Wed, 8 May 2019 18:34:04 +0000 (13:34 -0500)]
intel/fs/ra: Spill without destroying the interference graph
Instead of re-building the interference graph every time we spill, we
modify it in place so we can avoid recalculating liveness and the whole
O(n^2) interference graph building process. We make a simplifying
assumption in order to do so which is that all spill/fill temporary
registers live for the entire duration of the instruction around which
we're spilling. This isn't quite true because a spill into the source
of an instruction doesn't need to interfere with its destination, for
instance. Not re-calculating liveness also means that we aren't
adjusting spill costs based on the new liveness. The combination of
these things results in a bit of churn in spilling. It takes a large
cut out of the run-time of shader-db on my laptop.
Shader-db results on Kaby Lake:
total instructions in shared programs:
15311224 ->
15311360 (<.01%)
instructions in affected programs: 77027 -> 77163 (0.18%)
helped: 11
HURT: 18
total cycles in shared programs:
355544739 ->
355830749 (0.08%)
cycles in affected programs:
203273745 ->
203559755 (0.14%)
helped: 234
HURT: 190
total spills in shared programs: 12049 -> 12042 (-0.06%)
spills in affected programs: 2465 -> 2458 (-0.28%)
helped: 9
HURT: 16
total fills in shared programs: 25112 -> 25165 (0.21%)
fills in affected programs: 6819 -> 6872 (0.78%)
helped: 11
HURT: 16
Total CPU time (seconds): 2469.68 -> 2360.22 (-4.43%)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Jason Ekstrand [Wed, 8 May 2019 18:09:27 +0000 (13:09 -0500)]
intel/fs/ra: Put the VGRFs at the end of the nodes
This is slightly less convenient in some places but it will make it much
easier when we want to start adding nodes dynamically.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Jason Ekstrand [Wed, 8 May 2019 04:54:17 +0000 (23:54 -0500)]
intel/fs/ra: Re-arrange interference setup
The old code was arranged by the type of interference being added. It
would set up payload registers and then add payload interference for all
VGRFs. It would set up MRFs and add MRF interference for all VGRFs.
This commit re-arranges things to be organized differently. It first
creates and sets up all RA nodes and then groups interference into two
new categories: live range and instruction interference. Once all the
RA nodes have been set up, it walks the list of VGRFs and sets up their
live range interference and then walks the list of instructions and sets
up instruction interference. This new arrangement will be advantageous
for a future patch but, at the moment, it cuts 2% off the run-time of
shader-db on my laptop.
Shader-db results on Kaby Lake:
total instructions in shared programs:
15311224 ->
15311224 (0.00%)
instructions in affected programs: 0 -> 0
helped: 0
HURT: 0
total cycles in shared programs:
355544739 ->
355544739 (0.00%)
cycles in affected programs: 0 -> 0
helped: 0
HURT: 0
Total CPU time (seconds): 2523.45 -> 2469.68 (-2.13%)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Jason Ekstrand [Tue, 7 May 2019 23:14:46 +0000 (18:14 -0500)]
intel/fs/ra: Do the spill loop inside RA
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Jason Ekstrand [Tue, 7 May 2019 22:38:22 +0000 (17:38 -0500)]
intel/fs/ra: Only add MRF hack interference if we're spilling
The only use of the MRF hack these days is for spilling and there we
don't need the precise MRF usage information. If we're spilling then we
know pretty well how many MRFs are going to be used. It is possible if
the only things that are spilled have fewer SIMD channels than the
dispatch width of the shader that this may be more MRFs than needed.
That's a risk we're willing to takd.
Shader-db results on Kaby Lake:
total instructions in shared programs:
15311100 ->
15311224 (<.01%)
instructions in affected programs: 16664 -> 16788 (0.74%)
helped: 1
HURT: 5
total cycles in shared programs:
355543197 ->
355544739 (<.01%)
cycles in affected programs: 731864 -> 733406 (0.21%)
helped: 3
HURT: 6
The hurt shaders are all SIMD32 compute shaders where we reserve enough
space for a 32-wide spill/fill but don't need it.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Jason Ekstrand [Wed, 8 May 2019 01:09:08 +0000 (20:09 -0500)]
intel/fs/ra: Pull the guts of RA into its own class
This accomplishes two things. First, it makes interfaces which are
really private to RA private to RA. Second, it gives us a place to
store some common stuff as we go through the algorithm.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Jason Ekstrand [Tue, 7 May 2019 21:48:27 +0000 (16:48 -0500)]
intel/fs/ra: Move assign_regs further down in the file
It's the main function from which all the other functions are called.
It belongs at the bottom.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Jason Ekstrand [Tue, 7 May 2019 21:43:56 +0000 (16:43 -0500)]
intel/fs/ra: Split building the interference graph into a helper
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Jason Ekstrand [Wed, 8 May 2019 01:11:08 +0000 (20:11 -0500)]
intel/fs/ra: Initialize grf_used with first_non_payload_grf
There's no reason why we need to use the calculated payload_node_count
value which is just first_non_payload_grf aligned up. The grf_used
value will be aligned up to 16 anyway (which is a much bigger alignment)
before being handed off to hardware.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Jason Ekstrand [Wed, 8 May 2019 17:41:31 +0000 (12:41 -0500)]
intel/fs/ra: Stop adding RA interference to too many SENDS nodes
We only have one node per VGRF so this was adding way too much
interference. No idea how we didn't catch this before.
Shader-db results on Kaby Lake:
total instructions in shared programs:
15311100 ->
15311100 (0.00%)
instructions in affected programs: 0 -> 0
helped: 0
HURT: 0
total cycles in shared programs:
355468050 ->
355543197 (0.02%)
cycles in affected programs: 2472492 -> 2547639 (3.04%)
helped: 17
HURT: 20
Fixes:
014edff0d20d "intel/fs: Add interference between SENDS sources"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Jason Ekstrand [Tue, 14 May 2019 15:20:35 +0000 (10:20 -0500)]
util/ra: Assert nodes are in-bounds in add_node_interference
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Jason Ekstrand [Tue, 14 May 2019 15:36:09 +0000 (10:36 -0500)]
intel/fs/ra: Only add dest interference to sources that exist
Fixes:
83dedb6354d "i965: Add src/dst interference for certain"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Jason Ekstrand [Wed, 8 May 2019 21:41:41 +0000 (16:41 -0500)]
util/ra: Don't destroy the graph in ra_allocate()
We want to be able to call ra_allocate() and, when it fails, mutate the
graph and try again rather than re-building the graph from scratch.
This commit moves all the scratch bits except the final register
allocation (which is really an out value not scratch) into sub-structs
named "tmp" to make it clear which things are scratch. It also adds
bits to the ra_select() initialization loop to initialize things (since
we can't trust rzalloc anymore) and copy q_test and forced_reg over.
Reviewed-by: Eric Anholt <eric@anholt.net>
Jason Ekstrand [Tue, 7 May 2019 22:16:58 +0000 (17:16 -0500)]
util/ra: Add a helper for resetting a node's interference
Reviewed-by: Eric Anholt <eric@anholt.net>
Jason Ekstrand [Thu, 9 May 2019 23:53:04 +0000 (18:53 -0500)]
util/ra: Add helpers for adding nodes to an interference graph
Reviewed-by: Eric Anholt <eric@anholt.net>
Jason Ekstrand [Thu, 9 May 2019 23:52:44 +0000 (18:52 -0500)]
util/ralloc: Add helpers for growing zero-initialized memory
Unfortunately, we can't quite follow the standard C conventions for
these because ralloc doesn't know the sizes of pointers.
Reviewed-by: Eric Anholt <eric@anholt.net>
Jason Ekstrand [Thu, 9 May 2019 19:44:16 +0000 (14:44 -0500)]
intel/fs: Stop doing extra RA calls
In the last phase of the schedule and RA loop, the RA call is redundant
if we spill. Immediately afterwards, we're going to see that we
couldn't allocate without spilling and call back into RA and tell it to
go ahead and spill. We've known about it for a while but we've always
brushed over it on the theory that, if you're going to spill, you'll be
calling RA a bunch anyway and what does one extra RA hurt? As it turns
out, it hurts more than you'd expect. Because the RA interference graph
gets sparser with each spill and the RA algorithm is more efficient on
sparser graphs, the RA call that we're duplicating is actually the most
expensive call in the RA-and-spill loop.
There's another extra RA call we do that's a bit harder to see which
this also removes. If we try to compile a shader that isn't the minimum
dispatch width and it fails to allocate without spilling we call fail()
to set an error but then go ahead and do the first spilling RA pass and
only after that's complete do we detect the fail and bail out. By
making minimum dispatch widths part of the spill condition, we side-step
this problem.
Getting rid of these extra spills takes the compile time of a nasty
Aztec Ruins shader from about 28 seconds to about 26 seconds on my
laptop. It also makes shader-db 1.5% faster
Shader-db results on Kaby Lake:
total instructions in shared programs:
15311100 ->
15311100 (0.00%)
instructions in affected programs: 0 -> 0
helped: 0
HURT: 0
total cycles in shared programs:
355468050 ->
355468050 (0.00%)
cycles in affected programs: 0 -> 0
helped: 0
HURT: 0
Total CPU time (seconds): 2524.31 -> 2486.63 (-1.49%)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Jason Ekstrand [Thu, 9 May 2019 15:01:06 +0000 (10:01 -0500)]
util/ra: Improve the performance of ra_simplify
The most expensive part of register allocation is the ra_simplify step
which is a fixed-point algorithm with a worst-case complexity of O(n^2)
which adds the registers to a stack which we then use later to do the
actual allocation. This commit uses bit sets and changes the core loop
of ra_simplify to first walk 32-node chunks and then walk each chunk.
This lets us skip whole 32-node chunks in one go based on bit operations
and compute the minimum q value potentially 32x as fast. Of course, the
algorithm still has the same fundamental O(n^2) run-time but the
constant is now much lower.
In the nasty Aztec Ruins compute shader, this shaves a full four seconds
off the 30s compile time for a release build of mesa. In a debug build
(needed for accurate stack traces), perf says that ra_select takes 20%
of runtime before this patch and only 5-6% of runtime after this patch.
It also makes shader-db runs faster.
Shader-db results on Kaby Lake:
total instructions in shared programs:
15311100 ->
15311100 (0.00%)
instructions in affected programs: 0 -> 0
helped: 0
HURT: 0
total cycles in shared programs:
355468050 ->
355468050 (0.00%)
cycles in affected programs: 0 -> 0
helped: 0
HURT: 0
Total CPU time (seconds): 2602.37 -> 2524.31 (-3.00%)
Reviewed-by: Eric Anholt <eric@anholt.net>
Jason Ekstrand [Thu, 9 May 2019 22:05:31 +0000 (17:05 -0500)]
util/ra: Only update q_total if the reg is not assigned
We only use q_total if the reg is not assigned so there's no point in
updating it if the reg is not assigned. This has no known perf benefit
but it will reduce churn in a future commit.
Reviewed-by: Eric Anholt <eric@anholt.net>
Jason Ekstrand [Thu, 9 May 2019 18:51:03 +0000 (13:51 -0500)]
util/ra: Only update best_optimistic_node if !progress
This shaves about half a second off the 30 second compile time of one of
the compute shaders in Aztec ruins.
Reviewed-by: Eric Anholt <eric@anholt.net>
Jason Ekstrand [Thu, 9 May 2019 14:48:36 +0000 (09:48 -0500)]
util/ra: Make in_stack a bitset in the graph
Reviewed-by: Eric Anholt <eric@anholt.net>
Jason Ekstrand [Thu, 9 May 2019 14:46:41 +0000 (09:46 -0500)]
util/ra: Get rid of tabs
Reviewed-by: Eric Anholt <eric@anholt.net>
Chia-I Wu [Fri, 10 May 2019 03:40:28 +0000 (20:40 -0700)]
virgl: clean up virgl_res_needs_flush
Add comments and some minor cleanups.
v2: document the function
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com> (v1)
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Chia-I Wu [Fri, 10 May 2019 18:06:49 +0000 (11:06 -0700)]
virgl: comment on a sync issue in transfers
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Chia-I Wu [Tue, 7 May 2019 20:22:51 +0000 (13:22 -0700)]
virgl: PIPE_TRANSFER_READ does not imply flush
virgl_res_needs_flush should suffice.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Chia-I Wu [Tue, 7 May 2019 17:56:40 +0000 (10:56 -0700)]
virgl: do not skip readback because of explicit flush
Both apps and we (see virgl_buffer_transfer_flush_region) might
flush regions that are unmodified. We have to read back for those
flushes.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Chia-I Wu [Tue, 7 May 2019 17:01:31 +0000 (10:01 -0700)]
virgl: remove unused virgl_transfer_inline_write
It currently has no user and is probably incorrect (resource_wait is
required in some more cases). Remove it so that we can focus on
transfers first.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Nanley Chery [Wed, 1 May 2019 21:57:23 +0000 (14:57 -0700)]
iris/resource: Drop redundant checks for aux support
Drop some checks that are already done by ISL.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Nanley Chery [Wed, 1 May 2019 21:42:58 +0000 (14:42 -0700)]
iris/resource: Fall back to no aux if creation fails
No surface requires an auxiliary surface to operate correctly. Fall back
to an uncompressed surface if mesa fails to create and allocate an
auxiliary surface. This enables adding more restrictions to ISL without
having to update iris.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Nanley Chery [Thu, 2 May 2019 22:41:50 +0000 (15:41 -0700)]
i965/miptree: Refactor intel_miptree_supports_ccs_e()
Update and rename this function to format_supports_ccs_e() to better
match its behavior.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Nanley Chery [Mon, 29 Apr 2019 20:00:25 +0000 (13:00 -0700)]
i965/miptree: Drop intel_*_supports_hiz()
intel_tiling_supports_hiz() and intel_miptree_supports_hiz() duplicate
much the work done by isl_surf_get_hiz_surf(). Replace them with simple
expressions.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Nanley Chery [Mon, 29 Apr 2019 19:59:35 +0000 (12:59 -0700)]
isl: Add restrictions to isl_surf_get_hiz_surf()
Import some restrictions from intel_tiling_supports_hiz() and
intel_miptree_supports_hiz().
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Nanley Chery [Sat, 27 Apr 2019 01:18:44 +0000 (18:18 -0700)]
i965/miptree: Drop intel_*_supports_ccs()
intel_tiling_supports_ccs() and intel_miptree_supports_ccs() duplicate
much the work done by isl_surf_get_ccs_surf(). Drop them both and index
a boolean array to choose CCS_D in intel_miptree_choose_aux_usage().
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Nanley Chery [Sat, 27 Apr 2019 01:17:19 +0000 (18:17 -0700)]
isl: Add restriction and comments to isl_surf_get_ccs_surf()
Import some restrictions and comments from intel_miptree_supports_ccs().
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Nanley Chery [Sat, 27 Apr 2019 00:00:34 +0000 (17:00 -0700)]
i965/miptree: Drop intel_miptree_supports_mcs()
This function duplicates much the work done by isl_surf_get_mcs_surf().
Replace it with a simple expression.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Nanley Chery [Fri, 26 Apr 2019 23:52:48 +0000 (16:52 -0700)]
isl: Modify restrictions in isl_surf_get_mcs_surf()
Import some restrictions from intel_miptree_supports_mcs() and don't
assume that the caller knows which device generations are supported.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Nanley Chery [Wed, 24 Apr 2019 20:34:15 +0000 (13:34 -0700)]
i965/miptree: Fall back to no aux if creation fails
No surface requires an auxiliary surface to operate correctly. Fall back
to an uncompressed surface if mesa fails to create and allocate an
auxiliary surface. This enables adding more restrictions to ISL without
having to update i965.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Mathias Fröhlich [Sun, 12 May 2019 08:35:52 +0000 (10:35 +0200)]
mesa: Set _NEW_VARYING_VP_INPUTS iff varying_vp_inputs are set.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Mathias Fröhlich [Sun, 12 May 2019 08:35:52 +0000 (10:35 +0200)]
mesa: Avoid setting _NEW_VARYING_VP_INPUTS in non fixed function mode.
Instead of checking the API variant on entry of set_varying_vp_inputs
to check if we can ever be interrested in fixed function processing
or not, we can check if we are actually fixed function processing.
To check this we can use the immediately updated
gl_context::VertexProgram._VPMode value that tells us if we have a
user provided shader program or if we are in fixed function processing
either through an internal TNL shader of directly through hardware.
When doing so, we also need to recheck the varying_vp_inputs variable
at the time gl_context::VertexProgram._VPMode is set to VP_MODE_FF.
Put asserts at the consumers of gl_context::varying_vp_inputs to make
sure gl_context::VertexProgram._VPMode is set to VP_MODE_FF. By that
gl_context::varying_vp_inputs should be up to date then.
By not looking at the opengl api for this decision we should actually
catch more cases where we can avoid setting a state change flag, including
the ones where we cannot get into VP_MODE_FF by the choice of the api.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Mathias Fröhlich [Sun, 12 May 2019 08:35:52 +0000 (10:35 +0200)]
mesa: Fix test for setting the _NEW_VARYING_VP_INPUTS flag.
The precondition stated in the comment is not true. The values mentioned are
only set from _mesa_update_state which in turn may not yet be called.
For now set the _NEW_VARYING_VP_INPUTS flag a bit more often, we will narrow
that down to a minimum again in a later patch.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Mathias Fröhlich [Sun, 12 May 2019 08:35:52 +0000 (10:35 +0200)]
mesa: Make _mesa_set_varying_vp_inputs static in state.c.
Is no longer used outside that file.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Mathias Fröhlich [Sun, 12 May 2019 08:35:52 +0000 (10:35 +0200)]
mesa: Fix old outdated variable name in a comment.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Mathias Fröhlich [Sun, 12 May 2019 08:35:52 +0000 (10:35 +0200)]
mesa/vbo: Update Comment to what is actually happening.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Jonas Ådahl [Mon, 6 May 2019 07:54:27 +0000 (09:54 +0200)]
wayland/egl: Ensure correct buffer size when allocating
Whenever a buffer is allocated, e.g. by the first draw call or EGL call after a
buffer swap, make sure the size is up to date. Prior to this commit, we
failed to do so when querying the buffer age, or swapping buffers
without any prior EGL call or draw call.
Signed-off-by: Jonas Ådahl <jadahl@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Paulo Zanoni [Wed, 1 May 2019 23:26:47 +0000 (16:26 -0700)]
egl: check if a window/pixmap is already used on surface creation
The spec says we can't create another surface if we already created a
surface with the given window or pixmap. Implement this check.
This behavior is exercised by piglit/egl-create-surface.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Paulo Zanoni [Wed, 1 May 2019 22:42:26 +0000 (15:42 -0700)]
egl: store the native surface pointer in struct _egl_surface
Each platform stores this in a different place:
- platform_drm uses dri2_surf->gbm_surf->base
- platform_android uses dri2_surf->window
- platform_wayland uses dri2_surf->wl_win
- platform_x11 uses dri2_surf->drawable
- platform_x11_dri3 uses dri3_surf->loader_drawable.drawable
- haiku doesn't even store it!
We need access to the native surface since the specification asks us
to refuse creating a new surface if there's already an EGLSurface
associated with native_surface.
An alternative to this patch would be to create a new
API.GetNativeWindow callback that each platform would have to
implement. While that's something we can definitely do, I prefer
this approach.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Samuel Pitoiset [Mon, 13 May 2019 16:43:55 +0000 (18:43 +0200)]
radv: add support for VK_KHR_uniform_buffer_standard_layout
Nothing to do.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Gert Wollny [Mon, 13 May 2019 12:02:24 +0000 (14:02 +0200)]
softpipe/buffer: load only as many components as the the buffer resource type provides
Otherwise we risk to read past the end of the buffer.
In addition, change the loop counters to unsigned to be consistent
with the types.
Fixes:
afa8707ba93a7d226a76319acda2a8dd89524db7
softpipe: add SSBO/shader atomics support.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Tomeu Vizoso [Mon, 13 May 2019 05:28:24 +0000 (07:28 +0200)]
panfrost: ci: Reduce batch size to 3000
As with the previous value of 5000 we seemed to be reaching OOM in some
circumstances.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Tomeu Vizoso [Mon, 13 May 2019 06:05:01 +0000 (08:05 +0200)]
panfrost: ci: Update expectations
Since last Friday, these two tests have been fixed:
dEQP-GLES2.functional.shaders.functions.control_flow.return_in_nested_loop_fragment
dEQP-GLES2.functional.shaders.linkage.varying_7
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Eric Anholt [Mon, 13 May 2019 18:04:46 +0000 (11:04 -0700)]
freedreno: Fix warning on printing a uint64_t using %llx.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Eric Anholt [Fri, 10 May 2019 00:32:25 +0000 (17:32 -0700)]
freedreno: Silence compiler warnings about "*" in boolean context.
It sure looks like we just want both of them to be nonzero, and && is
probably going to be cheaper than * anyway.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>