Lucas Stach [Thu, 18 May 2017 14:30:02 +0000 (16:30 +0200)]
etnaviv: don't read back resource if transfer discards contents
Reduces bandwidth usage of transfers which discard the buffer contents,
as well as skipping unnecessary command stream flushes and CPU/GPU
synchronization.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
Lucas Stach [Thu, 18 May 2017 16:20:12 +0000 (18:20 +0200)]
etnaviv: honor PIPE_TRANSFER_UNSYNCHRONIZED flag
This gets rid of quite a bit of CPU/GPU sync on frequent vertex buffer
uploads and I haven't seen any of the issues mentioned in the comment,
so this one seems stale.
Ignore the flag if there exists a temporary resource, as those ones are
never busy.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
Lucas Stach [Thu, 18 May 2017 13:37:02 +0000 (15:37 +0200)]
etnaviv: slim down resource waiting
cpu_prep() already does all the required waiting, so the only thing that
needs to be done is flushing the commandstream, if a GPU write is pending.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
Rob Herring [Thu, 1 Jun 2017 00:56:56 +0000 (19:56 -0500)]
glsl: Fix gl_shader_stage enum unsigned comparison
Replace -1 with MESA_SHADER_NONE enum value to fix sign related warning:
external/mesa3d/src/compiler/glsl/link_varyings.cpp:1415:25: warning: comparison of constant -1 with expression of type 'gl_shader_stage' is always true [-Wtautological-constant-out-of-range-compare]
(consumer_stage != -1 && consumer_stage != MESA_SHADER_FRAGMENT))) {
~~~~~~~~~~~~~~ ^ ~~
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Rob Herring [Thu, 8 Jun 2017 01:56:07 +0000 (20:56 -0500)]
Android: vulkan: fix build error due to extra )
Commit
621b3410f5f8 ("util/vulkan: Move Vulkan utilities to
src/vulkan/util") broke the Android build with the following error:
build/core/binary.mk:1427: error: external/mesa3d/src/vulkan/Android.mk: libmesa_vulkan_util: Unused source files: util/vk_util.h).
Fixes:
621b3410f5f8 ("util/vulkan: Move Vulkan utilities to src/vulkan/util")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Cc: Alex Smith <asmith@feralinteractive.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Iago Toral Quiroga [Thu, 8 Jun 2017 06:38:29 +0000 (08:38 +0200)]
Fix glcpp test expectations
With commit
f7741985be0234 we have changed some preprocessor
error messages and warnings. Adapt related glcpp tests
expectations accordingly.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101336
Tested-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Vlad Golovkin [Mon, 29 May 2017 23:51:32 +0000 (02:51 +0300)]
util: make set's deleted_key_value declaration consistent with hash table one
This also silences following clang warnings:
no previous extern declaration for non-static variable 'deleted_key' [-Werror,-Wmissing-variable-declarations]
const void *deleted_key = &deleted_key_value;
^
no previous extern declaration for non-static variable 'deleted_key_value'
[-Werror,-Wmissing-variable-declarations]
uint32_t deleted_key_value;
^
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Jason Ekstrand [Fri, 26 May 2017 19:18:49 +0000 (12:18 -0700)]
i965: Delete intel_resolve_map
Now that we've moved over to the new array mechanism, it's no longer
needed.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Chad Versace <chadversary@chromium.org>
Jason Ekstrand [Fri, 26 May 2017 19:12:06 +0000 (12:12 -0700)]
i965: Use the new tracking mechanism for HiZ
This is similar to the previous commit only for HiZ. For HiZ, apart
from everything looking different, there is really only one functional
change: We now track the ISL_AUX_STATE_COMPRESSED_NO_CLEAR state.
Previously, if you rendered to a resolved slice of the miptree and then
did a fast-clear with a different clear color, that slice would get
resolved even though it hadn't been fast-cleared. Now that we can track
COMPRESSED_NO_CLEAR, we know that it doesn't have any blocks in the
"clear" state so we can skip the resolve.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Chad Versace <chadversary@chromium.org>
Jason Ekstrand [Wed, 31 May 2017 18:50:24 +0000 (11:50 -0700)]
i965/miptree: Make level_has_hiz take a const miptree
Acked-by: Chad Versace <chadversary@chromium.org>
Jason Ekstrand [Fri, 26 May 2017 00:18:30 +0000 (17:18 -0700)]
i965: Wholesale replace the color resolve tracking code
This commit reworks the resolve tracking for CCS and MCS to use the new
isl_aux_state enum. This should provide much more accurate and easy to
reason about tracking. In order to understand, for instance, the
intel_miptree_prepare_ccs_access function, one only has to go look at
the giant comment for the isl_aux_state enum and follow the arrows.
Unfortunately, there's no good way to split this up without making a
real mess so there are a bunch of changes in here:
1) We now do partial resolves. I really have no idea how this ever
worked before. So far as I can tell, the only time the old code
ever did a partial resolve was when it was using CCS_D where a
partial resolve and a full resolve are the same thing.
2) We are now tracking 4 states instead of 3 for CCS_E. In particular,
we distinguish between compressed with clear and compressed without
clear. The end result is that you will never get two partial
resolves in a row.
3) The texture view rules are now more correct. Previously, we would
only bail if compression was not supported by the destination
format. However, this is not actually correct. Not all format
pairs are supported for texture views with CCS even if both support
CCS individually. Fortunately, ISL has a helper for this.
4) We are no longer using intel_resolve_map for tracking aux state but
are instead using a simple array of enum isl_aux_state indexed by
level and layer. This is because, now that we're tracking 4
different states, it's no longer clear which should be the "default"
and array lookups are faster than linked list searches.
5) The new code is very assert-happy. Incorrect transitions will now
get caught by assertions rather than by rendering corruption.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Chad Versace <chadversary@chromium.org>
Jason Ekstrand [Thu, 25 May 2017 22:02:23 +0000 (15:02 -0700)]
i965: Delete most of the old resolve interface
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Chad Versace <chadversary@chromium.org>
Jason Ekstrand [Thu, 25 May 2017 23:09:04 +0000 (16:09 -0700)]
i965: Use the new get/set_aux_state functions for color clears
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Chad Versace <chadversary@chromium.org>
Jason Ekstrand [Thu, 25 May 2017 21:50:26 +0000 (14:50 -0700)]
i965: Move blorp to the new resolve functions
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Chad Versace <chadversary@chromium.org>
Jason Ekstrand [Thu, 25 May 2017 21:15:44 +0000 (14:15 -0700)]
i965: Move depth to the new resolve functions
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Jason Ekstrand [Thu, 25 May 2017 19:30:50 +0000 (12:30 -0700)]
i965: Move images to the new resolve functions
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Jason Ekstrand [Thu, 25 May 2017 19:26:00 +0000 (12:26 -0700)]
i965: Move framebuffer fetch to the new resolve functions
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Jason Ekstrand [Thu, 25 May 2017 19:14:52 +0000 (12:14 -0700)]
i965: Remove an unneeded render_cache_set_check_flush
This is only needed to fix rendering corruptions caused by not flushing
after doing a resolve operation. The resolve now does all the needed
flushing so this is unnecessary.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Jason Ekstrand [Thu, 25 May 2017 18:58:40 +0000 (11:58 -0700)]
i965: Move color rendering to the new resolve functions
This also removes an unneeded brw_render_cache_set_check_flush() call.
We were calling it in the case where the surface got resolved to satisfy
the flushing requirements around resolves. However, blorp now does this
itself, so the extra is just redundant.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Jason Ekstrand [Thu, 25 May 2017 18:29:17 +0000 (11:29 -0700)]
i965: Move texturing to the new resolve functions
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Jason Ekstrand [Thu, 25 May 2017 05:09:51 +0000 (22:09 -0700)]
i965: Use the new resolve function for several simple cases
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Jason Ekstrand [Thu, 25 May 2017 05:09:30 +0000 (22:09 -0700)]
i965/miptree: Add new entrypoints for resolve management
This commit adds a new unified interface for doing resolves. The basic
format is that, prior to any surface access such as texturing or
rendering, you call intel_miptree_prepare_access. If the surface was
written, you call intel_miptree_finish_write. These two functions take
parameters which tell them whether or not auxiliary compression and fast
clears are supported on the surface. Later commits will add wrappers
around these two functions for texturing, rendering, etc.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Jason Ekstrand [Thu, 25 May 2017 03:01:12 +0000 (20:01 -0700)]
intel/isl: Add an enum for describing auxiliary compression state
This enum describes all of the states that a auxiliary compressed
surface can have. All of the states as well as normative language for
referring to each of the compression operations is provided in the
truly colossal comment for the new isl_aux_state enum. There is also
a diagram showing how surfaces move between the different states.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Jason Ekstrand [Thu, 25 May 2017 18:47:13 +0000 (11:47 -0700)]
i965: Combine render target resolve code
We have two different bits of resolve code for render targets: one in
brw_draw where it's always been and one in brw_context to deal with sRGB
on gen9. Let's pull them together.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Jason Ekstrand [Wed, 31 May 2017 17:06:31 +0000 (10:06 -0700)]
i965: Be a bit more conservative about certain resolves
There are several places where we were resolving the entire miptree
when we really only needed to resolve a single slice. Let's avoid the
unneeded resolving.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Jason Ekstrand [Fri, 26 May 2017 02:13:47 +0000 (19:13 -0700)]
i965/blorp: Move MCS allocation earlier for clears
This way it happens before we call get_aux_state.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Jason Ekstrand [Tue, 23 May 2017 04:27:50 +0000 (21:27 -0700)]
i965/blorp: Refactor do_single_blorp_clear
Previously, we had two checks for can_fast_clear and a tiny bit of
shared code in between. This commit pulls all of the fast clear code
together and duplicates the tiny bit that declares some surface structs
and calls blorp_surf_for_miptree. The duplication is no real loss and
we're about to change the two in slightly different ways.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Jason Ekstrand [Thu, 25 May 2017 23:59:12 +0000 (16:59 -0700)]
i965/blorp: Take an explicit fast clear op in resolve_color
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Jason Ekstrand [Thu, 25 May 2017 05:06:29 +0000 (22:06 -0700)]
i965/miptree: Move color resolve on map to intel_miptree_map
None of the other methods such as blit work with CCS either so we need
to do the resolve for all maps. This change also makes us only resolve
the one slice we're mapping and not the entire image.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Jason Ekstrand [Thu, 25 May 2017 18:04:38 +0000 (11:04 -0700)]
i965: Inline renderbuffer_att_set_needs_depth_resolve
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Jason Ekstrand [Thu, 25 May 2017 04:55:59 +0000 (21:55 -0700)]
i965: Get rid of intel_renderbuffer_resolve_*
There is exactly one caller so it's a bit pointless to have all of this
plumbing. Just inline it at the one place it's used.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Jason Ekstrand [Mon, 22 May 2017 16:23:45 +0000 (09:23 -0700)]
i965/miptree: Refactor intel_miptree_resolve_color
The new version now takes a range of levels as well as a range of
layers. It should also be a tiny bit faster because it only walks the
resolve_map list once instead of once per layer.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Jason Ekstrand [Mon, 22 May 2017 15:59:49 +0000 (08:59 -0700)]
i965/miptree: Clean up the depth resolve helpers a little
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Jason Ekstrand [Fri, 26 May 2017 16:03:42 +0000 (09:03 -0700)]
i965/surface_state: Images can't handle CCS at all
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Jason Ekstrand [Fri, 26 May 2017 16:33:55 +0000 (09:33 -0700)]
i965: Mark depth surfaces as needing a HiZ resolve after blitting
Cc: "17.0 17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Dave Airlie [Thu, 8 Jun 2017 03:27:46 +0000 (13:27 +1000)]
st_glsl_to_tgsi: cleanup variable storage search.
I forgot to put the cleanup in earlier.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Rob Herring [Tue, 30 May 2017 20:30:38 +0000 (15:30 -0500)]
mesa/main: fix gl_buffer_index enum comparison
For clang, enums are unsigned by default and gives the following warning:
external/mesa3d/src/mesa/main/buffers.c:764:21: warning: comparison of constant -1 with expression of type 'gl_buffer_index' is always false [-Wtautological-constant-out-of-range-compare]
if (srcBuffer == -1) {
~~~~~~~~~ ^ ~~
Replace -1 with an enum value to fix this.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Rob Herring [Thu, 1 Jun 2017 00:07:40 +0000 (19:07 -0500)]
glsl: fix bounds check in blob_overwrite_bytes
clang gives a warning in blob_overwrite_bytes because offset type is
size_t which is unsigned:
src/compiler/glsl/blob.c:110:15: warning: comparison of unsigned expression < 0 is always false [-Wtautological-compare]
if (offset < 0 || blob->size - offset < to_write)
~~~~~~ ^ ~
Remove the less than 0 check to fix this.
Additionally, if offset is greater than blob->size, the 2nd check would
be false due to unsigned math. Rewrite the check to avoid subtraction.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Dave Airlie [Tue, 30 May 2017 05:52:14 +0000 (15:52 +1000)]
st_glsl_to_tgsi: replace variables tracking list with a hash table
This removes the linear search which is fail when number of variables
goes up to 30000 or so.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Tue, 30 May 2017 05:52:13 +0000 (15:52 +1000)]
st_glsl_to_tgsi: rewrite rename registers to use array fully.
Instead of having to search the whole array, just use the whole
thing and store a valid bit in there with the rename.
Removes this from the profile on some of the fp64 tests
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Tue, 30 May 2017 05:52:11 +0000 (15:52 +1000)]
st_glsl_to_tgsi: bump index back up to 32-bit
with some of the fp64 emulation, we are seeing shaders coming in with
> 32K temps, they go out with 40 or so used, but while doing register
renumber we need to store a lot of them.
So bump this fields back up to 32-bit.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Marek Olšák [Wed, 7 Jun 2017 21:13:35 +0000 (23:13 +0200)]
util/u_queue: fix a use-before-initialization race for queue->threads
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Grazvydas Ignotas [Wed, 7 Jun 2017 21:00:02 +0000 (00:00 +0300)]
ac/nir: remove another unused variable
Declared by each loop already.
Trivial.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Grazvydas Ignotas [Tue, 6 Jun 2017 23:00:36 +0000 (02:00 +0300)]
radv/meta: remove an unused variable
Trivial.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Grazvydas Ignotas [Tue, 6 Jun 2017 22:55:26 +0000 (01:55 +0300)]
ac/nir: convert several ifs to a switch
Also solve "outinfo may be used uninitialized" warning by putting in an
unreachable().
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Grazvydas Ignotas [Tue, 6 Jun 2017 22:52:30 +0000 (01:52 +0300)]
ac/nir: mark some arguments const
Most functions are only inspecting nir, so nir related arguments can be
marked const. Some more can be done if/when some nir changes are
accepted.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Li [Tue, 6 Jun 2017 22:21:08 +0000 (18:21 -0400)]
radeonsi: Use libdrm to get chipset name
v2: Add a func pointer to radeon_winsys to support radeon later.
Change-Id: I614ea71424f9e5c97e4ae68654315d28c89eaa5f
Signed-off-by: Samuel Li <Samuel.Li@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Thomas Helland [Sat, 3 Jun 2017 17:59:07 +0000 (19:59 +0200)]
util: Add extern c to u_dynarray.h
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Thomas Helland [Wed, 7 Jun 2017 18:46:05 +0000 (20:46 +0200)]
nir: Delete nir_array.h
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Thomas Helland [Wed, 7 Jun 2017 18:45:41 +0000 (20:45 +0200)]
nir: Port to u_dynarray
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Thomas Helland [Wed, 7 Jun 2017 18:45:14 +0000 (20:45 +0200)]
nir: Remove unused include
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Thomas Helland [Thu, 1 Jun 2017 21:15:43 +0000 (23:15 +0200)]
util: Port nir_array functionality to u_dynarray
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Thomas Helland [Thu, 1 Jun 2017 20:38:59 +0000 (22:38 +0200)]
util: Remove unused includes and convert to lower-case memory ops
Also, prepare for the next commit by correcting some coding style
changes. This should be all non-functional changes.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Thomas Helland [Thu, 1 Jun 2017 20:21:19 +0000 (22:21 +0200)]
util: Move u_dynarray to src/util
This will be used as the basis for unification
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Thomas Helland [Thu, 1 Jun 2017 20:52:02 +0000 (22:52 +0200)]
gallium: Add missing includes
These will need to be in place to avoid regressions when
removing these includes from the u_dynarray
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Marek Olšák [Mon, 5 Jun 2017 10:55:14 +0000 (12:55 +0200)]
radeonsi: update clip_regs on shader state changes only when it's needed
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Mon, 5 Jun 2017 10:54:18 +0000 (12:54 +0200)]
radeonsi: precompute some fields for PA_CL_VS_OUT_CNTL in si_shader_selector
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sat, 27 May 2017 17:17:27 +0000 (19:17 +0200)]
radeonsi: add a new helper si_get_vs
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Samuel Pitoiset [Tue, 30 May 2017 20:36:28 +0000 (22:36 +0200)]
radeonsi: isolate real framebuffer changes from the decompression passes (v3)
When a stencil buffer is part of the framebuffer state, it is
decompressed but because it's bindless, all draw calls set
stencil_dirty_level_mask to 1.
v2: Marek - set the flags outside the loop
- also clear and set framebuffer.do_update_surf_dirtiness there
- do it in the DB->CB copy path too
v3: Marek - save and restore the do_update_surf_dirtiness flag
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sat, 3 Jun 2017 11:59:03 +0000 (13:59 +0200)]
radeonsi: do EarlyCSEMemSSA LLVM pass
so that LLVM IR looks like CSE has been run on it. It's also recommended
by the instruction combining pass.
This also fixes:
- GL45-CTS.arrays_of_arrays_gl.InteractionFunctionCalls2 (crash)
- piglit/spec/arb_shader_ballot/execution/fs-readFirstInvocation-uint-loop (fail)
The code size decrease is positive, the register usage isn't. There is
a decrease in VGPR spilling for Tomb Raider, but increase in DiRT Showdown
and GRID Autosport.
EarlyCSEMemSSA has a -0.01% change in code size compared EarlyCSE.
SGPRS: 1935420 -> 1938076 (0.14 %)
VGPRS: 1645504 -> 1645988 (0.03 %)
Spilled SGPRs: 2493 -> 2651 (6.34 %)
Spilled VGPRs: 107 -> 115 (7.48 %)
Private memory VGPRs: 1332 -> 1332 (0.00 %)
Scratch size: 1512 -> 1516 (0.26 %) dwords per thread
Code Size:
61981592 ->
61890012 (-0.15 %) bytes
Max Waves: 371847 -> 371798 (-0.01 %)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Thu, 1 Jun 2017 20:37:25 +0000 (22:37 +0200)]
radeonsi: remove 8 bytes from si_shader_key
We can use a union in si_shader_key::mono.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sun, 28 May 2017 22:40:39 +0000 (00:40 +0200)]
radeonsi: move PSIZE and CLIPDIST unique IO indices after GENERIC
Heaven LDS usage for LS+HS is below. The masks are "outputs_written"
for LS and HS. Note that 32K is the maximum size.
Before:
heaven_x64: ls=1f1 tcs=1f1, lds=32K
heaven_x64: ls=31 tcs=31, lds=24K
heaven_x64: ls=71 tcs=71, lds=28K
After:
heaven_x64: ls=3f tcs=3f, lds=24K
heaven_x64: ls=7 tcs=7, lds=13K
heaven_x64: ls=f tcs=f, lds=17K
All other apps have a similar decrease in LDS usage, because
the "outputs_written" masks are similar. Also, most apps don't write
POSITION in these shader stages, so there is room for improvement.
(tight per-component input/output packing might help even more)
It's unknown whether this improves performance.
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Thomas Hellstrom [Tue, 30 May 2017 13:54:38 +0000 (15:54 +0200)]
svga: Always set the alpha value to 1 when sampling using an XRGB view
If the XRGB view is sampling from an ARGB svga format, change
PIPE_SWIZZLE_W to PIPE_SWIZZLE_1 for all channels.
Previously we unconditionally set PIPE_SWIZZLE_1 on the alpha channel which
could be both insufficient and incorrect.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Thomas Hellstrom [Tue, 30 May 2017 13:51:06 +0000 (15:51 +0200)]
svga: Fix imported surface view creation
When deciding to create a view with or without an alpha channel we need to
look at the SVGA3D format and not the PIPE format.
This fixes the glx-tfp piglit test for dri3/xa.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Thomas Hellstrom [Wed, 26 Apr 2017 13:00:38 +0000 (06:00 -0700)]
svga: Set alpha to 1 for non-alpha views
Gallium RGB textures may be backed by imported ARGB svga3d surfaces. In those
and similar cases we need to set the alpha value to 1 when sampling.
Fixes piglit glx::glx-tfp
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Thomas Hellstrom [Tue, 30 May 2017 13:02:19 +0000 (15:02 +0200)]
svga: Allow format differences in 16-bit RGBA surface sharing
For the purpose of surface sharing, treat SVGA3D_R5G6B5 and
SVGA3D_B5G6R5_UNORM as identical formats.
This fixes the following piglit tests with dri3/xa:
glx@glx-visuals-depth -pixmap
glx@glx-visuals-stencil -pixmap
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Deepak Singh Rawat <drawat@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Thomas Hellstrom [Tue, 16 May 2017 14:25:12 +0000 (07:25 -0700)]
dri/vmwgfx: Disable a couple of glx extensions also for Ubuntu unity / compiz
It appears like the GLX_EXT_buffer_age extension also prevents Compiz /
Ubuntu Unity from performing partial buffer swaps when it otherwise
feels like doing so. So try to get them back again. We also disable
GLX_OML_sync_control since it appears it had a favourable impact on
gnome-shell.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Thomas Hellstrom [Fri, 5 May 2017 13:26:03 +0000 (06:26 -0700)]
dri: Turn of a couple of glx extensions for gnome-shell on vmwgfx.
Increases performance on vmwgfx since we're avoiding full buffer damage and
since we can't sync to vertical retrace anyway.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Thomas Hellstrom [Fri, 5 May 2017 13:06:07 +0000 (06:06 -0700)]
st/dri: Allow gallium drivers to turn off two GLX extensions
Allow gallium drivers to turn off GLX_EXT_buffer_age and
GLX_OML_sync_control if needed, using driconf.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Thomas Hellstrom [Fri, 5 May 2017 12:57:27 +0000 (05:57 -0700)]
dri: Optionally turn off a couple of GLX extensions based on driconf options
With GLX_EXT_buffer_age turned on, gnome-shell will use full-screen damage
with GLX, which severely hurts performance with architectures that emulate
page-flips with copies. Like vmware. We would like to be able to turn off that
extension. Similarly, typically the GLX_OML_sync_control doesn't make much
sense on a virtual architecture since we don't really sync to the host's
vertical retrace. We'd like to be able to turn it off as well.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Thomas Hellstrom [Fri, 5 May 2017 12:49:50 +0000 (05:49 -0700)]
st/dri: Allow dri users to query also driver options
There will be situations where we want to control, for example, the
GLX behaviour based on applications and drivers. So allow DRI users access
to the driver options.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Marek Olšák [Tue, 6 Jun 2017 14:28:59 +0000 (16:28 +0200)]
radeonsi: clean up decompress blend state names
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Tue, 6 Jun 2017 14:26:49 +0000 (16:26 +0200)]
gallium/radeon: clean up a misleading statement from the old days
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Mon, 5 Jun 2017 17:59:06 +0000 (19:59 +0200)]
radeonsi: don't use 1D tiling for Z/S on VI to get TC-compatible HTILE
It's always good to have fewer decompress blits.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Mon, 5 Jun 2017 17:51:38 +0000 (19:51 +0200)]
radeonsi: enable TC-compatible stencil compression on VI
Most things are in place. Ideally we won't see decompress blits for stencil
anymore.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sun, 4 Jun 2017 23:22:45 +0000 (01:22 +0200)]
st/mesa: don't keep framebuffer state in st_context
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sun, 4 Jun 2017 23:08:41 +0000 (01:08 +0200)]
st/mesa: cache pipe_surface for GL_FRAMEBUFFER_SRGB changes
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sun, 4 Jun 2017 21:09:44 +0000 (23:09 +0200)]
st/mesa: use gl_driver_flags::NewFramebufferSRGB
also call st_init_driver_flags when st_context is initialized.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sun, 4 Jun 2017 21:08:07 +0000 (23:08 +0200)]
mesa: add gl_driver_flags::NewFramebufferSRGB
_NEW_BUFFERS updates too much stuff.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Thu, 1 Jun 2017 17:02:16 +0000 (19:02 +0200)]
radeonsi/gfx9: prevent a race when the previous shader's main part is missing
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Thu, 1 Jun 2017 16:57:37 +0000 (18:57 +0200)]
radeonsi/gfx9: wait for main part compilation of 1st shaders of merged shaders
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sat, 27 May 2017 16:49:11 +0000 (18:49 +0200)]
radeonsi/gfx9: fix LS scratch buffer support without TCS for GFX9
LS is merged into TCS. If there is no TCS, LS is merged into fixed-func
TCS. The problem is the fixed-func TCS was ignored by scratch update
functions, so LS didn't have the scratch buffer set up.
Note that Mesa 17.1 doesn't have merged shaders.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sat, 27 May 2017 15:52:31 +0000 (17:52 +0200)]
radeonsi: move streamout state update out of si_update_shaders
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Wed, 31 May 2017 21:09:33 +0000 (23:09 +0200)]
radeonsi: remove dead code in declare_input_fs
Colors are interpolated in the PS prolog. This was never used.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Fri, 26 May 2017 23:22:25 +0000 (01:22 +0200)]
radeonsi: move handling of DBG_NO_OPT_VARIANT into si_shader_selector_key
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Wed, 31 May 2017 11:18:53 +0000 (13:18 +0200)]
radeonsi: use a compiler queue with a low priority for optimized shaders
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Wed, 31 May 2017 20:04:29 +0000 (22:04 +0200)]
util/u_queue: add an option to set the minimum thread priority
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sat, 27 May 2017 10:13:34 +0000 (12:13 +0200)]
radeonsi: decrease the number of compiler threads to num CPUs - 1
Reserve one core for other things (like draw calls).
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Wed, 31 May 2017 16:02:54 +0000 (18:02 +0200)]
radeonsi: drop unfinished shader compilations when destroying shaders
If we enqueue too many jobs and destroy the GL context, it may take
several seconds before the jobs finish. Just drop them instead.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Wed, 31 May 2017 14:44:12 +0000 (16:44 +0200)]
util/u_queue: add a way to remove a job when we just want to destroy it
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Rob Clark [Tue, 6 Jun 2017 17:15:02 +0000 (13:15 -0400)]
freedreno/a5xx: set SP_BLEND_CONTROL properly
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Sat, 3 Jun 2017 17:36:25 +0000 (13:36 -0400)]
freedreno/a5xx: LRZ support
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Sat, 3 Jun 2017 16:53:15 +0000 (12:53 -0400)]
freedreno: drop timestamp field
unused.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Sat, 3 Jun 2017 16:42:35 +0000 (12:42 -0400)]
freedreno/a5xx: refactor out helper for LRZ flush
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Sat, 3 Jun 2017 16:34:28 +0000 (12:34 -0400)]
freedreno: reshuffle FD_MESA_DEBUG bitmask
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Sat, 3 Jun 2017 16:30:36 +0000 (12:30 -0400)]
freedreno: update generated headers
Signed-off-by: Rob Clark <robdclark@gmail.com>
Marek Olšák [Tue, 30 May 2017 21:52:07 +0000 (23:52 +0200)]
gallium/u_blitter: use 2D_ARRAY for cubemap blits if possible
so that we can use TXF.
The cubemap blit pixel shader code size: 148 -> 92 bytes
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Tue, 30 May 2017 20:18:40 +0000 (22:18 +0200)]
gallium/u_blitter: use TXF if possible
This fixes piglit:
arb_texture_view-rendering-r32ui
TEX (image_sample) flushes denorms to 0 with FP32 textures on GCN, but such
a texture can contain integer data written using an integer render view.
If we do a transfer blit with TEX, denorms are flushed to 0. Luckily,
TXF (image_load) doesn't do that.
TXF also doesn't need to load the sampler state, so blit shaders don't have
to do s_load_dwordx4.
TXF doesn't do CLAMP_TO_EDGE, so it can only be used if the src box is
in bounds, or if we clamp manually (this commit doesn't).
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Tue, 30 May 2017 15:07:47 +0000 (17:07 +0200)]
gallium/u_blitter: use TEX_LZ if it's supported
The sampler views always have first_level == last_level.
Now radeonsi doesn't have to use the WQM. (a few SALU removed)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Tue, 30 May 2017 17:24:17 +0000 (19:24 +0200)]
gallium/util: add _LZ and TXF options to simple shaders
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>