Bas Nieuwenhuizen [Wed, 12 Jun 2019 22:52:18 +0000 (00:52 +0200)]
radv: Skip transitions coming from external queue.
Transitions to external queue should do the transition & make sure
it works on all queues.
Fixes:
8ebc7dcb59a "radv: Allow fast clears with concurrent queue mask for some layouts."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Mateusz Krzak [Fri, 7 Jun 2019 22:33:30 +0000 (00:33 +0200)]
lima/ppir: change offset type to int
Offset doesn't need to be 64-bit. This fixes compilation error
with 64-bit off_t.
Fixes:
af0de6b9 lima/ppir: implement discard and discard_if
Suggested-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Mateusz Krzak <kszaquitto@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
Chia-I Wu [Wed, 15 May 2019 22:38:49 +0000 (15:38 -0700)]
virgl: virgl_transfer should own its virgl_resource
We should avoid having potentially dangling pointers to
pipe_resources in general.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Chia-I Wu [Wed, 15 May 2019 22:46:40 +0000 (15:46 -0700)]
virgl: pass virgl_context to transfer create/destroy
A pipe_transfer is a context object. It is fine for the
constructor/destructor to have access to the context.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Chia-I Wu [Wed, 15 May 2019 22:52:34 +0000 (15:52 -0700)]
virgl: init transfer queue from virgl_context
A pipe_transfer is a context object. It is fine for
virgl_transfer_queue to have access to the context.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Chia-I Wu [Wed, 15 May 2019 23:01:02 +0000 (16:01 -0700)]
virgl: clean up virgl_transfer_queue.h
Add header guard and forward declare structs. Move virgl_resource.h
inclusion to the C file.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Nicolai Hähnle [Mon, 29 Apr 2019 14:02:29 +0000 (16:02 +0200)]
radeonsi: add radeonsi_debug_disassembly option
This dumps disassembly to the pipe_debug_callback together with shader
stats.
Can be used together with shader-db to get full disassembly of all shaders
in the database.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Mon, 29 Apr 2019 13:59:22 +0000 (15:59 +0200)]
radeonsi: fix line splitting in si_shader_dump_assembly
Compute the count since the start of the current line instead of the
count since the start of the the disassembly.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Sat, 4 May 2019 10:37:36 +0000 (12:37 +0200)]
radeonsi: raise the alignment of LDS memory for compute shaders
This implies that the memory will always be at address 0, which allows
LLVM to generate slightly better code.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Sat, 4 May 2019 10:34:52 +0000 (12:34 +0200)]
radeonsi: use an explicit symbol for the LSHS LDS memory
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Sat, 4 May 2019 10:18:07 +0000 (12:18 +0200)]
radeonsi: rename lds_{load,store} to lshs_lds_{load,store}
These functions are now only used in LS/HS shaders (both separate and
merged).
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Sat, 4 May 2019 10:11:08 +0000 (12:11 +0200)]
radeonsi/gfx9: declare LDS ESGS ring as an explicit symbol on LLVM >= 9
This will make it easier to use LDS for other purposes in geometry
shaders in the future.
The lifetime of the esgs_ring variable is as follows:
- declared as [0 x i32] while compiling shader parts or monolithic shaders
- just before uploading, gfx9_get_gs_info computes (among other things)
the final ESGS ring size (this depends on both the ES and the GS shader)
- during upload, the "esgs_ring" symbol is given to ac_rtld as a shared
LDS symbol, which will lead to correctly laying out the LDS including
other LDS objects that may be defined in the future
- si_shader_gs uses shader->config.lds_size as the LDS size
This change depends on the LLVM changes for emitting LDS symbols into
the ELF file.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Fri, 3 May 2019 19:18:51 +0000 (21:18 +0200)]
amd/rtld: layout and relocate LDS symbols
Upcoming changes to LLVM will emit LDS objects as symbols in the ELF
symbol table, with relocations that will be resolved with this change.
Callers will also be able to define LDS symbols that are shared between
shader parts. This will be used by radeonsi for the ESGS ring in gfx9+
merged shaders.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Fri, 30 Nov 2018 10:50:07 +0000 (11:50 +0100)]
radeonsi: cleanup some #includes
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Thu, 4 Apr 2019 09:49:52 +0000 (11:49 +0200)]
amd/common: use ARRAY_SIZE for the LLVM command line options
This is more convenient for changing it around during debug.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Fri, 3 May 2019 17:15:52 +0000 (19:15 +0200)]
radeonsi: inline si_shader_binary_read_config into its only caller
Since it can only be used for reading the config of an individual,
non-combined shader, it is not very reusable anyway.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Tue, 22 May 2018 14:14:16 +0000 (16:14 +0200)]
radeonsi: use the new run-time linker for shaders
v2:
- fix a memory leak
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Thu, 6 Dec 2018 11:43:44 +0000 (12:43 +0100)]
radeonsi: don't declare pointers to static strings
The compiler should be able to optimize them away, but still. There's
no point in declaring those as pointers, and if the compiler *doesn't*
optimize them away, they add unnecessary load-time relocations.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Wed, 28 Nov 2018 11:46:45 +0000 (12:46 +0100)]
amd/common: add ac_compile_module_to_elf
A new variant of ac_compile_module_to_binary that allows us to
keep the entire ELF around.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Thu, 6 Dec 2018 11:35:36 +0000 (12:35 +0100)]
radeonsi: dump shader binary buffer contents
Help identify bugs related to corruption of shaders in memory,
or errors in shader upload / rtld.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Thu, 17 May 2018 16:17:07 +0000 (18:17 +0200)]
radeonsi: return bool from si_shader_binary_upload
We didn't really use error codes anyway.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Wed, 28 Nov 2018 10:32:01 +0000 (11:32 +0100)]
radeonsi: let si_shader_create return a boolean
We didn't really use error codes anyway.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Wed, 9 May 2018 14:38:33 +0000 (16:38 +0200)]
radeonsi: use ac_shader_config
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Fri, 4 May 2018 14:00:35 +0000 (16:00 +0200)]
amd/common: add a more powerful runtime linker
Using an explicit linker instead of just concatenating .text
sections will allow us to start using .rodata sections and
explicit descriptions of data on LDS that is shared between
stages.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Caio Marcelo de Oliveira Filho [Mon, 10 Jun 2019 21:23:34 +0000 (14:23 -0700)]
i965: Fix INTEL_DEBUG=bat
Use hash_table_u64 instead of hash_table directly, since the former
will also handle the special keys (deleted and freed) and allow use
the whole u64 space.
Fixes crash in INTEL_DEBUG=bat when using a key with value 0 -- the
current value for a freed key.
Fixes:
b38dab101ca "util/hash_table: Assert that keys are not reserved pointers"
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Caio Marcelo de Oliveira Filho [Mon, 10 Jun 2019 19:10:54 +0000 (12:10 -0700)]
util/hash_table: Properly handle the NULL key in hash_table_u64
The hash_table_u64 should support any uint64_t as input. It does
special handling for the "deleted" key, storing the data in the table
itself; do the same for the "freed" key.
Fixes:
b38dab101ca "util/hash_table: Assert that keys are not reserved pointers"
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Nicolai Hähnle [Wed, 23 May 2018 19:52:26 +0000 (21:52 +0200)]
amd/common: clarify ac_shader_binary::lds_size
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Tue, 22 May 2018 11:29:27 +0000 (13:29 +0200)]
amd/common: extract ac_parse_shader_binary_config
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Mon, 13 May 2019 14:58:08 +0000 (16:58 +0200)]
u_dynarray: turn util_dynarray_{grow, resize} into element-oriented macros
The main motivation for this change is API ergonomics: most operations
on dynarrays are really on elements, not on bytes, so it's weird to have
grow and resize as the odd operations out.
The secondary motivation is memory safety. Users of the old byte-oriented
functions would often multiply a number of elements with the element size,
which could overflow, and checking for overflow is tedious.
With this change, we only need to implement the overflow checks once.
The checks are cheap: since eltsize is a compile-time constant and the
functions should be inlined, they only add a single comparison and an
unlikely branch.
v2:
- ensure operations are no-op when allocation fails
- in util_dynarray_clone, call resize_bytes with a compile-time constant element size
v3:
- fix iris, lima, panfrost
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Mon, 13 May 2019 14:58:07 +0000 (16:58 +0200)]
u_dynarray: return 0 on realloc failure and ensure no-op
We're not very good at handling out-of-memory conditions in general, but
this change at least gives the caller the option of handling it gracefully
and without memory leaks.
This happens to fix an error in out-of-memory handling in i965, which has
the following code in brw_bufmgr.c:
node = util_dynarray_grow(vma_list, sizeof(struct vma_bucket_node));
if (unlikely(!node))
return 0ull;
Previously, allocation failure for util_dynarray_grow wouldn't actually
return NULL when the dynarray was previously non-empty.
v2:
- make util_dynarray_ensure_cap a no-op on failure, add MUST_CHECK attribute
- simplify the new capacity calculation: aside from avoiding a useless loop
when newcap is very large, this also avoids an infinite loop when newcap
is larger than 1 << 31
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Fri, 3 May 2019 16:11:27 +0000 (18:11 +0200)]
freedreno: use util_dynarray_clear instead of util_dynarray_resize(_, 0)
This is more expressive and simplifies a subsequent change.
v2:
- fix one more call-site after rebase
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Alyssa Rosenzweig [Tue, 11 Jun 2019 17:24:57 +0000 (10:24 -0700)]
panfrost/midgard: Differentiate vertex/fragment texture tags
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Tue, 11 Jun 2019 16:55:18 +0000 (09:55 -0700)]
panfrost/midgard: Assert on unknown texture source
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Tue, 11 Jun 2019 16:54:22 +0000 (09:54 -0700)]
panfrost/midgard: Set minimal swizzle on texture input
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Tue, 11 Jun 2019 16:51:29 +0000 (09:51 -0700)]
panfrost/midgard: Lower texture projectors
We do have native support for perspective division on the load/store
unit, but this is for the future, something ideally we would select
generally, not just for textures. Meanwhile, flipping on projector
lowering works now.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Tue, 11 Jun 2019 16:43:08 +0000 (09:43 -0700)]
panfrost/midgard: Implement txl
This follows the txb implementation, but requires an adjustment to how
the cont/last flags are set.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Tue, 11 Jun 2019 16:23:05 +0000 (09:23 -0700)]
panfrost/midgard: Implement txb op
We refactor the main tex handling to fit a bias argument in as well.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Tue, 4 Jun 2019 23:48:17 +0000 (23:48 +0000)]
panfrost: Unify bind_vs/fs_state
This replaces bind_vs/fs_state calls to a unified bind_shader_state
call, removing a great deal of duplicated logic related to variant
selection.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Tue, 4 Jun 2019 23:47:35 +0000 (23:47 +0000)]
panfrost: Add panfrost_job_type_for_pipe helper
This logic is repeated in a bunch of places and will only grow worse as
we support more job types; collect it.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Tue, 4 Jun 2019 23:26:09 +0000 (23:26 +0000)]
panfrost/midgard: Extract emit_varying_read
Paralleling emit_uniform_read, this allows varying reads to be emitted
independent of an honest-to-goodness load vary instruction in the NIR.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Tue, 11 Jun 2019 21:56:30 +0000 (14:56 -0700)]
panfrost: Remove "vertex/tiler render target" silliness
I don't think these are actual structures, just figments over
cargoculting dumped memory without making any sense of it. Nothing seems
to break if the region is zeroed out, anyway.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Tue, 11 Jun 2019 20:47:37 +0000 (13:47 -0700)]
panfrost/decode: Print line number of bad memory access
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Tue, 11 Jun 2019 19:25:35 +0000 (12:25 -0700)]
panfrost: Replace pantrace with direct decoding
History lesson! In the early days of a Panfrost, we had a library
independent of the driver called `panwrap` which would be LD_PRELOAD'ed
into a driver to decode its cmdstream in real-time. When upstreaming
Panfrost, we realized that we would much rather have this decode
functionality maintained in-tree to avoid divergence, but that we could
not upstream panwrap because of its use with the legacy API. So we
instead dumped GPU memory to the filesystem with an out-of-tree panwrap,
and decoded that with the in-tree pandecode module. When we migrated to
the new kernel, we just added support for doing this memory dump
directly from the driver (via a module "pantrace").
This works, but dumping memory every frame is sloooooooooooooow and
error-prone. I figured if we have pandecode in-tree, we might as well
link to it directly in the driver, allowing us to decode Panfrost's
command streams without dumping memory to the filesystem first. This
cleans up the code *substantially* and improves dumping performance by a
HUGE margin. I'm talking "several seconds per frame" to "dumping in
real-time" kind of jump.
Note to users: this removes the environmental option "PANTRACE_BASE".
Instead, for equivalent functionality set "PAN_MESA_DEBUG=trace" and
redirect stdout to the file of your choosing.
This should be debugging Panfrost much more pleasant.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Kevin Strasser [Thu, 30 May 2019 20:31:20 +0000 (13:31 -0700)]
st/mesa: Add rgbx handling for fp formats
Add missing cases for fp32 and fp16 formats.
Fixes:
c68334ffc0a9 "st/mesa: add floating point formats in st_new_renderbuffer_fb()"
Signed-off-by: Kevin Strasser <kevin.strasser@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Kevin Strasser [Thu, 30 May 2019 19:37:07 +0000 (12:37 -0700)]
gallium/winsys/kms: Fix dumb buffer bpp
The bpp in the dumb buffer creation request is hardcoded to 32, which is an
incorrect assumption as the caller is free to pick any pipe format. Use the
bpp supplied to us through util_format_get_blocksizebits().
Fixes:
3b176c441b "gallium: Add a dumb drm/kms winsys backed swrast provider"
Signed-off-by: Kevin Strasser <kevin.strasser@intel.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Eric Engestrom [Wed, 12 Jun 2019 16:23:27 +0000 (17:23 +0100)]
util/futex: fix dangling pointer use
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110901
Fixes:
7dc2f4788288ec9c7ab6 "util: emulate futex on FreeBSD using umtx"
Cc: Greg V <greg@unrelenting.technology>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Samuel Pitoiset [Wed, 12 Jun 2019 07:44:29 +0000 (09:44 +0200)]
radv: fix VK_EXT_memory_budget if one heap isn't available
When the visible VRAM size is equal to the VRAM size only two
heaps are exposed.
This fixes dEQP-VK.api.info.device.memory_budget.
Cc: 19.0 19.1 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Tue, 11 Jun 2019 14:46:32 +0000 (16:46 +0200)]
radv: fix occlusion queries on VegaM
The number of render backends is 16 but the enabled mask is 0xaaaa.
As noticed by Bas, allowing disabled render backends might break
the OCCLUSION_QUERY packet. We don't use it yet but keep this in
mind.
This fixes dEQP-VK.query_pool.* and dEQP-VK.multiview.*.
Cc: 19.0 19.1 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Lionel Landwerlin [Wed, 12 Jun 2019 09:41:36 +0000 (12:41 +0300)]
anv: do not parse genxml data without INTEL_DEBUG=bat
This significantly slows down the CTS runs.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes:
32ffd90002b04b ("anv: add support for INTEL_DEBUG=bat")
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Lionel Landwerlin [Sat, 27 Apr 2019 01:36:23 +0000 (09:36 +0800)]
intel/dump: fix segfault when the app hasn't accessed the device
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Caio Marcelo de Oliveira Filho [Wed, 5 Jun 2019 05:29:13 +0000 (22:29 -0700)]
iris: Only upload surface state for grid info when needed
Special care is needed to ensure that when we have two consecutive
calls with the same grid size, we only bail in the second one if it
either don't need the surface state or the surface state was already
uploaded.
v2: Instead of having a new bool in ice->state to know whether we had
a surface, check whether we have state->ref. (Ken)
Clean up the logic a little bit by adding 'grid_updated' local. (Ken)
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> [v1]
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Caio Marcelo de Oliveira Filho [Tue, 4 Jun 2019 20:38:36 +0000 (13:38 -0700)]
iris: Create binding table slot for num_work_groups only when needed
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Rui Salvaterra [Fri, 7 Jun 2019 11:19:23 +0000 (12:19 +0100)]
r300g: implement GLSL disk shader caching
This implements GLSL disk shader caching for the R300-R500 series of AMD GPUs.
Signed-off-by: Rui Salvaterra <rsalvaterra@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Richard Thier [Sat, 8 Jun 2019 06:35:36 +0000 (08:35 +0200)]
r300g: restore performance after RADEON_FLAG_NO_INTERPROCESS_SHARING was added
v1: Fix skipped slab allocators and the buffer cache.
v2: Use only 1 domain for texture allocation
v3: Added flag for the create_fence call too
Based on Marek v1 and v2 proposed fixes.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=1107812.patch
Cc: 19.1 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Marek Olšák [Tue, 21 May 2019 22:39:43 +0000 (18:39 -0400)]
radeonsi: don't test SDMA perf if SDMA is disabled/unsupported
Marek Olšák [Wed, 29 May 2019 00:12:53 +0000 (20:12 -0400)]
radeonsi: always interpolate PrimID as flat
Marek Olšák [Mon, 3 Jun 2019 18:51:08 +0000 (14:51 -0400)]
radeonsi: move color clamping to si_llvm_export_vs to unify the code
Marek Olšák [Mon, 3 Jun 2019 23:43:44 +0000 (19:43 -0400)]
radeonsi: use the ac helper for index buffer stores in the culling shader
Marek Olšák [Mon, 3 Jun 2019 20:35:37 +0000 (16:35 -0400)]
radeonsi: use the ac helper for image stores
Marek Olšák [Fri, 24 May 2019 22:56:05 +0000 (18:56 -0400)]
radeonsi: use the ac helper for SSBO stores
Marek Olšák [Mon, 3 Jun 2019 22:11:27 +0000 (18:11 -0400)]
radeonsi: fixes for vec3 buffer stores in LLVM 9
Caio Marcelo de Oliveira Filho [Mon, 10 Jun 2019 03:58:08 +0000 (20:58 -0700)]
iris: Enable PIPE_CAP_CS_DERIVED_SYSTEM_VALUES_SUPPORTED
This avoids lowering of CS system values by GLSL (configured by state
tracker). In i965 we don't use that lowering, and we also shouldn't
need that in Iris.
Using it cause some unnecessary round trip between values, e.g.:
shader uses gl_LocalInvocationIndex, GLSL rewrites it in terms of
gl_LocalInvocationID, then driver rewrites those in terms of
gl_LocalInvocationIndex again. Copy propagation can make some of
those go away, but not all as seen below.
Intel SKL shader-db results:
total instructions in shared programs:
15595189 ->
15594556 (<.01%)
instructions in affected programs: 74880 -> 74247 (-0.85%)
helped: 81
HURT: 4
helped stats (abs) min: 2 max: 172 x̄: 7.88 x̃: 4
helped stats (rel) min: 0.19% max: 5.66% x̄: 1.71% x̃: 1.23%
HURT stats (abs) min: 1 max: 2 x̄: 1.25 x̃: 1
HURT stats (rel) min: 0.45% max: 1.65% x̄: 0.76% x̃: 0.46%
95% mean confidence interval for instructions value: -11.56 -3.34
95% mean confidence interval for instructions %-change: -1.91% -1.28%
Instructions are helped.
total loops in shared programs: 4831 -> 4831 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0
total cycles in shared programs:
372136618 ->
372145628 (<.01%)
cycles in affected programs: 9218230 -> 9227240 (0.10%)
helped: 131
HURT: 86
helped stats (abs) min: 1 max: 798 x̄: 39.79 x̃: 12
helped stats (rel) min: <.01% max: 6.75% x̄: 0.42% x̃: 0.13%
HURT stats (abs) min: 2 max: 2442 x̄: 165.38 x̃: 6
HURT stats (rel) min: <.01% max: 20.83% x̄: 0.74% x̃: 0.12%
95% mean confidence interval for cycles value: -2.07 85.11
95% mean confidence interval for cycles %-change: -0.22% 0.30%
Inconclusive result (value mean confidence interval includes 0).
total spills in shared programs: 11956 -> 11950 (-0.05%)
spills in affected programs: 77 -> 71 (-7.79%)
helped: 3
HURT: 0
total fills in shared programs: 25619 -> 25549 (-0.27%)
fills in affected programs: 593 -> 523 (-11.80%)
helped: 4
HURT: 0
LOST: 0
GAINED: 0
Total CPU time (seconds): 1695.69 -> 1706.03 (0.61%)
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Caio Marcelo de Oliveira Filho [Mon, 10 Jun 2019 03:56:09 +0000 (20:56 -0700)]
gallium: Add PIPE_CAP_CS_DERIVED_SYSTEM_VALUES_SUPPORTED
Tells whether or not the driver can handle gl_LocalInvocationIndex and
gl_GlobalInvocationID. If not supported (the default), state tracker
will lower those on behalf of the driver.
v2: Add case to u_screen.c. (Anholt)
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Caio Marcelo de Oliveira Filho [Sun, 9 Jun 2019 17:25:07 +0000 (10:25 -0700)]
st/glsl: Perform some var optimizations
Perform those before some derefs are gone when we lower the buffers
after the st_nir_opts() call.
Intel SKL shader-db results:
total instructions in shared programs:
15593685 ->
15590708 (-0.02%)
instructions in affected programs: 378078 -> 375101 (-0.79%)
helped: 777
HURT: 44
helped stats (abs) min: 1 max: 68 x̄: 4.07 x̃: 4
helped stats (rel) min: 0.04% max: 31.58% x̄: 2.88% x̃: 1.37%
HURT stats (abs) min: 1 max: 24 x̄: 4.20 x̃: 2
HURT stats (rel) min: 0.17% max: 8.00% x̄: 1.60% x̃: 1.27%
95% mean confidence interval for instructions value: -4.02 -3.23
95% mean confidence interval for instructions %-change: -2.93% -2.35%
Instructions are helped.
total loops in shared programs: 4815 -> 4815 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0
total cycles in shared programs:
371965528 ->
371788566 (-0.05%)
cycles in affected programs:
184190307 ->
184013345 (-0.10%)
helped: 3650
HURT: 2855
helped stats (abs) min: 1 max: 59400 x̄: 99.45 x̃: 15
helped stats (rel) min: <.01% max: 43.18% x̄: 2.60% x̃: 1.02%
HURT stats (abs) min: 1 max: 16362 x̄: 65.16 x̃: 10
HURT stats (rel) min: <.01% max: 66.22% x̄: 2.78% x̃: 0.81%
95% mean confidence interval for cycles value: -53.73 -0.68
95% mean confidence interval for cycles %-change: -0.39% -0.08%
Cycles are helped.
total spills in shared programs: 11936 -> 11956 (0.17%)
spills in affected programs: 443 -> 463 (4.51%)
helped: 0
HURT: 8
total fills in shared programs: 25644 -> 25619 (-0.10%)
fills in affected programs: 2306 -> 2281 (-1.08%)
helped: 24
HURT: 2
LOST: 7
GAINED: 16
Total CPU time (seconds): 1679.04 -> 1695.69 (0.99%)
shader-db results radeonsi (VEGA64):
Totals from affected shaders:
SGPRS: 180160 -> 179552 (-0.34 %)
VGPRS: 115368 -> 114544 (-0.71 %)
Spilled SGPRs: 5627 -> 5603 (-0.43 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 7808364 -> 7803268 (-0.07 %) bytes
LDS: 192 -> 192 (0.00 %) blocks
Max Waves: 19202 -> 19340 (0.72 %)
Wait states: 0 -> 0 (0.00 %)
Radeonsi results provided by Timothy.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Ville Syrjälä [Mon, 10 Jun 2019 11:21:05 +0000 (14:21 +0300)]
anv/cmd_buffer: Reuse gen8 Cmd{Set, Reset}Event on gen7
Modern DXVK requires event support [1], but looks like it only
uses vkCmdSetEvent() + vkGetEventStatus(). So we can just
borrow the relevant code from gen8, leaving CmdWaitEvents still
unimplemented.
[1] https://github.com/doitsujin/dxvk/commit/
8c3900c533d83d12c970b905183d17a1d3e8df1f
v2: Also move CmdWaitEvents into genX_cmd_buffer.c (Jason)
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Ian Romanick [Fri, 7 Jun 2019 22:45:03 +0000 (15:45 -0700)]
intel/fs: Mark source 0 of bcsel as needing Boolean resolve
The other sources of the bcsel behave like the sources of an and or
other logical operation. However, source zero behaves differently.
It is evaluated as a Boolean, so it needs to be resolved.
No shader-db changes, but the tests mentioned in the bug get a couple
instructions added back.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110857
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Rob Clark [Thu, 6 Jun 2019 14:43:19 +0000 (07:43 -0700)]
freedreno/a5xx: enable a540
Tested-by: Jeffrey Hugo <jeffrey.l.hugo@gmail.com>
Signed-off-by: Rob Clark <robdclark@chromium.org>
Rob Clark [Mon, 10 Jun 2019 23:12:12 +0000 (16:12 -0700)]
freedreno/a6xx: enable UBWC by default
Flip the FD_MESA_DEBUG flag to a disable rather than enable, drop the
obsolete comment (and bonus, drop unused softpin debug flag)
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Rob Clark [Mon, 10 Jun 2019 22:57:43 +0000 (15:57 -0700)]
freedreno/a6xx: disallow UBWC for z24s8
This is slightly annoying because it *mostly* works.. but we have some
issues to sort out about how to blit z24s8/x24s8/z24x8 with UBWC before
we can enable UBWC by default. For now it is a step forward to at least
enable it for non-z/s while we figure out how to blit z24s8+UBWC.
(The basic issue is that pretending z24s8 is an equivalently sized rgba
format for the purpose of blitting falls apart when UBWC is in the
picture.)
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Rob Clark [Mon, 10 Jun 2019 13:43:52 +0000 (06:43 -0700)]
freedreno/a6xx: use correct UBWC reg builders
No functional change, the registers have the same layout as MRT flags
pitch reg.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Rob Clark [Mon, 10 Jun 2019 13:36:31 +0000 (06:36 -0700)]
freedreno: update generated headers
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Rob Clark [Fri, 7 Jun 2019 19:14:10 +0000 (12:14 -0700)]
freedreno/a6xx: disable UBWC for some formats
An older blob claims to support UBWC w/ r32ui an r32i, but not r32f.
Results from deqp indicate that it doesn't work with r32ui and r32i.
This *could* also just mean that use as "IBO" (image) is more limited
than as texture, although blob also doesn't seem to bother to try to use
UBWC with images at all, so hard to know for sure.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Rob Clark [Fri, 7 Jun 2019 18:00:56 +0000 (11:00 -0700)]
freedreno/a6xx: handle non-UWC-compatible image views
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Rob Clark [Fri, 7 Jun 2019 17:31:59 +0000 (10:31 -0700)]
freedreno/a6xx: handle non-UBWC-compatible texture views
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Rob Clark [Fri, 7 Jun 2019 17:14:12 +0000 (10:14 -0700)]
freedreno: add helper to uncompress UBWC resource
We'll need this for a few edge cases, like image/sampler view that uses
a format that UBWC does not support with a resource originally created
in a format that UBWC does support.
NOTE we *could* in some cases do an in-place uncompress. But that has
a couple potential sharp edges:
1) the uncompressed buffer could have different layout, ie. a5xx
with meta and pixel data of layers/levels interleaved.
2) if it comes mid-batch, it would force flush, or somehow fixing
up cmdstream for draws already emitted. But with the resource
shadowing approach we can rely on batch re-ordering to avoid
splitting things.. older draws see the older compressed version,
newer draws see the new uncompressed version of the rsc.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Rob Clark [Fri, 7 Jun 2019 18:20:11 +0000 (11:20 -0700)]
freedreno: handle images in rebind_resource()
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Rob Clark [Fri, 7 Jun 2019 16:39:30 +0000 (09:39 -0700)]
freedreno: allow null discard box in shadow path
When uncompressing a UBWC buffer, we don't want to discard anything.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Rob Clark [Fri, 7 Jun 2019 16:29:53 +0000 (09:29 -0700)]
freedreno: swap UBWC state in shadow path
It doesn't come up yet, as so far we only hit this path with linear
buffers. But it will when we start re-using the shadow path for
uncompressing UBWC buffers.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Rob Clark [Fri, 7 Jun 2019 16:23:16 +0000 (09:23 -0700)]
freedreno: add modifier param to fd_try_shadow_resource()
To uncompress UBWC, I want to re-use the shadow path, but we'll need a
way to request that the new buffer is not compressed.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Rob Clark [Fri, 7 Jun 2019 16:12:52 +0000 (09:12 -0700)]
freedreno: correct modifier for UBWC buffers
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Chia-I Wu [Thu, 6 Jun 2019 17:55:59 +0000 (10:55 -0700)]
virgl: consider newly created resources idle
A newly created resource can be regarded as idle. We don't care if
the RESOURCE_CREATE command has been retired, unless it is used for
fencing.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Chia-I Wu [Fri, 10 May 2019 18:56:46 +0000 (11:56 -0700)]
virgl: make resource_wait/resource_is_busy cheaper
The round trip to the kernel is expensive. Add a local cache to
avoid it when possible.
There is a race condition when two contexts access the same resource
at the same time (e.g., ctx1 submits a cmdbuf that accesses a
resource while ctx2 maps the resource). But that is probably an app
bug in the first place.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Chia-I Wu [Mon, 10 Jun 2019 23:05:48 +0000 (16:05 -0700)]
virgl: add virgl_drm_{alloc,free,clear}_res_list
Helpers to work with resource list. virgl_drm_release_all_res is
removed.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Chia-I Wu [Thu, 6 Jun 2019 21:58:39 +0000 (14:58 -0700)]
virgl: do not cache external resources
We should not reuse a resource for other purposes when it can still
be accessed by another process or device.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Alyssa Rosenzweig [Mon, 10 Jun 2019 15:04:10 +0000 (08:04 -0700)]
panfrost: Enable AFBC on depth/stencil
This seems to be a performance win, but more rigorous testing is
necessary to figure out the exact circumstances when this is good/bad.
Incidentally, this fixes non-aligned ZS.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Mon, 10 Jun 2019 14:43:41 +0000 (07:43 -0700)]
panfrost: Linear depth/stencil should be aligned
We might render to it.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Tue, 11 Jun 2019 14:41:09 +0000 (07:41 -0700)]
panfrost/midgard: Decode LOD/bias registers
For constant LODs/biases, we can use an immediate embedded in the
texture (already decoded); for non-constant, we have to use a register
squeezed into the usual immediate field, which is decoded here.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Mon, 10 Jun 2019 21:56:54 +0000 (14:56 -0700)]
panfrost/midgard: Decode texture offset register swizzle
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Mon, 10 Jun 2019 21:56:32 +0000 (14:56 -0700)]
panfrost/midgard/disasm: include textureGather()
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Mon, 10 Jun 2019 20:27:10 +0000 (13:27 -0700)]
panfrost/midgard: Support negative immediate offsets
It's not at all clear why this work for texelFetch but not texture.
Maybe the top bits are dual-purpose on other texturing ops...?
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Mon, 10 Jun 2019 20:19:15 +0000 (13:19 -0700)]
panfrost/midgard: Fix redunant mask redundancy
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Mon, 10 Jun 2019 20:13:51 +0000 (13:13 -0700)]
panfrost/midgard/disasm: Print LOD for texelFetch
Its encoding differs slightly from the LOD used in normal texture calls.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Mon, 10 Jun 2019 20:09:39 +0000 (13:09 -0700)]
panfrost/midgard: Identify the in_reg_full field
This is clear for texelFetch, hence the confusion with Bifrost's filter
field, but it's much more general in reality.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Mon, 10 Jun 2019 19:12:49 +0000 (12:12 -0700)]
panfrost/midgard/disasm: Correctly dump bias/LOD
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Mon, 10 Jun 2019 19:12:27 +0000 (12:12 -0700)]
panfrost/midgard/disasm: Cleanup texture op code
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Mon, 10 Jun 2019 18:52:32 +0000 (11:52 -0700)]
panfrost/midgard/disasm: Add missing space
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Mon, 10 Jun 2019 18:51:54 +0000 (11:51 -0700)]
panfrost/midgard/disasm: LOD immediate/register select
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Mon, 10 Jun 2019 18:51:16 +0000 (11:51 -0700)]
panfrost/midgard/disasm: Use texture op name bare
This allows us to show a call to textureLod in a reasonable way.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Mon, 10 Jun 2019 18:18:41 +0000 (11:18 -0700)]
panfrost/midgard/disasm: Varying perspective divides
With an extra flag, we're able to do a perspective division "for free"
while loading a varying.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Mon, 10 Jun 2019 18:05:40 +0000 (11:05 -0700)]
panfrost/midgard: Add perspective division opcodes
...on the load/store unit, not the ALUs. Looks goofy but hey.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>