Timothy Arceri [Tue, 14 Mar 2017 00:22:44 +0000 (11:22 +1100)]
util/disk_cache: don't fallback to an empty cache dir on evict
If we fail to randomly select a two letter cache dir, don't select
an empty dir on fallback.
In real world use we should never hit the fallback path but it can
be hit by tests when the cache is set to a very small max value.
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
Timothy Arceri [Mon, 13 Mar 2017 00:07:30 +0000 (11:07 +1100)]
util/disk_cache: use a thread queue to write to shader cache
This should help reduce any overhead added by the shader cache
when programs are not found in the cache.
To avoid creating any special function just for the sake of the
tests we add a one second delay whenever we call dick_cache_put()
to give it time to finish.
V2: poll for file when waiting for thread in test
V3: fix poll delay to really be 100ms, and simplify the wait function
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
Timothy Arceri [Sun, 12 Mar 2017 23:14:35 +0000 (10:14 +1100)]
util/disk_cache: add helpers for creating/destroying disk cache put jobs
V2: Make a copy of the data so we don't have to worry about it being
freed before we are done compressing/writing.
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
Timothy Arceri [Wed, 8 Mar 2017 23:51:01 +0000 (10:51 +1100)]
util/disk_cache: add thread queue to disk cache
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
Dave Airlie [Tue, 14 Mar 2017 21:15:50 +0000 (07:15 +1000)]
radv/ac: workaround regression in llvm 4.0 release
LLVM 4.0 released with a pretty messy regression, that hopefully
get fixed in the future.
This work around was proposed by Tom, and it fixes the CTS regressions
here at least, I'm not sure if this will cause any major side effects,
but correctness over speed and all that.
radeonsi should possibly consider the same workaround until an llvm
fix can be found.
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Mon, 27 Feb 2017 01:30:41 +0000 (11:30 +1000)]
radv/ac: gather4 cube workaround integer
This fix is extracted from amdgpu-pro shader traces.
It appears the gather4 workaround for integer types doesn't
work for cubes, so instead if forces a float scaled sample,
then converts to integer.
It modifies the descriptor before calling the gather.
This also produces some ugly asm code for reasons specified
in the patch, llvm could probably do better than dumping
sgprs to vgprs.
This fixes:
dEQP-VK.glsl.texture_gather.basic.cube.rgba8*
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Bas Nieuwenhuizen [Tue, 14 Mar 2017 21:57:55 +0000 (22:57 +0100)]
radv: Set driver version to mesa version;
I couldn't really find an encoding in the spec. I'm not sure it
prescribes VK_MAKE_VERSION format, but vulkan.gpuinfo.org interprets
it that way by default. vulkaninfo gives the raw number, so we could
alternatively do something like
17001000, but that doesn't show
up right on vulkan.gpuinfo.org again. Looking at that site, the -pro
driver also uses VK_MAKE_VERSION, so keeping consistency is probably
best.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Bas Nieuwenhuizen [Tue, 14 Mar 2017 21:37:03 +0000 (22:37 +0100)]
radv: Increase api version to 1.0.42.
I've skimmed to changes from 1.0.5 to 1.0.42 and I think we have all
changes. We're still not conformant ofcourse, but this should not
regress stuff,
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Jason Ekstrand [Tue, 14 Mar 2017 02:26:06 +0000 (19:26 -0700)]
util/vk: Add helpers for finding an extension struct
Reviewed-by: Dave Airlie <airlied@redhat.com>
Alex Smith [Tue, 14 Mar 2017 15:26:32 +0000 (15:26 +0000)]
radv: Flush before copying with PKT3_WRITE_DATA in CmdUpdateBuffer
Need to flush before updating the buffer to ensure that the copy is
ordered after previous accesses (assuming the app has performed the
appropriate barriers).
This fixes potential issues due to draws prior to an update reading
the new buffer content, despite having the necessary barriers between
them.
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Bas Nieuwenhuizen [Tue, 14 Mar 2017 20:46:54 +0000 (21:46 +0100)]
radv: Emit cache flushes before CP DMA.
The flushes could be due to TRANSFER barriers.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Jan Beich [Sun, 12 Mar 2017 03:19:14 +0000 (03:19 +0000)]
Convert sed(1) syntax to be compatible with FreeBSD and OpenBSD
BSD regex library doesn't support extended RE escapes (e.g. \+) and
shorthand character classes (e.g. \s, \S) and SVR4-style word
delimiters[1] (on DragonFly and NetBSD). Both GNU and BSD sed support
-E and -r to enable extended RE but OS X still lacks -r.
[1] https://www.illumos.org/issues/516
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Tested-by: Eric Engestrom <eric.engestrom@imgtec.com> (GNU sed)
Jason Ekstrand [Tue, 14 Mar 2017 02:30:26 +0000 (19:30 -0700)]
anv: Properly enumerate physical devices when none are present
Jason Ekstrand [Thu, 9 Mar 2017 04:23:05 +0000 (20:23 -0800)]
nir/constant_expressions: Refactor helper functions
Apart from avoiding some unneeded size cases, this shouldn't have any
actual functional impact.
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Jason Ekstrand [Wed, 8 Mar 2017 03:54:37 +0000 (19:54 -0800)]
nir: Rework conversion opcodes
The NIR story on conversion opcodes is a mess. We've had way too many
of them, naming is inconsistent, and which ones have explicit sizes was
sort-of random. This commit re-organizes things and makes them all
consistent:
- All non-bool conversion opcodes now have the explicit size in the
destination and are named <src_type>2<dst_type><size>.
- Integer <-> integer conversion opcodes now only come in i2i and u2u
forms (i2u and u2i have been removed) since the only difference
between the different integer conversions is whether or not they
sign-extend when up-converting.
- Boolean conversion opcodes all have the explicit size on the bool and
are named <src_type>2<dst_type>.
Making things consistent also allows nir_type_conversion_op to be moved
to nir_opcodes.c and auto-generated using mako. This will make adding
int8, int16, and float16 versions much easier when the time comes.
Reviewed-by: Eric Anholt <eric@anholt.net>
Jason Ekstrand [Wed, 8 Mar 2017 03:32:50 +0000 (19:32 -0800)]
i965/fs: Re-arrange conversion operations
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Jason Ekstrand [Wed, 8 Mar 2017 02:32:17 +0000 (18:32 -0800)]
i965/vec4: Get rid of the type parameter from to/from_double
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Jason Ekstrand [Wed, 8 Mar 2017 00:46:44 +0000 (16:46 -0800)]
glsl/nir: Use nir_type_conversion_op
Using the helper is way better than hand-coding the universe.
Reviewed-by: Eric Anholt <eric@anholt.net>
Jason Ekstrand [Mon, 13 Mar 2017 20:07:24 +0000 (13:07 -0700)]
nir: Rewrite nir_type_conversion_op
The original version was very convoluted and tried way too hard to not
just have the nested switch statement that it needs. Let's just write
the obvious code and then we know it's correct. This fixes a bunch of
missing cases particularly with int64.
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Jason Ekstrand [Wed, 8 Mar 2017 00:46:17 +0000 (16:46 -0800)]
nir: Add a get_nir_type_for_glsl_base_type helper
Reviewed-by: Eric Anholt <eric@anholt.net>
Jason Ekstrand [Wed, 8 Mar 2017 18:32:40 +0000 (10:32 -0800)]
nir/validate: Rework ALU bit-size rule validation
The original bit-size validation wasn't capable of properly dealing with
instructions with variable bit sizes. An attempt was made to handle it
by looking at source and destinations but, because the validation was
done in validate_alu_(src|dest), it didn't really have the needed
information. The new validation code is much more straightforward and
should be more correct.
Reviewed-by: Eric Anholt <eric@anholt.net>
Jason Ekstrand [Fri, 3 Mar 2017 00:25:59 +0000 (16:25 -0800)]
nir/validate: Validate that bit sizes and components always match
We've always required bit sizes to match but the rules for number of
components have been a bit loose. You've never been allowed to source
from something with less components than you consume, but more has
always been fine. This changes the validator to require that they match
exactly. The fact that they don't always match has been a source of
confusion in NIR for quite some time and it's time we got rid of it.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Fri, 3 Mar 2017 05:42:06 +0000 (21:42 -0800)]
nir: Make image_size a variable-width intrinsic
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Fri, 3 Mar 2017 05:39:58 +0000 (21:39 -0800)]
i965/fs: Use num_components from the SSA def in image intrinsics
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Fri, 3 Mar 2017 03:27:57 +0000 (19:27 -0800)]
nir/lower_tex: Use tex_instr_dest_size for txs destinations
Using coord_components of the source texture is correct for everything
except cube maps where it's off by one.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Fri, 3 Mar 2017 03:03:01 +0000 (19:03 -0800)]
nir/spirv: Restrict the number of channels in texture coordinates
Some SPIR-V texturing instructions pack more than the texture coordinate
into the coordinate source. We need to mask off the unused channels.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Fri, 3 Mar 2017 01:10:24 +0000 (17:10 -0800)]
nir/copy_prop: Respect the source's number of components
In the near future we are going to require that the num_components in a
src dereference match the num_components of the SSA value being
dereferenced. To do that, we need copy_prop to not remove our MOVs from
a larger SSA value into an instruction that uses fewer channels.
Because we suddenly have to know how many components each source has,
this makes the pass a bit more complicated. Fortunately, copy
propagation is the only pass that cares about the number of components
are read by any given source so it's fairly contained.
Shader-db results on Sky Lake:
total instructions in shared programs:
13318947 ->
13320265 (0.01%)
instructions in affected programs: 260633 -> 261951 (0.51%)
helped: 324
HURT: 1027
Looking through the hurt programs, about a dozen are hurt by 3
instructions and the rest are all hurt by 2 instructions. From a
spot-check of the shaders, the story is always the same: They get a
vec4 from somewhere (frequently an input) and use the first two or three
components as a texture coordinate. Because of the vector component
mismatch, we have a mov or, more likely, a vecN sitting between the
texture instruction and the input. This means that the back-end inserts
a bunch of MOVs and split_virtual_grfs() goes to town. Because the
texture coordinate is also used by some other calculation, register
coalesce can't combine them back together and we end up with an extra 2
MOV instructions in our shader.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Fri, 3 Mar 2017 01:39:11 +0000 (17:39 -0800)]
nir/intrinsics: Make load_barycentric_input take a 2-component coor
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Cc: "17.0 13.0" <mesa-stable@lists.freedesktop.org>
Jason Ekstrand [Fri, 3 Mar 2017 07:03:03 +0000 (23:03 -0800)]
anv/blorp: Only set a clear color for resolves if fast-cleared
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Jason Ekstrand [Fri, 10 Mar 2017 00:37:23 +0000 (16:37 -0800)]
anv/blorp: Turn off AUX after doing a CCS_D resolve
For render passes with multiple subpasses on gen7, we only fast-clear at
the top but an input attachment use can cause us to do a resolve in the
middle of the render pass. Once we've done so, we are no longer have a
fast-cleared surface so we can just set aux_usage to NONE.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Tapani Pälli [Mon, 13 Mar 2017 12:08:38 +0000 (14:08 +0200)]
android: add '/vulkan' to libmesa_anv_entrypoints path
otherwise generated entrypoint headers are not found during build
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Tapani Pälli [Mon, 13 Mar 2017 12:08:37 +0000 (14:08 +0200)]
android: add src/intel/compiler to libmesa_intel_compiler includes
fixes build error when brw_nir.h not found in the generated file
brw_nir_trig_workarounds.c.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Gwan-gyeong Mun [Tue, 29 Nov 2016 21:59:15 +0000 (06:59 +0900)]
anv: Add missing error-checking to anv_CreateDevice (v3)
This patch adds missing error-checking and fixes resource leak in
allocation failure path on anv_CreateDevice()
v2: Fixes from Jason Ekstrand's review
a) Add missing destructors for all of the state pools on allocation
failure path
b) Add missing destructor for batch bo pools on allocation failure path
v3: Fixes from Emil Velikov's review
Add missing destructor for queue and scratch_pool on allocation failure
path
Signed-off-by: Mun Gwan-gyeong <elongbug@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Dave Airlie [Mon, 13 Mar 2017 20:50:59 +0000 (06:50 +1000)]
radv: setup llvm target data layout
Ported from radeonsi, pointed out by Tom.
"This prevents LLVM from using sext instructions for local memory
offsets and allows the backend to fold immediate offsets into the
instruction. This also prevents some incorrect code generation for
ptrtoint and inttoptr instructions."
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Tom Stellard <tstellar@redhat.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Alex Smith [Mon, 13 Mar 2017 13:28:19 +0000 (13:28 +0000)]
radv: Reinitialise loaderMagic when allocating a cached command buffer
This must be set to ICD_LOADER_MAGIC by vkAllocateCommandBuffers, which
was being done when allocating a new buffer but not when reusing an
existing one in the cache. This would hit an assertion and crash in
debug builds of the Vulkan loader.
Fixes:
682248db451f ("radv: Cache command buffers in command pool.")
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Marek Olšák [Fri, 10 Mar 2017 11:18:07 +0000 (12:18 +0100)]
gallium/radeon: disable the shader cache if dumping shaders
otherwise, cached shaders aren't dumped.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Marek Olšák [Mon, 6 Mar 2017 00:47:52 +0000 (01:47 +0100)]
radeonsi: mark all bound shader buffer ranges as initialized
This should prevent cases when a buffer was incorrectly mapped without
synchronization just because this wasn't done.
Cc: 13.0 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Marek Olšák [Fri, 10 Mar 2017 11:19:50 +0000 (12:19 +0100)]
st/mesa: disable the shader cache if dumping shaders
otherwise, cached shaders aren't dumped.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Chad Versace [Sun, 5 Mar 2017 21:15:06 +0000 (13:15 -0800)]
anv: Use vk_outarray in vkGetPhysicalDeviceQueueFamilyProperties
No intended change in behavior. Just a refactor.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Chad Versace [Sun, 5 Mar 2017 21:07:13 +0000 (13:07 -0800)]
anv: Use vk_outarray in vkEnumeratePhysicalDevices (v2)
No intended change in behavior. Just a refactor.
v2: Replace vk_outarray_is_incomplete() with vk_outarray_status(). For
Jason.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Chad Versace [Sat, 25 Feb 2017 04:58:59 +0000 (20:58 -0800)]
util/vulkan: Add vk_outarray (v2)
This is a wrapper for a Vulkan output array. A Vulkan output array is
one that follows the convention of the parameters to
vkGetPhysicalDeviceQueueFamilyProperties().
v2: Replace vk_outarray_is_incomplete() with vk_outarray_status(). For
Jason.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Lionel Landwerlin [Sun, 12 Mar 2017 16:53:29 +0000 (16:53 +0000)]
intel: genxml: prevent missing ; with address fields dwords
Before this change, the generator could print this kind of things :
const uint32_t v0 =
__gen_uint(values->ValidBit, 0, 0) |
__gen_uint(values->FaultType, 1, 2) |
__gen_uint(values->SRCIDofFault, 3, 10) |
__gen_uint(values->GTTSEL, 11, 1) |
dw[0] = __gen_combine_address(data, &dw[0], values->VirtualAddressofFault, v0);
This change fix the trailing '|'.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Julien Isorce [Fri, 10 Mar 2017 17:16:07 +0000 (17:16 +0000)]
gallium/hud: check NULL return from u_upload_alloc
Fixes the following segmentation fault:
signal SIGSEGV: invalid address (fault address: 0x0)
frame #0: 0x00007fffe718e117 radeonsi_dri.so hud_draw_background_quad hud_context.c:170
167
168 assert(hud->bg.num_vertices + 4 <= hud->bg.max_num_vertices);
169
-> 170 vertices[num++] = (float) x1;
171 vertices[num++] = (float) y1;
172
173 vertices[num++] = (float) x1;
(lldb) bt
* frame #0: 0x00007fffe718e117 radeonsi_dri.so`hud_draw_background_quad
frame #1: 0x00007fffe718f458 radeonsi_dri.so`hud_draw
frame #2: 0x00007fffe712967f radeonsi_dri.so`dri_flush
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Julien Isorce [Fri, 10 Mar 2017 17:20:56 +0000 (17:20 +0000)]
winsys/radeon: check null return from radeon_cs_create_fence in cs_flush
Follow-up of patch:
"radeon_cs_create_fence: check null return from radeon_winsys_bo_create"
radeon_drm_cs_flush
radeon_cs_create_fence
radeon_winsys_bo_create
Signed-off-by: Julien Isorce <jisorce@oblong.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Julien Isorce [Fri, 10 Mar 2017 17:16:05 +0000 (17:16 +0000)]
winsys/radeon: check null in radeon_cs_create_fence
Fixes the following segmentation fault:
radeon_drm_cs_add_buffer (bo=0x0) at radeon_drm_cs.c
-> if (!bo->handle)
(gdb) bt
0 radeon_drm_cs_add_buffer (bo=0x0) at radeon_drm_cs.c
1 0x00007fffe73575de in radeon_cs_create_fence radeon_drm_cs.c
2 0x00007fffe7358c48 in radeon_drm_cs_flush radeon_drm_cs.c
Signed-off-by: Julien Isorce <jisorce@oblong.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Juan A. Suarez Romero [Mon, 13 Mar 2017 15:04:20 +0000 (16:04 +0100)]
vulkan/wsi: include builddir for generated headers
wayland-drm-client-protocol.h is generated in builddir, so when
builddir != srcdir the header is not found, and compilation of
wsi_common_wayland.c will fail.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Jason Ekstrand [Sat, 4 Mar 2017 17:23:26 +0000 (09:23 -0800)]
anv: Use on-the-fly surface states for dynamic buffer descriptors
We have a performance problem with dynamic buffer descriptors. Because
we are currently implementing them by pushing an offset into the shader
and adding that offset onto the already existing offset for the UBO/SSBO
operation, all UBO/SSBO operations on dynamic descriptors are indirect.
The back-end compiler implements indirect pull constant loads using what
basically amounts to a texelFetch instruction. For pull constant loads
with constant offsets, however, we use an oword block read message which
goes through the constant cache and reads a whole cache line at a time.
Because of these two things, direct pull constant loads are much faster
than indirect pull constant loads. Because all loads from dynamically
bound buffers are indirect, the user takes a substantial performance
penalty when using this "performance" feature.
There are two potential solutions I have seen for this problem. The
alternate solution is to continue pushing offsets into the shader but
wire things up in the back-end compiler so that we use the oword block
read messages anyway. The only reason we can do this because we know a
priori that the dynamic offsets are uniform and 16-byte aligned.
Unfortunately, thanks to the 16-byte alignment requirement of the oword
messages, we can't do some general "if the indirect offset is uniform,
use an oword message" sort of thing.
This solution, however, is recommended for a few of reasons:
1. Surface states are relatively cheap. We've been using on-the-fly
surface state setup for some time in GL and it works well. Also,
dynamic offsets with on-the-fly surface state should still be
cheaper than allocating new descriptor sets every time you want to
change a buffer offset which is really the only requirement of the
dynamic offsets feature.
2. This requires substantially less compiler plumbing. Not only can we
delete the entire apply_dynamic_offsets pass but we can also avoid
having to add architecture for passing dynamic offsets to the back-
end compiler in such a way that it can continue using oword messages.
3. We get robust buffer access range-checking for free. Because the
offset and range are baked into the surface state, we no longer need
to pass ranges around and do bounds-checking in the shader.
4. Once we finally get UBO pushing implemented, it will be much easier
to handle pushing chunks of dynamic descriptors if the compiler
remains blissfully unaware of dynamic descriptors.
This commit improves performance of The Talos Principle on ULTRA
settings by around 50% and brings it nicely into line with OpenGL
performance.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Jason Ekstrand [Sat, 11 Mar 2017 07:00:49 +0000 (23:00 -0800)]
anv: Stall before fast-clear operations
During initial CCS bring-up, I discovered that you have to do a full CS
stall prior to doing a CCS resolve as well as afterwards. It appears
that the same is needed for fast-clears as well. This fixes rendering
corruptions on The Talos Principle on Sky Lake GT4. The issue hasn't
been demonstrated on any other hardware however, given that this appears
to be a "too many things in the pipe" problem, having it be easier to
reproduce on a system with more EUs makes sense. The issues with
resolves is demonstrable on a GT3 or GT2 so this is probably also a
problem on all GTs.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Jason Ekstrand [Sat, 4 Mar 2017 18:52:43 +0000 (10:52 -0800)]
anv: Accurately advertise dynamic descriptor limits
The number of dynamic descriptors is limited by both the number of
descriptors and the total number of dynamic things. Because there isn't
a single "maximum dynamic things" limit, we need to divide by two so
that they can create the maximum of both UBOs and SSBOs.
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Cc: "17.0 13.0" <mesa-stable@lists.freedesktop.org>
Jason Ekstrand [Sat, 4 Mar 2017 18:07:56 +0000 (10:07 -0800)]
anv: Add a helper for working with VK_WHOLE_SIZE for buffers
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Rob Clark [Tue, 31 Jan 2017 13:31:37 +0000 (08:31 -0500)]
freedreno/ir3: fragz cannot be half precision
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Mon, 30 Jan 2017 22:27:35 +0000 (17:27 -0500)]
freedreno/ir3: optimize less in glsl
Rely on nir for optimization, to reduce compile times. Very minimal impact
on shader-db:
total instructions in shared programs: 104170 -> 104199 (0.03%)
total dwords in shared programs: 209664 -> 209728 (0.03%)
total full registers used in shared programs: 7156 -> 7161 (0.07%)
total half registers used in shader programs: 109 -> 109 (0.00%)
total const registers used in shared programs: 24222 -> 24224 (0.01%)
half full const instr dwords
helped 12 107 103 112 98
hurt 11 104 105 115 102
But shader db runtime dropped from ~29.3s user to ~20.4s user.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Lionel Landwerlin [Fri, 10 Mar 2017 16:14:43 +0000 (16:14 +0000)]
aubinator/genxml: use gzipped files to store embedded genxml
This reduces the size of the aubinator binary from ~1.4Mb to ~700Kb.
With can now drop the checks on xxd in configure.
v2: Fix incorrect makefile dependency (Lionel)
v3: use $(PYTHON2) (Emil)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Lionel Landwerlin [Fri, 10 Mar 2017 16:12:51 +0000 (16:12 +0000)]
intel: genxml: add script to generate gzipped genxml
v2 (from Dylan):
Add main function
Add missing Copyright
Use print_function
v3: Add actually license (Dylan)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Jose Fonseca [Mon, 13 Mar 2017 12:23:11 +0000 (12:23 +0000)]
util/u_thread.h: Include stdint.h for int64_t definition.
Fixes MinGW build. Trivial.
Iago Toral Quiroga [Mon, 13 Mar 2017 11:58:44 +0000 (12:58 +0100)]
intel: fix compiler build
compiler/brw_vec4_gs_visitor.cpp:744:39: error:
‘GEN7_MAX_GS_OUTPUT_VERTEX_SIZE_BYTES’ was not declared in this scope
output_vertex_size_bytes <= GEN7_MAX_GS_OUTPUT_VERTEX_SIZE_BYTES);
Fixes:
d0d4a5f43b4 ("i965: split EU defines to brw_eu_defines.h")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Christian König [Mon, 13 Mar 2017 11:43:18 +0000 (12:43 +0100)]
svga: handle P016 format as well
Fixes:
62cff793785 ("gallium: add P016 format")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100180
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Emil Velikov [Thu, 2 Mar 2017 19:02:44 +0000 (19:02 +0000)]
configure.ac: require pthread-stubs only where available
The project is a thing only for BSD platforms. Or in other words - for
any other platforms building/installing pthread-stubs results only in a
pthread-stub.pc file.
And even where it provides a DSO, there's a fundamental design issue
with it - see the pthread-stubs mailing list for the specifics.
v2: Update comment above the switch statement (Jon Turney).
Reviewed-by: Jeremy Huddleston Sequoia <jeremyhu@apple.com>
Acked-by: Gary Wong <gtw@gnu.org>
Tested-by: Eric Engestrom <eric.engestrom@imgtec.com>
Acked-by: Randy Fishel <randy.fishel@oracle.com>
Cc: Niveditha Rau <niveditha.rau@oracle.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Emil Velikov [Wed, 8 Mar 2017 23:46:06 +0000 (23:46 +0000)]
configure.ac: do not require the i965 driver for ANV
As of last few commits we have the two split, thus we no longer require
the i965 in order to have the ANV driver.
Even though ANV does not link against libdrm nor libdrm_intel, we still
require those as dependencies due to the headers they provide.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Jason Ekstrand [Thu, 2 Mar 2017 05:11:51 +0000 (21:11 -0800)]
intel/vulkan: Get rid of recursive make
v2 [Emil Velikov]
- Various fixes and initial stab at the Android build.
- Keep the generation rules/EXTRA_DIST outside the conditional
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Jason Ekstrand [Wed, 1 Mar 2017 21:26:40 +0000 (13:26 -0800)]
intel/tools: Use a makefile included from intel/Makefile.am
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Emil Velikov [Tue, 7 Mar 2017 15:07:49 +0000 (15:07 +0000)]
intel/compiler: whitespace cleanups
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Emil Velikov [Tue, 7 Mar 2017 16:13:42 +0000 (16:13 +0000)]
intel/compiler: link all tests again gtest, even test_eu_compact"
At the moment all the tests but test_eu_compact are actual C++ gtests.
To simplify things, we can move the gtest.la to the common TEST_LIBS.
As we're here, we can rename change the test extension [to .cpp] to
avoid using the confusing dummy.cpp.
Add a nice comment in the makefile for posterity.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Emil Velikov [Tue, 7 Mar 2017 15:57:27 +0000 (15:57 +0000)]
i965: remove i965_symbols_test reference from .gitignore
The test/binary was removed back in 2012. With that one gone, we can
drop the .gitignore file all together.
Cc: Eric Anholt <eric@anholt.net>
Fixes:
c8850394423 ("i965: Drop the missing symbols link test.")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Jason Ekstrand [Tue, 28 Feb 2017 17:10:43 +0000 (09:10 -0800)]
i965: Move the back-end compiler to src/intel/compiler
Mostly a dummy git mv with a couple of noticable parts:
- With the earlier header cleanups, nothing in src/intel depends
files from src/mesa/drivers/dri/i965/
- Both Autoconf and Android builds are addressed. Thanks to Mauro and
Tapani for the fixups in the latter
- brw_util.[ch] is not really compiler specific, so it's moved to i965.
v2:
- move brw_eu_defines.h instead of brw_defines.h
- remove no-longer applicable includes
- add missing vulkan/ prefix in the Android build (thanks Tapani)
v3:
- don't list brw_defines.h in src/intel/Makefile.sources (Jason)
- rebase on top of the oa patches
[Emil Velikov: commit message, various small fixes througout]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Emil Velikov [Thu, 9 Mar 2017 00:44:29 +0000 (00:44 +0000)]
i965: split EU defines to brw_eu_defines.h
Split out the EU defines from the 'generic' ones, as the former are more
compiler oriented.
With a later commit we'll move brw_eu_defines.h alongside the compiler
infra to src/intel/. Pulling all the defines in there seems overzealous.
Some defines are used by both i965 and the i965 compiler. Those are
moved to brw_eu_defines.h, and annotated accordingly. The i965 users
were updated to have the extre include to indicate that.
With future work we might provide a better, split but for now this seems
reasonable.
Cc: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Emil Velikov [Thu, 9 Mar 2017 17:45:19 +0000 (17:45 +0000)]
util/bitscan: use correct signature for ffs/ffsll
Otherwise we'll get errors such as
error: conflicting types for ‘ffs’
error: conflicting types for ‘ffsll’
We might want to improve the heuristics and provide a definition only
when a native one is missing. We can address that at a later stage.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Emil Velikov [Thu, 9 Mar 2017 02:05:07 +0000 (02:05 +0000)]
i965: add missing brw_defines.h include in brw_program.c
File is using MI_LOAD_REGISTER_IMM, GEN7_CACHE_MODE_1 and others as
defined in the header.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Emil Velikov [Thu, 9 Mar 2017 01:58:30 +0000 (01:58 +0000)]
i965: add missing brw_defines.h include in brw_program.c
File is using the PIPE_CONTROL_* macros as defined in the header.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Emil Velikov [Thu, 9 Mar 2017 01:13:52 +0000 (01:13 +0000)]
i965: add missing #include <assert.h> in brw_inst.h
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Emil Velikov [Thu, 9 Mar 2017 00:38:21 +0000 (00:38 +0000)]
i965: move brw_define.h ifndef guard to the top
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Emil Velikov [Wed, 8 Mar 2017 23:38:07 +0000 (23:38 +0000)]
i965: remove unused macros from brw_defines.h
The follow three groups are not used by neither the DRI module nor the
compiler.
BRW_POLYGON_*_FACING
BRW_POLYGON_FACING_*
BRW_STATELESS_BUFFER_*
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Emil Velikov [Tue, 7 Mar 2017 18:38:34 +0000 (18:38 +0000)]
i965: remove unused brw_program.h include
Neither of the changed files requires the brw_program.h include. Since
we're about to move them [to src/intel/compiler] with the next commit
there's no point in having the include.
Let alone the very confusing compiler include directive
[-I${top_srcdir}/src/mesa/drivers/dri/i965/] that one would have to use.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Emil Velikov [Tue, 7 Mar 2017 15:35:55 +0000 (15:35 +0000)]
i965: remove duplicate declaration of brw_mark_surface_used
Function was made static and moved to another header with earlier
commit.
Fixes:
760c8a1d950 ("i965: Make mark_surface_used a static inline in brw_compiler.h")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Emil Velikov [Tue, 7 Mar 2017 15:30:45 +0000 (15:30 +0000)]
i965: remove dead brw_new_shader() declaration
Cc: Timothy Arceri <tarceri@itsqueeze.com>
Fixes:
194537ebe44 ("mesa/glsl/i965: remove Driver.NewShader()")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Emil Velikov [Tue, 7 Mar 2017 15:29:06 +0000 (15:29 +0000)]
i965: remove unused brw_cs.h include
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Jason Ekstrand [Thu, 2 Mar 2017 05:14:56 +0000 (21:14 -0800)]
anv: Stop including brw_context.h
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Jason Ekstrand [Thu, 2 Mar 2017 04:55:17 +0000 (20:55 -0800)]
intel/isl: Stop linking libi965_compiler.la into tests
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Jason Ekstrand [Thu, 2 Mar 2017 03:03:17 +0000 (19:03 -0800)]
vulkan/wsi: Generate wayland protocol headers separately from EGL
Previously, we were depending on EGL for generating the headers and
providing the protocol symbols. However, since neither Vulkan driver
actually wants to link against EGL, this is kind of pointless. It also
creates a weird build dependency.
v2 [Jason]
- Add missing wsi/ prefix, MKDIR_GEN
v3 [Emil Velikov]
- include BUILT_SOURCES/generation rules outside of conditional
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Emil Velikov [Tue, 7 Mar 2017 14:54:36 +0000 (14:54 +0000)]
radv/wsi: Don't include wayland headers
Unused and we'll rework the way wayland-drm-client-protocol.h is
generated with later commit.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Dave Airlie <airlied@redhat.com>
Jason Ekstrand [Thu, 2 Mar 2017 05:15:55 +0000 (21:15 -0800)]
anv/wsi: Don't include wayland headers
Unused and we'll rework the way wayland-drm-client-protocol.h is
generated with later commit.
v2 [Emil]
- Also remove wayland-client.h
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Emil Velikov [Thu, 2 Mar 2017 19:14:24 +0000 (19:14 +0000)]
configure.ac: provide a fall-back define for WAYLAND_SCANNER
In some cases, we can end up calling WAYLAND_SCANNER even when
there's no binary. Do follow the other's approach set by
AX_PROG_FLEX/BISON and set the variable to :
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Emil Velikov [Thu, 2 Mar 2017 19:11:06 +0000 (19:11 +0000)]
wayland: move .gitignore where applicable
Strictly speaking things work as-is, but let's move the file alongside
the artefacts it references. Analogous to all other places in mesa.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Christian König [Mon, 6 Mar 2017 16:36:29 +0000 (17:36 +0100)]
st/va: add config support for 10bit decoding v2
Advertise 10bpp support if the driver supports decoding to a P016 surface.
v2: Advertise 10bpp for the decoder as well.
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Mark Thompson <sw@jkqxz.net>
Christian König [Tue, 7 Mar 2017 14:23:39 +0000 (15:23 +0100)]
st/va: add support for allocating 10bpp surfaces
We support P010 and P016 as targets for 10bpp video decoding.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Mark Thompson <sw@jkqxz.net>
Christian König [Mon, 6 Mar 2017 16:53:04 +0000 (17:53 +0100)]
st/va: add support for P010 and P016 formats v3
No hardware I know off can actually support P010 natively. But we can easily
support P016 and as long as nobody decodes anything into the lower 6bits it
doesn't make any difference to P010.
v2: allow P0160 for post processing as well
v3: fix post processing once more
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Mark Thompson <sw@jkqxz.net>
Christian König [Tue, 7 Mar 2017 14:20:08 +0000 (15:20 +0100)]
st/va: clear the video surface on allocation
This makes debugging of decoding problems quite a bit easier.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Mark Thompson <sw@jkqxz.net>
Christian König [Wed, 25 Jan 2017 13:46:53 +0000 (14:46 +0100)]
st/va: cleanup error handling in vlVaCreateSurfaces2
No need to have that twice.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Mark Thompson <sw@jkqxz.net>
Christian König [Mon, 16 Jan 2017 14:04:47 +0000 (15:04 +0100)]
radeon/uvd: enable 10bit HEVC decode v2
Just use whatever the state tracker allocated.
v2: fix msb mode
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Mark Thompson <sw@jkqxz.net>
Christian König [Wed, 8 Mar 2017 11:51:13 +0000 (12:51 +0100)]
radeon/UVD: fix the decoding target pitch calculation
The firmware expects the value in pixel not bytes. Didn't made a difference
so far because we only used 8bpp surfaces.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Mark Thompson <sw@jkqxz.net>
Christian König [Sat, 14 Jan 2017 12:57:02 +0000 (13:57 +0100)]
vl/video_buffer: add support for P016
Just simply the description of the planes.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Mark Thompson <sw@jkqxz.net>
Christian König [Fri, 13 Jan 2017 18:11:43 +0000 (19:11 +0100)]
gallium: add P016 format
Same layout as NV12, but 16bit per channel instead of 8.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Mark Thompson <sw@jkqxz.net>
Kenneth Graunke [Mon, 13 Mar 2017 05:39:51 +0000 (22:39 -0700)]
i965: Delete unused last_ring local.
Dead since
071d80bde2a78f464a7f54c3e6c6e42845ef52e4, and causing
warnings.
Bas Nieuwenhuizen [Sun, 12 Mar 2017 13:12:19 +0000 (14:12 +0100)]
radv: Store shaders in VRAM.
Less IFETCH latency on misses. Shader code is write once read many,
so GTT doesn't make much sense anyway.
If it turns out to fragment the CPU visible VRAM too much, we can upload with SDMA.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Sun, 5 Mar 2017 23:58:45 +0000 (09:58 +1000)]
radv/ac: move to new image intrinsics.
This hooks up radv to the new image intrinsic builders.
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Tue, 7 Mar 2017 07:00:39 +0000 (17:00 +1000)]
radv: disabled scaled formats for transfers.
These really are only supported for vertex buffers.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Timothy Arceri [Fri, 10 Mar 2017 02:28:53 +0000 (13:28 +1100)]
util/u_queue: make u_queue accessible to cpp
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Timothy Arceri [Fri, 10 Mar 2017 00:30:01 +0000 (11:30 +1100)]
glsl: don't use ralloc for blob creation
There is no need to use ralloc here.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Timothy Arceri [Wed, 8 Mar 2017 23:07:43 +0000 (10:07 +1100)]
gallium/util: replace pipe_thread_setname() with u_thread_setname()
They do the same thing we just moved the function to be
accessible to all of Mesa.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Timothy Arceri [Wed, 8 Mar 2017 23:03:00 +0000 (10:03 +1100)]
gallium/util: replace pipe_thread_get_time_nano() with u_thread_get_time_nano()
They do the same thing we just moved the function to be
accessible to all of Mesa.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>