Paul Berry [Wed, 14 Nov 2012 22:39:21 +0000 (14:39 -0800)]
dri: Update dri_util to keep track of __DRI_BACKGROUND_CALLABLE
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Paul Berry [Wed, 14 Nov 2012 19:13:02 +0000 (11:13 -0800)]
dri_interface: Add new marshalling interfaces to dri_interface.h
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Roland Scheidegger [Thu, 16 Mar 2017 03:01:41 +0000 (04:01 +0100)]
gallivm: (trivial) remove duplicated line
pointed out by clang (stored value never read)
Roland Scheidegger [Thu, 16 Mar 2017 02:59:52 +0000 (03:59 +0100)]
draw: (trivial) remove a unnecessary lp_build_alloca()
pointed out by clang (stored value never read)
Ilia Mirkin [Sun, 5 Mar 2017 23:24:44 +0000 (18:24 -0500)]
swr: support layer output in geometry shaders
This makes bin/gl-3.2-layered-rendering-gl-layer-render fail only with
2DMS_ARRAY, which is expected given the lackluster MSAA support. However
all the regular types pass.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Bas Nieuwenhuizen [Wed, 15 Mar 2017 17:49:29 +0000 (18:49 +0100)]
Revert "radv: Emit cache flushes before CP DMA."
This reverts commit
cce43f6d8c40222099badaf52344d6a0eed993f3.
Redundant, as the flush already happens at si_cp_dma_prepare.
Acked-by: Dave Airlie <airlied@redhat.com>
Francisco Jerez [Tue, 14 Mar 2017 00:31:39 +0000 (17:31 -0700)]
gallium/tgsi: Treat UCMP sources as floats to match the GLSL-to-TGSI pass expectations.
Currently the GLSL-to-TGSI translation pass assumes it can use
floating point source modifiers on the UCMP instruction. See the bug
report linked below for an example where an unrelated change in the
GLSL built-in lowering code for atan2 (
e9ffd12827ac11a2d2002a42fa8eb1)
caused the generation of floating-point ir_unop_neg instructions
followed by ir_triop_csel, which is translated into UCMP with a negate
modifier on back-ends with native integer support.
Allowing floating-point source modifiers on an integer instruction
seems like rather dubious design for a transport IR, since the same
semantics could be represented as a sequence of MOV+UCMP instructions
instead, but supposedly this matches the expectations of TGSI
back-ends other than tgsi_exec, and the expectations of the DX10 API.
I take no responsibility for future headaches caused by this
inconsistency.
Fixes a regression of piglit glsl-fs-tan-1 on softpipe introduced by
the above-mentioned glsl front-end commit. Even though the commit
that triggered the regression doesn't seem to have made it to any
stable branches yet, this might be worth back-porting since I don't
see any reason why the bug couldn't have been reproduced before that
point.
Suggested-by: Roland Scheidegger <sroland@vmware.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99817
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Grazvydas Ignotas [Wed, 15 Mar 2017 18:53:56 +0000 (20:53 +0200)]
util/disk_cache: do eviction before creating .tmp
cache_put() first creates a .tmp file and then tries to do eviction.
The recently added LRU eviction code selects non-empty directory with
the oldest access time, but that may easily be the one with just the
new .tmp file, especially on Linux where atime is updated lazily
(with "relatime" mount option, which is the default). So when cache is
small, if random doesn't hit another dir LRU keeps selecting the same
dir with just the .tmp and not deleting anything. To fix this (and the
tests), do eviction earlier.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Tim Rowley [Wed, 15 Mar 2017 16:42:43 +0000 (11:42 -0500)]
swr: validate backend state numAttributes
General protection and prevents us from smashing the stack
on the first clear state validation (
a7b8d50bcb). Fixes crash
using icc.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Ben Widawsky [Fri, 21 Oct 2016 01:21:24 +0000 (18:21 -0700)]
gbm: Export a get modifiers
This patch originally had i965 specific code and was named:
commit
61cd3c52b868cf8cb90b06e53a382a921eb42754
Author: Ben Widawsky <ben@bwidawsk.net>
Date: Thu Oct 20 18:21:24 2016 -0700
gbm: Get modifiers from DRI
To accomplish this, two new query tokens are added to the extension:
__DRI_IMAGE_ATTRIB_MODIFIER_UPPER
__DRI_IMAGE_ATTRIB_MODIFIER_LOWER
The query extension only supported 32b queries, and modifiers are 64b,
so we needed two of them.
NOTE: The extension version is still set to 13, so none of this will
actually be called.
v2: Error handling of queryImage (Emil)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Ben Widawsky [Tue, 14 Mar 2017 01:20:02 +0000 (18:20 -0700)]
i965: introduce modifier selection.
Nothing special here other than a brief introduction to modifier
selection. Originally this was part of another patch but was split out
from
gbm: Introduce modifiers into surface/bo creation by request of Emil.
Requested-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Ben Widawsky [Tue, 14 Mar 2017 01:19:00 +0000 (18:19 -0700)]
egl/drm: Use modifiers for backbuffer creation
Split into a separate patch from the previous patch as requested by
Emil.
Requested-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Ben Widawsky [Thu, 3 Nov 2016 23:14:44 +0000 (16:14 -0700)]
gbm: Introduce modifiers into surface/bo creation
The idea behind modifiers like this is that the user of GBM will have
some mechanism to query what properties the hardware supports for its BO
or surface. This information is directly passed in (and stored) so that
the DRI implementation can create an image with the appropriate
attributes.
A getter() will be added later so that the user GBM will be able to
query what modifier should be used.
Only in surface creation, the modifiers are stored until the BO is
actually allocated. In regular buffer allocation, the correct modifier
can (will be, in future patches be chosen at creation time.
v2: Make sure to check if count is non-zero in addition to testing if
calloc fails. (Daniel)
v3: Remove "usage" and "flags" from modifier creation. Requested by
Kristian.
v4: Take advantage of the "INVALID" modifier added by the GET_PLANE2
series.
v5: Don't bother with storing modifiers for gbm_bo_create because that's
a synchronous operation and we can actually select the correct modifier
at create time (done in a later patch) (Jason)
v6: Make modifier condition outside the check so that dri_use will work
properly (Jason)
Cc: Kristian Høgsberg <krh@bitplanet.net>
References (v4): https://lists.freedesktop.org/archives/intel-gfx/2017-January/116636.html
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Daniel Stone <daniels@collabora.com>
Ben Widawsky [Mon, 13 Mar 2017 21:53:43 +0000 (14:53 -0700)]
i965: Implement basic modifier image creation
This is just a stub for now and will be filled in later.
This was split out of an earlier patch
Requested-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Ben Widawsky [Fri, 4 Nov 2016 18:31:15 +0000 (11:31 -0700)]
dri: Add an image creation with modifiers
Modifiers will be obtained or guessed by the client and passed in during
image creation/import. In guessing, a client might decide to simply pass
along all known modifiers
This requires bumping the DRIimage version.
As of this patch, the modifiers aren't plumbed all the way down, this
patch simply makes sure the interface level stuff is correct.
v2: Don't allow usage + modifiers
v3: Make NAND actually NAND. Bug introduced in v2. (Jason)
v4:
- s/obtains/obtained (Jason)
- Pull out i965 imlemnentation into a later patch (Emil)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Daniel Stone <daniels@collabora.com>
Marek Olšák [Tue, 7 Mar 2017 01:19:47 +0000 (02:19 +0100)]
radeonsi: implement TGSI opcodes TEX_LZ and TXF_LZ
This massively decreases VGPR spilling for DiRT Showdown, because we
no longer have to use v4i32 for 2D fetches when level == 0.
We now use v2i32 for those cases.
DiRT Showdown - Spilled VGPRs: -26 (-81%)
This surprisingly doesn't have any useful effect on performance (+ 0.05%).
Marek Olšák [Tue, 7 Mar 2017 01:26:47 +0000 (02:26 +0100)]
glsl_to_tgsi: use TEX_LZ and TXF_LZ when available
Marek Olšák [Tue, 7 Mar 2017 01:01:08 +0000 (02:01 +0100)]
glsl_to_tgsi: remove a redundant statement
it's the same as the last "else".
Marek Olšák [Tue, 7 Mar 2017 01:15:14 +0000 (02:15 +0100)]
gallium: add TGSI opcodes TEX_LZ and TXF_LZ
for better code generation in radeonsi
Marek Olšák [Tue, 7 Mar 2017 01:09:03 +0000 (02:09 +0100)]
gallium: add PIPE_CAP_TGSI_TEX_TXF_LZ
Samuel Pitoiset [Tue, 14 Mar 2017 23:59:13 +0000 (00:59 +0100)]
radeonsi: disable sinking common instructions down to the end block
Initially this was a workaround for a bug introduced in LLVM 4.0
in the SimplifyCFG pass that caused image instrinsics to disappear
(because they were badly sunk). Finally, this is a win because it
decreases SGPR spilling and increases the number of waves a bit.
Although, shader-db results are good I think we might want to
remove it in the future once the issue is fixed. For now, enable
it for LLVM >= 4.0.
This also fixes a rendering issue with the speedometer in Dirt Rally.
More information can be found here https://reviews.llvm.org/D26348.
Thanks to Dave Airlie for the patch.
v2: - add a FIXME comment
- use if (HAVE_LLVM >= 0x0400) instead
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99484
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97988
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Samuel Pitoiset [Wed, 15 Mar 2017 11:40:13 +0000 (12:40 +0100)]
tgsi: add missing compute shader entry in tgsi_get_processor_name()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Samuel Pitoiset [Wed, 15 Mar 2017 12:00:02 +0000 (13:00 +0100)]
radeonsi: clean up tex_fetch_ptrs()
Will also help when the src sampler register will be
TGSI_FILE_CONSTANT for bindless.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Emil Velikov [Thu, 2 Mar 2017 19:02:45 +0000 (19:02 +0000)]
configure.ac: bump pthread-stubs requirement
On platforms that require it, we bump the requirement to 0.4 or later.
Due to an issue with the project [design] any version earlier than it,
is bound to cause issues. For the specifics see the pthread-stubs README
Cc: Uli Schlachter <psychon@znc.in>
Cc: Jonathan Gray <jsg@jsg.id.au>
Cc: Jean-Sébastien Pédron <dumbbell@FreeBSD.org>
Cc: François Tigeot <ftigeot@wolfpond.org>
Cc: Tobias Nygren <tnn@NetBSD.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Emil Velikov [Tue, 27 Sep 2016 12:39:36 +0000 (13:39 +0100)]
glx: don't expose systemTimeExtension for DRI2/DRI3/DRISW
Used/applicable to only dri1 drivers.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Emil Velikov [Thu, 1 Dec 2016 21:21:10 +0000 (21:21 +0000)]
anv: do not open random render node(s)
drmGetDevices2() provides us with enough flexibility to build heuristics
upon. Opening a random node on the other hand will wake up the device,
regardless if it's the one we're interested or not.
v2: Rebase, explicitly require/check for libdrm
v3: Return VK_ERROR_INCOMPATIBLE_DRIVER for no devices (Ilia)
v4: Rebase
Cc: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v1)
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Emil Velikov [Thu, 1 Dec 2016 20:58:20 +0000 (20:58 +0000)]
radv: do not open random render node(s)
drmGetDevices2() provides us with enough flexibility to build heuristics
upon. Opening a random node on the other hand will wake up the device,
regardless if it's the one we're interested or not.
v2: Rebase.
v3: Return VK_ERROR_INCOMPATIBLE_DRIVER for no devices (Ilia)
Cc: Michel Dänzer <michel.daenzer@amd.com>
Cc: Dave Airlie <airlied@redhat.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> (v1)
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v1)
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Emil Velikov [Thu, 1 Dec 2016 19:53:11 +0000 (19:53 +0000)]
radv/winsys: use drmGetDevice2 API
Analogous to previous commit
v2: Add explicit require_libdrm check.
Cc: Dave Airlie <airlied@redhat.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> (v1)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> (v1)
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v1)
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Emil Velikov [Thu, 1 Dec 2016 19:54:39 +0000 (19:54 +0000)]
winsys/amdgpu: use drmGetDevice2 API
Analogous to previous commit
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98502
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Emil Velikov [Thu, 1 Dec 2016 19:51:03 +0000 (19:51 +0000)]
loader: use drmGetDevice[s]2 API
By this allows us to fetch the device list/info w/o the revision field.
At the moment retrieving the latter wakes up the device.
Note: kernel patch to resolve that should be in 4.10.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Emil Velikov [Thu, 1 Dec 2016 19:48:43 +0000 (19:48 +0000)]
autoconf/scons: bump libdrm to 2.4.75
We'll be using the drmGetDevice[s]2 API in src/loader with next patch.
v2: Rebase.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> (v1)
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v1)
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Emil Velikov [Tue, 24 Jan 2017 21:21:10 +0000 (21:21 +0000)]
util/sha1: drop _mesa_sha1_{update, format} return type
Unused/unchecked by any of the callers.
v2: Fix the glsl cases that have crept in since v1
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
Emil Velikov [Tue, 24 Jan 2017 21:21:09 +0000 (21:21 +0000)]
util/sha1: rework _mesa_sha1_{init,final}
Rather than having an extra memory allocation [that we currently do not
and act accordingly] just make the API take an pointer to a stack
allocated instance.
This and follow-up steps will effectively make the _mesa_sha1_foo simple
define/inlines around their SHA1 counterparts.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
Emil Velikov [Tue, 24 Jan 2017 21:21:08 +0000 (21:21 +0000)]
util/sha1: add non-typedef name for the SHA1_CTX struct
Using typedef(s) is not always the answer and makes it harder for people
to do clever (or one might call nasty) things with the code.
Add a struct name which we will use with follow-up commit.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
Bas Nieuwenhuizen [Wed, 15 Mar 2017 07:54:04 +0000 (08:54 +0100)]
radv: Remove unused descriptor set field.
Trivial.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Dave Airlie [Thu, 31 Mar 2016 05:33:16 +0000 (15:33 +1000)]
r600: refactor binding code for attach buffer to CB.
This refactors out the code and fixes it up to be used
for images later. It uses the code in the current RAT binding
for compute.
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Thu, 31 Mar 2016 05:27:42 +0000 (15:27 +1000)]
r600: refactor out CB setup.
This moves the code to create CB info out into
a separate function so it can be reused in images
code to create RATs.
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Thu, 31 Mar 2016 05:24:47 +0000 (15:24 +1000)]
r600: refactor texture resource words setup code.
This refactors out the code to setup a texture resource
so we can reuse it later from the images code.
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Thu, 31 Mar 2016 05:20:42 +0000 (15:20 +1000)]
r600: factor out the code to initialise a buffer resource.
This takes the code required to initialise a buffer resource
out of the texture buffer code, into it's own function.
This is going to be used for the image support later.
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Tue, 26 Jan 2016 03:35:08 +0000 (13:35 +1000)]
r600g: make framebuffer atom rely on dual src blend state.
In order to make ARB_shader_image_load_store, we have to share
the CB space with RATs, so we should only steal the dual src
space if we have dual src enabled.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Jason Ekstrand [Mon, 13 Mar 2017 21:23:34 +0000 (14:23 -0700)]
intel/debug: Add a common INTEL_DEBUG=nohiz option
The GL driver had a driconf option (which doesn't make much sense) and
the Vulkan driver had a hand-rolled environment variable. Instead,
let's tie both into the INTEL_DEBUG mechanism and unify things.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Jason Ekstrand [Mon, 13 Mar 2017 15:10:38 +0000 (08:10 -0700)]
anv/image: Move handling of INTEL_VK_HIZ
This makes it so that you don't get an "Implement gen7 HiZ" perf warning
when you manually disable HiZ on gen8.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Timothy Arceri [Tue, 14 Mar 2017 04:50:34 +0000 (15:50 +1100)]
radv: trivial tidy ups
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Alan Swanson [Mon, 6 Mar 2017 16:17:32 +0000 (16:17 +0000)]
util/disk_cache: scale cache according to filesystem size
Select higher of current 1G default or 10% of filesystem where
cache is located.
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
Alan Swanson [Mon, 6 Mar 2017 16:17:31 +0000 (16:17 +0000)]
util/disk_cache: actually enforce cache size
Currently only a one in one out eviction so if at max_size and
cache files were to constantly increase in size then so would the
cache. Restrict to limit of 8 evictions per new cache entry.
V2: (Timothy Arceri) fix make check tests
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
Alan Swanson [Fri, 10 Mar 2017 16:22:51 +0000 (16:22 +0000)]
util/disk_cache: use LRU eviction rather than random eviction
Still using fast random selection of two-character subdirectory in
which to check cache files rather than scanning entire cache.
v2: Factor out double strlen call
v3: C99 declaration of variables where used
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Timothy Arceri [Tue, 14 Mar 2017 00:22:44 +0000 (11:22 +1100)]
util/disk_cache: don't fallback to an empty cache dir on evict
If we fail to randomly select a two letter cache dir, don't select
an empty dir on fallback.
In real world use we should never hit the fallback path but it can
be hit by tests when the cache is set to a very small max value.
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
Timothy Arceri [Mon, 13 Mar 2017 00:07:30 +0000 (11:07 +1100)]
util/disk_cache: use a thread queue to write to shader cache
This should help reduce any overhead added by the shader cache
when programs are not found in the cache.
To avoid creating any special function just for the sake of the
tests we add a one second delay whenever we call dick_cache_put()
to give it time to finish.
V2: poll for file when waiting for thread in test
V3: fix poll delay to really be 100ms, and simplify the wait function
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
Timothy Arceri [Sun, 12 Mar 2017 23:14:35 +0000 (10:14 +1100)]
util/disk_cache: add helpers for creating/destroying disk cache put jobs
V2: Make a copy of the data so we don't have to worry about it being
freed before we are done compressing/writing.
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
Timothy Arceri [Wed, 8 Mar 2017 23:51:01 +0000 (10:51 +1100)]
util/disk_cache: add thread queue to disk cache
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
Dave Airlie [Tue, 14 Mar 2017 21:15:50 +0000 (07:15 +1000)]
radv/ac: workaround regression in llvm 4.0 release
LLVM 4.0 released with a pretty messy regression, that hopefully
get fixed in the future.
This work around was proposed by Tom, and it fixes the CTS regressions
here at least, I'm not sure if this will cause any major side effects,
but correctness over speed and all that.
radeonsi should possibly consider the same workaround until an llvm
fix can be found.
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Mon, 27 Feb 2017 01:30:41 +0000 (11:30 +1000)]
radv/ac: gather4 cube workaround integer
This fix is extracted from amdgpu-pro shader traces.
It appears the gather4 workaround for integer types doesn't
work for cubes, so instead if forces a float scaled sample,
then converts to integer.
It modifies the descriptor before calling the gather.
This also produces some ugly asm code for reasons specified
in the patch, llvm could probably do better than dumping
sgprs to vgprs.
This fixes:
dEQP-VK.glsl.texture_gather.basic.cube.rgba8*
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Bas Nieuwenhuizen [Tue, 14 Mar 2017 21:57:55 +0000 (22:57 +0100)]
radv: Set driver version to mesa version;
I couldn't really find an encoding in the spec. I'm not sure it
prescribes VK_MAKE_VERSION format, but vulkan.gpuinfo.org interprets
it that way by default. vulkaninfo gives the raw number, so we could
alternatively do something like
17001000, but that doesn't show
up right on vulkan.gpuinfo.org again. Looking at that site, the -pro
driver also uses VK_MAKE_VERSION, so keeping consistency is probably
best.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Bas Nieuwenhuizen [Tue, 14 Mar 2017 21:37:03 +0000 (22:37 +0100)]
radv: Increase api version to 1.0.42.
I've skimmed to changes from 1.0.5 to 1.0.42 and I think we have all
changes. We're still not conformant ofcourse, but this should not
regress stuff,
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Jason Ekstrand [Tue, 14 Mar 2017 02:26:06 +0000 (19:26 -0700)]
util/vk: Add helpers for finding an extension struct
Reviewed-by: Dave Airlie <airlied@redhat.com>
Alex Smith [Tue, 14 Mar 2017 15:26:32 +0000 (15:26 +0000)]
radv: Flush before copying with PKT3_WRITE_DATA in CmdUpdateBuffer
Need to flush before updating the buffer to ensure that the copy is
ordered after previous accesses (assuming the app has performed the
appropriate barriers).
This fixes potential issues due to draws prior to an update reading
the new buffer content, despite having the necessary barriers between
them.
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Bas Nieuwenhuizen [Tue, 14 Mar 2017 20:46:54 +0000 (21:46 +0100)]
radv: Emit cache flushes before CP DMA.
The flushes could be due to TRANSFER barriers.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Jan Beich [Sun, 12 Mar 2017 03:19:14 +0000 (03:19 +0000)]
Convert sed(1) syntax to be compatible with FreeBSD and OpenBSD
BSD regex library doesn't support extended RE escapes (e.g. \+) and
shorthand character classes (e.g. \s, \S) and SVR4-style word
delimiters[1] (on DragonFly and NetBSD). Both GNU and BSD sed support
-E and -r to enable extended RE but OS X still lacks -r.
[1] https://www.illumos.org/issues/516
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Tested-by: Eric Engestrom <eric.engestrom@imgtec.com> (GNU sed)
Jason Ekstrand [Tue, 14 Mar 2017 02:30:26 +0000 (19:30 -0700)]
anv: Properly enumerate physical devices when none are present
Jason Ekstrand [Thu, 9 Mar 2017 04:23:05 +0000 (20:23 -0800)]
nir/constant_expressions: Refactor helper functions
Apart from avoiding some unneeded size cases, this shouldn't have any
actual functional impact.
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Jason Ekstrand [Wed, 8 Mar 2017 03:54:37 +0000 (19:54 -0800)]
nir: Rework conversion opcodes
The NIR story on conversion opcodes is a mess. We've had way too many
of them, naming is inconsistent, and which ones have explicit sizes was
sort-of random. This commit re-organizes things and makes them all
consistent:
- All non-bool conversion opcodes now have the explicit size in the
destination and are named <src_type>2<dst_type><size>.
- Integer <-> integer conversion opcodes now only come in i2i and u2u
forms (i2u and u2i have been removed) since the only difference
between the different integer conversions is whether or not they
sign-extend when up-converting.
- Boolean conversion opcodes all have the explicit size on the bool and
are named <src_type>2<dst_type>.
Making things consistent also allows nir_type_conversion_op to be moved
to nir_opcodes.c and auto-generated using mako. This will make adding
int8, int16, and float16 versions much easier when the time comes.
Reviewed-by: Eric Anholt <eric@anholt.net>
Jason Ekstrand [Wed, 8 Mar 2017 03:32:50 +0000 (19:32 -0800)]
i965/fs: Re-arrange conversion operations
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Jason Ekstrand [Wed, 8 Mar 2017 02:32:17 +0000 (18:32 -0800)]
i965/vec4: Get rid of the type parameter from to/from_double
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Jason Ekstrand [Wed, 8 Mar 2017 00:46:44 +0000 (16:46 -0800)]
glsl/nir: Use nir_type_conversion_op
Using the helper is way better than hand-coding the universe.
Reviewed-by: Eric Anholt <eric@anholt.net>
Jason Ekstrand [Mon, 13 Mar 2017 20:07:24 +0000 (13:07 -0700)]
nir: Rewrite nir_type_conversion_op
The original version was very convoluted and tried way too hard to not
just have the nested switch statement that it needs. Let's just write
the obvious code and then we know it's correct. This fixes a bunch of
missing cases particularly with int64.
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Jason Ekstrand [Wed, 8 Mar 2017 00:46:17 +0000 (16:46 -0800)]
nir: Add a get_nir_type_for_glsl_base_type helper
Reviewed-by: Eric Anholt <eric@anholt.net>
Jason Ekstrand [Wed, 8 Mar 2017 18:32:40 +0000 (10:32 -0800)]
nir/validate: Rework ALU bit-size rule validation
The original bit-size validation wasn't capable of properly dealing with
instructions with variable bit sizes. An attempt was made to handle it
by looking at source and destinations but, because the validation was
done in validate_alu_(src|dest), it didn't really have the needed
information. The new validation code is much more straightforward and
should be more correct.
Reviewed-by: Eric Anholt <eric@anholt.net>
Jason Ekstrand [Fri, 3 Mar 2017 00:25:59 +0000 (16:25 -0800)]
nir/validate: Validate that bit sizes and components always match
We've always required bit sizes to match but the rules for number of
components have been a bit loose. You've never been allowed to source
from something with less components than you consume, but more has
always been fine. This changes the validator to require that they match
exactly. The fact that they don't always match has been a source of
confusion in NIR for quite some time and it's time we got rid of it.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Fri, 3 Mar 2017 05:42:06 +0000 (21:42 -0800)]
nir: Make image_size a variable-width intrinsic
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Fri, 3 Mar 2017 05:39:58 +0000 (21:39 -0800)]
i965/fs: Use num_components from the SSA def in image intrinsics
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Fri, 3 Mar 2017 03:27:57 +0000 (19:27 -0800)]
nir/lower_tex: Use tex_instr_dest_size for txs destinations
Using coord_components of the source texture is correct for everything
except cube maps where it's off by one.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Fri, 3 Mar 2017 03:03:01 +0000 (19:03 -0800)]
nir/spirv: Restrict the number of channels in texture coordinates
Some SPIR-V texturing instructions pack more than the texture coordinate
into the coordinate source. We need to mask off the unused channels.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Fri, 3 Mar 2017 01:10:24 +0000 (17:10 -0800)]
nir/copy_prop: Respect the source's number of components
In the near future we are going to require that the num_components in a
src dereference match the num_components of the SSA value being
dereferenced. To do that, we need copy_prop to not remove our MOVs from
a larger SSA value into an instruction that uses fewer channels.
Because we suddenly have to know how many components each source has,
this makes the pass a bit more complicated. Fortunately, copy
propagation is the only pass that cares about the number of components
are read by any given source so it's fairly contained.
Shader-db results on Sky Lake:
total instructions in shared programs:
13318947 ->
13320265 (0.01%)
instructions in affected programs: 260633 -> 261951 (0.51%)
helped: 324
HURT: 1027
Looking through the hurt programs, about a dozen are hurt by 3
instructions and the rest are all hurt by 2 instructions. From a
spot-check of the shaders, the story is always the same: They get a
vec4 from somewhere (frequently an input) and use the first two or three
components as a texture coordinate. Because of the vector component
mismatch, we have a mov or, more likely, a vecN sitting between the
texture instruction and the input. This means that the back-end inserts
a bunch of MOVs and split_virtual_grfs() goes to town. Because the
texture coordinate is also used by some other calculation, register
coalesce can't combine them back together and we end up with an extra 2
MOV instructions in our shader.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Fri, 3 Mar 2017 01:39:11 +0000 (17:39 -0800)]
nir/intrinsics: Make load_barycentric_input take a 2-component coor
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Cc: "17.0 13.0" <mesa-stable@lists.freedesktop.org>
Jason Ekstrand [Fri, 3 Mar 2017 07:03:03 +0000 (23:03 -0800)]
anv/blorp: Only set a clear color for resolves if fast-cleared
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Jason Ekstrand [Fri, 10 Mar 2017 00:37:23 +0000 (16:37 -0800)]
anv/blorp: Turn off AUX after doing a CCS_D resolve
For render passes with multiple subpasses on gen7, we only fast-clear at
the top but an input attachment use can cause us to do a resolve in the
middle of the render pass. Once we've done so, we are no longer have a
fast-cleared surface so we can just set aux_usage to NONE.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Tapani Pälli [Mon, 13 Mar 2017 12:08:38 +0000 (14:08 +0200)]
android: add '/vulkan' to libmesa_anv_entrypoints path
otherwise generated entrypoint headers are not found during build
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Tapani Pälli [Mon, 13 Mar 2017 12:08:37 +0000 (14:08 +0200)]
android: add src/intel/compiler to libmesa_intel_compiler includes
fixes build error when brw_nir.h not found in the generated file
brw_nir_trig_workarounds.c.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Gwan-gyeong Mun [Tue, 29 Nov 2016 21:59:15 +0000 (06:59 +0900)]
anv: Add missing error-checking to anv_CreateDevice (v3)
This patch adds missing error-checking and fixes resource leak in
allocation failure path on anv_CreateDevice()
v2: Fixes from Jason Ekstrand's review
a) Add missing destructors for all of the state pools on allocation
failure path
b) Add missing destructor for batch bo pools on allocation failure path
v3: Fixes from Emil Velikov's review
Add missing destructor for queue and scratch_pool on allocation failure
path
Signed-off-by: Mun Gwan-gyeong <elongbug@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Dave Airlie [Mon, 13 Mar 2017 20:50:59 +0000 (06:50 +1000)]
radv: setup llvm target data layout
Ported from radeonsi, pointed out by Tom.
"This prevents LLVM from using sext instructions for local memory
offsets and allows the backend to fold immediate offsets into the
instruction. This also prevents some incorrect code generation for
ptrtoint and inttoptr instructions."
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Tom Stellard <tstellar@redhat.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Alex Smith [Mon, 13 Mar 2017 13:28:19 +0000 (13:28 +0000)]
radv: Reinitialise loaderMagic when allocating a cached command buffer
This must be set to ICD_LOADER_MAGIC by vkAllocateCommandBuffers, which
was being done when allocating a new buffer but not when reusing an
existing one in the cache. This would hit an assertion and crash in
debug builds of the Vulkan loader.
Fixes:
682248db451f ("radv: Cache command buffers in command pool.")
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Marek Olšák [Fri, 10 Mar 2017 11:18:07 +0000 (12:18 +0100)]
gallium/radeon: disable the shader cache if dumping shaders
otherwise, cached shaders aren't dumped.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Marek Olšák [Mon, 6 Mar 2017 00:47:52 +0000 (01:47 +0100)]
radeonsi: mark all bound shader buffer ranges as initialized
This should prevent cases when a buffer was incorrectly mapped without
synchronization just because this wasn't done.
Cc: 13.0 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Marek Olšák [Fri, 10 Mar 2017 11:19:50 +0000 (12:19 +0100)]
st/mesa: disable the shader cache if dumping shaders
otherwise, cached shaders aren't dumped.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Chad Versace [Sun, 5 Mar 2017 21:15:06 +0000 (13:15 -0800)]
anv: Use vk_outarray in vkGetPhysicalDeviceQueueFamilyProperties
No intended change in behavior. Just a refactor.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Chad Versace [Sun, 5 Mar 2017 21:07:13 +0000 (13:07 -0800)]
anv: Use vk_outarray in vkEnumeratePhysicalDevices (v2)
No intended change in behavior. Just a refactor.
v2: Replace vk_outarray_is_incomplete() with vk_outarray_status(). For
Jason.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Chad Versace [Sat, 25 Feb 2017 04:58:59 +0000 (20:58 -0800)]
util/vulkan: Add vk_outarray (v2)
This is a wrapper for a Vulkan output array. A Vulkan output array is
one that follows the convention of the parameters to
vkGetPhysicalDeviceQueueFamilyProperties().
v2: Replace vk_outarray_is_incomplete() with vk_outarray_status(). For
Jason.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Lionel Landwerlin [Sun, 12 Mar 2017 16:53:29 +0000 (16:53 +0000)]
intel: genxml: prevent missing ; with address fields dwords
Before this change, the generator could print this kind of things :
const uint32_t v0 =
__gen_uint(values->ValidBit, 0, 0) |
__gen_uint(values->FaultType, 1, 2) |
__gen_uint(values->SRCIDofFault, 3, 10) |
__gen_uint(values->GTTSEL, 11, 1) |
dw[0] = __gen_combine_address(data, &dw[0], values->VirtualAddressofFault, v0);
This change fix the trailing '|'.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Julien Isorce [Fri, 10 Mar 2017 17:16:07 +0000 (17:16 +0000)]
gallium/hud: check NULL return from u_upload_alloc
Fixes the following segmentation fault:
signal SIGSEGV: invalid address (fault address: 0x0)
frame #0: 0x00007fffe718e117 radeonsi_dri.so hud_draw_background_quad hud_context.c:170
167
168 assert(hud->bg.num_vertices + 4 <= hud->bg.max_num_vertices);
169
-> 170 vertices[num++] = (float) x1;
171 vertices[num++] = (float) y1;
172
173 vertices[num++] = (float) x1;
(lldb) bt
* frame #0: 0x00007fffe718e117 radeonsi_dri.so`hud_draw_background_quad
frame #1: 0x00007fffe718f458 radeonsi_dri.so`hud_draw
frame #2: 0x00007fffe712967f radeonsi_dri.so`dri_flush
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Julien Isorce [Fri, 10 Mar 2017 17:20:56 +0000 (17:20 +0000)]
winsys/radeon: check null return from radeon_cs_create_fence in cs_flush
Follow-up of patch:
"radeon_cs_create_fence: check null return from radeon_winsys_bo_create"
radeon_drm_cs_flush
radeon_cs_create_fence
radeon_winsys_bo_create
Signed-off-by: Julien Isorce <jisorce@oblong.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Julien Isorce [Fri, 10 Mar 2017 17:16:05 +0000 (17:16 +0000)]
winsys/radeon: check null in radeon_cs_create_fence
Fixes the following segmentation fault:
radeon_drm_cs_add_buffer (bo=0x0) at radeon_drm_cs.c
-> if (!bo->handle)
(gdb) bt
0 radeon_drm_cs_add_buffer (bo=0x0) at radeon_drm_cs.c
1 0x00007fffe73575de in radeon_cs_create_fence radeon_drm_cs.c
2 0x00007fffe7358c48 in radeon_drm_cs_flush radeon_drm_cs.c
Signed-off-by: Julien Isorce <jisorce@oblong.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Juan A. Suarez Romero [Mon, 13 Mar 2017 15:04:20 +0000 (16:04 +0100)]
vulkan/wsi: include builddir for generated headers
wayland-drm-client-protocol.h is generated in builddir, so when
builddir != srcdir the header is not found, and compilation of
wsi_common_wayland.c will fail.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Jason Ekstrand [Sat, 4 Mar 2017 17:23:26 +0000 (09:23 -0800)]
anv: Use on-the-fly surface states for dynamic buffer descriptors
We have a performance problem with dynamic buffer descriptors. Because
we are currently implementing them by pushing an offset into the shader
and adding that offset onto the already existing offset for the UBO/SSBO
operation, all UBO/SSBO operations on dynamic descriptors are indirect.
The back-end compiler implements indirect pull constant loads using what
basically amounts to a texelFetch instruction. For pull constant loads
with constant offsets, however, we use an oword block read message which
goes through the constant cache and reads a whole cache line at a time.
Because of these two things, direct pull constant loads are much faster
than indirect pull constant loads. Because all loads from dynamically
bound buffers are indirect, the user takes a substantial performance
penalty when using this "performance" feature.
There are two potential solutions I have seen for this problem. The
alternate solution is to continue pushing offsets into the shader but
wire things up in the back-end compiler so that we use the oword block
read messages anyway. The only reason we can do this because we know a
priori that the dynamic offsets are uniform and 16-byte aligned.
Unfortunately, thanks to the 16-byte alignment requirement of the oword
messages, we can't do some general "if the indirect offset is uniform,
use an oword message" sort of thing.
This solution, however, is recommended for a few of reasons:
1. Surface states are relatively cheap. We've been using on-the-fly
surface state setup for some time in GL and it works well. Also,
dynamic offsets with on-the-fly surface state should still be
cheaper than allocating new descriptor sets every time you want to
change a buffer offset which is really the only requirement of the
dynamic offsets feature.
2. This requires substantially less compiler plumbing. Not only can we
delete the entire apply_dynamic_offsets pass but we can also avoid
having to add architecture for passing dynamic offsets to the back-
end compiler in such a way that it can continue using oword messages.
3. We get robust buffer access range-checking for free. Because the
offset and range are baked into the surface state, we no longer need
to pass ranges around and do bounds-checking in the shader.
4. Once we finally get UBO pushing implemented, it will be much easier
to handle pushing chunks of dynamic descriptors if the compiler
remains blissfully unaware of dynamic descriptors.
This commit improves performance of The Talos Principle on ULTRA
settings by around 50% and brings it nicely into line with OpenGL
performance.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Jason Ekstrand [Sat, 11 Mar 2017 07:00:49 +0000 (23:00 -0800)]
anv: Stall before fast-clear operations
During initial CCS bring-up, I discovered that you have to do a full CS
stall prior to doing a CCS resolve as well as afterwards. It appears
that the same is needed for fast-clears as well. This fixes rendering
corruptions on The Talos Principle on Sky Lake GT4. The issue hasn't
been demonstrated on any other hardware however, given that this appears
to be a "too many things in the pipe" problem, having it be easier to
reproduce on a system with more EUs makes sense. The issues with
resolves is demonstrable on a GT3 or GT2 so this is probably also a
problem on all GTs.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Jason Ekstrand [Sat, 4 Mar 2017 18:52:43 +0000 (10:52 -0800)]
anv: Accurately advertise dynamic descriptor limits
The number of dynamic descriptors is limited by both the number of
descriptors and the total number of dynamic things. Because there isn't
a single "maximum dynamic things" limit, we need to divide by two so
that they can create the maximum of both UBOs and SSBOs.
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Cc: "17.0 13.0" <mesa-stable@lists.freedesktop.org>
Jason Ekstrand [Sat, 4 Mar 2017 18:07:56 +0000 (10:07 -0800)]
anv: Add a helper for working with VK_WHOLE_SIZE for buffers
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Rob Clark [Tue, 31 Jan 2017 13:31:37 +0000 (08:31 -0500)]
freedreno/ir3: fragz cannot be half precision
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Mon, 30 Jan 2017 22:27:35 +0000 (17:27 -0500)]
freedreno/ir3: optimize less in glsl
Rely on nir for optimization, to reduce compile times. Very minimal impact
on shader-db:
total instructions in shared programs: 104170 -> 104199 (0.03%)
total dwords in shared programs: 209664 -> 209728 (0.03%)
total full registers used in shared programs: 7156 -> 7161 (0.07%)
total half registers used in shader programs: 109 -> 109 (0.00%)
total const registers used in shared programs: 24222 -> 24224 (0.01%)
half full const instr dwords
helped 12 107 103 112 98
hurt 11 104 105 115 102
But shader db runtime dropped from ~29.3s user to ~20.4s user.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Lionel Landwerlin [Fri, 10 Mar 2017 16:14:43 +0000 (16:14 +0000)]
aubinator/genxml: use gzipped files to store embedded genxml
This reduces the size of the aubinator binary from ~1.4Mb to ~700Kb.
With can now drop the checks on xxd in configure.
v2: Fix incorrect makefile dependency (Lionel)
v3: use $(PYTHON2) (Emil)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Lionel Landwerlin [Fri, 10 Mar 2017 16:12:51 +0000 (16:12 +0000)]
intel: genxml: add script to generate gzipped genxml
v2 (from Dylan):
Add main function
Add missing Copyright
Use print_function
v3: Add actually license (Dylan)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>