Scott D Phillips [Mon, 24 Sep 2018 08:39:33 +0000 (11:39 +0300)]
i965/miptree: Use cpu tiling/detiling when mapping
Rename the (un)map_gtt functions to (un)map_map (map by
returning a map) and add new functions (un)map_tiled_memcpy that
return a shadow buffer populated with the intel_tiled_memcpy
functions.
Tiling/detiling with the cpu will be the only way to handle Yf/Ys
tiling, when support is added for those formats.
v2: Compute extents properly in the x|y-rounded-down case (Chris Wilson)
v3: Add units to parameter names of tile_extents (Nanley Chery)
Use _mesa_align_malloc for the shadow copy (Nanley)
Continue using gtt maps on gen4 (Nanley)
v4: Use streaming_load_memcpy when detiling
v5: (edited by Ken) Move map_tiled_memcpy above map_movntdqa, so it
takes precedence. Add intel_miptree_access_raw, needed after
rebasing on commit
b499b85b0f2cc0c82b7c9af91502c2814fdc8e67.
v6: refactor to changes done for sse41 separation (Tapani)
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v5)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Scott D Phillips [Mon, 24 Sep 2018 05:33:06 +0000 (08:33 +0300)]
i965/tiled_memcpy: inline movntdqa loads in tiled_to_linear
The reference for MOVNTDQA says:
For WC memory type, the nontemporal hint may be implemented by
loading a temporary internal buffer with the equivalent of an
aligned cache line without filling this data to the cache.
[...] Subsequent MOVNTDQA reads to unread portions of the WC
cache line will receive data from the temporary internal
buffer if data is available.
This hidden cache line sized temporary buffer can improve the
read performance from wc maps.
v2: Add mfence at start of tiled_to_linear for streaming loads (Chris)
v3: add Android build support (Tapani)
v4: squash 'fix i915: Fix streaming loads for intel_tiled_memcpy'
separate sse41 to own static library (Tapani)
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v2)
Reviewed-by: Matt Turner <mattst88@gmail.com> (v2)
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Tapani Pälli [Wed, 19 Sep 2018 07:16:58 +0000 (10:16 +0300)]
i965: expose type of memcpy instead of memcpy function itself
There is currently no use of returned memcpy functions outside
intel_tiled_memcpy. Patch changes intel_get_memcpy to return memcpy
type instead of actual function. This makes it easier later to separate
streaming load copy in to own static library.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Eric Engestrom [Tue, 16 Oct 2018 08:43:07 +0000 (09:43 +0100)]
util: use *unsigned* ints for bit operations
Fixes errors thrown by GCC's Undefined Behaviour sanitizer (ubsan) every
time this macro is used.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Eric Engestrom [Thu, 18 Oct 2018 14:51:47 +0000 (15:51 +0100)]
radv: s/abs/fabsf/ for floats
Fixes:
a4c4efad89eceb26cf82 "radv: Rework guard band calculation"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Eric Engestrom [Thu, 11 Oct 2018 15:38:24 +0000 (16:38 +0100)]
meson: drop option description relic
`platforms` is no longer a comma-separated string, and some of our
option descriptions are way too long already. Just drop the incorrect
bit.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Jason Ekstrand [Tue, 2 Oct 2018 03:16:59 +0000 (22:16 -0500)]
st/mesa: Record shader access qualifiers for images
They're not required to be the same as the access flag on the image
unit. For hardware that does shader image lowering based on the
qualifier (Intel), it may be required for state setup.
v2: (by Kenneth Graunke, incorporating feedback from Marek Olšák)
- Reduce both access and shader_access to uint16_t to avoid making
the pipe_image_view structure larger.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Jason Ekstrand [Fri, 19 Oct 2018 19:33:36 +0000 (14:33 -0500)]
nir/algebraic: Provide descriptive asserts for bit size checks
This will hopefully make debugging opt_algebraic bit-size compile
failures easier.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Jason Ekstrand [Fri, 19 Oct 2018 19:31:19 +0000 (14:31 -0500)]
nir/algebraic: Loosen a restriction on variables
Previously, we would fail if a variable had an assigned but unknown bit
size X and we tried to assign it an actual bit size. However, this is
ok because, at the time we do the search, the variable does have an
actual bit size and it will match X because of the NIR rules.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Jason Ekstrand [Fri, 19 Oct 2018 19:03:24 +0000 (14:03 -0500)]
nir/algebraic: A bit of validation refactoring'
We rename some local variables in validate() to be more readable and
plumb the var through to get/set_var_bit_class instead of the var index.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Jason Ekstrand [Fri, 19 Oct 2018 19:01:31 +0000 (14:01 -0500)]
nir/algebraic: Make internal classes str-able
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Jason Ekstrand [Fri, 19 Oct 2018 17:43:43 +0000 (12:43 -0500)]
nir/algebraic: Generalize an optimization
There's nothing boolean about (a | ~a) ~> -1
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Jason Ekstrand [Fri, 19 Oct 2018 03:31:08 +0000 (22:31 -0500)]
nir/algebraic: Use bool internally instead of bool32
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Kenneth Graunke [Mon, 22 Oct 2018 04:41:39 +0000 (21:41 -0700)]
intel: Fix decoding for partial STATE_BASE_ADDRESS updates.
STATE_BASE_ADDRESS only modifies various bases if the "modify" bit is
set. Otherwise, we want to keep the existing base address.
Iris uses this for updating Surface State Base Address while leaving the
others as-is.
v2: Also update aubinator_viewer_decoder (caught by Lionel)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Jason Ekstrand [Sat, 20 Oct 2018 14:10:02 +0000 (09:10 -0500)]
nir: Use nir_src_is_const and nir_src_as_* in core code
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Jason Ekstrand [Sat, 20 Oct 2018 17:07:41 +0000 (12:07 -0500)]
nir/search_helpers: Use nir_src_is_const and friends
This not only makes them safe for more bit sizes but it also fixes a bug
in is_zero_to_one where it would return true for constant NaN.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Jason Ekstrand [Sat, 20 Oct 2018 17:17:30 +0000 (12:17 -0500)]
nir/search: Use nir_src_is_const and friends
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Jason Ekstrand [Sat, 20 Oct 2018 13:36:21 +0000 (08:36 -0500)]
nir: Add some new helpers for working with const sources
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Alyssa Rosenzweig [Sun, 21 Oct 2018 18:29:37 +0000 (11:29 -0700)]
mesa/st: Only call nir_lower_io_to_scalar_early on scalar ISAs
On scalar ISAs, nir_lower_io_to_scalar_early enables significant
optimizations. However, on vector ISAs, it is counterproductive and
impedes optimal codegen. This patch only calls
nir_lower_io_to_scalar_early for scalar ISAs. It appears that at present
there are no upstreamed drivers using Gallium, NIR, and a vector ISA, so
for existing code, this should be a no-op. However, this patch is
necessary for the upcoming Panfrost (Midgard) and Lima (Utgard)
compilers, which are vector.
With this patch, Panfrost is able to consume NIR directly, rather than
TGSI with the TGSI->NIR conversion.
For how this affects Lima, see
https://www.mail-archive.com/mesa-dev@lists.freedesktop.org/msg189216.html
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Dylan Baker [Mon, 22 Oct 2018 14:26:44 +0000 (07:26 -0700)]
meson: don't require libelf for r600 without LLVM
r600 doesn't have a hard requirement on LLVM, and therefore doesn't have
a hard requirement on libelf. Currently the logic doesn't allow that
however.
Distro-bug: https://bugs.gentoo.org/669058
Fixes:
5060c51b6f4dfb0d5358bde6523285163d3faaad
("meson: build r600 driver")
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Sat, 13 Oct 2018 13:46:20 +0000 (08:46 -0500)]
anv,radv: Trivially expose two new VK_GOOGLE extensions
This patch exposes support for the following two extensions:
* VK_GOOGLE_decorate_string
* VK_GOOGLE_hlsl_functionality1
There's nothing for the driver to do; it's all handled in spirv_to_nir.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107971
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Jason Ekstrand [Sat, 13 Oct 2018 13:41:36 +0000 (08:41 -0500)]
spirv: Add no-op support for VK_GOOGLE_hlsl_functionality1
This extension adds two new decorations which carry meaning only for
HLSL shaders. They are expected to be handled by higher level layers
and can be ignored by implementations. However, it does save the client
a bit of work if the implementation safely ignores them instead of the
client having to strip them out of the SPIR-V in order for it to be
valid.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Jason Ekstrand [Sat, 13 Oct 2018 13:33:22 +0000 (08:33 -0500)]
spirv: Add support for SPV_GOOGLE_decorate_string
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Rob Herring [Tue, 24 Jul 2018 09:09:39 +0000 (11:09 +0200)]
android: Build kms_swrast for the Android platform
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Robert Foss <robert.foss@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Connor Abbott [Thu, 18 Oct 2018 13:39:13 +0000 (15:39 +0200)]
ac: Fix loading a dvec3 from an SSBO
The comment was wrong, since the loop above casts to a type with the
correct bitsize already.
Fixes:
7e7ee82698247d8f93fe37775b99f4838b0247dd ("ac: add support for 16bit buffer loads")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Connor Abbott [Thu, 18 Oct 2018 13:30:11 +0000 (15:30 +0200)]
ac: Introduce ac_build_expand()
And implement ac_bulid_expand_to_vec4() on top of it.
Fixes:
7e7ee82698247d8f93fe37775b99f4838b0247dd ("ac: add support for 16bit buffer loads")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Eduardo Lima Mitev [Sun, 21 Oct 2018 18:48:41 +0000 (20:48 +0200)]
ir3/nir: Set up image_dims consts for image_deref_size intrinsic too
`nir_intrinsic_image_deref_size` is not being considered during scan for
driver constants, so image constants are not emitted if a shader
only ever query the size of an image (no load, store, atomic op, etc).
This is unlikely, but possible.
Reviewed-by: Rob Clark <robdclark@gmail.com>
Karol Herbst [Fri, 19 Oct 2018 17:26:39 +0000 (19:26 +0200)]
nv50/ir: fix ConstantFolding::createMul for 64 bit muls
Fixes:
2f52925f5c60c72c9389bfdc122c3d5f8e15b25f
"nv50/ir: move a * b -> a << log2(b) code into createMul()"
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Sonny Jiang [Fri, 19 Oct 2018 20:16:41 +0000 (16:16 -0400)]
radeonsi: Disable clear_state with radeon kernel driver
Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Kenneth Graunke [Tue, 30 Jan 2018 09:32:07 +0000 (01:32 -0800)]
meson: Add -Werror=return-type when supported.
This warning detects non-void functions with a missing return statement,
return statements with a value in void functions, and functions with an
bogus return type that ends up defaulting to int. It's already enabled
by default with -Wall. Generally, these are fairly serious bugs in the
code, which developers would like to notice and fix immediately. This
patch promotes it from a warning to an error, to help developers catch
such mistakes early.
I would not expect this warning to change much based on the compiler
version, so hopefully it won't become a problem for packagers/builders.
See the GCC documentation or 'man gcc' for more details:
https://gcc.gnu.org/onlinedocs/gcc-7.3.0/gcc/Warning-Options.html#index-Wreturn-type
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Jason Ekstrand [Mon, 15 Oct 2018 03:20:17 +0000 (22:20 -0500)]
anv: Define trampolines as the weak functions
Instead of having weak references to the anv functions and separate
trampoline functions with their own dispatch table, just make the
trampoline functions weak. This gets rid of a dispatch table and
potentially lets the compiler delete the unused weak function. The
end result is a reduction in the .text section of 5.7K and a reduction
in the .data section of 1.4K.
Before:
text data bss dec hex filename
3190329 282232 8960 3481521 351fb1 _install/lib64/libvulkan_intel.so
After:
text data bss dec hex filename
3184548 280792 8960 3474300 35037c _install/lib64/libvulkan_intel.so
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Juan A. Suarez Romero [Fri, 19 Oct 2018 16:47:45 +0000 (18:47 +0200)]
docs: fix typo in 18.2.3 release notes link
Fixes:
86b4bd52dc ("docs: update calendar, add news item and link
release notes for 18.2.3")
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Juan A. Suarez Romero [Fri, 19 Oct 2018 16:45:41 +0000 (18:45 +0200)]
docs: update calendar, add news item and link release notes for 18.2.3
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Juan A. Suarez Romero [Fri, 19 Oct 2018 16:43:26 +0000 (18:43 +0200)]
docs: add sha256 checksums for 18.2.3
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
(cherry picked from commit
27fd12857b53ec22c0e918eee6c4c009643fccbc)
Juan A. Suarez Romero [Fri, 19 Oct 2018 16:02:51 +0000 (18:02 +0200)]
docs: add release notes for 18.2.3
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
(cherry picked from commit
d219361b4226944835959676d1721b2a9d29da72)
Jose Fonseca [Thu, 18 Oct 2018 14:04:49 +0000 (15:04 +0100)]
scons: Remove gles option.
It's broken, and WGL state tracker is always built with GLES support
noawadays.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Bas Nieuwenhuizen [Fri, 19 Oct 2018 09:51:47 +0000 (11:51 +0200)]
radv: Fix WSI & PCI bus info initialization order.
Trying to access the bus info before it is initialized is not going
to work.
Fixes:
baa38c144f6 "vulkan/wsi: Use VK_EXT_pci_bus_info for DRM fd matching"
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108491
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Andre Heider <a.heider@gmail.com>
Marek Olšák [Thu, 18 Oct 2018 22:01:00 +0000 (18:01 -0400)]
radeonsi: fix a typo in a comment in emit_guardband
Marek Olšák [Thu, 18 Oct 2018 21:54:24 +0000 (17:54 -0400)]
radeonsi: fix gnome-shell crash
I wasn't expecting to get viewports with the center having
negative coordinates.
Broken by:
6cc79e4411f
Jason Ekstrand [Mon, 15 Oct 2018 02:56:47 +0000 (21:56 -0500)]
Revert "anv: Stop generating weak references for instance entrypoints"
This reverts commit
00bb42105d6edf6e432c0e3712ffb9d3eb0aece4. It was
not as well thought out as I had intended and broke the build when
VK_KHR_display is disabled in the build.
Marek Olšák [Wed, 17 Oct 2018 16:26:54 +0000 (12:26 -0400)]
radeonsi: clamp point size to the limit
This fixes dEQP-GLES2.functional.rasterization.limits.points.
Broken by:
ea039f789d9b54e1bd1d644b6a29863ca3500314
Tested-by: Jakob Bornecrantz <jakob@collabora.com>
Marek Olšák [Tue, 16 Oct 2018 19:10:01 +0000 (15:10 -0400)]
radeonsi: fix a VGT hang with primitive restart on Polaris10 and later
Cc: 18.1 18.2 <mesa-stable@lists.freedesktop.org>
Tested-by: Jakob Bornecrantz <jakob@collabora.com>
Marek Olšák [Wed, 17 Oct 2018 16:41:38 +0000 (12:41 -0400)]
radeonsi: fix a deadlock due to partially-initialized context on CI
Jan Vesely [Thu, 18 Oct 2018 19:15:06 +0000 (15:15 -0400)]
radeonsi: Bump number of allowed global buffers to 32
Fixes assertion failure/crash when running luxmark/luxball on clover.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108272
CC: mesa-stable@lists.freedesktop.org
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Andres Rodriguez [Thu, 18 Oct 2018 19:32:31 +0000 (15:32 -0400)]
radv: fix check for perftest options size
It was using the debug options array size.
CC: mesa-stable@lists.freedesktop.org
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Marek Olšák [Thu, 18 Oct 2018 18:42:42 +0000 (14:42 -0400)]
radeonsi: fix incorrect hw screen offset and guardband computation
It resulted in assertion failures or incorrect rendering.
Broken by:
9e182b8313c5ab952498a76495f57e8420f9e5ad
Jason Ekstrand [Thu, 18 Oct 2018 15:08:32 +0000 (10:08 -0500)]
vulkan/wsi: Use VK_EXT_pci_bus_info for DRM fd matching
This lets us avoid passing the DRM fd around all over the place and gets
us closer to layer utopia.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Michel Dänzer [Mon, 1 Oct 2018 16:43:46 +0000 (18:43 +0200)]
loader/dri3: Also wait for front buffer fence if we triggered it
In that case, we have to wait for the fence to synchronize with the
corresponding drawing we triggered in the X server.
Fixes incorrect display with the i965 driver and some applications, e.g.
solvespace.
Bugzilla: https://bugs.freedesktop.org/108097
Fixes:
aefac10fecc9 "loader/dri3: Only wait for back buffer fences in
dri3_get_buffer"
Tested-by: Sergii Romantsov <sergii.romantsov@globallogic.com>
Jason Ekstrand [Mon, 15 Oct 2018 02:56:47 +0000 (21:56 -0500)]
anv: Stop generating weak references for instance entrypoints
We don't need weak references to instance entrypoints because we never
have more than one of each so we don't need the NULL fall-back. This
also helps us avoid forgetting things because we now get link errors for
missing instance entrypoints.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Jason Ekstrand [Mon, 15 Oct 2018 02:56:34 +0000 (21:56 -0500)]
vulkan/wsi: Implement GetPhysicalDevicePresentRectanglesKHR
This got missed during 1.1 enabling because it was defined as an
interaction between device groups and WSI and it wasn't obvious it was
in the delta.
The idea behind it is that it's supposed to provide a hint to the
application in a multi-GPU setup to indicate which regions of the screen
are being scanned out by which GPU so a multi-device split-screen
rendering application can render each part of the screen on the GPU that
will be presenting it and avoid extra bus traffic between GPUs. On a
single-GPU setup or one which doesn't support this present mode, we need
to do something. We choose to return the window size (or a max-size
rect) if the compositor, X server, or crtc is associated with the given
physical device and zero rectangles otherwise.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Jason Ekstrand [Wed, 17 Oct 2018 19:35:16 +0000 (14:35 -0500)]
vulkan/wsi: Store the instance allocator in wsi_device
We already have wsi_device and we know the instance allocator at
wsi_device_init time so there's no need to pass it into the physical
device queries. This also fixes a memory allocation domain bug that can
occur if CreateSwapchain gets called prior to any queries (not likely)
in which case the cached connection gets allocated off the device
instead of the instance.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Michał Janiszewski [Tue, 16 Oct 2018 21:44:22 +0000 (23:44 +0200)]
st/xlib: Use more appropriate include guard
Signed-off-by: Michał Janiszewski <janisozaur+signed@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com
Michał Janiszewski [Tue, 16 Oct 2018 21:44:21 +0000 (23:44 +0200)]
gallium: Fix mismatched ifdef-guards
Signed-off-by: Michał Janiszewski <janisozaur+signed@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Gert Wollny [Tue, 16 Oct 2018 08:07:49 +0000 (10:07 +0200)]
softpipe: dynamically allocate space for immediate constants
The number of immediate constants was fixed and the size check was
only done by means of an assertion. Given this a shader that emits
more immediate constants would result in a memory corruption when
mesa is build in release mode.
Instead of using this fixed limit allocate the space dynamically, let it
grow as needed, and also remove the unused ImmArray.
Fixes: dEQP-GLES31.functional.ssbo.layout.random.arrays_of_arrays.1
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Timothy Arceri [Wed, 17 Oct 2018 23:23:42 +0000 (10:23 +1100)]
radv: use nir_shrink_vec_array_vars()
Totals from affected shaders:
SGPRS: 1096 -> 1096 (0.00 %)
VGPRS: 1192 -> 1056 (-11.41 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 100940 -> 94384 (-6.49 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 100 -> 112 (12.00 %)
Wait states: 0 -> 0 (0.00 %)
All affected shaders are from Batman Arkham City.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Timothy Arceri [Wed, 17 Oct 2018 23:19:16 +0000 (10:19 +1100)]
radv: use nir_split_array_vars()
We call in the opt loop in case another pass results in an
array with indirect access being turned into direct access.
Totals from affected shaders:
SGPRS: 512 -> 496 (-3.12 %)
VGPRS: 456 -> 452 (-0.88 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 40040 -> 39664 (-0.94 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 41 -> 43 (4.88 %)
Wait states: 0 -> 0 (0.00 %)
All affected shaders are from Batman Arkham City.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Timothy Arceri [Wed, 17 Oct 2018 22:42:17 +0000 (09:42 +1100)]
radv: use nir_opt_find_array_copies()
Totals from affected shaders:
SGPRS: 1112 -> 1112 (0.00 %)
VGPRS: 1492 -> 1196 (-19.84 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 112172 -> 101316 (-9.68 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 93 -> 98 (5.38 %)
Wait states: 0 -> 0 (0.00 %)
All affected shaders are from "Batman: Arkham City" over DXVK.
The pass detects that the temporary array created by DXVK for
storing TCS inputs is a copy of the input arrays and allows
us to avoid copying all of the input data and then indirecting
on it with if-ladders, instead we just do indirect indexing.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Timothy Arceri [Wed, 17 Oct 2018 21:55:46 +0000 (08:55 +1100)]
radv: use nir_opt_copy_prop_vars and nir_opt_dead_write_vars
Totals from affected shaders:
SGPRS: 2856 -> 2856 (0.00 %)
VGPRS: 3236 -> 3248 (0.37 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 236560 -> 233548 (-1.27 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 277 -> 283 (2.17 %)
Wait states: 0 -> 0 (0.00 %)
Even in the cases were we have increased VGPR use it appears
the NIR is improved significantly.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Keith Packard [Thu, 11 Oct 2018 23:05:18 +0000 (16:05 -0700)]
vulkan: Add VK_EXT_calibrated_timestamps extension (radv and anv) [v5]
Offers three clocks, device, clock monotonic and clock monotonic
raw. Could use some kernel support to reduce the deviation between
clock values.
v2:
Ensure deviation is at least as big as the GPU time interval.
v3:
Set device->lost when returning DEVICE_LOST.
Use MAX2 and DIV_ROUND_UP instead of open coding these.
Delete spurious TIMESTAMP in radv version.
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Suggested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
v4:
Add anv_gem_reg_read to anv_gem_stubs.c
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
v5:
Adjust maxDeviation computation to max(sampled_clock_period) +
sample_interval.
Suggested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Topi Pohjolainen [Tue, 16 Oct 2018 11:56:51 +0000 (07:56 -0400)]
intel/compiler/icl: Use invocation id bits 22:16 instead of 23:17
Identifier bits in the dispatch header have changed. See Bspec:
SINGLE_PATCH Payload:
3D Pipeline Stages - 3D Pipeline Geometry -
Hull Shader (HS) Stage IVB+ - Payloads IVB+
Fixes: KHR-GL46.tessellation_shader.tessellation_shader_tc_barriers.barrier_guarded_read_write_calls
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Neil Roberts [Wed, 17 Oct 2018 15:27:07 +0000 (17:27 +0200)]
Fix setting indent-tabs-mode in the Emacs .dir-locals.el files
Some of the .dir-locals.el had the wrong name for the truthy value so
it wasn’t setting indent-tabs-mode.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Rob Clark [Mon, 15 Oct 2018 13:56:35 +0000 (09:56 -0400)]
freedreno/a6xx: don't allocate binning rb
Now that a single cmdstream is used for both binning and draw passes, we
can skip allocation of cmdstream buffer for binning.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Sat, 13 Oct 2018 17:56:05 +0000 (13:56 -0400)]
freedreno/a6xx: single cmdstream for draw+binning
Now that state which is different for draw vs binning pass is split out
into different state-groups with appropriate enable_mask (so the
appropriate one is chosen for draw vs binning), switch over to using a
single cmdstream for both passes.
This should significantly lower draw overhead for CPU bound benchmarks.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Sat, 13 Oct 2018 17:54:32 +0000 (13:54 -0400)]
freedreno/a6xx: split binning vs draw program stateobj's
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Sat, 13 Oct 2018 17:51:33 +0000 (13:51 -0400)]
freedreno/a6xx: split VBO state into binning/draw variants
Blob seems to manage to use same input registers for BS (binning pass)
vs VS (draw pass) shaders, so it can use the same VBO state for both.
We can't quite do that yet, so split them.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Sat, 13 Oct 2018 17:24:24 +0000 (13:24 -0400)]
freedreno/a6xx: move VBO state to stateobj
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Sat, 13 Oct 2018 17:04:16 +0000 (13:04 -0400)]
freedreno/a6xx: move ZSA state to stateobj
Step towards single cmdstream, where we need different state-group-id's
for binning vs draw ZSA state.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Fri, 12 Oct 2018 20:29:22 +0000 (16:29 -0400)]
freedreno/a6xx: remove vismode param
We don't need to keep this IGNORE_VISIBILITY in binning pass. Prep work
for using single cmdstream for both draw and binning passes.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Fri, 12 Oct 2018 20:01:22 +0000 (16:01 -0400)]
freedreno/ir3: move binning-pass fixup for a6xx+
Move this to after ir3_cp (which can add lowered immediates to the const
state) for a6xx+, to ensure the uniform state matches between binning
and vertex shaders. This way we can emit just a single VS_CONST state-
group when we re-use single cmdstream for both binning and draw passes.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Fri, 12 Oct 2018 19:10:46 +0000 (15:10 -0400)]
freedreno/a6xx: a bit more state emit cleanup
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Fri, 12 Oct 2018 18:47:13 +0000 (14:47 -0400)]
freedreno/a6xx: move framebuffer state emit to emit_mrt()
No point in checking this per-draw, since framebuffer change means new
batch.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Fri, 12 Oct 2018 18:41:14 +0000 (14:41 -0400)]
freedreno/a6xx: small emit_mrt() cleanup
On a6xx, this is only used for pfb->cbufs so we can just directly pass
the pfb state.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Wed, 10 Oct 2018 19:58:57 +0000 (15:58 -0400)]
freedreno/a6xx: use program cache
Use the in-memory cache to construct shader program state and re-use it
on subsequent draws, to lower driver overhead.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Tue, 24 Jul 2018 18:12:24 +0000 (14:12 -0400)]
freedreno/ir3: shader variant cache
Cache that maps gallium hwcso (in this case, 'struct ir3_shader') plus
shader variant key to a generation specific state object.
This could eventually replace the linked list of shader variants, but
for now it lets us re-use the work currently done in fdN_program_emit()
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Tue, 2 Oct 2018 16:38:09 +0000 (12:38 -0400)]
freedreno/ir3: move binning_pass out of shader variant key
Prep work for a following patch, that introduces a cache to map from
program state (all shader stages) plus variant key to pre-baked hw
state (which could be emit'd via CP_SET_DRAW_STATE, for example).
To do that, we really want the variant key to be immutable, and to
treat the binning pass shader as an extra shader stage, rather than
as a VS variant.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Tue, 2 Oct 2018 20:04:39 +0000 (16:04 -0400)]
freedreno/ir3: track # of samplers used by shader
This is useful for a6xx to avoid program state from depending on bound
tex/samp state.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Wed, 10 Oct 2018 19:59:29 +0000 (15:59 -0400)]
freedreno/a6xx: texture state obj
Unfortunately gallium doesn't match what the hw wants perfectly here, in
using a separate CSO for each texture/sampler. So we have to use a hash
table to map the collection of texture/samplers to hw state object.
We probably could use separate hw state objects for texture and sampler
state, but mesa/st tends to update the tex and samp state together.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Fri, 12 Oct 2018 13:24:21 +0000 (09:24 -0400)]
freedreno: add resource seqno
Intended to be something more compact than a 64b pointer, which could be
used as a key into hashtables. Prep work for texture state objects.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Sun, 7 Oct 2018 17:59:27 +0000 (13:59 -0400)]
freedreno/a6xx: move const emit to state group
Eventually we want to move nearly everything, but no other state depends
on const state, so this is the easiest one to move first.
For webgl aquarium, this reduces GPU load by about 10%, since for each
fish it does a uniform upload plus draw.. fish frequently are visible in
only a single tile, so this skips the uniform uploads for other tiles.
The additional step of avoiding WFI's when using CP_SET_DRAW_STATE seems
to be work an additional 10% gain for aquarium.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Sun, 7 Oct 2018 17:58:30 +0000 (13:58 -0400)]
freedreno/a6xx: add infrastructure for CP_DRAW_STATE
Add helper to add state-groups to emit, and code to emit CP_DRAW_STATE
packet if we have any state-groups.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Mon, 1 Oct 2018 18:13:06 +0000 (14:13 -0400)]
freedreno: reduce resource dependency tracking overhead
Signed-off-by: Rob Clark <robdclark@gmail.com>
Neil Roberts [Wed, 17 Oct 2018 15:38:03 +0000 (17:38 +0200)]
freedreno: Remove the Emacs mode lines
These are not necessary because the corresponding settings are set via
the .dir-locals.el file anyway. Most of them were missing a ‘:’ after
“tab-width” which was making Emacs display an annoying warning
whenever you open the file.
This patch was made with:
sed -ri '/-\*- mode:/,/^$/d' \
$(find src/gallium/{drivers,winsys} -name \*.\[ch\] \
-exec grep -l -- '-\*- mode:' {} \+)
Signed-off-by: Rob Clark <robdclark@gmail.com>
Neil Roberts [Wed, 17 Oct 2018 15:38:02 +0000 (17:38 +0200)]
freedreno: Fix the Emacs indentation configuration file
The .dir-locals.el had the wrong name for the truthy value so it
wasn’t setting indent-tabs-mode.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Hyunjun Ko [Wed, 17 Oct 2018 12:57:28 +0000 (21:57 +0900)]
freedreno: allocate batches from the cache in launch_grid
Needs to allocate batches from the cache so that it could
get a valid index and make resource dependancy tracking right.
In addition this fixes assertion on debug build since the commit
1a40faa8 landed.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Hyunjun Ko [Wed, 17 Oct 2018 12:57:27 +0000 (21:57 +0900)]
freedreno: adds nondraw param to fd_bc_alloc_batch
Needs to specify nondraw when creating a batch through
fd_bc_alloc_batch since it'd better create a batch through
it rather than fd_batch_create.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Fri, 12 Oct 2018 20:27:59 +0000 (16:27 -0400)]
freedreno/a6xx: remove fd6_emit_render_cntl()
It was dead code carried over from a5xx
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Sat, 15 Sep 2018 18:41:07 +0000 (14:41 -0400)]
freedreno/ir3: fix broken texcoord inputs
TODO not sure if this is best solution, but current logic is broken for
texcoord inputs. It is definitely the simplest solution.
Fixes:
1a24f519663 freedreno/ir3: ignore unused inputs
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Sat, 13 Oct 2018 16:34:09 +0000 (12:34 -0400)]
freedreno: fix off-by-one error in BEGIN_RING()
Signed-off-by: Rob Clark <robdclark@gmail.com>
Marek Olšák [Sun, 23 Sep 2018 00:03:27 +0000 (20:03 -0400)]
util: document a limitation of util_fast_udiv32
trivial
Matt Turner [Thu, 6 Sep 2018 18:15:55 +0000 (11:15 -0700)]
i965/fs: Add 64-bit int immediate support to dump_instructions()
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Marek Olšák [Sun, 7 Oct 2018 02:44:36 +0000 (22:44 -0400)]
radeonsi: track context rolls better for the Vega scissor bug workaround
We should get fewer context rolls with the SET_CONTEXT_REG optimization,
but it would have been for nothing if the scissor state rolled the context
anyway. Don't emit the scissor state if there is no context roll.
Marek Olšák [Sun, 7 Oct 2018 02:53:33 +0000 (22:53 -0400)]
radeonsi: emit sample locations for 1xAA only when the hw bug is present
Marek Olšák [Fri, 24 Aug 2018 04:29:04 +0000 (00:29 -0400)]
radeonsi: use compute shaders for clear_buffer & copy_buffer
Fast color clears should be much faster. Also, fast color clears on
evicted buffers should be 200x faster on GFX8 and older.
Marek Olšák [Sun, 7 Oct 2018 00:56:32 +0000 (20:56 -0400)]
radeonsi: use copy_buffer in buffer_do_flush_region directly
Marek Olšák [Sun, 23 Sep 2018 02:02:32 +0000 (22:02 -0400)]
radeonsi: use faster integer division for instance divisors
We know the divisors when we upload them, so instead we can precompute
and upload division factors derived from each divisor.
This fast division consists of add, mul_hi, and two shifts,
and we have to load 4 dwords intead of 1.
This probably won't affect any apps.
Marek Olšák [Sun, 23 Sep 2018 01:17:52 +0000 (21:17 -0400)]
ac: add helpers for fast integer division by a constant
Marek Olšák [Sat, 29 Sep 2018 01:43:49 +0000 (21:43 -0400)]
radeonsi: use higher subpixel precision (QUANT_MODE) for smaller viewports
Marek Olšák [Sat, 29 Sep 2018 00:57:07 +0000 (20:57 -0400)]
radeonsi: move emission of PA_SU_VTX_CNTL into emit_guardband
We'll modify the quant mode there, which also affects the guarband
computation.
Marek Olšák [Sat, 29 Sep 2018 00:38:26 +0000 (20:38 -0400)]
radeonsi: don't re-upload the sample position constant buffer repeatedly
Marek Olšák [Sat, 29 Sep 2018 00:16:13 +0000 (20:16 -0400)]
radeonsi: set PA_SU_PRIM_FILTER_CNTL optimally