Emil Velikov [Wed, 1 Feb 2017 10:10:38 +0000 (10:10 +0000)]
docs: add release notes for 13.0.4
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit
3255d10da4c2703bfdfcefd8f59b0d8f21dbb43f)
Michel Dänzer [Tue, 31 Jan 2017 06:33:19 +0000 (15:33 +0900)]
winsys/radeon: Allow visible VRAM size > 256MB with kernel driver >= 2.49
The kernel driver reports correct values now.
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Tapani Pälli [Mon, 30 Jan 2017 11:37:47 +0000 (13:37 +0200)]
android: add vulkan build for intel
fixes to issues spotted by Emil Velikov:
- set ANV_TIMESTAMP corretly
- fix typo with VULKAN_GEM_FILES
v2: update to use Makefile.sources under vulkan
instead of having own
v3: update to changes to generate from vk.xml
(commit c7fc310)
v4: remove 'hw' relative path
cleanups, remove unnecessary cruft
review from Emil Velikov:
- move to vulkan folder
- remove timestamp gen, no longer necessary
- more cleanups
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Ilia Mirkin [Tue, 24 Jan 2017 05:26:29 +0000 (00:26 -0500)]
mesa: use same is_color_attachment trick to discern error cases
All the other calls to retrieve the attachment have been covered except
this one - return the proper error for attachment points that are valid
enums but out of bound for the driver.
Fixes GL45-CTS.geometry_shader.layered_fbo.fb_texture_invalid_attachment
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Jason Ekstrand [Tue, 31 Jan 2017 03:53:17 +0000 (19:53 -0800)]
anv: Improve flushing around STATE_BASE_ADDRESS
It is not clear from the docs exactly how pipelined STATE_BASE_ADDRESS
actually is. We know from experimentation that we need to flush the
render cache prior to emitting STATE_BASE_ADDRESS and invalidate the
texture cache afterwards. The only thing the PRM says is that, on gen8+
we're supposed to invalidate the state cache after STATE_BASE_ADDRESS
but experimentation has indicated that doing so does nothing whatsoever.
Since we don't really know, let's do just a bit more flushing in the
hopes that this won't be a problem again. In particular:
1) Do a CS stall before we emit STATE_BASE_ADDRESS since we don't
really know whether or not it's pipelined.
2) Do a data cache flush in case what runs before STATE_BASE_ADDRESS
is a compute shader.
3) Invalidate the state and constant caches after STATE_BASE_ADDRESS
because the state may be getting cached there (we don't really know).
Reported-by: Mark Janes <mark.a.janes@intel.com>
Tested-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Jason Ekstrand [Tue, 31 Jan 2017 23:06:56 +0000 (15:06 -0800)]
anv: Flush render cache before STATE_BASE_ADDRESS on gen7
We had no good reason for *not* doing this on gen7 before but we didn't
know it was needed. Recently, when trying update to Vulkan CTS version
1.0.2 in our CI system, Mark discovered GPU hangs on Haswell that appear
to be STATE_BASE_ADDRESS related. This commit fixes them.
Reported-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Jason Ekstrand [Fri, 27 Jan 2017 20:31:40 +0000 (12:31 -0800)]
isl/formats: Only advertise sampling for A4B4G4R4 on Broadwell
This causes hangs on Broadwell if you try to render to it. I have no
idea how we managed to not hit this earlier.
Tested-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Jason Ekstrand [Fri, 27 Jan 2017 20:32:05 +0000 (12:32 -0800)]
intel/blorp: Handle clearing of A4B4G4R4 on all platforms
Tested-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Tom Stellard [Wed, 1 Feb 2017 00:18:01 +0000 (00:18 +0000)]
radeonsi: Fix build on LLVM < 3.9 v2
This was broken by:
e0cc0a614c96011958bc3a1b84da9168e0e1ccbb
v2:
- Use preprocessor macro
Tested-by: Mark Janes <mark.a.janes@intel.com>
Bas Nieuwenhuizen [Sun, 29 Jan 2017 22:07:10 +0000 (23:07 +0100)]
radv: Enable Float64 support.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Bas Nieuwenhuizen [Sun, 8 Jan 2017 18:38:28 +0000 (19:38 +0100)]
radv/ac: Implement Float64 SSBO loads.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Bas Nieuwenhuizen [Sun, 8 Jan 2017 00:36:30 +0000 (01:36 +0100)]
radv/ac: Implement Float64 UBO loads.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Bas Nieuwenhuizen [Sun, 8 Jan 2017 00:31:07 +0000 (01:31 +0100)]
radv/ac: Implement Float64 load/store var.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Bas Nieuwenhuizen [Thu, 5 Jan 2017 00:36:26 +0000 (01:36 +0100)]
radv/ac: Implement Float64 SSBO stores.
No f16 support as I'm not quite sure about alignment yet.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Bas Nieuwenhuizen [Thu, 5 Jan 2017 00:09:12 +0000 (01:09 +0100)]
radv/ac: Add core Float64 support.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Rob Herring [Mon, 30 Jan 2017 22:54:53 +0000 (16:54 -0600)]
vc4: Enable Neon on arm android builds
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Rob Herring [Mon, 30 Jan 2017 22:54:52 +0000 (16:54 -0600)]
vc4: fix arm64 build with Neon
The addition of Neon assembly breaks on arm64 builds because the assembly
syntax is different. For now, restrict Neon to ARMv7 builds.
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Rob Herring [Mon, 30 Jan 2017 22:54:51 +0000 (16:54 -0600)]
vc4: Make Neon inline assembly clang compatible
clang throws an error on "%r2" and similar. I couldn't find any
documentation on what "%r?" is supposed to mean and I've never seen any
use like that as far as I remember. The parameter is supposed to be
cpu_stride and just %2/%3 should be sufficient.
There's no need for trailing ";" either, so remove those, too.
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tom Stellard [Thu, 15 Dec 2016 15:25:49 +0000 (15:25 +0000)]
radeonsi: Set datalayout on the llvm module
This prevents LLVM from using sext instructions for local memory offsets
and allows the backend to fold immediate offsets into the instruction.
This also prevents some incorrect code generation for ptrtoint and
inttoptr instructions.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Francisco Jerez [Tue, 24 Jan 2017 07:36:46 +0000 (23:36 -0800)]
nir/spirv/glsl450: Implement IEEE-compliant handling of atan2(±∞, ±∞).
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Francisco Jerez [Tue, 24 Jan 2017 21:43:07 +0000 (13:43 -0800)]
glsl: Implement IEEE-compliant handling of atan2(±∞, ±∞).
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Francisco Jerez [Fri, 20 Jan 2017 23:24:30 +0000 (15:24 -0800)]
nir/spirv/glsl450: Rewrite atan2 implementation to fix accuracy and handling of zero/infinity.
See "glsl: Rewrite atan2 implementation to fix accuracy and handling
of zero/infinity." for the rationale, but note that the instruction
count benefit discussed there is somewhat less important for the SPIRV
implementation, because the current code already emitted no control
flow instructions -- Still this saves us one hardware instruction per
scalar component on Intel SKL hardware.
Fixes the following Vulkan CTS tests on Intel hardware:
dEQP-VK.glsl.builtin.precision.atan2.highp_compute.scalar
dEQP-VK.glsl.builtin.precision.atan2.highp_compute.vec2
dEQP-VK.glsl.builtin.precision.atan2.highp_compute.vec3
dEQP-VK.glsl.builtin.precision.atan2.highp_compute.vec4
dEQP-VK.glsl.builtin.precision.atan2.mediump_compute.vec2
dEQP-VK.glsl.builtin.precision.atan2.mediump_compute.vec4
Note that most of the test-cases above expect IEEE-compliant handling
of atan2(±∞, ±∞), which this patch doesn't explicitly handle, so
except for the last two the test-cases above weren't expected to pass
yet. The reason they do is that the i965 back-end implementation of
the NIR fmin and fmax instructions is not quite GLSL-compliant (it
complies with IEEE 754 recommendations though), because fmin/fmax of a
NaN and a non-NaN argument currently always return the non-NaN
argument, which causes atan() to flush NaN to one and return the
expected value. The front-end should probably not be relying on this
behavior for correctness though because other back-ends are likely to
behave differently -- A follow-up patch will handle the atan2(±∞, ±∞)
corner cases explicitly.
v2: Fix up argument scaling to take into account the range and
precision of exotic FP24 hardware. Flip coordinate system for
arguments along the vertical line as if they were on the left
half-plane in order to avoid division by zero which may give
unspecified results on non-GLSL 4.1-capable hardware. Sprinkle in
some more comments.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Francisco Jerez [Sat, 21 Jan 2017 21:41:08 +0000 (13:41 -0800)]
glsl: Rewrite atan2 implementation to fix accuracy and handling of zero/infinity.
This addresses several issues of the current atan2 implementation:
- Negative zero (and negative denorms which end up getting flushed to
zero) isn't handled correctly by the current implementation. The
reason is that it does 'y >= 0' and 'x < 0' comparisons to decide
on which side of the branch cut the argument is, which causes us to
return incorrect results (off by up to 2π) for very small negative
values.
- There is a serious precision problem for x values of large enough
magnitude introduced by the floating point division operation being
implemented as a mul+rcp sequence. This can lead to the quotient
getting flushed to zero in some cases introducing an error of over
8e6 ULP in the result -- Or in the most catastrophic case will
cause us to return NaN instead of the correct value ±π/2 for y=±∞
and x very large. We can fix this easily by scaling down both
arguments when the absolute value of the denominator goes above
certain threshold. The error of this atan2 implementation remains
below 25 ULP in most of its domain except for a neighborhood of y=0
where it reaches a maximum error of about 180 ULP.
- It emits a bunch of instructions including no less than three
if-else branches per scalar component that don't seem to get
optimized out later on. This implementation uses about 13% less
instructions on Intel SKL hardware and doesn't emit any control
flow instructions.
v2: Fix up argument scaling to take into account the range and
precision of exotic FP24 hardware. Flip coordinate system for
arguments along the vertical line as if they were on the left
half-plane in order to avoid division by zero which may give
unspecified results on non-GLSL 4.1-capable hardware. Sprinkle in
some more comments.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Francisco Jerez [Tue, 24 Jan 2017 20:26:54 +0000 (12:26 -0800)]
i965/fs: Fix nir_op_fsign of absolute value.
This does point at the front-end emitting silly code that could have
been optimized out, but the current fsign implementation would emit
bogus IR if abs was set for the argument (because it would apply the
abs modifier on an unsigned integer type), and we shouldn't rely on
the upper layer's optimization passes for correctness.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Francisco Jerez [Tue, 24 Jan 2017 07:59:45 +0000 (23:59 -0800)]
glsl/ir_builder: Add rcp builder.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Francisco Jerez [Tue, 24 Jan 2017 19:41:46 +0000 (11:41 -0800)]
glsl: Fix constant evaluation of the rcp op.
Will avoid a regression in a future commit that introduces some
additional rcp operations. According to the GLSL 4.10 specification:
"Dividing by 0 results in the appropriately signed IEEE Inf."
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Francisco Jerez [Tue, 24 Jan 2017 07:53:03 +0000 (23:53 -0800)]
mesa/program: Translate csel operation from GLSL IR.
This will be used internally by the GLSL front-end in order to
implement some built-in functions. Plumb it through MESA IR for
back-ends that rely on this translation pass.
v2: Add comment.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Wladimir J. van der Laan [Fri, 25 Nov 2016 06:42:43 +0000 (06:42 +0000)]
etnaviv: Set SE.CLIP registers, add margins for scissor/clip registers
This fixes rendering of full-screen quads (and other screen-filling
geometry, e.g. ioquake3 walls up-close) on gc3000. It should be a no-op
on other hardware.
- It looks like SE_CLIP registers were not set at all.
I'm amazed that rendering worked without them. Emit them to
avoid issues on gc3000.
- Define constants
ETNA_SE_SCISSOR_MARGIN_RIGHT (0x1119)
ETNA_SE_SCISSOR_MARGIN_BOTTOM (0x1111)
ETNA_SE_CLIP_MARGIN_RIGHT (0xffff)
ETNA_SE_CLIP_MARGIN_BOTTOM (0xffff)
These demarcate the margin (fixp16) between the computed sizes and the
value sent to the chip. I have set these to the numbers used by the
Vivante driver for gc2000. I am not sure whether any old hardware was
relying on the old numbers, or whether those were just a guess. But if
so, these need to be moved to the _specs structure.
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Acked-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Wladimir J. van der Laan [Tue, 31 Jan 2017 08:23:51 +0000 (09:23 +0100)]
etnaviv: Generate new sin/cos instructions on GC3000
Shaders using sin/cos instructions were not working on GC3000.
The reason for this turns out to be that these chips implement sin/cos
in a different way (but using the same opcodes):
- Need their input scaled by 1/pi instead of 2/pi.
- Output an x and y component, which need to be multiplied to
get the result.
- tex_amode needs to be set to 1.
Add a new bit to the compiler specs and generate these instructions
as necessary.
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Acked-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Nanley Chery [Mon, 30 Jan 2017 20:27:15 +0000 (12:27 -0800)]
anv/cmd_buffer: Use the proper depth input attachment surface state
Commit
2852efcda40274acf3272611c6a3b7731523a72d moved the location of
the depth input attachment surface state from the render pass to the
image view, but failed to update the surface state location used when
emitting the binding table. Fix this by loading the surface state from
the correct location.
Fixes:
dEQP-VK.renderpass.formats.d16_unorm.input.*
dEQP-VK.renderpass.formats.d24_unorm_s8_uint.input.*
dEQP-VK.renderpass.formats.d32_sfloat.input.*
dEQP-VK.renderpass.formats.x8_d24_unorm_pack32.input.*
dEQP-VK.renderpass.attachment_allocation.input_output.93
dEQP-VK.renderpass.attachment_allocation.input_output.92
dEQP-VK.renderpass.attachment_allocation.input_output.82
dEQP-VK.renderpass.attachment_allocation.input_output.46
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Bartosz Tomczyk [Tue, 31 Jan 2017 11:02:20 +0000 (12:02 +0100)]
glsl: fix heap-buffer-overflow
The `end+1` skips the ']', whereas the `strlen+1` includes the final
'\0' in the move to terminate the string.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Wladimir J. van der Laan [Wed, 7 Dec 2016 12:59:54 +0000 (12:59 +0000)]
etnaviv: Cannot render to rb-swapped formats
Exposing rb swapped (or other swizzled) formats for rendering would
involve swizzing in the pixel shader. This is not the case at the
moment, so reject requests for creating such surfaces.
(GPUs that need an extra resolve step anyway due to multiple pixel
pipes, such as gc2000, might also do this swap in the resolve operation.
But this would be tricky to keep track of)
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Acked-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Christian Gmeiner [Tue, 31 Jan 2017 08:10:27 +0000 (09:10 +0100)]
etnaviv: Avoid infinite loop in find_frame()
Use of unsigned loop control variable with '>= 0' would lead
to infinite loop.
Reported by clang:
etnaviv_compiler.c:1024:39: warning: comparison of unsigned expression
>= 0 is always true [-Wtautological-compare]
for (unsigned sp = c->frame_sp; sp >= 0; sp--)
~~ ^ ~
v2: Simply use the same datatype as c->frame_sp is using.
CC: <mesa-stable@lists.freedesktop.org>
Reported-by: Rhys Kidd <rhyskidd@gmail.com>
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Rhys Kidd <rhyskidd@gmail.com>
Dave Airlie [Tue, 31 Jan 2017 00:09:11 +0000 (10:09 +1000)]
radv/ac: apply slice rounding to 1d arrays as well.
Fixes:
dEQP-VK.glsl.texture_functions.texture.*1darray*
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Tue, 31 Jan 2017 00:37:25 +0000 (10:37 +1000)]
radv/geom: check if esgs and gsvs ring exists before filling geom rings
There are some corner cases where you end up with an esgs ring, but no
gsvs ring, test for both before dereferencing.
Fixes:
dEQP-VK.geometry.emit.points_emit_0_end_0
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Fri, 20 Jan 2017 02:42:26 +0000 (12:42 +1000)]
radv: enable geometryShader and multiViewport capabilities.
This enables geometry shader support on radv.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Mon, 30 Jan 2017 19:56:49 +0000 (05:56 +1000)]
radv: handle layer export from vs->fs properly
Fixes:
dEQP-VK.geometry.layered.1d_array.fragment_layer
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Fri, 20 Jan 2017 02:41:19 +0000 (12:41 +1000)]
radv: emit esgs itemsize register.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Fri, 20 Jan 2017 02:40:13 +0000 (12:40 +1000)]
radv: handle prim id inputs to fragment shader.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Fri, 20 Jan 2017 02:33:45 +0000 (12:33 +1000)]
radv: emit geometry shaders to hardware
This emits the compiled geometry shader and other state registers.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Fri, 20 Jan 2017 01:06:52 +0000 (11:06 +1000)]
radv: emit geometry ring size and pointers via preamble (v2)
This uses the scratch infrastructure to handle the esgs
and gsvs rings.
(this replaces the old code that did this with patching).
v2: fix correct ring sizes, reset sizes (Bas)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Fri, 20 Jan 2017 00:21:19 +0000 (10:21 +1000)]
radv: add gs ring size calculations to pipeline.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Thu, 19 Jan 2017 23:55:37 +0000 (09:55 +1000)]
radv: add pipeline creation support for geometry shaders (v2.1)
This adds gs copy shader support to the pipeline cache, and few
geometry related changes.
v2: rebase for spill changes.
v2.1: fix incorrect pipeline destruction.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Thu, 19 Jan 2017 05:23:02 +0000 (15:23 +1000)]
radv/ac: handle primitive id
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Thu, 19 Jan 2017 05:14:31 +0000 (15:14 +1000)]
radv/ac: handle emitting vertex outputs to esgs ring.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Thu, 19 Jan 2017 05:09:19 +0000 (15:09 +1000)]
radv/ac: handle gs inputs
This handles geometry shader inputs written by the vertex (es) shader
to the esgs ring.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Thu, 19 Jan 2017 05:05:37 +0000 (15:05 +1000)]
radv/ac: add geom input support to get deref offset.
This just adds the API and fixes up the callers.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Thu, 19 Jan 2017 04:54:18 +0000 (14:54 +1000)]
radv/ac: handle invocation and primitive id intrinsics
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Thu, 19 Jan 2017 04:52:07 +0000 (14:52 +1000)]
radv/ac: handle geometry emit vertex and end prim intrinsics.
This handles emitting things to the gsvs ring, and sending the
correct GS msgs.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Thu, 19 Jan 2017 04:47:50 +0000 (14:47 +1000)]
radv/ac: handle emitting gs epilogue
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Thu, 19 Jan 2017 03:55:19 +0000 (13:55 +1000)]
radv/ac: add copy shader creation
This create the gs copy shader and compiles it.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Thu, 19 Jan 2017 03:48:26 +0000 (13:48 +1000)]
radv/ac: setup function parameters for vs as es and copy shader.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Thu, 19 Jan 2017 03:43:26 +0000 (13:43 +1000)]
radv: pass some necessary gs info back to state handling.
We need this info to program some registers.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Thu, 19 Jan 2017 03:26:01 +0000 (13:26 +1000)]
radv: emit vertex shader to correct hw block.
This emits the shader to the ES block in the correct case.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Thu, 19 Jan 2017 03:23:55 +0000 (13:23 +1000)]
radv/ac: propogate as_es flag into shader info from key.
This just places the flag into the shader info so we can use it from
the driver after we create the shader.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Thu, 19 Jan 2017 02:58:00 +0000 (12:58 +1000)]
radv: extend shader stage code to cover geometry shaders.
This enables the paths for setting up user ptrs to vs/es and gs.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Wed, 18 Jan 2017 06:13:20 +0000 (16:13 +1000)]
radv/ac: start setting up the geom shader rings (v2)
This sets up the rings and adds the variables
needed to make them work.
v2: rework for sharing ring and scratch
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Wed, 18 Jan 2017 05:22:44 +0000 (15:22 +1000)]
radv/ac: handle geom shader sgpr/vgpr inputs
This just sets up the gpr inputs.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Wed, 18 Jan 2017 05:17:35 +0000 (15:17 +1000)]
radv/ac: add geom shader sendmsg defines.
This just adds some defines needed for geom shaders.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Wed, 18 Jan 2017 05:11:52 +0000 (15:11 +1000)]
radv/ac: add some geom shader info from nir->ac shader.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Wed, 18 Jan 2017 04:48:09 +0000 (14:48 +1000)]
radv: move hw vertex shader emit to separate function
This is to later allow ES shaders to be emitted.
Review-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Wed, 18 Jan 2017 03:55:05 +0000 (13:55 +1000)]
radv: fixup ia multi vgt param code to handle geom shaders.
This fixes up a few of the commented out blocks.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Wed, 18 Jan 2017 03:54:17 +0000 (13:54 +1000)]
radv: add code to set gs_table_depth.
Review-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Wed, 18 Jan 2017 03:50:16 +0000 (13:50 +1000)]
radv: add small helper to denote when a geom shader is in the pipeline.
Review-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Robert Foss [Mon, 30 Jan 2017 21:26:58 +0000 (16:26 -0500)]
radv: Prevent Coverity warning
Prevent Coverity seeing potential errors when src is
no initialized in the switch case.
Coverity-Id: 1396397
Signed-off-by: Robert Foss <robert.foss@collabora.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Timothy Arceri [Fri, 8 Jul 2016 02:44:44 +0000 (12:44 +1000)]
mesa: add new MESA_GLSL flag for printing shader cache debug info
Reviewed-by: Eric Anholt <eric@anholt.net>
Carl Worth [Thu, 14 Apr 2016 01:04:23 +0000 (11:04 +1000)]
glsl: add cache to ctx and add sha1 string fields
We also add a flag for detecting shaders written to shader cache.
V2: dont leak cache
Signed-off-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Carl Worth [Thu, 14 Apr 2016 00:48:19 +0000 (10:48 +1000)]
glsl: add new uniform fields to be used to restore state from cache
Signed-off-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Carl Worth [Mon, 16 Mar 2015 18:46:20 +0000 (11:46 -0700)]
glsl: Switch to disable-by-default for the GLSL shader cache
The shader cache is expected to be developed incrementally over a
fairly long series of commits. For that period of instability, we
require users to opt into the shader cache by setting:
MESA_GLSL_CACHE_ENABLE=1
In the future, when the shader cache is complete, we can revert this
commit so that the cache will be on by default.
The user can always disable the cache with
MESA_GLSL_CACHE_DISABLE=1. That functionality is not affected by this
commit, (nor will it be affected by the future revert).
Reviewed-by: Eric Anholt <eric@anholt.net>
Dave Airlie [Mon, 30 Jan 2017 19:19:56 +0000 (05:19 +1000)]
radv/ac: implement txs for buffer textures.
This fixes a bunch of buffer related:
dEQP-VK.memory.pipeline_barrier.*
tests, that were crashing in LLVM due to this being missing.
Reviewed-by: Andres Rodriguez<andresx7@gmail.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Mon, 30 Jan 2017 18:50:30 +0000 (04:50 +1000)]
radv/ac: handle nir irem opcode.
This fixes:
dEQP-VK.spirv_assembly.instruction.compute.opsrem.*
Reviewed-by: Andres Rodriguez <andresx7@gmail.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org"
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Mon, 30 Jan 2017 06:13:30 +0000 (16:13 +1000)]
radv/ac: fix multisample subpass image.
We weren't adding the fragment position properly.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Mon, 30 Jan 2017 03:17:05 +0000 (13:17 +1000)]
radv: handle transfer_write as a dst flag.
It appears we can get image barriers like:
srcStageMask: VkPipelineStageFlags = 4096 (VK_PIPELINE_STAGE_TRANSFER_BIT)
dstStageMask: VkPipelineStageFlags = 4096 (VK_PIPELINE_STAGE_TRANSFER_BIT)
dependencyFlags: VkDependencyFlags = 0
memoryBarrierCount: uint32_t = 0
pMemoryBarriers: const VkMemoryBarrier* = NULL
bufferMemoryBarrierCount: uint32_t = 0
pBufferMemoryBarriers: const VkBufferMemoryBarrier* = NULL
imageMemoryBarrierCount: uint32_t = 1
pImageMemoryBarriers: const VkImageMemoryBarrier* = 0x7ffc882367b0
pImageMemoryBarriers[0]: const VkImageMemoryBarrier = 0x7ffc882367b0:
sType: VkStructureType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER (45)
pNext: const void* = NULL
srcAccessMask: VkAccessFlags = 4096 (VK_ACCESS_TRANSFER_WRITE_BIT)
dstAccessMask: VkAccessFlags = 4096 (VK_ACCESS_TRANSFER_WRITE_BIT)
oldLayout: VkImageLayout = VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL (7)
newLayout: VkImageLayout = VK_IMAGE_LAYOUT_GENERAL (1)
srcQueueFamilyIndex: uint32_t =
4294967295
dstQueueFamilyIndex: uint32_t =
4294967295
image: VkImage = 0x2df55e0
subresourceRange: VkImageSubresourceRange = 0x7ffc882367e0:
aspectMask: VkImageAspectFlags = 1 (VK_IMAGE_ASPECT_COLOR_BIT)
baseMipLevel: uint32_t = 0
levelCount: uint32_t = 1
baseArrayLayer: uint32_t = 0
layerCount: uint32_t = 1
This fixes all the CTS dEQP-VK.memory.pipeline_barrier.transfer_dst tests here,
not sure if this is a too large hammer.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Samuel Pitoiset [Mon, 30 Jan 2017 12:55:53 +0000 (13:55 +0100)]
r600: fix a compilation warning in r600_screen_create()
Should be r600_common_screen instead of r600_screen.
Fixes:
80157a2c20 ("gallium/radeon: clean up r600_query_init_backend_mask")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Tue, 24 Jan 2017 22:37:56 +0000 (23:37 +0100)]
gallium/radeon: merge dirty_fb_counter and dirty_tex_descriptor_counter
to simplify things in draw_vbo a little
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Fri, 27 Jan 2017 11:11:33 +0000 (12:11 +0100)]
winsys/radeon: clamp vram_vis_size to 256MB
the value from the kernel is wrong
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sun, 29 Jan 2017 21:28:04 +0000 (22:28 +0100)]
radeonsi: handle count_from_stream_output in a few IA_MULTI_VGT_PARAM cases
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sun, 29 Jan 2017 22:59:59 +0000 (23:59 +0100)]
radeonsi: don't invoke DCC decompression in update_all_texture_descriptors
This fixes a bug uncovered by the 17-part patch series, specifically:
"gallium/radeon: merge dirty_fb_counter and dirty_tex_descriptor_counter"
If dirty_tex_counter has been updated and set_shader_image invokes DCC
decompression, the DCC decompression itself checks the counter and updates
descriptors, which in turn invokes the same DCC decompression. The blitter
can't handle the recursion and the driver eventually crashes.
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Thu, 26 Jan 2017 02:02:23 +0000 (03:02 +0100)]
radeonsi: fold info->indirect conditionals into the last one in draw_vbo
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Thu, 26 Jan 2017 01:56:15 +0000 (02:56 +0100)]
radeonsi: atomize the scratch buffer state
The update frequency is very low.
Difference: Only account for the size when allocating a new one and when
starting a new IB, and check for NULL. (v3)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Bartosz Tomczyk [Mon, 30 Jan 2017 13:07:45 +0000 (14:07 +0100)]
r600: Fix stack overflow
Commit
7b5878ee0491e7a93914389a8369cd6752b9757d increased number of
outputs to 64, but left output array intact. This caused stack overflow
when number of outputs is bigger then 32. Found by ASAN.
Cc: "12.0 13.0 17.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Samuel Pitoiset [Mon, 30 Jan 2017 11:52:56 +0000 (12:52 +0100)]
gallium/radeon: add new HUD queries for monitoring the CP
There are even more counters in the CP_STAT register but I think
these ones are enough for now.
v2: only read (and expose) CP_STAT on VI+
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Samuel Pitoiset [Mon, 30 Jan 2017 11:52:24 +0000 (12:52 +0100)]
gallium/radeon: add new GPU-sdma-busy HUD query
For simplicity, GPU-sdma-busy will return 0 on previous gens.
v2: only read SRBM_STATUS2 on Evergreen+
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Samuel Pitoiset [Thu, 26 Jan 2017 19:54:45 +0000 (20:54 +0100)]
gallium/radeon: rename grbm to mmio in the gpu load path
We also want to monitor other MMIO counters like SRBM_STATUS2 in
order to know if SDMA is busy.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Marek Olšák [Thu, 26 Jan 2017 16:29:32 +0000 (17:29 +0100)]
winsys/amdgpu: add a fast exit path into amdgpu_cs_add_buffer
The time spent in the function dropped by 37% for torcs.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Samuel Pitoiset [Fri, 27 Jan 2017 13:35:23 +0000 (14:35 +0100)]
winsys/amdgpu: do not iterate twice when adding fence dependencies
The perf difference is very small, 3.25->2.84% in amdgpu_cs_flush()
in the DXMD benchmark.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Samuel Pitoiset [Fri, 27 Jan 2017 13:35:22 +0000 (14:35 +0100)]
winsys/amdgpu: add one likely() call in amdgpu_cs_flush()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Samuel Pitoiset [Mon, 30 Jan 2017 10:19:14 +0000 (11:19 +0100)]
hud: fix compilation warnings in hud_nic_graph_install()
v2: use PRId64 instead of PRIx64
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Samuel Pitoiset [Fri, 27 Jan 2017 13:34:52 +0000 (14:34 +0100)]
st/mesa: make st_texture_get_sampler_view() static
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Thu, 26 Jan 2017 01:40:34 +0000 (02:40 +0100)]
gallium/radeon: remove r600_common_context::max_db
this cleanup is based on the vulkan driver, which seems to do the same thing
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Thu, 26 Jan 2017 01:16:18 +0000 (02:16 +0100)]
winsys/amdgpu: fix ADDR_REGISTER_VALUE::backendDisables
This would be a fix if the value was used anywhere.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Thu, 26 Jan 2017 00:33:23 +0000 (01:33 +0100)]
gallium/radeon: clean up r600_query_init_backend_mask
This just needs to be done for r600g in the screen.
We don't need an IB submission for every new context created for GCN.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Wed, 25 Jan 2017 01:47:15 +0000 (02:47 +0100)]
radeonsi: precompute IA_MULTI_VGT_PARAM values into a table
The perf difference is very small: 0.99% -> 0.40% for the time spent
in si_get_ia_multi_vgt_param when si_draw_vbo is 20%. Pretty much nothing.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Wed, 25 Jan 2017 02:27:34 +0000 (03:27 +0100)]
radeonsi: move VGT_VERTEX_REUSE_BLOCK_CNTL into shader states for Polaris
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Tue, 24 Jan 2017 23:15:35 +0000 (00:15 +0100)]
radeonsi: state atom IDs don't have to be off by one
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Tue, 24 Jan 2017 23:09:24 +0000 (00:09 +0100)]
radeonsi: use a bitmask for looping over dirty PM4 states
also move it to draw_vbo, because it should be 0 in most cases
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Tue, 24 Jan 2017 22:28:32 +0000 (23:28 +0100)]
radeonsi: atomize L2 prefetches
to move the big conditional statement out of draw_vbo
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Tue, 24 Jan 2017 21:54:06 +0000 (22:54 +0100)]
radeonsi: unbind disabled shader stages to prevent useless L2 prefetches
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Tue, 24 Jan 2017 02:41:05 +0000 (03:41 +0100)]
radeonsi: also prefetch compute shaders
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Tue, 24 Jan 2017 02:25:40 +0000 (03:25 +0100)]
radeonsi: update dirty_level_mask only after the first draw after FB change
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>