Samuel Pitoiset [Wed, 11 Nov 2015 23:59:00 +0000 (00:59 +0100)]
nv50: add support for performance metrics on G84+
Currently only one metric is exposed but more will be added later.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Pierre Moreau <pierre.morrow@free.fr>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Samuel Pitoiset [Tue, 10 Nov 2015 00:27:15 +0000 (01:27 +0100)]
nv50: add compute-related MP perf counters on G84+
These compute-related MP performance counters have been reverse
engineered using CUPTI which is part of NVIDIA CUDA.
As for nvc0, we use a compute kernel to read out those performance
counters, and the command stream to configure them. Note that Tesla
only exposes 4 MP performance counters, while Fermi has 8.
Only G84+ is supported because G80 is an old and weird card.
Tested on G84, G96, G200, MCP79 and GT218 with glxgears, glxspheres64,
xonotic-glx, heaven and valley.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Pierre Moreau <pierre.morrow@free.fr>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Samuel Pitoiset [Wed, 14 Oct 2015 19:42:41 +0000 (21:42 +0200)]
nv50: implement a basic compute support
This adds the ability to launch simple compute kernels like the one I
will use to read out MP performance counters in the upcoming patch.
This compute support is based on the work of Francisco Jerez (aka curro)
that he did as part of his EVoC project in 2011/2012 to get OpenCL
working on Tesla. His original work can be found here:
https://github.com/curro/mesa/commits/nv50-compute
I did some improvements on the original code, like fixing using both 3D
and COMPUTE simultaneously, improving global buffers binding, and making
the code closer to what nvc0 already does. This compute support has been
tested by Pierre Moreau and myself with some compute kernels. This is a
step towards OpenCL.
Speaking about this, it seems like compute programs overlap fragment
programs when they are used both. To fix this, we need to re-validate
fragment programs when binding compute programs and vice versa.
Note that, textures, samplers and surfaces still need to be implemented.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Pierre Moreau <pierre.morrow@free.fr>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Samuel Pitoiset [Sat, 14 Nov 2015 21:57:59 +0000 (22:57 +0100)]
nv50: free interpolation parameters in nv50_program_destroy()
As for nvc0, we need to free memory allocated by interpolation
parameters. This fixes a memory leak spotted by valgrind.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Samuel Pitoiset [Sat, 14 Nov 2015 16:20:09 +0000 (17:20 +0100)]
nvc0: reduce the number of GPR used when reading MP perf counters
No need to allocate more GPR than used in the compute kernel which
reads MP performance counters on Fermi.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Ilia Mirkin [Sat, 14 Nov 2015 15:28:55 +0000 (10:28 -0500)]
nouveau: don't expose HEVC decoding support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
Vinson Lee [Mon, 2 Nov 2015 09:23:59 +0000 (01:23 -0800)]
nir: Silence GCC maybe-uninitialized warnings.
nir/nir_control_flow.c: In function ‘split_block_cursor.isra.11’:
nir/nir_control_flow.c:460:15: warning: ‘after’ may be used uninitialized in this function [-Wmaybe-uninitialized]
*_after = after;
^
nir/nir_control_flow.c:458:16: warning: ‘before’ may be used uninitialized in this function [-Wmaybe-uninitialized]
*_before = before;
^
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Kenneth Graunke [Sat, 7 Nov 2015 09:37:33 +0000 (01:37 -0800)]
i965: Add a SHADER_OPCODE_URB_READ_SIMD8_PER_SLOT opcode.
We need to use per-slot offsets when there's non-uniform indexing,
as each SIMD channel could have a different index. We want to use
them for any non-constant index (even if uniform), as it lives in
the message header instead of the descriptor, allowing us to set
offsets in GRFs rather than immediates.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Kenneth Graunke [Thu, 12 Nov 2015 21:02:05 +0000 (13:02 -0800)]
glsl: Allow implicit int -> uint conversions for the % operator.
GLSL 4.00 and GL_ARB_gpu_shader5 introduced a new int -> uint implicit
conversion rule and updated the rules for modulus to use them. (In
earlier languages, none of the implicit conversion rules did anything
relevant, so there was no point in applying them.)
This allows expressions such as:
int foo;
uint bar;
uint mod = foo % bar;
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Kenneth Graunke [Tue, 10 Nov 2015 08:48:33 +0000 (00:48 -0800)]
i965: Print input/output VUE maps on INTEL_DEBUG=vs, gs.
I've been carrying around a patch to do this for the last few months,
and it's been exceedingly useful for debugging GS and tessellation
problems. I've caught lots of bugs by inspecting the interface
expectations of two adjacent stages.
It's not that much spam, so I figure we may as well just print it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Matt Turner <mattst88@gmail.com>
Kenneth Graunke [Thu, 12 Nov 2015 06:37:53 +0000 (22:37 -0800)]
i965: Make convert_attr_sources_to_hw_regs handle stride == 0.
This makes expressions like component(fs_reg(ATTR, n), 7) get a proper
<0,1,0> region instead of the invalid <0,8,0>.
Nobody uses this today, but I plan to.
v2: Rebase on Matt's changes; simplify.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
Kenneth Graunke [Sun, 8 Nov 2015 06:35:33 +0000 (22:35 -0800)]
nir: Add helpers for getting input/output intrinsic sources.
With the many variants of IO intrinsics, particular sources are often in
different locations. It's convenient to say "give me the indirect
offset" or "give me the vertex index" and have it just work, without
having to think about exactly which kind of intrinsic you have.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Kenneth Graunke [Mon, 19 Oct 2015 18:28:15 +0000 (11:28 -0700)]
nir: Don't lower TCS outputs to temporaries.
We'd like to shadow these when possible, but the current code doesn't
work properly for TCS outputs. For now, disable it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Kenneth Graunke [Mon, 19 Oct 2015 18:44:28 +0000 (11:44 -0700)]
nir: Allow outputs reads and add the relevant intrinsics.
Normally, we rely on nir_lower_outputs_to_temporaries to create shadow
variables for outputs, buffering the results and writing them all out
at the end of the program. However, this is infeasible for tessellation
control shader outputs.
Tessellation control shaders can generate multiple output vertices, and
write per-vertex outputs. These are arrays indexed by the vertex
number; each thread only writes one element, but can read any other
element - including those being concurrently written by other threads.
The barrier() intrinsic synchronizes between threads.
Even if we tried to shadow every output element (which is of dubious
value), we'd have to read updated values in at barrier() time, which
means we need to allow output reads.
Most stages should continue using nir_lower_outputs_to_temporaries(),
but in theory drivers could choose not to if they really wanted.
v2: Rebase to accomodate Jason's review feedback.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Kenneth Graunke [Fri, 2 Oct 2015 07:11:01 +0000 (00:11 -0700)]
nir/lower_io: Introduce nir_store_per_vertex_output intrinsics.
Similar to nir_load_per_vertex_input, but for outputs. This is not
useful in geometry shaders, but will be useful in tessellation shaders.
v2: Change stage_uses_per_vertex_outputs() to is_per_vertex_output(),
taking a nir_variable (requested by Jason Ekstrand).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Kenneth Graunke [Thu, 1 Oct 2015 00:17:35 +0000 (17:17 -0700)]
nir/lower_io: Use load_per_vertex_input intrinsics for TCS and TES.
Tessellation control shader inputs are an array indexed by the vertex
number, like geometry shader inputs. There aren't per-patch TCS inputs.
Tessellation evaluation shaders have both per-vertex and per-patch
inputs. Per-vertex inputs get the new intrinsics; per-patch inputs
continue to use the ordinary load_input intrinsics, as they already
work like we want them to.
v2: Change stage_uses_per_vertex_inputs into is_per_vertex_input(),
which takes a variable (requested by Jason Ekstrand).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Ian Romanick [Mon, 2 Nov 2015 22:29:42 +0000 (14:29 -0800)]
i965: Silence unused parameter warnings in get_buffer_rect
brw_meta_fast_clear.c: In function 'get_buffer_rect':
brw_meta_fast_clear.c:318:37: warning: unused parameter 'brw' [-Wunused-parameter]
get_buffer_rect(struct brw_context *brw, struct gl_framebuffer *fb,
^
brw_meta_fast_clear.c:319:44: warning: unused parameter 'irb' [-Wunused-parameter]
struct intel_renderbuffer *irb, struct rect *rect)
^
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Ian Romanick [Tue, 10 Nov 2015 20:36:58 +0000 (12:36 -0800)]
meta/generate_mipmap: Don't leak the sampler object
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Matt Turner [Fri, 13 Nov 2015 20:16:48 +0000 (12:16 -0800)]
i965: Remove unneeded #includes.
Some of these are no longer needed since all the backends switched to
NIR.
Matt Turner [Fri, 13 Nov 2015 20:13:14 +0000 (12:13 -0800)]
i965: Silence warning.
intel_asm_annotation.c: In function ‘annotation_insert_error’:
intel_asm_annotation.c:214:18:
warning: ‘ann’ may be used uninitialized in this function
[-Wmaybe-uninitialized]
ann->error = ralloc_strdup(annotation->mem_ctx, error);
^
I initially tried changing the type of ann_count to unsigned (is
currently int), since that in addition to the check that it's non-zero
at the beginning of the function seems sufficient to prove that it must
be greater than zero. Unfortunately that wasn't sufficient.
Juha-Pekka Heikkila [Fri, 13 Nov 2015 11:36:43 +0000 (13:36 +0200)]
i965: Don't write beyond allocated memory.
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Matt Turner [Mon, 2 Nov 2015 18:23:12 +0000 (10:23 -0800)]
i965: Use BRW_MRF_COMPR4 macro in more places.
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Tue, 27 Oct 2015 01:41:27 +0000 (18:41 -0700)]
i965: Combine register file field.
The first four values (2-bits) are hardware values, and VGRF, ATTR, and
UNIFORM remain values used in the IR.
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Tue, 27 Oct 2015 00:52:57 +0000 (17:52 -0700)]
i965: Replace HW_REG with ARF/FIXED_GRF.
HW_REGs are (were!) kind of awful. If the file was HW_REG, you had to
look at different fields for type, abs, negate, writemask, swizzle, and
a second file. They also caused annoying problems like immediate sources
being considered scheduling barriers (commit
6148e94e2) and other such
nonsense.
Instead use ARF/FIXED_GRF/MRF for fixed registers in those files.
After a sufficient amount of time has passed since "GRF" was used, we
can rename FIXED_GRF -> GRF, but doing so now would make rebasing awful.
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Mon, 2 Nov 2015 00:25:04 +0000 (00:25 +0000)]
i965/fs: Set stride correctly for immediates in fs_reg(brw_reg).
The fs_reg() constructors for immediates set stride to 0, except for
vector-immediates, which set stride to 1. This patch makes the fs_reg
constructor that takes a brw_reg do likewise, so that stride is set
correctly for cases such as fs_reg(brw_imm_v(...)).
The generator asserts that this is true (and presumably it's useful in
some optimization passes?) and the VF fs_reg constructors did this (by
virtue of the fact that it doesn't override what init() does).
In the next commit, calling this constructor with brw_imm_* will generate
an IMM file register rather than a HW_REG, making this change necessary
to avoid breakage with existing uses of brw_imm_v().
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Mon, 2 Nov 2015 00:22:29 +0000 (00:22 +0000)]
i965/fs: Handle type-V immediates in brw_reg_from_fs_reg().
We use brw_imm_v() to produce type-V immediates, which generates a
brw_reg with fs_reg's .file set to HW_REG. The next commit will rid us
of HW_REGs, so we need to handle BRW_REGISTER_TYPE_V in the IMM case.
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Tue, 27 Oct 2015 00:09:25 +0000 (17:09 -0700)]
i965: Rename GRF to VGRF.
The 2-bit hardware register file field is ARF, GRF, MRF, IMM.
Rename GRF to VGRF (virtual GRF) so that we can reuse the GRF name to
mean an assigned general purpose register.
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Fri, 30 Oct 2015 05:04:22 +0000 (22:04 -0700)]
i965: Move BAD_FILE from the beginning of enum register_file.
I'm going to begin using brw_reg's file field in backend_reg and its
derivatives, and in order to keep the hardware value for ARF as 0, we
have to do something different.
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Fri, 30 Oct 2015 20:53:38 +0000 (13:53 -0700)]
i965: Initialize registers.
The test (file == BAD_FILE) works on registers for which the constructor
has not run because BAD_FILE is zero. The next commit will move
BAD_FILE in the enum so that it's no longer zero.
In the case of this->outputs, the constructor was being run implicitly,
and we were unnecessarily memsetting is to zero.
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Mon, 26 Oct 2015 11:35:14 +0000 (04:35 -0700)]
i965: Use brw_reg's nr field to store register number.
In addition to combining another field, we get replace silliness like
"reg.reg" with something that actually makes sense, "reg.nr"; and no one
will ever wonder again why dst.reg isn't a dst_reg.
Moving the now 16-bit nr field to a 16-bit boundary decreases code size
by about 3k.
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Mon, 26 Oct 2015 11:04:16 +0000 (04:04 -0700)]
i965: Unwrap some lines.
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Mon, 26 Oct 2015 04:14:56 +0000 (21:14 -0700)]
i965/vec4: Remove swizzle/writemask fields from src/dst_reg.
Also allows us to handle HW_REGs in the swizzle() and writemask()
functions.
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Sat, 24 Oct 2015 22:29:03 +0000 (15:29 -0700)]
i965: Remove fixed_hw_reg field from backend_reg.
Since backend_reg now inherits brw_reg, we can use it in place of the
fixed_hw_reg field.
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Sat, 24 Oct 2015 21:55:57 +0000 (14:55 -0700)]
i965: Use immediate storage in inherited brw_reg.
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Fri, 23 Oct 2015 20:11:44 +0000 (13:11 -0700)]
i965: Add and use enum brw_reg_file.
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Fri, 23 Oct 2015 19:17:03 +0000 (12:17 -0700)]
i965: Reorganize brw_reg fields.
Put fields that are meaningless with an immediate in the same storage
with the immediate. This leaves fields type, file, nr, subnr in the
first dword where there's now extra room for expansion.
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Fri, 23 Oct 2015 02:41:30 +0000 (19:41 -0700)]
i965: Make 'dw1' and 'bits' unnamed structures in brw_reg.
Generated by
sed -i -e 's/\.bits\././g' *.c *.h *.cpp
sed -i -e 's/dw1\.//g' *.c *.h *.cpp
and then reverting changes to comments in gen7_blorp.cpp and
brw_fs_generator.cpp.
There wasn't any utility offered by forcing the programmer to list these
to access their fields. Removing them will reduce churn in future
commits.
This is C11 (and gcc has apparently supported it for sometime
"compatibility with other compilers")
See https://gcc.gnu.org/onlinedocs/gcc/Unnamed-Fields.html
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Sat, 24 Oct 2015 22:04:23 +0000 (15:04 -0700)]
i965: Delete type field from backend_reg.
Switching from an implicitly-sized type field to field with an explicit
bit width is safe because we have fewer than 2^4 types, and gcc will
warn if you attempt to set a value that will not fit.
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Sat, 24 Oct 2015 21:35:33 +0000 (14:35 -0700)]
i965: Delete abs/negate fields from backend_reg.
Instead use the ones provided by brw_reg. Also allows us to handle
HW_REGs in the negate() functions.
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Sat, 24 Oct 2015 21:32:03 +0000 (14:32 -0700)]
i965: Make backend_reg inherit from brw_reg.
Some fields (file, type, abs, negate) in brw_reg are shadowed by
backend_reg.
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Fri, 13 Nov 2015 00:02:22 +0000 (16:02 -0800)]
i965/fs: Replace nested ternary with if ladder.
Since the types of the expression were
bool ? src_reg : (bool ? brw_reg : brw_reg)
the result of the second (nested) ternary would be implicitly
converted to a src_reg by the src_reg(struct brw_reg) constructor. I.e.,
bool ? src_reg : src_reg(bool ? brw_reg : brw_reg)
In the next patch, I make backend_reg (the parent of src_reg) inherit
from brw_reg, which changes this expression to return brw_reg, which
throws away any fields that exist in the classes derived from brw_reg.
I.e.,
src_reg(bool ? brw_reg(src_reg) : bool ? brw_reg : brw_reg)
Generally this code was gross, and wasn't actually shorter or easier to
read than an if ladder.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Marek Olšák [Thu, 15 Oct 2015 21:41:35 +0000 (23:41 +0200)]
radeonsi: remove dead code after ES-GS linkage change
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Marek Olšák [Thu, 15 Oct 2015 21:29:00 +0000 (23:29 +0200)]
radeonsi: link ES-GS just like LS-HS
This reduces the shader key for ES.
Use a fixed attrib location based on (semantic name, index).
The ESGS item size is determined by the physical index of the highest ES
output, so it's almost always larger than before, but I think that
shouldn't matter as long as the ESGS ring buffer is large enough.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Marek Olšák [Sun, 8 Nov 2015 12:34:44 +0000 (13:34 +0100)]
radeonsi: calculate optimal GS ring sizes to fix GS hangs on Tonga
I discovered that increasing the ESGS ring size fixes GS hangs on Tonga,
so let's do it properly.
There is now a separate init_config_gs_rings state that is not immutable,
because GS rings are resized when needed.
This also saves some memory. Most apps won't need more than 1MB
per ring per shader engine.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sun, 8 Nov 2015 11:15:54 +0000 (12:15 +0100)]
radeonsi: rename si_update_gs_rings
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sun, 8 Nov 2015 11:12:46 +0000 (12:12 +0100)]
radeonsi: calculate ESGS_RING_ITEMSIZE in create_shader
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sun, 8 Nov 2015 11:05:39 +0000 (12:05 +0100)]
radeonsi: move maximum gs stream calculation into create_shader
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sun, 8 Nov 2015 10:49:33 +0000 (11:49 +0100)]
radeonsi: clean up small duplication in si_shader_gs
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sat, 7 Nov 2015 15:30:01 +0000 (16:30 +0100)]
gallium/radeon: shorten render_cond variable names
and ..._cond -> ..._invert
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sat, 7 Nov 2015 15:24:47 +0000 (16:24 +0100)]
gallium/radeon: remove predicate_drawing flag
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sat, 7 Nov 2015 14:39:39 +0000 (15:39 +0100)]
gallium/radeon: atomize render condition (SET_PREDICATION)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sat, 7 Nov 2015 14:00:55 +0000 (15:00 +0100)]
gallium/radeon: simplify restoring render condition after flush
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sat, 7 Nov 2015 13:55:23 +0000 (14:55 +0100)]
gallium/radeon: don't use PREDICATION_OP_CLEAR
Not setting the predication bit is sufficient.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sat, 7 Nov 2015 13:45:58 +0000 (14:45 +0100)]
gallium/radeon: simplify disabling render condition for u_blitter
just disable it by not setting the predication bit
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sat, 7 Nov 2015 13:36:38 +0000 (14:36 +0100)]
r600g: don't set predication on non-draw packets
This has no effect.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sat, 7 Nov 2015 13:00:30 +0000 (14:00 +0100)]
gallium/radeon: inline the r600_rings structure
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sat, 7 Nov 2015 11:22:56 +0000 (12:22 +0100)]
radeonsi: prevent recursion in si_context_gfx_flush
The recursion can only occur if you modify need_cs_space to always flush.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sat, 7 Nov 2015 12:43:18 +0000 (13:43 +0100)]
gallium/radeon: remove the IB flushing flag
Not needed anymore. A similar flag will be introduced in the next commit,
which will be private in radeonsi.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sat, 7 Nov 2015 12:31:03 +0000 (13:31 +0100)]
gallium/radeon: move GFX/DMA flushing from add_to_buffer_list to need_cs_space
need_cs_space isn't invoked so often and is called before all commands too.
This is a lot cleaner. The code in radeon_add_to_buffer_list always seemed
dodgy to me.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Fri, 6 Nov 2015 20:11:16 +0000 (21:11 +0100)]
radeonsi: rename cache flushing flags once more
KCACHE, TC L1 and TC L2 are renamed to:
- SMEM L1
- VMEM L1
- GLOBAL L2
You can easily tell what they are used for now.
Shaders must deal with coherency issues between both L1s manually,
e.g. by setting GLC=1 or by using s_dcache_*.
BOTH_ICACHE_KCACHE was an unused definition.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sat, 7 Nov 2015 11:07:31 +0000 (12:07 +0100)]
radeonsi: set the DISABLE_WR_CONFIRM flag on CI-VI as well
I missed this in commit
c3e527f93d4281ad6e2ca165eaf6ff588e4faefa
radeonsi: only enable write confirmation on the last CP DMA packet
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Thu, 5 Nov 2015 22:56:38 +0000 (23:56 +0100)]
radeonsi: initialize SX_PS_DOWNCONVERT to 0 on Stoney
otherwise the SX or CB blocks can go bananas
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Cc: mesa-stable@lists.freedesktop.org
Marek Olšák [Tue, 3 Nov 2015 18:35:46 +0000 (19:35 +0100)]
radeonsi: add glClearBufferSubData acceleration
8-bit and 16-bit clears which are not aligned to dwords are done in software.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Fri, 6 Nov 2015 22:16:11 +0000 (23:16 +0100)]
radeonsi: add SI_SAVE_FRAGMENT_STATE blitter flag
Buffer clears via transform feedback won't set this.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Fri, 6 Nov 2015 22:41:15 +0000 (23:41 +0100)]
gallium/u_blitter: add support for multi-dword clear values in clear_buffer
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Fri, 6 Nov 2015 22:42:49 +0000 (23:42 +0100)]
radeonsi: fix a future crash in emit_cb_target_mask
This can't crash currently, but it would crash if clear_buffer
from u_blitter were used with a clean context.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Fri, 6 Nov 2015 22:06:47 +0000 (23:06 +0100)]
radeonsi: fix unaligned clear_buffer fallback
This is unreachable currently, but it will be used by unaligned 8-bit and
16-bit fills.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Thu, 5 Nov 2015 11:24:20 +0000 (12:24 +0100)]
r600g: fix clear_buffer fallback with offset != 0
Discovered by luck. This code path hasn't been exercised since transform
feedback was implemented.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sat, 7 Nov 2015 18:31:55 +0000 (19:31 +0100)]
gallium/radeon: fix PIPE_QUERY_GPU_FINISHED
Broken by the addition of r600_multi_fence
in
3b37155a68acc351cba86a1fa142bd0de2192d4c
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89014
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Brian Paul [Fri, 13 Nov 2015 15:02:05 +0000 (08:02 -0700)]
mesa: minor comment fix in blend.c
Brian Paul [Fri, 13 Nov 2015 15:01:29 +0000 (08:01 -0700)]
docs: add link to Coverity on developer utilities page
Signed-off-by: Brian Paul <brianp@vmware.com>
Brian Paul [Fri, 13 Nov 2015 14:59:42 +0000 (07:59 -0700)]
docs: update VMware driver instructions
Use a LIBDIR variable, set per-platform.
Update the Mesa configuration flags.
Run update-initramfs or dracut, update /etc/modules
Signed-off-by: Brian Paul <brianp@vmware.com>
Daniel Stone [Sat, 7 Nov 2015 18:25:31 +0000 (18:25 +0000)]
egl/wayland: Ignore rects from SwapBuffersWithDamage
eglSwapBuffersWithDamage accepts damage-region rectangles to hint the
compositor that it only needs to redraw certain areas, which was passed
through the wl_surface_damage request, as designed.
Wayland also offers a buffer transformation interface, e.g. to allow
users to render pre-rotated buffers. Unfortunately, there is no way to
query buffer transforms, and the damage region was provided in surface,
rather than buffer, co-ordinate space.
Users could in theory account for this themselves, but EGL also requires
co-ordinates to be passed in GL/mathematical co-ordinate space, with an
inversion to Wayland's natural/scanout co-ordinate space, so
transformations other than a 180-degree rotation will fail as EGL
attempts to subtract the region from (its view of the) surface height.
Pending creation and acceptance of a wl_surface.buffer_damage request,
which will accept co-ordinates in buffer co-ordinate space, pessimise to
always sending full-surface damage.
bce64c6c provides the explanation for why we send maximum-range damage,
rather than the full size of the surface: in the presence of buffer
transformations, full-surface damage may not actually cover the entire
surface.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Iago Toral Quiroga [Fri, 13 Nov 2015 07:51:06 +0000 (08:51 +0100)]
Revert "nir/copy_propagate: do not copy-propagate MOV srcs with source modifiers"
The change proposed in the review leads to piglit regressions because
is_move() is used in other places and relies on the checks for source
modifiers to be there.
Revert this until we agree on a better solution.
Samuel Iglesias Gonsálvez [Thu, 12 Nov 2015 15:14:07 +0000 (16:14 +0100)]
glsl: fix 'shared' layout qualifier related regressions
Commit 8b28b35 added 'shared' as a keyword for compute shaders
but it broke the existing 'shared' layout qualifier support for
uniform and shader storage blocks.
This patch fixes 578 dEQP-GLES31.functional.ssbo.* tests.
v2:
- Move SHARED to interface_block_layout_qualifier (Timothy)
- Don't remove "shared" case insensitive check (Timothy)
- Remove the clearing of shared_storage flag (Timothy)
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Iago Toral Quiroga [Fri, 6 Nov 2015 11:08:49 +0000 (12:08 +0100)]
nir/copy_propagate: do not copy-propagate MOV srcs with source modifiers
If a source operand in a MOV has source modifiers, then we cannot
copy-propagate it from the parent instruction and remove the MOV.
v2: remove the check for source source modifiers from is_move() (Jason)
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Jason Ekstrand [Fri, 13 Nov 2015 05:52:37 +0000 (21:52 -0800)]
nir/vars_to_ssa: Delete dead output set code
This was a remnant of an early attempt to handle output reads in
vars_to_ssa. That attempt was abandon a long time ago but these few lines
were aparently left in the pass and managed to evade review.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Fri, 13 Nov 2015 02:10:22 +0000 (18:10 -0800)]
nir/vars_to_ssa: Rework copy set handling in lower_copies_to_load_store
Previously, we walked through a given deref_node's copies and, after
lowering the copy away, removed it from both the source and destination
copy sets. This commit changes this to only remove it from the other
node's copy set (not the one we're lowering). At the end of the loop, we
just throw away the copy set for the node we're lowering since that node no
longer has any copies. This has two advantages:
1) It's more efficient because we're doing potentially half as many set
search operations.
2) It now properly handles copies from a node to itself. Perviously, it
would delete the copy from the set when processing the destinatioon and
then assert-fail when we couldn't find it for the source.
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92588
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Tue, 10 Nov 2015 22:13:47 +0000 (14:13 -0800)]
nir/validate: Allow subroutine types for the tails of derefs
The shader-subroutine code creates uniforms of type SUBROUTINE for
subroutines that are then read as integers in the backends. If we ever
want to do any optimizations on these, we'll need to come up with a better
plan where they are actual scalars or something, but this works for now.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92859
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Nanley Chery [Fri, 16 Oct 2015 17:14:39 +0000 (10:14 -0700)]
mesa: Replace gl_extensions::EXT_texture3D with ::dummy_true
Mesa unconditionally sets this driver flag to true in
_mesa_init_extensions(). There is therefore no need for
the driver to communicate support for this extension.
Replace the driver capability flag with ::dummy_true.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Brian Paul [Thu, 12 Nov 2015 22:59:21 +0000 (15:59 -0700)]
mesa: fix MSVC build break in extensions.h
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Ilia Mirkin [Mon, 14 Sep 2015 20:23:29 +0000 (16:23 -0400)]
nvc0/ir: add support for TGSI_SEMANTIC_HELPER_INVOCATION
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Mon, 14 Sep 2015 20:23:04 +0000 (16:23 -0400)]
gallium: add support for gl_HelperInvocation semantic
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Ilia Mirkin [Mon, 14 Sep 2015 20:13:43 +0000 (16:13 -0400)]
glsl: add gl_HelperInvocation system value
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jordan Justen [Thu, 12 Nov 2015 06:02:06 +0000 (22:02 -0800)]
glsl: Correctly handle vector extract on function parameter
This commit accidentally used a '==' when '=' was intended.
commit
96b22fb080894ba1840af2372f28a46cc0f40c76
Author: Kristian Høgsberg Kristensen <krh@bitplanet.net>
Date: Wed Nov 4 14:58:54 2015 -0800
glsl: Use array deref for access to vector components
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Nanley Chery [Thu, 15 Oct 2015 19:34:43 +0000 (12:34 -0700)]
mesa: In helpers, only check driver capability for meta
Make API context and version checks done by the helper functions pass
unconditionally while meta is in progress. This transparently makes
extension checks solely dependent on struct gl_extensions while in meta.
v2: Use an 8-bit data type instead of a GLuint
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Nanley Chery [Mon, 26 Oct 2015 22:22:24 +0000 (15:22 -0700)]
mesa/extensions: Prefix global struct and extension type
Rename the following types and variables:
* struct extension -> struct mesa_extension,
like the mesa_format type.
* extension_table -> _mesa_extension_table,
like the _mesa_extension_override_{enables,disables} structs.
Suggested-by: Marek Olšák <marek.olsak@amd.com>
Suggested-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Nanley Chery [Thu, 17 Sep 2015 22:49:40 +0000 (15:49 -0700)]
mesa: Generate a helper function for each extension
Generate functions which determine if an extension is supported in the
current context. Initially, enums were going to be explicitly used with
_mesa_extension_supported(). The idea to embed the function and enums
into generated helper functions was suggested by Kristian Høgsberg.
For performance, the function body no longer uses
_mesa_extension_supported() and, as suggested by Chad Versace, the
functions are also declared static inline.
v2: Place function qualifiers on separate line (Chad)
v3: Move function curly brace to new line (Chad)
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Nanley Chery [Mon, 21 Sep 2015 18:23:33 +0000 (11:23 -0700)]
mesa/extensions: Replace extension::api_set with ::version
The api_set field has no users outside of _mesa_extension_supported().
Remove it and allow the version field to take its place.
The brunt of the transformation was performed with the following vim commands:
s/\(GL [^,]\+\),\s*\d*,\s*\d*\(,\s*\d*\)\(,\s*\d*\)/\1, GLL, GLC\2\3/g
s/\(GLL [^,]\+\)\,\s*\d*/\1, GLL/g
s/\(GLC [^,]\+\)\(,\s*\d*\),\s*\d*\(,\s*\d*\)\(,\s*\d*\)/\1\2, GLC\3\4/g
s/\( ES1[^,]*\)\(,\s*\(\w\|\d\)\+\)\(,\s*\(\w\|\d\)\+\),\s*\d*/\1\2\4, ES1/g
s/\( ES2[^,]*\)\(,\s*\(\w\|\d\)\+\)\(,\s*\(\w\|\d\)\+\)\(,\s*\(\w\|\d\)\+\),\s*\d*/\1\2\4\6, ES2/g
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Nanley Chery [Tue, 8 Sep 2015 19:41:18 +0000 (12:41 -0700)]
mesa/extensions: Use _mesa_extension_supported()
Replace open-coded checks for extension support with
_mesa_extension_supported().
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Nanley Chery [Wed, 2 Sep 2015 18:53:16 +0000 (11:53 -0700)]
mesa/extensions: Create _mesa_extension_supported()
Create a function which determines if an extension is supported in the
current context.
v2: Use common variable names (Emil)
Insert new line between variables and return statement (Chad)
Rename api_set variable to api_bit (Chad)
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Nanley Chery [Tue, 8 Sep 2015 19:25:56 +0000 (12:25 -0700)]
mesa/extensions: Add extension::version
Enable limiting advertised extension support by context version with
finer granularity. This new field is currently unused and is set to
0 everywhere. When it is used, a value of 0 will indicate that the
extension is supported for any version of a context.
v2: Use uint*t type for version and note the expected values (Emil)
Use an 8-bit data type
Reformat macro for better readability (Chad)
v3: Note preparatory nature of commit (Chad)
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Nanley Chery [Wed, 16 Sep 2015 18:27:38 +0000 (11:27 -0700)]
mesa/extensions: Move entries entries to separate file
With this infrastructure set in place, we can now reuse the entries to
generate useful code.
v2: Add the new file into Makefile.sources (Emil)
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Nanley Chery [Wed, 2 Sep 2015 18:26:57 +0000 (11:26 -0700)]
mesa/extensions: Wrap array entries in macros
Now that we're using macros, remove the redundant text from each entry.
Remove comments between the entries to make editing easier and separate
the sections with blank lines. Structure the EXT macros in a way that
helps reviewers verify that no meaning has been altered.
v2: Indent the entries (Chad)
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Nanley Chery [Fri, 11 Sep 2015 16:59:32 +0000 (09:59 -0700)]
mesa/extensions: Remove array sentinel
Simplify future updates to the extension struct array by removing
the sentinel.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Matt Turner [Mon, 29 Jun 2015 22:59:37 +0000 (15:59 -0700)]
i965: Check instructions appear only on supported hardware.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Mon, 29 Jun 2015 21:08:51 +0000 (14:08 -0700)]
i965: Add initial assembly validation pass.
Initially just checks that sources are non-NULL, which would have
alerted us to the problem fixed by commit
6c846dc5.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Wed, 21 Oct 2015 22:23:10 +0000 (15:23 -0700)]
i965: Add annotation_insert_error() and support for printing errors.
Will allow annotations to contain error messages (indicating an
instruction violates a rule for instance) that are printed after the
disassembly of the block.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Thu, 8 Oct 2015 04:04:48 +0000 (21:04 -0700)]
i965: Combine assembly annotations if possible.
Often annotations are identical between sets of consecutive
instructions. We can perhaps avoid some memory allocations by reusing
the previous annotation.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Mon, 29 Jun 2015 21:05:27 +0000 (14:05 -0700)]
i965: Set annotation_info's mem_ctx.
It was being memset to 0 previously.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>