platform/upstream/mesa.git
8 years agoi965: Prevent fast clears for MSRTs on SKL
Neil Roberts [Mon, 16 Nov 2015 13:03:11 +0000 (14:03 +0100)]
i965: Prevent fast clears for MSRTs on SKL

There are currently a bunch of formats that behave strangely when
sampling the cleared color from the MCS buffer on SKL. They seem to
mostly be formats that don't have an alpha component, although it's
not all of them, and we haven't yet found anything in the specs which
would explain this. For now to be on the safe side this patch just
prevents fast clears for MSRTs on SKL altogether so that when fast
clears are eventually enabled it will only be for single-sampled
surfaces. The assumption is that clears are probably more likely to be
used in single-sampled applications anyway so we can at least get them
working and we can enable MSRTs later once we understand the problem
better.

This patch should have no functional effect other than perhaps
receiving fewer perf_debug messages on SKL+.

v2: Improve the commit message to avoid saying the patch disables fast
    clears because it will be merged before fast clears are enabled
    for any surfaces so it doesn't actually disable anything.
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
8 years agovc4: Don't bother lowering uniforms when the same value is used twice.
Eric Anholt [Thu, 12 Nov 2015 01:09:40 +0000 (17:09 -0800)]
vc4: Don't bother lowering uniforms when the same value is used twice.

DEQP likes to do math on uniforms, and the "fmaxabs dst, uni, uni" to get
the absolute value would get lowered.  The lowering doesn't bother to try
to restrict the lifetime of the lowered uniforms, so we'd end up register
allocation failng due to this on 5 of the tests (More tests still fail in
RA, which look like we'll need to reduce lowered uniform lifetimes to
fix).

No changes on shader-db, though fewer extra MOVs are generated on even
glxgears (MOVs pair well enough that it ends up being the same instruction
count).

8 years agovc4: Fix uniform reordering to support reading the same uniform twice.
Eric Anholt [Tue, 17 Nov 2015 04:45:46 +0000 (20:45 -0800)]
vc4: Fix uniform reordering to support reading the same uniform twice.

This does actually happen in the wild (particularly fabs of a uniform), so
we'd like to support it.

8 years agovc4: Fix documentation on vc4_qir_lower_uniforms.c.
Eric Anholt [Thu, 12 Nov 2015 00:50:29 +0000 (16:50 -0800)]
vc4: Fix documentation on vc4_qir_lower_uniforms.c.

8 years agovc4: Add support for nir_op_uge, using the carry bit on QPU_A_SUB.
Eric Anholt [Tue, 10 Nov 2015 23:37:47 +0000 (15:37 -0800)]
vc4: Add support for nir_op_uge, using the carry bit on QPU_A_SUB.

It looks like nir_lower_idiv is going to use it soon, so add support.
With Ilia's change, this fixes one case in fs-op-div-large-uint-uint (with
GL 3.0 forced on).

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
8 years agoi965: Fix PIPE_CONTOL typo.
Kenneth Graunke [Wed, 18 Nov 2015 00:31:14 +0000 (16:31 -0800)]
i965: Fix PIPE_CONTOL typo.

PIPE_CONTOL!!!

8 years agoi965: Add assertion for src_stencil payload size
Ben Widawsky [Tue, 17 Nov 2015 01:23:01 +0000 (17:23 -0800)]
i965: Add assertion for src_stencil payload size

This helps address a coverity warning and prevents future questions about this
code.

Reported-by: Coverity (via Ilia)
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
8 years agoi965: Implement ARB_pipeline_statistics_query tessellation counters.
Kenneth Graunke [Wed, 28 Oct 2015 23:26:15 +0000 (16:26 -0700)]
i965: Implement ARB_pipeline_statistics_query tessellation counters.

We basically just need to uncomment Ben's code.

v2: Fix obvious bugs caught by Ben.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
8 years agoglsl: rename location layout helper
Timothy Arceri [Fri, 13 Nov 2015 04:43:13 +0000 (15:43 +1100)]
glsl: rename location layout helper

Change name from validate -> apply to more accurately describe what
the function does.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
8 years agoglsl: don't validate binding when its not needed
Timothy Arceri [Fri, 13 Nov 2015 00:41:52 +0000 (11:41 +1100)]
glsl: don't validate binding when its not needed

Checking that the flag has been set is all the validation thats
needed here.

Also not calling the binding validation function will make things
much simpler when adding compile time constant support as we
won't need to resolve the binding value.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
8 years agoglsl: remove temp variable to make code easier to read
Timothy Arceri [Fri, 13 Nov 2015 00:28:20 +0000 (11:28 +1100)]
glsl: remove temp variable to make code easier to read

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
8 years agoglsl: cleanup and fix validate matrix function for arrays
Timothy Arceri [Fri, 13 Nov 2015 00:21:42 +0000 (11:21 +1100)]
glsl: cleanup and fix validate matrix function for arrays

Previously if the member was an array of matrices then a
warning message would be incorrectly given.

Also the struct case could never be met so it has been removed.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
8 years agoglsl: use better location in struct and block error messages
Timothy Arceri [Thu, 12 Nov 2015 23:49:48 +0000 (10:49 +1100)]
glsl: use better location in struct and block error messages

Previously we only gave the location for some members and never
gave the variable location. In those cases we were just giving
the location of the struct/block.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
8 years agoglsl: only do type and qualifier validation once per declaration
Timothy Arceri [Thu, 12 Nov 2015 23:27:00 +0000 (10:27 +1100)]
glsl: only do type and qualifier validation once per declaration

For struct and block members previously we were doing it for
every variable declaration.

So for example

struct S {
  atomic_uint x, y, z;
};

Would previously generate three error messages when one is sufficient.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
8 years agoglsl: rename function that processes struct and iface members
Timothy Arceri [Thu, 12 Nov 2015 22:49:31 +0000 (09:49 +1100)]
glsl: rename function that processes struct and iface members

As of the previous commit this function handles only struct/iface
members.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
8 years agoglsl: move block validation outside function that validates members
Timothy Arceri [Thu, 12 Nov 2015 22:45:36 +0000 (09:45 +1100)]
glsl: move block validation outside function that validates members

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
8 years agoglsl: move ast layout qualifier handling code into its own function
Timothy Arceri [Thu, 12 Nov 2015 06:43:52 +0000 (17:43 +1100)]
glsl: move ast layout qualifier handling code into its own function

We now also only apply these rules to variables rather than also
trying to apply them to function params.

V2: move code for handling stream layout qualifier

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
8 years agoi965: Add INTEL_DEBUG=shader_time support for tessellation shaders.
Kenneth Graunke [Tue, 10 Nov 2015 09:53:33 +0000 (01:53 -0800)]
i965: Add INTEL_DEBUG=shader_time support for tessellation shaders.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
8 years agoi965: Add INTEL_DEBUG=tcs,tes and hs,ds flags for tessellation shaders.
Kenneth Graunke [Sun, 26 Jul 2015 02:28:59 +0000 (19:28 -0700)]
i965: Add INTEL_DEBUG=tcs,tes and hs,ds flags for tessellation shaders.

Even though both tessellation shader stages must be used together, I
still think it makes sense to add separate debug flags for each stage.
It makes it possible to read the TCS/HS, rule out problems, then read
the TES/DS separately, without sifting through as much printed text.

I decided to add both the GL names (tcs/tes) and hardware names (hs/ds)
so they can be used interchangeably.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
8 years agoi965: Add more MAX_*_URB_ENTRY_SIZE_BYTES #defines.
Kenneth Graunke [Wed, 11 Nov 2015 02:06:07 +0000 (18:06 -0800)]
i965: Add more MAX_*_URB_ENTRY_SIZE_BYTES #defines.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
8 years agoi965: Add missing stdio.h include to brw_compiler.h.
Kenneth Graunke [Tue, 17 Nov 2015 09:37:27 +0000 (01:37 -0800)]
i965: Add missing stdio.h include to brw_compiler.h.

This is needed for the FILE * type in brw_print_vue_map().

Apparently, all files that include brw_compiler.h already pick this up
via some include chain, so this isn't actually a build fix.  However,
I have patches which introduce new consumers of brw_compiler.h that
fail to build because of the missing #include.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
8 years agoegl: make it clear which platform x11 backend is being used (dri2 or 3)
Martin Peres [Fri, 30 Oct 2015 15:16:35 +0000 (17:16 +0200)]
egl: make it clear which platform x11 backend is being used (dri2 or 3)

Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
Reviewed-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
8 years agoegl/x11_dri3: Implement EGL_KHR_image_pixmap
Boyan Ding [Tue, 21 Jul 2015 15:44:02 +0000 (23:44 +0800)]
egl/x11_dri3: Implement EGL_KHR_image_pixmap

v2: from Martin Peres
 - Replace a tab with spaces

v3: from Martin Peres
 - disable EGL_KHR_image_pixmap when is_different_gpu is set (Axel Davy)

Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Martin Peres <martin.peres@linux.intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
8 years agoloader/dri3: Expose function to create __DRIimage from pixmap
Boyan Ding [Tue, 21 Jul 2015 15:44:01 +0000 (23:44 +0800)]
loader/dri3: Expose function to create __DRIimage from pixmap

Used to support EGL_KHR_image_pixmap.

Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Martin Peres <martin.peres@linux.intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
8 years agoegl/x11: Implement dri3 support with loader's dri3 helper
Boyan Ding [Tue, 21 Jul 2015 15:44:00 +0000 (23:44 +0800)]
egl/x11: Implement dri3 support with loader's dri3 helper

v2: From Martin Peres
 - Tell we are compiling the dri3 backend in configure.ac
 - Update the Makefile.am
 - get rid of the LIBDRM_HAS_RENDERNODE_SUPPORT macro
 - fix some warnings related to EGLuint64KHR to int64_t conversions
 - use dri2_get_dri_config to get the __DRIconfig instead of open-coding it
 - replace the occasional tabs with spaces

v3: From Martin Peres
 - fix and indent problem (Matt Turner)
 - drop the authenticate function, use NULL in the vtable instead (Emil)
 - drop some useless includes (Emil Velikov)
 - mandate libdrm (Emil Velikov)
 - link to xcb-dri3 (Kristian Høgsberg)
 - convert to the new loader interface for drwable (Kristian)
 - remove some dead code after the dropping of some vfuncs (Kristian)
 - add a comment on the topic of rendering to the frontbuffer

v4: From Martin Peres
 - do not expose the preserved swap behavior (Acked by Eric Anholt)

Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
8 years agoegl_dri2: Add a function to let platform code return dri drawable from _EGLSurface
Boyan Ding [Tue, 21 Jul 2015 15:43:59 +0000 (23:43 +0800)]
egl_dri2: Add a function to let platform code return dri drawable from _EGLSurface

dri3 for EGL will use different struct other than dri2_egl_surface for
an EGL surface, the common code only uses __DRIdrawable from that
struct, so instead of converting _EGLSurface to dri2_egl_surface, let
the platform code return the __DRIdrawable by its own (although the
current platforms use the same function).

v2: From Martin Peres
 - convert to the new drawable interface (Kristian)

Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
8 years agoglx/dri3: Convert to use dri3 helper in loader library
Boyan Ding [Tue, 21 Jul 2015 15:43:55 +0000 (23:43 +0800)]
glx/dri3: Convert to use dri3 helper in loader library

v2: From Martin Peres
 - convert to the new drawable interface
 - delete dead code after the dropping of some vfuncs
 - delete the width and height attributes since they are found in the helper

Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
8 years agoloader: Add dri3 helper
Boyan Ding [Tue, 21 Jul 2015 15:43:54 +0000 (23:43 +0800)]
loader: Add dri3 helper

v2: From Martin Peres
 - Try to fit in the 80-col limit as much as possible

v3: From Martin Peres
 - introduce loader_dri3_helper.la to avoid dragging the xcb dep everywhere (Kristian & Emil)
 - get rid of the width, height, dri_screen and is_different_gpu vfuncs (Kristian)
 - replace the create/destroy functions with init/fini for dri3 drawables
 - prefix static functions with dri3_ and exported ones with loader_dri3 (Emil)
 - keep the function definition consistent (Emil)

Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
8 years agoi965: Return the correct value type from brw_compile_gs()
Eduardo Lima Mitev [Tue, 17 Nov 2015 08:49:43 +0000 (09:49 +0100)]
i965: Return the correct value type from brw_compile_gs()

brw_compile_gs() should return a pointer to unsigned, but it is returning the
bool 'false' at some point, hence annoying us with a compiler warning:

In function 'const unsigned int* brw::brw_compile_gs(const brw_compiler*,
   void*, void*, const brw_gs_prog_key*, brw_gs_prog_data*, const nir_shader*,
   gl_shader_program*, int, unsigned int*, char**)':

brw_vec4_gs_visitor.cpp:776:14: warning: converting 'false' to pointer type
                                'const unsigned int*' [-Wconversion-null]
                                return false;
                                       ^
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
8 years agoglsl: copy each field's precision information in glsl_types's structure constructor
Samuel Iglesias Gonsálvez [Mon, 16 Nov 2015 11:35:13 +0000 (12:35 +0100)]
glsl: copy each field's precision information in glsl_types's structure constructor

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
8 years agoglsl: copy each field's precision information from the old gl_PerVertex interface...
Samuel Iglesias Gonsálvez [Mon, 16 Nov 2015 11:02:41 +0000 (12:02 +0100)]
glsl: copy each field's precision information from the old gl_PerVertex interface block

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
8 years agoglsl: copy each field's precision information when generating varying variables
Samuel Iglesias Gonsálvez [Mon, 16 Nov 2015 11:01:37 +0000 (12:01 +0100)]
glsl: copy each field's precision information when generating varying variables

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
8 years agoglsl: initialize data.precision value in ir_variable constructor
Samuel Iglesias Gonsálvez [Mon, 16 Nov 2015 10:59:18 +0000 (11:59 +0100)]
glsl: initialize data.precision value in ir_variable constructor

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
8 years agoglsl/nir: initialize precision field in glsl_struct_field constructor
Samuel Iglesias Gonsálvez [Mon, 16 Nov 2015 10:43:20 +0000 (11:43 +0100)]
glsl/nir: initialize precision field in glsl_struct_field constructor

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
8 years agonir: reduce memory footprint of glsl_struct_field's precision
Samuel Iglesias Gonsálvez [Mon, 16 Nov 2015 09:23:42 +0000 (10:23 +0100)]
nir: reduce memory footprint of glsl_struct_field's precision

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
8 years agomesa: do runtime validation of precision varyings only on ES
Tapani Pälli [Mon, 16 Nov 2015 06:43:12 +0000 (08:43 +0200)]
mesa: do runtime validation of precision varyings only on ES

Precision qualifier should be ignored on desktop OpenGL.

v2: include spec quote (Samuel)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
8 years agoglsl: initialize precision when adding per vertex record fields
Tapani Pälli [Mon, 16 Nov 2015 06:44:18 +0000 (08:44 +0200)]
glsl: initialize precision when adding per vertex record fields

Fixes issues with tessellation builtin variables since precision was
introduced to IR with commit f84bc57d7dc02fceb805803131426c791eadeff9.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965: Set MaxCombinedUniformBlocks properly.
Kenneth Graunke [Fri, 13 Nov 2015 22:55:50 +0000 (14:55 -0800)]
i965: Set MaxCombinedUniformBlocks properly.

Up until now, we've been letting core Mesa initialize it to 36 for us
(which is presumably BRW_MAX_UBO (12) * (VS+GS+FS stages -> 3)).

With compute and tessellation, we need to increase this.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
8 years agoi965: Clean up context constant initialization code.
Kenneth Graunke [Thu, 12 Nov 2015 21:46:16 +0000 (13:46 -0800)]
i965: Clean up context constant initialization code.

This was getting pretty out of hand, and with compute partially in place
and tessellation on the way, it was only going to get worse.

This patch makes a "stage exists?" predicate and a "number of stages"
count and uses them to clean up a lot of calculations.  We can just
loop over shader stages and set things for the ones that exist.  For
combined counts, we can just multiply by the number of stages.

It also tries to organize a little bit.

We should probably use _mesa_has_geometry_shaders/tessellation/compute
here, but we can't because ctx->Version isn't initialized yet.  Perhaps
that could be fixed in the future.

No change in "glxinfo -l" on Broadwell.

v2: Drop stray compute shader hunk.  Mark stage_exists as const.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
8 years agoi965: Convert scalar_* flags to a scalar_stage array.
Kenneth Graunke [Thu, 12 Nov 2015 21:32:13 +0000 (13:32 -0800)]
i965: Convert scalar_* flags to a scalar_stage array.

I was going to add scalar_tcs and scalar_tes flags, and then thought
better of it and decided to convert this to an array.  Simpler.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
8 years agor200: fix bgrx8/xrgb8 blits
Roland Scheidegger [Tue, 17 Nov 2015 00:04:05 +0000 (01:04 +0100)]
r200: fix bgrx8/xrgb8 blits

Since 779cabfc7d022de8b7b9bc7fdac0caffa8646c51 the same txformat table entries
are used for "normal" texturing as well as for blits. However, I forgot to put
in an entry for the bgrx8 (le) and xrgb8 (be) formats - the normal texturing
path can't hit them because the radeon tex format chooser will never chose
them, but we get that format from the dri buffers (at least I assume we got
it from there).
This is untested but essentially addressing the same bug as for radeon.
(I don't think that the second entry per le/be table is actually necessary,
but shouldn't hurt...)

Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
8 years agoradeon: fix bgrx8/xrgb8 blits
Roland Scheidegger [Thu, 12 Nov 2015 18:33:14 +0000 (19:33 +0100)]
radeon: fix bgrx8/xrgb8 blits

Since d21320f6258b2e1780a15c1ca718963d8a15ca18 the same txformat table entries
are used for "normal" texturing as well as for blits. However, I forgot to put
in an entry for the bgrx8 (le) and xrgb8 (be) formats - the normal texturing
path can't hit them because the radeon tex format chooser will never chose
them, but we get that format from the dri buffers (at least I assume we got
it from there). This caused lots of piglit regressions (and probably lots of
trouble outside piglit too).
This fixes bug https://bugs.freedesktop.org/show_bug.cgi?id=92900.

Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
8 years agometa/generate_mipmap: Only modify the draw framebuffer binding in fallback_required
Ian Romanick [Fri, 13 Nov 2015 19:58:41 +0000 (11:58 -0800)]
meta/generate_mipmap: Only modify the draw framebuffer binding in fallback_required

Previously GL_FRAMEBUFFER was used.  However, if GL_EXT_framebuffer_blit
is supported (note: it is supported by every Mesa driver), this is
*sometimes* an alias for GL_DRAW_FRAMEBUFFER (getters) and *sometimes*
an alias for *both* GL_DRAW_FRAMEBUFFER and GL_READ_FRAMEBUFFER
(setters).  As a result, the code saved one binding but modified both.
If the bindings were different, the GL_READ_FRAMEBUFFER would be
incorrect on exit.

Fixes the piglit fbo-generatemipmap-versus-READ_FRAMEBUFFER test.

Ideally this function would use DSA functions and not modify the binding
at all.  However, that would be a much more intrusive change because
_mesa_meta_bind_fbo_image would also need to be modified.
_mesa_meta_bind_fbo_image has a lot of callers.  Much of this code is
about to get a major rework due to bug #92363, so I don't think it
matters too much.  In fact, I discovered this bug while working on the
other bug.  Le bon temps!

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
8 years agonir/glsl: Fix copy-n-paste mistakes from commit 213f864.
Matt Turner [Sun, 15 Nov 2015 01:47:33 +0000 (17:47 -0800)]
nir/glsl: Fix copy-n-paste mistakes from commit 213f864.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
8 years agoradeonsi: enable optimal raster config setting for fiji (v2)
Alex Deucher [Fri, 13 Nov 2015 18:00:30 +0000 (13:00 -0500)]
radeonsi: enable optimal raster config setting for fiji (v2)

Requires proper kernel tiling configuration so check the tiling
config registers.

v2: send the right version of the patch

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: mesa-stable@lists.freedesktop.org
8 years agoradeonsi: use proper GRBM_GFX_INDEX offset for CI+
Alex Deucher [Fri, 13 Nov 2015 21:21:09 +0000 (16:21 -0500)]
radeonsi: use proper GRBM_GFX_INDEX offset for CI+

The offset is different on CI and newer.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
8 years agodocs: Add 16x MSAA on i965 to the release notes
Neil Roberts [Mon, 16 Nov 2015 13:35:46 +0000 (14:35 +0100)]
docs: Add 16x MSAA on i965 to the release notes

Signed-off-by: Neil Roberts <neil@linux.intel.com>
8 years agonv50: add missing header into the sources list
Emil Velikov [Mon, 16 Nov 2015 10:49:14 +0000 (10:49 +0000)]
nv50: add missing header into the sources list

Otherwise it won't end up in the tarball.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
8 years agonir/glsl_to_nir: use _mesa_fls() to compute num_textures
Juan A. Suarez Romero [Fri, 6 Nov 2015 12:23:17 +0000 (12:23 +0000)]
nir/glsl_to_nir: use _mesa_fls() to compute num_textures

Replace the current loop by a direct call to _mesa_fls() function.

It also fixes an implicit bug in the current code where num_textures
seems to be one value less than it should be when sh->Program->SamplersUsed > 0.

For instance, num_textures is 0 instead of 1 when
sh->Program->SamplersUsed is 1.

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
8 years agonir/copy_propagate: do not copy-propagate MOV srcs with source modifiers
Iago Toral Quiroga [Fri, 13 Nov 2015 08:03:55 +0000 (09:03 +0100)]
nir/copy_propagate: do not copy-propagate MOV srcs with source modifiers

If a source operand in a MOV has source modifiers, then we cannot
copy-propagate it from the parent instruction and remove the MOV.

v2: remove the check for source modifiers from is_move() (Jason)

v3: Put the check for source modifiers back into is_move() since
    this function is called from copy_prop_alu_src(). Add source
    modifiers checks to is_vec() instead.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
8 years agonv50,nvc0: disable render condition around clear_* functions
Ilia Mirkin [Sun, 15 Nov 2015 01:14:07 +0000 (20:14 -0500)]
nv50,nvc0: disable render condition around clear_* functions

Only the regular "clear" call is supposed to respect the render
condition. The rest should ignore it.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
8 years agoi965: Introduce a MOV_INDIRECT opcode.
Kenneth Graunke [Sun, 8 Nov 2015 02:58:34 +0000 (18:58 -0800)]
i965: Introduce a MOV_INDIRECT opcode.

The geometry and tessellation control shader stages both read from
multiple URB entries (one per vertex).  The thread payload contains
several URB handles which reference these separate memory segments.

In GLSL, these inputs are represented as per-vertex arrays; the
outermost array index selects which vertex's inputs to read.  This
array index does not necessarily need to be constant.

To handle that, we need to use indirect addressing on GRFs to select
which of the thread payload registers has the appropriate URB handle.
(This is before we can even think about applying the pull model!)

This patch introduces a new opcode which performs a MOV from a
source using VxH indirect addressing (which allows each of the 8
SIMD channels to select distinct data.)

Based on a patch by Jason Ekstrand.

v2: Rename from INDIRECT_THREAD_PAYLOAD_MOV to MOV_INDIRECT; make it
    a bit more generic.  Use regs_read() instead of hacking up the
    register allocator.  (Suggested by Jason Ekstrand.)

v3: Fix regs_read() to be more accurate for small unaligned regions.
    Also rebase on Matt's work.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> [v3]
Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com> [v1]
8 years agonv50: add support for performance metrics on G84+
Samuel Pitoiset [Wed, 11 Nov 2015 23:59:00 +0000 (00:59 +0100)]
nv50: add support for performance metrics on G84+

Currently only one metric is exposed but more will be added later.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Pierre Moreau <pierre.morrow@free.fr>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
8 years agonv50: add compute-related MP perf counters on G84+
Samuel Pitoiset [Tue, 10 Nov 2015 00:27:15 +0000 (01:27 +0100)]
nv50: add compute-related MP perf counters on G84+

These compute-related MP performance counters have been reverse
engineered using CUPTI which is part of NVIDIA CUDA.

As for nvc0, we use a compute kernel to read out those performance
counters, and the command stream to configure them. Note that Tesla
only exposes 4 MP performance counters, while Fermi has 8.

Only G84+ is supported because G80 is an old and weird card.

Tested on G84, G96, G200, MCP79 and GT218 with glxgears, glxspheres64,
xonotic-glx, heaven and valley.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Pierre Moreau <pierre.morrow@free.fr>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
8 years agonv50: implement a basic compute support
Samuel Pitoiset [Wed, 14 Oct 2015 19:42:41 +0000 (21:42 +0200)]
nv50: implement a basic compute support

This adds the ability to launch simple compute kernels like the one I
will use to read out MP performance counters in the upcoming patch.

This compute support is based on the work of Francisco Jerez (aka curro)
that he did as part of his EVoC project in 2011/2012 to get OpenCL
working on Tesla. His original work can be found here:
https://github.com/curro/mesa/commits/nv50-compute

I did some improvements on the original code, like fixing using both 3D
and COMPUTE simultaneously, improving global buffers binding, and making
the code closer to what nvc0 already does. This compute support has been
tested by Pierre Moreau and myself with some compute kernels. This is a
step towards OpenCL.

Speaking about this, it seems like compute programs overlap fragment
programs when they are used both. To fix this, we need to re-validate
fragment programs when binding compute programs and vice versa.

Note that, textures, samplers and surfaces still need to be implemented.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Pierre Moreau <pierre.morrow@free.fr>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
8 years agonv50: free interpolation parameters in nv50_program_destroy()
Samuel Pitoiset [Sat, 14 Nov 2015 21:57:59 +0000 (22:57 +0100)]
nv50: free interpolation parameters in nv50_program_destroy()

As for nvc0, we need to free memory allocated by interpolation
parameters. This fixes a memory leak spotted by valgrind.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
8 years agonvc0: reduce the number of GPR used when reading MP perf counters
Samuel Pitoiset [Sat, 14 Nov 2015 16:20:09 +0000 (17:20 +0100)]
nvc0: reduce the number of GPR used when reading MP perf counters

No need to allocate more GPR than used in the compute kernel which
reads MP performance counters on Fermi.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
8 years agonouveau: don't expose HEVC decoding support
Ilia Mirkin [Sat, 14 Nov 2015 15:28:55 +0000 (10:28 -0500)]
nouveau: don't expose HEVC decoding support

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
8 years agonir: Silence GCC maybe-uninitialized warnings.
Vinson Lee [Mon, 2 Nov 2015 09:23:59 +0000 (01:23 -0800)]
nir: Silence GCC maybe-uninitialized warnings.

nir/nir_control_flow.c: In function ‘split_block_cursor.isra.11’:
nir/nir_control_flow.c:460:15: warning: ‘after’ may be used uninitialized in this function [-Wmaybe-uninitialized]
       *_after = after;
               ^
nir/nir_control_flow.c:458:16: warning: ‘before’ may be used uninitialized in this function [-Wmaybe-uninitialized]
       *_before = before;
                ^

Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
8 years agoi965: Add a SHADER_OPCODE_URB_READ_SIMD8_PER_SLOT opcode.
Kenneth Graunke [Sat, 7 Nov 2015 09:37:33 +0000 (01:37 -0800)]
i965: Add a SHADER_OPCODE_URB_READ_SIMD8_PER_SLOT opcode.

We need to use per-slot offsets when there's non-uniform indexing,
as each SIMD channel could have a different index.  We want to use
them for any non-constant index (even if uniform), as it lives in
the message header instead of the descriptor, allowing us to set
offsets in GRFs rather than immediates.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
8 years agoglsl: Allow implicit int -> uint conversions for the % operator.
Kenneth Graunke [Thu, 12 Nov 2015 21:02:05 +0000 (13:02 -0800)]
glsl: Allow implicit int -> uint conversions for the % operator.

GLSL 4.00 and GL_ARB_gpu_shader5 introduced a new int -> uint implicit
conversion rule and updated the rules for modulus to use them.  (In
earlier languages, none of the implicit conversion rules did anything
relevant, so there was no point in applying them.)

This allows expressions such as:

   int foo;
   uint bar;
   uint mod = foo % bar;

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
8 years agoi965: Print input/output VUE maps on INTEL_DEBUG=vs, gs.
Kenneth Graunke [Tue, 10 Nov 2015 08:48:33 +0000 (00:48 -0800)]
i965: Print input/output VUE maps on INTEL_DEBUG=vs, gs.

I've been carrying around a patch to do this for the last few months,
and it's been exceedingly useful for debugging GS and tessellation
problems.  I've caught lots of bugs by inspecting the interface
expectations of two adjacent stages.

It's not that much spam, so I figure we may as well just print it.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Matt Turner <mattst88@gmail.com>
8 years agoi965: Make convert_attr_sources_to_hw_regs handle stride == 0.
Kenneth Graunke [Thu, 12 Nov 2015 06:37:53 +0000 (22:37 -0800)]
i965: Make convert_attr_sources_to_hw_regs handle stride == 0.

This makes expressions like component(fs_reg(ATTR, n), 7) get a proper
<0,1,0> region instead of the invalid <0,8,0>.

Nobody uses this today, but I plan to.

v2: Rebase on Matt's changes; simplify.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
8 years agonir: Add helpers for getting input/output intrinsic sources.
Kenneth Graunke [Sun, 8 Nov 2015 06:35:33 +0000 (22:35 -0800)]
nir: Add helpers for getting input/output intrinsic sources.

With the many variants of IO intrinsics, particular sources are often in
different locations.  It's convenient to say "give me the indirect
offset" or "give me the vertex index" and have it just work, without
having to think about exactly which kind of intrinsic you have.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
8 years agonir: Don't lower TCS outputs to temporaries.
Kenneth Graunke [Mon, 19 Oct 2015 18:28:15 +0000 (11:28 -0700)]
nir: Don't lower TCS outputs to temporaries.

We'd like to shadow these when possible, but the current code doesn't
work properly for TCS outputs.  For now, disable it.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
8 years agonir: Allow outputs reads and add the relevant intrinsics.
Kenneth Graunke [Mon, 19 Oct 2015 18:44:28 +0000 (11:44 -0700)]
nir: Allow outputs reads and add the relevant intrinsics.

Normally, we rely on nir_lower_outputs_to_temporaries to create shadow
variables for outputs, buffering the results and writing them all out
at the end of the program.  However, this is infeasible for tessellation
control shader outputs.

Tessellation control shaders can generate multiple output vertices, and
write per-vertex outputs.  These are arrays indexed by the vertex
number; each thread only writes one element, but can read any other
element - including those being concurrently written by other threads.
The barrier() intrinsic synchronizes between threads.

Even if we tried to shadow every output element (which is of dubious
value), we'd have to read updated values in at barrier() time, which
means we need to allow output reads.

Most stages should continue using nir_lower_outputs_to_temporaries(),
but in theory drivers could choose not to if they really wanted.

v2: Rebase to accomodate Jason's review feedback.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
8 years agonir/lower_io: Introduce nir_store_per_vertex_output intrinsics.
Kenneth Graunke [Fri, 2 Oct 2015 07:11:01 +0000 (00:11 -0700)]
nir/lower_io: Introduce nir_store_per_vertex_output intrinsics.

Similar to nir_load_per_vertex_input, but for outputs.  This is not
useful in geometry shaders, but will be useful in tessellation shaders.

v2: Change stage_uses_per_vertex_outputs() to is_per_vertex_output(),
    taking a nir_variable (requested by Jason Ekstrand).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
8 years agonir/lower_io: Use load_per_vertex_input intrinsics for TCS and TES.
Kenneth Graunke [Thu, 1 Oct 2015 00:17:35 +0000 (17:17 -0700)]
nir/lower_io: Use load_per_vertex_input intrinsics for TCS and TES.

Tessellation control shader inputs are an array indexed by the vertex
number, like geometry shader inputs.  There aren't per-patch TCS inputs.

Tessellation evaluation shaders have both per-vertex and per-patch
inputs.  Per-vertex inputs get the new intrinsics; per-patch inputs
continue to use the ordinary load_input intrinsics, as they already
work like we want them to.

v2: Change stage_uses_per_vertex_inputs into is_per_vertex_input(),
    which takes a variable (requested by Jason Ekstrand).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
8 years agoi965: Silence unused parameter warnings in get_buffer_rect
Ian Romanick [Mon, 2 Nov 2015 22:29:42 +0000 (14:29 -0800)]
i965: Silence unused parameter warnings in get_buffer_rect

brw_meta_fast_clear.c: In function 'get_buffer_rect':
brw_meta_fast_clear.c:318:37: warning: unused parameter 'brw' [-Wunused-parameter]
 get_buffer_rect(struct brw_context *brw, struct gl_framebuffer *fb,
                                     ^
brw_meta_fast_clear.c:319:44: warning: unused parameter 'irb' [-Wunused-parameter]
                 struct intel_renderbuffer *irb, struct rect *rect)
                                            ^

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
8 years agometa/generate_mipmap: Don't leak the sampler object
Ian Romanick [Tue, 10 Nov 2015 20:36:58 +0000 (12:36 -0800)]
meta/generate_mipmap: Don't leak the sampler object

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
8 years agoi965: Remove unneeded #includes.
Matt Turner [Fri, 13 Nov 2015 20:16:48 +0000 (12:16 -0800)]
i965: Remove unneeded #includes.

Some of these are no longer needed since all the backends switched to
NIR.

8 years agoi965: Silence warning.
Matt Turner [Fri, 13 Nov 2015 20:13:14 +0000 (12:13 -0800)]
i965: Silence warning.

intel_asm_annotation.c: In function ‘annotation_insert_error’:
intel_asm_annotation.c:214:18:
warning: ‘ann’ may be used uninitialized in this function
[-Wmaybe-uninitialized]
       ann->error = ralloc_strdup(annotation->mem_ctx, error);
                         ^

I initially tried changing the type of ann_count to unsigned (is
currently int), since that in addition to the check that it's non-zero
at the beginning of the function seems sufficient to prove that it must
be greater than zero. Unfortunately that wasn't sufficient.

8 years agoi965: Don't write beyond allocated memory.
Juha-Pekka Heikkila [Fri, 13 Nov 2015 11:36:43 +0000 (13:36 +0200)]
i965: Don't write beyond allocated memory.

Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
8 years agoi965: Use BRW_MRF_COMPR4 macro in more places.
Matt Turner [Mon, 2 Nov 2015 18:23:12 +0000 (10:23 -0800)]
i965: Use BRW_MRF_COMPR4 macro in more places.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965: Combine register file field.
Matt Turner [Tue, 27 Oct 2015 01:41:27 +0000 (18:41 -0700)]
i965: Combine register file field.

The first four values (2-bits) are hardware values, and VGRF, ATTR, and
UNIFORM remain values used in the IR.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965: Replace HW_REG with ARF/FIXED_GRF.
Matt Turner [Tue, 27 Oct 2015 00:52:57 +0000 (17:52 -0700)]
i965: Replace HW_REG with ARF/FIXED_GRF.

HW_REGs are (were!) kind of awful. If the file was HW_REG, you had to
look at different fields for type, abs, negate, writemask, swizzle, and
a second file. They also caused annoying problems like immediate sources
being considered scheduling barriers (commit 6148e94e2) and other such
nonsense.

Instead use ARF/FIXED_GRF/MRF for fixed registers in those files.

After a sufficient amount of time has passed since "GRF" was used, we
can rename FIXED_GRF -> GRF, but doing so now would make rebasing awful.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/fs: Set stride correctly for immediates in fs_reg(brw_reg).
Matt Turner [Mon, 2 Nov 2015 00:25:04 +0000 (00:25 +0000)]
i965/fs: Set stride correctly for immediates in fs_reg(brw_reg).

The fs_reg() constructors for immediates set stride to 0, except for
vector-immediates, which set stride to 1.  This patch makes the fs_reg
constructor that takes a brw_reg do likewise, so that stride is set
correctly for cases such as fs_reg(brw_imm_v(...)).

The generator asserts that this is true (and presumably it's useful in
some optimization passes?) and the VF fs_reg constructors did this (by
virtue of the fact that it doesn't override what init() does).

In the next commit, calling this constructor with brw_imm_* will generate
an IMM file register rather than a HW_REG, making this change necessary
to avoid breakage with existing uses of brw_imm_v().

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/fs: Handle type-V immediates in brw_reg_from_fs_reg().
Matt Turner [Mon, 2 Nov 2015 00:22:29 +0000 (00:22 +0000)]
i965/fs: Handle type-V immediates in brw_reg_from_fs_reg().

We use brw_imm_v() to produce type-V immediates, which generates a
brw_reg with fs_reg's .file set to HW_REG. The next commit will rid us
of HW_REGs, so we need to handle BRW_REGISTER_TYPE_V in the IMM case.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965: Rename GRF to VGRF.
Matt Turner [Tue, 27 Oct 2015 00:09:25 +0000 (17:09 -0700)]
i965: Rename GRF to VGRF.

The 2-bit hardware register file field is ARF, GRF, MRF, IMM.

Rename GRF to VGRF (virtual GRF) so that we can reuse the GRF name to
mean an assigned general purpose register.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965: Move BAD_FILE from the beginning of enum register_file.
Matt Turner [Fri, 30 Oct 2015 05:04:22 +0000 (22:04 -0700)]
i965: Move BAD_FILE from the beginning of enum register_file.

I'm going to begin using brw_reg's file field in backend_reg and its
derivatives, and in order to keep the hardware value for ARF as 0, we
have to do something different.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965: Initialize registers.
Matt Turner [Fri, 30 Oct 2015 20:53:38 +0000 (13:53 -0700)]
i965: Initialize registers.

The test (file == BAD_FILE) works on registers for which the constructor
has not run because BAD_FILE is zero.  The next commit will move
BAD_FILE in the enum so that it's no longer zero.

In the case of this->outputs, the constructor was being run implicitly,
and we were unnecessarily memsetting is to zero.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965: Use brw_reg's nr field to store register number.
Matt Turner [Mon, 26 Oct 2015 11:35:14 +0000 (04:35 -0700)]
i965: Use brw_reg's nr field to store register number.

In addition to combining another field, we get replace silliness like
"reg.reg" with something that actually makes sense, "reg.nr"; and no one
will ever wonder again why dst.reg isn't a dst_reg.

Moving the now 16-bit nr field to a 16-bit boundary decreases code size
by about 3k.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965: Unwrap some lines.
Matt Turner [Mon, 26 Oct 2015 11:04:16 +0000 (04:04 -0700)]
i965: Unwrap some lines.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/vec4: Remove swizzle/writemask fields from src/dst_reg.
Matt Turner [Mon, 26 Oct 2015 04:14:56 +0000 (21:14 -0700)]
i965/vec4: Remove swizzle/writemask fields from src/dst_reg.

Also allows us to handle HW_REGs in the swizzle() and writemask()
functions.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965: Remove fixed_hw_reg field from backend_reg.
Matt Turner [Sat, 24 Oct 2015 22:29:03 +0000 (15:29 -0700)]
i965: Remove fixed_hw_reg field from backend_reg.

Since backend_reg now inherits brw_reg, we can use it in place of the
fixed_hw_reg field.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965: Use immediate storage in inherited brw_reg.
Matt Turner [Sat, 24 Oct 2015 21:55:57 +0000 (14:55 -0700)]
i965: Use immediate storage in inherited brw_reg.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965: Add and use enum brw_reg_file.
Matt Turner [Fri, 23 Oct 2015 20:11:44 +0000 (13:11 -0700)]
i965: Add and use enum brw_reg_file.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965: Reorganize brw_reg fields.
Matt Turner [Fri, 23 Oct 2015 19:17:03 +0000 (12:17 -0700)]
i965: Reorganize brw_reg fields.

Put fields that are meaningless with an immediate in the same storage
with the immediate. This leaves fields type, file, nr, subnr in the
first dword where there's now extra room for expansion.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965: Make 'dw1' and 'bits' unnamed structures in brw_reg.
Matt Turner [Fri, 23 Oct 2015 02:41:30 +0000 (19:41 -0700)]
i965: Make 'dw1' and 'bits' unnamed structures in brw_reg.

Generated by

   sed -i -e 's/\.bits\././g' *.c *.h *.cpp
   sed -i -e 's/dw1\.//g' *.c *.h *.cpp

and then reverting changes to comments in gen7_blorp.cpp and
brw_fs_generator.cpp.

There wasn't any utility offered by forcing the programmer to list these
to access their fields. Removing them will reduce churn in future
commits.

This is C11 (and gcc has apparently supported it for sometime
"compatibility with other compilers")

See https://gcc.gnu.org/onlinedocs/gcc/Unnamed-Fields.html

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965: Delete type field from backend_reg.
Matt Turner [Sat, 24 Oct 2015 22:04:23 +0000 (15:04 -0700)]
i965: Delete type field from backend_reg.

Switching from an implicitly-sized type field to field with an explicit
bit width is safe because we have fewer than 2^4 types, and gcc will
warn if you attempt to set a value that will not fit.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965: Delete abs/negate fields from backend_reg.
Matt Turner [Sat, 24 Oct 2015 21:35:33 +0000 (14:35 -0700)]
i965: Delete abs/negate fields from backend_reg.

Instead use the ones provided by brw_reg. Also allows us to handle
HW_REGs in the negate() functions.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965: Make backend_reg inherit from brw_reg.
Matt Turner [Sat, 24 Oct 2015 21:32:03 +0000 (14:32 -0700)]
i965: Make backend_reg inherit from brw_reg.

Some fields (file, type, abs, negate) in brw_reg are shadowed by
backend_reg.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/fs: Replace nested ternary with if ladder.
Matt Turner [Fri, 13 Nov 2015 00:02:22 +0000 (16:02 -0800)]
i965/fs: Replace nested ternary with if ladder.

Since the types of the expression were

   bool ? src_reg : (bool ? brw_reg : brw_reg)

the result of the second (nested) ternary would be implicitly
converted to a src_reg by the src_reg(struct brw_reg) constructor. I.e.,

   bool ? src_reg : src_reg(bool ? brw_reg : brw_reg)

In the next patch, I make backend_reg (the parent of src_reg) inherit
from brw_reg, which changes this expression to return brw_reg, which
throws away any fields that exist in the classes derived from brw_reg.
I.e.,

   src_reg(bool ? brw_reg(src_reg) : bool ? brw_reg : brw_reg)

Generally this code was gross, and wasn't actually shorter or easier to
read than an if ladder.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
8 years agoradeonsi: remove dead code after ES-GS linkage change
Marek Olšák [Thu, 15 Oct 2015 21:41:35 +0000 (23:41 +0200)]
radeonsi: remove dead code after ES-GS linkage change

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
8 years agoradeonsi: link ES-GS just like LS-HS
Marek Olšák [Thu, 15 Oct 2015 21:29:00 +0000 (23:29 +0200)]
radeonsi: link ES-GS just like LS-HS

This reduces the shader key for ES.

Use a fixed attrib location based on (semantic name,  index).

The ESGS item size is determined by the physical index of the highest ES
output, so it's almost always larger than before, but I think that
shouldn't matter as long as the ESGS ring buffer is large enough.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
8 years agoradeonsi: calculate optimal GS ring sizes to fix GS hangs on Tonga
Marek Olšák [Sun, 8 Nov 2015 12:34:44 +0000 (13:34 +0100)]
radeonsi: calculate optimal GS ring sizes to fix GS hangs on Tonga

I discovered that increasing the ESGS ring size fixes GS hangs on Tonga,
so let's do it properly.

There is now a separate init_config_gs_rings state that is not immutable,
because GS rings are resized when needed.

This also saves some memory. Most apps won't need more than 1MB
per ring per shader engine.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
8 years agoradeonsi: rename si_update_gs_rings
Marek Olšák [Sun, 8 Nov 2015 11:15:54 +0000 (12:15 +0100)]
radeonsi: rename si_update_gs_rings

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
8 years agoradeonsi: calculate ESGS_RING_ITEMSIZE in create_shader
Marek Olšák [Sun, 8 Nov 2015 11:12:46 +0000 (12:12 +0100)]
radeonsi: calculate ESGS_RING_ITEMSIZE in create_shader

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
8 years agoradeonsi: move maximum gs stream calculation into create_shader
Marek Olšák [Sun, 8 Nov 2015 11:05:39 +0000 (12:05 +0100)]
radeonsi: move maximum gs stream calculation into create_shader

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
8 years agoradeonsi: clean up small duplication in si_shader_gs
Marek Olšák [Sun, 8 Nov 2015 10:49:33 +0000 (11:49 +0100)]
radeonsi: clean up small duplication in si_shader_gs

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>