platform/upstream/mesa.git
8 years agonir/inline: Don't use foreach_instr_safe unless we need to
Jason Ekstrand [Fri, 27 May 2016 16:25:51 +0000 (09:25 -0700)]
nir/inline: Don't use foreach_instr_safe unless we need to

Suggested-by: Connor Abbott <cwabbott0@gmail.com>
8 years agogallivm: eliminate a unnecessary AND with unorm lerps
Roland Scheidegger [Thu, 12 May 2016 23:44:39 +0000 (01:44 +0200)]
gallivm: eliminate a unnecessary AND with unorm lerps

Instead of doing a add and then mask out the upper bits, we can
simply do a add with a half wide type (this, of course, assumes
the hw can actually do it...), so we'll get the required zero
in the upper bits automatically.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
8 years agogallium/util: use enum pipe_prim_type instead of unsigned some more
Roland Scheidegger [Fri, 27 May 2016 16:49:44 +0000 (18:49 +0200)]
gallium/util: use enum pipe_prim_type instead of unsigned some more

There were complaints from a mingw build:
u_draw.h:134:14: error: invalid conversion from ‘uint {aka unsigned int}’
to ‘pipe_prim_type’ [-fpermissive]

Reviewed-by: Brian Paul <brianp@vmware.com>
8 years agosvga: remove unneeded casts in get_query_result_vgpu9() calls
Brian Paul [Fri, 27 May 2016 00:58:16 +0000 (18:58 -0600)]
svga: remove unneeded casts in get_query_result_vgpu9() calls

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
8 years agosvga: use MAYBE_UNUSED to silence release-build warnings
Brian Paul [Fri, 27 May 2016 00:57:51 +0000 (18:57 -0600)]
svga: use MAYBE_UNUSED to silence release-build warnings

Signed-off-by: Brian Paul <brianp@vmware.com>
8 years agoisl: Fix some tautological-compare warnings
Ben Widawsky [Fri, 27 May 2016 04:59:17 +0000 (21:59 -0700)]
isl: Fix some tautological-compare warnings

Fixes:
isl.c:62:22: warning: self-comparison always evaluates to true [-Wtautological-compare]
    assert(ISL_DEV_GEN(dev) == dev->info->gen);
                      ^~
isl.c:63:33: warning: self-comparison always evaluates to true [-Wtautological-compare]
    assert(ISL_DEV_USE_SEPARATE_STENCIL(dev) == dev->use_separate_stencil);

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
8 years agomesa: add support for GLSL ES 3.20 version string
Ilia Mirkin [Thu, 26 May 2016 17:58:42 +0000 (13:58 -0400)]
mesa: add support for GLSL ES 3.20 version string

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
8 years agomapi: expose new functions in GL ES 3.2
Ilia Mirkin [Thu, 26 May 2016 17:58:41 +0000 (13:58 -0400)]
mapi: expose new functions in GL ES 3.2

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
8 years agonvc0/ir: handle a load's reg result not being used for locked variants
Ilia Mirkin [Thu, 26 May 2016 02:41:06 +0000 (22:41 -0400)]
nvc0/ir: handle a load's reg result not being used for locked variants

For a load locked, we might not use the first result but the second
result is the predicate result of the locking. In that case the load
splitting logic doesn't apply (which is designed for splitting 128-bit
loads). Instead we take the predicate and move it into the first
position (as having a dead result in first def's position upsets all
sorts of things including RA). Update the emitters to deal with this as
well.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
8 years agonvc0/ir: avoid generating illegal instructions for compute constbuf loads
Ilia Mirkin [Thu, 26 May 2016 01:54:39 +0000 (21:54 -0400)]
nvc0/ir: avoid generating illegal instructions for compute constbuf loads

For user-supplied constbufs, fileIndex is 0. In that case, when we
subtract 1, we'll end up loading from constbuf offset -16. This is
illegal, and there are asserts to avoid it. Normally we'd just DCE it,
but no point in generating the instructions if they're not going to be
used.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
8 years agogallium/util: fix build break
Rob Clark [Fri, 27 May 2016 00:59:08 +0000 (20:59 -0400)]
gallium/util: fix build break

Missing #include caused build breaks after 21a3fb9cd.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
8 years agonir/spirv: Allow pointless variable decorations on inputs
Jason Ekstrand [Fri, 27 May 2016 00:06:17 +0000 (17:06 -0700)]
nir/spirv: Allow pointless variable decorations on inputs

SPIR-V specifies that a bunch of stuff gets applied to types.  This means
taht a local variable could get, for instance, an array stride.  Just
because it's pointless doesn't mean you'll never see it.

8 years agogallium/util: use enum pipe_prim_type in u_prim.h functions
Brian Paul [Thu, 26 May 2016 20:50:13 +0000 (14:50 -0600)]
gallium/util: use enum pipe_prim_type in u_prim.h functions

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
8 years agoutil/indices: move duplicated assignments out of switch cases
Brian Paul [Thu, 26 May 2016 15:50:24 +0000 (09:50 -0600)]
util/indices: move duplicated assignments out of switch cases

Spotted by Roland.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
8 years agogallium: change pipe_draw_info::mode to be pipe_prim_type
Brian Paul [Thu, 26 May 2016 13:17:50 +0000 (07:17 -0600)]
gallium: change pipe_draw_info::mode to be pipe_prim_type

Makes debugging with gdb a little nicer.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
8 years agoutil/indices,svga: s/unsigned/enum pipe_prim_type/
Brian Paul [Thu, 26 May 2016 13:12:59 +0000 (07:12 -0600)]
util/indices,svga: s/unsigned/enum pipe_prim_type/

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
8 years agoutil: s/unsigned/enum pipe_resource_usage/ for buffer usage variables
Brian Paul [Wed, 25 May 2016 23:13:56 +0000 (17:13 -0600)]
util: s/unsigned/enum pipe_resource_usage/ for buffer usage variables

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
8 years agosvga: s/unsigned/enum pipe_resource_usage/ for buffer usage variables
Brian Paul [Wed, 25 May 2016 23:13:23 +0000 (17:13 -0600)]
svga: s/unsigned/enum pipe_resource_usage/ for buffer usage variables

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
8 years agosvga: s/unsigned/enum pipe_prim_type/ for primitive type variables
Brian Paul [Wed, 25 May 2016 22:52:34 +0000 (16:52 -0600)]
svga: s/unsigned/enum pipe_prim_type/ for primitive type variables

Proper enum types were only added recently.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
8 years agosvga: fix test for unfilled triangles fallback
Brian Paul [Wed, 25 May 2016 18:42:55 +0000 (12:42 -0600)]
svga: fix test for unfilled triangles fallback

VGPU10 actually supports line-mode triangles.  We failed to make use of
that before.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
8 years agosvga: clean up and improve comments in svga_draw_private.h
Brian Paul [Wed, 25 May 2016 15:46:17 +0000 (09:46 -0600)]
svga: clean up and improve comments in svga_draw_private.h

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
8 years agoutil/indices: implement unfilled (tri->line) conversion for adjacency prims
Brian Paul [Wed, 25 May 2016 17:58:29 +0000 (11:58 -0600)]
util/indices: implement unfilled (tri->line) conversion for adjacency prims

Tested with new piglit gl-3.2-adj-prims test.

v2: re-order trisadj and tristripadj code, per Roland.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
8 years agoutil/indices: implement provoking vertex conversion for adjacency primitives
Brian Paul [Wed, 25 May 2016 21:53:25 +0000 (15:53 -0600)]
util/indices: implement provoking vertex conversion for adjacency primitives

Tested with new piglit gl-3.2-adj-prims test.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
8 years agoutil/indices: assert that the incoming primitive is a triangle type
Brian Paul [Fri, 13 May 2016 22:49:22 +0000 (16:49 -0600)]
util/indices: assert that the incoming primitive is a triangle type

The unfilled index translator/generator functions should only be
called when the primitive mode is one of the triangle types.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
8 years agoutil/indices: formatting, whitespace fixes in u_unfilled_indices.c
Brian Paul [Fri, 13 May 2016 22:46:26 +0000 (16:46 -0600)]
util/indices: formatting, whitespace fixes in u_unfilled_indices.c

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
8 years agoutil/indices: improve comments in u_indices.h
Brian Paul [Fri, 13 May 2016 22:45:25 +0000 (16:45 -0600)]
util/indices: improve comments in u_indices.h

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
8 years agosvga: fix primitive mode (point/line/tri) test for unfilled primitives
Brian Paul [Mon, 9 May 2016 19:42:58 +0000 (13:42 -0600)]
svga: fix primitive mode (point/line/tri) test for unfilled primitives

The original mode test was valid before we had GS support.

Regression tested with full piglit run.  Though, I don't think we have
any piglit tests that exercise drawing unfilled adjacency primitives.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
8 years agoi965: Enable GL_OES_shader_io_blocks
Ian Romanick [Wed, 11 May 2016 20:11:00 +0000 (13:11 -0700)]
i965: Enable GL_OES_shader_io_blocks

Only one dEQP io_blocks test fails.  This test fails for the same reason
as the match_different_member_struct_names test in a previous commit.

dEQP-GLES31.functional.separate_shader.validation.io_blocks.match_different_member_struct_names

v2: Add to release notes.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
8 years agoglsl: Allow shader interface blocks in GLSL ES
Ian Romanick [Thu, 12 May 2016 01:24:32 +0000 (18:24 -0700)]
glsl: Allow shader interface blocks in GLSL ES

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
8 years agoglsl: Add a has_shader_io_blocks helper
Ian Romanick [Wed, 11 May 2016 21:03:40 +0000 (14:03 -0700)]
glsl: Add a has_shader_io_blocks helper

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
8 years agomesa: Add extension tracking for GL_OES_shader_io_blocks
Ian Romanick [Wed, 11 May 2016 20:05:22 +0000 (13:05 -0700)]
mesa: Add extension tracking for GL_OES_shader_io_blocks

v2: Also support GL_EXT_shader_io_blocks.  It's pretty much identical to
the OES extension.  Suggested by Ilia.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
8 years agomesa: Only validate SSO shader IO in OpenGL ES or debug context
Ian Romanick [Thu, 19 May 2016 17:27:12 +0000 (10:27 -0700)]
mesa: Only validate SSO shader IO in OpenGL ES or debug context

v2: Move later in series to avoid issues with Gallium drivers and debug
contexts.  Suggested by Ilia.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Suggested-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
8 years agomesa: Remove old validate_io function
Ian Romanick [Fri, 20 May 2016 01:09:00 +0000 (18:09 -0700)]
mesa: Remove old validate_io function

The new validate_io catches all of the cases (and many more) that the
old function caught.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
8 years agomesa: Additional SSO validation using program_interface_query data
Ian Romanick [Thu, 19 May 2016 17:28:25 +0000 (10:28 -0700)]
mesa: Additional SSO validation using program_interface_query data

Fixes the following dEQP tests on SKL:

dEQP-GLES31.functional.separate_shader.validation.varying.mismatch_qualifier_vertex_smooth_fragment_flat
dEQP-GLES31.functional.separate_shader.validation.varying.mismatch_implicit_explicit_location_1
dEQP-GLES31.functional.separate_shader.validation.varying.mismatch_array_element_type
dEQP-GLES31.functional.separate_shader.validation.varying.mismatch_qualifier_vertex_flat_fragment_none
dEQP-GLES31.functional.separate_shader.validation.varying.mismatch_struct_member_order
dEQP-GLES31.functional.separate_shader.validation.varying.mismatch_struct_member_type
dEQP-GLES31.functional.separate_shader.validation.varying.mismatch_qualifier_vertex_centroid_fragment_flat
dEQP-GLES31.functional.separate_shader.validation.varying.mismatch_array_length
dEQP-GLES31.functional.separate_shader.validation.varying.mismatch_type
dEQP-GLES31.functional.separate_shader.validation.varying.mismatch_struct_member_precision
dEQP-GLES31.functional.separate_shader.validation.varying.mismatch_explicit_location_type
dEQP-GLES31.functional.separate_shader.validation.varying.mismatch_qualifier_vertex_flat_fragment_centroid
dEQP-GLES31.functional.separate_shader.validation.varying.mismatch_explicit_location
dEQP-GLES31.functional.separate_shader.validation.varying.mismatch_qualifier_vertex_flat_fragment_smooth
dEQP-GLES31.functional.separate_shader.validation.varying.mismatch_struct_member_name

It regresses one test:

dEQP-GLES31.functional.separate_shader.validation.varying.match_different_struct_names

Hoever, this test is based on language in the OpenGL ES 3.1 spec that I
believe is incorrect.  I have already submitted a spec bug:

https://www.khronos.org/bugzilla/show_bug.cgi?id=1500

v2: Move spec quote about built-in variables to the first place where
it's relevant.  Suggested by Alejandro.

v3: Move patch earlier in series, fix rebase issues.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> [v2]
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com> [v2]
8 years agomesa: Track the additional data in gl_shader_variable
Ian Romanick [Thu, 19 May 2016 17:25:47 +0000 (10:25 -0700)]
mesa: Track the additional data in gl_shader_variable

The interface type, interpolation mode, precision, the type of the
outermost structure, and whether or not the variable has an explicit
location will be used for SSO validation on OpenGL ES.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
8 years agonir: Make nir_const_value a union
Jason Ekstrand [Thu, 26 May 2016 22:38:45 +0000 (15:38 -0700)]
nir: Make nir_const_value a union

There's no good reason for it to be a struct of an anonymous union.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96221
Tested-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
8 years agoi965: Use the buffer object size for VERTEX_BUFFER_STATE's size field.
Kenneth Graunke [Wed, 25 May 2016 21:38:32 +0000 (14:38 -0700)]
i965: Use the buffer object size for VERTEX_BUFFER_STATE's size field.

commit 7c8dfa78b98a12c1c5 (i965/draw: Use the real size for vertex
buffers) changed how we programmed the VERTEX_BUFFER_STATE size field.

Previously, we programmed it to the size of the actual underlying BO,
which is page-aligned, and potentially much larger than the GL buffer
object.  This violated the ARB_robust_buffer_access spec.

With that change, we started programming it based on the range of data
we expect the draw call to actually access - which is based on the
min_index and max_index information provided to glDrawRangeElements().

Unfortunately, applications often provide inaccurate range information
to glDrawRangeElements().  For example, all the Unreal demos appear to
draw using a range of [0, 3] when the index buffer's actual index range
is [0, 5].  Such results are undefined, and we are absolutely allowed
to restrict access to the range they specified.  However, the failure
mode is usually that nothing draws, or misrendering with wild geometry,
which is kind of bad for a common mistake.  And people tend to assume
the range information isn't that important when data is in VBOs.

There's no real advantage, either.  ARB_robust_buffer_access only
requires us to restrict access to the GL buffer object size, not
the range of data we think they should access.  Doing that allows
buggy applications to still function.  (Note that we still use this
information for busy-tracking, so if they try to overwrite the data
with glBufferSubData, they'll still hit a bug.)  This seems to be
safer.

We may want to provide the more strict range as a debug option,
or scan the VBO and warn against bogus glDrawRangeElements in
debug contexts.  That can be done as a later patch, though.

Makes Unreal demos draw again.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agonvc0: invalidate textures/samplers between 3D and CP on Fermi
Samuel Pitoiset [Thu, 26 May 2016 21:01:37 +0000 (23:01 +0200)]
nvc0: invalidate textures/samplers between 3D and CP on Fermi

Like constant buffers, samplers and textures are aliased on Fermi and
we need to invalidate the state when switching from 3D to CP and vice
versa.

This fixes rendering issues in the UE4 demos.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
8 years agoanv: Stop linking against libmesa.la and libdri_test_stubs.la
Jason Ekstrand [Thu, 26 May 2016 01:20:40 +0000 (18:20 -0700)]
anv: Stop linking against libmesa.la and libdri_test_stubs.la

This brings the final size of an optimized non-debug build of the Vulkan
driver down to 2.9 MB as opposed to 8.7 MB for the dri driver.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
8 years agoi965: Don't link libmesa or libdri_test_stubs into tests
Jason Ekstrand [Thu, 26 May 2016 00:51:59 +0000 (17:51 -0700)]
i965: Don't link libmesa or libdri_test_stubs into tests

Now that the compiler has been completely separated from libmesa, we no
longer need these.  We can make the tests much smaller by not linking them
in.  This also ensures that anyone who runs make check won't accidentally
put in any dependencies from the compiler to the rest of mesa core.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
8 years agoi965: Move compiler debug functions to intel_screen.c
Jason Ekstrand [Thu, 26 May 2016 01:19:50 +0000 (18:19 -0700)]
i965: Move compiler debug functions to intel_screen.c

They reference the compiler so they shouldn't go in libi965_compiler.la.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
8 years agoi965/test: Remove the fragment/vertex_program field from test visitors
Jason Ekstrand [Thu, 26 May 2016 00:41:59 +0000 (17:41 -0700)]
i965/test: Remove the fragment/vertex_program field from test visitors

None of them are actually using it.  It's a relic of an older compiler
interface that required a gl_program.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
8 years agoi965: Move brw_new_shader to brw_link.cpp
Jason Ekstrand [Thu, 26 May 2016 00:46:07 +0000 (17:46 -0700)]
i965: Move brw_new_shader to brw_link.cpp

That's where brw_link_shader lives and they seem to go together.  Also,
this gets it out of libi965_compiler.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
8 years agoi965: Move brw_nir_lower_uniforms.cpp to i965_FILES
Jason Ekstrand [Thu, 26 May 2016 00:29:38 +0000 (17:29 -0700)]
i965: Move brw_nir_lower_uniforms.cpp to i965_FILES

This gets it out of i965_compiler.la

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
8 years agoi965: Move brw_create_nir to brw_program.c
Jason Ekstrand [Thu, 26 May 2016 00:27:23 +0000 (17:27 -0700)]
i965: Move brw_create_nir to brw_program.c

This way it's no longer part of libi965_compiler.la since it depends on
GLSL and ARB program stuff.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
8 years agoi965/nir: Move the type_size_*_bytes functions to brw_nir.h
Jason Ekstrand [Thu, 26 May 2016 00:26:42 +0000 (17:26 -0700)]
i965/nir: Move the type_size_*_bytes functions to brw_nir.h

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
8 years agoptn: Include nir.h
Jason Ekstrand [Thu, 26 May 2016 00:27:57 +0000 (17:27 -0700)]
ptn: Include nir.h

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
8 years agocompiler: Move glsl_to_nir to libglsl.la
Jason Ekstrand [Wed, 25 May 2016 23:00:38 +0000 (16:00 -0700)]
compiler: Move glsl_to_nir to libglsl.la

Right now libglsl.la depends on libnir.la so putting it in libnir.la
adds a dependency on libglsl.la that goes the wrong direction.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
8 years agoi965/sklgt4: Implement depth/timestamp write w/a
Ben Widawsky [Thu, 26 May 2016 18:04:07 +0000 (11:04 -0700)]
i965/sklgt4: Implement depth/timestamp write w/a

The stated bug describes a scenario in which a post sync write operation for
depth or timestamp can be ignored. There are two workarounds suggested, the
first and easier is to simply do a cs stall when we do these type of writes.
The second option is to do a PIPE_CONTROL flush after the post sync but before
the data is required.

Generally, I believe the data written out is consumed by the application on the
CPU side and so doing the easier of the two is ideal. Furthermore, these queries
aren't tremendously common in the perf sensitive apps I have looked at. However,
there could be cases where a shader stage might directly consume the data, and
as a result option 2 may be desirable.

This patch goes with the easier solution for now.

gen9lp bug_de_id=2137196

By itself, this does *not* fix any of the GT4 hangs we're currently
experiencing.

Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
8 years agoi965/bxt: Add 2x6 variant
Ben Widawsky [Thu, 26 May 2016 15:08:29 +0000 (08:08 -0700)]
i965/bxt: Add 2x6 variant

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
8 years agoradeonsi: Allow TES distribution between shader engines.
Bas Nieuwenhuizen [Tue, 12 Apr 2016 18:28:46 +0000 (20:28 +0200)]
radeonsi: Allow TES distribution between shader engines.

The R_028B50_VGT_TESS_DISTRIBUTION value is copied from
amdgpu-pro. Smaller values in the ACCUM fields seem to
decrease the performance advantage from this patch, higher
values don't seem to matter.

v2: Add distribution mode field enums.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agoradeonsi: Process multiple patches per threadgroup.
Bas Nieuwenhuizen [Mon, 2 May 2016 13:00:21 +0000 (15:00 +0200)]
radeonsi: Process multiple patches per threadgroup.

Using more than 1 wave per threadgroup does increase performance
generally.  Not using too many patches per threadgroup also
increases performance. Both catalyst and amdgpu-pro seem to
use 40 patches as their maximum, but I haven't really seen
any performance increase from limiting the number of patches
to 40 instead of 64.

Note that the trick where we overlap the input and output LDS
does not work anymore as the insertion of the tess factors
changes the patch stride.

v2: - Add comment about LDS assumptions.
    - Add constant for buffer size.
    - Fix code style.

v3: - Correct limits for not splitting patches between waves.
    - Set max num_patches to 40 as in the proprietary driver.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agoradeonsi: Add barrier before writing the tess factors.
Bas Nieuwenhuizen [Thu, 26 May 2016 12:09:43 +0000 (14:09 +0200)]
radeonsi: Add barrier before writing the tess factors.

The factors may be stored to LDs by another invocation than
the invocation for vertex 0.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agoradeonsi: Enable dynamic HS.
Bas Nieuwenhuizen [Mon, 2 May 2016 12:59:43 +0000 (14:59 +0200)]
radeonsi: Enable dynamic HS.

This allows running the TES on different CU's than the
TCS which results in performance improvements.

v2: Only write the control word from one invocation.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agoradeonsi: Remove LDS layout user SGPR's from TES.
Bas Nieuwenhuizen [Mon, 9 May 2016 23:05:32 +0000 (01:05 +0200)]
radeonsi: Remove LDS layout user SGPR's from TES.

They are unused.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agoradeonsi: Use buffer loads and stores for passing data from TCS to TES.
Bas Nieuwenhuizen [Mon, 2 May 2016 12:55:52 +0000 (14:55 +0200)]
radeonsi: Use buffer loads and stores for passing data from TCS to TES.

We always try to use 4-component loads, as LLVM does not combine loads
and they bypass the L1 cache.

We can't use a similar strategy for stores and this is especially
notable with the tess factors, as they are often set with separate
MOV's per component in the TGSI.

We keep storing to LDS and the LDS space, so we can load the outputs
later, either due to the shader, of for wrting the tess factors.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agoradeonsi: Store inputs to memory when not using a TCS.
Bas Nieuwenhuizen [Tue, 3 May 2016 19:31:00 +0000 (21:31 +0200)]
radeonsi: Store inputs to memory when not using a TCS.

We need to copy the VS outputs to memory. I decided to do this
using a shader key, as the value depends on other shaders.

I also switch the fixed function TCS over to monolithic, as
otherwisze many of the user SGPR's need to be passed to the
epilog, which increases register pressure, or complexity to
avoid that. The main body of the fixed function TCS is not
that interesting to precompile anyway, since we do it on
demand and it is very small.

v2: Use u_bit_scan64.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agoradeonsi: Add offchip buffer address calculation.
Bas Nieuwenhuizen [Mon, 9 May 2016 22:49:39 +0000 (00:49 +0200)]
radeonsi: Add offchip buffer address calculation.

Instead of creating a memory area per patch and per vertex, we put
the same attribute of every vertex & patch together. Most loads
and stores access the same attribute across all lanes, only for
different patches and vertices.

For the TCS this results in tightly packed data for 4-component
stores.

For the TES this is not the case as within a patch the loads
often also access the same vertex. However if there are < 4
vertices/patch, this still results in a reduction of the number
of cache lines. In the LDS situation we only do better than worst
case if the data per patch < 64 bytes, which due to the
tessellation factors is pretty much never.

We do not use hardware swizzling for this. It would slightly reduce
the number of executed VALU instructions, but I had issues with
increased wait times that I haven't been able to solve yet.
Furthermore, the tbuffer_store intrinsic does not support both
VGPR offset and an index, so we have a problem storing
indirectly indexed outputs. This can be solved by temporarily
storing arrays in LDS and then copying them, but I don't think
that is worth the effort. The difference in VALU cycles
hardware swizzling gives is about 0.2% of total busy cycles.
That is without handling the array case.

I chose for attributes instead of components as they are often
accessed together, and the software swizzling takes VALU cycles
for calculating offsets.

v2: - Rename functions to get_tcs_tes_buffer_address.
    - multiply by 16 as late as possible.
    - Use  tgsi_full_src_register_from_dst.
    - Remove some bad comments.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agoradeonsi: Add user SGPR for the layout of the offchip buffer.
Bas Nieuwenhuizen [Mon, 9 May 2016 22:48:55 +0000 (00:48 +0200)]
radeonsi: Add user SGPR for the layout of the offchip buffer.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agoradeonsi: Use correct parameter index for LS_OUT_LAYOUT.
Bas Nieuwenhuizen [Sun, 1 May 2016 18:35:40 +0000 (20:35 +0200)]
radeonsi: Use correct parameter index for LS_OUT_LAYOUT.

This happens to be in the right position, but that changes
when TCS/TES get new parameters.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agoradeonsi: Add buffer load functions.
Bas Nieuwenhuizen [Mon, 2 May 2016 12:39:56 +0000 (14:39 +0200)]
radeonsi: Add buffer load functions.

v2: - Use llvm.admgcn.buffer.load instrinsics for new LLVM.
    - Code style fixes.

v3: - Code style fix.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agoradeonsi: Define build_tbuffer_store_dwords earlier to support new users.
Bas Nieuwenhuizen [Mon, 2 May 2016 12:20:19 +0000 (14:20 +0200)]
radeonsi: Define build_tbuffer_store_dwords earlier to support new users.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agoradeonsi: Add offchip tessellation parameters.
Bas Nieuwenhuizen [Mon, 2 May 2016 11:20:43 +0000 (13:20 +0200)]
radeonsi: Add offchip tessellation parameters.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agoradeonsi: Add buffer for offchip storage between TCS and TES.
Bas Nieuwenhuizen [Mon, 2 May 2016 07:54:11 +0000 (09:54 +0200)]
radeonsi: Add buffer for offchip storage between TCS and TES.

The buffer is quite large, but should only be allocated if the
application uses tessellation. Most non-games don't.

v2: - Use the correct register for SI.
    - Add define for block size.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agotgsi: fix coverity out-of-bounds warning
Rob Clark [Thu, 26 May 2016 15:11:32 +0000 (11:11 -0400)]
tgsi: fix coverity out-of-bounds warning

CID 1271532 (#1 of 1): Out-of-bounds read (OVERRUN)34. overrun-local:
Overrunning array of 2 16-byte elements at element index 2 (byte offset
32) by dereferencing pointer &inst.Dst[i].

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
8 years agotgsi: fix out of bounds access
Rob Clark [Thu, 26 May 2016 14:22:33 +0000 (10:22 -0400)]
tgsi: fix out of bounds access

Not sure why coverity calls this an out-of-bounds read vs out-of-bounds
write.

CID 1358920 (#1 of 1): Out-of-bounds read (OVERRUN)9. overrun-local:
Overrunning array r of 3 16-byte elements at element index 3 (byte
offset 48) using index chan (which evaluates to 3).

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
8 years agoi965: Don't use fast copy blit in case of logical operations other than GL_COPY
Anuj Phogat [Wed, 25 May 2016 18:33:51 +0000 (11:33 -0700)]
i965: Don't use fast copy blit in case of logical operations other than GL_COPY

XY_FAST_COPY_BLT command doesn't have a field for raster operation. So, fall
back to using XY_SRC_COPY_BLT to handle those cases.

Fixes piglit test gl-1.1-xor-copypixels when fast copy blit is enabled
for all tiling formats.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
8 years agoi965/gen9: Remove the halign/valign field setup code in fast copy blit
Anuj Phogat [Sat, 12 Dec 2015 03:14:24 +0000 (19:14 -0800)]
i965/gen9: Remove the halign/valign field setup code in fast copy blit

Experimentation with different values of src/dst horizontal/vertical
alignment showed that these fileds are not used on gen9 hardware.

A recent update in graphics specs has removed these fields from
XY_FAST_COPY_BLT command.

Cc: Ben Widawsky <ben@bwidawsk.net>
Cc: Chad Versace <chad.versace@intel.com>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
8 years agonvc0: allow to monitor MP perf counters with compute shaders
Samuel Pitoiset [Wed, 25 May 2016 21:36:48 +0000 (23:36 +0200)]
nvc0: allow to monitor MP perf counters with compute shaders

To read out MP perf counters we use a compute shader and need to upload
input data like a 64-bits addr used to store the values and a sequence
ID for synchronization. Currently, this input data is uploaded as user
uniforms which means that it's sticked to c0[], but if a compute shader
from a real application is used, monitoring those performance counters
will just overwrite some data and miserably crash.

Instead, sticking the 64-bits addr and the sequence into the driver
constant buffer seems like much better and will allow to monitor
counters with GL 4.3 apps.

Tested on GF119 and GK110, but should not hurt anything on GK104.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
8 years agomesa: Move robustness code to main/robustness.c
Kristian Høgsberg Kristensen [Wed, 25 May 2016 22:29:41 +0000 (15:29 -0700)]
mesa: Move robustness code to main/robustness.c

Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Brian Paul <brianp@vmware.com>
8 years agodocs: Mark GL_KHR_robustness done for GLES3.2 as well
Kristian Høgsberg Kristensen [Wed, 25 May 2016 22:22:52 +0000 (15:22 -0700)]
docs: Mark GL_KHR_robustness done for GLES3.2 as well

Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Brian Paul <brianp@vmware.com>
8 years agoegl: Additional attribute validation for eglCreatePbufferSurface
Plamena Manolova [Wed, 25 May 2016 16:29:55 +0000 (17:29 +0100)]
egl: Additional attribute validation for eglCreatePbufferSurface

eglCreatePbufferSurface should generate an EGL_BAD_MATCH error if:
1: The EGL_TEXTURE_FORMAT attribute is EGL_NO_TEXTURE and EGL_TEXTURE_TARGET
is something other than EGL_NO_TEXTURE
2: EGL_TEXTURE_FORMAT is something other than EGL_NO_TEXTURE and
EGL_TEXTURE_TARGET is EGL_NO_TEXTURE.

This fixes the dEQP-EGL.functional.negative_api.create_pbuffer_surface test.

Signed-off-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
8 years agogallium/radeon: add the kernel version into the renderer string
Marek Olšák [Tue, 24 May 2016 23:00:53 +0000 (01:00 +0200)]
gallium/radeon: add the kernel version into the renderer string

Example:
Gallium 0.4 on AMD TONGA (DRM 3.2.0 / 4.5.0, LLVM 3.9.0)

My kernel version is pretty long already (4.5.0-amd-01025-g32791c1)
and adding "kernel" into the string would make too it long for glxinfo
to display.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
8 years agowinsys/amdgpu: add back multithreaded command submission
Marek Olšák [Tue, 8 Mar 2016 00:19:31 +0000 (01:19 +0100)]
winsys/amdgpu: add back multithreaded command submission

Ported from the initial amdgpu winsys from the private AMD branch.

The thread creates the buffer list, submits IBs, and cleans up
the submission context, which can also destroy buffers.

3-5% reduction in CPU overhead is expected for apps submitting a lot
of IBs per frame. This is most visible with DMA IBs.

v2: use a semaphore instead of a busy loop in amdgpu_ws_queue_cs
    add another amdgpu_cs_sync_flush call into amdgpu_bo_map

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
8 years agogallium/tgsi: use _mesa_roundevenf in micro_rnd
Lars Hamre [Thu, 19 May 2016 21:34:00 +0000 (15:34 -0600)]
gallium/tgsi: use _mesa_roundevenf in micro_rnd

Fixes the following piglit tests (for softpipe):

/spec/glsl-1.30/execution/built-in-functions/...
fs-roundeven-float
fs-roundeven-vec2
fs-roundeven-vec3
fs-roundeven-vec4
vs-roundeven-float
vs-roundeven-vec2
vs-roundeven-vec3
vs-roundeven-vec4

/spec/glsl-1.50/execution/built-in-functions/...
gs-roundeven-float
gs-roundeven-vec2
gs-roundeven-vec3
gs-roundeven-vec4

Signed-off-by: Lars Hamre <chemecse@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
8 years ago.mailmap: use Jakob Bornecrantz's personal email
Emil Velikov [Thu, 26 May 2016 12:57:32 +0000 (13:57 +0100)]
.mailmap: use Jakob Bornecrantz's personal email

The VMware one is bouncing.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
8 years agonvc0: add note about where the viewport mask would go
Ilia Mirkin [Thu, 26 May 2016 04:02:57 +0000 (00:02 -0400)]
nvc0: add note about where the viewport mask would go

Not piping this all the way through yet, but no better place to note
this down. This will can be used with NV_viewport_array2.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
8 years agonvc0: enable 32 textures on kepler+
Ilia Mirkin [Sat, 21 May 2016 23:09:32 +0000 (19:09 -0400)]
nvc0: enable 32 textures on kepler+

For fermi, this likely will require use of linked tsc mode. However on
bindless architectures, we can have as many as we want. As it stands,
the AUX_TEX_INFO has 32 teture handles reserved.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
8 years agoglsl: add unit tests data vertex/expected outcome for uninitialized warning
Alejandro Piñeiro [Wed, 20 Apr 2016 08:02:45 +0000 (10:02 +0200)]
glsl: add unit tests data vertex/expected outcome for uninitialized warning

v2: fix 025 test. Add three more tests (Ian Romanick)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
8 years agoglsl: add warning-test
Alejandro Piñeiro [Tue, 19 Apr 2016 19:03:07 +0000 (21:03 +0200)]
glsl: add warning-test

It executes compiler-glsl on all the available shaders, and it checks
that the outcome is the expected.

Bash code based on the already existing optimization-test

v2: rebasing: use --version option

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
8 years agoglsl: add just-log option for the standalone compiler.
Alejandro Piñeiro [Tue, 19 Apr 2016 18:26:32 +0000 (20:26 +0200)]
glsl: add just-log option for the standalone compiler.

Add an option in order to ask to just print the InfoLog, without any
header or separator. Useful if we want to use the standalone compiler
to track only the warning/error messages.

v2: all printfs goes on its own line (Ian Romanick)
v3: rebasing: move just_log to standalone.h/cpp

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
8 years agoglsl: do not raise uninitialized warning with out function parameters
Alejandro Piñeiro [Tue, 19 Apr 2016 09:17:27 +0000 (11:17 +0200)]
glsl: do not raise uninitialized warning with out function parameters

It silence by default warnings with function parameters, as the
parameters need to be processed in order to have the actual and the
formal parameter, and the function signature. Then it raises the
warning if needed at verify_parameter_modes where other in/out/inout modes
checks are done.

v2: fix comment style, multi-line condition style, simplify check,
    remove extra blank (Ian Romanick)
v3: inout function parameters can raise the warning too (Ian
    Romanick)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
8 years agoglsl: add a empty set_is_lhs on ast_node
Alejandro Piñeiro [Tue, 19 Apr 2016 09:15:54 +0000 (11:15 +0200)]
glsl: add a empty set_is_lhs on ast_node

Just to allow to call set_is_lhs on any ast_node without a casting. Useful
when processing a ast_node list that we know it contain ast_expression.

v2: comment out new_value to avoid unused parameter warning (Ian Romanick)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
8 years agoglsl: handle implicit sized arrays in ssbo
Dave Airlie [Wed, 25 May 2016 03:31:41 +0000 (13:31 +1000)]
glsl: handle implicit sized arrays in ssbo

The current code disallows unsized arrays except at the end of
an SSBO but it is a bit overzealous in doing so.

struct a {
int b[];
int f[4];
};

is valid as long as b is implicitly sized within the shader,
i.e. it is accessed only by integer indices.

I've submitted some piglit tests to test for this.

This also has no regressions on piglit on my Haswell.
This fixes:
GL45-CTS.shader_storage_buffer_object.basic-syntax
GL45-CTS.shader_storage_buffer_object.basic-syntaxSSO

This patch moves a chunk of the linker code down, so
that we don't link the uniform blocks until after we've
merged all the variables. The logic went something like:

Removing the checks for last ssbo member unsized from
the compiler and into the linker, meant doing the check
in the link_uniform_blocks code. However to do that the
array sizing had to happen first, so we knew that the
only unsized arrays were in the last block. But array
sizing required the variable to be merged, otherwise
you'd get two different array sizes in different
version of two variables, and one would get lost
when merged. So the solution was to move array sizing
up, after variable merging, but before uniform block
visiting.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
8 years agoglsl: fix error message on uniform block mismatch
Dave Airlie [Wed, 25 May 2016 21:42:16 +0000 (07:42 +1000)]
glsl: fix error message on uniform block mismatch

This looks like a cut-paste from above.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
8 years agoglsl/ast: assign explicit_xfb_buffer from correct place
Dave Airlie [Wed, 25 May 2016 23:23:54 +0000 (09:23 +1000)]
glsl/ast: assign explicit_xfb_buffer from correct place

This fixes:
GL44-CTS.tessellation_shader.tessellation_control_to_tessellation_evaluation.data_pass_through

As the OUT_TC interface structures weren't matching because
one of them had explicit_xfb_buffer set when it shouldn't.

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
8 years agoswr: [rasterizer] Correctly select optimized primitive assembly.
Bruce Cherniak [Tue, 24 May 2016 20:00:17 +0000 (15:00 -0500)]
swr: [rasterizer] Correctly select optimized primitive assembly.

Indexed primitives were always using cut-aware primitive assembly,
whether primitive_restart was enabled or not.  Correctly pass down
primitive_restart and select optimized PA when possible.

Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
8 years agodocs: Mention i965/gen8+ supports GL 4.2 in release notes.
Kenneth Graunke [Wed, 25 May 2016 21:22:56 +0000 (14:22 -0700)]
docs: Mention i965/gen8+ supports GL 4.2 in release notes.

8 years agodocs: Update GL_OES_copy_image status.
Kenneth Graunke [Wed, 25 May 2016 21:22:30 +0000 (14:22 -0700)]
docs: Update GL_OES_copy_image status.

8 years agoi965: Enable OES_copy_image (and EXT) on Gen8+ and Baytrail.
Kenneth Graunke [Fri, 20 May 2016 04:44:59 +0000 (21:44 -0700)]
i965: Enable OES_copy_image (and EXT) on Gen8+ and Baytrail.

For now, only enable it on platforms that actually support ETC2.

At this point, Broadwell is only failing 5 (out of 8358) dEQP tests:
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.
   srgb8_alpha8_r11f_g11f_b10f.renderbuffer_to_texture3d
   srgb8_alpha8_rgb10_a2ui.renderbuffer_to_cubemap
   srgb8_alpha8_rgb10_a2ui.renderbuffer_to_renderbuffer
   srgb8_alpha8_rgb10_a2.renderbuffer_to_texture2d
   srgb8_alpha8_rgb9_e5.renderbuffer_to_texture3d

These fail with all methods (meta, blorp, blitter, memcpy).

All are blacklisted from the Android mustpass list, which makes me
wonder whether there's an issue with the tests.  The formats in
question work with other targets, and the targets in question work
with other formats...

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
8 years agoi965: Implement a BLORP path for CopyImage and prefer it over Meta.
Kenneth Graunke [Fri, 20 May 2016 04:13:29 +0000 (21:13 -0700)]
i965: Implement a BLORP path for CopyImage and prefer it over Meta.

We're dropping Meta in favor of BLORP everywhere we can.

This also fixes bugs when copying cubemaps to 2D, which is currently
broken in the meta pass.  BLORP just works.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94198
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
8 years agoi965: Make the CopyImage BLT path bail for stencil images.
Kenneth Graunke [Fri, 20 May 2016 04:10:14 +0000 (21:10 -0700)]
i965: Make the CopyImage BLT path bail for stencil images.

The BLT can't handle S8 because it's W-tiled (at least without
additional funny business, and I'm not sure we care).  Disallow
it so it falls back to the CPU path, which works.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
8 years agoi965: Also copy stencil miptree data.
Kenneth Graunke [Fri, 20 May 2016 03:50:06 +0000 (20:50 -0700)]
i965: Also copy stencil miptree data.

The Meta path handles this, but the CPU/BLT fallbacks did not.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
8 years agoi965: Make a helper function for CopyImage of a miptree.
Kenneth Graunke [Fri, 20 May 2016 03:46:22 +0000 (20:46 -0700)]
i965: Make a helper function for CopyImage of a miptree.

Currently, it only contains the BLT/CPU fallbacks, so the name is a bit
too generic.  But eventually this will use BLORP as well, at which point
the name will make more sense.

The next patch will introduce a second call.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
8 years agoi965: Combine src/dest tex vs. rb checks in intel_copy_image_sub_data.
Kenneth Graunke [Fri, 20 May 2016 03:29:04 +0000 (20:29 -0700)]
i965: Combine src/dest tex vs. rb checks in intel_copy_image_sub_data.

This simplifies things a little - now we only have one (tex or rb?)
if-ladder for src, and a second for dst, rather than four.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
8 years agoi965: Account for MinLayer in CopyImageSubData's blitter/CPU paths.
Kenneth Graunke [Fri, 20 May 2016 02:20:12 +0000 (19:20 -0700)]
i965: Account for MinLayer in CopyImageSubData's blitter/CPU paths.

Fixes Piglit's arb_copy_image-texview test with the Meta path disabled
(so we hit the blitter/CPU fallback paths).

v2: Add MinLayer even for cube maps (suggested by Ilia).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
8 years agofreedreno/ir3: cmdline compiler for glsl
Rob Clark [Sat, 14 May 2016 17:38:13 +0000 (13:38 -0400)]
freedreno/ir3: cmdline compiler for glsl

Use glsl/libstandalone.la to add support for taking glsl src files (in
addition to .tgsi) as input.  Then glsl->nir and feed the result into
the ir3 backend as normal.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
8 years agoglsl: split out libstandalone
Rob Clark [Sat, 14 May 2016 15:59:26 +0000 (11:59 -0400)]
glsl: split out libstandalone

Split standalone glsl_compiler into a libstandalone.la and a thin
main.cpp.  This way drivers can re-use the glsl standalone frontend in
their own standalone compilers.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
8 years agoandroid: drop build of standalone glsl_compiler
Rob Clark [Wed, 25 May 2016 13:59:02 +0000 (09:59 -0400)]
android: drop build of standalone glsl_compiler

It's only a tool for debugging the glsl compiler, and should not be
installed.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Tested-by: Rob Herring <robh@kernel.org>
Acked-by: Emil Velikov <emil.l.velikov@gmail.com>
8 years agoi965: Mark fallthrough in switch statement.
Matt Turner [Tue, 24 May 2016 19:23:00 +0000 (12:23 -0700)]
i965: Mark fallthrough in switch statement.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>