Rob Clark [Fri, 3 Oct 2014 14:08:59 +0000 (10:08 -0400)]
freedreno/a3xx: handle VS only outputting BCOLOR
Possibly we should map the front color to black (zeroes). But not sure
there is a way to do that without generating a shader variant.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Fri, 3 Oct 2014 14:02:31 +0000 (10:02 -0400)]
freedreno/ir3: fix lockups with lame FRAG shaders
Shaders like:
FRAG
PROPERTY FS_COLOR0_WRITES_ALL_CBUFS 1
DCL IN[0], GENERIC[0], PERSPECTIVE
DCL OUT[0], COLOR
DCL SAMP[0]
DCL TEMP[0], LOCAL
IMM[0] FLT32 { 0.0000, 1.0000, 0.0000, 0.0000}
0: TEX TEMP[0], IN[0].xyyy, SAMP[0], 2D
1: MOV OUT[0], IMM[0].xyxx
2: END
cause unhappyness. They have an IN[], but once this is compiled the
useless TEX instruction goes away. Leaving a varying that is never
fetched, which makes the hw unhappy.
In the process fix a signed vs unsigned compare. If the vertex shader
has max_reg=-1, MAX2() vs an unsigned would not give the desired result.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Matt Turner [Fri, 3 Oct 2014 17:01:54 +0000 (10:01 -0700)]
i965/compaction: Disable compaction on SNB temporarily.
Will investigate after XDC.
Matt Turner [Fri, 3 Oct 2014 16:58:41 +0000 (09:58 -0700)]
Revert "i965: Emit ELSE/ENDIF JIP with type D on Gen 7."
This reverts commit
54e30dbf4db437748509d1319c3f6e4185f76c69.
Will investigate after XDC.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84557
Matt Turner [Wed, 1 Oct 2014 06:18:34 +0000 (23:18 -0700)]
i965/fs: Remove dead generate_rep_fb_write prototype.
Added in commit
f9dc7aab.
Brian Paul [Thu, 2 Oct 2014 15:36:54 +0000 (09:36 -0600)]
mesa: fix spurious wglGetProcAddress / GL_INVALID_OPERATION error
On Windows, the Piglit primitive-restart test was failing a
glGetError()==0 assertion when it was run w/out any command line
arguments. Piglit's all.py script only runs primitive-restart
with arguments so this case isn't normally hit during a full
piglit run.
The basic problem is Microsoft's opengl32.dll calls glFlush
from wglGetProcAddress() and Piglit uses wglGetProcAddress() to
resolve glPrimitiveRestartNV() which is called inside glBegin/End.
See comments in the code for more info.
Plus, improve the comments for _mesa_alloc_dispatch_table().
Cc: <mesa-stable@lists.freedesktop.org>
Acked-by: Sinclair Yeh <syeh@vmware.com>
Ilia Mirkin [Wed, 24 Sep 2014 21:42:03 +0000 (17:42 -0400)]
freedreno/ir3: add TXF support
Still failing a bunch of the fairly picky texelFetch tests, but the
1D(Array) ones are full passes.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Sat, 27 Sep 2014 14:50:40 +0000 (10:50 -0400)]
freedreno/ir3: add TXD support and expose ARB_shader_texture_lod
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Sat, 27 Sep 2014 06:52:42 +0000 (02:52 -0400)]
freedreno/ir3: add texture offset support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Wed, 1 Oct 2014 00:02:37 +0000 (20:02 -0400)]
freedreno/ir3: shadow comes before array
Experimentally, this makes *ArrayShadow tex-miplevel-selection tests
pass.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Sun, 28 Sep 2014 23:37:27 +0000 (19:37 -0400)]
freedreno/ir3: make TXQ return integers, not floats
We're still doing something wrong for array textures.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Wed, 1 Oct 2014 05:13:38 +0000 (01:13 -0400)]
freedreno/ir3: add UMAD support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Mon, 29 Sep 2014 02:00:34 +0000 (22:00 -0400)]
freedreno/ir3: add ISSG support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Wed, 1 Oct 2014 05:03:31 +0000 (01:03 -0400)]
freedreno/ir3: add MOD support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Mon, 29 Sep 2014 01:05:05 +0000 (21:05 -0400)]
freedreno/ir3: add UMOD support, based on UDIV
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Fri, 12 Sep 2014 03:15:11 +0000 (23:15 -0400)]
freedreno/ir3: add IDIV/UDIV support
Logic shamelessly copied from nv50 lowering pass.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Michel Dänzer [Thu, 2 Oct 2014 07:00:26 +0000 (16:00 +0900)]
radeonsi: Clear sampler view flags when binding a buffer
Fixes assertion failure while running the Unreal Engine 4 Elemental demo:
.../si_blit.c:322:si_decompress_color_textures: Assertion `tex->cmask.size || tex->fmask.size' failed.
Cc: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Eric Anholt [Thu, 2 Oct 2014 21:14:48 +0000 (14:14 -0700)]
vc4: Add support for framebuffer sRGB encoding.
Eric Anholt [Thu, 2 Oct 2014 21:01:29 +0000 (14:01 -0700)]
vc4: Add support for sampling from sRGB.
This isn't perfect -- the filtering is happening on the srgb values, and
we're decoding afterwards, which is not what you want. I think that's the
cause of some additional texwrap(GL_CLAMP, LINEAR) failures, though many
other texwrap tests on srgb start to pass since unfiltered values come out
correct.
Ilia Mirkin [Wed, 1 Oct 2014 03:27:25 +0000 (23:27 -0400)]
freedreno/ir3: avoid fan-in sources referring to same instruction
Since the RA has to be done s.t. each one gets its own (adjacent)
register, it would complicate matters if instructions were allowed to be
repeated. This enables copy-propagation use in situations where
previously that might have happened.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Wed, 1 Oct 2014 15:28:17 +0000 (11:28 -0400)]
freedreno/a3xx: emit all immediates in one shot
Makes the command stream a bit tighter when there are lots of
immediates.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Ilia Mirkin [Thu, 2 Oct 2014 07:39:05 +0000 (03:39 -0400)]
freedreno: instanced drawing/compute not yet supported
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Dave Airlie [Tue, 30 Sep 2014 23:22:13 +0000 (09:22 +1000)]
mesa: fix GetTexImage for 1D array depth textures
While running piglit in virgl, I hit an assert in intel driver.
"qemu-system-x86_64: intel_tex.c:219: intel_map_texture_image: Assertion `tex_image->TexObject->Target != 0x8C18 || h == 1' failed."
Thanks to Eric and Ken for pointing me in the right direction,
Fix the get_tex_depth to do the same fixup as get_tex_rgba does
for 1D array textures.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
Tomasz Figa [Sat, 27 Sep 2014 14:20:01 +0000 (16:20 +0200)]
st/mesa: Fix paths used in Android builds
With current makefiles the build fails because source and build paths
are generated incorrectly. With Android build system the top_srcdir and
top_builddir variables are undefined and all paths are relative to where
Android.mk is located. This ends up with path likes
external/mesa/src/mesa/src/mesa/ for both source and build paths, which
are obviously wrong.
This patch fixes this by overriding resulting SRCDIR and BUILDDIR
variables with empty string, so that paths end up being relative to
Android.mk file again. Appending correct build path to generated files
is already done in Android.gen.mk.
Signed-off-by: Tomasz Figa <tomasz.figa@gmail.com>
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Tomasz Figa [Sat, 27 Sep 2014 14:20:00 +0000 (16:20 +0200)]
st/mesa: Generate format_info.c in Android builds
Current Android makefiles lack generation of format_info.c, which is
a dependency of main/format.c. This patch adds necessary code to
Android.gen.mk.
Signed-off-by: Tomasz Figa <tomasz.figa@gmail.com>
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Tomasz Figa [Sat, 27 Sep 2014 14:19:59 +0000 (16:19 +0200)]
util: Include in Android builds
This patch fixes Android build failures by including src/util directory
in compilation. Files inside of this directory are compiled into
libmesa_util static library and linked with resulting libGLES_mesa.
Signed-off-by: Tomasz Figa <tomasz.figa@gmail.com>
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Jason Ekstrand [Thu, 2 Oct 2014 23:04:57 +0000 (16:04 -0700)]
i965/fs: Use the correct base_mrf for spilling pairs in SIMD8
Before, we were hard-coding the base_mrf based on dispatch width not number
of registers spilled at a time. This caused us to emit instructions with a
base_mrf or 14 and a mlen of 3 so we used the magical non-existant m16
register. This fixes the problem.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Wed, 1 Oct 2014 17:54:59 +0000 (10:54 -0700)]
i965/fs: Add a MAX_GRF_SIZE define and use it various places
Previously, we had a MAX_SAMPLER_MESSAGE_SIZE which we used instead.
However, some FB write messages can validly be longer than this so we need
something different. Since MAX_SAMPLER_MESSAGE_SIZE is validly useful on
its own, we leave it alone and add a new MAX_GRF_SIZE that's big enough for
FB writes.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84539
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Wed, 1 Oct 2014 17:46:48 +0000 (10:46 -0700)]
i965/fs: Use the actual regsister width in brw_reg_from_fs_reg
This fixes a bug where 1-wide operations don't properly translate down to
1-wide instructions.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Wed, 1 Oct 2014 17:27:24 +0000 (10:27 -0700)]
i965/fs_fp: Use null_reg from fs_visitor instead of rolling our own
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84529
Reviewed-by: Matt Turner <mattst88@gmail.com>
Rob Clark [Wed, 1 Oct 2014 19:26:26 +0000 (15:26 -0400)]
freedreno/a3xx: handle large shader program sizes
Above a certain limit use CACHE mode instead of BUFFER mode. This
should solve gpu hangs with large shader programs.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Wed, 1 Oct 2014 18:57:34 +0000 (14:57 -0400)]
freedreno: update generated headers
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Ilia Mirkin [Wed, 1 Oct 2014 04:26:03 +0000 (00:26 -0400)]
freedreno: dual-source render targets are not supported
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Ilia Mirkin [Wed, 1 Oct 2014 23:43:38 +0000 (19:43 -0400)]
gallium/hud: use u_sampler_view_default_template helper
The existing code was not setting several fields, most importantly the
target, which is required on nv50/nvc0.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Iago Toral Quiroga [Wed, 1 Oct 2014 10:12:38 +0000 (12:12 +0200)]
glsl: Fix memory leak in builtin_builder::_image_prototype.
in_var calls the ir_variable constructor, which dups the variable name.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Tapani Pälli [Tue, 30 Sep 2014 07:28:26 +0000 (10:28 +0300)]
mesa: relax draw api validation on ES2
Patch fixes failing test in WebGL conformance test
'point-no-attributes' when running Chrome on OpenGL ES.
(Shader program may draw points using constant data in shader.)
No Piglit regressions.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Ilia Mirkin [Tue, 30 Sep 2014 04:12:40 +0000 (00:12 -0400)]
glsl: make consistent use of DECLARE_RALLOC_CXX_OPERATORS
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Eric Anholt [Wed, 1 Oct 2014 18:58:22 +0000 (11:58 -0700)]
vc4: Fix the mapping of the minification filter to HW values.
They're actually as documented in the HW specs and the GL mipmapping enums
order. Fixes fbo-generatemipmap-filtering , and some other tests where we
were off by a few bits due to unexpected linear filtering.
Eric Anholt [Wed, 1 Oct 2014 17:58:02 +0000 (10:58 -0700)]
vc4: Make the last static array in vc4_program.c dynamically sized.
Eric Anholt [Tue, 30 Sep 2014 23:10:09 +0000 (16:10 -0700)]
vc4: Fix some broken indentation.
Eric Anholt [Tue, 30 Sep 2014 23:08:23 +0000 (16:08 -0700)]
vc4: Add support for the FACE semantic.
Fixes glsl-fs-frontfacing.
Eric Anholt [Tue, 30 Sep 2014 21:19:25 +0000 (14:19 -0700)]
vc4: Add support for TGSI_OPCODE_CLAMP.
This will be used by the shared LIT lowering code.
Eric Anholt [Tue, 30 Sep 2014 23:26:51 +0000 (16:26 -0700)]
vc4: Fix compiler warning
Anuj Phogat [Wed, 1 Oct 2014 22:24:27 +0000 (15:24 -0700)]
meta: Fix make check failures in setup_glsl_msaa_blit_scaled_shader()
introduced by commit 68ee950.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reported-by: Mark Janes <mark.a.janes@intel.com>
Brian Paul [Wed, 1 Oct 2014 15:03:13 +0000 (09:03 -0600)]
mesa: fix _mesa_alloc_dispatch_table() declaration
Insert 'void' parameter to match declaration in api_exec.h. Trivial.
Roland Scheidegger [Wed, 1 Oct 2014 21:14:46 +0000 (23:14 +0200)]
meta: (trivial) remove accidental double semicolon
Anuj Phogat [Thu, 4 Sep 2014 20:49:04 +0000 (13:49 -0700)]
i965: Enable EXT_framebuffer_multisample_blit_scaled for gen8
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Anuj Phogat [Fri, 5 Sep 2014 19:19:22 +0000 (12:19 -0700)]
meta: Implement ext_framebuffer_multisample_blit_scaled extension
Extension enables doing a multisample buffer resolve and buffer
scaling using a single glBlitFrameBuffer() call. Currently, we
have this extension implemented in BLORP which is only used by
SNB and IVB. This patch implements the extension in meta path
which makes it available to Broadwell.
Implementation features:
- Supports scaled resolves of 2X, 4X and 8X multisample buffers.
- Avoids unnecessary shader compilations by storing the pre compiled
shaders for each supported sample count.
- Uses bilinear filtering for both GL_SCALED_RESOLVE_FASTEST_EXT and
GL_SCALED_RESOLVE_NICEST_EXT filter options. This is an allowed
behavior in the extension's spec.
- I tried doing bicubic filtering for GL_SCALED_RESOLVE_NICEST_EXT
filter. It made the edges in the image look little smoother but
the image gets blurred causing no overall quality improvement.
For now I have dropped the idea of doing different filtering for
nicest filter.
V2:
- Minor changes to simplify the fragment shader.
- Refactor the code to move i965 specific sample_map computation out
of Meta. We now use ctx->Const.SampleMap{2,4,8}x variables initialized
by the driver.
- Use a simple msaa resolve shader for scaled resolves with scaling
factor = 1.0.
V3:
- Make changes to create a string out of ctx->Const.SampleMap{2,4,8}x
variables and use it in fragment shader.
V4:
- Make changes to use uint8_t type ctx->Const.SampleMap{2,4,8}x
variables.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Anuj Phogat [Tue, 23 Sep 2014 18:58:02 +0000 (11:58 -0700)]
i965: Initialize the SampleMap{2,4,8}x variables
with values specific to Intel hardware.
V2: Define and use gen6_get_sample_map() function to initialize
the variables.
V3: Change the function name to gen6_set_sample_maps() and use
memcpy() to fill in the data.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Anuj Phogat [Tue, 23 Sep 2014 18:56:54 +0000 (11:56 -0700)]
mesa: Add new variables in gl_context to store sample layout
SampleMap{2,4,8}x variables are used in later patches to implement
EXT_framebuffer_multisample_blit_scaled extension.
V2: Use integer array instead of a string.
Bump up the comment.
V3: Use uint8_t type array.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Leo Liu [Thu, 18 Sep 2014 16:21:58 +0000 (12:21 -0400)]
st/va: implement vlVa(Query|Create|Get|Put|Destroy)Image
This patch implements functions for images support,
which basically supports copy data between video
surface and user buffers, in this case supports
SW decode, and other video output
v2: fix buffer size for odd-sized image case
expose I420 format as well
v3: fix YUV 4:2:2 format data buffer size
cleanup I420 format exposure
Signed-off-by: Leo Liu <leo.liu@amd.com>
Christian König [Thu, 18 Sep 2014 15:57:46 +0000 (11:57 -0400)]
st/va: implement Picture functions for mpeg2 h264 and vc1
This patch implements codec for mpeg2 h264 and vc1,
populates codec parameters and pass them to HW driver.
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Leo Liu <leo.liu@amd.com>
Christian König [Fri, 4 Jul 2014 16:44:36 +0000 (12:44 -0400)]
st/va: implement Context Surface and Buffer
This patch implements context managements, relate it HW driver,
functions for video surface managements, and functions for
application data memory buffer managements.
implemented functions:
vlVa(Create|Destroy)Context
vlVa(Create|Destroy|Put)Surfaces
vlVa(Create|Destroy)Buffer
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Leo Liu <leo.liu@amd.com>
Christian König [Tue, 28 May 2013 16:02:58 +0000 (18:02 +0200)]
st/va: implement vlVa(Create|Destroy|Query|Get)Config
This patch is for application to query configuration,
such as profiles, entrypoints, and attributes
v2: fix missing profile with query
Signed-off-by: Michael Varga <michael.varga@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Leo Liu <leo.liu@amd.com>
Christian König [Wed, 2 Jul 2014 20:35:28 +0000 (16:35 -0400)]
st/va: skeleton VAAPI state tracker
This patch adds a skeleton VA-API state tracker,
which is filled with live in the subsequent patches.
v2: fixes in configure.ac and va state_tracker Makefile.am
v3: do not link against libva.
detect libva version, and correctly set driver entrypoint name.
rebase(cleanup) targets/va/Makefile.am
v4: cleanup va version auto detection
add back targets/va/va.sym
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Leo Liu [Mon, 22 Sep 2014 18:07:13 +0000 (14:07 -0400)]
st/vdpau: move common functions to util
Break out these functions so that they can be shared with a other
state trackers. They will be used in subsequent patches for the new
VA-API state tracker.
Signed-off-by: Leo Liu <leo.liu@amd.com>
Rob Clark [Wed, 1 Oct 2014 11:26:39 +0000 (07:26 -0400)]
freedreno: max-texture-lod-bias should be 15.0f
Fixes piglit lodbias test.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Kenneth Graunke [Fri, 26 Sep 2014 22:13:30 +0000 (15:13 -0700)]
mesa: Avoid flagging _NEW_VIEWPORT on redundant viewport updates.
Cuts the number of i965 color calculator viewport uploads by 100x
(
11017983 -> 113385) in 'x11perf -gc' with Glamor in Xephyr.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Kenneth Graunke [Fri, 26 Sep 2014 22:13:29 +0000 (15:13 -0700)]
i965: Drop CACHE_NEW_VS_PROG from the gen7_sf_state atom.
I believe when I wrote this code, gen6_sf_state used CACHE_NEW_VS_PROG,
which has since been replaced by BRW_NEW_VUE_MAP_GEOM_OUT. It's not
needed here anyway - only SBE needs it. Just a copy and paste mistake.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Kenneth Graunke [Fri, 26 Sep 2014 18:12:34 +0000 (11:12 -0700)]
i965: Drop brwBindProgram driver hook.
This function flagged BRW_NEW_*_PROGRAM
When ctx->{Vertex,Geometry,Fragment}Program._Current changes, core Mesa
calls the BindProgram driver hook, which flagged BRW_NEW_*_PROGRAM.
However, brw_upload_state also checks for that changing, sets the same
flags, and also updates brw->fragment_program and so on. So, this looks
to be entirely redundant.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Kenneth Graunke [Fri, 26 Sep 2014 17:13:02 +0000 (10:13 -0700)]
i965: Add missing /* BRW_NEW_FRAGMENT_PROGRAM */ comments.
I had to dig a bit to figure out why this was necessary.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Kenneth Graunke [Fri, 26 Sep 2014 08:44:51 +0000 (01:44 -0700)]
i965: Use "1ull" instead of "1" in BRW_NEW_* defines.
Now that the bitfield is a uint64_t, we should use 1ull. Currently, we
only have 32 entries, so 1 works fine, but it's not future-proof.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Kenneth Graunke [Fri, 26 Sep 2014 22:50:14 +0000 (15:50 -0700)]
i965: Use ~0ull when flagging all BRW_NEW_* dirty flags.
~0 is 0xFFFFFFFF, which only covers the first 32 bits. We need all 64.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Kenneth Graunke [Fri, 26 Sep 2014 17:29:25 +0000 (10:29 -0700)]
i965: Fix INTEL_DEBUG=state to work with 64-bit dirty bits.
This will keep INTEL_DEBUG=state working when we add BRW_NEW_* bits
beyond 1 << 31. We missed doing this when widening the driver flags
from uint32_t to uint64_t.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Kenneth Graunke [Fri, 26 Sep 2014 08:27:11 +0000 (01:27 -0700)]
i965: Delete CACHE_NEW_BLORP_CONST_COLOR_PROG.
Unused since krh rewrote fast clears to use meta.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Chris Forbes [Tue, 23 Sep 2014 09:27:24 +0000 (21:27 +1200)]
i965: Fix typo in comment
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Chris Forbes [Sun, 28 Sep 2014 04:07:37 +0000 (17:07 +1300)]
i965: Fix spelling of GEN7_SAMPLER_EWA_ANISOTROPIC_ALGORITHM
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Vinson Lee [Wed, 1 Oct 2014 04:52:13 +0000 (21:52 -0700)]
llvmpipe: Add missing LLVMGetGlobalContext() arg in lp_test_format.c.
Fix build error introduced with commit
eedbce9c63a3f385908bdc8a69e8be98dd3522ff.
lp_test_format.c: In function ‘test_format_unorm8’:
lp_test_format.c:226:4: error: too few arguments to function ‘gallivm_create’
gallivm = gallivm_create("test_module_unorm8");
^
In file included from ../../../../src/gallium/auxiliary/gallivm/lp_bld_format.h:38:0,
from lp_test_format.c:42:
../../../../src/gallium/auxiliary/gallivm/lp_bld_init.h:58:1: note: declared here
gallivm_create(const char *name, LLVMContextRef context);
^
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84538
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Keith Packard [Wed, 1 Oct 2014 03:03:29 +0000 (20:03 -0700)]
glx/dri3: Provide error diagnostics when DRI3 allocation fails
Instead of just segfaulting in the driver when a buffer allocation fails,
report error messages indicating what went wrong so that we can debug things.
As a simple example, chromium wraps Mesa in a sandbox which doesn't allow
access to most syscalls, including the ability to create shared memory
segments for fences. Before, you'd get a simple segfault in mesa and your 3D
acceleration would fail. Now you get:
$ chromium --disable-gpu-blacklist
[10618:10643:0930/200525:ERROR:nss_util.cc(856)] After loading Root Certs, loaded==false: NSS error code: -8018
libGL: pci id for fd 12: 8086:0a16, driver i965
libGL: OpenDriver: trying /local-miki/src/mesa/mesa/lib/i965_dri.so
libGL: Can't open configuration file /home/keithp/.drirc: Operation not permitted.
libGL: Can't open configuration file /home/keithp/.drirc: Operation not permitted.
libGL error: DRI3 Fence object allocation failure Operation not permitted
[10618:10618:0930/200525:ERROR:command_buffer_proxy_impl.cc(153)] Could not send GpuCommandBufferMsg_Initialize.
[10618:10618:0930/200525:ERROR:webgraphicscontext3d_command_buffer_impl.cc(236)] CommandBufferProxy::Initialize failed.
[10618:10618:0930/200525:ERROR:webgraphicscontext3d_command_buffer_impl.cc(256)] Failed to initialize command buffer.
This made it pretty easy to diagnose the problem in the referenced bug report.
Bugzilla: https://code.google.com/p/chromium/issues/detail?id=415681
Signed-off-by: Keith Packard <keithp@keithp.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Matt Turner <mattst88@gmail.com>
Keith Packard [Wed, 2 Jul 2014 20:26:22 +0000 (13:26 -0700)]
glx/dri3: Use four buffers until X driver supports async flips
A driver which doesn't have async flip support will queue up flips without any
way to replace them afterwards. This means we've got a scanout buffer pinned
as soon as we schedule a flip and so we need another buffer to keep from
stalling.
When vblank_mode=0, if there are only three buffers we do:
current scanout buffer = 0 at MSC 0
Render frame 1 to buffer 1
PresentPixmap for buffer 1 at MSC 1
This is sitting down in the kernel waiting for vblank to
become the next scanout buffer
Render frame 2 to buffer 2
PresentPixmap for buffer 2 at MSC 1
This cannot be displayed at MSC 1 because the
kernel doesn't have any way to replace buffer 1 as the pending
scanout buffer. So, best case this will get displayed at MSC 2.
Now we block after this, waiting for one of the three buffers to become idle.
We can't use buffer 0 because it is the scanout buffer. We can't use buffer 1
because it's sitting in the kernel waiting to become the next scanout buffer
and we can't use buffer 2 because that's the most recent frame which will
become the next scanout buffer if the application doesn't manage to generate
another complete frame by MSC 2.
With four buffers, we get:
current scanout buffer = 0 at MSC 0
Render frame 1 to buffer 1
PresentPixmap for buffer 1 at MSC 1
This is sitting down in the kernel waiting for vblank to
become the next scanout buffer
Render frame 2 to buffer 2
PresentPixmap for buffer 2 at MSC 1
This cannot be displayed at MSC 1 because the
kernel doesn't have any way to replace buffer 1 as the pending
scanout buffer. So, best case this will get displayed at MSC
2. The X server will queue this swap until buffer 1 becomes
the scanout buffer.
Render frame 3 to buffer 3
PresentPixmap for buffer 3 at MSC 1
As soon as the X server sees this, it will replace the pending
buffer 2 swap with this swap and release buffer 2 back to the
application
Render frame 4 to buffer 2
PresentPixmap for buffer 2 at MSC 1
Now we're in a steady state, flipping between buffer 2 and 3
waiting for one of them to be queued to the kernel.
...
current scanout buffer = 1 at MSC 1
Now buffer 0 is free and (e.g.) buffer 2 is queued in
the kernel to be the scanout buffer at MSC 2
Render frames, flipping between buffer 0 and 3
When the system can replace a queued buffer, and we update Present to take
advantage of that, we can use three buffers and get:
current scanout buffer = 0 at MSC 0
Render frame 1 to buffer 1
PresentPixmap for buffer 1 at MSC 1
This is sitting waiting for vblank to become the next scanout
buffer
Render frame 2 to buffer 2
PresentPixmap for buffer 2 at MSC 1
Queue this for display at MSC 1
1. There are three possible results:
1) We're still before MSC 1. Buffer 1 is released,
buffer 2 is queued waiting for MSC 1.
2) We're now after MSC 1. Buffer 0 was released at MSC 1.
Buffer 1 is the current scanout buffer.
a) If the user asked for a tearing update, we swap
scanout from buffer 1 to buffer 2 and release buffer
1.
b) If the user asked for non-tearing update, we
queue buffer 2 for the MSC 2.
In all three cases, we have a buffer released (call it 'n'),
ready to receive the next frame.
Render frame 3 to buffer n
PresentPixmap for buffer n
If we're still before MSC 1, then we'll ask to present at MSC
1. Otherwise, we'll ask to present at MSC 2.
Present already does this if the driver offers async flips, however it does
this by waiting for the right vblank event and sending an async flip right at
that point.
I've hacked the intel driver to offer this, but I get tearing at the top of
the screen. I think this is because flips are always done from within the
ring, and so the latency between the vblank event and the async flip happening
can cause tearing at the top of the screen.
That's why I'm keying the need for the extra buffer on the lack of 2D
driver support for async flips.
Signed-off-by: Keith Packard <keithp@keithp.com>
Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
Tested-by: Dylan Baker <baker.dylan.c@gmail.com>
Jason Ekstrand [Wed, 1 Oct 2014 00:27:33 +0000 (17:27 -0700)]
i965/fs: Fix the build
Jason Ekstrand [Tue, 30 Sep 2014 22:41:24 +0000 (15:41 -0700)]
i965/fs: Fix an uninitialized value warnings
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Roland Scheidegger [Tue, 30 Sep 2014 17:25:09 +0000 (19:25 +0200)]
galahad: fix indirect draw
Need to unwrap the indirect resource otherwise bad things will happen.
Fixes random crashes and timeouts with piglit's arb_indirect_draw tests.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Roland Scheidegger [Tue, 30 Sep 2014 17:23:37 +0000 (19:23 +0200)]
galahad: (trivial) handle cubemap arrays
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Matt Turner [Tue, 30 Sep 2014 22:06:23 +0000 (15:06 -0700)]
i965/fs: Emit compressed BFI2 instructions on Gen > 7.
IVB had a restriction that prevented us from emitting compressed
three-source instructions, and although that was lifted on Haswell,
Haswell had a new restriction that said BFI instructions specifically
couldn't be compressed.
Matt Turner [Sat, 30 Aug 2014 03:26:11 +0000 (20:26 -0700)]
i965/fs: Allow SIMD16 borrow/carry/64-bit multiply on Gen > 7.
These checks were intended for Gen 7 only. None of these restrictions
apply to Gen 8.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Matt Turner [Sun, 28 Sep 2014 00:34:51 +0000 (17:34 -0700)]
i965/fs: Set MUL source type to W/UW in 64-bit mul macro on Gen8.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Matt Turner [Sat, 27 Sep 2014 17:34:56 +0000 (10:34 -0700)]
i965/fs: Optimize sqrt+inv into rsq.
Transform
sqrt a, b
rcp c, a
into
sqrt a, b
rsq c, b
The improvement here is that we've broken a dependency between these
instructions. Leads to 330 fewer INV instructions and 330 more RSQ.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Matt Turner [Sat, 27 Sep 2014 17:34:27 +0000 (10:34 -0700)]
i965/vec4: Optimize sqrt+inv into rsq.
Transform
sqrt a, b
rcp c, a
into
sqrt a, b
rsq c, b
In most cases the sqrt's result is still used, so the improvement here
is that we've broken a dependency between these instructions. Leads to
80 fewer INV instructions and 80 more RSQ.
Occasionally the sqrt's result is no longer used, leading to:
instructions in affected programs: 5005 -> 4949 (-1.12%)
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Matt Turner [Sat, 27 Sep 2014 17:34:07 +0000 (10:34 -0700)]
i965/vec4: Call opt_algebraic after opt_cse.
The next patch adds an algebraic optimization for the pattern
sqrt a, b
rcp c, a
and turns it into
sqrt a, b
rsq c, b
but many vertex shaders do
a = sqrt(b);
var1 /= a;
var2 /= a;
which generates
sqrt a, b
rcp c, a
rcp d, a
If we apply the algebraic optimization before CSE, we'll end up with
sqrt a, b
rsq c, b
rcp d, a
Applying CSE combines the RCP instructions, preventing this from
happening.
No shader-db changes.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Matt Turner [Thu, 4 Sep 2014 20:25:15 +0000 (13:25 -0700)]
i965/fs: Extend predicated break pass to predicate WHILE.
Helps a handful of programs in Serious Sam 3 that use do-while loops.
instructions in affected programs: 16114 -> 16075 (-0.24%)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Mathias Fröhlich [Tue, 30 Sep 2014 20:11:30 +0000 (22:11 +0200)]
gallivm: Fix build for LLVM 3.2
Do not rely on LLVMMCJITMemoryManagerRef being available.
The c binding to the memory manager objects only appeared
on llvm-3.4.
The change is based on an initial patch of Brian Paul.
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
Rob Clark [Tue, 30 Sep 2014 17:47:58 +0000 (13:47 -0400)]
freedreno: destroy transfer pool after blitter
Blitter can still have transfers hanging around which it frees in
util_blitter_destroy(). So let it clean up before we yank the
transfer_pool from under it.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Tue, 30 Sep 2014 20:53:24 +0000 (16:53 -0400)]
freedreno/lowering: fix token calculation for lowering
Indirect registers consume an additional token. Try to clean up the
token calculation math a bit, and fix it at the same time.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Ian Romanick [Sat, 24 May 2014 03:03:31 +0000 (20:03 -0700)]
i965/fs: Don't make a name for a vector splitting temporary
If the name is just going to get dropped, don't bother making it. If
the name is made, release it sooner (rather than later).
No change Valgrind massif results for a trimmed apitrace of dota2.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Ian Romanick [Wed, 9 Jul 2014 02:04:52 +0000 (19:04 -0700)]
glsl: Don't make a name for the function return variable
If the name is just going to get dropped, don't bother making it. If
the name is made, release it sooner (rather than later).
No change Valgrind massif results for a trimmed apitrace of dota2.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Ian Romanick [Wed, 9 Jul 2014 02:03:52 +0000 (19:03 -0700)]
glsl: Don't allocate a name for ir_var_temporary variables
Valgrind massif results for a trimmed apitrace of dota2:
n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B)
Before (32-bit): 74 40,578,719,715 67,762,208 62,263,404 5,498,804 0
After (32-bit): 52 40,565,579,466 66,359,800 61,187,818 5,171,982 0
Before (64-bit): 74 37,129,541,061 95,195,160 87,369,671 7,825,489 0
After (64-bit): 76 37,134,691,404 93,271,352 85,900,223 7,371,129 0
A real savings of 1.0MiB on 32-bit and 1.4MiB on 64-bit.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Ian Romanick [Wed, 9 Jul 2014 23:57:03 +0000 (16:57 -0700)]
glsl: Use ir_var_temporary for compiler generated temporaries
These few places were using ir_var_auto for seemingly no reason. The
names were not added to the symbol table.
No change Valgrind massif results for a trimmed apitrace of dota2.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Ian Romanick [Wed, 9 Jul 2014 01:55:27 +0000 (18:55 -0700)]
glsl: Add context-level controls for whether temporaries have real names
No change Valgrind massif results for a trimmed apitrace of dota2.
v2: Minor rebase on _mesa_init_constants changes.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Ian Romanick [Wed, 9 Jul 2014 01:53:09 +0000 (18:53 -0700)]
glsl: Never put ir_var_temporary variables in the symbol table
Later patches will give every ir_var_temporary the same name in release
builds. Adding a bunch of variables named "compiler_temp" to the symbol
table can only cause problems.
No change Valgrind massif results for a trimmed apitrace of dota2.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Ian Romanick [Tue, 8 Jul 2014 23:57:33 +0000 (16:57 -0700)]
glsl: Add the possibility for ir_variable to have a non-ralloced name
Specifically, ir_var_temporary variables constructed with a NULL name
will all have the name "compiler_temp" in static storage.
No change Valgrind massif results for a trimmed apitrace of dota2.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Ian Romanick [Wed, 28 May 2014 01:45:40 +0000 (18:45 -0700)]
glsl: Store ir_variable_data::_num_state_slots and ::binding in 16-bits each
Valgrind massif results for a trimmed apitrace of dota2:
n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B)
Before (32-bit): 44 40,577,049,140 68,118,608 62,441,063 5,677,545 0
After (32-bit): 71 40,583,408,411 67,761,528 62,263,519 5,498,009 0
Before (64-bit): 63 37,122,829,194 95,153,008 87,333,600 7,819,408 0
After (64-bit): 67 37,123,303,706 95,150,544 87,333,600 7,816,944 0
A real savings of 173KiB on 32-bit and no change on 64-bit.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Ian Romanick [Wed, 28 May 2014 01:34:24 +0000 (18:34 -0700)]
glsl: Squish ir_variable::max_ifc_array_access and ::state_slots together
At least one of these pointers must be NULL, and we can determine which
will be NULL by looking at other fields. Use this information to store
both pointers in the same location.
If anyone can think of a better name for the union than "u", I'm all
ears.
Valgrind massif results for a trimmed apitrace of dota2:
n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B)
Before (32-bit): 63 40,574,239,515 68,117,280 62,618,607 5,498,673 0
After (32-bit): 44 40,577,049,140 68,118,608 62,441,063 5,677,545 0
Before (64-bit): 53 37,126,451,468 95,150,256 87,711,304 7,438,952 0
After (64-bit): 63 37,122,829,194 95,153,008 87,333,600 7,819,408 0
A real savings of 173KiB on 32-bit and 368KiB on 64-bit.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Ian Romanick [Thu, 15 May 2014 02:47:28 +0000 (19:47 -0700)]
glsl: Make ir_variable::num_state_slots and ir_variable::state_slots private
Also move num_state_slots inside ir_variable_data for better packing.
The payoff for this will come in a few more patches.
No change Valgrind massif results for a trimmed apitrace of dota2.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Ian Romanick [Sat, 24 May 2014 01:57:36 +0000 (18:57 -0700)]
glsl: Make ir_variable::max_ifc_array_access private
The payoff for this will come in a few more patches.
No change Valgrind massif results for a trimmed apitrace of dota2.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Ian Romanick [Thu, 15 May 2014 01:36:57 +0000 (18:36 -0700)]
glsl: Store ir_variable::depth_layout using 3 bits
warn_extension_index was moved to improve packing.
Valgrind massif results for a trimmed apitrace of dota2:
n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B)
Before (32-bit): 73 40,580,476,304 68,488,400 62,796,151 5,692,249 0
After (32-bit): 73 40,575,751,558 68,116,528 62,618,607 5,497,921 0
Before (64-bit): 71 37,124,890,613 95,889,584 88,089,008 7,800,576 0
After (64-bit): 62 37,123,578,526 95,150,784 87,711,304 7,439,480 0
A real savings of 173KiB on 32-bit and 368KiB on 64-bit.
v2: Use the enum name with the bit-field and remove the extra casts.
Suggested by Ken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
Reviewed-by: Tapani Pälli <tapani.palli@intel.com> [v1]
Ian Romanick [Wed, 14 May 2014 20:25:14 +0000 (13:25 -0700)]
glsl: Replace ir_variable::warn_extension pointer with an 8-bit index
Also move the new warn_extension_index into ir_variable::data. This
enables slightly better packing.
Valgrind massif results for a trimmed apitrace of dota2:
n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B)
Before (32-bit): 82 40,580,040,531 68,488,992 62,973,695 5,515,297 0
After (32-bit): 73 40,580,476,304 68,488,400 62,796,151 5,692,249 0
Before (64-bit): 65 37,124,013,542 95,892,768 88,466,712 7,426,056 0
After (64-bit): 71 37,124,890,613 95,889,584 88,089,008 7,800,576 0
A real savings of 173KiB on 32-bit and 368KiB on 64-bit.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Ian Romanick [Tue, 13 May 2014 18:59:01 +0000 (11:59 -0700)]
glsl: Use accessors for ir_variable::warn_extension
The payoff for this will come in the next patch.
No change Valgrind massif results for a trimmed apitrace of dota2.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Ian Romanick [Thu, 29 May 2014 00:09:45 +0000 (17:09 -0700)]
glsl: Eliminate unused built-in variables after compilation
After compilation (and before linking) we can eliminate quite a few
built-in variables. Basically, any uniform or constant (e.g.,
gl_MaxVertexTextureImageUnits) that isn't used (with one exception) can
be eliminated. System values, vertex shader inputs (with one
exception), and fragment shader outputs that are not used and not
re-declared in the shader text can also be removed.
gl_ModelViewProjectMatrix and gl_Vertex are used by the built-in
function ftransform. There are some complications with eliminating
these variables (see the comment in the patch), so they are not
eliminated.
Valgrind massif results for a trimmed apitrace of dota2:
n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B)
Before (32-bit): 46 40,661,487,174 75,116,800 68,854,065 6,262,735 0
After (32-bit): 50 40,564,927,443 69,185,408 63,683,871 5,501,537 0
Before (64-bit): 64 37,200,329,700 104,872,672 96,514,546 8,358,126 0
After (64-bit): 59 36,822,048,449 96,526,888 89,113,000 7,413,888 0
A real savings of 4.9MiB on 32-bit and 7.0MiB on 64-bit.
v2: Don't remove any built-in with Transpose in the name.
v3: Fix comment typo noticed by Anuj.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Suggested-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: Eric Anholt <eric@anholt.net>
Ian Romanick [Thu, 29 May 2014 00:05:14 +0000 (17:05 -0700)]
glsl: Validate that built-in uniforms have backing state
All built-in uniforms are supposed to be backed by some GL state. The
state_slots field describes this backing state.
This helped me track down a bug in a later patch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Anuj Phogat <anuj.phogat@gmail.com>