Christian König [Tue, 28 May 2013 16:02:58 +0000 (18:02 +0200)]
st/va: implement vlVa(Create|Destroy|Query|Get)Config
This patch is for application to query configuration,
such as profiles, entrypoints, and attributes
v2: fix missing profile with query
Signed-off-by: Michael Varga <michael.varga@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Leo Liu <leo.liu@amd.com>
Christian König [Wed, 2 Jul 2014 20:35:28 +0000 (16:35 -0400)]
st/va: skeleton VAAPI state tracker
This patch adds a skeleton VA-API state tracker,
which is filled with live in the subsequent patches.
v2: fixes in configure.ac and va state_tracker Makefile.am
v3: do not link against libva.
detect libva version, and correctly set driver entrypoint name.
rebase(cleanup) targets/va/Makefile.am
v4: cleanup va version auto detection
add back targets/va/va.sym
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Leo Liu [Mon, 22 Sep 2014 18:07:13 +0000 (14:07 -0400)]
st/vdpau: move common functions to util
Break out these functions so that they can be shared with a other
state trackers. They will be used in subsequent patches for the new
VA-API state tracker.
Signed-off-by: Leo Liu <leo.liu@amd.com>
Rob Clark [Wed, 1 Oct 2014 11:26:39 +0000 (07:26 -0400)]
freedreno: max-texture-lod-bias should be 15.0f
Fixes piglit lodbias test.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Kenneth Graunke [Fri, 26 Sep 2014 22:13:30 +0000 (15:13 -0700)]
mesa: Avoid flagging _NEW_VIEWPORT on redundant viewport updates.
Cuts the number of i965 color calculator viewport uploads by 100x
(
11017983 -> 113385) in 'x11perf -gc' with Glamor in Xephyr.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Kenneth Graunke [Fri, 26 Sep 2014 22:13:29 +0000 (15:13 -0700)]
i965: Drop CACHE_NEW_VS_PROG from the gen7_sf_state atom.
I believe when I wrote this code, gen6_sf_state used CACHE_NEW_VS_PROG,
which has since been replaced by BRW_NEW_VUE_MAP_GEOM_OUT. It's not
needed here anyway - only SBE needs it. Just a copy and paste mistake.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Kenneth Graunke [Fri, 26 Sep 2014 18:12:34 +0000 (11:12 -0700)]
i965: Drop brwBindProgram driver hook.
This function flagged BRW_NEW_*_PROGRAM
When ctx->{Vertex,Geometry,Fragment}Program._Current changes, core Mesa
calls the BindProgram driver hook, which flagged BRW_NEW_*_PROGRAM.
However, brw_upload_state also checks for that changing, sets the same
flags, and also updates brw->fragment_program and so on. So, this looks
to be entirely redundant.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Kenneth Graunke [Fri, 26 Sep 2014 17:13:02 +0000 (10:13 -0700)]
i965: Add missing /* BRW_NEW_FRAGMENT_PROGRAM */ comments.
I had to dig a bit to figure out why this was necessary.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Kenneth Graunke [Fri, 26 Sep 2014 08:44:51 +0000 (01:44 -0700)]
i965: Use "1ull" instead of "1" in BRW_NEW_* defines.
Now that the bitfield is a uint64_t, we should use 1ull. Currently, we
only have 32 entries, so 1 works fine, but it's not future-proof.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Kenneth Graunke [Fri, 26 Sep 2014 22:50:14 +0000 (15:50 -0700)]
i965: Use ~0ull when flagging all BRW_NEW_* dirty flags.
~0 is 0xFFFFFFFF, which only covers the first 32 bits. We need all 64.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Kenneth Graunke [Fri, 26 Sep 2014 17:29:25 +0000 (10:29 -0700)]
i965: Fix INTEL_DEBUG=state to work with 64-bit dirty bits.
This will keep INTEL_DEBUG=state working when we add BRW_NEW_* bits
beyond 1 << 31. We missed doing this when widening the driver flags
from uint32_t to uint64_t.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Kenneth Graunke [Fri, 26 Sep 2014 08:27:11 +0000 (01:27 -0700)]
i965: Delete CACHE_NEW_BLORP_CONST_COLOR_PROG.
Unused since krh rewrote fast clears to use meta.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Chris Forbes [Tue, 23 Sep 2014 09:27:24 +0000 (21:27 +1200)]
i965: Fix typo in comment
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Chris Forbes [Sun, 28 Sep 2014 04:07:37 +0000 (17:07 +1300)]
i965: Fix spelling of GEN7_SAMPLER_EWA_ANISOTROPIC_ALGORITHM
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Vinson Lee [Wed, 1 Oct 2014 04:52:13 +0000 (21:52 -0700)]
llvmpipe: Add missing LLVMGetGlobalContext() arg in lp_test_format.c.
Fix build error introduced with commit
eedbce9c63a3f385908bdc8a69e8be98dd3522ff.
lp_test_format.c: In function ‘test_format_unorm8’:
lp_test_format.c:226:4: error: too few arguments to function ‘gallivm_create’
gallivm = gallivm_create("test_module_unorm8");
^
In file included from ../../../../src/gallium/auxiliary/gallivm/lp_bld_format.h:38:0,
from lp_test_format.c:42:
../../../../src/gallium/auxiliary/gallivm/lp_bld_init.h:58:1: note: declared here
gallivm_create(const char *name, LLVMContextRef context);
^
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84538
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Keith Packard [Wed, 1 Oct 2014 03:03:29 +0000 (20:03 -0700)]
glx/dri3: Provide error diagnostics when DRI3 allocation fails
Instead of just segfaulting in the driver when a buffer allocation fails,
report error messages indicating what went wrong so that we can debug things.
As a simple example, chromium wraps Mesa in a sandbox which doesn't allow
access to most syscalls, including the ability to create shared memory
segments for fences. Before, you'd get a simple segfault in mesa and your 3D
acceleration would fail. Now you get:
$ chromium --disable-gpu-blacklist
[10618:10643:0930/200525:ERROR:nss_util.cc(856)] After loading Root Certs, loaded==false: NSS error code: -8018
libGL: pci id for fd 12: 8086:0a16, driver i965
libGL: OpenDriver: trying /local-miki/src/mesa/mesa/lib/i965_dri.so
libGL: Can't open configuration file /home/keithp/.drirc: Operation not permitted.
libGL: Can't open configuration file /home/keithp/.drirc: Operation not permitted.
libGL error: DRI3 Fence object allocation failure Operation not permitted
[10618:10618:0930/200525:ERROR:command_buffer_proxy_impl.cc(153)] Could not send GpuCommandBufferMsg_Initialize.
[10618:10618:0930/200525:ERROR:webgraphicscontext3d_command_buffer_impl.cc(236)] CommandBufferProxy::Initialize failed.
[10618:10618:0930/200525:ERROR:webgraphicscontext3d_command_buffer_impl.cc(256)] Failed to initialize command buffer.
This made it pretty easy to diagnose the problem in the referenced bug report.
Bugzilla: https://code.google.com/p/chromium/issues/detail?id=415681
Signed-off-by: Keith Packard <keithp@keithp.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Matt Turner <mattst88@gmail.com>
Keith Packard [Wed, 2 Jul 2014 20:26:22 +0000 (13:26 -0700)]
glx/dri3: Use four buffers until X driver supports async flips
A driver which doesn't have async flip support will queue up flips without any
way to replace them afterwards. This means we've got a scanout buffer pinned
as soon as we schedule a flip and so we need another buffer to keep from
stalling.
When vblank_mode=0, if there are only three buffers we do:
current scanout buffer = 0 at MSC 0
Render frame 1 to buffer 1
PresentPixmap for buffer 1 at MSC 1
This is sitting down in the kernel waiting for vblank to
become the next scanout buffer
Render frame 2 to buffer 2
PresentPixmap for buffer 2 at MSC 1
This cannot be displayed at MSC 1 because the
kernel doesn't have any way to replace buffer 1 as the pending
scanout buffer. So, best case this will get displayed at MSC 2.
Now we block after this, waiting for one of the three buffers to become idle.
We can't use buffer 0 because it is the scanout buffer. We can't use buffer 1
because it's sitting in the kernel waiting to become the next scanout buffer
and we can't use buffer 2 because that's the most recent frame which will
become the next scanout buffer if the application doesn't manage to generate
another complete frame by MSC 2.
With four buffers, we get:
current scanout buffer = 0 at MSC 0
Render frame 1 to buffer 1
PresentPixmap for buffer 1 at MSC 1
This is sitting down in the kernel waiting for vblank to
become the next scanout buffer
Render frame 2 to buffer 2
PresentPixmap for buffer 2 at MSC 1
This cannot be displayed at MSC 1 because the
kernel doesn't have any way to replace buffer 1 as the pending
scanout buffer. So, best case this will get displayed at MSC
2. The X server will queue this swap until buffer 1 becomes
the scanout buffer.
Render frame 3 to buffer 3
PresentPixmap for buffer 3 at MSC 1
As soon as the X server sees this, it will replace the pending
buffer 2 swap with this swap and release buffer 2 back to the
application
Render frame 4 to buffer 2
PresentPixmap for buffer 2 at MSC 1
Now we're in a steady state, flipping between buffer 2 and 3
waiting for one of them to be queued to the kernel.
...
current scanout buffer = 1 at MSC 1
Now buffer 0 is free and (e.g.) buffer 2 is queued in
the kernel to be the scanout buffer at MSC 2
Render frames, flipping between buffer 0 and 3
When the system can replace a queued buffer, and we update Present to take
advantage of that, we can use three buffers and get:
current scanout buffer = 0 at MSC 0
Render frame 1 to buffer 1
PresentPixmap for buffer 1 at MSC 1
This is sitting waiting for vblank to become the next scanout
buffer
Render frame 2 to buffer 2
PresentPixmap for buffer 2 at MSC 1
Queue this for display at MSC 1
1. There are three possible results:
1) We're still before MSC 1. Buffer 1 is released,
buffer 2 is queued waiting for MSC 1.
2) We're now after MSC 1. Buffer 0 was released at MSC 1.
Buffer 1 is the current scanout buffer.
a) If the user asked for a tearing update, we swap
scanout from buffer 1 to buffer 2 and release buffer
1.
b) If the user asked for non-tearing update, we
queue buffer 2 for the MSC 2.
In all three cases, we have a buffer released (call it 'n'),
ready to receive the next frame.
Render frame 3 to buffer n
PresentPixmap for buffer n
If we're still before MSC 1, then we'll ask to present at MSC
1. Otherwise, we'll ask to present at MSC 2.
Present already does this if the driver offers async flips, however it does
this by waiting for the right vblank event and sending an async flip right at
that point.
I've hacked the intel driver to offer this, but I get tearing at the top of
the screen. I think this is because flips are always done from within the
ring, and so the latency between the vblank event and the async flip happening
can cause tearing at the top of the screen.
That's why I'm keying the need for the extra buffer on the lack of 2D
driver support for async flips.
Signed-off-by: Keith Packard <keithp@keithp.com>
Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
Tested-by: Dylan Baker <baker.dylan.c@gmail.com>
Jason Ekstrand [Wed, 1 Oct 2014 00:27:33 +0000 (17:27 -0700)]
i965/fs: Fix the build
Jason Ekstrand [Tue, 30 Sep 2014 22:41:24 +0000 (15:41 -0700)]
i965/fs: Fix an uninitialized value warnings
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Roland Scheidegger [Tue, 30 Sep 2014 17:25:09 +0000 (19:25 +0200)]
galahad: fix indirect draw
Need to unwrap the indirect resource otherwise bad things will happen.
Fixes random crashes and timeouts with piglit's arb_indirect_draw tests.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Roland Scheidegger [Tue, 30 Sep 2014 17:23:37 +0000 (19:23 +0200)]
galahad: (trivial) handle cubemap arrays
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Matt Turner [Tue, 30 Sep 2014 22:06:23 +0000 (15:06 -0700)]
i965/fs: Emit compressed BFI2 instructions on Gen > 7.
IVB had a restriction that prevented us from emitting compressed
three-source instructions, and although that was lifted on Haswell,
Haswell had a new restriction that said BFI instructions specifically
couldn't be compressed.
Matt Turner [Sat, 30 Aug 2014 03:26:11 +0000 (20:26 -0700)]
i965/fs: Allow SIMD16 borrow/carry/64-bit multiply on Gen > 7.
These checks were intended for Gen 7 only. None of these restrictions
apply to Gen 8.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Matt Turner [Sun, 28 Sep 2014 00:34:51 +0000 (17:34 -0700)]
i965/fs: Set MUL source type to W/UW in 64-bit mul macro on Gen8.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Matt Turner [Sat, 27 Sep 2014 17:34:56 +0000 (10:34 -0700)]
i965/fs: Optimize sqrt+inv into rsq.
Transform
sqrt a, b
rcp c, a
into
sqrt a, b
rsq c, b
The improvement here is that we've broken a dependency between these
instructions. Leads to 330 fewer INV instructions and 330 more RSQ.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Matt Turner [Sat, 27 Sep 2014 17:34:27 +0000 (10:34 -0700)]
i965/vec4: Optimize sqrt+inv into rsq.
Transform
sqrt a, b
rcp c, a
into
sqrt a, b
rsq c, b
In most cases the sqrt's result is still used, so the improvement here
is that we've broken a dependency between these instructions. Leads to
80 fewer INV instructions and 80 more RSQ.
Occasionally the sqrt's result is no longer used, leading to:
instructions in affected programs: 5005 -> 4949 (-1.12%)
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Matt Turner [Sat, 27 Sep 2014 17:34:07 +0000 (10:34 -0700)]
i965/vec4: Call opt_algebraic after opt_cse.
The next patch adds an algebraic optimization for the pattern
sqrt a, b
rcp c, a
and turns it into
sqrt a, b
rsq c, b
but many vertex shaders do
a = sqrt(b);
var1 /= a;
var2 /= a;
which generates
sqrt a, b
rcp c, a
rcp d, a
If we apply the algebraic optimization before CSE, we'll end up with
sqrt a, b
rsq c, b
rcp d, a
Applying CSE combines the RCP instructions, preventing this from
happening.
No shader-db changes.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Matt Turner [Thu, 4 Sep 2014 20:25:15 +0000 (13:25 -0700)]
i965/fs: Extend predicated break pass to predicate WHILE.
Helps a handful of programs in Serious Sam 3 that use do-while loops.
instructions in affected programs: 16114 -> 16075 (-0.24%)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Mathias Fröhlich [Tue, 30 Sep 2014 20:11:30 +0000 (22:11 +0200)]
gallivm: Fix build for LLVM 3.2
Do not rely on LLVMMCJITMemoryManagerRef being available.
The c binding to the memory manager objects only appeared
on llvm-3.4.
The change is based on an initial patch of Brian Paul.
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
Rob Clark [Tue, 30 Sep 2014 17:47:58 +0000 (13:47 -0400)]
freedreno: destroy transfer pool after blitter
Blitter can still have transfers hanging around which it frees in
util_blitter_destroy(). So let it clean up before we yank the
transfer_pool from under it.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Tue, 30 Sep 2014 20:53:24 +0000 (16:53 -0400)]
freedreno/lowering: fix token calculation for lowering
Indirect registers consume an additional token. Try to clean up the
token calculation math a bit, and fix it at the same time.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Ian Romanick [Sat, 24 May 2014 03:03:31 +0000 (20:03 -0700)]
i965/fs: Don't make a name for a vector splitting temporary
If the name is just going to get dropped, don't bother making it. If
the name is made, release it sooner (rather than later).
No change Valgrind massif results for a trimmed apitrace of dota2.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Ian Romanick [Wed, 9 Jul 2014 02:04:52 +0000 (19:04 -0700)]
glsl: Don't make a name for the function return variable
If the name is just going to get dropped, don't bother making it. If
the name is made, release it sooner (rather than later).
No change Valgrind massif results for a trimmed apitrace of dota2.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Ian Romanick [Wed, 9 Jul 2014 02:03:52 +0000 (19:03 -0700)]
glsl: Don't allocate a name for ir_var_temporary variables
Valgrind massif results for a trimmed apitrace of dota2:
n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B)
Before (32-bit): 74 40,578,719,715 67,762,208 62,263,404 5,498,804 0
After (32-bit): 52 40,565,579,466 66,359,800 61,187,818 5,171,982 0
Before (64-bit): 74 37,129,541,061 95,195,160 87,369,671 7,825,489 0
After (64-bit): 76 37,134,691,404 93,271,352 85,900,223 7,371,129 0
A real savings of 1.0MiB on 32-bit and 1.4MiB on 64-bit.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Ian Romanick [Wed, 9 Jul 2014 23:57:03 +0000 (16:57 -0700)]
glsl: Use ir_var_temporary for compiler generated temporaries
These few places were using ir_var_auto for seemingly no reason. The
names were not added to the symbol table.
No change Valgrind massif results for a trimmed apitrace of dota2.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Ian Romanick [Wed, 9 Jul 2014 01:55:27 +0000 (18:55 -0700)]
glsl: Add context-level controls for whether temporaries have real names
No change Valgrind massif results for a trimmed apitrace of dota2.
v2: Minor rebase on _mesa_init_constants changes.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Ian Romanick [Wed, 9 Jul 2014 01:53:09 +0000 (18:53 -0700)]
glsl: Never put ir_var_temporary variables in the symbol table
Later patches will give every ir_var_temporary the same name in release
builds. Adding a bunch of variables named "compiler_temp" to the symbol
table can only cause problems.
No change Valgrind massif results for a trimmed apitrace of dota2.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Ian Romanick [Tue, 8 Jul 2014 23:57:33 +0000 (16:57 -0700)]
glsl: Add the possibility for ir_variable to have a non-ralloced name
Specifically, ir_var_temporary variables constructed with a NULL name
will all have the name "compiler_temp" in static storage.
No change Valgrind massif results for a trimmed apitrace of dota2.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Ian Romanick [Wed, 28 May 2014 01:45:40 +0000 (18:45 -0700)]
glsl: Store ir_variable_data::_num_state_slots and ::binding in 16-bits each
Valgrind massif results for a trimmed apitrace of dota2:
n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B)
Before (32-bit): 44 40,577,049,140 68,118,608 62,441,063 5,677,545 0
After (32-bit): 71 40,583,408,411 67,761,528 62,263,519 5,498,009 0
Before (64-bit): 63 37,122,829,194 95,153,008 87,333,600 7,819,408 0
After (64-bit): 67 37,123,303,706 95,150,544 87,333,600 7,816,944 0
A real savings of 173KiB on 32-bit and no change on 64-bit.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Ian Romanick [Wed, 28 May 2014 01:34:24 +0000 (18:34 -0700)]
glsl: Squish ir_variable::max_ifc_array_access and ::state_slots together
At least one of these pointers must be NULL, and we can determine which
will be NULL by looking at other fields. Use this information to store
both pointers in the same location.
If anyone can think of a better name for the union than "u", I'm all
ears.
Valgrind massif results for a trimmed apitrace of dota2:
n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B)
Before (32-bit): 63 40,574,239,515 68,117,280 62,618,607 5,498,673 0
After (32-bit): 44 40,577,049,140 68,118,608 62,441,063 5,677,545 0
Before (64-bit): 53 37,126,451,468 95,150,256 87,711,304 7,438,952 0
After (64-bit): 63 37,122,829,194 95,153,008 87,333,600 7,819,408 0
A real savings of 173KiB on 32-bit and 368KiB on 64-bit.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Ian Romanick [Thu, 15 May 2014 02:47:28 +0000 (19:47 -0700)]
glsl: Make ir_variable::num_state_slots and ir_variable::state_slots private
Also move num_state_slots inside ir_variable_data for better packing.
The payoff for this will come in a few more patches.
No change Valgrind massif results for a trimmed apitrace of dota2.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Ian Romanick [Sat, 24 May 2014 01:57:36 +0000 (18:57 -0700)]
glsl: Make ir_variable::max_ifc_array_access private
The payoff for this will come in a few more patches.
No change Valgrind massif results for a trimmed apitrace of dota2.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Ian Romanick [Thu, 15 May 2014 01:36:57 +0000 (18:36 -0700)]
glsl: Store ir_variable::depth_layout using 3 bits
warn_extension_index was moved to improve packing.
Valgrind massif results for a trimmed apitrace of dota2:
n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B)
Before (32-bit): 73 40,580,476,304 68,488,400 62,796,151 5,692,249 0
After (32-bit): 73 40,575,751,558 68,116,528 62,618,607 5,497,921 0
Before (64-bit): 71 37,124,890,613 95,889,584 88,089,008 7,800,576 0
After (64-bit): 62 37,123,578,526 95,150,784 87,711,304 7,439,480 0
A real savings of 173KiB on 32-bit and 368KiB on 64-bit.
v2: Use the enum name with the bit-field and remove the extra casts.
Suggested by Ken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
Reviewed-by: Tapani Pälli <tapani.palli@intel.com> [v1]
Ian Romanick [Wed, 14 May 2014 20:25:14 +0000 (13:25 -0700)]
glsl: Replace ir_variable::warn_extension pointer with an 8-bit index
Also move the new warn_extension_index into ir_variable::data. This
enables slightly better packing.
Valgrind massif results for a trimmed apitrace of dota2:
n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B)
Before (32-bit): 82 40,580,040,531 68,488,992 62,973,695 5,515,297 0
After (32-bit): 73 40,580,476,304 68,488,400 62,796,151 5,692,249 0
Before (64-bit): 65 37,124,013,542 95,892,768 88,466,712 7,426,056 0
After (64-bit): 71 37,124,890,613 95,889,584 88,089,008 7,800,576 0
A real savings of 173KiB on 32-bit and 368KiB on 64-bit.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Ian Romanick [Tue, 13 May 2014 18:59:01 +0000 (11:59 -0700)]
glsl: Use accessors for ir_variable::warn_extension
The payoff for this will come in the next patch.
No change Valgrind massif results for a trimmed apitrace of dota2.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Ian Romanick [Thu, 29 May 2014 00:09:45 +0000 (17:09 -0700)]
glsl: Eliminate unused built-in variables after compilation
After compilation (and before linking) we can eliminate quite a few
built-in variables. Basically, any uniform or constant (e.g.,
gl_MaxVertexTextureImageUnits) that isn't used (with one exception) can
be eliminated. System values, vertex shader inputs (with one
exception), and fragment shader outputs that are not used and not
re-declared in the shader text can also be removed.
gl_ModelViewProjectMatrix and gl_Vertex are used by the built-in
function ftransform. There are some complications with eliminating
these variables (see the comment in the patch), so they are not
eliminated.
Valgrind massif results for a trimmed apitrace of dota2:
n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B)
Before (32-bit): 46 40,661,487,174 75,116,800 68,854,065 6,262,735 0
After (32-bit): 50 40,564,927,443 69,185,408 63,683,871 5,501,537 0
Before (64-bit): 64 37,200,329,700 104,872,672 96,514,546 8,358,126 0
After (64-bit): 59 36,822,048,449 96,526,888 89,113,000 7,413,888 0
A real savings of 4.9MiB on 32-bit and 7.0MiB on 64-bit.
v2: Don't remove any built-in with Transpose in the name.
v3: Fix comment typo noticed by Anuj.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Suggested-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: Eric Anholt <eric@anholt.net>
Ian Romanick [Thu, 29 May 2014 00:05:14 +0000 (17:05 -0700)]
glsl: Validate that built-in uniforms have backing state
All built-in uniforms are supposed to be backed by some GL state. The
state_slots field describes this backing state.
This helped me track down a bug in a later patch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
Eric Anholt [Tue, 30 Sep 2014 18:23:29 +0000 (11:23 -0700)]
vc4: Don't forget to store stencil along with depth when storing either.
Otherwise, we'd replace the stencil in our packed depth/stencil with 0s.
Fixes about 50 piglit tests.
Mathias Fröhlich [Sun, 21 Sep 2014 06:54:00 +0000 (08:54 +0200)]
llvmpipe: Reuse llvmpipes LLVMContext in the draw context.
Reuse the LLVMContext already allocated in llvmpipe_context
for draw_llvm if ppossible. This should decrease the memory
footprint of an llvmpipe context.
v2: Fix compile with llvm disabled.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
Mathias Fröhlich [Sun, 13 Jul 2014 10:49:41 +0000 (12:49 +0200)]
llvmpipe: Make a llvmpipe OpenGL context thread safe.
This fixes the remaining problem with the recently introduced
global jit memory manager. This change again uses a memory manager
that is local to gallivm_state. This implementation still frees
the majority of the memory immediately after compilation.
Only the generated code is deferred until this code is no longer used.
This change and the previous one using private LLVMContext instances
I can now safely run several independent OpenGL contexts driven
by llvmpipe from different threads.
v3: Rebase on llvm-3.6 compile fixes.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
Mathias Fröhlich [Thu, 28 Aug 2014 17:49:35 +0000 (19:49 +0200)]
llvmpipe: Use two LLVMContexts per OpenGL context instead of a global one.
This is one step to make llvmpipe thread safe as mandated by the OpenGL
standard. Using the global LLVMContext is obviously a problem for
that kind of use pattern. The patch introduces two LLVMContext
instances that are private to an OpenGL context and used for all
compiles. One is put into struct draw_llvm and the other
one into struct llvmpipe_context.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
Jason Ekstrand [Tue, 30 Sep 2014 17:15:23 +0000 (10:15 -0700)]
i965/brw_reg: Make the accumulator register take an explicit width.
The big pile of patches I just pushed regresses about 25 piglit tests on
SNB. This fixes the regressions.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Brian Paul [Mon, 29 Sep 2014 22:08:55 +0000 (16:08 -0600)]
llvmpipe: move lp_jit_screen_init() call after allocation of screen object
The screen argument isn't actually used by lp_jit_screen_init() at this
time, but let's move the call so that we pass a valid pointer.
v2: don't leak screen if lp_jit_screen_init() fails.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Brian Paul [Tue, 30 Sep 2014 16:28:34 +0000 (10:28 -0600)]
tgsi: fix Semantic.Name assignment in tgsi_transform_input_decl()
Assign the sem_name parameter, not TGSI_SEMANTIC_GENERIC.
Fixes polygon stipple regression.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Brian Paul [Mon, 29 Sep 2014 22:10:18 +0000 (16:10 -0600)]
util: simplify PIPE_TEXTURE_CUBE case in util_max_layer()
For cube resources, the array_size value should be 6. So handle
that case as we do for array texture resources. But assert that
array_size==6 just to be safe.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Brian Paul [Mon, 29 Sep 2014 16:17:15 +0000 (10:17 -0600)]
softpipe: don't special case PIPE_TEXTURE_CUBE in softpipe_resource_layout()
As with the previous patch for llvmpipe.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Brian Paul [Mon, 29 Sep 2014 16:15:28 +0000 (10:15 -0600)]
llvmpipe: remove special case for PIPE_TEXTURE_CUBE in llvmpipe_texture_layout()
layers (aka array_size) should be 6 for cube textures so we don't need
to special-case it. But add an assertion just to be safe.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Brian Paul [Mon, 29 Sep 2014 16:14:01 +0000 (10:14 -0600)]
gallium: add doc note about cube textures and can_create_resource()
Just to be clear, and echo the description for resource_create().
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Brian Paul [Mon, 29 Sep 2014 16:11:41 +0000 (10:11 -0600)]
st/mesa: remove unneded PIPE_TEXTURE_CUBE check in st_texture_create()
Earlier in the function we assert layers==6 for PIPE_TEXTURE_CUBE so
there's no reason to special-case the pt.array_size = layers assignment.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Eric Anholt [Thu, 4 Sep 2014 23:05:00 +0000 (16:05 -0700)]
mesa: Drop the always-software-primitive-restart paths.
The core sw primitive restart code is still around, because i965 uses it
in some cases, but there are no drivers that want it on all the time.
Reviewed-by: Rob Clark <robdclark@gmail.com>
Eric Anholt [Thu, 4 Sep 2014 20:57:23 +0000 (13:57 -0700)]
gallium: Drop software-only primitive restart support.
The drivers not flagging primitive restart support are r300 swtcl, svga,
nv30, and vc4.
The point of primitive restart is to slightly reduce draw call overhead
for apps by batching multiple draws. If we do an extra pass to read the
index buffer and split back into multiple draws, we've entirely missed the
point. This is particularly bad for drivers that otherwise have hardware
IB reads, where the readback is probably uncached.
Reviewed-by: Rob Clark <robdclark@gmail.com>
Jason Ekstrand [Fri, 26 Sep 2014 23:08:52 +0000 (16:08 -0700)]
i965/fs: Properly calculate the number of instructions in calculate_register_pressure
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Fri, 12 Sep 2014 23:17:37 +0000 (16:17 -0700)]
i965/fs: Use the GRF for FB writes on gen >= 7
On gen 7, the MRF was removed and we gained the ability to do send
instructions directly from the GRF. This commit enables that
functinoality for FB writes.
v2: Make handling of components more sane.
i965/fs: Force a high register for the final FB write
v2: Renamed the array for the range mappings and added a comment
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Tue, 16 Sep 2014 23:28:53 +0000 (16:28 -0700)]
i965/fs: Handle COMPR4 in LOAD_PAYLOAD
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Wed, 24 Sep 2014 21:51:22 +0000 (14:51 -0700)]
i965/fs: Constant propagate into LOAD_PAYLOAD
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Sat, 20 Sep 2014 04:03:25 +0000 (21:03 -0700)]
i965/fs: Add split_virtual_grfs and compute_to_mrf after lower_load_payload
If we are going to use LOAD_PAYLOAD operations to fill MRF registers, then
we will need this.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Tue, 16 Sep 2014 22:16:20 +0000 (15:16 -0700)]
i965/fs: Add a an optional source to the FS_OPCODE_FB_WRITE instruction
Previously, we were use the base_mrf parameter of fs_inst to store the MRF
location. In preparation for doing FB writes from the GRF, we now also
allow you to set inst->base_mrf to -1 and provide a source register.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Thu, 11 Sep 2014 23:43:37 +0000 (16:43 -0700)]
i965/fs: Use the GRF for UNTYPED_SURFACE_READ instructions
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Thu, 11 Sep 2014 23:13:15 +0000 (16:13 -0700)]
i965/fs: Use the GRF for UNTYPED_ATOMIC instructions
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Thu, 11 Sep 2014 23:15:10 +0000 (16:15 -0700)]
i965/fs: Add a function for getting a component of a 8 or 16-wide register
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Sat, 30 Aug 2014 00:22:57 +0000 (17:22 -0700)]
i965/fs: Use the instruction execution size directly for texture generation
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Wed, 17 Sep 2014 01:02:52 +0000 (18:02 -0700)]
i965/fs: Use exec_size instead of force_uncompressed in dump_instruction
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Sat, 16 Aug 2014 18:34:56 +0000 (11:34 -0700)]
i965/fs: Use instruction execution sizes instead of heuristics
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Sat, 16 Aug 2014 03:58:50 +0000 (20:58 -0700)]
i965/fs: Use instruction execution sizes to set compression state
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Fri, 5 Sep 2014 03:40:34 +0000 (20:40 -0700)]
i965/fs: Remove unneeded uses of force_uncompressed
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Sat, 16 Aug 2014 18:38:07 +0000 (11:38 -0700)]
i965/fs: Derive force_uncompressed from instruction exec_size
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Sat, 20 Sep 2014 03:36:52 +0000 (20:36 -0700)]
i965/fs: Make fs_reg::effective_width take fs_inst* instead of fs_visitor*
Now that we have execution sizes, we can use that instead of the
dispatch width. This way it also works for 8-wide instructions in
SIMD16.
i965/fs: Make effective_width a variable instead of a function
i965/fs: Preserve effective width in constant propagation
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Fri, 12 Sep 2014 05:33:52 +0000 (22:33 -0700)]
i965/fs: Better guess the width of LOAD_PAYLOAD
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Thu, 14 Aug 2014 20:56:24 +0000 (13:56 -0700)]
i965/fs: Add an exec_size field to fs_inst
This will, eventually, allow us to manage execution sizes of
instructions in a much more natural way from the fs_visitor level.
i965/fs: Explicitly set instruction execute size a couple of places
i965/blorp: Explicitly set instruction execute sizes
Since blorp is all 16-wide and nothing isn't, in general, very careful
about register width, we'll just set it all explicitly.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Sat, 30 Aug 2014 00:18:42 +0000 (17:18 -0700)]
i965/fs: Determine partial writes based on the destination width
Now that we track both halves of a 16-wide vgrf, we no longer need to worry
about force_sechalf or force_uncompressed. The only real issue is if the
destination is too small.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Tue, 16 Sep 2014 23:34:23 +0000 (16:34 -0700)]
i965/fs: Fix a bug in register coalesce
This commit fixes a bug in register coalesce that happens when one register
is moved to another the proper number of times but the channels are
re-arranged. When this happens, the previous code would happily coalesce
the registers regardless of the fact that the channel mappins were wrong.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Thu, 18 Sep 2014 19:16:25 +0000 (12:16 -0700)]
i965/fs: Rework GEN5 texturing code to use fs_reg and offset()
Now that offset() can properly handle MRF registers, we can use an MRF
fs_reg and let offset() handle incrementing it correctly for different
dispatch widths. While this doesn't have any noticeable effect currently,
it does ensure that the destination register is 16-wide which will be
necessary later when we start detecting execution sizes based on source and
destination registers.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Mon, 18 Aug 2014 21:27:55 +0000 (14:27 -0700)]
i965/fs_reg: Allocate double the number of vgrfs in SIMD16 mode
This is actually the squash of a bunch of different changes. Individual
commit titles follow:
i965/fs: Always 2-align registers SIMD16 for gen <= 5
i965/fs: Use the register width when applying offsets
This reworks both byte_offset() and offset() to be more intelligent.
The byte_offset() function now supports offsets bigger than 32. The
offset() function uses the byte_offset() function together with the
register width and the type size to offset the register by the correct
amount.
i965/fs: Change regs_read to be in hardware registers
i965/fs: Change regs_written to be actual hardware registers
i965/fs: Properly handle register widths in LOAD_PAYLOAD
The LOAD_PAYLOAD instruction is a bit special because it collects a
bunch of registers (with possibly different widths) into a single
payload block. Once the payload is constructed, it's treated as a
single block of data and most of the information such as register widths
doesn't matter anymore. In particular, the offset of any particular
source register is the accumulation of the sizes of the previous source
registers.
i965/fs: Properly set writemasks in LOAD_PAYLOAD
i965/fs: Handle register widths in demote_pull_constants
i965/fs: Get rid of implicit register doubling in the allocator
i965/fs: Reserve enough registers for PLN instructions
i965/fs: Make sources and destinations interfere in 16-wide
i965/fs: Properly handle register widths in CSE
i965/fs: Properly handle register widths in register_coalesce
i965/fs: Properly handle widths in copy propagation
i965/fs: Properly handle register widths in VARYING_PULL_CONSTANT_LOAD
i965/fs: Properly handle register widths and odd register sizes in spilling
i965/fs: Don't waste a register on texture lookups for gen >= 7
Previously, we were waisting a register in SIMD16 mode because we could
only allocate registers in pairs. Now that we can allocate and address
odd-sized registers, let's get rid of this special-case.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Wed, 13 Aug 2014 21:42:40 +0000 (14:42 -0700)]
i965/fs: Handle printing of registers better.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Thu, 25 Sep 2014 19:06:42 +0000 (12:06 -0700)]
i965: Explicitly set widths on gen5 math instruction destinations.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Sat, 16 Aug 2014 17:48:18 +0000 (10:48 -0700)]
i965/fs: Make half() divide the register width by 2 and use it more
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Wed, 13 Aug 2014 19:25:58 +0000 (12:25 -0700)]
i965/fs: Add a concept of a width to fs_reg
Every register in i965 assembly implicitly has a concept of a "width".
Usually, this is derived from the execution size of the instruction.
However, when writing a compiler it turns out that it is frequently a
useful to have the width explicitly in the register and derive the
execution size of the instruction from the widths of the registers used in
it.
This commit adds a width field to fs_reg along with an effective_width()
helper function. The effective_width() function tells you how wide the
register effectively is when used in an instruction. For example, uniform
values have width 1 since the data is not actually repeated, but when used
in an instruction they take on the width of the instruction. However, for
some instructions (LOAD_PAYLOAD being the notable exception), the width is
not the same.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Fri, 19 Sep 2014 23:42:10 +0000 (16:42 -0700)]
i965/fs: A little harmless refactoring of register_coalesce
Just pass the visitor into is_copy_payload() and is_coalesce_candidate()
instead of a register size and the virtual_grf_sizes array. Among other
things, this makes the code more obvious because you don't have to figure
out where src_size came from.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Wed, 13 Aug 2014 19:23:47 +0000 (12:23 -0700)]
i965/brw_reg: Add a firsthalf function and use it in the generator
Right now, this function is a no-op but it indicates that we intend to only
use the first half of the 16-wide register.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Wed, 24 Sep 2014 00:22:09 +0000 (17:22 -0700)]
i965/fs: Copy propagate partial reads.
This commit reworks copy propagation a bit to support propagating the
copying of partial registers. This comes up every time we have pull
constants because we do a pull constant read immediately followed by a move
to splat the one component of the out to 8 or 16-wide. This allows us to
eliminate the copy and simply use the one component of the register.
Shader DB results:
total instructions in shared programs: 5044937 -> 5044428 (-0.01%)
instructions in affected programs: 66112 -> 65603 (-0.77%)
GAINED: 0
LOST: 0
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Sat, 13 Sep 2014 18:49:55 +0000 (11:49 -0700)]
i965/fs: Refactor fs_inst::is_send_from_grf()
A switch statement is much easier to read/edit than a big giant or
statement.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Sat, 13 Sep 2014 00:49:49 +0000 (17:49 -0700)]
i965/fs: Clean up emit_fb_writes
This splits emit_fb_writes into two functions: emit_fb_writes and
emit_single_fb_write. This reduces the amount of duplicated code in
emit_fb_writes and makes the register number fiddling less arcane.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Tue, 16 Sep 2014 22:56:47 +0000 (15:56 -0700)]
i965/fs: Print BAD_FILE registers in dump_instruction
Sometimes these show up in LOAD_PAYLOAD instructions and it's nice to be
able to see them.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Tue, 16 Sep 2014 20:14:09 +0000 (13:14 -0700)]
i965/fs: Make compact_virtual_grfs an optimization pass
Previously we disabled compact_virtual_grfs when dumping optimizations.
The idea here was to make it easier to diff the dumped shader because you
didn't have a sudden renaming. However, sometimes a bug is affected by
compact_virtual_grfs and, when this happens, you want to keep dumping
instructions with compact_virtual_grfs enabled. By turning it into an
optimization pass and dumping it along with the others, we retain the
ability to diff because you can just diff against the compact_virtual_grf
output.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Wed, 10 Sep 2014 17:17:28 +0000 (10:17 -0700)]
i964/fs: Make immediate fs_reg constructors explicit
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Wed, 10 Sep 2014 18:28:27 +0000 (11:28 -0700)]
i965/fs: Make null_reg_* const members of fs_visitor instead of globals
We also set the register width equal to the dispatch width. Right now,
this is effectively a no-op since we don't do anything with it. However,
it will be important once we add an actual width field to fs_reg.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Tue, 9 Sep 2014 01:34:28 +0000 (18:34 -0700)]
i965/fs: Use the var_from_vgrf helper function instead of doing it manually
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Wed, 24 Sep 2014 20:16:43 +0000 (13:16 -0700)]
i965/fs: Fix a bug with dead_code_eliminate on large writes
Previously, if an instruction wrote to more than one register, we
implicitly assumed that it filled the entire register. We never hit this
before because the only time we did multi-register writes was things like
texturing which always wrote to all of the registers. However, with the
upcoming ability to do 16-wide instructions in SIMD8 and things of that
nature, we can have multi-register writes at offsets and we'll hit this.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Mon, 8 Sep 2014 22:26:24 +0000 (15:26 -0700)]
i965/fs: Use the UW type for the destination of VARYING_PULL_CONSTANT_LOAD instructions
Using a floating-point type doesn't usually cause hangs on my HSW, but the
simulator complains about it quite a bit.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Sat, 6 Sep 2014 20:48:34 +0000 (13:48 -0700)]
i965/fs: Use offset a lot more places
We have this wonderful offset() function for advancing registers, but we're
not using it. Using offset() allows us to do some sanity checking and
avoid manually touching fs_reg::reg_offset. In a few commits, we will make
offset do even more nifty things for us.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>