Jason Ekstrand [Sat, 20 Sep 2014 03:36:52 +0000 (20:36 -0700)]
i965/fs: Make fs_reg::effective_width take fs_inst* instead of fs_visitor*
Now that we have execution sizes, we can use that instead of the
dispatch width. This way it also works for 8-wide instructions in
SIMD16.
i965/fs: Make effective_width a variable instead of a function
i965/fs: Preserve effective width in constant propagation
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Fri, 12 Sep 2014 05:33:52 +0000 (22:33 -0700)]
i965/fs: Better guess the width of LOAD_PAYLOAD
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Thu, 14 Aug 2014 20:56:24 +0000 (13:56 -0700)]
i965/fs: Add an exec_size field to fs_inst
This will, eventually, allow us to manage execution sizes of
instructions in a much more natural way from the fs_visitor level.
i965/fs: Explicitly set instruction execute size a couple of places
i965/blorp: Explicitly set instruction execute sizes
Since blorp is all 16-wide and nothing isn't, in general, very careful
about register width, we'll just set it all explicitly.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Sat, 30 Aug 2014 00:18:42 +0000 (17:18 -0700)]
i965/fs: Determine partial writes based on the destination width
Now that we track both halves of a 16-wide vgrf, we no longer need to worry
about force_sechalf or force_uncompressed. The only real issue is if the
destination is too small.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Tue, 16 Sep 2014 23:34:23 +0000 (16:34 -0700)]
i965/fs: Fix a bug in register coalesce
This commit fixes a bug in register coalesce that happens when one register
is moved to another the proper number of times but the channels are
re-arranged. When this happens, the previous code would happily coalesce
the registers regardless of the fact that the channel mappins were wrong.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Thu, 18 Sep 2014 19:16:25 +0000 (12:16 -0700)]
i965/fs: Rework GEN5 texturing code to use fs_reg and offset()
Now that offset() can properly handle MRF registers, we can use an MRF
fs_reg and let offset() handle incrementing it correctly for different
dispatch widths. While this doesn't have any noticeable effect currently,
it does ensure that the destination register is 16-wide which will be
necessary later when we start detecting execution sizes based on source and
destination registers.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Mon, 18 Aug 2014 21:27:55 +0000 (14:27 -0700)]
i965/fs_reg: Allocate double the number of vgrfs in SIMD16 mode
This is actually the squash of a bunch of different changes. Individual
commit titles follow:
i965/fs: Always 2-align registers SIMD16 for gen <= 5
i965/fs: Use the register width when applying offsets
This reworks both byte_offset() and offset() to be more intelligent.
The byte_offset() function now supports offsets bigger than 32. The
offset() function uses the byte_offset() function together with the
register width and the type size to offset the register by the correct
amount.
i965/fs: Change regs_read to be in hardware registers
i965/fs: Change regs_written to be actual hardware registers
i965/fs: Properly handle register widths in LOAD_PAYLOAD
The LOAD_PAYLOAD instruction is a bit special because it collects a
bunch of registers (with possibly different widths) into a single
payload block. Once the payload is constructed, it's treated as a
single block of data and most of the information such as register widths
doesn't matter anymore. In particular, the offset of any particular
source register is the accumulation of the sizes of the previous source
registers.
i965/fs: Properly set writemasks in LOAD_PAYLOAD
i965/fs: Handle register widths in demote_pull_constants
i965/fs: Get rid of implicit register doubling in the allocator
i965/fs: Reserve enough registers for PLN instructions
i965/fs: Make sources and destinations interfere in 16-wide
i965/fs: Properly handle register widths in CSE
i965/fs: Properly handle register widths in register_coalesce
i965/fs: Properly handle widths in copy propagation
i965/fs: Properly handle register widths in VARYING_PULL_CONSTANT_LOAD
i965/fs: Properly handle register widths and odd register sizes in spilling
i965/fs: Don't waste a register on texture lookups for gen >= 7
Previously, we were waisting a register in SIMD16 mode because we could
only allocate registers in pairs. Now that we can allocate and address
odd-sized registers, let's get rid of this special-case.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Wed, 13 Aug 2014 21:42:40 +0000 (14:42 -0700)]
i965/fs: Handle printing of registers better.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Thu, 25 Sep 2014 19:06:42 +0000 (12:06 -0700)]
i965: Explicitly set widths on gen5 math instruction destinations.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Sat, 16 Aug 2014 17:48:18 +0000 (10:48 -0700)]
i965/fs: Make half() divide the register width by 2 and use it more
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Wed, 13 Aug 2014 19:25:58 +0000 (12:25 -0700)]
i965/fs: Add a concept of a width to fs_reg
Every register in i965 assembly implicitly has a concept of a "width".
Usually, this is derived from the execution size of the instruction.
However, when writing a compiler it turns out that it is frequently a
useful to have the width explicitly in the register and derive the
execution size of the instruction from the widths of the registers used in
it.
This commit adds a width field to fs_reg along with an effective_width()
helper function. The effective_width() function tells you how wide the
register effectively is when used in an instruction. For example, uniform
values have width 1 since the data is not actually repeated, but when used
in an instruction they take on the width of the instruction. However, for
some instructions (LOAD_PAYLOAD being the notable exception), the width is
not the same.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Fri, 19 Sep 2014 23:42:10 +0000 (16:42 -0700)]
i965/fs: A little harmless refactoring of register_coalesce
Just pass the visitor into is_copy_payload() and is_coalesce_candidate()
instead of a register size and the virtual_grf_sizes array. Among other
things, this makes the code more obvious because you don't have to figure
out where src_size came from.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Wed, 13 Aug 2014 19:23:47 +0000 (12:23 -0700)]
i965/brw_reg: Add a firsthalf function and use it in the generator
Right now, this function is a no-op but it indicates that we intend to only
use the first half of the 16-wide register.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Wed, 24 Sep 2014 00:22:09 +0000 (17:22 -0700)]
i965/fs: Copy propagate partial reads.
This commit reworks copy propagation a bit to support propagating the
copying of partial registers. This comes up every time we have pull
constants because we do a pull constant read immediately followed by a move
to splat the one component of the out to 8 or 16-wide. This allows us to
eliminate the copy and simply use the one component of the register.
Shader DB results:
total instructions in shared programs: 5044937 -> 5044428 (-0.01%)
instructions in affected programs: 66112 -> 65603 (-0.77%)
GAINED: 0
LOST: 0
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Sat, 13 Sep 2014 18:49:55 +0000 (11:49 -0700)]
i965/fs: Refactor fs_inst::is_send_from_grf()
A switch statement is much easier to read/edit than a big giant or
statement.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Sat, 13 Sep 2014 00:49:49 +0000 (17:49 -0700)]
i965/fs: Clean up emit_fb_writes
This splits emit_fb_writes into two functions: emit_fb_writes and
emit_single_fb_write. This reduces the amount of duplicated code in
emit_fb_writes and makes the register number fiddling less arcane.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Tue, 16 Sep 2014 22:56:47 +0000 (15:56 -0700)]
i965/fs: Print BAD_FILE registers in dump_instruction
Sometimes these show up in LOAD_PAYLOAD instructions and it's nice to be
able to see them.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Tue, 16 Sep 2014 20:14:09 +0000 (13:14 -0700)]
i965/fs: Make compact_virtual_grfs an optimization pass
Previously we disabled compact_virtual_grfs when dumping optimizations.
The idea here was to make it easier to diff the dumped shader because you
didn't have a sudden renaming. However, sometimes a bug is affected by
compact_virtual_grfs and, when this happens, you want to keep dumping
instructions with compact_virtual_grfs enabled. By turning it into an
optimization pass and dumping it along with the others, we retain the
ability to diff because you can just diff against the compact_virtual_grf
output.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Wed, 10 Sep 2014 17:17:28 +0000 (10:17 -0700)]
i964/fs: Make immediate fs_reg constructors explicit
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Wed, 10 Sep 2014 18:28:27 +0000 (11:28 -0700)]
i965/fs: Make null_reg_* const members of fs_visitor instead of globals
We also set the register width equal to the dispatch width. Right now,
this is effectively a no-op since we don't do anything with it. However,
it will be important once we add an actual width field to fs_reg.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Tue, 9 Sep 2014 01:34:28 +0000 (18:34 -0700)]
i965/fs: Use the var_from_vgrf helper function instead of doing it manually
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Wed, 24 Sep 2014 20:16:43 +0000 (13:16 -0700)]
i965/fs: Fix a bug with dead_code_eliminate on large writes
Previously, if an instruction wrote to more than one register, we
implicitly assumed that it filled the entire register. We never hit this
before because the only time we did multi-register writes was things like
texturing which always wrote to all of the registers. However, with the
upcoming ability to do 16-wide instructions in SIMD8 and things of that
nature, we can have multi-register writes at offsets and we'll hit this.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Mon, 8 Sep 2014 22:26:24 +0000 (15:26 -0700)]
i965/fs: Use the UW type for the destination of VARYING_PULL_CONSTANT_LOAD instructions
Using a floating-point type doesn't usually cause hangs on my HSW, but the
simulator complains about it quite a bit.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Sat, 6 Sep 2014 20:48:34 +0000 (13:48 -0700)]
i965/fs: Use offset a lot more places
We have this wonderful offset() function for advancing registers, but we're
not using it. Using offset() allows us to do some sanity checking and
avoid manually touching fs_reg::reg_offset. In a few commits, we will make
offset do even more nifty things for us.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Tue, 19 Aug 2014 23:11:36 +0000 (16:11 -0700)]
i965/fs: fix a comment in compact_virtual_grfs
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Tue, 19 Aug 2014 20:57:11 +0000 (13:57 -0700)]
i965/fs: Rewrite fs_visitor::split_virtual_grfs
The original vgrf splitting code was written with the assumption that vgrfs
came in two types: those that can be split into single registers and those
that can't be split at all It was very conservative and bailed as soon as
more than one element of a register was read or written. This won't work
once we start allowing a regular MOV or ADD operation to operate on
multiple registers. This rewrite allows for the case where a vgrf of size
5 may appropriately be split in to one register of size 1 and two registers
of size 2.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Sat, 6 Sep 2014 17:37:22 +0000 (10:37 -0700)]
i965/fs_live_variables: Use var_from_vgrf insead of repeating the calculation
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Fri, 26 Sep 2014 21:47:03 +0000 (14:47 -0700)]
i965/fs: Manually generate the meta fast-clear shader
Previously, we were generating the fast-clear shader from GLSL. The
problem is that fast clears require that we use a replicated write rather
than a regular write instruction. In order to get this we had a
complicated and somewhat fragile optimization pass that looked for places
where we can use a replicated write and used it. Since replicated writes
have a lot of restrictions, we only ever use them for fast-clear
operations.
This commit replaces the optimization pass with a function that just
generates the shader we want. This is a) less code, b) less fragile than
the optimization pass, and c) generates a more efficient shader.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Michel Dänzer [Tue, 30 Sep 2014 02:16:52 +0000 (11:16 +0900)]
radeonsi: Pass the slice size to si_dma_copy_buffer
Otherwise some parts of tiled slices can be missed.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Michel Dänzer [Thu, 11 Sep 2014 02:49:16 +0000 (11:49 +0900)]
radeonsi: Catch more cases that can't be handled by si_dma_copy_buffer/tile
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Michel Dänzer [Wed, 10 Sep 2014 09:43:56 +0000 (18:43 +0900)]
radeonsi: Fix si_dma_copy(_tile) for compressed formats
Fixes GPUVM faults when running the piglit test "getteximage-formats
init-by-rendering" with R600_DEBUG=forcedma on SI.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Michel Dänzer [Wed, 10 Sep 2014 01:57:58 +0000 (10:57 +0900)]
radeonsi: Fix tiling mode index for stencil resources
We are currently only dealing with depth-only or stencil-only resources
here, not with resources having both depth and stencil[0]. In both cases,
the tiling mode index is in the tile_mode field, not in the
stencil_tile_mode field.
[0] Add an assertion for that.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Chia-I Wu [Tue, 30 Sep 2014 08:21:15 +0000 (16:21 +0800)]
ilo: fix format of edge flag pointer
The VE format of edge flag pointers was changed in
780ce576bb1781f027797039693b98253ee4813e.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Chia-I Wu [Tue, 30 Sep 2014 02:32:53 +0000 (10:32 +0800)]
ilo: add a pass to finalize ilo_ve_state
Add finalize_vertex_elements() to finalize ilo_ve_state. This fixes a
potential issue with URB entry allocation for VS and move the complexity of
gen6_3DSTATE_VERTEX_ELEMENTS() to the new function.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Chia-I Wu [Tue, 30 Sep 2014 07:18:09 +0000 (15:18 +0800)]
ilo: precalculate aligned depth buffer size
To replace the hacky zs_align_surface().
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Chia-I Wu [Sat, 27 Sep 2014 16:41:42 +0000 (00:41 +0800)]
ilo: use dynamic bo for rectlist vertices
The size is always 24 bytes. We can upload them to the dynamic buffer.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Thomas Hellstrom [Sun, 28 Sep 2014 15:17:22 +0000 (17:17 +0200)]
st/xa: Fix regression in xa_yuv_planar_blit()
Commit "st/xa: scissor to help tilers" broke xa_yuv_planar_blit() and vmwgfx
textured video. Fix this by implementing scissors also in the yuv draw path.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Cc: Rob Clark <robclark@freedesktop.org>
Cc: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
Kenneth Graunke [Mon, 29 Sep 2014 19:22:04 +0000 (12:22 -0700)]
i965: Delete intel_chipset.h.
Unused; it was replaced by include/pci_ids/i965_pci_ids.h long ago.
Acked-by: Matt Turner <mattst88@gmail.com>
Alex Henrie [Fri, 26 Sep 2014 20:44:20 +0000 (14:44 -0600)]
driconf: Correct and update Catalan translation
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Alex Henrie [Fri, 26 Sep 2014 20:32:57 +0000 (14:32 -0600)]
driconf: Update Spanish translation
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Alex Henrie [Fri, 26 Sep 2014 20:31:57 +0000 (14:31 -0600)]
driconf: Synchronize po files
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Eric Anholt [Mon, 29 Sep 2014 22:58:15 +0000 (15:58 -0700)]
vc4: Don't try to do stores to buffers that aren't bound.
The code was kind of mixed up what buffers were getting stored in the case
that a resolve bit was unset (which are set based on the GL state at draw
time) and the buffer wasn't actually bound. In particular, depth-only
rendering would store the color buffer contents, which happen to be
pointing at the depth buffer.
Thanks to clearing out the resolve bits for things we really can't
resolve, now I can drop the safety checks for buffer presence around the
actual stores.
Fixes 42 piglit tests.
Eric Anholt [Mon, 29 Sep 2014 22:31:23 +0000 (15:31 -0700)]
vc4: Shove some depth comparison bits down to where they're used.
Matt Turner [Sat, 30 Aug 2014 04:10:32 +0000 (21:10 -0700)]
i965: Use BRW_MATH_DATA_SCALAR when source regioning is scalar.
Notice the mistaken (but harmless) argument swapping in brw_math_invert().
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Sat, 27 Sep 2014 16:42:33 +0000 (09:42 -0700)]
i965/compaction: Move variable declarations to their uses.
Tested-by: Mark Janes <mark.a.janes@intel.com>
Matt Turner [Sat, 27 Sep 2014 16:39:49 +0000 (09:39 -0700)]
i965/compaction: Simplify jump target code.
My attempts to clarify the code with _compacted/_uncompacted prefixed
variables apparently failed. Hopefully this is clearer.
In any case, the previous code wasn't clear enough to gcc to let it
optimize division by a power of two into a shift. No problems now.
Also, the previous code (in the ADD case) didn't work on 32-bit x86, due
to complicated set of interactions best summed up as unsigned division
and compiler optimizations.
Tested-by: Mark Janes <mark.a.janes@intel.com>
Rob Clark [Mon, 29 Sep 2014 18:29:04 +0000 (14:29 -0400)]
freedreno/a3xx: re-emit shaders on variant change
We need to keep track if a state change other than frag/vert shader
state will trigger us to need a different shader variant, and if
necessary mark the appropriate shader state as dirty. Otherwise we will
forget to re-emit the shader state.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Mon, 29 Sep 2014 12:52:16 +0000 (08:52 -0400)]
freedreno/ir3: add some cmdline args
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Mon, 29 Sep 2014 14:44:46 +0000 (10:44 -0400)]
freedreno/a3xx: add support to emulate GL_CLAMP
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Mon, 29 Sep 2014 13:59:27 +0000 (09:59 -0400)]
freedreno: add texcoord clamp support to lowering
This is for hw that needs to emulate some texture wrap modes (like
CLAMP) with some help from the shader.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Mon, 29 Sep 2014 18:55:38 +0000 (14:55 -0400)]
freedreno: move bind_sampler_states to per-generation
Keep the existing function as a common helper. But this lets us move an
a2xx specific hack out of common code. And the PIPE_TEX_WRAP_CLAMP
emulation will require an a3xx specific hack. So rather than piling on
hacks, split this out.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Sat, 27 Sep 2014 22:19:54 +0000 (18:19 -0400)]
freedreno/a3xx: fix border color order
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Fri, 26 Sep 2014 14:33:35 +0000 (10:33 -0400)]
freedreno/a3xx: add 32bit integer vtx formats
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Eric Anholt [Mon, 29 Sep 2014 19:34:17 +0000 (12:34 -0700)]
vc4: Add support for GL 1.1's stupid CLAMP mode.
We just clamp the incoming texture coordinates. This breaks the lambda
calculation, but it gets the piglit tests to pass. This is the same
behavior as in i965.
Eric Anholt [Mon, 29 Sep 2014 19:26:21 +0000 (12:26 -0700)]
vc4: Add support for texture border color.
One spot in the docs says that it's stored at a miplevel just beyond the
last miplevel, which was scary. But really, you just load it as the R
coordinate (which conflicts with cubemaps, but you don't do border
clamping on cubes).
Eric Anholt [Mon, 29 Sep 2014 18:33:13 +0000 (11:33 -0700)]
vc4: Add the necessary stubs for occlusion queries.
We have to expose them for GL 2.0, but we just always return a value of 0.
We should be advertising 0 query bits instead of 64, but gallium doesn't
have plumbing for that yet. At least this stops the segfaults.
Eric Anholt [Thu, 25 Sep 2014 04:57:06 +0000 (21:57 -0700)]
vc4: Optimize out silly SUBs of 0.
Drops instructions on vs-temp-array-mat4-index-col-row-wr.shader_test,
which I was looking at because it's failing to register allocate.
Eric Anholt [Thu, 25 Sep 2014 05:03:06 +0000 (22:03 -0700)]
vc4: Dump constant uniform values in VC4_DEBUG=qir.
Definitely helps when trying to understand and optimize a program.
Eric Anholt [Thu, 25 Sep 2014 04:37:12 +0000 (21:37 -0700)]
vc4: Turn a SEL_X_Y(x, 0) into SEL_X_0(x).
This may reduce register pressure and uniform counts. Drops a bunch of 0
uniform loads on vs-temp-array-mat4-index-col-row-wr.shader_test, which is
failing to register allocate.
Eric Anholt [Sun, 28 Sep 2014 01:57:20 +0000 (18:57 -0700)]
vc4: Add support for texture cube maps.
It's not passing some of the piglit tests, because it looks like at small
miplevels some contents from surrounding faces are getting filtered in at
the corners. It does get 7 new tests passing.
Eric Anholt [Sun, 28 Sep 2014 02:10:34 +0000 (19:10 -0700)]
vc4: Rename the slice's size0.
In the other related fields, "0" refers to the size of the first miplevel,
while this is a field in a slice. The other implicit slices we have
(cubemap layers) don't vary in size compared to the first one.
Eric Anholt [Mon, 29 Sep 2014 16:39:46 +0000 (09:39 -0700)]
vc4: Stop trying to reuse temporaries that store uniform values.
Almost always, the MOV will get copy propagated out. Even if it doesn't,
it's probably better to just reload the uniform at next use (to reduce
register pressure) rather than try to save instruction count.
I was looking at this because in the presence of texturing (which calls
add_uniform() directly to get the uniform load forced into the
instruction) the c->uniform_contents indices don't match 1:1 with the
temporary qregs.
Tapani Pälli [Mon, 29 Sep 2014 12:02:57 +0000 (15:02 +0300)]
egl: setup screen iterator before using it
commit 4ed23fd broke creation of pbuffer surfaces, patch fixes
the failure, noticed when running chrome with '--use-gl=egl'.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Chia-I Wu [Mon, 29 Sep 2014 08:58:14 +0000 (16:58 +0800)]
ilo: fix a missing 'else'
An 'else' is missing in the disassembler.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Kalyan Kondapally [Sat, 27 Sep 2014 02:44:06 +0000 (19:44 -0700)]
glsl: Allow texture2DProjLod and textureCubeLod in GL ES
According to GLES (i.e. 1.0 and above) spec textureCubeLod and
texture2DProjLod are built in functions. We seem to disable support
for these functions with GLES. This patch enables the support.
Signed-off-by: Kalyan Kondapally <kalyan.kondapally@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84355
Rob Clark [Sun, 28 Sep 2014 16:45:23 +0000 (12:45 -0400)]
configure.ac: bump libdrm_freedreno requirement
We need 2.4.57 for fd_bo_dmabuf() / fd_bo_from_dmabuf().
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Matt Turner [Sun, 7 Sep 2014 07:41:41 +0000 (00:41 -0700)]
glsl: Recognize open-coded pow(x, y).
pow(x, y) is equivalent to exp(log(x) * y).
instructions in affected programs: 578 -> 458 (-20.76%)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Mon, 8 Sep 2014 19:09:44 +0000 (12:09 -0700)]
i965/fs: Don't invalidate live intervals in saturate propagation.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Matt Turner [Mon, 8 Sep 2014 19:06:49 +0000 (12:06 -0700)]
i965/fs: Ignore mov.sat instructions in interference check in sat prop.
When an instruction's result was consumed by multiple mov.sat
instructions, we would decide that we couldn't move the saturate
modifier because something else was using the result, even though it was
just another mov.sat!
total instructions in shared programs: 4275598 -> 4274842 (-0.02%)
instructions in affected programs: 75634 -> 74878 (-1.00%)
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Matt Turner [Mon, 8 Sep 2014 19:05:25 +0000 (12:05 -0700)]
i965/fs: Walk instructions in reverse in saturate propagation.
When we find a mov.sat, we search backwards. We might as well search
everything else backwards as well and potentially look at fewer
instructions.
This change enables the next patch.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Rob Clark [Fri, 26 Sep 2014 14:33:11 +0000 (10:33 -0400)]
freedreno/a3xx: add flat interpolation mode
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Sat, 27 Sep 2014 15:45:26 +0000 (11:45 -0400)]
freedreno/a3xx: add LOD_BIAS
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Fri, 26 Sep 2014 21:56:08 +0000 (17:56 -0400)]
freedreno: turn missing caps into compile warnings
Get rid of the 'default' case (as suggestied by imirkin) so compiler
warns us about missing caps. Also add some caps that were missing until
now.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Fri, 26 Sep 2014 19:40:35 +0000 (15:40 -0400)]
freedreno: we have more than 0 viewports!
4155d1c7 'st/mesa: drop dependence on API profile in st_init_extensions'
broke freedreno because somehow 'PIPE_CAP_MAX_VIEWPORTS' fell through
the cracks. Resulting that we reported zero viewports. So the state
tracker never bothered to give us any valid viewport!
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Thu, 25 Sep 2014 21:14:05 +0000 (17:14 -0400)]
freedreno: update generated headers
Among other things, fixes a bug for fixed point registers/bitfields.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Fri, 26 Sep 2014 14:35:52 +0000 (10:35 -0400)]
freedreno: don't advertise mirror-clamp support
At least on a3xx, we cannot do it without some emulation in shader.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Fri, 26 Sep 2014 14:35:33 +0000 (10:35 -0400)]
freedreno: fix compiler warning
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Tom Stellard [Thu, 25 Sep 2014 19:55:40 +0000 (12:55 -0700)]
configure.ac: Compute LLVM_VERSION_PATCH using llvm-config
This is the only guaranteed way get the patch level for llvm,
since the define cannot always be found in config.h depending
on the version of llvm or the build system used.
CC: 10.2 10.3 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jonathan Gray <jsg@jsg.id.au>
Emil Velikov [Thu, 25 Sep 2014 14:26:13 +0000 (15:26 +0100)]
Remove Bluegene/L wrappers
Added back in 2009, with osmesa/GLU in mind. Unlikely to be working
any more since the removal of the static makefiles.
Cc: Brian Paul <brianp@vmware.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Emil Velikov [Thu, 25 Sep 2014 13:16:16 +0000 (14:16 +0100)]
mesa: remove last DJGPP remains
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Emil Velikov [Mon, 22 Sep 2014 17:58:19 +0000 (18:58 +0100)]
configure: use explicit enabled/disabled in config switch description
Rather than having double negatives -> disable-opencl, default=no
simply use enabled/disabled. It makes things a bit easier for the
reader and consistent throughout the file.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Emil Velikov [Mon, 22 Sep 2014 17:41:58 +0000 (18:41 +0100)]
configure: ask vdpau.pc for the default location of the vdpau drivers
Rather than using hardcoded values honor the value set at libvdpau
build time - i.e. the moduledir variable from vdpau.pc
Update the omx description to match reality while we're here.
Cc: Christian König <deathsimple@vodafone.de>
Cc: Alexandre Demers <alexandre.f.demers@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80615
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Emil Velikov [Mon, 22 Sep 2014 17:32:03 +0000 (18:32 +0100)]
configure: drop --with-egl-driver-dir switch
The location of the egl driver(s) is matter that we should have
never exposed to the user. Currently the dri2 driver is built
into the libEGL loader, with the gallium based one soon to follow.
v2: Fold EGL_DRIVER_INSTALL_DIR within the makefiles. Suggested by Matt.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80615
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Emil Velikov [Mon, 22 Sep 2014 17:12:00 +0000 (18:12 +0100)]
configure: remove non-functional --with-opencl-libdir
The parameter used to control where the gallium pipe-drivers
were installed, but was broken since
commit
45270fb0fd1abd7619933c2845f9dc74cdfbe6fd
Author: Matt Turner <mattst88@gmail.com>
Date: Thu Sep 13 10:45:01 2012 -0700
targets/pipe-loader: Convert to automake
Considering that nowadays the pipe-drivers can be used by
more than just the opencl target, even fixing this up will
not be the best idea.
Cc: Matt Turner <mattst88@gmail.com>
Cc: Francisco Jerez <currojerez@riseup.net>
Buzilla: https://bugs.freedesktop.org/show_bug.cgi?id=61415
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Ian Romanick [Wed, 10 Sep 2014 22:19:43 +0000 (15:19 -0700)]
glsl: Strip arrayness from ir_type_dereference_variable too
If the thing being dereferenced is a record or an array of records, it
should be treated as row-major. The ir_type_derference_record path
already does this, and I think I intended to do the same for this path
in
b17a4d5d.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83741
Cc: mesa-stable@lists.freedesktop.org
Ian Romanick [Wed, 10 Sep 2014 17:54:55 +0000 (10:54 -0700)]
glsl: Round struct size up to at least 16 bytes
Per rule #9, the size of the structure is vec4 aligned. The MAX2 in the
loop ensures that sizes >= 16 bytes are vec4 aligned. The new MAX2
after the loop ensures that sizes < 16 bytes are vec4 aligned.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82932
Cc: mesa-stable@lists.freedesktop.org
Ian Romanick [Tue, 9 Sep 2014 01:25:15 +0000 (18:25 -0700)]
glsl: Make sure row-major array-of-structure get correct layout
Whether or not the field is row-major (because it might be a bvec2 or
something) does not affect the array itself. We need to know whether an
array element in its entirety is row-major.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83506
Cc: mesa-stable@lists.freedesktop.org
Ian Romanick [Mon, 8 Sep 2014 19:23:39 +0000 (12:23 -0700)]
glsl: Make sure fields after small structs have correct padding
Previously the linker would correctly calculate the layout, but the
lower_ubo_reference pass would not apply correct alignment to fields
following small (less than 16-byte) nested structures.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83533
Cc: mesa-stable@lists.freedesktop.org
Chia-I Wu [Thu, 25 Sep 2014 08:54:28 +0000 (16:54 +0800)]
ilo: give gen6_draw_session a better prefix
gen6_draw_session is not GEN dependent. Rename it to ilo_render_draw_session.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Chia-I Wu [Thu, 25 Sep 2014 08:41:31 +0000 (16:41 +0800)]
ilo: make ilo_render opaque
It is not used outside the render code. There are also too many details in it
that we do not want other components to access directly.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Chia-I Wu [Thu, 25 Sep 2014 04:02:33 +0000 (12:02 +0800)]
ilo: make ilo_render_emit_draw() direct
Remove emit_draw() and ILO_RENDER_DRAW indirections. With all emit functions
being direct now, ilo_render_estimate_size() and more can also be removed.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Chia-I Wu [Thu, 25 Sep 2014 04:02:33 +0000 (12:02 +0800)]
ilo: make ilo_render_emit_rectlist() direct
Remove emit_rectlist() and ILO_RENDER_RECTLIST indirections.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Chia-I Wu [Thu, 25 Sep 2014 05:22:19 +0000 (13:22 +0800)]
ilo: clean up draw and rectlist state emission
Add these new high-level functions
ilo_render_get_draw_dynamic_states_len()
ilo_render_emit_draw_dynamic_states()
ilo_render_get_rectlist_dynamic_states_len()
ilo_render_emit_rectlist_dynamic_states()
ilo_render_get_draw_surface_states_len()
ilo_render_emit_draw_surface_states()
for draw and rectlist state emission. They are implemented in the new
ilo_render_dynamic.c and ilo_render_surface.c.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Chia-I Wu [Thu, 25 Sep 2014 04:44:53 +0000 (12:44 +0800)]
ilo: sanity check ilo_render_get_*_len()
Assert that we never write more than what ilo_render_get_*_len() returns.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Chia-I Wu [Thu, 25 Sep 2014 04:32:21 +0000 (12:32 +0800)]
ilo: simplify ilo_render_get_query_len()
For all supported query types, we always emit a PIPE_CONTROL. Call
ilo_render_get_flush_len() for simplicity and clarity.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Chia-I Wu [Thu, 25 Sep 2014 04:10:00 +0000 (12:10 +0800)]
ilo: make ilo_render_emit_query() direct
Remove emit_query() and ILO_RENDER_QUERY indirections.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Chia-I Wu [Thu, 25 Sep 2014 04:02:33 +0000 (12:02 +0800)]
ilo: make ilo_render_emit_flush() direct
Remove emit_flush() and ILO_RENDER_FLUSH indirections.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Chia-I Wu [Wed, 24 Sep 2014 07:24:25 +0000 (15:24 +0800)]
ilo: simplify ilo_render invalidation
ilo_render is based on ilo_builder. We should only care if the builder
buffers are invalidated, or if the hardware context is invalidated. Replace
ilo_render_invalidate() with flags by ilo_render_invalidate_builder() and
ilo_render_invalidate_hw().
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Chia-I Wu [Thu, 25 Sep 2014 06:53:34 +0000 (14:53 +0800)]
ilo: add ilo_builder_{dynamic,surface}_used()
Return how many DWords are used in dynamic and surface buffers respectively.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Chia-I Wu [Thu, 25 Sep 2014 06:46:33 +0000 (14:46 +0800)]
ilo: rename state buffer to dynamic buffer
Both dynamic buffer and surface buffer are state buffers. We should not use
state buffer to refer to the former.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>