review.tizen.org Git - profile/ivi/mesa.git/log

radeon/llvm: Update and fix some comments

radeonsi: Remove use.sgpr* intrinsics, use load instructions instead

We now model loading uses sgpr values with LLVM IR load instructions that
use the USER_SGPR address space.

The definition of the sgpr parameter to the use_sgpr() helper function
in radeonsi_shader.c has changed so that you can pass raw sgpr values
rather than having to divide the sgpr value you want to use by the dword
width of the type you want to load.

radeonsi: Handle TGSI CONST registers

We now emit LLVM load instructions for TGSI CONST register reads,
which are lowered in the backend to S_LOAD_DWORD* instructions.

radeon/llvm: Remove AMDILIntrinsicInfo::GetDeclaration fuction body

This function was causing compile errors in the tablegen'd code for
some intrinsic definitions. I don't think we really need this function,
so I'm removing the function body just as a temporary solution. I'll
look into removing the entire AMDILIntrinsicInfo class later.

radeon/llvm: Remove AMDILTargetMachine

nouveau: unreference fences on resource destruction

nvc0: optimize blend cso by checking which by-RT data actually differs

Can save about 200 bytes of command buffer space.

nvc0: don't upload UCPs if the shader doesn't use them

nvc0/ir: allow 64-bit constant loads on nve4

Looks like only 128-bit access doesn't work.

nvc0/ir: fix texture barrier insertion to prevent WAW hazards

Fixes, for instance, object highlighting in Diablo 3 (wine).

nvc0/ir: TEX doesn't support JOIN modifier either

gallium: add st_api feature mask to prevent advertising MS visuals

v2: use a define for the maximum sample count
v3: also test odd sample counts (r300 supports MS3)

While multisample renderbuffers are supported by mesa, MS visuals
are not, so we need a way to tell dri/st not to advertise them even
if the gallium driver does support multisampled surfaces.

Otherwise applications selecting these non-functional visuals would
run into trouble ...

Reviewed-by: Brian Paul <brianp@vmware.com>

nv30: Fix generic passing to fragment program in NV34.

nv30: handle user index buffers

radeon/llvm: Use a custom inserter for MASK_WRITE

radeon/llvm: Use tablegen pattern to lower bitconvert

radeon/llvm: Use a custom inserter to lower FNEG

radeon/llvm: Use a custom inserter to lower CLAMP

radeon/llvm: Use a custom inserter to lower FABS

r600g: handle R16G16B16_FLOAT and R32G32B32_FLOAT in translate_colorswap

Fixes https://bugs.freedesktop.org/show_bug.cgi?id=50318

Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org>

draw: fix primitive restart bug by using the index buffer offset

The code which scans the index buffer for restart indexes wasn't adding
the index buffer offset so we were always starting at offset=0. The
offset is usually zero so it wasn't noticed before.

Fixes a failure in the piglit primitive-restart test when testing
vertex data + index data in a single VBO.

NOTE: This is a candidate for the 8.0 branch.

svga: remove the special zero-stride vertex array code

This code actually hasn't been needed for some time now. We can just
treat a zero-stride vertex array like any other non-zero-stride array.

gallium/docs: beef up the docs related to color clamping

Reviewed-by: Marek Olšák <maraeo@gmail.com>

util: add GALLIUM_LOG_FILE option for logging output to a file

Useful for logging different runs to files and diffing, etc.

i965/msaa: Enable 4x MSAA on Gen7.

Basic 4x MSAA support now works on Gen7.  This patch enables it.

As with Gen6, MSAA support is still fairly preliminary.  In
particular, the following are not yet supported:
- 8x oversampling (Gen7 has hardware support for this, but we do not
  yet expose it).
- Fully general blits between MSAA and non-MSAA buffers.
- Formats other than RGBA8, DEPTH24, and STENCIL8.
- Centrold interpolation.
- Coverage parameters (glSampleCoverage, GL_SAMPLE_ALPHA_TO_COVERAGE,
  GL_SAMPLE_ALPHA_TO_ONE, GL_SAMPLE_COVERAGE, GL_SAMPLE_COVERAGE_VALUE,
  GL_SAMPLE_COVERAGE_INVERT).

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965/msaa: Implement manual blending operation for Gen7.

On Gen6, the blending necessary to blit an MSAA surface to a non-MSAA
surface could be accomplished with a single texturing operation.  On
Gen7, the WM program must fetch each sample and blend them together
manually.  From the Bspec (Shared Functions/Messages/Initiating
Message/Message Types/sample):

    [DevIVB+]:Number of Multisamples on the associated surface must be
    MULTISAMPLECOUNT_1.

This patch implements the manual blend operation.

Acked-by: Kenneth Graunke <kenneth@whitecape.org>

i965/msaa: Modify blorp code to account for Gen7 MSAA layouts.

Since blorp uses color textures and render targets to do all its work
(even when blitting stencil and depth data), it always has to
configure the Gen7 GPU to use the new "sliced" MSAA layout. However,
when blitting stencil or depth data, the actual MSAA layout is
interleaved (as in Gen6). Therefore, blorp has to do extra coordinate
transformation work to account for the interleaving manually.

This patch causes blorp to perform the necessary extra coordinate
transformations.

It also modifies the blorp SURFACE_STATE setup code for Gen7, so that
it does not try to correct the surface width and height to account for
MSAA, since "sliced" MSAA layout doesn't affect the surface width or
height.

Acked-by: Kenneth Graunke <kenneth@whitecape.org>

i965/msaa: Validate Gen7 surface state constraints.

When a Gen7 SURFACE_STATE is configured for MSAA, a number of
additional constaints come in to play. This patch adds a function
gen7_check_surface_setup() which verifies that all of those
constraints are met.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965/msaa: Properly handle sliced layout for Gen7.

Starting in Gen7, there are two possible layouts for MSAA surfaces:

- Interleaved, in which additional samples are accommodated by scaling
  up the width and height of the surface.  This is the only layout
  available in Gen6.  On Gen7 it is used for depth and stencil
  surfaces only.

- Sliced, in which the surface is stored as a 2D array, with array
  slice n containing all pixel data for sample n.  On Gen7 this layout
  is used for color surfaces.

The "Sliced" layout has an additional requirement: it must be used in
ARYSPC_LOD0 mode, which means that the surface doesn't leave any extra
room between array slices for miplevels other than 0.

This patch modifies the surface allocation functions to use the
correct layout when allocating MSAA surfaces in Gen7, and to set the
array offsets properly when using ARYSPC_LOD0 mode.  It also modifies
the code that populates SURFACE_STATE structures to ensure that
ARYSPC_LOD0 mode is selected in the appropriate circumstances.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965/msaa: Add defines for Gen7.

Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965/blorp: Enable blorp blits on Gen7.

Gen7 support for blorp (blits using the render bath) now works for
non-MSAA purposes. This patch enables it.

Since blorp operations re-use the logic for HiZ ops, this required
adding a case to the switch statement in gen7_blorp_emit_wm_config(),
to allow for the case where no HiZ op is being performed.

Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965/blorp: Implement proper texel fetch messages for Gen7.

On Gen6, texel fetch is always accomplished using the SAMPLE_LD
message, which accepts arguments (u, v, r, lod, si). On Gen7, there
are two* texel fetch messages: SAMPLE_LD for non-MSAA surfaces, taking
arguments (u, lod, v), and SAMPLE_LD2DSS for MSAA surfaces, taking
arguments (si, u, v).

*Technically, there are other texel fetch messages, but they are used
for "compressed" MSAA surfaces, which we don't yet support.

This patch adds the proper message types and argument orderings for
Gen7.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965/blorp: Use 16 pixel dispatch on Gen7.

Gen7 hardware requires us to enable at least one WM dispatch mode,
even if there is no program being dispatched to.  When this code was
only used for HiZ operations (which don't use a WM program), we used
32-pixel dispatch, because it didn't matter.  But blit programs are
compiled for 16-pixel dispatch.  So just enable 16-wide dispatch
unconditionally.

Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Enable 16-wide dispatch unconditionally rather than add the
unnecessary complication of using 32-wide dispatch when there is no WM
program.

i965/blorp: Allocate space for push constants on Gen7.

On Gen7, push constants for shader programs are stored in the URB, so
blorp code needs to set aside space for them. This was previously
unnecessary because blorp code was based on HiZ operations, which
don't require any shaders.

This patch adds a call from gen7_blorp_exec() to
gen7_allocate_push_constants(), to ensure that push constants are
assigned the correct location in the URB. It also extracts a new
function gen7_emit_urb_state() from gen7_upload_urb(), which is
re-used by gen7_blorp_emit_urb_config() to ensure that the URB regions
used by all the pipeline stages leave room for the push constants.

Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965/blorp: Set the dynamic state upper bound.

We know from previous bug fixes (commits
c25e5300cba7628b58df93ead14ebc3cc32f338c and
b2ace06cbbbb1021e2d7ace12a985c6406821939) that texture border color
doesn't work if the dynamic state upper bound is set to 0. Although
the blorp engine doesn't make use of texture borders, it seems like we
ought to err on the safe side and set this value properly.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965/blorp: Factor gen6_blorp_emit_batch_head into separate functions.

This patch separates out the portions of gen6_blorp_emit_batch_head()
that emit 3DSTATE_MULTISAMPLE, 3DSTATE_SAMPLE_MASK, and
STATE_BASE_ADDRESS. This paves the way for making the blorp code work
on Gen7, where additional command packets
(3DSTATE_PUSH_CONSTANT_ALLOC_VS and 3DSTATE_PUSH_CONSTANT_ALLOC_PS)
need to be emitted before 3DSTATE_MULTISAMPLE.

Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965/blorp: Use MSDISPMODE_PERSAMPLE rendering when necessary

This patch modifies the "blorp" WM program so that it can be run in
MSDISPMODE_PERSAMPLE (which means that every single sample of a
multisampled render target is dispatched to the WM program, not just
every pixel).

Previously we were using the ugly hack of configuring multisampled
destination surfaces as single-sampled, and generating sample indices
other than zero by swizzling the pixel coordinates in the WM program.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

i965/blorp: Emit sample index in SAMPLE_LD message when necessary

This patch modifies the function brw_blorp_blit_program::texel_fetch()
to emit the SI (sample index) argument to the SAMPLE_LD message when
reading from a sample index other than zero.

Previously we were using the ugly hack of configuring multisampled
source surfaces as single-sampled, and accessing sample indices other
than zero by swizzling the texture coordinates in the WM program.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965/blorp: Generalize sampling code in preparation for Gen7

This patch generalizes the function
brw_blorp_blit_program::texture_lookup() so that it prepares the
arguments to the sampler message based on a caller-provided array
rather than assuming the argument order is always (u, v).

This paves the way for the messages we will need to use in Gen7, which
use argument orders (u, lod, v) and (si, u, v) (si=sample index).

It will also will allow us to read from arbitrary sample indices on
Gen6, by supplying the arguments (u, v, r, lod, si) to the SAMPLE_LD
message instead of just (u, v).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965/msaa: Expand odd-sized MSAA surfaces to account for interleaving pattern.

Gen6 MSAA buffers (and Gen7 MSAA depth/stencil buffers) interleave
MSAA samples in a complex pattern that repeats every 2x2 pixel block.
Therefore, when allocating an MSAA buffer, we need to make sure to
allocate an integer number of 2x2 blocks; if we don't, then some of
the samples in the last row and column will be cut off.

Fixes piglit tests "EXT_framebuffer_multisample/unaligned-blit {2,4}
color msaa" on i965/Gen6.

Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

gallium/targets: pass ldflags parameter to MKLIB

Without passing the -ldflags parameter before $(LDFLAGS) in some cases
flags will be passed to MKLIB which it does not understand.
This might be -m64, -m32 or similar.

NOTE: This is a candidate for the 8.0 branch.

Signed-off-by: Thomas Gstädtner <thomas@gstaedtner.net>
Signed-off-by: Brian Paul <brianp@vmware.com>

Revert "r600g: set round_mode to truncate and get rid of tgsi_f2i on evergreen"

This reverts commit 60bf0f05b472e66bf1175fcec7a274dab6f7e2a3.

It seems round_mode behaves differently in some cases depending on the
instruction/slot. Reverting it for now.

Fixes https://bugs.freedesktop.org/show_bug.cgi?id=50232

Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>

radeon/llvm: add FLT_TO_UINT, UINT_TO_FLT instructions

Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>

radeon/llvm: prepare to revert the round mode state to default

Use TRUNC before FLT_TO_INT on evergreen/cayman.

Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>

radeon/llvm: fix sampler index in llvm_emit_tex

Sampler index isn't a second source operand for some tgsi texture
instructions. Let's assume it's always the last.

Fixes https://bugs.freedesktop.org/show_bug.cgi?id=50230

Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>

radeon/llvm: fix opcode for RECIP_UINT_r600

Fixes https://bugs.freedesktop.org/show_bug.cgi?id=50312

Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Tested-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>

radeon/llvm/loader: convert hardcoded gpu name to option

Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>

r600g: add RECIP_INT, PRED_SETE_INT to r600_bytecode_get_num_operands

Fixes https://bugs.freedesktop.org/show_bug.cgi?id=50315

Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Tested-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>

i915g: Check for geometry shader earlier in i915_set_constant_buffer.

Fix resource leak defect reported by Coverity.

Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>

scons: Fix SCons build infrastructure for FreeBSD.

This patch gets the FreeBSD SCons build working again. The build still
fails though.

Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>

radeon/llvm: Lower UDIV using the Selection DAG

radeon/llvm: Remove auto-generated AMDIL->ISA conversion code

radeon/llvm: Remove AMDIL instructions MULHI, SMUL

radeon/llvm: Remove AMDIL bitshift instructions (SHL, SHR, USHR)

radeon/llvm: Remove AMDIL FTOI and ITOF instructions

radeon/llvm: Remove AMDIL EXP* instructions

radeon/llvm: Remove AMDIL ADD instructions

radeon/llvm: Remove AMDIL binary instrutions (OR, AND, XOR, NOT)

radeon/llvm: Remove AMDILMachinePeephole pass

radeon/llvm: Remove AMDIL CMP instructions and associated lowering code

radeon/llvm: Remove AMDIL ROUND_NEAREST instruction

radeon/llvm: Remove AMDIL ROUND_POSINF instruction

radeon/llvm: Add custom SDNode for FRACT

radeon/llvm: Use -1 as true value for SET* integer instructions

radeon/llvm: Handle SETGE_INT, SETGE_UINT, and SETGT_UINT opcodes

Support for these was inadvertently dropped in commit
cee23ab246f22210b3063cdc47bdb45b3d943526

radeon/llvm: Avoid error with SI in EmitInstrWithCustomInserter()

We need to return immediately after inserting instructions that require
S_WAITCNT so that the parent class' custom inserter won't try to insert
them again.

tgsi: Initialize Padding struct fields.

Fix uninitialized scalar variable defects report by Coverity.

Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>

i965: Gut the separate OpenGL ES extension enabling.

We should just set the bits of functionality that we support; the
GL/ES1/ES2 flags in extensions.c will take care of advertising the
appropriate extensions for the current API.

This enables the GL_EXT_texture_compression_dxt1 extension on ES1/ES2
when libtxc_dxtn is installed or the force_s3tc driconf option is set.
The main extension code set this up properly, but the ES-specific code
failed to do so.

Otherwise, the extension strings reported by es1_info, es2_info, and
glxinfo all remain the same.

This patch manually disables the ARB_framebuffer_object bit on ES
to preserve the behavior of 1c0f5d8324c4db2720247989ddc4a45315b55a85.

v2: Rebase, fix the i915 Makefile, and unconditionally set the
OES_draw_texture bit as core Mesa will only apply it to ES1 now.

Tested-by: Daniel Charles <daniel.charles@intel.com> [v1]
Reviewed-by: Chad Versace <chad.versace@linux.intel.com> [v1]
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>

mesa: Remove the OES_draw_texture extension from ES2.

This extension appears to be written against ES 1.0.
In ES 2.0, you really want to be using FBOs instead.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>

i965: use cut index to handle primitive restart when possible

If the primitive restart index and the primitive type can
be handled by the cut index feature, then use the hardware
to handle the primitive restart feature.

The VBO module's software handling of primitive restart is
used as a fall back.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

i965: add flag to enable cut_index

When brw->prim_restart.enable_cut_index is set, the cut index
will be enabled when uploading index_buffer commands.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

i965: create code path to handle primitive restart in hardware

For newer hardware we disable the VBO module's software handling
of primitive restart. We now handle primitive restarts in
brw_handle_primitive_restart.

The initial version of brw_handle_primitive_restart simply calls
vbo_sw_primitive_restart, and therefore still uses the VBO
module software primitive restart support.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

glsl/tests: Add .gitignore for uniform initialization unit test.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

glsl/constant propagation: kill whole var if LHS involves array indexing.

When considering which components of a variable were killed by an
assignment, constant propagation would previously just use the write
mask of the assignment. This worked if the LHS of the assignment was
simple, e.g.:

v.xy = ...; // (assign (xy) (var_ref v) ...)

But it did the wrong thing if the LHS of the assignment involved an
array indexing operator, since in this case the write mask is always
(x):

v[i] = ...; // (assign (x) (deref_array (var_ref v) (var_ref i)) ...)

In general, we can't predict which vector component will be selected
by array indexing, so the only safe thing to do in this case is to
kill the entire variable.

Fixes piglit tests {fs,vs}-vector-indexing-kills-all-channels.shader_test.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>

glsl/tests: Add test for uniform initialization by the linker

v2: Put unit tests in src/glsl/tests rather than tests/glsl.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>

mesa: Use initializers to configure samplers

Now that the linker handles initializers of samplers just like any
other uniform, a bunch of this annoying code is unnecessary.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

ir_to_mesa: Don't set initial uniform values again

This work is now done by the linker, so we don't need to keep doing it
here.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

ir_to_mesa: Propagate initial values in _mesa_associate_uniform_storage

The linker may have set initial values for uniforms. Propagate these
values to the driver's backing storage when it is first associated.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

glsl: Propagate sampler uniform initializers to gl_shader_program::SamplerUnits

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

glsl: Initialize samplers to 0, propagate sampler values to the gl_program

The spec requires that samplers be initialized to 0. Since this
differs from the 1-to-1 mapping of samplers to texture units assumed
by ARB assembly shaders (and the gl_program structure), be sure to
propagate this date from the gl_shader_program to the gl_program.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
CC: Vadim Girlin <vadimgirlin@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=49088

glsl: Set initial values for uniforms in the linker

v2: Fix handling of arrays-of-structure. Thanks to Eric Anholt for
pointing this out.

v3: Minor comment change based on feedback from Ken.

Fixes piglit glsl-1.20/execution/uniform-initializer/fs-structure-array
and glsl-1.20/execution/uniform-initializer/vs-structure-array.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965/gen6+: Add support for GL_ARB_blend_func_extended.

v2: Add support for gen6, and don't turn it on if blending is
disabled. (fixes GPU hang), and note it in docs/GL3.txt

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

mesa: Keep a computed value for dual source blend func with each buffer.

The i965 driver needed this as well for hardware setup, so instead of
duplicating the logic, just save it off.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>

i965/gen6+: Add support for fast depth clears.

Improves citybench high-res performance 3.0% +- 0.4%, n=10. Improves
Lightsmark 1024x768 performance 0.74% +/- 0.20% (n=78). No
significant difference on openarena (n=5, didn't fast clear) or nexuiz
(n=3).

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>

i965/gen6: Add CC viewport state setup to blorp code.

While it doesn't have the same warning in the simulator as in gen7,
let's emit it out of paranoia. We wouldn't want our resolves of some
previous clear to get clamped to some current clamping value.

Suggested-by: pretty much everyone

i965/gen7: Add CC viewport setup to blorp code.

When doing fast clears, a fulsim warning said that the batch was being
emitted without the viewport set up.  While the fast clear pass I was
looking at doesn't use the clear value, the later resolves which also
didn't set up the vieport would trigger the same.  It's not obvious
from the error message whether it meant "fast clear value gets clamped
to something you haven't defined" or "fast clear value doesn't get
clamped, and I saw it was out of the current (uninitialized) range,
and you probably wanted it clamped to that (uninitialized) range".  Be
paranoid and assume the first case.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>

i965: Drop a layer of indirection in doing HiZ resolves.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>

i965: Replace intel_need_resolve with the hiz ops it maps to.

Having this enum separate caused us to need a bunch of helper
functions to translate to the op to be executed.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>

i965: Add an interface for doing hiz ops from C code.

This required moving gen6_hiz_op, and I put it in intel_resolve_map.h
for the next commit.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>

i965: Rename the clear function for this driver.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>

i965: Simplify the remaining clear logic by relying on the meta clear.

The GLSL clear path doesn't need any buffer presence checks, since
those are already handled in the normal drawing path code.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>

i965: Switch blit color clears to tri clears on gen4/5.

Our understanding is that the 3D engine is supposed to be faster
anyway. We used to have more overhead in our tri clear path than we
do today, which would have led to this choice. But given that we
almost always see a depth clear along with a color clear, the path was
hardly exercised anyway.

Also, the color mask logic was broken in the presence of
GL_EXT_draw_buffers2's per-buffer colormask.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>

i965: Remove dead logic for non-tri depth/stencil clears.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>

i965: We always have GLSL, so always use it for tri clears.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>

i915: Drop gen4+ code from the forked clear code.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>

intel: Fork the intel_clear.c file between i915 and i965.

This logic is wasted on i965 when we want to just always do GLSL tri
clears.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>

st/mesa: set stObj->lastLevel in guess_and_alloc_texture

Fixes lockups/asserts with depthstencil-render-miplevels tests and r600g.
Should also fix https://bugs.freedesktop.org/show_bug.cgi?id=50033

NOTE: This is a candidate for the 8.0 branch.

Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>

i965: Completely annotate the batch bo when aub dumping.

Previously, when the environment variable INTEL_DEBUG=aub was set,
mesa would simply instruct DRM to start dumping data to an .aub file,
but we would not provide DRM with any information about the format of
the data in various buffers. As a result, a lot of the data in the
generate .aub file would be unannotated, making further data analysis
difficult.

This patch causes the entire contents of each batch buffer to be
annotated using the data in brw->state_batch_list (which was
previously used only to annotate the output of INTEL_DEBUG=bat). This
includes data that was allocated by brw_state_batch, such as binding
tables, surface and sampler states, depth/stencil state, and so on.

The new annotation mechanism requires DRM version 2.4.34.

Reviewed-by: Eric Anholt <eric@anholt.net>

intel: When AUB dumping, flush before emitting final bitmap command.

When we are generating an AUB dump, we make a final call to
aub_dump_bmp() as the context is being destroyed, to ensure that any
rendering performed before the application exits can be seen during a
simulation run. However, we were doing this before flushing the batch
buffer; as a result simulation runs would not always see the effect of
all rendering commands.

This patch flushes the batch buffer just before making the final call
to aub_dump_bmp(), to ensure that all rendering is properly captured
in the final bitmap.

llvmpipe: Fix alpha testing precision on rgba8 formats.

This is a long standing problem, that recently surfaced with the change
to enable perspective correct color interpolation.

A fix for all possible formats is left to the future.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>