review.tizen.org Git - platform/upstream/mesa.git/log

projects / platform / upstream / mesa.git / log

Hyunjun Ko [Wed, 17 Oct 2018 12:57:27 +0000 (21:57 +0900)]

freedreno: adds nondraw param to fd_bc_alloc_batch

Needs to specify nondraw when creating a batch through
fd_bc_alloc_batch since it'd better create a batch through
it rather than fd_batch_create.

Signed-off-by: Rob Clark <robdclark@gmail.com>

commit | commitdiff | tree

Rob Clark [Fri, 12 Oct 2018 20:27:59 +0000 (16:27 -0400)]

freedreno/a6xx: remove fd6_emit_render_cntl()

It was dead code carried over from a5xx

Signed-off-by: Rob Clark <robdclark@gmail.com>

commit | commitdiff | tree

Rob Clark [Sat, 15 Sep 2018 18:41:07 +0000 (14:41 -0400)]

freedreno/ir3: fix broken texcoord inputs

TODO not sure if this is best solution, but current logic is broken for
texcoord inputs. It is definitely the simplest solution.

Fixes: 1a24f519663 freedreno/ir3: ignore unused inputs
Signed-off-by: Rob Clark <robdclark@gmail.com>

commit | commitdiff | tree

Rob Clark [Sat, 13 Oct 2018 16:34:09 +0000 (12:34 -0400)]

freedreno: fix off-by-one error in BEGIN_RING()

Signed-off-by: Rob Clark <robdclark@gmail.com>

commit | commitdiff | tree

Marek Olšák [Sun, 23 Sep 2018 00:03:27 +0000 (20:03 -0400)]

util: document a limitation of util_fast_udiv32

trivial

commit | commitdiff | tree

Matt Turner [Thu, 6 Sep 2018 18:15:55 +0000 (11:15 -0700)]

i965/fs: Add 64-bit int immediate support to dump_instructions()

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>

commit | commitdiff | tree

Marek Olšák [Sun, 7 Oct 2018 02:44:36 +0000 (22:44 -0400)]

radeonsi: track context rolls better for the Vega scissor bug workaround

We should get fewer context rolls with the SET_CONTEXT_REG optimization,
but it would have been for nothing if the scissor state rolled the context
anyway. Don't emit the scissor state if there is no context roll.

commit | commitdiff | tree

Marek Olšák [Sun, 7 Oct 2018 02:53:33 +0000 (22:53 -0400)]

radeonsi: emit sample locations for 1xAA only when the hw bug is present

commit | commitdiff | tree

Marek Olšák [Fri, 24 Aug 2018 04:29:04 +0000 (00:29 -0400)]

radeonsi: use compute shaders for clear_buffer & copy_buffer

Fast color clears should be much faster. Also, fast color clears on
evicted buffers should be 200x faster on GFX8 and older.

commit | commitdiff | tree

Marek Olšák [Sun, 7 Oct 2018 00:56:32 +0000 (20:56 -0400)]

radeonsi: use copy_buffer in buffer_do_flush_region directly

commit | commitdiff | tree

Marek Olšák [Sun, 23 Sep 2018 02:02:32 +0000 (22:02 -0400)]

radeonsi: use faster integer division for instance divisors

We know the divisors when we upload them, so instead we can precompute
and upload division factors derived from each divisor.

This fast division consists of add, mul_hi, and two shifts,
and we have to load 4 dwords intead of 1.

This probably won't affect any apps.

commit | commitdiff | tree

Marek Olšák [Sun, 23 Sep 2018 01:17:52 +0000 (21:17 -0400)]

ac: add helpers for fast integer division by a constant

commit | commitdiff | tree

Marek Olšák [Sat, 29 Sep 2018 01:43:49 +0000 (21:43 -0400)]

radeonsi: use higher subpixel precision (QUANT_MODE) for smaller viewports

commit | commitdiff | tree

Marek Olšák [Sat, 29 Sep 2018 00:57:07 +0000 (20:57 -0400)]

radeonsi: move emission of PA_SU_VTX_CNTL into emit_guardband

We'll modify the quant mode there, which also affects the guarband
computation.

commit | commitdiff | tree

Marek Olšák [Sat, 29 Sep 2018 00:38:26 +0000 (20:38 -0400)]

radeonsi: don't re-upload the sample position constant buffer repeatedly

commit | commitdiff | tree

Marek Olšák [Sat, 29 Sep 2018 00:16:13 +0000 (20:16 -0400)]

radeonsi: set PA_SU_PRIM_FILTER_CNTL optimally

commit | commitdiff | tree

Marek Olšák [Fri, 28 Sep 2018 22:49:29 +0000 (18:49 -0400)]

radeonsi: center viewport to improve guardband clipping for high resolutions

This will be more useful when we change the quant mode to increase subpixel
precision and decrease the viewport range (which might not be possible
if the viewport is not centered in the viewport range).

commit | commitdiff | tree

Marek Olšák [Sat, 29 Sep 2018 23:28:20 +0000 (19:28 -0400)]

radeonsi: save raster config in screen, add se_tile_repeat

commit | commitdiff | tree

Marek Olšák [Fri, 28 Sep 2018 04:38:10 +0000 (00:38 -0400)]

radeonsi: switch back to standard DX sample positions

Apps may rely on them.

commit | commitdiff | tree

Marek Olšák [Sun, 11 Sep 2016 20:15:04 +0000 (22:15 +0200)]

radeonsi: add GDS support to CP DMA

commit | commitdiff | tree

Marek Olšák [Fri, 21 Sep 2018 07:41:18 +0000 (03:41 -0400)]

radeonsi: rename si_gfx_* functions to si_cp_*

and write_event_eop -> release_mem

commit | commitdiff | tree

Marek Olšák [Fri, 21 Sep 2018 07:36:32 +0000 (03:36 -0400)]

radeonsi: make si_gfx_write_event_eop more configurable

commit | commitdiff | tree

Sergii Romantsov [Wed, 19 Sep 2018 16:21:11 +0000 (19:21 +0300)]

anv/skylake: disable ForceThreadDispatchEnable

On Skylake enabling of ForceThreadDispatchEnable causes gpu-hang.

-v2: enabling of ForceThreadDispatchEnable is only for gen8, for
gen9 and higher reverted enabling of PixelShaderHasUAV.

-v3 (Jason Ekstrand): Rework the comments a bit.

CC: Jason Ekstrand <jason.ekstrand@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107941
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107760
Fixes: 79270d2140ec (anv: Stop setting 3DSTATE_PS_EXTRA::PixelShaderHasUAV)
Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Lionel Landwerlin [Sun, 14 Oct 2018 12:12:50 +0000 (13:12 +0100)]

anv: Implement VK_EXT_pci_bus_info

Even though the Intel GPU are always at the same PCI location, all the
info we need is already provided by libdrm. Let's be future proof.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Jose Fonseca [Fri, 12 Oct 2018 09:21:38 +0000 (10:21 +0100)]

appveyor: Cache pip's cache files.

It should speed up the Python packages installation.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>

commit | commitdiff | tree

Jose Fonseca [Fri, 12 Oct 2018 09:09:07 +0000 (10:09 +0100)]

appveyor: Update to newer Mako/winflexbison versions.

As that's what most people are bound to use.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>

commit | commitdiff | tree

Jose Fonseca [Fri, 12 Oct 2018 08:52:52 +0000 (09:52 +0100)]

appveyor: Update to MSVC 2017.

That's what we (and I suppose most people out there) are using now.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>

commit | commitdiff | tree

Samuel Pitoiset [Tue, 16 Oct 2018 07:42:42 +0000 (09:42 +0200)]

radv: disable VK_SUBGROUP_FEATURE_VOTE_BIT

This feature isn't used for now, so disable it until
wwm is fixed in LLVM.

Fixes dEQP-VK.subgroups.vote.graphics.subgroupallequal*

https://bugs.freedesktop.org/show_bug.cgi?id=108115
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

commit | commitdiff | tree

Samuel Pitoiset [Fri, 12 Oct 2018 09:30:13 +0000 (11:30 +0200)]

radv: implement buffer to image operations for R32G32B32

This should fix rendering issues with Batman Arkham City.
We will probably need to implement itob and itoi at some
point, but currently nothing hits these paths.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107765
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

commit | commitdiff | tree

Alex Smith [Mon, 15 Oct 2018 14:50:20 +0000 (15:50 +0100)]

ac/nir: Use context-specific LLVM types

LLVMInt*Type() return types from the global context and therefore are
not safe for use in other contexts. Use types from our own context
instead.

Fixes frequent crashes seen when doing multithreaded pipeline creation.

Fixes: 4d0b02bb5a "ac: add support for 16bit load_push_constant"
Fixes: 7e7ee82698 "ac: add support for 16bit buffer loads"
Cc: "18.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>

commit | commitdiff | tree

Vadym Shovkoplias [Tue, 9 Oct 2018 16:09:10 +0000 (19:09 +0300)]

glsl: Check the subroutine associated functions names

Adding compile time check for subroutine functions with
the same names. Similar check for intrastage linking was already
landed in commit 5f0567a4f60.

From Section 6.1.2 (Subroutines) of the GLSL 4.00 specification

    "A program will fail to compile or link if any shader
     or stage contains two or more functions with the same
     name if the name is associated with a subroutine type."

Fixes:
    * no-overloads.vert

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108109
Signed-off-by: Vadym Shovkoplias <vadym.shovkoplias@globallogic.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>

commit | commitdiff | tree

Vadym Shovkoplias [Wed, 10 Oct 2018 10:51:28 +0000 (13:51 +0300)]

glsl/linker: Change the format of spec quotation

Also there is no "OpenGL ES Shading Language 4.00" spec,
so change it to GLSL 4.00 spec.

Signed-off-by: Vadym Shovkoplias <vadym.shovkoplias@globallogic.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>

commit | commitdiff | tree

Dave Airlie [Wed, 9 May 2018 03:21:43 +0000 (13:21 +1000)]

nir: fix clip cull lowering to not assert if GLSL already lowered.

If GLSL has already done the lowering, we'd rather not crash in this pass.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

commit | commitdiff | tree

Kenneth Graunke [Mon, 15 Oct 2018 23:02:50 +0000 (16:02 -0700)]

i965: Add PCI IDs for new Amberlake parts that are Coffeelake based

See commit c0c46ca461f136a0ae1ed69da6c874e850aeeb53 in the Linux kernel,
where José Roberto de Souza added this new PCI ID there.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

commit | commitdiff | tree

Kenneth Graunke [Sun, 19 Aug 2018 17:15:12 +0000 (10:15 -0700)]

intel: disable FS IR validation in release mode.

We probably don't need to iterate, fprintf, and abort in release mode.

Reviewed-by: Matt Turner <mattst88@gmail.com>

commit | commitdiff | tree

Caio Marcelo de Oliveira Filho [Fri, 14 Sep 2018 18:41:39 +0000 (11:41 -0700)]

nir: Copy propagation between blocks

Extend the pass to propagate the copies information along the control
flow graph.  It performs two walks, first it collects the vars
that were written inside each node. Then it walks applying the copy
propagation using a list of copies previously available.  At each node
the list is invalidated according to results from the first walk.

This approach is simpler than a full data-flow analysis, but covers
various cases.  If derefs are used for operating on more memory
resources (e.g. SSBOs), the difference from a regular pass is expected
to be more visible -- as the SSA copy propagation pass won't apply to
those.

A full data-flow analysis would handle more scenarios: conditional
breaks in the control flow and merge equivalent effects from multiple
branches (e.g. using a phi node to merge the source for writes to the
same deref).  However, as previous commentary in the code stated, its
complexity 'rapidly get out of hand'.  The current patch is a good
intermediate step towards more complex analysis.

The 'copies' linked list was modified to use util_dynarray to make it
more convenient to clone it (to handle ifs/loops).

Annotated shader-db results for Skylake:

    total instructions in shared programs: 15105796 -> 15105451 (<.01%)
    instructions in affected programs: 152293 -> 151948 (-0.23%)
    helped: 96
    HURT: 17

        All the HURTs and many HELPs are one instruction.  Looking
        at pass by pass outputs, the copy prop kicks in removing a
        bunch of loads correctly, which ends up altering what other
        other optimizations kick.  In those cases the copies would be
        propagated after lowering to SSA.

        In few HELPs we are actually helping doing more than was
        possible previously, e.g. consolidating load_uniforms from
        different blocks.  Most of those are from
        shaders/dolphin/ubershaders/.

    total cycles in shared programs: 566048861 -> 565954876 (-0.02%)
    cycles in affected programs: 151461830 -> 151367845 (-0.06%)
    helped: 2933
    HURT: 2950

        A lot of noise on both sides.

    total loops in shared programs: 4603 -> 4603 (0.00%)
    loops in affected programs: 0 -> 0
    helped: 0
    HURT: 0

    total spills in shared programs: 11085 -> 11073 (-0.11%)
    spills in affected programs: 23 -> 11 (-52.17%)
    helped: 1
    HURT: 0

        The shaders/dolphin/ubershaders/12.shader_test was able to
        pull a couple of loads from inside if statements and reuse
        them.

    total fills in shared programs: 23143 -> 23089 (-0.23%)
    fills in affected programs: 2718 -> 2664 (-1.99%)
    helped: 27
    HURT: 0

        All from shaders/dolphin/ubershaders/.

    LOST:   0
    GAINED: 0

The other generations follow the same overall shape.  The spills and
fills HURTs are all from the same game.

shader-db results for Broadwell.

    total instructions in shared programs: 15402037 -> 15401841 (<.01%)
    instructions in affected programs: 144386 -> 144190 (-0.14%)
    helped: 86
    HURT: 9

    total cycles in shared programs: 600912755 -> 600902486 (<.01%)
    cycles in affected programs: 185662820 -> 185652551 (<.01%)
    helped: 2598
    HURT: 3053

    total loops in shared programs: 4579 -> 4579 (0.00%)
    loops in affected programs: 0 -> 0
    helped: 0
    HURT: 0

    total spills in shared programs: 80929 -> 80924 (<.01%)
    spills in affected programs: 720 -> 715 (-0.69%)
    helped: 1
    HURT: 5

    total fills in shared programs: 93057 -> 93013 (-0.05%)
    fills in affected programs: 3398 -> 3354 (-1.29%)
    helped: 27
    HURT: 5

    LOST:   0
    GAINED: 2

shader-db results for Haswell:

    total instructions in shared programs: 9231975 -> 9230357 (-0.02%)
    instructions in affected programs: 44992 -> 43374 (-3.60%)
    helped: 27
    HURT: 69

    total cycles in shared programs: 87760587 -> 87727502 (-0.04%)
    cycles in affected programs: 7720673 -> 7687588 (-0.43%)
    helped: 1609
    HURT: 1416

    total loops in shared programs: 1830 -> 1830 (0.00%)
    loops in affected programs: 0 -> 0
    helped: 0
    HURT: 0

    total spills in shared programs: 1988 -> 1692 (-14.89%)
    spills in affected programs: 296 -> 0
    helped: 1
    HURT: 0

    total fills in shared programs: 2103 -> 1668 (-20.68%)
    fills in affected programs: 438 -> 3 (-99.32%)
    helped: 4
    HURT: 0

    LOST:   0
    GAINED: 1

v2: Remove the DISABLE prefix from tests we now pass.

v3: Add comments about missing write_mask handling. (Caio)
    Add unreachable when switching on cf_node type. (Jason)
    Properly merge the component information in written map
    instead of replacing. (Jason)
    Explain how removal from written arrays works. (Jason)
    Use mode directly from deref instead of getting the var. (Jason)

v4: Register the local written mode for calls. (Jason)
    Prefer cf_node instead of node. (Jason)
    Clarify that remove inside iteration only works in backward
    iterations. (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Caio Marcelo de Oliveira Filho [Sat, 15 Sep 2018 01:17:51 +0000 (18:17 -0700)]

nir: Take call instruction into account in copy_prop_vars

Calls are not used yet (functions are inlined), but since new code is
already taking them into account, do it here too. The convention here
and in other places is that no writable memory is assumed to remain
unchanged, as well as global variables.

Also, explicitly state the modes affected (instead of using the
reverse logic) in one of the apply_for_barrier_modes calls.

Suggested by Jason.

v2: Consider local vars used by a call to be conservative, SPIR-V has
such cases. (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Caio Marcelo de Oliveira Filho [Wed, 12 Sep 2018 21:33:59 +0000 (14:33 -0700)]

nir: Add tests for copy propagation of derefs

Also tests for removal of redundant loads, that we currently handle as
part of the copy propagation.

Note some tests involve multiple blocks and are currently DISABLED
because they (expectedly) fail.

v2: Add missing DISABLED prefix to "multi block" tests. (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Caio Marcelo de Oliveira Filho [Wed, 15 Aug 2018 16:52:53 +0000 (09:52 -0700)]

nir: Remove handling of dead writes from copy_prop_vars

These are covered by another pass now.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Caio Marcelo de Oliveira Filho [Thu, 30 Aug 2018 00:26:03 +0000 (17:26 -0700)]

intel/nir, freedreno/ir3: Use the separated dead write vars pass

No changes to shader-db for intel.
No changes to shader-db expected for freedreno.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Caio Marcelo de Oliveira Filho [Fri, 27 Jul 2018 20:56:35 +0000 (13:56 -0700)]

nir: Separate dead write removal into its own pass

Instead of doing this as part of the existing copy_prop_vars pass.

Separation makes easier to expand the scope of both passes to be more
than per-block.  For copy propagation, the information about valid
copies comes from previous instructions; while the dead write removal
depends on information from later instructions ("have any instruction
used this deref before overwrite it?").

Also change the tests to use this pass (instead of copy prop vars).
Note that the disabled tests continue to fail, since the standalone
pass is still per-block.

v2: Remove entries from dynarray instead of marking items as
    deleted.  Use foreach_reverse. (Caio)

    (all from Jason)
    Do not cache nir_deref_path.  Not worthy for this patch.
    Clear unused writes when hitting a call instruction.
    Clean up enumeration of modes for barriers.
    Move metadata calls to the inner function.

v3: For copies, use the vector length to calculate the mask.

    (all from Jason)
    Use nir_component_mask_t when applicable.
    Rename functions for clarity.
    Consider local vars used by a call to be conservative (SPIR-V has
    such cases).
    Comment and assert the assumption that stores and copies are
    always to a deref that ends with a vector or scalar.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Caio Marcelo de Oliveira Filho [Wed, 29 Aug 2018 23:30:09 +0000 (16:30 -0700)]

nir: Add tests for dead write elimination

Note at the moment the pass called is nir_opt_copy_prop_vars, because
dead write elimination is implemented there.

Also added tests that involve identifying dead writes in multiple
blocks (e.g. the overwrite happens in another block). Those currently
fail as expected, so are marked to be skipped.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Caio Marcelo de Oliveira Filho [Wed, 29 Aug 2018 23:30:09 +0000 (16:30 -0700)]

nir: Add test file for vars related passes

Add basic helpers for doing tests on the vars related optimization
passes.  The main goal is to lower the barrier to create tests during
development and debugging of the passes.  Full coverage is not a
requirement.

v2: Make find_next_intrinsic() skip blocks before 'after'. (Jason)
    Move nir_imm_ivec2() to nir_builder.h. (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Caio Marcelo de Oliveira Filho [Thu, 11 Oct 2018 00:14:34 +0000 (17:14 -0700)]

nir: Add nir_imm_ivec2 helper

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Caio Marcelo de Oliveira Filho [Wed, 12 Sep 2018 21:57:35 +0000 (14:57 -0700)]

util: Add foreach_reverse for dynarray

Useful to walk the array removing elements by swapping them with the
last element.

v2: Change iteration to make sure we never underflow. (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Eric Anholt [Tue, 18 Sep 2018 21:09:25 +0000 (14:09 -0700)]

v3d: Add support for hardware pack/unpack of half floats.

Cuts the formerly 7-minute simulation time of fs-packHalf2x16.shader_test
in half.

commit | commitdiff | tree

Eric Anholt [Wed, 26 Sep 2018 16:13:13 +0000 (09:13 -0700)]

nir: Expose nir_remove_unused_io_vars().

For gallium drivers where you want to do some linking at variant compile
time, you don't have the other producer/consumer shader on hand to modify.
By exposing the inner function, the driver can have the used varyings in
the compiled shader cache key and still do linking.

This is also useful for V3D, where the binning shader wants to only output
position and TF varyings. We've been removing those after nir_lower_io,
but this will be less driver-specific code and let more of the shader get
DCEed early in NIR.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

commit | commitdiff | tree

Eric Anholt [Wed, 26 Sep 2018 18:33:09 +0000 (11:33 -0700)]

nir: Be sure to fix deref modes after demoting shader i/o vars to global.

Fixes assertion failures when calling nir_remove_unused_varyings() or
nir_remove_unused_io_vars().

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

commit | commitdiff | tree

Eric Anholt [Tue, 18 Sep 2018 22:53:54 +0000 (15:53 -0700)]

gallium/ttn: Convert inputs and outputs to derefs of variables.

This means that TTN shaders more closely resemble GTN shaders: they have
inputs and outputs as variable derefs, with the variables having their
.driver_location already set up for you.

This will be useful for v3d to do input variable DCE in NIR, which we
can't do when the TTN shaders never have a pre-nir_lower_io stage.

Acked-by: Rob Clark <robdclark@gmail.com>

commit | commitdiff | tree

Eric Anholt [Wed, 19 Sep 2018 19:35:51 +0000 (12:35 -0700)]

gallium/ttn: Fix the type of gl_FragDepth.

In TGSI we have a vec4 of which only .z is used, but for NIR we should be
using a float the same as other NIR IR. We were already moving TGSI's .z
to the .x channel.

Acked-by: Rob Clark <robdclark@gmail.com>

commit | commitdiff | tree

Kristian H. Kristensen [Mon, 17 Sep 2018 23:59:00 +0000 (16:59 -0700)]

freedreno/a6xx: Enable blitter

Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>

commit | commitdiff | tree

Kristian H. Kristensen [Fri, 12 Oct 2018 22:42:05 +0000 (15:42 -0700)]

freedreno/a6xx: Update headers

Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>

commit | commitdiff | tree

Kristian H. Kristensen [Mon, 8 Oct 2018 19:10:26 +0000 (12:10 -0700)]

freedreno/a6xx: Remove unnecessary GRAS_2D_BLIT_INFO write

Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>

commit | commitdiff | tree

Jason Ekstrand [Mon, 15 Oct 2018 18:07:06 +0000 (13:07 -0500)]

anv: Don't advertise ASTC support on BSW

Tested-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>

commit | commitdiff | tree

Samuel Pitoiset [Mon, 15 Oct 2018 12:48:16 +0000 (14:48 +0200)]

radv: do not force the flat qualifier for clip/cull distances

This fixes some new CTS that reads clip/cull distances
from the fragment shader stage:

dEQP-VK.clipping.user_defined.clip_*

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

commit | commitdiff | tree

Samuel Pitoiset [Mon, 15 Oct 2018 12:53:00 +0000 (14:53 +0200)]

radv: bump discreteQueuePriorities to 2

It's the minimum value required by the spec.

This fixes dEQP-VK.api.info.device.properties.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

commit | commitdiff | tree

Jason Ekstrand [Sat, 13 Oct 2018 17:16:14 +0000 (12:16 -0500)]

anv: Split dispatch tables into device and instance

There's no reason why we need generate trampoline functions for instance
functions or carry N copies of the instance dispatch table around for
every hardware generation.  Splitting the tables and being more
conservative shaves about 34K off .text and about 4K off .data when
built with clang.

Before splitting dispatch tables:

   text    data     bss     dec     hex filename
3224305 286216    8960 3519481 35b3f9 _install/lib64/libvulkan_intel.so

After splitting dispatch tables:

   text    data     bss     dec     hex filename
3190325 282232    8960 3481517 351fad _install/lib64/libvulkan_intel.so

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

commit | commitdiff | tree

Kenneth Graunke [Mon, 15 Oct 2018 16:29:06 +0000 (09:29 -0700)]

i965: Drop assert about number of uniforms in ARB handling.

My recent prog_to_nir patch started making new sampler uniforms, which
apparently increased the number of parameters. We used to poke at the
one parameter directly, making it important that there was only one,
but we haven't done that in a while. It should be safe to just delete
the assertion.

Fixes: 1c0f92d8a8c "nir: Create sampler variables in prog_to_nir."
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Jason Ekstrand [Sat, 13 Oct 2018 15:00:05 +0000 (10:00 -0500)]

vulkan: Add the fuchsia headers

These were missing in the last couple of spec updates.

Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

commit | commitdiff | tree

Bas Nieuwenhuizen [Sat, 13 Oct 2018 17:20:02 +0000 (19:20 +0200)]

radv: Implement VK_EXT_pci_bus_info.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>

commit | commitdiff | tree

Kenneth Graunke [Mon, 30 Jul 2018 22:20:00 +0000 (15:20 -0700)]

gallium/u_transfer_helper: Add support for separate Z24/S8 as well.

u_transfer_helper already had code to handle treating packed Z32_S8
as separate Z32_FLOAT and S8_UINT resources, since some drivers can't
handle that interleaved format natively.

Other hardware needs depth and stencil as separate resources for all
formats. For example, V3D3 needs this for 24-bit depth as well.

This patch adds a new flag to lower all depth/stencils formats, and
implements support for Z24_UNORM_S8_UINT. (S8_UINT_Z24_UNORM is left
as an exercise to the reader, preferably someone who has access to a
machine that uses that format.)

Reviewed-by: Eric Anholt <eric@anholt.net>

commit | commitdiff | tree

Kenneth Graunke [Mon, 30 Jul 2018 22:20:00 +0000 (15:20 -0700)]

gallium/format: Add a helper to combine separate Z24 and S8 stencil.

This new function takes separate Z24 depth and S8 stencil sources,
and packs them into a single combined Z24S8 buffer.

Reviewed-by: Eric Anholt <eric@anholt.net>

commit | commitdiff | tree

Kenneth Graunke [Mon, 30 Jul 2018 22:20:47 +0000 (15:20 -0700)]

gallium/auxiliary: Add util_format_get_depth_only() helper.

This will be used by u_transfer_helper.c shortly, in order to split
packed depth-stencil into separate resources.

Reviewed-by: Eric Anholt <eric@anholt.net>

commit | commitdiff | tree

Kenneth Graunke [Fri, 24 Aug 2018 23:40:19 +0000 (16:40 -0700)]

nir: Create sampler variables in prog_to_nir.

This is needed for nir_gather_info to actually count the textures,
since it operates solely on variables.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>

commit | commitdiff | tree

Kenneth Graunke [Fri, 24 Aug 2018 08:34:30 +0000 (01:34 -0700)]

nir: Create sampler2D variables in nir_lower_{bitmap,drawpixels}.

This is needed for nir_gather_info to actually count the new textures,
since it operates solely on variables.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>

commit | commitdiff | tree

Jason Ekstrand [Sat, 13 Oct 2018 13:23:37 +0000 (08:23 -0500)]

spirv: Update SPIR-V json and headers to Khronos master

This corresponds to commit 801cca8104245c07e8cc532 on GitHub.

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

commit | commitdiff | tree

Samuel Pitoiset [Sat, 13 Oct 2018 12:57:25 +0000 (14:57 +0200)]

vulkan: Update the XML and headers to 1.1.88

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Vinson Lee [Wed, 10 Oct 2018 20:38:12 +0000 (13:38 -0700)]

r600/sb: Fix constant-logical-operand warning.

sb/sb_bc_parser.cpp:620:27: warning: use of logical '&&' with constant operand [-Wconstant-logical-operand]
        if (cf->bc.op_ptr->flags && FF_GDS)
                                 ^  ~~~~~~
sb/sb_bc_parser.cpp:620:27: note: use '&' for a bitwise operation
        if (cf->bc.op_ptr->flags && FF_GDS)
                                 ^~
                                 &
sb/sb_bc_parser.cpp:620:27: note: remove constant to silence this warning
        if (cf->bc.op_ptr->flags && FF_GDS)
                                ~^~~~~~~~~

Fixes: da977ad90747 ("r600/sb: start adding GDS support")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>

commit | commitdiff | tree

Rafael Antognolli [Tue, 4 Sep 2018 23:08:15 +0000 (16:08 -0700)]

i965/miptree: Use enum instead of boolean.

ISL_AUX_USAGE_NONE happens to be the same as "false", but let's do the
right thing and use the enum.

v2: fix intel_miptree_finish_depth too (Caio)

Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Samuel Pitoiset [Fri, 12 Oct 2018 12:04:39 +0000 (14:04 +0200)]

radv: do not support blitting surfaces for R32G32B32 formats

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108113
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

commit | commitdiff | tree

Jose Fonseca [Tue, 9 Oct 2018 15:46:21 +0000 (16:46 +0100)]

scons: Allow building with custom MSVC_USE_SCRIPT script.

SCons MSVC support relies on vcvarsall.bat to extract the PATH, CPP
includes, library paths, etc.

And SCons also has an build env var named MSVC_USE_SCRIPT which one can
use to point to alternative vcvarsall.bat script.

This change exposes this MSVC_USE_SCRIPT build env variable as a SCons
command line variable. This will enable using MSVC outside Program
Files (e.g, network shares, etc.)

This change also links advapi32 library, necessary for the Windows
Registry API used by WGL state tracker, avoiding missing symbols.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>

commit | commitdiff | tree

Samuel Pitoiset [Wed, 3 Oct 2018 21:06:34 +0000 (23:06 +0200)]

radv: emit the GLC bit for SSBO loads/stores when needed

This fixes some new memory model tests:
dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.device.*

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108112
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

commit | commitdiff | tree

Samuel Pitoiset [Wed, 10 Oct 2018 08:42:19 +0000 (10:42 +0200)]

spirv/nir: handle memory access qualifiers for SSBO loads/stores

v2: - change how the access qualifiers are accumulated
v3: - duplicate members in struct_member_decoration_cb()
    - handle access qualifiers on variables
    - remove access qualifiers handling in _vtn_variable_load_store()
    - fix setting access qualifiers on type->array_element

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net

commit | commitdiff | tree

Tapani Pälli [Wed, 3 Oct 2018 10:46:42 +0000 (13:46 +0300)]

anv/android: we need git_sha1.h in include paths

Fixes: e4538b9 "anv: Implement VK_KHR_driver_properties"
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>

commit | commitdiff | tree

Nanley Chery [Thu, 11 Oct 2018 23:31:08 +0000 (16:31 -0700)]

anv: Clear WM_HZ_OP overrides in init_device_state

This is basically a port of commit,
3ade766684933ac84e41634429fb693f85353c11
("i965: Disable 3DSTATE_WM_HZ_OP fields.")

The BDW+ docs describe how to use the 3DSTATE_WM_HZ_OP instruction in
the section titled, "Optimized Depth Buffer Clear and/or Stencil Buffer
Clear." It mentions that the packet overrides GPU state for the clear
operation and needs to be reset to 0s to clear the overrides. Depending
on the kernel, we may not get a context with the GPU state for this
packet zeroed. Do it ourselves just in case.

Prevents a number of GPU hangs when running crucible on ICL. I tried to
get the exact number of hangs that occurs without this patch, but was
unsuccessful. The test machine became unresponsive before completing the
full run.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

commit | commitdiff | tree

Jordan Justen [Wed, 10 Oct 2018 09:31:00 +0000 (02:31 -0700)]

i965/gen10+: Initialize new fields in STATE_BASE_ADDRESS

Ref: 263b584d5e4 "i965/skl: Emit extra zeros in STATE_BASE_ADDRESS on Skylake."
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>

commit | commitdiff | tree

Jordan Justen [Wed, 28 Mar 2018 08:29:18 +0000 (01:29 -0700)]

anv/gen9+: Initialize new fields in STATE_BASE_ADDRESS

Ref: 263b584d5e4 "i965/skl: Emit extra zeros in STATE_BASE_ADDRESS on Skylake."
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>

commit | commitdiff | tree

Jason Ekstrand [Thu, 11 Oct 2018 03:36:52 +0000 (22:36 -0500)]

nir: Add a bunch of b2[if] optimizations

The b2f and b2i conversions always produce zero or one which are both
representable in every type and size. Since b2i and b2f support all bit
sizes, we can just get rid of the conversion opcode.

total instructions in shared programs: 15089335 -> 15084368 (-0.03%)
instructions in affected programs: 212564 -> 207597 (-2.34%)
helped: 896
HURT: 0

total cycles in shared programs: 369831123 -> 369826267 (<.01%)
cycles in affected programs: 2008647 -> 2003791 (-0.24%)
helped: 693
HURT: 216

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

commit | commitdiff | tree

Jason Ekstrand [Thu, 11 Oct 2018 03:04:17 +0000 (22:04 -0500)]

intel/vec4: Fix nir_op_b2[fi] with 64-bit result

This is valid NIR but you can't actually hit this case today.  GLSL IR
doesn't have a bool to double opcode; it does f2d(b2f(x)).  In SPIR-V we
don't have any to/from bool conversion opcodes at all.  However, the
next commit will make us start generating it so we should be ready.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

commit | commitdiff | tree

Jason Ekstrand [Wed, 10 Oct 2018 22:17:11 +0000 (15:17 -0700)]

intel/fs: Fix nir_op_b2[fi] with 64-bit result on Gen8 LP and Gen9 LP

Several of the Atom GPUs have additional restrictions on alignment when
moving < 64-bit source to a 64-bit destination.  All of the nir_op_*2*64
code generation paths respected this, but nir_op_b2[fi] did not.

Previous to commit a68dd47b911 it was not possible to generate such an
instruction from the GLSL path.  It may have been possible from SPIR-V,
but it's not clear.  The aforementioned patch converts a 64-bit
nir_op_fsign into a sequence of operations including a nir_op_b2f with a
64-bit result.  This "just works" everywhere except these Atom parts.

This problem was not detected during normal CI testing because the Atom
parts are not included in developer builds.

v2 (idr): Make the patch compile, and make some cosmetic changes.  Add a
commit message.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108319
Fixes: a68dd47b911 "nir/algebraic: Simplify fsat of fsign"
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

commit | commitdiff | tree

Vinson Lee [Wed, 3 Oct 2018 21:56:26 +0000 (14:56 -0700)]

egl: Use correct shared libraries suffix on macOS.

Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

commit | commitdiff | tree

Illia Iorin [Thu, 11 Oct 2018 15:06:18 +0000 (18:06 +0300)]

mesa: Fix pack_uint_Z_FLOAT32()

Fixed pack_uint_Z_FLOAT32 by casting row data to float instead uint.
Remove code duplicate function pack_uint_Z_FLOAT32_X24S8.
Edited case in "_mesa_get_pack_uint_z_func".
Now it looks like "_mesa_get_pack_float_z_func".
Remove _mesa_problem call, which was added for debuging this issue.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91433
Signed-off-by: Illia Iorin <illia.iorin@globallogic.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>

commit | commitdiff | tree

Rodrigo Vivi [Thu, 30 Aug 2018 21:39:57 +0000 (14:39 -0700)]

intel: Introducing Whiskey Lake platform

Whiskey Lake uses the same gen graphics as Coffe Lake, including some
ids that were previously marked as reserved on Coffe Lake, but that
now are moved to WHL page.

This follows the ids and approach used on kernel's commit
b9be78531d27 ("drm/i915/whl: Introducing Whiskey Lake platform")
and commit c1c8f6fa731b ("drm/i915: Redefine some Whiskey Lake SKUs")

v2: Lionel noticed that GT{1,2,3} on kernel wasn't following
spec when looking to number of EUs, so kernel has been updated.

Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

commit | commitdiff | tree

Boyuan Zhang [Wed, 10 Oct 2018 19:08:44 +0000 (15:08 -0400)]

st/va: use provided sizes and coords for vlVaGetImage

vlVaGetImage should respect the width, height, and coordinates x and y that
passed in. Therefore, pipe_box should be created with the passed in values
instead of surface width/height.

v2: add input size check, return error when size out of bounds
v3: fix the size check for vaimage
v4: add size adjustment for x and y coordinates

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Cc: "18.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Christian König <christian.koenig@amd.com>

commit | commitdiff | tree

Samuel Pitoiset [Tue, 9 Oct 2018 10:26:42 +0000 (12:26 +0200)]

radv: implement clear operations for R32G32B32

This fixes crashes for some CTS:
dEQP-VK.api.copy_and_blit.core.blit_image.all_formats.color.*.linear_*_*
dEQP-VK.api.copy_and_blit.core.blit_image.all_formats.color.*.*_linear_*

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108113
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

commit | commitdiff | tree

Samuel Pitoiset [Mon, 8 Oct 2018 12:40:17 +0000 (14:40 +0200)]

radv: disallow 3D images and mipmaps/layers for R32G32B32 linear formats

R32G32B32 are weird formats and we are only going to support
some basic operations for now.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

commit | commitdiff | tree

Samuel Pitoiset [Wed, 10 Oct 2018 12:04:42 +0000 (14:04 +0200)]

radv: add a workaround for a VGT hang with prim restart and strips

Otherwise, Yakuza and The Evil Within hang the GPU with DXVK.
This apparently only works on Polaris.

Suggested by Marek.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

commit | commitdiff | tree

Timothy Arceri [Thu, 11 Oct 2018 00:25:08 +0000 (11:25 +1100)]

glsl: remove redundant es_shader checks

The es check is already covered by the is_version() check.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

commit | commitdiff | tree

Dave Airlie [Fri, 5 Oct 2018 00:52:51 +0000 (10:52 +1000)]

st/glsl_to_tgsi: initialise need_uarl in contructor

Found by coverity

Reviewed-by: Marek Olšák <marek.olsak@amd.com>

commit | commitdiff | tree

Dave Airlie [Thu, 4 Oct 2018 23:20:09 +0000 (09:20 +1000)]

glspirv: drop pointless assert (size_t is unsigned)

Found by coverity

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>

commit | commitdiff | tree

Dave Airlie [Thu, 4 Oct 2018 23:30:44 +0000 (09:30 +1000)]

radv: remove unsigned comparison against 0

The value is always >= 0 here.

Found by coverity

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>

commit | commitdiff | tree

Dave Airlie [Thu, 4 Oct 2018 23:24:31 +0000 (09:24 +1000)]

radv: remove dead code for master_fd close

We have never opened master_Fd at this point, so remove code to
close it.

Found by coverity.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>

commit | commitdiff | tree

Dave Airlie [Thu, 4 Oct 2018 23:17:45 +0000 (09:17 +1000)]

radv: don't pass shader key by copy

Coverity pointed out we were copying 168 bytes here unnecessarily.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>

commit | commitdiff | tree

Dave Airlie [Thu, 4 Oct 2018 23:56:19 +0000 (09:56 +1000)]

anv: add missing unlock in error path.

Not going to matter, but be consistent.

Found by coverity

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Fixes: caf41c78c (anv/allocator: Support softpin in the BO cache)

commit | commitdiff | tree

Jason Ekstrand [Mon, 8 Oct 2018 17:22:35 +0000 (12:22 -0500)]

intel: Don't propagate conditional modifiers if a UD source is negated

This fixes a bug uncovered by my NIR integer division by constant
optimization series.

Fixes: 19f9cb72c8b "i965/fs: Add pass to propagate conditional..."
Fixes: 627f94b72e0 "i965/vec4: adding vec4_cmod_propagation..."
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

commit | commitdiff | tree

Jason Ekstrand [Thu, 13 Sep 2018 14:56:20 +0000 (09:56 -0500)]

util: Add tests for fast integer division by constants

While I generally trust rediculousfish to have done his homework, we've
made some adjustments to suit the needs of mesa and it'd be good to
test those. Also, there's no better place than unit tests to clearly
document the different edge cases of the different methods.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>

commit | commitdiff | tree

Marek Olšák [Sat, 6 Oct 2018 01:42:16 +0000 (20:42 -0500)]

util: Add power-of-two divisor support to compute_fast_udiv_info

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

commit | commitdiff | tree

Jason Ekstrand [Sat, 6 Oct 2018 01:29:31 +0000 (20:29 -0500)]

util: Generalize fast integer division to be variable bit-width

There's nothing inherently fixed-width in the code.  All that's required
to generalize it is to make everything internally 64-bit and pass
UINT_BITS in as a parameter to util_compute_fast_[us]div_info.  With
that, it can now handle 8, 16, 32, and 64-bit integer division by a
constant.

We also add support for division by 1 and by other powers of 2.  This is
useful if you want to divide by a uniform value in a shader where you
have the opportunity to adjust the uniform on the CPU before passing it
in.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>

commit | commitdiff | tree

Marek Olšák [Sat, 6 Oct 2018 01:28:40 +0000 (20:28 -0500)]

util: Add fast division helpers

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Marek Olšák [Sun, 23 Sep 2018 16:57:51 +0000 (12:57 -0400)]

util: import public domain code for integer division by a constant

Compilers can use this to generate optimal code for integer division
by a constant.

Additionally, an unsigned division by a uniform that is constant but not
known at compile time can still be optimized by passing 2-4 division
factors to the shader as uniforms and executing one of the fast_udiv*
variants. The signed division algorithm doesn't have this capability.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

Domain: Graphics System / GL;

RSS Atom