review.tizen.org Git - platform/upstream/mesa.git/log

st/nine: Do not set unused states for stateblocks

A lot of these states are used only for the context,
and are unused for stateblocks (which just uses the
changed.* fields instead for a lot of them).

Signed-off-by: Axel Davy <davyaxel0@gmail.com>

st/nine: Fix aliasing states for stateblocks

If NINE_STATE_FF_MATERIAL is set, the stateblock will upload
its recorded materials matrix.
If NINE_STATE_FF_LIGHTING is set, the lighting set is uploaded.

These flags could be set by a NineDevice9_SetTransform call
or by setting some states related to ff, but that shouldn't trigger
these stateblock behaviours.

We don't need to follow the context states dirtied by render states.
NINE_STATE_FF_VSTRANSF is exactly the state controlling stateblock
updates of transformation matrices, NINE_STATE_FF is too broad.

These two changes avoid setting the two mentionned states when we
shouldn't.

Fixes: https://github.com/iXit/Mesa-3D/issues/320

Signed-off-by: Axel Davy <davyaxel0@gmail.com>

st/nine: Never update device changed.* fields

The device state changed.* field are never used.
These fields are used only for stateblocks.

Avoid setting them at all for clarity.

Signed-off-by: Axel Davy <davyaxel0@gmail.com>

st/nine: Capture also default matrices for D3DSBT_ALL

We avoid allocating space for never unused matrices.
However we must do as if we had captured them.
Thus when a D3DSBT_ALL stateblock apply has fewer matrices
than device state, allocate the default matrices for the stateblock
before applying.

Signed-off-by: Axel Davy <davyaxel0@gmail.com>

st/nine: Mark transform matrices dirty for D3DSBT_ALL

D3DSBT_ALL stateblocks capture the transform matrices.

Fixes some d3d test programs not displaying properly.

Signed-off-by: Axel Davy <davyaxel0@gmail.com>

st/nine: Don't update unused world matrices

While to the application we have to track
accurately all 256 world matrices (including
in stateblocks), hw vertex processing enables
to set a limit to the number of world matrices
the hardware can access to in the advertised caps,
which is 8 for nine.

Thus don't bother in the stateblock code to send
the updated values for the unreachable matrices.

Signed-off-by: Axel Davy <davyaxel0@gmail.com>

st/nine: Remove two unused states.

NINE_STATE_MATERIAL was used incorrectly at one location.
Replace it with the correct state.

Signed-off-by: Axel Davy <davyaxel0@gmail.com>

st/nine: Remove commented nine_context_apply_stateblock

At some point the project was to adapt the
commented version to csmt.

The csmt rework enabled to fix some state aliasing
issues between stateblocks and internal state updates.
The commented version needs a lot of work to work with that.
Just drop it.

Signed-off-by: Axel Davy <davyaxel0@gmail.com>

nir: Fix array initializer

Empty initializer is not standard C. This fixes MSVC build.

Trivial.

anv: Return VK_ERROR_DEVICE_LOST from anv_device_set_lost

This lets us get rid of a bunch of duplicated error messages.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

anv/util: Split a vk_errorv helper out of vk_errorf

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

scons/svga: remove opt from the list of valid build types

This reverts commit a5fd54f8bf6713312fa5efd7ef5cd125557a0ffe.

The whole point was to add a way to pass -DVMX86_STATS to the build,
but we can do that with a command line argument when we invoke scons.

Reviewed-by: José Fonseca <jfonseca@vmware.com>

intel/blorp: Define the clear value bounds for HiZ clears

Follow the restriction of making sure the clear value is between the min
and max values defined in CC_VIEWPORT. Avoids a simulator warning for
some piglit tests, one of them being:

./bin/depthstencil-render-miplevels 146 d=z32f_s8

Jason found this to fix incorrect clearing on SKL.

Fixes: 09948151ab1d5184b4dd9052bb1f710fa1e00a7b
("intel/blorp: Add the BDW+ optimized HZ_OP sequence to BLORP")

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Jason Ekstrand <jason@jlekstrand.net>

radv: remove duplicate brackets in version string

MESA_GIT_SHA1 resolves to either an empty "" string if not build from git,
or " (git-DEADBEEF)" if it is. No need to wrap it in additional "()".

Fixes: 9d40ec2cf6ec6d3d9d78 "radv: Add support for VK_KHR_driver_properties."
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

vulkan: drop always-true param

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

radeon/vcn: use util function to get h264 profile idc

Use utility function for converting h264 pipe video profile to profile idc,
instead of using array.

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Christian König <christian.koenig at amd.com>

radeon/vce: use util function to get h264 profile idc

Use utility function for converting h264 pipe video profile to profile idc,
instead of using array.

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Christian König <christian.koenig at amd.com>

vl: get h264 profile idc

Adding a function for converting h264 pipe video profile to profile idc

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Christian König <christian.koenig at amd.com>

intel/nir: Use the OPT macro for more passes

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

spirv: Initialize subgroup destinations with the destination type

Instead of initializing them manually, just use the type that we already
have sitting there.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

spirv: Use the right bit-size for spec constant ops

Previously, we would always pull the bit size from the destination which
is wrong for opcodes like nir_ilt where the sources are variable-sized
but the destination is a fixed size. We were getting lucky before
because nir_op_ilt returns a 32-bit value and basically everyone who
uses spec constants uses 32-bit ones.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

nir/prog: Use nir_bany in kill handling

We have a helper that does exactly what the bany_inequal was doing. It
emits the same code but is a bit higher level and is designed to operate
on a bvec4.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>

glsl/nir: Use i2b instead of ine for fixing UBO/SSBO Booleans

They do the same thing in the end but i2b is a bit simpler. Also, let's
clean up the mess of code for SSBO handling with one line of builder.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>

nir/system_values: Use the bit size from the load_deref

This isn't a great solution for bit-sizes but we don't have a
particularly convenient way to get a bit size from the system value enum
and this keeps the lowering pass from changing it.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>

nir/opt_if: Rework condition propagation

Instead of doing our own constant folding, we just emit instructions and
let constant folding happen. This is substantially simpler and lets us
use the nir_imm_bool helper instead of dealing with the const_value's
ourselves.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

nir/search: Use the nir_imm_* helpers from nir_builder

This requires that we rework the interface a bit to use nir_builder but
that's a nice little modernization anyway.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>

nir/builder: Handle 16-bit floats in nir_imm_floatN_t

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>

nir/builder: Add a nir_imm_true/false helpers

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>

nir/constant_folding: Use nir_src_as_bool for discard_if

Missed one while converting to the nir_src_as_* helpers.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>

nir/constant_folding: Add an unreachable to a switch

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>

nir/validate: Print when the validation failed

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>

anv: Handle the device loss abort in anv_device_set_lost

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

anv: Add helpers for setting/checking device lost

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

anv: Provide a error message with a DEVICE_LOST

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

anv: Fix sanitization of stencil state when the depth test is disabled

When depth testing is disabled, we shouldn't pay attention to the
specified depthCompareOp, and just treat it as always passing. Before,
if the depth test is disabled, but depthCompareOp is VK_COMPARE_OP_NEVER
(e.g. from the app having zero-initialized the structure), then
sanitize_stencil_face() would have incorrectly changed passOp to
VK_STENCIL_OP_KEEP.

v2: Roll the depthTestEnable check into the ds_aspect check below since
they now both do the same thing.

Fixes: 028e1137e6 "anv/pipeline: Be smarter about depth/stencil state"
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

radv: implement image to image operations for R32G32B32

This should address the remaining failures in Batman Arkhman City.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107765
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

radv: fix a comment in radv_meta_buffer_to_image_cs_r32g32b32()

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

radv: add get_image_stride_for_r32g32b32() helper

For the special R32G32B32 paths.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

radv: add create_bview_for_r32g32b32() helper

For the special R32G32B32 paths.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

radv: add create_buffer_from_image() helper

For the special R32G32B32 paths.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

intel/compiler: Print message descriptor as immediate source

While disassembling send(c) instruction print message descriptor as
immediate source operand along with message descriptor. This allows
assembler to read immediate source operand and set bits accordingly.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>

intel/compiler: Print hex representation along with floating point value

While encoding the immediate floating point values in instruction we use
values upto precision 9, but while disassembling, we print precision to
6 places, which round up the value and gives wrong interpretation for
encoded immediate constant.

To avoid misinterpretation of encoded immediate values in instruction
and disassembled output, print hex representation along with floating
point value which can be used by assembler in future.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>

util: Change remaining uint32 cache ids to sha1

After discussion with Timothy Arceri. disk_cache_get_function_identifier
was using only the first byte of the sha1 build-id.  Replace
disk_cache_get_function_identifier with implementation from
radv_get_build_id.  Instead of writing a uint32_t it now writes to a
mesa_sha1.  All drivers using disk_cache_get_function_identifier are
updated accordingly.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Fixes: 83ea8dd99bb1 ("util: add disk_cache_get_function_identifier()")

freedreno: use fd_bc_alloc_batch instead of fd_batch_create.

Following the commit 2385d7b066 and 8e798e28f7, for resource dependancy
tracking.

Fixes: dEQP-GLES31.functional.image_load_store.early_fragment_tests.no_early_fragment_tests_depth_fbo
with FD_MESA_DEBUG=inorder

Signed-off-by: Rob Clark <robdclark@gmail.com>

freedreno/ir3: take reg->num out of union in ir3_register

To avoid wrong result when identifying the type of register.
Ie. If the reg is an array, it might be identified as address or
predicate register.

Fixes: dEQP-GLES31.functional.ssbo.layout.random.arrays_of_arrays.6

Signed-off-by: Rob Clark <robdclark@gmail.com>

freedreno/a6xx: disable unused groups

Don't leave vsconst/fsconst group enabled if we switch to shader with no
uniforms.

Fixes: abcdf5627a2 freedreno/a6xx: move const emit to state group
Signed-off-by: Rob Clark <robdclark@gmail.com>

freedreno: add useful assert

Would have been useful to catch the problem fixed in
8e798e28f736e22e9e1e4534ab42a36cde14b142

Signed-off-by: Rob Clark <robdclark@gmail.com>

swr/rast: ignore CreateElementUnorderedAtomicMemCpy

This function's API changed between LLVM 5 and 6. Compile errors occur
when building with LLVM 6+ if LLVM 5 was used for a dist tarball

CC: <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107865
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>

swr/rast: fix intrinsic/function for LLVM 7 compatibility

Converted from x86 VFMADDPS intrinsic to generic LLVM intrinsic, and
removed createInstructionSimplifierPass, which were both removed in LLVM
7.0.0

These changes combine patches we received from the community and our own
internal patches

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Tested-by: Chuck Atkins <chuck.atkins@kitware.com>

nvc0: increase NOUVEAU_TRANSFER_PUSHBUF_THRESHOLD to 1024 on Kepler+

Gives a +3.89% to +5.27% FPS improvement with Hitman and +2.73% to +2.82%
FPS improvement with Dirt Rally on my GTX 1060.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>

radv: Emit enqueued pipeline barriers on event write.

Since the CPU can read them we need to execute any GPU->CPU
flushes before the event is written.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108524
Fixes: f4e499ec791 "radv: add initial non-conformant radv vulkan driver"
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>

radv: Add support for VK_KHR_driver_properties.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

util: use C99 declaration in the for-loop set_foreach() macro

Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

util: use C99 declaration in the for-loop hash_table_foreach() macro

Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

gen: Add AMD_gpu_shader_int64.xml to tarball

CC: Ian Romanick <ian.d.romanick@intel.com>
CC: Marek Olšák <marek.olsak@amd.com>
Fixes: b3c17330e631695b5e5dc209ba9ea1a528618c97
("mesa: expose AMD_gpu_shader_int64")
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>

gen: Add EXT_vertex_attrib_64bit.xml to dependency lists

Which is also required to put it in the tarball, a requirement for
building with meson from the tarball.

CC: Ian Romanick <ian.d.romanick@intel.com>
CC: Marek Olšák <marek.olsak@amd.com>
Fixes: 263c962cfdee6b43578ee5f28601309ea77d1434
("mesa: expose EXT_vertex_attrib_64bit")
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>

anv: move variable to proper scope and mark as MAYBE_UNUSED

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

anv: use snprintf() instead of memset()+strcpy()

snprintf() guarantees that it will not write more chars than allowed,
and that the string will be null-terminated, without the need to fill
the whole thing with zeroes to begin with.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

anv: drop unused includes

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

autotools: include intel_tiled_memcopy.c

There are two problems with the fixed patch. First, it fails to create a
dependency on the sourced .c file, so changes to intel_tiled_memcpy.c
won't trigger a rebuild. It also doesn't get included in the dist
tarball.

Fixes: 11b1afdc92db98e93f2ca50beeb7fc481a11e708
("i965/tiled_memcpy: inline movntdqa loads in tiled_to_linear")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>

meson: fix formatting and add extra_files to i965

extra_files is just a nice way to to tell certain IDEs (and those
reading the file) that this file is also a dependency. Meson will use
the .d file generated by the compiler to figure out what the target
actually depends on.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>

ir3_compiler/nir: fix imageSize() for buffer-backed images

GL_EXT_texture_buffer introduced texture buffers, which can be used
in shaders through a new type imageBuffer.

Because how image access is implemented in freedreno, calling
imageSize on an imageBuffer returns the size in bytes instead of texels,
which is incorrect.

This patch adds a division of imageSize result by the bytes-per-pixel
of the image format, when image is buffer-backed.

Fixes all tests under
dEQP-GLES31.functional.image_load_store.buffer.image_size.*

v2: Pre-compute and submit the log2 of the image format's bpp as shader
constant instead of emitting the LOG2 instruction in code. (Rob Clark)

v3: Use ffs (find-first-bit) helper for computing log2 (Ilia Mirkin)

Reviewed-by: Rob Clark <robdclark@gmail.com>

nir: Fix array initializer.

Empty initializer is not standard C. This fixes MSVC build.

Trivial.

scons: Put to rest zombie texture_float build option.

I found a remnant of texture_float build option that wasn't removed in
commit 66673bef941af344314fe9c91cad8cd330b245eb

This patch removes it.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>

anv: Allow presenting via a different GPU

anv_GetPhysicalDeviceSurfaceSupportKHR will already return success for
this, but anv_GetPhysicalDevice{Xcb,Xlib}PresentationSupportKHR do not.
Apps which check for presentation support via the latter (all Feral
Vulkan games at least) will therefore fail.

This allows me to render on an Intel GPU and present to a display
connected to an AMD card (tested HD 530 + Vega 64).

v2: Rebase on current master.

Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

nir: fix nir_copy_propagation test

Use nir_src_comp_as_uint() to read the proper second component, as
nir_src_as_uint() returns the first one.

v2: Use nir_src_comp_as_uint() [Jason]

Fixes: 16870de8a0a ("nir: Use nir_src_is_const and nir_src_as_* in core
code")
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108532
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Tested-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

radv: call nir_link_xfb_varyings()

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>

radv: move nir_lower_io_to_scalar_early() to radv_link_shaders()

nir_lower_io_to_scalar_early() is really part of the link time
optimisations. Moving it here allows the code to be simplified
and also keeps the code easy to follow in the next patch.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>

nir: add linking helper nir_link_xfb_varyings()

The linking opts shouldn't try removing or compacting XFB varyings
in the consumer. To avoid this we copy the always_active_io flag
from the producer.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

intel/compiler: Change src1 reg type to unsigned doubleword

To have uniform behavior while disassembling send(c) instruction use
register type of unsigned doubleword for src1 when message descriptor is
immediate value. Bspec does not specifiy anything for src1 immediate
default type.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>

mesa/glformats: Remove redundant helper _mesa_base_format_component_count

There exists _mesa_components_in_format() which already includes
all cases handled in _mesa_base_format_component_count().

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>

nir/algebraic: Fix a typo in the bit size validation code

The conon_bit_class and canon_var_class variables got switched.

Fixes: 932c650e0b "nir/algebraic: Loosen a restriction on variables"
Reported-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

amd/common: check DRM version 3.27 for JPEG decode

JPEG was added after DRM version 3.26

Signed-off-by: Leo Liu <leo.liu@amd.com>
Fixes: 4558758c51749(amd/common: add vcn jpeg ip info query)
Cc: Boyuan Zhang <boyuan.zhang@amd.com>
Cc: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

docs: update calendar

I'll take care of 18.2 releases series on Andres behalf.

CC: Andres Gomez <agomez@igalia.com>
CC: Dylan Baker <dylan@pnwbakers.com>
CC: Emil Velikov <emil.l.velikov@gmail.com>

intel/decoders: fix end of batch limit

Pointer arithmetic...

v2: s/4/sizeof(uint32_t)/ (Eric)

v3: Give bytes to print_batch() in error_decode (Lionel)
Make clear what values we're dealing with in error_decode (Lionel)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> (v2)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

radeonsi: enable vcn jpeg decode for raven

Enable vcn jpeg decode for raven.

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>

winsys/amdgpu: add vcn jpeg cs support

Add vcn jpeg cs support, align cs by no-op.

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>

amd/common: add vcn jpeg ip info query

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>

radeon/vcn: implement jpeg target buffer cmd

Implement jpeg target buffer cmd by programming registers directly,
since there is no firmware for VCN Jpeg decode.

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>

radeon/vcn: implement jpeg bitstream buffer cmd

Implement jpeg bitstream buffer cmd by programming registers directly,
since there is no firmware for VCN Jpeg decode.

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>

radeon/uvd: remove get mjpeg slice header

Move the previous get_mjpeg_slice_heaeder function and eoi from
"radeon/vcn" to "st/va".

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>

st/va: get mjpeg slice header

Move the previous get_mjpeg_slice_heaeder function and eoi from
"radeon/vcn" to "st/va".

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>

radeon/vcn: add jpeg decode implementation

Add a new file to handle VCN Jpeg decode specific functions. Use Jpeg
specific cmd sending function in end_frame call.

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>

radeon/vcn: separate send cmd call from end frame

Use function pointer for sending cmd in end_frame call. By doing this, we can
assign different cmd sending logics for Jpeg decode later.

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>

radeon/vcn: create cs based on ring type

Add RING_VCN_JPEG for VCN Jpeg decode, and keep RING_VCN_DEC for other codecs.

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>

radeon/winsys: add vcn jpeg ring type

Add a new ring type for vcn jpeg.

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>

radeon/vcn: add vcn jpeg decode interface

Add VCN Jpeg decode interfaces and register defines.

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>

radeon/vcn: move radeon decoder define to header file

Move radeon_decoder definition from "radeon_vcn_dec.c" to "radeon_vcn_dec.h",
so that it can be included by other files later.

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>

meson: update required amdgpu version to 2.4.95

VCN jpeg requires new hw ip

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>

configure.ac: update libdrm amdgpu version to 2.4.95

VCN jpeg requires new hw ip

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>

radv: fix btoi for R32G32B32 when the dest offset is not 0

Fixes: 593996bc02 ("radv: implement buffer to image operations for R32G32B32")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

i965/miptree: Use cpu tiling/detiling when mapping

Rename the (un)map_gtt functions to (un)map_map (map by
returning a map) and add new functions (un)map_tiled_memcpy that
return a shadow buffer populated with the intel_tiled_memcpy
functions.

Tiling/detiling with the cpu will be the only way to handle Yf/Ys
tiling, when support is added for those formats.

v2: Compute extents properly in the x|y-rounded-down case (Chris Wilson)

v3: Add units to parameter names of tile_extents (Nanley Chery)
    Use _mesa_align_malloc for the shadow copy (Nanley)
    Continue using gtt maps on gen4 (Nanley)

v4: Use streaming_load_memcpy when detiling

v5: (edited by Ken) Move map_tiled_memcpy above map_movntdqa, so it
    takes precedence.  Add intel_miptree_access_raw, needed after
    rebasing on commit b499b85b0f2cc0c82b7c9af91502c2814fdc8e67.

v6: refactor to changes done for sse41 separation (Tapani)

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v5)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>

i965/tiled_memcpy: inline movntdqa loads in tiled_to_linear

The reference for MOVNTDQA says:

    For WC memory type, the nontemporal hint may be implemented by
    loading a temporary internal buffer with the equivalent of an
    aligned cache line without filling this data to the cache.
    [...] Subsequent MOVNTDQA reads to unread portions of the WC
    cache line will receive data from the temporary internal
    buffer if data is available.

This hidden cache line sized temporary buffer can improve the
read performance from wc maps.

v2: Add mfence at start of tiled_to_linear for streaming loads (Chris)
v3: add Android build support (Tapani)
v4: squash 'fix i915: Fix streaming loads for intel_tiled_memcpy'
    separate sse41 to own static library (Tapani)

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v2)
Reviewed-by: Matt Turner <mattst88@gmail.com> (v2)
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>

i965: expose type of memcpy instead of memcpy function itself

There is currently no use of returned memcpy functions outside
intel_tiled_memcpy. Patch changes intel_get_memcpy to return memcpy
type instead of actual function. This makes it easier later to separate
streaming load copy in to own static library.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>

util: use *unsigned* ints for bit operations

Fixes errors thrown by GCC's Undefined Behaviour sanitizer (ubsan) every
time this macro is used.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>

radv: s/abs/fabsf/ for floats

Fixes: a4c4efad89eceb26cf82 "radv: Rework guard band calculation"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

meson: drop option description relic

`platforms` is no longer a comma-separated string, and some of our
option descriptions are way too long already. Just drop the incorrect
bit.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>

st/mesa: Record shader access qualifiers for images

They're not required to be the same as the access flag on the image
unit. For hardware that does shader image lowering based on the
qualifier (Intel), it may be required for state setup.

v2: (by Kenneth Graunke, incorporating feedback from Marek Olšák)
- Reduce both access and shader_access to uint16_t to avoid making
the pipe_image_view structure larger.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>

nir/algebraic: Provide descriptive asserts for bit size checks

This will hopefully make debugging opt_algebraic bit-size compile
failures easier.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>

nir/algebraic: Loosen a restriction on variables

Previously, we would fail if a variable had an assigned but unknown bit
size X and we tried to assign it an actual bit size. However, this is
ok because, at the time we do the search, the variable does have an
actual bit size and it will match X because of the NIR rules.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>