mesa.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	vc4: Enable NIR's lower_fmod option.	Kenneth Graunke	2019-06-05	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	Currently, st/mesa is always calling the GLSL IR lower_instructions() pass with MOD_TO_FLOOR set, so mod operations will be lowered before ever reaching NIR. This enables the same lowering at the NIR level, which will let me shut off the GLSL IR path for NIR-based drivers. Reviewed-by: Marek Olšák <[email protected]> Acked-by: Eric Anholt <[email protected]>
*	nir: Drop imov/fmov in favor of one mov instruction	Jason Ekstrand	2019-05-24	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The difference between imov and fmov has been a constant source of confusion in NIR for years. No one really knows why we have two or when to use one vs. the other. The real reason is that they do different things in the presence of source and destination modifiers. However, without modifiers (which many back-ends don't have), they are identical. Now that we've reworked nir_lower_to_source_mods to leave one abs/neg instruction in place rather than replacing them with imov or fmov instructions, we don't need two different instructions at all anymore. Reviewed-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Vasily Khoruzhick <[email protected]> Acked-by: Rob Clark <[email protected]>
*	gallium: Redefine the max texture 2d cap from _LEVELS to _SIZE.	Eric Anholt	2019-05-13	1	-1/+2
\| \| \| \| \| \| \| \|	The _LEVELS assumes that the max is always power of two. For V3D 4.2, we can support up to 7680 non-power-of-two MSAA textures, which will let X11 support dual 4k displays on newer hardware. Reviewed-by: Marek Olšák <[email protected]>
*	nir: allow specifying a set of opcodes in lower_alu_to_scalar	Jonathan Marek	2019-05-10	1	-1/+1
\| \| \| \| \| \| \| \| \|	This can be used by both etnaviv and freedreno/a2xx as they are both vec4 architectures with some instructions being scalar-only. Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Christian Gmeiner <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	nir: Initialize lower_flrp_progress everywhere	Ian Romanick	2019-05-09	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I don't know why I thought NIR_PASS always set the progress variable. Derp. Fixes: d41cdef2a59 ("nir: Use the flrp lowering pass instead of nir_opt_algebraic") Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Coverity CID: 1444996 Coverity CID: 1444995 Coverity CID: 1444994 Coverity CID: 1444993 Coverity CID: 1444991 Coverity CID: 1444989
*	nir: Use the flrp lowering pass instead of nir_opt_algebraic	Ian Romanick	2019-05-06	1	-0/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I tried to be very careful while updating all the various drivers, but I don't have any of that hardware for testing. :( i965 is the only platform that sets always_precise = true, and it is only set true for fragment shaders. Gen4 and Gen5 both set lower_flrp32 only for vertex shaders. For fragment shaders, nir_op_flrp is lowered during code generation as a(1-c)+bc. On all other platforms 64-bit nir_op_flrp and on Gen11 32-bit nir_op_flrp are lowered using the old nir_opt_algebraic method. No changes on any other Intel platforms. v2: Add panfrost changes. Iron Lake and GM45 had similar results. (Iron Lake shown) total cycles in shared programs: 188647754 -> 188647748 (<.01%) cycles in affected programs: 5096 -> 5090 (-0.12%) helped: 3 HURT: 0 helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 helped stats (rel) min: 0.12% max: 0.12% x̄: 0.12% x̃: 0.12% Reviewed-by: Matt Turner <[email protected]>
*	nir: nir_shader_compiler_options: drop native_integers	Christian Gmeiner	2019-05-07	1	-1/+0
\| \| \| \| \| \| \| \|	Driver which do not support native integers should use a lowering pass to go from integers to floats. Signed-off-by: Christian Gmeiner <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	vc4: Fall back to renderonly if the vc4 driver doesn't have v3d.	Eric Anholt	2019-04-26	1	-1/+0
\| \| \| \| \| \| \|	I have a platform with vc4 display but V3D 4.x. We can fall back on kmsro's probing to bring up the v3d gallium driver. Acked-by: Rob Clark <[email protected]>
*	vc4: Use _mesa_hash_table_remove_key() where appropriate.	Eric Anholt	2019-04-26	1	-12/+9
\|
*	vc4: fix build	Rhys Perry	2019-04-15	1	-1/+0
\| \| \| \| \| \|	Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Fixes: 5131b7a43f8488a7 ('gallium: add support for formatted image loads')
*	Delete autotools	Dylan Baker	2019-04-15	2	-77/+0
\| \| \| \| \| \| \| \| \| \|	Acked-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Acked-by: Marek Olšák <[email protected]> Acked-by: Jason Ekstrand <[email protected]> Acked-by: Bas Nieuwenhuizen <[email protected]> Acked-by: Matt Turner <[email protected]>
*	gallium: add support for formatted image loads	Rhys Perry	2019-04-15	1	-0/+1
\| \| \| \| \| \| \| \|	v3: rebase v3: make use of u_pipe_screen_get_param_defaults Signed-off-by: Rhys Perry <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
*	nir: make nir_const_value scalar	Karol Herbst	2019-04-14	1	-1/+1
\| \| \| \| \| \| \| \| \|	v2: remove & operator in a couple of memsets add some memsets v3: fixup lima Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> (v2)
*	nir/i965/freedreno/vc4: add a bindless bool to type size functions	Timothy Arceri	2019-04-12	1	-1/+1
\| \| \| \| \| \| \|	This required to calculate sizes correctly when we have bindless samplers/images. Reviewed-by: Marek Olšák <[email protected]>
*	vc4: Upload CS/VS UBO uniforms together.	Eric Anholt	2019-04-10	5	-181/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Same as I did for V3D, drop all this code trying to GC the non-indirectly-loaded uniforms from the UBO that's used for indirect access of gallium cb[0]. While it does successfully drop some of those, it came at the cost of uploading the VS's indirect unifroms twice, for the bin and render versions of the shader. With the UBO loads simplified, I was also able to easily backport V3D's change to pack a UBO offset into the uniform_data[] field so that we don't need to do the add of the uniform base in the shader. As a bonus, now vc4 doesn't depend on mesa/st type_size functions. total uniforms in shared programs: 25514 -> 25490 (-0.09%) total instructions in shared programs: 77019 -> 76836 (-0.24%)
*	vc4: Split UBO0 and UBO1 address uniform handling.	Eric Anholt	2019-04-10	3	-19/+20
\| \| \| \|	I'm going to extend how UBO0 works in a moment.
*	vc4: Don't forget to set the range when scalarizing our uniforms.	Eric Anholt	2019-04-10	1	-0/+2
\| \| \| \| \| \|	In the next commit, we'll want this for handling UBO access clamping. Reviewed-by: Kenneth Graunke <[email protected]>
*	st: Lower uniforms in st in the !PIPE_CAP_PACKED_UNIFORMS case as well.	Eric Anholt	2019-04-10	1	-6/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	PIPE_CAP_PACKED_UNIFORMS conflates several things: Lowering uniforms i/o at the st level instead of the backend, packing uniforms with no padding at all, and lowering to UBOs. Requiring backends to lower uniforms i/o for !PIPE_CAP_PACKED_UNIFORMS leads to the driver needing to either link against the type size function in mesa/st, or duplicating it in the backend. Given that all backends want this lower-io as far as I can tell, just move it to mesa/st to resolve the link issue and avoid the driver author needing to understand st's uniforms layout. Incidentally, fixes uniform layout failures in nouveau in: dEQP-GLES2.functional.shaders.struct.uniform.sampler_nested_fragment dEQP-GLES2.functional.shaders.struct.uniform.sampler_nested_vertex dEQP-GLES2.functional.shaders.struct.uniform.sampler_array_fragment dEQP-GLES2.functional.shaders.struct.uniform.sampler_array_vertex and I think in Lima as well. v2: fix indents Reviewed-by: Kenneth Graunke <[email protected]>
*	nir: Get rid of global registers	Jason Ekstrand	2019-04-09	1	-2/+0
\| \| \| \| \| \| \| \| \|	We have a pass to lower global registers to locals and many drivers dutifully call it. However, no one ever creates a global register ever so it's all dead code. It's time we bury it. Acked-by: Karol Herbst <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	vc4: Prefer nir_src_comp_as_uint over nir_src_as_const_value	Jason Ekstrand	2019-04-07	2	-18/+16
\| \| \| \| \|	Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	vc4: Use shared drm_find_modifier util	Alyssa Rosenzweig	2019-03-14	1	-15/+3
\| \| \| \| \|	Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	vc4: Switch the post-RA scheduler over to the DAG datastructure.	Eric Anholt	2019-03-11	1	-110/+73
\| \| \| \|	Just a small code reduction from shared infrastructure.
*	v3d: Use the DAG datastructure for QPU instruction scheduling.	Eric Anholt	2019-03-11	1	-3/+3
\| \| \| \|	Just a small code reduction from shared infrastructure.
*	vc4: Reuse list_for_each_entry_rev().	Eric Anholt	2019-03-11	1	-2/+2
\|
*	vc4: Switch over to using the DAG datastructure for QIR scheduling.	Eric Anholt	2019-03-11	1	-79/+55
\| \| \| \|	Just a small code reduction from shared infrastructure.
*	tgsi_to_nir: Produce optimized NIR for a given pipe_screen.	Timur Kristóf	2019-03-05	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With this patch, tgsi_to_nir will output NIR that is tailored to the given pipe, by reading its capabilities and adjusting the NIR code to those capabilities similarly to how glsl_to_nir works. It also adds an optimization loop that brings the output NIR in line with what glsl_to_nir outputs. This is necessary for the same reason why glsl_to_nir has its own optimization loop: currently not every driver does these optimizations yet. For uses which cannot pass a pipe_screen we also keep a variant called tgsi_to_nir_noscreen which keeps the old behavior. Signed-Off-By: Timur Kristóf <[email protected]> Tested-by: Andre Heider <[email protected]> Tested-by: Rob Clark <[email protected]> Acked-By: Eric Anholt <[email protected]>
*	drm-uapi: use local files, not system libdrm	Eric Engestrom	2019-02-14	6	-8/+8
\| \| \| \| \| \| \| \| \|	There was an issue recently caused by the system header being included by mistake, so let's just get rid of this include path and always explicitly #include "drm-uapi/FOO.h" Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
*	gallium: add PIPE_CAP_MAX_VARYINGS	Karol Herbst	2019-02-07	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some NVIDIA hardware can accept 128 fragment shader input components, but only have up to 124 varying-interpolated input components. We add a new cap to express this cleanly. For most drivers, this will have the same value as PIPE_SHADER_CAP_MAX_INPUTS for the fragment shader. Fixes KHR-GL45.limits.max_fragment_input_components Signed-off-by: Karol Herbst <[email protected]> [imirkin: rebased, improved docs/commit message] Signed-off-by: Ilia Mirkin <[email protected]> Acked-by: Rob Clark <[email protected]> Acked-by: Eric Anholt <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Cc: 19.0 <[email protected]>
*	vc4: Fix leak in HW queries error path	Ernestas Kulik	2019-01-29	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Reported by Coverity: in the case where there exist hardware and non-hardware queries, the code does not jump to err_free_query and leaks the query. CID: 1430194 Signed-off-by: Ernestas Kulik <[email protected]> Fixes: 9ea90ffb98fb ("broadcom/vc4: Add support for HW perfmon")
*	vc4: Enable NEON asm on meson cross-builds.	Eric Anholt	2019-01-28	1	-4/+6
\| \| \| \| \| \| \| \| \|	The core Mesa with_asm_arch and USE_ARM_ASM flags are disabled for meson cross-builds because of the need to run host binaries on the build system. vc4 doesn't need to do that, so skip with_asm_arch to enable NEON on my cross-builds. Fixes: ebcb4c2156e9 ("meson: Enable VC4's NEON assembly support.")
*	nir: add bit_size parameter to system values with multiple allowed bit sizes	Karol Herbst	2019-01-21	1	-2/+2
\| \| \| \| \| \| \| \| \|	v2: add assert to verify we have at least one valid bit_size v3: fix use of load_front_face in nir_lower_two_sided_color and tgsi_to_nir Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
*	nir: rename nir_var_function to nir_var_function_temp	Karol Herbst	2019-01-19	1	-2/+2
\| \| \| \| \| \| \| \|	Signed-off-by: Karol Herbst <[email protected]> Acked-by: Jason Ekstrand <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
*	nir: rename global/local to private/function memory	Karol Herbst	2019-01-08	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	the naming is a bit confusing no matter how you look at it. Within SPIR-V "global" memory is memory accessible from all threads. glsl "global" memory normally refers to shader thread private memory declared at global scope. As we already use "shared" for memory shared across all thrads of a work group the solution where everybody could be happy with is to rename "global" to "private" and use "global" later for memory usually stored within system accessible memory (be it VRAM or system RAM if keeping SVM in mind). glsl "local" memory is memory only accessible within a function, while SPIR-V "local" memory is memory accessible within the same workgroup. v2: rename local to function as well v3: rename vtn_variable_mode_local as well Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	vc4: Hook up perf_debug() output to GL_ARB_debug_output as well.	Eric Anholt	2018-12-20	2	-0/+3
\| \| \| \| \|	This is the right channel to report these things, so that end-users don't need to know each driver's custom debug options.
*	vc4: Wire up core pipe_debug_callback	Rhys Kidd	2018-12-20	2	-0/+14
\| \| \| \| \| \| \|	This lets the driver use pipe_debug_message() for GL_ARB_debug_output. Signed-off-by: Rhys Kidd <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	vc4: Move the utile load/store functions to a header for reuse by v3d.	Eric Anholt	2018-12-19	2	-202/+11
\| \| \| \| \|	These implementations of whole-utile load/stores would be the same for v3d, though the layouts of blocks of utiles has changed.
*	nir/opt_peephole_select: Don't peephole_select expensive math instructions	Ian Romanick	2018-12-17	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On some GPUs, especially older Intel GPUs, some math instructions are very expensive. On those architectures, don't reduce flow control to a csel if one of the branches contains one of these expensive math instructions. This prevents a bunch of cycle count regressions on pre-Gen6 platforms with a later patch (intel/compiler: More peephole select for pre-Gen6). v2: Remove stray #if block. Noticed by Thomas. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Thomas Helland <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
*	nir/opt_peephole_select: Don't try to remove flow control around indirect loads	Ian Romanick	2018-12-17	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	That flow control may be trying to avoid invalid loads. On at least some platforms, those loads can also be expensive. No shader-db changes on any Intel platform (even with the later patch "intel/compiler: More peephole select"). v2: Add a 'indirect_load_ok' flag to nir_opt_peephole_select. Suggested by Rob. See also the big comment in src/intel/compiler/brw_nir.c. v3: Use nir_deref_instr_has_indirect instead of deref_has_indirect (from nir_lower_io_arrays_to_elements.c). v4: Fix inverted condition in brw_nir.c. Noticed by Lionel. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
*	vc4: Reuse nir_format_convert.h in our blend lowering.	Eric Anholt	2018-12-17	1	-33/+3
\| \| \| \| \|	These helpers came along after and have effectively the same implementation.
*	nir: Add a bool to int32 lowering pass	Jason Ekstrand	2018-12-16	1	-0/+2
\| \| \| \| \| \| \| \|	We also enable it in all of the NIR drivers. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Tested-by: Bas Nieuwenhuizen <[email protected]>
*	nir: Rename Boolean-related opcodes to include 32 in the name	Jason Ekstrand	2018-12-16	1	-20/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a squash of a bunch of individual changes: nir/builder: Generate 32-bit bool opcodes transparently nir/algebraic: Remap Boolean opcodes to the 32-bit variant Use 32-bit opcodes in the NIR producers and optimizations Generated with a little hand-editing and the following sed commands: sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' */.c sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' */.c sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' */.c sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' */.c sed -i 's/nir_op_\([fiu]lt\)/nir_op_\132/g' */.c sed -i 's/nir_op_\([fiu]ge\)/nir_op_\132/g' */.c sed -i 's/nir_op_\([fiu]ne\)/nir_op_\132/g' */.c sed -i 's/nir_op_\([fiu]eq\)/nir_op_\132/g' */.c sed -i 's/nir_op_\([fi]\)ne32g/nir_op_\1neg/g' */.c sed -i 's/nir_op_bcsel/nir_op_b32csel/g' */.c Use 32-bit opcodes in the NIR back-ends Generated with a little hand-editing and the following sed commands: sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' */.c sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' */.c sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' */.c sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' */.c sed -i 's/nir_op_\([fiu]lt\)/nir_op_\132/g' */.c sed -i 's/nir_op_\([fiu]ge\)/nir_op_\132/g' */.c sed -i 's/nir_op_\([fiu]ne\)/nir_op_\132/g' */.c sed -i 's/nir_op_\([fiu]eq\)/nir_op_\132/g' */.c sed -i 's/nir_op_\([fi]\)ne32g/nir_op_\1neg/g' */.c sed -i 's/nir_op_bcsel/nir_op_b32csel/g' */.c Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Tested-by: Bas Nieuwenhuizen <[email protected]>
*	vc4: Use the original bit size when scalarizing uniform loads.	Eric Anholt	2018-12-16	1	-1/+2
\| \| \| \| \| \|	Prevents a regression in jekstrand's 1-bit series. Reviewed-by: Jason Ekstrand <[email protected]>
*	vc4: Fix a leak of the transfer helper on screen destroy.	Eric Anholt	2018-12-07	1	-0/+3
\| \| \| \|	Fixes: d009463a6549 ("vc4: Switch to using u_transfer_helper for MSAA maps.")
*	nir: Make boolean conversions sized just like the others	Jason Ekstrand	2018-12-05	1	-4/+4
\| \| \| \| \| \| \| \| \|	Instead of a single i2b and b2i, we now have i2b32 and b2iN where N is one if 8, 16, 32, or 64. This leads to having a few more opcodes but now everything is consistent and booleans aren't a weird special case anymore. Reviewed-by: Connor Abbott <[email protected]>
*	nir: Make nir_lower_clip_vs optionally work with variables.	Kenneth Graunke	2018-11-19	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \|	The way nir_lower_clip_vs() works with store_output intrinsics makes a ton of assumptions about the driver_location field. In i965 and iris, I'd rather do this lowering early and work with variables. v3d may want to switch to that as well, and ir3 could too, but I'm not sure exactly what would need updating. For now, handle both methods. Reviewed-by: Eric Anholt <[email protected]>
*	vc4: Don't return a vc4 BO handle on a renderonly screen.	Eric Anholt	2018-11-15	1	-2/+4
\| \| \| \| \| \|	The handles exported need to be on the KMS device's fd, anything else is failure. Also, this code is assuming that the scanout resource has been created already, so assert it.
*	vc4: Make sure we make ro scanout resources for create_with_modifiers.	Eric Anholt	2018-11-15	1	-1/+9
\| \| \| \| \| \| \| \| \| \|	The DRI3 create_with_modifiers paths don't set tmpl.bind to SCANOUT or SHARED, with the theory that given that you've got modifiers, that's all you need. However, we were looking at the tmpl.bind for setting up the KMS handle in the renderonly case, so we'd end up trying to use vc4's handle on the hx8357d fd. Fixes: 84ed8b67c56b ("vc4: Set shareable BOs as T tiled if possible")
*	vc4: Use the normal simulator ioctl path for CL submit as well.	Eric Anholt	2018-11-02	3	-13/+5
\| \| \| \|	The simulator no longer needs to look back into the gallium structs.
*	vc4: Maintain a separate GEM mapping of BOs in the simulator.	Eric Anholt	2018-11-02	2	-42/+58
\| \| \| \|	This will let us avoid looking back into the gallium driver's vc4_bo.
*	vc4: Take advantage of _mesa_hash_table_remove_key() in the simulator.	Eric Anholt	2018-11-02	1	-4/+2
\|