mesa.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	radv: gather clip/cull distances in the shader info pass	Samuel Pitoiset	2019-09-06	2	-21/+25
\| \| \| \| \|	Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
*	radv: move ac_fill_shader_info() to radv_nir_shader_info_pass()	Samuel Pitoiset	2019-09-06	2	-45/+38
\| \| \| \| \|	Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
*	radv: merge radv_shader_variant_info into radv_shader_info	Samuel Pitoiset	2019-09-06	6	-293/+275
\| \| \| \| \| \| \|	Having two different structs is useless. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
*	radeon: Fix mjpeg issue for ARCTURUS	Zhu, James	2019-09-06	1	-0/+1
\| \| \| \| \| \| \|	ARCTURUS mjpeg is using direct register access. Signed-off-by: James Zhu <[email protected]> Reviewed-by: Boyuan Zhang <[email protected]>
*	radeon/vcn: add RENOIR VCN decode support	Leo Liu	2019-09-06	1	-4/+4
\| \| \| \| \| \| \|	It has same VCN2.x block as Navi1x Signed-off-by: Leo Liu <[email protected]> Reviewed-by: Boyuan Zhang <[email protected]>
*	glsl: Fix unroll of do{} while(false) like loops	Danylo Piliaiev	2019-09-06	2	-17/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For loops which condition is false on the first iteration iteration count was falsely calculated under the assumption that loop's condition is true until it becomes false, meaning it's true at least one time. Now such loops are reported as having 0 iteration. Similar to the fix e71fc7f2 done in NIR. Fixes tests/shaders/glsl-fs-loop-while-false-02.shader_test Signed-off-by: Danylo Piliaiev <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
*	tgsi_to_nir: Remove dependency on libglsl.	Timur Kristóf	2019-09-06	2	-14/+18
\| \| \| \| \| \| \| \| \|	This commit removes the GLSL dependency in TTN by manually recording the textures used and calling nir_lower_samplers instead of its GL counterpart. Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
*	nir: Carve out nir_lower_samplers from GLSL code.	Timur Kristóf	2019-09-06	5	-127/+159
\| \| \| \| \| \| \| \| \| \| \| \|	Lowering samplers is needed to produce NIR that can actually be consumed by some gallium drivers, so it doesn't make sense to to keep it only in the GLSL code. This commit introduces nir_lower_samplers to compiler/nir, while maintains the GL-specific function too. Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
*	radeonsi: Release storage for smda_uploads when the context is destroyed	Gert Wollny	2019-09-06	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes a memory leak in the flush code: Direct leak of 128 byte(s) in 1 object(s) allocated from: #0 in __interceptor_realloc .../gcc-8.3.0/libsanitizer/asan/asan_malloc_linux.cc:105 #1 in si_buffer_do_flush_region src/gallium/drivers/radeonsi/si_buffer.c:573 #2 in si_buffer_flush_region src/gallium/drivers/radeonsi/si_buffer.c:608 #3 in si_buffer_flush_region src/gallium/drivers/radeonsi/si_buffer.c:597 Signed-off-by: Gert Wollny <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
*	android: mesa: revert "Enable asm unconditionally"	Mauro Rossi	2019-09-06	2	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch partially reverts 20294dc ("mesa: Enable asm unconditionally, ...") Android makefile build logic needs to disable assembler optimization in 32bit builds to avoid text relocations for libglapi.so shared Fixes the following build error with Android x86 32bit target: [ 0% 4/477] target SharedLib: libglapi (out/target/product/x86/obj/SHARED_LIBRARIES/libglapi_intermediates/LINKED/libglapi.so) FAILED: out/target/product/x86/obj/SHARED_LIBRARIES/libglapi_intermediates/LINKED/libglapi.so ... prebuilts/gcc/linux-x86/x86/x86_64-linux-android-4.9/x86_64-linux-android/bin/ld: warning: shared library text segment is not shareable prebuilts/gcc/linux-x86/x86/x86_64-linux-android-4.9/x86_64-linux-android/bin/ld: error: treating warnings as errors clang-6.0: error: linker command failed with exit code 1 (use -v to see invocation) Fixes: 20294dc ("mesa: Enable asm unconditionally, now that gen_matypes is gone.") Signed-off-by: Mauro Rossi <[email protected]> Acked-by: Eric Engestrom <[email protected]>
*	radv/gfx10: always set ballot_mask_bits to 64	Samuel Pitoiset	2019-09-06	1	-2/+1
\| \| \| \| \| \| \| \|	The codegen handles it and it adds the correct casts. This fixes a bunch of LLVM validation errors when enabling Wave32 for compute. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
*	nir/lower_explicit_io: Handle 1 bit loads and stores	Caio Marcelo de Oliveira Filho	2019-09-05	1	-9/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Load a 32-bit value then convert to 1-bit. Convert 1-bit to 32-bit value, then Store it. These cases started to appear when we changed Anvil to use derefs for shared memory. v2: Use `bit_size` in a couple of places we were missing. (Jason) Reassign `value` instead of `src[0]`. (Jason) Fixes: 024a46a4079 ("anv: use derefs for shared memory access") Reviewed-by: Jason Ekstrand <[email protected]>
*	Revert "intel/fs: Move the scalar-region conversion to the generator."	Jason Ekstrand	2019-09-06	4	-5/+5
\| \| \| \| \| \| \| \| \| \|	This reverts commit c0504569eac5e5c305e9f0c240e248aca9d8891f. Now that we're doing interpolation lowering in NIR, we can continue to stride the FS input registers directly in the brw_fs_nir code like we did before. This fixes SIMD32 fragment shaders which broke because lower_simd_width depended on the 0 stride to split PLN instructions correctly. Reviewed-by: Francisco Jerez <[email protected]>
*	intel/fs: Fix FB write inst groups	Jason Ekstrand	2019-09-06	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	This commit does two things. First, it simplifies the way we compute the FB write group bit. There's no reason to use a ternary because inst->group / 16 can only be 0 or 1. Second, it fixes an order-of- operations bug where the ternary wasn't selecting between (1 << 11) and 0 but between (1 << 11) and 0 \| brw_dp_write_desc(...). Fixes: 0d9648416 "intel/compiler: Use generic SEND for Gen7+ FB writes" Reviewed-by: Francisco Jerez <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	lima/ppir: don't lower phis to scalar	Vasily Khoruzhick	2019-09-05	1	-1/+0
\| \| \| \| \| \| \| \| \| \|	Utgard PP is vec4 architecture, so lowering phis to scalars increases instruction count and potentially interferes with spilling. Tested-by: Andreas Baierl <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Signed-off-by: Vasily Khoruzhick <[email protected]>
*	freedreno/a2xx: formats update	Jonathan Marek	2019-09-06	5	-250/+106
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	For render formats, update fd2_pipe2color to only work with HW supported render formats, and remove the format whitelist is_format_supported. This patch enables float render formats (which work). For vertex/texture formats, use a generic function which translates using the bitsize of the channels. Since we fake support for some vertex formats, check for these in is_format_supported to avoid enabling them as sampler formats. Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Rob Clark <[email protected]>
*	freedreno/a2xx: fix depth gmem restore	Jonathan Marek	2019-09-06	1	-15/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use fd_gmem_restore_format() to avoid trying to use unsupported Z24S8/Z16 render formats for gmem restore. Also apply this change to gmem2mem so it doesn't depend on fd2_pipe2color working with depth formats. gmem2mem/mem2gmem also doesn't need to use the swap/swizzle, since dst/src formats are the same. Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Rob Clark <[email protected]>
*	freedreno/a2xx: implement polygon offset	Jonathan Marek	2019-09-06	2	-0/+14
\| \| \| \| \| \| \| \| \|	Fixes failures in the following deqp tests: dEQP-GLES2.functional.polygon_offset.* Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	freedreno/a2xx: fix SRC_ALPHA_SATURATE for alpha blend function	Jonathan Marek	2019-09-06	1	-1/+6
\| \| \| \| \| \| \| \| \|	Fixes failures in the following deqp tests: dEQP-GLES2.functional.fragment_ops.src_alpha_saturate Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	freedreno/a2xx: ir2: update register state in scalar insert	Jonathan Marek	2019-09-06	1	-0/+6
\| \| \| \| \|	Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Rob Clark <[email protected]>
*	freedreno/a2xx: ir2: fix incorrect instruction reordering	Jonathan Marek	2019-09-06	1	-0/+16
\| \| \| \| \|	Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Rob Clark <[email protected]>
*	freedreno/a2xx: ir2: check opcode on the right instruction in export cp	Jonathan Marek	2019-09-06	1	-1/+1
\| \| \| \| \| \|	Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	freedreno/a2xx: ir2: fix saturate in cp	Jonathan Marek	2019-09-06	1	-0/+4
\| \| \| \| \| \|	Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	freedreno/a2xx: ir2: set lower_fdph	Jonathan Marek	2019-09-06	1	-0/+1
\| \| \| \| \| \| \| \|	The fdph opcode is not supported. Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	freedreno/a2xx: ir2: remove pointcoord y invert	Jonathan Marek	2019-09-06	1	-4/+2
\| \| \| \| \| \| \| \| \|	Fixes the following deqp test: dEQP-GLES2.functional.shaders.builtin_variable.pointcoord Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	freedreno/a2xx: ir2: fix lowering of instructions after float lowering	Jonathan Marek	2019-09-06	1	-3/+2
\| \| \| \| \| \| \| \| \| \| \|	Some instructions generated by int/bool float lowering need to be lowered by opt_algebraic. Fixes: 43dbd7d6 Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	lima/ppir: don't lower vector {b,f}csel to scalar if condition is scalar	Vasily Khoruzhick	2019-09-06	1	-5/+21
\| \| \| \| \| \| \| \| \| \|	Utgard PP has vector fcsel operation, but its condition is scalar. Add filtering callback that checks whether {b,f}csel condition is not scalar to lower {b,f}csel to scalar only in this case. Reviewed-by: Qiang Yu <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Signed-off-by: Vasily Khoruzhick <[email protected]>
*	nir: allow specifying filter callback in lower_alu_to_scalar	Vasily Khoruzhick	2019-09-06	16	-67/+113
\| \| \| \| \| \| \| \| \| \| \| \| \|	Set of opcodes doesn't have enough flexibility in certain cases. E.g. Utgard PP has vector conditional select operation, but condition is always scalar. Lowering all the vector selects to scalar increases instruction number, so we need a way to filter only those ops that can't be handled in hardware. Reviewed-by: Qiang Yu <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Signed-off-by: Vasily Khoruzhick <[email protected]>
*	util: android logging support	Rob Clark	2019-09-06	2	-2/+21
\| \| \| \| \| \| \| \| \|	In particular, it would be nice for failed debug_assert() msgs to show up in logcat. Signed-off-by: Rob Clark <[email protected]> Kristian H. Kristensen <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
*	freedreno/ir3: allow copy propagation for relative	Rob Clark	2019-09-06	1	-9/+19
\| \| \| \| \| \| \| \| \| \|	This appears to work fine (with the additional constraint of keeping the indirect load in the same block that a0.x was loaded). We can probably lift this restriction on earlier gens after testing. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	freedreno/ir3: fix cp cmps.s opt	Rob Clark	2019-09-06	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	Need to use ir3_instr_set_address(), otherwise the instruction might not get added to the indirects table. This becomes a problem when we turn on copy propagation for relative accesses, as check_instr() in the sched pass won't realize there is an indirect consumer of address register load that is ready to be scheduled. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	freedreno/ir3: assert that only single address	Rob Clark	2019-09-06	2	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	An instruction can reference only a single address register value. Add an assert to catch bugs. Also, address value should also be local to the same block as the instruction. (The one spot where changing the instruction address is actually legit needs to clear the address first.) Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	freedreno/ir3: fix mad copy propagation special case	Rob Clark	2019-09-06	1	-9/+35
\| \| \| \| \| \| \| \| \| \| \| \|	After the next patch enabling copy propagation for relative sources, we'll need to dereference the n'th src in valid_flags(), so we actually need to swap the sources before calling valid_flags(). But the logic was already a bit cumbersome, so move it into a helper function. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	freedreno/ir3: fix addr/pred spilling	Rob Clark	2019-09-06	1	-7/+42
\| \| \| \| \| \| \| \| \|	The live_values and use_count was not being properly updated. This starts triggering problems with the next patch, where we allow copy propagation for RELATIV access. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	freedreno/ir3: cleanup "partially const" ubo srcs	Rob Clark	2019-09-06	1	-4/+52
\| \| \| \| \| \| \| \| \| \|	Move the constant part of the indirect offset into nir intrinsic base. When we have multiple indirect accesses with different constant offsets, this lets other opt passes clean up things to use a single address register value. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	lima/ppir: improve regalloc spill cost calculation	Erico Nunes	2019-09-05	1	-5/+49
\| \| \| \| \| \| \| \| \| \| \| \|	Now that spilling ops can be inserted into existing instructions, it makes sense to increase cost to spill registers that would cause the creation of a new instruction. Experimental results showed that penalizing too much due to this caused worse results, however it is beneficial as a tie resolver between registers with the same number of components. Signed-off-by: Erico Nunes <[email protected]> Reviewed-by: Vasily Khoruzhick <[email protected]>
*	lima/ppir: optimizations in regalloc spilling code	Erico Nunes	2019-09-05	1	-90/+88
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Avoid creating unnecessary instructions for the load/store temp nodes when not required, to further reduce register pressure. The store_temp operation seems to be unable to do any spilling. At least the offline shader seems to never output instructions accessing swizzled components, and attempting to output that in ppir results in errors. So, force spilled registers to allocate a full vec4 register. This seems to be the optimal way as it is possible to always keep stores and temps in a single instruction that can be pipelined. Signed-off-by: Erico Nunes <[email protected]> Reviewed-by: Vasily Khoruzhick <[email protected]>
*	lima/ppir: mark regalloc created ssa unspillable	Erico Nunes	2019-09-05	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	One ssa created in the spillinc code in ppir_update_spilled_src was not properly being marked 'spilled', which made it a candidate for future spilling attempts. Since it was being inserted by the spilling code itself, let's mark it unspillable to avoid an infinite spilling loop. Signed-off-by: Erico Nunes <[email protected]> Reviewed-by: Vasily Khoruzhick <[email protected]>
*	v3d: writes to magic registers aren't RF writes after THREND	Jose Maria Casanova Crespo	2019-09-05	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Shaders must not attempt to write to the register files in the last three instructions, but that doesn't include the magic registers: nop ; nop ; thrsw; ldtmu.- * ERROR * nop ; nop nop ; nop v2: Simplify validation rules. (Eric Anholt) v3: Adjust validation even more. (Eric Anholt) Reviewed-by: Eric Anholt <[email protected]>
*	intel/dri: finish proper glthread	Sergii Romantsov	2019-09-05	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	KWin was able to get NULL-context in the call intelUnbindContext. But a call _mesa_glthread_finish is not resistent to such case. Case can be catched with steps: 1. Create both glx and egl contexts 2. Make glx as current 3. Make egl as current 4. Reset glx context 5. Make egl as current Solution adds proper finishing of glthread-context (context will be taken from the requested dri-context for unbinding, but not from the saved current context). Piglit-test: https://gitlab.freedesktop.org/mesa/piglit/merge_requests/87 Cc: 19.1 19.2 <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110814 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111271 Fixes: dca36d5516d0 (i965: Implement threaded GL support) Signed-off-by: Sergii Romantsov <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	radv: Call nir_propagate_invariant()	Connor Abbott	2019-09-05	1	-0/+2
\| \| \| \| \| \| \| \|	Without this, invariant qualifiers don't do anything. Together with a fix to the game, this fixes flickering in No Man's Sky. Cc: [email protected] Reviewed-by: Samuel Pitoiset <[email protected]>
*	radeonsi/nir: Don't lower constant arrays to uniforms	Connor Abbott	2019-09-05	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	shader-db results: Totals: SGPRS: 3955968 -> 3954960 (-0.03 %) VGPRS: 2220220 -> 2220092 (-0.01 %) Spilled SGPRs: 11387 -> 11325 (-0.54 %) Spilled VGPRs: 97 -> 97 (0.00 %) Private memory VGPRs: 2528 -> 2528 (0.00 %) Scratch size: 2656 -> 2656 (0.00 %) dwords per thread Code Size: 76002204 -> 75994988 (-0.01 %) bytes LDS: 740 -> 740 (0.00 %) blocks Max Waves: 772776 -> 772787 (0.00 %) Wait states: 0 -> 0 (0.00 %) Totals from affected shaders: SGPRS: 16840 -> 15832 (-5.99 %) VGPRS: 16452 -> 16324 (-0.78 %) Spilled SGPRs: 1416 -> 1354 (-4.38 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 2016 -> 2016 (0.00 %) Scratch size: 2040 -> 2040 (0.00 %) dwords per thread Code Size: 953624 -> 946408 (-0.76 %) bytes LDS: 303 -> 303 (0.00 %) blocks Max Waves: 1622 -> 1633 (0.68 %) Wait states: 0 -> 0 (0.00 %) There were a large number of regressions in code size, but they seem to be because NIR unrolls some loop which results in the table being replaced by a bunch of immediates on multiplies etc. -- this bloats code size since the table size is now included, but means that there are less loads so it's still a net positive. Reviewed-by: Timothy Arceri <[email protected]>
*	gallium: Plumb through a way to disable GLSL const lowering	Connor Abbott	2019-09-05	7	-1/+20
\| \| \| \| \| \| \| \| \| \|	For radeonsi, we will prefer the NIR pass as it'll generate better code (some index calculation and a single load vs. a load, then index calculation, then another load) and oftentimes NIR optimization can kick in and make all the access indices constant. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
*	st/nir: Don't lower indirects when linking	Connor Abbott	2019-09-05	1	-17/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I believe this was stuck here early because otherwise nir_opt_copy_prop_vars could undo what lower_io_to_temporaries does. However that has since been fixed. Also, we now use scratch for large variables so the comment is stale. On radeonsi these are the shader-db results: Totals: SGPRS: 3955968 -> 3955968 (0.00 %) VGPRS: 2220208 -> 2220220 (0.00 %) Spilled SGPRs: 11387 -> 11387 (0.00 %) Spilled VGPRs: 97 -> 97 (0.00 %) Private memory VGPRs: 2528 -> 2528 (0.00 %) Scratch size: 2656 -> 2656 (0.00 %) dwords per thread Code Size: 76002108 -> 76002204 (0.00 %) bytes LDS: 740 -> 740 (0.00 %) blocks Max Waves: 772779 -> 772776 (-0.00 %) Wait states: 0 -> 0 (0.00 %) Totals from affected shaders: SGPRS: 176 -> 176 (0.00 %) VGPRS: 144 -> 156 (8.33 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 12104 -> 12200 (0.79 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 28 -> 25 (-10.71 %) Wait states: 0 -> 0 (0.00 %) The few small regressions are due to nir_opt_large_constants kicking in when indirect lowering happens to result in smaller code after optimization since the array is very simple. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
*	st/nir: Call nir_remove_unused_variables() in the opt loop	Connor Abbott	2019-09-05	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \|	This prevents regressions when disabling indirect lowering. Sometimes the only use of an input array was copying it to the array created by nir_lower_io_to_temporaries, and without lowering indirects we wouldn't have eliminated the temporary array until after linking, which was too late to remove unused code in the producer. No shader-db changes with radeonsi NIR. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
*	ac/nir: Enable nir_opt_large_constants	Connor Abbott	2019-09-05	2	-0/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	vkpipeline-db numbers: Totals: SGPRS: 1740306 -> 1741322 (0.06 %) VGPRS: 1331124 -> 1331712 (0.04 %) Spilled SGPRs: 21201 -> 21316 (0.54 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 256 -> 256 (0.00 %) dwords per thread Code Size: 79022628 -> 78694788 (-0.41 %) bytes LDS: 6500 -> 6500 (0.00 %) blocks Max Waves: 301413 -> 301302 (-0.04 %) Wait states: 0 -> 0 (0.00 %) Totals from affected shaders: SGPRS: 53633 -> 54649 (1.89 %) VGPRS: 53000 -> 53588 (1.11 %) Spilled SGPRs: 3454 -> 3569 (3.33 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 5284232 -> 4956392 (-6.20 %) bytes LDS: 2 -> 2 (0.00 %) blocks Max Waves: 4239 -> 4128 (-2.62 %) Wait states: 0 -> 0 (0.00 %) (The biggest VGPR and max wave regression is due to unrolling a loop, which made the scheduler more aggressive, but in this case it's able to effectively hide latency so it's actually probably a win.) shader-db numbers with radeonsi NIR: Totals: SGPRS: 3526496 -> 3526512 (0.00 %) VGPRS: 2198576 -> 2198576 (0.00 %) Spilled SGPRs: 10463 -> 10463 (0.00 %) Spilled VGPRs: 86 -> 86 (0.00 %) Private memory VGPRs: 3182 -> 2528 (-20.55 %) Scratch size: 3308 -> 2640 (-20.19 %) dwords per thread Code Size: 74117280 -> 74106140 (-0.02 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 775846 -> 775844 (-0.00 %) Wait states: 0 -> 0 (0.00 %) Totals from affected shaders: SGPRS: 856 -> 872 (1.87 %) VGPRS: 680 -> 680 (0.00 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 654 -> 0 (-100.00 %) Scratch size: 668 -> 0 (-100.00 %) dwords per thread Code Size: 49652 -> 38512 (-22.44 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 182 -> 180 (-1.10 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Marek Olšák <[email protected]>
*	ac/nir: Support load_constant intrinsics	Connor Abbott	2019-09-05	1	-0/+55
\| \| \| \| \| \| \|	Setup a constant global variable that LLVM will stick in a .rodata section and generate PC-relative loads for. Reviewed-by: Marek Olšák <[email protected]>
*	radv/radeonsi: Don't count read-only data when reporting code size	Connor Abbott	2019-09-05	6	-4/+14
\| \| \| \| \| \| \| \| \| \|	We usually use these counts as a simple way to figure out if a change reduces the number of instructions or shrinks an instruction. However, since .rodata sections aren't executed, we shouldn't be counting their size for this analysis. Make the linker return the total executable size, and use it to report the more useful size in both drivers. Reviewed-by: Marek Olšák <[email protected]>
*	headers: remove redundant GL token from GL wrapper	Heinrich Fink	2019-09-05	1	-4/+0
\| \| \| \| \| \| \| \|	Removing GL_FRAMEBUFFER_FLIP_Y_MESA token from glheader.h as it is now provided by glext.h Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	clover: Fix build after clang r370122.	Hal Gentz	2019-09-04	2	-2/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	../mesa/src/gallium/state_trackers/clover/llvm/invocation.cpp: In function ‘std::unique_ptr<clang::CompilerInstance> {anonymous}::create_compiler_instance(const clover::device&, const std::vector<std::__cxx11::basic_string<char> >&, std::string&)’: ../mesa/src/gallium/state_trackers/clover/llvm/invocation.cpp:203:81: error: no matching function for call to ‘clang::CompilerInvocation::CreateFromArgs(clang::CompilerInvocation&, const char* const, const char const, clang::DiagnosticsEngine&)’ 203 \| c->getInvocation(), copts.data(), copts.data() + copts.size(), diag)) \| ^ In file included from /opt/llvm64/include/clang/Frontend/CompilerInstance.h:15, from ../mesa/src/gallium/state_trackers/clover/llvm/codegen.hpp:37, from ../mesa/src/gallium/state_trackers/clover/llvm/invocation.cpp:49: /opt/llvm64/include/clang/Frontend/CompilerInvocation.h:157:15: note: candidate: ‘static bool clang::CompilerInvocation::CreateFromArgs(clang::CompilerInvocation&, llvm::ArrayRef<const char>, clang::DiagnosticsEngine&)’ 157 \| static bool CreateFromArgs(CompilerInvocation &Res, \| ^~~~~~~~~~~~~~ /opt/llvm64/include/clang/Frontend/CompilerInvocation.h:157:15: note: candidate expects 3 arguments, 4 provided Signed-off-by: Hal Gentz <[email protected]> Reviewed-by: Aaron Watry <[email protected]>