mesa.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	nir: avoid uninitialized variable warning	Timothy Arceri	2019-01-07	1	-1/+1
\| \| \| \| \|	Reviewed-by: Bas Nieuwenhuizen <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109231
*	nir: Add nir_lower_tex options to lower sampler return formats.	Eric Anholt	2019-01-04	2	-0/+83
\| \| \| \| \| \| \| \| \| \| \| \| \|	I've been doing this in the nir-to-vir and nir-to-qir backends of v3d and vc4, but nir could potentially do some useful stuff for us (like avoiding unpack/repacks) if we give it the information. v2: Skip lowering for txs/query_levels v3: Fix a crash on old-style shadow v4: Rename to tex_packing, use nir_format_unpack_sint/uint helpers, pack the enum. Reviewed-by: Jason Ekstrand <[email protected]>
*	nir: Allow nir_format_unpack_int/sint to unpack larger values.	Eric Anholt	2019-01-04	1	-3/+8
\| \| \| \| \| \| \|	For V3D, I want to unpack 4-16-bit packed integers for 8 and 16-bit integer samplers. Reviewed-by: Jason Ekstrand <[email protected]>
*	nir: remove dead code from copy_prop_vars	Caio Marcelo de Oliveira Filho	2019-01-04	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	When copy_prop_vars also took care of dead write handling, intrin was used as part of store_to_entry. Now it isn't, so this assignment isn't used really used. Add a comment clarifying what happens to intrin. Fixes: 4dfa7adc100 "nir: Remove handling of dead writes from copy_prop_vars" Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	glsl/linker: complete documentation for assign_attribute_or_color_locations	Andres Gomez	2019-01-04	1	-9/+13
\| \| \| \| \| \| \| \| \|	Commit 27f1298b9d9 ("glsl/linker: validate attribute aliasing before optimizations") forgot to complete the documentation. Cc: Tapani Pälli <[email protected]> Signed-off-by: Andres Gomez <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
*	nir: merge some basic consecutive ifs	Timothy Arceri	2019-01-03	1	-0/+93
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	After trying multiple times to merge if-statements with phis between them I've come to the conclusion that it cannot be done without regressions. The problem is for some shaders we end up with a whole bunch of phis for the merged ifs resulting in increased register pressure. So this patch just merges ifs that have no phis between them. This seems to be consistent with what LLVM does so for radeonsi we only see a change (although its a large change) in a single shader. Shader-db results i965 (SKL): total instructions in shared programs: 13098176 -> 13098152 (<.01%) instructions in affected programs: 1326 -> 1302 (-1.81%) helped: 4 HURT: 0 total cycles in shared programs: 332032989 -> 332037583 (<.01%) cycles in affected programs: 60665 -> 65259 (7.57%) helped: 0 HURT: 4 The cycles estimates reported by shader-db for i965 seem inaccurate as the only difference in the final code is the removal of the redundent condition evaluations and jumps. Also the biggest code reduction (~7%) for radeonsi was in a tomb raider tressfx shader but for some reason this does not get merged for i965. Shader-db results radeonsi (VEGA): Totals from affected shaders: SGPRS: 232 -> 232 (0.00 %) VGPRS: 164 -> 164 (0.00 %) Spilled SGPRs: 59 -> 59 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 14584 -> 13520 (-7.30 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 13 -> 13 (0.00 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Ian Romanick <[email protected]>
*	nir: add rewrite_phi_predecessor_blocks() helper	Timothy Arceri	2019-01-03	1	-20/+31
\| \| \| \| \| \|	This will also be used by the if merge pass in the following commit. Reviewed-by: Ian Romanick <[email protected]>
*	nir: simplify does_varying_match()	Timothy Arceri	2019-01-03	1	-5/+2
\| \| \| \|	Reviewed-by: Alejandro Piñeiro <[email protected]>
*	nir: make use of does_varying_match() helper	Timothy Arceri	2019-01-03	1	-2/+1
\| \| \| \|	Reviewed-by: Alejandro Piñeiro <[email protected]>
*	nir: make nir_opt_remove_phis_impl() static	Timothy Arceri	2019-01-03	2	-2/+1
\| \| \| \|	Reviewed-by: Alejandro Piñeiro <[email protected]>
*	nir: add a way to print the deref chain	Caio Marcelo de Oliveira Filho	2019-01-02	2	-4/+14
\| \| \| \| \| \| \| \|	Makes debugging easier when we care about the deref chain and not the deref instruction itself. To make it take a const pointer, constify some of the static functions in nir_print.c. Reviewed-by: Eric Anholt <[email protected]>
*	compiler/spirv: use 32-bit polynomial approximation for 16-bit asin()	Iago Toral Quiroga	2019-01-02	1	-0/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The 16-bit polynomial execution doesn't meet Khronos precision requirements. Also, the half-float denorm range starts at 2^(-14) and with asin taking input values in the range [0, 1], polynomial approximations can lead to flushing relatively easy. An alternative is to use the atan2 formula to compute asin, which is the reference taken by Khronos to determine precision requirements, but that ends up generating too many additional instructions when compared to the polynomial approximation. Specifically, for the Intel case, doing this adds +41 instructions to the program for each asin/acos call, which looks like an undesirable trade off. So for now we take the easy way out and fallback to using the 32-bit polynomial approximation, which is better (faster) than the 16-bit atan2 implementation and gives us better precision that matches Khronos requirements. v2: - Fallback to 32-bit using recursion (Jason). Reviewed-by: Jason Ekstrand <[email protected]>
*	compiler/spirv: implement 16-bit frexp	Iago Toral Quiroga	2019-01-02	1	-2/+46
\| \| \| \|	Reviewed-by: Jason Ekstrand <[email protected]>
*	compiler/spirv: implement 16-bit hyperbolic trigonometric functions	Iago Toral Quiroga	2019-01-02	1	-18/+26
\| \| \| \| \| \| \| \| \| \|	v2: - use nir_fadd_imm and nir_fmul_imm helpers (Jason) v3: - since we need to define one for fsub use it for fdiv too (Jason) Reviewed-by: Jason Ekstrand <[email protected]>
*	compiler/spirv: implement 16-bit exp and log	Iago Toral Quiroga	2019-01-02	1	-2/+2
\| \| \| \| \| \| \|	v2 - use nir_fmul_imm helper (Jason) Reviewed-by: Jason Ekstrand <[email protected]>
*	compiler/spirv: implement 16-bit atan2	Iago Toral Quiroga	2019-01-02	1	-7/+11
\| \| \| \| \| \| \| \| \| \| \| \|	v2: - fix huge_val for 16-bit, it was mean't to be 2^14 not 10^14. v3: - rebase on top of new bool sized opcodes - use nir_b2f helper - use nir_fmul_imm helper Reviewed-by: Jason Ekstrand <[email protected]>
*	compiler/spirv: implement 16-bit atan	Iago Toral Quiroga	2019-01-02	1	-12/+11
\| \| \| \| \| \| \| \| \|	v2: - use nir_fadd_imm and nir_fmul_imm helpers (Jason) - rebased on top of new sized boolean opcodes - use nir_b2f helper Reviewed-by: Jason Ekstrand <[email protected]>
*	compiler/spirv: implement 16-bit acos	Iago Toral Quiroga	2019-01-02	1	-2/+3
\| \| \| \|	Reviewed-by: Jason Ekstrand <[email protected]>
*	compiler/spirv: implement 16-bit asin	Iago Toral Quiroga	2019-01-02	1	-9/+14
\| \| \| \| \| \| \| \| \| \| \|	v2: - use nir_fmul_imm and nir_fadd_imm helpers (Jason) v3: - missed one case where we need to replace nir_imm_float with nir_imm_floatN_t (Jason) Reviewed-by: Jason Ekstrand <[email protected]>
*	compiler/spirv: handle 16-bit float in radians() and degrees()	Iago Toral Quiroga	2019-01-02	1	-2/+2
\| \| \| \| \| \| \|	v2: - use nir_imm_fmul helper (Jason) Reviewed-by: Jason Ekstrand <[email protected]>
*	compiler/nir: add nir_fadd_imm() and nir_fmul_imm() helpers	Iago Toral Quiroga	2019-01-02	1	-0/+12
\| \| \| \|	Reviewed-by: Jason Ekstrand <[email protected]>
*	compiler/nir: add a nir_b2f() helper	Iago Toral Quiroga	2019-01-02	1	-0/+12
\| \| \| \|	Reviewed-by: Jason Ekstrand <[email protected]>
*	nir: link time opt duplicate varyings	Timothy Arceri	2019-01-02	1	-0/+88
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we are outputting the same value to more than one output component rewrite the inputs to read from a single component. This will allow the duplicate varying components to be optimised away by the existing opts. shader-db results i965 (SKL): total instructions in shared programs: 12869230 -> 12860886 (-0.06%) instructions in affected programs: 322601 -> 314257 (-2.59%) helped: 3080 HURT: 8 total cycles in shared programs: 317792574 -> 317730593 (-0.02%) cycles in affected programs: 2584925 -> 2522944 (-2.40%) helped: 2975 HURT: 477 shader-db results radeonsi (VEGA): SGPRS: 31576 -> 31664 (0.28 %) VGPRS: 17484 -> 17064 (-2.40 %) Spilled SGPRs: 184 -> 167 (-9.24 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 583340 -> 569368 (-2.40 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 6162 -> 6270 (1.75 %) Wait states: 0 -> 0 (0.00 %) vkpipeline-db results RADV (VEGA): Totals from affected shaders: SGPRS: 14880 -> 15080 (1.34 %) VGPRS: 10872 -> 10888 (0.15 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 674016 -> 668396 (-0.83 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 2708 -> 2704 (-0.15 %) Wait states: 0 -> 0 (0.00 % V2: bunch of tidy ups suggested by Jason Reviewed-by: Eric Anholt <[email protected]>
*	nir: rework nir_link_opt_varyings()	Timothy Arceri	2019-01-02	1	-16/+12
\| \| \| \| \| \| \| \|	This just cleans things up a little and make things more safe for derefs. Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	nir: add can_replace_varying() helper	Timothy Arceri	2019-01-02	1	-2/+14
\| \| \| \| \| \| \| \|	This will be reused by the following patch. Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	nir: rename nir_link_constant_varyings() nir_link_opt_varyings()	Timothy Arceri	2019-01-02	2	-2/+2
\| \| \| \| \| \| \| \| \| \|	The following patches will add support for an additional optimisation so this function will no longer just optimise varying constants. Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	spirv: add support for SpvCapabilityStorageImageMultisample	Samuel Pitoiset	2018-12-20	2	-1/+5
\| \| \| \| \|	Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
*	intel/compiler: move nir_lower_bool_to_int32 before nir_lower_locals_to_regs	Iago Toral Quiroga	2018-12-20	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \|	The former expects to see SSA-only things, but the latter injects registers. The assertions in the lowering where not seeing this because they asserted on the bit_size values only, not on the is_ssa field, so add that assertion too. Fixes: 11dc1307794e "nir: Add a bool to int32 lowering pass" CC: [email protected] Reviewed-by: Jason Ekstrand <[email protected]>
*	nir: properly find the entry to keep in copy_prop_vars	Caio Marcelo de Oliveira Filho	2018-12-19	1	-3/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When copy propagation handles a store/copy, it iterates the current copy entries to remove aliases, but keeps the "equal" entry (if exists) to be updated. The removal step may swap the entries around (to ensure there are no holes), invalidating previous iteration pointers. The bug was saving such pointer to use later. Change the code to first perform the removals and then find the remaining right entry. This was causing updates to be lost since they were being made to an entry that was not part of the current copies. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108624 Fixes: b3c61469255 "nir: Copy propagation between blocks" Cc: [email protected] Reviewed-by: Jason Ekstrand <[email protected]>
*	nir: properly clear the entry sources in copy_prop_vars	Caio Marcelo de Oliveira Filho	2018-12-19	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	When updating a copy entry source value from a "non-SSA" (the data come from a copy instruction) to a "SSA" (the data or parts of it come from SSA values), it was possible to hold invalid data in ssa[0] depending on the writemask. Because the union, ssa[0] could contain a pointer to a nir_deref_instr left-over from previous non-SSA usage. Change code to clean up the array before use to avoid invalid data around. Fixes: 62332d139c8 "nir: Add a local variable-based copy propagation pass" Reviewed-by: Jason Ekstrand <[email protected]>
*	nir/algebraic: Don't put quotes around floating point literals	Ian Romanick	2018-12-18	2	-5/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The quotation marks around 1.0 cause it to be treated as a string instead of a floating point value. The generator then treats it as an arbitrary variable replacement, so any iand involving a ('ineg', ('b2i', a)) matches. v2: Remove misleading comment about sized literals (suggested by Timothy). Add assertion that the name of a varible is entierly alphabetic (suggested by Jason). Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Tested-by: Timothy Arceri <[email protected]> [v1] Reviewed-by: Timothy Arceri <[email protected]> [v1] Fixes: 6bcd2af0863 ("nir/algebraic: Add some optimizations for D3D-style Booleans") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109075
*	nir: Add a new lowering option to lower 3D surfaces from txd to txl.	Sagar Ghuge	2018-12-18	2	-1/+8
\| \| \| \| \| \| \| \| \|	Tested on gen9. v2: Rename lower_txd_3d_surafaces flag to lower_txd_3d (Jason Ekstrand) Signed-off-by: Sagar Ghuge <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	nir/dead_write_vars: Get modes directly from derefs	Jason Ekstrand	2018-12-18	1	-2/+1
\| \| \| \| \| \| \| \| \|	Instead of going all the way back to the variable, just look at the deref. The modes are guaranteed to be the same by nir_validate whenever the variable can be found. This fixes clear_unused_for_modes for derefs that don't have an accessible variable. Reviewed-by: Timothy Arceri <[email protected]>
*	nir/copy_prop_vars: Get modes directly from derefs	Jason Ekstrand	2018-12-18	1	-6/+2
\| \| \| \| \| \| \| \| \|	Instead of going all the way back to the variable, just look at the deref. The modes are guaranteed to be the same by nir_validate whenever the variable can be found. This fixes apply_barrier_for_modes for derefs that don't have an accessible variable. Reviewed-by: Timothy Arceri <[email protected]>
*	nir/lower_wpos_center: Look at derefs for modes	Jason Ekstrand	2018-12-18	1	-2/+4
\| \| \| \| \| \| \| \| \|	This is instead of looking all the way back to the variable which may not exist for all derefs. This makes this code properly ignore casts with modes other than the mode[s] we care about (where casts aren't allowed). Reviewed-by: Timothy Arceri <[email protected]>
*	nir/lower_io_to_scalar: Look at derefs for modes	Jason Ekstrand	2018-12-18	1	-3/+6
\| \| \| \| \| \| \| \| \|	This is instead of looking all the way back to the variable which may not exist for all derefs. This makes this code properly ignore casts with modes other than the mode[s] we care about (where casts aren't allowed). Reviewed-by: Timothy Arceri <[email protected]>
*	nir/lower_io_arrays_to_elements: Look at derefs for modes	Jason Ekstrand	2018-12-18	1	-5/+8
\| \| \| \| \| \| \| \| \|	This is instead of looking all the way back to the variable which may not exist for all derefs. This makes this code properly ignore casts with modes other than the mode[s] we care about (where casts aren't allowed). Reviewed-by: Timothy Arceri <[email protected]>
*	nir/linking_helpers: Look at derefs for modes	Jason Ekstrand	2018-12-18	1	-12/+11
\| \| \| \| \| \| \| \| \|	This is instead of looking all the way back to the variable which may not exist for all derefs. This makes this code properly ignore casts with modes other than the mode[s] we care about (where casts aren't allowed). Reviewed-by: Timothy Arceri <[email protected]>
*	nir/propagate_invariant: Skip unknown vars	Jason Ekstrand	2018-12-18	1	-1/+1
\| \| \| \| \| \| \| \|	If we can't find the variable from the deref, just assume it isn't invariant and continue on. This can happen if, for instance, we're writing to a deref that points into an SSBO. Reviewed-by: Timothy Arceri <[email protected]>
*	Revert "nir/lower_indirect: Bail early if modes == 0"	Ian Romanick	2018-12-18	1	-3/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	"There's no point in walking the program if we're never going to actually lower anything." Except we might lower compacted local arrays. In that case, modes will be 0, but there is still lowering to be done. This reverts commit 7f75cf2a9408b9af562e033ef6c1d1fd15141421. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109081 Suggested-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]> Tested-by: Clayton Craft <[email protected]> Cc: Kenneth Graunke <[email protected]>
*	nir/opt_peephole_select: Don't peephole_select expensive math instructions	Ian Romanick	2018-12-17	2	-9/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On some GPUs, especially older Intel GPUs, some math instructions are very expensive. On those architectures, don't reduce flow control to a csel if one of the branches contains one of these expensive math instructions. This prevents a bunch of cycle count regressions on pre-Gen6 platforms with a later patch (intel/compiler: More peephole select for pre-Gen6). v2: Remove stray #if block. Noticed by Thomas. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Thomas Helland <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
*	nir/opt_peephole_select: Don't try to remove flow control around indirect loads	Ian Romanick	2018-12-17	2	-11/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	That flow control may be trying to avoid invalid loads. On at least some platforms, those loads can also be expensive. No shader-db changes on any Intel platform (even with the later patch "intel/compiler: More peephole select"). v2: Add a 'indirect_load_ok' flag to nir_opt_peephole_select. Suggested by Rob. See also the big comment in src/intel/compiler/brw_nir.c. v3: Use nir_deref_instr_has_indirect instead of deref_has_indirect (from nir_lower_io_arrays_to_elements.c). v4: Fix inverted condition in brw_nir.c. Noticed by Lionel. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
*	nir: Fix clamping of uints for image store lowering.	Eric Anholt	2018-12-17	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	I botched some copy-and-paste and clamped to signed int max instead of uint max. Fixes KHR-GL46.shader_image_load_store.multiple-uniforms on skl. Fixes: d3e046e76c06 ("nir: Pull some of intel's image load/store format conversion to nir_format.h") Reviewed-by: Jason Ekstrand <[email protected]>
*	nir: Release per-block metadata in nir_sweep	Ian Romanick	2018-12-16	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	nir_sweep already marks all metadata invalid, so it is safe to release the memory here too. mean soft fp64 using uint64: 1,342,759,331 => 1,010,670,475 gfxbench5 aztec ruins high 11: 63,555,571 => 61,889,811 deus ex mankind divided 148: 62,845,304 => 62,829,640 deus ex mankind divided 2890: 71,922,686 => 71,922,686 dirt showdown 676: 69,238,607 => 69,238,607 dolphin ubershaders 210: 77,822,072 => 77,822,072 Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	nir: Fix holes in nir_instr	Ian Romanick	2018-12-16	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Found using pahole. Changes in peak memory usage according to Valgrind massif: mean soft fp64 using uint64: 1,343,991,403 => 1,342,759,331 gfxbench5 aztec ruins high 11: 63,619,971 => 63,555,571 deus ex mankind divided 148: 62,887,728 => 62,845,304 deus ex mankind divided 2890: 72,399,750 => 71,922,686 dirt showdown 676: 69,464,023 => 69,238,607 dolphin ubershaders 210: 78,359,728 => 77,822,072 Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	nir/phi_builder: Use per-value hash table to store [block] -> def mapping	Ian Romanick	2018-12-16	1	-9/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Replace the old array in each value with a hash table in each value. Changes in peak memory usage according to Valgrind massif: mean soft fp64 using uint64: 5,499,875,082 => 1,343,991,403 gfxbench5 aztec ruins high 11: 63,619,971 => 63,619,971 deus ex mankind divided 148: 62,887,728 => 62,887,728 deus ex mankind divided 2890: 72,402,222 => 72,399,750 dirt showdown 676: 74,466,431 => 69,464,023 dolphin ubershaders 210: 109,630,376 => 78,359,728 Run-time change for a full run on shader-db on my Haswell desktop (with -march=native) is 1.22245% +/- 0.463879% (n=11). This is about +2.9 seconds on a 237 second run. The first time I sent this version of this patch out, the run-time data was quite different. I had misconfigured the script that ran the test, and none of the tests from higher GLSL versions were run. These are generally more complex shaders, and they are more affected by this change. The previous version of this patch used a single hash table for the whole phi builder. The mapping was from [value, block] -> def, so a separate allocation was needed for each [value, block] tuple. There was quite a bit of per-allocation overhead (due to ralloc), so the patch was followed by a patch that added the use of the slab allocator. The results of those two patches was not quite as good: mean soft fp64 using uint64: 5,499,875,082 => 1,343,991,403 gfxbench5 aztec ruins high 11: 63,619,971 => 63,619,971 deus ex mankind divided 148: 62,887,728 => 62,887,728 deus ex mankind divided 2890: 72,402,222 => 72,402,222 * dirt showdown 676: 74,466,431 => 72,443,591 * dolphin ubershaders 210: 109,630,376 => 81,034,320 * The * denote tests that are better now. In the tests that are the same in both patches, the "after" peak memory usage was at a different location. I did not check the local peaks. Signed-off-by: Ian Romanick <[email protected]> Suggested-by: Jason Ekstrand <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	nir/algebraic: Add some optimizations for D3D-style Booleans	Jason Ekstrand	2018-12-16	1	-0/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	D3D Booleans use a 32-bit 0/-1 representation. Because this previously matched NIR exactly, we didn't have to really optimize for it. Now that we have 1-bit Booleans, we need some specific optimizations to chew through the D3D12-style Booleans. Shader-db results on Kaby Lake: total instructions in shared programs: 15136811 -> 14967944 (-1.12%) instructions in affected programs: 2457021 -> 2288154 (-6.87%) helped: 8318 HURT: 10 total cycles in shared programs: 373544524 -> 359701825 (-3.71%) cycles in affected programs: 151029683 -> 137186984 (-9.17%) helped: 7749 HURT: 682 total loops in shared programs: 4431 -> 4399 (-0.72%) loops in affected programs: 32 -> 0 helped: 21 HURT: 0 total spills in shared programs: 10290 -> 10051 (-2.32%) spills in affected programs: 2532 -> 2293 (-9.44%) helped: 18 HURT: 18 total fills in shared programs: 22203 -> 21732 (-2.12%) fills in affected programs: 3319 -> 2848 (-14.19%) helped: 18 HURT: 18 Note that a large chunk of the improvement fixing regressions caused by switching to 1-bit Booleans. Previously, our ability to optimize D3D booleans was improved by using the D3D representation directly in NIR. Now that NIR does 1-bit bools, we need a few more optimizations. Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Tested-by: Bas Nieuwenhuizen <[email protected]>
*	nir/algebraic: Optimize 1-bit Booleans	Jason Ekstrand	2018-12-16	2	-86/+57
\| \| \| \| \| \|	Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Tested-by: Bas Nieuwenhuizen <[email protected]>
*	nir: Switch to using 1-bit Booleans for almost everything	Jason Ekstrand	2018-12-16	13	-111/+96
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a squash of a few distinct changes: glsl,spirv: Generate 1-bit Booleans Revert "Use 32-bit opcodes in the NIR producers and optimizations" Revert "nir/builder: Generate 32-bit bool opcodes transparently" nir/builder: Generate 1-bit Booleans in nir_build_imm_bool Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Tested-by: Bas Nieuwenhuizen <[email protected]>
*	nir: Add a bool to int32 lowering pass	Jason Ekstrand	2018-12-16	4	-0/+163
\| \| \| \| \| \| \| \|	We also enable it in all of the NIR drivers. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Tested-by: Bas Nieuwenhuizen <[email protected]>