summaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* glsl/freedreno/panfrost: pass gl_context to the standalone compilerTimothy Arceri2019-03-065-8/+15
| | | | | | | This allows us to use the ctx with glsl_to_nir() in a following patch. Reviewed-by: Eric Anholt <[email protected]>
* vulkan/overlay: drop dependency on validation layer headersLionel Landwerlin2019-03-064-213/+40
| | | | | | | | | v2: reimplement layer chain info getters (Eric) v3: make it compile.. (Lionel) Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* vulkan/util: generate instance/device dispatch tablesLionel Landwerlin2019-03-065-24/+148
| | | | | | | | This will be used by the overlay instead of system installed validation layers helpers. Signed-off-by: Lionel Landwerlin <[email protected]> Acked-by: Eric Engestrom <[email protected]>
* vulkan/util: make header available from c++Lionel Landwerlin2019-03-061-1/+9
| | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* iris: setup EdgeFlag Vertex Element when needed.Jose Maria Casanova Crespo2019-03-063-15/+86
| | | | | | | | | | | | | | | | If Vertex Shader uses EdgeFlag the hardware request that it is setup as the last VERTEX_ELEMENT_STATE. If SGVS are add at draw time we need to also reconfigure the last 3DSTATE_VF_INSTANCING so its VertexElementIndex points to the new Vertex Element that contains the EdgeFlag. So if draw parameters or edgeflag are not used the CSO generated at iris_create_vertex_element is sent directly in the batches. But if edge flag is used we adjust last VERTEX_ELEMENT_STATE and last 3DSTATE_VF_INSTANCING using their alternative edge flag version we generate at iris_create_vertex_element and store at the CSO. Reviewed-by: Kenneth Graunke <[email protected]>
* v3d: Include a count of register pressure in the RA failure dumps.Eric Anholt2019-03-061-1/+13
| | | | | | You usually want to go find the highest pressure and figure out why you couldn't spill or what pattern led to a bunch of pressure leading to that point.
* radv: enable lower_mul_2x32_64Samuel Pitoiset2019-03-061-0/+1
| | | | | | Fixes: 58bcebd987b ("spirv: Allow [i/u]mulExtended to use new nir opcode") Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* st/nir: Move 64-bit lowering laterJason Ekstrand2019-03-061-2/+5
| | | | | | | | | | | | | | Now that we have a loop unrolling cost function and loop unrolling isn't going to kill us the moment we have a 64-bit op in a loop, we can go ahead and move 64-bit lowering later. This gives us the opportunity to do more optimizations and actually let the full optimizer run even on 64-bit ops rather than hoping one round of opt_algebraic will fix everything. This substantially reduces both fp64 shader compile times and the resulting code size. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel/nir: Move 64-bit lowering laterJason Ekstrand2019-03-061-21/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | Now that we have a loop unrolling cost function and loop unrolling isn't going to kill us the moment we have a 64-bit op in a loop, we can go ahead and move 64-bit lowering later. This gives us the opportunity to do more optimizations and actually let the full optimizer run even on 64-bit ops rather than hoping one round of opt_algebraic will fix everything. This substantially reduces both fp64 shader compile times and the resulting code size. On the vs-isnan-dvec test from piglit: Before this commit: 1684.63s user 17.29s system 99% cpu 28:28.24 total 101479 instructions. 0 loops. 802452 cycles. 79:369 spills:fills. Peak memory usage (according to massif): 1.435 GB After this commit: 179.64s user 7.75s system 99% cpu 3:07.92 total 57316 instructions. 0 loops. 459287 cycles. 0:0 spills:fills. Peak memory usage (according to massif): 531.0 MB Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nir/lower_doubles: Inline functions directly in lower_doublesJason Ekstrand2019-03-069-63/+53
| | | | | | | | | | | | Instead of trusting the caller to already have created a softfp64 function shader and added all its functions to our shader, we simply take the softfp64 shader as an argument and do the function inlining ouselves. This means that there's no more nasty functions lying around that the caller needs to worry about cleaning up. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nir/deref: Expose nir_opt_deref_implJason Ekstrand2019-03-062-1/+2
| | | | | | Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nir/inline_functions: Break inlining into a builder helperJason Ekstrand2019-03-063-40/+60
| | | | | | | | | | | | | This pulls the guts of function inlining into a builder helper so that it can be used elsewhere. The rest of the infrastructure is still needed for most inlining cases to ensure that everything gets inlined and only ever once. However, there are use-cases where you just want to inline one little thing. This new helper also has a neat trick where it can seamlessly inline a function from one nir_shader into another. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* glsl/nir: Inline functions in float64_funcs_to_nirJason Ekstrand2019-03-061-0/+5
| | | | | | | | | | This doesn't really change anything as the functions will all get inlined anyway. However it does let us do a bit of the work earlier and in a common place. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* glsl/nir: Add a shared helper for building float64 shadersJason Ekstrand2019-03-067-99/+70
| | | | | | Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel/nir: Drop an unneeded lower_constant_initializers callJason Ekstrand2019-03-061-2/+0
| | | | | | | | | | | Even though this is technically a step in the function inlining process as laid out in nir_inline_functions.c, it's not really needed. We already have constant initializers lowered here and no new ones are added by appending the softfp64 functions. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel/debug: Add a debug flag to force software fp64Jason Ekstrand2019-03-063-2/+4
| | | | | | Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Compile the fp64 program based on nir optionsJason Ekstrand2019-03-061-1/+2
| | | | | | | | | | Instead of looking the devinfo directly, look at the lowering options we provided to NIR. This is more accurate as it's now checking for "do we need full software lowering" rather than a hardware bit. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nir: Teach loop unrolling about 64-bit instruction loweringJason Ekstrand2019-03-063-13/+79
| | | | | | | | | | | | | | | | The lowering we do for 64-bit instructions can cause a single NIR ALU instruction to blow up into hundreds or thousands of instructions potentially with control flow. If loop unrolling isn't aware of this, it can unroll a loop 20 times which contains a nir_op_fsqrt which we then lower to a full software implementation based on integer math. Those 20 invocations suddenly get a lot more expensive than NIR loop unrolling currently expects. By giving it an approximate estimate function, we can prevent loop unrolling from going to town when it shouldn't. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nir: Expose double and int64 op_to_options_mask helpersJason Ekstrand2019-03-063-51/+23
| | | | | | | | | We already have one internally for int64 but we don't have a similar one for doubles so we'll have to make one. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* compiler/nir: add an is_conversion field to nir_op_infoIago Toral Quiroga2019-03-063-33/+47
| | | | | | | | | This is set to True only for numeric conversion opcodes. Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel/fs: Fix extract_u8 of an odd byte from a 64-bit integerIan Romanick2019-03-061-0/+7
| | | | | | | | | | | | | | | In the old code, we would generate the exact same instruction for extract_u8(some_u64, 0) and extract_u8(some_u64, 1). The mask-a-word trick only works for even numbered bytes. This fixes the (new) piglit test tests/spec/arb_gpu_shader_int64/execution/fs-ushr-and-mask.shader_test. v2: Use a SHR instead of an AND. This saves an instruction compared to using two moves. Suggested by Jason. Fixes: 6ac2d169019 ("i965/fs: Fix extract_i8/u8 to a 64-bit destination") Reviewed-by: Jason Ekstrand <[email protected]>
* intel/fs: nir_op_extract_i8 extracts a byte, not a wordIan Romanick2019-03-061-2/+4
| | | | | Fixes: 6ac2d169019 ("i965/fs: Fix extract_i8/u8 to a 64-bit destination") Reviewed-by: Jason Ekstrand <[email protected]>
* intel/compiler: Silence unused parameter warning in brw_interpolation_map.cIan Romanick2019-03-063-7/+4
| | | | | | | | | | | | The parameter is never used, and it's not part of a common interface idiom. Remove it. src/intel/compiler/brw_interpolation_map.c: In function ‘brw_setup_vue_interpolation’: src/intel/compiler/brw_interpolation_map.c:62:59: warning: unused parameter ‘devinfo’ [-Wunused-parameter] const struct gen_device_info *devinfo) ^~~~~~~ Reviewed-by: Jason Ekstrand <[email protected]>
* intel/compiler: Silence many unused parameter warnings in brw_eu.hIan Romanick2019-03-061-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In file included from src/intel/compiler/brw_eu_util.c:34:0: src/intel/compiler/brw_eu.h: In function ‘brw_message_desc_header_present’: src/intel/compiler/brw_eu.h:288:63: warning: unused parameter ‘devinfo’ [-Wunused-parameter] brw_message_desc_header_present(const struct gen_device_info *devinfo, ^~~~~~~ src/intel/compiler/brw_eu.h: In function ‘brw_message_ex_desc’: src/intel/compiler/brw_eu.h:296:51: warning: unused parameter ‘devinfo’ [-Wunused-parameter] brw_message_ex_desc(const struct gen_device_info *devinfo, ^~~~~~~ src/intel/compiler/brw_eu.h: In function ‘brw_message_ex_desc_ex_mlen’: src/intel/compiler/brw_eu.h:303:59: warning: unused parameter ‘devinfo’ [-Wunused-parameter] brw_message_ex_desc_ex_mlen(const struct gen_device_info *devinfo, ^~~~~~~ src/intel/compiler/brw_eu.h: In function ‘brw_sampler_desc_binding_table_index’: src/intel/compiler/brw_eu.h:337:68: warning: unused parameter ‘devinfo’ [-Wunused-parameter] brw_sampler_desc_binding_table_index(const struct gen_device_info *devinfo, ^~~~~~~ src/intel/compiler/brw_eu.h: In function ‘brw_sampler_desc_sampler’: src/intel/compiler/brw_eu.h:344:56: warning: unused parameter ‘devinfo’ [-Wunused-parameter] brw_sampler_desc_sampler(const struct gen_device_info *devinfo, uint32_t desc) ^~~~~~~ src/intel/compiler/brw_eu.h: In function ‘brw_sampler_desc_return_format’: src/intel/compiler/brw_eu.h:371:62: warning: unused parameter ‘devinfo’ [-Wunused-parameter] brw_sampler_desc_return_format(const struct gen_device_info *devinfo, ^~~~~~~ src/intel/compiler/brw_eu.h: In function ‘brw_dp_desc_binding_table_index’: src/intel/compiler/brw_eu.h:405:63: warning: unused parameter ‘devinfo’ [-Wunused-parameter] brw_dp_desc_binding_table_index(const struct gen_device_info *devinfo, ^~~~~~~ src/intel/compiler/brw_eu.h: In function ‘brw_dp_a64_untyped_atomic_desc’: src/intel/compiler/brw_eu.h:754:41: warning: unused parameter ‘exec_size’ [-Wunused-parameter] unsigned exec_size, /**< 0 for SIMD4x2 */ ^~~~~~~~~ src/intel/compiler/brw_eu.h: In function ‘brw_dp_a64_untyped_atomic_float_desc’: src/intel/compiler/brw_eu.h:775:47: warning: unused parameter ‘exec_size’ [-Wunused-parameter] unsigned exec_size, ^~~~~~~~~ Reviewed-by: Jason Ekstrand <[email protected]>
* meson: remove unused include_directories(vulkan)Eric Engestrom2019-03-061-1/+1
| | | | | | | The correct include path is "vulkan/…". Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* radv: set num_components on vulkan_resource_index intrinsicLionel Landwerlin2019-03-063-10/+20
| | | | | | | | | | In 61e009d2c4e4df we changed the number of components in the vulkan_resource_index intrinsic and forgot the update Radv's code for it. Signed-off-by: Lionel Landwerlin <[email protected]> Fixes: 61e009d2c4e4df ("spirv: Use the same types for resource indices as pointers") Reviewed-by: Samuel Pitoiset [email protected]
* nir: rename glsl_type_is_struct() -> glsl_type_is_struct_or_ifc()Timothy Arceri2019-03-0625-45/+45
| | | | | | | | | | Replace done using: find ./src -type f -exec sed -i -- \ 's/glsl_type_is_struct(/glsl_type_is_struct_or_ifc(/g' {} \; Acked-by: Karol Herbst <[email protected]> Acked-by: Jason Ekstrand <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
* glsl: rename record_types -> struct_typesTimothy Arceri2019-03-062-10/+10
| | | | | | Acked-by: Karol Herbst <[email protected]> Acked-by: Jason Ekstrand <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
* glsl: rename record_location_offset() -> struct_location_offset()Timothy Arceri2019-03-068-11/+11
| | | | | | | | | | Replace done using: find ./src -type f -exec sed -i -- \ 's/record_location_offset(/struct_location_offset(/g' {} \; Acked-by: Karol Herbst <[email protected]> Acked-by: Jason Ekstrand <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
* glsl: rename get_record_instance() -> get_struct_instance()Timothy Arceri2019-03-065-7/+7
| | | | | | | | | | Replace done using: find ./src -type f -exec sed -i -- \ 's/get_record_instance(/get_struct_instance(/g' {} \; Acked-by: Karol Herbst <[email protected]> Acked-by: Jason Ekstrand <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
* glsl: rename is_record() -> is_struct()Timothy Arceri2019-03-0622-90/+90
| | | | | | | | | | Replace was done using: find ./src -type f -exec sed -i -- \ 's/is_record(/is_struct(/g' {} \; Acked-by: Karol Herbst <[email protected]> Acked-by: Jason Ekstrand <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
* nir/spirv: initial handling of OpenCL.std extension opcodesKarol Herbst2019-03-0512-3/+602
| | | | | | | | | | | | | | | | | | Not complete, mostly just adding things as I encounter them in CTS. But not getting far enough yet to hit most of the OpenCL.std instructions. Anyway, this is better than nothing and covers the most common builtins. v2: add hadd proof from Jason move some of the lowering into opt_algebraic and create new nir opcodes simplify nextafter lowering fix normalize lowering for inf rework upsample to use nir_pack_bits add missing files to build systems v3: split lines of iadd/sub_sat expressions Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir/vtn: add support for SpvBuiltInGlobalLinearIdKarol Herbst2019-03-055-12/+45
| | | | | | | | v2: use formula with fewer operations Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* nir: add support for address bit sized system valuesKarol Herbst2019-03-052-18/+29
| | | | | | | | | | | v2: add assert in else clause make local group intrinsics 32 bit wide v3: always use 32 bit constant for local_size v4: add comment by Jason Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir/spirv: improve parsing of the memory modelKarol Herbst2019-03-053-7/+45
| | | | | | | v2: add some vtn_fail_ifs Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: replace magic numbers with M_PIKarol Herbst2019-03-051-2/+2
| | | | | | | we define it inside 'include/c99_math.h' so it is safe to use. Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: Implement VK_EXT_external_memory_hostCaio Marcelo de Oliveira Filho2019-03-054-1/+133
| | | | | | | v2: Ignore the import if handleType == 0. (Jason) Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* v3d: Drop the V3D 3.x vpm read dead code elimination.Eric Anholt2019-03-051-33/+2
| | | | | We now have NIR dead code eliminating our VPM reads, so this shouldn't be necessary.
* v3d: Eliminate the TLB and TLBU files.Eric Anholt2019-03-054-41/+20
| | | | We can just use the magic register file like we do for other magic waddrs.
* v3d: Use ldunif instructions for uniforms.Eric Anholt2019-03-0511-270/+27
| | | | | | | | | | | | | | The idea is that for repeated use of the same uniform, we could avoid loading it on each consumer. The results look pretty good. total instructions in shared programs: 6413571 -> 6521464 (1.68%) total threads in shared programs: 154214 -> 154000 (-0.14%) total uniforms in shared programs: 2393604 -> 2119629 (-11.45%) total spills in shared programs: 4960 -> 4984 (0.48%) total fills in shared programs: 6350 -> 6418 (1.07%) Once we do scheduling at the NIR level, the register pressure (and thus also instructions) issues we see here will drop back down.
* v3d: Add support for register-allocating a ldunif to a QFILE_TEMP.Eric Anholt2019-03-052-14/+77
| | | | | On V3D 4.x, we can use ldunifrf to load uniforms to any register, and this will let us schedule the ldunif wherever we want in the program.
* v3d: Drop the old class bits splitting up the accumulators.Eric Anholt2019-03-051-7/+3
| | | | This seems to be left over from vc4, and I don't use them any more.
* v3d: Add support for vir-to-qpu of ldunif instructions to a temp.Eric Anholt2019-03-051-2/+15
| | | | | We can load a uniform to any register, so add support for non-ALU instructions with sig.ldunif to a temp.
* v3d: Switch implicit uniforms over to being any qinst->uniform != ~0.Eric Anholt2019-03-0510-123/+77
| | | | | I'm not sure why I didn't do this before -- it's clearly much simpler to add dumping of the extra thing than to have it as another implicit source.
* v3d: Do uniform rematerialization spilling before dropping threadcountEric Anholt2019-03-051-8/+10
| | | | | | | | | This feels like the right tradeoff for threads vs uniforms, particularly given that we often have very short thread segments right now: total instructions in shared programs: 6411504 -> 6413571 (0.03%) total threads in shared programs: 153946 -> 154214 (0.17%) total uniforms in shared programs: 2387665 -> 2393604 (0.25%)
* v3d: Fix temporary leaks of temp_registers and when spilling.Eric Anholt2019-03-051-5/+4
| | | | | | | On each iteration of successfully spilling a reg, we'd allocate another copy of temp_registers, and when decrementing thread conut we'd allocate another copy of the graph. These all got cleaned up on freeing the compile.
* tgsi_to_nir: Set correct location for uniforms.Timur Kristóf2019-03-051-0/+1
| | | | | | | | | | | | Previously, only the driver_location was set for all variables, but constants need to use the location field instead. This change is necessary because the nine state tracker can produce non-packed constants whose location needs to be explicitly set. Signed-Off-By: Timur Kristóf <[email protected]> Tested-by: Andre Heider <[email protected]> Tested-by: Rob Clark <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* tgsi_to_nir: Improve interpolation modes.Timur Kristóf2019-03-051-15/+21
| | | | | | | | | | | | | | This patch extracts the interpolation mode translation into a separate function called ttn_translate_interp_mode, adds support for TGSI_INTERPOLATE_COLOR which was missing, and also sets the proper interpolation mode to output variables, which were not set previously. Signed-Off-By: Timur Kristóf <[email protected]> Tested-by: Andre Heider <[email protected]> Tested-by: Rob Clark <[email protected]> Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* tgsi_to_nir: use sampler variables and derefsKenneth Graunke2019-03-051-10/+79
| | | | | | | | | | | | | | v2: fix is_shadow, is_array and txq Some drivers (eg. iris) need the presence of sampler variables and derefs so that they can count them to determine the number of samplers used. This change also makes the output NIR closer to what glsl_to_nir outputs. Signed-Off-By: Timur Kristóf <[email protected]> Tested-by: Andre Heider <[email protected]> Tested-by: Rob Clark <[email protected]> Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* tgsi_to_nir: Support FACE and POSITION properly.Timur Kristóf2019-03-051-12/+68
| | | | | | | | | | | | | | Previously, FACE was hard-coded as a sysval, but TTN emulated it incorrectly. Also, POSITION was not supported when it was a sysval. This patch fixes these by allowing both of them to be sysvals or inputs, based on driver capabilities. It also fixes the TGSI FACE emulation based on the TGSI spec. Signed-Off-By: Timur Kristóf <[email protected]> Tested-by: Andre Heider <[email protected]> Tested-by: Rob Clark <[email protected]> Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Eric Anholt <[email protected]>