aboutsummaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/iris/iris_program.c
Commit message (Collapse)AuthorAgeFilesLines
* nir: Add explicit signs to image min/max intrinsicsJason Ekstrand2019-08-211-6/+12
| | | | | | | | | | | This better matches all the other atomic intrinsics such as those for SSBOs and shared variables where the sign is part of the intrinsic opcode. Both generators (GLSL and SPIR-V) know the sign from the type of the image variable or handle. In SPIR-V, signed min/max are separate opcodes from unsigned. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* iris: Enable non coherent framebuffer fetch on broadwellSagar Ghuge2019-08-201-1/+1
| | | | | | | | v2: Use GEN_GEN in iris_state (Kenneth Graunke) Signed-off-by: Sagar Ghuge <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* iris: Add render target read entry in binding tableSagar Ghuge2019-08-201-7/+43
| | | | | | | | | | | | | | This will be used in next patches for supporting non coherent framebuffer fetch on Broadwell. v2: Fix comment (Kenneth Graunke) v3: 1) Fix a few nits (Caio) 2) Add comment (Caio) Signed-off-by: Sagar Ghuge <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel/compiler: Fill a compiler statistics structJason Ekstrand2019-08-121-6/+7
| | | | | | | | | This commit is all annoying plumbing work which just adds support for a new brw_compile_stats struct. This struct provides a binary driver readable form of the same statistics we dump out to stderr when we INTEL_DEBUG is set with a shader stage. Reviewed-by: Lionel Landwerlin <[email protected]>
* anv,i965,iris: deduplicate setting of total_sharedRhys Perry2019-08-081-2/+0
| | | | | | | | v5: add patch Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* iris: Handle vertex shader with window space positionDanylo Piliaiev2019-08-061-0/+14
| | | | | | | | | | Iris advertises support for PIPE_CAP_TGSI_VS_WINDOW_SPACE_POSITION so let's actually implement it. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110657 Signed-off-by: Danylo Piliaiev <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* iris: add support for gl_ClipVertex in tess eval shadersTimothy Arceri2019-08-011-1/+14
| | | | | | Required for OpenGL compat support. Reviewed-by: Kenneth Graunke <[email protected]>
* iris: add support for gl_ClipVertex in geometry shadersTimothy Arceri2019-08-011-19/+31
| | | | | | This will enable us to support the OpenGL compat profile. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/compiler: Be more conservative about subgroup sizes in GLJason Ekstrand2019-07-241-0/+1
| | | | | | | | | | | The rules for gl_SubgroupSize in Vulkan require that it be a constant that can be queried through the API. However, all GL requires is that it's a uniform. Instead of always claiming that the subgroup size in the shader is 32 in GL like we have to do for Vulkan, claim 8 for geometry stages, the maximum for fragment shaders, and the actual size for compute. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* iris: change last_vue_stage() to look at uncompiled shadersTimothy Arceri2019-07-191-3/+3
| | | | | | | This allows us to find the last vue stage before we have compiled the shaders. Reviewed-by: Kenneth Graunke <[email protected]>
* iris: Fix key->input_vertices for 8_PATCH TCS mode.Kenneth Graunke2019-07-111-1/+3
| | | | | | We were failing to flag the program dirty when it changed. Also, we were unnecessarily setting key->input_vertices for SINGLE_PATCH mode, which would reduce program cache hits. Only set it if needed.
* iris: Only set key->flat_shade if COL0/COL1 are written.Kenneth Graunke2019-07-111-1/+1
| | | | | | This was just laziness on my part, we already added similar checks in the VS key handling. Just need to do it here too. Should improve cache hits.
* iris: Drop comment about var->data.binding not being set.Kenneth Graunke2019-07-111-4/+0
| | | | | I refactored the sampler lowering passes a long time ago to ensure that gl_nir_lower_samplers_as_deref is run and var->data.binding is set.
* iris: Drop comments about missing NOSKenneth Graunke2019-07-111-6/+0
| | | | | These stages don't need NOS. If they do, we can add it - the infrastructure is there if we need it someday.
* intel/compiler: Add a "base class" for program keysJason Ekstrand2019-07-101-15/+14
| | | | | | | | | Right now, all keys have two things in common: a program string ID and a sampler_prog_key_data. I'd like to add another thing or two and need a place to put it. This commit adds a new brw_base_prog_key struct which contains those two common bits. Reviewed-by: Kenneth Graunke <[email protected]>
* iris: Minor tidyingKenneth Graunke2019-07-031-3/+0
|
* iris: move sysvals to their own constant bufferTimur Kristóf2019-06-231-38/+22
| | | | | | | | | | | | | | | | | | This commit moves the sysvals to a separate, new constant buffer at the end (before the shader constants). It also allows us to remove the special handling we had for cbuf0, and enables all constant buffers to support user-specified resources and user buffers. v2: (by Kenneth Graunke) - Rebase on the previous patch to fix system value uploading. - Fix disk cache num_cbufs calculation - Fix passthrough TCS to report num_cbufs = 1 so upload actually occurs - Change upload_sysvals to assert that num_cbufs > 0 when num_system_values > 0. Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* iris: Mark cbuf0 as not needing uploading every single timeKenneth Graunke2019-06-231-3/+13
| | | | | | | | | | | | I neglected to mark cbuf0_needs_upload = false after uploading it. The obvious fix regressed user clip plane tests, because of a second bug: we also forgot to mark that they may need re-uploading when changing shader programs (which may have more or less system values). Thanks to Timur Kristóf for catching the original issue. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Timur Kristóf <[email protected]>
* iris: Create binding table slot for num_work_groups only when neededCaio Marcelo de Oliveira Filho2019-06-111-1/+4
| | | | | Reviewed-by: Sagar Ghuge <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* iris: Zero shs->cbuf0 when binding a passthrough TCSKenneth Graunke2019-06-071-0/+16
| | | | | | Fixes valgrind errors when running two CTS tests back to back: - KHR-GL45.shader_image_load_store.basic-allTargets-loadStoreT* (The first test has an actual TCS, the second uses passthrough.)
* iris: Rename bind_state to bind_shader_state.Kenneth Graunke2019-06-071-9/+9
| | | | | bind_state is possibly the worst name ever. For create, we used create_shader_state, which is more descriptive. Put shader in the name.
* iris: Sweep the NIR in iris_create_uncompiled_shader().Kenneth Graunke2019-06-071-0/+2
| | | | | | We run a ton of backend specific passes here (mostly brw_preprocess_nir) and ought to sweep up any unused memory at this point, since we're going to hang on to this NIR for as long as the linked program lives.
* intel/nir: Stop returning the shader from helpersJason Ekstrand2019-06-051-1/+1
| | | | | | | | Now that NIR_TEST_* doesn't swap the shader out from under us, it's sufficient to just modify the shader rather than having to return in case we're testing serialization or cloning. Reviewed-by: Kenneth Graunke <[email protected]>
* iris: Fix SO stride units for DrawTransformFeedbackKenneth Graunke2019-06-031-1/+1
| | | | | | | | | | | Mesa measures in DWords. The hardware also claims to measure in DWords. Except the SO_WRITE_OFFSET field is actually bits 31:2, with 1:0 MBZ. Which means that it really measures in bytes. So, convert to bytes. Without this, our offset / stride denominator was 1/4th the size it should be, leading to 4x the vertex count that we should have had. Fixes GTF-GL46.gtf40.GL3Tests.transform_feedback2.transform_feedback2_two_buffers
* iris: Always reserve binding table space for NIR constantsCaio Marcelo de Oliveira Filho2019-06-031-4/+4
| | | | | | | | Don't have a separate mechanism for NIR constants to be removed from the table. If unused, we will compact it away. The use_null_surface is needed when INTEL_DISABLE_COMPACT_BINDING_TABLE is set. Reviewed-by: Kenneth Graunke <[email protected]>
* iris: Print binding tables when INTEL_DEBUG=btCaio Marcelo de Oliveira Filho2019-06-031-0/+53
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* iris: Compact binding tablesCaio Marcelo de Oliveira Filho2019-06-031-39/+180
| | | | | | | | | | | | | | | | | | | | | Change the iris_binding_table to keep track of what surfaces are actually going to be used, then assign binding table indices just for those. Reducing unused bytes on those are valuable because we use a reduced space for those tables in Iris. The rest of the driver can go from "group indices" (i.e. UBO #2) to BTI and vice-versa using helper functions. The value IRIS_SURFACE_NOT_USED is returned to indicate a certain group index is not used or a certain BTI is not valid. The environment variable INTEL_DISABLE_COMPACT_BINDING_TABLE can be set to skip compacting binding table. v2: (all from Ken) Use BITFIELD64_MASK helper. Improve comments. Assert all group is marked as used when we have indirects. Reviewed-by: Kenneth Graunke <[email protected]>
* iris: Create an enum for the surface groupsCaio Marcelo de Oliveira Filho2019-06-031-23/+25
| | | | | | | This will make convenient to handle compacting and printing the binding table. Reviewed-by: Kenneth Graunke <[email protected]>
* iris: Handle binding table in the driverCaio Marcelo de Oliveira Filho2019-06-031-102/+186
| | | | | | | | | | | | | | | | | | | | | Stop using brw_compiler to lower the final binding table indices for surface access. This is done by simply not setting the 'prog_data->binding_table.*_start' fields. Then make the driver perform this lowering. This is a better place to perfom the binding table assignments, since the driver has more information and will also later consume those assignments to upload resources. This also prepares us for two changes: use ibc without having to implement binding table logic there; and remove unused entries from the binding table. Since the `block` field in brw_ubo_range now refers to the final binding table index, we need to adjust it before using to index shs->constbuf. Reviewed-by: Kenneth Graunke <[email protected]>
* iris: Pull brw_nir_analyze_ubo_ranges() call out setup_uniformsCaio Marcelo de Oliveira Filho2019-06-031-3/+10
| | | | | | | | | We'll change iris to perform lowering of the binding table indices earlier (before the backend kick in), but the backend compiler uses the result of the analysis to identify load_ubo intrinsics, so we do the analysis after the lowering to have the right indices. Reviewed-by: Kenneth Graunke <[email protected]>
* iris: Enable nir_opt_large_constantsJason Ekstrand2019-05-291-0/+66
| | | | | | | | | | | | | | | | | | | | | | Shader-db results on Kaby Lake: total instructions in shared programs: 15306230 -> 15304726 (<.01%) instructions in affected programs: 4570 -> 3066 (-32.91%) helped: 16 HURT: 0 total cycles in shared programs: 361703436 -> 361680041 (<.01%) cycles in affected programs: 129388 -> 105993 (-18.08%) helped: 16 HURT: 0 LOST: 0 GAINED: 2 The helped programs were in XCom 2, Deus Ex: Mankind Divided, and Kerbal Space Program Reviewed-by: Kenneth Graunke <[email protected]>
* iris: Don't assume UBO indices are constantJason Ekstrand2019-05-291-1/+2
| | | | | | | | | It will be true for the constant/system value buffer because they use a constant zero but it's not true in general. If we ever got here when the source wasn't constant, nir_src_as_uint would assert. Reviewed-by: Kenneth Graunke <[email protected]> Cc: [email protected]
* iris: Move upload_ubo_ssbo_surf_state to iris_program.cJason Ekstrand2019-05-291-0/+45
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* iris: Clone before calling nir_strip and serializingKenneth Graunke2019-05-291-6/+8
| | | | | | This is non-destructive and leaves the debugging information in place. Reviewed-by: Jason Ekstrand <[email protected]>
* iris: Only store the SHA1 of the NIR in iris_uncompiled_shaderKenneth Graunke2019-05-291-3/+1
| | | | | | | | | Jason pointed out that we don't need to keep an entire copy of the serialized NIR around, we just need the SHA1. This does change our disk cache key to be taking a SHA1 of a SHA1, which is a bit odd, but should work out and be faster and use less memory. Reviewed-by: Jason Ekstrand <[email protected]>
* iris: Fix ALT mode regressions from shader cacheKenneth Graunke2019-05-211-4/+6
| | | | | | | | | | We were checking this based on nir->info.name, but with the shader cache enabled, nir_strip throws out the name, causing us to use IEEE mode for ARB programs. gl-1.0-spot-light regressed because it wants ALT mode for 0^0 behavior. Fixes: dc5dc727d59 iris: Serialize the NIR to a blob we can use for shader cache purposes.
* iris: Cache assembly shaders in the on-disk shader cacheDylan Baker2019-05-211-6/+43
| | | | | | | This implements storing and retrieving iris_compiled_shader objects from the on-disk shader cache. (by Dylan Baker and Kenneth Graunke)
* iris: Serialize the NIR to a blob we can use for shader cache purposes.Kenneth Graunke2019-05-211-0/+21
| | | | | | | | We will use a hash of the serialized NIR together with brw_prog_*_key (for NOS) as the disk cache key, where the disk cache contains actual assembly shaders. Reviewed-by: Dylan Baker <[email protected]>
* iris: Move iris_uncompiled_shader definition to iris_context.hKenneth Graunke2019-05-211-23/+0
| | | | | | | | | | It had been internal to iris_program.c, but with the upcoming disk cache code, the "program module" is going to be spread across a couple source files. Into a header it goes! Now it lives alongside iris_compiled_shader, which makes sense. Reviewed-by: Dylan Baker <[email protected]>
* intel/compiler: Implement TCS 8_PATCH mode and INTEL_DEBUG=tcs8Kenneth Graunke2019-05-141-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Our tessellation control shaders can be dispatched in several modes. - SINGLE_PATCH (Gen7+) processes a single patch per thread, with each channel corresponding to a different patch vertex. PATCHLIST_N will launch (N / 8) threads. If N is less than 8, some channels will be disabled, leaving some untapped hardware capabilities. Conditionals based on gl_InvocationID are non-uniform, which means that they'll often have to execute both paths. However, if there are fewer than 8 vertices, all invocations will happen within a single thread, so barriers can become no-ops, which is nice. We also burn a maximum of 4 registers for ICP handles, so we can compile without regard for the value of N. It also works in all cases. - DUAL_PATCH mode processes up to two patches at a time, where the first four channels come from patch 1, and the second group of four come from patch 2. This tries to provide better EU utilization for small patches (N <= 4). It cannot be used in all cases. - 8_PATCH mode processes 8 patches at a time, with a thread launched per vertex in the patch. Each channel corresponds to the same vertex, but in each of the 8 patches. This utilizes all channels even for small patches. It also makes conditions on gl_InvocationID uniform, leading to proper jumps. Barriers, unfortunately, become real. Worse, for PATCHLIST_N, the thread payload burns N registers for ICP handles. This can burn up to 32 registers, or 1/4 of our register file, for URB handles. For Vulkan (and DX), we know the number of vertices at compile time, so we can limit the amount of waste. In GL, the patch dimension is dynamic state, so we either would have to waste all 32 (not reasonable) or guess (badly) and recompile. This is unfortunate. Because we can only spawn 16 thread instances, we can only use this mode for PATCHLIST_16 and smaller. The rest must use SINGLE_PATCH. This patch implements the new 8_PATCH TCS mode, but leaves us using SINGLE_PATCH by default. A new INTEL_DEBUG=tcs8 flag will switch to using 8_PATCH mode for testing and benchmarking purposes. We may want to consider using 8_PATCH mode in Vulkan in some cases. The data I've seen shows that 8_PATCH mode can be more efficient in some cases, but SINGLE_PATCH mode (the one we use today) is faster in other cases. Ultimately, the TES matters much more than the TCS for performance, so the decision may not matter much. Reviewed-by: Jason Ekstrand <[email protected]>
* iris: Set XY Clipping correctly.Kenneth Graunke2019-04-291-1/+27
| | | | | | | | | | I was setting it based off a pipe_rasterizer_state field that appears to be entirely dead outside of the draw module respecting it. I should be setting it when the primitive type reaching the SF is neither points nor lines. This is, unfortunately, rather dirty, as we have to look at the rasterizer state, the geometry shader state, the tessellation evaluation shader state, and the primitive type...
* iris: Move iris_debug_recompile calls before uploading.Kenneth Graunke2019-04-161-33/+33
| | | | | | | | Order of operations is important, otherwise we'll find the program we just uploaded as the "old" compile and get confused why nothing is different between the two keys. Reviewed-by: Jordan Justen <[email protected]>
* iris: Print the reason for shader recompiles.Kenneth Graunke2019-04-161-6/+30
| | | | | | | I was lazy earlier and hadn't bothered typing / refactoring this. Now I'm hitting some extra recompiles and would like to see why. Reviewed-by: Jordan Justen <[email protected]>
* glsl/nir: add support for lowering bindless images_derefsKarol Herbst2019-04-121-1/+1
| | | | | | | | | | | v2: handle atomics as well make use of nir_rewrite_image_intrinsic v3: remove call to nir_remove_dead_derefs v4: (Timothy Arceri) dont actually call lowering yet Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> (v3) Reviewed-by: Marek Olšák <[email protected]>
* nir: move brw_nir_rewrite_image_intrinsic into common codeKarol Herbst2019-04-121-1/+1
| | | | | | Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* iris: Silence unused variable warnings in release modeKenneth Graunke2019-04-061-3/+2
|
* iris: avoid use after free in shader destructionDave Airlie2019-04-051-7/+49
| | | | | | | | | | | While playing with compute shaders, I was getting a random crash, noticed that bind_state was using the old shader info for comparision, but gallium allows the shader to be deleted while bound, so this could lead to a use after free. This can't happen using the cso cache. As it tracks all of this. Reviewed-by: Kenneth Graunke <[email protected]>
* iris: initialize num_cbufsTapani Pälli2019-03-201-1/+1
| | | | | | | | | | Currently initialized only if 'ish' is non-NULL. CID: 1444106 Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* iris: Fix TES gl_PatchVerticesIn handling.Kenneth Graunke2019-03-111-0/+7
| | | | | | | | | | | | | | | | 1. If we switch the TCS for one with a different number of output vertices, then the TES's gl_PatchVerticesIn value will change. We need to re-upload in this case. For now, re-emit constants whenever the TCS/TES are swapped out. 2. If there is no TCS, then we can't grab gl_PatchVerticesIn from the TCS info. Since it's a passthrough, we can just use the primitive's patch count (like the TCS gl_PatchVerticesIn does). Fixes KHR-GL45.tessellation_shader.single.max_patch_vertices and KHR-GL45.tessellation_shader.tessellation_control_to_tessellation_evaluation.gl_PatchVerticesIn. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* iris: Rework default tessellation level uploadsKenneth Graunke2019-03-111-2/+23
| | | | | | | | | | | Now that we've added a system value uploading mechanism, we may as well reuse the same system for default tessellation levels. This simplifies the state upload code a bit. Also fixes: KHR-GL45.tessellation_shader.tessellation_control_to_tessellation_evaluation.gl_tessLevel Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>