summaryrefslogtreecommitdiffstats
path: root/src/intel
Commit message (Collapse)AuthorAgeFilesLines
* anv/pipeline: Disable FS dispatch for pointless fragment shadersJason Ekstrand2018-08-031-4/+33
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* intel/tools: add error2aub creation into autotoolsAndres Gomez2018-08-021-1/+23
| | | | | | | | | | | | | | Tarball distribution is done through "make distcheck". We include the meson targets also into autotools so they won't fail when building from the tarball. Fixes: 6a60beba408 ("intel/tools: Add an error state to aub translator") Cc: Jason Ekstrand <[email protected]> Cc: Lionel Landwerlin <[email protected]> Cc: Dylan Baker <[email protected]> Signed-off-by: Andres Gomez <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* anv/pipeline: Do cross-stage linking optimizationsJason Ekstrand2018-08-021-0/+11
| | | | | | | This appears to help the Aztec Ruins benchmark by about 2% on my Kaby Lake gt2 laptop. Reviewed-by: Timothy Arceri <[email protected]>
* anv/pipeline: Pull most of the anv_pipeline_compile_* into common codeJason Ekstrand2018-08-021-215/+92
| | | | | | | | | | This leaves us with a series of little anv_pipeline_compile_* functions which each take a compiler object, a mem_ctx, the stage to compile, and the previous stage for VUE linking purposes. Some of them do interesting things but most are little more than wrappers around brw_compile_*. Reviewed-by: Timothy Arceri <[email protected]>
* anv/pipeline: Add a separate "link" stageJason Ekstrand2018-08-021-128/+193
| | | | | | | | | | | This breaks compilation up a bit into "link" and "compile". In the "link" stage, new anv_pipeline_link_* helpers are called which are responsible for setting up the binding table and doing anything needed to properly link with the next stage in the pipeline if one exists. They are called in reverse order starting with the fragment shader so you can assume linking in later stages is already done. Reviewed-by: Timothy Arceri <[email protected]>
* anv/pipeline: Compile to NIR in compile_graphicsJason Ekstrand2018-08-021-161/+116
| | | | | | This pulls the SPIR-V to NIR step out into common code. Reviewed-by: Timothy Arceri <[email protected]>
* anv/pipeline: Recompile all shaders if any are missing from the cacheJason Ekstrand2018-08-021-4/+37
| | | | Reviewed-by: Timothy Arceri <[email protected]>
* anv/pipeline: Drop anv_pipeline_add_compiled_stageJason Ekstrand2018-08-022-19/+10
| | | | | | | We can set active_stages much more directly and then it's just candy around setting pipeline->stages[stage]. Reviewed-by: Timothy Arceri <[email protected]>
* anv/pipeline: Pull shader compilation out into a helper.Jason Ekstrand2018-08-021-108/+120
| | | | Reviewed-by: Timothy Arceri <[email protected]>
* anv/pipeline: Call anv_pipeline_compile_* in a loopJason Ekstrand2018-08-021-26/+30
| | | | Reviewed-by: Timothy Arceri <[email protected]>
* anv/pipeline: Hash the entire pipeline in one goJason Ekstrand2018-08-021-53/+94
| | | | | | | | | Instead of hashing each stage separately (and TES and TCS together), we hash the entire pipeline. This means we'll get fewer cache hits if they, for instance, re-use the same VS over and over again but it also means we can now safely do cross-stage optimizations. Reviewed-by: Timothy Arceri <[email protected]>
* anv/pipeline: Populate keys up-frontJason Ekstrand2018-08-021-55/+60
| | | | | | | | Instead of having each anv_pipeline_compile_* function populate the shader key, make it part of the anv_pipeline_stage struct and fill it out up-front. Reviewed-by: Timothy Arceri <[email protected]>
* anv/pipline: Add a helper struct for per-stage infoJason Ekstrand2018-08-022-95/+74
| | | | Reviewed-by: Timothy Arceri <[email protected]>
* intel/compiler: Add brw_get_compiler_config_value for disk cacheJordan Justen2018-08-013-1/+41
| | | | | | | | | | | | | | | | | | | During code review, Jason pointed out that: 2b3064c0731 "i965, anv: Use INTEL_DEBUG for disk_cache driver flags" Didn't account for INTEL_SCALER_* environment variables. To fix this, let the compiler return the disk_cache driver flags. Another possible fix would be to pull the INTEL_SCALER_* into INTEL_DEBUG bits, but as we are currently using 41 of 64 bits, I didn't think it was a good use of 4 more of these bits. (5 since INTEL_PRECISE_TRIG needs to be accounted for as well.) Cc: Jason Ekstrand <[email protected]> Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Disable shader cache with INTEL_DEBUG=shader_timeJordan Justen2018-08-011-2/+5
| | | | | | | | | | | | | | | | | | | Shader time hard codes an index of the shader time buffer within the gen program. In order to support shader time in the disk shader cache, we'd need to add the shader time index into the program key. This should work, but probably is not worth it for this particular debug feature. Therefore, let's just disable the disk shader cache if the shader time debug feature is used. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106382 Fixes: 96fe36f7acc "i965: Enable disk shader cache by default" Cc: Eero Tamminen <[email protected]> Cc: Kenneth Graunke <[email protected]> Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* anv/pipeline: Add populate_tcs/tes_key helpersJason Ekstrand2018-08-011-3/+25
| | | | | | | They don't really do anything interesting, but it's more consistent this way. Reviewed-by: Timothy Arceri <[email protected]>
* anv/pipeline: Rework the parameters to populate_wm_prog_keyJason Ekstrand2018-08-011-22/+24
| | | | Reviewed-by: Timothy Arceri <[email protected]>
* anv/pipeline: More aggressively optimize away color attachmentsJason Ekstrand2018-08-012-5/+14
| | | | | | | | | | Instead of just looking at the number of color attachments, look at which ones are actually used by the subpass. This lets us potentially throw away chunks of the fragment shader. In DXVK, for example, all subpasses have 8 attachments and most are VK_ATTACHMENT_UNUSED so this is very helpful in that case. Reviewed-by: Timothy Arceri <[email protected]>
* anv: Restrict the number of color regions to those actually writtenJason Ekstrand2018-08-011-0/+5
| | | | | | | | | | The back-end compiler emits the number of color writes specified by wm_prog_key::nr_color_regions regardless of what nir_store_outputs we have. Once we've gone through and figured out which render targets actually exist and are written by the shader, we should restrict the key to avoid extra RT write messages. Reviewed-by: Timothy Arceri <[email protected]>
* anv/pipeline: Fix up deref modes if we delete a FS outputJason Ekstrand2018-08-011-0/+5
| | | | | | | | | With the new deref instructions, we have to keep the modes consistent between the derefs and the variables they reference. Since we remove outputs by changing them to local variables, we need to run the fixup pass to fix the modes. Reviewed-by: Timothy Arceri <[email protected]>
* intel/nir: Call nir_lower_io_to_scalar_earlyJason Ekstrand2018-08-011-5/+12
| | | | | | | | | | | | | | | | | Shader-db results on Kaby Lake: total instructions in shared programs: 15166953 -> 15073611 (-0.62%) instructions in affected programs: 2390284 -> 2296942 (-3.91%) helped: 16469 HURT: 505 total loops in shared programs: 4954 -> 4951 (-0.06%) loops in affected programs: 3 -> 0 helped: 3 HURT: 0 Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel/nir: Split IO arrays into elementsJason Ekstrand2018-08-011-0/+4
| | | | | | | | | | | | | | | | | The NIR nir_lower_io_arrays_to_elements pass attempts to split I/O variables which are arrays or matrices into a sequence of separate variables. This can help link-time optimization by allowing us to remove varyings at a more granular level. Shader-db results on Kaby Lake: total instructions in shared programs: 15177645 -> 15168494 (-0.06%) instructions in affected programs: 79857 -> 70706 (-11.46%) helped: 392 HURT: 0 Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Flag all slots of a flat input as flatJason Ekstrand2018-08-011-6/+9
| | | | | | | | | | | | | | | | Otherwise, only the first vec4 of a matrix or other complex type will get marked as flat and we'll interpolate the others. This was caught by a dEQP test which started failing because it did a SSO vs. non-SSO comparison. Previously, we did the interpolation wrong consistently in both versions. However, with one of Tim Arceri's NIR linkingpatches, we started splitting the matrix input into vectors at link time in the non-SSO version and it started getting correctly interpolated which didn't match the broken SSO version. As of this commit, they both get correctly interpolated. Fixes: e61cc87c757f8bc "i965/fs: Add a flat_inputs field to prog_data" Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* intel/nir: Use the correct scalar stage for consumers when linkingJason Ekstrand2018-08-011-1/+1
| | | | | Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* intel: tools: aubwrite: split gen[89] from gen10+Lionel Landwerlin2018-08-015-186/+416
| | | | | | | | | | | Gen10+ has an additional bit in MI_BATCH_BUFFER_END to signal the end of the context image. We select the largest size for the context image regardless of the generation. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Rafael Antognolli <[email protected]>
* python: Explicitly use byte stringsMathieu Bridon2018-08-011-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | In both Python 2 and 3, zlib.Compress.compress() takes a byte string, and returns a byte string as well. In Python 2, the script was working because: 1. string literalls were byte strings; 2. opening a file in unicode mode, reading from it, then passing the unicode string to compress() would automatically encode to a byte string; On Python 3, the above two points are not valid any more, so: 1. zlib.Compress.compress() refuses the passed unicode string; 2. compressed_data, defined as an empty unicode string literal, can't be concatenated with the byte string returned by compress(); This commit fixes this by explicitly using byte strings where appropriate, so that the script works on both Python 2 and 3. Signed-off-by: Mathieu Bridon <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* python: Open file in binary modeMathieu Bridon2018-08-011-1/+1
| | | | | | | | | | | | | | | | | | The XML parser wants byte strings, not unicode strings. In both Python 2 and 3, opening a file without specifying the mode will open it for reading in text mode ('r'). On Python 2, the read() method of the file object will return byte strings, while on Python 3 it will return unicode strings. Explicitly specifying the binary mode ('rb') makes the behaviour identical in both Python 2 and 3, returning what the XML parser expects. Signed-off-by: Mathieu Bridon <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* python: Better get character ordinalsMathieu Bridon2018-08-011-2/+2
| | | | | | | | | | | | | | | | | In Python 2, iterating over a byte-string yields single-byte strings, and we can pass them to ord() to get the corresponding integer. In Python 3, iterating over a byte-string directly yields those integers. Transforming the byte string into a bytearray gives us a list of the integers corresponding to each byte in the string, removing the need to call ord(). This makes the script compatible with both Python 2 and 3. Signed-off-by: Mathieu Bridon <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* intel/compiler: implement 8-bit constant loadIago Toral Quiroga2018-08-011-0/+5
| | | | | | Fixes VK-GL-CTS CL#2567 Reviewed-by: Jason Ekstrand <[email protected]>
* intel/compiler: add setup_imm_(u)b helpersIago Toral Quiroga2018-08-012-0/+22
| | | | | | | | The hardware doesn't support byte immediates, so similar to setup_imm_df() for doubles, these helpers work by loading the constant value into a VGRF. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/icl: Disable binding table prefetchingTopi Pohjolainen2018-07-271-0/+7
| | | | | | | | | | | | | Gen 11 workarounds table #2056 WABTPPrefetchDisable suggests to disable prefetching of binding tables for ICLLP A0 and B0 steppings. It fixes multiple gpu hangs in ext_framebuffer_multisample* tests on ICLLP B0 h/w. Anuj: Add comments and commit message. Add gen 11 checks in the code. Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Rafael Antognolli <[email protected]>
* intel/compiler: fix lower conversions to account for predicationIago Toral Quiroga2018-07-271-1/+4
| | | | | | | | The pass can create a temporary result for the instruction and then moves from it to the original destination, however, if the original instruction was predicated, the mov has to be predicated as well. Reviewed-by: Jose Maria Casanova Crespo <[email protected]>
* i965: Combine both gl_PatchVerticesIn lowering passes.Kenneth Graunke2018-07-262-4/+2
| | | | | | | | | | | | | | | | | | Until now, we had separate passes for lowering gl_PatchVerticesIn to a statically known constant (for TES inputs when linked against a TCS), and a uniform in the other cases. Annoyingly, one had to be run before nir_lower_system_values, and the other afterward. This simplified the passes, but made life painful for the callers. This patch combines both into a single pass. If you give it a non-zero static count, it uses that. If you give it Mesa state slots, it turns it back into a built-in uniform. Otherwise, it does nothing. This also moves the i965 uniform lowering out to shared code. v2: Make token arrays const. Reviewed-by: Eric Anholt <[email protected]>
* intel/compiler: Delete dead VS intrinsic handling.Kenneth Graunke2018-07-261-12/+4
| | | | | | | These are lowered by brw_nir_lower_vs_inputs(). If they weren't, we would have already hit the unreachable() in emit_system_values_block(). Reviewed-by: Jason Ekstrand <[email protected]>
* anv: drop unused local varsEric Engestrom2018-07-261-6/+0
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: remove incorrect `UNUSED` flagEric Engestrom2018-07-261-1/+1
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* intel: Make the decoder just store addresses for bases, not buffers.Kenneth Graunke2018-07-252-12/+12
| | | | | | | | | | The various base addresses are simply addresses. There may or may not be a buffer located at those addresses. So, it doesn't make much sense to request one. Just save the raw address so we can add it later, when asking about BOs at the final <base + offset> address. Suggested-by: Lionel Landwerlin <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* intel: Make the decoder handle STATE_BASE_ADDRESS not being a buffer.Kenneth Graunke2018-07-252-38/+46
| | | | | | | | | | | | | | | | | | | | | | | | Normally, i965 programs STATE_BASE_ADDRESS every batch, and puts all state for a given base in a single buffer. I'm working on a prototype which emits STATE_BASE_ADDRESS only once at startup, where each base address is a fixed 4GB region of the PPGTT. State may live in many buffers in that 4GB region, even if there isn't a buffer located at the actual base address itself. To handle this, we need to save the STATE_BASE_ADDRESS values across multiple batches, rather than assuming we'll see the command each time. Then, each time we see a pointer, we need to ask the driver for the BO map for that data. (We can't just use the map for the base address, as state may be in multiple buffers, and there may not even be a buffer at the base address to map.) v2: Fix things caught in review by Lionel: - Drop bogus bind_bo.size check. - Drop "get the BOs again" code - we just get the BOs as needed - Add a message about interface descriptor data being unavailable Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: don't crash on vkDestroyDevice(NULL)Eric Engestrom2018-07-251-1/+3
| | | | | | | | CovID: 1438132 Fixes: a99c9e63a07477634ab73 "anv: finish the binding_table_pool on destroyDevice when use_softpin" Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Jose Maria Casanova Crespo <[email protected]>
* anv: fix python whitespace warningEric Engestrom2018-07-251-1/+1
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* anv: cleanup python importsEric Engestrom2018-07-252-3/+3
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* anv: remove unnecessary semicolons in pythonEric Engestrom2018-07-251-3/+3
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* intel: tools: dump: only store device id on successLionel Landwerlin2018-07-251-2/+2
| | | | | | | | We might fail on master node drm fd because we won't have the right permissions. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* i965, anv: Use INTEL_DEBUG for disk_cache driver flagsJordan Justen2018-07-242-1/+7
| | | | | | | | | | | | | | | | | | | Since various options within INTEL_DEBUG could impact code generation, we need to set the disk cache driver_flags parameter based on the INTEL_DEBUG flags in use. An example that will affect the program generated by i965 is the INTEL_DEBUG=nocompact option. The DEBUG_DISK_CACHE_MASK value is added to mask the settings of INTEL_DEBUG that can affect program generation. v2: * Use driver_flags (Tim) * Also update Anvil (Jason) Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* i965, anv: Add extra unused character in disk_cache renderer temp stringJordan Justen2018-07-241-2/+2
| | | | | | | | | | | | This extra character should not be used by snprintf, but we make it available to verify that we printed the exact number we wanted, and didn't overflow. v2: * Also update Anvil Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* nir: rename f2f16_undef to f2f16Karol Herbst2018-07-241-1/+1
| | | | | | | | | | | we need rounding modes on other conversions involving floats and it is easier to rename f2f16_undef than renaming all the other ones. v2: rebased on master Reviewed-by: Jason Ekstrand <[email protected]> Acked-by: Rob Clark <[email protected]> Signed-off-by: Karol Herbst <[email protected]>
* python: Use range() instead of xrange()Mathieu Bridon2018-07-241-1/+1
| | | | | | | | | | | | | | | | Python 2 has a range() function which returns a list, and an xrange() one which returns an iterator. Python 3 lost the function returning a list, and renamed the function returning an iterator as range(). As a result, using range() makes the scripts compatible with both Python versions 2 and 3. Signed-off-by: Mathieu Bridon <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* python: Better iterate over dictionariesMathieu Bridon2018-07-242-6/+6
| | | | | | | | | | | | | | | | In Python 2, dictionaries have 2 sets of methods to iterate over their keys and values: keys()/values()/items() and iterkeys()/itervalues()/iteritems(). The former return lists while the latter return iterators. Python 3 dropped the method which return lists, and renamed the methods returning iterators to keys()/values()/items(). Using those names makes the scripts compatible with both Python 2 and 3. Signed-off-by: Mathieu Bridon <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* intel: Make the disassembler take a const pointer to the assembly.Kenneth Graunke2018-07-242-4/+5
| | | | | | Disassembling doesn't modify the assembly. Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/compiler: Account for built-in uniforms in analyze_ubo_rangesJason Ekstrand2018-07-233-4/+40
| | | | | | | | | | | | The original pass only looked for load_uniform intrinsics but there are a number of other places that could end up loading a push constant. One obvious omission was images which always implicitly use a push constant. Legacy VS clip planes also get pushed into the shader. This fixes some new Vulkan CTS tests that test random combinations of bindings and, in particular, test lots of UBOs and images together. Cc: [email protected] Cc: Kenneth Graunke <[email protected]>