aboutsummaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* intel/fs: Add INTEL_DEBUG=no32 debugging flag.Francisco Jerez2020-04-283-2/+5
| | | | | | | | | | | This is useful in order to identify codegen issues caused by SIMD32. It doesn't currently have any effect on compute shaders since SIMD32 dispatch is only enabled for CS when it's strictly necessary to do so in order to support the workgroup size requested for the shader -- That might change in the future though when we hook up the SIMD32 heuristic to CS compilation. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/fs: Implement performance analysis-based SIMD32 heuristic for fragment ↵Francisco Jerez2020-04-281-7/+17
| | | | | | | | | | | | | | | | | | | | | | | shaders. The heuristic enables the SIMD32 fragment shader based on whether the IR performance modeling pass predicts it to have greater throughput than the SIMD16 and SIMD8 variants of the same shader. It would be straightforward to do the same thing in order to control whether SIMD16 dispatch is enabled, but it's pending additional performance evaluation. The INTEL_DEBUG=do32 option is left around in order to force the SIMD32 shader to be used regardless of the result of the heuristic, since it's useful as a debugging aid e.g. in order to identify SIMD32-specific codegen issues which may be masked by the SIMD32 heuristic, or cases where the heuristic is incorrectly disabling SIMD32 shaders that offer a performance advantage. Currently this is only enabled on Gen6+, since SIMD32 codegen support is incomplete on earlier platforms. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/fs: Heap-allocate fs_visitors in brw_compile_fs().Francisco Jerez2020-04-281-38/+39
| | | | | | | | | | | | | | This makes brw_compile_fs() look a bit more similar to brw_compile_cs(). It saves us three v*_shader_stats local variables, and will save us additional triplicated declarations as we start tracking IR performance analysis results. The triplicated cfg pointers are left around because they're set to NULL to mark specific dispatch modes as disabled (e.g. in order to enforce hardware restrictions). Doing the same thing with the visitor pointers would cause data leaks. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/ir: Import shader performance analysis pass.Francisco Jerez2020-04-288-1/+1660
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This introduces an analysis pass intended to estimate several performance statistics of the shader, including cycle count latency and throughput values, based on static modeling. It has instruction performance information more comprehensive than the current scheduling pass for all platforms between Gen4-11, and works on both the FS and VEC4 back-end. The most immediate purpose of this pass is to implement a heuristic meant to determine whether using SIMD32 dispatch for a fragment shader can be expected to help more than it hurts. In addition this will allow the effect of passes run after scheduling (e.g. the TGL software scoreboard pass and the VEC4 dependency control pass) to be visible in shader-db statistics. But that isn't the end of the story, other potential applications of this pass (not part of this MR) I've been playing around with are: - Implement a similar SIMD16 heuristic allowing the identification of inefficient SIMD16 fragment shaders. - Implement similar SIMD16 and SIMD32 heuristics for the compute shader stage -- Currently compute shader builds always use the SIMD16 shader if available and never use the SIMD32 shader unless strictly necessary, which is suboptimal under certain conditions. - Hook up to the instruction scheduler in order to improve the accuracy of its timing information. - Use as heuristic in order to drive the selection of scheduling modes (Matt was experimenting with that). - Plug to the TGL software scoreboard pass in order to implement a more effective SBID token allocation algorithm, since in general the optimal token allocation depends on the timings of all instructions in the program. - Use its bottleneck detection functionality in order to implement a heuristic computing a more optimal bound for the number of fragment shader threads executed in parallel (by adjusting the MaximumNumberofThreadsPerPSD control of 3DSTATE_PS). As a follow-up I'm planning to submit updated timing information for Gen12 platforms -- Everything else required to support Gen12 like SWSB handling is already included in this patch, but there were some IP concerns regarding the TGL timing parameters since they cannot currently be obtained with the documentation and hardware which is publicly available. The timing parameters for any previous Gen7-11 platforms can be obtained by anyone by sampling the timestamp register using e.g. shader_time, though I have some more convenient instrumentation coming up. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/vec4: Fix constness of vec4_instruction::reads_flag() and ::writes_flag().Francisco Jerez2020-04-281-2/+2
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* intel/fs: Replace fs_visitor::bank_conflict_cycles() with stand-alone function.Francisco Jerez2020-04-284-17/+17
| | | | | | This will be re-usable by the IR performance analysis pass. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/fs: Fix constness of argument of ↵Francisco Jerez2020-04-281-2/+2
| | | | | | fs_instruction_scheduler::is_compressed(). Reviewed-by: Kenneth Graunke <[email protected]>
* intel/fs: Rename half() helpers to quarter(), allow index up to 3.Francisco Jerez2020-04-284-14/+14
| | | | | | | | Makes more sense considering SIMD32. Relaxing the assertion in brw_ir_fs.h will be required in order to avoid assertion failures on SNB with SIMD32 fragment shaders. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/ir: Add missing initialization of backend_reg::offset during construction.Francisco Jerez2020-04-281-1/+1
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* intel/fs/gen12: Fix Render Target Read header setup for new thread payload ↵Francisco Jerez2020-04-281-0/+17
| | | | | | | | | | | | | | | | | | | layout. In Gen12 the Poly 0 Info DWORD containing the Viewport Index and Render Target Index fields were moved from r0.0 to r1.1 in order to make room for dual-polygon dispatch. The render target message format was updated to expect that information in the same location, so we didn't need to make any changes for framebuffer fetch to work with SIMD8 and SIMD16 dispatch. Unfortunately that won't work with SIMD32, since the render target message header is assembled from r0 and r2 instead of r1, and the r2 thread payload wasn't updated with an additional copy of the same information. We need to fix things up manually instead. This avoids a handful of EXT_shader_framebuffer_fetch regressions in combination with SIMD32 fragment shaders. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/fs/gen12: Work around dual-source blending hangs in combination with ↵Francisco Jerez2020-04-281-2/+3
| | | | | | | | | | | SIMD32. This applies the same work-around I commited as b84fa0b31e67 "intel/fs/gen11: Work around dual-source blending hangs in combination with SIMD32." to Gen12, which seems to suffer from the same hardware bug found empirically. The failure mode seems to be identical. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/fs/gen12: Fix hangs with per-sample SIMD32 fragment shader dispatch.Francisco Jerez2020-04-281-3/+10
| | | | | | | | | | | | | | | The Gen12 docs are rather contradictory regarding the dispatch configurations supported by the fragment shader -- The same table present in previous generations seems to imply that only one dispatch mode can be enabled when doing per-sample shading, but a restriction documented in the 3DSTATE_PS_BODY page implies the opposite: That SIMD32 can only be used in combination with some other dispatch mode. The latter seems to match the behavior of real hardware as I could tell from my testing: A bunch of multisample test-cases that do per-sample shading hang if we only provide a SIMD32 shader. Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: Follow OpenGL conversion rules for values that exceed storage sizeDylan Baker2020-04-291-4/+33
| | | | | | | | | | | | | | | | | | | Section 2.2.2 (Data Conversions For State Query Commands) of the OpenGL 4.5 spec says: Following these steps, if a value is so large in magnitude that it cannot be represented by the returned data type, then the nearest value representable using that type is returned. The current code doesn't do the correct thing, because it truncates a long (potentially a 64bit values) to an int. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2828 Fixes: 53c36dfcfe3eb3749a53267f054870280afb0d71 ("replace IROUND with util functions") Reviewed-by: Matt Turner <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4673>
* pan/bit: Add BITWISE testAlyssa Rosenzweig2020-04-291-0/+32
| | | | | Signed-off-by: Alyssa Rosenzweig <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4790>
* pan/bit: Interpret BI_BITWISEAlyssa Rosenzweig2020-04-291-2/+22
| | | | | | | No shifting yet. Signed-off-by: Alyssa Rosenzweig <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4790>
* pan/bi: Handle iand/ior/ixor in NIR->BIRAlyssa Rosenzweig2020-04-291-0/+18
| | | | | Signed-off-by: Alyssa Rosenzweig <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4790>
* pan/bi: Pack BI_BITWISEAlyssa Rosenzweig2020-04-292-2/+77
| | | | | Signed-off-by: Alyssa Rosenzweig <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4790>
* pan/bi: Add bitwise modifiersAlyssa Rosenzweig2020-04-292-0/+12
| | | | | Signed-off-by: Alyssa Rosenzweig <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4790>
* freedreno/a6xx: invalidate tex state cache entries on rebindRob Clark2020-04-293-1/+30
| | | | | | | | | | | | When a resource's backing bo changes, its seqno will be incremented. Which would result in a new tex state cache key, and nothing to clean up the old tex state until the sampler view/state is destroyed. But in some games, that may never happen, or at least not happen before we run out of memory. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2830 Signed-off-by: Rob Clark <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4744>
* freedreno: rebind_resource() *before* bo changesRob Clark2020-04-291-4/+2
| | | | | | | | | | | This will matter in the next patch, where we need the original rsc->seqno. It means slight shuffling of where we call rebind_resource() in the `fd_try_shadow_resource()` path. Signed-off-by: Rob Clark <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4744>
* freedreno: rebind resource in all contextsRob Clark2020-04-295-15/+27
| | | | | | | If the resource is rebound, we need to invalidate in all contexts. Signed-off-by: Rob Clark <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4744>
* freedreno: optimize rebind_resource()Rob Clark2020-04-294-38/+123
| | | | | | | | | | | Track how resources are used, ie. which state they may potentially dirty if the backing bo is changed/reallocated, to optimize rebind_resource(). This will be more important in a later patch when we hook up eviction of entries in a6xx tex state cache. Signed-off-by: Rob Clark <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4744>
* freedreno: mark more state dirty when rebinding resourcesRob Clark2020-04-292-6/+16
| | | | | | | Plus a bonus typo fix. Signed-off-by: Rob Clark <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4744>
* freedreno: don't realloc idle bo'sRob Clark2020-04-292-5/+13
| | | | | | | | | The `DISCARD_WHOLE_RESOURCE` is just a hint. And `rebind_resource()` is a bunch of faffing about (and going to get worse in a later patch), so let's not bother when the bo is already idle. Signed-off-by: Rob Clark <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4744>
* freedreno: small whitespace fixRob Clark2020-04-291-1/+1
| | | | | Signed-off-by: Rob Clark <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4744>
* gallium/swr: Fix crashes and failures in vertex fetchJan Zielinski2020-04-282-3/+7
| | | | | | | | | | | This commit fixes two problems: - In some cases SWR does not correctly report to Gallium which formats are supported. - Incorrect LLVM instructions are used in vertex fetch in some situations Reviewed-by: Krzysztof Raszkowski <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4788>
* freedreno/log-parser: support to read gzip'd logsRob Clark2020-04-281-1/+8
| | | | | | | ~50MB gzip'd log files are nicer than ~300MB uncompressed Signed-off-by: Rob Clark <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4750>
* freedreno/a6xx: pre-calculate expected vsc stream sizesRob Clark2020-04-288-1/+229
| | | | | | | | | | | | We should only rely on overflow detection for indirect draws, where we have no other option. This doesn't use quite the worst-possible-case sizes, which in practice seem to be ~20x larger than what is required. But instead uses roughly half of that. Signed-off-by: Rob Clark <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4750>
* freedreno: add helper to estimate # of bins per pipeRob Clark2020-04-282-6/+24
| | | | | | | | | For vsc size calculation, we need to know the # of bins per pipe. Or at least the worst-case # of bins, assuming we don't eliminate an unused depth/ stencil buffer. Signed-off-by: Rob Clark <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4750>
* freedreno/a6xx+tu: rename VSC_DATA/VSC_DATA2Rob Clark2020-04-288-134/+129
| | | | | | | | These are the draw-stream and primitive-stream, so lets give them more descriptive names. Signed-off-by: Rob Clark <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4750>
* aco: fix vgpr nir_op_vecn with sgpr operandsRhys Perry2020-04-281-2/+7
| | | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4772>
* aco: improve clamped integer addition disassembly workaroundRhys Perry2020-04-281-3/+8
| | | | | | | | Make it work with 16-bit and GFX10. Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4772>
* aco: add various GFX10 int16 opcodesRhys Perry2020-04-281-2/+11
| | | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4772>
* aco: fix sub-dword overwrite check in RA validatorRhys Perry2020-04-281-1/+1
| | | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4772>
* aco: fix sub-dword out-of-bounds check in RA validatorRhys Perry2020-04-281-2/+2
| | | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4772>
* aco: add missing adjust_max_used_regs()Rhys Perry2020-04-281-0/+1
| | | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4772>
* aco: improve RA for uneven p_split_vectorRhys Perry2020-04-281-1/+2
| | | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4772>
* aco: don't recurse in sub-dword get_reg_simple()Rhys Perry2020-04-281-0/+1
| | | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4772>
* aco: split self-intersecting copies instead of swappingRhys Perry2020-04-281-0/+33
| | | | | | | | | | | | | | | | | Example situation: v1 = {v0.hi, v1.lo} v0.hi = v1.hi The 4-byte copy's definition is completely used, but swapping it makes no sense. We have to split it to generate correct code: swap(v0.hi, v1.lo) swap(v0.hi, v1.hi) Found in dEQP-VK.spirv_assembly.type.vec3.i16.constant_composite_vert Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4772>
* aco: fix neighboring register check in get_reg_simple()Rhys Perry2020-04-281-1/+1
| | | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4772>
* aco: check alignment of non-subdword registers in get_reg_specified()Rhys Perry2020-04-281-0/+2
| | | | | | | | | When splitting a v6b vector into v1 and v2b components, we should ensure the v1 definition doesn't start at the upper half. Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4772>
* aco: make RegisterFile::block() take a regclassRhys Perry2020-04-281-9/+9
| | | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4772>
* anv: Claim VK_EXT_robustness2 supportJason Ekstrand2020-04-282-0/+18
| | | | | | Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4767>
* anv: Handle null vertex buffer bindingsJason Ekstrand2020-04-281-11/+20
| | | | | | Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4767>
* anv: Handle NULL descriptorsJason Ekstrand2020-04-285-73/+150
| | | | | | Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4767>
* nir/combine_stores: Handle volatileJason Ekstrand2020-04-282-1/+66
| | | | | Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4767>
* nir/dead_write_vars: Handle volatileJason Ekstrand2020-04-282-0/+61
| | | | | | | | We can't remove volatile writes and we can't combine them with other volatile writes so all we can do is clear the unused bits. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4767>
* nir/copy_prop_vars: Report progress when deleting self-copiesJason Ekstrand2020-04-282-0/+138
| | | | | | | Fixes: 62332d139c8f6 "nir: Add a local variable-based copy prop..." Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4767>
* nir/copy_prop_vars: Handle volatile betterJason Ekstrand2020-04-281-16/+18
| | | | | | | | | For deref_store, we can still delete invalid stores that write to statically OOB data. For everything, we need to make sure that we kill aliases of destinations even if it's volatile. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4767>
* vulkan: Update Vulkan XML and headers to 1.2.139Jason Ekstrand2020-04-281-415/+1213
| | | | | Acked-by: Caio Marcelo de Oliveira Filho <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4767>