summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* nir: Add functions to subtract and compare addressesCaio Marcelo de Oliveira Filho2019-06-032-0/+54
| | | | | | | v2: Fix comparing addresses from formats that have more than one component by using nir_ball_iequal(). (Jason) Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Add nir_ball_iequal() helperCaio Marcelo de Oliveira Filho2019-06-031-0/+13
| | | | | | Similar to nir_bany_inequal(). Suggested by Jason. Reviewed-by: Jason Ekstrand <[email protected]>
* mesa: ARB program parser should clean parametersSergii Romantsov2019-06-032-2/+13
| | | | | | | | | | Program parser allocates parameter list. In case of parsing error some variables will not be freed. Patch adds freeing of it. Signed-off-by: Sergii Romantsov <[email protected]> Reviewed-by: Tapani Pälli <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* freedreno/ir3: fix counting and printing for half registers.Hyunjun Ko2019-06-034-9/+20
| | | | | v2: defining 0x100 and use this for setting the FS_OUTPUT_REG.HALF_PRECISION Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: Fix up the half reg source even when src instr==NULLNeil Roberts2019-06-031-3/+2
| | | | | | | | | | | | | | | | Previously the loop for assigning registers was bailing out early if the register had a null source. I think the intention is that in this case it isn’t necessary to assign a register. However it was also missing out the part to fix up the types. This can happen if the instruction is copy propagated to be a move from a constant half-float input register. In that case it still needs to fix up the types. Fixes assert in dEQP-GLES3.functional.shaders.invariance.highp.subexpression_precision_mediump when lowering the precision of the variables. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: Add a 16-bit implementation of nir_op_imulNeil Roberts2019-06-031-9/+15
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: set dst type of alu instructions correctly.Hyunjun Ko2019-06-031-5/+8
| | | | | | | | | | Though it should be fixed in RA pass, it needs to be set correctly from the beginning according to the bitsize of NIR dest. v2: Would be better for mad,fddx,fddy to fixup later in RA pass. [small cleanup of fallout from imov/fmov removal fallout] Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: adjust the bitsize of regs when an array loading.Hyunjun Ko2019-06-032-7/+16
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: convert back to 32-bit values for half constant registers.Hyunjun Ko2019-06-032-4/+54
| | | | | | | | | It seems to handle only 32-bit values for half constant registers within floating point opcodes according to the blob driver. So we need to convert back to 32-bit values from 16-bit values, when a lower precision pass is in effect. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: check the type of regs of absneg opcode in is_same_type_mov.Hyunjun Ko2019-06-031-0/+16
| | | | | | | | | | If the type of dest reg and src reg of absneg opcode are different, it shouldn't be considered as same type mov. This patch becomes meaningful when we start to use mediump information for doing precision lowering to 16bit. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: set proper dst type for uniform according to the type of nir ↵Hyunjun Ko2019-06-032-7/+14
| | | | | | | | | | | dest. eg. uniform mediump vec4 f; This patch means nothing since there's no mediump lowering pass for now, but will be meaningful when the pass land in the near future. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: Use output type size to set OUTPUT_REG_HALF_PRECISIONNeil Roberts2019-06-032-6/+2
| | | | | | | | | | | | | | | | | | | Previously the A5XX_SP_FS_OUTPUT_REG_HALF_PRECISION was set depending on whether half_precision was set in the shader key. With support for mediump precision, it is possible to have different outputs use different precisions. That means we can’t have a global shader state to specify it. Instead it now tries to copy the half-float-ness from the nir_variable for the output into the ir3_shader_variant. This is then used to decide whether to set half-precision for each output. The a6xx version is copied from the a5xx code but it has not been tested. v2. [Hyunjun Ko ([email protected])] There's the half flag recently added, which represents precision based on IR3_REG_HALF. Now use this flag to avoid duplication. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: Fix loading half-float immediate vectorsNeil Roberts2019-06-031-3/+12
| | | | | | | | | Previously the code to load from a constant instruction was always using the u32 pointer. If the constant is actually a 16-bit source this would end up with the wrong values because the pointer would be offset by the wrong size. This fixes it to use the u16 pointer. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: immediately schedule meta instructionsRob Clark2019-06-031-0/+3
| | | | | | | | The aren't real instructions, and don't change # of live values, so no point in them competing with real instructions. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* freedreno/ir3: scheduler improvementsRob Clark2019-06-032-13/+115
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For instructions that increase the # of live values, apply a threshold to avoid scheduling them too early. And factor the net change of # of live values that would result from scheduling an instruction, to prioritize instructions that reduce number of live values as the number of live values increases. For manhattan: total instructions in shared programs: 27869 -> 28413 (1.95%) instructions in affected programs: 26756 -> 27300 (2.03%) helped: 102 HURT: 87 total full in shared programs: 1903 -> 1719 (-9.67%) full in affected programs: 1390 -> 1206 (-13.24%) helped: 124 HURT: 9 The reduction in register usage nets ~20% gain in manhattan. (So getting mediump support should be a huge win for gles gfxbench.) Also significantly helps some of the more complex shadertoy shaders, like IQ's Piano (32 to 18 regs, doubles fps). The effect is less pronounced on smaller shaders. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* freedreno/ir3: sched should mark outputs usedRob Clark2019-06-031-19/+35
| | | | | | | | Account for shader outputs and values live in any direct/indirect successor block. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* mesa: EXT_dsa add selectorless matrix stack functionsPierre-Eric Pelloux-Prayer2019-06-035-112/+925
| | | | | | | | | Allows the legacy matrix stacks to be manipulated without disturbing the matrix mode selector. Adapted from a patch from Chris Forbes. Reviewed-by: Marek Olšák <[email protected]>
* mesa: factor out enum -> matrix stack lookupPierre-Eric Pelloux-Prayer2019-06-031-54/+56
| | | | | | | | | Split this out from glMatrixMode since we're about to need it independently for EXT_DSA. Adapted from Chris Forbes commit. Reviewed-by: Marek Olšák <[email protected]>
* mesa: add new EXT_direct_state_access tokensTimothy Arceri2019-06-031-0/+4
| | | | Reviewed-by: Marek Olšák <[email protected]>
* glapi: add EXT_direct_state_accessChris Forbes2019-06-033-0/+21
| | | | | Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* mesa: add a list of EXT_direct_state_access to dispatch sanityTimothy Arceri2019-06-031-0/+210
| | | | | | | This extension is huge and this gives us a TODO list of functions to implement. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: init sctx->dma_copy before using itPierre-Eric Pelloux-Prayer2019-06-031-3/+3
| | | | | | | | | | | | | | Commit a1378639ab19 reordered context functions initializations but broke sctx->b.resource_copy_region init when using AMD_DEBUG=forcedma. In this case sctx->dma_copy was assigned a value after being used in: sctx->b.resource_copy_region = sctx->dma_copy; This commit moves the FORCE_DMA special case after sctx->dma_copy initialization. See https://bugs.freedesktop.org/show_bug.cgi?id=110422 Signed-off-by: Marek Olšák <[email protected]>
* d3dadapter9: Revert to old throttling limit valueAxel Davy2019-06-031-2/+4
| | | | | | | | | | | | | | | Recently PIPE_CAP_MAX_FRAMES_IN_FLIGHT was changed from 2 to 1: 20909284f204091757c050aa40cfffaf3f981b9c No driver seems to overwrite the default value. One user reports severe regressions for some games. For now, revert to the value 2 for nine. Cc: "19.1" [email protected] Signed-off-by: Axel Davy <[email protected]>
* ac: use amdgpu-flat-work-group-sizeMarek Olšák2019-06-034-10/+15
| | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* u_blitter: don't fail mipmap generation for depth formats containing stencilMarek Olšák2019-06-031-1/+2
| | | | | | | Bugzilla: https://bugzilla.freedesktop.org/show_bug.cgi?id=109754 Cc: 19.0 19.1 <[email protected]> Tested-by: Pierre-Eric Pelloux-Prayer <[email protected]>
* etnaviv: drop a bunch of duplicated gallium PIPE_CAP default codeChristian Gmeiner2019-06-031-157/+0
| | | | | | | Now that we have the util function for the default values, we can get rid of the boilerplate. Signed-off-by: Christian Gmeiner <[email protected]>
* radv: flush pending query reset caches before copying resultsSamuel Pitoiset2019-06-031-15/+25
| | | | | | | | | | From the Vulkan spec 1.1.108: "vkCmdCopyQueryPoolResults is guaranteed to see the effect of previous uses of vkCmdResetQueryPool in the same queue, without any additional synchronization." Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* nir: copy intrinsic type when lowering load input/uniform and store outputJonathan Marek2019-06-032-0/+3
| | | | | | | | | Fixes: c1275052 "nir: add type information to load uniform/input and store output intrinsics" Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Erico Nunes <[email protected]> Tested-by: Erico Nunes <[email protected]> Tested-by: Andreas Baierl <[email protected]>
* ac,radv: remove the vec3 restriction with LLVM 9+Samuel Pitoiset2019-06-034-11/+18
| | | | | | | | | | | | | | | | | | | | | | | | | This changes requires LLVM r356755. 32706 shaders in 16744 tests Totals: SGPRS: 1448848 -> 1455984 (0.49 %) VGPRS: 1016684 -> 1016220 (-0.05 %) Spilled SGPRs: 25871 -> 25815 (-0.22 %) Spilled VGPRs: 122 -> 122 (0.00 %) Scratch size: 11964 -> 11956 (-0.07 %) dwords per thread Code Size: 55324500 -> 55301152 (-0.04 %) bytes Max Waves: 235660 -> 235586 (-0.03 %) Totals from affected shaders: SGPRS: 293704 -> 300840 (2.43 %) VGPRS: 246716 -> 246252 (-0.19 %) Spilled SGPRs: 159 -> 103 (-35.22 %) Scratch size: 188 -> 180 (-4.26 %) dwords per thread Code Size: 8653664 -> 8630316 (-0.27 %) bytes Max Waves: 60811 -> 60737 (-0.12 %) Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* nir: Return nir_type_invalid for non-numeric base typesCaio Marcelo de Oliveira Filho2019-05-311-2/+14
| | | | | | | | | Now that the type gathering function look at instructions that might have other types, return invalid type instead of crashing. That invalid will be properly ignored later. Fixes: c12750527b7 "nir: add type information to load uniform/input and store output intrinsics" Reviewed-by: Jason Ekstrand <[email protected]>
* iris: Drop unused locals from iris_clear.c to avoid warningCaio Marcelo de Oliveira Filho2019-05-311-3/+0
| | | | Reviewed-by: Jordan Justen <[email protected]>
* nir: remove bool lowering from lower_int_to_floatJonathan Marek2019-05-313-71/+45
| | | | | | | | | | | | | | Removes the bool_to_float logic from the int_to_float pass, so that both can be used separately. By having separate passes we have better validation and it makes it possible to use with the lower_ftrunc option (int lowering generates ftrunc, but lower_ftrunc generates bools, ftrunc lowering should probably be reworked). For now we always expect lower_bool to come after lower_int. Also fixes f2i32 to become ftrunc and adds u2f/f2u cases. Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: fix lower_{int,bool}_to_float for new mov opcodeJonathan Marek2019-05-312-0/+2
| | | | | | | It is treated like the vecN instructions which also have no type. Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: add lower_bitshift optionJonathan Marek2019-05-314-3/+10
| | | | | | | | | Add a "lower_bitshift" option, which disables optimizations introducing bitshifts and lowers ishl by constant to a multiply, so that we don't have to deal with bitshifts in int_to_float lowering. Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: fix gather_ssa_typesJonathan Marek2019-05-311-36/+52
| | | | | | | | | | | | | | Consts and undefs can be used as different types (common with "0" constant) so don't copy types from consts/undefs, only to them. It doesn't entirely solve the problem that the type given to the const could be wrong , but now the only realistic case is with "0" which is the same when casted to float, so it doesn't matter for lower_int_to_float. The other change is to get type information for load input/uniform and store output, and use that to get correct results. Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: add type information to load uniform/input and store output intrinsicsJonathan Marek2019-05-314-10/+42
| | | | | | | This type information will be used by gather_ssa_types to get usable results Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: improvements to native_integers removalJonathan Marek2019-05-312-18/+2
| | | | | | | | | Improvements related to the patch that removed native_integers: * In glsl_to_nir, special cases for i2f,u2f,etc are no longer needed * In prog_to_nir, use sge/slt and let lower_scmp lower it if needed Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* freedreno/a6xx: add 'type' to shader state keyRob Clark2019-05-312-0/+2
| | | | | | | | | | | | | | | | | | | | We could have identical texture state for both VS and FS.. which would result in VS state getting created first, and FS state mapping to the identical cmdstream. Resulting in VS state getting emitted twice and no FS state emitted. Fixes: dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.basic_array.sampler2D_both dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.struct_in_array.sampler2D_samplerCube_both dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.array_in_struct.sampler2D_samplerCube_both dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.nested_structs_arrays.sampler2D_samplerCube_both dEQP-GLES2.functional.uniform_api.value.assigned.by_value.render.nested_structs_arrays.sampler2D_samplerCube_both dEQP-GLES31.functional.program_uniform.by_pointer.render.array_in_struct.sampler2D_samplerCube_both dEQP-GLES31.functional.program_uniform.by_pointer.render.nested_structs_arrays.sampler2D_samplerCube_both dEQP-GLES31.functional.program_uniform.by_value.render.nested_structs_arrays.sampler2D_samplerCube_both Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* freedreno/ir3: fix constlen versus indirect UBORob Clark2019-05-311-1/+8
| | | | | | | | | | | | | | If we access the address of the UBO indirectly, and there is no higher const emitted w/ direct access (like an immediate lowered to uniform) the assembler won't figure out the correct constlen. Fixes: dEQP-GLES31.functional.shaders.opaque_type_indexing.ubo.uniform_vertex dEQP-GLES31.functional.shaders.opaque_type_indexing.ubo.uniform_fragment dEQP-GLES31.functional.shaders.opaque_type_indexing.ubo.dynamically_uniform_vertex dEQP-GLES31.functional.shaders.opaque_type_indexing.ubo.dynamically_uniform_fragment Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: fix GPU crash on small render targetsRob Clark2019-05-311-0/+7
| | | | | | | Fixes dEQP-GLES2.functional.multisampled_render_to_texture.readpixels Signed-off-by: Rob Clark <[email protected]> Acked-by: Eric Anholt <[email protected]>
* freedreno/ir3: set more barrier bitsRob Clark2019-05-311-0/+1
| | | | | | | | | | | Blob is also setting the .l bit, and it seems to solve some intermittent failures with a couple of deqp's: dEQP-GLES31.functional.image_load_store.2d.qualifiers.coherent_r32i dEQP-GLES31.functional.image_load_store.2d.qualifiers.volatile_r32f Signed-off-by: Rob Clark <[email protected]> Acked-by: Eric Anholt <[email protected]>
* freedreno/ir3: set (ss) on last_input if ldlvRob Clark2019-05-311-3/+12
| | | | | | | | | | | | | | | It seems like (ei) handling doesn't sync on (ss), so we could end up in a situation where we release varying storage before an ldlv for flat shaded varyings completes. Keep track if we've done an (ss) since the last ldlv, and if not add (ss) flag to last_input which gets (ei). Noticed with dEQP-GLES3.functional.fragment_out.random.24 and dEQP-GLES3.functional.fragment_out.random.27, which previously passed by luck because ir3_sched ordered instructions in a way that resulted in a lucky (ss). Signed-off-by: Rob Clark <[email protected]> Acked-by: Eric Anholt <[email protected]>
* freedreno/ir3: add assertRob Clark2019-05-311-0/+2
| | | | | | | | | The special handling for last_input assumes that all the varying loads are in the first block. Add an assert to catch if anyone breaks that assumption. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* util/hash_table: Use fast modulo computationConnor Abbott2019-05-312-37/+52
| | | | | | | | While we're here, copy the size table from set.c to get rid of hard tabs in the hash_table.c version. Reviewed-by: Eric Anholt <[email protected]> Acked-by: Jason Ekstrand <[email protected]>
* util/set: Use fast modulo computationConnor Abbott2019-05-312-37/+52
| | | | | | | | | | | | Compilation times with my shader-db database: Difference at 95.0% confidence -1.22312 +/- 0.726033 -0.283979% +/- 0.168254% (Student's t, pooled s = 1.02177) Reviewed-by: Eric Anholt <[email protected]> Acked-by: Jason Ekstrand <[email protected]>
* util: Add a helper for faster remaindersConnor Abbott2019-05-315-0/+210
| | | | | | | | | | This should be at least as fast as using fast_idiv_by_const, and has the advantage that the precomputation is simple enough to be evaluated at Mesa-compile time for hash tables and sets which have a fixed table of possible divisors. Acked-by: Eric Anholt <[email protected]> Acked-by: Jason Ekstrand <[email protected]>
* util/hash_table: Add specialized resizing add functionConnor Abbott2019-05-311-1/+27
| | | | | | | To keep it in sync with the set implementation. Reviewed-by: Eric Anholt <[email protected]> Acked-by: Jason Ekstrand <[email protected]>
* util/set: Add specialized resizing add functionConnor Abbott2019-05-311-3/+23
| | | | | | | | | | | | | | | | | | | | A significant portion of the time spent in nir_opt_cse for the Dolphin ubershaders was in resizing the set. When resizing a hash table, we know in advance that each new element to be inserted will be different from every other element, so we don't have to compare them, and there will be no tombstone elements, so we don't have to worry about caching the first-seen tombstone. We add a specialized add function which skips these steps entirely, speeding up resizing. Compile-time results from my shader-db database: Difference at 95.0% confidence -2.29143 +/- 0.845534 -0.529475% +/- 0.194767% (Student's t, pooled s = 1.08807) Reviewed-by: Eric Anholt <[email protected]> Acked-by: Jason Ekstrand <[email protected]>
* util/hash_table: Pull out loop-invariant computationsConnor Abbott2019-05-311-14/+13
| | | | | | | | | To keep the set and hash table in sync. Note that some of this had already been done for hash tables, in particular pulling out the hash % ht->size computation. Reviewed-by: Eric Anholt <[email protected]> Acked-by: Jason Ekstrand <[email protected]>
* util/set: Pull out loop-invariant computationsConnor Abbott2019-05-311-16/+16
| | | | | | | | | | | | | | | | | | | | | | Unfortunately GCC can't do this for us, probably because we call the key comparison function which GCC can't prove won't modify arbitrary memory. This is a pretty hot function, so do the optimization manually to be sure the compiler will get it right. While we're here, make the computation of the new probe address use a single conditional subtract instead of a modulo, since we know that it won't ever get as big as 2 * ht->size before the modulo. Modulos tend to be pretty expensive operations. shader-db compile time results for my database: Difference at 95.0% confidence -2.24934 +/- 0.69897 -0.516296% +/- 0.159993% (Student's t, pooled s = 0.983684) Reviewed-by: Eric Anholt <[email protected]> Acked-by: Jason Ekstrand <[email protected]>