summaryrefslogtreecommitdiffstats
path: root/src/freedreno
Commit message (Collapse)AuthorAgeFilesLines
* freedreno: Improve the pi approximations in trig lowering.Eric Anholt2019-06-041-2/+2
| | | | | | | | | | | | When comparing our sin/cos behavior to the closed source driver, I noticed that we were off by a bit (or, in the case of 1/2pi, 3 bits). Fixes: dEQP-GLES3.functional.shaders.random.trigonometric.vertex.52 dEQP-GLES3.functional.shaders.random.all_features.vertex.0 Reviewed-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* freedreno: Fix GCC build error.Vinson Lee2019-06-031-1/+1
| | | | | | | | | | | ../src/freedreno/vulkan/tu_device.c:900:4: error: initializer element is not constant .minImageTransferGranularity = (VkExtent3D) { 1, 1, 1 }, ^ Suggested-by: Kristian Høgsberg <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110698 Signed-off-by: Vinson Lee <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* freedreno/ir3: fix counting and printing for half registers.Hyunjun Ko2019-06-032-7/+18
| | | | | v2: defining 0x100 and use this for setting the FS_OUTPUT_REG.HALF_PRECISION Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: Fix up the half reg source even when src instr==NULLNeil Roberts2019-06-031-3/+2
| | | | | | | | | | | | | | | | Previously the loop for assigning registers was bailing out early if the register had a null source. I think the intention is that in this case it isn’t necessary to assign a register. However it was also missing out the part to fix up the types. This can happen if the instruction is copy propagated to be a move from a constant half-float input register. In that case it still needs to fix up the types. Fixes assert in dEQP-GLES3.functional.shaders.invariance.highp.subexpression_precision_mediump when lowering the precision of the variables. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: Add a 16-bit implementation of nir_op_imulNeil Roberts2019-06-031-9/+15
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: set dst type of alu instructions correctly.Hyunjun Ko2019-06-031-5/+8
| | | | | | | | | | Though it should be fixed in RA pass, it needs to be set correctly from the beginning according to the bitsize of NIR dest. v2: Would be better for mad,fddx,fddy to fixup later in RA pass. [small cleanup of fallout from imov/fmov removal fallout] Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: adjust the bitsize of regs when an array loading.Hyunjun Ko2019-06-032-7/+16
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: convert back to 32-bit values for half constant registers.Hyunjun Ko2019-06-032-4/+54
| | | | | | | | | It seems to handle only 32-bit values for half constant registers within floating point opcodes according to the blob driver. So we need to convert back to 32-bit values from 16-bit values, when a lower precision pass is in effect. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: check the type of regs of absneg opcode in is_same_type_mov.Hyunjun Ko2019-06-031-0/+16
| | | | | | | | | | If the type of dest reg and src reg of absneg opcode are different, it shouldn't be considered as same type mov. This patch becomes meaningful when we start to use mediump information for doing precision lowering to 16bit. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: set proper dst type for uniform according to the type of nir ↵Hyunjun Ko2019-06-032-7/+14
| | | | | | | | | | | dest. eg. uniform mediump vec4 f; This patch means nothing since there's no mediump lowering pass for now, but will be meaningful when the pass land in the near future. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: Fix loading half-float immediate vectorsNeil Roberts2019-06-031-3/+12
| | | | | | | | | Previously the code to load from a constant instruction was always using the u32 pointer. If the constant is actually a 16-bit source this would end up with the wrong values because the pointer would be offset by the wrong size. This fixes it to use the u16 pointer. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: immediately schedule meta instructionsRob Clark2019-06-031-0/+3
| | | | | | | | The aren't real instructions, and don't change # of live values, so no point in them competing with real instructions. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* freedreno/ir3: scheduler improvementsRob Clark2019-06-032-13/+115
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For instructions that increase the # of live values, apply a threshold to avoid scheduling them too early. And factor the net change of # of live values that would result from scheduling an instruction, to prioritize instructions that reduce number of live values as the number of live values increases. For manhattan: total instructions in shared programs: 27869 -> 28413 (1.95%) instructions in affected programs: 26756 -> 27300 (2.03%) helped: 102 HURT: 87 total full in shared programs: 1903 -> 1719 (-9.67%) full in affected programs: 1390 -> 1206 (-13.24%) helped: 124 HURT: 9 The reduction in register usage nets ~20% gain in manhattan. (So getting mediump support should be a huge win for gles gfxbench.) Also significantly helps some of the more complex shadertoy shaders, like IQ's Piano (32 to 18 regs, doubles fps). The effect is less pronounced on smaller shaders. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* freedreno/ir3: sched should mark outputs usedRob Clark2019-06-031-19/+35
| | | | | | | | Account for shader outputs and values live in any direct/indirect successor block. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* freedreno/ir3: fix constlen versus indirect UBORob Clark2019-05-311-1/+8
| | | | | | | | | | | | | | If we access the address of the UBO indirectly, and there is no higher const emitted w/ direct access (like an immediate lowered to uniform) the assembler won't figure out the correct constlen. Fixes: dEQP-GLES31.functional.shaders.opaque_type_indexing.ubo.uniform_vertex dEQP-GLES31.functional.shaders.opaque_type_indexing.ubo.uniform_fragment dEQP-GLES31.functional.shaders.opaque_type_indexing.ubo.dynamically_uniform_vertex dEQP-GLES31.functional.shaders.opaque_type_indexing.ubo.dynamically_uniform_fragment Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: set more barrier bitsRob Clark2019-05-311-0/+1
| | | | | | | | | | | Blob is also setting the .l bit, and it seems to solve some intermittent failures with a couple of deqp's: dEQP-GLES31.functional.image_load_store.2d.qualifiers.coherent_r32i dEQP-GLES31.functional.image_load_store.2d.qualifiers.volatile_r32f Signed-off-by: Rob Clark <[email protected]> Acked-by: Eric Anholt <[email protected]>
* freedreno/ir3: set (ss) on last_input if ldlvRob Clark2019-05-311-3/+12
| | | | | | | | | | | | | | | It seems like (ei) handling doesn't sync on (ss), so we could end up in a situation where we release varying storage before an ldlv for flat shaded varyings completes. Keep track if we've done an (ss) since the last ldlv, and if not add (ss) flag to last_input which gets (ei). Noticed with dEQP-GLES3.functional.fragment_out.random.24 and dEQP-GLES3.functional.fragment_out.random.27, which previously passed by luck because ir3_sched ordered instructions in a way that resulted in a lucky (ss). Signed-off-by: Rob Clark <[email protected]> Acked-by: Eric Anholt <[email protected]>
* freedreno/ir3: add assertRob Clark2019-05-311-0/+2
| | | | | | | | | The special handling for last_input assumes that all the varying loads are in the first block. Add an assert to catch if anyone breaks that assumption. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* freedreno/ir3: fix input ncomp for vertex shadersJonathan Marek2019-05-311-0/+1
| | | | | | | | | ncomp is never set for vertex shaders, but a3xx and a4xx still use it. Fixes: 831f1a05c0d freedreno/ir3: rework varying packing Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* spirv: Change spirv_to_nir() to return a nir_shaderCaio Marcelo de Oliveira Filho2019-05-291-4/+4
| | | | | | | | | | | | | | | spirv_to_nir() returned the nir_function corresponding to the entrypoint, as a way to identify it. There's now a bool is_entrypoint in nir_function and also a helper function to get the entry_point from a nir_shader. The return type reflects better what the function name suggests. It also helps drivers avoid the mistake of reusing internal shader references after running NIR_PASS on it. When using NIR_TEST_CLONE or NIR_TEST_SERIALIZE, those would be invalidated right in the first pass executed. Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* turnip: Don't re-use entry_point pointer from spirv_to_nirCaio Marcelo de Oliveira Filho2019-05-291-7/+5
| | | | | | | | | | Replace its uses with nir_shader_get_entrypoint(), and change the helper function to return nir_shader *. This is a preparation to change spirv_to_nir() return type. Reviewed-by: Rob Clark <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* nir: Drop imov/fmov in favor of one mov instructionJason Ekstrand2019-05-241-2/+2
| | | | | | | | | | | | | | | | The difference between imov and fmov has been a constant source of confusion in NIR for years. No one really knows why we have two or when to use one vs. the other. The real reason is that they do different things in the presence of source and destination modifiers. However, without modifiers (which many back-ends don't have), they are identical. Now that we've reworked nir_lower_to_source_mods to leave one abs/neg instruction in place rather than replacing them with imov or fmov instructions, we don't need two different instructions at all anymore. Reviewed-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Vasily Khoruzhick <[email protected]> Acked-by: Rob Clark <[email protected]>
* vulkan: fix build dependency issue with generated filesLionel Landwerlin2019-05-221-2/+1
| | | | | | | | | | | | | On machines with many cores, you can run into that issue : ../mesa-9999/src/vulkan/overlay-layer/overlay.cpp:42:10: fatal error: vk_enum_to_str.h: No such file or directory v2: Move declare_dependency around (Eric) Signed-off-by: Lionel Landwerlin <[email protected]> Reported-by: Jan Ziak Cc: <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* freedreno: Log the number of loops in the shader for shader-db.Eric Anholt2019-05-162-0/+2
| | | | | | | | | | shader-db's report.py will use this to see when we've changed loop unrolling behavior on a shader and skip including other stats like instruction count from being considered for that shader, since they won't be useful as a proxy for real world performance in that case. Reviewed-by: Rob Clark <[email protected]> Tested-by: Eduardo Lima Mitev <[email protected]>
* freedreno: Move msm_drm.h to the same spot as other DRM uapi.Eric Anholt2019-05-147-341/+4
| | | | | | | The new location matches other drivers, and has a README about the rules for updating it. Reviewed-by: Rob Clark <[email protected]>
* freedreno: Quiet compiler warnings on 64-bit.Eric Anholt2019-05-131-1/+1
| | | | | | | __u64 is a ulonglong on x86_64, not uint64_t, so my gcc was complaining about the wrong type being passed in. Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno: Make emacs indent the way robclark's eclipse does.Eric Anholt2019-05-131-0/+3
| | | | | | | | | The .editorconfig helps with the tabs, but we've got this two-tabs-from-previous-indentation line continuation style that requires whacking the c-file-offsets. This will throw emacs warnings when first opening a file in the directory, press '!' to shut it up for the future. Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno: Make .editorconfig match .dir-locals.el.Eric Anholt2019-05-131-0/+5
| | | | | | | | The editorconfig takes precedence over dir-locals in emacs26 with editorconfig enabled, so the /.editorconfig was affecting these directories. Reviewed-by: Kristian H. Kristensen <[email protected]>
* tu/entrypoints: Import copyJason Ekstrand2019-05-131-0/+1
| | | | It's used without being imported
* nir: allow specifying a set of opcodes in lower_alu_to_scalarJonathan Marek2019-05-101-1/+1
| | | | | | | | | This can be used by both etnaviv and freedreno/a2xx as they are both vec4 architectures with some instructions being scalar-only. Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Christian Gmeiner <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* freedreno/ir3: fix rasterflat/glxgearsRob Clark2019-05-092-5/+5
| | | | | | | | Ofc legacy gl features that are broken don't trigger fails in deqp. I should remember to test glxgears more often. Fixes: 7ff6705b8d8 freedreno/ir3: convert to "new style" frag inputs Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: move const_state to ir3_shaderRob Clark2019-05-079-22/+25
| | | | | | | | | | | | | | | | | | | For a6xx, we construct/emit a single VS const state used for both binning pass and draw pass. So far we were mostly getting lucky that there were not (obvious) mismatches between the const_state (like different lowered immediates) between the binning and draw pass VS ir3_shader_variant. And I guess this situation will come up more as GS and tess is added into the equation. Since really everything about the const state is not specific to the variant, move this. The main exception is lowered immediates, but these are the last to appear in the layout, and it doesn't hurt for each new shader variant to just append any immed's it lowers to the end of the immediate state. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: split out const_state setupRob Clark2019-05-073-52/+61
| | | | | | | Next patch moves const_state to ir3_shader, before the compile context is created. So move the code around in prep to call it earlier. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: move immediates to const_stateRob Clark2019-05-074-28/+27
| | | | | | | They are really part of the constant state, and it will moving things from ir3_shader_variant to ir3_shader if we combine them. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: consolidate const stateRob Clark2019-05-078-75/+90
| | | | | | | | Combine the offsets of differenet parts of the constant space with (what was formerly known as) ir3_driver_const_layout. Bunch of churn, but no functional change. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: move ir3_pointer_size()Rob Clark2019-05-074-9/+9
| | | | | | | | Move to ir3_compiler so it doesn't depend on the compile context. Prep work for moving constant state from variant (where we have compile context) to shader (where we do not). Signed-off-by: Rob Clark <[email protected]>
* nir: Use the flrp lowering pass instead of nir_opt_algebraicIan Romanick2019-05-061-0/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | I tried to be very careful while updating all the various drivers, but I don't have any of that hardware for testing. :( i965 is the only platform that sets always_precise = true, and it is only set true for fragment shaders. Gen4 and Gen5 both set lower_flrp32 only for vertex shaders. For fragment shaders, nir_op_flrp is lowered during code generation as a(1-c)+bc. On all other platforms 64-bit nir_op_flrp and on Gen11 32-bit nir_op_flrp are lowered using the old nir_opt_algebraic method. No changes on any other Intel platforms. v2: Add panfrost changes. Iron Lake and GM45 had similar results. (Iron Lake shown) total cycles in shared programs: 188647754 -> 188647748 (<.01%) cycles in affected programs: 5096 -> 5090 (-0.12%) helped: 3 HURT: 0 helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 helped stats (rel) min: 0.12% max: 0.12% x̄: 0.12% x̃: 0.12% Reviewed-by: Matt Turner <[email protected]>
* nir: nir_shader_compiler_options: drop native_integersChristian Gmeiner2019-05-071-2/+0
| | | | | | | | Driver which do not support native integers should use a lowering pass to go from integers to floats. Signed-off-by: Christian Gmeiner <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* mesa: Makefile.sources: Add ir3_nir_lower_load_barycentric_at_sample/offset ↵John Stultz2019-05-061-0/+2
| | | | | | | | | | | | | | | | | | | | | to Makefile.sources In commit 2f0b9d22495 ("freedreno/ir3: lower load_barycentric_at_offset") a new file was added that needs to also be added to the Makefile.sources list used by Android and SCons build system. Cc: Rob Clark <[email protected]> Cc: Emil Velikov <[email protected]> Cc: Amit Pundir <[email protected]> Cc: Sumit Semwal <[email protected]> Cc: Alistair Strachan <[email protected]> Cc: Greg Hartman <[email protected]> Cc: Tapani Pälli <[email protected]> Cc: Jason Ekstrand <[email protected]> Fixes: 2f0b9d22495 ("freedreno/ir3: lower load_barycentric_at_offset") Reviewed-by: Emil Velikov <[email protected]> Signed-off-by: John Stultz <[email protected]>
* mesa: android: freedreno: build libfreedreno_{drm,ir3} static libsAmit Pundir2019-05-063-0/+122
| | | | | | | | | | | | | | | | | | | | Add libfreedreno_drm/ir3 to the build Cc: Rob Clark <[email protected]> Cc: Emil Velikov <[email protected]> Cc: Amit Pundir <[email protected]> Cc: Sumit Semwal <[email protected]> Cc: Alistair Strachan <[email protected]> Cc: Greg Hartman <[email protected]> Cc: Tapani Pälli <[email protected]> Cc: Jason Ekstrand <[email protected]> Fixes: b4476138d5a ("freedreno: move drm to common location") Fixes: aa0fed10d35 ("freedreno: move ir3 to common location") Reviewed-by: Tapani Pälli <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Signed-off-by: Amit Pundir <[email protected]> [jstultz: Tweaked to add extra ir3 files from master] Signed-off-by: John Stultz <[email protected]>
* freedreno: update generated headersRob Clark2019-05-047-30/+56
| | | | | | Corrects tex state ubwc pitch/size Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: remove assertRob Clark2019-05-041-1/+0
| | | | | | | | | | | Fixes dEQP-GLES31.functional.ubo.random.all_per_block_buffers.13 and .20 ca3eb5db665cbcc2de5a5d3158e3dc68f86e5822 went from silently truncating the constant state, which was also the wrong thing to do, to an assert. Which then showed up in a couple of dEQPs. Actually there is nothing wrong with larger constant file so just drop the assert. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: fb read supportRob Clark2019-05-023-7/+33
| | | | | | | | Lower load_output to txf_ms_fb and add support for the new texture fetch instruction. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno/drm: expose GMEM_BASE addressRob Clark2019-05-023-0/+9
| | | | | | | Needed for sampling from tile buffer (GMEM). Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno/ir3: add some ubo range related assertsRob Clark2019-05-022-1/+5
| | | | | | | And a comment.. since we are mixing units of bytes/dwords/vec4, hopefully this will avoid some unit confusion. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: add IR3_SHADER_DEBUG flag to disable ubo loweringRob Clark2019-05-023-1/+9
| | | | | | | | | It isn't quite as simple as not running the pass, since with packed varyings we get load_ubo for block==0 (ie. the "real" uniforms). So instead run the pass normally but decline to lower anything in block > 0 Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: fix lowered ubo region alignmentRob Clark2019-05-021-1/+1
| | | | | | | | | | | Since we emit UBO regions INDIRECTly (ie. not copied into cmdstream but emit by EXT_SRC_ADDR) we need to keep them 4*vec4 aligned. Which the code already mostly did, except for aligning the first UBO region itself (ie. the one after block==0 which is the "real" uniforms). Fixes: 893425a607a freedreno/ir3: Push UBOs to constant file Fixes: 3c8779af325 freedreno/ir3: Enable PIPE_CAP_PACKED_UNIFORMS Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: fix shader variants vs UBO analysisRob Clark2019-05-021-1/+3
| | | | | | | | | | | Otherwise we zero out the state again, but all the UBO loads that we could lower are already lowered. End result is that we didn't emit the uniforms for lowered UBO access in any case where multiple shader variants are used. Fixes: 893425a607a freedreno/ir3: Push UBOs to constant file Fixes: 3c8779af325 freedreno/ir3: Enable PIPE_CAP_PACKED_UNIFORMS Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: fixes for half reg in/outRob Clark2019-04-304-13/+39
| | | | | | | Needs to update max_half_reg, or be remapped to full reg and update max_reg accordingly, depending on generation.. Signed-off-by: Rob Clark <[email protected]>
* turnip: update to use the new features struct namesEric Engestrom2019-04-301-5/+5
| | | | | | | | | These were updated in version 1.1.106 of vulkan.h to make more sense with the extension names. We may as well keep with the times. See also: 90108deb277d33d19233 "anv: Update to use the new features struct names" Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Chia-I Wu <[email protected]>