summaryrefslogtreecommitdiffstats
path: root/src/freedreno
Commit message (Collapse)AuthorAgeFilesLines
* freedreno/a6xx: pre-calculate userconst stateobj sizeRob Clark2019-09-121-0/+1
| | | | | | | | The AnTuTu "garden" benchmark overflows the fixed size constbuffer stateobject, so lets be more clever and calculate (a potentially slightly pessimistic) actual size. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: Implement primitive count queries on GPUKristian H. Kristensen2019-09-062-0/+5
| | | | | | | | | | The driver can't determine PIPE_QUERY_PRIMITIVES_GENERATED or PIPE_QUERY_PRIMITIVES_EMITTED once we support geometry or tessellation, since these stages add primitives at runtime. Use the WRITE_PRIMITIVE_COUNTS event to write back the primitive counts and implement a hw query for this. Reviewed-by: Rob Clark <[email protected]>
* freedreno/a2xx: formats updateJonathan Marek2019-09-061-3/+3
| | | | | | | | | | | | | | For render formats, update fd2_pipe2color to only work with HW supported render formats, and remove the format whitelist is_format_supported. This patch enables float render formats (which work). For vertex/texture formats, use a generic function which translates using the bitsize of the channels. Since we fake support for some vertex formats, check for these in is_format_supported to avoid enabling them as sampler formats. Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* freedreno/a2xx: implement polygon offsetJonathan Marek2019-09-061-0/+2
| | | | | | | | | Fixes failures in the following deqp tests: dEQP-GLES2.functional.polygon_offset.* Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* nir: allow specifying filter callback in lower_alu_to_scalarVasily Khoruzhick2019-09-061-1/+1
| | | | | | | | | | | | | Set of opcodes doesn't have enough flexibility in certain cases. E.g. Utgard PP has vector conditional select operation, but condition is always scalar. Lowering all the vector selects to scalar increases instruction number, so we need a way to filter only those ops that can't be handled in hardware. Reviewed-by: Qiang Yu <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Signed-off-by: Vasily Khoruzhick <[email protected]>
* freedreno/ir3: allow copy propagation for relativeRob Clark2019-09-061-9/+19
| | | | | | | | | | This appears to work fine (with the additional constraint of keeping the indirect load in the same block that a0.x was loaded). We can probably lift this restriction on earlier gens after testing. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* freedreno/ir3: fix cp cmps.s optRob Clark2019-09-061-1/+1
| | | | | | | | | | | Need to use ir3_instr_set_address(), otherwise the instruction might not get added to the indirects table. This becomes a problem when we turn on copy propagation for relative accesses, as check_instr() in the sched pass won't realize there is an indirect consumer of address register load that is ready to be scheduled. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* freedreno/ir3: assert that only single addressRob Clark2019-09-062-0/+5
| | | | | | | | | | | | | | An instruction can reference only a single address register value. Add an assert to catch bugs. Also, address value should also be local to the same block as the instruction. (The one spot where changing the instruction address is actually legit needs to clear the address first.) Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* freedreno/ir3: fix mad copy propagation special caseRob Clark2019-09-061-9/+35
| | | | | | | | | | | | After the next patch enabling copy propagation for relative sources, we'll need to dereference the n'th src in valid_flags(), so we actually need to swap the sources before calling valid_flags(). But the logic was already a bit cumbersome, so move it into a helper function. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* freedreno/ir3: fix addr/pred spillingRob Clark2019-09-061-7/+42
| | | | | | | | | The live_values and use_count was not being properly updated. This starts triggering problems with the next patch, where we allow copy propagation for RELATIV access. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* freedreno/ir3: cleanup "partially const" ubo srcsRob Clark2019-09-061-4/+52
| | | | | | | | | | Move the constant part of the indirect offset into nir intrinsic base. When we have multiple indirect accesses with different constant offsets, this lets other opt passes clean up things to use a single address register value. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* freedreno/drm-shim: fix mem leakEric Engestrom2019-09-041-3/+4
| | | | | | Fixes: 494ecef6b42198ab6c3e ("freedreno: Add support for drm-shim.") Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* freedreno/ir3: use uniform baseRob Clark2019-09-031-4/+4
| | | | | | | | | | When lowering from ubo, use the constant base field in the load_uniform instruction for the constant part of the offset. Doesn't change much for constant indexing, but this will help for indirect indexing because constant-folding can't completely clean up the result. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* freedreno/drm: fix 64b iova shiftsRob Clark2019-09-031-10/+4
| | | | | | | Should shift before splitting 64b iova into dwords Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* freedreno/ir3: Link directly to Sethi-Ullman paperAlyssa Rosenzweig2019-08-301-1/+1
| | | | | | | | Allow a direct link to the PDF itself from the authors themselves, rather than a paywall splash page. Signed-off-by: Alyssa Rosenzweig <[email protected]> Acked-by: Rob Clark <[email protected]>
* freedreno/ir3: do better job of marking convergence pointsRob Clark2019-08-281-35/+28
| | | | | | | | | Fixes: dEQP-GLES3.functional.shaders.switch.switch_in_do_while_loop_dynamic_vertex dEQP-GLES3.functional.shaders.switch.switch_in_do_while_loop_dynamic_fragment Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* freedreno/ir3: maintain predecessors/successorsRob Clark2019-08-281-2/+42
| | | | | | | | While resolving jumps to skip intermediate jumps from the structured CFG, maintain the successors and predecessors correctly. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* freedreno/ir3: convert block->predecessors to setRob Clark2019-08-285-18/+19
| | | | | Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* nir: Add explicit signs to image min/max intrinsicsJason Ekstrand2019-08-214-8/+16
| | | | | | | | | | | This better matches all the other atomic intrinsics such as those for SSBOs and shared variables where the sign is part of the intrinsic opcode. Both generators (GLSL and SPIR-V) know the sign from the type of the image variable or handle. In SPIR-V, signed min/max are separate opcodes from unsigned. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* freedreno/ir3+a6xx: same VBO state for draw/binningRob Clark2019-08-135-17/+141
| | | | | | Worth ~+20% on gl_driver2 Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: track # of driver paramsRob Clark2019-08-132-10/+32
| | | | | | To avoid emitting unneeded const state. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: drop unneeded ir3_ra() argsRob Clark2019-08-133-9/+3
| | | | Signed-off-by: Rob Clark <[email protected]>
* nir: replace nir_move_load_const() with nir_opt_sink()Rhys Perry2019-08-121-1/+1
| | | | | | | | | | | | | | | | | | | This is mostly the same as nir_move_load_const() but can also move undef instructions, comparisons and some intrinsics (being careful with loops). v2: actually delete nir_move_load_const.c v3: fix nir_opt_sink() usage in freedreno v3: update Makefile.sources v4: replace get_move_def with nir_can_move_instr and nir_instr_ssa_def v4: handle if uses v4: fix handling of nested loops v5: re-write adjust_block_for_loops v5: re-write setting of use_block for if uses Signed-off-by: Rhys Perry <[email protected]> Co-authored-by: Daniel Schürmann <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* spirv: Drop lower_workgroup_access_to_offsetsCaio Marcelo de Oliveira Filho2019-08-101-1/+0
| | | | | | | | Intel drivers are not using this anymore, and turnip still don't have Compute Shaders, so won't make a difference to stop using this option. Reviewed-by: Jason Ekstrand <[email protected]> Acked-by: Rob Clark <[email protected]>
* mesa: freedreno: Android.registers.mk: Fix up register xml.h file generationJohn Stultz2019-08-071-3/+34
| | | | | | | | | | | | | | | | | | | | | | | | The current Androdi.registers.mk file causes build failures that look like: FAILED: external/mesa3d/src/freedreno/Android.registers.mk:49: error: implicit rules are obsolete: out/target/product/linaro_db845c/gen/STATIC_LIBRARIES/libfreedreno_registers_intermediates/registers/%.xml.h Caused by the following Android build rule change: https://android.googlesource.com/platform/build/+/HEAD/Changes.md#implicit_rules I tried to replace this with something similar to the static pattern suggested in the URL above, but ended up getting all the xml.h files generated using only the first a2xx.xml source file. So I've fallen back to explicitly defining the make rules for each. Additionally, we needed to provide the proper LOCAL_EXPORT_C_INCLUDE_DIRS and add the defined static library to the components that depend on the register headers. Acked-by: Eric Anholt <[email protected]> Signed-off-by: John Stultz <[email protected]>
* mesa: Add ir3/ir3_nir_imul.c generation to Android.mkJohn Stultz2019-08-071-1/+2
| | | | | | | | | | | | | | | | | | | | | | With current master we're seeing build failures with AOSP: error: undefined symbol: ir3_nir_lower_imul This is due to the ir3_nir_imul.c file not being generated in the Android.mk files. This patch simply adds it to the Android build, after which thigns build and book ok on db410c. Cc: Rob Clark <[email protected]> Cc: Emil Velikov <[email protected]> Cc: Amit Pundir <[email protected]> Cc: Sumit Semwal <[email protected]> Cc: Alistair Strachan <[email protected]> Cc: Greg Hartman <[email protected]> Cc: Tapani Pälli <[email protected]> Reviewed-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Signed-off-by: John Stultz <[email protected]>
* meson: replace libmesa_util with idep_mesautilEric Engestrom2019-08-031-1/+1
| | | | | | | | | | | This automates the include_directories and dependencies tracking so that all users of libmesa_util don't need to add them manually. Next commit will remove the ones that were only added for that reason. Signed-off-by: Eric Engestrom <[email protected]> Acked-by: Eric Anholt <[email protected]> Tested-by: Vinson Lee <[email protected]>
* freedreno: update registersRob Clark2019-08-022-4/+42
| | | | | | | Pull in some updates of VSC regs Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* freedreno/drm: convert ring_pool to child_poolRob Clark2019-08-023-6/+29
| | | | | | Worth another couple percent at driver2 Signed-off-by: Rob Clark <[email protected]>
* freedreno/drm: remove idx_lockRob Clark2019-08-023-29/+24
| | | | | | | Since it ends up contended, it is a bit of a bottleneck for workloads with high driver overhead. Worth nearly +10% at gfxbench driver2. Signed-off-by: Rob Clark <[email protected]>
* freedreno: a2xx: implement texture tilingJonathan Marek2019-08-021-1/+1
| | | | | Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* freedreno/ir3: fix for array/reg store vs meta instructionsRob Clark2019-07-291-1/+4
| | | | | | | | | | | | | | | fishgl.com has a shader which does roughly: foo = texture(...); if (bar) foo = texture(...); after lowering phi webs to regs we end up w/ a vec4 reg (array). But since it was not an indirect access, we try to skip the extra mov. This results that the per-component fanout (split) meta instructions store directly to the reg (array). Which doesn't work out in RA. Signed-off-by: Rob Clark <[email protected]>
* freedreno: Fix data race on making the shader's id.Eric Anholt2019-07-291-1/+2
| | | | | | | The value is only used for IR3_DBG_DISASM, but it cleans up the helgrind output. Reviewed-by: Rob Clark <[email protected]>
* freedreno: Take a lock around shader variant creation.Eric Anholt2019-07-292-0/+7
| | | | | | | | | Shaders are shared across contexts in gallium (part of making it so that you get shader compile at link time, for shader-db and to reduce compiles at draw time). So, we need to protect from variant creation for a shader from multiple threads at the same time. Reviewed-by: Rob Clark <[email protected]>
* freedreno: Fix data races with allocating/freeing struct ir3.Eric Anholt2019-07-291-1/+1
| | | | | | | | | | | | | | There is a single ir3_compiler in the screen, and each context may be compiling ir3 shaders, which call ir3_create. ralloc doesn't do any locking on its own, so eventually you can end up racing to break ralloc's linked lists. We really don't want struct ir3 to live as long as the compiler (maybe struct ir3_shader's lifetime, if anything), so you'd better be freeing it anyway. Fixes: 8fe20762433d ("freedreno/ir3: convert over to ralloc") Reviewed-by: Rob Clark <[email protected]>
* anv+tu+radv: delete unusable dev_icd.jsonEric Engestrom2019-07-261-13/+0
| | | | | | | | | | | As per previous commit, Meson doesn't support using uninstalled libs, they're simply not ready until `ninja install` is ran, so delete them. Suggested-by: Jason Ekstrand <[email protected]> Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> # for anv Reviewed-by: Eric Anholt <[email protected]> # for tu Reviewed-by: Bas Nieuwenhuizen <[email protected]> # for radv
* freedreno: Add support for drm-shim.Eric Anholt2019-07-254-0/+224
| | | | | | I'm using this for shader-db analysis on x86_64 systems. Reviewed-by: Rob Clark <[email protected]>
* freedreno: Convert nir_lower_tg4_to_tex to the NIR lowering helper.Eric Anholt2019-07-181-88/+51
| | | | | | Cuts a bunch of boilerplate. Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno: Convert load_barycentric_at_sample to the NIR lowering helper.Eric Anholt2019-07-181-48/+30
| | | | | | Cuts out a ton of boilerplate. Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno: Convert load_barycentric_at_offset to the NIR lowering helper.Eric Anholt2019-07-181-39/+19
| | | | | | Cuts out a ton of boilerplate. Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno: Generate headers from xml filesKristian H. Kristensen2019-07-1024-23892/+14106
| | | | | Reviewed-by: Eric Engestrom <[email protected]> Acked-by: Rob Clark <[email protected]>
* tu: add exported symbols checkEric Engestrom2019-07-101-0/+13
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* nir: Add lower_rotate flag and set to true in all driversSagar Ghuge2019-07-011-0/+2
| | | | | | Signed-off-by: Sagar Ghuge <[email protected]> Suggested-by: Matt Turner <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* freedreno: update generated registersRob Clark2019-07-017-16/+23
| | | | | | Corrects the a3xx texconst state for TILE_MODE. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: small cleanupRob Clark2019-06-281-1/+1
| | | | | | `target` cannot be NULL here. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: fix missing (ss) in dummy bary.f caseRob Clark2019-06-281-0/+5
| | | | | | | | | | | | In case we need to insert a dummy bary.f for the (ei) flag, it also needs (ss) so we don't release varying storage to the next VS wave before the ldlv completed. Fixes random failures in: dEQP-GLES3.functional.transform_feedback.random.interleaved.lines.* Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno: Only upload the used part of UBO0 to the constant buffer.Eric Anholt2019-06-242-5/+13
| | | | | | | | | | We were pessimistically uploading all of it in case of indirection, but we can just bump that when we encounter indirection. total constlen in shared programs: 2529623 -> 2485933 (-1.73%) Reviewed-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* freedreno: Stop treating UBO 0 specially in UBO uploading.Eric Anholt2019-06-242-7/+0
| | | | | | | | | ir3_nir_analyze_ubo_ranges() has already told us how much of cb0 we need to upload (all of it, since it will lower indirect UBO 0 accesses from load_ubo back to indirection on the constant buffer). Reviewed-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* nir: define behavior of nir_op_bfm and nir_op_u/ibfe according to SM5 spec.Daniel Schürmann2019-06-241-2/+0
| | | | | | | | | | | That is: the five least significant bits provide the values of 'bits' and 'offset' which is the case for all hardware currently supported by NIR and using the bfm/bfe instructions. This patch also changes the lowering of bitfield_insert/extract using shifts to not use bfm and removes the flag 'lower_bfm'. Tested-by: Eric Anholt <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* freedreno: Only upload UBO pointers for UBOs that haven't been lowered.Eric Anholt2019-06-211-1/+7
| | | | | | | total constlen in shared programs: 2485933 -> 2462236 (-0.95%) Reviewed-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>