summaryrefslogtreecommitdiffstats
path: root/src/broadcom
Commit message (Collapse)AuthorAgeFilesLines
* v3d: use inc/dec tmu operation with image atomic sub/add of 1Alejandro Piñeiro2019-07-121-5/+11
| | | | | | | | | | | | | | | | | | This allows to remove a mov of 1/-1, as it is implicit with the operation. As with atomic inc/dec/add, usual shader-db set doesn't include any GLES shader using it. So using as workaround vk-gl-cts shaders, we get this: total instructions in shared programs: 1217013 -> 1217006 (<.01%) instructions in affected programs: 53 -> 46 (-13.21%) helped: 2 HURT: 0 One of the helped shader went from 40 to 34 instructions. Reviewed-by: Eric Anholt <[email protected]>
* v3d: refactor some code from v3d40_vir_emit_image_load_storeAlejandro Piñeiro2019-07-121-33/+29
| | | | | | | And moved to new auxiliar method v3d40_image_load_store_tmu_op, equivalent to the nir_to_nir v3d_general_tmu_op, to clean-up a little. Reviewed-by: Eric Anholt <[email protected]>
* v3d: use inc/dec tmu operation with atomic sub/add of 1Alejandro Piñeiro2019-07-122-6/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Among other things, this avoid the need of loading 1/-1 constants (so one less operation). The removed comment suggest the option of adding support on NIR for inc/dec. Intel just uses an auxiliar method to get which hw operation is needed, so no lowering is needed. And at the same time, being so small, seems unreasonable to try to add a general one on NIR itself. It is more easy to just adapt the method here (that is what the patch does right now). It is worth to note that we are not getting any change on shader-db stats because all those methods are used on the usual shader-db set with shaders needing GLSL > 4.2. In general there aren't too many GLSL ES 3.1 tests. As an alternative, we captured the GLES3/GLSL31/GLS32 used on vk-gl-cts, even if that is not a real life usage of shaders. With those we get the following: total instructions in shared programs: 1217022 -> 1217013 (<.01%) instructions in affected programs: 117 -> 108 (-7.69%) helped: 6 HURT: 0 helped stats (abs) min: 1 max: 2 x̄: 1.50 x̃: 1 helped stats (rel) min: 3.57% max: 10.00% x̄: 8.09% x̃: 9.09% 95% mean confidence interval for instructions value: -2.07 -0.93 95% mean confidence interval for instructions %-change: -10.54% -5.64% Instructions are helped. Note that the shaders helped are really low because most of the vk-gl-cts tests using AtomicInc/Dec/Add are mostly used on compute shaders. Although right now there is a branch around with CS support, the usual is doing the stats against master. Reviewed-by: Eric Anholt <[email protected]>
* v3d: remove redefinition of tmu operations on nir_to_virAlejandro Piñeiro2019-07-121-38/+21
| | | | | | | | | | | | They are already defined, although is a slightly different format on the generated packet headers, so it was needed to change how it is used on nir_to_vir. In addition to allow to remove some duplicated headers, it will allow to define just one get_op_for_atomic_add aux method later to support using inc/dec instead of add of 1/-1. Reviewed-by: Eric Anholt <[email protected]>
* v3d: tweak initial comment on pack generator scriptAlejandro Piñeiro2019-07-121-1/+1
| | | | | | | As the files it mentions to use as reference has slightly different names. Reviewed-by: Eric Anholt <[email protected]>
* v3d: remove unused definitionsIago Toral Quiroga2019-07-121-7/+0
| | | | Reviewed-by: Eric Anholt <[email protected]>
* v3d: move implementation of some intrinsics to separate helpersIago Toral Quiroga2019-07-121-78/+90
| | | | Reviewed-by: Eric Anholt <[email protected]>
* v3d: emit correct lowering for logic ops with RGB10A2 render targetsIago Toral Quiroga2019-07-121-12/+64
| | | | Reviewed-by: Eric Anholt <[email protected]>
* v3d: emit correct lowering for logic ops with integer render targetsIago Toral Quiroga2019-07-122-9/+47
| | | | Reviewed-by: Eric Anholt <[email protected]>
* v3d: add lowering for OpenGL logic operationsIago Toral Quiroga2019-07-124-0/+279
| | | | | | | | | | | | | | | | | | | | | This implements support for OpenGL logic operations by emitting code to read from the TLB if needed and blending the fragment output accordingly. It is similar to VC4's blend lowering pass, but exclusive to logic operations, since blending is otherwise supported in hardware. The pass doesn't handle MSAA targets yet. Fixes the following piglit tests: spec/!opengl 1.0/gl-1.0-logicop/* spec/!opengl 1.1/gl-1.1-xor spec/!opengl 1.1/gl-1.1-xor-copypixels It also fixes text cursor rendering in Libreoffice with the GTK+2 theme, which is rendered via glamor using the XOR logic operation. v2: fix checks for allowed variable location and maximum render target (Eric) Reviewed-by: Eric Anholt <[email protected]>
* v3d: acquire scoreboard lock before first tlb readIago Toral Quiroga2019-07-123-0/+34
| | | | | | | | | | | | | | | | | | | Until now we have always been emitting our scoreboard locks on the last thread switch to improve parallelism. We did this by emitting our last thread switch right before our tlb writes at the very end of the program, where we know that we are outside control flow. Unfortunately, this strategy is not valid when we have tlb color reads too, as these will happen before this point in the program and can happen inside control flow. To fix this we always emit a thread switch before the first tlb load and if we see additional thread switches after that point, we change the strategy to lock on the first thread switch. v2: change the solution so it is expected to work in more scenarios (Eric). Reviewed-by: Eric Anholt <[email protected]>
* v3d: implement tile buffer color read intrinsicIago Toral Quiroga2019-07-121-0/+100
| | | | | | | | | | | | We will be emitting this intrinsic to signal TLB color loads when we implement OpenGL logic operations, where we need to blend the fragment shader color output with the existing color in the render target. Per-sample TLB reads are not supported yet. v2: fix the offset into the color_reads array (Eric). Reviewed-by: Eric Anholt <[email protected]>
* v3d: fix size of color_reads and sample_colors arraysIago Toral Quiroga2019-07-121-2/+2
| | | | | | | | | We need to scale the size of these arrays to consider up to V3D_MAX_DRAW_BUFFERS render targets and 4 components per color. v2: we want to store each color component separately, so scale by 4 too. Reviewed-by: Eric Anholt <[email protected]>
* v3d: add color formats and swizzles to the fragment shader keyIago Toral Quiroga2019-07-121-0/+9
| | | | | | | We are going to need these very soon to emit correct reads from the tlb to implement logic operations. Reviewed-by: Eric Anholt <[email protected]>
* v3d: add helpers to emit ldtlb and ldtlbu signalsIago Toral Quiroga2019-07-121-0/+24
| | | | | | | | | | The ldtlbu version will read an implicit uniform with the TLB read specifier and should be used for the first read in a sequence of TLB reads (unless the default configuration is valid, in which case we can use ldtlb). The ldtlb version is used for any subsequent TLB read in the sequence. Reviewed-by: Eric Anholt <[email protected]>
* v3d: handle tlb read dependency tracking as if they were writesIago Toral Quiroga2019-07-121-1/+1
| | | | | | Tile buffer reads are emitted as ordered sequences and cannot be reordered. Reviewed-by: Eric Anholt <[email protected]>
* v3d: instructions with the ldtlb and ldtlbu signals are tlb instructionsIago Toral Quiroga2019-07-121-0/+3
| | | | Reviewed-by: Eric Anholt <[email protected]>
* v3d: tlb loads cannot be removedIago Toral Quiroga2019-07-121-0/+2
| | | | | | | Loads from the tile buffer are emitted in ordered sequences so we cannot eliminate or reorder any of them. Reviewed-by: Eric Anholt <[email protected]>
* v3d: the ldtlbu signal reads an implicit uniformIago Toral Quiroga2019-07-121-0/+1
| | | | Reviewed-by: Eric Anholt <[email protected]>
* v3d: handle ldtlb and ldtlbu signals during disassemblyIago Toral Quiroga2019-07-121-0/+2
| | | | | | | | We already have code to print these signals but the early return in the code that checks if any signals are present present was missing the checks for them, so it would skip printing them unless they were paired with other signals. Reviewed-by: Eric Anholt <[email protected]>
* nir: Add lower_rotate flag and set to true in all driversSagar Ghuge2019-07-011-0/+1
| | | | | | Signed-off-by: Sagar Ghuge <[email protected]> Suggested-by: Matt Turner <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* nir: define behavior of nir_op_bfm and nir_op_u/ibfe according to SM5 spec.Daniel Schürmann2019-06-241-1/+0
| | | | | | | | | | | That is: the five least significant bits provide the values of 'bits' and 'offset' which is the case for all hardware currently supported by NIR and using the bfm/bfe instructions. This patch also changes the lowering of bitfield_insert/extract using shifts to not use bfm and removes the flag 'lower_bfm'. Tested-by: Eric Anholt <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* v3d: implement simultaneous peripheral access exceptions for V3D 4.1+Iago Toral Quiroga2019-06-183-6/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Shader-db results: total instructions in shared programs: 9117550 -> 9102719 (-0.16%) instructions in affected programs: 1752873 -> 1738042 (-0.85%) helped: 7076 HURT: 478 helped stats (abs) min: 1 max: 22 x̄: 2.19 x̃: 2 helped stats (rel) min: 0.07% max: 13.89% x̄: 1.70% x̃: 1.07% HURT stats (abs) min: 1 max: 7 x̄: 1.41 x̃: 1 HURT stats (rel) min: 0.09% max: 10.17% x̄: 0.86% x̃: 0.54% 95% mean confidence interval for instructions value: -2.00 -1.92 95% mean confidence interval for instructions %-change: -1.58% -1.50% Instructions are helped. total max-temps in shared programs: 1327774 -> 1327728 (<.01%) max-temps in affected programs: 1025 -> 979 (-4.49%) helped: 47 HURT: 2 helped stats (abs) min: 1 max: 2 x̄: 1.02 x̃: 1 helped stats (rel) min: 2.63% max: 20.00% x̄: 7.67% x̃: 5.26% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 4.17% max: 4.17% x̄: 4.17% x̃: 4.17% 95% mean confidence interval for max-temps value: -1.06 -0.82 95% mean confidence interval for max-temps %-change: -8.89% -5.49% Max-temps are helped. Reviewed-by: Eric Anholt <[email protected]>
* v3d: do not setup execute flags for else block in uniform control flowIago Toral Quiroga2019-06-141-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | Either all channels executed the 'then' block, in which case all channels will directly jump to the 'endif' block at the end of the 'then' block, or all channels execute the 'else' block (so no execution masking is necessary). Shader-db results: total instructions in shared programs: 9119238 -> 9117550 (-0.02%) instructions in affected programs: 401252 -> 399564 (-0.42%) helped: 855 HURT: 77 total uniforms in shared programs: 3022622 -> 3022605 (<.01%) uniforms in affected programs: 3566 -> 3549 (-0.48%) helped: 17 HURT: 0 total max-temps in shared programs: 1327762 -> 1327774 (<.01%) max-temps in affected programs: 619 -> 631 (1.94%) helped: 2 HURT: 15 Reviewed-by: Eric Anholt <[email protected]>
* v3d: fix checking twice auf flagAlejandro Piñeiro2019-06-131-1/+1
| | | | | | | | | Seems a C&P error, and should check for auf/muf. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110902 Fixes: 8f065596d22ab000c53f "v3d: Add an optimization pass for redundant flags updates." Reviewed-by: Eric Anholt <[email protected]>
* v3d: don't emit point coordinates varyings if the FS doesn't read themIago Toral Quiroga2019-06-073-5/+22
| | | | | | | We still need to emit them in V3D 3.x since there there is no mechanism to disable them. Reviewed-by: Eric Anholt <[email protected]>
* v3d: add a helper to track variables that need point coordinatesIago Toral Quiroga2019-06-071-5/+10
| | | | Reviewed-by: Eric Anholt <[email protected]>
* v3d: fix scheduling dependency tracking for ALU with small immediatesIago Toral Quiroga2019-06-061-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | We were not accountint for small immediates in the B mux so the scheduler was interpreting these are regular register file accesses, which could lead to additional (incorrect) write-read dependencies. Shader-db changes: total instructions in shared programs: 9163664 -> 9137263 (-0.29%) instructions in affected programs: 3931035 -> 3904634 (-0.67%) helped: 12457 HURT: 2563 total max-temps in shared programs: 1325787 -> 1325597 (-0.01%) max-temps in affected programs: 5746 -> 5556 (-3.31%) helped: 186 HURT: 16 helped stats (abs) min: 1 max: 4 x̄: 1.12 x̃: 1 helped stats (rel) min: 1.45% max: 22.22% x̄: 4.42% x̃: 3.28% HURT stats (abs) min: 1 max: 3 x̄: 1.12 x̃: 1 HURT stats (rel) min: 2.86% max: 10.00% x̄: 5.76% x̃: 5.88% 95% mean confidence interval for max-temps value: -1.04 -0.84 95% mean confidence interval for max-temps %-change: -4.16% -3.07% Max-temps are helped. Reviewed-by: Eric Anholt <[email protected]>
* v3d: Enable NIR's lower_fmod option.Kenneth Graunke2019-06-051-0/+1
| | | | | | | | | | Currently, st/mesa is always calling the GLSL IR lower_instructions() pass with MOD_TO_FLOOR set, so mod operations will be lowered before ever reaching NIR. This enables the same lowering at the NIR level, which will let me shut off the GLSL IR path for NIR-based drivers. Reviewed-by: Marek Olšák <[email protected]> Acked-by: Eric Anholt <[email protected]>
* nir: Drop imov/fmov in favor of one mov instructionJason Ekstrand2019-05-241-2/+1
| | | | | | | | | | | | | | | | The difference between imov and fmov has been a constant source of confusion in NIR for years. No one really knows why we have two or when to use one vs. the other. The real reason is that they do different things in the presence of source and destination modifiers. However, without modifiers (which many back-ends don't have), they are identical. Now that we've reworked nir_lower_to_source_mods to leave one abs/neg instruction in place rather than replacing them with imov or fmov instructions, we don't need two different instructions at all anymore. Reviewed-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Vasily Khoruzhick <[email protected]> Acked-by: Rob Clark <[email protected]>
* nir: allow specifying a set of opcodes in lower_alu_to_scalarJonathan Marek2019-05-101-1/+1
| | | | | | | | | This can be used by both etnaviv and freedreno/a2xx as they are both vec4 architectures with some instructions being scalar-only. Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Christian Gmeiner <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* nir: Initialize lower_flrp_progress everywhereIan Romanick2019-05-091-1/+1
| | | | | | | | | | | | | | | | I don't know why I thought NIR_PASS always set the progress variable. Derp. Fixes: d41cdef2a59 ("nir: Use the flrp lowering pass instead of nir_opt_algebraic") Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Coverity CID: 1444996 Coverity CID: 1444995 Coverity CID: 1444994 Coverity CID: 1444993 Coverity CID: 1444991 Coverity CID: 1444989
* nir: Use the flrp lowering pass instead of nir_opt_algebraicIan Romanick2019-05-061-0/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | I tried to be very careful while updating all the various drivers, but I don't have any of that hardware for testing. :( i965 is the only platform that sets always_precise = true, and it is only set true for fragment shaders. Gen4 and Gen5 both set lower_flrp32 only for vertex shaders. For fragment shaders, nir_op_flrp is lowered during code generation as a(1-c)+bc. On all other platforms 64-bit nir_op_flrp and on Gen11 32-bit nir_op_flrp are lowered using the old nir_opt_algebraic method. No changes on any other Intel platforms. v2: Add panfrost changes. Iron Lake and GM45 had similar results. (Iron Lake shown) total cycles in shared programs: 188647754 -> 188647748 (<.01%) cycles in affected programs: 5096 -> 5090 (-0.12%) helped: 3 HURT: 0 helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 helped stats (rel) min: 0.12% max: 0.12% x̄: 0.12% x̃: 0.12% Reviewed-by: Matt Turner <[email protected]>
* nir: nir_shader_compiler_options: drop native_integersChristian Gmeiner2019-05-071-1/+0
| | | | | | | | Driver which do not support native integers should use a lowering pass to go from integers to floats. Signed-off-by: Christian Gmeiner <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* delete autotools .gitignore filesEric Engestrom2019-04-292-4/+0
| | | | | | | | One special case, `src/util/xmlpool/.gitignore` is not entirely deleted, as `xmlpool.pot` still gets generated (eg. by `ninja xmlpool-pot`). Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* v3d: Fix detection of TMU write sequences in register spilling.Eric Anholt2019-04-261-2/+9
| | | | | | | | We can't use the QPU functions to detect this until register allocation is done and we've moved inst->dst into inst->qpu. Fixes bad TMU sequences from register spilling in KHR-GLES31.core.compute_shader.shared-max.
* v3d: Fix detection of the last ldtmu before a new TMU op.Eric Anholt2019-04-261-3/+3
| | | | | We were looking at the start instruction, instead of scanning through the list of following instructions to find any more ldtmus.
* v3d: Re-add support for memory_barrier_shared.Eric Anholt2019-04-261-0/+1
| | | | | | | | Looks like I lost it in a rebase conflict resolution. We'd hit the unknown intrinsic assertion in KHR-GLES31.core.compute_shader.shared-struct. Fixes: 6b1c65982509 ("v3d: Add Compute Shader compilation support.")
* v3d: Add a note about i/o indirection for future performance work.Eric Anholt2019-04-261-0/+7
|
* v3d: Assert that we do request the normal texturing return data.Eric Anholt2019-04-261-0/+2
| | | | | An unused tex should be DCEed, but if it wasn't we'd run into trouble with not doing a TMUWT.
* v3d: Fix atomic cmpxchg in shaders on hardware.Eric Anholt2019-04-181-3/+13
| | | | | | | | In what might be my first case of finding a divergence between hardware and simpenrose for v3d 4.x, it seems that despite what the spec claims, you actually need specific values in the TYPE field for atomic ops. Fixes dEQP-GLES31.functional.*.compswap.*
* v3d: Fix an invalid reuse of flags generation from before a thrsw.Eric Anholt2019-04-181-0/+4
| | | | | Noticed while debugging the last GLES 3.1 failure, though it doesn't seem to affect that bug.
* v3d: Always set up the qregs for CSD payload.Eric Anholt2019-04-161-10/+2
| | | | | | | | We were failing to set up payload[1] for use by LocalInvocationIndex/ID and shared variable accesses if gl_WorkGroupID/gl_GlobalInvocationID wasn't used (possibly because you only have one workgroup). You're always going to use payload[1], and payload[0] is common enough and we have DCE in the backend to clean it up if it happens to not be used.
* v3d: Only look up the 3rd texture gather offset for non-arrays.Eric Anholt2019-04-161-1/+1
| | | | | | | Fixes assertion failures in the CTS since Karol's cleanup when NIR started noticing that we were reading an invalid component. Fixes: 5450f1c9fb09 ("v3d: prefer using nir_src_comp_as_int over nir_src_as_const_value")
* Delete autotoolsDylan Baker2019-04-154-158/+0
| | | | | | | | | | Acked-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Acked-by: Marek Olšák <[email protected]> Acked-by: Jason Ekstrand <[email protected]> Acked-by: Bas Nieuwenhuizen <[email protected]> Acked-by: Matt Turner <[email protected]>
* nir: make nir_const_value scalarKarol Herbst2019-04-141-1/+1
| | | | | | | | | v2: remove & operator in a couple of memsets add some memsets v3: fixup lima Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> (v2)
* v3d: Use the new lower_to_scratch implementation for indirects on temps.Eric Anholt2019-04-127-10/+191
| | | | | | | | | | | | | We can use the same register spilling infrastructure for our loads/stores of indirect access of temp variables, instead of doing an if ladder. Cuts 50% of instructions and max-temps from 2 KSP shaders in shader-db. Also causes several other KSP shaders with large bodies and large loop counts to not be force-unrolled. The change was originally motivated by NOLTIS slightly modifying register pressure in piglit temp mat4 array read/write tests, triggering register allocation failures.
* v3d: Detect the correct number of QPUs and use it to fix the spill size.Eric Anholt2019-04-121-0/+3
| | | | | We were missing a * 4 even if the particular hardware matched our assumption.
* v3d: Add missing dumping for the spill offset/size uniforms.Eric Anholt2019-04-121-0/+8
|
* v3d: Add missing base offset to CS shared memory accesses.Eric Anholt2019-04-121-9/+20
| | | | | This code is so touchy, trying to emit the minimum amount of address math. Some day we'll move it all to NIR, I hope.