summaryrefslogtreecommitdiffstats
path: root/src/broadcom
Commit message (Collapse)AuthorAgeFilesLines
* nir: Add lower_rotate flag and set to true in all driversSagar Ghuge2019-07-011-0/+1
| | | | | | Signed-off-by: Sagar Ghuge <[email protected]> Suggested-by: Matt Turner <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* nir: define behavior of nir_op_bfm and nir_op_u/ibfe according to SM5 spec.Daniel Schürmann2019-06-241-1/+0
| | | | | | | | | | | That is: the five least significant bits provide the values of 'bits' and 'offset' which is the case for all hardware currently supported by NIR and using the bfm/bfe instructions. This patch also changes the lowering of bitfield_insert/extract using shifts to not use bfm and removes the flag 'lower_bfm'. Tested-by: Eric Anholt <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* v3d: implement simultaneous peripheral access exceptions for V3D 4.1+Iago Toral Quiroga2019-06-183-6/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Shader-db results: total instructions in shared programs: 9117550 -> 9102719 (-0.16%) instructions in affected programs: 1752873 -> 1738042 (-0.85%) helped: 7076 HURT: 478 helped stats (abs) min: 1 max: 22 x̄: 2.19 x̃: 2 helped stats (rel) min: 0.07% max: 13.89% x̄: 1.70% x̃: 1.07% HURT stats (abs) min: 1 max: 7 x̄: 1.41 x̃: 1 HURT stats (rel) min: 0.09% max: 10.17% x̄: 0.86% x̃: 0.54% 95% mean confidence interval for instructions value: -2.00 -1.92 95% mean confidence interval for instructions %-change: -1.58% -1.50% Instructions are helped. total max-temps in shared programs: 1327774 -> 1327728 (<.01%) max-temps in affected programs: 1025 -> 979 (-4.49%) helped: 47 HURT: 2 helped stats (abs) min: 1 max: 2 x̄: 1.02 x̃: 1 helped stats (rel) min: 2.63% max: 20.00% x̄: 7.67% x̃: 5.26% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 4.17% max: 4.17% x̄: 4.17% x̃: 4.17% 95% mean confidence interval for max-temps value: -1.06 -0.82 95% mean confidence interval for max-temps %-change: -8.89% -5.49% Max-temps are helped. Reviewed-by: Eric Anholt <[email protected]>
* v3d: do not setup execute flags for else block in uniform control flowIago Toral Quiroga2019-06-141-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | Either all channels executed the 'then' block, in which case all channels will directly jump to the 'endif' block at the end of the 'then' block, or all channels execute the 'else' block (so no execution masking is necessary). Shader-db results: total instructions in shared programs: 9119238 -> 9117550 (-0.02%) instructions in affected programs: 401252 -> 399564 (-0.42%) helped: 855 HURT: 77 total uniforms in shared programs: 3022622 -> 3022605 (<.01%) uniforms in affected programs: 3566 -> 3549 (-0.48%) helped: 17 HURT: 0 total max-temps in shared programs: 1327762 -> 1327774 (<.01%) max-temps in affected programs: 619 -> 631 (1.94%) helped: 2 HURT: 15 Reviewed-by: Eric Anholt <[email protected]>
* v3d: fix checking twice auf flagAlejandro Piñeiro2019-06-131-1/+1
| | | | | | | | | Seems a C&P error, and should check for auf/muf. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110902 Fixes: 8f065596d22ab000c53f "v3d: Add an optimization pass for redundant flags updates." Reviewed-by: Eric Anholt <[email protected]>
* v3d: don't emit point coordinates varyings if the FS doesn't read themIago Toral Quiroga2019-06-073-5/+22
| | | | | | | We still need to emit them in V3D 3.x since there there is no mechanism to disable them. Reviewed-by: Eric Anholt <[email protected]>
* v3d: add a helper to track variables that need point coordinatesIago Toral Quiroga2019-06-071-5/+10
| | | | Reviewed-by: Eric Anholt <[email protected]>
* v3d: fix scheduling dependency tracking for ALU with small immediatesIago Toral Quiroga2019-06-061-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | We were not accountint for small immediates in the B mux so the scheduler was interpreting these are regular register file accesses, which could lead to additional (incorrect) write-read dependencies. Shader-db changes: total instructions in shared programs: 9163664 -> 9137263 (-0.29%) instructions in affected programs: 3931035 -> 3904634 (-0.67%) helped: 12457 HURT: 2563 total max-temps in shared programs: 1325787 -> 1325597 (-0.01%) max-temps in affected programs: 5746 -> 5556 (-3.31%) helped: 186 HURT: 16 helped stats (abs) min: 1 max: 4 x̄: 1.12 x̃: 1 helped stats (rel) min: 1.45% max: 22.22% x̄: 4.42% x̃: 3.28% HURT stats (abs) min: 1 max: 3 x̄: 1.12 x̃: 1 HURT stats (rel) min: 2.86% max: 10.00% x̄: 5.76% x̃: 5.88% 95% mean confidence interval for max-temps value: -1.04 -0.84 95% mean confidence interval for max-temps %-change: -4.16% -3.07% Max-temps are helped. Reviewed-by: Eric Anholt <[email protected]>
* v3d: Enable NIR's lower_fmod option.Kenneth Graunke2019-06-051-0/+1
| | | | | | | | | | Currently, st/mesa is always calling the GLSL IR lower_instructions() pass with MOD_TO_FLOOR set, so mod operations will be lowered before ever reaching NIR. This enables the same lowering at the NIR level, which will let me shut off the GLSL IR path for NIR-based drivers. Reviewed-by: Marek Olšák <[email protected]> Acked-by: Eric Anholt <[email protected]>
* nir: Drop imov/fmov in favor of one mov instructionJason Ekstrand2019-05-241-2/+1
| | | | | | | | | | | | | | | | The difference between imov and fmov has been a constant source of confusion in NIR for years. No one really knows why we have two or when to use one vs. the other. The real reason is that they do different things in the presence of source and destination modifiers. However, without modifiers (which many back-ends don't have), they are identical. Now that we've reworked nir_lower_to_source_mods to leave one abs/neg instruction in place rather than replacing them with imov or fmov instructions, we don't need two different instructions at all anymore. Reviewed-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Vasily Khoruzhick <[email protected]> Acked-by: Rob Clark <[email protected]>
* nir: allow specifying a set of opcodes in lower_alu_to_scalarJonathan Marek2019-05-101-1/+1
| | | | | | | | | This can be used by both etnaviv and freedreno/a2xx as they are both vec4 architectures with some instructions being scalar-only. Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Christian Gmeiner <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* nir: Initialize lower_flrp_progress everywhereIan Romanick2019-05-091-1/+1
| | | | | | | | | | | | | | | | I don't know why I thought NIR_PASS always set the progress variable. Derp. Fixes: d41cdef2a59 ("nir: Use the flrp lowering pass instead of nir_opt_algebraic") Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Coverity CID: 1444996 Coverity CID: 1444995 Coverity CID: 1444994 Coverity CID: 1444993 Coverity CID: 1444991 Coverity CID: 1444989
* nir: Use the flrp lowering pass instead of nir_opt_algebraicIan Romanick2019-05-061-0/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | I tried to be very careful while updating all the various drivers, but I don't have any of that hardware for testing. :( i965 is the only platform that sets always_precise = true, and it is only set true for fragment shaders. Gen4 and Gen5 both set lower_flrp32 only for vertex shaders. For fragment shaders, nir_op_flrp is lowered during code generation as a(1-c)+bc. On all other platforms 64-bit nir_op_flrp and on Gen11 32-bit nir_op_flrp are lowered using the old nir_opt_algebraic method. No changes on any other Intel platforms. v2: Add panfrost changes. Iron Lake and GM45 had similar results. (Iron Lake shown) total cycles in shared programs: 188647754 -> 188647748 (<.01%) cycles in affected programs: 5096 -> 5090 (-0.12%) helped: 3 HURT: 0 helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 helped stats (rel) min: 0.12% max: 0.12% x̄: 0.12% x̃: 0.12% Reviewed-by: Matt Turner <[email protected]>
* nir: nir_shader_compiler_options: drop native_integersChristian Gmeiner2019-05-071-1/+0
| | | | | | | | Driver which do not support native integers should use a lowering pass to go from integers to floats. Signed-off-by: Christian Gmeiner <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* delete autotools .gitignore filesEric Engestrom2019-04-292-4/+0
| | | | | | | | One special case, `src/util/xmlpool/.gitignore` is not entirely deleted, as `xmlpool.pot` still gets generated (eg. by `ninja xmlpool-pot`). Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* v3d: Fix detection of TMU write sequences in register spilling.Eric Anholt2019-04-261-2/+9
| | | | | | | | We can't use the QPU functions to detect this until register allocation is done and we've moved inst->dst into inst->qpu. Fixes bad TMU sequences from register spilling in KHR-GLES31.core.compute_shader.shared-max.
* v3d: Fix detection of the last ldtmu before a new TMU op.Eric Anholt2019-04-261-3/+3
| | | | | We were looking at the start instruction, instead of scanning through the list of following instructions to find any more ldtmus.
* v3d: Re-add support for memory_barrier_shared.Eric Anholt2019-04-261-0/+1
| | | | | | | | Looks like I lost it in a rebase conflict resolution. We'd hit the unknown intrinsic assertion in KHR-GLES31.core.compute_shader.shared-struct. Fixes: 6b1c65982509 ("v3d: Add Compute Shader compilation support.")
* v3d: Add a note about i/o indirection for future performance work.Eric Anholt2019-04-261-0/+7
|
* v3d: Assert that we do request the normal texturing return data.Eric Anholt2019-04-261-0/+2
| | | | | An unused tex should be DCEed, but if it wasn't we'd run into trouble with not doing a TMUWT.
* v3d: Fix atomic cmpxchg in shaders on hardware.Eric Anholt2019-04-181-3/+13
| | | | | | | | In what might be my first case of finding a divergence between hardware and simpenrose for v3d 4.x, it seems that despite what the spec claims, you actually need specific values in the TYPE field for atomic ops. Fixes dEQP-GLES31.functional.*.compswap.*
* v3d: Fix an invalid reuse of flags generation from before a thrsw.Eric Anholt2019-04-181-0/+4
| | | | | Noticed while debugging the last GLES 3.1 failure, though it doesn't seem to affect that bug.
* v3d: Always set up the qregs for CSD payload.Eric Anholt2019-04-161-10/+2
| | | | | | | | We were failing to set up payload[1] for use by LocalInvocationIndex/ID and shared variable accesses if gl_WorkGroupID/gl_GlobalInvocationID wasn't used (possibly because you only have one workgroup). You're always going to use payload[1], and payload[0] is common enough and we have DCE in the backend to clean it up if it happens to not be used.
* v3d: Only look up the 3rd texture gather offset for non-arrays.Eric Anholt2019-04-161-1/+1
| | | | | | | Fixes assertion failures in the CTS since Karol's cleanup when NIR started noticing that we were reading an invalid component. Fixes: 5450f1c9fb09 ("v3d: prefer using nir_src_comp_as_int over nir_src_as_const_value")
* Delete autotoolsDylan Baker2019-04-154-158/+0
| | | | | | | | | | Acked-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Acked-by: Marek Olšák <[email protected]> Acked-by: Jason Ekstrand <[email protected]> Acked-by: Bas Nieuwenhuizen <[email protected]> Acked-by: Matt Turner <[email protected]>
* nir: make nir_const_value scalarKarol Herbst2019-04-141-1/+1
| | | | | | | | | v2: remove & operator in a couple of memsets add some memsets v3: fixup lima Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> (v2)
* v3d: Use the new lower_to_scratch implementation for indirects on temps.Eric Anholt2019-04-127-10/+191
| | | | | | | | | | | | | We can use the same register spilling infrastructure for our loads/stores of indirect access of temp variables, instead of doing an if ladder. Cuts 50% of instructions and max-temps from 2 KSP shaders in shader-db. Also causes several other KSP shaders with large bodies and large loop counts to not be force-unrolled. The change was originally motivated by NOLTIS slightly modifying register pressure in piglit temp mat4 array read/write tests, triggering register allocation failures.
* v3d: Detect the correct number of QPUs and use it to fix the spill size.Eric Anholt2019-04-121-0/+3
| | | | | We were missing a * 4 even if the particular hardware matched our assumption.
* v3d: Add missing dumping for the spill offset/size uniforms.Eric Anholt2019-04-121-0/+8
|
* v3d: Add missing base offset to CS shared memory accesses.Eric Anholt2019-04-121-9/+20
| | | | | This code is so touchy, trying to emit the minimum amount of address math. Some day we'll move it all to NIR, I hope.
* v3d: Add Compute Shader compilation support.Eric Anholt2019-04-123-4/+44
| | | | | | | | While waiting for the CSD UABI to get reviewed, I keep having to rebase the CS patch. Just land the compiler side for now to keep it from diverging. For now this covers just GLES 3.1 compute shaders, not CL kernels.
* v3d: Replace the old shader-db env var output with the ARB_debug_output.Eric Anholt2019-04-123-30/+4
| | | | | | | | | We're using ARB_debug_output for the main shader-db, but I had this env var left around from the shader-db-2 support (vc4 apitrace-based). Keep the env var around since it's nice sometimes to get the stats on a shader you're optimizing without having to do a shader-db run, but drop the old formatting that's not useful and keeps tricking me when I go to add another measurement to the shader-db output.
* v3d: Include the number of max temps used in the shader-db output.Eric Anholt2019-04-121-1/+29
| | | | | This gives us finer-grained feedback on how we're doing on register pressure than "did we trigger a new shader to spill or drop thread count?"
* v3d: Add and use a define for the number of channels in a QPU invocation.Eric Anholt2019-04-123-3/+9
| | | | | A shader invocation always executes 16 channels together, so we often end up multiplying things by this magic 16 number. Give it a name.
* nir/i965/freedreno/vc4: add a bindless bool to type size functionsTimothy Arceri2019-04-121-1/+1
| | | | | | | This required to calculate sizes correctly when we have bindless samplers/images. Reviewed-by: Marek Olšák <[email protected]>
* v3d: Add an optimization pass for redundant flags updates.Eric Anholt2019-04-115-0/+143
| | | | | | | | | | | | Our exec masking introduces lots of redundant flags updates, and even without that there will be cases where NIR comparisons on the same sources for different reasons may generate the same comparison instruction before the selection. total instructions in shared programs: 6492930 -> 6460934 (-0.49%) total uniforms in shared programs: 2117460 -> 2115106 (-0.11%) total spills in shared programs: 4983 -> 4987 (0.08%) total fills in shared programs: 6408 -> 6416 (0.12%)
* nir: Get rid of global registersJason Ekstrand2019-04-091-1/+0
| | | | | | | | | We have a pass to lower global registers to locals and many drivers dutifully call it. However, no one ever creates a global register ever so it's all dead code. It's time we bury it. Acked-by: Karol Herbst <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* v3d: prefer using nir_src_comp_as_int over nir_src_as_const_valueKarol Herbst2019-04-072-11/+8
| | | | | Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* v3d: Add some more new packets for V3D 4.x.Eric Anholt2019-04-041-0/+131
| | | | The T/G shader references and common state will be needed for GLES 3.2.
* v3d: Bump the maximum texture size to 4k for V3D 4.x.Eric Anholt2019-04-042-3/+4
| | | | | | | 4.1 and 4.2 both have the same 16k limit, but it I'm seeing GPU hangs in the CTS at 8k and 16k. 4k at least lets us get one 4k display working. Cc: [email protected]
* v3d: Remove some dead members of struct v3d_compile.Eric Anholt2019-03-211-12/+0
| | | | These are more vc4 leftovers.
* v3d: Upload all of UBO[0] if any indirect load occurs.Eric Anholt2019-03-213-129/+1
| | | | | | | | | | | | | | | The idea was that we could skip uploading the constant-indexed uniform data and just upload the uniforms that are variably-indexed. However, since the VS bin and render shaders may have a different set of uniforms used, this meant that we had to upload the UBO for each of them. The first case is generally a fairly small impact (usually the uniform array is the most space, other than a couple of FSes in shader-db), while the second is a larger impact: 3DMMES2 was uploading 38k/frame of uniforms instead of 18k. Given that the optimization is of dubious value, has a big downside, and is quite a bit of code, just drop it. No change in shader-db. No change on 3DMMES2 (n=15).
* v3d: Move constant offsets to UBO addresses into the main uniform stream.Eric Anholt2019-03-213-9/+16
| | | | | | | | | | We'd end up with the constant offset in the uniform stream anyway, since they're bigger than small immediates. Avoids the extra uniforms and adds in the shader in favor of just adding once on the CPU. shader-db: total instructions in shared programs: 6496865 -> 6494851 (-0.03%) total uniforms in shared programs: 2119511 -> 2117243 (-0.11%)
* v3d: Rename v3d_tmu_config_data to v3d_unit_data.Eric Anholt2019-03-212-9/+9
| | | | | | I want to reuse this for encoding small constant UBO/SSBO offsets into the uniform stream to reduce the extra uniform loads and adds for the small constant offsets.
* v3d: Fix leak of the mem_ctx after the DAG refactor.Eric Anholt2019-03-121-2/+2
| | | | | | Noticed while trying to get a CTS run again. Fixes: 33886474d646 ("v3d: Use the DAG datastructure for QPU instruction scheduling.")
* v3d: Use the DAG datastructure for QPU instruction scheduling.Eric Anholt2019-03-111-114/+72
| | | | Just a small code reduction from shared infrastructure.
* v3d: Reuse list_for_each_entry_rev().Eric Anholt2019-03-111-2/+2
|
* v3d: Include a count of register pressure in the RA failure dumps.Eric Anholt2019-03-061-1/+13
| | | | | | You usually want to go find the highest pressure and figure out why you couldn't spill or what pattern led to a bunch of pressure leading to that point.
* v3d: Drop the V3D 3.x vpm read dead code elimination.Eric Anholt2019-03-051-33/+2
| | | | | We now have NIR dead code eliminating our VPM reads, so this shouldn't be necessary.
* v3d: Eliminate the TLB and TLBU files.Eric Anholt2019-03-054-41/+20
| | | | We can just use the magic register file like we do for other magic waddrs.