summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/freedreno
Commit message (Collapse)AuthorAgeFilesLines
* nir: move to compiler/Emil Velikov2016-01-263-5/+5
| | | | | | Signed-off-by: Emil Velikov <[email protected]> Acked-by: Matt Turner <[email protected]> Acked-by: Jose Fonseca <[email protected]>
* nir: move shader_enums.[ch] to compilerEmil Velikov2016-01-262-2/+2
| | | | | | | | | This way one can reuse it in glsl, nir or other infrastructure without pulling nir as dependency. Signed-off-by: Emil Velikov <[email protected]> Acked-by: Matt Turner <[email protected]> Acked-by: Jose Fonseca <[email protected]>
* freedreno/a4xx: Add support for adreno 430cstout2016-01-211-0/+1
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno: make opc array static constChristian Gmeiner2016-01-211-1/+1
| | | | | Signed-off-by: Christian Gmeiner <[email protected]> Signed-off-by: Rob Clark <[email protected]>
* freedreno: implement emit_string_markerRob Clark2016-01-212-1/+28
| | | | | | Writes string to cmdstream in payload of a no-op packet. Signed-off-by: Rob Clark <[email protected]>
* gallium: add GREMEDY_string_markerRob Clark2016-01-211-0/+1
| | | | | | | | | | Since the GREMEDY extensions are normally only exposed by the gremedy debugger (and could possibly trigger debug paths in the app), we don't expose the extension by default, but instead only with ST_DEBUG=gremedy. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* freedreno/a4xx: use smaller threadsize for more registersRob Clark2016-01-181-2/+5
| | | | | | | | Once we go past half of the "GPR" register file, it seems like we need to run frag shader with smaller threadsize. (The vertex shader already runs at TWO_QUADS, which is the minimum.) Signed-off-by: Rob Clark <[email protected]>
* freedreno: per-generation OUT_IB packetRob Clark2016-01-189-6/+43
| | | | | | | | | | Some a4xx firmware doesn't implement the "PFD" (prefetch-disabled) version of the CP_INDIRECT_BUFFER packet. So allow for PFD vs PFE per generation. Switch a3xx and a4xx over to using prefetch-enabled version (which is also what blob does.. it seems only on a2xx we cannot use PFE). Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: fix mad 3rd src delay calcRob Clark2016-01-171-1/+1
| | | | | | | In fad158a0 ("freedreno/ir3: array rework") the src # (n) shifted by one, but missed updating delay-slot calc. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: better array register allocationRob Clark2016-01-162-9/+51
| | | | | | | Detect arrays which don't conflict with each other and allow overlapping register allocation. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: array offset can be negativeRob Clark2016-01-165-12/+13
| | | | | | | | | | | | | | | | | | | | | | It at least happens with some piglit tests, like $piglit/bin/vp-address-01 VERT DCL IN[0] DCL IN[1] DCL OUT[0], POSITION DCL OUT[1], COLOR DCL CONST[0..7] DCL ADDR[0] 0: ARL ADDR[0].x, IN[1].xxxx 1: MOV_SAT OUT[1], CONST[ADDR[0].x-1] 2: DP4 OUT[0].x, CONST[4], IN[0] 3: DP4 OUT[0].y, CONST[5], IN[0] 4: DP4 OUT[0].z, CONST[6], IN[0] 5: DP4 OUT[0].w, CONST[7], IN[0] 6: END Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: workaround bug/featureRob Clark2016-01-161-0/+9
| | | | | | | | | | Seems like in certain cases, we cannot use c<a0.x+0> as the third src to cat3 instructions. This may be slightly conservative, we may only have this restriction when the first src is also const. This fixes, for example, +24/-0 of the variable-indexing piglit tests. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: array reworkRob Clark2016-01-169-363/+365
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: refactor/simplify cpRob Clark2016-01-161-87/+82
| | | | | | | | | If we handle separately the special case of eliminating output mov (which includes keeps and various other cases where we don't have a consuming instruction's src register to collapse things into), we can simplify the logic. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: fix incorrect decoding of mov instructionsRob Clark2016-01-161-1/+1
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: remove unused tgsi tokens ptrRob Clark2016-01-161-1/+0
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: bit of ra refactorRob Clark2016-01-161-25/+20
| | | | | | | Shuffle things slightly, passing instr-data to ra_name() to reduce the number of places where we need to add support for array names. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: cosmetic de-indentRob Clark2016-01-161-36/+34
| | | | | | Collapse two nested if's into one to reduce indent level. Signed-off-by: Rob Clark <[email protected]>
* gallium/st: add pipe_context::generate_mipmap()Charmaine Lee2016-01-141-0/+1
| | | | | | | | | | | | | | | | This patch adds a new interface to support hardware mipmap generation. PIPE_CAP_GENERATE_MIPMAP is added to allow a driver to specify if this new interface is supported; if not supported, the state tracker will fallback to mipmap generation by rendering/texturing. v2: add PIPE_CAP_GENERATE_MIPMAP to the disabled section for all drivers v3: add format to the generate_mipmap interface to allow mipmap generation using a format other than the resource format v4: fix return type of trace_context_generate_mipmap() Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* gallium: add PIPE_CAP_INVALIDATE_BUFFERNicolai Hähnle2016-01-141-0/+1
| | | | | | | | | It makes sense to re-use pipe->invalidate_resource for the purpose of glInvalidateBufferData, but this function is already implemented in vc4 where it doesn't have the expected behavior. So add a capability flag to indicate that the driver supports the expected behavior. Reviewed-by: Marek Olšák <[email protected]>
* freedreno: add ir3_compiler to gitignoreIlia Mirkin2016-01-081-0/+1
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* gallium: add PIPE_CAP_SHADER_BUFFER_OFFSET_ALIGNMENTIlia Mirkin2016-01-081-0/+1
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium: add PIPE_SHADER_CAP_MAX_SHADER_BUFFERSIlia Mirkin2016-01-081-0/+2
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium: add caps for POSITION and FACE system valuesMarek Olšák2016-01-081-0/+2
| | | | | | | v2: document the integer behavior Reviewed-by: Edward O'Callaghan <[email protected] Reviewed-by: Brian Paul <[email protected]>
* gallium: add caps to expose support for multi indirect drawsIlia Mirkin2016-01-071-0/+2
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium: add PIPE_CAP_TGSI_PACK_HALF_FLOAT to indicate UP2H/PK2H supportIlia Mirkin2016-01-031-0/+1
| | | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* freedreno/ir3: use NIR_PASS helper macrosRob Clark2016-01-031-19/+28
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: we require block_index metadataRob Clark2016-01-031-0/+2
| | | | | | | | Found during NIR_TEST_CLONE=1 piglit run. We were using block->index but forgetting to require it. Causing things to not work with a cloned shader which didn't preserve block_index. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: refactor NIR IR handlingRob Clark2016-01-037-111/+202
| | | | | | | | | Immediately convert into NIR and do an initial key-agnostic lowering/ optimization pass. This should let us share most of the per-variant transformations between each variant, and hopefully minimize the draw- time variant creation part of the compilation process. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: drop unnecessary unreachable() caseRob Clark2016-01-031-2/+0
| | | | | | | It will still hit a compile_assert() in emit_tex, which has the advantage of dumping out the offending shader. Signed-off-by: Rob Clark <[email protected]>
* u_upload_mgr: allow specifying PIPE_USAGE_* for the upload bufferMarek Olšák2016-01-022-2/+4
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* u_upload_mgr: remove alignment parameter from u_upload_createMarek Olšák2016-01-022-4/+2
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* u_upload_mgr: pass alignment to u_upload_alloc manuallyMarek Olšák2016-01-025-4/+8
| | | | | | | | | | The fixed alignment of u_upload_mgr will go away. This is the first step. The motivation is that one u_upload_mgr can have multiple users, each allocating from the same buffer, but requiring a different alignment. Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium: add PIPE_CAP_DRAW_PARAMETERSIlia Mirkin2015-12-301-0/+1
| | | | | | | | This allows the state tracker to know that the various draw parameters are available in vertex shaders. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* nir: Get rid of function overloadsJason Ekstrand2015-12-282-7/+7
| | | | | | | | | | | | | | | | | When Connor originally drafted NIR, he copied the same function+overload system that GLSL IR had with a few names changed. However, this double-indirection is not really needed and has only served to confuse people. Instead, let's just have functions which may not have unique names and may or may not have an implementation. If someone wants to do overload resolving, they can hav a hash table based function+overload system in the overload resolving pass. There's no good reason to keep it in core NIR. Reviewed-by: Connor Abbott <[email protected]> Acked-by: Kenneth Graunke <[email protected]> ir3 bits are Reviewed-by: Rob Clark <[email protected]>
* freedreno/ir3: spelling..Rob Clark2015-12-231-6/+6
| | | | Signed-off-by: Rob Clark <[email protected]>
* nir: Add a writemask to store intrinsics.Kenneth Graunke2015-12-221-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Tessellation control shaders need to be careful when writing outputs. Because multiple threads can concurrently write the same output variables, we need to only write the exact components we were told. Traditionally, for sub-vector writes, we've read the whole vector, updated the temporary, and written the whole vector back. This breaks down with concurrent access. This patch prepares the way for a solution by adding a writemask field to store_var intrinsics, as well as the other store intrinsics. It then updates all produces to emit a writemask of "all channels enabled". It updates nir_lower_io to copy the writemask to output store intrinsics. Finally, it updates nir_lower_vars_to_ssa to handle partial writemasks by doing a read-modify-write cycle (which is safe, because local variables are specific to a single thread). This should have no functional change, since no one actually emits partial writemasks yet. v2: Make nir_validate momentarily assert that writemasks cover the complete value - we shouldn't have partial writemasks yet (requested by Jason Ekstrand). v3: Fix accidental SSBO change that arose from merge conflicts. v4: Don't try to handle writemasks in ir3_compiler_nir - my code for indirects was likely wrong, and TTN doesn't generate partial writemasks today anyway. Change them to asserts as requested by Rob Clark. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> [v3]
* freedreno/ir3: fix 32-bit builds with pointer-to-int-cast error enabledRob Herring2015-12-181-1/+1
| | | | | | | | | Android builds with -Werror=pointer-to-int-cast causing an error on 32-bit builds. Cc: "11.0 11.1" <[email protected]> Signed-off-by: Rob Herring <[email protected]> Signed-off-by: Rob Clark <[email protected]>
* freedreno/a4xx: fix fragcoord.z + fragdepthRob Clark2015-12-152-5/+5
| | | | | | | | | | | | | It seems like disabling earlyz on a4xx also, by defaults, disables fragcoord.z to the FS. For frag shaders that both read fragcoord(.z) and write fragdepth, we need to set some extra bits to prevent a lockup. This lets us get rid of the hack of disabling fragcoord.z (which prevented 0ad from lockups, but resulted in rendering corruption). Also fixes fbo-depth-sample-compare. Signed-off-by: Rob Clark <[email protected]>
* freedreno: update generated headersRob Clark2015-12-156-92/+231
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3/cmdline: don't dump nir by defaultRob Clark2015-12-151-3/+1
| | | | | | By default we only want the disasm dumped, which we get anyways. Signed-off-by: Rob Clark <[email protected]>
* nir: Get rid of *_indirect variants of input/output load/store intrinsicsJason Ekstrand2015-12-101-32/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is some special-casing needed in a competent back-end. However, they can do their special-casing easily enough based on whether or not the offset is a constant. In the mean time, having the *_indirect variants adds special cases a number of places where they don't need to be and, in general, only complicates things. To complicate matters, NIR had no way to convdert an indirect load/store to a direct one in the case that the indirect was a constant so we would still not really get what the back-ends wanted. The best solution seems to be to get rid of the *_indirect variants entirely. This commit is a bunch of different changes squashed together: - nir: Get rid of *_indirect variants of input/output load/store intrinsics - nir/glsl: Stop handling UBO/SSBO load/stores differently depending on indirect - nir/lower_io: Get rid of load/store_foo_indirect - i965/fs: Get rid of load/store_foo_indirect - i965/vec4: Get rid of load/store_foo_indirect - tgsi_to_nir: Get rid of load/store_foo_indirect - ir3/nir: Use the new unified io intrinsics - vc4: Do all uniform loads with byte offsets - vc4/nir: Use the new unified io intrinsics - vc4: Fix load_user_clip_plane crash - vc4: add missing src for store outputs - vc4: Fix state uniforms - nir/lower_clip: Update to the new load/store intrinsics - nir/lower_two_sided_color: Update to the new load intrinsic NIR and i965 changes are Reviewed-by: Kenneth Graunke <[email protected]> NIR indirect declarations and vc4 changes are Reviewed-by: Eric Anholt <[email protected]> ir3 changes are Reviewed-by: Rob Clark <[email protected]> NIR changes are Acked-by: Rob Clark <[email protected]>
* freedreno: little clean up in fd_create_surfaceSerge Martin2015-12-091-15/+16
| | | | | | in order to avoid returing invalid adress if CALLOC_STRUCT return NULL. Signed-off-by: Rob Clark <[email protected]>
* freedreno: change to goto failSerge Martin2015-12-091-4/+2
| | | | | | in fd_resource_transfer_map, like the others error cases Signed-off-by: Rob Clark <[email protected]>
* freedreno: fix bind_sampler_states when hwcso is NULLSerge Martin2015-12-093-0/+9
| | | | | | | | | | src/gallium/tests/trivial/compute.c expects samplers to be cleaned when the samplers list is NULL. Like in radeon, the function behave like when the number of samplers parameter is set to 0. [small s/hwsco/hwcso/ typo fix] Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: nir shader prints with 'disasm' debug optionRob Clark2015-12-051-2/+2
| | | | | | | | Move these to 'disasm' instead of the more verbose 'optmsgs' since, like the tgsi dumps, it is useful without the more verbose compiler logging enabled. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: convert scheduler back to recursive algoRob Clark2015-12-042-127/+204
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I've played with a few different approaches to tweak instruction priority according to how much they increase/decrease register pressure, etc. But nothing seems to change the fact that compared to original (pre-multiple-block-support) scheduler, in some edge cases we are generating shaders w/ 5-6x higher register usage. The problem is that the priority queue approach completely looses the dependency between instructions, and ends up scheduling all paths at the same time. Original reason for switching was that recursive approach relied on starting from the shader outputs array. But we can achieve more or less the same thing by starting from the depth-sorted list. shader-db results: total instructions in shared programs: 113350 -> 105183 (-7.21%) total dwords in shared programs: 219328 -> 211168 (-3.72%) total full registers used in shared programs: 7911 -> 7383 (-6.67%) total half registers used in shader programs: 109 -> 109 (0.00%) total const registers used in shared programs: 21294 -> 21294 (0.00%) half full const instr dwords helped 0 322 0 711 215 hurt 0 163 0 38 4 The shaders hurt tend to gain a register or two. While there are also a lot of helped shaders that only loose a register or two, the more complex ones tend to loose significanly more registers used. In some more extreme cases, like glsl-fs-convolution-1.shader_test it is more like 7 vs 34 registers! Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: don't reuse a0.x across blocksRob Clark2015-12-041-7/+14
| | | | | | | | It causes confusion in sched if we need to split_addr() since otherwise we wouldn't easily know which block the new addr instr will be scheduled in. So just side-step the whole situation. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: rename ir3_block::bdRob Clark2015-12-043-11/+11
| | | | | | | | We'll need to add similar for ir3_instruction, but following the pattern to use 'id' seems confusing. Let's just go w/ generic 'data' as the name. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: assign varying locations laterRob Clark2015-11-264-29/+37
| | | | | | | | | | | | | Rather than assigning inloc up front, when we don't yet know if it will be unused, assign it last thing before the legalize pass. Also, realize when inputs are unused (since for frag shader's we can't rely on them being removed from ir->inputs[]). This doesn't make sense if we don't also dynamically assign the inloc's, since we could end up telling the hw the wrong # of varyings (since we currently assume that the # of varyings and max-inloc are related..) Signed-off-by: Rob Clark <[email protected]>