summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/freedreno
Commit message (Collapse)AuthorAgeFilesLines
* gallium/st: add pipe_context::generate_mipmap()Charmaine Lee2016-01-141-0/+1
| | | | | | | | | | | | | | | | This patch adds a new interface to support hardware mipmap generation. PIPE_CAP_GENERATE_MIPMAP is added to allow a driver to specify if this new interface is supported; if not supported, the state tracker will fallback to mipmap generation by rendering/texturing. v2: add PIPE_CAP_GENERATE_MIPMAP to the disabled section for all drivers v3: add format to the generate_mipmap interface to allow mipmap generation using a format other than the resource format v4: fix return type of trace_context_generate_mipmap() Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* gallium: add PIPE_CAP_INVALIDATE_BUFFERNicolai Hähnle2016-01-141-0/+1
| | | | | | | | | It makes sense to re-use pipe->invalidate_resource for the purpose of glInvalidateBufferData, but this function is already implemented in vc4 where it doesn't have the expected behavior. So add a capability flag to indicate that the driver supports the expected behavior. Reviewed-by: Marek Olšák <[email protected]>
* freedreno: add ir3_compiler to gitignoreIlia Mirkin2016-01-081-0/+1
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* gallium: add PIPE_CAP_SHADER_BUFFER_OFFSET_ALIGNMENTIlia Mirkin2016-01-081-0/+1
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium: add PIPE_SHADER_CAP_MAX_SHADER_BUFFERSIlia Mirkin2016-01-081-0/+2
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium: add caps for POSITION and FACE system valuesMarek Olšák2016-01-081-0/+2
| | | | | | | v2: document the integer behavior Reviewed-by: Edward O'Callaghan <[email protected] Reviewed-by: Brian Paul <[email protected]>
* gallium: add caps to expose support for multi indirect drawsIlia Mirkin2016-01-071-0/+2
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium: add PIPE_CAP_TGSI_PACK_HALF_FLOAT to indicate UP2H/PK2H supportIlia Mirkin2016-01-031-0/+1
| | | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* freedreno/ir3: use NIR_PASS helper macrosRob Clark2016-01-031-19/+28
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: we require block_index metadataRob Clark2016-01-031-0/+2
| | | | | | | | Found during NIR_TEST_CLONE=1 piglit run. We were using block->index but forgetting to require it. Causing things to not work with a cloned shader which didn't preserve block_index. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: refactor NIR IR handlingRob Clark2016-01-037-111/+202
| | | | | | | | | Immediately convert into NIR and do an initial key-agnostic lowering/ optimization pass. This should let us share most of the per-variant transformations between each variant, and hopefully minimize the draw- time variant creation part of the compilation process. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: drop unnecessary unreachable() caseRob Clark2016-01-031-2/+0
| | | | | | | It will still hit a compile_assert() in emit_tex, which has the advantage of dumping out the offending shader. Signed-off-by: Rob Clark <[email protected]>
* u_upload_mgr: allow specifying PIPE_USAGE_* for the upload bufferMarek Olšák2016-01-022-2/+4
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* u_upload_mgr: remove alignment parameter from u_upload_createMarek Olšák2016-01-022-4/+2
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* u_upload_mgr: pass alignment to u_upload_alloc manuallyMarek Olšák2016-01-025-4/+8
| | | | | | | | | | The fixed alignment of u_upload_mgr will go away. This is the first step. The motivation is that one u_upload_mgr can have multiple users, each allocating from the same buffer, but requiring a different alignment. Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium: add PIPE_CAP_DRAW_PARAMETERSIlia Mirkin2015-12-301-0/+1
| | | | | | | | This allows the state tracker to know that the various draw parameters are available in vertex shaders. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* nir: Get rid of function overloadsJason Ekstrand2015-12-282-7/+7
| | | | | | | | | | | | | | | | | When Connor originally drafted NIR, he copied the same function+overload system that GLSL IR had with a few names changed. However, this double-indirection is not really needed and has only served to confuse people. Instead, let's just have functions which may not have unique names and may or may not have an implementation. If someone wants to do overload resolving, they can hav a hash table based function+overload system in the overload resolving pass. There's no good reason to keep it in core NIR. Reviewed-by: Connor Abbott <[email protected]> Acked-by: Kenneth Graunke <[email protected]> ir3 bits are Reviewed-by: Rob Clark <[email protected]>
* freedreno/ir3: spelling..Rob Clark2015-12-231-6/+6
| | | | Signed-off-by: Rob Clark <[email protected]>
* nir: Add a writemask to store intrinsics.Kenneth Graunke2015-12-221-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Tessellation control shaders need to be careful when writing outputs. Because multiple threads can concurrently write the same output variables, we need to only write the exact components we were told. Traditionally, for sub-vector writes, we've read the whole vector, updated the temporary, and written the whole vector back. This breaks down with concurrent access. This patch prepares the way for a solution by adding a writemask field to store_var intrinsics, as well as the other store intrinsics. It then updates all produces to emit a writemask of "all channels enabled". It updates nir_lower_io to copy the writemask to output store intrinsics. Finally, it updates nir_lower_vars_to_ssa to handle partial writemasks by doing a read-modify-write cycle (which is safe, because local variables are specific to a single thread). This should have no functional change, since no one actually emits partial writemasks yet. v2: Make nir_validate momentarily assert that writemasks cover the complete value - we shouldn't have partial writemasks yet (requested by Jason Ekstrand). v3: Fix accidental SSBO change that arose from merge conflicts. v4: Don't try to handle writemasks in ir3_compiler_nir - my code for indirects was likely wrong, and TTN doesn't generate partial writemasks today anyway. Change them to asserts as requested by Rob Clark. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> [v3]
* freedreno/ir3: fix 32-bit builds with pointer-to-int-cast error enabledRob Herring2015-12-181-1/+1
| | | | | | | | | Android builds with -Werror=pointer-to-int-cast causing an error on 32-bit builds. Cc: "11.0 11.1" <[email protected]> Signed-off-by: Rob Herring <[email protected]> Signed-off-by: Rob Clark <[email protected]>
* freedreno/a4xx: fix fragcoord.z + fragdepthRob Clark2015-12-152-5/+5
| | | | | | | | | | | | | It seems like disabling earlyz on a4xx also, by defaults, disables fragcoord.z to the FS. For frag shaders that both read fragcoord(.z) and write fragdepth, we need to set some extra bits to prevent a lockup. This lets us get rid of the hack of disabling fragcoord.z (which prevented 0ad from lockups, but resulted in rendering corruption). Also fixes fbo-depth-sample-compare. Signed-off-by: Rob Clark <[email protected]>
* freedreno: update generated headersRob Clark2015-12-156-92/+231
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3/cmdline: don't dump nir by defaultRob Clark2015-12-151-3/+1
| | | | | | By default we only want the disasm dumped, which we get anyways. Signed-off-by: Rob Clark <[email protected]>
* nir: Get rid of *_indirect variants of input/output load/store intrinsicsJason Ekstrand2015-12-101-32/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is some special-casing needed in a competent back-end. However, they can do their special-casing easily enough based on whether or not the offset is a constant. In the mean time, having the *_indirect variants adds special cases a number of places where they don't need to be and, in general, only complicates things. To complicate matters, NIR had no way to convdert an indirect load/store to a direct one in the case that the indirect was a constant so we would still not really get what the back-ends wanted. The best solution seems to be to get rid of the *_indirect variants entirely. This commit is a bunch of different changes squashed together: - nir: Get rid of *_indirect variants of input/output load/store intrinsics - nir/glsl: Stop handling UBO/SSBO load/stores differently depending on indirect - nir/lower_io: Get rid of load/store_foo_indirect - i965/fs: Get rid of load/store_foo_indirect - i965/vec4: Get rid of load/store_foo_indirect - tgsi_to_nir: Get rid of load/store_foo_indirect - ir3/nir: Use the new unified io intrinsics - vc4: Do all uniform loads with byte offsets - vc4/nir: Use the new unified io intrinsics - vc4: Fix load_user_clip_plane crash - vc4: add missing src for store outputs - vc4: Fix state uniforms - nir/lower_clip: Update to the new load/store intrinsics - nir/lower_two_sided_color: Update to the new load intrinsic NIR and i965 changes are Reviewed-by: Kenneth Graunke <[email protected]> NIR indirect declarations and vc4 changes are Reviewed-by: Eric Anholt <[email protected]> ir3 changes are Reviewed-by: Rob Clark <[email protected]> NIR changes are Acked-by: Rob Clark <[email protected]>
* freedreno: little clean up in fd_create_surfaceSerge Martin2015-12-091-15/+16
| | | | | | in order to avoid returing invalid adress if CALLOC_STRUCT return NULL. Signed-off-by: Rob Clark <[email protected]>
* freedreno: change to goto failSerge Martin2015-12-091-4/+2
| | | | | | in fd_resource_transfer_map, like the others error cases Signed-off-by: Rob Clark <[email protected]>
* freedreno: fix bind_sampler_states when hwcso is NULLSerge Martin2015-12-093-0/+9
| | | | | | | | | | src/gallium/tests/trivial/compute.c expects samplers to be cleaned when the samplers list is NULL. Like in radeon, the function behave like when the number of samplers parameter is set to 0. [small s/hwsco/hwcso/ typo fix] Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: nir shader prints with 'disasm' debug optionRob Clark2015-12-051-2/+2
| | | | | | | | Move these to 'disasm' instead of the more verbose 'optmsgs' since, like the tgsi dumps, it is useful without the more verbose compiler logging enabled. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: convert scheduler back to recursive algoRob Clark2015-12-042-127/+204
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I've played with a few different approaches to tweak instruction priority according to how much they increase/decrease register pressure, etc. But nothing seems to change the fact that compared to original (pre-multiple-block-support) scheduler, in some edge cases we are generating shaders w/ 5-6x higher register usage. The problem is that the priority queue approach completely looses the dependency between instructions, and ends up scheduling all paths at the same time. Original reason for switching was that recursive approach relied on starting from the shader outputs array. But we can achieve more or less the same thing by starting from the depth-sorted list. shader-db results: total instructions in shared programs: 113350 -> 105183 (-7.21%) total dwords in shared programs: 219328 -> 211168 (-3.72%) total full registers used in shared programs: 7911 -> 7383 (-6.67%) total half registers used in shader programs: 109 -> 109 (0.00%) total const registers used in shared programs: 21294 -> 21294 (0.00%) half full const instr dwords helped 0 322 0 711 215 hurt 0 163 0 38 4 The shaders hurt tend to gain a register or two. While there are also a lot of helped shaders that only loose a register or two, the more complex ones tend to loose significanly more registers used. In some more extreme cases, like glsl-fs-convolution-1.shader_test it is more like 7 vs 34 registers! Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: don't reuse a0.x across blocksRob Clark2015-12-041-7/+14
| | | | | | | | It causes confusion in sched if we need to split_addr() since otherwise we wouldn't easily know which block the new addr instr will be scheduled in. So just side-step the whole situation. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: rename ir3_block::bdRob Clark2015-12-043-11/+11
| | | | | | | | We'll need to add similar for ir3_instruction, but following the pattern to use 'id' seems confusing. Let's just go w/ generic 'data' as the name. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: assign varying locations laterRob Clark2015-11-264-29/+37
| | | | | | | | | | | | | Rather than assigning inloc up front, when we don't yet know if it will be unused, assign it last thing before the legalize pass. Also, realize when inputs are unused (since for frag shader's we can't rely on them being removed from ir->inputs[]). This doesn't make sense if we don't also dynamically assign the inloc's, since we could end up telling the hw the wrong # of varyings (since we currently assume that the # of varyings and max-inloc are related..) Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: use instr flag to mark unused instructionsRob Clark2015-11-264-14/+24
| | | | | | Rather than magic depth value, which won't be available in later stages. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a4xx: rework vinterp/vpsreplRob Clark2015-11-261-12/+36
| | | | | | Same as previous commit, for a4xx. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx: rework vinterp/vpsreplRob Clark2015-11-261-12/+37
| | | | | | | | | | | | Make the interpolation / point-sprite replacement mode setup deal with varying packing. In a later commit, we switch to packing just the varying components that are actually used by the frag shader, so we won't be able to assume everything is vec4's aligned to vec4. Which would highly confuse the previous vinterp/vpsrepl logic. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: add support for a few gs5 opsIlia Mirkin2015-11-231-0/+27
| | | | | | | Tested on a4xx. This is part of the builtins added by ARB_gpu_shader5 and GLSL ES 3.10. Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno/a4xx: add ARB_texture_query_lod supportIlia Mirkin2015-11-232-6/+20
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno/a4xx: re-emit program on dirty framebufferIlia Mirkin2015-11-231-1/+1
| | | | | | | The program emit depends on certain fb details. Make sure those get updated when the fb changes. Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno/a4xx: use a factor of 32767 for snorm8 blendingIlia Mirkin2015-11-231-5/+34
| | | | | | | | | | | | It appears that the hardware wants the integer to be scaled the same way that the hardware representation is. snorm16 uses one of the float factors, so this is only relevant for snorm8. This fixes a number of subcases of bin/fbo-blending-formats GL_EXT_texture_snorm Signed-off-by: Ilia Mirkin <[email protected]> Cc: [email protected]
* freedreno/a4xx: only compute texture offset once for the viewIlia Mirkin2015-11-233-13/+6
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno/a4xx: add ARB_texture_view supportIlia Mirkin2015-11-233-8/+10
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno/a4xx: add formats for ARB_texture_buffer_object_rgb32 supportIlia Mirkin2015-11-233-3/+9
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno/a4xx: add ARB_texture_rgb10_a2ui supportIlia Mirkin2015-11-232-2/+3
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno/a4xx: add astc formatsIlia Mirkin2015-11-232-1/+39
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno/a4xx: support 16384 texels in buffer textureIlia Mirkin2015-11-232-5/+4
| | | | | | Looks like the width field's bitmask was off-by-one. Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno/a4xx: add ARB_texture_buffer_range supportIlia Mirkin2015-11-233-15/+41
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno/a4xx: add polygon mode supportIlia Mirkin2015-11-234-4/+26
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nir: s/nir_type_unsigned/nir_type_uintJason Ekstrand2015-11-231-1/+1
| | | | | | | | | | | v2: do the same in tgsi_to_nir (Samuel) v3: added missing cases after rebase (Iago) v4: Add a blank space after '#' in one of the comments (Matt) Reviewed-by: Iago Toral Quiroga <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* freedreno/a4xx: disable blending and alphatest for integer rt0Ilia Mirkin2015-11-211-2/+13
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Cc: [email protected]
* freedreno/a4xx: fix independent blendIlia Mirkin2015-11-212-2/+3
| | | | | | | This fixes the ext_draw_buffers2 and arb_draw_buffers_blend tests. Signed-off-by: Ilia Mirkin <[email protected]> Cc: [email protected]