summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/freedreno
Commit message (Collapse)AuthorAgeFilesLines
* freedreno/ir3: convert scheduler back to recursive algoRob Clark2015-12-042-127/+204
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I've played with a few different approaches to tweak instruction priority according to how much they increase/decrease register pressure, etc. But nothing seems to change the fact that compared to original (pre-multiple-block-support) scheduler, in some edge cases we are generating shaders w/ 5-6x higher register usage. The problem is that the priority queue approach completely looses the dependency between instructions, and ends up scheduling all paths at the same time. Original reason for switching was that recursive approach relied on starting from the shader outputs array. But we can achieve more or less the same thing by starting from the depth-sorted list. shader-db results: total instructions in shared programs: 113350 -> 105183 (-7.21%) total dwords in shared programs: 219328 -> 211168 (-3.72%) total full registers used in shared programs: 7911 -> 7383 (-6.67%) total half registers used in shader programs: 109 -> 109 (0.00%) total const registers used in shared programs: 21294 -> 21294 (0.00%) half full const instr dwords helped 0 322 0 711 215 hurt 0 163 0 38 4 The shaders hurt tend to gain a register or two. While there are also a lot of helped shaders that only loose a register or two, the more complex ones tend to loose significanly more registers used. In some more extreme cases, like glsl-fs-convolution-1.shader_test it is more like 7 vs 34 registers! Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: don't reuse a0.x across blocksRob Clark2015-12-041-7/+14
| | | | | | | | It causes confusion in sched if we need to split_addr() since otherwise we wouldn't easily know which block the new addr instr will be scheduled in. So just side-step the whole situation. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: rename ir3_block::bdRob Clark2015-12-043-11/+11
| | | | | | | | We'll need to add similar for ir3_instruction, but following the pattern to use 'id' seems confusing. Let's just go w/ generic 'data' as the name. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: assign varying locations laterRob Clark2015-11-264-29/+37
| | | | | | | | | | | | | Rather than assigning inloc up front, when we don't yet know if it will be unused, assign it last thing before the legalize pass. Also, realize when inputs are unused (since for frag shader's we can't rely on them being removed from ir->inputs[]). This doesn't make sense if we don't also dynamically assign the inloc's, since we could end up telling the hw the wrong # of varyings (since we currently assume that the # of varyings and max-inloc are related..) Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: use instr flag to mark unused instructionsRob Clark2015-11-264-14/+24
| | | | | | Rather than magic depth value, which won't be available in later stages. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a4xx: rework vinterp/vpsreplRob Clark2015-11-261-12/+36
| | | | | | Same as previous commit, for a4xx. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx: rework vinterp/vpsreplRob Clark2015-11-261-12/+37
| | | | | | | | | | | | Make the interpolation / point-sprite replacement mode setup deal with varying packing. In a later commit, we switch to packing just the varying components that are actually used by the frag shader, so we won't be able to assume everything is vec4's aligned to vec4. Which would highly confuse the previous vinterp/vpsrepl logic. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: add support for a few gs5 opsIlia Mirkin2015-11-231-0/+27
| | | | | | | Tested on a4xx. This is part of the builtins added by ARB_gpu_shader5 and GLSL ES 3.10. Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno/a4xx: add ARB_texture_query_lod supportIlia Mirkin2015-11-232-6/+20
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno/a4xx: re-emit program on dirty framebufferIlia Mirkin2015-11-231-1/+1
| | | | | | | The program emit depends on certain fb details. Make sure those get updated when the fb changes. Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno/a4xx: use a factor of 32767 for snorm8 blendingIlia Mirkin2015-11-231-5/+34
| | | | | | | | | | | | It appears that the hardware wants the integer to be scaled the same way that the hardware representation is. snorm16 uses one of the float factors, so this is only relevant for snorm8. This fixes a number of subcases of bin/fbo-blending-formats GL_EXT_texture_snorm Signed-off-by: Ilia Mirkin <[email protected]> Cc: [email protected]
* freedreno/a4xx: only compute texture offset once for the viewIlia Mirkin2015-11-233-13/+6
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno/a4xx: add ARB_texture_view supportIlia Mirkin2015-11-233-8/+10
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno/a4xx: add formats for ARB_texture_buffer_object_rgb32 supportIlia Mirkin2015-11-233-3/+9
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno/a4xx: add ARB_texture_rgb10_a2ui supportIlia Mirkin2015-11-232-2/+3
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno/a4xx: add astc formatsIlia Mirkin2015-11-232-1/+39
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno/a4xx: support 16384 texels in buffer textureIlia Mirkin2015-11-232-5/+4
| | | | | | Looks like the width field's bitmask was off-by-one. Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno/a4xx: add ARB_texture_buffer_range supportIlia Mirkin2015-11-233-15/+41
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno/a4xx: add polygon mode supportIlia Mirkin2015-11-234-4/+26
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nir: s/nir_type_unsigned/nir_type_uintJason Ekstrand2015-11-231-1/+1
| | | | | | | | | | | v2: do the same in tgsi_to_nir (Samuel) v3: added missing cases after rebase (Iago) v4: Add a blank space after '#' in one of the comments (Matt) Reviewed-by: Iago Toral Quiroga <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* freedreno/a4xx: disable blending and alphatest for integer rt0Ilia Mirkin2015-11-211-2/+13
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Cc: [email protected]
* freedreno/a4xx: fix independent blendIlia Mirkin2015-11-212-2/+3
| | | | | | | This fixes the ext_draw_buffers2 and arb_draw_buffers_blend tests. Signed-off-by: Ilia Mirkin <[email protected]> Cc: [email protected]
* freedreno/a4xx: enable ARB_base_instance supportIlia Mirkin2015-11-211-1/+1
| | | | | | We already pass in start_instance in fd4_draw. Expose the extension. Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno/a4xx: set fetchsize in mem2gmem texture restoreIlia Mirkin2015-11-211-1/+2
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno/a4xx: add 11_11_10_float vertex type supportIlia Mirkin2015-11-212-1/+2
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno/a4xx: fix 3d texture setupIlia Mirkin2015-11-213-3/+7
| | | | | | | | | | Same fix as on a3xx - set the second (tiny) layer size bitfield to the smallest level's size so that the hw knows not to minify beyond that. This fixes texelFetch sampler3D piglits. Signed-off-by: Ilia Mirkin <[email protected]> Cc: [email protected]
* freedreno/a4xx: only align slices in non-layer_first texturesIlia Mirkin2015-11-211-2/+4
| | | | | | | | | | | | When layer is the container, slices are tightly packed inside of each layer. We don't need any additional alignment. On a3xx, each slice contains all the layers, so having alignment makes sense. This fixes a whole slew of array-related piglits, including texelFetch and tex-miplevel-selection varieties. Signed-off-by: Ilia Mirkin <[email protected]> Cc: [email protected]
* freedreno/a4xx: add missing formats to enable ARB_vertex_type_2_10_10_10_revIlia Mirkin2015-11-202-4/+8
| | | | | | | Same as commit 84d087aea but for a4xx. The RE'd enums had the same issue too. Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno/a4xx: use hardware RGTC texture samplersIlia Mirkin2015-11-206-24/+19
| | | | | | | | | a4xx hardware has real support for RGTC so there's no need to fake it like we do on a3xx. Undo the hacks, and keep track of an "internal format" of a resource, which on a3xx will be different, triggering the transfer-time conversions to take place. Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno/a4xx: hook up RGB565 formatIlia Mirkin2015-11-202-1/+2
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno/a4xx: logic op handlingIlia Mirkin2015-11-206-29/+35
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno/a4xx: add 16-bit unorm/snorm format texturing/renderingIlia Mirkin2015-11-202-25/+47
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno/a4xx: point regid to "red" even for alpha-only rb formatsIlia Mirkin2015-11-201-7/+0
| | | | | | | | | Looks like a4xx hw does this in a more standard way and we don't need to hack around it like we do on a3xx. Fixes GL_ALPHA formats in fbo-blending-formats, fbo-colormask-formats, and fbo-alphatest-formats. Signed-off-by: Ilia Mirkin <[email protected]> Cc: [email protected]
* freedreno: always set all border colorsIlia Mirkin2015-11-201-30/+8
| | | | | | | Instead of playing the guessing game as to which texture format reads from which border color encoding type, just write both of them always. Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno/a4xx: fix dst_alpha blend for RGBX render targetsIlia Mirkin2015-11-203-5/+32
| | | | | | | There are not native RGBX render formats, so we must manually force dst_alpha to be one, same as for a3xx. Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno/a4xx: add BPTC supportIlia Mirkin2015-11-202-0/+8
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nir: Add nir_texop_samples_identical opcodeIan Romanick2015-11-191-0/+3
| | | | | | | | | | | This is the NIR analog to GLSL IR ir_samples_identical. v2: Don't add the second nir_tex_src_ms_index parameter. Suggested by Ken and Jason. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* freedreno/a4xx: fix 5_5_5_1 texture sampler formatIlia Mirkin2015-11-191-1/+1
| | | | | | | | This fixes teximage-colors, fbo-generatemipmap-formats, and probably others (in relation to the RGB5 formats, others still fail). Signed-off-by: Ilia Mirkin <[email protected]> Cc: [email protected]
* freedreno/a4xx: add depth clamp and halfz clipIlia Mirkin2015-11-193-4/+9
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno/a4xx: allow seamless cubemap filtering to be enabled per-textureIlia Mirkin2015-11-193-1/+3
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno/a4xx: support lod_biasIlia Mirkin2015-11-192-0/+7
| | | | | | | | | The lower layers assume that we support this, and it's been core since GL 1.4. This fixes a slew of piglit tests, especially around tex-miplevel-selection. Signed-off-by: Ilia Mirkin <[email protected]> Cc: [email protected]
* freedreno/a4xx: add fake RGTC support (required for GL3)Rob Clark2015-11-183-1/+22
| | | | | | | | | | The a4xx bits corresponding to 'freedreno/a3xx: add fake RGTC support (required for GL3)' TODO some more r/e.. maybe we get lucky and hw supports some of this directly? For now this will help us enable gl3. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a4xx: add compressed texture formatsRob Clark2015-11-182-2/+26
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno: update generated headersRob Clark2015-11-185-11/+37
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno: expose GLSL 140 and fake MSAA for GL3.0/3.1 supportIlia Mirkin2015-11-181-2/+2
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx: fix texture buffers, enable offsetsIlia Mirkin2015-11-183-14/+32
| | | | | | | | | | | | | | | The main issue is that the current logic looked into cso->u.tex, which is the wrong side of the union to look into for texture buffers. While I was at it, it was easy enough to add the logic to handle offsets (first_element). - reduce texture buffer size limit (determined experimentally) - don't look at first/last levels, instead look at first/last element - include the first element offset - set offset alignment to 16 (determined experimentally) Signed-off-by: Ilia Mirkin <[email protected]> Signed-off-by: Rob Clark <[email protected]>
* freedreno: add support for conditional rendering, required for GL3.0Ilia Mirkin2015-11-186-6/+56
| | | | | | | | | A smarter implementation would make it possible to attach this to emit state for the BY_REGION versions to avoid breaking the tiling. But this is a start. Signed-off-by: Ilia Mirkin <[email protected]> Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx: add fake RGTC support (required for GL3)Ilia Mirkin2015-11-185-27/+175
| | | | | | | | | | | | Also throw in LATC while we're at it (same exact format). This could be made more efficient by keeping a shadow compressed texture to use for returning at map time. However... it's not worth it for now... presumably compressed textures are not updated often. Lastly fix up Z32S8 transfers to non-0 layers. Signed-off-by: Ilia Mirkin <[email protected]> Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx: add missing formats to enable ARB_vertex_type_2_10_10_10_revIlia Mirkin2015-11-182-4/+8
| | | | | | | | | The previously RE'd formats were from an ES driver implementing OES_vertex_type_10_10_10_2 and thus backwards. A future change could add the 2_10_10_10 support. Signed-off-by: Ilia Mirkin <[email protected]> Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx+a4xx: fix for stk binning pass hangRob Clark2015-11-183-19/+76
| | | | | | | | | | | | | | We'd end up in a state where shader uses no inputs, yet num_elements is greater than zero. Triggered by a TF vertex shader which did: gl_Position = vec4(0.0, 0.0, 0.0, 0.0); resulting in a binning pass variant with no inputs. Includes equiv fix in a4xx, even though we don't have binning-pass enabled yet on a4xx. Signed-off-by: Rob Clark <[email protected]>