summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/freedreno
Commit message (Collapse)AuthorAgeFilesLines
* freedreno/a6xx: disallow UBWC for x24s8Rob Clark2019-06-171-4/+15
| | | | | | | | | Fixes: dEQP-GLES31.functional.stencil_texturing.format.depth24_stencil8_2d dEQP-GLES31.functional.stencil_texturing.format.stencil_index8_2d dEQP-GLES31.functional.stencil_texturing.misc.compare_mode_effect Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: un-swap X24S8_UINTRob Clark2019-06-172-5/+6
| | | | | | | | | | | | | | | | | | | The stencil is actually in the .w component, but we used to use SWAP to remap the channels. This doesn't work when tiled/ubwc. Fixes: dEQP-GLES31.functional.stencil_texturing.format.depth24_stencil8_2d_array dEQP-GLES31.functional.stencil_texturing.format.depth24_stencil8_cube dEQP-GLES31.functional.stencil_texturing.format.stencil_index8_2d_array dEQP-GLES31.functional.stencil_texturing.format.stencil_index8_cube dEQP-GLES31.functional.stencil_texturing.misc.base_level dEQP-GLES31.functional.texture.border_clamp.formats.stencil_index8.nearest_size_pot dEQP-GLES31.functional.texture.border_clamp.formats.stencil_index8.nearest_size_npot dEQP-GLES31.functional.texture.border_clamp.formats.depth24_stencil8_sample_stencil.nearest_size_pot dEQP-GLES31.functional.texture.border_clamp.formats.depth24_stencil8_sample_stencil.nearest_size_npot dEQP-GLES31.functional.texture.border_clamp.sampler.uint_stencil Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: re-enable UBWC for depth/stencilRob Clark2019-06-151-0/+2
| | | | | | | | Now that we can blit depth/stencil in a way that plays nicely with UBWC, re-enable it. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno/a6xx: handle z24s8/z24x8 blits with u_blitterRob Clark2019-06-152-25/+11
| | | | | | | | | | Now that it can turn these blits into rendering to RB6_Z24_UNORM_S8_UINT it can properly handle cases where only one of depth+stencil is being blit. And this avoids lying about he format, which completely doesn't work when UBWC is used. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno/a6xx: handle fallback for rewritten blits ourselfRob Clark2019-06-151-11/+37
| | | | | | | | | For re-written z/s blits, we want to use the re-written `pipe_blit_info` even if we have to fallback to 3d pipe (`u_blitter`). So handle that fallback ourself. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno/a6xx: rename variableRob Clark2019-06-151-39/+39
| | | | | | | | | The name 'separate' doesn't make a while lot of sense, as only one of the cases is the blit actually split. But split out from previous patch in an attempt to reduce the noise. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno/a6xx: consolidate z/s blit handlingRob Clark2019-06-151-67/+46
| | | | | | | This will get even simpler with the next patch Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno/a6xx: fix MAX_INDICESRob Clark2019-06-131-2/+1
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/blitter: remove dead codeRob Clark2019-06-131-7/+0
| | | | | | | The src/dst format is overriden from the pipe_blit_info, so this just logic just serves to confuse the reader. Signed-off-by: Rob Clark <[email protected]>
* freedreno: turn staging cube into 2d-arrayRob Clark2019-06-131-0/+2
| | | | | | | Since we could only need a subset of the layers, and otherwise we trigger an assert in util_max_layer() Signed-off-by: Rob Clark <[email protected]>
* freedreno: use util_dynarray_clear instead of util_dynarray_resize(_, 0)Nicolai Hähnle2019-06-125-12/+12
| | | | | | | | | This is more expressive and simplifies a subsequent change. v2: - fix one more call-site after rebase Reviewed-by: Marek Olšák <[email protected]>
* freedreno/a5xx: enable a540Rob Clark2019-06-112-2/+14
| | | | | Tested-by: Jeffrey Hugo <[email protected]> Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: enable UBWC by defaultRob Clark2019-06-113-18/+3
| | | | | | | | Flip the FD_MESA_DEBUG flag to a disable rather than enable, drop the obsolete comment (and bonus, drop unused softpin debug flag) Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno/a6xx: disallow UBWC for z24s8Rob Clark2019-06-111-1/+0
| | | | | | | | | | | | | | This is slightly annoying because it *mostly* works.. but we have some issues to sort out about how to blit z24s8/x24s8/z24x8 with UBWC before we can enable UBWC by default. For now it is a step forward to at least enable it for non-z/s while we figure out how to blit z24s8+UBWC. (The basic issue is that pretending z24s8 is an equivalently sized rgba format for the purpose of blitting falls apart when UBWC is in the picture.) Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno/a6xx: use correct UBWC reg buildersRob Clark2019-06-112-11/+11
| | | | | | | | No functional change, the registers have the same layout as MRT flags pitch reg. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno/a6xx: disable UBWC for some formatsRob Clark2019-06-111-2/+0
| | | | | | | | | | | | An older blob claims to support UBWC w/ r32ui an r32i, but not r32f. Results from deqp indicate that it doesn't work with r32ui and r32i. This *could* also just mean that use as "IBO" (image) is more limited than as texture, although blob also doesn't seem to bother to try to use UBWC with images at all, so hard to know for sure. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno/a6xx: handle non-UWC-compatible image viewsRob Clark2019-06-115-1/+45
| | | | | Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno/a6xx: handle non-UBWC-compatible texture viewsRob Clark2019-06-113-0/+23
| | | | | Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno: add helper to uncompress UBWC resourceRob Clark2019-06-112-0/+37
| | | | | | | | | | | | | | | | | | | | | We'll need this for a few edge cases, like image/sampler view that uses a format that UBWC does not support with a resource originally created in a format that UBWC does support. NOTE we *could* in some cases do an in-place uncompress. But that has a couple potential sharp edges: 1) the uncompressed buffer could have different layout, ie. a5xx with meta and pixel data of layers/levels interleaved. 2) if it comes mid-batch, it would force flush, or somehow fixing up cmdstream for draws already emitted. But with the resource shadowing approach we can rely on batch re-ordering to avoid splitting things.. older draws see the older compressed version, newer draws see the new uncompressed version of the rsc. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno: handle images in rebind_resource()Rob Clark2019-06-111-0/+9
| | | | | Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno: allow null discard box in shadow pathRob Clark2019-06-111-4/+10
| | | | | | | When uncompressing a UBWC buffer, we don't want to discard anything. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno: swap UBWC state in shadow pathRob Clark2019-06-111-0/+4
| | | | | | | | | It doesn't come up yet, as so far we only hit this path with linear buffers. But it will when we start re-using the shadow path for uncompressing UBWC buffers. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno: add modifier param to fd_try_shadow_resource()Rob Clark2019-06-111-3/+5
| | | | | | | | To uncompress UBWC, I want to re-use the shadow path, but we'll need a way to request that the new buffer is not compressed. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno: correct modifier for UBWC buffersRob Clark2019-06-111-0/+3
| | | | | Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno/a5xx: Fix indirect draw max_indices calculationEduardo Lima Mitev2019-06-111-2/+1
| | | | | | | | | | | | | | | | | The number of elements to draw should not be affected by the offset. A similar fix was submitted for a6xx at 79180a05. Fixes these dEQP tests on a5xx: dEQP-GLES31.functional.draw_indirect.compute_interop.large.drawelements_separate_grid_500x500_drawcount_8 dEQP-GLES31.functional.draw_indirect.compute_interop.large.drawelements_separate_grid_500x500_drawcount_2500 dEQP-GLES31.functional.draw_indirect.compute_interop.large.drawarrays_separate_grid_500x500_drawcount_2500 dEQP-GLES31.functional.draw_indirect.compute_interop.large.drawarrays_combined_grid_500x500_drawcount_2500 dEQP-GLES31.functional.draw_indirect.compute_interop.large.drawelements_combined_grid_500x500_drawcount_8 dEQP-GLES31.functional.draw_indirect.compute_interop.large.drawelements_combined_grid_500x500_drawcount_2500 Reviewed-by: Rob Clark <[email protected]>
* freedreno/a6xx: re-arrange program stageobj/groupRob Clark2019-06-074-30/+58
| | | | | | | | | | | | | | Split out a separate program config state group to run early before the other groups. This seems to help w/ intermittent "missed tiles" (although I had assumed that was a mem2gmem issue), or at least I can't reproduce that issue with this patch, but can without. It has the benefit of HLSQ_VS_CNTL.CONSTLEN matching for VS and BS. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno/a6xx: fix hangs with newer sqe fwRob Clark2019-06-071-32/+81
| | | | | | | | | | | | | | | | | | | With the newer (v1.76) fw, we were getting hangs (compared to older v1.66 fw). Re-work the GMEM code to structure things a bit closer to the blob. This moves some PKT7 packets from IB2 to IB1, which I think is what was confusing SQE and causing it to get stuck in an infinite loop. But in general structuring things at least closer to the same way blob does makes it easier to compare cmdstream. Note: this is a bit on the large side for what I'd normally consider for stable.. but right now it is looking like it is the newer fw that is headed for linux-firmware. This should defn have some soak time on master, but probably a good idea for this patch to end up in distro mesa builds by the time a630_sqe.fw hits linux-firmware. Cc: [email protected] Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno/a6xx: WFI before RB_CCU_CNTL writesRob Clark2019-06-072-0/+4
| | | | | | | | | | | | This seems to be in a block of non buffered/context regs. Blob always WFIs before write, so probably a good idea. Annoyingly, compared to ealier gens, it is a bit harder to tell from the register offset whether it is a buffered reg, it isn't as simple as everything below 0x2000, it seems. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno/a6xx: don't pre-dispatch texture fetch on accidentRob Clark2019-06-071-1/+4
| | | | | Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno/a6xx: fix issues with gallium HUDRob Clark2019-06-071-5/+8
| | | | | | | | | | | | | In some cases the draw for the text wasn't working. This seems to be fixed by resyncing some of the "golded registers" from blob (initial values were based on somewhat older blob version). Perhaps good to have a bit of soak time on master, but would be good to eventually land in 19.x stable branches. Cc: [email protected] Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno/a6xx: Drop struct stage arrayKristian H. Kristensen2019-06-071-144/+80
| | | | | | | | | | | | | | | This now boils down to just picking between binning or vertex shader and dummy_fs or real fs, which we can do in a couple of lines of code instead. The constlen logic isn't doing what it thinks it's doing, both constlens at this point MAX2(s[VS].constlen, align(state->bs->constlen, 4)); are binning shader constlens. We'll have to revisit the constlen logic, but this commit doesn't change how it works. Reviewed-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* freedreno/a6xx: Drop support for SS6_DIRECT shader uploadKristian H. Kristensen2019-06-071-30/+3
| | | | | | | a6xx only supports indirect shaders. Reviewed-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* freedreno/a6xx: Share shader_t_to_opcodeKristian H. Kristensen2019-06-073-35/+21
| | | | | | | | We have a similar function in fd6_program.c. Move to fd6_emit.h and share. Reviewed-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* freedreno/a6xx: Consolidate more of dword 0 building in fd6_draw_vboKristian H. Kristensen2019-06-071-31/+24
| | | | | | | | | There's already a bit of duplicated logic here and tessellation will add more. Build up dword 0 in fd6_draw_vbo() and drop the a4xx in the process. Reviewed-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* freedreno: Move fd4_size2indextype() helper to freedreno_util.hKristian H. Kristensen2019-06-072-13/+13
| | | | | | | In preparation for refactoring fd6_draw.c a bit. Reviewed-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* nir: Combine lower_fmod16/32 back into a single lower_fmod.Kenneth Graunke2019-06-051-1/+1
| | | | | | | | | | | | | | We originally had a single lower_fmod option. In commit 2ab2d2e5, Sam split 32 and 64-bit lowering into separate flags, with the rationale that some drivers might want different options there. This left 16-bit unhandled, so Iago added a lower_fmod16 option in commit ca31df6f. Now that lower_fmod64 is gone (in favor of nir_lower_doubles and nir_lower_dmod), we re-combine lower_fmod16 and lower_fmod32 into a single lower_fmod flag again. I'm not aware of any hardware which need lowering for one bitsize and not the other. Reviewed-by: Marek Olšák <[email protected]>
* freedreno/a6xx: Use VALIDREG in next_regid() helperKristian H. Kristensen2019-06-051-6/+6
| | | | Reviewed-by: Rob Clark <[email protected]>
* freedreno/a6xx: Remove dead code from a5xxKristian H. Kristensen2019-06-051-10/+0
| | | | Reviewed-by: Rob Clark <[email protected]>
* freedreno: Drop invalid scissor optimization.Eric Anholt2019-06-041-7/+0
| | | | | | | We do support TF now, so it's no longer valid. Besides, if we want this optimization, we should probably have mesa/st doing it right for everyone. Reviewed-by: Rob Clark <[email protected]>
* freedreno: Add printf pattern string.Bas Nieuwenhuizen2019-06-041-1/+1
| | | | | | | Some new flag setting disallows it due to being a security risk. Fixes: c9c1e261064 "mesa: prevent common string formatting security issues" Reviewed-by: Rob Clark <[email protected]>
* freedreno/ir3: fix counting and printing for half registers.Hyunjun Ko2019-06-032-2/+2
| | | | | v2: defining 0x100 and use this for setting the FS_OUTPUT_REG.HALF_PRECISION Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: Use output type size to set OUTPUT_REG_HALF_PRECISIONNeil Roberts2019-06-032-6/+2
| | | | | | | | | | | | | | | | | | | Previously the A5XX_SP_FS_OUTPUT_REG_HALF_PRECISION was set depending on whether half_precision was set in the shader key. With support for mediump precision, it is possible to have different outputs use different precisions. That means we can’t have a global shader state to specify it. Instead it now tries to copy the half-float-ness from the nir_variable for the output into the ir3_shader_variant. This is then used to decide whether to set half-precision for each output. The a6xx version is copied from the a5xx code but it has not been tested. v2. [Hyunjun Ko ([email protected])] There's the half flag recently added, which represents precision based on IR3_REG_HALF. Now use this flag to avoid duplication. Signed-off-by: Rob Clark <[email protected]>
* nir: remove bool lowering from lower_int_to_floatJonathan Marek2019-05-311-0/+1
| | | | | | | | | | | | | | Removes the bool_to_float logic from the int_to_float pass, so that both can be used separately. By having separate passes we have better validation and it makes it possible to use with the lower_ftrunc option (int lowering generates ftrunc, but lower_ftrunc generates bools, ftrunc lowering should probably be reworked). For now we always expect lower_bool to come after lower_int. Also fixes f2i32 to become ftrunc and adds u2f/f2u cases. Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: add lower_bitshift optionJonathan Marek2019-05-311-0/+1
| | | | | | | | | Add a "lower_bitshift" option, which disables optimizations introducing bitshifts and lowers ishl by constant to a multiply, so that we don't have to deal with bitshifts in int_to_float lowering. Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* freedreno/a6xx: add 'type' to shader state keyRob Clark2019-05-312-0/+2
| | | | | | | | | | | | | | | | | | | | We could have identical texture state for both VS and FS.. which would result in VS state getting created first, and FS state mapping to the identical cmdstream. Resulting in VS state getting emitted twice and no FS state emitted. Fixes: dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.basic_array.sampler2D_both dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.struct_in_array.sampler2D_samplerCube_both dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.array_in_struct.sampler2D_samplerCube_both dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.nested_structs_arrays.sampler2D_samplerCube_both dEQP-GLES2.functional.uniform_api.value.assigned.by_value.render.nested_structs_arrays.sampler2D_samplerCube_both dEQP-GLES31.functional.program_uniform.by_pointer.render.array_in_struct.sampler2D_samplerCube_both dEQP-GLES31.functional.program_uniform.by_pointer.render.nested_structs_arrays.sampler2D_samplerCube_both dEQP-GLES31.functional.program_uniform.by_value.render.nested_structs_arrays.sampler2D_samplerCube_both Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* freedreno/a6xx: fix GPU crash on small render targetsRob Clark2019-05-311-0/+7
| | | | | | | Fixes dEQP-GLES2.functional.multisampled_render_to_texture.readpixels Signed-off-by: Rob Clark <[email protected]> Acked-by: Eric Anholt <[email protected]>
* spirv: Change spirv_to_nir() to return a nir_shaderCaio Marcelo de Oliveira Filho2019-05-291-4/+4
| | | | | | | | | | | | | | | spirv_to_nir() returned the nir_function corresponding to the entrypoint, as a way to identify it. There's now a bool is_entrypoint in nir_function and also a helper function to get the entry_point from a nir_shader. The return type reflects better what the function name suggests. It also helps drivers avoid the mistake of reusing internal shader references after running NIR_PASS on it. When using NIR_TEST_CLONE or NIR_TEST_SERIALIZE, those would be invalidated right in the first pass executed. Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* nir: Drop imov/fmov in favor of one mov instructionJason Ekstrand2019-05-241-14/+11
| | | | | | | | | | | | | | | | The difference between imov and fmov has been a constant source of confusion in NIR for years. No one really knows why we have two or when to use one vs. the other. The real reason is that they do different things in the presence of source and destination modifiers. However, without modifiers (which many back-ends don't have), they are identical. Now that we've reworked nir_lower_to_source_mods to leave one abs/neg instruction in place rather than replacing them with imov or fmov instructions, we don't need two different instructions at all anymore. Reviewed-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Vasily Khoruzhick <[email protected]> Acked-by: Rob Clark <[email protected]>
* gallium: Change PIPE_CAP_TGSI_FS_FBFETCH bool to PIPE_CAP_FBFETCH countKenneth Graunke2019-05-231-1/+1
| | | | | | | | | | | | | | TGSI's FBFETCH instruction currently only supports reading from a single render target, but NIR intrinsics can support multiple render targets. radeonsi can only support fetching from RT 0, but other drivers may be able to support fetching from any render target. To express this, this patch renames PIPE_CAP_TGSI_FS_FBFETCH to simply PIPE_CAP_FBFETCH, and converts it from a boolean "is FBFETCH supported?" to an integer number of render targets which can be fetched. Reviewed-by: Marek Olšák <[email protected]>
* freedreno/a6xx: WFI in program stateobj tooRob Clark2019-05-201-0/+2
| | | | | | | | | | | This "fixes" hangs seen w/ various android games. I think a similar issue to with constant state, we need to avoid CP_LOAD_STATE until previous draw completes. It isn't entirely clear why blob doesn't need to do this, but it might have a different way to accomplish the same thing. Signed-off-by: Rob Clark <[email protected]>