summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* radeonsi/gfx10: set cache control registersMarek Olšák2019-07-031-0/+19
| | | | Acked-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi/gfx10: export correct PrimitiveID from NGG vertex shadersMarek Olšák2019-07-038-11/+71
| | | | Acked-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi/gfx10: set PA_SC_TILE_STEERING_OVERRIDEMarek Olšák2019-07-033-0/+5
| | | | Acked-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi/gfx10: add a workaround for stencil HTILE with mipmappingMarek Olšák2019-07-036-12/+28
| | | | Acked-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi/gfx10: disable DCC with MSAAMarek Olšák2019-07-031-1/+6
| | | | | | It was only enabled for 2x MSAA anyway. Acked-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi/gfx10: fix GL_LINE polygon mode for decomposed primitivesMarek Olšák2019-07-035-3/+24
| | | | | | | We need to tell PA to accept edge flags generated by the input assembler, because decomposed primitives shouldn't draw inner edges. Acked-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi/gfx10: fix NGG GS color clampingMarek Olšák2019-07-031-0/+4
| | | | | | Just need to pass the input from ES to GS. Everything else is done. Acked-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi/gfx10: fix vertex color clamping for TESMarek Olšák2019-07-031-5/+18
| | | | Acked-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi/gfx10: unbind NGG shaders when destroyedMarek Olšák2019-07-031-0/+9
| | | | | | | This fixes glsl-max-varyings, which creates shaders, draws, and then destroys them. Acked-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi/gfx10: don't use the GS workaround for triangle strips w/ adjancencyMarek Olšák2019-07-031-1/+1
| | | | Acked-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi/gfx10: don't do the query buffer atomic for blit shadersMarek Olšák2019-07-031-23/+26
| | | | Acked-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi/gfx10: update spi_map if API VS (as NGG) changes and PS doesn'tMarek Olšák2019-07-031-1/+3
| | | | Acked-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi/gfx10: fix a possible hang with exp pos0 with done=0 and exec=0Marek Olšák2019-07-031-0/+8
| | | | Acked-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi/gfx10: prefetch HW GS when NGG is usedMarek Olšák2019-07-031-2/+2
| | | | Acked-by: Bas Nieuwenhuizen <[email protected]>
* amd/common/gfx10: set DLC for llvm.amdgcn.s.buffer.loadNicolai Hähnle2019-07-031-3/+1
| | | | Acked-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi/gfx10: fix PS exports for SPI_SHADER_32_ARMarek Olšák2019-07-031-3/+9
| | | | Acked-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi/gfx10: set DLC for loads when GLC is setMarek Olšák2019-07-034-17/+34
| | | | | | This fixes L1 shader array cache coherency. Acked-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi/gfx10: fix shader imagesMarek Olšák2019-07-031-2/+3
| | | | | | Don't promote 2D image instructions to 3D, and don't set z=BASE_ARRAY. Acked-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi/gfx10: set the DCC constant encoding flagMarek Olšák2019-07-031-1/+2
| | | | Acked-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi/gfx10: fix intensity formatsMarek Olšák2019-07-035-14/+23
| | | | | | move the ALPHA_IS_ON_MSB fixup into vi_alpha_is_on_msb Acked-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi/gfx10: allocate GDS BOs for streamoutMarek Olšák2019-07-034-10/+40
| | | | Acked-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi/gfx10: make sure GDS is idle between IBsMarek Olšák2019-07-032-9/+28
| | | | Acked-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi/gfx10: implement streamoutNicolai Hähnle2019-07-034-33/+618
| | | | Acked-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi/gfx10: implement streamout-related queriesNicolai Hähnle2019-07-0312-3/+903
| | | | | | | | | | | | | | | | | | | | The NGG hardware pipeline doesn't track these statistics automatically, and in fact *cannot* track them automatically when API geometry shaders are involved, so we accumulate statistics in the shader using atomic adds. This implementation accumulates statistics via the memory system and the RW buffer descriptor setup. We could use GDS, but since these atomics aren't latency-sensitive, that basically just trades off L2$ bandwidth vs. export bus bandwidth. One single memory transaction per shader workgroup doesn't seem too bad. The result ring buffer in memory is needed either way to avoid pipeline stalls. The shader code contains the atomic unconditionally, though the GFX10_GS_QUERY_BUF is a null buffer when no queries are active. The atomic is simply discarded by the shader hardware in that case. Acked-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi/gfx10: enable the workaround for unaligned vertex fetchNicolai Hähnle2019-07-031-1/+3
| | | | | | | | | | Yes, really. Note that non-format buffer loads are unaffected and work just fine with unaligned pointers (as long as SH_MEM_CONFIG is setup correctly, which amdgpu ensures). Fixes e.g. KHR-GL45.vertex_attrib_64bit.vao Acked-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi/gfx10: re-order the initialization order in si_compile_tgsi_mainNicolai Hähnle2019-07-031-32/+32
| | | | | | | It's useful to be able to access gs_ngg_scratch before creating the main wrapping branch. Acked-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi/gfx10: apply DCC MSAA blend workaroundNicolai Hähnle2019-07-031-3/+1
| | | | Acked-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi/gfx10: implement si_emit_global_shader_pointersNicolai Hähnle2019-07-031-1/+12
| | | | Acked-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi/gfx10: implement si_init_tess_factor_ringNicolai Hähnle2019-07-031-1/+4
| | | | Acked-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi/gfx10: initialize EXEC for TES-as-NGG (without geometry shader)Nicolai Hähnle2019-07-031-1/+3
| | | | Acked-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi/gfx10: use correct VGPR for instance ID in LS shaderNicolai Hähnle2019-07-031-2/+7
| | | | Acked-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi/gfx10: implement si_shader_hsNicolai Hähnle2019-07-031-7/+26
| | | | Acked-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi/gfx10: implement si_create_sampler_stateNicolai Hähnle2019-07-031-5/+10
| | | | Acked-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi/gfx10: double the number of tessellation offchip buffers per SENicolai Hähnle2019-07-031-2/+4
| | | | | | | Each gfx10 shader engine corresponds to two gfx9 shader engines, so scale the number of offchip buffers accordingly. Acked-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi/gfx10: implement get_tess_ring_descriptorNicolai Hähnle2019-07-031-7/+14
| | | | Acked-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi/gfx10: mask DCC tile swizzle by alignmentNicolai Hähnle2019-07-032-2/+7
| | | | | | | | DCC alignment can be less than the alignment of the main surface. In that case, the DCC tile swizzle needs to be masked accordingly. Should have no impact on pre-gfx10. Acked-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi/gfx10: implement hardware MSAA resolveNicolai Hähnle2019-07-035-2/+17
| | | | | | | | | | MSAA is only supported for 64KB_{R,Z}_X modes, so the micro tile optimization that we use on gfx9 and earlier does not work. Be very explicit about how the swizzle mode of the temporary surface is selected. Acked-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi/gfx10: fix binding on si_update_scratch_relocsNicolai Hähnle2019-07-031-3/+7
| | | | Acked-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi/gfx10: set llvm_has_working_vgpr_indexingNicolai Hähnle2019-07-031-3/+2
| | | | Acked-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi/gfx10: implement load_const_buffer_desc_fast_pathNicolai Hähnle2019-07-031-7/+14
| | | | Acked-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi/gfx10: take PRIMID from the correct output when exported by GSNicolai Hähnle2019-07-031-2/+2
| | | | Acked-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi/gfx10: change location of instance ID shader inputNicolai Hähnle2019-07-031-2/+11
| | | | Acked-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi/gfx10: set USER_DATA_ADDR offset for geometry shadersNicolai Hähnle2019-07-031-2/+8
| | | | Acked-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi/gfx10: implement si_emit_derived_tess_stateNicolai Hähnle2019-07-031-2/+6
| | | | Acked-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi/gfx10: implement si_shader_gsNicolai Hähnle2019-07-031-15/+29
| | | | | | This is only used in the legacy, non-NGG path. Acked-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi/gfx10: implement preload_ring_buffersNicolai Hähnle2019-07-031-11/+20
| | | | Acked-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi/gfx10: implement si_set_ring_bufferNicolai Hähnle2019-07-031-2/+9
| | | | Acked-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi/gfx10: allow rectangle outputs from NGG primitive shaderNicolai Hähnle2019-07-031-0/+1
| | | | Acked-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi/gfx10: emit VGT_GS_OUT_PRIM_TYPE from draw and add it to VS_STATENicolai Hähnle2019-07-035-48/+52
| | | | | | | | | | | | With NGG, the VGT_GS_OUT_PRIM_TYPE can change without a shader change. The VS_STATE is required for both streamout and culling from a vertex shader without pre-compiling outprim-specific variants. We could consider compiling specialized variants in the future. We could also consider compiling the NGG logic as an epilog. Acked-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi/gfx10: NGG geometry shader PM4 and uploadNicolai Hähnle2019-07-035-29/+316
| | | | Acked-by: Bas Nieuwenhuizen <[email protected]>