summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers
Commit message (Collapse)AuthorAgeFilesLines
* radeonsi: decrease GS copy shader user SGPRs to 2Marek Olšák2016-04-222-3/+3
| | | | | | | | const buffers are no longer used since the clip plane const buffer was moved to RW buffers Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: shorten slot masks to 32 bitsMarek Olšák2016-04-224-63/+61
| | | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: clean up shader resource limit definitionsMarek Olšák2016-04-223-23/+12
| | | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: move default tess level constant buffer to RW buffersMarek Olšák2016-04-225-10/+35
| | | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: move sample positions constant buffer to RW buffersMarek Olšák2016-04-223-4/+5
| | | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: move clip plane constant buffer to RW buffersMarek Olšák2016-04-224-14/+12
| | | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: rework polygon stippling to use constant buffer instead of textureMarek Olšák2016-04-226-101/+55
| | | | | | | | | add it to the RW_BUFFERS descriptor array now the slot masks don't have to have 64 bits Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: generalize si_set_constant_bufferMarek Olšák2016-04-221-10/+17
| | | | | | | this will be used in the next commit Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: make RW buffer descriptor array global, not per shader stageMarek Olšák2016-04-222-51/+43
| | | | | | | v2: also simplify invalidation of RW buffer bindings (squashed) Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: rename and rearrange RW buffer slotsMarek Olšák2016-04-224-30/+39
| | | | | | | | | - use an enum - use a unique slot number regardless of the shader stage (the per-stage slots will go away for RW buffers) Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: Add config parameter to si_shader_apply_scratch_relocs.Bas Nieuwenhuizen2016-04-214-3/+5
| | | | | | | shader->config is not updated for compute kernels. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Tom Stellard <[email protected]>
* swr: add PIPE_CAP_FRAMEBUFFER_NO_ATTACHMENT to get_paramTim Rowley2016-04-211-0/+1
| | | | Reviewed-by: Ilia Mirkin <[email protected]>
* gallium/radeon: Silence possibly uninitialized variable warning.Bas Nieuwenhuizen2016-04-211-1/+1
| | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Enable loading into CE RAM.Bas Nieuwenhuizen2016-04-213-0/+14
| | | | | | | | | | We need to enable a bit in the CONTEXT_CONTROL packet for the loads to work. v2: Style issues. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: Use defines for CONTEXT_CONTROL instead of magic values.Bas Nieuwenhuizen2016-04-212-2/+5
| | | | | | | | v2: Use field names provided by Nicolai. v3: Updated to use CONTEXT_CONTROL prefix. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gk110/ir: make use of IMUL32I for all immediatesSamuel Pitoiset2016-04-201-1/+1
| | | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Cc: "11.1 11.2" <[email protected]>
* gk110/ir: do not overwrite def value with zero for EXCH opsSamuel Pitoiset2016-04-201-6/+15
| | | | | | | | | | This is only valid for other atomic operations (including CAS). This fixes an invalid opcode error from dmesg. While we are it, make sure to initialize global addr to 0 for other atomic operations. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Cc: "11.1 11.2" <[email protected]>
* nir: rename nir_foreach_block*() to nir_foreach_block*_call()Connor Abbott2016-04-205-5/+5
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* nvc0: avoid tex read fault from compute shaders on GK110Samuel Pitoiset2016-04-201-0/+3
| | | | | | | | | | | | After some investigation, it seems like that disabling the UNK02C4 command avoid a read fault with texelFetch() from a compute shader. I have no clue on what this method actually does, but this avoid the GPU to hang with basic-texelFetch.shader_test without introducing any compute-related regressions. Signed-off-by: Samuel Pitoiset <[email protected]> Acked-by: Ilia Mirkin <[email protected]>
* swr: fix resource backed constant buffersTim Rowley2016-04-202-7/+7
| | | | | | | | | | Code was using an incorrect address for the base pointer. v2: use swr_resource_data() utility function. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94979 Reviewed-by: Bruce Cherniak <[email protected]> Tested-by: Markus Wick <[email protected]>
* nouveau: codegen: Add support for OpenCL global memory buffersHans de Goede2016-04-201-2/+10
| | | | | | | | | | | | | | | Add support for OpenCL global memory buffers, note this has only been tested with regular load and stores and likely needs more work for e.g. atomic ops. Tested with piglet on a gf119 and a gk107: ./piglit run -o shader -t '.*arb_shader_storage_buffer_object.*' results/shader [9/9] pass: 9 / ./piglit run -o shader -t '.*arb_compute_shader.*' results/shader [20/20] skip: 4, pass: 16 | Signed-off-by: Hans de Goede <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* nouveau: codegen: Use FILE_MEMORY_BUFFER for buffersHans de Goede2016-04-206-5/+13
| | | | | | | | | | | | | | | | | | | | | | Some of the lowering steps we currently do for FILE_MEMORY_GLOBAL only apply to buffers, making it impossible to use FILE_MEMORY_GLOBAL for OpenCL global buffers. This commits changes the buffer code to use FILE_MEMORY_BUFFER at the ir_from_tgsi and lowering steps, freeing use of FILE_MEMORY_GLOBAL for use with OpenCL global buffers. Note that after lowering buffer accesses use the FILE_MEMORY_GLOBAL register file. Tested with piglet on a gf119 and a gk107: ./piglit run -o shader -t '.*arb_shader_storage_buffer_object.*' results/shader [9/9] pass: 9 / ./piglit run -o shader -t '.*arb_compute_shader.*' results/shader [20/20] skip: 4, pass: 16 | Signed-off-by: Hans de Goede <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* freedreno/a4xx: lower srgb in shader for astc texturesRob Clark2016-04-197-6/+62
| | | | | | | | | | | | | | This *seems* like a hw bug, and maybe only applies to certain a4xx variants/revisions. But setting the SRGB bit in sampler view state (texconst0) causes invalid alpha for ASTC textures. Work around this by doing the srgb->linear conversion in the shader instead. This fixes 392 dEQP tests: dEQP-GLES3.functional.texture.*astc*srgb* (The remaining fails seem to be a bug w/ ASTC + linear filtering, also possibly a420.0 specific.) Signed-off-by: Rob Clark <[email protected]>
* freedreno: cleanup fd_set_sampler_viewsRob Clark2016-04-191-37/+24
| | | | | | | The separate FS/VS entrypoints are no longer used since a3ed98f. So just inline them. Signed-off-by: Rob Clark <[email protected]>
* radeonsi: enable TGSI support cap for compute shadersBas Nieuwenhuizen2016-04-192-7/+30
| | | | | | | | | | | | v2: Use chip_class instead of family. v3: Check kernel version for SI. v4: Preemptively allow amdgpu winsys for SI. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Consider input SGPR count for compute shader SGPR count.Bas Nieuwenhuizen2016-04-192-6/+13
| | | | | | | | si_shader_create corrects the SGPR count with si_fix_num_sgprs. We then recompute the rsrc1 register to use the new SGPR count. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Add CE synchronization for compute dispatches.Bas Nieuwenhuizen2016-04-193-2/+8
| | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: clean up compute flushBas Nieuwenhuizen2016-04-192-18/+8
| | | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: do not do two full flushes on every compute dispatchBas Nieuwenhuizen2016-04-195-22/+17
| | | | | | | | | | | | | | | | | | | | v2: Add more CS_PARTIAL_FLUSH events. Essentially every place with waits on finishing for pixel shaders also has a write after read hazard with compute shaders. Invalidating L2 waits implicitly on pixel and compute shaders, so, we don't need a CS_PARTIAL_FLUSH for switching FBO. v3: Add CS_PARTIAL_FLUSH events even if we already have INV_GLOBAL_L2. According to Marek the INV_GLOBAL_L2 events don't wait for compute shaders to finish, so wait for them explicitly. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: split setting graphics and compute descriptorsBas Nieuwenhuizen2016-04-194-14/+59
| | | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: split texture decompression for compute shadersBas Nieuwenhuizen2016-04-194-4/+16
| | | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: update predicate condition for compute dispatchesBas Nieuwenhuizen2016-04-192-0/+15
| | | | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: implement TGSI compute dispatchBas Nieuwenhuizen2016-04-191-27/+77
| | | | | | | | | | v2: - Use radeon_set_sh_reg_seq. - Set predicate bit for conditional rendering. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: only emit compute shader state when switching shadersBas Nieuwenhuizen2016-04-192-59/+88
| | | | | | | | | | | | v2: - Do check if anything changed earlier - Use emitted_program instead of emitted_bo to prevent shaders with shader->bo = NULL confusing the check - Use radeon_set_sh_reg* Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: rework compute scratch bufferBas Nieuwenhuizen2016-04-193-93/+47
| | | | | | | | | | | | | | | Instead of having a scratch buffer per program, have one per context. Also removed the per kernel wave count calculations, but that only helped if the total number of waves in the dispatch was smaller than sctx->scratch_waves. v2: Fix style issue. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: do per cs setup for compute shaders once per csBas Nieuwenhuizen2016-04-193-32/+48
| | | | | | | | | | | | Also removes PKT3_CONTEXT_CONTROL as that is already being done by si_begin_new_cs, when emitting init_config. v2: - Use radeon_set_sh_reg_seq. - Also set COMPUTE_STATIC_THREAD_MGMT_SE2 / SE3 for CIK+ Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: don't pass scratch buffer to user SGPRsBas Nieuwenhuizen2016-04-191-8/+0
| | | | | | | | As far as I can see we use relocations for clover too. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: split input upload off from si_launch_gridBas Nieuwenhuizen2016-04-191-41/+52
| | | | | | | | | | | | | | | | Also uses a dynamically allocated buffer using u_upload_alloc. The old buffer per program approach required serializing all dispatches of the same program. v2: - Clarified commit message. - Use radeon_set_sh_reg_seq. - Also upload input buffer for clover kernels, even when input_size is 0, as it contains grid parameters. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: implement TGSI compute shader creationBas Nieuwenhuizen2016-04-191-18/+58
| | | | | | | v2: Moved scratch_enabled initialization after compile. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: update shader count for compute shadersBas Nieuwenhuizen2016-04-191-1/+2
| | | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: set maximum work group size based on block sizeBas Nieuwenhuizen2016-04-191-0/+12
| | | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: implement shared atomicsBas Nieuwenhuizen2016-04-191-1/+76
| | | | | | | | | | v2: - Use single region - Use get_memory_ptr Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: implement shared memory load/storeBas Nieuwenhuizen2016-04-191-2/+82
| | | | | | | | | | v2: - Use single region - Combine address calculation Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: add shared memoryBas Nieuwenhuizen2016-04-194-0/+37
| | | | | | | | | | | | | | | Declares the shared memory as a global variable so that LLVM is aware of it and it does not conflict with passes like AMDGPUPromoteAlloca. v2: - Use ctx->i8. - Dropped null-check for declare_memory_region. - Changed memory region array to single region. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: lower compute shader argumentsBas Nieuwenhuizen2016-04-192-0/+50
| | | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: Use CE for all descriptors.Bas Nieuwenhuizen2016-04-191-10/+64
| | | | | | | | | | v2: Load previous list for new CS instead of re-emitting all descriptors. v3: Do radeon_add_to_buffer_list in si_ce_upload. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Replace list_dirty with a mask.Bas Nieuwenhuizen2016-04-192-17/+29
| | | | | | | We can then upload only the dirty ones with the constant engine. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Add CE uploader.Bas Nieuwenhuizen2016-04-193-0/+37
| | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Allocate chunks of CE ram.Bas Nieuwenhuizen2016-04-192-9/+27
| | | | | | | | | v2: Use 32 byte alignment. v3: Don't allocate CE space for vertex buffer descriptors. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Add CE synchronization.Bas Nieuwenhuizen2016-04-192-0/+27
| | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>