summaryrefslogtreecommitdiffstats
path: root/src/gallium
Commit message (Collapse)AuthorAgeFilesLines
* gallium/radeon: Silence possibly uninitialized variable warning.Bas Nieuwenhuizen2016-04-211-1/+1
| | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* winsys/amdgpu: Silence possibly uninitialized variable warning.Bas Nieuwenhuizen2016-04-211-0/+3
| | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Enable loading into CE RAM.Bas Nieuwenhuizen2016-04-213-0/+14
| | | | | | | | | | We need to enable a bit in the CONTEXT_CONTROL packet for the loads to work. v2: Style issues. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: Use defines for CONTEXT_CONTROL instead of magic values.Bas Nieuwenhuizen2016-04-212-2/+5
| | | | | | | | v2: Use field names provided by Nicolai. v3: Updated to use CONTEXT_CONTROL prefix. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* winsys/amdgpu: fix preamble IB sizeThomas Hindoe Paaboel Andersen2016-04-211-0/+1
| | | | | | | | | The missing break caused the IB size to be overwritten with the size of IB_CONST. This was introduced in: 7201230582e060aa2eb79c825d3188b437ef7bb8 Signed-off-by: Marek Olšák <[email protected]>
* gk110/ir: make use of IMUL32I for all immediatesSamuel Pitoiset2016-04-201-1/+1
| | | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Cc: "11.1 11.2" <[email protected]>
* gk110/ir: do not overwrite def value with zero for EXCH opsSamuel Pitoiset2016-04-201-6/+15
| | | | | | | | | | This is only valid for other atomic operations (including CAS). This fixes an invalid opcode error from dmesg. While we are it, make sure to initialize global addr to 0 for other atomic operations. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Cc: "11.1 11.2" <[email protected]>
* nir: rename nir_foreach_block*() to nir_foreach_block*_call()Connor Abbott2016-04-205-5/+5
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* nvc0: avoid tex read fault from compute shaders on GK110Samuel Pitoiset2016-04-201-0/+3
| | | | | | | | | | | | After some investigation, it seems like that disabling the UNK02C4 command avoid a read fault with texelFetch() from a compute shader. I have no clue on what this method actually does, but this avoid the GPU to hang with basic-texelFetch.shader_test without introducing any compute-related regressions. Signed-off-by: Samuel Pitoiset <[email protected]> Acked-by: Ilia Mirkin <[email protected]>
* swr: fix resource backed constant buffersTim Rowley2016-04-202-7/+7
| | | | | | | | | | Code was using an incorrect address for the base pointer. v2: use swr_resource_data() utility function. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94979 Reviewed-by: Bruce Cherniak <[email protected]> Tested-by: Markus Wick <[email protected]>
* nouveau: codegen: Add support for OpenCL global memory buffersHans de Goede2016-04-201-2/+10
| | | | | | | | | | | | | | | Add support for OpenCL global memory buffers, note this has only been tested with regular load and stores and likely needs more work for e.g. atomic ops. Tested with piglet on a gf119 and a gk107: ./piglit run -o shader -t '.*arb_shader_storage_buffer_object.*' results/shader [9/9] pass: 9 / ./piglit run -o shader -t '.*arb_compute_shader.*' results/shader [20/20] skip: 4, pass: 16 | Signed-off-by: Hans de Goede <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* nouveau: codegen: Use FILE_MEMORY_BUFFER for buffersHans de Goede2016-04-206-5/+13
| | | | | | | | | | | | | | | | | | | | | | Some of the lowering steps we currently do for FILE_MEMORY_GLOBAL only apply to buffers, making it impossible to use FILE_MEMORY_GLOBAL for OpenCL global buffers. This commits changes the buffer code to use FILE_MEMORY_BUFFER at the ir_from_tgsi and lowering steps, freeing use of FILE_MEMORY_GLOBAL for use with OpenCL global buffers. Note that after lowering buffer accesses use the FILE_MEMORY_GLOBAL register file. Tested with piglet on a gf119 and a gk107: ./piglit run -o shader -t '.*arb_shader_storage_buffer_object.*' results/shader [9/9] pass: 9 / ./piglit run -o shader -t '.*arb_compute_shader.*' results/shader [20/20] skip: 4, pass: 16 | Signed-off-by: Hans de Goede <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* st/dri: implement the GL interop DRI extension (v2.2)Marek Olšák2016-04-201-0/+258
| | | | | | | | | | v2: - set interop_version - simplify the offset_after macro v2.1: - use version numbers, remove offset_after - set "out_driver_data_written" v2.2: - set buf_offset & buf_size for GL_ARRAY_BUFFER too - add whandle.offset to buf_offset - disable the minmax cache for GL_TEXTURE_BUFFER
* st/dri: Fix RGB565 EGLImage creationNicolas Dufresne2016-04-201-20/+24
| | | | | | | | | | When creating egl images we do a bytes to pixel conversion by deviding by 4 regardless of the pixel format. This does not work for RGB565. In this patch, we avoid useless conversion and use proper API when the conversion cannot be avoided. Signed-off-by: Nicolas Dufresne <[email protected]> Reviewed-by: Michel Dänzer <[email protected]>
* st/dri: Factor out DRI2 to PIPE_FORMAT conversionNicolas Dufresne2016-04-201-34/+27
| | | | | | | | This code is already duplicated twice and will be useful again. This will also help when adding formats. Signed-off-by: Nicolas Dufresne <[email protected]> Reviewed-by: Michel Dänzer <[email protected]>
* freedreno/a4xx: lower srgb in shader for astc texturesRob Clark2016-04-197-6/+62
| | | | | | | | | | | | | | This *seems* like a hw bug, and maybe only applies to certain a4xx variants/revisions. But setting the SRGB bit in sampler view state (texconst0) causes invalid alpha for ASTC textures. Work around this by doing the srgb->linear conversion in the shader instead. This fixes 392 dEQP tests: dEQP-GLES3.functional.texture.*astc*srgb* (The remaining fails seem to be a bug w/ ASTC + linear filtering, also possibly a420.0 specific.) Signed-off-by: Rob Clark <[email protected]>
* freedreno: cleanup fd_set_sampler_viewsRob Clark2016-04-191-37/+24
| | | | | | | The separate FS/VS entrypoints are no longer used since a3ed98f. So just inline them. Signed-off-by: Rob Clark <[email protected]>
* tgsi/lowering: improved lowering for LRPRussell King2016-04-191-35/+20
| | | | | | | | | | | Provide an improved lowering for LRP, which can be implemented in two MAD instructions with a bit of rearranging of the equation, rather than the literal implementation of two multiplies, an add and a subtract. Signed-off-by: Russell King <[email protected]> Reviewed-by: Rob Clark <[email protected]> Signed-off-by: Rob Clark <[email protected]>
* tgsi/lowering: improved lowering for XPDRussell King2016-04-191-22/+13
| | | | | | | | | Improve XPD lowering to consume less instructions by using the MAD instruction to perform the multiply and subtraction together. Signed-off-by: Russell King <[email protected]> Reviewed-by: Rob Clark <[email protected]> Signed-off-by: Rob Clark <[email protected]>
* tgsi/lowering: add support for lowering TRUNCRussell King2016-04-192-0/+85
| | | | | | | | | | | | | | Add support for lowering TRUNC using the following sequence: FRC tmpA, |src| SUB tmpA, |src|, tmpA CMP dst, -tmpA, tmpA Note that this is incompatible with FRC lowering. Signed-off-by: Russell King <[email protected]> Reviewed-by: Rob Clark <[email protected]> Signed-off-by: Rob Clark <[email protected]>
* tgsi/lowering: add support for lowering FLR and CEILRussell King2016-04-192-20/+149
| | | | | | | | | | | | | | | | Add support for lowering FLR and CEIL to FRC/SUB and FRC/ADD instructions for GPUs that support FRC but not FLR or CEIL. Since these uses FRC, it is invalid to ask for FLR or CEIL to be lowered along with FRC, so add an assert to catch this invalid configuration. We also need to deal with FLR instructions emitted by the lowering code. Fix these up with the FRC+SUB equivalent when FLR lowering is enabled. Signed-off-by: Russell King <[email protected]> Reviewed-by: Rob Clark <[email protected]> Reviewed-by: Christian Gmeiner <[email protected]> Signed-off-by: Rob Clark <[email protected]>
* radeonsi: enable TGSI support cap for compute shadersBas Nieuwenhuizen2016-04-192-7/+30
| | | | | | | | | | | | v2: Use chip_class instead of family. v3: Check kernel version for SI. v4: Preemptively allow amdgpu winsys for SI. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Consider input SGPR count for compute shader SGPR count.Bas Nieuwenhuizen2016-04-192-6/+13
| | | | | | | | si_shader_create corrects the SGPR count with si_fix_num_sgprs. We then recompute the rsrc1 register to use the new SGPR count. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Add CE synchronization for compute dispatches.Bas Nieuwenhuizen2016-04-193-2/+8
| | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: clean up compute flushBas Nieuwenhuizen2016-04-192-18/+8
| | | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: do not do two full flushes on every compute dispatchBas Nieuwenhuizen2016-04-195-22/+17
| | | | | | | | | | | | | | | | | | | | v2: Add more CS_PARTIAL_FLUSH events. Essentially every place with waits on finishing for pixel shaders also has a write after read hazard with compute shaders. Invalidating L2 waits implicitly on pixel and compute shaders, so, we don't need a CS_PARTIAL_FLUSH for switching FBO. v3: Add CS_PARTIAL_FLUSH events even if we already have INV_GLOBAL_L2. According to Marek the INV_GLOBAL_L2 events don't wait for compute shaders to finish, so wait for them explicitly. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: split setting graphics and compute descriptorsBas Nieuwenhuizen2016-04-194-14/+59
| | | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: split texture decompression for compute shadersBas Nieuwenhuizen2016-04-194-4/+16
| | | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: update predicate condition for compute dispatchesBas Nieuwenhuizen2016-04-192-0/+15
| | | | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: implement TGSI compute dispatchBas Nieuwenhuizen2016-04-191-27/+77
| | | | | | | | | | v2: - Use radeon_set_sh_reg_seq. - Set predicate bit for conditional rendering. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: only emit compute shader state when switching shadersBas Nieuwenhuizen2016-04-192-59/+88
| | | | | | | | | | | | v2: - Do check if anything changed earlier - Use emitted_program instead of emitted_bo to prevent shaders with shader->bo = NULL confusing the check - Use radeon_set_sh_reg* Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: rework compute scratch bufferBas Nieuwenhuizen2016-04-193-93/+47
| | | | | | | | | | | | | | | Instead of having a scratch buffer per program, have one per context. Also removed the per kernel wave count calculations, but that only helped if the total number of waves in the dispatch was smaller than sctx->scratch_waves. v2: Fix style issue. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: do per cs setup for compute shaders once per csBas Nieuwenhuizen2016-04-193-32/+48
| | | | | | | | | | | | Also removes PKT3_CONTEXT_CONTROL as that is already being done by si_begin_new_cs, when emitting init_config. v2: - Use radeon_set_sh_reg_seq. - Also set COMPUTE_STATIC_THREAD_MGMT_SE2 / SE3 for CIK+ Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: don't pass scratch buffer to user SGPRsBas Nieuwenhuizen2016-04-191-8/+0
| | | | | | | | As far as I can see we use relocations for clover too. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: split input upload off from si_launch_gridBas Nieuwenhuizen2016-04-191-41/+52
| | | | | | | | | | | | | | | | Also uses a dynamically allocated buffer using u_upload_alloc. The old buffer per program approach required serializing all dispatches of the same program. v2: - Clarified commit message. - Use radeon_set_sh_reg_seq. - Also upload input buffer for clover kernels, even when input_size is 0, as it contains grid parameters. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: implement TGSI compute shader creationBas Nieuwenhuizen2016-04-191-18/+58
| | | | | | | v2: Moved scratch_enabled initialization after compile. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: update shader count for compute shadersBas Nieuwenhuizen2016-04-191-1/+2
| | | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: set maximum work group size based on block sizeBas Nieuwenhuizen2016-04-191-0/+12
| | | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: implement shared atomicsBas Nieuwenhuizen2016-04-191-1/+76
| | | | | | | | | | v2: - Use single region - Use get_memory_ptr Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: implement shared memory load/storeBas Nieuwenhuizen2016-04-191-2/+82
| | | | | | | | | | v2: - Use single region - Combine address calculation Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: add shared memoryBas Nieuwenhuizen2016-04-194-0/+37
| | | | | | | | | | | | | | | Declares the shared memory as a global variable so that LLVM is aware of it and it does not conflict with passes like AMDGPUPromoteAlloca. v2: - Use ctx->i8. - Dropped null-check for declare_memory_region. - Changed memory region array to single region. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: lower compute shader argumentsBas Nieuwenhuizen2016-04-192-0/+50
| | | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: Use CE for all descriptors.Bas Nieuwenhuizen2016-04-191-10/+64
| | | | | | | | | | v2: Load previous list for new CS instead of re-emitting all descriptors. v3: Do radeon_add_to_buffer_list in si_ce_upload. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/util: Add u_bit_scan_consecutive_range64.Bas Nieuwenhuizen2016-04-191-0/+14
| | | | | | | | | For use by radeonsi. v2: Make sure that it works for all 64 bits set. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Replace list_dirty with a mask.Bas Nieuwenhuizen2016-04-192-17/+29
| | | | | | | We can then upload only the dirty ones with the constant engine. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Add CE uploader.Bas Nieuwenhuizen2016-04-193-0/+37
| | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Allocate chunks of CE ram.Bas Nieuwenhuizen2016-04-192-9/+27
| | | | | | | | | v2: Use 32 byte alignment. v3: Don't allocate CE space for vertex buffer descriptors. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Add CE synchronization.Bas Nieuwenhuizen2016-04-192-0/+27
| | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Add CE packet definitions.Bas Nieuwenhuizen2016-04-191-0/+6
| | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Create CE IB.Bas Nieuwenhuizen2016-04-195-1/+54
| | | | | | | | | | | | | | | | | | | Based on work by Marek Olšák. v2: Add preamble IB. Leaves the load packet in the space calculation as the radeon winsys might not be able to support a premable. The added space calculation may look expensive, but is converted to a constant with (at least) -O2 and -O3. v3: - Fix code style. - Remove needed space for vertex buffer descriptors. - Fail when the preamble cannot be created. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>