summaryrefslogtreecommitdiffstats
path: root/src/gallium
Commit message (Collapse)AuthorAgeFilesLines
* freedreno: cleanup fd_set_sampler_viewsRob Clark2016-04-191-37/+24
| | | | | | | The separate FS/VS entrypoints are no longer used since a3ed98f. So just inline them. Signed-off-by: Rob Clark <[email protected]>
* tgsi/lowering: improved lowering for LRPRussell King2016-04-191-35/+20
| | | | | | | | | | | Provide an improved lowering for LRP, which can be implemented in two MAD instructions with a bit of rearranging of the equation, rather than the literal implementation of two multiplies, an add and a subtract. Signed-off-by: Russell King <[email protected]> Reviewed-by: Rob Clark <[email protected]> Signed-off-by: Rob Clark <[email protected]>
* tgsi/lowering: improved lowering for XPDRussell King2016-04-191-22/+13
| | | | | | | | | Improve XPD lowering to consume less instructions by using the MAD instruction to perform the multiply and subtraction together. Signed-off-by: Russell King <[email protected]> Reviewed-by: Rob Clark <[email protected]> Signed-off-by: Rob Clark <[email protected]>
* tgsi/lowering: add support for lowering TRUNCRussell King2016-04-192-0/+85
| | | | | | | | | | | | | | Add support for lowering TRUNC using the following sequence: FRC tmpA, |src| SUB tmpA, |src|, tmpA CMP dst, -tmpA, tmpA Note that this is incompatible with FRC lowering. Signed-off-by: Russell King <[email protected]> Reviewed-by: Rob Clark <[email protected]> Signed-off-by: Rob Clark <[email protected]>
* tgsi/lowering: add support for lowering FLR and CEILRussell King2016-04-192-20/+149
| | | | | | | | | | | | | | | | Add support for lowering FLR and CEIL to FRC/SUB and FRC/ADD instructions for GPUs that support FRC but not FLR or CEIL. Since these uses FRC, it is invalid to ask for FLR or CEIL to be lowered along with FRC, so add an assert to catch this invalid configuration. We also need to deal with FLR instructions emitted by the lowering code. Fix these up with the FRC+SUB equivalent when FLR lowering is enabled. Signed-off-by: Russell King <[email protected]> Reviewed-by: Rob Clark <[email protected]> Reviewed-by: Christian Gmeiner <[email protected]> Signed-off-by: Rob Clark <[email protected]>
* radeonsi: enable TGSI support cap for compute shadersBas Nieuwenhuizen2016-04-192-7/+30
| | | | | | | | | | | | v2: Use chip_class instead of family. v3: Check kernel version for SI. v4: Preemptively allow amdgpu winsys for SI. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Consider input SGPR count for compute shader SGPR count.Bas Nieuwenhuizen2016-04-192-6/+13
| | | | | | | | si_shader_create corrects the SGPR count with si_fix_num_sgprs. We then recompute the rsrc1 register to use the new SGPR count. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Add CE synchronization for compute dispatches.Bas Nieuwenhuizen2016-04-193-2/+8
| | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: clean up compute flushBas Nieuwenhuizen2016-04-192-18/+8
| | | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: do not do two full flushes on every compute dispatchBas Nieuwenhuizen2016-04-195-22/+17
| | | | | | | | | | | | | | | | | | | | v2: Add more CS_PARTIAL_FLUSH events. Essentially every place with waits on finishing for pixel shaders also has a write after read hazard with compute shaders. Invalidating L2 waits implicitly on pixel and compute shaders, so, we don't need a CS_PARTIAL_FLUSH for switching FBO. v3: Add CS_PARTIAL_FLUSH events even if we already have INV_GLOBAL_L2. According to Marek the INV_GLOBAL_L2 events don't wait for compute shaders to finish, so wait for them explicitly. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: split setting graphics and compute descriptorsBas Nieuwenhuizen2016-04-194-14/+59
| | | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: split texture decompression for compute shadersBas Nieuwenhuizen2016-04-194-4/+16
| | | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: update predicate condition for compute dispatchesBas Nieuwenhuizen2016-04-192-0/+15
| | | | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: implement TGSI compute dispatchBas Nieuwenhuizen2016-04-191-27/+77
| | | | | | | | | | v2: - Use radeon_set_sh_reg_seq. - Set predicate bit for conditional rendering. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: only emit compute shader state when switching shadersBas Nieuwenhuizen2016-04-192-59/+88
| | | | | | | | | | | | v2: - Do check if anything changed earlier - Use emitted_program instead of emitted_bo to prevent shaders with shader->bo = NULL confusing the check - Use radeon_set_sh_reg* Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: rework compute scratch bufferBas Nieuwenhuizen2016-04-193-93/+47
| | | | | | | | | | | | | | | Instead of having a scratch buffer per program, have one per context. Also removed the per kernel wave count calculations, but that only helped if the total number of waves in the dispatch was smaller than sctx->scratch_waves. v2: Fix style issue. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: do per cs setup for compute shaders once per csBas Nieuwenhuizen2016-04-193-32/+48
| | | | | | | | | | | | Also removes PKT3_CONTEXT_CONTROL as that is already being done by si_begin_new_cs, when emitting init_config. v2: - Use radeon_set_sh_reg_seq. - Also set COMPUTE_STATIC_THREAD_MGMT_SE2 / SE3 for CIK+ Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: don't pass scratch buffer to user SGPRsBas Nieuwenhuizen2016-04-191-8/+0
| | | | | | | | As far as I can see we use relocations for clover too. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: split input upload off from si_launch_gridBas Nieuwenhuizen2016-04-191-41/+52
| | | | | | | | | | | | | | | | Also uses a dynamically allocated buffer using u_upload_alloc. The old buffer per program approach required serializing all dispatches of the same program. v2: - Clarified commit message. - Use radeon_set_sh_reg_seq. - Also upload input buffer for clover kernels, even when input_size is 0, as it contains grid parameters. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: implement TGSI compute shader creationBas Nieuwenhuizen2016-04-191-18/+58
| | | | | | | v2: Moved scratch_enabled initialization after compile. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: update shader count for compute shadersBas Nieuwenhuizen2016-04-191-1/+2
| | | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: set maximum work group size based on block sizeBas Nieuwenhuizen2016-04-191-0/+12
| | | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: implement shared atomicsBas Nieuwenhuizen2016-04-191-1/+76
| | | | | | | | | | v2: - Use single region - Use get_memory_ptr Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: implement shared memory load/storeBas Nieuwenhuizen2016-04-191-2/+82
| | | | | | | | | | v2: - Use single region - Combine address calculation Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: add shared memoryBas Nieuwenhuizen2016-04-194-0/+37
| | | | | | | | | | | | | | | Declares the shared memory as a global variable so that LLVM is aware of it and it does not conflict with passes like AMDGPUPromoteAlloca. v2: - Use ctx->i8. - Dropped null-check for declare_memory_region. - Changed memory region array to single region. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: lower compute shader argumentsBas Nieuwenhuizen2016-04-192-0/+50
| | | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: Use CE for all descriptors.Bas Nieuwenhuizen2016-04-191-10/+64
| | | | | | | | | | v2: Load previous list for new CS instead of re-emitting all descriptors. v3: Do radeon_add_to_buffer_list in si_ce_upload. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/util: Add u_bit_scan_consecutive_range64.Bas Nieuwenhuizen2016-04-191-0/+14
| | | | | | | | | For use by radeonsi. v2: Make sure that it works for all 64 bits set. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Replace list_dirty with a mask.Bas Nieuwenhuizen2016-04-192-17/+29
| | | | | | | We can then upload only the dirty ones with the constant engine. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Add CE uploader.Bas Nieuwenhuizen2016-04-193-0/+37
| | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Allocate chunks of CE ram.Bas Nieuwenhuizen2016-04-192-9/+27
| | | | | | | | | v2: Use 32 byte alignment. v3: Don't allocate CE space for vertex buffer descriptors. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Add CE synchronization.Bas Nieuwenhuizen2016-04-192-0/+27
| | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Add CE packet definitions.Bas Nieuwenhuizen2016-04-191-0/+6
| | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Create CE IB.Bas Nieuwenhuizen2016-04-195-1/+54
| | | | | | | | | | | | | | | | | | | Based on work by Marek Olšák. v2: Add preamble IB. Leaves the load packet in the space calculation as the radeon winsys might not be able to support a premable. The added space calculation may look expensive, but is converted to a constant with (at least) -O2 and -O3. v3: - Fix code style. - Remove needed space for vertex buffer descriptors. - Fail when the preamble cannot be created. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* winsys/amdgpu: Enlarge const IB size.Bas Nieuwenhuizen2016-04-191-8/+20
| | | | | | | | | | | | Necessary to prevent performance regressions due to extra flushing. Probably should enlarge it even further when also updating uniforms through the CE, but this seems large enough for now. v2: Add preamble IB. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* winsys/amdgpu: Add support for const IB.Marek Olšák2016-04-193-5/+124
| | | | | | v2: Use the correct IB to update request (Bas Nieuwenhuizen) v3: Add preamble IB. (Bas Nieuwenhuizen) Reviewed-by: Marek Olšák <[email protected]>
* winsys/amdgpu: split IB data into a new structure in preparation for CEMarek Olšák2016-04-194-47/+48
| | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* gallium/radeon: move ring_type into winsysesMarek Olšák2016-04-195-10/+11
| | | | | | Not used by drivers. Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* llvmpipe: Call LLVMShutdown before exiting.Jose Fonseca2016-04-191-0/+2
| | | | | | So that LLVM frees its globals. Trivial.
* llvmpipe: Avoid LLVMGetGlobalContext in tests.Jose Fonseca2016-04-195-6/+24
| | | | Trivial.
* llvmpipe: Skip false exp2 failure in lp_test_arit due to buggy MSVCRT.Jose Fonseca2016-04-191-4/+34
| | | | | | | | | 64bits MSVCRT's exp2f(-inf) returns -inf instead of 0. Tested with MSVC 2013's CRT. (I haven't tried 2015 yet.) Also this does not happen with MinGW. Reviewed-by: Roland Scheidegger <[email protected]>
* llvmpipe: Test more vector lengths.Jose Fonseca2016-04-191-13/+30
| | | | | | | | | | All power of two of up native vector length. There is actually a bug in lp_build_round for v2, whereby it doesn't round to nearest. Fixing is left to the future, but the test is now able to expect it to fail. Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: Avoid llvm::sys::getProcessTriple().Jose Fonseca2016-04-191-3/+3
| | | | | | | Just use LLVM_HOST_TRIPLE, which is available at least from LLVM 3.3 onwards, and is pretty much what llvm::sys::getProcessTriple() does anyway, Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: Remove lp_get_module_id.Jose Fonseca2016-04-194-12/+15
| | | | | | Just keep a copy of the module_name in gallivm. Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: Fix MCJIT with LLVM 3.3.Jose Fonseca2016-04-191-3/+3
| | | | | | | | | | | | One needs to call setJITMemoryManager for LLVM 3.3, instead of setMCJITMemoryManager. This regressed in commits 065256df/75ad4fe7 when trying to make the code to build with LLVM 3.6. Tested MCJIT with LLVM 3.3 to 3.6. Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: Make MCJIT a runtime option.Jose Fonseca2016-04-191-75/+72
| | | | | | | | | | | | | On the LLVM versions that support it, so we can easily switch between MCJIT/old-jit for testing. The new option is GALLIVM_MCJIT. Unfortunately setting GALLIVM_MCJIT=1 for LLVM 3.3 or 3.4 causes segfault, both on Linux and Windows. I'm almost certain this used to work, so there probably is a regression somewhere. Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: Use LLVMSetTarget.Jose Fonseca2016-04-191-3/+9
| | | | | | Instead of LLVM C++ interfaces. Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: Use LLVMPrintValueToString where available.Jose Fonseca2016-04-191-35/+10
| | | | | | | | | | | | | And llvm::raw_string_ostream where not (LLVM 3.3). Thereby eliminating yet another dependency on unstable LLVM interfaces. As a bonus this also gets LLVM IR on OutputDebugMessageA on MSVC (which was disabled, probably due to C++ issues.) Tested `lp_test_arit -v -v` on LLVM 3.3, 3.4 and 3.8. Reviewed-by: Roland Scheidegger <[email protected]>
* gallium/tests: Update UTIL_FORMAT_MAX_* defines.Jose Fonseca2016-04-192-3/+7
| | | | Reviewed-by: Roland Scheidegger <[email protected]>
* Revert "nv50/ra: `isinf()` is in namespace `std` since C++11."Jose Fonseca2016-04-191-4/+0
| | | | | | This reverts commit f525db6358fbaa7b4296d2e6484e0b1ae703ac78. It was superseeded by commit 649704f1f7c9e1d0990d34a76154b2eb656bee42.