summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* radeonsi: implement TGSI compute shader creationBas Nieuwenhuizen2016-04-191-18/+58
| | | | | | | v2: Moved scratch_enabled initialization after compile. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: update shader count for compute shadersBas Nieuwenhuizen2016-04-191-1/+2
| | | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: set maximum work group size based on block sizeBas Nieuwenhuizen2016-04-191-0/+12
| | | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: implement shared atomicsBas Nieuwenhuizen2016-04-191-1/+76
| | | | | | | | | | v2: - Use single region - Use get_memory_ptr Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: implement shared memory load/storeBas Nieuwenhuizen2016-04-191-2/+82
| | | | | | | | | | v2: - Use single region - Combine address calculation Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: add shared memoryBas Nieuwenhuizen2016-04-194-0/+37
| | | | | | | | | | | | | | | Declares the shared memory as a global variable so that LLVM is aware of it and it does not conflict with passes like AMDGPUPromoteAlloca. v2: - Use ctx->i8. - Dropped null-check for declare_memory_region. - Changed memory region array to single region. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: lower compute shader argumentsBas Nieuwenhuizen2016-04-192-0/+50
| | | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: Use CE for all descriptors.Bas Nieuwenhuizen2016-04-191-10/+64
| | | | | | | | | | v2: Load previous list for new CS instead of re-emitting all descriptors. v3: Do radeon_add_to_buffer_list in si_ce_upload. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/util: Add u_bit_scan_consecutive_range64.Bas Nieuwenhuizen2016-04-191-0/+14
| | | | | | | | | For use by radeonsi. v2: Make sure that it works for all 64 bits set. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Replace list_dirty with a mask.Bas Nieuwenhuizen2016-04-192-17/+29
| | | | | | | We can then upload only the dirty ones with the constant engine. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Add CE uploader.Bas Nieuwenhuizen2016-04-193-0/+37
| | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Allocate chunks of CE ram.Bas Nieuwenhuizen2016-04-192-9/+27
| | | | | | | | | v2: Use 32 byte alignment. v3: Don't allocate CE space for vertex buffer descriptors. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Add CE synchronization.Bas Nieuwenhuizen2016-04-192-0/+27
| | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Add CE packet definitions.Bas Nieuwenhuizen2016-04-191-0/+6
| | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Create CE IB.Bas Nieuwenhuizen2016-04-195-1/+54
| | | | | | | | | | | | | | | | | | | Based on work by Marek Olšák. v2: Add preamble IB. Leaves the load packet in the space calculation as the radeon winsys might not be able to support a premable. The added space calculation may look expensive, but is converted to a constant with (at least) -O2 and -O3. v3: - Fix code style. - Remove needed space for vertex buffer descriptors. - Fail when the preamble cannot be created. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* winsys/amdgpu: Enlarge const IB size.Bas Nieuwenhuizen2016-04-191-8/+20
| | | | | | | | | | | | Necessary to prevent performance regressions due to extra flushing. Probably should enlarge it even further when also updating uniforms through the CE, but this seems large enough for now. v2: Add preamble IB. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* winsys/amdgpu: Add support for const IB.Marek Olšák2016-04-193-5/+124
| | | | | | v2: Use the correct IB to update request (Bas Nieuwenhuizen) v3: Add preamble IB. (Bas Nieuwenhuizen) Reviewed-by: Marek Olšák <[email protected]>
* winsys/amdgpu: split IB data into a new structure in preparation for CEMarek Olšák2016-04-194-47/+48
| | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* gallium/radeon: move ring_type into winsysesMarek Olšák2016-04-195-10/+11
| | | | | | Not used by drivers. Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* llvmpipe: Call LLVMShutdown before exiting.Jose Fonseca2016-04-191-0/+2
| | | | | | So that LLVM frees its globals. Trivial.
* llvmpipe: Avoid LLVMGetGlobalContext in tests.Jose Fonseca2016-04-195-6/+24
| | | | Trivial.
* llvmpipe: Skip false exp2 failure in lp_test_arit due to buggy MSVCRT.Jose Fonseca2016-04-191-4/+34
| | | | | | | | | 64bits MSVCRT's exp2f(-inf) returns -inf instead of 0. Tested with MSVC 2013's CRT. (I haven't tried 2015 yet.) Also this does not happen with MinGW. Reviewed-by: Roland Scheidegger <[email protected]>
* llvmpipe: Test more vector lengths.Jose Fonseca2016-04-191-13/+30
| | | | | | | | | | All power of two of up native vector length. There is actually a bug in lp_build_round for v2, whereby it doesn't round to nearest. Fixing is left to the future, but the test is now able to expect it to fail. Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: Avoid llvm::sys::getProcessTriple().Jose Fonseca2016-04-191-3/+3
| | | | | | | Just use LLVM_HOST_TRIPLE, which is available at least from LLVM 3.3 onwards, and is pretty much what llvm::sys::getProcessTriple() does anyway, Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: Remove lp_get_module_id.Jose Fonseca2016-04-194-12/+15
| | | | | | Just keep a copy of the module_name in gallivm. Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: Fix MCJIT with LLVM 3.3.Jose Fonseca2016-04-191-3/+3
| | | | | | | | | | | | One needs to call setJITMemoryManager for LLVM 3.3, instead of setMCJITMemoryManager. This regressed in commits 065256df/75ad4fe7 when trying to make the code to build with LLVM 3.6. Tested MCJIT with LLVM 3.3 to 3.6. Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: Make MCJIT a runtime option.Jose Fonseca2016-04-191-75/+72
| | | | | | | | | | | | | On the LLVM versions that support it, so we can easily switch between MCJIT/old-jit for testing. The new option is GALLIVM_MCJIT. Unfortunately setting GALLIVM_MCJIT=1 for LLVM 3.3 or 3.4 causes segfault, both on Linux and Windows. I'm almost certain this used to work, so there probably is a regression somewhere. Reviewed-by: Roland Scheidegger <[email protected]>
* scons: Show the unit test full path.Jose Fonseca2016-04-191-1/+1
| | | | Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: Use LLVMSetTarget.Jose Fonseca2016-04-191-3/+9
| | | | | | Instead of LLVM C++ interfaces. Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: Use LLVMPrintValueToString where available.Jose Fonseca2016-04-191-35/+10
| | | | | | | | | | | | | And llvm::raw_string_ostream where not (LLVM 3.3). Thereby eliminating yet another dependency on unstable LLVM interfaces. As a bonus this also gets LLVM IR on OutputDebugMessageA on MSVC (which was disabled, probably due to C++ issues.) Tested `lp_test_arit -v -v` on LLVM 3.3, 3.4 and 3.8. Reviewed-by: Roland Scheidegger <[email protected]>
* gallium/tests: Update UTIL_FORMAT_MAX_* defines.Jose Fonseca2016-04-192-3/+7
| | | | Reviewed-by: Roland Scheidegger <[email protected]>
* Revert "nv50/ra: `isinf()` is in namespace `std` since C++11."Jose Fonseca2016-04-191-4/+0
| | | | | | This reverts commit f525db6358fbaa7b4296d2e6484e0b1ae703ac78. It was superseeded by commit 649704f1f7c9e1d0990d34a76154b2eb656bee42.
* vc4: Fix fbo-generatemipmap-formats for NPOT.Eric Anholt2016-04-181-0/+20
| | | | | | | Single-sampled texture miplevels > 1 are stored in POT-aligned areas, but we only get one value to control the stride of the src and dst for single sampled buffers. A RCL tile blit from level != 1 to level == 0 would therefore load from the wrong stride.
* vc4: Remove unused "immediates" fieldEric Anholt2016-04-181-1/+0
| | | | This was for TGSI, which we no longer have to deal with.
* i965: Define miptree map functions static (trivial)Ben Widawsky2016-04-181-2/+2
| | | | | | | | | | | | | They were already declared as such. It was changed here: commit 31f0967fb50101437d2568e9ab9640ffbcbf7ef9 Author: Ian Romanick <[email protected]> Date: Wed Sep 2 14:43:18 2015 -0700 i965: Make intel_miptree_map_raw static Cc: Ian Romanick <[email protected]> Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* glsl: Properly handle ldexp(0.0f, non-zero-exp).Matt Turner2016-04-181-4/+6
|
* gallivm: convert size query to using a set of parameters.Dave Airlie2016-04-197-97/+51
| | | | | | | | | | This isn't currently that easy to expand, so fix it up before expanding it later to include dynamic samplers. [airlied: use some local variables (Roland)] Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* swr: dereference cbuf/zbuf/views on context destroyTim Rowley2016-04-181-0/+15
| | | | | | Fixes resource memory leaks. Reviewed-by: Ilia Mirkin <[email protected]>
* freedreno/ir3: fix grouping issue w/ reverse swizzlesRob Clark2016-04-181-1/+17
| | | | | | | | | | | | | | | | | | | | | | | When we have something like: MOV OUT[n], IN[m].wzyx the existing grouping code was missing a potential conflict. Due to input needing to be sequential scalar regs, we have: IN: x <-> y <-> z <-> w which would be grouped to: OUT: w <-> z2 <-> y2 <-> x (where the 2 denotes a copy/mov) but that can't actually work. We need to realize that x and w are already in the same chain, not just that they aren't both already in new chain being built. With this fixed, we probably no longer need the hack from f68f6c0. Signed-off-by: Rob Clark <[email protected]>
* radeonsi: use enums in si_shader.hMarek Olšák2016-04-181-93/+119
| | | | | Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: use enums in r600_query.hMarek Olšák2016-04-181-20/+23
| | | | | Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: always use PFP_SYNC_ME when doing flushes and waitsMarek Olšák2016-04-182-1/+10
| | | | | | This is typically used by the closed driver before SURFACE_SYNC. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't do VS/PS partial flushes if SURFACE_SYNC waits tooMarek Olšák2016-04-181-11/+14
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: add safety assertions for meta cache flushesMarek Olšák2016-04-181-0/+4
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't use ACQUIRE_MEM on the graphics ringMarek Olšák2016-04-181-18/+8
| | | | | | | | | It's only required on the compute ring. This matches the closed driver. The compute flag is removed to prevent confusion and Bas's compute shader patches remove it in the whole function. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: remove TODO and correct a comment in si_emit_cache_flushMarek Olšák2016-04-181-2/+1
| | | | | | Yes, that flag is really needed. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't flush CB/DB caches for performance countersMarek Olšák2016-04-181-3/+6
| | | | | | | | I'm not sure about this. This will make the engines go idle, but the caches will be unflushed. This should match app behavior without performance counters, which can be a good thing. Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: don't flush CB/DB caches for timestamp queriesMarek Olšák2016-04-182-2/+3
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/util: fix undefined shift to the last bit in u_bit_scanMarek Olšák2016-04-181-1/+1
| | | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/util: fix u_bit_scan_consecutive_range for mask == 0xffffffffMarek Olšák2016-04-181-1/+7
| | | | | | | | The second ffs returns 0, yielding count == -1. v2: change 1 to 1u Reviewed-by: Bas Nieuwenhuizen <[email protected]>