| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
The separate FS/VS entrypoints are no longer used since a3ed98f. So
just inline them.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Provide an improved lowering for LRP, which can be implemented in two
MAD instructions with a bit of rearranging of the equation, rather
than the literal implementation of two multiplies, an add and a
subtract.
Signed-off-by: Russell King <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Improve XPD lowering to consume less instructions by using the
MAD instruction to perform the multiply and subtraction together.
Signed-off-by: Russell King <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add support for lowering TRUNC using the following sequence:
FRC tmpA, |src|
SUB tmpA, |src|, tmpA
CMP dst, -tmpA, tmpA
Note that this is incompatible with FRC lowering.
Signed-off-by: Russell King <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add support for lowering FLR and CEIL to FRC/SUB and FRC/ADD
instructions for GPUs that support FRC but not FLR or CEIL. Since
these uses FRC, it is invalid to ask for FLR or CEIL to be lowered
along with FRC, so add an assert to catch this invalid configuration.
We also need to deal with FLR instructions emitted by the lowering
code. Fix these up with the FRC+SUB equivalent when FLR lowering is
enabled.
Signed-off-by: Russell King <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
Reviewed-by: Christian Gmeiner <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
v2: Use chip_class instead of family.
v3: Check kernel version for SI.
v4: Preemptively allow amdgpu winsys for SI.
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
si_shader_create corrects the SGPR count with si_fix_num_sgprs. We then
recompute the rsrc1 register to use the new SGPR count.
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
v2: Add more CS_PARTIAL_FLUSH events.
Essentially every place with waits on finishing for pixel shaders
also has a write after read hazard with compute shaders.
Invalidating L2 waits implicitly on pixel and compute shaders,
so, we don't need a CS_PARTIAL_FLUSH for switching FBO.
v3: Add CS_PARTIAL_FLUSH events even if we already have INV_GLOBAL_L2.
According to Marek the INV_GLOBAL_L2 events don't wait for compute
shaders to finish, so wait for them explicitly.
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
v2: - Use radeon_set_sh_reg_seq.
- Set predicate bit for conditional rendering.
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
v2: - Do check if anything changed earlier
- Use emitted_program instead of emitted_bo to prevent
shaders with shader->bo = NULL confusing the check
- Use radeon_set_sh_reg*
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of having a scratch buffer per program, have one per
context.
Also removed the per kernel wave count calculations, but
that only helped if the total number of waves in the dispatch
was smaller than sctx->scratch_waves.
v2: Fix style issue.
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Also removes PKT3_CONTEXT_CONTROL as that is already being done
by si_begin_new_cs, when emitting init_config.
v2: - Use radeon_set_sh_reg_seq.
- Also set COMPUTE_STATIC_THREAD_MGMT_SE2 / SE3 for CIK+
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
As far as I can see we use relocations for clover too.
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Also uses a dynamically allocated buffer using u_upload_alloc.
The old buffer per program approach required serializing all
dispatches of the same program.
v2: - Clarified commit message.
- Use radeon_set_sh_reg_seq.
- Also upload input buffer for clover kernels, even when
input_size is 0, as it contains grid parameters.
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
|
|
| |
v2: Moved scratch_enabled initialization after compile.
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
v2: - Use single region
- Use get_memory_ptr
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
v2: - Use single region
- Combine address calculation
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Declares the shared memory as a global variable so that
LLVM is aware of it and it does not conflict with passes
like AMDGPUPromoteAlloca.
v2: - Use ctx->i8.
- Dropped null-check for declare_memory_region.
- Changed memory region array to single region.
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
v2: Load previous list for new CS instead of re-emitting
all descriptors.
v3: Do radeon_add_to_buffer_list in si_ce_upload.
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
| |
For use by radeonsi.
v2: Make sure that it works for all 64 bits set.
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
We can then upload only the dirty ones with the constant engine.
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
| |
v2: Use 32 byte alignment.
v3: Don't allocate CE space for vertex buffer descriptors.
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Based on work by Marek Olšák.
v2: Add preamble IB.
Leaves the load packet in the space calculation as the
radeon winsys might not be able to support a premable.
The added space calculation may look expensive, but
is converted to a constant with (at least) -O2 and -O3.
v3: - Fix code style.
- Remove needed space for vertex buffer descriptors.
- Fail when the preamble cannot be created.
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Necessary to prevent performance regressions due to extra flushing.
Probably should enlarge it even further when also updating
uniforms through the CE, but this seems large enough for now.
v2: Add preamble IB.
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
v2: Use the correct IB to update request (Bas Nieuwenhuizen)
v3: Add preamble IB. (Bas Nieuwenhuizen)
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
| |
Not used by drivers.
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
| |
So that LLVM frees its globals.
Trivial.
|
|
|
|
| |
Trivial.
|
|
|
|
|
|
|
|
|
| |
64bits MSVCRT's exp2f(-inf) returns -inf instead of 0. Tested with
MSVC 2013's CRT. (I haven't tried 2015 yet.)
Also this does not happen with MinGW.
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
All power of two of up native vector length.
There is actually a bug in lp_build_round for v2, whereby it doesn't
round to nearest. Fixing is left to the future, but the test is now
able to expect it to fail.
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
| |
Just use LLVM_HOST_TRIPLE, which is available at least from LLVM 3.3
onwards, and is pretty much what llvm::sys::getProcessTriple() does anyway,
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
| |
Just keep a copy of the module_name in gallivm.
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
One needs to call setJITMemoryManager for LLVM 3.3, instead of
setMCJITMemoryManager.
This regressed in commits 065256df/75ad4fe7 when trying to make the
code to build with LLVM 3.6.
Tested MCJIT with LLVM 3.3 to 3.6.
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On the LLVM versions that support it, so we can easily switch between
MCJIT/old-jit for testing.
The new option is GALLIVM_MCJIT.
Unfortunately setting GALLIVM_MCJIT=1 for LLVM 3.3 or 3.4 causes
segfault, both on Linux and Windows. I'm almost certain this used to
work, so there probably is a regression somewhere.
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
| |
Instead of LLVM C++ interfaces.
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
And llvm::raw_string_ostream where not (LLVM 3.3).
Thereby eliminating yet another dependency on unstable LLVM interfaces.
As a bonus this also gets LLVM IR on OutputDebugMessageA on MSVC (which
was disabled, probably due to C++ issues.)
Tested `lp_test_arit -v -v` on LLVM 3.3, 3.4 and 3.8.
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
| |
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
| |
This reverts commit f525db6358fbaa7b4296d2e6484e0b1ae703ac78.
It was superseeded by commit 649704f1f7c9e1d0990d34a76154b2eb656bee42.
|