| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
Signed-off-by: Dylan Baker <[email protected]>
Reviewed-by: Eric Anholt <eric at anholt.net>
|
|
|
|
| |
Signed-off-by: Dylan Baker <[email protected]>
|
|
|
|
|
|
|
| |
v2: - guard gallivm files with "with_llvm" instead of "dep_llvm.found()"
Signed-off-by: Dylan Baker <[email protected]>
Reviewed-by: Eric Anholt <[email protected]> (v1)
|
|
|
|
|
|
|
| |
This is a left-over from my version of adding the new format
after rebasing on Eric's version.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Increase the max allowed vector size from 256 to 512.
No piglit llvmpipe regressions running on avx2.
Reviewed-by: Jose Fonseca <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
V3D 3.3 is a continuation of the 3D implementation in VC4 (v2.1 and v2.6).
V3D 3.3 introduces an MMU (no more CMA allocations) and support for
GLES3.1. This driver is not currently conformant, though that will be a
target as soon as possible.
V3D 3.x parts use a new texture tiling layout common across many Broadcom
graphics parts including and the HVS scanout engine. It also massively
changes the QPU instructions, introducing a common physical register file
(no more A/B split) and half-float instructions, while removing the 4x8
unorm instructions in favor of half-float for talking to fixed function
interfaces. Because so much has changed, vc5 is implemented in a separate
gallium driver, using only the XML code-generation support from vc4.
v2: Fix tile layout for 64bpp textures. Fix texture swizzling for 32-bit
returns. Fix up a bit of MRT setup. Sync the simulator to kernel
behavior a bit more. Improve uniform debugging code. Rebase on
QIR->VIR rename. Move texture state mostly to the CSOs. Improve
cache flushing on the simulator. Fix program deletion
use-after-frees.
Acked-by: Dave Airlie <[email protected]> (uabi plan)
Acked-by: Daniel Vetter <[email protected]> (uabi plan)
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is how VC4 stores 5551 textures, which we need to support for
GL_OES_required_internalformat.
v2: Extend commit message, fix svga driver build, add BE ordering from
Roland.
v3: Rebase on PIPE_FORMAT_R10G10B10X2_UNORM addition.
Reviewed-by: Marek Olšák <[email protected]> (v2)
Reviewed-by: Nicolai Hähnle <[email protected]> (v2)
|
|
|
|
|
|
|
|
| |
The uploaders can own transfers which need to be unmapped. Destroy them
before the final sync (they're not used from the driver thread anyway)
so that the transfer_unmap call is processed by the driver.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
The viewport state was an identity anyway.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
This approach allows drivers to set their own vertex shader and skip
compilation of u_blitter vertex shaders.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
radeonsi won't set it.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The intrinsic is gone, causing shader compilation to crash.
While here, also change the fallback code to match what llvm's auto-updater
of these intrinsics would do (except that there will still be zext/trunc
instructions in there), which should ensure that the sequence gets recognized
and fused back into a pabs in the end (I didn't test this, and it's possible
even the old sequence would get recognized, but I don't see a reason why we
shouldn't use the same sequence in any case).
Tested-by: Vinson Lee <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
v2: set swizzled usage mask
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
It has always been a usage mask *after* swizzling.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
tex offsets are not "Src" operands.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
All opcodes are handled.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In check_os_altivec_support(), allow control of Altivec (first PPC vector
instruction set) code generation via a new environmental control,
GALLIVM_ALTIVEC, which is expected to take on a value of 1 or 0.
The default is to enable Altivec code generation.
This environmental control of Altivec code generation is initially
available only #ifdef DEBUG.
Cc: "17.2" <[email protected]>
Signed-off-by: Ben Crocker <[email protected]>
Acked-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In lp_build_create_jit_compiler_for_module(), advance the minimum
version of LLVM for VSX code generation to 4.0; this is the minimum
revision at which several known VSX code generation bugs are fixed:
https://llvm.org/bugs/show_bug.cgi?id=25503 (fixed in 3.8.1)
https://llvm.org/bugs/show_bug.cgi?id=26775 (fixed in 3.8.1)
https://llvm.org/bugs/show_bug.cgi?id=33531 (fixed in 4.0)
An llc performance bug introduced in LLVM 4.0,
https://llvm.org/bugs/show_bug.cgi?id=34647
is still pending as of LLVM 5.0, but only has a pronounced effect on
one of the Piglit tests: ext_transform_feedback-max-varyings.
All changes tested via Piglit.
Cc: "17.2" <[email protected]>
Signed-off-by: Ben Crocker <[email protected]>
Acked-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In init_native_targets, allow the passing of additional options to
the LLC compiler via new GALLIVM_LLC_OPTIONS environmental control.
This option is available only #ifdef DEBUG, initially.
At top, add #include <llvm-c/Support.h> for LLVMParseCommandLineOptions()
declaration.
v2: Fix compile error with old llvm versions (sroland)
Cc: "17.2" <[email protected]>
Signed-off-by: Ben Crocker <[email protected]>
Acked-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
In gallivm_compile_module, fix a typo in the
debug_printf("Invoke as \"llc ..." message.
Cc: "17.2" <[email protected]>
Signed-off-by: Ben Crocker <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
| |
Reviewed-by: Charmaine Lee <[email protected]>
|
|
|
|
|
|
| |
include libsync.h only when libdrm is compiled in
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
| |
This should be sufficient for testing all kernel/libdrm/radeonsi codepaths
that are used by radeonsi.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
for C++ editors
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The operation performed is all the same as LODQ, but with the usual
differences between dx10 and GL texture opcodes, that is separate resource
and sampler indices (plus result swizzling, and setting z/w channels
to zero).
Reviewed-by: Jose Fonseca <[email protected]>
Acked-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The status quo is quite the mess:
1. tgsi_exec will do a per-channel computation, and store the dst[0]
result (significand) correctly for each channel. The dst[1] result
(exponent) will be written to the first bit set in the writemask.
So per-component calculation only works partially.
2. r600 will only do a single computation. It will replicate the
exponent but not the significand.
3. The docs pretend that there's per-component calculation, but even
get dst[0] and dst[1] confused.
4. Luckily, st_glsl_to_tgsi only ever emits single-component instructions,
and kind-of assumes that everything is replicated, generating this for
the dvec4 case:
DFRACEXP TEMP[0].xy, TEMP[1].x, CONST[0][0].xyxy
DFRACEXP TEMP[0].zw, TEMP[1].y, CONST[0][0].zwzw
DFRACEXP TEMP[2].xy, TEMP[1].z, CONST[0][1].xyxy
DFRACEXP TEMP[2].zw, TEMP[1].w, CONST[0][1].zwzw
Settle on the simplest behavior, which is single-component calculation
with replication, document it, and adjust tgsi_exec and r600.
Reviewed-by: Marek Olšák <[email protected]>
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
| |
vc5 MMU mappings are access-controlled at a 128kb boundary, so the 4kb
here was too small for that purpose. Allowing any valid align2 value that
u_mm's 32-bit addressing can represent will still catch most cases of
people passing in a byte alignment.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Acked-by: Christian König <[email protected]>
|
|
|
|
| |
Acked-by: Christian König <[email protected]>
|
|
|
|
| |
Acked-by: Christian König <[email protected]>
|
|
|
|
| |
Acked-by: Christian König <[email protected]>
|
|
|
|
|
|
| |
No longer used.
Acked-by: Christian König <[email protected]>
|
|
|
|
|
|
|
| |
It will replace previous deint function with abilities of
scaling and field deinterlacing
Acked-by: Christian König <[email protected]>
|
|
|
|
|
|
| |
It will add Bob deint ability to interlaced video for HW encoder
Acked-by: Christian König <[email protected]>
|
|
|
|
|
|
| |
So that it can be re-used
Acked-by: Christian König <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This is not used anywhere in the codebase. It's a hashtable
implementation that is based around cso_hash, and is therefore
(and as mentioned in a comment in the source) quite similar to
u_hash_table.
CC: Brian Paul<[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|