| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
Signed-off-by: Tomeu Vizoso <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Tomeu Vizoso <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When the host is running on softpipe/llvmpipe the maximum number of
samples for multisampling is 1. GL 3.0 requires at least 4 samples, and
softpipe/llvmpipe get around this by enabling PIPE_CAP_FAKE_SW_MSAA.
This patch mimics softpipe/llvmpipe behavior in virgl by enabling the
same PIPE_CAP_FAKE_SW_MSAA workaround when the max sample count reported
by the host is 1. This change allows virgl on a softpipe/llvmpipe host
to advertise support for GL 3.0 and beyond.
Signed-off-by: Alexandros Frantzis <[email protected]>
Reviewed-By: Gert Wollny <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch refactors a substantial amount of code in preparation for
mipmaps. In particular, we know have a correct slice abstraction based
on offsets; cpu/gpu are no longer arbitrary pointers. We additionally
shuffle around other code to accompany these changes and cleanup how
tiled textures are handled, while drawing some attention to the blit
code.
Mipmaps are still disabled at this point, as autogeneration is not yet
implemented; enabling as-is would cause regressions.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
| |
In fact, the native "fpow" instruction only does half of it; more work
is needed for the actual instruction. For now, just lower.
Fixes: 1ea42894c ("panfrost/midgard: Implement fpow")
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
| |
Fixes
dEQP-GLES2.functional.shaders.conversions.scalar_to_scalar.int_to_bool_fragment
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
| |
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
| |
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Although this is not functional (and the command stream side is not
aiming for ES3 right now), this is enough to run dEQP-GLES3 shader
tests with the version override directive; this is useful, as some ES3
shader feature can occur in ES2 class shaders due to lowering.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
| |
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On Midgard, float ops support standard source modifiers (abs/neg) and
destination modifiers (sat/pos/round). Integer ops do not support these,
however. To cope, we use native NIR source modifiers for floats, but
lower them away to iabs/ineg for integers, implementing those ops
simultaneously to avoid regressions.
Fixes the integer tests in
dEQP-GLES2.functional.shaders.operator.unary_operator.minus.*
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
| |
Fixes
dEQP-GLES2.functional.shaders.conversions.scalar_to_scalar.bool_to_int_fragment
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
| |
Fixes
dEQP-GLES2.functional.shader.conversions.scalar_to_scalar.int_to_bool_vertex
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
| |
Fixes
dEQP-GLES2.functional.shaders.swizzles.vector_swizzles.mediump_bvec2_x_vertex
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
| |
Fixes dEQP-GLES2.functional.shaders.linkage.varying_type_vec2 (among
many others).
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Some of these are not yet fully functional due to related bugs, but this
the correct op mapping. The native ball/bany opcodes act on vec4's
unconditionally. That said, both ball and bany have the nice property
that duplicating an argument does not affect their output, so the
default "hanging swizzles" allow us to implement 2/3-component opcodes
correctly, implicitly lowering.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
| |
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
| |
Though they output scalars, they need a vector unit to make sense.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Whereas a normal fcsel acts on a boolean input in r31.w, the fcsel_i
variant acts on an integer input in r31.w, which can be preloaded with
an instruction like imov (with the appropriate negate flag on the
source).
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
| |
This preliminary implementation should handle some basic cases. Future
work should scissor the FRAGMENT job as well for efficiency.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Our viewport code hardcoded a number of wrong assumptions, which sort of
sometimes worked but was definitely wrong (and broke most of dEQP). This
corrects the logic, accounting for flipped-Y framebuffers, which
fixes... most of dEQP.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
| |
Fixes issues in most of dEQP-GLES2.functional.shaders.*
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
| |
This fixes piglit clearbuffer-mixed-format
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This gets the basevertex from the draw depending on whether
it's an indexed or non-indexed draw.
We still fail a transform feedback test for vertex id, as
the vertex id actually an index id, and isn't getting translated
properly to a vertex id, suggestions on how/where to fix that welcome.
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In 1088b788 ("freedreno/ir3: find # of samplers from uniform vars") we
started counting number of samplers based on the uniform vars instead
of number of cat5 instructions. We used the number of samplers to
determine whether to enable derivatives, but when we only use
derivatives and no samplers, that now breaks. Track whether we need
derivatives explicitly and use that to enable the state.
Fixes: 1088b788 ("freedreno/ir3: find # of samplers from uniform vars")
Signed-off-by: Kristian H. Kristensen <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
iris is thread safe, enable csmt for a ~5% performace boost.
Signed-off-by: Andre Heider <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Axel Davy <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
From "Alpha Coverage" section of SKL PRM Volume 7:
"If Pixel Shader outputs oMask, AlphaToCoverage is disabled in
hardware, regardless of the state setting for this feature."
From OpenGL spec 4.6, "15.2 Shader Execution":
"The built-in integer array gl_SampleMask can be used to change
the sample coverage for a fragment from within the shader."
From OpenGL spec 4.6, "17.3.1 Alpha To Coverage":
"If SAMPLE_ALPHA_TO_COVERAGE is enabled, a temporary coverage value
is generated where each bit is determined by the alpha value at the
corresponding sample location. The temporary coverage value is then
ANDed with the fragment coverage value to generate a new fragment
coverage value."
Similar wording could be found in Vulkan spec 1.1.100
"25.6. Multisample Coverage"
Thus we need to compute alpha to coverage dithering manually in shader
and replace sample mask store with the bitwise-AND of sample mask and
alpha to coverage dithering.
The following formula is used to compute final sample mask:
m = int(16.0 * clamp(src0_alpha, 0.0, 1.0))
dither_mask = 0x1111 * ((0xfea80 >> (m & ~3)) & 0xf) |
0x0808 * (m & 2) | 0x0100 * (m & 1)
sample_mask = sample_mask & dither_mask
Credits to Francisco Jerez <[email protected]> for creating it.
It gives a number of ones proportional to the alpha for 2, 4, 8 or 16
least significant bits of the result.
GEN6 hardware does not have issue with simultaneous usage of sample mask
and alpha to coverage however due to the wrong sending order of oMask
and src0_alpha it is still affected by it.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109743
Signed-off-by: Danylo Piliaiev <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
If the geom shader emits a point size we failed to find it here,
use the correct API to look it up.
Fixes:
tests/spec/glsl-1.50/execution/geometry/point-size-out.shader_test
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
With indirect rendering it's fine to set the instance count
parameter to 0, and expect the rendering to be ignored.
Fixes assert in KHR-GLES31.core.compute_shader.pipeline-gen-draw-commands
on softpipe
v2: return earlier before changing fpstate
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The wait here is unnecessary since we got a pool of back buffers,
and the wait for swap buffer will happen before the present pixmap,
at the same time the previous back buffer will be put back to pool
for reuse after the check for PresentIdleNotify event
Signed-off-by: Leo Liu <[email protected]>
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In Android O, MESA needs to statically link libexpat so that
it's in same VNDK namespace.
v2: apply change also to anv driver (Tapani)
v3: use += in anv change (Eric Engestrom)
Change-Id: I82b0be5c817c21e734dfdf5bfb6a9aa1d414ab33
Signed-off-by: Kishore Kadiyala <[email protected]>
Signed-off-by: Tapani Pälli <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Report 320 for a6xx, which isn't *quite* true (no geom/tess, in
particular), but other caps keep the reported GL and GLSL versions
correct (3.1 / 3.10 es). But reporting 320 will switch on
EXT_gpu_shader5, which is the goal.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adds a new cap to allow drivers to expose higher shading language
versions in GLES contexts, to avoid having to report an artificially
low version for the benefit of GL contexts.
The motivation is to expose EXT_gpu_shader5 even though a driver may
not support all the features needed for the corresponding GL extension
(ARB_gpu_shader5).
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix build error after llvm-9.0svn r352827 ("[opaque pointer types] Add a
FunctionCallee wrapper type, and use it.").
In file included from ./rasterizer/jitter/builder.h:158:0,
from swr_shader.cpp:35:
./rasterizer/jitter/gen_builder_meta.hpp: In member function ‘llvm::Value* SwrJit::Builder::VGATHERPD(llvm::Value*, llvm::Value*, llvm::Value*, llvm::Value*, llvm::Value*, const llvm:
:Twine&)’:
./rasterizer/jitter/gen_builder_meta.hpp:51:117: error: no matching function for call to ‘cast(llvm::FunctionCallee)’
Function* pFunc = cast<Function>(JM()->mpCurrentModule->getOrInsertFunction("meta.intrinsic.VGATHERPD", pFuncTy));
^
Suggested-by: Philip Meulengracht <[email protected]>
Signed-off-by: Vinson Lee <[email protected]>
Reviewed-by: Alok Hota <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This lowering isn't needed for RADV because AMDGCN has two
instructions. It will be disabled for RADV in an upcoming series.
While we are at it, factorize a little bit.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Invoking VALGRIND_CHECK_MEM_IS_DEFINED pulls in enough code to convince
gcc to not inline __gen_uint and results in a lot of packing code ending
up out-of-line with lots of stack copying. To ameliorate this, only
insert the check inside the packer if DEBUG is defined and instead
perform the validation checking before submitting the batch to the
kernel. This should give accurate results if --trace-origins=yes is
used, and failing that we can recompile in full debug mode to check on
insertion.
Improve drawoverhead baseline by 25% with a default build with
valgrind-dev installed (with effectively no loss of vg coverage).
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Caught by Chris Wilson; split out from his valgrind patch.
|
|
|
|
|
|
|
| |
There are other cases where we need to disable early-z, like image
writes. So rename to something more generic.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Improves drawoverhead baseline scores by 1.17x.
|
|
|
|
| |
Improves drawoverhead baseline score by 1.86x.
|
|
|
|
|
| |
This brings the drawoverhead 16 Tex w/ no state change score from
22% of baseline to 97% of baseline.
|
|
|
|
| |
Fixes assertions when disabling bucket allocators.
|
|
|
|
|
|
|
|
| |
The swizzling was putting float one in not integer 1.
This fixes a lot of arb_texture_view-rendering-formats cases.
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
I don't think this really buys us anything and TG4 with cubemap arrays
falls over because sampler == 2, but otherwise works fine.
Fixes:
./bin/textureGather fs shadow r CubeArray repeat
on softpipe with ARB_gpu_shader5 enabled.
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
| |
Fixes piglits if ARB_gpu_shader5 is enabled
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
| |
These didn't deal with the width == 32 case that TGSI is defined with.
Fixes piglit tests if ARB_gpu_shader5 is enabled.
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The idea was that we could skip uploading the constant-indexed uniform
data and just upload the uniforms that are variably-indexed. However,
since the VS bin and render shaders may have a different set of uniforms
used, this meant that we had to upload the UBO for each of them. The
first case is generally a fairly small impact (usually the uniform array
is the most space, other than a couple of FSes in shader-db), while the
second is a larger impact: 3DMMES2 was uploading 38k/frame of uniforms
instead of 18k.
Given that the optimization is of dubious value, has a big downside, and
is quite a bit of code, just drop it. No change in shader-db. No change
on 3DMMES2 (n=15).
|
|
|
|
|
|
|
|
|
|
| |
We'd end up with the constant offset in the uniform stream anyway, since
they're bigger than small immediates. Avoids the extra uniforms and adds
in the shader in favor of just adding once on the CPU.
shader-db:
total instructions in shared programs: 6496865 -> 6494851 (-0.03%)
total uniforms in shared programs: 2119511 -> 2117243 (-0.11%)
|
|
|
|
|
|
| |
I want to reuse this for encoding small constant UBO/SSBO offsets into the
uniform stream to reduce the extra uniform loads and adds for the small
constant offsets.
|
|
|
|
|
|
| |
v2: only emit offsets if those are !0
Signed-off-by: Karol Herbst <[email protected]>
|