| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
This makes it clear that it's a boolean test and not an action
(eg. "empty the list").
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
| |
This reverts commit a79b93269cf340ce4d23b5b34100039bcaafc841.
Reviewed-by: Jose Maria Casanova <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
v2: make variable names snake_case
v2: minor cleanups in emit_udiv()
v2: fix Panfrost build failure
v3: use an enum instead of a boolean flag in nir_lower_idiv()'s signature
v4: remove nir_op_urcp
v5: drop nv50 path
v5: rebase
v6: add back nv50 path
v6: add comment for nir_lower_idiv_path enum
v7: rename _nv50/_llvm to _fast/_precise
v8: fix etnaviv build failure
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Daniel Schürmann <[email protected]>
|
|
|
|
|
|
|
| |
This triggers lowering in the state-tracker, which makes things a bit
simpler.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
This allows us to make sure clipdist is emitted as a scalar array rather
than two vec4s. This matches SPIR-V semantics, and will be useful for
Zink.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
Reviewed-by: Timothy Arceri <[email protected]>
Reviewed-by: Christian Gmeiner <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
This will replace PIPE_SHADER_CAP_SCALAR_ISA.
Reviewed-by: Timothy Arceri <[email protected]>
Reviewed-by: Christian Gmeiner <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Upcoming changes to sub optimization will make this pass required. Over
the course of that series, we see uniforms +.46%, instructions -.24%
(seems like a fine tradeoff -- uniforms are 1/2 the size of instructions
as far as cache occupancy)
Reviewed-by: Daniel Schürmann <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Set of opcodes doesn't have enough flexibility in certain cases. E.g.
Utgard PP has vector conditional select operation, but condition is always
scalar. Lowering all the vector selects to scalar increases instruction
number, so we need a way to filter only those ops that can't be handled
in hardware.
Reviewed-by: Qiang Yu <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Signed-off-by: Vasily Khoruzhick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Four bytes of src_surf will be compressed into a 32-bits data and
stored into dst_surf, and dst_surf is read as z-order, so its width
must be aligned to multiples of 8(4x2) before divided by 2.
Signed-off-by: Zhaowei Yuan <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111266
Reviewed-by: Alejandro Piñeiro <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The OpenGL ES spec requires that the value of gl_PointSize is clamped
to an implementation-dependent range matching what is advertised by
GL_ALIASED_POINT_SIZE_RANGE. For VC4 this is [1.0, 512.0], but the
hardware won't clamp to the minimum side of the range and won't render
points with a size strictly smaller than 1.0 either, so we need to
clamp manually. For points larger than the maximum size of the range
the hardware clamps automatically.
Fixes piglit test:
spec/!opengl 2.0/vs-point_size-zero
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Lionel Landwerlin <[email protected]>
Acked-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
We have a cap bit for gallium and a GLSL compiler flag to control this.
Just trust what GLSL gives us and stop forcing it. In order for this to
be safe, we have to advertise another cap in some of the gallium
drivers.
Reviewed-by: Alyssa Rosenzweig <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
| |
Suggested-by: Jason Ekstrand <[email protected]>
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
| |
Fixes: ea5b7de138b ("radeonsi: make gl_SampleMaskIn = 0x1 when MSAA is disabled")
Tested-by: Michel Dänzer <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a relatively minimal change to adjust all the gallium interfaces
to use bool instead of boolean. I tried to avoid making unrelated
changes inside of drivers to flip boolean -> bool to reduce the risk of
regressions (the compiler will much more easily allow "dirty" values
inside a char-based boolean than a C99 _Bool).
This has been build-tested on amd64 with:
Gallium drivers: nouveau r300 r600 radeonsi freedreno swrast etnaviv v3d
vc4 i915 svga virgl swr panfrost iris lima kmsro
Gallium st: mesa xa xvmc xvmc vdpau va
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Acked-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
| |
Cuts out a bunch of boilerplate.
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Sagar Ghuge <[email protected]>
Suggested-by: Matt Turner <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Currently, st/mesa is always calling the GLSL IR lower_instructions()
pass with MOD_TO_FLOOR set, so mod operations will be lowered before
ever reaching NIR. This enables the same lowering at the NIR level,
which will let me shut off the GLSL IR path for NIR-based drivers.
Reviewed-by: Marek Olšák <[email protected]>
Acked-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The difference between imov and fmov has been a constant source of
confusion in NIR for years. No one really knows why we have two or when
to use one vs. the other. The real reason is that they do different
things in the presence of source and destination modifiers. However,
without modifiers (which many back-ends don't have), they are identical.
Now that we've reworked nir_lower_to_source_mods to leave one abs/neg
instruction in place rather than replacing them with imov or fmov
instructions, we don't need two different instructions at all anymore.
Reviewed-by: Kristian H. Kristensen <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
Reviewed-by: Vasily Khoruzhick <[email protected]>
Acked-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
The _LEVELS assumes that the max is always power of two. For V3D 4.2, we
can support up to 7680 non-power-of-two MSAA textures, which will let X11
support dual 4k displays on newer hardware.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This can be used by both etnaviv and freedreno/a2xx as they are both vec4
architectures with some instructions being scalar-only.
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Christian Gmeiner <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I don't know why I thought NIR_PASS always set the progress variable.
Derp.
Fixes: d41cdef2a59 ("nir: Use the flrp lowering pass instead of nir_opt_algebraic")
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
Coverity CID: 1444996
Coverity CID: 1444995
Coverity CID: 1444994
Coverity CID: 1444993
Coverity CID: 1444991
Coverity CID: 1444989
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I tried to be very careful while updating all the various drivers, but I
don't have any of that hardware for testing. :(
i965 is the only platform that sets always_precise = true, and it is
only set true for fragment shaders. Gen4 and Gen5 both set lower_flrp32
only for vertex shaders. For fragment shaders, nir_op_flrp is lowered
during code generation as a(1-c)+bc. On all other platforms 64-bit
nir_op_flrp and on Gen11 32-bit nir_op_flrp are lowered using the old
nir_opt_algebraic method.
No changes on any other Intel platforms.
v2: Add panfrost changes.
Iron Lake and GM45 had similar results. (Iron Lake shown)
total cycles in shared programs: 188647754 -> 188647748 (<.01%)
cycles in affected programs: 5096 -> 5090 (-0.12%)
helped: 3
HURT: 0
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.12% max: 0.12% x̄: 0.12% x̃: 0.12%
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
| |
Driver which do not support native integers should use a lowering
pass to go from integers to floats.
Signed-off-by: Christian Gmeiner <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
I have a platform with vc4 display but V3D 4.x. We can fall back on
kmsro's probing to bring up the v3d gallium driver.
Acked-by: Rob Clark <[email protected]>
|
| |
|
|
|
|
|
|
| |
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Fixes: 5131b7a43f8488a7 ('gallium: add support for formatted image loads')
|
|
|
|
|
|
|
|
|
|
| |
Acked-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
Acked-by: Marek Olšák <[email protected]>
Acked-by: Jason Ekstrand <[email protected]>
Acked-by: Bas Nieuwenhuizen <[email protected]>
Acked-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
| |
v3: rebase
v3: make use of u_pipe_screen_get_param_defaults
Signed-off-by: Rhys Perry <[email protected]>
Signed-off-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
| |
v2: remove & operator in a couple of memsets
add some memsets
v3: fixup lima
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]> (v2)
|
|
|
|
|
|
|
| |
This required to calculate sizes correctly when we have bindless
samplers/images.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Same as I did for V3D, drop all this code trying to GC the
non-indirectly-loaded uniforms from the UBO that's used for indirect
access of gallium cb[0]. While it does successfully drop some of those,
it came at the cost of uploading the VS's indirect unifroms twice, for the
bin and render versions of the shader.
With the UBO loads simplified, I was also able to easily backport V3D's
change to pack a UBO offset into the uniform_data[] field so that we don't
need to do the add of the uniform base in the shader.
As a bonus, now vc4 doesn't depend on mesa/st type_size functions.
total uniforms in shared programs: 25514 -> 25490 (-0.09%)
total instructions in shared programs: 77019 -> 76836 (-0.24%)
|
|
|
|
| |
I'm going to extend how UBO0 works in a moment.
|
|
|
|
|
|
| |
In the next commit, we'll want this for handling UBO access clamping.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
PIPE_CAP_PACKED_UNIFORMS conflates several things: Lowering uniforms i/o
at the st level instead of the backend, packing uniforms with no padding
at all, and lowering to UBOs.
Requiring backends to lower uniforms i/o for !PIPE_CAP_PACKED_UNIFORMS
leads to the driver needing to either link against the type size function
in mesa/st, or duplicating it in the backend. Given that all backends
want this lower-io as far as I can tell, just move it to mesa/st to
resolve the link issue and avoid the driver author needing to understand
st's uniforms layout.
Incidentally, fixes uniform layout failures in nouveau in:
dEQP-GLES2.functional.shaders.struct.uniform.sampler_nested_fragment
dEQP-GLES2.functional.shaders.struct.uniform.sampler_nested_vertex
dEQP-GLES2.functional.shaders.struct.uniform.sampler_array_fragment
dEQP-GLES2.functional.shaders.struct.uniform.sampler_array_vertex
and I think in Lima as well.
v2: fix indents
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We have a pass to lower global registers to locals and many drivers
dutifully call it. However, no one ever creates a global register ever
so it's all dead code. It's time we bury it.
Acked-by: Karol Herbst <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Alyssa Rosenzweig <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
| |
Just a small code reduction from shared infrastructure.
|
|
|
|
| |
Just a small code reduction from shared infrastructure.
|
| |
|
|
|
|
| |
Just a small code reduction from shared infrastructure.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With this patch, tgsi_to_nir will output NIR that is tailored to
the given pipe, by reading its capabilities and adjusting the NIR code
to those capabilities similarly to how glsl_to_nir works.
It also adds an optimization loop that brings the output NIR in line
with what glsl_to_nir outputs. This is necessary for the same reason
why glsl_to_nir has its own optimization loop: currently not every
driver does these optimizations yet.
For uses which cannot pass a pipe_screen we also keep a variant
called tgsi_to_nir_noscreen which keeps the old behavior.
Signed-Off-By: Timur Kristóf <[email protected]>
Tested-by: Andre Heider <[email protected]>
Tested-by: Rob Clark <[email protected]>
Acked-By: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
| |
There was an issue recently caused by the system header being included
by mistake, so let's just get rid of this include path and always
explicitly #include "drm-uapi/FOO.h"
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some NVIDIA hardware can accept 128 fragment shader input components,
but only have up to 124 varying-interpolated input components. We add a
new cap to express this cleanly. For most drivers, this will have the
same value as PIPE_SHADER_CAP_MAX_INPUTS for the fragment shader.
Fixes KHR-GL45.limits.max_fragment_input_components
Signed-off-by: Karol Herbst <[email protected]>
[imirkin: rebased, improved docs/commit message]
Signed-off-by: Ilia Mirkin <[email protected]>
Acked-by: Rob Clark <[email protected]>
Acked-by: Eric Anholt <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Cc: 19.0 <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Reported by Coverity: in the case where there exist hardware and
non-hardware queries, the code does not jump to err_free_query and leaks
the query.
CID: 1430194
Signed-off-by: Ernestas Kulik <[email protected]>
Fixes: 9ea90ffb98fb ("broadcom/vc4: Add support for HW perfmon")
|