| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
| |
There was an issue recently caused by the system header being included
by mistake, so let's just get rid of this include path and always
explicitly #include "drm-uapi/FOO.h"
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
For some reason we don't use view volume clipping by default, and use
scissors instead. These scissors were set to an 8k max fb size, while
the driver advertises 16k-sized framebuffers.
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Nouveau apparently uses the u_screen helper but prints a warning in the
default case, so running any GL program would start grumbling.
Fixes: 8fa54bc5490 gallium: Add a PIPE_CAP_NIR_COMPACT_ARRAYS capability bit.
Reviewed-by: Karol Herbst <[email protected]>
Acked-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some NVIDIA hardware can accept 128 fragment shader input components,
but only have up to 124 varying-interpolated input components. We add a
new cap to express this cleanly. For most drivers, this will have the
same value as PIPE_SHADER_CAP_MAX_INPUTS for the fragment shader.
Fixes KHR-GL45.limits.max_fragment_input_components
Signed-off-by: Karol Herbst <[email protected]>
[imirkin: rebased, improved docs/commit message]
Signed-off-by: Ilia Mirkin <[email protected]>
Acked-by: Rob Clark <[email protected]>
Acked-by: Eric Anholt <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Cc: 19.0 <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The strategy is to keep a CPU-side counter of the direct invocations,
and a GPU-side counter of the indirect invocations, and then add them
together for queries.
The specific technique is a macro which multiplies a list of integers
together and accumulates the product into SCRATCH registers held inside
of the context. Another macro will read those values out and add them to
the passed-in cpu-side counter to be stored in a query buffer the same
way that all the other statistics are stored.
Original implementation by Rhys Perry, redone by Ilia Mirkin to use the
SCRATCH temporaries.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
| |
Acked-by: Ilia Mirkin <[email protected]>
Cc: 19.0 <[email protected]>
|
|
|
|
|
| |
Acked-by: Ilia Mirkin <[email protected]>
Cc: 19.0 <[email protected]>
|
|
|
|
|
|
| |
[imirkin: add a few more "long" prefixes to safen things up]
Acked-by: Ilia Mirkin <[email protected]>
Cc: 19.0 <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
v2: (Karol Herbst <[email protected]>
* fix Value setup for the builtins
Signed-off-by: Boyan Ding <[email protected]>
[imirkin: track the fp64 flag when switching ops to calls]
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: 19.0 <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Boyan Ding <[email protected]>
Acked-by: Ilia Mirkin <[email protected]>
Cc: 19.0 <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Boyan Ding <[email protected]>
Acked-by: Ilia Mirkin <[email protected]>
Cc: 19.0 <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Not quite perfect, but at least we don't end up with random values in
the query buffer.
Fixes KHR-GL45.pipeline_statistics_query_tests_ARB.functional_default_qo_values
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: 19.0 <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For the NO_WAIT variants, we would jump into the ALWAYS case for both
nested and inverted occlusion queries. However if the query had
previously completed, the application could reasonably expect that the
render condition would follow that result.
To resolve this, we remove the nesting distinction which unnecessarily
created an imbalance between the regular and inverted cases (since
there's no "zero" condition mode). We also use the proper comparison if
we know that the query has completed (which could happen as a result of
an earlier get_query_result call).
Fixes KHR-GL45.conditional_render_inverted.functional
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: 19.0 <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Looks like SUBFM.3D and SUEAU are perfectly capable of dealing with 3d
tiling, they just need the correct inputs. Supply them.
We also have to deal with the case where a 2d "layer" of a 3d image is
bound. In this case, we supply the z coordinate separately to the
shader, which has to optionally treat every 2d case as if it could be a
slice of a 3d texture.
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: 19.0 <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We used to pre-set a bunch of extra arguments to a texture instruction
in order to force the RA to allocate a register at the boundary of 4.
However with the levelZero optimization, which removes a LOD argument
when it's uniformly equal to zero, we undid that logic by removing an
extra argument. As a result, we could end up with insufficient alignment
on the second wide texture argument.
Instead we switch to a different method of achieving the same result.
The logic runs during the constraint analysis of the RA, and adds unset
sources as necessary right before being merged into a wide argument.
Fixes MISALIGNED_REG errors in Hitman when run with bindless textures
enabled on a GK208.
Fixes: 9145873b152 ("nvc0/ir: use levelZero flag when the lod is set to 0")
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: 19.0 <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Atomic operations don't update the local cache, which means that we
would have to issue CCTL operations in order to get the updated values.
When we know that a buffer is primarily used for atomic operations, it's
easier to just avoid the caching at that level entirely.
The same issue persists for non-atomic buffers, which will have to be
fixed separately.
Fixes the failing dEQP-GLES31.functional.atomic_counter.* tests.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Karol Herbst <[email protected]>
Cc: 19.0 <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The hardware does not natively support FIXED and DOUBLE formats. If
those are used in an indirect draw, they have to be converted. Our
conversion tries to be clever about only converting the data that's
needed. However for indirect, that won't work.
Given that DOUBLE or FIXED are highly unlikely to ever be used with
indirect draws, read the indirect buffer on the CPU and issue draws
directly.
Fixes the failing dEQP-GLES31.functional.draw_indirect.random.* tests.
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: 19.0 <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
gives me an performance boost of 0.2% in pixmark_piano on my gk106, gm204 and
gp107.
reduces the amount of generated convert instructions by roughly 30% in
shader-db.
v2: only for 32 bit operations
move some common code out of the switch
handle OP_SAT with modifiers
v3: only for registers and const memory
rework if clauses
merge isCvt into this patch
v4: merge isCvt into its use
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: 19.0 <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The text segment is shared among multiple contexts, while each one has
its own bufctx. So when reallocating the text segment, some contexts may
end up with stale values in their bufctx's. Instead limit the exposure
to the bufctx to within a single draw.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We may have to flush the cache if there are any textures presently bound
that refer to the outgoing framebuffer. This is only checked at
validation time.
Fixes a number of dEQP-GLES3.functional.fbo.color.repeated_clear.sample.*
tests, which would bind a texture, then clear it while the binding was
in effect, and then render to a different texture. This seems legal
under the "no feedback loops" rule.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Karol Herbst <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Rhys Kidd <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
fixes deqp tests:
dEQP-GLES3.functional.shaders.texture_functions.texturegrad.samplercube_fixed_vertex
dEQP-GLES3.functional.shaders.texture_functions.texturegrad.samplercube_float_vertex
dEQP-GLES3.functional.shaders.texture_functions.texturegrad.isamplercube_vertex
dEQP-GLES3.functional.shaders.texture_functions.texturegrad.usamplercube_vertex
dEQP-GLES3.functional.shaders.texture_functions.texturegrad.sampler3d_fixed_vertex
dEQP-GLES3.functional.shaders.texture_functions.texturegrad.sampler3d_float_vertex
dEQP-GLES3.functional.shaders.texture_functions.texturegrad.isampler3d_vertex
dEQP-GLES3.functional.shaders.texture_functions.texturegrad.usampler3d_vertex
dEQP-GLES3.functional.shaders.texture_functions.texturegrad.sampler2dshadow_vertex
dEQP-GLES3.functional.shaders.texture_functions.textureprojgrad.sampler3d_fixed_vertex
dEQP-GLES3.functional.shaders.texture_functions.textureprojgrad.sampler3d_float_vertex
dEQP-GLES3.functional.shaders.texture_functions.textureprojgrad.isampler3d_vertex
dEQP-GLES3.functional.shaders.texture_functions.textureprojgrad.usampler3d_vertex
dEQP-GLES3.functional.shaders.texture_functions.textureprojgrad.sampler2dshadow_vertex
Fixes: f821e80213e38e93f96255b3deacb737a600ed40
"gm107/ir: use scalar tex instructions where possible"
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
instructions
fixes dEQP-GLES2.functional.shaders.invariance.mediump.loop_3
CC: <[email protected]>
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
There's no way to tell the 3D engine about swizzling on such textures.
While rendering to NPOT ones may be possible, there's no great way to
expose that in gallium, nor would there be any practical benefit.
Fixes the non-compressed-format "copyteximage 3D" failures. Something
odd going on with the compressed formats.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
s3tc layouts are a bit finicky - they're packed, but not swizzled.
Adjust logic to allow for that case:
- Don't set a uniform pitch for POT-sized compressed textures
- Adjust define_rect API to be less confused about block sizes
- Only mark a texture as linear if it has a uniform pitch set
This has been tested to fix xonotic (as well as the s3tc-* piglits)
on nv3x and keeps it working on nv4x.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
This doesn't matter since all compressed formats supported by this
hardware use square blocks, but best to use the correct helper.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
| |
This logic mirrors what we do on nv50. The relatively new
texture_subdata callback can cause this to happen with 3D textures,
which is triggered at least by xonotic, and probably many piglits.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the last-active context gets deleted, the pushbuf doesn't have a
bufctx to reference. Then there could be a sequence of binds which would
trigger a reset on that bin before validation was done. Instead we just
pass in the bufctx in question directly.
All other instances of PUSH_RESET happen strictly after a validation is
run.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102349
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The whole user_priv thing is a mess, but as long as it's there, it
basically has to map 1:1 to the cur_ctx. Unfortunately we were setting
user_priv to some context, then that context could get deleted without
any draws/validations in it, leading user_priv to become NULL, with
cur_ctx still pointing at some old context. Then we wouldn't run the
switch logic, which in turn led to a NULL bufctx being dereferenced.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102349
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Same as on nv50, the TXF op always uses the TSC bound to slot 0,
returning blank values if nothing is bound.
An earlier change arranges for the TSC entries list to always have valid
data at entry 0, so here we just make use of it.
Fixes arb_texture_buffer_object-subdata-sync among others.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This was used for implementing FBFETCH. However that uses TXF, which
doesn't do much with a TSC. The only important bit is that sRGB-decoding
works as expected, which we can achieve since all samplers we ever
generate enable sRGB-decoding. Always point to entry 0 in the TSC table,
and ensure that even before it ever gets initialized, the sRGB-decoding
enable bit is set.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
opnd() might delete the passed in instruction, but it's used through
i->srcExists() later in visit
v2: use continue instead return
v3: use brackets for the outer if/else chain
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
multiple threads can write to those at the same time
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
this race condition is pretty harmless, but also pretty trivial to fix
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
It doesn't seem like the exact number has too much effect on the
performaince in "teximage". However setting it to just about anything
prevents some OOMs from getting hit. These values are not well-tuned,
but don't seem too bad.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
Since the max attrib stride is 2048, the max src offset makes sense as
2047.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
All TXF operations implicitly use sampler 0, and fail if it's not bound
to anything. This does not happen in LINKED_TSC mode, but we don't
currently use this.
We ensure that TSC entry at id 0 has the SRGB conversion bit enabled
(and all samplers we normally generate will too). Then when the TSC at
*slot* 0 (not to be confused with entry 0 in the global TSC table) is
unbound, we bind it to entry 0. This way, TXF operations are not
dependent on there being a regular sampler bound there.
Fixes arb_texture_buffer_object-subdata-sync among others. (TBO's are
particularly susceptible to this as they don't bind a sampler.)
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The new approach is that samplers don't get unbound even if they won't be used
in a draw and we should just leave them be as well.
Fixes a regression in multiple windows games using gallium nine and nouveau.
v2: adjust num_samplers to keep track of the highest sampler bound
v3: rework how to set the new value of num_samplers
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106577
Fixes: 4d6fab245eec3880e2a59424a579851f44857ce8
"cso: don't track the number of sampler states bound"
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
dnz flag only applies for multiplications (e.g. to make 0 * Infinity
becomes 0 instead of NaN). Once we optimize a MAD into an ADD, the dnz
flag no longer makes sense, and upsets the GM107 emitter (since it looks
at the ftz and dnz flags together).
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Karol Herbst <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On nv50, certain operations must happen on regs below 64, due to
encoding requirements. First of all, we add infrastructure to enforce
this. Secondly we change the spill order to first spill RIG nodes that
are unconstrained, followed by ones that are.
This makes the gamecube logo shadertoy compile properly. Curiously, if
we adjust the spill order so that we first spill the constrained RIG
nodes instead, the RA also succeeds. However it seems more logical to
first spill the unconstrained ones.
While we're at it, drop the nv50 max register to reserve r127 as the
zero register of last resort (r63 is preferred).
Signed-off-by: Ilia Mirkin <[email protected]>
Acked-by: Karol Herbst <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of the size restriction existing in two places, and potentially
being applied twice, we move this together. Ops with 16-bit register
addresses can only take a short reg, and ops with immediates can only
take a short reg.
Of course we leave the immediate 0 in place since we know that it will
be replaced by r63/r127 down the line, so don't treat zeroes as an
immediate.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Karol Herbst <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We removed the op from the BB, but it was still listed in its sources'
uses. This could trip up some logic down the line which analyzes all the
uses of an l-value, e.g. spilling.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Karol Herbst <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
In function 'uint8_t nv50_ir::getTEXSMask(uint8_t)':
warning: control reaches end of non-void function [-Wreturn-type]
Reported-by: Moiman@freenode
Fixes: f821e80213e38e93f96255b3deacb737a600ed40
"gm107/ir: use scalar tex instructions where possible"
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
TEXS, TLD4 and TLD4S are variants of tex instructions which are more
scalar, which gives RA more freedom and is less likely to insert silly
MOVs to satisfy quad registers.
shader-db changes:
total instructions in shared programs : 7687265 -> 7614782 (-0.94%)
total gprs used in shared programs : 803620 -> 798045 (-0.69%)
total shared used in shared programs : 639636 -> 639636 (0.00%)
total local used in shared programs : 24648 -> 24648 (0.00%)
total bytes used in shared programs : 82103400 -> 81330696 (-0.94%)
local shared gpr inst bytes
helped 0 0 3648 10647 10647
hurt 0 0 464 205 205
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Reviewed-by: Ilia Mirkin <[email protected]>
|