| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
Noticed by Brian. Trivial.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
| |
Now the lowering pass is fixed, reenable ARB_cull_distance.
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
| |
Now the lowering pass is fixed we can reenable culling.
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes a bug that breaks cull distances. The problem
is the max array accessors can't tell the difference between
an never accessed unsized array and an accessed at location 0
unsized array. This leads to converting an undeclared unused
gl_ClipDistance inside or outside gl_PerVertex to a size 1
array. However we need to the number of active clip distances
to work out the starting point for the cull distances, and
this offset by one when it's not being used isn't possible
to distinguish from the case were only the first element is
accessed. I tried to use ->used for this, but that doesn't
work when gl_ClipDistance is part of an interface block.
So this changes things so that max_array_access is an int
and initialised to -1. This also allows unsized arrays to
proceed further than that could before, but we really shouldn't
mind as they will get eliminated if nothing uses them later.
For initialised uniforms we no longer change their array
size at runtime, if these are unused they will get eliminated
eventually.
v2: use ralloc_array (Ilia)
Reviewed-by: Ilia Mirkin <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
| |
This format does not support alpha blending, according to the SNB PRM.
Signed-off-by: Nanley Chery <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
gcc6 warns about this.
Acked-by: Matt Turner <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
| |
When we have the geometry extensions, enable querying of the new param.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
|
| |
OES_geometry_shader has wording to allow xfb when using Draw*Indirect
and DrawElements.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
When passing in GL_RGBA or other base formats, we will try to upgrade
the format to whatever the passed in type was. However not all drivers
(notably nv30) support 32F textures, and so this would lead to crashes
down the line. Only upgrade when the relevant extensions are available.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Otherwise we still have TGSI_OPCODE_CMP's info, which causes a number of
later logic to go wrong. This fixes
dEQP-GLES2.functional.shaders.functions.control_flow.return_in_if_vertex
on nv30.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Technically, this was introduced with GL 4.4. However, I believe it
was intended to be retroactive. As far as I know, AMD has never
supported primitive restart with patches, while NVidia and Intel do.
This necessitated the need for a query which would allow applications
to figure out whether this was usable or not.
I decided to expose it everywhere ARB_tessellation_shader is exposed.
(It's also in both OES and EXT_tessellation_shader.)
Enable this for i965 and Gallium drivers which expose the capability.
v2: Fix a bug in the state_tracker code (caught by Ilia Mirkin).
Bugzilla: https://cvs.khronos.org/bugzilla/show_bug.cgi?id=10364
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This lets the rest of the backend know that the uniform pull constant
load opcodes don't respect channel enables -- Without this the
register allocator has no way to know that the return payload of a
pull constant load is not per-channel and spills of the destination
will be broken under non-uniform control flow.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
This should be working fine now.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94997
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently the spilling code attempts to guess the scratch message
block size from the dispatch width of the shader, which is plain wrong
for SIMD-lowered instructions (frequently but not exclusively
encountered in SIMD32 shaders) or for instructions with register
region data types of size other than 32 bit.
Instead try to use the SIMD component size of the instruction which in
some cases will allow the dataport to apply the correct channel mask
to the scratch data read or written. In the spill case the block size
needs to be clamped to the number of MRF registers reserved for
spilling. In the unspill case I didn't even bother because we
currently have no 100% accurate way to determine whether a source
region is per-channel or whether it contains things like headers that
don't respect channel boundaries -- That's fine, because the unspill
is marked force_writemask_all we can just use the largest allowable
scratch message size.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
instruction.
This prevents the application of an incorrect channel mask by the
scratch write instruction for spilled variables that don't have an
exact one-to-one correspondence between channels of the variable and
32-bit components of the scratch write instruction.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This makes sure that unspills restore the exact contents of the
variable in scratch space into the GRF without applying channel
masking, which is incorrect under control flow for things like message
headers or vectors of heterogeneous types that don't properly respect
channel boundaries.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This makes emit_(un)spill even more stupid by removing the logic that
decides what execution size each scratch read or write send message
should have and instead relying on the caller to specify an
appropriate execution size via the builder argument. This makes sense
because the caller will need to act differently based on the scratch
message width (e.g. emit an additional unspill before the instruction
if the execution width and channel layout of the spill doesn't match
the instruction's).
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This seems cleaner than exposing an implementation detail of
brw_fs_reg_allocate.cpp to the world, and will give the caller control
over the instruction execution flags (e.g. force_writemask_all) that
are applied to the scratch read and write instructions.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Until now the execution controls (e.g. channel group,
force_writemask_all, exec_size) of the instruction had been completely
ignored by spilling, even though that can lead to a mismatch between
the channel mask applied to the contents of the (un)spilled memory and
the GRF source or destination of the instruction. In some cases we'll
actually want the (un)spill messages to be marked force_writemask_all
regardless of whether the instruction has it set, but that will have
to be handled specially by the caller.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
To avoid some some spurious warnings about comparison signedness in
the following commits.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
And as we're at it fix the calculation to allocate a larger block of
registers for 32-wide dispatch.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
/system/vendor/lib/dri/*_dri.so actually depend on libglapi: without
this, loading the so file fails with:
cannot locate symbol "__emutls_v._glapi_tls_Context"
On non-Android (non-bionic) platform, EGL uses the following
workflow, which works fine:
dlopen("libglapi.so", RTLD_LAZY | RTLD_GLOBAL);
dlopen("dri/<driver>_dri.so", RTLD_NOW | RTLD_GLOBAL);
However, bionic does not respect the RTLD_GLOBAL flag, and the dri
library cannot find symbols in libglapi.so, so we need to link
to libglapi.so explicitly. Android.mk already does this.
Signed-off-by: Nicolas Boichat <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
[Emil Velikov: s/explicitely/explicitly/]
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The previous implementation relied on the std140 alignment rules to
avoid handling misalignment in the case where we are loading more than
2 double components from a vector, which requires to emit a second load
message.
This alternative implementation deals with misalignment and is more
flexible going forward.
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The code didn't deal with explicit function indexes properly.
It also handed out the indexes at link time, when we really
need them in the lowering pass to create the correct if ladder.
So this patch moves assigning the non-explicit indexes earlier,
fixes the lowering pass and the lookups to get the correct values.
This fixes a few of:
GL45-CTS.explicit_uniform_location.subroutine-index-*
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
| |
Fixes:
GL45-CTS.shader_subroutine.subroutine_uniform_reset
Reviewed-by: Chris Forbes <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The code was implementing the ACTIVE_SUBROUTINE_UNIFORMS
incorrectly, using the number of types not the number of
uniforms. This is different than the locations as the
locations may be sparsly allocated.
This fixes:
GL43-CTS.shader_subroutine.four_subroutines_with_two_uniforms
Reviewed-by: Chris Forbes <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
GLSL spec says this doesn't generate an error.
Fixes:
GL45-CTS.explicit_uniform_location.subroutine-loc
Reviewed-by: Chris Forbes <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
_mesa_GetActiveSubroutineUniformiv needs to check
against the number of types here.
Noticed while playing with ogl conform.
Reviewed-by: Chris Forbes <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes:
GL45-CTS.direct_state_access.queries_errors
The ARB_direct_state_access spec agrees.
v2: move check down further (Ilia)
Reviewed-by: Ilia Mirkin <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
| |
Caught by Coverity (CID 1362021). Caused by commit 015f2207c.
|
|
|
|
|
|
| |
We would have segfaulted in the above code if prog could be NULL.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the transform feedback object is paused when ending, then there are
no new snapshots to add to the tally. In fact, we haven't written a
starting snapshot, so we'd best not try and compute (end - start).
Just load the existing tally so we can convert it to the number of
vertices written and store it to the final result location.
This is the Haswell+ equivalent of the previous commit.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the transform feedback object is paused, then we've already written
an ending counter snapshot. We don't want to write another one.
This fixes assertions in GL33-CTS.transform_feedback.api_errors_test,
which calls EndTransformfeedback after PauseTransformFeedback. On the
next BeginTransformFeedback, we tried to tally up the results, and saw
an odd number of snapshots (due to the double-end), and tripped an
assertion.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This way, the driver's EndTransformFeedback() hook can tell whether the
transform feedback operation was paused. It's also convenient to have
Paused remain false until the driver's PauseTransformFeedback hook
finishes.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For cull distance GLSL will let unsized unused arrays get
into the backend, we should nuke those straight away, to
save caring about them later.
This fixes:
arb_separate_shader_objects/linker/large-number-of-unused-varyings
as a side effect (even without culling changes).
Reviewed-by: Jason Ekstrand <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Rob's nir_lower_wpos_ytransform() pass flips dFdy in the opposite case
of what I expected, so we always take the negate_value case. It doesn't
really matter.
v2: Write src0 before src1 in ADD instructions (requested by Matt).
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that we handle flipping and other gl_FragCoord transformations
via a uniform, these key fields have no users.
This patch actually eliminates the associated recompiles. The Tomb
Raider benchmark's minimum FPS increases from ~1 FPS to a reasonable
number.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This handles gl_FragCoord transformations and other window system vs.
user FBO coordinate system flipping by multiplying/adding uniform
values, rather than recompiles.
This is much better because we have no decent way to guess whether
the application is going to use a shader with the window system FBO
or a user FBO, much less the drawable height. This led to a lot of
recompiles in many applications.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
| |
We'd like the comparisons to mean "the exact same bits". Comparing
doubles won't do that for NaN values or positive vs. negative zero.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We were locking the Shared->Mutex and then using calling functions like
_mesa_HashInsert that do additional per-hash-table locking internally.
Instead just lock each hash-table's mutex and use functions like
_mesa_HashInsertLocked and the new _mesa_HashRemoveLocked.
In order to do this, we need to remove the locking from
_mesa_HashFindFreeKeyBlock since it will always be called with the
per-hash-table lock taken.
Reviewed-by: Timothy Arceri <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Timothy Arceri <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Cuts 6K of .text.
text data bss dec hex filename
5772372 264648 29320 6066340 5c90a4 lib/i965_dri.so before
5766074 264648 29320 6060042 5c780a lib/i965_dri.so after
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
This trivial fix to error-handling corrects the sign of drm error
codes before passing them to strerror.
Identified by Coverity: CID1358581
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Ken suggested instead of a big and complicated optimization pass, to
just recognize the operations here. It's certainly less code and a lot
prettier, but it seems to actually perform worse for currently unknown
reasons.
total instructions in shared programs: 8923452 -> 8904108 (-0.22%)
instructions in affected programs: 814563 -> 795219 (-2.37%)
helped: 3336
HURT: 10
total cycles in shared programs: 66970734 -> 66651476 (-0.48%)
cycles in affected programs: 10582686 -> 10263428 (-3.02%)
helped: 2438
HURT: 691
total spills in shared programs: 1811 -> 1789 (-1.21%)
spills in affected programs: 85 -> 63 (-25.88%)
helped: 4
total fills in shared programs: 3143 -> 3109 (-1.08%)
fills in affected programs: 167 -> 133 (-20.36%)
helped: 4
LOST: 2
GAINED: 36
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
The next patch wants to inspect the LOD argument and do something
different if it's 0.0f. But at that point we've emitted a MOV for it and
we just have a register to look at.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
brw_fs.cpp: In function ‘const unsigned int* brw_compile_fs(const [...]
brw_fs.cpp:6093:64: warning: ‘simd16_grf_start’ may be used uninitialized [...]
prog_data->base.dispatch_grf_start_reg = simd16_grf_start;
brw_fs.cpp:5996:29: note: ‘simd16_grf_start’ was declared here
uint8_t simd8_grf_start, simd16_grf_start;
brw_fs.cpp:6094:52: warning: ‘simd16_grf_used’ may be used uninitialized [...]
prog_data->reg_blocks_0 = brw_register_blocks(simd16_grf_used);
brw_fs.cpp:5997:29: note: ‘simd16_grf_used’ was declared here
unsigned simd8_grf_used, simd16_grf_used;
(and more)
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
| |
This reverts commit 2a8aa1e3deb99a1ae16d942318da648c1327ece5.
|
|
|
|
|
|
|
| |
Fixes regression introduced by af5ca43f2676bff7499f93277f908b681cb821d0
Reviewed-by: Matt Turner <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95419
|
|
|
|
|
| |
Pretty useless, as it's in debugging code. Found by Coverity (CID
1257016).
|