| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes:
GL45-CTS.direct_state_access.queries_errors
The ARB_direct_state_access spec agrees.
v2: move check down further (Ilia)
Reviewed-by: Ilia Mirkin <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
| |
Caught by Coverity (CID 1362021). Caused by commit 015f2207c.
|
|
|
|
|
|
| |
We would have segfaulted in the above code if prog could be NULL.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the transform feedback object is paused when ending, then there are
no new snapshots to add to the tally. In fact, we haven't written a
starting snapshot, so we'd best not try and compute (end - start).
Just load the existing tally so we can convert it to the number of
vertices written and store it to the final result location.
This is the Haswell+ equivalent of the previous commit.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the transform feedback object is paused, then we've already written
an ending counter snapshot. We don't want to write another one.
This fixes assertions in GL33-CTS.transform_feedback.api_errors_test,
which calls EndTransformfeedback after PauseTransformFeedback. On the
next BeginTransformFeedback, we tried to tally up the results, and saw
an odd number of snapshots (due to the double-end), and tripped an
assertion.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This way, the driver's EndTransformFeedback() hook can tell whether the
transform feedback operation was paused. It's also convenient to have
Paused remain false until the driver's PauseTransformFeedback hook
finishes.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For cull distance GLSL will let unsized unused arrays get
into the backend, we should nuke those straight away, to
save caring about them later.
This fixes:
arb_separate_shader_objects/linker/large-number-of-unused-varyings
as a side effect (even without culling changes).
Reviewed-by: Jason Ekstrand <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Rob's nir_lower_wpos_ytransform() pass flips dFdy in the opposite case
of what I expected, so we always take the negate_value case. It doesn't
really matter.
v2: Write src0 before src1 in ADD instructions (requested by Matt).
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that we handle flipping and other gl_FragCoord transformations
via a uniform, these key fields have no users.
This patch actually eliminates the associated recompiles. The Tomb
Raider benchmark's minimum FPS increases from ~1 FPS to a reasonable
number.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This handles gl_FragCoord transformations and other window system vs.
user FBO coordinate system flipping by multiplying/adding uniform
values, rather than recompiles.
This is much better because we have no decent way to guess whether
the application is going to use a shader with the window system FBO
or a user FBO, much less the drawable height. This led to a lot of
recompiles in many applications.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
| |
We'd like the comparisons to mean "the exact same bits". Comparing
doubles won't do that for NaN values or positive vs. negative zero.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We were locking the Shared->Mutex and then using calling functions like
_mesa_HashInsert that do additional per-hash-table locking internally.
Instead just lock each hash-table's mutex and use functions like
_mesa_HashInsertLocked and the new _mesa_HashRemoveLocked.
In order to do this, we need to remove the locking from
_mesa_HashFindFreeKeyBlock since it will always be called with the
per-hash-table lock taken.
Reviewed-by: Timothy Arceri <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Timothy Arceri <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Cuts 6K of .text.
text data bss dec hex filename
5772372 264648 29320 6066340 5c90a4 lib/i965_dri.so before
5766074 264648 29320 6060042 5c780a lib/i965_dri.so after
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
This trivial fix to error-handling corrects the sign of drm error
codes before passing them to strerror.
Identified by Coverity: CID1358581
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Ken suggested instead of a big and complicated optimization pass, to
just recognize the operations here. It's certainly less code and a lot
prettier, but it seems to actually perform worse for currently unknown
reasons.
total instructions in shared programs: 8923452 -> 8904108 (-0.22%)
instructions in affected programs: 814563 -> 795219 (-2.37%)
helped: 3336
HURT: 10
total cycles in shared programs: 66970734 -> 66651476 (-0.48%)
cycles in affected programs: 10582686 -> 10263428 (-3.02%)
helped: 2438
HURT: 691
total spills in shared programs: 1811 -> 1789 (-1.21%)
spills in affected programs: 85 -> 63 (-25.88%)
helped: 4
total fills in shared programs: 3143 -> 3109 (-1.08%)
fills in affected programs: 167 -> 133 (-20.36%)
helped: 4
LOST: 2
GAINED: 36
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
The next patch wants to inspect the LOD argument and do something
different if it's 0.0f. But at that point we've emitted a MOV for it and
we just have a register to look at.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
brw_fs.cpp: In function ‘const unsigned int* brw_compile_fs(const [...]
brw_fs.cpp:6093:64: warning: ‘simd16_grf_start’ may be used uninitialized [...]
prog_data->base.dispatch_grf_start_reg = simd16_grf_start;
brw_fs.cpp:5996:29: note: ‘simd16_grf_start’ was declared here
uint8_t simd8_grf_start, simd16_grf_start;
brw_fs.cpp:6094:52: warning: ‘simd16_grf_used’ may be used uninitialized [...]
prog_data->reg_blocks_0 = brw_register_blocks(simd16_grf_used);
brw_fs.cpp:5997:29: note: ‘simd16_grf_used’ was declared here
unsigned simd8_grf_used, simd16_grf_used;
(and more)
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
| |
This reverts commit 2a8aa1e3deb99a1ae16d942318da648c1327ece5.
|
|
|
|
|
|
|
| |
Fixes regression introduced by af5ca43f2676bff7499f93277f908b681cb821d0
Reviewed-by: Matt Turner <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95419
|
|
|
|
|
| |
Pretty useless, as it's in debugging code. Found by Coverity (CID
1257016).
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
The only place that actually used the type parameter was the GS visitor,
and it was always passed glsl_type::int. Just remove the parameter.
brw_vec4_vs_visitor.cpp:38:61: warning: unused parameter ‘type’ [-Wunused-parameter]
const glsl_type *type)
^
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
The MaxComputeWorkGroupInvocations constant is used in
compute_version_es2() instead of extensions->ARB_compute_shader
as ES has lower requirements than desktop GL.
Both i965 and gallium set this constant before enabling compute support.
Signed-off-by: Daniel Scharrer <[email protected]>
Signed-off-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
Pointed out by coverity.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
The code which used this was removed quite a while ago.
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Sinclair Yeh <[email protected]>
Reviewed-by: Charmaine Lee <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In a5d7e144eaf43fee37e6ff9e2de194407087632b, Connor generalized the
exec_size halving code to handle more cases. As part of this, he made
it not halve anything if the region accessed falls completely in a
single register.
Unfortunately, it started producing some invalid regions:
-add(16) g6<1>F g10<8,8,1>UW -g1<0,1,0>F { align1 compr };
-add(16) g8<1>F g12<8,8,1>UW -g1.1<0,1,0>F { align1 compr };
+add(16) g6<1>F g10<16,16,1>UW -g1<0,1,0>F { align1 compr };
+add(16) g8<1>F g12<16,16,1>UW -g1.1<0,1,0>F { align1 compr };
Here, the UW source region completely fits within a register. However,
we have to use instruction compression because the destination region
spans two registers. <16,16,1> is invalid because it's compressed.
To handle this, skip the "everything fits in one register" case and
fall through to the exec_size halving case when compressed.
Fixes hundreds of Piglit regressions on GM965.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95370
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
| |
brw_reg_from_fs_reg() needs to know whether the instruction will be
compressed or not.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95370
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This enables:
- GL_OES_sample_shading
- GL_OES_sample_variables
- GL_OES_shader_multisample_interpolation
On Gen8, we pass all the CTS tests, and all but 4 of the dEQP-GLES31
tests (dealing with 1x/2x MSAA at half rate sampling). We believe
those 4 dEQP-GLES31 tests are incorrect.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes the following building error due to libmesa_nir dependency:
In file included from external/mesa/src/mesa/state_tracker/st_glsl_to_nir.cpp:44:0:
external/mesa/src/compiler/nir/nir.h:42:25: fatal error: nir_opcodes.h: No such file or directory
#include "nir_opcodes.h"
^
compilation terminated.
build/core/binary.mk:706: recipe for target 'out/target/product/x86/obj/STATIC_LIBRARIES/libmesa_st_mesa_intermediates/state_tracker/st_glsl_to_nir.o' failed
make: *** [out/target/product/x86/obj/STATIC_LIBRARIES/libmesa_st_mesa_intermediates/state_tracker/st_glsl_to_nir.o] Error 1
make: *** Waiting for unfinished jobs....
Reviewed-by: Rob Herring <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
The format_desc swizzle describes where in the array each color channel
comes from - but the existing code was written as if each entry in the
swizzle described the meaning of an array element.
Fixes piglit's arb_copy_image-format-swizzle.
Cc: "11.1 11.2" <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
Acked-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
Prep work for next patch.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This allows us to disable spilling for blorp shaders since blorp state
setup doesn't handle spilling. Without this, blorp fails hard if you run
with INTEL_DEBUG=spill.
Reviewed-by: Francisco Jerez <[email protected]>
Tested-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
| |
ARB_vertex_attrib_64bit was the only feature missing.
v2: we can expose 4.2 instead of 4.1 (Ian Romanick)
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
Input attributes can require 2 vec4 or 1 vec4 depending on whether they
are double-precision or not.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
When computing where the first non-payload GRF starts, we can't rely on
the number of attributes, as each attribute can be using 1 or 2 slots
depending on whether they are a dvec3/4 or other.
Instead, we need to use the number of slots used by the attributes.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Do not use total attributes because a dvec3/dvec4 attribute requires two
slots. So rather use total attribute slots.
v2: do not use loop to calculate required attribute slots (Kenneth
Graunke)
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
Double-precision types require 1 slot in VUE for double and dvec2, and 2 slots for
anything else.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
VS Thread Payload handles attributes in URB as vec4, no matter if they
are actually single or double precision.
So with double-precision types, value ends up in the registers split in
32bits chunks, in different positions.
We need to shuffle the chunks to get the doubles correctly.
v2:
* Extra blank line. Add { } on if body (Ian Romanick)
* Use dest directly (Kenneth Graunke)
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The HW has a restriction that only vertical stride may cross register
boundaries. Until now this was only handled on VGRFs at
rw_reg_from_fs_reg, but it is also needed for attributes.
v2:
* Remove reference to commit id on commit message (Juan Suarez)
* Simplify code that compute final exec_size (Ian Romanick)
* Use REG_SIZE on that same code (Kenneth Graunke)
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
Add an assertion to detect this case.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
From the Broadwell specification, structure VERTEX_ELEMENT_STATE
description:
"When SourceElementFormat is set to one of the *64*_PASSTHRU
formats, 64-bit components are stored in the URB without any
conversion. In this case, vertex elements must be written as 128
or 256 bits, with VFCOMP_STORE_0 being used to pad the output
as required. E.g., if R64_PASSTHRU is used to copy a 64-bit Red component into
the URB, Component 1 must be specified as VFCOMP_STORE_0 (with
Components 2,3 set to VFCOMP_NOSTORE) in order to output a 128-bit
vertex element, or Components 1-3 must be specified as VFCOMP_STORE_0
in order to output a 256-bit vertex element. Likewise, use of
R64G64B64_PASSTHRU requires Component 3 to be specified as VFCOMP_STORE_0
in order to output a 256-bit vertex element."
Uses 128-bits to write double and dvec2 vertex elements, and 256-bits for
dvec3 and dvec4 vertex elements.
Signed-off-by: Juan A. Suarez Romero <[email protected]>
Signed-off-by: Antia Puentes <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This commit adds support for PASSTHRU format when pushing
double-precision attributes.
Check glarray->Doubles in order to know if we should choose a format
that does a conversion to float, or just passthru the 64-bit double.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I recently fixed a bug in the Piglit tests:
https://lists.freedesktop.org/archives/piglit/2016-May/019802.html
With that patch in place, we pass all the tests. So, turn it on.
We could probably expose this earlier than Gen8, but the extension
says that OpenGL 4.0 is required, and all of our tests are written
against GLSL 4.00 (which is only supported on Gen8+).
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Ilia Mirkin <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
| |
Coverity issue 1361544 found an instance where the tcs variable is
checked for NULL, but unconditionally dereferenced later in the same
function.
Reviewed-by: Kenneth Graunke <[email protected]>
|