| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
| |
We will add new gl_extensions structures that capture the environment
variable extension overrides and are available early in context
creation.
This will allow a driver to take actions during its initialization
based on the extension overrides.
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously setting:
MESA_EXTENSION_OVERRIDE=-GL_MESA_ham_sandwich
Would cause Mesa to advertise support for the GL_MESA_ham_sandwich
extension, even though the override specifically asked for it to be
disabled.
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
| |
v2: enable also for i915 (Ian)
Signed-off-by: Tapani Pälli <[email protected]>
Reviewed-by: Petri Latvala <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Support inactive uniforms that have explicit location set in
glUniform* functions.
v2: remove unnecessary extension check, use new define (Ian)
Signed-off-by: Tapani Pälli <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Patch adds new implementation dependent value required by the
GL_ARB_explicit_uniform_location extension. Default value for user
assignable locations is calculated as sum of MaxUniformComponents
for each stage.
v2: fix descriptor in get_hash_params.py (Petri)
v3: simpler formula for calculating initial value (Ian)
Signed-off-by: Tapani Pälli <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Tapani Pälli <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We've used the LD sampler message for pull constant loads on earlier
hardware for some time, and also were already using it for the FS on
Broadwell. This patch makes us use it for Broadwell VS/GS as well.
I believe that when I wrote this code in 2012, we still used the data
port in some cases, and I somehow neglected to convert it while
rebasing.
Improves performance in GLBenchmark 2.7 Egypt by 416.978% +/- 2.25821%
(n = 17). Many other applications should benefit similarly: this speeds
up uniform array access in the VS, which is commonly used for skinning
shaders, among other things.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Tested-by: Ben Widawsky <[email protected]>
Cc: "10.2" <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Cc: "10.2" <[email protected]>
|
|
|
|
|
|
|
|
|
| |
I actually added MOCS support for these things, but forgot to delete the
corresponding perf_debug() warnings.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Cc: "10.2" <[email protected]>
|
|
|
|
|
|
|
|
| |
Somehow I missed this when adding all of the other MOCS values.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Cc: "10.2" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When faced with code such as:
mov vgrf31.0:UD, 960D
mov vgrf31.1:UD, vgrf30.xxxx:UD
The dead code eliminator didn't consider reg_offsets, so it decided that
the second instruction was writing was writing to the same register as
the first one, and eliminated the first one. But they're actually
different registers.
This fixes INTEL_DEBUG=shader_time for vertex shaders. In the above
code, vgrf31.0 represents the offset into the shader_time buffer where
the data should be written, and vgrf31.1 represents the actual time
data. With a completely undefined offset, results were...unexpected.
I think this is probably one of the few cases (maybe only case) where we
generate multiple MOVs to a large VGRF. Normally, we just use them as
texturing results; the other SEND-from-GRF uses a size 1 VGRF.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79029
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
| |
"shader_time_add" is a lot more informative than "op152".
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch fixes several clang constant-logical-operand warnings such as
the following.
../../../../../src/mesa/tnl_dd/t_dd_tritmp.h:130:32: warning: use of logical '||' with constant operand [-Wconstant-logical-operand]
if (DO_TWOSIDE || DO_OFFSET || DO_UNFILLED || DO_TWOSTENCIL)
^ ~~~~~~~~~~~
../../../../../src/mesa/tnl_dd/t_dd_tritmp.h:130:32: note: use '|' for a bitwise operation
if (DO_TWOSIDE || DO_OFFSET || DO_UNFILLED || DO_TWOSTENCIL)
^~
|
Signed-off-by: Vinson Lee <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
| |
Cc: "10.2" <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The comment for _mesa_is_type_integer is confusing because it says that it
returns whether the type is an “integer (non-normalized)” format. I don't
think it makes sense to say whether a type is normalized or not because it
depends on what format it is used with. For example, GL_RGBA+GL_UNSIGNED_BYTE
is normalized but GL_RGBA_INTEGER+GL_UNSIGNED_BYTE isn't. If the normalized
comment is just a mistake then it still doesn't make much sense because it is
missing the packed-pixel types such as GL_UNSIGNED_INT_5_6_5. If those were
added then it effectively just returns type != GL_FLOAT.
That function was only used in _mesa_is_enum_format_or_type_integer. This
function effectively checks whether the format is non-normalized or the type
is an integer. I can't think of any situation where that check would make
sense.
As far as I can tell neither of these functions have ever been used anywhere
so we should just remove them to avoid confusion.
These functions were added in 9ad8f431b2a47060bf05517246ab0fa8d249c800.
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a multisampled texture is used for sampling the fast clear color value
needs to be programmed into the surface state. This was being left as all
zeroes so if the surface was cleared to a value other than black then it
wouldn't work properly. This doesn't matter for single-sample textures because
in that case the MCS buffer is resolved before it is used as a texture source.
https://bugs.freedesktop.org/show_bug.cgi?id=79729
Reviewed-by: Chris Forbes <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
Cc: "10.1 10.2" <[email protected]>
|
|
|
|
|
|
|
| |
Too many levels of indirection.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
We only need to alter the default state if we're emitting MOVs for
header related fields. So, we can simply move the push/pop of state in
to the if (header_present) block, bypassing it in the common case.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79903
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In commit dc2d3a7f5c217a7cee92380fbf503924a9591bea, Iago accidentally
moved fire_fb_write() above the brw_pop_insn_state(), which caused the
SEND to lose its predication and change from WE_normal to WE_all.
Haswell uses predicated SENDs for discards, so this broke Piglit's
tests for discards.
We want the Gen4-5 MOV to be uncompressed, unpredicated, and unmasked,
but the actual FB write itself should respect those. So, pop state
first, and force it again around the single MOV.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79903
|
|
|
|
|
|
|
| |
Will simplify the automated conversion if we want to allow compiling the
driver for a single generation.
Reviewed-by: Kristian Høgsberg <[email protected]>
|
|
|
|
| |
I'm adding vec4 CSE, and I want to diff the files.
|
|
|
|
|
|
|
|
|
|
| |
This makes sure to use a no-op swizzle while iteratively rendering each
level of a mipmap otherwise we may loose components and effectively
apply the swizzle twice by the time these levels are sampled.
Signed-off-by: Robert Bragg <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously we would emit the comparison, emit an AND to mask off extra
bits from the comparison result, then convert the result to float. Now,
do the comparison, then use a cleverly constructed SEL to pick either
0.0f or 1.0f.
No piglit regressions on Ivybridge.
total instructions in shared programs: 1642311 -> 1639449 (-0.17%)
instructions in affected programs: 136533 -> 133671 (-2.10%)
GAINED: 0
LOST: 0
Programs that are affected appear to save between 1 and 5 instuctions
(just by skimming the output from shader-db report.py.
v2: s/b2i/b2f/ in commit subject (noticed by Chris Forbes). Remove
extraneous fix_3src_operand (suggested by Matt). The latter change
required swapping the order of the operands and using predicate_inverse.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
| |
brw_vec4_visitor.cpp:2717:1: warning: unused parameter 'ir' [-Wunused-parameter]
brw_vec4_visitor.cpp:2723:1: warning: unused parameter 'ir' [-Wunused-parameter]
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
| |
And delete the incorrect comment.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Tapani Pälli <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add Intel driver hook for glGetTexImage to accelerate the case of reading
texture image into a PBO. This case gets huge performance gains by using
GPU BLIT directly to PBO rather than GPU BLIT to temporary texture followed
by memcpy.
No regressions on Piglit tests with Intel driver.
Performance gain (1280 x 800 FBO, Ivybridge):
glGetTexImage + glMapBufferRange with patch 1.45 msec
glGetTexImage + glMapBufferRange without patch 4.68 msec
v3: (by Kenneth Graunke)
- Fix compile after Eric's change to drop the tiling argument
to intel_miptree_create_for_bo.
- Add GL_TEXTURE_3D to blacklisted texture targets to prevent Piglit
regressions.
- Squash in several whitespace and coding style fixes.
|
|
|
|
|
|
|
|
|
| |
We need to invalidate the live intervals when inserting new
instructions.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
|
|
|
| |
When walking backwards, we want to stop at the head sentinel, which is
where scan_inst->prev->prev == NULL, not scan_inst->prev == NULL.
Fixes random crashes, as well as valgrind errors.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
|
| |
Giving the meta clear program a meaningful name makes it easier to find
in output such as INTEL_DEBUG=fs or INTEL_DEBUG=shader_time. We already
did so for integer programs, but neglected to label the primary program.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These used to call different math emitters (brw_math vs. brw_math2).
Now that they both call gen6_math, they're virtually identical.
When unrolling SIMD16 to multiple SIMD8 operations, we should take care
not to apply sechalf to brw_null_reg for src1. Otherwise, we'd end up
with BRW_ARF_NULL + 1 as the register number, and I'm not sure if that's
valid.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
These functions are basically identical, so we should combine them.
However, they're so trivial, we may as well just fold them into their
only call sites.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
These are trivial to combine: we should just avoid checking the second
operand if it's brw_null_reg.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
It's now a single line of code, so we may as well fold it into the
caller.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Usually, I try to use "brw" for functions that apply to all generations,
and "gen4" for dead end/legacy code that is only used on Gen4-5.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Our existing functions, brw_math and brw_math2, had unclear roles:
Gen4-5 used brw_math for both unary and binary math functions; it never
used brw_math2. Since operands are already in message registers, this
is reasonable.
Gen6+ used brw_math for unary math functions, and brw_math2 for binary
math functions, duplicating a lot of code. The only real difference was
that brw_math used brw_null_reg() for src1.
This patch improves brw_math2's assertions to allow both unary and
binary operations, renames it to gen6_math(), and drops the Gen6+ code
out of brw_math().
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
| |
This is more typical C++ style.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Thread switching on control flow instructions is a documented workaround
for Gen4-5 errata. As far as I can tell, it hasn't been needed since
Sandybridge. Thread switching is not free, so in theory this may help
performance slightly.
Flow control instructions with the "switch" flag cannot be compacted, so
removing it will make these instructions compactable. (Of course, we
still have to implement compaction for flow control instructions...)
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
total instructions in shared programs: 2081469 -> 2081248 (-0.01%)
instructions in affected programs: 22606 -> 22385 (-0.98%)
No programs were hurt by this patch.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
|
|
|
|
| |
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Cody Northrop <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
| |
Nothing else uses GL-types here.
Signed-off-by: Kristian Høgsberg <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
|
|
|
|
|
|
|
|
| |
It's not used.
Signed-off-by: Kristian Høgsberg <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
|
|
|
|
| |
Untested.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 07af0ab changed fs_inst to have 0 sources for texture opcodes
in emit_texture_gen5 (Ironlake, Sandybrige) while fs_generator still
uses a single source from brw_reg struct. Patch sets src as reg_undef
which matches the behavior before the constructor got changed.
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
Signed-off-by: Tapani Pälli <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79534
|
|
|
|
|
|
|
|
|
|
| |
_mesa_streaming_load_memcpy is defined in main/streaming-load-memcpy.c
I'm adding it to the dricore lib
Cc: "10.1 10.2" <[email protected]>
Signed-off-by: Adrian Negreanu <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
Reviewed-by: Juha-Pekka Heikkila <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes:
include/c11/threads_posix.h: In function 'cnd_timedwait':
include/c11/threads_posix.h:140:21: error: storage size of 'abs_time' isn't known
Cc: "10.1 10.2" <[email protected]>
Signed-off-by: Adrian Negreanu <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
Reviewed-by: Juha-Pekka Heikkila <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes:
In file included from
/home/adrian/workspace/mesa/mesa-master.git/src/mesa/vbo/vbo_exec_api.c:445:0:
/home/adrian/workspace/mesa/mesa-master.git/src/mesa/vbo/vbo_attrib_tmp.h:28:38:
fatal error: util/u_format_r11g11b10f.h: No such file or directory
Cc: "10.1 10.2" <[email protected]>
Signed-off-by: Adrian Negreanu <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
Reviewed-by: Juha-Pekka Heikkila <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes linker error:
ld:
.../libmesa_dri_common_intermediates/libmesa_dri_common.a(dri_util.o):
in function globalDriverAPI:dri_util.c(.data.rel+0x0): error:
undefined reference to 'driDriverAPI'
As an example, you can see that mesa_dri_drivers
also uses common/libmegadriver_stub (src/mesa/drivers/dri/Makefile.am)
The _stub part might be confusing, but
it actually provides the dri-driver shared lib constructor,
megadriver_stub_init, which will later on load the real
platform dependent part and call
l __driDriverGetExtensions_<platform>
Cc: "10.1 10.2" <[email protected]>
Signed-off-by: Adrian Negreanu <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
Reviewed-by: Juha-Pekka Heikkila <[email protected]>
|
|
|
|
|
|
|
|
|
| |
So that android part can also use $(megadriver_stub_FILES)
Cc: "10.1 10.2" <[email protected]>
Signed-off-by: Adrian Negreanu <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
Reviewed-by: Juha-Pekka Heikkila <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Matt Turner <[email protected]>
Signed-off-by: Abdiel Janulgue <[email protected]>
|