| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Saves a measly 20 bytes on IA32 and nothing on x64. Depending on
exactly when this is applied, a lot of variation is possible due to
function alignment.
text data bss dec hex filename
6670131 228340 22552 6921023 699b3f lib/i965_dri.so before
6670111 228340 22552 6921003 699b2b lib/i965_dri.so after
6342932 293872 29880 6666684 65b9bc lib64/i965_dri.so before
6342932 293872 29880 6666684 65b9bc lib64/i965_dri.so after
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
By putting the parameters first that match the parameters to the call
site, 4 (of 14) instructions are saved at _mesa_Uniform4fv on x64. On
IA32, the details of the instructions change, but it is the same count
and mix of instructions.
Before:
0000000000000830 <_mesa_Uniform4fv>:
830: 48 83 ec 10 sub $0x10,%rsp
834: 49 89 d0 mov %rdx,%r8
837: 48 8b 15 00 00 00 00 mov 0x0(%rip),%rdx # 83e <_mesa_Uniform4fv+0xe>
83e: 89 f8 mov %edi,%eax
840: 89 f1 mov %esi,%ecx
842: 41 b9 02 00 00 00 mov $0x2,%r9d
848: 64 48 8b 3a mov %fs:(%rdx),%rdi
84c: 48 8b 97 c8 01 02 00 mov 0x201c8(%rdi),%rdx
853: 48 8b 72 70 mov 0x70(%rdx),%rsi
857: 6a 04 pushq $0x4
859: 89 c2 mov %eax,%edx
85b: e8 00 00 00 00 callq 860 <_mesa_Uniform4fv+0x30>
860: 48 83 c4 18 add $0x18,%rsp
864: c3 retq
After:
00000000000007f0 <_mesa_Uniform4fv>:
7f0: 48 83 ec 10 sub $0x10,%rsp
7f4: 48 8b 05 00 00 00 00 mov 0x0(%rip),%rax # 7fb <_mesa_Uniform4fv+0xb>
7fb: 41 b9 02 00 00 00 mov $0x2,%r9d
801: 64 48 8b 08 mov %fs:(%rax),%rcx
805: 48 8b 81 c8 01 02 00 mov 0x201c8(%rcx),%rax
80c: 6a 04 pushq $0x4
80e: 4c 8b 40 70 mov 0x70(%rax),%r8
812: e8 00 00 00 00 callq 817 <_mesa_Uniform4fv+0x27>
817: 48 83 c4 18 add $0x18,%rsp
81b: c3 retq
Saves a measly 416 bytes of text on x64. Depending on exactly when this
is applied, a lot of variation is possible due to function alignment.
text data bss dec hex filename
6670131 228340 22552 6921023 699b3f lib/i965_dri.so before
6670131 228340 22552 6921023 699b3f lib/i965_dri.so after
6343348 293872 29880 6667100 65bb5c lib64/i965_dri.so before
6342932 293872 29880 6666684 65b9bc lib64/i965_dri.so after
There is likely to be no performance change with just this patch.
_mesa_uniform immediately calls validate_uniform_parameters with
parameters in the "wrong" (different from the call site) order.
v2: Rebase on GL_ARB_gpu_shader_fp64.
v3: Rebase on GL_ARB_gpu_shader_int64.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
By putting the parameters first that match the parameters to the call
site, 4 (of 16) instructions are saved at _mesa_UniformMatrix4fv on
x64. On IA32, the details of the instructions change, but it is the
same count and mix of instructions.
Before:
0000000000001380 <_mesa_UniformMatrix4fv>:
1380: 48 83 ec 10 sub $0x10,%rsp
1384: 48 8b 05 00 00 00 00 mov 0x0(%rip),%rax # 138b <_mesa_UniformMatrix4fv+0xb>
138b: 41 89 f8 mov %edi,%r8d
138e: 41 89 f1 mov %esi,%r9d
1391: 0f b6 d2 movzbl %dl,%edx
1394: 64 48 8b 38 mov %fs:(%rax),%rdi
1398: 48 8b b7 c8 01 02 00 mov 0x201c8(%rdi),%rsi
139f: 48 8b 76 70 mov 0x70(%rsi),%rsi
13a3: 68 06 14 00 00 pushq $0x1406
13a8: 51 push %rcx
13a9: 52 push %rdx
13aa: b9 04 00 00 00 mov $0x4,%ecx
13af: ba 04 00 00 00 mov $0x4,%edx
13b4: e8 00 00 00 00 callq 13b9 <_mesa_UniformMatrix4fv+0x39>
13b9: 48 83 c4 28 add $0x28,%rsp
13bd: c3 retq
After:
0000000000001360 <_mesa_UniformMatrix4fv>:
1360: 48 83 ec 10 sub $0x10,%rsp
1364: 48 8b 05 00 00 00 00 mov 0x0(%rip),%rax # 136b <_mesa_UniformMatrix4fv+0xb>
136b: 0f b6 d2 movzbl %dl,%edx
136e: 64 4c 8b 00 mov %fs:(%rax),%r8
1372: 49 8b 80 c8 01 02 00 mov 0x201c8(%r8),%rax
1379: 68 06 14 00 00 pushq $0x1406
137e: 6a 04 pushq $0x4
1380: 6a 04 pushq $0x4
1382: 4c 8b 48 70 mov 0x70(%rax),%r9
1386: e8 00 00 00 00 callq 138b <_mesa_UniformMatrix4fv+0x2b>
138b: 48 83 c4 28 add $0x28,%rsp
138f: c3 retq
Saves a measly 576 bytes of text on x64.
text data bss dec hex filename
6670131 228340 22552 6921023 699b3f lib/i965_dri.so before
6670131 228340 22552 6921023 699b3f lib/i965_dri.so after
6343924 293872 29880 6667676 65bd9c lib64/i965_dri.so before
6343348 293872 29880 6667100 65bb5c lib64/i965_dri.so after
v2: Rebase on GL_ARB_gpu_shader_fp64.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This is C++, so we can mix code and declarations. Doing so allows
constification.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Plamena Manolova <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When trying to blit larger tiled surfaces, the pitch can be larger than
32768 bytes, which means it won't fit in a GLshort. Passing it in will
truncate the stride to 0, which has...surprising results.
The pitch can be up to 32,768 DWords, or 128kB. We measure it in bytes,
but divide by 4 when programming it. So we need to handle values up to
131,072. Switch from GLshort to int32_t to avoid the truncation.
Fixes GL45-CTS.gtf30.GL3Tests.depth_texture.depth_texture_copyteximage
at widths greater than 8192.
v2: Use int32_t as negative values can be used (Jason).
Cc: "17.0" <[email protected]>
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
SIMD16 compute shaders use a send(16) with mlen 1 for the EOT message,
using a source of g127 for the single register. With a UD type, this
supposedly could read g128, which doesn't exist, causing the simulator
to get cranky. Use a UW type to avoid this.
Cc: "17.0" <[email protected]>
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I hadn't bothered to set this bit because I figured it would just
paper over us getting the rectangle wrong. But it turns out that
there is a legitimate reason to use it, so let's do so.
The alternative would be to chop up 16k clears to multiple 8k clears,
which is pointlessly painful.
Cc: "17.0" <[email protected]>
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
functions
All of the functions were passing 1 to _mesa_uniform instead of passing
count.
Fixes 16 unsed parameter warnings like:
main/uniforms.c: In function ‘_mesa_Uniform1i64vARB’:
main/uniforms.c:1692:47: warning: unused parameter ‘count’ [-Wunused-parameter]
_mesa_Uniform1i64vARB(GLint location, GLsizei count, const GLint64 *value)
^~~~~
This is why I build with extra warnings enabled. Unfortunately, there
are so many unused parameter warnings in Mesa that I didn't notice these
added warnings for over 6 months. :(
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If radeonsi starts compiling an optimized shader variant asynchronously
with a GL debug callback set and the application destroys the GL context,
radeonsi crashes when trying to write shader stats into the debug output
of a non-existent context after compilation, because st/mesa was destroyed
before pipe_context.
Firefox with WebGL2 enabled hits this bug.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99456
v2: protect against a double destroy in st_create_context_priv and callers.
Cc: 17.0 <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
OpenGL ES implementations are not allowed to ship ARB extensions, and
OpenGL implementations are not allowed to ship OES extensions.
The functionality is also included in GL_ARB_ES2_compatibility. Ever
OpenGL core-profile driver currently exposes both extensions. I don't
know of any applications that explicitly check for GL_OES_read_format,
so removing it seems very unlikely to cause problems. No functionality
is removed.
I have left this extension in place for compatibility profile. There
are still OpenGL 1.x drivers in Mesa, and adding code to check for
compatibility profile and not GL_ARB_ES2_compatibility for
GL_IMPLEMENTATION_COLOR_READ_TYPE and GL_IMPLEMENTATION_COLOR_READ_FORMAT
just feels dumb.
Three other other alternatives considered:
- Remove the string from compatibility profile drivers but leave the
functionality in place.
- Add a flag to expose the extension string, and set it in every OpenGL
driver that does not expose GL_ARB_ES2_compatibility (and those
drivers only). I tried this. You can't have two instances of an
extension in the extension table (one dummy_true for ES1 and one with
a flag for compatibility profile), so the implementation requires a
bit of effort.
- Only expose the extension in compatibility if the version is less
than 2.0. I didn't see an easy way to do this.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In brw_blorp_copyteximage, we use the format from the render buffer.
This could be a combined depth/stencil format. In this case, we handle
stencil properly but we give blorp the wrong ISL format. Specifically,
we would give blorp ISL_FORMAT_R32G32B32A32_FLOAT which is the wrong
size was causing GPU hangs.
Fixes: GL45-CTS.gtf30.GL3Tests.packed_depth_stencil.packed_depth_stencil_copyteximage
Reviewed-by: Chad Versace <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
Cc: "13.0 17.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
state_tracker/st_glsl_to_tgsi.cpp:302:28: warning: ‘glsl_to_tgsi_instruction::tex_type’
is too small to hold all values of ‘enum glsl_base_type’
glsl_base_type tex_type:4;
Fixes: 8ce53d4a2f3 ("glsl: Add basic ARB_gpu_shader_int64 types")
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
v2 (Jason, Curro): Add stencil also even though it is not
enabled yet.
Cc: 17.0 <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Signed-off-by: Topi Pohjolainen <[email protected]>
|
| |
|
|
|
|
|
|
|
| |
This allows eglCreateImageKHR to access P010 surfaces created by vaapi
Signed-off-by: Rainer Hochecker <[email protected]>
Acked-by: Ben Widawky <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes GL45-CTS.gpu_shader_fp64.built_in_functions.
v2: use DDIV unconditionally (Roland)
Reviewed-by: Roland Scheidegger <[email protected]> (v1)
Reviewed-by: Marek Olšák <[email protected]> (v1)
Tested-by: Glenn Kennard <[email protected]>
Tested-by: James Harvey <[email protected]>
Cc: 17.0 <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
There are some line wrapping violations here but those lines will get
deleted in the following patch.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Now that the i965 backend doesn't depend on this field we can
make it more generic and short circuit a bunch of code paths.
The new field will be used in a following patch for another
clean-up.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This makes much more sense and should be more performant in some
critical paths such as SSO validation which is called at draw time.
Previously the CurrentProgram array could have contained multiple
pointers to the same struct which was confusing and we would often
need to fish out the information we were really after from the
gl_program anyway.
Also it was error prone to depend on the _LinkedShader array for
programs in current use because a failed linking attempt will lose
the infomation about the current program in use which is still
valid.
V2: fix validate_io() to compare linked_stages rather than the
consumer and producer to decide if we are looking at inward
facing shader interfaces which don't need validation.
Acked-by: Edward O'Callaghan <[email protected]>
To avoid build regressions the following 2 patches were squashed in to
this commit:
mesa/meta: rewrite _mesa_shader_program_use() and _mesa_program_use()
These are rewritten to do what the function name suggests, that is
_mesa_shader_program_use() sets the use of all stage and
_mesa_program_use() sets the use of a single stage.
Reviewed-by: Lionel Landwerlin <[email protected]>
Acked-by: Edward O'Callaghan <[email protected]>
mesa: update active relinked program
This likely fixes a subroutine bug were
_mesa_shader_program_init_subroutine_defaults() would never have been
called for the relinked program as we previously just set
_NEW_PROGRAM as dirty and never called the _mesa_use* functions when
linking.
Acked-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
|
| |
This reverts commit c95380c4044237d73fb537511667c3c8f658fcee.
Acked-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Gen8 adds Q/UQ types. We attempted to change the types back to DF in the
generator (commit c95380c40), but an assertion added in the FP64 series
(commit e481dcc3) triggers before that code has a chance to execute.
In fact, using Q/UQ in the IR and then changing to DF in the generator
would not work in the presence of source modifiers, etc.
Fixes: d6fcede6 ("i965: Return Q and UQ types for int64 and uint64")
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
| |
This is basically the same as happens for doubles.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
| |
Integer comparison functions (e.g., nir_op_ilt) are handled in the next
commit.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
v2 (idr): Make the "from" type in a cast unsized. This reduces the
number of required cast operations at the expensive slightly more
complex code. However, this will be a dramatic improvement when other
sized integer types are added. Suggested by Connor.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
| |
v2: Fixup assertion in brw_reg_type_to_hw_type to allow
BRW_REGISTER_TYPE_{UQ,Q} on Gen8+.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
| |
It seems like maybe this should return a different type based on Gen. Q
and UQ only exist on Gen8+, but, based on the old comment, I believe
previous Gens can generate 64-bit moves.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It's much easier to do this in the generator rather than while coming
out of NIR. brw_type_for_nir_type doesn't know the Gen, so we'd have to
add a bunch of plumbing. The alternate fix is to not emit int64 moves
for doubles in the first place... but that seems even more difficult.
This change won't catch non-MOV instructions that try to use 64-bit
integer types on Gen < 8. This may convert certain kinds of bugs in to
different kinds of bugs that are more difficult to detect (since the
assertions in the function won't catch them).
NOTE: I don't think anything can emit mixed-type 64-bit moves until the
same platform supports both ARB_gpu_shader_fp64 and
ARB_gpu_shader_int64. When we enable int64 on Gen < 8, we can solve
this problem other ways.
This prevents regressions on HSW in the next patch.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Just add operations to the switch statement here.
v2 (idr): "cut them down later" => Remove ir_unop_b2u64 and
ir_unop_u642b. Handle these with extra i2u or u2i casts just like
uint(bool) and bool(uint) conversion is done.
Signed-off-by: Dave Airlie <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Just add types into unsupported or double equivalent spots.
Signed-off-by: Dave Airlie <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
| |
v2 (idr): "cut them down later" => Remove ir_unop_b2u64 and
ir_unop_u642b. Handle these with extra i2u or u2i casts just like
uint(bool) and bool(uint) conversion is done.
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This hooks up the API to the internals for 64-bit integer uniforms.
v2: update to use non-strict aliased alternatives
Signed-off-by: Dave Airlie <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds the builtins and the lexer support.
To avoid too many warnings, it adds basic support to the type in a few
other places in mesa, mostly in the trivial places.
It also adds a query to be used later for if a type is an integer 32 or 64.
Signed-off-by: Dave Airlie <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
| |
This just adds the usual boilerplate in mesa core.
Signed-off-by: Dave Airlie <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Just add the boilerplate xml code.
v2 (idr): Update dispatch_sanity. Only add extension functions in core
profile.
v3 (idr): Remove comment line from gl_API.xml. Suggested by Matt.
Signed-off-by: Dave Airlie <[email protected]>
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Ian Romanick <[email protected]> [v1]
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
| |
Do this in general_restrictions_based_on_operand_types() because the two
rules that "Special Cases for Byte Operations" relax are checked there.
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
A function is necessary to handle immediate types.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
src1 must be a descriptor (including the information to determine that
the SEND is doing an extended math operation), but src0 can actually be
null since it serves as the source of the implicit GRF -> MRF move.
Reviewed-by: Kenneth Graunke <[email protected]>
|