| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
Signed-off-by: Ben Widawsky <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the first call in a GL app is glReadPixels(GL_FRONT) we'd fail the
assert(st->ctx->FragmentProgram._Current) at st_atom_shader.c:114 in
update_fp().
This is because we were calling st_validate_state() without first
updating Mesa state with _mesa_update_state().
The regression came from commit 83b589301f4a150f4 "st/mesa: fix
frontbuffer glReadPixels regressions".
The new piglit gl-1.0-simple-readbuffer test exercises this.
Cc: "11.1 11.2" <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It is used as a bitfield, so it seems cleaner to keep it unsigned.
The literal 1 is a (signed) int, and shifting into the sign bit is undefined
in C, so change occurences of 1 to 1u.
v2: add an assert for bitfield size and use 1u << idx
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]> (v1)
Reviewed-by: Marek Olšák <[email protected]> (v1)
|
|
|
|
|
|
|
|
| |
Handle the case of ARB_framebuffer_no_attachment.
Also, kill off a dead debug printf() call while we are here.
Signed-off-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Using PIPE_FORMAT_NONE to indicate what MSAA modes are supported
with a framebuffer using no attachment.
V.2:
Rewrite MSAA mode loop to be more general.
V.3:
Move comment to right place after loop was rewritten.
V.4: [airlied]
remove unneeded variable, and assert, and unneeded pipe assignment
Signed-off-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Set default values for the constants required in
ARB_framebuffer_no_attachments and obtained the number
of layers from ``PIPE_CAP_MAX_TEXTURE_ARRAY_LAYERS``.
We also obtain the MaxFramebufferSamples value using
a query back to the driver for PIPE_FORMAT_NONE.
V.1:
Merge if branch predicates into one branch.
Move const init into st_init_limits()
[airlied: whitespace fixup]
Signed-off-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Change references to gl_framebuffer::Width, Height, MaxNumLayers
and Visual::samples to use the _mesa_geometric_ convenience functions
for those places where the geometry of the gl_framebuffer is needed.
This is in contrast to the geometry of the intersection of the
attachments of the gl_framebuffer.
This patch paves the way to enable GL_ARB_framebuffer_no_attachements
for all gallium drivers.
V.2:
Remove itermeditate variable state.
Signed-off-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
| |
V.2:
Change 'N.B.,' to 'NOTE:'.
Signed-off-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Previously, we were walking over the shader source to figure out which
inputs should be marked flat. Now, we can just pull it out of prog_data.
This is needed for properly setting up 3DSTATE_SF/SBE for Vulkan and it
also means that it will get properly cached.
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
This is needed by the Vulkan driver
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Francisco Jerez <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
In the Vulkan driver we use a single flat input instead of a uniform
because setting up push constants is more disruptive to the pipeline than
setting up another vertex input. This uses the number of uniforms as a key
to keep it working for the GL driver.
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
It's used by brw_compile_gs in brw_vec4_gs_visitor.cpp so it needs to be in
a file that's linked into libi965_compiler.la.
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Cc: 11.1 11.2 <[email protected]>
|
|
|
|
| |
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
With this change we create the UBO and SSBO arrays separately from the
beginning rather than putting them into a combined array and splitting
it apart later.
A bug is with UBO and SSBO stage reference querying is also fixed as
we now use the block index to lookup the references in the separate arrays
not the combined buffer block array.
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
|
|
|
|
|
|
|
| |
They are compute-shader only and that's where the code for doing atomics on
shared variables lives so it seemes to make sense.
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Our hardware requires an LOD for all texelFetch commands even if they are
on buffer textures. GLSL IR gives us an LOD of 0 in that case, but the LOD
is really rather meaningless. This commit allows other NIR producers to be
more lazy and not provide one at all.
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
There may not be a previous block. In this case, there's no real work
to do, so just continue on to the next one.
v2: Update for bblock->prev() API change.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The bblock_t::prev/prev_const/next/next_const API returns bblock_t
pointers, rather than exec_nodes. So it's a bit surprising.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
| |
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The SIN and COS instructions on Intel hardware can produce values
slightly outside of the [-1.0, 1.0] range for a small set of values.
Obviously, this can break everyone's expectations about trig functions.
According to an internal presentation, the COS instruction can produce
a value up to 1.000027 for inputs in the range (0.08296, 0.09888). One
suggested workaround is to multiply by 0.99997, scaling down the
amplitude slightly. Apparently this also minimizes the error function,
reducing the maximum error from 0.00006 to about 0.00003.
When enabled, fixes 16 dEQP precision tests
dEQP-GLES31.functional.shaders.builtin_functions.precision.
{cos,sin}.{highp,mediump}_compute.{scalar,vec2,vec4,vec4}.
at the cost of making every sin and cos call more expensive (about
twice the number of cycles on recent hardware). Enabling this
option has been shown to reduce GPUTest Volplosion performance by
about 10%.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
See commit 3b0279a69 - this restriction is documented in the "Surface
Format" field of RENDER_SURFACE_STATE.
Looking at newer documentation, this restriction appears to exist on
Haswell, but no longer applies on Gen8+.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ben Widawsky <[email protected]>
|
|
|
|
|
|
|
| |
this was returning the fragment shader value.
Reviewed-by: Kenneth Graunke <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
| |
This extension is identical to ARB_base_instance. Reuse the same
entrypoints.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
| |
The extension spec was extended to also support ES. This functionality
is provided all the way back to ES 1.0.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
| |
As the relevant extensions get implemented, the lines should be
uncommented. I believe this is (almost) everything needed for those GL
versions though.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
| |
The if always returns so no need for an else.
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This allows us to simplify the code and drop InterfaceBlockStageIndex
which is a per stage array of integers the size of all blocks in the
program combined including duplicates across stages. Adding a stage
ref per block will use less memory.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With SSO, the GL_PROGRAM_INPUT and GL_PROGRAM_OUTPUT interfaces refer to
the first and last shader stage linked into a program. This may not be
the vertex and fragment shader stages.
So, subtracting VERT_ATTRIB_GENERIC0 and FRAG_RESULT_DATA0 is bogus.
We need to subtract VERT_ATTRIB_GENERIC0 for VS inputs,
FRAG_RESULT_DATA0 for FS outputs, and VARYING_SLOT_VAR0 for other cases.
Note that built-in variables get a location of -1.
Fixes 4 dEQP-GLES31.functional.program_interface_query tests:
- program_input.location.separable_fragment.var_explicit_location
- program_input.location.separable_fragment.var_array_explicit_location
- program_output.location.separable_vertex.var_array_explicit_location
- program_output.location.separable_vertex.var_array_explicit_location
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We were recording locations for all variables, even ones without an
explicit location set. Implement the rules from the spec, and record
-1 in the resource list accordngly. Make program_resource_location
stop doing math on negative values. Remove hacks that are no longer
necessary now that we've stopped doing that.
Fixes 4 dEQP-GLES31.functional.program_interface_query tests:
- program_input.location.separable_fragment.var
- program_input.location.separable_fragment.var_array
- program_output.location.separable_vertex.var_array
- program_output.location.separable_vertex.var_array
v2: Delete more code
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
| |
A program will either have gl_VertexID or gl_VertexIDMESA (the lowered
zero-based version), not both. Just spoof it in the resource list so
the hacks are done in a single place.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
System values are just built-in input variables that we've opted to
special-case out of convenience. We need to consider all inputs,
regardless of how we've classified them.
Unfortunately, there's one exception: we shouldn't add gl_BaseVertex
unless ARB_shader_draw_parameters is enabled, because it doesn't
actually exist in the language, and shouldn't be counted in the
GL_ACTIVE_RESOURCES query.
Fixes dEQP-GLES31.functional.program_interface_query.program_input.
resource_list.compute.empty, which expects gl_NumWorkGroups to appear
in the resource list.
v2: Delete more code
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
| |
This is necessary for ARB_texture_stencil8 support on classic drivers.
Presumably Gallium works because it implements its own ChooseTexFormat.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For radeonsi, native and TGSI use different compilers and this results
in different limits for different IR's.
The set we strictly need for radeonsi is only the MAX_BLOCK_SIZE
and MAX_THREADS_PER_BLOCK params, but I added a few others as shader
related that seemed like they would also typically depend on the
compiler.
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The value 0 for unknown has been chosen to so that
drivers using tgsi_scan_shader do not need to detect
missing properties if they zero-initialize the struct.
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
| |
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Builds with gallium enabled fail on x86 with linker error:
external/mesa3d/src/mesa/vbo/vbo_exec_array.c:127: error: undefined reference to '_mesa_uint_array_min_max'
The problem is sse_minmax.c is not included in the libmesa_st_mesa
library. Since the SSE4.1 files are needed for both libmesa_st_mesa
and libmesa_dricore, move SSE4.1 files into a separate static library
that can be used by both.
Cc: "11.1 11.2" <[email protected]>
Signed-off-by: Rob Herring <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
| |
This is the same ext as ARB_draw_buffers_blend (plus some core
functionality that already exists). Add the alias entrypoints.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Haswell GT2 and GT3 have a minimum of 64 entries. Hardcoding 32
is not legal.
v2: Delete stale comment (caught by Alejandro).
Cc: [email protected]
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
According to the Sandybridge PRM's description of the resinfo message,
the .z value returned will be Depth == 0 ? 0 : Depth + 1. The earlier
PRMs have the same table.
This means we return 0 for array textures with a single slice, when
we ought to return 1. Just override it to max(depth, 1).
Fixes 10 dEQP-GLES3.functional tests on Sandybridge:
shaders.texture_functions.texturesize.sampler2darray_fixed_vertex
shaders.texture_functions.texturesize.sampler2darray_fixed_fragment
shaders.texture_functions.texturesize.sampler2darray_float_vertex
shaders.texture_functions.texturesize.sampler2darray_float_fragment
shaders.texture_functions.texturesize.isampler2darray_vertex
shaders.texture_functions.texturesize.isampler2darray_fragment
shaders.texture_functions.texturesize.usampler2darray_vertex
shaders.texture_functions.texturesize.usampler2darray_fragment
shaders.texture_functions.texturesize.sampler2darrayshadow_vertex
shaders.texture_functions.texturesize.sampler2darrayshadow_fragment
Cc: [email protected]
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
None of the callers actually wanted what it did. In ptn_xpd, you only
ever want a vec3 swizzle. In ptn_tex, you want a swizzle that matches
the number of required texture coordinates.
shader-db results:
G45:
total instructions in shared programs: 4011240 -> 4010911 (-0.01%)
instructions in affected programs: 59232 -> 58903 (-0.56%)
helped: 114
HURT: 0
total cycles in shared programs: 84314194 -> 84313220 (-0.00%)
cycles in affected programs: 779150 -> 778176 (-0.13%)
helped: 110
HURT: 13
Ironlake:
total instructions in shared programs: 6397262 -> 6396605 (-0.01%)
instructions in affected programs: 117402 -> 116745 (-0.56%)
helped: 227
HURT: 0
total cycles in shared programs: 128889798 -> 128888524 (-0.00%)
cycles in affected programs: 1214644 -> 1213370 (-0.10%)
helped: 179
HURT: 44
Sandy Bridge:
total instructions in shared programs: 8467391 -> 8467384 (-0.00%)
instructions in affected programs: 3107 -> 3100 (-0.23%)
helped: 10
HURT: 6
total cycles in shared programs: 117580120 -> 117573448 (-0.01%)
cycles in affected programs: 103158 -> 96486 (-6.47%)
helped: 84
HURT: 11
Ivy Bridge:
total instructions in shared programs: 7774255 -> 7774258 (0.00%)
instructions in affected programs: 1677 -> 1680 (0.18%)
helped: 8
HURT: 6
total cycles in shared programs: 65743828 -> 65739190 (-0.01%)
cycles in affected programs: 89312 -> 84674 (-5.19%)
helped: 78
HURT: 23
Haswell:
total instructions in shared programs: 7107172 -> 7107150 (-0.00%)
instructions in affected programs: 2048 -> 2026 (-1.07%)
helped: 16
HURT: 0
total cycles in shared programs: 64653636 -> 64647486 (-0.01%)
cycles in affected programs: 86836 -> 80686 (-7.08%)
helped: 85
HURT: 17
Broadwell and Skylake:
total instructions in shared programs: 8447529 -> 8447507 (-0.00%)
instructions in affected programs: 2038 -> 2016 (-1.08%)
helped: 16
HURT: 0
total cycles in shared programs: 66418670 -> 66413416 (-0.01%)
cycles in affected programs: 90110 -> 84856 (-5.83%)
helped: 83
HURT: 20
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The KIL instruction doesn't have a destination, so ptn_kil never uses
dest.
program/prog_to_nir.c: In function ‘ptn_kil’:
program/prog_to_nir.c:547:38: warning: unused parameter ‘dest’ [-Wunused-parameter]
ptn_kil(nir_builder *b, nir_alu_dest dest, nir_ssa_def **src)
^
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
The extension is identical to GL_OES_copy_image. But dEQP has tests that
want the EXT variant.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We require the full ARB_gpu_shader5 for now, but in the future some
other CAP could get exposed to indicate that only the multisample-related
behavior of ARB_gpu_shader5 is available.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|