| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Unlike most shader stages, the Hull Shader hardware makes us explicitly
tell it how many threads to dispatch and manually configure the channel
mask. One perk of this is that we have a lot of flexibility - we can
run it in either SIMD4x2 or SIMD8 mode.
Treating it as SIMD8 means that shaders with 8 or fewer output vertices
(which is overwhemingly the common case) can be handled by a single
thread. This has several intriguing properties:
- Accessing input arrays with gl_InvocationID as the index is a simple
SIMD8 URB read with g1 as the header. No indirect addressing required.
- Barriers are no-ops.
- We could potentially do output shadowing to combine writes, as the
concurrency concerns are gone. (We don't do this yet, though.)
v2: Drop first_non_payload_grf change, as it was always adding 0
(caught by Jordan Justen).
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I'm about to implement a scalar TCS backend, and I'd rather not
duplicate all of this code there.
One change is that we now write the tessellation levels from all
TCS threads, rather than just the first. This is pretty harmless,
and was easier. The IF/ENDIF needed for that are gone; otherwise
the generated code is basically identical.
I chose to emit load/store intrinsics directly because it was easier.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This prevents a crash when a NULL src is passed with a non-NULL length.
fixes: dEQP-GLES31.functional.debug.object_labels.query_length_only
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95252
Signed-off-by: Mark Janes <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
| |
This will be later required for 3D ASTC formats.
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This fixes
GL43-CTS.copy_image.samples_missmatch
which otherwise asserts in the radeonsi driver.
Reviewed-by: Anuj Phogat <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This prevents GL43-CTS.khr_debug.labels_non_debug from
memcpying all over the stack and crashing.
v2: actually fix the test.
Reviewed-by: Alejandro Piñeiro <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
GL43-CTS.texture_view.errors checks for GL_INVALID_VALUE
here but we catch these problems in the dimensionsOK check
and return the wrong error value.
This fixes:
GL43-CTS.texture_view.errors.
Reviewed-by: Anuj Phogat <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
| |
This fixes getteximage-depth piglit failures on radeonsi.
Cc: 11.1 11.2 <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
Spotted by Coverity
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Vinson Lee <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
From Section 7.3.1.1 (Naming Active Resources) of the OpenGL 4.5 spec:
"For the property LOCATION_COMPONENT, a single integer indicating the first
component of the location assigned to an active input or output variable is
written to params. For input and output variables with a component specified
by a layout qualifier, the specified component is written. For all other
input and output variables, the value zero is written."
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
OpenGL 4.5 Core Profile section 7.1, in the documentation for
CompileShader, says: "Changing the source code of a shader object with
ShaderSource does not change its compile status or the compiled shader
code."
According to Karol Herbst, the game "Divinity: Original Sin - Enhanced
Edition" depends on this odd quirk of the spec.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93551
Signed-off-by: Jamey Sharp <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
| |
Otherwise we won't be able to regenerate the source file(s).
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of cascading support for various different implementations of
GLX, all three options are now specified through the --enable-glx
option:
--enable-glx=dri : Enable the DRI-based GLX
--enable-glx=xlib : Enable the classic Xlib-based GLX
--enable-glx=gallium-xlib : Enable the gallium Xlib-based GLX
--enable-glx[=yes] : Defaults to dri if DRI is enabled, else
gallium-xlib if gallium is enabled, else
xlib
This removes the --enable-xlib-glx option and fixes a bug in which both
the classic xlib-glx and gallium xlib-glx implementations were getting
built causing different versioned and conflicting libGL libraries to be
installed.
v2: Changes from various review feedback from Emil:
a) Fixed typos
b) Corrected help docs for new option
c) Added appropriate a-b and r-b tags in commit msg
d) Fixed various GLX related dependency checks.
v3: Rebased to current master and added changelog in commit msg
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94086
Acked-by: Brian Paul <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
When index - t->temps_size is greater than 4096, allocating space for
temporaries on demand will miserably crash. This can happen when a game
uses a lot of temporaries like the recent released Tomb raider.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
Cc: "11.1 11.2" <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Thomas Faller <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In optimized builds, visit(ir_expression *) experiences inlining with gcc that
leads the function to have a roughly 32KB stack frame. This is a problem given
that the function is called recursively. In non-optimized builds, the stack
frame is much smaller, hence one gets crashes that happen only in optimized
builds.
Arguably there is a compiler bug or at least severe misfeature here. In any
case, the easy thing to do for now seems to be moving the bulk of the
non-recursive code into a separate function. This is sufficient to convince my
version of gcc not to blow up the stack frame of the recursive part. Just to be
sure, add the gcc-specific noinline attribute to prevent this bug from
reoccuring if inliner heuristics change.
v2: put ATTRIBUTE_NOINLINE into macros.h
Cc: "11.1 11.2" <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95133
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95026
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92850
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes dEQP-GLES31.functional subtests:
draw_indirect.negative.command_offset_not_in_buffer_signed32_wrap
draw_indirect.negative.command_offset_not_in_buffer_unsigned32_wrap
These tests use really large values that overflow GLsizeiptr, at
which point the buffer size isn't less than "end".
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95138
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Alejandro Piñeiro <[email protected]>
Reviewed-by: Mark Janes <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This matches the "foreach x in container" pattern found in many other
programming languages. Generated by the following regular expression:
s/nir_foreach_use(\([^,]*\),\s*\([^,]*\))/nir_foreach_use(\2, \1)/
and similar expressions for nir_foreach_use_safe, etc.
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This matches the "foreach x in container" pattern found in many other
programming languages. Generated by the following regular expression:
s/nir_foreach_function(\([^,]*\),\s*\([^,]*\))/nir_foreach_function(\2, \1)/
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This matches the "foreach x in container" pattern found in many other
programming languages. Generated by the following regular expression:
s/nir_foreach_instr(\([^,]*\),\s*\([^,]*\))/nir_foreach_instr(\2, \1)/
and similar expressions for nir_foreach_instr_safe etc.
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
| |
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
The old comment was a bit terse. Also, change the function return
type to bool.
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
| |
A later patch will add lower_flrp64 option to NIR.
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes the OpenGLES 3.1 CTS:
* ESEXT-CTS.draw_elements_base_vertex_tests.invalid_mapped_bos
Because this is triggering the error message after the normal API
validation phase, we don't have the API function name available, and
therefore we generate an error message without the draw call name:
Mesa: User error: GL_INVALID_OPERATION in draw call (vertex buffers are mapped)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95142
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes ES31-CTS.gtf.GL31Tests.texture_stencil8.texture_stencil8_multisample.
Current logic divides given layer of one by number of samples (four)
trashing the layer to zero. Layer adjustment is only to be used with
non-interleaved msaa surfaces where samples for particular layer are
in multiple slices.
I copy-pasted a bit of documentation from
brw_blorp.c::brw_blorp_compute_tile_offsets().
Also took the opportunity to fix the comment regarding sampling
as 2D, cube textures are the only exception.
Reviewed-by: Kenneth Graunke <[email protected]>
Signed-off-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Stencil texturing is required by ES 3.1. Apparently we never actually
turned it on. Do that now. Also turn on the desktop extension.
Fixes nine dEQP-GLES31.functional tests:
stencil_texturing.format.stencil_index8_2d
texture.border_clamp.formats.stencil_index8.nearest_size_pot
texture.border_clamp.formats.stencil_index8.nearest_size_npot
texture.border_clamp.formats.stencil_index8.gather_size_pot
texture.border_clamp.formats.stencil_index8.gather_size_npot
texture.border_clamp.unused_channels.stencil_index8
state_query.internal_format.renderbuffer.stencil_index8_samples
state_query.internal_format.texture_2d_multisample.stencil_index8_samples
state_query.internal_format.texture_2d_multisample_array.stencil_index8_samples
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ES prohibits this, but GL appears to allow it. We at least need this
much, or else we'll crash as there's no source to read from.
This fixed crashes in the ES tests before I realized I needed to
prohibit stencil instead.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes
- ES31-CTS.gtf.GL31Tests.texture_stencil8.texture_stencil8
- ES31-CTS.gtf.GL31Tests.texture_stencil8.texture_stencil8_multisample
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
We called intel_miptree_get_image_offset() to get the image offsets
for the current level/slice, but then proceeded to ignore the results
and clobber level/slice 0 every time.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94713
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
|
|
|
|
|
|
|
|
| |
I want to add another condition. Moving the indirect_offset.file
check out a level should make this a little easier.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Often, we don't need a full 4 channels worth of data from the sampler.
For example, depth comparisons and red textures only return one value.
To handle this, the sampler message header contains a mask which can
be used to disable channels, and reduce the message length (in SIMD16
mode on all hardware, and SIMD8 mode on Broadwell and later).
We've never used it before, since it required setting up a message
header. This meant trading a smaller response length for a larger
message length and additional MOVs to set it up.
However, Skylake introduces a terrific new feature: for headerless
messages, you can simply reduce the response length, and it makes
the implicit header contain an appropriate mask. So to read only
RG, you would simply set the message length to 2 or 4 (SIMD8/16).
This means we can finally take advantage of this at no cost.
total instructions in shared programs: 9091831 -> 9073067 (-0.21%)
instructions in affected programs: 191370 -> 172606 (-9.81%)
helped: 2609
HURT: 0
total cycles in shared programs: 70868114 -> 68454752 (-3.41%)
cycles in affected programs: 35841154 -> 33427792 (-6.73%)
helped: 16357
HURT: 8188
total spills in shared programs: 3492 -> 1707 (-51.12%)
spills in affected programs: 2749 -> 964 (-64.93%)
helped: 74
HURT: 0
total fills in shared programs: 4266 -> 2647 (-37.95%)
fills in affected programs: 3029 -> 1410 (-53.45%)
helped: 74
HURT: 0
LOST: 1
GAINED: 143
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The previous behavior would only allocate one register and then write
four thus potentially stomping three innocent bystanders.
Cc: [email protected]
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
|