| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commits a39a8fbbaa12 ("nir: move to compiler/") and eb63640c1d38
("glsl: move to compiler/") broke Android builds. Fix them.
There is also a missing dependency between generated NIR headers and
several libraries. This isn't a new issue, but seems to have been
exposed by the NIR move.
Built with i915, i965, freedreno, r300g, r600g, vc4, and virgl enabled.
Cc: "11.2" <[email protected]>
Cc: Mauro Rossi <[email protected]>
Signed-off-by: Rob Herring <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
(cherry picked from commit 574a92b048ae2b482982c3f156182970d551ca94)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It's not correct to CSE these multiplies
mul.sat dst1, -a, b
mul.sat dst2, a, b
by emitting a negated MOV from dst1 to dst2:
mul.sat dst1, -a, b
mov dst2, -dst1
Take 2.0*2.0 for example. The first multiply would produce 0.0 and the
second would produce 1.0.
Fixes bad generated code in 18 to 22 shaders:
instructions in affected programs: 432 -> 464 (7.41%)
helped: 4
HURT: 18
Cc: [email protected]
Reviewed-by: Ian Romanick <[email protected]>
(cherry picked from commit 1567da1e2820d4c1a6c14f4598ad3addba6bc788)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The change "mesa/readpix: Don't clip in _mesa_readpixels()" caused a
few piglit regressions. The failing tests use glReadPixels to read
from the front color buffer. The problem is we were trying to read
from a non-existant front color buffer. The front color buffer is
created on demand in st/mesa. Since the missing buffer bounds were
effectively 0 x 0 the glReadPixels was totally clipped and returned
early.
The fix involves creating the real front color buffer when we're about
to try reading from it.
Tested with llvmpipe and VMware driver on Linux, Windows.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94253
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94254
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94257
Cc: [email protected]
Reviewed-by: Roland Scheidegger <[email protected]>
(cherry picked from commit 83b589301f4a150f4b1b13fd3ffd9f6d98ee6546)
|
|
|
|
|
|
|
|
|
|
|
| |
See commit 9db2098d for the i965 version of this.
This fixes depth in a bunch of dEQP EXT_texture_border_clamp tests. And
probably other ones as well.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
|
| |
If the destination is a renderbuffer, dst_tex_image will be NULL. This
fixes the *to_renderbuffer dEQP copy image tests.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
|
| |
It's basically the same thing as GL_ARB_texture_stencil8 except that
glCopyTexImage isn't supported, so add STENCIL_INDEX to the list of
invalid GLES formats for glCopyTexImage.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Eduardo Lima Mitev <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
- LOD must be provided in .w for TXF (even for buffer textures)
- User buffer must be valid at draw time
- Must have a sampler associated with the sampler view
This makes PBO uploads work again on nouveau.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The base format is a function of the user-requested format, while the
driver format is not. So we should use the base format instead.
The driver format can be anything. Specifically in the stencil-only
case, it might be a depth/stencil format. However we still want to
refuse such an attachment when bound to GL_DEPTH.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
| |
Avoid a per-pixel multiply.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
This reduces a glTexImage(GL_RGBA, GL_UNSIGNED_BYTE) hot spot in when
storing the texture as BGRA.
Reviewed-by: Jose Fonseca <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Instead of discarding the texture we created, keep it around in case
the next glDrawPixels draws the same image again. This is intended
to help application which draw the same image several times in a row,
either within a frame or subsequent frames.
Reviewed-by: Charmaine Lee <[email protected]>
|
|
|
|
|
|
| |
Noticed by Brian Paul.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
src/mesa/main/texstore.c:92:22: warning: ‘map_1032’ defined but not used [-Wunused-const-variable]
static const GLubyte map_1032[6] = { 1, 0, 3, 2, ZERO, ONE };
^~~~~~~~
src/mesa/main/texstore.c:91:22: warning: ‘map_3210’ defined but not used [-Wunused-const-variable]
static const GLubyte map_3210[6] = { 3, 2, 1, 0, ZERO, ONE };
^~~~~~~~
src/mesa/main/texstore.c:90:22: warning: ‘map_identity’ defined but not used [-Wunused-const-variable]
static const GLubyte map_identity[6] = { 0, 1, 2, 3, ZERO, ONE };
^~~~~~~~~~~~
These appear to be unused since:
commit 8ec6534b266549cdc2798e2523bf6753924f6cde
Author: Iago Toral Quiroga <[email protected]>
AuthorDate: Wed Oct 15 13:42:11 2014 +0200
mesa: Use _mesa_format_convert to implement texstore_rgba.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp:244:1: warning:
‘void {anonymous}::fs_copy_prop_dataflow::dump_block_data() const’ defined but not used [-Wunused-function]
fs_copy_prop_dataflow::dump_block_data() const
^~~~~~~~~~~~~~~~~~~~~
From looking at git history, it looks like this is intended to be unused
(ie. just for adding on-demand debug prints)
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 5fd848f6c9ee ("program: Use _mesa_geometric_samples to calculate
gl_NumSamples") broken Android builds. Add the missing include path "main"
to framebuffer.h like other includes in prog_statevars.c.
Cc: Neil Roberts <[email protected]>
Cc: Ilia Mirkin <[email protected]>
Signed-off-by: Rob Herring <[email protected]>
Reviewed-by: Neil Roberts <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
From ARB_sample_shading:
"gl_NumSamples is the total number of samples in the framebuffer,
or one if rendering to a non-multisample framebuffer"
So make sure to always pass in at least 1.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Edward O`Callaghan <[email protected]>
Reviewed-by: Neil Roberts <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch moves the calculation of current uniforms to
link_uniforms, which makes use of UniformRemapTable which
stores all the reserved uniform locations.
Location assignment for implicit uniforms now tries to use
any gaps left in the table after the location assignment
for explicit uniforms. This gives us more space to store more
uniforms.
Patch is based on earlier patch with following changes/additions:
1: Move the counting of explicit locations to
check_explicit_uniform_locations and then pass
the number to link_assign_uniform_locations.
2: Count the number of empty slots in UniformRemapTable
and store them in a list_head.
3: Try to find an empty slot for implicit locations from
the list, if that fails resize UniformRemapTable.
Fixes following CTS tests:
ES31-CTS.explicit_uniform_location.uniform-loc-mix-with-implicit-max
ES31-CTS.explicit_uniform_location.uniform-loc-mix-with-implicit-max-array
Signed-off-by: Tapani Pälli <[email protected]>
Signed-off-by: Plamena Manolova <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93696
|
|
|
|
|
|
|
|
|
|
|
|
| |
This basically saves the current pipeline state, sets up state for
rendering, constructs a set of textured quads, renders, then restores
the previous pipeline state.
It shouldn't be hard to implement a similar function for non-gallium
drives. With some code refactoring, the vertex definition code could
probably be shared.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This improves the performance of applications which use glXUseXFont()
or wglUseFontBitmaps() and glCallLists() to draw bitmap text.
Basically, we collect all the glBitmap images from the display lists
and put them into a texture atlas. To render the bitmaps for a
glCallLists() command, we render a set of textured quads where each
quad is textured with one bitmap image. Actually, the rendering part
has to be done by the Mesa driver or Mesa/gallium state tracker.
Note that GLUT demos that use glutBitmapCharacter() don't benefit
from this.
v2, per Nicolai Hähnle:
- check the max tex rect size is at least 1024.
- add comment in dd.h that texture_rectangle is required.
- in _mesa_DeleteLists(), try to delete the atlas before the list(s)
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
| |
Gallium doesn't present these as GL_RED-style. A swizzle is necessary to
present the proper data in the unused components.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The restriction on multisampled integer texture formats only applies to
GLES 3.0, so don't apply it to GLES 3.1 contexts. This fixes a slew of
dEQP-GLES31.functional.state_query.internal_format.*
tests, which now all pass.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Every stage has a corresponding 3DSTATE_CONSTANT_XS packet, so having
the code to create and emit push constant buffers in genX_vs_state.c
is a little strange. Moving it to a separate file seems more logical.
v2 [Ken]: Rebase on master, explain motivation in the commit message.
Signed-off-by: Ben Widawsky <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
And use it in brw_fs_nir.cpp.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Gen4/5's SEL instruction cannot use conditional modifiers, so min/max
are implemented as CMP + SEL. Handling that after optimization lets us
CSE more.
On Ironlake:
total instructions in shared programs: 6426035 -> 6422753 (-0.05%)
instructions in affected programs: 326604 -> 323322 (-1.00%)
helped: 1411
total cycles in shared programs: 129184700 -> 129101586 (-0.06%)
cycles in affected programs: 18950290 -> 18867176 (-0.44%)
helped: 2419
HURT: 328
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
| |
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Even though it's a no-op, it's important to keep track of the type so
that we can pick the properly-signed op later on.
This fixes dEQP-GLES3.functional.shaders.precision.uint.highp_div_fragment,
which ended up using IDIV instead of UDIV.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
| |
Note that this results in a different transformation for the viewport's
Z axis (depth range), but that doesn't matter for this case.
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On gen7 (Ivy Bridge, Haswell), we will get a GPU hang if an indirect
dispatch is used, but one of the dimensions is 0.
Therefore we use predicated rendering on the GPGPU_WALKER command to
handle this case.
Fixes piglit test: spec/arb_compute_shader/zero-dispatch-size
From the ARB_compute_shader spec, under DispatchCompute:
"If the work group count in any dimension is zero, no work groups are
dispatched."
And then for DispatchComputeIndirect:
... "is equivalent (assuming no errors are generated) to calling
DispatchCompute with <num_groups_x>, <num_groups_y> and
<num_groups_z>" ...
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94100
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Ben Widawsky <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Tested-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
| |
Noticed by Ilia when I was trying to figure out why some app was failing
to use ETC2.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When the number of uniform blocks is less than 12,
ARB_uniform_buffer_object can't be enabled and the maximum GL version
is not even 3.1...
This fixes a regression introduced in 7c79c1e (st/mesa: add compute
shader state) if the maximum number of uniform blocks allowed for
compute shaders is less than 12. This happens on Kepler but this might
also affect other Gallium drivers.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reported-by: Tobias Klausmann <[email protected]>
Tested-by: Tobias Klausmann <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Tobias Klausmann <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The ARB_compute_shader spec says:
"If the work group count in any dimension is zero, no work groups
are dispatched."
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
See Ivy Bridge PRM, Volume 2, Part 2, 1.8.4 INTERFACE_DESCRIPTOR_DATA:
DWORD 5, bits 20:16: "This field indicates how much shared local
memory the thread group requires. The amount is specified in 4k
blocks, but only powers of 2 are allowed: 0, 4k, 8k, 16k, 32k and 64k
per half-slice."
For Haswell, see Volume 2d, INTERFACE_DESCRIPTOR_DATA:
DWORD 5, bits 20:16: With text identical to the Ivy Bridge PRM.
For Broadwell, see Volume 2d, INTERFACE_DESCRIPTOR_DATA:
DWORD 6, bits 20:16: With text identical to the Ivy Bridge PRM.
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Ben Widawsky <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
| |
|
|
|
|
| |
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
Rename to 'tex_attr' to be a bit more clear.
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
| |
This simplifies the error handling code too.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
| |
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
| |
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
| |
Just a little cleaner.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
| |
The glClear, glBitmap and glDrawPixels code now use a new st_draw_quad()
helper function.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Define a new st_util_vertex structure which is a bit smaller (9 floats
versus the previous 12 floats per vertex). Clean up the glClear,
glDrawPixels and glBitmap code that sets up the vertex data and does the
drawing so it's all very similar. This can lead to more consolidation.
v2: add assertion that vertex buffer slot == 0 to catch possible future
change in cso_get_aux_vertex_buffer_slot() behavior.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
| |
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Alos use the opportunity to mark inputs constant. (Context has to be
given as read-write to intel_miptree_supports_non_msrt_fast_clear()
to support debug output).
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Ben Widawsky <[email protected]>
|
|
|
|
|
|
|
|
| |
This will be re-used to initialize auxiliary buffers in lossless
compression case.
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Ben Widawsky <[email protected]>
|
|
|
|
|
|
|
|
|
| |
v2 (Ben): Use combination of msaa_layout and number of samples
instead of introducing explicit type for lossless
compression (intel_miptree_is_lossless_compressed()).
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Ben Widawsky <[email protected]>
|
|
|
|
|
|
|
|
|
| |
v2 (Ben): Use combination of msaa_layout and number of samples
instead of introducing explicit type for lossless
compression (intel_miptree_is_lossless_compressed()).
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Ben Widawsky <[email protected]>
|