| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Reviewed-by: José Fonseca <[email protected]>
|
|
|
|
|
|
|
| |
There's no reason to compute texel size, stride, etc. if there's no
image data to map.
Reviewed-by: José Fonseca <[email protected]>
|
|
|
|
|
|
| |
And add a const qualifier.
Reviewed-by: José Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes a segmentation fault in conform divzero.c test.
This happens when glTexImage(level, width=0, height=0) is called. We
don't allocate texture memory in that case so the ImageSlices array
was never allocated.
Cc: "10.1" <[email protected]>
Reviewed-by: José Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Straightforward fix to properly load dest->color with color data, as
opposed to position data as previously implemented.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=27499
Cc: "10.1" <[email protected]>
Signed-off-by: Ville Syrjälä <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
| |
Fixes assertion failures with radeonsi.
Tested-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This prevents buffer overflow w/ llvmpipe when running piglit
bin/gl-3.2-layered-rendering-clear-color-all-types 1d_array single_level -fbo -auto
v2: Compute the framebuffer size as the minimum size, as pointed out by
Brian; compacted code; ran piglit quick test list (with no
regressions.)
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
HiZ operations make the depth/render caches out of sync with the sampler
caches. We need to arrange for a TC flush to happen before the target
buffer is used by the sampler. Calling brw_render_cache_set_add_bo
makes that happen.
On previous generations, brw_blorp_exec took care of flushing the
texture cache by calling intel_batchbuffer_emit_mi_flush after doing
any rendering. If we were to use the normal drawing path, then
brw_postdraw_set_buffers_need_resolve would handle this.
On Broadwell, we don't use BLORP, and we don't emit a rectangle
primitive via the normal drawing path. The 3DSTATE_WM_HZ_OP and
PIPE_CONTROL implicitly make drawing happen. So, none of our existing
code makes this flush happen - we need to do it directly.
Fixes 11 Piglit copyteximage subtests.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77223
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77226
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
| |
No need to use 32-bits to store 15 and 12.
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
To fix MSVC build.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Section 4.3.1, page 220, of OpenGL 3.3 specification explains
the error conditions for glreadPixels():
"If the format is DEPTH_STENCIL, then values are taken from
both the depth buffer and the stencil buffer. If there is
no depth buffer or if there is no stencil buffer, then the
error INVALID_OPERATION occurs. If the type parameter is
not UNSIGNED_INT_24_8 or FLOAT_32_UNSIGNED_INT_24_8_REV,
then the error INVALID_ENUM occurs."
Fixes failing Khronos CTS test packed_depth_stencil_error.test
V2: Avoid code duplication
Cc: <[email protected]>
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
From the OpenGL 4.4 spec page 275:
"If pname is FRAMEBUFFER_ATTACHMENT_COMPONENT_TYPE, param will
contain the format of components of the specified attachment,
one of FLOAT, INT, UNSIGNED_INT, SIGNED_NORMALIZED, or
UNSIGNED_NORMALIZED for floating-point, signed integer,
unsigned integer, signed normalized fixedpoint, or unsigned
normalized fixed-point components respectively. If no data
storage or texture image has been specified for the attachment,
param will contain NONE. This query cannot be performed for a
combined depth+stencil attachment, since it does not have a
single format."
Fixes Khronos CTS test: packed_depth_stencil_parameters.test
Khronos Bug# 9170
Cc: <[email protected]>
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Benjamin Bellec <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Cc: <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
The '**used' pointer was pointing into the stObj->sampler_views array.
If 'free' was null, we'd realloc that array, thus making the 'used'
pointer invalid. This soon led to memory errors.
Just change the pointer to be '*used' so it points directly at the
pipe_sampler_view.
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Avoid looping over 32/48/96 (!!) tex image units every draw, most of
which we don't care about.
Improves performance on everyone's favorite not-a-benchmark by 2.9% on
Haswell.
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This gives us a better bound for some hot loops in the drivers than
MAX_COMBINED_TEXTURE_IMAGE_UNITS, which is ridiculously large on modern
hardware, and only getting worse as more shader stages are added.
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Back when I originally wrote this code, force_sechalf was only used for
Gen4 code, so I didn't bother hooking it up. However, it's used more
generally these days. In particular, we use it for computing
gl_SamplePosition.
Fixes Piglit's spec/ARB_sample_shading/builtin-gl-sample-position tests.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77222
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We previously only allowed coalescing registers that interfere (i.e.,
whose live ranges overlap) if the destination register's live range was
entirely inside the source's live range. This is unnecessary -- we only
need to check for interfering writes in the intersection of their live
ranges.
total instructions in shared programs: 1639470 -> 1638453 (-0.06%)
instructions in affected programs: 84751 -> 83734 (-1.20%)
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Rather than any old control flow. Muchnick's algorithm just checks for
interfering writes between the MOV and the end of the program. Handling
this when you have backward branches is hard, so don't, but there's no
reason to bail if you see forward branches.
instructions in affected programs: 4270 -> 4248 (-0.52%)
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We were starting at the beginning of the instruction list, rather than
with the MOV instruction itself. This allows us to coalesce after
control flow.
Excluding the shaders from an unreleased title, the shader-db results:
total instructions in shared programs: 1603791 -> 1594215 (-0.60%)
instructions in affected programs: 678772 -> 669196 (-1.41%)
GAINED: 5
LOST: 0
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
And avoid rewriting other instructions unnecessarily. Removes a few
self-moves we weren't able to handle because they were components of a
large VGRF.
instructions in affected programs: 830 -> 826 (-0.48%)
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
Otherwise there's nothing to do.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This sets up the proper execution mask for sends in SIMD16 mode.
Fixes Piglit's glsl-fs-normalmatrix, glsl-fs-uniform-array-2,
glsl-fs-uniform-array-6, and glsl-fs-uniform-array-7 on Ironlake,
which regressed when I enabled SIMD16 pull parameter support in
commit b207e88b25e526d0f1ada7b19605b880a27866dc.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
| |
Fixes failures in Khronos OpenGL CTS test proxy_textures_invalid_samples
Cc: <[email protected]>
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
Fixes failures in Khronos OpenGL CTS test conditional_render_test9
Cc: <[email protected]>
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
Cc: <[email protected]>
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
gl_ViewportIndex doesn't get its own varying slot. It is stored
in VARYING_SLOT_PSIZ.z. This patch fixes the issue for both gen7
and gen8 because gen7_upload_3dstate_so_decl_list() is shared
between them.
Fixes failures in OpenGL Khronos CTS test transform_feedback_builtins.
Makes new piglit test glsl-1.50-transform-feedback-builtins pass for
'gl_ViewportIndex'.
Cc: <[email protected]>
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
gl_Layer doesn't get its own varying slot. It is stored in
VARYING_SLOT_PSIZ.y. This patch fixes the issue for both gen7
and gen8 because gen7_upload_3dstate_so_decl_list() is shared
between them.
Fixes failures in OpenGL Khronos CTS test transform_feedback_builtins.
Makes new piglit test glsl-1.50-transform-feedback-builtins pass for
'gl_Layer'.
Cc: <[email protected]>
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
Cc: <[email protected]>
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
fs-deref-literal-array-of-structs.shader_test
This allows the following shader code to work without a weird crash:
struct Foo {
int value[1];
};
int actual_value = Foo[2](Foo(int[1](100)), Foo(int[1](200)))[i].value[0];
Signed-off-by: Maarten Lankhorst <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes piglit's fbo-blit-stretch test on drivers which use the meta path.
(i965: should fix Broadwell, but also fixes Sandybridge/Ivybridge/Haswell
since this test falls off the blorp path now due to format conversion)
V2: Use scissor instead of just mangling the rects, to avoid texcoord
rounding problems. (Thanks Marek)
V3: Rebase on Eric's CTSI meta changes; re-add _mesa_update_state in the
CTSI path so that _mesa_clip_blit sees the correct bounds.
Signed-off-by: Chris Forbes <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77414
Reviewed-by: Anuj Phogat <[email protected]>
Tested-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
According to the spec:
<renderbuffertarget> must be RENDERBUFFER and <renderbuffer>
should be set to the name of the renderbuffer object to be
attached to the framebuffer. <renderbuffer> must be either
zero or the name of an existing renderbuffer object of type
<renderbuffertarget>, otherwise an INVALID_OPERATION error is
generated.
This patch changes the previous returned GL_INVALID_VALUE to
GL_INVALID_OPERATION.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76894
Cc: [email protected]
Reviewed-by: Anuj Phogat <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Signed-off-by: Samuel Iglesias Gonsalvez <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that we properly track accumulator dependencies, the scheduler is
able to schedule instructions between the mach and mov in the common
the integer multiplication pattern:
mul acc0, x, y
mach null, x, y
mov dest, acc0
Since a null destination implies no dependency on the destination, we
can also safely schedule instructions (that don't write the accumulator)
between the mul and mach.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This allows us to emit ADD/MUL/MAC instead of MUL/ADD/MUL/ADD,
saving one instruction and two temporary registers.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Signed-off-by: Juha-Pekka Heikkila <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This allows us to generate the MAC (multiply-accumulate) instruction,
which can be used to implement some expressions in fewer instructions
than doing a series of MUL and ADDs.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Signed-off-by: Juha-Pekka Heikkila <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This allows us to emit ADD/MUL/MAC instead of MUL/ADD/MUL/ADD,
saving one instruction and two temporary registers.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Signed-off-by: Juha-Pekka Heikkila <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This allows us to generate the MAC (multiply-accumulate) instruction,
which can be used to implement some expressions in fewer instructions
than doing a series of MUL and ADDs.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Signed-off-by: Juha-Pekka Heikkila <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Our hardware has an "accumulator" register, which can be used to store
intermediate results across multiple instructions. Many instructions
can implicitly write a value to the accumulator in addition to their
normal destination register. This is enabled by the "AccWrEn" flag.
This patch introduces a new flag, inst->writes_accumulator, which
allows us to express the AccWrEn notion in the IR. It also creates a
n ALU2_ACC macro to easily define emitters for instructions that
implicitly write the accumulator.
Previously, we only supported implicit accumulator writes from the
ADDC, SUBB, and MACH instructions. We always enabled them on those
instructions, and left them disabled for other instructions.
To take advantage of the MAC (multiply-accumulate) instruction, we
need to be able to set AccWrEn on other types of instructions.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Signed-off-by: Juha-Pekka Heikkila <[email protected]>
|
|
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Signed-off-by: Juha-Pekka Heikkila <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
OpenGL 4.0 spec, page 306 suggests an INVALID_OPERATION in glGetTexImage
if :
"format is one of the integer formats in table 3.3 and the internal
format of the texture image is not integer, or format is not one of
the integer formats in table 3.3 and the internal format is integer."
V2: Use helper function _mesa_is_format_integer()
Cc: <[email protected]>
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
| |
This function will be used in the following patch.
Cc: <[email protected]>
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
mesa currently returns 4 when GL_VERTEX_ATTRIB_ARRAY_SIZE is queried
for a vertex array initially set up with size=GL_BGRA. This patch
makes changes to return size=GL_BGRA as required by the spec.
Fixes Khronos OpenGL CTS test: vertex_array_bgra_basic.test
V2: Use array->Format instead of adding a new variable
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
Cc: <[email protected]>
|
|
|
|
|
|
| |
This reverts commit f092e8951ce5212ba3cbb382ce3a6666eb6c9bed.
Didn't mean to push this...
|
|
|
|
| |
Otherwise there's nothing to do.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This manifested as rendering failures or sometimes GPU hangs in
compositors when they accidentally got MSAA visuals due to a bug in the X
Server. Today we decided that the problem in compositors was equivalent
to a corruption bug we'd noticed recently in resizing MSAA-visual
glxgears, and debugging got a lot easier.
When we allocate our MCS MT, libdrm takes the size we request, aligns it
to Y tile size (blowing it up from 300x300=900000 bytes to 384*320=122880
bytes, 30 pages), then puts it into a power-of-two-sized BO (131072 bytes,
32 pages). Because it's Y tiled, we attach a 384-byte-stride fence to it.
When we memset by the BO size in Mesa, between bytes 122880 and 131072 the
data gets stored to the first 20 or so scanlines of each of the 3 tiled
pages in that row, even though only 2 of those pages were allocated by
libdrm. In the glxgears case, the missing 3rd page happened to
consistently be the static VBO that got mapped right after the first MCS
allocation, so corruption only appeared once window resize made us throw
out the old MCS and then allocate the same BO to back the new MCS.
Instead, just memset the amount of data we actually asked libdrm to
allocate for, which will be smaller (more efficient) and not overrun.
Thanks go to Kenneth for doing most of the hard debugging to eliminate a
lot of the search space for the bug.
Cc: "10.0 10.1" <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77207
Reviewed-by: Kenneth Graunke <[email protected]>
|