| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
... and associated file(s).
No longer needed since commit 057259655e7 ("i965: Don't link libmesa or
libdri_test_stubs into tests")
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We were programming the number of threads per subslice, when we should
have been programming the total number of threads on the GPU as a whole.
Thanks to Curro and Jordan for helping track this down!
On Skylake GT3e:
- Improves performance in Unreal's Elemental Demo by roughly 1.5-1.7x.
- Improves performance in Synmark's Gl43CSDof by roughly 3.7x.
- Improves performance in Synmark's Gl43GSCloth by roughly 1.18x.
On Broadwell GT2:
- Improves performance in Unreal's Elemental Demo by roughly 1.2-1.5x.
- Improves performance in Synmark's Gl43CSDof by roughly 2.0x.
- Improves performance in Synmark's Gl43GSCloth by 1.47035% +/-
0.255654% (n=25).
On Haswell GT3e:
- Improves performance in Unreal's Elemental Demo (in GL 4.3 mode)
by roughly 1.10x.
- Improves performance in Synmark's Gl43CSDof by roughly 1.18x.
- Decreases performance in Synmark's Gl43CSCloth by -1.99484% +/-
0.432771% (n=64).
On Ivybridge GT2:
- Improves performance in Unreal's Elemental Demo (in GL 4.2 mode)
by roughly 1.03x.
- Improves performance in Synmark's G/43CSDof by roughly 1.25x.
- No change in Synmark's Gl43CSCloth (n=28).
Cc: "12.0" <[email protected]>
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I don't know that anything actually guarantees this, but if we exceed
the limits, we may end up overflowing and trashing random buffers that
happen to be nearby in the VMA space, leading to rendering corruption,
hangs, or worse.
We should really fix this properly. However, the pitfall has existed
for ages, so for now we should at least detect it.
Cc: "12.0" <[email protected]>
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
These are linear, not powers of two, and much more limited.
Cc: "12.0" <[email protected]>
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Most scratch stages use power of two sizes, in kilobytes, where
0 means 1kB. But compute shaders on Haswell have a minimum of 2kB,
and use a representation where 0 = 2kB.
This meant that we were effectively telling the hardware to allocate
each thread twice as much space as we meant to, while simultaneously
not allocating that much space in the buffer, leading to overflows.
Note that the existing code is completely wrong for Ivybridge,
but that will take additional work to sort out, so I've left it
as is for now. A subsequent commit will take care of that.
Together with the previous patches, this fixes rendering corruption
on Synmark's Gl43CSDof on Haswell.
Cc: "12.0" <[email protected]>
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Curro figured this out by investigating the simulator. Apparently
there's also a workaround in the Windows driver. I'm not sure it's
actually documented anywhere.
We were underallocating the scratch buffer by a factor of 128/70.
v2: Rename threads_per_subslice to scratch_ids_per_subslice
(suggested by Jordan Justen).
Cc: "12.0" <[email protected]>
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We were allocating enough space for the number of threads per subslice,
when we should have been allocating space for the number of threads in
the entire GPU.
Even though we currently run with a reduced thread count (due to a bug),
we might still overflow the scratch buffer because the address
calculation is based on the FFTID, which can depend on exactly which
threads, EUs, and threads are executing. We need to allocate enough
for every possible thread that could run.
Fixes rendering corruption in Synmark's Gl43CSDof on Gen8+.
Earlier platforms need additional bug fixes.
Cc: "12.0" <[email protected]>
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
We'll use this for compute shader thread counts and scratch space
calculations shortly.
Note that subslices are referred to as "half slices" on Ivybridge.
Cc: "12.0" <[email protected]>
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Skylake changes the representation of shared local memory size:
Size | 0 kB | 1 kB | 2 kB | 4 kB | 8 kB | 16 kB | 32 kB | 64 kB |
-------------------------------------------------------------------
Gen7-8 | 0 | none | none | 1 | 2 | 4 | 8 | 16 |
-------------------------------------------------------------------
Gen9+ | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
The old formula would substantially underallocate the amount of space.
This fixes GPU hangs on Skylake when running with full thread counts.
v2: Fix the Vulkan driver too, use a helper function, and fix the table
in the comments and commit message.
Cc: "12.0" <[email protected]>
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
An update in graphics specs has deleted the halign and valign fields
from XY_FAST_COPY_BLT command. See mesa commit 97f0f91.
Cc: Ben Widawsky <[email protected]>
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Ben Widawsky <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
When Kristian implemented GL_TEXTURE_EXTERNAL_OES, he hooked it up for gen8
but not for gen7 or earlier. It all works, we just need to emit the states
for the extra planes.
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
Cc: "12.0" <[email protected]>
|
|
|
|
|
|
|
| |
In the future int64 support will have the same requirements.
Reviewed-by: Ilia Mirkin <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With this change, to enable precise SIN and COS instructions
on Intel hardware, one can put
<option name="precise_trig" value="true"/>
in the proper drirc file.
V2: Make option name more generic
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Stephane Marchesin <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This fixes some cases of:
GL45-CTS.cull_distance.functional
on Skylake.
Reviewed-by: Chris Forbes <[email protected]>
Cc: "12.0" <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
Cc: "12.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When we are not packing a double input varying, we might need to
read its data in a non-aligned to 64-bit offset, so we read
the wrong data. This is happening when using explicit locations
in varyings because Mesa disables packing varying for that case.
const_index is in 32-bit size units but offset() is multiplying
it by destination type size units. When operating with double
input varyings, const_index value could be not aligned to 64 bits.
To fix it, we load the double vector as if it was a float based vector
with twice the number of components.
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
Cc: "12.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Data starts at suboffet 3 in 32-bit units (12 bytes), so it is not
64-bit aligned and the current implementation fails to read the data
properly. Instead, when there is is a double input varying, read it as
vector of floats with twice the number of components.
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
Cc: "12.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
For 3D textures we shouldn't be using NumLayers, we need
to get it from the depth.
This fixes:
GL45-CTS.geometry_shader.layered_framebuffer.clear_call_support
Reviewed-by: Eduardo Lima Mitev <[email protected]>
Cc: "12.0" <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This is more accurate than calling
_mesa_active_fragment_shader_has_side_effects because it looks at whether
or not the SSBOs, images, or atomic buffers are actually written rather
than just existing in the program.
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Cc: "12.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit c1107cec44ab030c7fcc97c67baa12df1cc9d7b5.
Apparently the hardware spec text I quoted in the commit message was
outright lying about scalar source math being supported on SNB, the
hardware seems to load 32 contiguous bits of data for each channel
regardless of the regioning mode. Fixes regressions in the following
CTS tests (which we didn't catch early due to CTS being temporarily
disabled in our CI system):
es2-cts.gtf.gl.atan.atan_vec3_frag_xvary
es2-cts.gtf.gl.cos.cos_vec2_frag_xvary
es2-cts.gtf.gl.atan.atan_vec2_frag_xvary
es2-cts.gtf.gl.pow.pow_vec2_frag_xvary_yconsthalf
es2-cts.gtf.gl.cos.cos_float_frag_xvary
es2-cts.gtf.gl.pow.pow_float_frag_xvary_yconsthalf
es2-cts.gtf.gl.atan.atan_vec3_frag_xvaryyvary
es2-cts.gtf.gl.pow.pow_vec3_frag_xvary_yconsthalf
es2-cts.gtf.gl.cos.cos_vec3_frag_xvary
es2-cts.gtf.gl.atan.atan_vec2_frag_xvaryyvary
Cc: [email protected]
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96346
Reported-by: Mark Janes <[email protected]>
Acked-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The conditional mod of these instructions determines the semantics of
the comparison itself (rather than being evaluated based on the result
of the instruction as is usually the case for most other instructions
that allow conditional mods), so it's in general not legal to
propagate a conditional mod into a CMP instruction. This prevents
cmod propagation from (mis)optimizing:
cmp.z.f0 tmp, ...
mov.z.f0 null, tmp
into:
cmp.z.f0 tmp, ...
which gives the negation of the flag result of the original sequence.
I originally noticed this while working on SIMD32 in the scalar
back-end, but the same scenario is likely to be possible in vec4
programs so this commit ports the bugfix with the same name from the
scalar back-end to the vec4 cmod propagation pass.
Cc: [email protected]
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
isl library is needed to build i965, libmesa_isl static library is added
to fix related Android building errors.
Any attempt to build libmesa_genxml as phony package module failed to deliver
gen{7,75,8,9}_pack.h generated headers, needed for libmesa_isl_gen{7,75,8,9}
Due to constraints in Android Build System, libmesa_genxml is built as static,
at least one source is needed, so dummy.c is autogenerated for this scope,
libmesa_genxml dependency is declared using LOCAL_WHOLE_STATIC_LIBRARIES,
to avoid building errors due to missing genxml/gen{7,75,8,9}_pack.h headers.
Cc: <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
| |
Cc: [email protected]
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Skipping the temporary allocation and copy instructions is easy (just
return dst), but the conditions used to find out whether the copy can
be optimized out safely without breaking the program are rather
complex: The destination must be exactly one component of at most the
execution width of the lowered instruction, and all source regions of
the instruction must be either fully disjoint from the destination or
be aligned with it group by group.
v2: Don't handle partial source-destination overlap for simplicity
(Jason). No instruction count regressions with respect to v1 in
either shader-db or the few FP64 shader_runner test-cases with
partial overlap I've checked manually.
Cc: [email protected]
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Piglit test ext_framebuffer_multisample_blit_scaled-blit-scaled
(with added 16x sample support) now passes with this patch.
Cc: "12.0" <[email protected]>
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
Coverity warns in multiple places about the potential for division by
zero, caused by this function's default case.
Reviewed-by: Francisco Jerez <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Right now the implementation only checks if the internalformat is
supported or not. But that implementation is wrong, returning
unsupported for some internalformats. Additionally, checking if
the internalformat is supported or not is already done at mesa/main
before calling the driver hook, so this new check is not needed.
Acked-by: Eduardo Lima <[email protected]>
Acked-by: Antia Puentes <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Among other thigs, fix a gpu hang when using INTEL_DEBUG=shader_time
for any shader.
Signed-off-by: Jason Ekstrand <[email protected]>
Signed-off-by: Alejandro Piñeiro <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The old method pushed data for each channels uvec3 data of
gl_LocalInvocationID.
The new method pushes 1 dword of data that is a 'thread local ID'
value. Based on that value, we can generate gl_LocalInvocationIndex
and gl_LocalInvocationID with some calculations.
Cc: "12.0" <[email protected]>
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The cross thread constant support appears on Haswell. It allows us to
upload a set of uniform data for all threads without duplicating it
per thread.
One complication is that cross-thread constants are loaded into
registers before per-thread constants. Previously, our local IDs were
loaded before the uniform data and treated as 'payload' data, even
though they were actually pushed into the registers like the other
uniform data.
Therefore, in this patch we simultaneously enable a newer layout where
each thread now uses a single uniform slot for a unique local ID for
the thread. This uniform is handled specially to make sure it is added
last into the uniform push constant registers. This minimizes our
usage of push constant registers, and maximizes our ability to use
cross-thread constants for registers.
To swap from the old to the new layout, we also need to flip some
lowering pass switches to let our driver handle the lowering instead.
We also no longer force thread_local_id_index to -1.
v4:
* Minimize size of patch that switches from the old local ID layout
to the new layout (Jason)
Cc: "12.0" <[email protected]>
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The cross thread constant support appears on Haswell. It allows us to
upload a set of uniform data for all threads without duplicating it
per thread.
We also support per-thread data which allows us to store a per-thread
ID in one of the uniforms that can be used to calculate the
gl_LocalInvocationIndex and gl_LocalInvocationID variables.
v4:
* Support the old local ID push constant layout as well (Jason)
Cc: "12.0" <[email protected]>
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We need information about push constants in a few places for the GL
driver, and another couple places for the vulkan driver.
When we add support for uploading both a common (cross-thread) set of
push constants, combined with the previous per-thread push constant
data, things are going to get even more complicated. To simplify
things, we add push constant info into the cs prog_data struct.
The cross-thread constant support is added as of Haswell. To support
it we need to make sure all push constants with uniform values are
added to earlier registers. The register that varies per thread and
holds the thread invocation's unique local ID needs to be added last.
For now we add the code that would calculate cross-thread constatn
information for hsw+, but we force it (cross_thread_supported) off
until the other parts of the driver support it.
v4:
* Support older local ID push constant layout as well. (Jason)
Cc: "12.0" <[email protected]>
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
| |
Cc: "12.0" <[email protected]>
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We add a lowering pass for nir intrinsics. This pass can replace nir
intrinsics with driver specific nir lower code.
We lower the gl_LocalInvocationIndex intrinsic based on a uniform
which is loaded with a thread specific ID.
We also lower the gl_LocalInvocationID based on
gl_LocalInvocationIndex.
v2:
* Create variable during lowering pass. (Ken)
v3:
* Don't create a variable, but instead just insert an intrisic call
to load a uniform from the allocated location. (Jason)
v4:
* Don't run this pass if thread_local_id_index < 0
Cc: "12.0" <[email protected]>
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This thread ID uniform will be used to compute the
gl_LocalInvocationIndex and gl_LocalInvocationID values.
It is important for this uniform to be added in the last push constant
register. fs_visitor::assign_constant_locations is updated to make
sure this happens.
The reason this is important is that the cross-thread push constant
registers are loaded first, and the per-thread push constant registers
are loaded after that. (Broadwell adds another push constant upload
mechanism which reverses this order, but we are ignoring this for
now.)
v2:
* Add variable in intrinsics lowering pass
* Make sure the ID is pushed last in assign_constant_locations, and
that we save a spot for the ID in the push constants
v3:
* Simplify code based with Jason's suggestions.
Cc: "12.0" <[email protected]>
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
v4:
* Force thread_local_id_index to -1 for now, and have
fs_visitor::setup_cs_payload look at thread_local_id_index. This
enables us to more easily cut over from the old local ID layout to
the new layout, as suggested by Jason.
Cc: "12.0" <[email protected]>
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
v2:
* simd16/32 fixes (curro)
Cc: "12.0" <[email protected]>
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
| |
Cc: "12.0" <[email protected]>
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
v2:
* Move lower flag to context constants. (Ken)
Cc: "12.0" <[email protected]>
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]> (v1)
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
This fixes 64 Vulkan CTS tests per gen
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96299
Reviewed-by: Francisco Jerez <[email protected]>
Cc: "12.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Isolines aren't reversed. commit 5b2d8c2273c6f fixed this for the vec4
TES backend, but not the scalar one.
Found while debugging GL45-CTS.tessellation_shader.
tessellation_control_to_tessellation_evaluation.gl_tessLevel.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
| |
This can occur when max_vertices=0 is explicitly specified.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Cc: "12.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This INTEL_DEBUG option disables lossless compression (also known
as render buffer compression).
v2: (Matt) Use likely(!lossless_compression_disabled) instead of
!likely(lossless_compression_disabled)
(Grazvydas) Update docs/envvars.html
Cc: "12.0" <[email protected]>
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes rendering in Shadow of Mordor with rbc. Application writes
RGBA_UNORM texture filling it with values the application wants to
later on treat as SRGB_ALPHA.
Intel driver enables lossless compression for the buffer by the time
of writing. However, the driver fails to make sure the buffer can be
sampled as something else later on and unfortunately there is
restriction in the hardware for using lossless compression for srgb
formats which looks to extend itself to the sampling engine also.
Requesting srgb to linear conversion on top of compressed buffer
results the color values to be pretty much garbage.
Fortunately none of tracked benchmarks showed a regression with
this.
v2 (Matt): Add missing space
Cc: "12.0" <[email protected]>
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We weren't setting up several of the uniform values for the patch
header, so we'd crash when uploading push constants. We at least
need to initialize them to zero. We also had the isoline parameters
reversed, so it would also render incorrectly (if it didn't crash).
Fixes a new Piglit test(*) (isoline-no-tcs), as well as crashes in
GL44-CTS.tessellation_shader.single.max_patch_vertices.
(*) https://lists.freedesktop.org/archives/piglit/2016-May/019866.html
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
|
|
|
| |
The driver was adding the skip components but always for buffer 0.
This fixes:
GL45-CTS.gtf40.GL3Tests.transform_feedback3.transform_feedback3_skip_multiple_buffers
Reviewed-by: Kenneth Graunke <[email protected]>
Cc: "12.0 11.2" <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I haven't found any evidence that this isn't supported by the
hardware, in fact according to the SNB hardware spec:
"The supported regioning modes for math instructions are align16,
align1 with the following restrictions:
- Scalar source is supported.
[...]
- Source and destination offset must be the same, except the case of
scalar source."
Cc: "12.0" <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|