| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With this change, to enable precise SIN and COS instructions
on Intel hardware, one can put
<option name="precise_trig" value="true"/>
in the proper drirc file.
V2: Make option name more generic
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Stephane Marchesin <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This fixes some cases of:
GL45-CTS.cull_distance.functional
on Skylake.
Reviewed-by: Chris Forbes <[email protected]>
Cc: "12.0" <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
Cc: "12.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When we are not packing a double input varying, we might need to
read its data in a non-aligned to 64-bit offset, so we read
the wrong data. This is happening when using explicit locations
in varyings because Mesa disables packing varying for that case.
const_index is in 32-bit size units but offset() is multiplying
it by destination type size units. When operating with double
input varyings, const_index value could be not aligned to 64 bits.
To fix it, we load the double vector as if it was a float based vector
with twice the number of components.
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
Cc: "12.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Data starts at suboffet 3 in 32-bit units (12 bytes), so it is not
64-bit aligned and the current implementation fails to read the data
properly. Instead, when there is is a double input varying, read it as
vector of floats with twice the number of components.
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
Cc: "12.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
For 3D textures we shouldn't be using NumLayers, we need
to get it from the depth.
This fixes:
GL45-CTS.geometry_shader.layered_framebuffer.clear_call_support
Reviewed-by: Eduardo Lima Mitev <[email protected]>
Cc: "12.0" <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This is more accurate than calling
_mesa_active_fragment_shader_has_side_effects because it looks at whether
or not the SSBOs, images, or atomic buffers are actually written rather
than just existing in the program.
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Cc: "12.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit c1107cec44ab030c7fcc97c67baa12df1cc9d7b5.
Apparently the hardware spec text I quoted in the commit message was
outright lying about scalar source math being supported on SNB, the
hardware seems to load 32 contiguous bits of data for each channel
regardless of the regioning mode. Fixes regressions in the following
CTS tests (which we didn't catch early due to CTS being temporarily
disabled in our CI system):
es2-cts.gtf.gl.atan.atan_vec3_frag_xvary
es2-cts.gtf.gl.cos.cos_vec2_frag_xvary
es2-cts.gtf.gl.atan.atan_vec2_frag_xvary
es2-cts.gtf.gl.pow.pow_vec2_frag_xvary_yconsthalf
es2-cts.gtf.gl.cos.cos_float_frag_xvary
es2-cts.gtf.gl.pow.pow_float_frag_xvary_yconsthalf
es2-cts.gtf.gl.atan.atan_vec3_frag_xvaryyvary
es2-cts.gtf.gl.pow.pow_vec3_frag_xvary_yconsthalf
es2-cts.gtf.gl.cos.cos_vec3_frag_xvary
es2-cts.gtf.gl.atan.atan_vec2_frag_xvaryyvary
Cc: [email protected]
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96346
Reported-by: Mark Janes <[email protected]>
Acked-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The conditional mod of these instructions determines the semantics of
the comparison itself (rather than being evaluated based on the result
of the instruction as is usually the case for most other instructions
that allow conditional mods), so it's in general not legal to
propagate a conditional mod into a CMP instruction. This prevents
cmod propagation from (mis)optimizing:
cmp.z.f0 tmp, ...
mov.z.f0 null, tmp
into:
cmp.z.f0 tmp, ...
which gives the negation of the flag result of the original sequence.
I originally noticed this while working on SIMD32 in the scalar
back-end, but the same scenario is likely to be possible in vec4
programs so this commit ports the bugfix with the same name from the
scalar back-end to the vec4 cmod propagation pass.
Cc: [email protected]
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
isl library is needed to build i965, libmesa_isl static library is added
to fix related Android building errors.
Any attempt to build libmesa_genxml as phony package module failed to deliver
gen{7,75,8,9}_pack.h generated headers, needed for libmesa_isl_gen{7,75,8,9}
Due to constraints in Android Build System, libmesa_genxml is built as static,
at least one source is needed, so dummy.c is autogenerated for this scope,
libmesa_genxml dependency is declared using LOCAL_WHOLE_STATIC_LIBRARIES,
to avoid building errors due to missing genxml/gen{7,75,8,9}_pack.h headers.
Cc: <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
| |
Cc: [email protected]
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Skipping the temporary allocation and copy instructions is easy (just
return dst), but the conditions used to find out whether the copy can
be optimized out safely without breaking the program are rather
complex: The destination must be exactly one component of at most the
execution width of the lowered instruction, and all source regions of
the instruction must be either fully disjoint from the destination or
be aligned with it group by group.
v2: Don't handle partial source-destination overlap for simplicity
(Jason). No instruction count regressions with respect to v1 in
either shader-db or the few FP64 shader_runner test-cases with
partial overlap I've checked manually.
Cc: [email protected]
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Piglit test ext_framebuffer_multisample_blit_scaled-blit-scaled
(with added 16x sample support) now passes with this patch.
Cc: "12.0" <[email protected]>
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
Coverity warns in multiple places about the potential for division by
zero, caused by this function's default case.
Reviewed-by: Francisco Jerez <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Right now the implementation only checks if the internalformat is
supported or not. But that implementation is wrong, returning
unsupported for some internalformats. Additionally, checking if
the internalformat is supported or not is already done at mesa/main
before calling the driver hook, so this new check is not needed.
Acked-by: Eduardo Lima <[email protected]>
Acked-by: Antia Puentes <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Among other thigs, fix a gpu hang when using INTEL_DEBUG=shader_time
for any shader.
Signed-off-by: Jason Ekstrand <[email protected]>
Signed-off-by: Alejandro Piñeiro <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The old method pushed data for each channels uvec3 data of
gl_LocalInvocationID.
The new method pushes 1 dword of data that is a 'thread local ID'
value. Based on that value, we can generate gl_LocalInvocationIndex
and gl_LocalInvocationID with some calculations.
Cc: "12.0" <[email protected]>
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The cross thread constant support appears on Haswell. It allows us to
upload a set of uniform data for all threads without duplicating it
per thread.
One complication is that cross-thread constants are loaded into
registers before per-thread constants. Previously, our local IDs were
loaded before the uniform data and treated as 'payload' data, even
though they were actually pushed into the registers like the other
uniform data.
Therefore, in this patch we simultaneously enable a newer layout where
each thread now uses a single uniform slot for a unique local ID for
the thread. This uniform is handled specially to make sure it is added
last into the uniform push constant registers. This minimizes our
usage of push constant registers, and maximizes our ability to use
cross-thread constants for registers.
To swap from the old to the new layout, we also need to flip some
lowering pass switches to let our driver handle the lowering instead.
We also no longer force thread_local_id_index to -1.
v4:
* Minimize size of patch that switches from the old local ID layout
to the new layout (Jason)
Cc: "12.0" <[email protected]>
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The cross thread constant support appears on Haswell. It allows us to
upload a set of uniform data for all threads without duplicating it
per thread.
We also support per-thread data which allows us to store a per-thread
ID in one of the uniforms that can be used to calculate the
gl_LocalInvocationIndex and gl_LocalInvocationID variables.
v4:
* Support the old local ID push constant layout as well (Jason)
Cc: "12.0" <[email protected]>
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We need information about push constants in a few places for the GL
driver, and another couple places for the vulkan driver.
When we add support for uploading both a common (cross-thread) set of
push constants, combined with the previous per-thread push constant
data, things are going to get even more complicated. To simplify
things, we add push constant info into the cs prog_data struct.
The cross-thread constant support is added as of Haswell. To support
it we need to make sure all push constants with uniform values are
added to earlier registers. The register that varies per thread and
holds the thread invocation's unique local ID needs to be added last.
For now we add the code that would calculate cross-thread constatn
information for hsw+, but we force it (cross_thread_supported) off
until the other parts of the driver support it.
v4:
* Support older local ID push constant layout as well. (Jason)
Cc: "12.0" <[email protected]>
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
| |
Cc: "12.0" <[email protected]>
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We add a lowering pass for nir intrinsics. This pass can replace nir
intrinsics with driver specific nir lower code.
We lower the gl_LocalInvocationIndex intrinsic based on a uniform
which is loaded with a thread specific ID.
We also lower the gl_LocalInvocationID based on
gl_LocalInvocationIndex.
v2:
* Create variable during lowering pass. (Ken)
v3:
* Don't create a variable, but instead just insert an intrisic call
to load a uniform from the allocated location. (Jason)
v4:
* Don't run this pass if thread_local_id_index < 0
Cc: "12.0" <[email protected]>
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This thread ID uniform will be used to compute the
gl_LocalInvocationIndex and gl_LocalInvocationID values.
It is important for this uniform to be added in the last push constant
register. fs_visitor::assign_constant_locations is updated to make
sure this happens.
The reason this is important is that the cross-thread push constant
registers are loaded first, and the per-thread push constant registers
are loaded after that. (Broadwell adds another push constant upload
mechanism which reverses this order, but we are ignoring this for
now.)
v2:
* Add variable in intrinsics lowering pass
* Make sure the ID is pushed last in assign_constant_locations, and
that we save a spot for the ID in the push constants
v3:
* Simplify code based with Jason's suggestions.
Cc: "12.0" <[email protected]>
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
v4:
* Force thread_local_id_index to -1 for now, and have
fs_visitor::setup_cs_payload look at thread_local_id_index. This
enables us to more easily cut over from the old local ID layout to
the new layout, as suggested by Jason.
Cc: "12.0" <[email protected]>
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
v2:
* simd16/32 fixes (curro)
Cc: "12.0" <[email protected]>
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
| |
Cc: "12.0" <[email protected]>
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
v2:
* Move lower flag to context constants. (Ken)
Cc: "12.0" <[email protected]>
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]> (v1)
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
This fixes 64 Vulkan CTS tests per gen
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96299
Reviewed-by: Francisco Jerez <[email protected]>
Cc: "12.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Isolines aren't reversed. commit 5b2d8c2273c6f fixed this for the vec4
TES backend, but not the scalar one.
Found while debugging GL45-CTS.tessellation_shader.
tessellation_control_to_tessellation_evaluation.gl_tessLevel.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
| |
This can occur when max_vertices=0 is explicitly specified.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Cc: "12.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This INTEL_DEBUG option disables lossless compression (also known
as render buffer compression).
v2: (Matt) Use likely(!lossless_compression_disabled) instead of
!likely(lossless_compression_disabled)
(Grazvydas) Update docs/envvars.html
Cc: "12.0" <[email protected]>
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes rendering in Shadow of Mordor with rbc. Application writes
RGBA_UNORM texture filling it with values the application wants to
later on treat as SRGB_ALPHA.
Intel driver enables lossless compression for the buffer by the time
of writing. However, the driver fails to make sure the buffer can be
sampled as something else later on and unfortunately there is
restriction in the hardware for using lossless compression for srgb
formats which looks to extend itself to the sampling engine also.
Requesting srgb to linear conversion on top of compressed buffer
results the color values to be pretty much garbage.
Fortunately none of tracked benchmarks showed a regression with
this.
v2 (Matt): Add missing space
Cc: "12.0" <[email protected]>
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We weren't setting up several of the uniform values for the patch
header, so we'd crash when uploading push constants. We at least
need to initialize them to zero. We also had the isoline parameters
reversed, so it would also render incorrectly (if it didn't crash).
Fixes a new Piglit test(*) (isoline-no-tcs), as well as crashes in
GL44-CTS.tessellation_shader.single.max_patch_vertices.
(*) https://lists.freedesktop.org/archives/piglit/2016-May/019866.html
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
|
|
|
| |
The driver was adding the skip components but always for buffer 0.
This fixes:
GL45-CTS.gtf40.GL3Tests.transform_feedback3.transform_feedback3_skip_multiple_buffers
Reviewed-by: Kenneth Graunke <[email protected]>
Cc: "12.0 11.2" <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I haven't found any evidence that this isn't supported by the
hardware, in fact according to the SNB hardware spec:
"The supported regioning modes for math instructions are align16,
align1 with the following restrictions:
- Scalar source is supported.
[...]
- Source and destination offset must be the same, except the case of
scalar source."
Cc: "12.0" <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This is the case for SNB math instructions so we need to be careful
and insert the literal value of the immediate into the table (rather
than its absolute value) if the instruction is unable to invert the
sign of the constant on the fly.
Cc: "12.0" <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Cc: "12.0" <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
single-GRF writes.
Which requires using a bitset instead of a boolean flag to keep track
of the GRFs we've seen a generating instruction for already. The
search loop continues until all instructions initializing the value of
the source VGRF have been found, or it is determined that coalescing
is not possible.
Fixes a few piglit test cases on Gen4-6 which were regressed by
6956015aa514f2d06d0e4b33bfe6bca83142fbf0 due to the different (yet
perfectly valid) ordering in which copy instructions are emitted now
by the simd lowering pass, which had the side effect of causing this
optimization pass to start corrupting the program in cases where a
VGRF-to-MRF copy instruction would be eliminated but only the last
instruction writing to the source VGRF region would be rewritten to
point to the target MRF.
Cc: "12.0" <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This will be required to correctly transform the destination of 8-wide
instructions that write a single GRF of a VGRF to MRF copy marked
COMPR4.
Cc: "12.0" <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
loops.
This will allow compute_to_mrf to handle cases where the source of the
VGRF-to-MRF copy is initialized by more than one instruction. In such
cases we cannot rewrite the destination of any of the generating
instructions until it's known whether the whole VGRF source region can
be coalesced into the destination MRF, which will imply continuing the
search until all generating instructions have been found or it has
been determined that the VGRF and MRF registers cannot be coalesced.
Cc: "12.0" <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Compute-to-mrf was checking whether the destination of scan_inst is
more than one component (making assumptions about the instruction data
type) in order to find out whether the result is being fully copied
into the MRF destination, which is rather inaccurate in cases where a
single-component instruction is only partially contained in the source
region, or when the execution size of the copy and scan_inst
instructions differ. Instead check whether the destination region of
the instruction is really contained within the bounds of the source
region of the copy.
Cc: "12.0" <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
regions_overlap().
Compute-to-mrf was being rather heavy-handed about checking whether
instruction source or destination regions interfere with the copy
instruction, which could conceivably lead to program miscompilation.
Fix it by using regions_overlap() instead of the open-coded and
dubiously correct overlap checks.
Cc: "12.0" <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Cc: "12.0" <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
| |
ARB_compute_shader was the last feature missing.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We know that there cannot be any destination dependency race if we
reach the beginning or end of the program without having found any
other instruction the send could possibly race with. This avoids
emitting a pile of useless moves at the beginning or end of the
program in the most common case in which the program has a single
basic block only.
On the original i965 I get the following shader-db results:
total instructions in shared programs: 3354165 -> 3215637 (-4.13%)
instructions in affected programs: 3183065 -> 3044537 (-4.35%)
helped: 13498
HURT: 0
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
| |
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Just to make sure we keep the SIMD lowering pass tidy when we
introduce additional logic to try to optimize out the copy
instructions used to zip and unzip the destination and source regions
into multiple packed regions of the lowered instruction width.
Shouldn't cause any functional changes.
Reviewed-by: Jason Ekstrand <[email protected]>
|