| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Once upon a time (commit 8313f44409) Paul added code for the unlit
centroid workaround (WaCopyUnlitCentroidBarys). His commit message
claims it fixed the EXT_framebuffer_multisample/interpolation {2,4}
{centroid-deriv,centroid-deriv-disabled} piglit tests but does not say
on which platform, though he cites the IVB PRM.
"3DSTATE_WM [DevIVB, DevHSW]" says
"[DevIVB]: Workaround: When Centroid Barycentric mode is required, HW
may produce incorrect interpolation results when a 2X2 pixels have
unlit pixels."
I later disabled it for Haswell (commit f6db414f3c) with no known ill
effects.
The Sandybridge page does not have this text, but the workarounds
database (see WaCopyUnlitCentroidBarys) says the issues applies *only*
to Sandybridge, and in fact in commit 1a2de7dce8fc I note that disabling
the workaround on Sandybridge causes the tests Paul originally mentioned
to fail.
So this is, and always has been, a huge confusing mess.
Disabling the workaround indeed causes the tests Paul originally
mentioned to fail on Sandybridge but not on Ivybridge/Baytrail.
On Ivybridge:
total instructions in shared programs: 6914901 -> 6909599 (-0.08%)
instructions in affected programs: 106766 -> 101464 (-4.97%)
helped: 884
total cycles in shared programs: 70874764 -> 70813774 (-0.09%)
cycles in affected programs: 794144 -> 733154 (-7.68%)
helped: 688
HURT: 186
LOST: 1
GAINED: 6
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The pass I introduced in commit a2dc11a7818c04d8dc0324e8fcba98d60bae
was entirely broken. A missing "break" made the load_interpolated_input
case always fall through to "default" and hit a "continue", making it
not actually move any load_interpolated_input intrinsics at all.
It would only move the simple load_barycentric_* intrinsics, which
don't emit any code anyway, making it basically useless.
The initial version I sent of the pass worked, but I apparently
failed to verify that the simplified version in v2 actually worked.
With the obvious fix applied (so we actually tried to move
load_interpolated_input intrinsics), I discovered a second bug: we
weren't moving the offset SSA def to the top, breaking SSA validation.
The new version of the pass actually moves load_interpolated_input
intrinsics and all their dependencies, as intended.
Papers over GPU hangs on Ivybridge and Baytrail caused by the
recent NIR FS input rework by restoring the old behavior.
(I'm not honestly sure why they hang with PLN not at the top.)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97083
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
| |
We would have hit a segfault already if this could be null.
Fixes Coverity warning spotted by Matt.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
I do appreciate the cleverness, but unfortunately it prevents a lot more
cleverness in the form of additional compiler optimizations brought on
by -fstrict-aliasing.
No difference in OglBatch7 (n=20).
Co-authored-by: Davin McCall <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
| |
intel_mipmap_tree::logical_depth0 is now in number of 2D slices so we no
longer need to be multiplying by 6.
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
Cc: "12.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Now that the logical_depth0 field is in number of 2D slices, we don't need
to be multiplying by 6 when creating the surface. It wasn't hurting
anything primarily because we get the actual length from the view which was
already handling it correctly.
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
|
| |
intel_mipmap_tree::logical_depth0 is now in 2-D slices so there is no need
for us to multiply by 6 when we go to fill out a blorp surface state.
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We can do this in NIR now. No need to keep a GLSL pass lying around for
it.
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Cc: "12.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes the following piglit tests on gen6+:
tex-miplevel-selection textureProjGradOffset 2DRect
tex-miplevel-selection textureGradOffset 2DRect
tex-miplevel-selection textureGradOffset 2DRectShadow
tex-miplevel-selection textureProjGradOffset 2DRect_ProjVec4
tex-miplevel-selection textureProjGradOffset 2DRectShadow
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Cc: "12.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since 7f53fead5c we treat every location as using all
four components so we only need special handling for
doubles when they cross multiple locations.
This fixes a crash in GL45-CTS.enhanced_layouts.varying_locations
where the outputs array would overflow when a dmat2 was stored at
the max varying location i.e 30.
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
| |
From the redundant redundant department.
Reported-by: Michael Schellenberger Costa <[email protected]>
|
|
|
|
|
|
| |
Cc: "12.0" <[email protected]>
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We always resort to the pull model for instanced GS inputs. So, we'd
better include the VUE handles, or else we can't actually pull anything.
Ian reports that on his branch with OES_geometry_shader enabled,
this fixes a bunch of dEQP-GLES31.functional.geometry_shading tests::
- instanced.draw_2_instances_geometry_2_invocations
- instanced.draw_2_instances_geometry_8_invocations
- instanced.draw_4_instances_geometry_2_invocations
- instanced.draw_4_instances_geometry_8_invocations
- instanced.draw_8_instances_geometry_2_invocations
- instanced.draw_8_instances_geometry_8_invocations
- instanced.geometry_2_invocations
- instanced.geometry_32_invocations
- instanced.geometry_8_invocations
- instanced.geometry_max_invocations
- instanced.geometry_output_different_2_invocations
- instanced.geometry_output_different_32_invocations
- instanced.geometry_output_different_8_invocations
- instanced.geometry_output_different_max_invocations
- instanced.invocation_output_vary_by_attribute
- instanced.invocation_output_vary_by_texture
- instanced.invocation_output_vary_by_uniform
- query.primitives_generated_instanced
Cc: [email protected]
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
Tested-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
| |
We do this for all other stages.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Acked-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Alejandro Piñeiro <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Here we create a new output_generic_reg array with the ability to
store the dst_reg for each component of user defined varyings.
This is needed as the previous code only stored the dst_reg based
on the varying location which meant packed varyings would overwrite
each other.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Alejandro Piñeiro <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
For example where n=3 first_component=1 this will give us
0xE (WRITEMASK_YZW).
V2:
Add assert to check first component is <= 4 (Suggested by Ken)
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This will be used to swizzle components to the beginning or end
of the vector based on the component layout qualifier and whether
we are doing a load or store.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
| |
Acked-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
|
|
|
| |
Here we use the component qualifier (which is the first component)
as an offset when loading output varyings.
Reviewed-by: Alejandro Piñeiro <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Rather than trying to work out the total number of components
used at a location we simply treat all outputs as vec4s. This
removes the need for complex code looping over varyings to match
packed locations and the need for storing the total number of
components used at each location.
Reviewed-by: Alejandro Piñeiro <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
We will use this for output varyings. To make component
packing simpler we will just treat all varyings as vec4s.
Reviewed-by: Alejandro Piñeiro <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
TCS/TES/GS and now FS all handle these in stage-specific functions.
CS don't have inputs, so VS was the only one left using this code.
Move it to the VS-specific function for clarity.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We no longer use this message. As far as I can tell, it's fairly
useless - the equivalent information is provided in the payload.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This eliminates the need to walk the list of input variables, recurse
into their types (via logic largely redundant with nir_lower_io), and
interpolate all possible inputs up front. The backend no longer has
to care about variables at all, which eliminates complications from
trying to pack multiple variables into the same location. Instead,
each intrinsic specifies exactly what's needed.
This should unblock Timothy's work on GL_ARB_enhanced_layouts.
Each load_interpolated_input intrinsic corresponds to PLN instructions,
while load_barycentric_at_* intrinsics correspond to pixel interpolator
messages. The pixel/centroid/sample barycentric intrinsics simply refer
to payload fields (delta_xy[]), and don't actually generate any code.
Because we use a single intrinsic for both centroid-qualified variables
and interpolateAtCentroid(), they become indistinguishable. We stop
sending pixel interpolator messages for those, and instead use the
payload provided data, which should be considerably faster.
On Broadwell:
total instructions in shared programs: 9067751 -> 9067570 (-0.00%)
instructions in affected programs: 145902 -> 145721 (-0.12%)
helped: 422
HURT: 209
total spills in shared programs: 2849 -> 2899 (1.76%)
spills in affected programs: 760 -> 810 (6.58%)
helped: 0
HURT: 10
total fills in shared programs: 3910 -> 3950 (1.02%)
fills in affected programs: 617 -> 657 (6.48%)
helped: 0
HURT: 10
LOST: 3
GAINED: 3
The differences mostly appear to be slight changes in MOVs.
v2: Use nir_shader_compiler_options::use_interpolated_input_intrinsics
flag rather than passing it directly to nir_lower_io. Use the
unreachable() macro rather than assert in one place. (Review
feedback from Chris Forbes.)
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
Acked-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, i965 interpolates all FS inputs at the top of the program.
This has advantages and disadvantages, but I'd like to keep that policy
while reworking this code. We can consider changing it independently.
The next patch will make the compiler generate PLN instructions "on the
fly", when it encounters an input load intrinsic, rather than doing it
for all inputs at the start of the program.
To emulate this behavior, we introduce an ugly pass to move all NIR
load_interpolated_input and payload-based (not interpolator message)
load_barycentric_* intrinsics to the shader's start block.
This helps avoid regressions in shader-db for cases such as:
if (...) {
...load some input...
} else {
...load that same input...
}
which CSE can't handle, because there's no dominance relationship
between the two loads. Because the start block dominates all others,
we can CSE all inputs and emit PLNs exactly once, as we did before.
Ideally, global value numbering would eliminate these redundant loads,
while not forcing them all the way to the start block. When that lands,
we should consider dropping this hacky pass.
Again, this pass currently does nothing, as i965 doesn't generate these
intrinsics yet. But it will shortly, and I figured I'd separate this
code as it's relatively self-contained.
v2: Dramatically simplify pass - instead of creating new instructions,
just remove/re-insert their list nodes (suggested by Jason Ekstrand).
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chris Forbes <[email protected]> [v1]
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When working with a non-multisampled render target, asking for "sample"
interpolation locations doesn't make sense. We demote them to centroid.
In a couple of patches, brw_compute_barycentric_modes will begin looking
at these intrinsics to determine the barycentric modes. fs_visitor also
will use them to code-generate pixel interpolator messages or payload
references. Handling the "but what if it's not MSAA?" logic ahead of
time in a NIR pass simplifies things and prevents duplicated logic.
This patch doesn't actually do anything useful yet as we don't generate
these intrinsics. I decided to keep it separate as it's self-contained,
in the hopes of shrinking the "convert everything" patch for reviewers.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
From the Sky Lake PRM:
"For SURFTYPE_CUBE: For Sampling Engine Surfaces and Typed Data Port
Surfaces, the range of this field is [0,340], indicating the number of
cube array elements (equal to the number of underlying 2D array elements
divided by 6). For other surfaces, this field must be zero."
In other words, the depth field for cube maps is in number of cubes not
number of 2-D slices so we need to divide by 6. ISL will do this correctly
for us assuming that we provide it with the correct array bounds which it
expects to be in 2-D slices. It appears as if we've been doing this wrong
ever since we first added cube map arrays for Sandy Bridge and the change
to ISL made things slightly worse. While we're at it, we now need to remoe
the shader hacks we've always done since they were only needed because we
were setting the depth field six times too large.
v2: Fix the vec4 backend as well (not sure how I missed this).
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This matches what we do for cube maps where logical_depth0 is in number of
face-layers rather than number of cubes. This does mean that we will
temporarily be setting the surface bounds too loose for cube map textures
but we are already setting them too loose for cube arrays and we will be
fixing that in the next commit anyway.
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
Cc: "12.0 11.2 11.1" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The GL API and mesa internals do this differently than we do. In GL, there
is no depth parameter for 1-D arrays and height is used. In the i965
miptree code we do the sane thing and make height == 1 and use depth for
number of slices. This makes for a mismatch every time we create a 1-D
array texture from GL. Instead of actually solving this problem, we just
said "1-D is hard, let's make sure it works no matter which way we pass the
parameters" and called it a day.
This commit fixes the one GL -> i965 transition point where we weren't
already handling 1-D array textures to do the right thing and then replaces
the magic fixup code with an assert that you're doing the right thing.
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
Cc: "12.0 11.2 11.1" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As the spec allows for {server,client}_wait_sync to be called without
currently bound context, while our implementation requires context
pointer.
v2: Add a mutex and acquire it for the duration of
brw_fence_client_wait() and brw_fence_is_completed() as suggested
by Chad.
Cc: "11.2 12.0" <[email protected]>
Signed-off-by: Emil Velikov <[email protected]>
Signed-off-by: Tomasz Figa <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes a 10-20% performance regression in OglCSDof caused by commit
5a8c89038abab0184ea72664ab390ec6ca58b4d6, which made images (in the
image load/store sense) use BDW_MOCS_PTE instead of BDW_MOCS_WB.
This seems sketchy, as the default PTE value is supposed to be
WB LLC eLLC, which is the same as our MOCS WB setting. It's only
supposed to change when using a surface for display, which won't
ever happen for images. Something may be wrong in the kernel...
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously SHADER_OPCODE_MULH could only exist on Gen7+, so the
assertion assumed the Gen7+ accumulator rules. A future patch will
allow this instruction on at least Gen6, so update the assertion.
v2: Use get_lowered_simd_width instead of open coding it. Suggested by
Curro.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Matt Turner <[email protected]> [v1]
|
|
|
|
|
|
|
| |
v2: Rebase on changes to previous two patches.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
| |
v2: Retype LZD source as UD to avoid potential problems with 0x80000000.
Suggested by Matt. Also update comment about problem values with
LZD(abs(x)). Suggested by Curro.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This uses one less instruction.
v2: Move emit_find_msb_using_lzd out of the visitor classes. Suggested
by Curro.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
| |
With the existing lowering passes, the functions from this extension
become a bunch of bit twiddling operations that have always been
supported.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
| |
This extension does not depend on the Gen. It only depends on the
availability of GLSL 1.30.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Thanks to rebase fail, recent surface state changes (commits 7e951cd56,
8521ce1a7, and 69c0dc5c53) effectively reverted 727a9b24933 and 367cf3a2e3e
which was unintentional. This should bring it back.
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|