| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
| |
Just say no to:
- brw->wm.base.prog_data = &brw->wm.prog_data->base.base;
We'll just use the brw_stage_prog_data pointer in brw_stage_state
and downcast it to brw_wm_prog_data as needed.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Just say no to:
- brw->gs.base.prog_data = &brw->gs.prog_data->base.base;
We'll just use the brw_stage_prog_data pointer in brw_stage_state
and downcast it to brw_gs_prog_data as needed.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Just say no to:
- brw->tes.base.prog_data = &brw->tes.prog_data->base.base;
We'll just use the brw_stage_prog_data pointer in brw_stage_state
and downcast it to brw_tes_prog_data as needed.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Just say no to:
- brw->tcs.base.prog_data = &brw->tcs.prog_data->base.base;
We'll just use the brw_stage_prog_data pointer in brw_stage_state
and downcast it to brw_tcs_prog_data as needed.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Just say no to:
- brw->vs.base.prog_data = &brw->vs.prog_data->base.base;
We'll just use the brw_stage_prog_data pointer in brw_stage_state
and downcast it to brw_vs_prog_data as needed.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
| |
Similar to brw_context(...), intel_texture_object(...), and so on.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
What is the difference between a 'driver_fence' and a 'fence'? Do the
characters 'driver_' add anything helpful? Nope. They do, though, add an
extra 7 chars and pull your eyeballs away to ask "huh? what's that?" one
microsecond too many.
Reviewed-by: Iago Toral Quiroga <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Iago Toral Quiroga <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
This is yet another patch for the great renaming begun long ago.
Reviewed-by: Iago Toral Quiroga <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We locked an unitialized mutex in the callstack
glClientWaitSync
intel_gl_client_wait_sync
brw_fence_client_wait_sync
because we forgot to initialize it in intel_gl_fence_sync.
(The EGLSync codepath didn't have this bug. It initialized the mutex in
intel_dri_create_sync).
We also forgot to tear down (mtx_destroy) the mutex when destroying
the sync object.
Cc: [email protected]
Reviewed-by: Iago Toral Quiroga <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Anv programs the hardware to use L3 data cache if we use either SSBOs or
images in the shaders, we can program i965 the same way.
gl_shader_program has a bit of a confusing named field with
'NumAtomicBuffers'. It doesn't tell how many buffers are accessed by the
shader in an atomic way but instead the number of atomic counters
manipulated by the shader.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The atom that uploads push constants listens to _NEW_TRANSFORM for
legacy clip plane handling. On Sandybridge, the gen6_vs_state atom
emits 3DSTATE_CONSTANT_VS as well as 3DSTATE_VS, so it needs to listen
to the same set of conditions.
However, it looks like Gen7 doesn't need this. The push constant atom
emits 3DSTATE_CONSTANT_VS directly, and the gen7_vs_state atom that
emits 3DSTATE_VS doesn't have a dependency on ctx->Transform.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
We need to free prog_data for TCS/TES too.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
| |
Cc: [email protected]
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
CACHE_NEW_CS_PROG hasn't existed in quite a long time...the old
comment was there, but not the actual bit.
Cc: [email protected]
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
| |
Cc: [email protected]
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
3DSTATE_PS doesn't need this. 3DSTATE_PS_EXTRA however does,
for brw_color_buffer_write_enabled().
Cc: [email protected]
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
| |
Cc: [email protected]
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Needed for user clip plane enables. Broken since this code was
introduced.
Cc: [email protected]
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
| |
This will make it easier to add more operations.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
| |
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
| |
Fixes unused variable warning in release build.
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
| |
This function is only ever used by an assert() this fixes an
unused function warning in release builds.
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
| |
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
| |
This fixes an unused variable warning on release builds.
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch causes 2 regressions in khronos' gles cts tests
on various intel platforms.
Failing tests:
ES3-CTS.functional.state_query.integers.viewport_getinteger
ES3-CTS.functional.state_query.integers.viewport_getfloat
Here is an explanation of what's causing the failures:
CTS tests are not clamping the x, y location of the viewport's
bottom-left corner as recommended by ARB_viewport_array and
OES_viewport_array:
"The location of the viewport's bottom-left corner, given by (x,y), are
clamped to be within the implementation-dependent viewport bounds range.
The viewport bounds range [min, max] tuple may be determined by
calling GetFloatv with the symbolic constant VIEWPORT_BOUNDS_RANGE_OES"
Khronos CTS merge request to fix the test case:
https://gitlab.khronos.org/opengl/cts/merge_requests/399
V2: Initialize the relevant variables for GL_OES_viewport_array on gen8+
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In core profile, we support up to 16 viewports. However, in the
majority of cases, only 1 of them is actually used - we only need
the others if the last shader stage prior to the rasterizer writes
gl_ViewportIndex.
Processing all 16 viewports adds additional CPU overhead, which hurts
CPU-intensive workloads such as Glamor. This meant that switching to
core profile actually penalized Glamor to an extent, which is
unfortunate.
This patch tracks the number of relevant viewports, switching between
1 and ctx->Const.MaxViewports if gl_ViewportIndex is written. A new
BRW_NEW_VIEWPORT_COUNT flag tracks this. This could mean re-emitting
viewport state when switching, but hopefully this is offset by doing
1/16th of the work in the common case. The new flag is also lighter
weight than BRW_NEW_VUE_MAP_GEOM_OUT, which we were using in one case.
According to Eric Anholt, x11perf -copypixwin10 performance improves by
11.5094% +/- 3.10841% (n=10) on his Skylake.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
Acked-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
| |
Using consistent naming allows us to create macros more easily.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
Using consistent naming allows us to create macros more easily.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
There's an assert right above this.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
The warning was also the wrong location, it should have been
in the else.
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The code already skips doing the depth stall on gen >= 8, and as we
enable new platforms this assertion will fail needlessly. Instead of
changing the caller, make this simple change.
Signed-off-by: Ben Widawsky <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
These will be used by the on disk shader cache.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Timothy Arceri <[email protected]>
Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>
|
|
|
|
|
|
| |
We already pass the shader so we can just get the stage from this.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
Signed-off-by: Adam Jackson <[email protected]>
|
|
|
|
|
|
|
|
| |
Now that we have gen_device_info mutable, we can update its values and drop
all copies we had in brw_context.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Make gen_device_info a mutable structure so we can update the fields that
can be refined by querying the kernel (like subslices and EU numbers).
This patch does not make any functional change, it just makes
gen_get_device_info() fill a structure rather than returning a const
pointer.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
VC4 was running into a major performance regression from enabling control
flow in the glmark2 conditionals test, because of short if statements
containing an ffract.
This pass seems like it was was trying to ensure that we only flattened
IFs that should be entirely a win by guaranteeing that there would be
fewer bcsels than there were MOVs otherwise. However, if the number of
ALU ops is small, we can avoid the overhead of branching (which itself
costs cycles) and still get a win, even if it means moving real
instructions out of the THEN/ELSE blocks.
For now, just turn on aggressive flattening on vc4. i965 will need some
tuning to avoid regressions. It does looks like this may be useful to
replace freedreno code.
Improves glmark2 -b conditionals:fragment-steps=5:vertex-steps=0 from 47
fps to 95 fps on vc4.
vc4 shader-db:
total instructions in shared programs: 101282 -> 99543 (-1.72%)
instructions in affected programs: 17365 -> 15626 (-10.01%)
total uniforms in shared programs: 31295 -> 31172 (-0.39%)
uniforms in affected programs: 3580 -> 3457 (-3.44%)
total estimated cycles in shared programs: 225182 -> 223746 (-0.64%)
estimated cycles in affected programs: 26085 -> 24649 (-5.51%)
v2: Update shader-db output.
Reviewed-by: Ian Romanick <[email protected]> (v1)
|
|
|
|
|
|
|
|
|
|
|
| |
It's already advertised because the version.c extension checks are
fulfilled, but we didn't actually claim support, so trying to create
a ES 3.2 context would fail.
It's all done, and the CTS results look good, so let's turn it on.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Implement querying this attribute in intelImageExtension and bump
version of intelImageExtension.
Signed-off-by: Chuanbo Weng <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Not [originally] intended for upstream. Should cause a GPU hang if
some thread is executed with a non-contiguous dispatch mask breaking
assumptions of brw_stage_has_packed_dispatch(). Doesn't cause any
CTS, DEQP or Piglit regressions, while replacing
brw_stage_has_packed_dispatch() with a dummy implementation that
unconditionally returns true on top of this patch causes multiple GPU
hangs.
v2: Refactor into a separate function instead of emitting the test
code directly from emit_nir_code(), drop VEC4 test and clean up
slightly for upstream. (Jason)
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
dispatch case.
This avoids emitting a few extra instructions required to take the
dispatch mask into account when it's known to be tightly packed.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
dispatch.
The eliminate_find_live_channel optimization eliminates
FIND_LIVE_CHANNEL instructions in cases where control flow is known to
be uniform, and replaces them with 'MOV 0', which in turn unblocks
subsequent elimination of the BROADCAST instruction frequently used on
the result of FIND_LIVE_CHANNEL. This is however not correct in
per-sample fragment shader dispatch because the PSD can dispatch a
fully unlit sample under certain conditions. Disable the optimization
in that case.
Reviewed-by: Jason Ekstrand <[email protected]>
v2: Add devinfo argument to brw_stage_has_packed_dispatch() to
implement hardware generation check.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On at least Sky Lake, ce0 does not contain the full story as far as enabled
channels goes. It is possible to have completely disabled channels where
the corresponding bits in ce0 are 1. In order to get the correct execution
mask, you have to mask off those channels which were disabled from the
beginning by taking the AND of ce0 with either sr0.2 or sr0.3 depending on
the shader stage. Failure to do so can result in FIND_LIVE_CHANNEL
returning a completely dead channel.
Signed-off-by: Jason Ekstrand <[email protected]>
Cc: Francisco Jerez <[email protected]>
[ Francisco Jerez: Fix a couple of typos, add mask register type
assertion, clarify reason why ce0 can have bits set for disabled
channels, clarify that this may only be a problem when thread
dispatch doesn't pack channels tightly in the SIMD thread. Apply
same treatment to Align16 path. ]
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
The state register sr0 is really a collection of dwords not a SIMD8
anything. It's much more convenient for brw_sr0_reg to return the
particular dword you're looking for rather than a giant blob you have to
massage into what you want.
Signed-off-by: Jason Ekstrand <[email protected]>
[ Francisco Jerez: Trivial simplification of brw_ud1_reg(). ]
Reviewed-by: Francisco Jerez <[email protected]>
|