| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Acked-by: Daniel Stone <[email protected]>
|
|
|
|
|
|
|
| |
This fixes a crash when running the arb_get_program_binary-api-errors
piglit test twice.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
In future we might want to try avoid calling nir_serialize() but
this works for now.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
This better represents that the ir could be either tgsi or nir.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Tested-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It makes no sense to compact before, and the description of
nir_compact_varyings() confirms that.
Polaris10:
Totals from affected shaders:
SGPRS: 108528 -> 108128 (-0.37 %)
VGPRS: 74548 -> 74500 (-0.06 %)
Spilled SGPRs: 844 -> 814 (-3.55 %)
Code Size: 3007328 -> 2992932 (-0.48 %) bytes
Max Waves: 16019 -> 16009 (-0.06 %)
Vega10:
Totals from affected shaders:
SGPRS: 106088 -> 106232 (0.14 %)
VGPRS: 74652 -> 74700 (0.06 %)
Spilled SGPRs: 692 -> 658 (-4.91 %)
Code Size: 2967708 -> 2953028 (-0.49 %) bytes
Max Waves: 18178 -> 18162 (-0.09 %)
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
| |
Fixes piglit test glsl-arb-fragment-coord-conventions
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
Split up the op properties table into generation-specific bits, and only
use the kepler ones on kepler. Fixes some CTS images tests.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Karol Herbst <[email protected]>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
All this information can be retrieved from the TIC directly. Avoid
having to dip into the constbuf information about the image.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
By default, 3DSTATE_CONSTANT_* Constant Buffer 0 is relative to dynamic
state base address. This makes it unusable for pushing UBOs.
There is a bit in the INSTPM register (or CS_DEBUG_MODE2 on Skylake)
which controls whether buffer 0 is relative to dynamic state base
address, or simply a normal pointer. Setting that gives us full
flexibility. This lets us push up to 4 UBO ranges.
We can't currently write this on Haswell and earlier, and will need
to update the kernel command parser, and then do the whole version
checking song and dance. We also need a brand new kernel that supports
context isolation - on older kernels, newly created contexts inherit
register state from whatever happened to be running. So, setting this
would have catastrophic impact on other drivers such as libva, Beignet,
or older Mesa.
See commit 8ec5a4e4a4a32f4de351c5fc2bf0eb615b6eef1b where we did this
once before, but had to revert it in commit 013d33122028f2492da90a03a.
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Kernel 4.16 has proper context isolation, which means we can change
the L3 configuration without worrying about that leaking to other
newly created contexts, breaking the assumptions of other userspace.
So, disable our workaround to reprogram it back to the default.
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
| |
GP10B requires the use of GP100_COMPUTE_CLASS instead of
GP104_COMPUTE_CLASS as is used for other non-GP100 chips.
Signed-off-by: Mikko Perttunen <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The previous commit reworked the checks intel_from_planar() to check the
right individual cases for regular/planar/aux buffers, and do size
checks in all cases.
Unfortunately, the aux size check was broken, and required the aux
surface to be allocated with the correct aux stride, but full image
height (!).
As the ISL aux surface is not recorded in the DRIimage, we cannot easily
access it to check. Instead, store the aux size from when we do have the
ISL surface to hand, and check against that later when we go to access
the aux surface.
Signed-off-by: Daniel Stone <[email protected]>
Fixes: c2c4e5bae3ba ("i965: Fix bugs in intel_from_planar")
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
User SGPRs changes:
VS: 14 -> 9
TCS: 14 -> 10
TES: 10 -> 6
GS: 8 -> 4
GSCOPY: 2 -> 1
PS: 9 -> 5
Merged VS-TCS: 24 -> 16
Merged VS-GS: 18 -> 11
Merged TES-GS: 18 -> 11
SGPRS: 2170102 -> 2158430 (-0.54 %)
VGPRS: 1645656 -> 1641516 (-0.25 %)
Spilled SGPRs: 9078 -> 8810 (-2.95 %)
Spilled VGPRs: 130 -> 114 (-12.31 %)
Scratch size: 1508 -> 1492 (-1.06 %) dwords per thread
Code Size: 52094872 -> 52692540 (1.15 %) bytes
Max Waves: 371848 -> 372723 (0.24 %)
v2: - the shader cache needs to take address32_hi into account
- set amdgpu-32bit-address-high-bits
Reviewed-by: Samuel Pitoiset <[email protected]> (v1)
|
|
|
|
|
|
|
| |
State trackers must use a user buffer or const_uploader,
or set pipe_resource::flags same as const_uploader->flags.
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
| |
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
| |
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
| |
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
| |
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
| |
Reviewed-by: Samuel Pitoiset <[email protected]>
|
| |
|
| |
|
|
|
|
| |
Required by radeonsi for optimal behavior.
|
|
|
|
| |
Reviewed-by: Dylan Baker <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The PIPE_CONTROL command description says:
"Whenever a Binding Table Index (BTI) used by a Render Taget Message
points to a different RENDER_SURFACE_STATE, SW must issue a Render
Target Cache Flush by enabling this bit. When render target flush
is set due to new association of BTI, PS Scoreboard Stall bit must
be set in this packet."
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Meta is awful and we'd like to stop using it. Implementing this using
BLORP allows us to stop trashing a bunch of GL state every time.
This follows the structure of st_generate_mipmap().
compute_num_levels is lifted directly from there.
Improves performance in Gl41HdrBloom by about 11.794% +/- 1.01919% (n=3)
on Kabylake GT2 at 1280x720 (the difference seems much smaller at higher
resolutions).
v2 (idr): Don't try depth or depth-stencil blorp blits on Gen4 or Gen5
because it's not implemented yet.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
| |
I want to use compute_num_levels inside i965. Rather than duplicating
it, move it from mesa/st to core Mesa, and make it non-static.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This fixes a race condition in building targets that link in freedreno.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105120
Fixes: 0bbecc5a8548883f76a7 ("meson: define driver dependencies")
Signed-off-by: Dylan Baker <[email protected]>
Acked-by: Mark Janes <[email protected]>
|
|
|
|
|
|
|
| |
fix gcc8 compiler error for KNL.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105029
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
| |
in template gen_llvm.hpp
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Consolidate archrst draw events into single draw event with an attribute
that represents the type of draw
- Add handlers for new private proto versions of DrawInstancedEvent,
DrawIndexedInstancedEvent, DrawInstancedSplitEvent, and
DrawIndexedInstancedSplitEvent
- Convert the draw events to generic DrawInfoEvents
- parse_proto_event_fields() replaces 'AR_DRAW_TYPE' as a field type with
'uint32_t'. This draw type is actually an enum, but can be represented
as an unsigned integer.
- is_draw_or_dispatch() recognizes DrawInfoEvent as a draw event
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
| |
Added support for another full translation path in fetch jitter.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Convert portions of the C sampler to the rasty SIMD lib.
Also fix SRL call with a non-immediate. Don't count on the compiler
automagically converting an srli call to srl if the shift count isn't
an immediate.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
| |
"typename SIMD_T::TypeName" --> "TypeName<SIMD_T>"
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
| |
Use a new function to denote that we want to get offset to next component
and hide the fact that GEP is used underneath.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
| |
We were passing a garbage handle. Let's not do that.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
| |
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Use llvm intrinsic masked.gather instead of manual unroll for the cases
where we have vector of pointers. Improves llvm IR debug experience by
reducing a ton of IR to a single intrinsic call. Also seems to reduce
overall stack use considerably.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
| |
Together with correct detection of clipDistance NaNs when no cullDistance is set
Reviewed-by: Bruce Cherniak <[email protected]>
|