| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
| |
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
| |
The vec4 backend will lower it.
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
| |
The uint versions zero extend while the int versions sign extend.
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
| |
i965/fs was the only consumer, and we're now doing the lowering in NIR.
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
| |
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
| |
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
| |
We'll want to have different lowering options set for scalar/vector
stages.
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
| |
A future patch will want to use designated initalizers, which aren't
available in C++, but this is C.
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
| |
Strangely the return and parameter types were reversed.
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Enable GL_OES_geometry_shader enums for OpenGL ES 3.1.
V4: EXTRA tokens updated according to comments from Ilia Mirkin.
V5: Account for check_extra does not evaluate "or" lazy. Fix issues
with EXTRA_EXT_FB_NO_ATTACH_CS.
Signed-off-by: Marta Lofstedt <[email protected]>
Reviewed-by: Tapani Pälli <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
| |
Cc: [email protected]
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This happens especially with exports and varying packing, where the last
bits aren't always filled in. We end up trying to do quad-wide stores,
which ends up being a lot of register moves that carefully preserve the
nop value. Instead don't do the stores.
total instructions in shared programs : 6131375 -> 6125267 (-0.10%)
total gprs used in shared programs : 910139 -> 895501 (-1.61%)
total local used in shared programs : 15328 -> 15328 (0.00%)
local gpr inst
helped 0 7442 4693
hurt 0 90 2687
Most of the helped/hurt instruction changes are by one or two ops
because can no longer do quad-wide stores in all cases.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If an instruction has multiple defs, we have to do a lot more checks to
make sure that we can move it forward. Among other things, various code
likes to do
a, b = tex()
if () c = a
else c = b
which means that a single phi node will have results pointing at the
same instruction. We obviously can't propagate the tex in this case, but
properly accounting for this situation is tricky. Just don't try for
instructions with multiple defs.
This fixes about 20 shaders in shader-db, including the dolphin efb2ram
shader.
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It appears that the nvidia render engine is quite picky when it comes to
linear surfaces. It doesn't like non-256-byte aligned offsets, and
apparently doesn't even do non-256-byte strides.
This makes arb_clear_buffer_object-unaligned pass on both nv50 and nvc0.
As a side-effect this also allows RGB32 clears to work via GPU data
upload instead of synchronizing the buffer to the CPU (nvc0 only).
Signed-off-by: Ilia Mirkin <[email protected]> # tested on GF108, GT215
Tested-by: Nick Sarnie <[email protected]> # GK208
Cc: [email protected]
|
|
|
|
|
|
|
|
|
|
|
| |
Since we emulate clip-planes, the clip-vertex is used within the VS
itself (thanks to nir_lower_clip). So just ignore it as a VS output.
Fixes a boatload of piglit tests that were asserting on unknown
varying slot.
(Also unrelated spelling/typo fix.)
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
With glsl_to_nir we end up with local variables, instead of global, for
arrays.
Note that we'll eventually have to do something more clever, I think,
when we support multiple functions, but that will probably take some
work in a few places.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
Something we start to see with glsl_to_nir.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
With tgsi_to_nir we get this as a normal input with VARYING_SLOT_FACE.
But glsl_to_nir plus nir_lower_system_values this becomes an intrinsic.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
Experimentally derived max size.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
When using the "shared" vertex array configuration strategy, we bind
each of the buffers as a separate array. However there can be holes in
such vertex buffer lists, so just emit a disable for those.
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: [email protected]
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
Teach the emitter that the two registers are sequential, and drop the
second arg entirely, in favor of a double-wide first argument.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
This largely leaves the existing image logic alone. When image support
is added this will have to be harmonized somehow.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
Issue a MEM_BARRIER. No idea if this is sufficient. As there are no
tests for this, it'll have to do for now.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
(address, length) pairs are uploaded to the driver constbuf as well to
make these values available to the shaders.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
| |
We need to store a lot more info now with per-buffer address/size.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]> (v1)
v1 -> v2: add arg_begin/arg_end around buffer array
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]> (v2)
v1 -> v2: use TGSI_MEMBAR defines
|
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
v1 -> v2: some 80 char reformatting
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This makes PROGRAM_IMMEDIATE a first-class gl_register_file type, and
adds PROGRAM_BUFFER to the list. These are used purely inside
glsl_to_tgsi conversion.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently any access params (coherent/volatile/restrict) are being lost
when lowering to the ssbo load/store intrinsics. Keep track of the
variable being used, and bake its access params in as the last arg of
the load/store intrinsics.
If the variable is accessed via an instance block, then 'variable'
points to the instance block variable and not the field inside the
instance block that we are accessing. In order to check access
parameters for the field itself we need to detect this case and keep
track of the corresponding field struct so we can extract the specific
field access information from there instead.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]> (v1)
v1 -> v2: add tracking of struct field
v2 -> v3: minor adjustments based on Iago's feedback
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Interfaces can have image properties set in case they are buffer
interfaces. Make sure not to lose this information.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]> (v1)
v1 -> v2: add defines for the various bits
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
In particular, AMDGPU_GEM_CREATE_CPU_GTT_USWC can affect even BOs created
in VRAM if they get evicted to GTT. In general there's no need to
restrict any of the flags to any particular domains.
Reviewed-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Christian König <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Failing to do this was resulting in the kernel driver unnecessarily
leaving open the possibility of CPU access to tiled BOs.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93862
(This change shouldn't be backported to stable branches, because
released versions of xf86-video-amdgpu unnecessarily try to map the
front buffer)
Reviewed-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Christian König <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Very modest effect, but it's clearly the right thing to do.
total instructions in shared programs : 6131491 -> 6131398 (-0.00%)
total gprs used in shared programs : 910157 -> 910131 (-0.00%)
total local used in shared programs : 15328 -> 15328 (0.00%)
local gpr inst bytes
helped 0 55 85 85
hurt 0 26 20 20
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
Reduces calls up to 50%
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Following shader-db results on GK110:
total instructions in shared programs : 6141510 -> 6131491 (-0.16%)
total gprs used in shared programs : 910187 -> 910157 (-0.00%)
total local used in shared programs : 15328 -> 15328 (0.00%)
local gpr inst bytes
helped 0 18 821 821
hurt 0 0 0 0
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Add a debug option to select the LLVM SI Machine Scheduler.
R600_DEBUG=sisched
Signed-off-by: Axel Davy <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ARB_gpu_shader_fp64 spec says:
"This extension does not support interpolation of double-precision
values; doubles used as fragment shader inputs must be qualified as
"flat"."
Fixes the regressions added by commit 781d278:
arb_gpu_shader_fp64-double-gettransformfeedbackvarying
arb_gpu_shader_fp64-tf-interleaved
arb_gpu_shader_fp64-tf-interleaved-aligned
arb_gpu_shader_fp64-tf-separate
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93878
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
Reviewed-by: Tapani Pälli <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Just make sure that after we've submitted, we get to at least 5
(global) submits ago before we go on to do more. Prevents up to
seconds of lag with window movement in X with xcompmgr -c. There may
be useful tuning to do in the future, but for now this gets us
usability.
Cc: "11.0 11.1" <[email protected]>
Signed-off-by: Eric Anholt <[email protected]>
|