| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
and svga_set_sampler_views(). If there's no change, return early
and don't set a SVGA_NEW_x dirty state flag.
Reviewed-by: Charmaine Lee <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
gcc 4.9.3 shows the following error:
brw_vue_map.c:260:20: warning: array subscript is above array bounds
[-Warray-bounds]
return brw_names[slot - VARYING_SLOT_MAX];
This is because BRW_VARYING_SLOT_COUNT is a valid value for the enum
type. Adding an assert will generate no additional code but will teach
the compiler to not complain.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
vainfo fails in vaDriverInit because "dd_create_screen"
does not reach strcmp(driver_name, "nouveau") code.
Indeed when compiling the va target.c, the macro GALLIUM_NOUVEAU
is not defined.
This patch define the macro the same it is done for dri and
vdpau targets.
Tested with:
./autogen.sh --enable-glx --enable-gles2 --enable-egl --enable-vdpau --enable-glx-tls=yes --enable-va
--with-gallium-drivers=swrast,nouveau --with-dri-drivers=swrast,nouveau --with-egl-platforms=x11
LIBVA_DRIVER_NAME=gallium vainfo
Output:
vainfo: Driver version: mesa gallium vaapi
vainfo: Supported profile and entrypoints
VAProfileMPEG2Simple : VAEntrypointVLD
VAProfileMPEG2Main : VAEntrypointVLD
VAProfileMPEG4Simple : VAEntrypointVLD
VAProfileMPEG4AdvancedSimple : VAEntrypointVLD
VAProfileVC1Simple : VAEntrypointVLD
VAProfileVC1Main : VAEntrypointVLD
VAProfileVC1Advanced : VAEntrypointVLD
VAProfileH264Baseline : VAEntrypointVLD
VAProfileH264Main : VAEntrypointVLD
VAProfileH264High : VAEntrypointVLD
VAProfileNone : VAEntrypointVideoProc
Signed-off-by: Julien Isorce <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
- split nvc0_decoder_bsp in begin/next/end
- preserve content buffer when calling nvc0_decoder_bsp_next
- implement pipe_video_codec::begin_frame/end_frame
https://bugs.freedesktop.org/show_bug.cgi?id=89969
Signed-off-by: Julien Isorce <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It allows to call nouveau_vp3_bsp_next multiple times
between one begin/end.
It is required to support st/va.
https://bugs.freedesktop.org/show_bug.cgi?id=89969
Signed-off-by: Julien Isorce <[email protected]>
[imirkin: create strparm_bsp function, simplified w0 calculation]
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
| |
The counter was not set but used by the nouveau driver.
It is required otherwise visual output is garbage.
Signed-off-by: Julien Isorce <[email protected]>
Reviewed-by: Christian Koenig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes the same tests that commit 8cf2e892f was attempting to fix:
ES31-CTS.shader_storage_buffer_object.advanced-unsizedArrayLength-cs-std430-vec-bindrangeOffset
ES31-CTS.shader_storage_buffer_object.advanced-unsizedArrayLength-cs-std430-vec-bindrangeSize
as confirmed by Samuel.
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: Samuel Iglesias Gonsálvez <[email protected]>
Cc: Marta Lofstedt <[email protected]>
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 8cf2e892fca20c4776b4a07c39918343cb2d4e0e. It's
entirely bogus to attempt to store anything about the binding in the
buffer object itself, which might be bound any number of times.
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: Samuel Iglesias Gonsálvez <[email protected]>
Cc: Marta Lofstedt <[email protected]>
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead, keep track of GL_DEBUG_OUTPUT and (un)install the pipe_debug_callback
accordingly. Hardware drivers can still use the absence of the callback to
skip more expensive operations in the normal case, and users can no longer be
surprised by the need to set the debug flag at context creation time.
v2:
- re-add the proper initialization of debug contexts (Ilia Mirkin)
- silence a potential warning (Ilia Mirkin)
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
| |
Experimentally, 4M causes corruption and slowness, try to ramp it up
with size instead.
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: "11.0 11.1" <[email protected]>
|
|
|
|
|
|
|
|
| |
H264 doesn't have a bitplane bo. We just need a device reference, so use
the one from the client.
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: "11.0 11.1" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 5bb5eeea fixes a bug indicating that the surfaces should have the
API buffer size. Hovewer it picked the wrong value.
This patch adds a new variable, which takes into account
glBindBufferRange() values. This patch fixes the following CTS
regressions:
ES31-CTS.shader_storage_buffer_object.advanced-unsizedArrayLength-cs-std430-vec-bindrangeOffset
ES31-CTS.shader_storage_buffer_object.advanced-unsizedArrayLength-cs-std430-vec-bindrangeSize
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Marta Lofstedt <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
It's the same behavior that we use for later LLVM.
Reviewed-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
the parameter will be used later
Reviewed-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
the parameter will be used later
Reviewed-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
Reported by Tom^ on IRC. The original intent was to mark the pointer
constant as well as the data being pointed to, so move the *.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Note these are a bit uglier, due to avoidance of GNU C extensions. But
drivers which do not need to be built with compilers that don't support
the extension can wrap these macros with their own.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
Found during NIR_TEST_CLONE=1 piglit run. We were using block->index
but forgetting to require it. Causing things to not work with a cloned
shader which didn't preserve block_index.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Immediately convert into NIR and do an initial key-agnostic lowering/
optimization pass. This should let us share most of the per-variant
transformations between each variant, and hopefully minimize the draw-
time variant creation part of the compilation process.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
It will still hit a compile_assert() in emit_tex, which has the
advantage of dumping out the offending shader.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Nested functions are supported as an extension in GNU C, but Clang
don't support them.
This fixes compilation errors when (manually) building compute.c,
or by setting --enable-gallium-tests to the configure script.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75165
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Instead of iterating over all the buffer resources looking for coherent
buffers, we keep track of a context-wide count. This will save some
iterations (and CPU cycles) in 99.99% case because usually coherent
buffers are not so used.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
| |
If there's a linked TES program, we should just use the actual
primitive mode. If not, just guess triangles (as we did before).
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Using the push model for inputs is much more efficient than pulling
inputs - the hardware can simply copy a large chunk into URB registers
at thread creation time, rather than having the thread send messages to
request data from the L3 cache. Unfortunately, it's possible to have
more TES inputs than fit in registers, so we have to fall back to the
pull model in some cases.
However, it turns out that most tessellation evaluation shaders are
fairly simple, and don't use many inputs. An arbitrary cut-off of
32 vec4 slots (16 registers) is more than sufficient to ensure that
100% of TES inputs are pushed for Shadow of Mordor, Unigine Heaven,
GPUTest/TessMark, and SynMark.
Note that unlike most SIMD8 stages, this actually reads packed vec4
data, since that is what our vec4 TCS programs write.
Improves performance in GPUTest's tessmark_x64 microbenchmark
by 93.4426% +/- 5.35541% (n = 25) on my Lenovo X250 at 1024x768.
Improves performance in Synmark's Gl40TerrainFlyTess microbenchmark
by 22.74% +/- 0.309394% (n = 5).
Improves performance in Shadow of Mordor at low settings with
tessellation enabled at 1280x720 by 2.12197% +/- 0.478553% (n = 4).
shader-db statistics for files containing tessellation shaders:
total instructions in shared programs: 184358 -> 181181 (-1.72%)
instructions in affected programs: 27971 -> 24794 (-11.36%)
helped: 226
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We need a MOV to replicate g0.0<0,1,0> to all 8 channels. Since the
message payload is a single register, MOV seemed more sensible than
LOAD_PAYLOAD. However, MOV cannot be CSE'd, while LOAD_PAYLOAD can.
All input loads can use the same header - we don't need to re-expand
g0 every time. CSE accomplishes this, saving instructions.
shader-db statistics for files containing tessellation shaders:
total instructions in shared programs: 186923 -> 184358 (-1.37%)
instructions in affected programs: 30536 -> 27971 (-8.40%)
helped: 226
HURT: 0
total cycles in shared programs: 1009850 -> 1005356 (-0.45%)
cycles in affected programs: 168206 -> 163712 (-2.67%)
helped: 226
HURT: 0
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
While most align16 instructions only support a SubRegNum of 0 or 4
(using swizzling to control the other channels), 3-src instructions
actually support arbitrary SubRegNums. When the RepCtrl bit is set,
we believe it ignores the swizzle and uses the equivalent of a <0,1,0>
region from the subnr.
In the past, we adopted a vec4-centric approach of specifying subnr of
0 or 4 and a swizzle, then having brw_eu_emit.c convert that to a proper
SubRegNum. This isn't a great fit for the scalar backend, where we
don't set swizzles at all, and happily set subnrs in the range [0, 7].
This patch changes brw_eu_emit.c to use subnr and swizzle directly,
relying on the higher levels to set them sensibly.
This should fix problems where scalar sources get copy propagated into
3-src instructions in the FS backend. I've only observed this with
TES push model inputs, but I suppose it could happen in other cases.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Diagnostics sent during code generation and the every error message reported
by LLVMTargetMachineEmitToMemoryBuffer are disjoint reporting mechanisms. We
take care of both and also send an explicit message indicating failure at the
end, so that log parsers can more easily tell the boundary between shader
compiles.
Removed an fprintf that could never be triggered.
Reviewed-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
This will allow us to send shader debug info via the context's debug callback.
Reviewed-by: Edward O'Callaghan <[email protected]> (v1)
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The output via stderr is very helpful for ad-hoc debugging tasks, so that remains
unchanged, but having the information available via debug messages as well
will allow the use of parallel shader-db runs.
Shader stats are always provided (if the context is a debug context, that is),
but you still have to enable the appropriate R600_DEBUG flags to get
disassembly (since it is rather spammy and is only generated by LLVM when we
explicitly ask for it).
Reviewed-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
This will allow us to send shader debug info.
Reviewed-by: Edward O'Callaghan <[email protected]> (v1)
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The fixed alignment of u_upload_mgr will go away.
This is the first step.
The motivation is that one u_upload_mgr can have multiple users,
each allocating from the same buffer, but requiring a different alignment.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The function only aligned the size, but not the offset.
The offset was aligned only when the previous suballocation was aligned.
That yielded the correct offset alignment if the alignment was constant
for all suballocations.
Instead, directly align the offset, but allow an unaligned size.
There is no change in behavior, because the alignment is constant
at the moment.
This a prerequisite for allowing a variable alignment for suballocations.
Reviewed-by: Nicolai Hähnle <[email protected]>
|