| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
Add support for 0 pitch in fetch.
Add support for USCALE/SSCALE for 32bit integer fetches.
Signed-off-by: Tim Rowley <[email protected]>
|
|
|
|
| |
Signed-off-by: Tim Rowley <[email protected]>
|
|
|
|
|
|
|
|
| |
Sync now uses a callback to ensure that it's called by the last
thread moving past a DC. This will help with the new counter
handling.
Signed-off-by: Tim Rowley <[email protected]>
|
|
|
|
|
|
| |
Avoid nested declarations of the same name within a single function.
Signed-off-by: Tim Rowley <[email protected]>
|
|
|
|
| |
Signed-off-by: Tim Rowley <[email protected]>
|
|
|
|
| |
Signed-off-by: Tim Rowley <[email protected]>
|
|
|
|
| |
Signed-off-by: Tim Rowley <[email protected]>
|
|
|
|
|
|
|
| |
This works out to be a wash in terms of memory usage: We use more memory
to store the separate ALU instructions, but we optimize out a lot of code
as well. The main result, though, is that we do more of our work at link
time rather than draw time.
|
|
|
|
|
|
|
|
|
|
|
|
| |
We don't want to bake the whole array into the FS key, because of the
hashing overhead. But we can keep a set of the arrays seen, and use a
pointer to the copy in as the array's proxy.
Between this and the previous patch, gl-1.0-blend-func now passes on
hardware, where previously it was filling the 256MB CMA area with shaders
and OOMing.
Drops 712 shaders from shader-db.
|
|
|
|
|
|
|
| |
The compiled_fs_id is a proxy for the vc4->prog.fs->input_slots[], but
only the VS dereferences it.
Drops 754 shaders from shader-db.
|
|
|
|
| |
It's a pretty big block, and I was about to make it bigger.
|
|
|
|
|
|
|
|
|
|
|
| |
Without this, the X server may accumulate stale Present event contexts
if a client performs several video decoding sessions using the same
window.
v2: Based on Chris Wilson's review:
* Use xcb_discard_reply() instead of free(xcb_request_check())
Reviewed-and-Tested-by: Leo Liu <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
We were baking in the LOD of the source level to each shader. Instead,
pass it in as a uniform -- this requires storing it to a temp register,
but that's better than compiling a ton of separate shaders:
total instructions in shared programs: 115032 -> 115036 (0.00%)
instructions in affected programs: 96 -> 100 (4.17%)
LOST: 572
|
|
|
|
|
|
| |
This helps in debugging memory pressure. It would be nice if we could
tell valgrind about it all the way from allocation time to destroy, but we
need a pointer to hand to VALGRIND_MALLOCLIKE_BLOCK.
|
|
|
|
| |
Cc: "12.0" <[email protected]>
|
| |
|
|
|
|
|
|
| |
The ranges are in units of bytes, not dwords. This wasn't caught by
piglit tests because ttn tends to make one big uniform file, so we only
had one UBO range with a src and dst offset of 0.
|
|
|
|
| |
I keep wanting to see this version of the NIR.
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
Small decrease in draw call overhead.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
v2: rebase on top of Brian's commit
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
| |
for piglit with the pipelined hang detection mode
v2: rebase on top of Brian's commit
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97140
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
LLVM doesn't use it.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
v_fma_f32 runs at FP64 rate (= slow). Alien Isolation and F1 2015 seem
to use fma for all d3d multiply-add instructions, which is silly.
This tries to restore performance for those games.
The main difference between v_mad_f32 and v_fma_f32 is that v_mad doesn't
support denormals, which we don't enable anyway, because they are slow too.
Also, there is code size reduction:
Totals from affected shaders:
VGPRS: 109796 -> 109808 (0.01 %)
Spilled SGPRs: 29995 -> 30022 (0.09 %)
Spilled VGPRs: 12 -> 13 (8.33 %) <-- it's just one shader going from 12 to 13
Code Size: 6667596 -> 6476356 (-2.87 %) bytes
Max Waves: 26931 -> 26899 (-0.12 %)
I've not actually tested real performance.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
swr rasterizer contains numerous data transfers between vectors
and ordinary C types. Fixing for strict aliasing will take time.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
| |
it cut off the upper 32 bits
Cc: [email protected]
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This can be used by the driver to get the command line which started
the process. Will be used by the VMware driver for extra logging.
For now, this is only implemented for Linux via /proc/self/cmdline
and Windows via GetCommandLine().
Reviewed-by: Charmaine Lee <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch eliminates the redundant SetVertexBuffers and
SetIndexBuffer commands that are emitted for rebind purpose.
With this patch, the set commands will be skipped, but we will still
reference the associated resources to allow the kernel to
bring in the resources.
Tested with Lightsmark2008, Valley, MTT glretrace, piglit, conform.
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are cases where we hit u_vbuf path due to alignment or pitch-
alignment restrictions, but for an output-format that u_vbuf does not
support translating (yet the driver does support natively). In which
case we hit the memcpy() path and don't care that u_vbuf doesn't
understand it.
Fixes crash with debug build of mesa in:
dEQP-GLES3.functional.vertex_arrays.single_attribute.strides.fixed.user_ptr_stride17_components2_quads1
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95000
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Fixes failure to initialize the force_first_level flag, causing
failures in piglit levelclamp.
|
|
|
|
|
|
| |
This reverts commit d1fe26a62862f4e47a799222dca1bc1dc14ca4af.
Replacing a resource leak with a segfault isn't the solution.
|
|
|
|
|
|
|
| |
CovID: 401540
Signed-off-by: Eric Engestrom <[email protected]>
Signed-off-by: Marek Olšák <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Francesco Ansanelli <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Francesco Ansanelli <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Francesco Ansanelli <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
To silence missing initializers warning
Signed-off-by: Francesco Ansanelli <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Previously, the bitshift would be performed on a simple int (32 bits on
most systems), overflow, and then be cast to 64 bits.
CovID: 1362461
Signed-off-by: Eric Engestrom <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
CovID: 1362445, 1362446
Signed-off-by: Eric Engestrom <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
Some apps, like warsow, create a bazillion contexts but don't render on
most of them.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
Would be nice if we could also have lockdep, like in the linux kernel.
But this is better than nothing.
Signed-off-by: Rob Clark <[email protected]>
|