| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
It crashes.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97413
|
|
|
|
|
|
|
| |
We can take advantage of the fact that multi_fence does the obvious thing
with NULL fences.
This fixes unflushed fences that can get stuck due to empty IBs.
|
|
|
|
|
|
|
|
| |
radeonsi needs to do some operations (DCC decompression) for OpenGL-OpenCL
interop and this is the only way to make it coherent with the current
context. It can optionally be set to NULL.
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
| |
Set the flag on when dual instance encoding is supported,
otherwise set it to off.
Signed-off-by: Boyuan Zhang <[email protected]>
|
|
|
|
| |
Reviewed-by: Tom Stellard <[email protected]>
|
|
|
|
|
|
|
| |
just FYI, the kernel receives priority/4
Acked-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
free 60..63, move CP_DMA up
Acked-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
and rename the enum
Acked-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Acked-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This is just a workaround. The problem is described in the code.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96541
v2: say that it's only between the current context and aux_context
Reviewed-by: Nicolai Hähnle <[email protected]> (v1)
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
| |
Avoid building all those store 0 / store undef instruction pairs that
end up getting removed anyway.
Reviewed-by: Roland Scheidegger <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
They can lead to VM faults and worse, which goes against the GL robustness
promises.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
v2: take actual writemasks into account
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Also, prepare for using tgsi_array_info.
This also opens the door for properly handling allocation failures, but I'm
leaving that for a separate change.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Doing the write-back of the temporary vector in radeon_llvm_emit_store makes
no sense.
This also allows us to get rid of get_alloca_for_array.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
To have the same signature as get_array_range.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Tom Stellard <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
We can use the pointer stored in the temps array directly.
Reviewed-by: Tom Stellard <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
We can use the pointer stored in the temps array directly.
Reviewed-by: Tom Stellard <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
In the alloca'd array case, no longer create redundant and unused allocas
for the individual elements; create getelementptrs instead.
Reviewed-by: Tom Stellard <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
+23% Bioshock Infinite performance.
v2: - use the new fence_finish interface
- allow deferred fences with multiple contexts
- clear the ctx pointer after a deferred flush
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
| |
required by glClientWaitSync (GL 4.5 Core spec) that can optionally flush
the context
Reviewed-by: Rob Clark <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
mainly for monitoring visible VRAM congestion
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
Mine is longer than 64 bytes.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Some ideas copied from Jakob Sinclair's implementation, but the color
clearing is completely different.
v2: remove leftover code, disable conditional rendering
disable render condition cleanly
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
The radeon kernel module doesn't have the firmware query interface, so the
corresponding values will remain 0.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
This reverts commit b403eb338533894ee012a96bf55653996c92ec7c.
Not needed.
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
We don't wanna use unflushed fences when we have multiple contexts.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
This will be used as a counter for whether fence_finish needs to flush
the IB.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
The following patches will use this.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
LLVM doesn't use it.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
v_fma_f32 runs at FP64 rate (= slow). Alien Isolation and F1 2015 seem
to use fma for all d3d multiply-add instructions, which is silly.
This tries to restore performance for those games.
The main difference between v_mad_f32 and v_fma_f32 is that v_mad doesn't
support denormals, which we don't enable anyway, because they are slow too.
Also, there is code size reduction:
Totals from affected shaders:
VGPRS: 109796 -> 109808 (0.01 %)
Spilled SGPRs: 29995 -> 30022 (0.09 %)
Spilled VGPRs: 12 -> 13 (8.33 %) <-- it's just one shader going from 12 to 13
Code Size: 6667596 -> 6476356 (-2.87 %) bytes
Max Waves: 26931 -> 26899 (-0.12 %)
I've not actually tested real performance.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
This reverts commit f84e9d749fbb6da73a60fb70e6725db773c9b8f8.
Bioshock Infinite no longer hangs.
|
|
|
|
|
|
| |
There is less noise in CPU profile data now.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
to reduce the call indirections with u_resource_vtbl.
The worst call tree you could get was:
- u_transfer_inline_write_vtbl
- u_default_transfer_inline_write
- u_transfer_map_vtbl
- driver_transfer_map
- u_transfer_unmap_vtbl
- driver_transfer_unmap
That's 6 indirect calls. Some drivers only had 5. The goal is to have
1 indirect call for drivers that care. The resource type can be determined
statically at most call sites.
The new interface is:
pipe_context::buffer_subdata(ctx, resource, usage, offset, size, data)
pipe_context::texture_subdata(ctx, resource, level, usage, box, data,
stride, layer_stride)
v2: fix whitespace, correct ilo's behavior
Reviewed-by: Nicolai Hähnle <[email protected]>
Acked-by: Roland Scheidegger <[email protected]>
|