| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
| |
We can use the pointer stored in the temps array directly.
Reviewed-by: Tom Stellard <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
In the alloca'd array case, no longer create redundant and unused allocas
for the individual elements; create getelementptrs instead.
Reviewed-by: Tom Stellard <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
+23% Bioshock Infinite performance.
v2: - use the new fence_finish interface
- allow deferred fences with multiple contexts
- clear the ctx pointer after a deferred flush
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
| |
required by glClientWaitSync (GL 4.5 Core spec) that can optionally flush
the context
Reviewed-by: Rob Clark <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
mainly for monitoring visible VRAM congestion
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
Mine is longer than 64 bytes.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Some ideas copied from Jakob Sinclair's implementation, but the color
clearing is completely different.
v2: remove leftover code, disable conditional rendering
disable render condition cleanly
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
The radeon kernel module doesn't have the firmware query interface, so the
corresponding values will remain 0.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
This reverts commit b403eb338533894ee012a96bf55653996c92ec7c.
Not needed.
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
We don't wanna use unflushed fences when we have multiple contexts.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
This will be used as a counter for whether fence_finish needs to flush
the IB.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
The following patches will use this.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
LLVM doesn't use it.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
v_fma_f32 runs at FP64 rate (= slow). Alien Isolation and F1 2015 seem
to use fma for all d3d multiply-add instructions, which is silly.
This tries to restore performance for those games.
The main difference between v_mad_f32 and v_fma_f32 is that v_mad doesn't
support denormals, which we don't enable anyway, because they are slow too.
Also, there is code size reduction:
Totals from affected shaders:
VGPRS: 109796 -> 109808 (0.01 %)
Spilled SGPRs: 29995 -> 30022 (0.09 %)
Spilled VGPRs: 12 -> 13 (8.33 %) <-- it's just one shader going from 12 to 13
Code Size: 6667596 -> 6476356 (-2.87 %) bytes
Max Waves: 26931 -> 26899 (-0.12 %)
I've not actually tested real performance.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
This reverts commit f84e9d749fbb6da73a60fb70e6725db773c9b8f8.
Bioshock Infinite no longer hangs.
|
|
|
|
|
|
| |
There is less noise in CPU profile data now.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
to reduce the call indirections with u_resource_vtbl.
The worst call tree you could get was:
- u_transfer_inline_write_vtbl
- u_default_transfer_inline_write
- u_transfer_map_vtbl
- driver_transfer_map
- u_transfer_unmap_vtbl
- driver_transfer_unmap
That's 6 indirect calls. Some drivers only had 5. The goal is to have
1 indirect call for drivers that care. The resource type can be determined
statically at most call sites.
The new interface is:
pipe_context::buffer_subdata(ctx, resource, usage, offset, size, data)
pipe_context::texture_subdata(ctx, resource, level, usage, box, data,
stride, layer_stride)
v2: fix whitespace, correct ilo's behavior
Reviewed-by: Nicolai Hähnle <[email protected]>
Acked-by: Roland Scheidegger <[email protected]>
|
|
|
|
| |
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
|
| |
always set
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This way we have unlimited UVD sessions.
v2: only enable it when kernel supports it as well.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Leo Liu <[email protected]>
|
|
|
|
|
|
| |
This reverts commit 513fccdfb68e6a71180e21827f071617c93fd09b.
Bioshock Infinite hangs with that.
|
|
|
|
|
|
| |
This is the bare minimum for reporting the error to the user.
Reviewed-by: Christian König <[email protected]>
|
|
|
|
|
|
| |
Required by our UVD code.
Reviewed-by: Christian König <[email protected]>
|
|
|
|
|
|
|
| |
Replace the previous hardcoded value with newly defined parameters
Signed-off-by: Boyuan Zhang <[email protected]>
Reviewed-by: Christian König <[email protected]>
|
|
|
|
|
|
| |
no change in behavior
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
whole buffer objects are not needed
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
Just send it whenever it is allocated.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Leo Liu <[email protected]>
|
|
|
|
|
|
|
| |
Destroying a not allocated buffer is harmless.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Leo Liu <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
It's actually not very clever to claim to support H.264
and then fail to create a decoder.
v2: prefix FW macro with UVD_.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Leo Liu <[email protected]>
|
|
|
|
|
|
|
| |
v2: fix other tabs as well.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Leo Liu <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Unlike SC, the small primitive filter does not automatically use center
locations in 1xAA mode, so this is needed to avoid artifacts caused by
the small primitive filter discarding triangles that it shouldn't.
As a side effect of how the effective number of samples is now calculated,
this patch also avoids submitting the sample locations for line/poly smoothing
when they're not really needed.
Cc: 12.0 <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We were storing arrays in vectors, which was leading to some really bad
spill code for large arrays. allocas instructions are a better fit for
arrays and LLVM optimizations are more geared toward dealing with
allocas instead of vectors.
For arrays that have 16 or less 32-bit elements, we will continue to use
vectors, because this will force LLVM to store them in registers and
use indirect registers, which is usually faster for small arrays.
In the future we should use allocas for all arrays and teach LLVM
how to store allocas in registers.
This fixes the piglit test:
spec/glsl-1.50/execution/geometry/max-input-component
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
bld->indirect_files is never set, so this function always returns false.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes a rare bug with stencil texturing -- seen on Polaris and Tonga,
though it's basically a function of the memory configuration so could affect
other parts as well.
Fixes piglit "unaligned-blit * stencil downsample" and various
"fbo-depth-array *stencil*" tests.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This is a left-over of when I considered generalizing the separate stencil
support. I do prefer the new name since it emphasizes what flushing vs.
non-flushing means from a functional point-of-view, namely special handling
of the texture format.
v2: adjust r600_init_color_surface as well
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
v2: adjust r600_init_color_surface as well
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
v2: keep using r600_texture_reference
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
It is the same for all levels.
Reviewed-by: Marek Olšák <[email protected]>
|