| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
We also want to monitor other MMIO counters like SRBM_STATUS2 in
order to know if SDMA is busy.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
The time spent in the function dropped by 37% for torcs.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
| |
The perf difference is very small, 3.25->2.84% in amdgpu_cs_flush()
in the DXMD benchmark.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
v2: use PRId64 instead of PRIx64
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
this cleanup is based on the vulkan driver, which seems to do the same thing
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
This would be a fix if the value was used anywhere.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
This just needs to be done for r600g in the screen.
We don't need an IB submission for every new context created for GCN.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
The perf difference is very small: 0.99% -> 0.40% for the time spent
in si_get_ia_multi_vgt_param when si_draw_vbo is 20%. Pretty much nothing.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
also move it to draw_vbo, because it should be 0 in most cases
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
to move the big conditional statement out of draw_vbo
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
it's the default and the name will change to +fp64-fp16-denormals.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
we don't use on-chip tess.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This fixes a vertex data corruption issue if some of the vertex streams
go through the MMU and some don't.
Signed-off-by: Lucas Stach <[email protected]>
Tested-by: Philipp Zabel <[email protected]>
Acked-by: Christian Gmeiner <[email protected]>
|
|
|
|
|
|
|
| |
Forgot the prefix ...
Fixes: 0fca80b3db64dc1d004f78e22b9de86a07e9de96
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since LLVM revision 293359 DumpModule gets only implemented when
either a debug build or LLVM_ENABLE_DUMP is set.
This patch adds a direct replacement for the function for radv and
radeonsi, However, as I don't know a good place to put common LLVM
code for all three I inlined the implementation for LLVMPipe.
v2: Use the new code for LLVM 3.4+ instead of LLVM 5+ & fixed indentation
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
| |
This matches the behavior of most other drivers, including nouveau,
radeonsi, and i965.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
| |
This generally cuts an instruction when blending is enabled and we thus
have a single instruction generating the color value.
total instructions in shared programs: 91759 -> 91634 (-0.14%)
instructions in affected programs: 5338 -> 5213 (-2.34%)
|
|
|
|
|
|
|
|
|
|
| |
shader-db results:
total instructions in shared programs: 92611 -> 91764 (-0.91%)
instructions in affected programs: 27417 -> 26570 (-3.09%)
The star is one shader in glmark2's terrain (drops 16% of its
instructions), but there are also wins in mupen64plus and glb2.7.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This has almost no effect on shader-db:
total instructions in shared programs: 92572 -> 92611 (0.04%)
instructions in affected programs: 4486 -> 4525 (0.87%)
Looking at 2 of the 7 different shaders that were hurt (all of which were
in mupen64), they all appear to be just differences in order of
instructions at the NIR level.
The advantage is that this should significantly reduce time in the compiler.
|
|
|
|
|
|
|
|
| |
Already handled by the build.
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Analogous to previous commits.
Cc: George Kyriazis <[email protected]>
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
| |
Analogous to previous commits.
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Correctly handled by the build systems.
Cc: Roland Scheidegger <[email protected]>
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
| |
Already handled by configure.ac
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Does not match the function definition or how it's used. Triggers the
following warning in AppVeyor
svga_cmd_vgpu10.c(1301) : warning C4028: formal parameter 2 different from declaration
Cc: Charmaine Lee <[email protected]>
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
| |
Analogous to previous commit.
Cc: "12.0 13.0" <[email protected]>
Cc: Axel Davy <[email protected]>
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
| |
Analogous to previous commit.
Cc: "12.0 13.0" <[email protected]>
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
| |
Already implicitly handled by the build system.
Signed-off-by: Emil Velikov <[email protected]>
Tested-by: Aaron Watry <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Analogous to previous commit.
Cc: "12.0 13.0" <[email protected]>
Cc: Aaron Watry <[email protected]>
Cc: Francisco Jerez <[email protected]>
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Analogous to previous commit.
Fixes: 4610e5ef28e "freedreno/ir3: fix sin/cos"
Cc: "12.0 13.0" <[email protected]>
Cc: Rob Clark <[email protected]>
Cc: Nicolas Dechesne <[email protected]>
Reported-by: Nicolas Dechesne <[email protected]>
Signed-off-by: Emil Velikov <[email protected]>
Tested-by: Nicolas Dechesne <[email protected]>
|
|
|
|
|
|
|
|
|
| |
All of these have had support for the TGSI opcodes since before most of
the glsl compiler work landed.
Also update the docs accordingly, including the missing note about i965.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
| |
v1.1: move to using a normal CAP. (Marek)
v2: fill in the cap everywhere
Signed-off-by: Dave Airlie <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
| |
Improves 1024x1024 TexSubImage2D by 41.2371% +/- 3.52799% (n=10).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We had a lot of memcpy call overhead because gpu_stride wasn't being
inlined. But if you split out the stride==8 and stride==16 cases like
this code does while still using memcpy, you'd no longer have glibc's
NEON memcpy applied at which point we'd be doing 16 uncached reads
instead of 64/(NEON memcpy granularity), for about a 30% performance
hit. By hand writing the assembly, we can get a whole cacheline
loaded at a time.
Unfortunately, NEON intrinsics turned out to be unusable -- they
didn't have the vldm instruction available.
Note that, for now, the NEON code is only enabled when building for ARMv7
(Pi 2+). We may want to do runtime detection for the Raspbian case, in
the future.
Improves 1024x1024 GetTexImage by 208.256% +/- 7.07029% (n=10).
|
|
|
|
| |
This paves the way for building it twice, with NEON assembly or not.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This new query returns the current visible usage of VRAM accessed
by the CPU. It will return 0 on radeon because it's unimplemented.
Signed-off-by: Samuel Pitoiset <[email protected]>
Tested-by: Edmondo Tommasina <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
R600_DEBUG="info" can be used to display that size, as well as
the total amount of VRAM/GTT.
Signed-off-by: Samuel Pitoiset <[email protected]>
Tested-by: Edmondo Tommasina <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
In swr_update_derived() update texture and sampler state on a new fragment
shader. GALLIUM_HUD can update fs using a previously bound texture and
sampler.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Useful when debugging applications which map a ton of buffers
and also because we used to run into Linux's limit on the number
of simultaneous mmap() calls.
v2: - update the commit message
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
| |
This fixes R11G11B10_FLOAT, because it's in the category of "OTHER",
meaning that it doesn't have any channel description.
Cc: 17.0 <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|