| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
| |
I was hacking something stupid in doom, and hit an assert for the bitcast
following this, it definitely looks like this should be the number of 32-bit
components, not the instr level ones.
Reviewed-by: Nicolai Hähnle <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
| |
We can avoid adding the buffer in the non-local case, this will
avoid all the overhead of the indirect call.
Reviewed-by: Samuel Pitoiset <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The next patch will try and avoid calling the indirect function.
v2: add a missing conversion.
Reviewed-by: Samuel Pitoiset <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
| |
The function that calls us has just added the buffer to the
list already, no need to try and add it again.
Reviewed-by: Samuel Pitoiset <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
| |
There's no point recalculating these the whole time on descriptor
emission, just store them at pipeline creation.
Reviewed-by: Samuel Pitoiset <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
It appears the latest dota2 vulkan uses this,
and we get a hang in VR mode without it.
v2: remove finishme I left in after finishing.
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Andres Rodriguez <[email protected]>
Cc: "17.2 17.3" <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of storing all the pointers and zeroing them all out,
just store a valid bitmask in the state. This also moves
the CmdBindPipeline path down the cpu usage path for the
multithreading demo as it no longer has to traverse MAX_SETS
to find the active descriptor sets.
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
| |
This is just a simple refactor.
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This isn't required to be cleared, since buffers are only linked
by vertex elements, so if elements are clear then no buffers
should be referenced.
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
| |
This just removes a hole in the cmd_state and packs some bools
together.
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we allocate attachments in the begin command buffer due to the
render pass continue bit, we were leaking them.
Since renderpasses inside a cmd buffer malloc/free these properly,
and set to NULL, we just need to call free at end.
Fixes a memory leak with multithreading demo.
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Cc: "17.2 17.3" <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
| |
uint32_t data[MAX_SETS * 2] = {}; was getting executed before
the exit and took significant amounts of time. By having the
check outside the function, we skip the execution of the clear.
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The vram_list linked list resulted in lots of pointer chasing.
Replacing this with an array instead improves descriptor set
allocation CPU usage by 3x at least (when also considering the free),
because it had to iterate through 300-400 sets on average.
Not a huge improvement as the pre-improvement CPU usage was only
about 2.3% in the busiest thread.
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]
Acked-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]
Acked-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]
Acked-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]
Acked-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]
Acked-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]
Acked-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]
Acked-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]
Acked-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]
Acked-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]
Acked-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
Reviewed-by: Dave Airlie <[email protected]>
Reviewed-by: Marek Olšák <[email protected]
Acked-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
Reviewed-by: Dave Airlie <[email protected]>
Reviewed-by: Marek Olšák <[email protected]
Acked-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
Reviewed-by: Dave Airlie <[email protected]>
Reviewed-by: Marek Olšák <[email protected]
Acked-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
Reviewed-by: Dave Airlie <[email protected]>
Reviewed-by: Marek Olšák <[email protected]
Acked-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
Reviewed-by: Dave Airlie <[email protected]>
Reviewed-by: Marek Olšák <[email protected]
Acked-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
Reviewed-by: Dave Airlie <[email protected]>
Reviewed-by: Marek Olšák <[email protected]
Acked-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
These are produced by nir_lower_bitmap(), adding the missing derefence
would cause other issues that need to be hacked around such as
skipping sampler lowering and uniform location assignment, so this
change seems the correct way to go.
Fixes 194 piglit crashes on radeonsi using NIR.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This should reduce the time where compute units are idle, mainly
for meta operations because they use a bunch of compute shaders.
This seems to have a really minor positive effect for Talos, at least.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
It confuses CTS. This pregenerates the heap info into the
physical device, so we can use it for translating contiguous
indices into our "standard" ones.
This also makes the WSI a bit smarter in case the first preferred
heap does not exist.
Reviewed-by: Dave Airlie <[email protected]>
CC: <[email protected]>
|
|
|
|
|
|
|
| |
It's inaccurate. Instead, see the copyright and use "git log" and
"git blame" to know the authorship.
Acked-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This should make sure we don't treat exports buffers as local
bos.
Fixes: a639d40f13 (radv: add support for local bos. (v3))
Tested-by: Andres Rodriguez <[email protected]>
Reviewed-by: Andres Rodriguez <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
| |
DOW3 appears to hit this path.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Fixes regression in:
dEQP-VK.api.object_management.alloc_callback_fail.graphics_pipeline
Fixes: 1e84e53712ae "radv: add cache items to in memory cache when reading from disk"
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
| |
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103513
Fixes: de889794134e ("radv: Implement VK_AMD_shader_info")
Signed-off-by: Alex Smith <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Otherwise we will leak them, load duplicates from disk rather
than memory and never write items loaded from disk to the apps
pipeline cache.
Fixes: fd24be134ffd 'radv: make use of on-disk cache'
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This allows an app to query shader statistics and get a disassembly of
a shader. RenderDoc git has support for it, so this allows you to view
shader disassembly from a capture.
When this extension is enabled on a device (or when tracing), we now
disable pipeline caching, since we don't get the shader debug info when
we retrieve cached shaders.
v2: Improvements to resource usage reporting
v3: Disassembly string must be null terminated (string_buffer's length
does not include the terminator)
v4: Fixed LDS reporting. (Bas)
Signed-off-by: Alex Smith <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Serious Sam Fusion 2017 uses a huge number of occlusion queries,
and the allocated query pool buffer is greater than 4096 bytes.
This slightly improves performance (tested in Ultra) from
117.2 FPS to 119.7 FPS (~+2%) on my RX480.
This also improves Talos, from 69 FPS to 72/73 FPS (~+5%).
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
| |
Only needed when the CS path is used.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This uses the new kernel interfaces for reduced cs overhead,
We only set the local flag for memory allocations that don't have
a dedicated allocation and ones that aren't imports.
v2: add to all the internal buffer creation paths.
v3: missed some command submission paths, handle 0/empty bo lists.
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When binding a new pipeline, we applied all dynamic states
without checking if they really need to be re-emitted. This
doesn't seem to be useful for the meta operations because only
the viewports/scissors are updated.
This should reduce the number of commands added to the IB
when a new graphics pipeline is bound.
Also, rename radv_dynamic_state_copy() to radv_bind_dynamic_state()
and set the dirty flags directly there.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
| |
The depth bounds test values are either set at pipeline
creation or dynamically using vkCmdSetDepthBounds().
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Timothy Arceri <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
| |
This was the same between si and ac.
Reviewed-by: Timothy Arceri <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This was unused.
v2: drop args.
Reviewed-by: Timothy Arceri <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|