mesa.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	radeonsi: enable ARB_bindless_texture	Samuel Pitoiset	2017-06-14	1	-1/+3
\| \| \| \| \| \| \|	This has only been tested on RX480. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
*	radeonsi: add support for loading bindless images	Samuel Pitoiset	2017-06-14	1	-7/+21
\| \| \| \| \|	Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
*	radeonsi: add support for loading bindless samplers	Samuel Pitoiset	2017-06-14	1	-3/+12
\| \| \| \| \|	Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
*	radeonsi: invalidate buffers which are made resident if needed	Samuel Pitoiset	2017-06-14	1	-0/+34
\| \| \| \| \| \| \| \|	When a buffer becomes resident, check if it has been invalidated, if so update the descriptor and the dirty flag. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
*	radeonsi: upload new descriptors when resident buffers are invalidated	Samuel Pitoiset	2017-06-14	2	-0/+148
\| \| \| \| \| \| \| \| \| \| \| \| \|	When texture buffers are invalidated the addr in the resident descriptor has to be updated but we can't create a new descriptor because the resident handle has to be the same. Instead, use the WRITE_DATA packet which allows to update memory directly but graphics/compute have to be idle in case the GPU is reading the descriptor. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
*	radeonsi: only decompress resident textures/images when used	Samuel Pitoiset	2017-06-14	1	-2/+11
\| \| \| \| \| \| \| \|	When the current bound shaders don't use any bindless textures or images, it's useless to decompress the resident resources. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
*	radeonsi: track use of bindless samplers/images from tgsi_shader_info	Samuel Pitoiset	2017-06-14	5	-5/+46
\| \| \| \| \| \| \| \| \|	This adds some new helper functions to know if the current draw call (or dispatch compute) is using bindless samplers/images, based on TGSI analysis. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
*	radeonsi: decompress resident textures/images before graphics/compute	Samuel Pitoiset	2017-06-14	3	-0/+114
\| \| \| \| \| \| \| \|	Similar to the existing decompression code path except that it loops over the list of resident textures/images. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
*	radeonsi: decompress DCC for resident textures/images	Samuel Pitoiset	2017-06-14	2	-0/+83
\| \| \| \| \| \| \| \| \| \|	Analogous to bound textures/images. We should also update the resident descriptors and disable COMPRESSION_EN for avoiding useless DCC fetches, but I postpone this optimization for a separate series. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
*	radeonsi: only add descriptors in presence of resident handles	Samuel Pitoiset	2017-06-14	1	-0/+6
\| \| \| \| \| \| \| \| \|	This won't help much except for applications that use a ton of resident handles. Though, this will reduce the winsys overhead a little bit. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
*	radeonsi: add all resident buffers to the current CS	Samuel Pitoiset	2017-06-14	3	-0/+52
\| \| \| \| \| \| \| \| \| \| \|	Resident buffers have to be added to every new command stream. Though, this could be slightly improved when current shaders don't use any bindless textures/images but usually applications tend to use bindless for almost every draw call, and the winsys thread might help when buffers are added early. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
*	radeonsi: implement ARB_bindless_texture	Samuel Pitoiset	2017-06-14	3	-0/+285
\| \| \| \| \| \| \| \|	This implements the Gallium interface. Decompression of resident textures/images will follow in the next patches. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
*	radeonsi: add a slab allocator for bindless descriptors	Samuel Pitoiset	2017-06-14	4	-0/+119
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For each texture/image handles, we need to allocate a new buffer for the bindless descriptor. But when the number of buffers added to the current CS becomes high, the overhead in the winsys (and in the kernel) is important. To reduce this bottleneck, the idea is to suballocate the bindless descriptors using a slab similar to the one used in the winsys. Currently, a buffer can hold 1024 bindless descriptors but this limit is arbitrary and could be changed in the future for some reasons. Once a slab is allocated the "base" buffer is added to a per-context list. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
*	radeonsi: add si_set_shader_image_desc() helper	Samuel Pitoiset	2017-06-14	1	-32/+47
\| \| \| \| \| \| \|	To share some common code between bound and bindless images. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
*	radeonsi: add si_set_sampler_view_desc() helper	Samuel Pitoiset	2017-06-14	1	-43/+52
\| \| \| \| \| \| \|	To share some common code between bound and bindless textures. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
*	radeonsi: add si_init_descriptor_list() helper	Samuel Pitoiset	2017-06-14	1	-0/+15
\| \| \| \| \| \| \| \|	This will be used in order to initialize resident descriptors for bindless textures/images. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
*	gallium: add PIPE_CAP_BINDLESS_TEXTURE	Samuel Pitoiset	2017-06-14	1	-0/+1
\| \| \| \| \| \| \| \| \|	Whether bindless texture operations are supported by the underlying driver. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: pack si_context better	Marek Olšák	2017-06-12	1	-18/+18
\| \| \| \| \| \|	there isn't much to gain here Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: pack si_framebuffer better	Marek Olšák	2017-06-12	1	-6/+6
\| \| \| \|	Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: pack si_sampler_view better	Marek Olšák	2017-06-12	1	-2/+2
\| \| \| \|	Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: pack si_buffer_resources better	Marek Olšák	2017-06-12	1	-4/+5
\| \| \| \|	Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: pack struct si_descriptors better	Marek Olšák	2017-06-12	1	-15/+15
\| \| \| \|	Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: pack struct si_vertex_elements better	Marek Olšák	2017-06-12	1	-9/+10
\| \| \| \|	Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: replace si_vertex_elements::elements with separate fields	Marek Olšák	2017-06-12	4	-14/+14
\| \| \| \| \| \|	It makes si_vertex_elements a little smaller. Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: rename si_vertex_element -> si_vertex_elements	Marek Olšák	2017-06-12	4	-6/+6
\| \| \| \|	Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: allocate si_state_rasterizer::pm4_poly_offset only when needed	Marek Olšák	2017-06-12	2	-2/+14
\| \| \| \| \| \|	Each element has over 700 bytes. Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: pack si_state_rasterizer fields	Marek Olšák	2017-06-12	1	-16/+16
\| \| \| \|	Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: remove 8 bytes from si_shader_key with uint32_t ff_tcs_inputs_to_copy	Marek Olšák	2017-06-12	3	-5/+14
\| \| \| \| \| \|	The previous patch helps with this. Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: use uint32_t to declare si_shader_key.opt.kill_outputs	Marek Olšák	2017-06-12	3	-6/+10
\| \| \| \| \| \|	the next patch will benefit from this Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: remove 8 bytes from si_shader_key by flattening opt.hw_vs	Marek Olšák	2017-06-12	4	-16/+15
\| \| \| \|	Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: don't emit DB_STENCIL_CONTROL if it has no effect	Marek Olšák	2017-06-12	1	-1/+2
\| \| \| \|	Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: fix missing num_L2_invalidates increment	Marek Olšák	2017-06-12	1	-0/+1
\| \| \| \|	Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: get rid of more compressed_colortex_mask names	Marek Olšák	2017-06-12	4	-18/+18
\| \| \| \|	Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: call LLVMAddEarlyCSEMemSSAPass only for LLVM >= 4.0	Juan A. Suarez Romero	2017-06-08	1	-0/+2
\| \| \| \| \| \| \| \|	LLVMAddEarlyCSEMemSSAPass() is defined in LLVM 4.0. Fixes: 257b538 ("radeonsi: do EarlyCSEMemSSA LLVM pass) Signed-off-by: Marek Olšák <[email protected]>
*	gallium/radeon: don't allocate HTILE in a separate buffer	Marek Olšák	2017-06-08	3	-19/+8
\| \| \| \| \|	Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: rename depth decompress functions	Marek Olšák	2017-06-08	1	-16/+15
\| \| \| \| \|	Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: rename shader resource decompress masks to their true meaning	Marek Olšák	2017-06-08	3	-28/+28
\| \| \| \| \|	Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: rename is_compressed_colortex -> color_needs_decompression	Marek Olšák	2017-06-08	1	-5/+5
\| \| \| \| \|	Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: disable the patch ID workaround on SI when the patch ID isn't used ↵	Marek Olšák	2017-06-08	2	-15/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	(v2) The workaround causes a massive performance decrease on 1-SE parts. (Cape Verde, Hainan, Oland) The performance regression is already part of 17.0 and 17.1. v2: check tess_uses_prim_id Cc: 17.0 17.1 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: don't update dependent states if it has no effect (v2)	Marek Olšák	2017-06-08	3	-12/+76
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This and the previous clip_regs commit decrease IB sizes and the number of si_update_shaders invocations as follows: IB size si_update_shaders calls Borderlands 2 -10% -27% Deus Ex: MD -5% -11% Talos Principle -8% -30% v2: always dirty cb_render_state in set_framebuffer_state Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: update clip_regs on shader state changes only when it's needed	Marek Olšák	2017-06-07	1	-3/+32
\| \| \| \|	Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: precompute some fields for PA_CL_VS_OUT_CNTL in si_shader_selector	Marek Olšák	2017-06-07	4	-16/+25
\| \| \| \| \|	Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: add a new helper si_get_vs	Marek Olšák	2017-06-07	2	-17/+19
\| \| \| \|	Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: isolate real framebuffer changes from the decompression passes (v3)	Samuel Pitoiset	2017-06-07	3	-2/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	When a stencil buffer is part of the framebuffer state, it is decompressed but because it's bindless, all draw calls set stencil_dirty_level_mask to 1. v2: Marek - set the flags outside the loop - also clear and set framebuffer.do_update_surf_dirtiness there - do it in the DB->CB copy path too v3: Marek - save and restore the do_update_surf_dirtiness flag Signed-off-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: do EarlyCSEMemSSA LLVM pass	Marek Olšák	2017-06-07	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	so that LLVM IR looks like CSE has been run on it. It's also recommended by the instruction combining pass. This also fixes: - GL45-CTS.arrays_of_arrays_gl.InteractionFunctionCalls2 (crash) - piglit/spec/arb_shader_ballot/execution/fs-readFirstInvocation-uint-loop (fail) The code size decrease is positive, the register usage isn't. There is a decrease in VGPR spilling for Tomb Raider, but increase in DiRT Showdown and GRID Autosport. EarlyCSEMemSSA has a -0.01% change in code size compared EarlyCSE. SGPRS: 1935420 -> 1938076 (0.14 %) VGPRS: 1645504 -> 1645988 (0.03 %) Spilled SGPRs: 2493 -> 2651 (6.34 %) Spilled VGPRs: 107 -> 115 (7.48 %) Private memory VGPRs: 1332 -> 1332 (0.00 %) Scratch size: 1512 -> 1516 (0.26 %) dwords per thread Code Size: 61981592 -> 61890012 (-0.15 %) bytes Max Waves: 371847 -> 371798 (-0.01 %) Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: remove 8 bytes from si_shader_key	Marek Olšák	2017-06-07	3	-14/+17
\| \| \| \| \| \| \|	We can use a union in si_shader_key::mono. Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: move PSIZE and CLIPDIST unique IO indices after GENERIC	Marek Olšák	2017-06-07	2	-8/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Heaven LDS usage for LS+HS is below. The masks are "outputs_written" for LS and HS. Note that 32K is the maximum size. Before: heaven_x64: ls=1f1 tcs=1f1, lds=32K heaven_x64: ls=31 tcs=31, lds=24K heaven_x64: ls=71 tcs=71, lds=28K After: heaven_x64: ls=3f tcs=3f, lds=24K heaven_x64: ls=7 tcs=7, lds=13K heaven_x64: ls=f tcs=f, lds=17K All other apps have a similar decrease in LDS usage, because the "outputs_written" masks are similar. Also, most apps don't write POSITION in these shader stages, so there is room for improvement. (tight per-component input/output packing might help even more) It's unknown whether this improves performance. Tested-by: Edmondo Tommasina <[email protected]> Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: clean up decompress blend state names	Marek Olšák	2017-06-07	4	-10/+10
\| \| \| \|	Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: enable TC-compatible stencil compression on VI	Marek Olšák	2017-06-07	3	-5/+8
\| \| \| \| \| \| \|	Most things are in place. Ideally we won't see decompress blits for stencil anymore. Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi/gfx9: prevent a race when the previous shader's main part is missing	Marek Olšák	2017-06-07	1	-0/+2
\| \| \| \|	Reviewed-by: Nicolai Hähnle <[email protected]>