summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* radeonsi: reload PS inputs with direct indexing at each use (v2)Marek Olšák2016-09-143-22/+41
| | | | | | | | | | | | | | | | | | | | | The LLVM compiler can CSE interp intrinsics thanks to LLVMReadNoneAttribute. 26011 shaders in 14651 tests Totals: SGPRS: 1146340 -> 1132676 (-1.19 %) VGPRS: 727371 -> 711730 (-2.15 %) Spilled SGPRs: 2218 -> 2078 (-6.31 %) Spilled VGPRs: 369 -> 369 (0.00 %) Scratch VGPRs: 1344 -> 1344 (0.00 %) dwords per thread Code Size: 35841268 -> 36009732 (0.47 %) bytes LDS: 767 -> 767 (0.00 %) blocks Max Waves: 222559 -> 224779 (1.00 %) Wait states: 0 -> 0 (0.00 %) v2: don't call load_input for fragment shaders in emit_declaration Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: get rid of constant buffer preloadingMarek Olšák2016-09-141-24/+14
| | | | | | | | | | | | | | | | | 26011 shaders in 14651 tests Totals: SGPRS: 1152636 -> 1146340 (-0.55 %) VGPRS: 728198 -> 727371 (-0.11 %) Spilled SGPRs: 3776 -> 2218 (-41.26 %) Spilled VGPRs: 369 -> 369 (0.00 %) Scratch VGPRs: 1344 -> 1344 (0.00 %) dwords per thread Code Size: 35835152 -> 35841268 (0.02 %) bytes LDS: 767 -> 767 (0.00 %) blocks Max Waves: 222372 -> 222559 (0.08 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: get rid of img/buf/sampler descriptor preloading (v2)Marek Olšák2016-09-141-132/+47
| | | | | | | | | | | | | | | | | | 26011 shaders in 14651 tests Totals: SGPRS: 1251920 -> 1152636 (-7.93 %) VGPRS: 728421 -> 728198 (-0.03 %) Spilled SGPRs: 16644 -> 3776 (-77.31 %) Spilled VGPRs: 369 -> 369 (0.00 %) Scratch VGPRs: 1344 -> 1344 (0.00 %) dwords per thread Code Size: 36001064 -> 35835152 (-0.46 %) bytes LDS: 767 -> 767 (0.00 %) blocks Max Waves: 222221 -> 222372 (0.07 %) Wait states: 0 -> 0 (0.00 %) v2: merge codepaths where possible Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: rename get_sampler_desc -> load_sampler_descMarek Olšák2016-09-141-11/+11
| | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: cosmetic changes in si_shader.cMarek Olšák2016-09-141-3/+5
| | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: load streamout buffer descriptors before use (v2)Marek Olšák2016-09-141-33/+14
| | | | | | v2: inline the code and remove the conditional that's a no-op now Reviewed-by: Nicolai Hähnle <[email protected]>
* vc4: Implement job shufflingEric Anholt2016-09-148-194/+333
| | | | | | | | | | | | | | | Track rendering to each FBO independently and flush rendering only when necessary. This lets us avoid the overhead of storing and loading the frame when an application momentarily switches to rendering to some other texture in order to continue rendering the main scene. Improves glmark -b desktop:effect=shadow:windows=4 by 27% Improves glmark -b desktop:blur-radius=5:effect=blur:passes=1:separable=true:windows=4 by 17% While I haven't tested other apps, this should help X rendering a lot, and I've heard GLBenchmark needed it too.
* vc4: Handle resolve skipping at job submit time.Eric Anholt2016-09-143-31/+37
| | | | | | This is done in vc4_flush currently, but I'm going to make the job always track the surfaces it might be rendering to instead of putting in the destinations at flush time.
* vc4: Move the render job state into a separate structure.Eric Anholt2016-09-1412-255/+287
| | | | | This is a preparation step for having multiple jobs being queued up at the same time.
* vc4: Always unref the current job surfaces at job reset time.Eric Anholt2016-09-143-36/+21
| | | | | Drops some tricky logic in vc4_flush() trying to update the pointers, and fixes a broken lack of unref for MSAA surfaces at context destroy time.
* vc4: Move job-submit skip cases to vc4_job_submit().Eric Anholt2016-09-142-12/+12
| | | | For calling job_submit() directly, I need the skipping here.
* vc4: Move bin CL trailer to job_submit() time.Eric Anholt2016-09-142-11/+14
| | | | | To implement job shuffling, I want to be able to call submit() on specific jobs, turning vc4_flush() into the context's flush-all-jobs hook.
* vc4: Simplify the DISCARD_RANGE handlingEric Anholt2016-09-141-12/+15
| | | | | | | It's really just an upgrade to attempting WHOLE_RESOURCE. Pulling the logic out caught two bugs in it: We would try to do so on cubemaps (even though we're only mapping 1 of the 6 slices), and we would break persistent coherent mappings by trying to reallocate when we shouldn't.
* vc4: Fix incorrect clearing of Z/stencil when cleared separately.Eric Anholt2016-09-143-15/+38
| | | | | | | | | | | | | | | | | The clear of Z or stencil will end up clearing the other as well, instead of masking. There's no way around this that I know of, so if we are clearing just one then we need to draw a quad. Fixes a regression in the job-shuffling code, where the clear values move to the job and don't just have the last clear's value laying around when you do glClear(DEPTH) and then glClear(STENCIL) separately (ext_framebuffer_multisample-clear 4 depth)). This causes regressions in ext_framebuffer_multisample/multisample-blit depth and ext_framebuffer_multisample/no-color depth, but these were formerly false positives due to the reference image also being black. Now the reference and test images are both being drawn, and it looks like there's an incorrect resolve of depth during blitting to an MSAA FBO.
* glsl: add core plumbing for GL_ANDROID_extension_pack_es31aIlia Mirkin2016-09-134-16/+47
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: introduce glPrimitiveBoundingBoxARB entrypointIlia Mirkin2016-09-133-19/+40
| | | | | | | | | This requires a bit of rejiggering, since normally ES entrypoints alias core ones, not vice-versa. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: add a GLES3.2 enums section, and expose new MS line width paramsIlia Mirkin2016-09-134-10/+46
| | | | | | | | | | | | | | This also exposes them for ARB_ES3_2_compatibility. While both specs refer to the new MS line width parameters being separate from the existing AA line widths, reality begs to differ. It's the same on all hardware currently supported by mesa. Should hardware come along that wants these to be different, they're easy enough to separate out. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Ian Romanick <[email protected]> (v1) Reviewed-by: Kenneth Graunke <[email protected]>
* aubinator: Remove bogus "end" parameter in gen_disasm_disassemble()Sirisha Gandikota2016-09-133-10/+12
| | | | | | | | | Earlier, the loop pretends to loop over instructions from "start" to "end", but the callers always pass 8192 for end, which is some huge bogus value. The real loop termination condition is send-with-EOT or 0. (Ken) Signed-off-by: Sirisha Gandikota <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* aubinator: Make gen_disasm_disassemble handle split sendsSirisha Gandikota2016-09-131-7/+12
| | | | | | | | | | | | Skylake adds new SENDS and SENDSC opcodes, which should be handled in the send-with-EOT check. Make an is_send() helper that checks if the opcode is SEND/SENDC/SENDS/SENDSC (Ken) v2: Make is_send() much more crispier, Mix declaration and code to make the code compact (Ken) Signed-off-by: Sirisha Gandikota <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* aubinator: Simplify print_dword_val() methodSirisha Gandikota2016-09-131-8/+4
| | | | | | | | | | Remove the float/dword union and use the iter->p[f->start / 32] directly as printf formatter %08x expects uint32_t (Ken) v2: Make the cleanup much more crispier (Ken) Signed-off-by: Sirisha Gandikota <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* anv/image: Set correct base_array_layer and array_len for storage imagesJason Ekstrand2016-09-131-0/+4
| | | | | | | Since Vulkan doesn't allow single-slice 3D storage images, we need to just set the base_array_layer and array_len to the full size of the 3-D LOD. Signed-off-by: Jason Ekstrand <[email protected]>
* Revert "intel/isl: Ignore base_array_layer and array_len for 3D storage..."Jason Ekstrand2016-09-131-6/+2
| | | | | | | | | This reverts commit 3943888c94beca69e575b8d3d1ec7a6cbf474ee4. It turns out that commit was pretty-much bogus since it breaks binding a 3-D texture as a 2-D storage image. The correct fix for the Vulkan CTS tests needs to be in the Vulkan driver itself rather than ISL. Signed-off-by: Jason Ekstrand <[email protected]>
* anv: Use blorp for doing MSAA resolvesJason Ekstrand2016-09-135-881/+121
| | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* anv: Use blorp for ClearColorImageJason Ekstrand2016-09-132-21/+56
| | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* anv: Delete meta_blit2dJason Ekstrand2016-09-134-1590/+0
| | | | | | | | Everything that we were once using the blit2d framework for is now done with blorp. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* anv/blorp: Add a gcd_pow2_u64 helper and use it for buffer alignmentsJason Ekstrand2016-09-131-24/+24
| | | | | | | | This is a lot cleaner and easier to read than the old piles of if statements. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Nanley Chery <[email protected]>
* anv: Use blorp for CopyBuffer and UpdateBufferJason Ekstrand2016-09-133-181/+187
| | | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Anuj Phogat <[email protected]> Reviewed-by: Nanley Chery <[email protected]>
* anv: Use blorp for CopyImageJason Ekstrand2016-09-132-158/+67
| | | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Anuj Phogat <[email protected]> Reviewed-by: Nanley Chery <[email protected]>
* anv: Use blorp for CopyBufferToImageJason Ekstrand2016-09-132-125/+16
| | | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Anuj Phogat <[email protected]> Reviewed-by: Nanley Chery <[email protected]>
* anv: Use blorp for CopyImageToBufferJason Ekstrand2016-09-132-16/+134
| | | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Anuj Phogat <[email protected]> Reviewed-by: Nanley Chery <[email protected]>
* anv: Use blorp to implement VkBlitImageJason Ekstrand2016-09-135-750/+144
| | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* anv: Make image_get_surface_for_aspect_mask constJason Ekstrand2016-09-133-7/+8
| | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* anv: Add initial blorp supportJason Ekstrand2016-09-137-0/+400
| | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
* intel/anv: Use #defines for all __gen_ helpersJason Ekstrand2016-09-131-5/+6
| | | | | | | This allows us to #undef them later if we don't want them to persist Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
* anv: Generalize emit_urb_setupJason Ekstrand2016-09-132-20/+45
| | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
* anv/pipeline: Roll compute_urb_partition into emit_urb_setupJason Ekstrand2016-09-133-156/+138
| | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
* intel/blorp: Use #defines for all __gen_ helpersJason Ekstrand2016-09-131-5/+6
| | | | | | | This allows us to #undef them later if we don't want them to persist Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
* intel/isl: Divide QPitch by 2 for 3-D stencil textures on SKL+Jason Ekstrand2016-09-131-1/+14
| | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* isl/state: Don't set QPitch for GEN4_3D surfacesJason Ekstrand2016-09-131-1/+16
| | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* intel/blorp: Rework alloc_binding_tableJason Ekstrand2016-09-132-10/+11
| | | | | | | | | | | | | The original blorp_alloc_binding_table helper was supposed to return the binding table offset and map along with the surface state maps. This isn't quite what we want, however. What we really want is the binding table offsets, surface state offsets, and surface state maps. In the GL driver, the binding table map *is* an array of surface state offsets. However, in Vulkan, this isn't quite true as the entries in the binding table are surface state offsets combined with another binding table block offset. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
* tgsi/scan: don't set interp flags for inputs only used by INTERP instructionsMarek Olšák2016-09-131-48/+57
| | | | | | | | | radeonsi depends on the interp flags a little bit too much. This fixes 9 randomly failing tests: GL45-CTS.shader_multisample_interpolation.render.interpolate_at_centroid.* Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: fix FP64 UBO loads with indirect uniform block indexingMarek Olšák2016-09-131-2/+1
| | | | | | | No known tests. Cc: [email protected] Reviewed-by: Nicolai Hähnle <[email protected]>
* winsys/amdgpu: don't assume GTT if the VRAM flag isn't setMarek Olšák2016-09-131-3/+3
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: clean up CP DMA emit codeMarek Olšák2016-09-131-84/+60
| | | | | | | | Unify the clear and copy paths, clean up the definitions. It looks more like a rework. It's a preparation for GDS support, which might or might not come. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: print the IB and buffer list in VM fault reportsMarek Olšák2016-09-131-1/+2
| | | | | | This is a fallout from reworking the debug flags. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: add sampler view BOs to the BO list lastMarek Olšák2016-09-131-7/+10
| | | | | | | | | If si_sampler_view_add_buffer ends up flushing, then the code in begin_new_cs would previously have added the buffer(s) for whatever was previously bound to that slot. Now it would add only the new buffer. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: export SampleMask from pixel shaders at full rateMarek Olšák2016-09-133-16/+56
| | | | | | | Heaven and Valley write gl_SampleMask and not Z. Use 16_ABGR instead of 32_ABGR if Z isn't written. Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: set new r600_resource fields correctly in other places tooMarek Olšák2016-09-131-0/+11
| | | | | | | | | | | | | This was missed in: commit 0d2e43fcb1198a6e67c85feadb1ca8c360ddc284 Author: Marek Olšák <[email protected]> Date: Thu Aug 18 16:30:00 2016 +0200 gallium/radeon: derive buffer placement and flags only at initialization Tested-by: Michel Dänzer <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* ddebug: dump shader buffers and imagesMarek Olšák2016-09-133-3/+49
| | | | | | this was unimplemented Reviewed-by: Nicolai Hähnle <[email protected]>
* ddebug: fix a crash in resource_get_handleMarek Olšák2016-09-131-1/+1
| | | | | | broken recently Reviewed-by: Nicolai Hähnle <[email protected]>