summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* util: Use ZSTD for shader cache if possibleDylan Baker2019-11-114-1/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This allows ZSTD instead of ZLIB to be used for compressing the shader cache. On a 72 core system emulating skl with a full shader-db (with i965): ZSTD: 1915.10s user 229.27s system 5150% cpu 41.632 total (cold cache) 225.40s user 10.87s system 3810% cpu 6.201 total (warm cache) 154M (235M on disk) ZLIB: 2231.33s user 194.24s system 1899% cpu 2:07.72 total (cold cache) 229.15s user 10.63s system 3906% cpu 6.139 total (warm cache) 163M (244M on disk) Tim Arceri sees (8 core ryzen and a full shader-db): ZSTD: 2505.22 user 40.50 system 3:18.73 elapsed 1280% CPU (cold cache) 418.71 user 14.93 system 0:46.53 elapsed 931% CPU (warm cache) 454.3 MB (681.7 MB on disk) ZLIB: 3069.83 user 40.02 system 4:20.13 elapsed 1195% CPU (cold cache) 425.50 user 15.17 system 0:46.80 elapsed 941% CPU (warm cache) 470.3 MB (701.4 MB on disk) Reviewed-by: Eric Engestrom <[email protected]> (v1) Reviewed-by: Eric Anholt <[email protected]>
* egl: avoid local modifications for eglext.h Khronos standard header fileLaurent Carlier2019-11-112-11/+11
| | | | | | | | | | Move differences in eglextchromium.h header file, then provide the same header than libglvnd-1.2 So program that omit to include eglextchromium.h will fail to build with both mesa and libglvnd headers. Fixes: a0a8109f "include: add the definition of EGL_EXT_image_flush_external" Cc: [email protected] Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* egl: move #include of local headers out of Khronos headersEric Engestrom2019-11-113-3/+4
| | | | | | Cc: [email protected] Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* intel/fs: Lower large local arrays to scratchJason Ekstrand2019-11-111-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Shader-db results on Kaby Lake: total instructions in shared programs: 14929212 -> 14880028 (-0.33%) instructions in affected programs: 72428 -> 23244 (-67.91%) helped: 6 HURT: 2 helped stats (abs) min: 2165 max: 15981 x̄: 8590.00 x̃: 7624 helped stats (rel) min: 56.06% max: 74.52% x̄: 67.55% x̃: 72.08% HURT stats (abs) min: 1178 max: 1178 x̄: 1178.00 x̃: 1178 HURT stats (rel) min: 350.60% max: 361.35% x̄: 355.97% x̃: 355.97% 95% mean confidence interval for instructions value: -11947.03 -348.97 95% mean confidence interval for instructions %-change: -125.72% 202.37% Inconclusive result (%-change mean confidence interval includes 0). total cycles in shared programs: 368585300 -> 342557344 (-7.06%) cycles in affected programs: 28144921 -> 2116965 (-92.48%) helped: 6 HURT: 2 helped stats (abs) min: 1404978 max: 7766106 x̄: 4353922.00 x̃: 3890682 helped stats (rel) min: 82.01% max: 95.57% x̄: 89.95% x̃: 92.28% HURT stats (abs) min: 47778 max: 47798 x̄: 47788.00 x̃: 47788 HURT stats (rel) min: 278.20% max: 282.98% x̄: 280.59% x̃: 280.59% 95% mean confidence interval for cycles value: -5900438.73 -606550.27 95% mean confidence interval for cycles %-change: -140.79% 146.16% Inconclusive result (%-change mean confidence interval includes 0). total spills in shared programs: 9243 -> 8901 (-3.70%) spills in affected programs: 2718 -> 2376 (-12.58%) helped: 4 HURT: 4 total fills in shared programs: 21831 -> 10141 (-53.55%) fills in affected programs: 11804 -> 114 (-99.03%) helped: 6 HURT: 2 total sends in shared programs: 815912 -> 815912 (0.00%) sends in affected programs: 0 -> 0 helped: 0 HURT: 0 LOST: 1 GAINED: 3 The helped shaders are all compute shaders in Aztec Ruins. There is also a compute shader in synmark2 OglCSDof that's helped but it doesn't show up in above shader-db results because it went from SIMD8 to SIMD16. That shader improves enough to yield an 15-20% performance boost to the benchmark as a whole on my KBL laptop. The hurt shaders are a couple shaders in Kerbal Space Program and a couple in Aztec Ruins. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/fs: Implement the new load/store_scratch intrinsicsJason Ekstrand2019-11-115-17/+241
| | | | | | | | | | | | | | | | | This commit fills in a number of different pieces: 1. We add support to brw_nir_lower_mem_access_bit_sizes to handle the new intrinsics. This involves simple plumbing work as well as a tiny bit of extra logic to always scalarize scratch intrinsics 2. Add code to brw_fs_nir.cpp to turn nir_load/store_scratch intrinsics into byte/dword scattered read/write messages which use the A32 stateless model. 3. Add code to lower_surface_logical_send to handle dword scattered messages and the A32 stateless model. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/nir: Plumb devinfo through lower_mem_access_bit_sizesJason Ekstrand2019-11-113-9/+14
| | | | Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/fs: refactor surface header setupJason Ekstrand2019-11-111-23/+16
| | | | Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/fs: Add DWord scattered read/write opcodesJason Ekstrand2019-11-115-0/+66
| | | | Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/nir: Use nir_extract_bits in lower_mem_access_bit_sizesJason Ekstrand2019-11-111-37/+15
| | | | | | | The new helper solves most of the annoying problems with data wrangling in brw_nir_lower_mem_access_bit_sizes. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir: Add tests for nir_extract_bitsJason Ekstrand2019-11-112-0/+167
|
* nir/builder: Add a nir_extract_bits helperJason Ekstrand2019-11-111-37/+80
| | | | | | | | | | This new helper is better than nir_bitcast_vector because it's able to take a (mostly) arbitrary range from the source vector. The only requirement is that first_bit has to be aligned to the smaller of the two bit sizes. It wouldn't be hard to lift that requirement but it's reasonable for now. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* egl: fix _EGL_NATIVE_PLATFORM fallbackEric Engestrom2019-11-111-9/+0
| | | | | | | | | When the X11 or Haiku platforms were compiled in, they would bypass the `_EGL_NATIVE_PLATFORM` fallback by always returning themselves instead. Cc: [email protected] Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* anv: Unify GetDeviceQueue and GetDeviceQueue2Ricardo Garcia2019-11-111-4/+8
| | | | | | | | | Avoid duplicating some checks and code by making anv_GetDeviceQueue a subcase of anv_GetDeviceQueue2, like radv does. Signed-off-by: Ricardo Garcia <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* panfrost: Select format-specific blending intrinsicsAlyssa Rosenzweig2019-11-113-9/+41
| | | | | | | | | | | If we have an accelerated path for a particular framebuffer format, let's use it to save a bunch of instructions in a blend shader. [Tomeu: Only use the faster intrinsic on >T760] Signed-off-by: Alyssa Rosenzweig <[email protected]> Signed-off-by: Tomeu Vizoso <[email protected]> Reviewed-by: Tomeu Vizoso <[email protected]>
* pan/midgard: Pack load/store masksAlyssa Rosenzweig2019-11-111-2/+30
| | | | | | | | | While most load/store operations on 32-bit/vec4 intriniscally, some are not and have special type-size-dependent semantics for the mask. We need to convert into this native format. Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Tomeu Vizoso <[email protected]>
* pan/midgard: Implement nir_intrinsic_load_output_u8_as_fp16_panAlyssa Rosenzweig2019-11-111-0/+20
| | | | | | | | We can use the native Midgard ops for this, depending what chip we're on. Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Tomeu Vizoso <[email protected]>
* pan/midgard: Identify ld_color_buffer_u8_as_fp16*Alyssa Rosenzweig2019-11-112-2/+7
| | | | | | | | | | There are two versions of this opcode, depending what version of the ISA you're using. I'm not sure if there's a semantic difference; I think there might be some slight subtleties but it's too early to know at this stage. Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Tomeu Vizoso <[email protected]>
* nir: Add load_output_u8_as_fp16_pan intrinsicAlyssa Rosenzweig2019-11-111-0/+6
| | | | | | | | | This is a single opcode, at least on newer Midgard chips. It's easier to have this represented in NIR rather than trying to optimize out the conversions, so let's add the intrinsic. Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Tomeu Vizoso <[email protected]>
* panfrost: Set depth and stencil for SFBD based on the formatTomeu Vizoso2019-11-114-21/+36
| | | | | Signed-off-by: Tomeu Vizoso <[email protected]> Reviewed-by: Alyssa Rosenzweig <[email protected]>
* zink: correct depth-stencil formatErik Faye-Lund2019-11-111-1/+1
| | | | | | | | | | | | | | | | | | When using packed vulkan-formats on little-endian systems, we need to swap the components for the gallium formats. And since Zink isn't big-endian safe yet, little-endian is the only endianess we care about right now. This fixes a bunch of piglit tests, amongs others: - spec@arb_depth_texture@depth-level-clamp - spec@arb_depth_texture@depthstencil-render-miplevels * d=z24 - spec@arb_depth_texture@fbo-depth-gl_depth_component24-blit - spec@arb_depth_texture@fbo-depth-gl_depth_component24-copypixels - spec@arb_depth_texture@fbo-depth-gl_depth_component24-drawpixels - spec@arb_depth_texture@fbo-depth-gl_depth_component24-readpixels Signed-off-by: Erik Faye-Lund <[email protected]> Fixes: 8d46e35d16e ("zink: introduce opengl over vulkan")
* zink/spirv: add support for nir_op_flrpErik Faye-Lund2019-11-111-0/+15
| | | | | | | | This fixes the following piglit: spec@ati_fragment_shader@ati_fragment_shader-render-fog Signed-off-by: Erik Faye-Lund <[email protected]>
* egl: Mention if swrast is being forcedChris Wilson2019-11-111-0/+2
| | | | | | | | | | | | | | The system can be disabling HW acceleration unbeknown to the user, leading to a long debug session trying to work out which component is failing. A quick mention that it is the environment override would be very useful. v2: Use more generic "CPU renderer" and so try to avoid jargon. Reviewed-By: Tapani Pälli <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Acked-by: Martin Peres <[email protected]>
* spirv: Sort out the mess that is sampled imageJason Ekstrand2019-11-092-15/+24
| | | | | | | | | | | | | | This commit makes two major changes. First, we add a second case to OpLoad for sampled images which constructs a vtn_sampled_image and stashes that rather than stashing a pointer to the combined image sampler like we do for bare samplers and images. This should be more in line with how SPIR-V is intended to work and hopefully doesn't cause any weird problems. The second is a rework of vtn_handle_texture to assume that everything has an image but not everything has a sampler. We also add a vtn_fail_if for the case where a texture instructions require a sampler but none is provided. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* spirv: Add a vtn_decorate_pointer helperJason Ekstrand2019-11-091-26/+41
| | | | | | | | | | This helper makes a duplicate copy of the pointer if any new access flags are set at this stage. This way we don't end up propagating access flags further than they actual SPIR-V decorations. In several instances where we create new pointers, we still call the decoration helper directly because no copy is needed. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* spirv: Remove the type from sampled_imageJason Ekstrand2019-11-094-8/+2
| | | | | | | We have types on all vtn_values at this point so there's no reason to carry the redundant type information. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* freedreno/ir3: also track # of nops for shader-dbRob Clark2019-11-093-1/+7
| | | | | | | | | | | | | The instruction count is (mostly) a measure of what optimization passes can do, while # of nops is more an indication of how effectively the scheduler is balancing register pressure vs instruction count. So track these independently. (There could be opportunities to rematerialize values to reduce register pressure, swapping some nop's with other alu instructions, so nothing is truely independent.. but it is still useful to break these stats out.) Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: sync disasm changes from envytoolsRob Clark2019-11-092-24/+94
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a4xx: fix SP_FS_MRT_REG.HALF_PRECISIONRob Clark2019-11-091-1/+1
| | | | | | Set flag based on actual output reg type. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx: fix SP_FS_MRT_REG.HALF_PRECISIONRob Clark2019-11-091-1/+1
| | | | | | | We should really be setting this based on the actual output register type. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: remove obsolete commentRob Clark2019-11-091-4/+0
| | | | | | | | The meta PHI instruction was removed long ago. And fanin/fanout themselves to not contribute actual instructions (at least not by the time you get to sched, they may prevent copy-propagating away a mov) Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3/ra: remove ir print after livein/outRob Clark2019-11-091-1/+0
| | | | | | | The IR hasn't changed at this point, so it isn't really adding any value. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3/ra: move regs_count==0 checkRob Clark2019-11-091-9/+2
| | | | | | | | Fold it in to writes_gpr() (since a register that does not reference any registers by definition does not write a register). This lets us avoid having to handle this case in a few other places. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: ir3_print tweaksRob Clark2019-11-092-47/+102
| | | | | | Handle HALF/HIGH flags in all cases, and colorize SSA src notation. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: use SSA flag on dest register tooRob Clark2019-11-094-45/+48
| | | | | | | | | | | | We did this in some places before, but not consistantly. But it will be useful for two-pass RA, to identify which registers have already been assigned. While we are cleaning this up, use __ssa_src() and new __ssa_dst() helper more consistently. (If nothing else, this reduces the # of callers of ir3_reg_create() to audit that we didn't miss something) Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: split pre-coloring to it's own functionRob Clark2019-11-091-3/+12
| | | | Signed-off-by: Rob Clark <[email protected]>
* spirv: Don't leak GS initialization to other stagesCaio Marcelo de Oliveira Filho2019-11-081-1/+2
| | | | | | | | | | | | The stage specific fields of shader_info are in an union. We've likely been lucky that this value was either overwritten or ignored by other stages. The recent change in shader_info layout in commit 84a1a2578da ("compiler: pack shader_info from 160 bytes to 96 bytes") made this issue visible. Fixes: cf2257069cb ("nir/spirv: Set a default number of invocations for geometry shaders") Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* compiler: pack shader_info from 160 bytes to 96 bytesMarek Olšák2019-11-081-66/+66
| | | | Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* glsl/linker: pass shader_info to analyze_clip_cull_usage directlyMarek Olšák2019-11-081-16/+9
| | | | | | This will be needed by the next commit. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* radeonsi/nir: fix compute shader crash due to nir_binary == NULLMarek Olšák2019-11-081-2/+12
| | | | | | This partially reverts 8b30114dda8. Fixes: 8b30114dda8 "radeonsi/nir: call nir_serialize only once per shader"
* radeonsi/nir: call nir_serialize only once per shaderMarek Olšák2019-11-081-21/+21
| | | | | | | | We were calling it twice. First serialize it, then use it to compute the cache key. Reviewed-by: Timothy Arceri <[email protected]>
* util: add blob_finish_get_bufferMarek Olšák2019-11-082-0/+14
| | | | Reviewed-by: Timothy Arceri <[email protected]>
* u_format: Fix swizzle of A1R5G5B5.Eric Anholt2019-11-081-1/+1
| | | | | | | Found once I started using the generated unpack code from the Mesa side. Fixes: 4bbaac3782ad ("gallium: Add some more channel orderings of packed formats.") Reviewed-by: Erik Faye-Lund <[email protected]>
* virgl: support emulating planar image samplingDavid Stevens2019-11-081-1/+6
| | | | | | | | | | Mesa emulates planar format sampling with per-plane samplers. Virgl now supports this by allowing the plane index to be passed when creating a sampler view from a planar image. With this change, mesa now passes that information to virgl. Signed-off-by: David Stevens <[email protected]> Reviewed-by: Lepton Wu <[email protected]>
* gallium/swr: Enable some ARB_gpu_shader5 extensionsKrzysztof Raszkowski2019-11-082-3/+4
| | | | | | | | | Enable / add to features.txt: - Enhanced textureGather. - Geometry shader instancing. - Geometry shader multiple streams. Reviewed-by: Jan Zielinski <[email protected]>
* gallium/swr: Fix GS invocation issuesKrzysztof Raszkowski2019-11-081-2/+7
| | | | | | | - Fixed proper setting gl_InvocationID. - Fixed GS vertices output memory overflow. Reviewed-by: Jan Zielinski <[email protected]>
* ac: Handle invalid GFX10 format correctly in ac_get_tbuffer_format.Timur Kristóf2019-11-082-0/+6
| | | | | | | | | | It happens that some games try to access a vertex buffer without a valid format. This case was incorrectly handled by ac_get_tbuffer_format which made ACO emit an invalid instruction. Signed-off-by: Timur Kristóf <[email protected]> Cc: 19.3 <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* panfrost: Try to evict unused BOs from the cacheBoris Brezillon2019-11-084-6/+61
| | | | | | | | | | | | | | | | | | | | | | The panfrost BO cache can only grow since all newly allocated BOs are returned to the cache (unless they've been exported). With the MADVISE ioctl that's not a big issue because the kernel can come and reclaim this memory, but MADVISE will only be available on 5.4 kernels. This means an app can currently allocate a lot memory without ever releasing it, leading to some situations where the OOM-killer kicks in and kills the app (or even worse, kills another process consuming more memory than the GL app) to get some of this memory back. Let's try to limit the amount of BOs we keep in the cache by evicting entries that have not been used for more than one second (if the app stopped allocating BOs of this size, it's likely to not allocate similar BOs in a near future). This solution is based on the VC4/V3D implementation. Signed-off-by: Boris Brezillon <[email protected]> Reviewed-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Move BO cache related fields to a sub-structBoris Brezillon2019-11-083-18/+21
| | | | | | | | | We will soon introduce an LRU list to evict BOs that have been unused for more than 1 second. Let's first move all BO cache fields to a sub-struct to clarify which fields are used by the BO caching logic. Signed-off-by: Boris Brezillon <[email protected]> Reviewed-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Switch base for vertex texturing on T720Alyssa Rosenzweig2019-11-081-11/+16
| | | | | | | | There aren't texture pipeline registers anymore; instead, space is shared with work and ldst registers for output and input respectively. We need to shift the base registers to represent this correctly. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Pass shader stage to disassemblerAlyssa Rosenzweig2019-11-084-4/+7
| | | | | | | Vertex texturing behaves differently from fragment texturing on some GPUs. Signed-off-by: Alyssa Rosenzweig <[email protected]>