aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
...
* virgl: use virgl_transfer_inline_write even lessGurchetan Singh2019-02-151-1/+1
| | | | | | | | | | | | | | We've noticed the Team Fortress 2 engine seems to do many small calls to glSubData(..). Let's pick our heuristic based on the resource base width, not the size of a particular upload. This will cause transfers to be batched together in the transfer queue. Revelant glbench microbenchmark -- Before: buffer_upload_dynamic_element_array_131072 = 131.17 mbytes_sec After: buffer_upload_dynamic_element_array_131072 = 6828.24 mbytes_sec Reviewed-by: Gert Wollny <[email protected]>
* virgl: use transfer queueGurchetan Singh2019-02-155-18/+36
| | | | | | | | | | This improves Unigine Valley benchmark by 3 to 10 fps (depending on the scene). It also improves the Team Fortress 2 benchmark from 6 fps to 13 fps (host: 20 fps). Reviewed-by: Gert Wollny <[email protected]>
* virgl: introduce transfer queueGurchetan Singh2019-02-155-0/+390
| | | | | | | | | | | | | | | | | | | | Transfers will be placed here at unmap time instead of incurring a VM exit. There's an attempt to deduplicate intersecting 1D transfers, which are surprisingly common. This can also help with mipmapped texture upload and smaller textures, where the majority of the time is spent in the guest kernel / QEMU -- not virglrenderer. This is shown by the GLbench texture upload benchmark: Before: texture_upload_rgba_teximage2d_32 = 64.23 mtexel_sec After: texture_upload_rgba_teximage2d_32 = 367.44 mtexel_sec v2: Split up list iteration functions (@gerddie) v3: Support for optimizing glBufferSubData Reviewed-by: Gert Wollny <[email protected]>
* virgl: add encoder functions for new protocolGurchetan Singh2019-02-152-0/+28
| | | | | | Let's encode the new protocol with new helper functions. Reviewed-by: Gert Wollny <[email protected]>
* virgl: make winsys modifications for encoded transfersGurchetan Singh2019-02-155-6/+21
| | | | | | | | | | | | | The idea is to have two command buffers: 1) One for transfers 2) One for commands, which can include transfers At flush time, (2) will be filled. Otherwise, (1) will be used to submit transfers if there are enough of them. v2: Pass size directly to cmd_buf_create (@gerddie) Reviewed-by: Gert Wollny <[email protected]>
* virgl: add extra checks in virgl_res_needs_flush_waitGurchetan Singh2019-02-151-4/+9
| | | | | | | | | | | | | | | | | | | | | | This is motivated by the following scenario: glSubBufferData(GL_ARRAY_BUFFER, ...) glFlush(..) glSubBufferData(GL_ARRAY_BUFFER, ...) glSubBufferData(GL_ARRAY_BUFFER, ...) glSubBufferData(GL_ARRAY_BUFFER, ...) This increases @davidriley's Team Fortress 2 apitrace from 1 fps to 6 fps and helps with the Chromium glbench microbenchmarks: Before: texture_update_rgba_texsubimage2d_2048 = 554.96 mtexel_sec buffer_upload_dynamic_array_12 = 0.02 mbytes_sec buffer_upload_dynamic_array_576 = 1.07 mbytes_sec After: texture_update_rgba_texsubimage2d_2048 = 612.29 mtexel_sec buffer_upload_dynamic_array_12 = 2.22 mbytes_sec buffer_upload_dynamic_array_576 = 164.89 mbytes_sec Reviewed-by: Gert Wollny <[email protected]>
* virgl: pass virgl transfer to virgl_res_needs_flush_waitGurchetan Singh2019-02-155-14/+22
| | | | Reviewed-by: Gert Wollny <[email protected]>
* virgl: keep track of number of computationsGurchetan Singh2019-02-152-3/+3
| | | | | | It's good to keep track of these things. Reviewed-by: Gert Wollny <[email protected]>
* virgl: limit command length to 16 bitsGurchetan Singh2019-02-152-5/+8
| | | | | | | | | | | Much of our logic is based around the idea the upper 16 bits of a command dword can encode the length of the command. Now that the command buffer >= 2^16 - 1, we should check for this. v2: alignment, and only check VIRGL_ENCODE_MAX_DWORDS Reviewed-by: Gert Wollny <[email protected]>
* virgl: use virgl_transfer in inline writeGurchetan Singh2019-02-151-26/+40
| | | | | | | | | | | | | Let's define a helper function and use it. This commit also allows resources to be emitted into different command buffers. Like the ioctls, send 0 for layer_stride and stride. If we actually send the real values, there are various assumptions in virglrenderer for non-1D buffers that may need to be modified. Reviewed-by: Gert Wollny <[email protected]>
* virgl: add protocol for resource transfersGurchetan Singh2019-02-152-0/+12
| | | | | | | | | Mostly similar to VIRGL_CCMD_RESOURCE_INLINE_WRITE. However, this uses the resource's already attached iovecs rather than the command buffer to transfer the data. v2: Used (1 << 16) not (1 << 15) [@gerddie] Reviewed-by: Gert Wollny <[email protected]>
* virgl: when creating / freeing transfers, pass slab pool directlyGurchetan Singh2019-02-154-14/+14
| | | | | | | This will allow us to destroy transfers w/o having a pointer to the context. Reviewed-by: Gert Wollny <[email protected]>
* virgl: unmap uploader at flush timeGurchetan Singh2019-02-151-2/+3
| | | | | | This should save some memory when allocating and freeing transfers. Reviewed-by: Gert Wollny <[email protected]>
* virgl: make alignment smaller when uploading index user buffersGurchetan Singh2019-02-151-1/+1
| | | | | | | | Since we're just uploading to guest memory, let's just align to dword size. Fixes: e0f932 ("u_upload_mgr: pass alignment to u_upload_data manually") Reviewed-by: Gert Wollny <[email protected]>
* virgl: track level cleanliness rather than resource cleanlinessGurchetan Singh2019-02-156-14/+20
| | | | | | This allows a minor optimization for texture upload. Reviewed-by: Gert Wollny <[email protected]>
* virgl: don't mark unclean after a flushGurchetan Singh2019-02-151-1/+0
| | | | | | | The guest memory is still clean until host GL touches it, which we should track elsewhere. Reviewed-by: Gert Wollny <[email protected]>
* virgl: use virgl_resource_dirty helperGurchetan Singh2019-02-156-16/+19
| | | | Reviewed-by: Gert Wollny <[email protected]>
* virgl: add ability to do finer grain dirty trackingGurchetan Singh2019-02-158-13/+15
| | | | | | There are levels to cleanliness. Reviewed-by: Gert Wollny <[email protected]>
* panfrost: Improve logging and patch memory leaksAlyssa Rosenzweig2019-02-152-49/+48
| | | | Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Don't align framebuffer dimsAlyssa Rosenzweig2019-02-151-2/+2
| | | | | | Fixes regressions with EGL clients Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Implement PIPE_QUERY_OCCLUSION_COUNTERAlyssa Rosenzweig2019-02-151-1/+8
| | | | Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Identify MALI_OCCLUSION_PRECISE bitAlyssa Rosenzweig2019-02-152-5/+7
| | | | | | Setting this is required for desktop-style occlusion queries. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* drirc/i965: add option to disable 565 configs and visualsTapani Pälli2019-02-152-0/+18
| | | | | | | | | | | We have cases where we would not like to expose these. v2: call the option allow_rgb565_configs for consistency with existing allow_rgb10_configs (Eric, Jason) Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* panfrost: Backport driver to Mali T600/T700Alyssa Rosenzweig2019-02-156-262/+340
| | | | | | | | | | | | | | | | | | | | | | | | | | | | There are a few differenes between Mali T860 (Panfrost's primary reference target) and the older Midgard generations (T600/T700): - Miscellaneous different magic numbers. It's not clear what these numbers mean on either the old or new configurations yet. - Errata fixes. T800 is the final Midgard generation and presumably the least buggy. Older Midgard has some extra hardware errata we have to workaround. - SFBD vs MFBD split. Essentially, older Midgard use a Single FrameBuffer Descriptor (SFBD), which corresponds to single render-target rendering. Newer Midgard (T760+) use a Multiple FrameBuffer Descriptor (MFBD), allowing multiple RTs. On ES 2.0, these descriptors serve the same function, but we implement both, depending on the version of the hardware. - CPU bitness. 32-bit systems generally use 32-bit GPU descriptors, and vice versa for 64-bit. Our target T760 systems are 32-bit whereas our target T860 systems are 64-bit. More work is needed in this area. This patch fixes support in these areas for supporting older Midgard hardware. It is tested on Mali T760 and Mali T860. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Fix build; depend on libdrmAlyssa Rosenzweig2019-02-151-0/+1
| | | | Signed-off-by: Alyssa Rosenzweig <[email protected]>
* nir/dead_cf: Stop relying on liveness analysisJason Ekstrand2019-02-141-21/+39
| | | | | | | | | | | | | The liveness analysis pass is fairly expensive because it has to build large bit-sets and run a fix-point algorithm on them. Instead of requiring liveness for detecting if values escape a CF node, just take advantage of the structured nature of NIR and use block indices instead. This only requires the block index metadata which is the fastest we have metadata to generate. No shader-db changes on Kaby Lake Reviewed-by: Timothy Arceri <[email protected]>
* nir/dead_cf: Inline cf_node_has_side_effectsJason Ekstrand2019-02-141-41/+32
| | | | | | | | We want to handle live SSA values differently and it's going to involve walking the instructions. We can make it a single instruction walk if we combine it with cf_node_has_side_effects. Reviewed-by: Timothy Arceri <[email protected]>
* intel/fs: Bail in optimize_extract_to_float if we have modifiersJason Ekstrand2019-02-141-0/+9
| | | | | | | | | | | | This fixes a bug in runscape where we were optimizing x >> 16 to an extract and then negating and converting to float. The NIR to fs pass was dropping the negate on the floor breaking a geometry shader and causing it to render nothing. Fixes: 1f862e923cb "i965/fs: Optimize float conversions of byte/word..." Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109601 Tested-by: Lionel Landwerlin <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* swr: set PIPE_CAP_MAX_VARYINGS correctlyIlia Mirkin2019-02-141-0/+2
| | | | | | | | | | | Unfortunately swr was missed in the original commit. The number of varyings should generally match up to what's reported as the shader caps for fragment inputs. Fixes: 6010d7b8e8be (gallium: add PIPE_CAP_MAX_VARYINGS) Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Alok Hota <[email protected]> Cc: 19.0 <[email protected]>
* intel/fs: Silence a compiler warningJason Ekstrand2019-02-141-2/+1
| | | | Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* anv: Silence some compiler warnings in release buildsJason Ekstrand2019-02-142-4/+4
| | | | Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* anv/blorp: Delete a pointless assertJason Ekstrand2019-02-141-5/+0
| | | | | | | | Just a little higher up in the function we assert that the aspect masks are actually equal so there's no reason for the weaker check. Also, the temporary variables were causing compiler warnings in release builds. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir: Silence a couple of warnings in release buildsJason Ekstrand2019-02-142-1/+3
| | | | | | | | | | | | | | [28/716] Compiling C object 'src/compiler/nir/068b2c8@@nir@sta/nir_gather_xfb_info.c.o'. ../src/compiler/nir/nir_gather_xfb_info.c: In function ‘nir_gather_xfb_info’: ../src/compiler/nir/nir_gather_xfb_info.c:171:13: warning: variable ‘max_offset’ set but not used [-Wunused-but-set-variable] unsigned max_offset[NIR_MAX_XFB_BUFFERS] = {0}; ^~~~~~~~~~ [36/716] Compiling C object 'src/compiler/nir/068b2c8@@nir@sta/nir_instr_set.c.o'. ../src/compiler/nir/nir_instr_set.c:502:1: warning: ‘instr_each_src_and_dest_is_ssa’ defined but not used [-Wunused-function] instr_each_src_and_dest_is_ssa(nir_instr *instr) ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* spirv: Eliminate dead input/output variables after translation.Kenneth Graunke2019-02-141-5/+20
| | | | | | | | | | | | | | | spirv_to_nir can generate input/output variables which are illegal for the current shader stage, which would cause nir_validate_shader to balk. After my recent commit to start decorating arrays as compact, dEQP-VK.spirv_assembly.instruction.graphics.module.same_module started hitting validation errors due to outputs in a TCS (not intended for the TCS at all) not being per-vertex arrays. Thanks to Jason Ekstrand for suggesting this approach. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109573 Fixes: ef99f4c8d17 compiler: Mark clip/cull distance arrays as compact before lowering. Reviewed-by: Juan A. Suarez <[email protected]>
* anv: Put MOCS in the correct locationKenneth Graunke2019-02-141-2/+2
| | | | | | | | | | | My patch to switch from struct-based MOCS to numeric MOCS accidentally divided all MOCS entries by 2 in the Vulkan driver. MOCS on Gen9+ is just an array index into a table. But in the hardware packets, the index starts at bit 1. So we need to shift it. Fixes: 0b44644ca68 (genxml: Consistently use a numeric "MOCS" field) Reviewed-by: Jason Ekstrand <[email protected]>
* spirv: Add missing breakIan Romanick2019-02-141-0/+1
| | | | | | | Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Fixes: c6465fec0c5 ("spirv: add SpvCapabilityInt64Atomics") CID: 1442555
* util/tests: compile to something sensible in release buildsEric Engestrom2019-02-1412-0/+24
| | | | | | | | assert()-based tests make no sense without asserts, so make sure asserts are compiled in, even if the rest of the code has asserts turned off. Signed-off-by: Eric Engestrom <[email protected]> Acked-by: Lionel Landwerlin <[email protected]>
* anv/tests: compile to something sensible in release buildsEric Engestrom2019-02-145-0/+10
| | | | | | | | assert()-based tests make no sense without asserts, so make sure asserts are compiled in, even if the rest of the code has asserts turned off. Signed-off-by: Eric Engestrom <[email protected]> Acked-by: Lionel Landwerlin <[email protected]>
* etnaviv: drop duplicate #defineEric Engestrom2019-02-141-4/+0
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* st/dri: drop duplicate #defineEric Engestrom2019-02-141-4/+0
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* gbm: drop duplicate #definesEric Engestrom2019-02-141-8/+0
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* drm-uapi: use local files, not system libdrmEric Engestrom2019-02-1495-109/+103
| | | | | | | | | There was an issue recently caused by the system header being included by mistake, so let's just get rid of this include path and always explicitly #include "drm-uapi/FOO.h" Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* drm-uapi/README: remove explicit list of driver namesEric Engestrom2019-02-141-2/+2
| | | | | | | These headers are used by a lot more than just the intel drivers nowadays. Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* radv: fix radv_fixup_vertex_input_fetches()Samuel Pitoiset2019-02-141-1/+1
| | | | | | | | We should check that num_channels is 4, otherwise that breaks the world. Sorry for the short breakage. Fixes: 4b3549c0846 ("radv: reduce the number of loaded channels for vertex input fetches") Signed-off-by: Samuel Pitoiset <[email protected]>
* radv: reduce the number of loaded channels for vertex input fetchesSamuel Pitoiset2019-02-141-2/+79
| | | | | | | | | | | | | | | | | | | | | | | | | | It's unnecessary to load more channels than the vertex attribute format. The remaining channels are filled with 0 for y and z, and 1 for w. 29077 shaders in 15096 tests Totals: SGPRS: 1321605 -> 1318869 (-0.21 %) VGPRS: 935236 -> 932252 (-0.32 %) Spilled SGPRs: 24860 -> 24776 (-0.34 %) Code Size: 49832348 -> 49819464 (-0.03 %) bytes Max Waves: 242101 -> 242611 (0.21 %) Totals from affected shaders: SGPRS: 93675 -> 90939 (-2.92 %) VGPRS: 58016 -> 55032 (-5.14 %) Spilled SGPRs: 172 -> 88 (-48.84 %) Code Size: 2862740 -> 2849856 (-0.45 %) bytes Max Waves: 15474 -> 15984 (3.30 %) This mostly helps Croteam games (Talos/Sam2017). Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: store vertex attribute formats as pipeline keysSamuel Pitoiset2019-02-143-3/+21
| | | | | | | The formats will be used for reducing the number of loaded channels. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: use MAX_{VBS,VERTEX_ATTRIBS} when defining max vertex input limitsSamuel Pitoiset2019-02-141-2/+2
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: make use of ac_build_expand_to_vec4() in visit_image_store()Samuel Pitoiset2019-02-143-8/+6
| | | | | | | And make ac_build_expand() a static function. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* freedreno: Use the NIR lowering for isign.Eric Anholt2019-02-142-14/+1
| | | | | | | I think this will save an instruction and hopefully not increase any other costs (possibly the immediate -1 and 1?), but I haven't actually tested. Reviewed-by: Kristian H. Kristensen <[email protected]>
* intel: Use the NIR lowering for isign.Eric Anholt2019-02-143-31/+1
| | | | | | | | | | | Drops one instruction from fs-sign-int.shader_test. No change in shader-db due to it having 0 instances of sign(genIType). This may hurt isign64 if algebraic runs before int64 lowering, but I wasn't sure how to mark the algebraic opt as "every bit size but 64". v2: Update commit message about shader-db. Reviewed-by: Ian Romanick <[email protected]> (v1)