summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers
Commit message (Collapse)AuthorAgeFilesLines
* virgl: ARB_texture_barrier supportDave Airlie2018-08-146-3/+24
| | | | Reviewed-by: Tomeu Vizoso <[email protected]>
* meson: Build with Python 3Mathieu Bridon2018-08-106-12/+12
| | | | | | | | | | | | Now that all the build scripts are compatible with both Python 2 and 3, we can flip the switch and tell Meson to use the latter. Since Meson already depends on Python 3 anyway, this means we don't need two different Python stacks to build Mesa. Signed-off-by: Mathieu Bridon <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* vc4: Implement texture_subdata() to directly upload tiled data.Eric Anholt2018-08-081-1/+39
| | | | | | This avoids a memcpy into a temporary in the upload path. Improves x11perf -putimage100 performance by 12.1586% +/- 1.38155% (n=145)
* vc4: Handle partial loads/stores of tiled textures.Eric Anholt2018-08-083-60/+155
| | | | | | | | | | | | | | | | Previously, we would load out the tile-aligned area, update the raster copy, and store it back. This was a huge cost for XPutImage calls to the screen under glamor. Instead, implement a general load/store path that walks over the source x/y writing into the corresponding pixel of the destination (using clever math from https://fgiesen.wordpress.com/2011/01/17/texture-tiling-and-swizzling/). If things are aligned, we go through the previous utile-at-a-time loop. Improves x11perf -putimage10 performance by 139.777% +/- 2.83464% (n=5) Improves x11perf -putimage100 performance by 383.908% +/- 22.6297% (n=11) Improves x11perf -getimage10 performance by 2.75731% +/- 0.585054% (n=145)
* vc4: Compile the LT image helper per cpp we might load/store.Eric Anholt2018-08-081-2/+31
| | | | | | | | For the partial load/store support I'm about to add, we want the memcpy to be compiled out to a single load/store. This should also eliminate the calls to vc4_utile_width/height(). Improves x11perf -putimage100 performance by 3.76344% +/- 1.16978% (n=15)
* vc4: Refactor to reuse the LT tile walking code.Eric Anholt2018-08-081-24/+34
|
* svga: use pipe_sampler_view::target in svga_set_sampler_views()Brian Paul2018-08-081-1/+1
| | | | | | | | | | | | | | | | | | instead of the underlying texture's target. This fixes an issue where the TGSI sampler type was not agreeing with the sampler view target/type. In particular, this fixes a Mint 19 XFCE desktop scaling issue because the TGSI code was using a RECT sampler but the sampler view's underlying texture was PIPE_TEXTURE_2D. We want to use the sampler view's type rather than the underlying resource, as we do for the view's surface format. No piglit regressions. VMware issue 2156696. Reviewed-by: Neha Bhende <[email protected]> Reviewed-by: Charmaine Lee <[email protected]>
* svga: use SVGA3D_RS_FILLMODE for vgpu9Brian Paul2018-08-083-26/+37
| | | | | | | | | | | | | I'm not sure why we didn't support this in the past, but fillmode is supported by all renderers nowadays. Also fix the logic in svga_create_rasterizer_state() to avoid a few swtnl case. No piglit regressions Reviewed-by: Neha Bhende <[email protected]> Reviewed-by: Charmaine Lee <[email protected]>
* svga: add TGSI_SEMANTIC_FACE switch case in svga_swtnl_update_vdecl()Brian Paul2018-08-081-0/+1
| | | | | | | | Fixes failed assertion running Piglit polygon-mode-face test. Though, the test still does not pass. Reviewed-by: Neha Bhende <[email protected]> Reviewed-by: Charmaine Lee <[email protected]>
* vc4: Fix vc4_fence_server_sync() on pre-syncobj kernels.Eric Anholt2018-08-071-1/+2
| | | | | | | | | We won't have an FD if we're just having the server wait on a fence created by eglCreateSyncKHR(). Our seqno fences will happen in order, so server-side waits are no-ops in that case. Fixes dEQP-EGL.functional.sharing.gles2.multithread.simple_egl_server_sync.buffers.gen_delete Fixes: b0acc3a5628c ("broadcom/vc4: Native fence fd support")
* vc4: Ignore samplers for finding uniform offsets.Eric Anholt2018-08-071-3/+14
| | | | | | | | | | Fixes: dEQP-GLES2.shaders.struct.uniform.sampler_array_fragment dEQP-GLES2.shaders.struct.uniform.sampler_array_vertex dEQP-GLES2.shaders.struct.uniform.sampler_nested_fragment dEQP-GLES2.shaders.struct.uniform.sampler_nested_vertex Cc: [email protected]
* vc4: Extend dumping of uniforms in QIR and in the command stream.Eric Anholt2018-08-073-13/+68
| | | | Similar to what I did for V3D, provide some description of the uniforms.
* vc4: Pull uinfo->data[i] dereference out to the top of the loop.Eric Anholt2018-08-071-20/+18
| | | | | | Reduces the size of vc4_uniforms.o by about 10%. We would basically always end up loading the cachline of uinfo->data[i] anyway, so it should be good for performance as well as making the code a bit cleaner.
* vc4: Make sure to emit a tile coordinates between two MSAA loads.Eric Anholt2018-08-071-12/+11
| | | | | | | | | | The HW only executes a load once the tile coordinates packet happens, and only tracks one at a time, so by emitting our two MSAA loads back to back we would end up with an undefined color or Z buffer. The simulator doesn't seem to care, but sync up the RCL generation with the kernel anyway. Fixes dEQP-EGL.functional.render.multi_context.gles2.rgb888_window
* vc4: Respect a sampler view's first_layer field.Eric Anholt2018-08-071-1/+3
| | | | | | | Fixes texturing from EGL images created from cubemap faces, as in dEQP-EGL.functional.image.create.gles2_cubemap_negative_x_rgba_texture Cc: [email protected]
* virgl: add ARB_shader_clock supportDave Airlie2018-08-082-1/+3
| | | | Reviewed-by: Erik Faye-Lund <[email protected]>
* radeonsi: set GLC=1 for all write-only shader resourcesMarek Olšák2018-08-071-2/+19
|
* radeonsi: don't load block dimensions into SGPRs if they are not variableMarek Olšák2018-08-073-7/+7
|
* swr: don't export swr_create_screen_internalEmil Velikov2018-08-072-2/+1
| | | | | | | | | | | | | | | | With earlier rework the user and provider of the symbol are within the same binary. Thus there's no point in exporting the function. Spotted while reviewing patch from Chuck, that nearly added another unneeded PUBLIC function. Cc: Chuck Atkins <[email protected]> Cc: Tim Rowley <[email protected]> Fixes: f50aa21456d "(swr: build driver proper separate from rasterizer") Signed-off-by: Emil Velikov <[email protected]> Tested-by: Chuck Atkins <[email protected]> Reviewed-By: George Kyriazis <[email protected]<mailto:[email protected]>> Tested-by: Chuck Atkins <[email protected]<mailto:[email protected]>>
* virgl: update virgl_hw.h from virglrendererErik Faye-Lund2018-08-071-1/+26
| | | | | | | | This just makes sure we're currently up-to-date with what virglrenderer has. Signed-off-by: Erik Faye-Lund <[email protected]> Acked-by: Dave Airlie <[email protected]>
* virgl: rename msaa_sample_positions -> sample_locationsErik Faye-Lund2018-08-072-5/+5
| | | | | | | | | | | | This matches what this field is called in virglrenderer's copy of this. This reduces the diff between the two different versions of virgl_hw.h, and should make it easier to upgrade the file in the future. Signed-off-by: Erik Faye-Lund <[email protected]> Acked-by: Dave Airlie <[email protected]>
* vc4: Fix a leak of the no-vertex-elements workaround BO.Eric Anholt2018-08-061-0/+2
| | | | Fixes: bd1925562ad1 ("vc4: Convert the driver to emitting the shader record using pack macros.")
* vc4: Fix context creation when syncobjs aren't supported.Eric Anholt2018-08-061-2/+6
| | | | | | Noticed when trying to run current Mesa on rpi's downstream kernel. Fixes: b0acc3a5628c ("broadcom/vc4: Native fence fd support")
* v3d: Emit the VCM_CACHE_SIZE packet.Eric Anholt2018-08-062-0/+9
| | | | | | | This is needed to ensure that we don't get blocked waiting for VPM space with bin/render overlapping. Cc: "18.2" <[email protected]>
* v3d: Drop "VC5" from the renderer string.Eric Anholt2018-08-061-1/+1
| | | | VC5 isn't a useful name any more, just stick to v3d.
* nvc0/ir: return 0 in imageLoad on incomplete texturesKarol Herbst2018-08-042-3/+31
| | | | | | | | | | | | | | We already guarded all OP_SULDP against out of bound accesses, but we ended up just reusing whatever value was stored in the dest registers. Fixes CTS test shader_image_load_store.incomplete_textures v2: fix for loads not ending up with predicates (bindless_texture) v3: fix replacing the def Cc: <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Signed-off-by: Karol Herbst <[email protected]>
* gm200/ir: optimize rcp(sqrt) to rsqKarol Herbst2018-08-041-1/+10
| | | | | | | | | | | | | | | | mitigates hurt shaders after adding sqrt: total instructions in shared programs : 5456166 -> 5454825 (-0.02%) total gprs used in shared programs : 647522 -> 647551 (0.00%) total shared used in shared programs : 389120 -> 389120 (0.00%) total local used in shared programs : 21064 -> 21064 (0.00%) total bytes used in shared programs : 58288696 -> 58274448 (-0.02%) local shared gpr inst bytes helped 0 0 0 516 516 hurt 0 0 27 2 2 Reviewed-by: Ilia Mirkin <[email protected]> Signed-off-by: Karol Herbst <[email protected]>
* gm200/ir: add native OP_SQRT supportKarol Herbst2018-08-044-2/+14
| | | | | | | | | | | | | | | | | | | | | ./GpuTest /test=pixmark_piano 1024x640 30sec: 301 -> 327 points shader-db: total instructions in shared programs : 5472103 -> 5456166 (-0.29%) total gprs used in shared programs : 647530 -> 647522 (-0.00%) total shared used in shared programs : 389120 -> 389120 (0.00%) total local used in shared programs : 21064 -> 21064 (0.00%) total bytes used in shared programs : 58459304 -> 58288696 (-0.29%) local shared gpr inst bytes helped 0 0 27 8281 8281 hurt 0 0 21 431 431 v2: use NVISA_GM200_CHIPSET Reviewed-by: Ilia Mirkin <[email protected]> Signed-off-by: Karol Herbst <[email protected]>
* radeonsi: cosmetic changesMarek Olšák2018-08-045-6/+5
|
* amd: remove support for LLVM 5.0Marek Olšák2018-08-033-36/+3
| | | | | | Users are encouraged to switch to LLVM 6.0 released in March 2018. Reviewed-by: Timothy Arceri <[email protected]>
* radeonsi: add new R600_DEBUG test "testclearbufperf"Darren Powell2018-08-028-11/+170
| | | | | Signed-off-by: Darren Powell <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* swr: Remove unnecessary memset callVlad Golovkin2018-08-021-1/+0
| | | | | | | | | | Zeroing memory after calloc is not necessary. This also allows to avoid possible crash when allocation fails, because memset is called before checking screen for NULL. Fixes: a29d63ecf71546c4798c6 "swr: refactor swr_create_screen to allow for proper cleanup on error" Reviewed-by: Eric Engestrom <[email protected]>
* ac,radeonsi: reduce optimizations for complex compute shaders on older APUs (v2)Marek Olšák2018-08-014-9/+43
| | | | | | | | To make dEQP-GLES31.functional.ssbo.layout.random.all_shared_buffer.23 finish sooner on the older CPUs. (otherwise it gets killed and we fail the test) Acked-by: Dave Airlie <[email protected]>
* vc4: Fix automake linking error.Juan A. Suarez Romero2018-08-011-0/+9
| | | | | | | | | | | | | | | CXXLD gallium_dri.la ../../../../src/gallium/drivers/vc4/.libs/libvc4.a(vc4_cl_dump.o): In function `vc4_dump_cl': src/gallium/drivers/vc4/vc4_cl_dump.c:45: undefined reference to `clif_dump_init' src/gallium/drivers/vc4/vc4_cl_dump.c:82: undefined reference to `clif_dump_destroy' ../../../../src/broadcom/cle/.libs/libbroadcom_cle.a(cle_libbroadcom_cle_la-v3d_decoder.o): In function `v3d_field_iterator_next': src/broadcom/cle/v3d_decoder.c:902: undefined reference to `clif_lookup_bo' Fixes: e92959c4e0 ("v3d: Pass the whole clif_dump structure to v3d_print_group().") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107423 CC: Eric Anholt <[email protected]> Acked-by: Eric Anholt <[email protected]> Reviewed-by: Andres Gomez <[email protected]>
* python: Use the unicode_escape codecMathieu Bridon2018-08-011-1/+1
| | | | | | | | | | | | Python 2 had string_escape and unicode_escape codecs. Python 3 only has the latter. These work the same as far as we're concerned, so let's use the future-proof one. However, the reste of the code expects unicode strings, so we need to decode them again. Signed-off-by: Mathieu Bridon <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* virgl: enable FBFETCH if virglrenderer supports itErik Faye-Lund2018-08-012-1/+3
| | | | | | | | | | | | | | | | | This fixes the following dEQP-GLES31 cases from NotSupported to Pass for me: - dEQP-GLES31.functional.blend_equation_advanced.state_query.* - dEQP-GLES31.functional.blend_equation_advanced.basic.* - dEQP-GLES31.functional.blend_equation_advanced.srgb.* - dEQP-GLES31.functional.blend_equation_advanced.msaa.* - dEQP-GLES31.functional.blend_equation_advanced.barrier.* - dEQP-GLES31.functional.draw_buffers_indexed.overwrite_*advanced_blend_eq* - dEQP-GLES31.functional.state_query.indexed.blend_equation_advanced_* - dEQP-GLES31.functional.debug.negative_coverage.*.advanced_blend.* Signed-off-by: Erik Faye-Lund <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* virgl: add texture_barrier stubErik Faye-Lund2018-08-011-0/+7
| | | | | | | | | | | | | | | | In gallium, supporting FBFETCH means supporting non-coherent fetches, but in virglrenderer, due to technical reasons this is backed by coherent fetches instead. This means we don't need to do anything for the barriers. However, if we don't have a texture_barrier implementation, we get crashes because the non-coherent extensions is exposed. So, let's leave this as a NOP for now. [airlied: I've got a more complete impl of this somewhere, once we land the host side]. Reviewed-by: Dave Airlie <[email protected]> Signed-off-by: Erik Faye-Lund <[email protected]>
* virgl: enable robustness if the host exposes itDave Airlie2018-08-012-1/+3
| | | | Reviewed-by: Gurchetan Singh <[email protected]>
* virgl: Support ARB_framebuffer_no_attachmentsDave Airlie2018-08-014-1/+23
| | | | | | This uses new protocol to send the default sizes to the host. Reviewed-by: Gurchetan Singh <[email protected]>
* virgl: add initial ARB_compute_shader supportDave Airlie2018-08-017-7/+153
| | | | | | This hooks up compute shader creation and launch grid support. Reviewed-by: Gurchetan Singh <[email protected]>
* radeonsi: report supported EQAA combinations from is_format_supportedMarek Olšák2018-07-311-16/+20
| | | | | | Framebuffer without attachments now supports 16 samples. Tested-by: Dieter Nützel <[email protected]>
* radeonsi: use storage_samples instead of color_samples in most placesMarek Olšák2018-07-316-41/+26
| | | | | | | and use pipe_resource::nr_storage_samples instead of r600_texture::num_color_samples. Tested-by: Dieter Nützel <[email protected]>
* gallium: add storage_sample_count parameter into is_format_supportedMarek Olšák2018-07-3131-6/+101
| | | | Tested-by: Dieter Nützel <[email protected]>
* gallium: add PIPE_CAP_FRAMEBUFFER_MSAA_CONSTRAINTSMarek Olšák2018-07-3116-0/+17
| | | | Tested-by: Dieter Nützel <[email protected]>
* virgl: also mark sampler views as dirtyGurchetan Singh2018-08-011-1/+2
| | | | | | | | | | | | | | | | When texture buffers are used as images in compute shaders, the guest never sees the modified data since the TBO is always marked as clean. Fixes most dEQP-GLES31.functional.image_load_store.buffer.* tests. Example test cases: dEQP-GLES31.functional.image_load_store.buffer.load_store.r32ui dEQP-GLES31.functional.image_load_store.buffer.qualifiers.coherent_r32f dEQP-GLES31.functional.image_load_store.buffer.format_reinterpret.rgba8_rgba8ui Note: virglrenderer side patch also needed to bind TBOs correctly Reviewed-by: Dave Airlie <[email protected]>
* virgl: add memory barrier supportDave Airlie2018-08-015-0/+29
| | | | Reviwed-by: Gert Wollny <[email protected]>
* virgl: add TXQS supportDave Airlie2018-08-012-1/+3
| | | | Reviwed-by: Gert Wollny <[email protected]>
* virgl: add initial images support (v2)Dave Airlie2018-08-018-0/+105
| | | | | | v2: add max image samples support Reviwed-by: Gert Wollny <[email protected]>
* etnaviv: fix typo in query namesChristian Gmeiner2018-07-311-2/+2
| | | | | | | Fixes: d0bed0b4944d ("etnaviv: support HI performance counters") Cc: [email protected] Signed-off-by: Christian Gmeiner <[email protected]> Reviewed-by: Chris Healy <[email protected]>
* v3d: Include commands to run the BCL and RCL in CLIF dumps.Eric Anholt2018-07-301-10/+1
|