aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* panfrost/meson: Remove subdir for nondrmAlyssa Rosenzweig2019-02-251-1/+0
| | | | | | This change fixes cross builds with the (temporary) non-DRM overlay. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Use tiler fast path (performance boost)Alyssa Rosenzweig2019-02-251-4/+38
| | | | | | | | | | | | | | | | | | For reasons that are still unclear (speculation included in the comment added in this patch), the tiler? metadata has a fast path that we were not enabling; there looks to be a possible time/memory tradeoff, but the details remain unclear. Regardless, this patch improves performance dramatically. Particular wins are for geometry-heavy scenes. For instance, glmark2-es2's Phong-shaded bunny, rendering at fullscreen (2400x1600) via GBM, jumped from ~20fps to hitting vsync cap at 60fps. Gains are even more obvious when vsync is disabled, as in glmark2-es2-wayland. With this patch, on GLES 2.0 samples not involving FBOs, it appears performance is converging with (and sometimes surpassing) the blob. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* nir/builder: Don't emit no-op swizzlesJason Ekstrand2019-02-241-1/+9
| | | | | | | | | | | The nir_swizzle helper is used some on it's own but it's also called by nir_channel and nir_channels which are used everywhere. It's pretty quick to check while we're walking the swizzle anyway whether or not it's an identity swizzle. If it is, we now don't bother emitting the instruction. Sure, copy-prop will clean it up for us but there's no sense making more work for the optimizer than we have to. Reviewed-by: Ian Romanick <[email protected]>
* nir/split_vars: Don't compact vectors unnecessarilyJason Ekstrand2019-02-241-0/+6
| | | | Reviewed-by: Alejandro Piñeiro <[email protected]>
* st/mesa: remove unused header-fileErik Faye-Lund2019-02-243-43/+0
| | | | | | | | | | This header has been unused since f8f2520e88c ("st/mesa: Remove unnecessary headers"). And in the more than 8 years since, this hasn't been useful. So let's just get rid of it. Signed-off-by: Erik Faye-Lund <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* configure: fix test portabilityMaya Rashish2019-02-241-2/+2
| | | | | | | | | From the bash manual: string1 == string2 string1 = string2 True if the strings are equal. = should be used with the test command for POSIX conformance.
* meson: ensure that xmlpool_options.h is generated for gallium targets that ↵David Shao2019-02-245-5/+5
| | | | | | | | | | | need it Fixes: 68076b87474e7959c161 "meson: build gallium vdpau state tracker" Fixes: 22a817af8a89eb3c762f "meson: build gallium xvmc state tracker" Fixes: 5a785d51a6d68ec676ce "meson: build gallium va state tracker" Fixes: 0ba909f0f111824223bc "meson: build gallium xa state tracker" Fixes: 1d36dc674d528b93bec3 "meson: build gallium omx state tracker" Reviewed-by: Eric Engestrom <[email protected]>
* vulkan/overlay: Add fps counterMatthias Lorenz2019-02-241-0/+17
| | | | | Reviewed-by: Lionel Landwerlin <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109747
* Revert "anv: add support for INTEL_DEBUG=bat"Lionel Landwerlin2019-02-241-49/+0
| | | | | | This reverts commit e4d88396d259c4ec6032d2834d1c9073d55e9b45. Apologies, I pushed the wrong commit.
* anv: add support for INTEL_DEBUG=batLionel Landwerlin2019-02-231-0/+49
| | | | | | As requested by Ken ;) Signed-off-by: Lionel Landwerlin <[email protected]>
* etnaviv: blt: mark used src resource as read fromChristian Gmeiner2019-02-231-0/+2
| | | | | | Signed-off-by: Christian Gmeiner <[email protected]> Reviewed-by: Lucas Stach <[email protected]> Reviewed-by: Boris Brezillon <[email protected]>
* etnaviv: rs: mark used src resource as read fromChristian Gmeiner2019-02-231-0/+1
| | | | | | Signed-off-by: Christian Gmeiner <[email protected]> Reviewed-by: Lucas Stach <[email protected]> Reviewed-by: Boris Brezillon <[email protected]>
* gallium/auxiliary/vl: Fix duplicate symbol build errors.Vinson Lee2019-02-222-6/+6
| | | | | | | | | | | | | | | | | | CXXLD gallium_dri.la duplicate symbol _compute_shader_video_buffer in: ../../../../src/gallium/auxiliary/.libs/libgalliumvl.a(libgalliumvl_la-vl_compositor.o) ../../../../src/gallium/auxiliary/.libs/libgalliumvl.a(libgalliumvl_la-vl_compositor_cs.o) duplicate symbol _compute_shader_weave in: ../../../../src/gallium/auxiliary/.libs/libgalliumvl.a(libgalliumvl_la-vl_compositor.o) ../../../../src/gallium/auxiliary/.libs/libgalliumvl.a(libgalliumvl_la-vl_compositor_cs.o) duplicate symbol _compute_shader_rgba in: ../../../../src/gallium/auxiliary/.libs/libgalliumvl.a(libgalliumvl_la-vl_compositor.o) ../../../../src/gallium/auxiliary/.libs/libgalliumvl.a(libgalliumvl_la-vl_compositor_cs.o) Fixes: 9364d66cb7f7 ("gallium/auxiliary/vl: Add video compositor compute shader render") Signed-off-by: Vinson Lee <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: James Zhu <[email protected]>
* nir: fix MSVC buildCaio Marcelo de Oliveira Filho2019-02-221-1/+1
| | | | Zero initialize struct with {0} instead of {}.
* nir/copy_prop_vars: add tests for load/store elements of vectorsCaio Marcelo de Oliveira Filho2019-02-221-0/+139
| | | | | | | Test using array deref on vectors in loads and stores. These are marked DISABLED_ as this optimization is currently not done. Reviewed-by: Jason Ekstrand <[email protected]>
* nir: nir_build_deref_follower accept array derefs of vectorsCaio Marcelo de Oliveira Filho2019-02-221-1/+3
| | | | | | | Code itself already supports it, just make sure we can use it for those cases. Reviewed-by: Jason Ekstrand <[email protected]>
* nir/copy_prop_vars: change test helper to get intrinsicsCaio Marcelo de Oliveira Filho2019-02-221-83/+56
| | | | | | | | | | | | | | | | | | Replace find_next_intrinsic(intrinsic, after) with get_intrinsic(intrinsic, index). This makes slightly more convenient to check the resulting loads/stores/copies, since in most tests we know which one we care about. The cost is to perform more traversals, but for such tests this is not a problem. Added the ASSERT_EQ() on count to some tests missing it, so the indices queried are always expected to find something. Also, drop two nir_print_shader leftover calls in a test. v2: Remove redundant assertions. nir_src_comp_as_uint already assert what we need. (Jason) Reviewed-by: Jason Ekstrand <[email protected]>
* nir/copy_prop_vars: keep track of components in copy_entryCaio Marcelo de Oliveira Filho2019-02-221-33/+48
| | | | | | | | | | | | | When a copy_entry is SSA, store not only the nir_ssa_def* for each component, but also the source component they come from. At the moment this is always a match (i.e. 'component[i] == i'), because all the operations for a copy_entry happen using definitions with the same size. This prepares the code for array_derefs of vectors, in which 'component[i] != i'. Also, extract setting all SSA components into a function of its own. Reviewed-by: Jason Ekstrand <[email protected]>
* nir/copy_prop_vars: add debug helpersCaio Marcelo de Oliveira Filho2019-02-221-1/+87
| | | | | | | | Disabled by default, to be used during development. Adding those so I don't rewrite some ad-hoc version of them everytime I'm working with this pass. Reviewed-by: Jason Ekstrand <[email protected]>
* nir/copy_prop_vars: don't get confused by array_deref of vectorsCaio Marcelo de Oliveira Filho2019-02-221-0/+28
| | | | | | | | | | | | | For now these derefs are not handled, so don't let these get into the copies list -- which would cause wrong propagations. For load_derefs, do nothing. For store_derefs, invalidate whatever the store is writing to. For copy_derefs, invalidate whatever the copy is writing to. These cases will happen once derefs to SSBOs/UBOs are kept around long enough to get optimized by copy_prop_vars. Reviewed-by: Jason Ekstrand <[email protected]>
* nir: allow nir_lower_phis_to_scalar() on more src typesTimothy Arceri2019-02-231-3/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rather than only lowering if all srcs are scalarizable we instead check that at least one src is scalarizable. We change undef type to return false otherwise it will cause regressions when it is the only scalarizable src. total instructions in shared programs: 13219105 -> 13024547 (-1.47%) instructions in affected programs: 1153797 -> 959239 (-16.86%) helped: 581 HURT: 74 total cycles in shared programs: 333968972 -> 324807922 (-2.74%) cycles in affected programs: 129809402 -> 120648352 (-7.06%) helped: 571 HURT: 131 total spills in shared programs: 57947 -> 29130 (-49.73%) spills in affected programs: 53364 -> 24547 (-54.00%) helped: 351 HURT: 0 total fills in shared programs: 51310 -> 25468 (-50.36%) fills in affected programs: 44882 -> 19040 (-57.58%) helped: 351 HURT: 0 Reviewed-by: Jason Ekstrand <[email protected]>
* swr/rast: bypass size limit for non-sampled texturesAlok Hota2019-02-221-1/+3
| | | | | | | | | This fixes a bug where SWR will fail to render in cases with large buffer allocations, e.g. very large meshes whose vertex buffers exceed 2GB CC: <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]>
* tgsi: don't set tgsi_info::uses_bindless_images for constbufs and hw atomicsMarek Olšák2019-02-221-1/+3
| | | | | | | | This might have decreased performance for radeonsi/tgsi, because most most shaders claimed they used bindless. Cc: 18.3 19.0 <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* iris: Add gitlab-ci build testingJordan Justen2019-02-221-1/+1
| | | | | Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* freedreno/a6xx: cube image fixRob Clark2019-02-221-0/+4
| | | | | | | | | | | Note that emit_intrinsic_load_image() already swaps a .3d flag with an .a flag. I tried doing things the other way around (going back to .3d) but that didn't work. And treating cube images as 2d array is also what blob does, so let's just go with that. Fixes dEQP-GLES31.functional.image_load_store.cube.load_store.* Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: fix border-color offsetRob Clark2019-02-222-3/+3
| | | | | | | Fixes nearly all of dEQP-GLES31.functional.texture.border_clamp.* when run after a test that binds textures used in vertex shader. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: don't hardcode wrmaskRob Clark2019-02-221-5/+6
| | | | | | | | | Fixes dEQP-GLES31.functional.shaders.opaque_type_indexing.sampler.const_literal.vertex.samplercubeshadow and few other similar tests that do multiple texture fetches into individual components of a packet output. Mostly works around the issue mentioned in ra_block_find_definers(). Signed-off-by: Rob Clark <[email protected]>
* freedreno: fix race conditionRob Clark2019-02-223-5/+16
| | | | | | | rsc->write_batch can be cleared behind our back, so we need to acquire the lock *before* deref'ing. Signed-off-by: Rob Clark <[email protected]>
* vulkan: Fix 32-bit build for the new overlay layerKenneth Graunke2019-02-221-3/+3
| | | | | | | | | | | | | vulkan_core.h defines non-dispatchable handles as (struct object *) on 64-bit systems, but uint64_t on 32-bit systems. The former can be implicitly cast to void *, but the latter requires an explicit cast. While here, %lu is the wrong format specifier for uint64_t on 32-bit systems, so use PRIu64, fixing a warning. Reported-by: Mike Lothian <[email protected]> Reviewed-by: Tapani Pälli <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: advertise 8 subpixel precision bitsJuan A. Suarez Romero2019-02-222-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On one side, when emitting 3DSTATE_SF, VertexSubPixelPrecisionSelect is used to select between 8 bit subpixel precision (value 0) or 4 bit subpixel precision (value 1). As this value is not set, means it is taking the value 0, so 8 bit are used. On the other side, in the Vulkan CTS tests, if the reference rasterizer, which uses 8 bit precision, as it is used to check what should be the expected value for the tests, is changed to use 4 bit as ANV was advertising so far, some of the tests will fail. So it seems ANV is actually using 8 bits. v2: explicitly set 3DSTATE_SF::VertexSubPixelPrecisionSelect (Jason) v3: use _8Bit definition as value (Jason) v4: (by Jason) anv: Explicitly set 3DSTATE_CLIP::VertexSubPixelPrecisionSelect This field was added on gen8 even though there's an identically defined one in 3DSTATE_SF. CC: Jason Ekstrand <[email protected]> CC: Kenneth Graunke <[email protected]> CC: 18.3 19.0 <[email protected]> Signed-off-by: Juan A. Suarez Romero <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* genxml: add missing field values for 3DSTATE_SFJuan A. Suarez Romero2019-02-226-6/+24
| | | | | | | | | Fill out "Vertex Sub Pixel Precision Select" possible values. CC: 18.3 19.0 <[email protected]> Signed-off-by: Juan A. Suarez Romero <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* radv: Allow interpolation on non-float types.Bas Nieuwenhuizen2019-02-221-10/+9
| | | | | | | | | | In particular structs containing floats and 16-bit floating point types. Fixes: 62024fa7750 "radv: enable VK_KHR_16bit_storage extension / 16bit storage features" Fixes: da295946361 "spirv: Only split blocks" Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109735 Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Fix float16 interpolation set up.Bas Nieuwenhuizen2019-02-226-16/+94
| | | | | | | | float16 types can have non-flat interpolation so set up the HW correctly for that. Fixes: 62024fa7750 "radv: enable VK_KHR_16bit_storage extension / 16bit storage features" Reviewed-by: Samuel Pitoiset <[email protected]>
* nv50: disable computeIlia Mirkin2019-02-221-1/+1
| | | | | | | | | | It causes more trouble than it's worth. Now vl tries to create compute shaders without all the proper checking. Since there's really no (current) way to use compute on nv50, just mark it disabled. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109742 Fixes: f6ac0b5d71 ("gallium/auxiliary/vl: Add compute shader to support video compositor render") Signed-off-by: Ilia Mirkin <[email protected]>
* intel: fix urb size for CFL GT1Lionel Landwerlin2019-02-221-0/+1
| | | | | | | | Same 192Kb amount as SKL/KBL GT1 applies. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Anuj Phogat <[email protected]> Fixes: de7ed0ba5522 ("i965/CFL: Add PCI Ids for Coffee Lake.")
* isl: the display engine requires 64B alignment for linear surfacesSamuel Iglesias Gonsálvez2019-02-221-0/+8
| | | | | | | v2: Add PRM quote (Lionel) Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* virgl: Enable mixed color FBO attachemnets only when the host supportsGert Wollny2019-02-222-1/+2
| | | | | | | it Signed-off-by: Gert Wollny <[email protected]> Reviewed-by: Elie Tournier <[email protected]>
* android: intel/isl: remove redundant building rulesMauro Rossi2019-02-221-13/+0
| | | | | | | | | | | | | | | | Fixes the following building error: including ./external/mesa/Android.mk ... build/core/base_rules.mk:183: *** external/mesa/src/intel: MODULE.TARGET.STATIC_LIBRARIES.libmesa_isl_tiled_memcpy already defined by external/mesa/src/intel. make: *** [build/core/ninja.mk:164: out/build-android_x86_64.ninja] Error 1 ISL_TILED_MEMCPY_FILES is isl/isl_tiled_memcpy_normal.c and that source file includes isl_tiled_memcpy.c source Fixes: 96bb328 ("iris: add Android build") Signed-off-by: Mauro Rossi <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* Revert "iris: Enable auxiliary buffer support"Kenneth Graunke2019-02-211-0/+3
| | | | | | This reverts commit cd0ced49e7957182d23e21657445b720184ea425. It breaks glxgears rendering.
* iris: Enable -msse2 and -mstackrealignKenneth Graunke2019-02-211-3/+3
| | | | | | | This is needed for gen_clflush.h intrinsics to work on 32-bit builds. i965 and anv both set these, and iris needs to as well. Tested-by: Mark Janes <[email protected]>
* intel/fs: Rely on undocumented unrestricted regioning for 32x16-bit integer ↵Francisco Jerez2019-02-211-3/+11
| | | | | | | | | | | | | multiply. Even though the hardware spec claims that any "integer DWord multiply" operation is affected by the regioning restrictions of CHV/BXT/GLK, this is inconsistent with the behavior of the simulator and with empirical evidence -- Return false from has_dst_aligned_region_restriction() for such instructions as a micro-optimization. Tested-by: Anuj Phogat <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel/fs: Implement extended strides greater than 4 for IR source regions.Francisco Jerez2019-02-211-3/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Strides up to 32B can be implemented for the source regions of most instructions by leveraging either the vertical or the horizontal stride of the hardware Align1 region. The main motivation for this is that currently the lower_integer_multiplication() pass will happily double the stride of one of the 32-bit sources, which can blow up if the stride of the original source was already the maximum value allowed by the hardware. An alternative would be to use the regioning legalization pass in order to lower such strides into the composition of multiple legal strides, but that would be somewhat less efficient. This showed up as a regression from my commit cbea91eb57a501bebb1ca2 in Vulkan 1.1 CTS tests on CHV/BXT platforms, however it was really a pre-existing problem that had affected conformance on other platforms without native support for integer multiplication. CHV/BXT were getting around it because the code I removed in that commit had the "fortunate" side effect of emitting narrower regions that didn't hit the hardware stride limit after lowering. Beyond fixing the regression this fixes ~90 additional Vulkan 1.1 subgroup CTS tests on ICL (that's why this patch is marked for inclusion in mesa-stable even though the original regressing patch was not). According to Jason, a nearly equivalent change had been committed previously as e8c9e65185de3e821e1 and then (mistakenly?) reverted as a31d0382084c8aa8. Cc: [email protected] Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109328 Reported-by: Mark Janes <[email protected]> Tested-by: Anuj Phogat <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel/fs: Cap dst-aligned region stride to maximum representable hstride value.Francisco Jerez2019-02-211-5/+23
| | | | | | | | | | | This is required in combination with the following commit, because otherwise if a source region with an extended 8+ stride is present in the instruction (which we're about to declare legal) we'll end up emitting code that attempts to write to such a region, even though strides greater than four are still illegal for the destination. Tested-by: Anuj Phogat <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel/fs: Lower integer multiply correctly when destination stride equals 4.Francisco Jerez2019-02-211-4/+8
| | | | | | | | | | | | | | | | | Because the "low" temporary needs to be accessed with word type and twice the original stride, attempting to preserve the alignment of the original destination can potentially lead to instructions with illegal destination stride greater than four. Because the CHV/BXT alignment restrictions are now being enforced by the regioning lowering pass run after lower_integer_multiplication(), there is no real need to preserve the original strides anymore. Note that this bug can be reproduced on stable branches, but back-porting would be non-trivial, because the fix relies on the regioning lowering pass recently introduced. Tested-by: Anuj Phogat <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel/fs: Exclude control sources from execution type and region alignment ↵Francisco Jerez2019-02-213-4/+68
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | calculations. Currently the execution type calculation will return a bogus value in cases like: mov_indirect(8) vgrf0:w, vgrf1:w, vgrf2:ud, 32u Which will be considered to have a 32-bit integer execution type even though the actual indirect move operation will be carried out with 16-bit precision. Similarly there's no need to apply the CHV/BXT double-precision region alignment restrictions to such control sources, since they aren't directly involved in the double-precision arithmetic operations emitted by these virtual instructions. Applying the CHV/BXT restrictions to control sources was expected to be harmless if mildly inefficient, but unfortunately it exposed problems at codegen level for virtual instructions (namely the SHUFFLE instruction used for the Vulkan 1.1 subgroup feature) that weren't prepared to accept control sources with an arbitrary strided region. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109328 Reported-by: Mark Janes <[email protected]> Fixes: efa4e4bc5fc "intel/fs: Introduce regioning lowering pass." Tested-by: Anuj Phogat <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: clone instruction set rather than removing individual entriesTimothy Arceri2019-02-221-3/+3
| | | | | | | | | | | This reduces the time spent in nir_opt_cse() by almost a half. The massif tool from callgrind reported no change in peak memory use with the large doliphin uber shaders I used for testing. Reviewed-by: Thomas Helland<[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* genxml: Remove extra space in gen4/45/5 field nameJordan Justen2019-02-213-15/+15
| | | | | | Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Sagar Ghuge <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* genxml/gen_bits_header.py: Use regex to strip no alphanum charsJordan Justen2019-02-211-26/+4
| | | | | | Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Sagar Ghuge <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* iris: Enable auxiliary buffer supportKenneth Graunke2019-02-211-3/+0
| | | | | | This currently regresses KHR-GL4x.compute_shader.resource-texture, but that's a pre-existing bug (https://bugs.freedesktop.org/109113) which should be fixed up once we have fast clear support.
* iris: Flag ALL_DIRTY_BINDINGS on aux state change.Rafael Antognolli2019-02-213-21/+29
| | | | | | If we change the aux state for a given resource, we need to re-emit the binding table pointers for any stage that has such resource bound. Since we don't track that, flag IRIS_ALL_DIRTY_BINDINGS and emit all of them.