aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* nir: Add ir3-specific version of most SSBO intrinsicsEduardo Lima Mitev2019-03-131-0/+27
| | | | | | | | | | | These are ir3 specific versions of SSBO intrinsics that add an extra source to hold the element offset (dword), which is what the backend instructions need. The original byte-offset source provided by NIR is not replaced because on a4xx and a5xx the backend still needs it. Reviewed-by: Rob Clark <[email protected]>
* docs: update calendar, add news item, and link release notes for 19.0.0Dylan Baker2019-03-133-10/+30
|
* docs: Add SHA256 sums for 19.0.0Dylan Baker2019-03-131-1/+2
|
* docs: Add release notes for 19.0.0Dylan Baker2019-03-131-2/+2402
|
* egl/dri: Avoid out of bounds array accessKevin Strasser2019-03-131-2/+4
| | | | | | | | | | | indexConfigAttrib iterates over every index in the dri driver, possibly exceeding __DRI_ATTRIB_MAX. In other words, if the dri driver has newer attributes libEGL will end up reading from uninitialized memory through dri2_to_egl_attribute_map[]. Signed-off-by: Kevin Strasser <[email protected]> Cc: [email protected] Reviewed-by: Emil Velikov <[email protected]>
* iris: Use streaming loads to read from tiled surfacesChris Wilson2019-03-132-2/+5
| | | | | | | | | | | Always use the streaming load (since we know we have Broadwell+, all of our target CPU support sse41) for reading back form the tiled surface for mapping the resource. This means we hit the fast WC handling paths on Atoms (without LLC), and for big Core (with LLC) using the streaming load is no less efficient as we do not require the tiled buffer to be pulled into the CPU cache. Reviewed-by: Kenneth Graunke <[email protected]>
* iris: Use coherent allocation for PIPE_RESOURCE_STAGINGChris Wilson2019-03-133-1/+24
| | | | | | | | | On !llc machines (Atoms), reading from a linear buffers is slow and so copying from one resource into the linear staging buffer is still slow. However, we can tell the GPU to snoop the CPU cache when reading from and writing to the staging buffer eliminating the slow uncached reads. Reviewed-by: Kenneth Graunke <[email protected]>
* iris: Use PIPE_BUFFER_STAGING for the query objectsChris Wilson2019-03-131-1/+1
| | | | | | We prefer fast CPU access to read back the query results. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/nir: Combine store_derefs to improve code from SPIR-VCaio Marcelo de Oliveira Filho2019-03-131-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Due to lack of write mask in SPIR-V store, generators may produce multiple stores to the same vector but using different array derefs. Use the combining store pass to clean this up. For example, layout(binding = 3) buffer block { vec4 v; }; void main() { v.x = 11; v.y = 22; } after going to SPIR-V and NIR, ends up with in two store_derefs to v[0] and v[1] vec2 32 ssa_4 = deref_struct &ssa_3->field0 (ssbo vec4) /* &((block *)ssa_2)->field0 */ vec2 32 ssa_6 = deref_array &(*ssa_4)[0] (ssbo float) /* &((block *)ssa_2)->field0[0] */ intrinsic store_deref (ssa_6, ssa_7) (1, 0) /* wrmask=x */ /* access=0 */ vec1 32 ssa_13 = load_const (0x00000001 /* 0.000000 */) vec2 32 ssa_14 = deref_array &(*ssa_4)[1] (ssbo float) /* &((block *)ssa_2)->field0[1] */ intrinsic store_deref (ssa_14, ssa_15) (1, 0) /* wrmask=x */ /* access=0 */ producing two different sends instructions in skl. The combining pass transform the snippet above into vec2 32 ssa_4 = deref_struct &ssa_3->field0 (ssbo vec4) /* &((block *)ssa_2)->field0 */ vec4 32 ssa_18 = vec4 ssa_7, ssa_15, ssa_16, ssa_17 intrinsic store_deref (ssa_4, ssa_18) (3, 0) /* wrmask=xy */ /* access=0 */ producing a single sends instruction. v2: Move this from spirv_to_nir into the general optimization pass for intel compiler. (Jason) Reviewed-by: Jason Ekstrand <[email protected]>
* intel/nir: Combine store_derefs after vectorizing IOCaio Marcelo de Oliveira Filho2019-03-131-0/+1
| | | | | | | | | | | | | | | | | | Shader-db results for skl: total instructions in shared programs: 15232903 -> 15224781 (-0.05%) instructions in affected programs: 61246 -> 53124 (-13.26%) helped: 221 HURT: 0 total cycles in shared programs: 371440470 -> 371398018 (-0.01%) cycles in affected programs: 281363 -> 238911 (-15.09%) helped: 221 HURT: 0 Results for bdw are very similar. Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Add a pass to combine store_derefs to same vectorCaio Marcelo de Oliveira Filho2019-03-136-0/+580
| | | | | | | | | v2: (all from Jason) Reuse existing function for the end of the block combinations. Check the SSA values are coming from the right place in tests. Document the case when the store to array_deref is reused. Reviewed-by: Jason Ekstrand <[email protected]>
* ac: use the raw tbuffer version for 16-bit SSBO loadsSamuel Pitoiset2019-03-133-6/+3
| | | | | | | vindex is always 0. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: add ac_build_{struct,raw}_tbuffer_load() helpersSamuel Pitoiset2019-03-133-23/+75
| | | | | | | The struct version sets IDXEN=1, while the raw version sets IDXEN=0. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: use typed buffer loads for vertex input fetchesSamuel Pitoiset2019-03-134-53/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | This drastically reduces the number of SGPRs because the driver now uses descriptors per vertex binding, instead of per vertex attribute format. 29077 shaders in 15096 tests Totals: SGPRS: 1354285 -> 1282109 (-5.33 %) VGPRS: 909896 -> 908800 (-0.12 %) Spilled SGPRs: 24840 -> 24811 (-0.12 %) Code Size: 49221144 -> 48986628 (-0.48 %) bytes Max Waves: 243930 -> 244229 (0.12 %) Totals from affected shaders: SGPRS: 390648 -> 318472 (-18.48 %) VGPRS: 288432 -> 287336 (-0.38 %) Spilled SGPRs: 94 -> 65 (-30.85 %) Code Size: 11548412 -> 11313896 (-2.03 %) bytes Max Waves: 86460 -> 86759 (0.35 %) This gives a really tiny boost. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: store more vertex attribute infos as pipeline keysSamuel Pitoiset2019-03-133-0/+37
| | | | | | | They are required for using typed buffer loads. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: rework typed buffers loads for LLVM 7Samuel Pitoiset2019-03-133-57/+83
| | | | | | | Be more generic, this will be used by an upcoming series. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* panfrost: Set bo->gem_handle when creating a linear BOTomeu Vizoso2019-03-131-1/+3
| | | | | | | So we can free it later. Signed-off-by: Tomeu Vizoso <[email protected]> Reviewed-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Set bo->size[0] in the DRM backendTomeu Vizoso2019-03-132-6/+5
| | | | | | | So we can unmap it later. Signed-off-by: Tomeu Vizoso <[email protected]> Reviewed-by: Alyssa Rosenzweig <[email protected]>
* intel/fs: Fix opt_peephole_csel to not throw away saturates.Kenneth Graunke2019-03-121-0/+1
| | | | | | | | | | | | | | | | We were not copying the saturate bit from the original instruction to the new replacement instruction. This caused major misrendering in DiRT Rally on iris, where comparisons leading to discards failed due to the missing saturate, causing lots of extra garbage pixels to be drawn in text rendering, trees, and so on. This did not show up on i965 because st/nir performs a more aggressive version of nir_opt_peephole_select, yielding more b32csel operations. Fixes: 52c7df1643e i965/fs: Merge CMP and SEL into CSEL on Gen8+ Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* glsl/lower_vector_derefs: Don't use a temporary for TCS outputsJason Ekstrand2019-03-131-10/+64
| | | | | | | | | | | | Tessellation control shader outputs act as if they have memory backing them and you can have multiple writes to different components of the same vector in-flight at the same time. When this happens, the load vec store pattern that gets used by ir_triop_vector_insert doesn't yield the correct results. Instead, just emit a sequence of conditional assignments. Reviewed-by: Ian Romanick <[email protected]> Cc: [email protected]
* glsl/list: Add a list variant of insert_afterJason Ekstrand2019-03-131-0/+26
| | | | | Reviewed-by: Ian Romanick <[email protected]> Caio Marcelo de Oliveira Filho <[email protected]>
* nir/loop_unroll: Fix out-of-bounds access handlingJason Ekstrand2019-03-121-12/+2
| | | | | | | | | | | The previous code was completely broken when it came to constructing the undef values. I'm not sure how it ever worked. For the case of a copy that reads an undefined value, we can just delete the copy because the destination is a valid undefined value. This saves us the effort of trying to construct a value for an arbitrary copy_deref intrinsic. Fixes: e8a8937a04 "nir: add partial loop unrolling support" Reviewed-by: Timothy Arceri <[email protected]>
* anv: Ignore VkRenderPassInputAttachementAspectCreateInfoJason Ekstrand2019-03-121-0/+4
| | | | | | | | We don't care about the information but there's no sense in throwing a debug warning about it. It's harmless but annoying to users. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109984 Reviewed-by: Sagar Ghuge <[email protected]>
* v3d: Fix leak of the renderonly struct on screen destruction.Eric Anholt2019-03-121-0/+1
| | | | | | This makes v3d match vc4's destroy path. Fixes: e113b21cb779 ("v3d: Add renderonly support.")
* v3d: Fix leak of the mem_ctx after the DAG refactor.Eric Anholt2019-03-121-2/+2
| | | | | | Noticed while trying to get a CTS run again. Fixes: 33886474d646 ("v3d: Use the DAG datastructure for QPU instruction scheduling.")
* glx: add support for GLX_ARB_create_context_no_error (v3)Grigori Goronzy2019-03-128-1/+96
| | | | | | | | | v2: Only reject no-error contexts for too-old GL if we're actually trying to create a no-error context (Adam Jackson) v3: Fix share contexts (Adam Jackson) Reviewed-by: Adam Jackson <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* radv: set the maximum number of IBs per submit to 192Samuel Pitoiset2019-03-122-1/+8
| | | | | | | | | This fixes random SteamVR corruption, see https://github.com/ValveSoftware/SteamVR-for-Linux/issues/181 Fixes: 4d30f2c6f42 ("radv/winsys: remove the max IBs per submit limit for the fallback path") Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* anv: Fix destroying descriptor sets when pool gets resetDanylo Piliaiev2019-03-121-6/+5
| | | | | | | | | | pool->next and pool->free_list were reset before their usage in anv_descriptor_pool_free_set Fixes: 775aabdd "anv: destroy descriptor sets when pool gets reset" Signed-off-by: Danylo Piliaiev <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* v3d: Disable PIPE_CAP_BLIT_BASED_TEXTURE_TRANSFER.Eric Anholt2019-03-121-0/+3
| | | | | | This reduces the runtime of dEQP-GLES3.functional.shaders.precision.* from 11.5s to 3.3s. This brings CTS runs down to 4 hours on one of my target devices.
* intel/nir: Vectorize all IOJason Ekstrand2019-03-121-0/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The IO scalarization pass that we run to help with linking end up turning some shader I/O such as that for tessellation and geometry shaders into many scalar URB operations rather than one vector one. To alleviate this, we now vectorize the I/O once again. This fixes a 10% performance regression in the GfxBench tessellation test that was caused by scalarizing. Shader-db results on Kaby Lake: total instructions in shared programs: 15224023 -> 15220871 (-0.02%) instructions in affected programs: 342009 -> 338857 (-0.92%) helped: 1236 HURT: 443 total spills in shared programs: 23471 -> 23465 (-0.03%) spills in affected programs: 6 -> 0 helped: 1 HURT: 0 total fills in shared programs: 31770 -> 31766 (-0.01%) fills in affected programs: 4 -> 0 helped: 1 HURT: 0 Cycles was just a lot of churn do to moves being different places. Most of the pure churn in instructions was +/- one or two instructions in fragment shaders. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107510 Fixes: 4434591bf56a "intel/nir: Call nir_lower_io_to_scalar_early" Fixes: 8d8222461f9d "intel/nir: Enable nir_opt_find_array_copies" Reviewed-by: Connor Abbott <[email protected]>
* nir: Add a pass for lowering IO back to vector when possibleJason Ekstrand2019-03-125-1/+392
| | | | | | | | This pass tries to turn scalar and array-of-scalar IO variables into vector IO variables whenever possible. Reviewed-by: Connor Abbott <[email protected]> Cc: "19.0" <[email protected]>
* ac/nir: fix 16-bit ssbo storesRhys Perry2019-03-121-0/+2
| | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* scons: Compatibility with Scons development version stringpal10002019-03-122-2/+14
| | | | | | | | | | | | | | | | | | | | | This ensures Mesa3D build doesn't fail in this case as encountered when bisecting Scons source code while regression testing https://bugs.freedesktop.org/show_bug.cgi?id=109443 and when testing 3.0.5.a.2 Technical details: Scons version string has consistently been in this format: MajorVersion.MinorVersion.Patch[.alpha/beta.yyyymmdd] so these formulas should strip alpha/beta flags and return Scons version: - as string - `'.'.join(SCons.__version__.split('.')[:3])` - as tuple of integers - `tuple(map(int, SCons.__version__.split('.')[:3]))` - v2: Fixed Scons version retrieval formulas as string and tuple of integers. - v3: Fixed Scons version string format description. Cc: "19.0" <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* anv: revert "anv: release memory allocated by glsl types during spirv_to_nir"Tapani Pälli2019-03-121-2/+0
| | | | | | | | | | This reverts commit 47fc359822494935852de1e70e4d840b2fe6a25c. Reason is that patch did not take in to account situation where we might have both OpenGL and Vulkan using glsl_types at the same time. Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* radeonsi/nir: Use nir stripping passConnor Abbott2019-03-121-0/+5
| | | | | | | | | This reduces compilation time for my shader-db collection from around 40 seconds to 30, vs. 19 seconds for TGSI. There are still some shaders that TGSI caches but NIR doesn't, partly because of more aggressive cross-stage optimizations with NIR. Reviewed-by: Timothy Arceri <[email protected]>
* nir: Add a stripping pass for improved cacheabilityConnor Abbott2019-03-124-0/+111
| | | | | | | | | | | | | | | | Oftentimes various nir shaders after lowering will be the same, or almost the same. For example, this can happen when the same shader is linked with different shaders to form different pipelines and cross-stage optimizations don't kick in to change it. We want to avoid running the backend twice on these shaders. We were already doing this with radeonsi, but we were storing a few extra pieces of information that made this much less effective compared to TGSI. The worse offender by far was the program name, which caused most of the cache misses. This pass strips out these pieces of information, controlled by the NIR_STRIP debug env variable. Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* radv: fix pointSizeRange limitsSamuel Pitoiset2019-03-121-1/+1
| | | | | | | | | | The values should match the ones that are emitted. This fixes new CTS dEQP-VK.rasterization.primitive_size.points.*. Fixes: f4e499ec791 ("radv: add initial non-conformant radv vulkan driver") Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* iris: Flag fewer dirty bits in BLORPSagar Ghuge2019-03-111-3/+27
| | | | | | | | | v2: 1) Skip flagging IRIS_DIRTY_DEPTH_BUFFER if BLORP_BATCH_NO_EMIT_DEPTH_STENCIL is set (Kenneth Graunke) 2) Add missing flags (Kenneth Graunke) Signed-off-by: Sagar Ghuge <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* st/glsl_to_nir: fix incorrect arrary accessTimothy Arceri2019-03-121-2/+5
| | | | | | | | | | | | | This fixes a segfault when we try to access the array using a -1 when the array wasn't allocated in the first place. Before 7536af670b75 we would just access a pre-allocated array that was also load/stored to/from the shader cache. But now the cache will no longer allocate these arrays if they are empty. The change resulted in tests such as the following segfaulting when run with a warm shader cache. tests/spec/arb_arrays_of_arrays/execution/sampler/fs-struct-const-index.shader_test
* nir: silence a couple new compiler warningsBrian Paul2019-03-122-2/+2
| | | | | | | | | | | | | | [33/630] Compiling C object 'src/compiler/nir/nir@sta/nir_loop_analyze.c.o'. ../src/compiler/nir/nir_loop_analyze.c: In function ‘try_find_trip_count_vars_in_iand’: ../src/compiler/nir/nir_loop_analyze.c:846:29: warning: suggest parentheses around ‘&&’ within ‘||’ [-Wparentheses] if (*ind == NULL || *ind && (*ind)->type != basic_induction || ^ [85/630] Compiling C object 'src/compiler/nir/nir@sta/nir_opt_loop_unroll.c.o'. ../src/compiler/nir/nir_opt_loop_unroll.c: In function ‘complex_unroll_single_terminator’: ../src/compiler/nir/nir_opt_loop_unroll.c:494:17: warning: unused variable ‘unroll_loc’ [-Wunused-variable] nir_cf_node *unroll_loc = ^ Reviewed-by: Timothy Arceri <[email protected]>
* panfrost: Identify fragment_extra flagsAlyssa Rosenzweig2019-03-123-10/+30
| | | | | | | | | | | | | | | | The fragment_extra structure contains additional fields extending the MRT framebuffer descriptor, snuck in between the main framebuffer descriptor and the render targets. Its fields include those related to transaction elimination and depth/stencil buffers. This patch identifies the flags field (previously just "unk" with some magic values) as well as identifying some (but not all) flags set by the driver. The process of identifying flags brought a bug to light where transaction elimination (checksumming) could not be enabled unless AFBC was in-use. This issue is now resolved. Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Tomeu Vizoso <[email protected]>
* panfrost: Document "depth-buffer writeback" bitAlyssa Rosenzweig2019-03-122-1/+9
| | | | | | | | | This bit, if set, causes the depth buffer to be copied from GPU tile memory to the provided depth buffer in main memory. If not set, the GPU will not access the main memory (saving considerable memory bandwidth if depth results are not actually used). Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Support linear depth texturesAlyssa Rosenzweig2019-03-121-2/+4
| | | | | | | | | | | This combination has not yet been seen "in the wild" in traces, but to support linear depth FBOs, ~bruteforce reveals this bit pattern is necessary. It's not yet clear why the meanings of 0x1 and 0x2 are essentially flipped (tiled vs linear for colour, linear vs some sort of tiled for depth). Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Tomeu Vizoso <[email protected]>
* panfrost: Allocate dedicated slab for linear BOsAlyssa Rosenzweig2019-03-122-15/+22
| | | | | | | | | | | | | | Previously, linear BOs shared memory with each other to minimize kernel round-trips / latency, as well as to work around a bug in the free_slab function. These concerns are invalid now, but continuing to use the slab allocator for BOs resulted in memory allocation errors. This issue was aggravated, though not introduced (so not a real regression) in the previous commit. v2 (unreviewed): Fix bug in v1 preventing munmaps from working Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Tomeu Vizoso <[email protected]>
* panfrost: Determine framebuffer format bits lateAlyssa Rosenzweig2019-03-121-17/+42
| | | | | | | | | | | Again, these formats are only properly known at the time of fragment job emit. Rather than hardcoding the format, at least for MFBD we begin to construct the format bits on-demand. This cleans up the code, futureproofs for ES3 framebuffer formats, and should fix bugs regarding FBO colour swizzles. Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Tomeu Vizoso <[email protected]>
* panfrost: Delay color buffer setupAlyssa Rosenzweig2019-03-121-43/+50
| | | | | | | | | In an effort to cleanup framebuffer management code, we delay colour buffer setup until the FRAGMENT job is actually emitted, allowing the AFBC and linear codepaths to be unified. Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Tomeu Vizoso <[email protected]>
* panfrost: Combine has_afbc/tiled in layout enumAlyssa Rosenzweig2019-03-123-24/+64
| | | | | | | | AFBC, tiled, and linear BO layouts are mutually exclusive; they should be coupled via a single enum rather than ad hoc checks of booleans. Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Tomeu Vizoso <[email protected]>
* panfrost: Cleanup needless if in create_boAlyssa Rosenzweig2019-03-121-30/+26
| | | | | | | | | I'm not sure why we were checking for these additional criteria (likely inherited from some other driver); remove the needless checks to cleanup the code and perhaps fix some bugs down the line. Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Tomeu Vizoso <[email protected]>
* i965: Reimplement all the PIPE_CONTROL rules.Kenneth Graunke2019-03-111-136/+403
| | | | | | | | | | | | | | | | | | | | | | | | | | This implements virtually all documented PIPE_CONTROL restrictions in a centralized helper. You now simply ask for the operations you want, and the pipe control "brain" will figure out exactly what pipe controls to emit to make that happen without tanking your system. The hope is that this will fix some intermittent flushing issues as well as GPU hangs. However, it also has a high risk of causing GPU hangs and other regressions, as this is a particularly sensitive area and poking the bear isn't always advisable. Mark Janes noted that this patch helps with some GPU hangs on Icelake. This does re-enable the VF Invalidate => Write Immediate workaround on Gen8, which had been disabled (bug 103787) due to GPU hangs. The old code did this workaround after another which would have added CS stall bits, so it missed a workaround. The new code orders them properly and appears to work. v4: Don't pass "bo, offset, imm" to a recursive CS stall (caught by Topi Pohjolainen), drop Gen10 workarounds that are unnecessary for production hardware. Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Use genxml for emitting PIPE_CONTROL.Kenneth Graunke2019-03-117-230/+362
| | | | | | | | | | | While this does add a bunch of boilerplate, it also protects us against the hardware moving bits, or changing their meaning. For something as finnicky as PIPE_CONTROL, the extra safety seems worth it. We turn PIPE_CONTROL_* into an bitfield of arbitrary flags, and then pack them appropriately. Reviewed-by: Topi Pohjolainen <[email protected]>