summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* i965: Disable internal CCS for shadows of multi-sampled windowsJason Ekstrand2018-06-041-1/+10
| | | | | | | | | | | | | | If window system supports Y-tiling but not CCS_E, we currently create an internal CCS for any window system buffers and then resolve right before handing it off to X or Wayland. In the case of the single-sampled shadow of a multi-sampled window system buffer, this is pointless because the only thing we do with it is use it as a MSAA resolve target so we do MSAA resolve -> CCS resolve -> hand to the window system. Instead, just disable CCS for the shadow and then the MSAA resolve will write uncompressed directly into it. If the window system supports CCS_E, we will still use CCS_E, we just won't do internal CCS. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/miptree: Rename a parameter to create_for_dri_imageJason Ekstrand2018-06-042-4/+4
| | | | | | | Instead of having it be a general "is this a winsys image" boolean, make it more specific to the actual purpose. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/eu: Switch to a logical state stackJason Ekstrand2018-06-043-126/+72
| | | | | | | | Instead of the state stack that's based on copying a dummy instruction around, we start using a logical stack of brw_insn_states. This uses a bit less memory and is way less conceptually bogus. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/eu: Set flag [sub]register number differently for 3srcJason Ekstrand2018-06-041-3/+10
| | | | | | | | | | | Prior to gen8, the flag [sub]register number is in a different spot on 3src instructions than on other instructions. Starting with Broadwell, they made it consistent. This commit fixes bugs that occur when a conditional modifier gets propagated into a 3src instruction such as a MAD. Cc: [email protected] Reviewed-by: Kenneth Graunke <[email protected]>
* intel/eu: Copy fields manually in brw_next_insnJason Ekstrand2018-06-041-1/+94
| | | | | | | | Instead of doing a memcpy, this moves us to start with a blank instruction (memset to zero) and copy the fields over one at a time. Cc: [email protected] Reviewed-by: Kenneth Graunke <[email protected]>
* intel/eu: Add some brw_get_default_ helpersJason Ekstrand2018-06-044-55/+79
| | | | | | | | This is much cleaner than everything that wants a default value poking at the bits of p->current directly. Cc: [email protected] Reviewed-by: Kenneth Graunke <[email protected]>
* trace: Fix parsing of recent traces.Jose Fonseca2018-06-041-5/+26
| | | | Reviewed-by: Roland Scheidegger <[email protected]>
* trace: Fix trace_context_transfer_unmap methods.Jose Fonseca2018-06-041-18/+42
| | | | | | | | | The emitted buffer_subdata/texture_subdata call didn't match the respective signatures. v2: Actually emit buffer_subdata call. Reviewed-by: Roland Scheidegger <[email protected]>
* amd/common: use the dimension-aware image intrinsics on LLVM 7+Nicolai Hähnle2018-06-041-24/+165
| | | | | | Requires LLVM trunk r329166. Acked-by: Marek Olšák <[email protected]>
* i965: Fix batch-last mode to properly swap BOs.Kenneth Graunke2018-06-041-0/+5
| | | | | | | | | | | | | | | On pre-4.13 kernels, which don't support I915_EXEC_BATCH_FIRST, we move the validation list entry to the end...but incorrectly left the exec_bo array alone, causing a mismatch where exec_bos[0] no longer corresponded with validation_list[0] (and similarly for the last entry). One example of resulting breakage is that we'd update bo->gtt_offset based on the wrong buffer. This wreaked total havoc when trying to use softpin, and likely caused unnecessary relocations in the normal case. Fixes: 29ba502a4e28471f67e4e904ae503157087efd20 (i965: Use I915_EXEC_BATCH_FIRST when available.) Reviewed-by: Chris Wilson <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* radv: fix a GPU hang when MRTs are sparseSamuel Pitoiset2018-06-041-0/+10
| | | | | | | | | | | | | | When the i-th target format is set, all previous target formats must be non-zero to avoid hangs. In other words, without this if a fragment shader exports mrt0, mrt2 and mrt3, the GPU hangs because the target format of mrt1 is zero. This fixes DXVK GPU hangs with "Seven: The Days Long Gone", "GTA V" and probably more games. Cc: "18.0" 18.1" <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Don't pass a TESS_EVAL shader when tesselation is not enabled.Bas Nieuwenhuizen2018-06-041-0/+2
| | | | | | | | | | | | | | | | Otherwise on pre-GFX9, if the constant layout allows both TESS_EVAL and GEOMETRY shaders, but the PIPELINE has only GEOMETRY, it would return the GEOMETRY shader for the TESS_EVAL shader. This would cause the flush_constants code to emit the GEOMETRY constants to the TESS_EVAL registers and then conclude that it did not need to set the GEOMETRY shader registers. Fixes: dfff9fb6f8d "radv: Handle GFX9 merged shaders in radv_flush_constants()" CC: 18.1 <[email protected]> Reviewed-by: Alex Smith <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* nir: implement the GLSL equivalent of if simplication in nir_opt_ifSamuel Pitoiset2018-06-041-5/+92
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This pass turns: if (cond) { } else { do_work(); } into: if (!cond) { do_work(); } else { } Here's the vkpipeline-db stats (from affected shaders) on Polaris10: Totals from affected shaders: SGPRS: 17272 -> 17296 (0.14 %) VGPRS: 18712 -> 18740 (0.15 %) Spilled SGPRs: 1179 -> 1142 (-3.14 %) Code Size: 1503364 -> 1515176 (0.79 %) bytes Max Waves: 916 -> 911 (-0.55 %) This pass only affects Serious Sam 2017 (Vulkan) on my side. The stats are not really good for now. Some shaders look quite dumb but this will be improved with further NIR passes, like ifs combination. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* nir: make is_comparison() a non-static helper functionSamuel Pitoiset2018-06-042-25/+25
| | | | | | | | | Rename and change the prototype for consistency regarding nir_tex_instr_is_query(). This function will be used in the following patch. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* nir: use num_components wrappers in print/validate.Dave Airlie2018-06-042-15/+5
| | | | | | These wrappers were introduces, so start using them. Reviewed-by: Jason Ekstrand <[email protected]>
* doc: update calendar, add news and link release notes for 18.0.5Juan A. Suarez Romero2018-06-033-7/+11
| | | | Signed-off-by: Juan A. Suarez Romero <[email protected]>
* docs: add sha256 checksums for 18.0.5Juan A. Suarez Romero2018-06-031-1/+2
| | | | | Signed-off-by: Juan A. Suarez Romero <[email protected]> (cherry picked from commit aba161e63a25a07c3c24fec01b6c63c43874b805)
* docs: add release notes for 18.0.5Juan A. Suarez Romero2018-06-031-0/+161
| | | | | Signed-off-by: Juan A. Suarez Romero <[email protected]> (cherry picked from commit ca0037aaefcb06ff8e1eb6fbde8f313c45789921)
* scons: Fix MinGW cross compilation with LLVM 5.0.Jose Fonseca2018-06-021-1/+8
| | | | | | LLVM 5.0 requires additional Win32 libraries, and MinGW with pthreads. Reviewed-by: Roland Scheidegger <[email protected]>
* anv: Don't even bother processing relocs if we have softpinJason Ekstrand2018-06-011-3/+15
| | | | Reviewed-by: Scott D Phillips <[email protected]>
* anv: Refactor reloc handling in execbuf_add_boJason Ekstrand2018-06-011-36/+42
| | | | | | | This just separates the reloc list vs. BO set cases and lets us avoid an allocation if relocs->deps->entries == 0. Reviewed-by: Scott D Phillips <[email protected]>
* anv: Assert that the kernel leaves pinned BO addresses aloneJason Ekstrand2018-06-011-1/+4
| | | | Reviewed-by: Scott D Phillips <[email protected]>
* anv: Soft-pin everything elseScott D Phillips2018-06-013-1/+21
| | | | | | | | v2 (Jason Ekstrand): - Break up Scott's mega-patch Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Scott D Phillips <[email protected]>
* anv: Soft-pin batch buffersScott D Phillips2018-06-014-11/+30
| | | | | | Co-authored-by: Jason Ekstrand <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Scott D Phillips <[email protected]>
* anv/batch_chain: Simplify secondary batch return chainingJason Ekstrand2018-06-011-40/+36
| | | | | | | | | | | | | | | Previously, we did this weird thing where we left space and an empty relocation for use in a hypothetical MI_BATCH_BUFFER_START that would be added to the secondary later. Then, when it came time to chain it into the primary, we would back that out and emit an MI_BATCH_BUFFER_START. This worked well but it was always a bit hacky, fragile and ugly. This commit instead adds a helper for rewriting the MI_BATCH_BUFFER_START at the end of an anv_batch_bo and we use that helper for both batch bo list cloning and handling returns from secondaries. The new helper doesn't actually modify the batch in any way but instead just adjusts the relocation as needed. Reviewed-by: Scott D Phillips <[email protected]>
* anv/batch_chain: Call batch_bo_finish at the end of end_batch_bufferJason Ekstrand2018-06-011-6/+6
| | | | | | | | | | The only reason we were calling it in the middle was that one of the cases for figuring out the secondary command buffer execution type wanted batch_bo->length which gets set by batch_bo_finish. It's easy enough to recalculate and now batch_bo_finish is called in a sensible location. Reviewed-by: Scott D Phillips <[email protected]>
* anv: Soft-pin client-allocated memoryJason Ekstrand2018-06-011-0/+3
| | | | | | | | Now that we've done all that refactoring, addresses are now being directly written into surface states by ISL and BLORP whenever a BO is pinned so there's really nothing to do besides enable it. Reviewed-by: Scott D Phillips <[email protected]>
* anv/allocator: Support softpin in the BO cacheJason Ekstrand2018-06-011-1/+50
| | | | Reviewed-by: Scott D Phillips <[email protected]>
* anv/allocator: Set the BO flags in bo_cache_alloc/importJason Ekstrand2018-06-015-28/+60
| | | | | | | It's safer to set them there because we have the opportunity to properly handle combining flags if a BO is imported more than once. Reviewed-by: Scott D Phillips <[email protected]>
* anv: For pinned BOs, skip relocations, but track bo usageScott D Phillips2018-06-012-0/+66
| | | | | | | | | | | | | | | | | References to pinned BOs won't need to be relocated at a later point, so just write the final value of the reference into the bo directly. Add a `set` to the relocation lists for tracking dependencies that were previously tracked by relocations. When a batch is executed, we add the referenced pinned BOs to the exec list. v2: - visit bos from the dependency set in a deterministic order (Jason) v3: - compar => compare, drat (Jason) - Reworded commit message, provided by (Jordan) Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: Use a separate pool for binding tables when soft pinningScott D Phillips2018-06-013-11/+53
| | | | | | | | | | | | | | | | | | | | | | Soft pinning lets us satisfy the binding table address requirements without using both sides of a growing state_pool. If you do use both sides of a state pool, then you need to read the state pool's center_bo_offset (with the device mutex held) to know the final offset of relocations that target the state pool bo. By having a separate pool for binding tables that only grows in the forward direction, the center_bo_offset is always 0 and relocations don't need an update pass to adjust relocations with the mutex held. v2: - don't introduce a separate state flag for separate binding tables (Jason) - replace bo and map accessors with a single binding_table_pool accessor (Jason) v3: - assert bt_block->offset >= 0 for the separate binding table (Jason) Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* anv: Soft-pin state poolsScott D Phillips2018-06-017-10/+34
| | | | | | | | | | | | | The state_pools reserve virtual address space of the full BLOCK_POOL_MEMFD_SIZE, but maintain the current behavior of growing from the middle. v2: - rename block_pool::offset to block_pool::start_address (Jason) - assign state pool start_address statically (Jason) v3: - remove unnecessary bo_flags tampering for the dynamic pool (Jason) Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* nir: Lower !f2b(x) to x == 0.0Ian Romanick2018-06-011-0/+1
| | | | | | | | | | | | | | | | | | | | | | Some trivial help now, but it also prevents ~40 regressions caused by Samuel's "nir: implement the GLSL equivalent of if simplication in nir_opt_if" patch. All Gen4+ platforms had similar results. (Skylake shown) total instructions in shared programs: 14369557 -> 14369555 (<.01%) instructions in affected programs: 442 -> 440 (-0.45%) helped: 2 HURT: 0 total cycles in shared programs: 532425772 -> 532425743 (<.01%) cycles in affected programs: 6086 -> 6057 (-0.48%) helped: 2 HURT: 0 Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]>
* nir: Add some missing "optimization undo" patternsIan Romanick2018-06-011-1/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | d8d18516b0a and 03fb13f6467 added some patterns to undo conversions like (('ior', ('flt', a, b), ('flt', a, c)), ('flt', a, ('fmax', b, c))) If further optimization cause some of the operands to either be the same or be constants, undoing the transformation can lead to further savings. I don't know why these patterns were not added in those patches. I did not check to see which specific patterns actually helped. I just added all of them for symmetry. This prevents some loop unrolling regressions Plane Shift caused by Samuel's "nir: implement the GLSL equivalent of if simplication in nir_opt_if" patch. Skylake and Broadwell had similar results. (Skylake shown) total instructions in shared programs: 14369768 -> 14369557 (<.01%) instructions in affected programs: 44076 -> 43865 (-0.48%) helped: 141 HURT: 0 helped stats (abs) min: 1 max: 5 x̄: 1.50 x̃: 1 helped stats (rel) min: 0.07% max: 1.52% x̄: 0.66% x̃: 0.60% 95% mean confidence interval for instructions value: -1.67 -1.32 95% mean confidence interval for instructions %-change: -0.72% -0.59% Instructions are helped. total cycles in shared programs: 532430629 -> 532425772 (<.01%) cycles in affected programs: 1170832 -> 1165975 (-0.41%) helped: 101 HURT: 5 helped stats (abs) min: 1 max: 160 x̄: 48.54 x̃: 32 helped stats (rel) min: <.01% max: 8.49% x̄: 2.76% x̃: 2.03% HURT stats (abs) min: 2 max: 22 x̄: 9.20 x̃: 4 HURT stats (rel) min: <.01% max: 0.05% x̄: 0.02% x̃: <.01% 95% mean confidence interval for cycles value: -53.64 -38.00 95% mean confidence interval for cycles %-change: -3.06% -2.20% Cycles are helped. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* docs/meson: mention how to use array optionsEric Engestrom2018-06-011-0/+8
| | | | Signed-off-by: Eric Engestrom <[email protected]>
* meson: drop unused empty string array elementEric Engestrom2018-06-011-3/+3
| | | | Signed-off-by: Eric Engestrom <[email protected]>
* meson: fix platforms=[]Eric Engestrom2018-06-011-15/+13
| | | | | Fixes: 5608d0a2cee47c7d037f ("meson: use array type options") Signed-off-by: Eric Engestrom <[email protected]>
* meson: fix vulkan-drivers=[]Eric Engestrom2018-06-011-8/+4
| | | | | Fixes: 5608d0a2cee47c7d037f ("meson: use array type options") Signed-off-by: Eric Engestrom <[email protected]>
* meson: fix gallium-drivers=[]Eric Engestrom2018-06-011-41/+24
| | | | | Fixes: 5608d0a2cee47c7d037f ("meson: use array type options") Signed-off-by: Eric Engestrom <[email protected]>
* meson: fix dri-drivers=[]Eric Engestrom2018-06-011-16/+9
| | | | | Fixes: 5608d0a2cee47c7d037f ("meson: use array type options") Signed-off-by: Eric Engestrom <[email protected]>
* REVIEWERS: add root meson.build to the Meson reviewers groupEric Engestrom2018-06-011-0/+1
| | | | | Reviewed-by: Dylan Baker <[email protected]> Signed-off-by: Eric Engestrom <[email protected]>
* glsl: Add ir_binop_vector_extract in NIRJuan A. Suarez Romero2018-06-011-0/+9
| | | | | | | | | | | | | | | Implement ir_binop_vector_extract using NIR operations. Based on SPIR-V to NIR approach. This fixes: dEQP-GLES3.functional.shaders.indexing.moredynamic.with_value_from_indexing_expression_fragment Piglit's glsl-fs-vec4-indexing-8.shader_test CC: [email protected] Signed-off-by: Juan A. Suarez Romero <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Iago Toral <[email protected]>
* doc: update calendar, add news and link release notes for 18.1.1Dylan Baker2018-06-013-7/+8
|
* docs/relnotes: Add sha256 sums for mesa 18.1.1Dylan Baker2018-06-011-1/+2
|
* docs: Add release notes for 18.1.1Dylan Baker2018-06-011-0/+167
|
* i965: Add ARB_fragment_shader_interlock support.Plamena Manolova2018-06-0110-8/+37
| | | | | | | | | Adds suppport for ARB_fragment_shader_interlock. We achieve the interlock and fragment ordering by issuing a memory fence via sendc. Signed-off-by: Plamena Manolova <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* mesa: Add GL/GLSL plumbing for ARB_fragment_shader_interlock.Plamena Manolova2018-06-0114-1/+197
| | | | | | | | | | | | | This extension provides new GLSL built-in functions beginInvocationInterlockARB() and endInvocationInterlockARB() that delimit a critical section of fragment shader code. For pairs of shader invocations with "overlapping" coverage in a given pixel, the OpenGL implementation will guarantee that the critical section of the fragment shader will be executed for only one fragment at a time. Signed-off-by: Plamena Manolova <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* compiler/spirv: reject invalid shader code properlyMartin Pelikán2018-06-012-5/+38
| | | | | | | | | | After bebe3d626e5, b->fail_jump is prepared after vtn_create_builder which can longjmp(3) to it through its vtx_assert()s. This corrupts the stack and creates confusing core dumps, so we need to avoid it. While there, I decided to print the offending values for debugability. Reviewed-by: Jason Ekstrand <[email protected]>
* docs: change release manager for 18.1Juan A. Suarez Romero2018-06-011-7/+7
| | | | | | | | Dylan will replace Emil as the release manager for 18.1.x series. CC: Emil Velikov <[email protected]> CC: Dylan Baker <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* virgl: Always assume that ORIGIN_UPPER_LEFT and PIXEL_CENTER* are supportedGert Wollny2018-06-011-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The driver must support at least one of PIPE_CAP_TGSI_FS_COORD_ORIGIN_UPPER_LEFT PIPE_CAP_TGSI_FS_COORD_ORIGIN_LOWER_LEFT and one of PIPE_CAP_TGSI_FS_COORD_PIXEL_CENTER_HALF_INTEGER PIPE_CAP_TGSI_FS_COORD_PIXEL_CENTER_INTEGER otherwise glsl_to_tgsi will fire an assert. ORIGIN_UPPER_LEFT is the default convention, and is supported by all mesa drivers, hence it seems reasonable to always report the caps to be enabled. On gles ORIGIN_LOWER_LEFT is generally not supported, so we rely on the caps reported by the host that depend on whether we run on an GL or an EGL host. For PIXEL_CENTER it is completely host driver dependend on what is supported, and since we do not report the actual host driver capabilities it is best to mark both as supported, this is how it works for a GL host too. Fixes: dEQP-GLES3.functional.shaders.builtin_variable.fragcoord_xyz dEQP-GLES3.functional.shaders.metamorphic.bubblesort_flag.variant_1 dEQP-GLES3.functional.shaders.metamorphic.bubblesort_flag.variant_2 Reviewed-by: Gurchetan Singh <[email protected]> Signed-off-by: Gert Wollny <[email protected]> Signed-off-by: Jakob Bornecrantz <[email protected]>