summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* radv: Implement VK_ANDROID_native_buffer.Bas Nieuwenhuizen2018-01-197-4/+407
| | | | | | | | | | | | | | | | | | | | | | | | Passes dEQP-VK.api.smoke.* dEQP-VK.wsi.android.* with android-cts-7.1_r12 . Unlike the initial anv implementation this does use syncobjs instead of waiting on the CPU. This is missing meson build coverage for now. One possible todo is that linux 4.15 now has a sycall that allows us to export amdgpu fence to a sync_file, which allows us not to force all fences and semaphores to use syncobjs. However, I had trouble with my kernel crashing regularly with NULL pointers, and I'm not sure how beneficial it is in the first place given that intel uses syncobjs for all fences if available. Reviewed-by: Dave Airlie <[email protected]>
* radv: Add create image flag to not use DCC/CMASK.Bas Nieuwenhuizen2018-01-192-19/+25
| | | | | | | If we import an image, we might not have space in the buffer for CMASK, even though it is compatible. Reviewed-by: Dave Airlie <[email protected]>
* radv: Generate VK_ANDROID_native_buffer.Bas Nieuwenhuizen2018-01-193-2/+9
| | | | Reviewed-by: Dave Airlie <[email protected]>
* radv: Replace an assert with unreachable.Bas Nieuwenhuizen2018-01-191-1/+1
| | | | | | Otherwise we get uninitialized variable warnings for es_vgpr_comp_cnt. Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Remove DCC check on CS resolve dst image.Bas Nieuwenhuizen2018-01-191-3/+0
| | | | | | | Gives a warning when the assert is disabled, and not even necessarily true. Reviewed-by: Samuel Pitoiset <[email protected]>
* gallivm: support avx512 (16x32) in interleave2_halfGeorge Kyriazis2018-01-181-2/+38
| | | | | | | | | | | | | | lp_build_interleave2_half was not doing the right thing for avx512-style 16-wide loads. This path is hit in the swr driver with a 16-wide vertex shader. It is called from lp_build_transpose_aos, when doing texel fetches and the fetched data needs to be transposed to one component per output register. Special-case the post-load swizzle operations for avx512 16x32 (16-wide 32-bit values) so that we move the xyzw components correctly to the outputs. Reviewed-by: Roland Scheidegger <[email protected]>
* vbo: fix VBO optimization regressionBrian Paul2018-01-182-4/+7
| | | | | | | | | | | | | | | The optimization in change 8e4efdc895ea ("vbo: optimize some display list drawing") missed the loopback case. This is used when the glBegin/End primitive doesn't have a uniform set of vertex attributes. The new Piglit gl-1.0-dlist-materials test hits this. So check the aligned_vertex_buffer_offset(list) value and adjust the buffer offset accordingly. We also need to remove the 'start == 0' assertion in the loopback code since it no longer applies. Reviewed-by: Roland Scheidegger <[email protected]>
* meson: ensure that xmlpool_options.h is generated for targets that need itDylan Baker2018-01-183-12/+12
| | | | | | | | Currently a couple of gallium targets race with xmlpool_options.h being generated, don't do that. Signed-off-by: Dylan Baker <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* ac: fix visit_ssa_undef() for doublesTimothy Arceri2018-01-191-2/+3
| | | | | | | | V2: use LLVMIntTypeInContext() Fixes: f4e499ec7914 "radv: add initial non-conformant radv vulkan driver" Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac/nir: account for view index in the user sgpr allocation.Dave Airlie2018-01-181-8/+34
| | | | | | | | | | The view index user sgpr wasn't being accounted for properly, this refactors out the code to decide if it's required and then uses that info to account for it. Fixes: 180c1b924e (ac/nir: Add shader support for multiviews.) Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: enable ARB_enhanced_layoutsDave Airlie2018-01-193-3/+4
| | | | | | | | | | | Only one piglit test fails, sso-vs-gs-fs-array-interleave There are 3 tests using ssbo without checking sizes failing also but those are test bugs. Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* intel: Future-proof ring names for aubinator_error_decodeChris Wilson2018-01-181-24/+98
| | | | | | | | | | | | | | | | | | The kernel is moving to a $class$instance naming scheme in preparation for accommodating more rings in the future in a consistent manner. It is already using the naming scheme internally, and now we are looking at updating some soft-ABI such as the error state to use the new naming scheme. This of course means we need to teach aubinator_error_decode how to map both sets of ring names onto its register maps. Signed-off-by: Chris Wilson <[email protected]> Cc: Michel Thierry <[email protected]> Cc: Michal Wajdeczko <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Cc: Lionel Landwerlin <[email protected]> Cc: Kenneth Graunke <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Michel Thierry <[email protected]>
* i965: Bind null render targets for shadow sampling + color.Kenneth Graunke2018-01-181-1/+32
| | | | | | | | | | | | | | | | | | | Portal 2 appears to bind RGBA8888_UNORM textures to a sampler2DShadow, and calls shadow2D() on it. This causes undefined behavior in OpenGL. Unfortunately, our sampler appears to hang in this scenario, which is not acceptable. Just give them a null surface instead, which returns all zeroes. Fixes GPU hangs in Portal 2 on Kabylake. Huge thanks to Jason Ekstrand for noticing this crazy behavior while sifting through crash dumps. Cc: [email protected] Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104487 Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv/query: implement multiview interactionsIago Toral Quiroga2018-01-181-0/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | From the Vulkan spec with KHX extensions: "If queries are used while executing a render pass instance that has multiview enabled, the query uses N consecutive query indices in the query pool (starting at query) where N is the number of bits set in the view mask in the subpass the query is used in. How the numerical results of the query are distributed among the queries is implementation-dependent. For example, some implementations may write each view's results to a distinct query, while other implementations may write the total result to the first query and write zero to the other queries. However, the sum of the results in all the queries must accurately reflect the total result of the query summed over all views. Applications can sum the results from all the queries to compute the total result." In our case we only really emit a single query (in the first query index) that stores the aggregated result for all views, but we still need to manage availability for all the other query indices involved, even if we don't actually use them. This is relevant when clients call vkGetQueryPoolResults and pass all N queries to retrieve the results. In that scenario, without this patch, we will never see queries other than the first being available since we never emit them. v2: we need the same treatment for timestamp queries. v3 (Jason): - Better an if instead of an early return. - We can't write to this memory in the CPU, we should use MI_STORE_DATA_IMM and emit_query_availability (Jason). v4 (Jason): - No need to take the value to write as parameter, just hard code it to 0. Fixes test failures in some work-in-progress CTS multiview+query tests. Reviewed-by: Jason Ekstrand <[email protected]>
* vc5: add missing files to the tarballEmil Velikov2018-01-181-0/+5
| | | | Signed-off-by: Emil Velikov <[email protected]>
* broadcom: add missing headers to the tarballEmil Velikov2018-01-181-2/+5
| | | | Signed-off-by: Emil Velikov <[email protected]>
* i965/screen: Allow drirc to set 'allow_rgb10_configs' again.Mario Kleiner2018-01-181-1/+6
| | | | | | | | | | | | | | Since setup of ALLOW_RGB10_CONFIGS was moved to i965's own brw_config_options.xml, this was hard-coded to false and could not be overriden by drirc. Add some parsing into i965's private screen->optionCache to enable drirc again. Fixes: b391fb26df9f1b ("dri_util: remove ALLOW_RGB10_CONFIGS option (v2)") Signed-off-by: Mario Kleiner <[email protected]> Cc: Marek Olšák <[email protected]> Cc: Tapani Pälli <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* anv: return VK_ERROR_OUT_OF_DEVICE_MEMORY when surface size is out of HW limitsSamuel Iglesias Gonsálvez2018-01-181-4/+2
| | | | | Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* ac: tidy up array indexing logicTimothy Arceri2018-01-181-5/+1
| | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* mesa/st: translate SO info in glsl_to_nir() caseRob Clark2018-01-181-4/+43
| | | | | | | | | | | | | | | | | | | | | | | This was handled for VS, but not for GS. Fixes for gallium drivers using nir: spec@arb_gpu_shader5@arb_gpu_shader5-xfb-streams-without-invocations spec@arb_gpu_shader5@arb_gpu_shader5-xfb-streams* spec@arb_transform_feedback3@arb_transform_feedback3-ext_interleaved_two_bufs_gs* spec@ext_transform_feedback@geometry-shaders-basic spec@ext_transform_feedback@* use_gs [email protected]@execution@geometry@primitive-id* [email protected]@execution@geometry@tri-strip-ordering-with-prim-restart gl_triangle_strip * [email protected]@transform-feedback-builtins [email protected]@transform-feedback-type-and-size v2: don't call st_translate_program_stream_output) for TCS v3: drop scanning patch outputs as TCS can't output xfb Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Timothy Arceri <[email protected]> Tested-by: Karol Herbst <[email protected]>
* r600/sb: add lds related peepholes.Dave Airlie2018-01-181-1/+8
| | | | | | | | | if no destination: a) convert _RET instructions to non _RET variants if no dst b) set src0 to undefined if it's a READ, this should get DCE then. Acked-By: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: use different stacks for tracking lds and queue usage.Dave Airlie2018-01-182-3/+24
| | | | | | | | | | | | | | The normal ssa renumbering isn't sufficient for LDS queue access, this uses two stacks, one for the lds queue, and one for the lds r/w ordering. The LDS oq values are incremented in their use in a linear fashion. The LDS rw values are incremented in their definitions and used in the next lds operation to ensure reordering doesn't occur. Acked-By: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: schedule LDS ops in appropriate places.Dave Airlie2018-01-182-0/+7
| | | | | | | | | | So LDS ops have to be SLOT_X, and LDS OQ reads have read port restrictions so we try and force those into only having one per slot and avoiding bank swizzles. Acked-By: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: hit the scheduler with a big hammer to avoid lds splits.Dave Airlie2018-01-181-0/+3
| | | | | | | | | | | | | This tries to avoid an lds queue read getting scheduled separately from an lds ret read, the non-sb code uses the same style of hammer, this isn't foolproof. We can do better, but it's a bit tricky, as you have to scan ahead and either schedule more lds oq moves and more lds reads and that could lead to you running out of space anyways. Acked-By: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: adding lds oq tracking to the schedulerDave Airlie2018-01-182-3/+15
| | | | | | | | | | | This adds support for tracking the lds oq read/writes so can avoid scheduling other things in between. This patch just adds the tracking and assert to show problems. Acked-By: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: add gcm support to avoid clause between lds read/queue readDave Airlie2018-01-182-2/+17
| | | | | | | | | | You have to schedule LDS_READ_RET _, x and MOV reg, LDS_OQ_A_POP in the same basic block/clause. This makes sure once we've issues and MOV we don't add another block until we balance it with an LDS read. Acked-By: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: handle lds special dest registers.Dave Airlie2018-01-182-2/+2
| | | | | | | This adds lds to the geom emit handling Acked-By: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: handle LDS operations in folding.Dave Airlie2018-01-181-0/+11
| | | | | | | Don't try and fold LDS using expressions. Acked-By: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: add finalising for lds output queue special values.Dave Airlie2018-01-181-0/+12
| | | | | | | We need to convert these to the hw special registers. Acked-By: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: add initial support for parsing lds operations.Dave Airlie2018-01-181-2/+50
| | | | | | | This handles parsing the LDS ops and queue accessess. Acked-By: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: disable if conversion for hsDave Airlie2018-01-181-1/+1
| | | | | | | This fixes bad interactions with the LDS special values. Acked-By: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: lds ops have no dst register.Dave Airlie2018-01-181-1/+1
| | | | | | | Although these are op3s they don't have a dst reg. Acked-By: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: introduce special register values for lds support.Dave Airlie2018-01-183-1/+33
| | | | | | | | | | | | | For LDS read/write ordering we use the LDS_RW value, reads will wait on previous writes. For LDS read/read from LDS queue ordering we use the LDS_OQ values, we define two for now, though initially we'll just support OQA. Also add the check for the lds oq values Acked-By: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: update last_cf if alu is the last clauseDave Airlie2018-01-181-0/+1
| | | | | | | | | | | It's rare to have a final alu clause on normal shaders (exports) but tess shaders write to LDS as their output, so we see some alu clauses, and the CF_END get put in the wrong place. This makes sure to update last_cf correctly. Acked-By: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: start adding GDS supportDave Airlie2018-01-1813-13/+123
| | | | | | | | | This adds support for GDS ops to sb backend. This seems to work for atomics and tess factor writes. Acked-By: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: add tess/compute initial state registers.Dave Airlie2018-01-181-1/+4
| | | | | | | This stops them being optimised out. Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/sb: fix a bug emitting ar load from a constant.Dave Airlie2018-01-181-0/+3
| | | | | | | | | | | | Some tess shaders were doing MOVA_INT _, c0.x on cayman, and then hitting an assert in sb_bc_finalize.cpp:translate_kcache. This makes sure the toplevel kcache tracker gets updated, and the clause gets fixed up. Reviewed-by: Roland Scheidegger <[email protected]> Cc: <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600/shader: only emit add instruction if param has a value.Dave Airlie2018-01-181-6/+8
| | | | | | | Just saves a pointless a = a + 0; Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: emit 0 gds_op for tf write.Dave Airlie2018-01-181-2/+3
| | | | | | | This field is ignored for tf writes so should be 0. Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: add support for ARB_shader_clock.Dave Airlie2018-01-184-6/+30
| | | | | Reviewed-by: Gert Wollny <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv/ws: get rid of useless return valueDave Airlie2018-01-181-3/+2
| | | | | | | This also used boolean, so nice to kill that. Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv: Initialize DCC on transition from preinitialized.Bas Nieuwenhuizen2018-01-181-1/+3
| | | | | | | | | Looks like the decompress does not handle invalid encodings well, which happens with random memory. Of course apps should not use it with random memory, but they are allowed to .... Fixes: 44fcf58744 "radv: Disable DCC for GENERAL layout and compute transfer dest." Reviewed-by: Dave Airlie <[email protected]>
* ac: fix buffer overflow bug in 64bit SSBO loadsTimothy Arceri2018-01-181-1/+4
| | | | | | Fixes: 441ee1e65b04 "radv/ac: Implement Float64 SSBO loads" Reviewed-by: Marek Olšák <[email protected]>
* ac: fix nir_intrinsic_get_buffer_size for radeonsiTimothy Arceri2018-01-181-2/+2
| | | | | Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* i965: Pass brw_growing_bo to grow_buffer().Kenneth Graunke2018-01-171-11/+9
| | | | | | Cleaner. Reviewed-by: Iago Toral Quiroga <[email protected]>
* i965: Make a helper for recreating growing buffers.Kenneth Graunke2018-01-171-13/+17
| | | | | | | | | | | Now that we have two of these, we're duplicating a bunch of this logic. The next commit will add more logic, which would make the duplication seem worse. This ends up setting EXEC_OBJECT_CAPTURE on the batch, which isn't necessary (it's already captured), but it should be harmless. Reviewed-by: Iago Toral Quiroga <[email protected]>
* i965: Replace cpu_map pointers with a "use_shadow_copy" boolean.Kenneth Graunke2018-01-172-21/+20
| | | | | | | | Having a boolean for "we're using malloc'd shadow copies for all buffers" is cleaner than having a cpu_map pointer for each. It was okay when we had one buffer, but this is more obvious. Reviewed-by: Iago Toral Quiroga <[email protected]>
* intel/fs: Optimize and simplify the copy propagation dataflow logic.Francisco Jerez2018-01-171-24/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously the dataflow propagation algorithm would calculate the ACP live-in and -out sets in a two-pass fixed-point algorithm. The first pass would update the live-out sets of all basic blocks of the program based on their live-in sets, while the second pass would update the live-in sets based on the live-out sets. This is incredibly inefficient in the typical case where the CFG of the program is approximately acyclic, because it can take up to 2*n passes for an ACP entry introduced at the top of the program to reach the bottom (where n is the number of basic blocks in the program), until which point the algorithm won't be able to reach a fixed point. The same effect can be achieved in a single pass by computing the live-in and -out sets in lock-step, because that makes sure that processing of any basic block will pick up the updated live-out sets of the lexically preceding blocks. This gives the dataflow propagation algorithm effectively O(n) run-time instead of O(n^2) in the acyclic case. The time spent in dataflow propagation is reduced by 30x in the GLES31.functional.ssbo.layout.random.all_shared_buffer.5 dEQP test-case on my CHV system (the improvement is likely to be of the same order of magnitude on other platforms). This more than reverses an apparent run-time regression in this test-case from my previous copy-propagation undefined-value handling patch, which was ultimately caused by the additional work introduced in that commit to account for undefined values being multiplied by a huge quadratic factor. According to Chad this test was failing on CHV due to a 30s time-out imposed by the Android CTS (this was the case regardless of my undefined-value handling patch, even though my patch substantially exacerbated the issue). On my CHV system this patch reduces the overall run-time of the test by approximately 12x, getting us to around 13s, well below the time-out. v2: Initialize live-out set to the universal set to avoid rather pessimistic dataflow estimation in shaders with cycles (Addresses performance regression reported by Eero in GpuTest Piano). Performance numbers given above still apply. No shader-db changes with respect to master. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104271 Reported-by: Chad Versace <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* gallium: remove PIPE_CAP_USER_CONSTANT_BUFFERSMarek Olšák2018-01-1718-26/+0
| | | | | Reviewed-by: Roland Scheidegger <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* st/mesa: assume that user constant buffers are always supportedMarek Olšák2018-01-174-34/+6
| | | | | Reviewed-by: Roland Scheidegger <[email protected]> Tested-by: Dieter Nützel <[email protected]>