summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* lima/ppir: clone ld_{uni,tex,var} into each blockVasily Khoruzhick2019-08-234-5/+103
| | | | | | | | | | | | | | | ppir_lower_load() and ppir_lower_load_texture() assume that node is in the same block as its successors, fix it by cloning each ld_uni and ld_tex to every block. It also reduces register pressure since values never cross block boundaries and thus never appear in live_in or live_out of any block, so do it for varyings as well. Tested-by: Andreas Baierl <[email protected]> Reviewed-by: Qiang Yu <[email protected]> Reviewed-by: Erico Nunes <[email protected]> Signed-off-by: Vasily Khoruzhick <[email protected]>
* lima/ppir: refactor const loweringVasily Khoruzhick2019-08-236-128/+99
| | | | | | | | | | | Const nodes are now cloned for each user, i.e. const is guaranteed to have exactly one successor, so we can use ppir_do_one_node_to_instr() and drop insert_to_each_succ_instr() Tested-by: Andreas Baierl <[email protected]> Reviewed-by: Qiang Yu <[email protected]> Reviewed-by: Erico Nunes <[email protected]> Signed-off-by: Vasily Khoruzhick <[email protected]>
* anv: Only re-emit non-dynamic state that has changed.Rafael Antognolli2019-08-232-24/+50
| | | | | | | | | | | | | | | | | | On commit f6e7de41d7b, we started emitting 3DSTATE_LINE_STIPPLE as part of the non-dynamic state. That gets re-emitted every time we bind a new VkPipeline. But that instruction is non-pipelined, and it caused a perf regression of about 9-10% on Dota2. This commit makes anv_dynamic_state_copy() return a mask with only the state that has changed when copying it. 3DSTATE_LINE_STIPPLE won't be emitted anymore unless it has changed, fixing the problem above. v2: Improve commit message and add documentation about skipped checks (Jason) Fixes: f6e7de41d7b ("anv: Implement VK_EXT_line_rasterization") Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* pan/decode: Validate and quiet helper invocation flagAlyssa Rosenzweig2019-08-231-1/+8
| | | | | | | | We can statically determine from the disassembly if helper invocations will be needed, so we can validate the corresponding bit in the cmdstream and thus avoid printing the bit itself in the decode. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Analyze helper invocationsAlyssa Rosenzweig2019-08-232-0/+22
| | | | | | | | We check for texture ops which calculate derivatives (either explicitly via dFd* or implicitly) and mark the shader as requiring helper invocations. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* util: fix compilation on macosLionel Landwerlin2019-08-231-1/+1
| | | | | | | | | | timespec_get() is not available on macos, we need to pull in the include/c11/threads_posix.h helper. Signed-off-by: Lionel Landwerlin <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103674 Fixes: e2d761de03 ("util: drop final reference to p_compiler.h") Reviewed-by: Eric Engestrom <[email protected]>
* i965: Silence brw_blorp uninitialized warningCaio Marcelo de Oliveira Filho2019-08-231-1/+1
| | | | | | | | | | The variables level and start_layer are not initialized, then initialized if we have a BUFFER_BIT_DEPTH set. We assert on them later using the same check. This should be enough but GCC 9.1.1 is not convinced, so let's initialize the variables. Acked-by: Eric Engestrom <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* tgsi: Remove unused localCaio Marcelo de Oliveira Filho2019-08-231-1/+0
| | | | | | | | Code that used it was removed in 4ebe6b2e72e ("tgsi: Drop the SSE2 constants setup that's been dead code since 2011.") Acked-by: Eric Engestrom <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* iris: Guard GEN9-only function in Iris state to avoid warningCaio Marcelo de Oliveira Filho2019-08-231-0/+2
| | | | | Acked-by: Eric Engestrom <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel/decoders: Avoid uninitialized variable warningsCaio Marcelo de Oliveira Filho2019-08-231-2/+2
| | | | | | | | | Initialize `next_batch_addr` and `second_level`. If the batch is well formed, those values will be overriden, if not, they are as good as uninitialized garbage. Acked-by: Eric Engestrom <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* compiler/glsl: Fix warning about unused functionCaio Marcelo de Oliveira Filho2019-08-231-1/+3
| | | | | | | | | | The helper check_node_type() is only used when DEBUG is set (in the function below), but ASSERTED macro uses NDEBUG. So just guard the helper with #ifdef. If we see more such cases we might consider a ASSERTED-like macro for the DEBUG case. Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: Drop unused local variableCaio Marcelo de Oliveira Filho2019-08-231-1/+0
| | | | | | | | Leftover from 021fa28163a ("xintel/nir: Add a helper for getting BRW_AOP from an intrinsic"). Acked-by: Eric Engestrom <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel/compiler: Silence maybe-uninitialized warning in GCC 9.1.1Caio Marcelo de Oliveira Filho2019-08-231-1/+3
| | | | | | | | | | | | | | | Compiler can't see that d is initialized. ../src/intel/compiler/brw_vec4_nir.cpp: In function ‘int brw::try_immediate_source(const nir_alu_instr*, brw::src_reg*, bool, const gen_device_info*)’: ../src/intel/compiler/brw_vec4_nir.cpp:984:12: warning: ‘d’ may be used uninitialized in this function [-Wmaybe-uninitialized] 984 | d = MAX2(-d, d); Assert that we expect at least one component -- hence d going to be set. That by itself is not enough, so also zero initialize the variable. Acked-by: Eric Engestrom <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* radv: additional query fixesAndres Rodriguez2019-08-171-7/+8
| | | | | | | | | Make sure we read the updated data from the gpu in cases where WAIT_BIT is not set. Cc: 19.1 19.2 <[email protected] Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* iris: Fix large timeout handling in rel2abs()Kenneth Graunke2019-08-231-13/+14
| | | | | | | | | | | ...by copying the implementation of anv_get_absolute_timeout(). Appears to fix a CTS test with 32-bit builds: GTF-GL46.gtf32.GL3Tests.sync.sync_functionality_clientwaitsync_flush Fixes: f459c56be6b ("iris: Add fence support using drm_syncobj") Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* iris: Set MOCS in all STATE_BASE_ADDRESS commandsKenneth Graunke2019-08-231-1/+14
| | | | | | | | | | | | | | | | | | | | | | | | Rafael Antognolli tracked down a performance gap between i965 and iris in Synmark2's OglCSDof microbenchmark, noting that iris was performing substantially more memory reads and writes, with substantially fewer L3 hits. He suggested that something might be wrong with MOCS, or L3 configs, at which point I came up with a theory... It would appear that the STATE_BASE_ADDRESS command updates the MOCS settings for various base addresses even if you don't specify the "Modify Enable" bit for that address. Until now, we had been setting only the MOCS for bases we intended to change, leaving the others "blank" which is MOCS table entry 0, which is uncached. Most data access has a more specific MOCS (e.g. in SURFACE_STATE), but scratch access uses the Stateless Data Port Access MOCS from STATE_BASE_ADDRESS. So this meant all scratch access was uncached. Improves performance in Synmark2's OglCSDof by 2x, bringing iris on par with the existing i965 driver. Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* glx: Fix up glXQueryGLXPbufferSGIX on macOS.Vinson Lee2019-08-231-1/+0
| | | | | | | | | | | | | Fix this build error on macOS. ../src/glx/apple/glx_empty.c:158:4: error: void function 'glXQueryGLXPbufferSGIX' should not return a value [-Wreturn-type] return 0; ^ ~ Fixes: 3dd299c3d5b8 ("glx: Sync <GL/glxext.h> with Khronos") Signed-off-by: Vinson Lee <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Adam Jackson <[email protected]>
* docs: update calendar, add news item and link release notes for 19.1.5Juan A. Suarez Romero2019-08-233-7/+8
| | | | Signed-off-by: Juan A. Suarez Romero <[email protected]>
* docs: add sha256 checksums for 19.1.5Juan A. Suarez Romero2019-08-231-1/+1
| | | | | Signed-off-by: Juan A. Suarez Romero <[email protected]> (cherry picked from commit ae2a676cd1748c850f579863003c92f2b137f44a)
* docs: add release notes for 19.1.5Juan A. Suarez Romero2019-08-231-0/+119
| | | | | Signed-off-by: Juan A. Suarez Romero <[email protected]> (cherry picked from commit a384fe0cebf1fcd6671c51c749fcc981e01b5505)
* radeonsi/nir: Rewrite output scanningConnor Abbott2019-08-231-126/+150
| | | | | | | | | Similarly to before, this didn't properly handle varying structs with doubles in them. This doesn't fix any tests, but was noticed while looking at the code. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi/nir: Rewrite store intrinsic gatheringConnor Abbott2019-08-231-59/+84
| | | | | | | | | | | The old version wasn't as accurate as it could be, and didn't handle double variables inside structs correctly. Walk the path to compute the actual components affected. In combination with the previous commit fixes KHR-GL45.enhanced_layouts.varying_structure_locations. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi/nir: Add const_index when loading GS inputsConnor Abbott2019-08-231-1/+1
| | | | | | This fixes loading GS inputs in structures or arrays. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi/nir: Don't add const offset to indirectConnor Abbott2019-08-231-19/+6
| | | | | | | | | This is already done in get_deref_offset() in the common code. We were adding it twice accidentally. Fixes KHR-GL45.enhanced_layouts.varying_array_locations. Reviewed-by: Marek Olšák <[email protected]>
* ac/nir: Assert GS input index is constantConnor Abbott2019-08-231-0/+1
| | | | | | If it's not we silently ignore indir_index which is definitely a bug. Reviewed-by: Marek Olšák <[email protected]>
* ac/nir: Handle const array offsets in get_deref_offset()Connor Abbott2019-08-231-6/+11
| | | | | | | | | | | Some users of this function (e.g. GS inputs) currently only work with constant offsets. We got lucky since all the tests used an array index of 0, so the non-constant part was always 0. But we still need to handle this. This doesn't fix any CTS test, but was noticed while debugging one. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi/nir: Don't recompute num_inputs and num_outputsConnor Abbott2019-08-231-24/+3
| | | | | | Don't repeat what mesa/st already does. Reviewed-by: Marek Olšák <[email protected]>
* st/nir: Fix num_inputs for VS inputsConnor Abbott2019-08-231-3/+2
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radv/gfx10: do not use NGG with NAVI14Samuel Pitoiset2019-08-231-0/+1
| | | | | | Cc: 19.2 <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv/gfx10: don't initialize VGT_INSTANCE_STEP_RATE_0Samuel Pitoiset2019-08-231-1/+2
| | | | | | | | | Only gfx9 and older use it to get InstanceID in VGPR1. Ported from RadeonSI. Cc: 19.2 <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* gitlab-ci: bump LLVM to 8 for meson-vulkan and meson-cloverSamuel Pitoiset2019-08-231-2/+3
| | | | | | | To fix pipeline builds. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* ac,radv,radeonsi: remove LLVM 7 supportSamuel Pitoiset2019-08-2314-321/+66
| | | | | | | | Now that LLVM 9 will be released soon, we will only support LLVM 8, 9 and master (10). Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* egl: reset blob cache set/get functions on terminateTapani Pälli2019-08-231-0/+4
| | | | | | | | | Fixes errors seen with eglSetBlobCacheFuncsANDROID on Android when running dEQP that terminates and reinitializes a display. Fixes: 6f5b57093b3 "egl: add support for EGL_ANDROID_blob_cache" Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* iris: Avoid unnecessary resolves on transfer mapsKenneth Graunke2019-08-222-16/+31
| | | | | | | | | | We were always resolving the buffer as if we were accessing it via CPU maps, which don't understand any auxiliary surfaces. But we often copy to a temporary using BLORP, which understands compression just fine. So we can avoid the resolve, and accelerate the copy as well. Fixes: 9d1334d2a0f ("iris: Use copy_region and staging resources to avoid transfer stalls") Reviewed-by: Rafael Antognolli <[email protected]>
* iris: Drop copy format hacks from copy region based transfer path.Kenneth Graunke2019-08-221-16/+5
| | | | | | | | | | | | | | This doesn't work for compressed formats, as the source texture and temporary texture would have different block sizes. (Forcing the driver to always take the GPU path would expose the bug.) Instead, just use the source format for the temporary, and let blorp_copy deal with overrides. The one case where we can't do this is ASTC, because isl won't let us create a linear ASTC surface. Fall back to the CPU paths there for now. Fixes: 9d1334d2a0f ("iris: Use copy_region and staging resources to avoid transfer stalls") Reviewed-by: Rafael Antognolli <[email protected]>
* iris: Update fast clear colors on Gen9 with direct immediate writes.Kenneth Graunke2019-08-222-9/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Gen11 stores the fast clear color in an "indirect clear buffer", as a packed pixel value. Gen9 hardware stores it as a float or integer value, which is interpreted via the format. We were trying to store that in a buffer, for similarity with Icelake, and MI_COPY_MEM_MEM it from there to the actual SURFACE_STATE bytes where it's stored. This unfortunately doesn't work for blorp_copy(), which does bit-for-bit copies, and overrides the format to a CCS-compatible UINT format. This causes the clear color to be interpreted in the overridden format. Normally, we provide the clear color on the CPU, and blorp_blit.c:2611 converts it to a packed pixel value in the original format, then unpacks it in the overridden format, so the clear color we use expands to the bits we originally desired. However, BLORP doesn't support this pack/unpack with an indirect clear buffer, as it would need to do the math on the GPU. On Gen11+, it isn't necessary, as the hardware does the right thing. This patch changes Gen9 to stop using an indirect clear buffer and simply do PIPE_CONTROLs with post-sync write immediate operations to store the new color over the surface states for regular drawing. BLORP continues streaming out surface states, and handles fast clear colors on the CPU. Fixes: 53c484ba8ac ("iris: blorp using resolve hooks") Reviewed-by: Rafael Antognolli <[email protected]>
* iris: Fix broken aux.possible/sampler_usages bitmask handlingKenneth Graunke2019-08-221-5/+6
| | | | | | | | | | | | | | | | | | | | | | For renderable surfaces, we allocate SURFACE_STATEs for each bit in res->aux.possible_usages. Sampler views use res->aux.sampler_usages. When pinning buffers, we call surf_state_offset_for_aux() to calculate the offset to the desired surface state. surf_state_offset_for_aux() took an aux_modes parameter, which should be one of those two fields. However...it was not using that parameter. It always used the broader res->aux.possible_usages field directly. One of the callers, update_clear_value(), was passing incorrect masks for this parameter. It iterated through the bits in order, using u_bit_scan(), which destructively modifies the mask. So each time we called it, the count of bits before our selected mode was 0, which would cause us to always update the SURFACE_STATE for ISL_AUX_USAGE_NONE, rather than updating each in turn. This was hidden by the earlier bug where surf_state_offset_for_aux() ignored the parameter. Fixes: 7339660e803 ("iris: Add aux.sampler_usages.") Reviewed-by: Rafael Antognolli <[email protected]>
* iris: Replace devinfo->gen with GEN_GENKenneth Graunke2019-08-221-22/+18
| | | | | | | This is genxml, we can compile out this code. Fixes: 26606672847 ("iris/gen8: Re-emit the SURFACE_STATE if the clear color changed.") Reviewed-by: Rafael Antognolli <[email protected]>
* pan/midgard: Fix writeout combiningAlyssa Rosenzweig2019-08-221-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | shader-db regression in the scheduler. Fixes: dff4986b1aa ("pan/midgard: Emit store_output branch just-in-time") total bundles in shared programs: 2055 -> 2019 (-1.75%) bundles in affected programs: 1055 -> 1019 (-3.41%) helped: 36 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.35% max: 20.00% x̄: 6.71% x̃: 5.16% 95% mean confidence interval for bundles value: -1.00 -1.00 95% mean confidence interval for bundles %-change: -8.45% -4.97% Bundles are helped. total quadwords in shared programs: 3444 -> 3408 (-1.05%) quadwords in affected programs: 1897 -> 1861 (-1.90%) helped: 36 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.19% max: 14.29% x̄: 3.97% x̃: 2.99% 95% mean confidence interval for quadwords value: -1.00 -1.00 95% mean confidence interval for quadwords %-change: -5.08% -2.86% Quadwords are helped. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Implement gl_FragCoord correctlyAlyssa Rosenzweig2019-08-225-25/+32
| | | | | | | | Rather than passing through the transformed gl_Position, we can use the hardware-level varying for this, which will correctly handle gl_FragCoord.w Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Remove vertex buffer offset from its sizeAlyssa Rosenzweig2019-08-221-2/+5
| | | | | | | | | | | | | | | | | | | | | The offset is added to the base address, so we need to subtract it from the size to maintain the same end address and thus prevent a buffer overflow: end_address = start_address + size start_address' = start_address + offset size' = size - offset end_address' = start_address' + size' = (start_address + offset) + (size - offset) = (start_address + size) + (offset - offset) = start_address + size = end_address QED. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/decode: Handle special varyingsAlyssa Rosenzweig2019-08-221-5/+41
| | | | | | | We need a special path for special varyings so we parse them correctly instead of throwing an error when they inevitably point to bad memory. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/decode: Remove size/stride divisibility checkAlyssa Rosenzweig2019-08-221-7/+3
| | | | | | | | | The hardware doesn't care, and a lot of Panfrost code relies on an oversized buffer. The important part is that (stride * padded_num_vertices) is no greater than size, which we'll need to check once we validate instancing. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/decode: Decouple attribute/meta printingAlyssa Rosenzweig2019-08-221-4/+8
| | | | | | They are independent fields, so the parser should reflect that. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/decode: Print stub for uniformsAlyssa Rosenzweig2019-08-221-1/+11
| | | | | | | We don't need to dump the contents necessary, but having the stub with the address is useful. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/decode: Decode actual varying_meta addressAlyssa Rosenzweig2019-08-221-1/+1
| | | | | | | I don't know who thought this mask was a good idea but unfortunately it must have been me. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/decode: Downgrade shader property mismatch to warningAlyssa Rosenzweig2019-08-221-1/+2
| | | | | | | If we permit more $whatever through than the shader needs, that's a bit of a waste, but it isn't an error. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/decode: Validate, but do not print, index bufferAlyssa Rosenzweig2019-08-222-30/+29
| | | | | | | We don't actually care about the *contents* of the index buffer, but we would rather like to ensure it is present and of the correct size. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/decode: Validate mali_shader_meta statsAlyssa Rosenzweig2019-08-221-35/+78
| | | | | | | | | We can infer these stats in many cases from the disassembly, so we should try to sanity check where we can. We may need to be fuzzy about analysis, since analysis gives us a bound but we don't mind if it's not used fully by the shader. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/decode: Disassemble before printing shader descriptorAlyssa Rosenzweig2019-08-221-10/+8
| | | | | | This allows the shader descriptor to access the disassembled stats. Signed-off-by: Alyssa Rosenzweig <[email protected]>