summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* glsl: Make interlock builtins follow same compiler rules as barriersCaio Marcelo de Oliveira Filho2019-06-101-5/+10
| | | | | | | | | | | Generalize the barrier code to provide correct error messages for other builtins. Fixes most of piglit compilation tests for ARB_fragment_shader_interlock. Reviewed-by: Tapani Pälli <[email protected]> Reviewed-by: Plamena Manolova <[email protected]>
* nir/opt_algebraic: Fix rules for imadsh_mix16Eduardo Lima Mitev2019-06-101-2/+2
| | | | | | | | | | | | | | | | | | The rules added in patch 3addd7c are inverted: It should be: (al * bh) << 16 + c instead of: (ah * bl) << 16 + c Fixes a number of regressions under dEQP-GLES31.functional.draw_indirect.compute_interop.large.* on Freedreno. Reviewed-by: Rob Clark <[email protected]>
* panfrost: Ignore discards in dead branch analysisAlyssa Rosenzweig2019-06-101-0/+5
| | | | | | | Fixes regressions in dEQP-GLES2.functional.shaders.discard.dynamic_loop_* Signed-off-by: Alyssa Rosenzweig <[email protected]>
* radv: fix setting CB_SHADER_MASK for dual source blendingSamuel Pitoiset2019-06-101-2/+5
| | | | | | | | | | | CB_SHADER_MASK was computed without the second color buffer format which looks totally wrong to me. While we are at it, copy a comment from RadeonSI. Cc: 19.0 19.1 <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-By: Bas Nieuwenhuizen <[email protected]>
* panfrost/midgard: Disambiguate register modeAlyssa Rosenzweig2019-06-101-1/+11
| | | | | | | | | | | | | We postfix instructions by their size if a destination override is in place (a la AT&T assembly), disambiguating instruction sizes. Previously, "16-bit instruction, 16-bit dest, 16-bit sources" disassembled identically to "32-bit instruction, 16-bit dest, 16-bit sources", which is semantically distinct due to the lessened opportunity for parallelism but (potentially) greater precision. Adding a postfix removes the ambiguity and relieves mental gymnastics reading weird disassemblies even in some cases that are not ambiguous. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard: Expose vec8/vec16 modesAlyssa Rosenzweig2019-06-101-236/+273
| | | | | | | | | | | | | | | | | | | | | | | | Midgard ALUs can operate in one of four modes: vec2 64-bit, vec4 32-bit, vec8 16-bit, or vec16 8-bit. Our compiler (and indeed, any OpenGL ES shader) only uses 32-bit (and eventually vec4 16-bit) modes in normal circumstances. Nevertheless, the other modes do exist and are easily accessible through OpenCL; they also come up in cases like blend shaders. While we have had minimal support for decoding 8-bit/64-bit modes, we did so pretending they were vec4 in each case; 16-bit registers had a synthetically duplicated register file to separate lo/hi halves, etc. This works for GL, but it doesn't map to what the hardware is -actually- doing, which can cause some headscratchingly bizarre disassemblies from OpenCL. So, we dive in the deep end and support these other modes natively in the disassembler, using absurdly long masks/swizzles, since the hardware is considerably more flexible than what was exposed before. Outside of some fixed routines for blending, none of the above is supported in the compiler yet. But it's better to have it in the ISA definitions and disassembler than not, for future use if nothing else. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard: Add shifting int modifiersAlyssa Rosenzweig2019-06-102-18/+14
| | | | | | | | | | | | | | | | | | | | | As a source modifier, shift allows shifting a value left by the bit size, useful in conjunction with a greater register mode, for instance to implement `upsample`. As a concrete example, the following OpenCL: ushort hr0 = /* ... */, uint r1 = /* ... */; uint r2 = (convert_uint(hr0) << 16) ^ b; compiles to the following Midgard assembly: ixor r, (hr0) << 16, b In reverse, the ".hi" output modifier shifts the value right by the bit size, leaving just the carry/overflow at the bottom. To implement *_hi functions in OpenCL (for <64-bit), we do arithmetic in the 2x higher mode with the .hi modifier. (For 64-bit, things are hairier, since there is not an 128-bit int mode). Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard: Add integer outmodsAlyssa Rosenzweig2019-06-103-25/+60
| | | | | | | | | | | | | | For floats, output modifiers determine clamping behaviour. For integers, they determine wrapping/saturation behaviour (or shifting -- see next commit). These are very different; they are conceptually two unrelated enums union'ed together; the distinction is responsible for many-a-bug. While clamping behaviour for floats was clear from GL, the int behaviour is only known From OpenCL contortion with convert_*_sat() functions. With the underlying functions known, clean up the codebase, likely fixing outmod type related bugs in the process. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard: Note floating compares type convertAlyssa Rosenzweig2019-06-101-4/+4
| | | | | | | | | | | | | | | | OP_TYPE_CONVERTS denotes an opcode that returns a different type than is source (going from int-domain to float-domain or vice versa), named after the f2i/i2f family of opcodes it covers. We care because source mods are determined by the source type (i/f) but output modifiers are determined by the output type (equals the source type, unless the op type converts, in which case it's the opposite). The upshot is that floating-point compares (feq/fne/etc) actually do type-convert. That is, that take in floating-points and output in integer space (a boolean), so we mark them off this way to ensure the correct output modifiers are used. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Align linear renderable resourcesAlyssa Rosenzweig2019-06-101-3/+10
| | | | | | | | It's just -easier- to render to aligned framebuffers. For winsys targets, we already align, but even for an internal linear FBO we ought to align everything nicely. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Fix stride check when mipmappingAlyssa Rosenzweig2019-06-101-7/+15
| | | | | | | | | Now that we support custom strides on mipmapped textures (theoretically, at least), extend the stride check to support mipmaps. Fixes incorrect strides of linear windows in Weston. Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Tomeu Vizoso <[email protected]>
* panfrost: Refactor texture/sampler uploadAlyssa Rosenzweig2019-06-103-100/+124
| | | | | | | | | | | | | We move some coding packing the texture/sampler descriptors into dedicated functions (out of the terrifyingly long emit_for_draw monolith), cleaning them up as we go. The discovery triggering the cleanup is the format for including manual strides in the presence of mipmaps/cubemaps. Rather than placed at the end like previously assumed, they are interleaved after each address. This difference is relevant when handling NPOT linear mipmaps. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Refactor blitting codeAlyssa Rosenzweig2019-06-105-79/+170
| | | | | | | | | We refactor the wallpaper rendering code to separate the wallpaper-specific bits from the general blitting capabilities. In the (hopefully near) future, we'll turn this on to implement real Gallium blits, e.g. for automatic mipmap generation. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Refactor AFBC codeAlyssa Rosenzweig2019-06-104-64/+154
| | | | | | | | This patch does a substantial cleanup of the code for handling AFBC, moving various disparate misplaced functions into a new central pan_afbc.c file. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Move pan_screen() to pan_screen.hAlyssa Rosenzweig2019-06-102-6/+6
| | | | | | Trivial. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Always align strides to cache line (64)Alyssa Rosenzweig2019-06-101-1/+7
| | | | | | (Performance tweak.) Signed-off-by: Alyssa Rosenzweig <[email protected]>
* docs: fixup 19.0.5 <> 19.0.6 confusionEmil Velikov2019-06-101-1/+1
| | | | | | | | | The title of the release notes says 19.0.5 while the rest of the file (correctly) says 19.0.6 Fixes: fe79d75ccf9 ("docs: Add relnotes for 19.0.6") Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Dylan Baker <dylan at pnwbakers.com>
* mapi: correctly handle the full offset tableEmil Velikov2019-06-102-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | Earlier commit converted ES1 and ES2 to a new, much simpler, dispatch generator. At the same time, GL/glapi and the driver side are still using the old code. There is a hidden ABI between GL*.so and glapi.so, former referencing entry-points by offset in the _glapi_table. Hence earlier commit added the full table of entry-points, alongside a marker for other cases like indirect GL(X) and driver-size remapping. Yet the patches did not handle things fully, thus it was possible to get different interpretations of the dispatch table after the marker. This commit fixes that adding an indicative error message to catch future bugs. While here correct the marker (MAX_OFFSETS) comment. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110302 Fixes: cf317bf0937 ("mapi: add all _glapi_table entrypoints tostatic_data.py") Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* mapi: add static_date offset to EXT_dsaEmil Velikov2019-06-101-0/+19
| | | | | | | | | | As elaborated in the next patch, there is some hidden ABI that effectively require most entrypoints to be listed in the file. Cc: Marek Olšák <[email protected]> Fixes: d2906293c43 ("mesa: EXT_dsa add selectorless matrix stackfunctions") Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* mapi: add static_date offset to MaxShaderCompilerThreadsKHREmil Velikov2019-06-101-0/+1
| | | | | | | | | | | As elaborated in the next patch, there is some hidden ABI that effectively require most entrypoints to be listed in the file. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110302 Cc: Marek Olšák <[email protected]> Fixes: c5c38e831ee ("mesa: implement ARB/KHR_parallel_shader_compile") Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* egl: Let the caller of dri2_create_drawable decide about loaderPrivate.Mathias Fröhlich2019-06-108-15/+10
| | | | | | | | | | | In the call arguments to dri2_create_drawable decouple loaderPrivate from dri2_surf. For all callers of dri2_create_drawable the two pointers are the same with the exception of the gbm backed platform. Let the calling code of dri2_create_drawable decide what loaderPrivate shall be. Reviewed-by: Kenneth Graunke <[email protected]> Signed-off-by: Mathias Fröhlich <[email protected]>
* radv: fix alpha-to-coverage when there is unused color attachmentsSamuel Pitoiset2019-06-101-1/+1
| | | | | | | | | | | | When alphaToCoverage is enabled, we should always write the alpha channel of MRT0 if it's unused. This now matches RadeonSI. This fixes the new CTS: dEQP-VK.pipeline.multisample.alpha_to_coverage_unused_attachment.samples_*.alpha_invisible Cc: 19.0 19.1 <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-By: Bas Nieuwenhuizen <[email protected]
* panfrost: ci: Switch from direct Docker use to buildahTomeu Vizoso2019-06-103-160/+166
| | | | | | | | | | | | | Use the infrastructure in wayland/ci-templates to build the container images. This prevents from getting into some situations in which the images wouldn't be rebuilt, and allows us to share some infrastructure with other projects in freedesktop.org. Signed-off-by: Tomeu Vizoso <[email protected]> Suggested-by: Michel Dänzer <[email protected]> Acked-by: Alyssa Rosenzweig <[email protected]>
* gallium/u_transfer_helper: Free the staging buffer on unmap.Kenneth Graunke2019-06-091-0/+1
| | | | | | u_transfer_helper sometimes mallocs a staging buffer, and leaked it. Reviewed-by: Eric Engestrom <[email protected]>
* intel/gpu_dump: fix argument passingLionel Landwerlin2019-06-092-3/+3
| | | | | | | | | | | We were dropping "/' around arguments grouped together. This was triggering failures with : $ ./framemetrics -g "Memory Writes Distribution Gen9" -o /tmp/output.csv -f ./my.trace 10 11 Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* util/os_file: suppress sign comparison warningEric Engestrom2019-06-091-1/+1
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* util/os_file: fix error being sign-cast back and forthEric Engestrom2019-06-091-1/+1
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* util/os_file: avoid shadowing read() with a local variableEric Engestrom2019-06-091-5/+5
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* util/os_file: actually return the error read() gave usEric Engestrom2019-06-091-1/+3
| | | | | | Fixes: 316964709e21286c2af5 "util: add os_read_file() helper" Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* virgl: Work around possible memory exhaustionAlexandros Frantzis2019-06-073-3/+22
| | | | | | | | | | | | | | | | | | | | | Since we don't normally flush before performing copy transfers, it's possible in some scenarios to use too much memory for staging resources and start failing. This can happen either because we exhaust the total available memory (including system memory virtio-gpu swaps out to), or, more commonly, because the total size of resources in a command buffer doesn't fit in virtio-gpu video memory. To reduce the chances of this happening, force a flush before a copy transfer if the total size of queued staging resources exceeds a certain limit. Since after a flush any queued staging resources will be eventually released, this ensures both that each command buffer doesn't require too much video memory, and that we don't end up consuming too much memory for staging resources in total. Fixes kernel errors reported when running texture_upload tests in glbench. Signed-off-by: Alexandros Frantzis <[email protected]> Reviewed-by: Chia-I Wu <[email protected]>
* virgl: Remove incorrect resource wait conditionAlexandros Frantzis2019-06-071-13/+0
| | | | | | | | | | Now that we have copy transfers in place, we can remove the incorrect resource wait condition. Copy transfers and other optimizations minimize the performance impact of this removal, while providing the correct behavior. Signed-off-by: Alexandros Frantzis <[email protected]> Reviewed-by: Chia-I Wu <[email protected]>
* virgl: Use copy transfers for texturesAlexandros Frantzis2019-06-072-9/+87
| | | | | | | | | | Extend copy transfers to also be used for busy textures. Performance results: Unigine Valley, qemu before: 22.7 FPS after: 23.1 FPS Signed-off-by: Alexandros Frantzis <[email protected]> Reviewed-by: Chia-I Wu <[email protected]>
* virgl: Use buffer copy transfers to avoid waiting when mappingAlexandros Frantzis2019-06-076-6/+137
| | | | | | | | | | | | | | | | | | | | We typically need to wait for a buffer to become ready before mapping, so that we don't write new contents while the host is still using the old contents. However, if we are allowed to discard the contents of the mapped buffer range, then we can avoid waiting by using a staging buffer range which we guarantee to never be busy, copying from the staging buffer range to the target buffer in the host. This commit implements this optimization by utilizing a dedicated u_upload_mgr for the staging buffer. Performance results: Twilight Struggle (Steam/Proton), qemu before: 7 FPS after: 25 FPS glmark2 ubo, qemu before: 38 FPS after: 331 FPS Signed-off-by: Alexandros Frantzis <[email protected]> Suggested-by: Gurchetan Singh <[email protected]> Reviewed-by: Chia-I Wu <[email protected]>
* virgl: Support copy transfersAlexandros Frantzis2019-06-075-5/+70
| | | | | | | | | | | | | | | Support transfers that use a different resource as the source of data to transfer. This will be used in upcoming commits to send data to host buffers through a transfer upload buffer, in order to avoid waiting when the buffer resource is busy. Note that we don't support queueing copy transfers in the transfer queue. Copy transfers should be emitted directly in the command queue, allowing us to avoid flushes before them and leads to better performance. Signed-off-by: Alexandros Frantzis <[email protected]> Reviewed-by: Chia-I Wu <[email protected]>
* virgl: Add copy_transfer3d definitionsAlexandros Frantzis2019-06-072-0/+9
| | | | | | | | | | Introduce definitions for the copy_transfer3d protocol command and virgl capability. This command transfers data to the host by copying through another resource, and will be used in upcoming commits to avoid waiting when transferring data for busy resources. Signed-off-by: Alexandros Frantzis <[email protected]> Reviewed-by: Chia-I Wu <[email protected]>
* virgl: Make VIRGL_BIND_STAGING resources cacheableAlexandros Frantzis2019-06-072-2/+4
| | | | | | | | This could help performance when trying to recreate such resources for copy transfers. Signed-off-by: Alexandros Frantzis <[email protected]> Reviewed-by: Chia-I Wu <[email protected]>
* virgl: Support VIRGL_BIND_STAGINGAlexandros Frantzis2019-06-073-4/+16
| | | | | | | | | Support a new virgl bind type for staging buffers which don't require dedicated host-side storage. These will be used to implement copy transfers. Signed-off-by: Alexandros Frantzis <[email protected]> Reviewed-by: Chia-I Wu <[email protected]>
* virgl: Avoid unfinished transfer_get with PIPE_TRANSFER_DONTBLOCKAlexandros Frantzis2019-06-071-9/+12
| | | | | | | | | | | | | If we are not allowed to block, and we know that we will have to wait, either because the resource is busy, or because it will become busy due to a readback, return early to avoid performing an incomplete transfer_get. Such an incomplete transfer_get may finish at any time, during which another unsynchronized map could write to the resource contents, leaving the contents in an undefined state. Signed-off-by: Alexandros Frantzis <[email protected]> Suggested-by: Chia-I Wu <[email protected]> Reviewed-by: Chia-I Wu <[email protected]>
* virgl: Deduplicate checks for resource cachingAlexandros Frantzis2019-06-074-20/+14
| | | | | | | | | | | Also fixes a missed check for VIRGL_BIND_CUSTOM in one of the duplicate code snippets. Note that legacy fences also use VIRGL_BIND_CUSTOM, but we ensured they don't go through the cache in the previous commit. Signed-off-by: Alexandros Frantzis <[email protected]> Reviewed-by: Chia-I Wu <[email protected]>
* virgl: Don't try to use cached resources for legacy fencesAlexandros Frantzis2019-06-072-6/+12
| | | | | | | | Resources for fences should not be from the cache, since we are basing the fence status on the resource creation busy status. Signed-off-by: Alexandros Frantzis <[email protected]> Reviewed-by: Chia-I Wu <[email protected]>
* virgl: More info about chosen alignment valueAlexandros Frantzis2019-06-071-0/+5
| | | | | | | Add more info about why the value of VIRGL_MAP_BUFFER_ALIGNMENT. Signed-off-by: Alexandros Frantzis <[email protected]> Reviewed-by: Chia-I Wu <[email protected]>
* virgl: store all info about atomic buffersChia-I Wu2019-06-072-16/+23
| | | | | | | | | We will need the full info. This also speeds up virgl_attach_res_atomic_buffers and fixes resource leaks when the context is destroyed. Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Alexandros Frantzis <[email protected]>
* virgl: add shader images to virgl_shader_binding_stateChia-I Wu2019-06-072-14/+27
| | | | | | | It replaces virgl_context::images. Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Alexandros Frantzis <[email protected]>
* virgl: add SSBOs to virgl_shader_binding_stateChia-I Wu2019-06-072-14/+26
| | | | | | | It replaces virgl_context::ssbos. Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Alexandros Frantzis <[email protected]>
* virgl: add UBOs to virgl_shader_binding_stateChia-I Wu2019-06-072-20/+37
| | | | | | | It replaces virgl_context::ubos. Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Alexandros Frantzis <[email protected]>
* virgl: add virgl_shader_binding_stateChia-I Wu2019-06-072-43/+44
| | | | | | | | | | | | | | virgl_shader_binding_state will be used to manage all per-stage shader bindings. For now, it manages only sampler views. This replaces virgl_textures_info and fixes some issues - start_slot is now honored - views outside of [start_slot, slart_slot+count) are unmodified - views are released when the context is destroyed Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Alexandros Frantzis <[email protected]>
* iris: Zero shs->cbuf0 when binding a passthrough TCSKenneth Graunke2019-06-071-0/+16
| | | | | | Fixes valgrind errors when running two CTS tests back to back: - KHR-GL45.shader_image_load_store.basic-allTargets-loadStoreT* (The first test has an actual TCS, the second uses passthrough.)
* intel/blorp: Only double the fast-clear rect alignment on HSWJason Ekstrand2019-06-071-10/+15
| | | | | | | | | This restriction was accidentally added to the BSpec/PRM as an unrestricted restriction starting with the HSW docs and it was never removed. However, it only ever applied to HSW and actually potentially causes problems on BDW and above where we have mipmapped fast-clears. Reviewed-by: Nanley Chery <[email protected]>
* freedreno/a6xx: re-arrange program stageobj/groupRob Clark2019-06-074-30/+58
| | | | | | | | | | | | | | Split out a separate program config state group to run early before the other groups. This seems to help w/ intermittent "missed tiles" (although I had assumed that was a mem2gmem issue), or at least I can't reproduce that issue with this patch, but can without. It has the benefit of HLSQ_VS_CNTL.CONSTLEN matching for VS and BS. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno/a6xx: fix hangs with newer sqe fwRob Clark2019-06-071-32/+81
| | | | | | | | | | | | | | | | | | | With the newer (v1.76) fw, we were getting hangs (compared to older v1.66 fw). Re-work the GMEM code to structure things a bit closer to the blob. This moves some PKT7 packets from IB2 to IB1, which I think is what was confusing SQE and causing it to get stuck in an infinite loop. But in general structuring things at least closer to the same way blob does makes it easier to compare cmdstream. Note: this is a bit on the large side for what I'd normally consider for stable.. but right now it is looking like it is the newer fw that is headed for linux-firmware. This should defn have some soak time on master, but probably a good idea for this patch to end up in distro mesa builds by the time a630_sqe.fw hits linux-firmware. Cc: [email protected] Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>