summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Revert "panfrost/midgard: Extend copy propagation pass"Alyssa Rosenzweig2019-04-281-48/+8
| | | | | | | | | Fixes: commit b53b4573c3f0571253672e44ce7d6310d9f987bf. Optimization gone wrong. In the future, we should try this again (it's a net win if implemented right), but at the moment this just regresses. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* radv: add missing VEGA20 chip in radv_get_device_name()Samuel Pitoiset2019-04-271-0/+1
| | | | | | | | Otherwise it returns "AMD RADV unknown". Cc: 19.0 <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* iris: Fix zeroing of transform feedback offsets in strange cases.Kenneth Graunke2019-04-272-4/+18
| | | | | | | | | | | | | | | | | | | | Some of the dEQP.functional.transform_feedback tests end up doing the following sequence of operations: 1. BeginTransformFeedback 2. PauseTransformFeedback 3. Draw 4. ResumeTransformFeedback At step 1, we'd pack 3DSTATE_SO_BUFFER commands saying to zero the SO_WRITE_OFFSET registers. At step 2, we disable streamout, so step 3 doesn't bother emitting those commands. Then, step 4 re-packs new 3DSTATE_SO_BUFFER commands with offset = 0xFFFFFFFF, saying to continue appending at the existing offset. This loads the value from the BO as the offsets - but we never actually zeroed it. So, just maintain a flag saying "we actually emitted the commands", and stomp offset back to zero until we emit some.
* vc4: Fall back to renderonly if the vc4 driver doesn't have v3d.Eric Anholt2019-04-263-4/+35
| | | | | | | I have a platform with vc4 display but V3D 4.x. We can fall back on kmsro's probing to bring up the v3d gallium driver. Acked-by: Rob Clark <[email protected]>
* kmsro: Add support for V3D.Eric Anholt2019-04-263-1/+17
| | | | | | | | | Like vc4, we expect to have SOCs with various displays that have a single V3D instance for rendering. v2: Add v3d to the list of drivers that make enabling kmsro valid. Acked-by: Rob Clark <[email protected]>
* radeonsi: don't ignore PIPE_FLUSH_ASYNCMarek Olšák2019-04-261-1/+1
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* v3d: Fix detection of TMU write sequences in register spilling.Eric Anholt2019-04-261-2/+9
| | | | | | | | We can't use the QPU functions to detect this until register allocation is done and we've moved inst->dst into inst->qpu. Fixes bad TMU sequences from register spilling in KHR-GLES31.core.compute_shader.shared-max.
* v3d: Fix detection of the last ldtmu before a new TMU op.Eric Anholt2019-04-261-3/+3
| | | | | We were looking at the start instruction, instead of scanning through the list of following instructions to find any more ldtmus.
* v3d: Re-add support for memory_barrier_shared.Eric Anholt2019-04-261-0/+1
| | | | | | | | Looks like I lost it in a rebase conflict resolution. We'd hit the unknown intrinsic assertion in KHR-GLES31.core.compute_shader.shared-struct. Fixes: 6b1c65982509 ("v3d: Add Compute Shader compilation support.")
* Revert "v3d: Disable PIPE_CAP_BLIT_BASED_TEXTURE_TRANSFER."Eric Anholt2019-04-261-1/+9
| | | | | | This reverts commit ccce9409470c1053c40c822d759b9bd417062bc0, leaving a note as to why we had to (corruption in chromium, breaking some GLES3.1 tests).
* v3d: Don't try to update the shadow texture for separate stencil.Eric Anholt2019-04-261-1/+2
| | | | | | | | | | | There are two cases where v3d's sampler view's resource doesn't match the base's: shadow textures for sampling from raster, and pointing at the separate depth texture for z32f_s8x24. We only want to update shadow for the first case. Fixes dEQP-GLES31.functional.stencil_texturing.render.depth32f_stencil8_draw when run after the previous testcase.
* v3d: Add a note about i/o indirection for future performance work.Eric Anholt2019-04-261-0/+7
|
* vc4: Use _mesa_hash_table_remove_key() where appropriate.Eric Anholt2019-04-261-12/+9
|
* v3d: Use _mesa_hash_table_remove_key() where appropriate.Eric Anholt2019-04-261-13/+8
|
* v3d: Assert that we do request the normal texturing return data.Eric Anholt2019-04-261-0/+2
| | | | | An unused tex should be DCEed, but if it wasn't we'd run into trouble with not doing a TMUWT.
* v3d: Apply the GFXH-930 workaround to the case where the VS loads attrs.Eric Anholt2019-04-261-0/+15
| | | | | | | We were emitting a dummy load for when the VS doesn't load any attributes, but we also need to emit a dummy load for when the render VS loads attributes but the binner VS doesn't. Fixes simulator assertion failures and GPU hangs on KHR-GLES31.core.texture_gather.\*
* v3d: Fill in the ignored segment size fields to appease new simulator.Eric Anholt2019-04-261-2/+4
| | | | | | We are assured that the input segment size field is ignored for !separate_segs mode, and now the simulator wants an in-range value set regardless of whether it's functionally ignored or not.
* glsl: use empty brace initializerTapani Pälli2019-04-261-2/+2
| | | | | | | | | fixes following warning with clang: warning: suggest braces around initialization of subobject Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* gbm: don't return voidcoypu2019-04-261-1/+1
| | | | | Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* nir: use braces around subobject in initializerTapani Pälli2019-04-262-6/+6
| | | | | | | | | | | | | | Used same syntax as elsewhere with Mesa sources, verified result against MSVC with godbolt.org. fixes following warning with clang: warning: suggest braces around initialization of subobject v2: empty braces -> braces around subobject (Caio, Kristian) Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno/drm: Quiet pointer to u64 conversion warningKristian H. Kristensen2019-04-261-1/+1
|
* swr/rast: enforce use of tile offsetsAlok Hota2019-04-264-0/+5
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: AVX512 support compiled in by defaultAlok Hota2019-04-2612-560/+333
| | | | | | | | | - Emulation of AVX512 built into SIMDLIB - Remove associated macros - Remove knobs controlling AVX512 and let emulation handle it - Refactor variable names for SIMD16 Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Remove deprecated 4x2 backend codeAlok Hota2019-04-268-834/+19
| | | | | | | | | - Use 8x2 tiling by default - Remove associated macros - Use SIMDLIB emulation for SIMD16 on SIMD8 hardware - Remove code rot in Load/StoreTile Reviewed-by: Bruce Cherniak <[email protected]>
* llvmpipe: Always return some fence in flush (v2)Tomasz Figa2019-04-261-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If there is no last fence, due to no rendering happening yet, just create a new signaled fence and return it, to match the expectations of the EGL sync fence API. Fixes random "Could not create sync fence 0x3003" assertion failures from Skia on Android, coming from the following code: https://android.googlesource.com/platform/frameworks/base/+/master/libs/hwui/pipeline/skia/SkiaOpenGLPipeline.cpp#427 Reproducible especially with thread count >= 4. One could make the driver always keep the reference to the last fence, but: - the driver seems to explicitly destroy the fence whenever a rendering pass completes and changing that would require a significant functional change to the code. (Specifically, in lp_scene_end_rasterization().) - it still wouldn't solve the problem of an EGL sync fence being created and waited on without any rendering happening at all, which is also likely to happen with Android code pointed to in the commit. Therefore, the simple approach of always creating a fence is taken, similarly to other drivers, such as radeonsi. Tested with piglit llvmpipe suite with no regressions and following tests fixed: egl_khr_fence_sync conformance eglclientwaitsynckhr_flag_sync_flush eglclientwaitsynckhr_nonzero_timeout eglclientwaitsynckhr_zero_timeout eglcreatesynckhr_default_attributes eglgetsyncattribkhr_invalid_attrib eglgetsyncattribkhr_sync_status v2: - remove the useless lp_fence_reference() dance (Nicolai), - explain why creating the dummy fence is the right approach. Signed-off-by: Tomasz Figa <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* llvmpipe: correctly handle waiting in llvmpipe_fence_finishEmil Velikov2019-04-261-1/+6
| | | | | | | | | | | Currently if the timeout differs from 0, we'll end up with infinite wait... even if the user is perfectly clear they don't want that. Use the new lp_fence_timedwait() helper guarding both waits in an !lp_fence_signalled block like the rest of llvmpipe. Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* llvmpipe: add lp_fence_timedwait() helperEmil Velikov2019-04-262-0/+32
| | | | | | | | | | | | | | | | The function is analogous to lp_fence_wait() while taking at timeout (ns) parameter, as needed for EGL fence/sync. v2: - use absolute UTC time, as per spec (Gustaw) - bail out on cnd_timedwait() failure (Gustaw) v3: - check count/rank under mutex (Gustaw) Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> (v1) Reviewed-by: Gustaw Smolarczyk <[email protected]>
* vulkan/wsi: don't use DUMB_CLOSE for normal GEM handlesEmil Velikov2019-04-261-2/+2
| | | | | | | | | | | Currently we get normal GEM handles from PrimeFDToHandle, yet we close then with DUMB_CLOSE. Use GEM_CLOSE instead. Fixes: da997ebec92 ("vulkan: Add KHR_display extension using DRM [v10]") Cc: Jason Ekstrand <[email protected]> Cc: Keith Packard <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Emil Velikov <[email protected]>
* vulkan/wsi: check if the display_fd given is masterEmil Velikov2019-04-261-0/+27
| | | | | | | | | | | | | | | | | | | | | | | | | As effectively required by the extension, we need to ensure we're master Currently drivers employ vendor specific solutions, which check if the device behind the fd is capable*, yet none of them do the master check. *In the radv case, if acceleration is available. Instead of duplicating the check in each driver, keep it where it's needed and used. Note this copies libdrm's drmIsMaster() to avoid depending on bleeding edge version of the library. v2: set the fd to -1 if not master (Bas) Fixes: da997ebec92 ("vulkan: Add KHR_display extension using DRM [v10]") Cc: Andres Rodriguez <[email protected]> Cc: Jason Ekstrand <[email protected]> Cc: Keith Packard <[email protected]> Reported-by: Andres Rodriguez <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Emil Velikov <[email protected]>
* turnip: drop dead close(master_fd)Emil Velikov2019-04-261-2/+0
| | | | | | | | The fd is -1, thus the block of if (fd != -1) close(fd) is dead code. Cc: Chad Versace <[email protected]> Reviewed-by: Chia-I Wu <[email protected]> Signed-off-by: Emil Velikov <[email protected]>
* nir/algebraic: Optimize integer cast-of-castJason Ekstrand2019-04-261-0/+42
| | | | | | | These have been popping up more and more with the OpenCL work and other bits causing extra conversions to/from 64-bit. Reviewed-by: Karol Herbst <[email protected]>
* anv/descriptor_set: Don't fully destroy sets in pool destroy/resetJason Ekstrand2019-04-261-2/+3
| | | | | | | | | | | | | | | | | | | In 105002bd2d617, we fixed a memory leak bug where we weren't properly destroying descriptor when destroying/resetting a descriptor pool. However, the only real leak that happened was that we we take a reference to the descriptor set layout in the descriptor set and we weren't dropping our reference. Everything else in the descriptor set is tied to the pool itself and doesn't need to be freed on a per-set basis. This commit changes the destroy/reset functions to only bother walking the list of sets to unref the layouts and otherwise we just assume that the whole-pool destroy/reset takes care of the rest. Now that we're doing more non-trivial things with descriptor sets such as allocating things with util_vma_heap, per-set destruction is starting to show up on perf traces. This takes reset back to where it's supposed to be as a cheap whole-pool operation. Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Better handle 32-byte alignment of descriptor set buffersJason Ekstrand2019-04-261-3/+3
| | | | | | | | | | | | | | | | | | | In c520f4dec9c, we chose to align the sizes of descriptor set buffers to 32 bytes. We have to align the descriptor set buffer to 32B so that it's valid for using with push constants. We align the size as well so we don't leave lots of holes with util_vma_heap_alloc. Unfortunately, we were only aligning it for alloc and not for free so we were still creating piles of holes when we delete descriptor sets. This causes terrible perf for the allocator once we've deleted piles of descriptor sets. This commit reworks the code so that we align the descriptor set buffer size to 32B for both alloc and free. The result is that it takes the new crucible vkResetDescriptorPool from 104.567719 to 2.898354 seconds. Fixes: c520f4dec9c "anv: Add a concept of a descriptor buffer" Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110497 Reviewed-by: Lionel Landwerlin <[email protected]>
* nir: fix bit_size in lower indirect derefs.Dave Airlie2019-04-261-1/+1
| | | | | | | | This fixes a case where we are expecting 64-bit but generate 32-bit consts and validate gets angry. Reviewed-by: Jason Ekstrand <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* iris: Silence unused function warningKenneth Graunke2019-04-251-1/+1
|
* glsl: fix shader_storage_blocks_write_access for SSBO block arrays (v2)Marek Olšák2019-04-251-3/+19
| | | | | | | | | | This fixes KHR-GL45.compute_shader.resources-max on radeonsi. Fixes: 4e1e8f684bf "glsl: remember which SSBOs are not read-only and pass it to gallium" v2: use is_interface_array, protect again assertion failures in u_bit_consecutive Reviewed-by: Dave Airlie <[email protected]>
* docs/features: update GL tooRob Clark2019-04-251-3/+3
| | | | | | | Forgot to update corresponding entries for desktop GL.. kinda wish we didn't have to update both GLES and GL tables. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: sample-shading supportRob Clark2019-04-255-24/+70
| | | | | | | | | | Enables: OES_sample_shading OES_sample_variables OES_shader_multisample_interpolation Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: sample-shading supportRob Clark2019-04-255-8/+113
| | | | | | | | | | The compiler support for: OES_sample_shading OES_sample_variables OES_shader_multisample_interpolation Signed-off-by: Rob Clark <[email protected]>
* freedreno: wire up core sample-shading supportRob Clark2019-04-252-0/+11
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: fix load_interpolated_input slotRob Clark2019-04-251-1/+1
| | | | | | | The so->inputs[] table is in units of vec4 Fixes: 7ff6705b8d8 freedreno/ir3: convert to "new style" frag inputs Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: add VALIDREG/CONDREG helper macrosRob Clark2019-04-251-7/+8
| | | | | | | | There are a few places that we check if a shader stage input reg is used/valid (ie. not r63.x).. and there are about to be a bunch more. So add some helper macros for less open-coding. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: rename frag_vcoord -> ij_pixelRob Clark2019-04-253-10/+16
| | | | | | | Since this is what the value actually is. Cleanup the name before adding more different i,j related values for sample-shading. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: remove bogus assertRob Clark2019-04-251-2/+0
| | | | | | tex instruction can actually return 16b values. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: lower load_barycentric_at_offsetRob Clark2019-04-254-0/+129
| | | | | | | | | Calculates i,j at specified offset within a pixel. A new load_size_ir3 intrinsic is used in conjunction with fddx/fddy to translate the offset into primitive space and adjust the i,j from load_barycentric_pixel accordingly. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: lower load_barycentric_at_sampleRob Clark2019-04-254-0/+139
| | | | | | | This lowers load_barycentric_at_sample to load_sample_pos_from_id plus load_barycentric_at_offset. Signed-off-by: Rob Clark <[email protected]>
* freedreno: update generated headersRob Clark2019-04-2511-102/+140
| | | | | | Pull in updates for sample shading. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: cleanup instruction builder macrosRob Clark2019-04-251-46/+27
| | | | | | | | De-duplicate the "normal" and "flags" versions of the macros, and while at it go ahead and add "flags" versions for all the remaining macros, since we'll at least need INSTR1F in a following commit. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: more emit-cat5 fixesRob Clark2019-04-251-0/+2
| | | | | | Couple more opcodes which don't take a sampler id as first arg. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: fix rgetpos decodingRob Clark2019-04-251-1/+1
| | | | | | It takes an argument. Signed-off-by: Rob Clark <[email protected]>