summaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* radv: Support VK_EXT_queue_family_foreign.Bas Nieuwenhuizen2019-07-033-3/+7
| | | | | | | | | Basically same as external for now. Reviewed-by: Samuel Pitoiset <[email protected]> Only case we might need to handle differently in the near future is Raven's case of displayable DCC which is not renderable. But we don't support that yet.
* radv: Fix interactions between variable descriptor count and inline uniform ↵Bas Nieuwenhuizen2019-07-031-1/+5
| | | | | | | blocks. Fixes: d7e6541cc72 "radv: Only allocate supplied number of descriptors when variable." Reviewed-by: Samuel Pitoiset <[email protected]>
* winsys/amdgpu: Make KMS handles valid for original DRM file descriptorMichel Dänzer2019-07-036-11/+24
| | | | | | | | | | | | | | | | | | Getting a DMA-buf fd and converting that to a handle using our duplicate of that file descriptor (getting at which requires passing a radeon_winsys pointer to the buffer_get_handle hook) makes sure of this, since duplicated file descriptors reference the same file description and therefore the same GEM handle namespace. This is necessary because libdrm_amdgpu may use a different DRM file descriptor with a separate handle namespace internally, e.g. because it always reuses any existing amdgpu_device_handle for the same device. amdgpu_bo_export returns a handle which is valid for that internal file descriptor. Bugzilla: https://bugs.freedesktop.org/110903 Reviewed-by: Marek Olšák <[email protected]> Tested-by: Pierre-Eric Pelloux-Prayer <[email protected]>
* winsys/amdgpu: Add amdgpu_screen_winsysMichel Dänzer2019-07-037-142/+183
| | | | | | | | | | | | | | | | | It extends pipe_screen / radeon_winsys and references amdgpu_winsys. Multiple amdgpu_screen_winsys instances may reference the same amdgpu_winsys instance, which corresponds to an amdgpu_device_handle. The purpose of amdgpu_screen_winsys is to keep a duplicate of the DRM file descriptor passed to amdgpu_winsys_create, which will be needed in the next change. v2: * Add comment in amdgpu_winsys_unref explaining why it always returns true (Marek Olšák) Reviewed-by: Marek Olšák <[email protected]> Tested-by: Pierre-Eric Pelloux-Prayer <[email protected]>
* winsys/amdgpu: Use amdgpu_winsys helper instead of open-coded castsMichel Dänzer2019-07-033-8/+8
| | | | | | | | Cleanup to prevent breakage with the next change, no functional change intended in this one. Reviewed-by: Marek Olšák <[email protected]> Tested-by: Pierre-Eric Pelloux-Prayer <[email protected]>
* intel: fix wrong format usageJuan A. Suarez Romero2019-07-031-1/+1
| | | | | | | | | | | Do not use the view format when filling the surface state. Fixes dEQP-VK.image.texel_view_compatible.compute.extended.texture.* Fixes: fb1350c76f1 ("intel: Add and use helpers for level0 extent") Reviewed-by: Nanley Chery <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* radv: only allocate a 32-bit value for the TC-compat range metadataSamuel Pitoiset2019-07-031-2/+2
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: remove unused code in radv_update_tc_compat_zrange_metadata()Samuel Pitoiset2019-07-031-2/+0
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: add radv_get_depth_pipeline() helperSamuel Pitoiset2019-07-031-25/+41
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* iris: assert isl_surf_init success in resource_from_handleMike Blumenkrantz2019-07-021-14/+15
| | | | | | | | this can fail unexpectedly due to bugs, so it's good to provide feedback when this occurs Reviewed-by: Sagar Ghuge <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* anv: Advertise a more accurate minTexelBufferOffsetAlignmentJason Ekstrand2019-07-021-1/+4
| | | | Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* anv: Implement VK_EXT_texel_buffer_alignmentJason Ekstrand2019-07-022-0/+38
| | | | Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* vulkan: Update the XML and headers to 1.1.113Jason Ekstrand2019-07-021-8/+63
| | | | Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* spirv: Ignore ArrayStride in OpPtrAccessChain for WorkgroupCaio Marcelo de Oliveira Filho2019-07-021-4/+6
| | | | | | | | | | | | | | | | | From OpPtrAccessChain description in the SPIR-V spec (1.4 rev 1): For objects in the Uniform, StorageBuffer, or PushConstant storage classes, the element’s address or location is calculated using a stride, which will be the Base-type’s Array Stride when the Base type is decorated with ArrayStride. For all other objects, the implementation will calculate the element’s address or location. For non-CL shaders the driver should layout the Workgroup storage class, so override any explicitly set ArrayStride in the shader. This currently fixes only the lower_workgroup_access_to_offsets case, which is used by anv. Reviewed-by: Juan A. Suarez <[email protected]>
* nouveau: handle new CAPSKarol Herbst2019-07-022-0/+26
| | | | | Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* intel/fs: Use nir_lower_interpolation on gen11+Jason Ekstrand2019-07-024-48/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On gen11, the removed the PLN instruction so we have to emit a pile of MAD to emulate it. We may as well do that in NIR so we can optimize and later schedule it. Shader-db results on Ice Lake: total instructions in shared programs: 17145644 -> 16556440 (-3.44%) instructions in affected programs: 11507454 -> 10918250 (-5.12%) helped: 35763 HURT: 42085 helped stats (abs) min: 1 max: 140 x̄: 19.09 x̃: 18 helped stats (rel) min: 0.04% max: 37.93% x̄: 15.40% x̃: 14.49% HURT stats (abs) min: 1 max: 248 x̄: 2.22 x̃: 2 HURT stats (rel) min: 0.05% max: 50.00% x̄: 5.00% x̃: 2.47% 95% mean confidence interval for instructions value: -7.67 -7.47 95% mean confidence interval for instructions %-change: -4.46% -4.29% Instructions are helped. total loops in shared programs: 4370 -> 4370 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total cycles in shared programs: 360624645 -> 368220857 (2.11%) cycles in affected programs: 269631244 -> 277227456 (2.82%) helped: 15583 HURT: 65874 helped stats (abs) min: 1 max: 28561 x̄: 78.45 x̃: 32 helped stats (rel) min: <.01% max: 67.81% x̄: 5.38% x̃: 2.44% HURT stats (abs) min: 1 max: 238638 x̄: 133.87 x̃: 20 HURT stats (rel) min: <.01% max: 306.25% x̄: 5.81% x̃: 3.97% 95% mean confidence interval for cycles value: 67.42 119.09 95% mean confidence interval for cycles %-change: 3.61% 3.73% Cycles are HURT. total spills in shared programs: 8943 -> 8981 (0.42%) spills in affected programs: 1925 -> 1963 (1.97%) helped: 44 HURT: 14 total fills in shared programs: 21815 -> 21925 (0.50%) fills in affected programs: 3511 -> 3621 (3.13%) helped: 41 HURT: 18 LOST: 70 GAINED: 14 Reviewed-by: Matt Turner <[email protected]>
* intel/fs: Implement nir_intrinsic_load_fs_input_interp_deltasJason Ekstrand2019-07-022-1/+14
| | | | Reviewed-by: Matt Turner <[email protected]>
* intel/fs: Actually implement the load_barycentric intrinsicsJason Ekstrand2019-07-022-12/+93
| | | | | | | | | | If they never get used, dead code should clean them up. Also, we rework the at_offset and at_sample intrinsics so they return a proper vec2 instead of returning things in PLN layout. Fortunately, copy-prop is pretty good at cleaning this up and it doesn't result in any actual extra MOVs. Reviewed-by: Matt Turner <[email protected]>
* nir: add pass to lower load_interpolated_inputRob Clark2019-07-026-0/+193
| | | | | | Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* panfrost: Pass referenced BOs to the SUBMIT ioctlsBoris Brezillon2019-07-021-19/+27
| | | | | | | | | | | Instead of manually adding the BOs from the various SLAB pools plus the one backing the color FB, we insert them in the BO set attached to the job and let panfrost_drm_submit_job() pass all BOs from this set to the SUBMIT ioctl. This means we are now passing all referenced BOs and let the scheduler wait on referenced BO fences if needed. Signed-off-by: Boris Brezillon <[email protected]>
* panfrost: Make SLAB pool creation rely on BO helpersBoris Brezillon2019-07-027-110/+56
| | | | | | | There's no point duplicating the code, and it will help us simplify the bo_handles[] filling logic in panfrost_drm_submit_job(). Signed-off-by: Boris Brezillon <[email protected]>
* panfrost: Add the panfrost_drm_{create,release}_bo() helpersBoris Brezillon2019-07-023-29/+70
| | | | | | | To avoid the panfrost_memory <-> panfrost_bo dance done in panfrost_resource_create_bo() and panfrost_bo_unreference(). Signed-off-by: Boris Brezillon <[email protected]>
* panfrost: Move the mmap BO logic out of panfrost_drm_import_bo()Boris Brezillon2019-07-021-21/+30
| | | | | | | So we can re-use it for the panfrost_drm_create_bo() function we are about to introduce. Signed-off-by: Boris Brezillon <[email protected]>
* panfrost: Avoid passing winsys handles to import/export BO funcsBoris Brezillon2019-07-023-19/+20
| | | | | | | | | | Let's keep a clear split between ioctl wrappers and the rest of the driver. All the import BO function need is a dmabuf FD and the screen object, and the export one should only take care of generating a dmabuf FD out of a BO object. Winsys handle manipulation should stay in the resource.c file. Signed-off-by: Boris Brezillon <[email protected]>
* panfrost: Move BO meta-data out of panfrost_boBoris Brezillon2019-07-026-94/+98
| | | | | | | | | | | | That's what most (all?) implementation seem to do, and my understanding is that a BO is just a bunch of memory that can be used for anything GPU related, not only texture/FB resources. Let's move those meta data in panfrost_resource so we can use panfrost_bo for all kind of memory allocation and make BO allocation more consistent. Signed-off-by: Boris Brezillon <[email protected]>
* panfrost: Stop exposing internal panfrost_drm_*() functionsBoris Brezillon2019-07-022-7/+2
| | | | | | | panfrost_drm_submit_job() and panfrost_fence_create() are not used outside of pan_drm.c. Signed-off-by: Boris Brezillon <[email protected]>
* panfrost: Get rid of the "free imported BO" logicBoris Brezillon2019-07-024-37/+8
| | | | | | | | | | bo->imported was never set to true which means this path was never taken. Moreover, panfrost_drm_free_imported_bo() is doing missing the munmap() call which seems wrong because the import BO function calls mmap(). Let's just kill this function along with the ->imported field. Signed-off-by: Boris Brezillon <[email protected]>
* panfrost: Get rid of the panfrost_driver abstraction leftoversBoris Brezillon2019-07-023-35/+0
| | | | | | | Commit 5f81669d880b ("panfrost: Remove the panfrost_driver abstraction") left a few things behind, remove them now. Signed-off-by: Boris Brezillon <[email protected]>
* panfrost: Move scanout res creation out of panfrost_resource_create()Boris Brezillon2019-07-021-32/+41
| | | | | | Which improves readability and help us avoid a memory leak. Signed-off-by: Boris Brezillon <[email protected]>
* panfrost: Add the sampled texture BO to the jobBoris Brezillon2019-07-021-0/+4
| | | | | | | | | Otherwise we get random use-after-{free,unmap} errors. Signed-off-by: Boris Brezillon <[email protected]> --- Changes in v2: - Move the panfrost_job_add_bo() call out of the loop
* radv: enable DCC for layers on GFX8Samuel Pitoiset2019-07-021-9/+23
| | | | | | | | | | | | It's currently only enabled if dcc_slice_size is equal to dcc_slice_fast_clear_size because the driver assumes that portions of multiple layers are contiguous but it's not always true. Still not supported on GFX9. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: do not enable DCC for mipmapped arrays because performance is worseSamuel Pitoiset2019-07-021-0/+4
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: implement clearing DCC layers on GFX8Samuel Pitoiset2019-07-022-4/+7
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: merge radv_dcc_clear_level() into radv_clear_dcc()Samuel Pitoiset2019-07-021-30/+22
| | | | | | | | This will help for clearing DCC arrays because we need to know the subresource range. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: add support for decompressing DCC layers with computeSamuel Pitoiset2019-07-021-51/+53
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: compute the DCC fast clear size per slice on GFX8Samuel Pitoiset2019-07-022-0/+28
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: compute the size of one DCC slice on GFX8Samuel Pitoiset2019-07-022-0/+7
| | | | | | | | Addrlib doesn't provide this info. Because DCC is linear, at least on GFX8, it's easy to compute the size of one slice. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* iris: Defer closing and freeing VMA until buffers are idle.Kenneth Graunke2019-07-021-10/+51
| | | | | | | | | | | | There will unfortunately be circumstances where we cannot re-use a virtual memory address until it's no longer active on the GPU. To facilitate this, we instead move BOs to a "dead" list, and defer closing them and returning their VMA until they are idle. We periodically sweep these away in cleanup_bo_cache, which triggers every time a new object's refcount hits zero. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Tested-by: Jordan Justen <[email protected]>
* iris: Add an explicit alignment parameter to iris_bo_alloc_tiled().Kenneth Graunke2019-07-023-12/+19
| | | | | | | | | | | | In the future, some images will need to be aligned to a larger value than 4096. Most buffers, however, don't have any such requirement, so for now we only add the parameter to iris_bo_alloc_tiled() and leave the others with the simpler interface. v2: Fix missing alignment in vma_alloc, caught by Caio! Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Tested-by: Jordan Justen <[email protected]>
* v3d: do not flush jobs that are synced with 'Wait for transform feedback'Iago Toral Quiroga2019-07-025-20/+61
| | | | | | | | | | | | | Generally, we achieve this by skipping the flush on calls to v3d_flush_jobs_writing_resource() when we detect that the resource is written in the current job from a transform feedback write. The exception to this is the case where the caller is about to map the resource, in which case we need to flush immediately since we can only emit 'Wait for transform feedback' commands on rendering jobs. We add a parameter to the function so the caller can identify that scenario. Reviewed-by: Eric Anholt <[email protected]>
* v3d: emit 'Wait for transform feedback' commands when neededIago Toral Quiroga2019-07-021-0/+120
| | | | | | | | | | | | | | | | | | | | | | The hardware can flush transform feedback writes before reads in the same job by inserting this command. This patch detects when the rendering state for the current draw call reads resources that had been previously written by transform feedback in the same job and inserts the 'Wait for transform feedback' command before emitting the new draw. v2 (Eric): - this was intended to look at job->tf_write_prscs for TF jobs. - clear job->tf_write_prscs after we emit the TF flush. - can skip flushes for fragment shader reads from TF. v3 (Eric): - all resources in job->tf_write_prscs are resources written by TF so we don't need to check if they are bound to PIPE_BIND_STREAM_OUTPUT. - documented optimization opportunity for geometry stages. Reviewed-by: Eric Anholt <[email protected]>
* v3d: keep track of resources written by transform feedbackIago Toral Quiroga2019-07-023-2/+15
| | | | | | | | | | | | | The hardware provides a feature to sync reads from previous transform feedback writes in the same job so if we use this mechanism we no longer have to flush the job. In order to identify this scenario we need a mechanism to identify resources that are written by transform feedback. v2: use _mesa_pointer_set_create (Eric) Reviewed-by: Eric Anholt <[email protected]>
* st/dri: fix typo in format table for GR1616 formatMike Blumenkrantz2019-07-011-1/+1
| | | | | | | the dri image format here should match the fourcc format Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* st/dri: pass dri2_format_mapping directly to dri2_create_image_from_winsysMike Blumenkrantz2019-07-011-4/+5
| | | | | | | this makes the entire struct available for use here Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* mesa/st: simplify format usage in st_bind_egl_imageMike Blumenkrantz2019-07-011-15/+13
| | | | | | | | | | | the formats handled in the switch statement will always return an unknown mesa format, so process them directly and leave the default case for other/unknown formats no functional changes Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* iris: Use MI_COPY_MEM_MEM for tiny resource_copy_region calls.Kenneth Graunke2019-07-011-0/+27
| | | | | | | | | | | | | | | If our resource_copy_region size is a small number of DWords, then instead of firing up BLORP, we can simply use MI_COPY_MEM_MEM (after a CS stall). We also try and select the optimal batch. Improves performance in Shadow of Mordor on Low settings at 1920x1080 on Skylake GT4e by 0.689096% +/- 0.473968% (n=4). It tries to copy 4 bytes of data to a buffer which was most recently used as a writable compute shader SSBO. Previously we were switching from compute to the render pipeline, then firing up all of blorp_buffer_copy...for 4 bytes. I arbitrarily decided to support 4/8/12/16 bytes. Jason thinks this is about the right threshold where it's cheaper to use MI_COPY_MEM_MEM.
* radv: Only allocate supplied number of descriptors when variable.Bas Nieuwenhuizen2019-07-011-1/+7
| | | | | | Fixes: b5e04e9217b "radv: Support allocating variable size descriptor sets." Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111019 Reviewed-by: Samuel Pitoiset <[email protected]>
* egl: simplify loopEric Engestrom2019-07-011-3/+1
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Sagar Ghuge<[email protected]>
* sparc: Reuse m_vector_asm.h.Eric Anholt2019-07-013-34/+14
| | | | | Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* mesa: Enable asm unconditionally, now that gen_matypes is gone.Eric Anholt2019-07-012-4/+0
| | | | | Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Matt Turner <[email protected]>