summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* vulkan/wsi: Use VK_EXT_pci_bus_info for DRM fd matchingJason Ekstrand2018-10-1811-70/+61
| | | | | | | | This lets us avoid passing the DRM fd around all over the place and gets us closer to layer utopia. Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* loader/dri3: Also wait for front buffer fence if we triggered itMichel Dänzer2018-10-181-2/+5
| | | | | | | | | | | | | In that case, we have to wait for the fence to synchronize with the corresponding drawing we triggered in the X server. Fixes incorrect display with the i965 driver and some applications, e.g. solvespace. Bugzilla: https://bugs.freedesktop.org/108097 Fixes: aefac10fecc9 "loader/dri3: Only wait for back buffer fences in dri3_get_buffer" Tested-by: Sergii Romantsov <[email protected]>
* anv: Stop generating weak references for instance entrypointsJason Ekstrand2018-10-181-13/+0
| | | | | | | | | We don't need weak references to instance entrypoints because we never have more than one of each so we don't need the NULL fall-back. This also helps us avoid forgetting things because we now get link errors for missing instance entrypoints. Reviewed-by: Lionel Landwerlin <[email protected]>
* vulkan/wsi: Implement GetPhysicalDevicePresentRectanglesKHRJason Ekstrand2018-10-188-0/+177
| | | | | | | | | | | | | | | | | | This got missed during 1.1 enabling because it was defined as an interaction between device groups and WSI and it wasn't obvious it was in the delta. The idea behind it is that it's supposed to provide a hint to the application in a multi-GPU setup to indicate which regions of the screen are being scanned out by which GPU so a multi-device split-screen rendering application can render each part of the screen on the GPU that will be presenting it and avoid extra bus traffic between GPUs. On a single-GPU setup or one which doesn't support this present mode, we need to do something. We choose to return the window size (or a max-size rect) if the compositor, X server, or crtc is associated with the given physical device and zero rectangles otherwise. Reviewed-by: Lionel Landwerlin <[email protected]>
* vulkan/wsi: Store the instance allocator in wsi_deviceJason Ekstrand2018-10-1811-27/+16
| | | | | | | | | | | | We already have wsi_device and we know the instance allocator at wsi_device_init time so there's no need to pass it into the physical device queries. This also fixes a memory allocation domain bug that can occur if CreateSwapchain gets called prior to any queries (not likely) in which case the cached connection gets allocated off the device instead of the instance. Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* st/xlib: Use more appropriate include guardMichał Janiszewski2018-10-181-2/+2
| | | | | Signed-off-by: Michał Janiszewski <[email protected]> Reviewed-by: Emil Velikov <[email protected]
* gallium: Fix mismatched ifdef-guardsMichał Janiszewski2018-10-181-2/+2
| | | | | Signed-off-by: Michał Janiszewski <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* softpipe: dynamically allocate space for immediate constantsGert Wollny2018-10-182-5/+15
| | | | | | | | | | | | | | | The number of immediate constants was fixed and the size check was only done by means of an assertion. Given this a shader that emits more immediate constants would result in a memory corruption when mesa is build in release mode. Instead of using this fixed limit allocate the space dynamically, let it grow as needed, and also remove the unused ImmArray. Fixes: dEQP-GLES31.functional.ssbo.layout.random.arrays_of_arrays.1 Signed-off-by: Gert Wollny <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* radv: use nir_shrink_vec_array_vars()Timothy Arceri2018-10-181-0/+1
| | | | | | | | | | | | | | | | | | Totals from affected shaders: SGPRS: 1096 -> 1096 (0.00 %) VGPRS: 1192 -> 1056 (-11.41 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 100940 -> 94384 (-6.49 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 100 -> 112 (12.00 %) Wait states: 0 -> 0 (0.00 %) All affected shaders are from Batman Arkham City. Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: use nir_split_array_vars()Timothy Arceri2018-10-181-0/+2
| | | | | | | | | | | | | | | | | | | | | We call in the opt loop in case another pass results in an array with indirect access being turned into direct access. Totals from affected shaders: SGPRS: 512 -> 496 (-3.12 %) VGPRS: 456 -> 452 (-0.88 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 40040 -> 39664 (-0.94 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 41 -> 43 (4.88 %) Wait states: 0 -> 0 (0.00 %) All affected shaders are from Batman Arkham City. Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: use nir_opt_find_array_copies()Timothy Arceri2018-10-183-8/+23
| | | | | | | | | | | | | | | | | | | | | | | Totals from affected shaders: SGPRS: 1112 -> 1112 (0.00 %) VGPRS: 1492 -> 1196 (-19.84 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 112172 -> 101316 (-9.68 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 93 -> 98 (5.38 %) Wait states: 0 -> 0 (0.00 %) All affected shaders are from "Batman: Arkham City" over DXVK. The pass detects that the temporary array created by DXVK for storing TCS inputs is a copy of the input arrays and allows us to avoid copying all of the input data and then indirecting on it with if-ladders, instead we just do indirect indexing. Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: use nir_opt_copy_prop_vars and nir_opt_dead_write_varsTimothy Arceri2018-10-181-0/+4
| | | | | | | | | | | | | | | | | | | Totals from affected shaders: SGPRS: 2856 -> 2856 (0.00 %) VGPRS: 3236 -> 3248 (0.37 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 236560 -> 233548 (-1.27 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 277 -> 283 (2.17 %) Wait states: 0 -> 0 (0.00 %) Even in the cases were we have increased VGPR use it appears the NIR is improved significantly. Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* vulkan: Add VK_EXT_calibrated_timestamps extension (radv and anv) [v5]Keith Packard2018-10-177-0/+270
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Offers three clocks, device, clock monotonic and clock monotonic raw. Could use some kernel support to reduce the deviation between clock values. v2: Ensure deviation is at least as big as the GPU time interval. v3: Set device->lost when returning DEVICE_LOST. Use MAX2 and DIV_ROUND_UP instead of open coding these. Delete spurious TIMESTAMP in radv version. Suggested-by: Jason Ekstrand <[email protected]> Suggested-by: Lionel Landwerlin <[email protected]> v4: Add anv_gem_reg_read to anv_gem_stubs.c Suggested-by: Jason Ekstrand <[email protected]> v5: Adjust maxDeviation computation to max(sampled_clock_period) + sample_interval. Suggested-by: Bas Nieuwenhuizen <[email protected]> Suggested-by: Jason Ekstrand <[email protected]> Signed-off-by: Keith Packard <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* intel/compiler/icl: Use invocation id bits 22:16 instead of 23:17Topi Pohjolainen2018-10-171-2/+6
| | | | | | | | | | | | | | Identifier bits in the dispatch header have changed. See Bspec: SINGLE_PATCH Payload: 3D Pipeline Stages - 3D Pipeline Geometry - Hull Shader (HS) Stage IVB+ - Payloads IVB+ Fixes: KHR-GL46.tessellation_shader.tessellation_shader_tc_barriers.barrier_guarded_read_write_calls Reviewed-by: Anuj Phogat <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]>
* Fix setting indent-tabs-mode in the Emacs .dir-locals.el filesNeil Roberts2018-10-174-4/+4
| | | | | | | Some of the .dir-locals.el had the wrong name for the truthy value so it wasn’t setting indent-tabs-mode. Reviewed-by: Ilia Mirkin <[email protected]>
* freedreno/a6xx: don't allocate binning rbRob Clark2018-10-171-3/+7
| | | | | | | Now that a single cmdstream is used for both binning and draw passes, we can skip allocation of cmdstream buffer for binning. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: single cmdstream for draw+binningRob Clark2018-10-173-15/+3
| | | | | | | | | | | Now that state which is different for draw vs binning pass is split out into different state-groups with appropriate enable_mask (so the appropriate one is chosen for draw vs binning), switch over to using a single cmdstream for both passes. This should significantly lower draw overhead for CPU bound benchmarks. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: split binning vs draw program stateobj'sRob Clark2018-10-172-4/+4
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: split VBO state into binning/draw variantsRob Clark2018-10-172-1/+8
| | | | | | | | Blob seems to manage to use same input registers for BS (binning pass) vs VS (draw pass) shaders, so it can use the same VBO state for both. We can't quite do that yet, so split them. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: move VBO state to stateobjRob Clark2018-10-173-8/+19
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: move ZSA state to stateobjRob Clark2018-10-172-19/+39
| | | | | | | Step towards single cmdstream, where we need different state-group-id's for binning vs draw ZSA state. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: remove vismode paramRob Clark2018-10-171-14/+2
| | | | | | | We don't need to keep this IGNORE_VISIBILITY in binning pass. Prep work for using single cmdstream for both draw and binning passes. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: move binning-pass fixup for a6xx+Rob Clark2018-10-171-20/+37
| | | | | | | | | Move this to after ir3_cp (which can add lowered immediates to the const state) for a6xx+, to ensure the uniform state matches between binning and vertex shaders. This way we can emit just a single VS_CONST state- group when we re-use single cmdstream for both binning and draw passes. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: a bit more state emit cleanupRob Clark2018-10-174-37/+27
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: move framebuffer state emit to emit_mrt()Rob Clark2018-10-172-29/+24
| | | | | | | No point in checking this per-draw, since framebuffer change means new batch. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: small emit_mrt() cleanupRob Clark2018-10-171-14/+7
| | | | | | | On a6xx, this is only used for pfb->cbufs so we can just directly pass the pfb state. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: use program cacheRob Clark2018-10-177-130/+247
| | | | | | | Use the in-memory cache to construct shader program state and re-use it on subsequent draws, to lower driver overhead. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: shader variant cacheRob Clark2018-10-175-0/+214
| | | | | | | | | | Cache that maps gallium hwcso (in this case, 'struct ir3_shader') plus shader variant key to a generation specific state object. This could eventually replace the linked list of shader variants, but for now it lets us re-use the work currently done in fdN_program_emit() Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: move binning_pass out of shader variant keyRob Clark2018-10-1721-75/+109
| | | | | | | | | | | Prep work for a following patch, that introduces a cache to map from program state (all shader stages) plus variant key to pre-baked hw state (which could be emit'd via CP_SET_DRAW_STATE, for example). To do that, we really want the variant key to be immutable, and to treat the binning pass shader as an extra shader stage, rather than as a VS variant. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: track # of samplers used by shaderRob Clark2018-10-178-25/+19
| | | | | | | This is useful for a6xx to avoid program state from depending on bound tex/samp state. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: texture state objRob Clark2018-10-176-33/+251
| | | | | | | | | | | Unfortunately gallium doesn't match what the hw wants perfectly here, in using a separate CSO for each texture/sampler. So we have to use a hash table to map the collection of texture/samplers to hw state object. We probably could use separate hw state objects for texture and sampler state, but mesa/st tends to update the tex and samp state together. Signed-off-by: Rob Clark <[email protected]>
* freedreno: add resource seqnoRob Clark2018-10-174-3/+11
| | | | | | | Intended to be something more compact than a 64b pointer, which could be used as a key into hashtables. Prep work for texture state objects. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: move const emit to state groupRob Clark2018-10-174-15/+70
| | | | | | | | | | | | | | Eventually we want to move nearly everything, but no other state depends on const state, so this is the easiest one to move first. For webgl aquarium, this reduces GPU load by about 10%, since for each fish it does a uniform upload plus draw.. fish frequently are visible in only a single tile, so this skips the uniform uploads for other tiles. The additional step of avoiding WFI's when using CP_SET_DRAW_STATE seems to be work an additional 10% gain for aquarium. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: add infrastructure for CP_DRAW_STATERob Clark2018-10-174-2/+48
| | | | | | | Add helper to add state-groups to emit, and code to emit CP_DRAW_STATE packet if we have any state-groups. Signed-off-by: Rob Clark <[email protected]>
* freedreno: reduce resource dependency tracking overheadRob Clark2018-10-171-42/+67
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno: Remove the Emacs mode linesNeil Roberts2018-10-17113-226/+0
| | | | | | | | | | | | | | | These are not necessary because the corresponding settings are set via the .dir-locals.el file anyway. Most of them were missing a ‘:’ after “tab-width” which was making Emacs display an annoying warning whenever you open the file. This patch was made with: sed -ri '/-\*- mode:/,/^$/d' \ $(find src/gallium/{drivers,winsys} -name \*.\[ch\] \ -exec grep -l -- '-\*- mode:' {} \+) Signed-off-by: Rob Clark <[email protected]>
* freedreno: Fix the Emacs indentation configuration fileNeil Roberts2018-10-171-1/+1
| | | | | | | The .dir-locals.el had the wrong name for the truthy value so it wasn’t setting indent-tabs-mode. Signed-off-by: Rob Clark <[email protected]>
* freedreno: allocate batches from the cache in launch_gridHyunjun Ko2018-10-171-1/+2
| | | | | | | | | | Needs to allocate batches from the cache so that it could get a valid index and make resource dependancy tracking right. In addition this fixes assertion on debug build since the commit 1a40faa8 landed. Signed-off-by: Rob Clark <[email protected]>
* freedreno: adds nondraw param to fd_bc_alloc_batchHyunjun Ko2018-10-174-6/+6
| | | | | | | | Needs to specify nondraw when creating a batch through fd_bc_alloc_batch since it'd better create a batch through it rather than fd_batch_create. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: remove fd6_emit_render_cntl()Rob Clark2018-10-172-34/+0
| | | | | | It was dead code carried over from a5xx Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: fix broken texcoord inputsRob Clark2018-10-171-21/+1
| | | | | | | | TODO not sure if this is best solution, but current logic is broken for texcoord inputs. It is definitely the simplest solution. Fixes: 1a24f519663 freedreno/ir3: ignore unused inputs Signed-off-by: Rob Clark <[email protected]>
* freedreno: fix off-by-one error in BEGIN_RING()Rob Clark2018-10-171-1/+1
| | | | Signed-off-by: Rob Clark <[email protected]>
* util: document a limitation of util_fast_udiv32Marek Olšák2018-10-171-1/+7
| | | | trivial
* i965/fs: Add 64-bit int immediate support to dump_instructions()Matt Turner2018-10-162-0/+8
| | | | | Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]>
* radeonsi: track context rolls better for the Vega scissor bug workaroundMarek Olšák2018-10-167-34/+80
| | | | | | We should get fewer context rolls with the SET_CONTEXT_REG optimization, but it would have been for nothing if the scissor state rolled the context anyway. Don't emit the scissor state if there is no context roll.
* radeonsi: emit sample locations for 1xAA only when the hw bug is presentMarek Olšák2018-10-161-4/+2
|
* radeonsi: use compute shaders for clear_buffer & copy_bufferMarek Olšák2018-10-168-203/+350
| | | | | Fast color clears should be much faster. Also, fast color clears on evicted buffers should be 200x faster on GFX8 and older.
* radeonsi: use copy_buffer in buffer_do_flush_region directlyMarek Olšák2018-10-161-11/+4
|
* radeonsi: use faster integer division for instance divisorsMarek Olšák2018-10-163-36/+83
| | | | | | | | | | We know the divisors when we upload them, so instead we can precompute and upload division factors derived from each divisor. This fast division consists of add, mul_hi, and two shifts, and we have to load 4 dwords intead of 1. This probably won't affect any apps.
* ac: add helpers for fast integer division by a constantMarek Olšák2018-10-162-0/+78
|