aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
...
* intel/fs/ra: Split building the interference graph into a helperJason Ekstrand2019-05-141-23/+42
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* intel/fs/ra: Initialize grf_used with first_non_payload_grfJason Ekstrand2019-05-141-1/+1
| | | | | | | | | There's no reason why we need to use the calculated payload_node_count value which is just first_non_payload_grf aligned up. The grf_used value will be aligned up to 16 anyway (which is a much bigger alignment) before being handed off to hardware. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/fs/ra: Stop adding RA interference to too many SENDS nodesJason Ekstrand2019-05-141-8/+3
| | | | | | | | | | | | | | | | | | | | We only have one node per VGRF so this was adding way too much interference. No idea how we didn't catch this before. Shader-db results on Kaby Lake: total instructions in shared programs: 15311100 -> 15311100 (0.00%) instructions in affected programs: 0 -> 0 helped: 0 HURT: 0 total cycles in shared programs: 355468050 -> 355543197 (0.02%) cycles in affected programs: 2472492 -> 2547639 (3.04%) helped: 17 HURT: 20 Fixes: 014edff0d20d "intel/fs: Add interference between SENDS sources" Reviewed-by: Kenneth Graunke <[email protected]>
* util/ra: Assert nodes are in-bounds in add_node_interferenceJason Ekstrand2019-05-141-0/+1
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* intel/fs/ra: Only add dest interference to sources that existJason Ekstrand2019-05-141-1/+1
| | | | | Fixes: 83dedb6354d "i965: Add src/dst interference for certain" Reviewed-by: Kenneth Graunke <[email protected]>
* util/ra: Don't destroy the graph in ra_allocate()Jason Ekstrand2019-05-141-76/+102
| | | | | | | | | | | | We want to be able to call ra_allocate() and, when it fails, mutate the graph and try again rather than re-building the graph from scratch. This commit moves all the scratch bits except the final register allocation (which is really an out value not scratch) into sub-structs named "tmp" to make it clear which things are scratch. It also adds bits to the ra_select() initialization loop to initialize things (since we can't trust rzalloc anymore) and copy q_test and forced_reg over. Reviewed-by: Eric Anholt <[email protected]>
* util/ra: Add a helper for resetting a node's interferenceJason Ekstrand2019-05-142-0/+37
| | | | Reviewed-by: Eric Anholt <[email protected]>
* util/ra: Add helpers for adding nodes to an interference graphJason Ekstrand2019-05-142-20/+72
| | | | Reviewed-by: Eric Anholt <[email protected]>
* util/ralloc: Add helpers for growing zero-initialized memoryJason Ekstrand2019-05-142-0/+87
| | | | | | | Unfortunately, we can't quite follow the standard C conventions for these because ralloc doesn't know the sizes of pointers. Reviewed-by: Eric Anholt <[email protected]>
* intel/fs: Stop doing extra RA callsJason Ekstrand2019-05-141-19/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the last phase of the schedule and RA loop, the RA call is redundant if we spill. Immediately afterwards, we're going to see that we couldn't allocate without spilling and call back into RA and tell it to go ahead and spill. We've known about it for a while but we've always brushed over it on the theory that, if you're going to spill, you'll be calling RA a bunch anyway and what does one extra RA hurt? As it turns out, it hurts more than you'd expect. Because the RA interference graph gets sparser with each spill and the RA algorithm is more efficient on sparser graphs, the RA call that we're duplicating is actually the most expensive call in the RA-and-spill loop. There's another extra RA call we do that's a bit harder to see which this also removes. If we try to compile a shader that isn't the minimum dispatch width and it fails to allocate without spilling we call fail() to set an error but then go ahead and do the first spilling RA pass and only after that's complete do we detect the fail and bail out. By making minimum dispatch widths part of the spill condition, we side-step this problem. Getting rid of these extra spills takes the compile time of a nasty Aztec Ruins shader from about 28 seconds to about 26 seconds on my laptop. It also makes shader-db 1.5% faster Shader-db results on Kaby Lake: total instructions in shared programs: 15311100 -> 15311100 (0.00%) instructions in affected programs: 0 -> 0 helped: 0 HURT: 0 total cycles in shared programs: 355468050 -> 355468050 (0.00%) cycles in affected programs: 0 -> 0 helped: 0 HURT: 0 Total CPU time (seconds): 2524.31 -> 2486.63 (-1.49%) Reviewed-by: Kenneth Graunke <[email protected]>
* util/ra: Improve the performance of ra_simplifyJason Ekstrand2019-05-141-30/+119
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The most expensive part of register allocation is the ra_simplify step which is a fixed-point algorithm with a worst-case complexity of O(n^2) which adds the registers to a stack which we then use later to do the actual allocation. This commit uses bit sets and changes the core loop of ra_simplify to first walk 32-node chunks and then walk each chunk. This lets us skip whole 32-node chunks in one go based on bit operations and compute the minimum q value potentially 32x as fast. Of course, the algorithm still has the same fundamental O(n^2) run-time but the constant is now much lower. In the nasty Aztec Ruins compute shader, this shaves a full four seconds off the 30s compile time for a release build of mesa. In a debug build (needed for accurate stack traces), perf says that ra_select takes 20% of runtime before this patch and only 5-6% of runtime after this patch. It also makes shader-db runs faster. Shader-db results on Kaby Lake: total instructions in shared programs: 15311100 -> 15311100 (0.00%) instructions in affected programs: 0 -> 0 helped: 0 HURT: 0 total cycles in shared programs: 355468050 -> 355468050 (0.00%) cycles in affected programs: 0 -> 0 helped: 0 HURT: 0 Total CPU time (seconds): 2602.37 -> 2524.31 (-3.00%) Reviewed-by: Eric Anholt <[email protected]>
* util/ra: Only update q_total if the reg is not assignedJason Ekstrand2019-05-141-1/+1
| | | | | | | | We only use q_total if the reg is not assigned so there's no point in updating it if the reg is not assigned. This has no known perf benefit but it will reduce churn in a future commit. Reviewed-by: Eric Anholt <[email protected]>
* util/ra: Only update best_optimistic_node if !progressJason Ekstrand2019-05-141-1/+5
| | | | | | | This shaves about half a second off the 30 second compile time of one of the compute shaders in Aztec ruins. Reviewed-by: Eric Anholt <[email protected]>
* util/ra: Make in_stack a bitset in the graphJason Ekstrand2019-05-141-18/+15
| | | | Reviewed-by: Eric Anholt <[email protected]>
* util/ra: Get rid of tabsJason Ekstrand2019-05-142-28/+28
| | | | Reviewed-by: Eric Anholt <[email protected]>
* virgl: clean up virgl_res_needs_flushChia-I Wu2019-05-141-2/+34
| | | | | | | | | | | Add comments and some minor cleanups. v2: document the function Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Alexandros Frantzis <[email protected]> (v1) Reviewed-by: Gurchetan Singh <[email protected]> Signed-off-by: Chia-I Wu <[email protected]>
* virgl: comment on a sync issue in transfersChia-I Wu2019-05-142-0/+20
| | | | | | Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Alexandros Frantzis <[email protected]> Reviewed-by: Gurchetan Singh <[email protected]>
* virgl: PIPE_TRANSFER_READ does not imply flushChia-I Wu2019-05-141-4/+1
| | | | | | | | virgl_res_needs_flush should suffice. Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Alexandros Frantzis <[email protected]> Reviewed-by: Gurchetan Singh <[email protected]>
* virgl: do not skip readback because of explicit flushChia-I Wu2019-05-141-3/+0
| | | | | | | | | | Both apps and we (see virgl_buffer_transfer_flush_region) might flush regions that are unmodified. We have to read back for those flushes. Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Alexandros Frantzis <[email protected]> Reviewed-by: Gurchetan Singh <[email protected]>
* virgl: remove unused virgl_transfer_inline_writeChia-I Wu2019-05-142-42/+0
| | | | | | | | | | It currently has no user and is probably incorrect (resource_wait is required in some more cases). Remove it so that we can focus on transfers first. Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Alexandros Frantzis <[email protected]> Reviewed-by: Gurchetan Singh <[email protected]>
* iris/resource: Drop redundant checks for aux supportNanley Chery2019-05-141-15/+0
| | | | | | Drop some checks that are already done by ISL. Reviewed-by: Rafael Antognolli <[email protected]>
* iris/resource: Fall back to no aux if creation failsNanley Chery2019-05-141-4/+6
| | | | | | | | | No surface requires an auxiliary surface to operate correctly. Fall back to an uncompressed surface if mesa fails to create and allocate an auxiliary surface. This enables adding more restrictions to ISL without having to update iris. Reviewed-by: Rafael Antognolli <[email protected]>
* i965/miptree: Refactor intel_miptree_supports_ccs_e()Nanley Chery2019-05-141-10/+5
| | | | | | | Update and rename this function to format_supports_ccs_e() to better match its behavior. Reviewed-by: Rafael Antognolli <[email protected]>
* i965/miptree: Drop intel_*_supports_hiz()Nanley Chery2019-05-141-35/+2
| | | | | | | | intel_tiling_supports_hiz() and intel_miptree_supports_hiz() duplicate much the work done by isl_surf_get_hiz_surf(). Replace them with simple expressions. Reviewed-by: Rafael Antognolli <[email protected]>
* isl: Add restrictions to isl_surf_get_hiz_surf()Nanley Chery2019-05-141-0/+25
| | | | | | | Import some restrictions from intel_tiling_supports_hiz() and intel_miptree_supports_hiz(). Reviewed-by: Rafael Antognolli <[email protected]>
* i965/miptree: Drop intel_*_supports_ccs()Nanley Chery2019-05-141-124/+6
| | | | | | | | intel_tiling_supports_ccs() and intel_miptree_supports_ccs() duplicate much the work done by isl_surf_get_ccs_surf(). Drop them both and index a boolean array to choose CCS_D in intel_miptree_choose_aux_usage(). Reviewed-by: Rafael Antognolli <[email protected]>
* isl: Add restriction and comments to isl_surf_get_ccs_surf()Nanley Chery2019-05-141-1/+17
| | | | | | Import some restrictions and comments from intel_miptree_supports_ccs(). Reviewed-by: Rafael Antognolli <[email protected]>
* i965/miptree: Drop intel_miptree_supports_mcs()Nanley Chery2019-05-141-46/+1
| | | | | | | This function duplicates much the work done by isl_surf_get_mcs_surf(). Replace it with a simple expression. Reviewed-by: Rafael Antognolli <[email protected]>
* isl: Modify restrictions in isl_surf_get_mcs_surf()Nanley Chery2019-05-141-5/+19
| | | | | | | Import some restrictions from intel_miptree_supports_mcs() and don't assume that the caller knows which device generations are supported. Reviewed-by: Rafael Antognolli <[email protected]>
* i965/miptree: Fall back to no aux if creation failsNanley Chery2019-05-141-5/+6
| | | | | | | | | No surface requires an auxiliary surface to operate correctly. Fall back to an uncompressed surface if mesa fails to create and allocate an auxiliary surface. This enables adding more restrictions to ISL without having to update i965. Reviewed-by: Rafael Antognolli <[email protected]>
* mesa: Set _NEW_VARYING_VP_INPUTS iff varying_vp_inputs are set.Mathias Fröhlich2019-05-141-7/+6
| | | | | Reviewed-by: Marek Olšák <[email protected]> Signed-off-by: Mathias Fröhlich <[email protected]>
* mesa: Avoid setting _NEW_VARYING_VP_INPUTS in non fixed function mode.Mathias Fröhlich2019-05-143-2/+16
| | | | | | | | | | | | | | | | | | | | | | Instead of checking the API variant on entry of set_varying_vp_inputs to check if we can ever be interrested in fixed function processing or not, we can check if we are actually fixed function processing. To check this we can use the immediately updated gl_context::VertexProgram._VPMode value that tells us if we have a user provided shader program or if we are in fixed function processing either through an internal TNL shader of directly through hardware. When doing so, we also need to recheck the varying_vp_inputs variable at the time gl_context::VertexProgram._VPMode is set to VP_MODE_FF. Put asserts at the consumers of gl_context::varying_vp_inputs to make sure gl_context::VertexProgram._VPMode is set to VP_MODE_FF. By that gl_context::varying_vp_inputs should be up to date then. By not looking at the opengl api for this decision we should actually catch more cases where we can avoid setting a state change flag, including the ones where we cannot get into VP_MODE_FF by the choice of the api. Reviewed-by: Marek Olšák <[email protected]> Signed-off-by: Mathias Fröhlich <[email protected]>
* mesa: Fix test for setting the _NEW_VARYING_VP_INPUTS flag.Mathias Fröhlich2019-05-141-8/+3
| | | | | | | | | | The precondition stated in the comment is not true. The values mentioned are only set from _mesa_update_state which in turn may not yet be called. For now set the _NEW_VARYING_VP_INPUTS flag a bit more often, we will narrow that down to a minimum again in a later patch. Reviewed-by: Marek Olšák <[email protected]> Signed-off-by: Mathias Fröhlich <[email protected]>
* mesa: Make _mesa_set_varying_vp_inputs static in state.c.Mathias Fröhlich2019-05-142-8/+3
| | | | | | | Is no longer used outside that file. Reviewed-by: Marek Olšák <[email protected]> Signed-off-by: Mathias Fröhlich <[email protected]>
* mesa: Fix old outdated variable name in a comment.Mathias Fröhlich2019-05-141-1/+1
| | | | | Reviewed-by: Marek Olšák <[email protected]> Signed-off-by: Mathias Fröhlich <[email protected]>
* mesa/vbo: Update Comment to what is actually happening.Mathias Fröhlich2019-05-141-3/+1
| | | | | Reviewed-by: Marek Olšák <[email protected]> Signed-off-by: Mathias Fröhlich <[email protected]>
* wayland/egl: Ensure correct buffer size when allocatingJonas Ådahl2019-05-141-2/+11
| | | | | | | | | | Whenever a buffer is allocated, e.g. by the first draw call or EGL call after a buffer swap, make sure the size is up to date. Prior to this commit, we failed to do so when querying the buffer age, or swapping buffers without any prior EGL call or draw call. Signed-off-by: Jonas Ådahl <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* egl: check if a window/pixmap is already used on surface creationPaulo Zanoni2019-05-141-0/+32
| | | | | | | | | | | The spec says we can't create another surface if we already created a surface with the given window or pixmap. Implement this check. This behavior is exercised by piglit/egl-create-surface. Reviewed-by: Tapani Pälli <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Signed-off-by: Paulo Zanoni <[email protected]>
* egl: store the native surface pointer in struct _egl_surfacePaulo Zanoni2019-05-1411-12/+24
| | | | | | | | | | | | | | | | | | | | | | | Each platform stores this in a different place: - platform_drm uses dri2_surf->gbm_surf->base - platform_android uses dri2_surf->window - platform_wayland uses dri2_surf->wl_win - platform_x11 uses dri2_surf->drawable - platform_x11_dri3 uses dri3_surf->loader_drawable.drawable - haiku doesn't even store it! We need access to the native surface since the specification asks us to refuse creating a new surface if there's already an EGLSurface associated with native_surface. An alternative to this patch would be to create a new API.GetNativeWindow callback that each platform would have to implement. While that's something we can definitely do, I prefer this approach. Reviewed-by: Tapani Pälli <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Signed-off-by: Paulo Zanoni <[email protected]>
* radv: add support for VK_KHR_uniform_buffer_standard_layoutSamuel Pitoiset2019-05-142-0/+7
| | | | | | | Nothing to do. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* softpipe/buffer: load only as many components as the the buffer resource ↵Gert Wollny2019-05-141-2/+5
| | | | | | | | | | | | | | | type provides Otherwise we risk to read past the end of the buffer. In addition, change the loop counters to unsigned to be consistent with the types. Fixes: afa8707ba93a7d226a76319acda2a8dd89524db7 softpipe: add SSBO/shader atomics support. Signed-off-by: Gert Wollny <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* panfrost: ci: Reduce batch size to 3000Tomeu Vizoso2019-05-141-1/+1
| | | | | | | | As with the previous value of 5000 we seemed to be reaching OOM in some circumstances. Signed-off-by: Tomeu Vizoso <[email protected]> Acked-by: Alyssa Rosenzweig <[email protected]>
* panfrost: ci: Update expectationsTomeu Vizoso2019-05-141-2/+0
| | | | | | | | | | Since last Friday, these two tests have been fixed: dEQP-GLES2.functional.shaders.functions.control_flow.return_in_nested_loop_fragment dEQP-GLES2.functional.shaders.linkage.varying_7 Signed-off-by: Tomeu Vizoso <[email protected]> Acked-by: Alyssa Rosenzweig <[email protected]>
* freedreno: Fix warning on printing a uint64_t using %llx.Eric Anholt2019-05-131-1/+1
| | | | Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno: Silence compiler warnings about "*" in boolean context.Eric Anholt2019-05-132-2/+2
| | | | | | | It sure looks like we just want both of them to be nonzero, and && is probably going to be cheaper than * anyway. Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno: Silence compiler warnings about uninit 'layers'Eric Anholt2019-05-133-3/+3
| | | | | | | My gcc can't see that the uninitialized value from the PIPE_BUFFER case isn't used from the !PIPE_BUFFER cases later. Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno: Quiet compiler warnings on 64-bit.Eric Anholt2019-05-131-1/+1
| | | | | | | __u64 is a ulonglong on x86_64, not uint64_t, so my gcc was complaining about the wrong type being passed in. Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno: Make emacs indent the way robclark's eclipse does.Eric Anholt2019-05-132-0/+6
| | | | | | | | | The .editorconfig helps with the tabs, but we've got this two-tabs-from-previous-indentation line continuation style that requires whacking the c-file-offsets. This will throw emacs warnings when first opening a file in the directory, press '!' to shut it up for the future. Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno: Make .editorconfig match .dir-locals.el.Eric Anholt2019-05-132-0/+8
| | | | | | | | The editorconfig takes precedence over dir-locals in emacs26 with editorconfig enabled, so the /.editorconfig was affecting these directories. Reviewed-by: Kristian H. Kristensen <[email protected]>
* anv: Implement VK_KHR_uniform_buffer_standard_layoutJason Ekstrand2019-05-132-0/+8
| | | | | | | | There's no real work to do here since we already support scalar block layout which is a direct superset of what this extension allows. Reviewed-by: Bas Nieuwenhuizen <[email protected]> Acked-by: Lionel Landwerlin <[email protected]>