aboutsummaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/iris/iris_resource.c
Commit message (Collapse)AuthorAgeFilesLines
* iris: Disable CCS_E for 32-bit floating point textures.Kenneth Graunke2019-09-301-1/+23
| | | | | | | | | | | | | | | | | | | | | A while back, Michael Larabel noticed that Paraview's Wavelet Volume case runs significantly slower on iris than i965. It turns out this is because we enable CCS_E for 32-bit floating point formats, while i965 disables it, with an oblique comment saying that we benchmarked it (on what exactly?) and determined that it was a loss. Paraview uses both R32_FLOAT and R32G32B32A32_FLOAT, and I observed large framerate drops when enabling CCS_E for either format. However, several other benchmarks (Aztec Ruins, many Synmark cases) use 16-bit floating point formats, with no apparent ill effects. So, disable compression for 32-bit float formats for now, but leave it enabled for 16-bit float formats as they seem to be working fine. Improves performance in Paraview's Wavelet Volume test by 62% on a Skylake GT4e. Fixes: 3cfc6a207bd ("iris: Fill out res->aux.possible_usages")
* iris: disable aux on first get_param if not created with auxTapani Pälli2019-09-251-9/+22
| | | | | | | | | | This moves the fix from commit 361f3d19f1f to happen in get_param (used now instead of get_handle by st/dri). This fixes artifacts seen with Xorg and CCS_E. Fixes: fc12fd05f56 "iris: Implement pipe_screen::resource_get_param" Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* iris: Track per-stage bind history, reduce work accordinglyKenneth Graunke2019-09-181-6/+1
| | | | | | | | | | | | | | We now track per-stage bind history for constant and shader buffers, shader images, and sampler views by adding an extra res->bind_stages field to go with res->bind_history. This lets us flag IRIS_DIRTY_CONSTANTS for only the specific stages involved, and also skip some CPU overhead in iris_rebind_buffer. Cuts 4% of 3DSTATE_CONSTANT_XS packets in a Shadow of Mordor trace on Icelake. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* iris: Don't flag IRIS_DIRTY_BINDINGS for constant usage historyKenneth Graunke2019-09-181-2/+1
| | | | | | | | | | | | | | | | | | | The underlying buffer isn't changing - so we don't need to update any SURFACE_STATE descriptors - we just might have new constants, meaning we need to re-emit 3DSTATE_CONSTANT_XS. On Gen9, this means we need to update 3DSTATE_BINDING_TABLE_POINTERS_XS too, but that's now handled by the explicit check in the previous patch. On Gen9, this should cause us to re-emit the binding table /pointer/ on writing to a buffer with PIPE_BIND_CONSTANT_BUFFER, rather than emitting a whole new /table/. On Gen8 and Gen11, this avoids binding table churn altogether. Cuts 61% of 3DSTATE_BINDING_TABLE_POINTERS_XS packets in a Shadow of Mordor trace on Icelake. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* gallium: extend resource_get_param to be as capable as resource_get_handleMarek Olšák2019-09-181-1/+4
| | | | | Reviewed-by: Tapani Pälli <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* iris: Avoid flushing for cache history on transfer range flushesKenneth Graunke2019-09-091-2/+11
| | | | | | | | | | | | | | | | | The VBO module maps a buffer with GL_MAP_FLUSH_EXPLICIT, and keeps appending data, and calling glFlushMappedBufferRange(). We were invalidating the VF cache each time it flushed a new range, which results in a ton of VF flushes. If the contents of the destination in the target range are undefined (never even possibly written), this patch makes us assume that it's likely not in the cache and so cache invalidations are required. If the destination range is defined, we continue cache flushing as we may need to expunge stale data. This eliminates 88% of the VF cache invalidates on Manhattan 3.0. Improves performance in Manhattan 3.0 on my Icelake 8x8 with the GPU frequency locked to 700Mhz by 0.376724% +/- 0.0989183% (n=10).
* iris: Report correct number of planes for planar imagesKenneth Graunke2019-09-031-1/+8
| | | | | | | | | | | We were only handling the modifiers case and not counting the number of planes in actual planar images. Fixes Piglit's ext_image_dma_buf_import-export. Fixes: fc12fd05f56 ("iris: Implement pipe_screen::resource_get_param") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111509 Reviewed-by: Jordan Justen <[email protected]>
* iris: Don't auto-flush/dirty on transfer unmap for coherent buffersKenneth Graunke2019-08-281-1/+2
| | | | | | | | | | | | | | | | | | | | | When u_upload_mgr fills up a buffer, it unmaps and destroys it. Our unmap function was automatically performing the equivalent of a FlushMappedBufferRange call in this case. Because the buffer mapping is persistent and coherent, we don't actually do any flushing when we do the rest of the writes to the buffer - we were just doing one final one at the end. But we would be using the uploaded contents on the GPU the whole time. This certainly shouldn't be necessary for streaming buffers, and if such flushing and dirtying is necessary for coherent buffers, this is wildly insufficient. Drops a small number of constant packets and PIPE_CONTROL flushes from most benchmarks that I've looked at. Doesn't seem to make much of an impact on performance, however. Thanks to Felix Degrood for noticing that we were emitting more 3DSTATE_CONSTANT_* packets than we needed to.
* iris: Drop swizzling parameter from s8_offset.Kenneth Graunke2019-08-271-19/+3
| | | | This is always false on Gen8+, no need for dead code and parameters.
* iris: Avoid unnecessary resolves on transfer mapsKenneth Graunke2019-08-221-16/+28
| | | | | | | | | | We were always resolving the buffer as if we were accessing it via CPU maps, which don't understand any auxiliary surfaces. But we often copy to a temporary using BLORP, which understands compression just fine. So we can avoid the resolve, and accelerate the copy as well. Fixes: 9d1334d2a0f ("iris: Use copy_region and staging resources to avoid transfer stalls") Reviewed-by: Rafael Antognolli <[email protected]>
* iris: Drop copy format hacks from copy region based transfer path.Kenneth Graunke2019-08-221-16/+5
| | | | | | | | | | | | | | This doesn't work for compressed formats, as the source texture and temporary texture would have different block sizes. (Forcing the driver to always take the GPU path would expose the bug.) Instead, just use the source format for the temporary, and let blorp_copy deal with overrides. The one case where we can't do this is ASTC, because isl won't let us create a linear ASTC surface. Fall back to the CPU paths there for now. Fixes: 9d1334d2a0f ("iris: Use copy_region and staging resources to avoid transfer stalls") Reviewed-by: Rafael Antognolli <[email protected]>
* iris: Update fast clear colors on Gen9 with direct immediate writes.Kenneth Graunke2019-08-221-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Gen11 stores the fast clear color in an "indirect clear buffer", as a packed pixel value. Gen9 hardware stores it as a float or integer value, which is interpreted via the format. We were trying to store that in a buffer, for similarity with Icelake, and MI_COPY_MEM_MEM it from there to the actual SURFACE_STATE bytes where it's stored. This unfortunately doesn't work for blorp_copy(), which does bit-for-bit copies, and overrides the format to a CCS-compatible UINT format. This causes the clear color to be interpreted in the overridden format. Normally, we provide the clear color on the CPU, and blorp_blit.c:2611 converts it to a packed pixel value in the original format, then unpacks it in the overridden format, so the clear color we use expands to the bits we originally desired. However, BLORP doesn't support this pack/unpack with an indirect clear buffer, as it would need to do the math on the GPU. On Gen11+, it isn't necessary, as the hardware does the right thing. This patch changes Gen9 to stop using an indirect clear buffer and simply do PIPE_CONTROLs with post-sync write immediate operations to store the new color over the surface states for regular drawing. BLORP continues streaming out surface states, and handles fast clear colors on the CPU. Fixes: 53c484ba8ac ("iris: blorp using resolve hooks") Reviewed-by: Rafael Antognolli <[email protected]>
* iris: Add infrastructure to support non coherent framebuffer fetchSagar Ghuge2019-08-201-1/+1
| | | | | | | | | | | | | | | | Create separate SURFACE_STATE for render target read in order to support non coherent framebuffer fetch on broadwell. Also we need to resolve framebuffer in order to support CCS_D. v2: Add outputs_read check (Kenneth Graunke) v3: 1) Import Curro's comment from get_isl_surf 2) Rename get_isl_surf method 3) Clean up allocation in case of failure Signed-off-by: Sagar Ghuge <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* iris: Add helper functions to get tile offsetSagar Ghuge2019-08-201-0/+103
| | | | | | | All helper functions are ported from i965 driver. Signed-off-by: Sagar Ghuge <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* iris: Add helper function to get isl dim layoutSagar Ghuge2019-08-201-0/+29
| | | | | | | v2: Add missing space (Caio) Signed-off-by: Sagar Ghuge <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* iris: Expose aux buffer as 2nd plane w/modifiersJordan Justen2019-08-131-10/+23
| | | | | Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* iris: Export and import surfaces with modifiers that have aux dataJordan Justen2019-08-131-25/+98
| | | | | | | | | | | | | | | | The DRI interface for modifiers with aux data treats the aux data as a separate plane of the main surface. When the dri layer requests the plane associated with the aux data, we save the required information into the dri aux plane image. Later when the image is used, the dri plane image will be available in the pipe_resource structure's `next` field. Therefore in iris, we reconstruct the aux setup from this separate dri plane image when the image is used. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* iris: Do proper format checks for Y+CCS modifier supportKenneth Graunke2019-08-131-6/+20
| | | | | | We need to ensure that the DRI image format supports CCS. Reviewed-by: Jordan Justen <[email protected]>
* iris: Create single bo for surfaces with modifiers and aux dataJordan Justen2019-08-131-3/+40
| | | | | Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* iris: Split iris_resource_alloc_aux to enable aux modifiersJordan Justen2019-08-131-40/+85
| | | | | | | | | | | | | | | | | Reworks: * If the aux-state is not ISL_AUX_STATE_AUX_INVALID, then use memset even when memset_value is zero. The hiz buffer initial aux-state will be set to invalid, and therefore we can skip the memset. But, for CCS it will be set to ISL_AUX_STATE_PASS_THROUGH, and therefore the aux data must be cleared to 0 with the memset. Previously we would use BO_ALLOC_ZEROED with the CCS aux data, so this memset wasn't required. Now, the CCS aux data may be part of the main surface. We prefer to not use BO_ALLOC_ZEROED excessively, so the memset is needed for the CCS case. (Nanley) Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* iris: Implement pipe_screen::resource_get_paramJordan Justen2019-08-131-0/+48
| | | | | Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* gallium: switch boolean -> bool at the interface definitionsIlia Mirkin2019-07-221-1/+1
| | | | | | | | | | | | | | | | | | This is a relatively minimal change to adjust all the gallium interfaces to use bool instead of boolean. I tried to avoid making unrelated changes inside of drivers to flip boolean -> bool to reduce the risk of regressions (the compiler will much more easily allow "dirty" values inside a char-based boolean than a C99 _Bool). This has been build-tested on amd64 with: Gallium drivers: nouveau r300 r600 radeonsi freedreno swrast etnaviv v3d vc4 i915 svga virgl swr panfrost iris lima kmsro Gallium st: mesa xa xvmc xvmc vdpau va Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Acked-by: Alyssa Rosenzweig <[email protected]>
* iris: Minor tidyingKenneth Graunke2019-07-031-1/+0
|
* iris: assert isl_surf_init success in resource_from_handleMike Blumenkrantz2019-07-021-14/+15
| | | | | | | | this can fail unexpectedly due to bugs, so it's good to provide feedback when this occurs Reviewed-by: Sagar Ghuge <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* iris: Add an explicit alignment parameter to iris_bo_alloc_tiled().Kenneth Graunke2019-07-021-2/+2
| | | | | | | | | | | | In the future, some images will need to be aligned to a larger value than 4096. Most buffers, however, don't have any such requirement, so for now we only add the parameter to iris_bo_alloc_tiled() and leave the others with the simpler interface. v2: Fix missing alignment in vma_alloc, caught by Caio! Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Tested-by: Jordan Justen <[email protected]>
* iris: Drop RT flushes from depth stencil clearing flushes.Kenneth Graunke2019-06-201-7/+2
| | | | | These write depth and stencil, not color writes, so there's no need to flush the render target.
* iris: Don't bother with PIPE_CONTROLs for CPU writes and no historyKenneth Graunke2019-06-201-6/+9
| | | | | | | | | | | | | If a buffer has no usage history, we don't have any read only cache invalidates to do. If we've written it with the CPU, we don't need to flush the render cache. The only bit remaining is the CS stall from iris_flush_bits_for_history. We can just skip the PIPE_CONTROL in this case. This is pretty common - an app creates a buffer, fills it with data, and then binds it for some purpose. Cuts 36% of the flushes in Manhattan 3.0 on Kabylake GT2.
* iris: Only do an RT flush for transfer maps if using copy_region.Kenneth Graunke2019-06-201-1/+1
| | | | | | If we wrote the data via the CPU, there's no point in doing a render target flush. If using BLORP, we do want a render target flush so the data lands.
* iris: Use iris_flush_bits_for_history in iris_transfer_flush_regionKenneth Graunke2019-06-201-5/+12
| | | | | | | Instead of using the combined iris_flush_and_dirty_for_history, use iris_flush_bits_for_history directly - we were already using the split out iris_dirty_for_history. There's no need to dirty twice, and we can avoid the looping altogether for non-buffers.
* iris: Fix iris_flush_and_dirty_history to actually dirty history.Kenneth Graunke2019-06-201-0/+2
| | | | | | | When I split iris_flush_and_dirty_history into two helper functions, I accidentally made it stop dirtying. Which was...sort of the point. Fixes: 21688a306b2 iris: Split iris_flush_and_dirty_for_history into two helpers.
* iris: Implement INTEL_DEBUG=pc for pipe control logging.Kenneth Graunke2019-06-201-3/+6
| | | | | | | | This prints a log of every PIPE_CONTROL flush we emit, noting which bits were set, and also the reason for the flush. That way we can see which are caused by hardware workarounds, render-to-texture, buffer updates, and so on. It should make it easier to determine whether we're doing too many flushes and why.
* iris: Check if resource has stencil before returning itAndrii Kryvytskyi2019-05-141-1/+5
| | | | | | Signed-off-by: Andrii Kryvytskyi <[email protected]> Signed-off-by: Danylo Piliaiev <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* iris/resource: Drop redundant checks for aux supportNanley Chery2019-05-141-15/+0
| | | | | | Drop some checks that are already done by ISL. Reviewed-by: Rafael Antognolli <[email protected]>
* iris/resource: Fall back to no aux if creation failsNanley Chery2019-05-141-4/+6
| | | | | | | | | No surface requires an auxiliary surface to operate correctly. Fall back to an uncompressed surface if mesa fails to create and allocate an auxiliary surface. This enables adding more restrictions to ISL without having to update iris. Reviewed-by: Rafael Antognolli <[email protected]>
* iris: support dmabuf imports with offsetsMike Blumenkrantz2019-05-071-6/+2
| | | | | | | | | this adds support for imports where the image data begins at an offset from the start of the buffer, as used in h/x264 fixes kwg/mesa#47 Reviewed-by: Kenneth Graunke <[email protected]>
* iris: Split iris_flush_and_dirty_for_history into two helpers.Kenneth Graunke2019-04-241-20/+42
| | | | | | | | | | | | | | We create two new helpers, iris_flush_bits_for_history, and iris_dirty_for_history, then use them in the existing function. The first accumulates flush bits based on res->bind_history, but doesn't actually perform a flush. This allows us to accumulate flush bits by looping over multiple resources, but ultimately emit a single flush for all of them. The latter flags dirty bits without flushing, which again allows us to handle multiple resources, but also is more convenient when writing from the CPU where we don't need a flush (as in commit 4d12236072).
* iris: Prefer staging blits when destination supports CCS_E.Kenneth Graunke2019-04-231-1/+1
| | | | | | | | | | Otherwise our textures don't get color compression. Thanks to Eero Tamminen for noticing this was missing! Improves performance of GLB27_FillTestC24Z16 on my Apollolake laptop with single channel RAM by 2.3x. Reported-by: Eero Tamminen <[email protected]>
* iris: Make some offset math helpers take a const isl_surf pointerKenneth Graunke2019-04-231-2/+2
|
* iris: Track valid data range and infer unsynchronized mappings.Kenneth Graunke2019-04-231-0/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | Applications frequently call glBufferSubData() to consecutive regions of a VBO to append new vertex data. If no data exists there yet, we can promote these to unsynchronized writes, even if the buffer is busy, since the GPU can't be doing anything useful with undefined content. This can avoid a bunch of unnecessary blitting on the GPU. u_threaded_context would do this for us, and in fact prohibits us from doing so (see TC_TRANSFER_MAP_NO_INFER_UNSYNCHRONIZED). But we haven't hooked that up yet, and it may be useful to disable u_threaded_context when debugging...at which point we'd still want this optimization. At the very least, it would let us measure the benefit of threading independently from this optimization. And it's not a lot of code. Removes most stall avoidance blits in "Total War: WARHAMMER." On my Skylake GT4e at 1920x1080, this appears to improve performance in games by the following (but I did not do many runs for proper statistics gathering): ---------------------------------------------- | DiRT Rally | +2% (avg) | + 2% (max) | | Bioshock Infinite | +3% (avg) | + 9% (max) | | Shadow of Mordor | +7% (avg) | +20% (max) | ----------------------------------------------
* iris: Make a resource_is_busy() helperKenneth Graunke2019-04-231-4/+13
| | | | This checks both "is it busy" and "do we have work queued up for it"?
* iris: Replace buffer backing storage and rebind to update addresses.Kenneth Graunke2019-04-231-5/+42
| | | | | | | | | | | | | | | | This implements PIPE_CAP_INVALIDATE_BUFFER and invalidate_resource(), as well as the PIPE_TRANSFER_DISCARD_WHOLE_RESOURCE flag. When either of these happen, we swap out the backing storage of the buffer for a new idle BO, allowing us to write to it immediately without stalling or queueing a blit. On my Skylake GT4e at 1920x1080, this improves performance in games: ----------------------------------------------- | DiRT Rally | +25% (avg) | +17% (max) | | Bioshock Infinite | +22% (avg) | +11% (max) | | Shadow of Mordor | +27% (avg) | +83% (max) | -----------------------------------------------
* iris: Mark constants dirty on transfer unmap even if no flushes occurKenneth Graunke2019-04-231-2/+8
| | | | | | | | | | | | I have various conditions in place to try and avoid unnecessary PIPE_CONTROL flushes, especially to batches which may have never used the buffer being mapped. But if we do a CPU map to a bound constant buffer, we still need to mark push constants dirty, even if there's nothing happening in batches that would warrant a flush. Fixes obvious misrendering in the "XCOM 2: War of the Chosen" menus (lots of rainbow colored triangles). Fixes lots of blinking elements in "Shadow of Mordor". Fixes missing crowd rendering in "DiRT Rally".
* iris: Fix FLUSH_EXPLICIT handling with staging buffers.Kenneth Graunke2019-04-151-23/+41
| | | | | I neglected to blit the staging buffer back to the real one at transfer_flush_region (FlushMappedBufferRange) time.
* iris: Preserve all PIPE_TRANSFER flags in xfer->usageKenneth Graunke2019-04-151-13/+9
| | | | | | | | We need to preserve PIPE_TRANSFER_FLUSH_EXPLICIT, DISCARD_RANGE, and so on, but don't want to pass them to iris_bo_map(). So, keep them all, but mask them off when calling map. Chris Wilson told me to do this a long time ago and he was right.
* intel/common: move gen_debug to intel/devMark Janes2019-04-101-1/+1
| | | | | | | | | libintel_common depends on libintel_compiler, but it contains debug functionality that is needed by libintel_compiler. Break the circular dependency by moving gen_debug files to libintel_dev. Suggested-by: Kenneth Graunke <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* iris: handle aux properly in iris_resource_get_handleTapani Pälli2019-04-041-0/+8
| | | | | | | | | Disable aux when resource seen the first time and EXPLICIT_FLUSH not being set. This fixes issues seen when launching Xorg and CCS_E getting utilized. Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* iris: move iris_flush_resource so we can call it from get_handleTapani Pälli2019-04-041-15/+15
| | | | | Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* iris: Add aux.sampler_usages.Rafael Antognolli2019-04-021-0/+11
| | | | | | | | | | | | | | We want to skip some types of aux usages (for instance, ISL_AUX_USAGE_HIZ when the hardware doesn't support it, or when we have multisampling) when sampling from the surface. Instead of checking for those cases while filling the surface state and leaving it blank, let's have a version of aux.possible_usages for sampling. This way we can also avoid allocating surface state for the cases we don't use. Fixes: a8b5ea8ef015ed4a "iris: Add function to update clear color in surface state." Reviewed-by: Kenneth Graunke <[email protected]>
* iris: Do not allocate clear_color_bo for gen8.Rafael Antognolli2019-04-021-4/+5
| | | | | | | | Since we are not using it for the clear color, there's no need to allocate it. Fixes: a8b5ea8ef015ed4a "iris: Add function to update clear color in surface state." Reviewed-by: Kenneth Graunke <[email protected]>
* iris: Actually advertise some modifiersKenneth Graunke2019-03-271-0/+39
| | | | | | | | | | I neglected to fill out this driver function, causing us to advertise 0 modifiers. Now we advertise the various tilings and let the driver pick them. I've verified that X tiling works with Weston (by hacking the list to skip Y tiling). Y+CCS doesn't work yet because it's multiplane and the Gallium dri state tracker isn't really prepared for that. Leave it off for now.