summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/iris
Commit message (Collapse)AuthorAgeFilesLines
* iris: Dodge more GLSL IR loweringKenneth Graunke2019-05-151-2/+3
| | | | This avoids some lower_instructions bits in st.
* iris: Check if resource has stencil before returning itAndrii Kryvytskyi2019-05-141-1/+5
| | | | | | Signed-off-by: Andrii Kryvytskyi <[email protected]> Signed-off-by: Danylo Piliaiev <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* iris: Enable fragment shader interlock on Gen9+.Kenneth Graunke2019-05-141-0/+1
| | | | | | | | There's some debate about whether we should support this on older hardware as well. Currently i965 turns it off on Gen8- though, so we follow suit. If this changes, we can update this as well. Reviewed-by: Marek Olšák <[email protected]>
* intel/compiler: Implement TCS 8_PATCH mode and INTEL_DEBUG=tcs8Kenneth Graunke2019-05-142-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Our tessellation control shaders can be dispatched in several modes. - SINGLE_PATCH (Gen7+) processes a single patch per thread, with each channel corresponding to a different patch vertex. PATCHLIST_N will launch (N / 8) threads. If N is less than 8, some channels will be disabled, leaving some untapped hardware capabilities. Conditionals based on gl_InvocationID are non-uniform, which means that they'll often have to execute both paths. However, if there are fewer than 8 vertices, all invocations will happen within a single thread, so barriers can become no-ops, which is nice. We also burn a maximum of 4 registers for ICP handles, so we can compile without regard for the value of N. It also works in all cases. - DUAL_PATCH mode processes up to two patches at a time, where the first four channels come from patch 1, and the second group of four come from patch 2. This tries to provide better EU utilization for small patches (N <= 4). It cannot be used in all cases. - 8_PATCH mode processes 8 patches at a time, with a thread launched per vertex in the patch. Each channel corresponds to the same vertex, but in each of the 8 patches. This utilizes all channels even for small patches. It also makes conditions on gl_InvocationID uniform, leading to proper jumps. Barriers, unfortunately, become real. Worse, for PATCHLIST_N, the thread payload burns N registers for ICP handles. This can burn up to 32 registers, or 1/4 of our register file, for URB handles. For Vulkan (and DX), we know the number of vertices at compile time, so we can limit the amount of waste. In GL, the patch dimension is dynamic state, so we either would have to waste all 32 (not reasonable) or guess (badly) and recompile. This is unfortunate. Because we can only spawn 16 thread instances, we can only use this mode for PATCHLIST_16 and smaller. The rest must use SINGLE_PATCH. This patch implements the new 8_PATCH TCS mode, but leaves us using SINGLE_PATCH by default. A new INTEL_DEBUG=tcs8 flag will switch to using 8_PATCH mode for testing and benchmarking purposes. We may want to consider using 8_PATCH mode in Vulkan in some cases. The data I've seen shows that 8_PATCH mode can be more efficient in some cases, but SINGLE_PATCH mode (the one we use today) is faster in other cases. Ultimately, the TES matters much more than the TCS for performance, so the decision may not matter much. Reviewed-by: Jason Ekstrand <[email protected]>
* iris/resource: Drop redundant checks for aux supportNanley Chery2019-05-141-15/+0
| | | | | | Drop some checks that are already done by ISL. Reviewed-by: Rafael Antognolli <[email protected]>
* iris/resource: Fall back to no aux if creation failsNanley Chery2019-05-141-4/+6
| | | | | | | | | No surface requires an auxiliary surface to operate correctly. Fall back to an uncompressed surface if mesa fails to create and allocate an auxiliary surface. This enables adding more restrictions to ISL without having to update iris. Reviewed-by: Rafael Antognolli <[email protected]>
* gallium: Redefine the max texture 2d cap from _LEVELS to _SIZE.Eric Anholt2019-05-131-1/+2
| | | | | | | | The _LEVELS assumes that the max is always power of two. For V3D 4.2, we can support up to 7680 non-power-of-two MSAA textures, which will let X11 support dual 4k displays on newer hardware. Reviewed-by: Marek Olšák <[email protected]>
* iris: Implement ARB_indirect_parametersIllia Iorin2019-05-115-23/+156
| | | | | | | | | | | | | | | | | | | | iris_draw_vbo is divided into two functions to remove unnecessary operations from the loop. This implementation of ARB_indirect_parameters takes into account NV_conditional_render by saving MI_PREDICATE_RESULT at the start of a draw call and restoring it at the end also the result of NV_conditional_render is taken into account when computing predicates that limit draw calls for ARB_indirect_parameters in a similar way to 1952fd8d in ANV. v2: Optimize indirect draws (suggested by Kenneth Graunke) v3: (by Kenneth Graunke) - Fix an issue where indirect draws wouldn't set patch information before updating the compiled TCS. - Move some code back to iris_draw_vbo to avoid duplicating it. - Fix minor indentation issues. Signed-off-by: Illia Iorin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* iris: Split iris_update_draw_info into two functions.Kenneth Graunke2019-05-111-0/+12
| | | | | | | | Shader draw parameters need updating on each iteration of a multidraw loop, but the primitive based information only needs to be updated once. Also, patch information needs to be recorded before filling out the TCS program key, as it determines the number of HS instances.
* iris: Use full ways for L3 cache setup on Icelake.Kenneth Graunke2019-05-101-0/+1
| | | | | | | | | Anuj fixed this in i965 and anv, but the fix never landed in iris. Fixes tessellation corruption on Icelake. Thanks to Rafael for bisecting this and tracking it down. Fixes: d0996d5fab6 iris: Emit default L3 config for the render pipeline Reviewed-by: Rafael Antognolli <[email protected]>
* iris: Expose PIPE_CAP_DEVICE_RESET_STATUS_QUERYKenneth Graunke2019-05-094-0/+72
| | | | | | This provides a way for the application to query whether any resets have happened, which lets us expose "robust" contexts. This also enables the KHR_robust_buffer_access_behavior tests.
* iris: Hook up device reset callbacksKenneth Graunke2019-05-094-1/+26
| | | | | | This mechanism lets the driver inform the state tracker about GPU resets, say for destroying a robust API context and reporting a "device lost" error to the application, making it take action to deal with this.
* iris: Try to recover from GPU hangs.Kenneth Graunke2019-05-093-0/+71
| | | | | | | | | The iris batch module now tries to detect that the kernel has banned our GEM context, creates a new non-banned context, and informs the iris context module that all assumptions about state are now invalid and it needs to reinitialize the relevant state. Based on Chris Wilson's work, but significantly rewritten by me.
* iris: Add helpers to clone a hardware context.Chris Wilson2019-05-092-0/+25
| | | | | (Chris Wilson wrote this code in a patch titled "i965: Be resilient in the face of GPU hangs"; Ken fixed a bug and copied it to iris.)
* iris: Mark render batches as non-recoverable.Kenneth Graunke2019-05-091-0/+22
| | | | | | | | | | | | | | | | | | Adapted from Chris Wilson's patch. The comment is largely his. Currently, when iris hangs the GPU, it will continue sending batches which incrementally update the state, assuming it's preserved across batches. However, the kernel's GPU reset support reinitializes the guilty context to the default GPU state (reasonably not wanting to trust the current state). This ends up resetting critical things like STATE_BASE_ADDRESS, causing memory accesses in all subsequent batches to be garbage, and almost certainly result in more hangs until we're banned or we kill the machine. We now ask the kernel to ban our render context immediately, so we notice we've gone off the rails as fast as possible. Eventually, we'll attempt to recover and continue. For now, we just avoid torching the GPU over and over.
* iris: Reorganise execbuf to have a single point of failureChris Wilson2019-05-081-27/+20
| | | | | | | | | | | Propagate the failure from GEM_EXECBUFFER2, cleanup then report failure if need be. We retain the current behaviour to abort() at the first sign of trouble -- for a non-robustness context, arguably this is the right thing to do as the client cannot recover, and the system state is lost. How to properly integrate with KHR_robustness and reset-strategy is left as a future exercise. Reviewed-by: Kenneth Graunke <[email protected]>
* iris: Report the same video memory settings as i965.Kenneth Graunke2019-05-082-2/+34
| | | | This just copy and pastes Ian's code from i965.
* iris: Also handle res->offset for buffer sampler/image viewsKenneth Graunke2019-05-071-8/+9
|
* iris: support dmabuf imports with offsetsMike Blumenkrantz2019-05-074-12/+12
| | | | | | | | | this adds support for imports where the image data begins at an offset from the start of the buffer, as used in h/x264 fixes kwg/mesa#47 Reviewed-by: Kenneth Graunke <[email protected]>
* iris: Enable PIPE_CAP_SURFACE_REINTERPRET_BLOCKSKenneth Graunke2019-05-062-6/+95
| | | | | | | | | | | | | | | | This makes CompressedTexSubImage from a PBO source do proper GPU rendering to upload instead of stalling to map the PBO source on the CPU (then copying it on the CPU). Thanks Bas Nieuwenhuizen for pointing out that Vulkan includes this functionality, and to Jason Ekstrand for writing the code I adapted. Vulkan only supports a single layer, however, and this code tries to support multiple layers as long as it's miplevel 0. Improves performance in Sid Meier's Civilization VI: Average frame time (ms): -3.67423% +/- 1.46201% (n=5) 99th percentile frame time (ms): -5.09910% +/- 3.87874% (n=5)
* iris: Delete bucketing allocatorsKenneth Graunke2019-05-031-167/+3
| | | | | | | | | These add a lot of complexity, and I currently can't measure any performance benefit from having them. In the past, I seem to recall seeing a benefit in drawoverhead scores, but currently it looks like dropping them is either a wash or 1-2% faster. Drop them to simplify allocations.
* iris: Force VMA alignment to be a multiple of the page size.Kenneth Graunke2019-05-031-0/+3
| | | | This should happen regardless, but let's be paranoid.
* iris: leave the top 4Gb of the high heap VMA unusedKenneth Graunke2019-05-031-1/+5
| | | | | This ports commit 9e7b0988d6e98690eb8902e477b51713a6ef9cae from anv to iris. Thanks to Lionel for noticing that it was missing!
* iris: Fix 4GB memory zone heap sizes.Kenneth Graunke2019-05-031-3/+6
| | | | | | | The STATE_BASE_ADDRESS "Size" fields can only hold 0xfffff in pages, and 0xfffff * 4096 = 4294963200, which is 1 page shy of 4GB. So we can't use the top page.
* iris: Resolve textures used by the program, not merely bound texturesKenneth Graunke2019-05-031-2/+5
| | | | | | | | | st/mesa's PBO upload path binds a vertex shader that doesn't use any textures, but leaves the existing sampler views bound in place. This was tricking us into thinking the PBO destination might be bound for texturing in some cases. In Civilization VI, this fixes a false self- dependency issue that was preventing CCS_E compression on upload. Fixing this slightly improves frame times.
* iris: Disable dual source blending when shader doesn't handle itKenneth Graunke2019-05-021-4/+15
| | | | | | | | | | | | | This is a port of Danylo's eca4a6548d07bbbb02a7768edb397bad7b72cfc2 which fixed the hang on i965. It fixes GPU hangs in his new Piglit test, arb_blend_func_extended-dual-src-blending-discard-without-src1. I avoided my own review feedback here, and decided to simply adjust 3DSTATE_PS_BLEND rather than BLEND_STATE_ENTRY[0]. It has never been clear to me which the hardware uses in every case. However, whacking the enable in 3DSTATE_PS_BLEND seems to be sufficient to fix the hang, and that packet is already dynamic, so it's easy to handle. I'd rather avoid making BLEND_STATE_ENTRY[0] dynamic unless I have to.
* iris: Fix imageBuffer and PBO download.Kenneth Graunke2019-05-011-2/+2
| | | | | | | | | Recently we added checks to try and deny multisampled shader images. Unfortunately, this messed up imageBuffers, which have sample_count = 0, which are also used in PBO download, causing us hit CPU map fallbacks. Fixes: b15f5cfd20c iris: Do not advertise multisampled image load/store. Reviewed-by: Rafael Antognolli <[email protected]>
* iris: Enable fast clear colors on gen11.Rafael Antognolli2019-04-301-3/+0
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* iris: Update the surface state clear color address when available.Rafael Antognolli2019-04-301-1/+9
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* iris: Use the linear version of the surface format during fast clears.Rafael Antognolli2019-04-301-1/+7
| | | | | | | | | | | | | | Newer gens (> 9) will start doing the linear -> sRGB conversion of the clear color for us, if we use a sRGB surface format. So let's make sure that doesn't happen and keep the same semantics as before. Even though the hardware could convert the clear color for us during fast clear, that converted color is only used for sampling. For resolve, the original color would be used (without the conversion). So we convert it ourselves and the same converted color gets used for both sampling and resolving, simplifying the whole logic. Reviewed-by: Kenneth Graunke <[email protected]>
* iris: Support sRGB fast clears even if the colorspaces differ.Rafael Antognolli2019-04-302-4/+8
| | | | | | | | | | | | | | We were disabling fast clears if the view format had a different colorspace than the resource format (sRGB vs linear or vice-versa). But we actually support them if we use the view format to decide if we should encode the clear color into sRGB colorspace. Also add a missing linear -> sRGB surface format conversion (we don't want the clear color to be encoded to sRGB again during resolve). v2: Do not track sRGB colorspace during fast clears (Nanley). Reviewed-by: Kenneth Graunke <[email protected]>
* iris: Do not advertise multisampled image load/store.Rafael Antognolli2019-04-291-0/+5
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* iris: Only enable GL_AMD_depth_clamp_separate on Gen9+Kenneth Graunke2019-04-291-1/+1
| | | | | The hardware feature is new as of Gen9+. I accidentally enabled it on Gen8.
* iris: Set XY Clipping correctly.Kenneth Graunke2019-04-294-2/+67
| | | | | | | | | | I was setting it based off a pipe_rasterizer_state field that appears to be entirely dead outside of the draw module respecting it. I should be setting it when the primitive type reaching the SF is neither points nor lines. This is, unfortunately, rather dirty, as we have to look at the rasterizer state, the geometry shader state, the tessellation evaluation shader state, and the primitive type...
* iris: Fix zeroing of transform feedback offsets in strange cases.Kenneth Graunke2019-04-272-4/+18
| | | | | | | | | | | | | | | | | | | | Some of the dEQP.functional.transform_feedback tests end up doing the following sequence of operations: 1. BeginTransformFeedback 2. PauseTransformFeedback 3. Draw 4. ResumeTransformFeedback At step 1, we'd pack 3DSTATE_SO_BUFFER commands saying to zero the SO_WRITE_OFFSET registers. At step 2, we disable streamout, so step 3 doesn't bother emitting those commands. Then, step 4 re-packs new 3DSTATE_SO_BUFFER commands with offset = 0xFFFFFFFF, saying to continue appending at the existing offset. This loads the value from the BO as the offsets - but we never actually zeroed it. So, just maintain a flag saying "we actually emitted the commands", and stomp offset back to zero until we emit some.
* iris: Silence unused function warningKenneth Graunke2019-04-251-1/+1
|
* iris: make the TFB result visible to othersAndrii Simiklit2019-04-251-10/+15
| | | | | | | | | | | | | | | | | | | | | | OpenGL 4.6 Spec: "5.3.3 Rules ....... Note: “Updates” via rendering or transform feedback are treated consistently with updates via GL commands. Once EndTransformFeedback has been issued, any subsequent command in the same context that uses the results of the transform feedback operation will see the results." v2: removed a wrong comment ( Kenneth Graunke <[email protected]> ) v3: - flush+dirty depends on buffers usage history - removed an old hack ( Kenneth Graunke <[email protected]> ) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110404 Signed-off-by: Andrii Simiklit <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* iris: Some tidying for preemption supportKenneth Graunke2019-04-254-98/+102
| | | | | | | | Just enable it during init_render_context on Gen10+, and move the Gen9 state tracking into iris_genx_state so it only exists on Gen9. Reviewed-by: Mike Blumenkrantz <[email protected]> Reviewed-by: Rafael Antognolli <[email protected]>
* iris: Advertise EXT_texture_sRGB_R8 supportKenneth Graunke2019-04-241-0/+1
| | | | Using the luminance format, like both brw and anv do.
* iris: Enable GL_AMD_depth_clamp_separateKenneth Graunke2019-04-241-0/+1
| | | | We support this, we just forgot to turn it on.
* iris: enable preemption support for gen10Mike Blumenkrantz2019-04-241-0/+2
| | | | | | | | this automatically enables preemption on gen10 where it is disabled by default but still available Reviewed-by: Anuj Phogat <[email protected]> Reviewed-by: Rafael Antognolli <[email protected]>
* iris: add preemption support on gen9Mike Blumenkrantz2019-04-243-0/+99
| | | | | | | | | | | this is basically just porting the following two commits to gallium: d8b50e152a0d5df0971c05b8db132fa688794001 5c454661c66fa2624cf4bba1071175070724869a resolves kwg/mesa#49 Reviewed-by: Anuj Phogat <[email protected]> Reviewed-by: Rafael Antognolli <[email protected]>
* iris: Split iris_flush_and_dirty_for_history into two helpers.Kenneth Graunke2019-04-242-20/+46
| | | | | | | | | | | | | | We create two new helpers, iris_flush_bits_for_history, and iris_dirty_for_history, then use them in the existing function. The first accumulates flush bits based on res->bind_history, but doesn't actually perform a flush. This allows us to accumulate flush bits by looping over multiple resources, but ultimately emit a single flush for all of them. The latter flags dirty bits without flushing, which again allows us to handle multiple resources, but also is more convenient when writing from the CPU where we don't need a flush (as in commit 4d12236072).
* iris: Actually put Mesa in GL_RENDERER stringKenneth Graunke2019-04-241-1/+1
| | | | I constructed the right thing and then returned the other one.
* android/iris: fix driinfo header filenameTapani Pälli2019-04-231-1/+1
| | | | | | | | Fixes iris driver Android build. Fixes: faa52e328e3 "iris: Add mechanism for iris-specific driconf options" Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* iris: Prefer staging blits when destination supports CCS_E.Kenneth Graunke2019-04-231-1/+1
| | | | | | | | | | Otherwise our textures don't get color compression. Thanks to Eero Tamminen for noticing this was missing! Improves performance of GLB27_FillTestC24Z16 on my Apollolake laptop with single channel RAM by 2.3x. Reported-by: Eero Tamminen <[email protected]>
* iris: add support for INTEL_conservative_rasterizationMike Blumenkrantz2019-04-232-11/+34
| | | | | | | | | this hooks up the iris gallium driver to existing mesa bits which handle the implementation resolves kwg/mesa#8 Reviewed-by: Kenneth Graunke <[email protected]>
* iris: Fix DrawTransformFeedback math when there's a buffer offsetKenneth Graunke2019-04-233-0/+14
| | | | | | | We need to subtract the starting offset from the final offset before dividing by the stride. See src/intel/vulkan/genX_cmd_buffer.c:3142. Not known to fix anything.
* iris: Make some offset math helpers take a const isl_surf pointerKenneth Graunke2019-04-231-2/+2
|
* iris: Track valid data range and infer unsynchronized mappings.Kenneth Graunke2019-04-235-0/+78
| | | | | | | | | | | | | | | | | | | | | | | | | | | Applications frequently call glBufferSubData() to consecutive regions of a VBO to append new vertex data. If no data exists there yet, we can promote these to unsynchronized writes, even if the buffer is busy, since the GPU can't be doing anything useful with undefined content. This can avoid a bunch of unnecessary blitting on the GPU. u_threaded_context would do this for us, and in fact prohibits us from doing so (see TC_TRANSFER_MAP_NO_INFER_UNSYNCHRONIZED). But we haven't hooked that up yet, and it may be useful to disable u_threaded_context when debugging...at which point we'd still want this optimization. At the very least, it would let us measure the benefit of threading independently from this optimization. And it's not a lot of code. Removes most stall avoidance blits in "Total War: WARHAMMER." On my Skylake GT4e at 1920x1080, this appears to improve performance in games by the following (but I did not do many runs for proper statistics gathering): ---------------------------------------------- | DiRT Rally | +2% (avg) | + 2% (max) | | Bioshock Infinite | +3% (avg) | + 9% (max) | | Shadow of Mordor | +7% (avg) | +20% (max) | ----------------------------------------------