aboutsummaryrefslogtreecommitdiffstats
path: root/src/intel
Commit message (Collapse)AuthorAgeFilesLines
* anv: bo_cache: allow importing a BO larger than neededLionel Landwerlin2017-10-171-1/+1
| | | | | | | | | | | | It's not a problem if a BO has been allocated larger than we need it to be. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102940 Fixes: 818b857914 ("anv: Use the BO cache for DeviceMemory allocations") Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Daniel Stone <[email protected]> Cc: [email protected] (cherry picked from commit c0a4f56fb9945e36a004bc38852fbb801aae3bd5)
* anv: Do not assert() on VK_ATTACHMENT_UNUSEDJózef Kucia2017-10-171-1/+2
| | | | | | | Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Nanley Chery <[email protected]> Cc: [email protected] (cherry picked from commit 91ba331ef4fee34085628af12a188924a28cd808)
* anv/cmd_buffer: Reset state in cmd_buffer_destroyLionel Landwerlin2017-10-171-9/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This ensures that everything gets cleaned up properly. In particular, it fixes a memory leak where we were leaking the push constants structs. Valgrind stats on dEQP-VK.pipeline.push_constant.graphics_pipeline.range_size_128 : Before: HEAP SUMMARY: in use at exit: 2,467,513 bytes in 1,305 blocks total heap usage: 697,853 allocs, 696,530 frees, 138,466,600 bytes allocated LEAK SUMMARY: definitely lost: 1,068 bytes in 11 blocks indirectly lost: 24,669 bytes in 412 blocks possibly lost: 0 bytes in 0 blocks still reachable: 2,441,776 bytes in 882 blocks suppressed: 0 bytes in 0 blocks After: HEAP SUMMARY: in use at exit: 2,467,381 bytes in 1,304 blocks total heap usage: 697,853 allocs, 696,531 frees, 138,466,600 bytes allocated LEAK SUMMARY: definitely lost: 936 bytes in 10 blocks indirectly lost: 24,669 bytes in 412 blocks possibly lost: 0 bytes in 0 blocks still reachable: 2,441,776 bytes in 882 blocks suppressed: 0 bytes in 0 blocks Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Cc: "17.2 17.1" <[email protected]> (cherry picked from commit 0763f814d7b5cb4da945d9211faab47e8523fdad)
* anv/cmd_buffer: fix push descriptors with set > 0Lionel Landwerlin2017-10-172-13/+49
| | | | | | | | | | | | | | | | | | When writing to set > 0, we were just wrongly writing to set 0. This commit fixes this by lazily allocating each set as we write to them. We didn't go for having them directly into the command buffer as this would require an additional ~45Kb per command buffer. v2: Allocate push descriptors from system memory rather than in BO streams. (Lionel) Cc: "17.2 17.1" <[email protected]> Fixes: 9f60ed98e501 ("anv: add VK_KHR_push_descriptor support") Reported-by: Daniel Ribeiro Maciel <[email protected]> Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> (cherry picked from commit d296dea54e243c41c2b3fbe631f7bcb87db6ae8c)
* intel/compiler: Don't propagate cmod into integer multipliesJason Ekstrand2017-10-172-0/+34
| | | | | | | | No shader-db change on Sky Lake. Reviewed-by: Matt Turner <[email protected]> Cc: [email protected] (cherry picked from commit 7463d5058009d1e9d9d01292f894271a26a062b8)
* intel/compiler: Don't cmod propagate into a saturated operationJason Ekstrand2017-10-172-0/+16
| | | | | | | | | | | | | Shader-db results on Sky Lake: total instructions in shared programs: 12954445 -> 12955125 (0.01%) instructions in affected programs: 141862 -> 142542 (0.48%) helped: 0 HURT: 626 Reviewed-by: Matt Turner <[email protected]> Cc: [email protected] (cherry picked from commit b91ecee04ab2a5b23ff2418b4d709252878cf894)
* intel: compiler: vec4: add missing default 0 lodLionel Landwerlin2017-10-171-0/+9
| | | | | | | | | | We set a similar default value for LOD in the fs backend for TXS/TXL. Without this we end up generating invalid MOV with a null src. Signed-off-by: Lionel Landwerlin <[email protected]> Cc: "17.2 17.1" <[email protected]> Reviewed-by: Matt Turner <[email protected]> (cherry picked from commit d3acc240d0ced4efefd18e8c26510bd6a638c9df)
* anv: Fix vkCmdFillBuffer()Józef Kucia2017-10-111-2/+2
| | | | | | | | | The vkCmdFillBuffer() command fills a buffer with an uint32_t value. Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Cc: "17.1 17.2" <[email protected]> (cherry picked from commit 15fdbf9c39105aaaae05c59a1a315b1775ac5d79)
* i965/vec4: Fix swizzles on atomic sources.Kenneth Graunke2017-09-281-4/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | Atomic operation sources are scalar values, but we were failing to select the .x component of the second operand. For example, atomicCounterCompSwapARB(counter, 5u, 10u) would generate mov(8) vgrf4.x:D, 5D mov(8) vgrf5.x:D, 10D mov(8) vgrf9.x:UD, vgrf4.xyzw:D mov(8) vgrf9.y:UD, vgrf5.xyzw:D which wrongly selects the .y component of vgrf5, so the actual 10u value would get dead code eliminated. The swizzle works for the other source, but both of them ought to be .xxxx. Fixes the compare and swap CTS tests in: KHR-GL45.shader_atomic_counter_ops_tests.ShaderAtomicCounterOpsExchangeTestCase Cc: "17.2 17.1 17.0 13.0" <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Ian Romanick <[email protected]> (cherry picked from commit 66342c997fb7892034e537d1456c37e6981c2fdb)
* i965/vec4: Actually handle atomic op intrinsics.Kenneth Graunke2017-09-281-2/+10
| | | | | | | | | | | | | Embarassingly, someone enabled the ARB_shader_atomic_counter_ops extension for Gen7+ but never added the intrinsics to the switch statement in the vec4 backend, so they just hit an unreachable() call and died. Fixes: 40dd45d0c6aa4a9d (i965: Enable ARB_shader_atomic_counter_ops) Cc: "17.2 17.1 17.0 13.0" <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Ian Romanick <[email protected]> (cherry picked from commit a62fe340981b56fe54e49d3a6791e568f7f87554)
* anv: fix viewport transformation for z componentSamuel Iglesias Gonsálvez2017-09-281-2/+2
| | | | | | | | | | | | | | | | | | | In Vulkan, for 'z' (depth) component, the scale and translate values for the viewport transformation are: pz = maxDepth - minDepth oz = minDepth zf = pz × zd + oz Being zd, the third component in vertex's normalized device coordinates. Fixes: dEQP-VK.draw.inverted_depth_ranges.* Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Cc: [email protected] (cherry picked from commit d2cd9deeb8db5941e99b149fa637f711198ca741)
* anv: Fix descriptors copyingJózef Kucia2017-09-281-1/+1
| | | | | | | | Trivial. Cc: [email protected] Reviewed-by: Jason Ekstrand <[email protected]> (cherry picked from commit 65a09f98ad1411982ce68de7a2d05942d608e674)
* anv/formats: Nicely handle unknown VkFormat enumsJason Ekstrand2017-09-131-5/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes some crashes in the dEQP-VK.memory.requirements.core.* tests. I'm not sure whether or not passing out-of-bound formats into the query is supposed to be allowed but there's no harm in protecting ourselves from it. Reviewed-by: Lionel Landwerlin <[email protected]> Bugzilla: https://bugs.freedesktop.org/101956 Cc: [email protected] (cherry picked from commit 242211933a06826961709c2689a1d30f735ab7b9) Squashed with: anv: fix off by one in array check `anv_formats[ARRAY_SIZE(anv_formats)]` is already one too far. Spotted by Coverity. CovID: 1417259 Fixes: 242211933a0682696170 "anv/formats: Nicely handle unknown VkFormat enums" Cc: Jason Ekstrand <[email protected]> Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]> (cherry picked from commit 0c7272a66c633b0b11c0b81c0f3552201d083b3a)
* intel/blorp: Adjust intra-tile x when faking rgb with red-onlyTopi Pohjolainen2017-08-291-0/+1
| | | | | | | | | | | v2 (Jason): Adjust directly in surf_fake_rgb_with_red() Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101910 CC: [email protected] Reviewed-by: Jason Ekstrand <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]> (cherry picked from commit 393ec1a5071263d300e91f43058ed3b594d41418)
* intel/genxml: Fix gen10 BLEND_STATE variable length packingScott D Phillips2017-08-191-2/+2
| | | | | | | | | | | | | | | | | BLEND_STATE packing was modified to be variable-length in: 9670124e31 genxml: Make BLEND_STATE command support variable length array. The initial gen10.xml still had the old, fixed-length style definition for BLEND_STATE. So gen10_upload_blend_state would overwrite the packed BLEND_STATE_ENTRYs with its own fixed array of all-zero entries when packing BLEND_STATE. This caused BLEND_STATE upload to not work at all. Fixes: aa416f515a ("i965/genxml: Add gen10.xml") Reviewed-by: Rafael Antognolli <[email protected]> Reviewed-by: Anuj Phogat <[email protected]> (cherry picked from commit d6539608a440f0102f88b375d20143f5f2d0c798)
* isl: Validate row pitch of stencil surfaces.Kenneth Graunke2017-08-111-2/+7
| | | | | | | | | | | | | Also, silence an obnoxious finishme that started occurring for all GL applications which use stencil after the i965 ISL conversion. v2: Check against 3DSTATE_STENCIL_BUFFER's pitch bits when using separate stencil, and 3DSTATE_DEPTH_BUFFER's bits when using combined depth-stencil. Cc: "17.2" <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> (cherry picked from commit 5563872dbfbf733ed56e1b367bc8944ca59b1c3e)
* intel/isl: Don't align the height of the last array sliceJason Ekstrand2017-08-111-1/+2
| | | | | | | | | | | | | | We were calculating the total height of 2D surfaces by multiplying the row pitch by the number of slices. This means that we actually request slightly more space than actually needed since the padding on the last slice is unnecessary. For tiled surfaces this is not likely to make a difference. For linear surfaces, on the other hand, this means we may require additional memory. In particular, this makes the i965 driver reject EGL imports of buffers which do not have this extra padding. Reviewed-by: Jordan Justen <[email protected]> Cc: "17.2" <[email protected]> (cherry picked from commit 4d27c6095e8385cccd225993452baad4d2e35420)
* intel/isl: Stop padding surfacesJason Ekstrand2017-08-111-117/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The docs contain a bunch of commentary about the need to pad various surfaces out to multiples of something or other. However, all of those requirements are about avoiding GTT errors due to missing pages when the data port or sampler accesses slightly out-of-bounds. However, because the kernel already fills all the empty space in our GTT with the scratch page, we never have to worry about faulting due to OOB reads. There are two caveats to this: 1) There is some potential for issues with caches here if extra data ends up in a cache we don't expect due to OOB reads. However, because we always trash the entire cache whenever we need to move anything between cache domains, this shouldn't be an issue. 2) There is a potential issue if a surface gets placed at the very top of the GTT by the kernel. In this case, the hardware could potentially end up trying to read past the top of the GTT. If it nicely wraps around at the 48-bit (or 32-bit) boundary, then this shouldn't be an issue thanks to the scratch page. If it doesn't, then we need to come up with something to handle it. Up until some of the GL move to ISL, having the padding code in there just caused us to harmlessly use a bit more memory in Vulkan. However, now that we're using ISL sizes to validate external dma-buf images, these padding requirements are causing us to reject otherwise valid images due to the size of the BO being too small. Acked-by: Kenneth Graunke <[email protected]> Tested-by: Tapani Pälli <[email protected]> Tested-by: Tomasz Figa <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Cc: "17.2" <[email protected]> (cherry picked from commit c15b92ce1160d742ea431062bbe4b3e818bb2aaf)
* anv/formats: Allow sampling on depth-only formats on gen7Jason Ekstrand2017-08-111-1/+2
| | | | | | | | | | We can't sample from depth-stencil formats but on gen7 but we can sample from depth-only formats. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102024 Reviewed-by: Juan A. Suarez Romero <[email protected]> Cc: [email protected] (cherry picked from commit 06d3115bb97740a4c8f36c645944a8bd0bde3f68)
* anv: Stop advertising VK_KHX_multiviewJason Ekstrand2017-08-051-4/+0
| | | | | We don't want to advertise experimental extensions in actual releases. However, there's no harm in leaving the code lying around in the tree.
* intel/vec4/gs: reset nr_pull_param if DUAL_INSTANCED compile failed.Dave Airlie2017-08-051-0/+1
| | | | | | | | | | | | | | | | If dual object compile fails (as seems to happen with virgl a fair bit, and does piglit even have any tests for it?), we end up not restarting the pull params, so we call vec4_visitor::move_uniform_array_access_to_pull_constant a second time and it runs over the ends of the alloc. Fixes: tests/spec/glsl-1.50/execution/geometry/max-input-components.shader_test running inside virgl on ivybridge. Reviewed-by: Kenneth Graunke <[email protected]> Cc: <[email protected]> Signed-off-by: Dave Airlie <[email protected]> (cherry picked from commit 271fa3a684ef0eefe99087c13d1abb099784163f)
* anv: only expose up to 28 vertex attributesIago Toral Quiroga2017-07-271-1/+1
| | | | | | | | | | | | | The EU limit of 128 GRFs should allow 32 vertex elements of 4 GRFs. However, the maximum allowed value of "Vertex URB Entry Read Length" in SIMD8 is 15. And 15 * 8 = 120 gives us a limit of 30 vertex elements. Because we also need to reserve a vertex buffer to upload VertexIndex/InstanceIndex and another to upload DrawID when needed, we can only expose 28. Cc: "17.2" <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]> (cherry picked from commit 31f1863ace73d31a579e5c36252a957818ad09cf)
* anv/cmd_buffer: fix off by one error in assertionIago Toral Quiroga2017-07-271-1/+1
| | | | | | Cc: "17.2" <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]> (cherry picked from commit a848e693efc8e2a1d355dc1076409968b374153f)
* intel/blorp: ship blorp_genX_exec.h within the tarballEmil Velikov2017-07-241-0/+1
| | | | | | Fixes: c9cb37b2a6c ("intel/blorp: Add a partial resolve pass for MCS") Signed-off-by: Emil Velikov <[email protected]> (cherry picked from commit 5d47dd9c2a7b3a64429aaf7863b5cdad68b683ec)
* anv/image: zalloc image viewsJason Ekstrand2017-07-221-7/+1
| | | | | | This allows us to avoid some extra zeroing. Reviewed-by: Lionel Landwerlin <[email protected]>
* anv/image: Use vk_zalloc instead of an explicit memsetJason Ekstrand2017-07-221-3/+2
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Separate surface states by layout instead of aux_usageJason Ekstrand2017-07-224-53/+58
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/isl: Add some sanity checks for compressed surfacesJason Ekstrand2017-07-221-0/+18
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/isl: Add a helper to get a subimage surfaceJason Ekstrand2017-07-223-30/+76
| | | | | | | We already have a helper for doing this in BLORP, this just moves the logic into ISL where we can share it with other components. Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Get rid of some unused function declarationsJason Ekstrand2017-07-221-7/+0
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/isl: Add a helper for determining if a color is 0/1Jason Ekstrand2017-07-222-0/+30
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* intel/blorp: Allow blorp_copy on sRGB formatsJason Ekstrand2017-07-221-2/+16
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* intel/isl/format: Add an srgb_to_linear helperJason Ekstrand2017-07-222-1/+53
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* intel/isl/format: Dedent the template in gen_format_layout.pyJason Ekstrand2017-07-221-58/+57
| | | | | | | This makes it much easier to edit the template and doesn't really dirty the python all that much. Reviewed-by: Topi Pohjolainen <[email protected]>
* intel/isl: Add an aux state for "partial clear"Jason Ekstrand2017-07-221-35/+53
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* intel/blorp: Add a partial resolve pass for MCSJason Ekstrand2017-07-224-1/+213
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* anv: Predicate fast-clear resolvesNanley Chery2017-07-223-16/+120
| | | | | | | | | | | | | | | Image layouts only let us know that an image *may* be fast-cleared. For this reason we can end up with redundant resolves. Testing has shown that such resolves can measurably hurt performance and that predicating them can avoid the penalty. v2: - Introduce additional resolve state management function (Jason Ekstrand). - Enable easy retrieval of fast clear state fields. v3: Use more descriptive field enums (Jason) Signed-off-by: Nanley Chery <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel/blorp: Allow BLORP calls to be predicatedNanley Chery2017-07-222-0/+6
| | | | | Signed-off-by: Nanley Chery <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv/cmd_buffer: Skip some input attachment transitionsNanley Chery2017-07-221-5/+26
| | | | | Signed-off-by: Nanley Chery <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: Stop resolving CCS implicitlyNanley Chery2017-07-223-169/+5
| | | | | | | | | With an earlier patch from this series, resolves are additionally performed on layout transitions. Remove the now unnecessary implicit resolves within render passes. Signed-off-by: Nanley Chery <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: Transition more color buffer layoutsNanley Chery2017-07-222-28/+169
| | | | | | | | | | | v2: Expound on comment for the pipe controls (Jason Ekstrand). v3: - Cast base_layer to uint64_t to avoid overflow. - Remove "seems" from the pipe control comment. - Fix clamp of layer_count (Jason Ekstrand). Signed-off-by: Nanley Chery <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv/cmd_buffer: Warn about not enabling CCS_ENanley Chery2017-07-221-5/+7
| | | | | | | | Use the performance warning infrastructure to provide helpful information when testing applications. Signed-off-by: Nanley Chery <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv/cmd_buffer: Move aux_usage assignment upNanley Chery2017-07-221-32/+30
| | | | | | | | For readability, bring the assignment of CCS closer to the assignment of NONE and MCS. Signed-off-by: Nanley Chery <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv/cmd_buffer: Always enable CCS_D in render passesNanley Chery2017-07-222-11/+20
| | | | | | | | | | | | The lifespan of the fast-clear data will surpass the render pass scope. We need CCS_D to be enabled in order to invalidate blocks previously marked as cleared and to sample cleared data correctly. v2: Avoid refactoring. v3: Allow CCS_D for subpass resolves. Signed-off-by: Nanley Chery <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv/cmd_buffer: Disable CCS on gen7 color attachments upfrontNanley Chery2017-07-221-11/+5
| | | | | | | | | The next patch enables the use of CCS_D even when the color attachment will not be fast-cleared. Catch the gen7 case early to simplify the changes required. Signed-off-by: Nanley Chery <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv/cmd_buffer: Ensure fast-clear values are currentNanley Chery2017-07-221-0/+114
| | | | | | | v2: Rewrite functions, change location of synchronization. Signed-off-by: Nanley Chery <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv/gpu_memcpy: Add a lighter-weight GPU memcpy functionNanley Chery2017-07-222-0/+45
| | | | | | | | | | | | | | | | We'll be performing a GPU memcpy in more places to copy small amounts of data. Add an alternate function that thrashes less state. v2: - Make a new function (Jason Ekstrand). - Move the #define into the function. v3: - Update the function name (Jason). - Update comments. v4: Use an indirect drawing register as TEMP_REG (Jason Ekstrand). Signed-off-by: Nanley Chery <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv/cmd_buffer: Restrict fast clears in the GENERAL layoutNanley Chery2017-07-223-0/+40
| | | | | | | | | v2: Remove ::first_subpass_layout assertion (Jason Ekstrand). v3: Allow some fast clears in the GENERAL layout. v4: Remove extra '||' and adjust line break (Jason Ekstrand). Signed-off-by: Nanley Chery <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv/cmd_buffer: Don't partially fast clear image layersNanley Chery2017-07-221-8/+23
| | | | | | | | v2: Don't pass in the command buffer (Jason Ekstrand). v3: Remove an incorrect assertion and an if condition for gen7. Signed-off-by: Nanley Chery <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv/cmd_buffer: Initialize the clear values bufferNanley Chery2017-07-221-1/+78
| | | | | | | | | | v2: Rewrite functions. v3 (Jason Ekstrand): - Don't set ResourceMinLOD. - Fix clamp of level_count. Signed-off-by: Nanley Chery <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>