summaryrefslogtreecommitdiffstats
path: root/src/intel
Commit message (Collapse)AuthorAgeFilesLines
* intel/tools: Add ROL/ROR support in assemblerSagar Ghuge2019-07-012-0/+10
| | | | | Signed-off-by: Sagar Ghuge <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* intel/compiler: Emit ROR and ROL instructionSagar Ghuge2019-07-012-0/+9
| | | | | | | v2: Reorder patch (Matt Turner) Signed-off-by: Sagar Ghuge <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* intel/compiler: Enable the emission of ROR/ROL instructionsSagar Ghuge2019-07-016-2/+26
| | | | | | | | v2: 1) Drop changes for vec4 backend as on Gen11+ we don't support align16 mode (Matt Turner) Signed-off-by: Sagar Ghuge <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* anv: fix indentationEric Engestrom2019-06-291-15/+14
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: fix typoEric Engestrom2019-06-291-1/+1
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: replace hard-coded platform list with vk.xml parseEric Engestrom2019-06-291-5/+11
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/compiler: don't use byte operands for src1 on ICLLionel Landwerlin2019-06-294-20/+192
| | | | | | | | | | | | | | | | | | | | | | The simulator complains about using byte operands, we also have documentation telling us. Note that add operations on bytes seems to work fine on HW (like ADD). Using dwords operands with CMP & SEL fixes the following tests : dEQP-VK.spirv_assembly.type.vec*.i8.* v2: Drop the GLK changes (Matt) Add validator tests (Matt) v3: Drop GLK ref (Matt) Don't mix float/integer in MAD (Matt) Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Rafael Antognolli <[email protected]> (v1) Reviewed-by: Matt Turner <[email protected]> BSpec: 3017 Cc: <[email protected]>
* intel/vec4: Try both sources as candidates for being immediatesIan Romanick2019-06-281-41/+80
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For some reason, when I first wrote try_immediate_source, I thought the sources had already been ordered so that the immediate value was the second source. That's rubbish. The generator assumes *neither* source is immediate, and it relies on later copy/constant propagation passes to do the reordering. For this reason, the changes to try_immediate_source have to go to some efforts to reorder the operands and tell the caller when it reordered them. The generator for comparison instructions uses this to determine when the comparison needs to change (e.g., from GT to LT). No changes on any Gen8 or later platform because those platforms do not use the vec4 backend. Haswell total instructions in shared programs: 13484431 -> 13480500 (-0.03%) instructions in affected programs: 441138 -> 437207 (-0.89%) helped: 1883 HURT: 0 helped stats (abs) min: 1 max: 49 x̄: 2.09 x̃: 1 helped stats (rel) min: 0.07% max: 8.91% x̄: 1.10% x̃: 0.90% 95% mean confidence interval for instructions value: -2.19 -1.98 95% mean confidence interval for instructions %-change: -1.14% -1.06% Instructions are helped. total cycles in shared programs: 376420286 -> 376406400 (<.01%) cycles in affected programs: 15995668 -> 15981782 (-0.09%) helped: 1692 HURT: 219 helped stats (abs) min: 2 max: 764 x̄: 13.78 x̃: 4 helped stats (rel) min: <.01% max: 9.69% x̄: 0.69% x̃: 0.35% HURT stats (abs) min: 2 max: 516 x̄: 43.09 x̃: 22 HURT stats (rel) min: 0.02% max: 12.09% x̄: 2.30% x̃: 1.13% 95% mean confidence interval for cycles value: -9.70 -4.83 95% mean confidence interval for cycles %-change: -0.42% -0.28% Cycles are helped. total spills in shared programs: 23166 -> 23158 (-0.03%) spills in affected programs: 66 -> 58 (-12.12%) helped: 2 HURT: 0 total fills in shared programs: 34592 -> 34580 (-0.03%) fills in affected programs: 75 -> 63 (-16.00%) helped: 2 HURT: 0 Ivy Bridge total instructions in shared programs: 12051590 -> 12048513 (-0.03%) instructions in affected programs: 355911 -> 352834 (-0.86%) helped: 1481 HURT: 0 helped stats (abs) min: 1 max: 12 x̄: 2.08 x̃: 1 helped stats (rel) min: 0.07% max: 4.92% x̄: 1.08% x̃: 0.90% 95% mean confidence interval for instructions value: -2.17 -1.98 95% mean confidence interval for instructions %-change: -1.12% -1.04% Instructions are helped. total cycles in shared programs: 180319624 -> 180307642 (<.01%) cycles in affected programs: 15591028 -> 15579046 (-0.08%) helped: 1340 HURT: 174 helped stats (abs) min: 2 max: 764 x̄: 14.19 x̃: 2 helped stats (rel) min: <.01% max: 8.68% x̄: 0.64% x̃: 0.32% HURT stats (abs) min: 2 max: 518 x̄: 40.41 x̃: 14 HURT stats (rel) min: 0.02% max: 8.37% x̄: 1.59% x̃: 0.67% 95% mean confidence interval for cycles value: -10.85 -4.97 95% mean confidence interval for cycles %-change: -0.45% -0.31% Cycles are helped. All Gen6 and earlier platforms had simlar results. (Sandy Bridge shown) total instructions in shared programs: 10863159 -> 10861462 (-0.02%) instructions in affected programs: 157839 -> 156142 (-1.08%) helped: 715 HURT: 0 helped stats (abs) min: 1 max: 12 x̄: 2.37 x̃: 2 helped stats (rel) min: 0.23% max: 4.33% x̄: 1.07% x̃: 0.85% 95% mean confidence interval for instructions value: -2.53 -2.21 95% mean confidence interval for instructions %-change: -1.13% -1.02% Instructions are helped. total cycles in shared programs: 153957782 -> 153948778 (<.01%) cycles in affected programs: 3171648 -> 3162644 (-0.28%) helped: 696 HURT: 62 helped stats (abs) min: 2 max: 390 x̄: 15.72 x̃: 4 helped stats (rel) min: 0.02% max: 10.57% x̄: 0.57% x̃: 0.12% HURT stats (abs) min: 2 max: 300 x̄: 31.29 x̃: 2 HURT stats (rel) min: 0.11% max: 7.23% x̄: 0.83% x̃: 0.34% 95% mean confidence interval for cycles value: -15.65 -8.11 95% mean confidence interval for cycles %-change: -0.56% -0.36% Cycles are helped. Reviewed-by: Matt Turner <[email protected]>
* intel/vec4: Try immediate sources for dot products tooIan Romanick2019-06-281-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | No changes on any Gen8 or later platform because those platforms do not use the vec4 backend. All Haswell and earlier platforms has similar results. (Haswell shown) total instructions in shared programs: 13484467 -> 13484431 (<.01%) instructions in affected programs: 8540 -> 8504 (-0.42%) helped: 33 HURT: 0 helped stats (abs) min: 1 max: 2 x̄: 1.09 x̃: 1 helped stats (rel) min: 0.31% max: 1.53% x̄: 0.49% x̃: 0.35% 95% mean confidence interval for instructions value: -1.19 -0.99 95% mean confidence interval for instructions %-change: -0.60% -0.38% Instructions are helped. total cycles in shared programs: 376420572 -> 376420286 (<.01%) cycles in affected programs: 56260 -> 55974 (-0.51%) helped: 26 HURT: 5 helped stats (abs) min: 2 max: 204 x̄: 11.85 x̃: 2 helped stats (rel) min: 0.11% max: 3.08% x̄: 0.39% x̃: 0.13% HURT stats (abs) min: 2 max: 6 x̄: 4.40 x̃: 6 HURT stats (rel) min: 0.03% max: 0.35% x̄: 0.24% x̃: 0.35% 95% mean confidence interval for cycles value: -22.91 4.45 95% mean confidence interval for cycles %-change: -0.56% -0.02% Inconclusive result (value mean confidence interval includes 0). Reviewed-by: Matt Turner <[email protected]>
* intel/vec4: Try emitting non-scalar immediatesIan Romanick2019-06-281-4/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sometimes an instruction has a vector as a source, but all of the components have the same value. For example, vec3 32 ssa_16 = load_const (1.0, 1.0, 1.0) ... vec3 32 ssa_82 = fadd ssa_16, -ssa_81.xyz No changes on any Gen8 or later platform because those platforms do not use the vec4 backend. Haswell total instructions in shared programs: 13487811 -> 13484467 (-0.02%) instructions in affected programs: 421981 -> 418637 (-0.79%) helped: 1859 HURT: 0 helped stats (abs) min: 1 max: 15 x̄: 1.80 x̃: 1 helped stats (rel) min: 0.04% max: 9.80% x̄: 1.04% x̃: 0.84% 95% mean confidence interval for instructions value: -1.85 -1.74 95% mean confidence interval for instructions %-change: -1.07% -1.00% Instructions are helped. total cycles in shared programs: 376423252 -> 376420572 (<.01%) cycles in affected programs: 14800970 -> 14798290 (-0.02%) helped: 1519 HURT: 329 helped stats (abs) min: 2 max: 462 x̄: 10.59 x̃: 4 helped stats (rel) min: 0.03% max: 16.73% x̄: 0.79% x̃: 0.36% HURT stats (abs) min: 2 max: 598 x̄: 40.74 x̃: 16 HURT stats (rel) min: <.01% max: 10.32% x̄: 2.56% x̃: 0.98% 95% mean confidence interval for cycles value: -3.53 0.63 95% mean confidence interval for cycles %-change: -0.30% -0.09% Inconclusive result (value mean confidence interval includes 0). total fills in shared programs: 34601 -> 34592 (-0.03%) fills in affected programs: 91 -> 82 (-9.89%) helped: 9 HURT: 0 Ivy Bridge total instructions in shared programs: 12053565 -> 12051626 (-0.02%) instructions in affected programs: 298103 -> 296164 (-0.65%) helped: 1228 HURT: 0 helped stats (abs) min: 1 max: 8 x̄: 1.58 x̃: 1 helped stats (rel) min: 0.04% max: 3.57% x̄: 0.91% x̃: 0.81% 95% mean confidence interval for instructions value: -1.63 -1.53 95% mean confidence interval for instructions %-change: -0.95% -0.88% Instructions are helped. total cycles in shared programs: 180322270 -> 180319922 (<.01%) cycles in affected programs: 14123840 -> 14121492 (-0.02%) helped: 1036 HURT: 195 helped stats (abs) min: 2 max: 462 x̄: 11.93 x̃: 2 helped stats (rel) min: 0.03% max: 14.05% x̄: 0.82% x̃: 0.35% HURT stats (abs) min: 2 max: 598 x̄: 51.33 x̃: 16 HURT stats (rel) min: <.01% max: 9.68% x̄: 3.02% x̃: 0.72% 95% mean confidence interval for cycles value: -4.92 1.10 95% mean confidence interval for cycles %-change: -0.35% -0.07% Inconclusive result (value mean confidence interval includes 0). Sandy Bridge total instructions in shared programs: 10864286 -> 10863189 (-0.01%) instructions in affected programs: 159722 -> 158625 (-0.69%) helped: 724 HURT: 0 helped stats (abs) min: 1 max: 4 x̄: 1.52 x̃: 1 helped stats (rel) min: 0.10% max: 2.91% x̄: 0.79% x̃: 0.62% 95% mean confidence interval for instructions value: -1.58 -1.46 95% mean confidence interval for instructions %-change: -0.82% -0.75% Instructions are helped. total cycles in shared programs: 153967938 -> 153957926 (<.01%) cycles in affected programs: 1923186 -> 1913174 (-0.52%) helped: 654 HURT: 56 helped stats (abs) min: 2 max: 170 x̄: 20.00 x̃: 4 helped stats (rel) min: 0.03% max: 11.82% x̄: 0.89% x̃: 0.18% HURT stats (abs) min: 2 max: 390 x̄: 54.75 x̃: 32 HURT stats (rel) min: 0.05% max: 6.92% x̄: 3.09% x̃: 2.92% 95% mean confidence interval for cycles value: -17.42 -10.78 95% mean confidence interval for cycles %-change: -0.76% -0.40% Cycles are helped. Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8142677 -> 8141721 (-0.01%) instructions in affected programs: 139511 -> 138555 (-0.69%) helped: 588 HURT: 0 helped stats (abs) min: 1 max: 8 x̄: 1.63 x̃: 1 helped stats (rel) min: 0.21% max: 4.39% x̄: 0.84% x̃: 0.46% 95% mean confidence interval for instructions value: -1.70 -1.55 95% mean confidence interval for instructions %-change: -0.89% -0.78% Instructions are helped. total cycles in shared programs: 188549394 -> 188547676 (<.01%) cycles in affected programs: 3171960 -> 3170242 (-0.05%) helped: 527 HURT: 0 helped stats (abs) min: 2 max: 18 x̄: 3.26 x̃: 2 helped stats (rel) min: <.01% max: 0.80% x̄: 0.08% x̃: 0.06% 95% mean confidence interval for cycles value: -3.49 -3.03 95% mean confidence interval for cycles %-change: -0.09% -0.07% Cycles are helped. Reviewed-by: Matt Turner <[email protected]>
* Revert "anv/icl: Add WA_2204188704 to disable pixel shader panic dispatch"Anuj Phogat2019-06-281-12/+0
| | | | | | | | | | | | | | SLICE_COMMON_CHICKEN3 is a privileged register not accesible from userspace. This patch silences a simulator warning about it. We don't need to add this workaround in linux kernel as the WA description says it's fixed on latest stepping. This reverts commit 2be60e0c73ed1555a919c5725cc0cab119a2b6de. Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* isl: Don't align phys_level0_sa by block dimensionNanley Chery2019-06-272-31/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Aligning phys_level0_sa by the compression block dimension prior to mipmap layout causes the layout of compressed surfaces to differ from the sampler's expectations in certain cases. The hardware docs agree: From the BDW PRM, Vol. 5, Compressed Mipmap Layout, The compressed mipmaps are stored in a similar fashion to uncompressed mipmaps [...] The following exceptions apply to the layout of compressed (vs. uncompressed) mipmaps: * [...] * The dimensions of the mip maps are first determined by applying the sizing algorithm presented in Non-Power-of-Two Mipmaps above. Then, if necessary, they are padded out to compression block boundaries. The last bullet indicates that alignment should not be done for calculating a miplevel's dimensions, but rather for determining miplevel placement/padding. Comply with this text by removing the extra alignment. Fixes some fbo-generatemipmap-formats piglit failures on all tested platforms (SNB-KBL). v2: - Note fixed platforms. - Update some consumers via a helper function. Cc: <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel: Add and use helpers for level0 extentNanley Chery2019-06-273-15/+37
| | | | | | | | | | | | | Prepare for a bug fix by adding and using helpers which convert isl_surf::logical_level0_px and isl_surf::phys_level0_sa to units of surface elements. v2: - Update iris (Ken). - Update anv. Cc: <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel/compiler: fix derivative on y axis implementationLionel Landwerlin2019-06-271-21/+5
| | | | | | | | | | | | | | | This rewrites the ddy in EXECUTE_4 mode with a loop to make it more obvious what is going on and also sets the group each of the 4 threads in the groups are supposed to execute. Fixes the following CTS tests : dEQP-VK.glsl.derivate.dfdyfine.dynamic_* Signed-off-by: Lionel Landwerlin <[email protected]> Co-Authored-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]> Fixes: 2134ea380033d5 ("intel/compiler/fs: Implement ddy without using align16 for Gen11+")
* intel/blorp: Disable sampler state prefetching on Gen11Kenneth Graunke2019-06-251-0/+4
| | | | | | | | | | | | | | | Sampler state prefetching is broken on Gen11, and WA_160668216 says to disable it. Apparently sampler state prefetching also has basically zero impact on performance, so we don't need to worry there. i965, anv, and iris already handle this correctly, but we missed BLORP. Ideally the kernel should globally disable this by writing SARCHKMD, at which point we wouldn't have to worry about it. But let's be defensive and handle it ourselves too. v2: separate out from BTP workaround in case we change that eventually Reviewed-by: Anuj Phogat <[email protected]> [v1]
* anv/descriptor_set: Only write texture swizzles if we have an image viewJason Ekstrand2019-06-251-1/+1
| | | | | | | | | When immutable samplers are set we call write_image_view with a NULL image view. This causes issues on IVB where we have to fake texture swizzling. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110999 Fixes: d2aa65eb18 "anv: Emulate texture swizzle in the shader when..."
* intel/compiler: silence a warning of using different enum typeTapani Pälli2019-06-251-1/+1
| | | | | Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* anv: Add HAL_PIXEL_FORMAT_IMPLEMENTATION_DEFINED in vk_formatNataraj Deshpande2019-06-243-3/+13
| | | | | | | | | | | | | | | | | | | | | | | | When HAL_PIXEL_FORMAT_IMPLEMENTATION_DEFINED is used, then the platform gralloc module will select a format based on the usage flags provided by the camera device and the other endpoint of the stream. The patch fixes crash in vulkan when the test is run with camera stream set to HAL_PIXEL_FORMAT_IMPLEMENTATION_DEFINED. Test: android.graphics.cts.CameraVulkanGpuTest#testCameraImportAndRendering on chromebook with camera HAL3. v2: use AHARDWAREBUFFER_FORMAT_IMPLEMENTATION_DEFINED and take AHARDWAREBUFFER_USAGE_CAMERA_MASK in to account (Gurchetan) Fixes: f1654fa7e31 "anv/android: support creating images from external format" Signed-off-by: Nataraj Deshpande <[email protected]> Signed-off-by: Gurchetan Singh <[email protected]> Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Tapani Pälli <[email protected]> Reviewed-by: Gurchetan Singh <[email protected]> Acked-by: Lionel Landwerlin <[email protected]> Acked-by: Jason Ekstrand <[email protected]>
* anv: Implement "pop-free" clippingJason Ekstrand2019-06-212-4/+86
| | | | | | | | | | This is the preferred clipping mode since it doesn't mean your points disappear the moment part of the point crosses over the edge of the viewport and that lines have weird endpoints at viewport edges. We've just never bothered to hook it up until now. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Enable the guardband clip testJason Ekstrand2019-06-212-3/+21
| | | | | | | | | | In workloads where there is a lot of geometry drawn that crosses over the edge of the viewport, this should substantially improve clipper performance. Not really sure why it's taken 3 years to turn it on but we never got around to it. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* i965,iris: Move guardband calculations to a common locationJason Ekstrand2019-06-213-0/+119
| | | | | Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* iris: Implement INTEL_DEBUG=pc for pipe control logging.Kenneth Graunke2019-06-202-0/+2
| | | | | | | | This prints a log of every PIPE_CONTROL flush we emit, noting which bits were set, and also the reason for the flush. That way we can see which are caused by hardware workarounds, render-to-texture, buffer updates, and so on. It should make it easier to determine whether we're doing too many flushes and why.
* anv: only resort to sync fds internally with no syncobj supportLionel Landwerlin2019-06-202-8/+45
| | | | | | | | | | | | | | | | We can rely on only one kind of synchronization object (drm-syncobj) when it is available. This reduces the number of file descriptors we use in our implementation. This will be required later for timeline semaphores implementation, at this point we won't ever want to use anything else but syncobjs. v2: Only use has_syncobj for semaphores (Jason) v3: Only has_syncobj in assert on semaphores in QueueSubmit (Jason) Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* isl: tag unreachable path as suchEric Engestrom2019-06-201-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | GCC should be able to figure out that all the possible enum values are exhausted in the switch() and all the branches return from the function, but apparently it doesn't, so let's tell the compiler explicitly. This gets rid of the following warnings in GCC 9: [1/24] Compiling C object 'src/intel/isl/60d23f8@@isl@sta/isl.c.o'. ../src/intel/isl/isl.c: In function ‘isl_surf_init_s’: ../src/intel/isl/isl.c:1569:10: warning: ‘array_pitch_el_rows’ may be used uninitialized in this function [-Wmaybe-uninitialized] 1569 | *surf = (struct isl_surf) { | ~~~~~~^~~~~~~~~~~~~~~~~~~~~ 1570 | .dim = info->dim, | ~~~~~~~~~~~~~~~~~ 1571 | .dim_layout = dim_layout, | ~~~~~~~~~~~~~~~~~~~~~~~~~ 1572 | .msaa_layout = msaa_layout, | ~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1573 | .tiling = tiling, | ~~~~~~~~~~~~~~~~~ 1574 | .format = info->format, | ~~~~~~~~~~~~~~~~~~~~~~~ 1575 | | 1576 | .levels = info->levels, | ~~~~~~~~~~~~~~~~~~~~~~~ 1577 | .samples = info->samples, | ~~~~~~~~~~~~~~~~~~~~~~~~~ 1578 | | 1579 | .image_alignment_el = image_align_el, | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1580 | .logical_level0_px = logical_level0_px, | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1581 | .phys_level0_sa = phys_level0_sa, | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1582 | | 1583 | .size_B = size_B, | ~~~~~~~~~~~~~~~~~ 1584 | .alignment_B = base_alignment_B, | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1585 | .row_pitch_B = row_pitch_B, | ~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1586 | .array_pitch_el_rows = array_pitch_el_rows, | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1587 | .array_pitch_span = array_pitch_span, | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1588 | | 1589 | .usage = info->usage, | ~~~~~~~~~~~~~~~~~~~~~ 1590 | }; | ~ ../src/intel/isl/isl.c:1488:24: warning: ‘*((void *)&phys_total_el+4)’ may be used uninitialized in this function [-Wmaybe-uninitialized] 1488 | struct isl_extent2d phys_total_el; | ^~~~~~~~~~~~~ ../src/intel/isl/isl.c:1335:38: warning: ‘phys_total_el’ may be used uninitialized in this function [-Wmaybe-uninitialized] 1335 | isl_align_div(phys_total_el->w * tile_el_scale, | ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~ ../src/intel/isl/isl.c:1488:24: note: ‘phys_total_el’ was declared here 1488 | struct isl_extent2d phys_total_el; | ^~~~~~~~~~~~~ Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: Fix vulkan build in meson.Bas Nieuwenhuizen2019-06-191-1/+7
| | | | | | | Apparently the android part was never ported to meson. CC: <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* anv/image: Set different usage flags for shadow surfacesJason Ekstrand2019-06-191-1/+6
| | | | | | | | | | | For the block BLOCK_TEXEL_VIEW_COMPATIBLE case, this didn't matter because the flags were already more-or-less what we wanted. However, for gen7 stencil shadow images, it still had ISL_SURF_USAGE_STENCIL_BIT so we were getting W-tiled which isn't what we want for the shadow. By passing just ISL_SURF_USAGE_TEXTURE_BIT (and CUBE if we care), we now get something that's actually texturable. Fixes: f3ea0cf828 "anv: Add stencil texturing support for gen7"
* anv: Flush caches in anv_image_copy_to_shadowJason Ekstrand2019-06-191-0/+13
| | | | | | | | | Copies to a shadow image happen during a VkCmdPipelineBarrier or at subpass transitions. We could potentially be a bit more conservative but these transitions shouldn't happen often and it's better to have our bases covered. Fixes: f3ea0cf828 "anv: Add stencil texturing support for gen7"
* anv: Fix wrong printf formatterKenneth Graunke2019-06-191-1/+1
| | | | %lu is for unsigned long, %zu is for size_t. Just cast the data.
* anv: write spirv-nir logs back to the applicationLionel Landwerlin2019-06-191-0/+35
| | | | | | | Using the existing VK_EXT_debug_report extension. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: Make border colors the right size and alignment on HSWJason Ekstrand2019-06-182-12/+48
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* anv: Set STATE_BASE_ADDRESS upper bounds on gen7Jason Ekstrand2019-06-171-0/+17
| | | | | | | | | This should fix floating-point border color on all gen7 HW. Integer is still thoroughly busted on gen7 because it doesn't exist on IVB and it's crazy on HSW. Cc: [email protected] Reviewed-by: Kenneth Graunke <[email protected]>
* anv:Use VK_EXT_separate_stencil_usage to avoid stencil shadows on gen7Jason Ekstrand2019-06-174-2/+16
| | | | | | | | | | Whenever stencil texturing is not required (most of the time), we can use VK_EXT_separate_stencil_usage to only create the shadow image when VK_IMAGE_USAGE_SAMPLED_BIT is required for stencil. Of course, this depends on applications to use the extension but hopefully DXVK and similar translators are doing so and that covers most of the apps. Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Add stencil texturing support for gen7Jason Ekstrand2019-06-173-7/+96
| | | | | | | | | | | | Intel hardware didn't get support for sampling from W-tiled (required for stencil) images until Broadwell so we can't directly sample from stencil. Instead, if we want to support stencil texturing on gen7 hardware, we have to keep a texture-capable shadow copy around and use BLORP to update when stencil changes. The one thing this commit does not implement is self-dependencies with stencil input attachments. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99493 Reviewed-by: Lionel Landwerlin <[email protected]>
* anv/blorp: Update shadow images when clearing or uploadingJason Ekstrand2019-06-171-11/+104
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* anv/cmd_buffer: Add a stencil transition helperJason Ekstrand2019-06-171-35/+75
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* anv/blorp: Take an aspect in anv_image_copy_to_shadowJason Ekstrand2019-06-173-3/+4
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* anv/formats: Re-arrange the way se set some flag bitsJason Ekstrand2019-06-171-6/+5
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* u_dynarray: turn util_dynarray_{grow, resize} into element-oriented macrosNicolai Hähnle2019-06-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | The main motivation for this change is API ergonomics: most operations on dynarrays are really on elements, not on bytes, so it's weird to have grow and resize as the odd operations out. The secondary motivation is memory safety. Users of the old byte-oriented functions would often multiply a number of elements with the element size, which could overflow, and checking for overflow is tedious. With this change, we only need to implement the overflow checks once. The checks are cheap: since eltsize is a compile-time constant and the functions should be inlined, they only add a single comparison and an unlikely branch. v2: - ensure operations are no-op when allocation fails - in util_dynarray_clone, call resize_bytes with a compile-time constant element size v3: - fix iris, lima, panfrost Reviewed-by: Marek Olšák <[email protected]>
* anv: do not parse genxml data without INTEL_DEBUG=batLionel Landwerlin2019-06-121-10/+13
| | | | | | | | This significantly slows down the CTS runs. Signed-off-by: Lionel Landwerlin <[email protected]> Fixes: 32ffd90002b04b ("anv: add support for INTEL_DEBUG=bat") Reviewed-by: Jordan Justen <[email protected]>
* intel/dump: fix segfault when the app hasn't accessed the deviceLionel Landwerlin2019-06-121-3/+5
| | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* anv/cmd_buffer: Reuse gen8 Cmd{Set, Reset}Event on gen7Ville Syrjälä2019-06-113-140/+107
| | | | | | | | | | | | | | Modern DXVK requires event support [1], but looks like it only uses vkCmdSetEvent() + vkGetEventStatus(). So we can just borrow the relevant code from gen8, leaving CmdWaitEvents still unimplemented. [1] https://github.com/doitsujin/dxvk/commit/8c3900c533d83d12c970b905183d17a1d3e8df1f v2: Also move CmdWaitEvents into genX_cmd_buffer.c (Jason) Signed-off-by: Ville Syrjälä <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel/fs: Mark source 0 of bcsel as needing Boolean resolveIan Romanick2019-06-111-0/+6
| | | | | | | | | | | | The other sources of the bcsel behave like the sources of an and or other logical operation. However, source zero behaves differently. It is evaluated as a Boolean, so it needs to be resolved. No shader-db changes, but the tests mentioned in the bug get a couple instructions added back. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110857 Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* anv: ignore inline uniform blocks in anv_CmdPushDescriptorSetKHR()Samuel Iglesias Gonsálvez2019-06-111-13/+0
| | | | | | | | | | | | | | | | | | | | | | | | According to the Vulkan spec, inline uniform blocks are not allowed to be updated through vkCmdPushDescriptorSetKHR(). These are the spec quotes from "13.2.1. Descriptor Set Layout" that are relevant for this case: "VK_DESCRIPTOR_SET_LAYOUT_CREATE_PUSH_DESCRIPTOR_BIT_KHR specifies that descriptor sets must not be allocated using this layout, and descriptors are instead pushed by vkCmdPushDescriptorSetKHR." "If flags contains VK_DESCRIPTOR_SET_LAYOUT_CREATE_PUSH_DESCRIPTOR_BIT_KHR, then all elements of pBindings must not have a descriptorType of VK_DESCRIPTOR_TYPE_INLINE_UNIFORM_BLOCK_EXT". There is no explicit mention in vkCmdPushDescriptorSetKHR() to forbid this case but it is implied in the creation of the descriptor set layout as aforementioned. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel/gpu_dump: fix argument passingLionel Landwerlin2019-06-092-3/+3
| | | | | | | | | | | We were dropping "/' around arguments grouped together. This was triggering failures with : $ ./framemetrics -g "Memory Writes Distribution Gen9" -o /tmp/output.csv -f ./my.trace 10 11 Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* intel/blorp: Only double the fast-clear rect alignment on HSWJason Ekstrand2019-06-071-10/+15
| | | | | | | | | This restriction was accidentally added to the BSpec/PRM as an unrestricted restriction starting with the HSW docs and it was never removed. However, it only ever applied to HSW and actually potentially causes problems on BDW and above where we have mipmapped fast-clears. Reviewed-by: Nanley Chery <[email protected]>
* anv/cmd_buffer: Initalize the clear color struct for CNL+Nanley Chery2019-06-071-13/+7
| | | | | | | | | | | | | | On CNL+, the clear color struct is composed of RGBA channel values and fields which are either reserved by the HW or used to control fast-clears. Currently anv initializes the channel values to zero and allows the other fields to be undefined. Satisfy the MBZ field requirements by removing an optimization that doesn't hold true for CNL+ and pulling in the number of dwords to initialize from ISL. Cc: <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* isl: Mark enum isl_channel_select packed so it becomes 1 byte.Kenneth Graunke2019-06-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I recently discovered that the following code lead to valgrind errors: struct isl_swizzle swizzle = ISL_SWIZZLE_IDENTITY; VALGRIND_CHECK_MEM_IS_DEFINED(&swizzle, sizeof(swizzle)); which is surprising, because struct isl_swizzle is simply: struct isl_swizzle { enum isl_channel_select r:4; enum isl_channel_select g:4; enum isl_channel_select b:4; enum isl_channel_select a:4; }; and the above code initializes all of them with a C99 initializer. Iván Briano reminded me that C99 initializers don't necessarily zero padding. A quick inspection revealed that sizeof(struct isl_swizzle) was 4 (rather than the expected 2). Ian Romanick suggested changing it to uint16_t, since this is essentially dicing up an unsigned, and that worked. This patch marks enum isl_channel_select packed, changing its size from 4 bytes to 1 byte. This then makes struct isl_swizzle 2 bytes, with no bogus padding fields. This eliminates valgrind undefined memory warnings. These isl_swizzle values become part of our BLORP blit program keys, which are then hashed. This undefined padding was being included in the hashing, possibly leading to issues. I originally saw this error when running KHR-GL45.texture_size_promotion.functional in iris under valgrind. Reviewed-by: Jason Ekstrand <[email protected]>
* anv: allow NV12 <--> AHARDWAREBUFFER_FORMAT_Y8Cb8Cr8_420 inter-opGurchetan Singh2019-06-061-0/+5
| | | | | | | | | | | | | | | AHARDWAREBUFFER_FORMAT_Y8Cb8Cr8_420 is an implementation defined flexible YUV format.  Most of the times, it's NV12 or YV12. On Intel, NV12 is preferred since it can be used by the display engine.   This API adds a dependency between gralloc and buffer consumers, unfortunately. Right now, the code seems to work for i915 gralloc, but not cros_gralloc. Add a preprocessor flag to fix this. TEST=android.graphics.cts.MediaVulkanGpuTest#testMediaImportAndRendering Reviewed-by: Tapani Pälli <[email protected]>
* anv: Fix check for isl_fmt in assertNataraj Deshpande2019-06-061-1/+1
| | | | | | | | | | Checking isl_fmt returned value in assert seems appropriate instead of format variable. Fixes: f1654fa7e31 "anv/android: support creating images from external format" Signed-off-by: Nataraj Deshpande <[email protected]> Reviewed-by: Tapani Pälli <[email protected]> Reviewed-by: Sagar Ghuge <[email protected]>
* intel/compiler: Treat b32csel as potentially producing a Boolean result for ↵Ian Romanick2019-06-051-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | resolve analysis If the 2nd and 3rd source are both Boolean values, we can potentially avoid a resolve by only resolving the result of the b32csel. No changes on any Gen6+ Intel platform. v2: Use ?: instead of cast from bool to unsigned. Suggested by Caio. Iron Lake total instructions in shared programs: 8142729 -> 8142677 (<.01%) instructions in affected programs: 12890 -> 12838 (-0.40%) helped: 26 HURT: 0 helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 helped stats (rel) min: 0.25% max: 0.74% x̄: 0.45% x̃: 0.38% 95% mean confidence interval for instructions value: -2.00 -2.00 95% mean confidence interval for instructions %-change: -0.52% -0.39% Instructions are helped. total cycles in shared programs: 188549632 -> 188549394 (<.01%) cycles in affected programs: 60754 -> 60516 (-0.39%) helped: 25 HURT: 1 helped stats (abs) min: 2 max: 26 x̄: 9.92 x̃: 8 helped stats (rel) min: 0.07% max: 2.23% x̄: 0.59% x̃: 0.27% HURT stats (abs) min: 10 max: 10 x̄: 10.00 x̃: 10 HURT stats (rel) min: 0.70% max: 0.70% x̄: 0.70% x̃: 0.70% 95% mean confidence interval for cycles value: -12.91 -5.40 95% mean confidence interval for cycles %-change: -0.84% -0.23% Cycles are helped. GM45 total instructions in shared programs: 5013119 -> 5013093 (<.01%) instructions in affected programs: 6764 -> 6738 (-0.38%) helped: 13 HURT: 0 helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 helped stats (rel) min: 0.24% max: 0.68% x̄: 0.43% x̃: 0.36% 95% mean confidence interval for instructions value: -2.00 -2.00 95% mean confidence interval for instructions %-change: -0.52% -0.34% Instructions are helped. total cycles in shared programs: 128977804 -> 128977700 (<.01%) cycles in affected programs: 37738 -> 37634 (-0.28%) helped: 13 HURT: 0 helped stats (abs) min: 8 max: 8 x̄: 8.00 x̃: 8 helped stats (rel) min: 0.18% max: 0.46% x̄: 0.30% x̃: 0.26% 95% mean confidence interval for cycles value: -8.00 -8.00 95% mean confidence interval for cycles %-change: -0.36% -0.24% Cycles are helped. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Reviewed-by: Matt Turner <[email protected]>