summaryrefslogtreecommitdiffstats
path: root/src/gallium
Commit message (Collapse)AuthorAgeFilesLines
* iris: Enable PIPE_CAP_CS_DERIVED_SYSTEM_VALUES_SUPPORTEDCaio Marcelo de Oliveira Filho2019-06-111-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This avoids lowering of CS system values by GLSL (configured by state tracker). In i965 we don't use that lowering, and we also shouldn't need that in Iris. Using it cause some unnecessary round trip between values, e.g.: shader uses gl_LocalInvocationIndex, GLSL rewrites it in terms of gl_LocalInvocationID, then driver rewrites those in terms of gl_LocalInvocationIndex again. Copy propagation can make some of those go away, but not all as seen below. Intel SKL shader-db results: total instructions in shared programs: 15595189 -> 15594556 (<.01%) instructions in affected programs: 74880 -> 74247 (-0.85%) helped: 81 HURT: 4 helped stats (abs) min: 2 max: 172 x̄: 7.88 x̃: 4 helped stats (rel) min: 0.19% max: 5.66% x̄: 1.71% x̃: 1.23% HURT stats (abs) min: 1 max: 2 x̄: 1.25 x̃: 1 HURT stats (rel) min: 0.45% max: 1.65% x̄: 0.76% x̃: 0.46% 95% mean confidence interval for instructions value: -11.56 -3.34 95% mean confidence interval for instructions %-change: -1.91% -1.28% Instructions are helped. total loops in shared programs: 4831 -> 4831 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total cycles in shared programs: 372136618 -> 372145628 (<.01%) cycles in affected programs: 9218230 -> 9227240 (0.10%) helped: 131 HURT: 86 helped stats (abs) min: 1 max: 798 x̄: 39.79 x̃: 12 helped stats (rel) min: <.01% max: 6.75% x̄: 0.42% x̃: 0.13% HURT stats (abs) min: 2 max: 2442 x̄: 165.38 x̃: 6 HURT stats (rel) min: <.01% max: 20.83% x̄: 0.74% x̃: 0.12% 95% mean confidence interval for cycles value: -2.07 85.11 95% mean confidence interval for cycles %-change: -0.22% 0.30% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 11956 -> 11950 (-0.05%) spills in affected programs: 77 -> 71 (-7.79%) helped: 3 HURT: 0 total fills in shared programs: 25619 -> 25549 (-0.27%) fills in affected programs: 593 -> 523 (-11.80%) helped: 4 HURT: 0 LOST: 0 GAINED: 0 Total CPU time (seconds): 1695.69 -> 1706.03 (0.61%) Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* gallium: Add PIPE_CAP_CS_DERIVED_SYSTEM_VALUES_SUPPORTEDCaio Marcelo de Oliveira Filho2019-06-113-0/+5
| | | | | | | | | | | Tells whether or not the driver can handle gl_LocalInvocationIndex and gl_GlobalInvocationID. If not supported (the default), state tracker will lower those on behalf of the driver. v2: Add case to u_screen.c. (Anholt) Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* freedreno/a5xx: enable a540Rob Clark2019-06-112-2/+14
| | | | | Tested-by: Jeffrey Hugo <[email protected]> Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: enable UBWC by defaultRob Clark2019-06-113-18/+3
| | | | | | | | Flip the FD_MESA_DEBUG flag to a disable rather than enable, drop the obsolete comment (and bonus, drop unused softpin debug flag) Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno/a6xx: disallow UBWC for z24s8Rob Clark2019-06-111-1/+0
| | | | | | | | | | | | | | This is slightly annoying because it *mostly* works.. but we have some issues to sort out about how to blit z24s8/x24s8/z24x8 with UBWC before we can enable UBWC by default. For now it is a step forward to at least enable it for non-z/s while we figure out how to blit z24s8+UBWC. (The basic issue is that pretending z24s8 is an equivalently sized rgba format for the purpose of blitting falls apart when UBWC is in the picture.) Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno/a6xx: use correct UBWC reg buildersRob Clark2019-06-112-11/+11
| | | | | | | | No functional change, the registers have the same layout as MRT flags pitch reg. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno/a6xx: disable UBWC for some formatsRob Clark2019-06-111-2/+0
| | | | | | | | | | | | An older blob claims to support UBWC w/ r32ui an r32i, but not r32f. Results from deqp indicate that it doesn't work with r32ui and r32i. This *could* also just mean that use as "IBO" (image) is more limited than as texture, although blob also doesn't seem to bother to try to use UBWC with images at all, so hard to know for sure. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno/a6xx: handle non-UWC-compatible image viewsRob Clark2019-06-115-1/+45
| | | | | Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno/a6xx: handle non-UBWC-compatible texture viewsRob Clark2019-06-113-0/+23
| | | | | Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno: add helper to uncompress UBWC resourceRob Clark2019-06-112-0/+37
| | | | | | | | | | | | | | | | | | | | | We'll need this for a few edge cases, like image/sampler view that uses a format that UBWC does not support with a resource originally created in a format that UBWC does support. NOTE we *could* in some cases do an in-place uncompress. But that has a couple potential sharp edges: 1) the uncompressed buffer could have different layout, ie. a5xx with meta and pixel data of layers/levels interleaved. 2) if it comes mid-batch, it would force flush, or somehow fixing up cmdstream for draws already emitted. But with the resource shadowing approach we can rely on batch re-ordering to avoid splitting things.. older draws see the older compressed version, newer draws see the new uncompressed version of the rsc. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno: handle images in rebind_resource()Rob Clark2019-06-111-0/+9
| | | | | Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno: allow null discard box in shadow pathRob Clark2019-06-111-4/+10
| | | | | | | When uncompressing a UBWC buffer, we don't want to discard anything. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno: swap UBWC state in shadow pathRob Clark2019-06-111-0/+4
| | | | | | | | | It doesn't come up yet, as so far we only hit this path with linear buffers. But it will when we start re-using the shadow path for uncompressing UBWC buffers. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno: add modifier param to fd_try_shadow_resource()Rob Clark2019-06-111-3/+5
| | | | | | | | To uncompress UBWC, I want to re-use the shadow path, but we'll need a way to request that the new buffer is not compressed. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno: correct modifier for UBWC buffersRob Clark2019-06-111-0/+3
| | | | | Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* virgl: consider newly created resources idleChia-I Wu2019-06-111-6/+8
| | | | | | | | | A newly created resource can be regarded as idle. We don't care if the RESOURCE_CREATE command has been retired, unless it is used for fencing. Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Alexandros Frantzis <[email protected]>
* virgl: make resource_wait/resource_is_busy cheaperChia-I Wu2019-06-112-0/+27
| | | | | | | | | | | | | The round trip to the kernel is expensive. Add a local cache to avoid it when possible. There is a race condition when two contexts access the same resource at the same time (e.g., ctx1 submits a cmdbuf that accesses a resource while ctx2 maps the resource). But that is probably an app bug in the first place. Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Alexandros Frantzis <[email protected]>
* virgl: add virgl_drm_{alloc,free,clear}_res_listChia-I Wu2019-06-111-17/+42
| | | | | | | | Helpers to work with resource list. virgl_drm_release_all_res is removed. Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Alexandros Frantzis <[email protected]>
* virgl: do not cache external resourcesChia-I Wu2019-06-112-1/+10
| | | | | | | | We should not reuse a resource for other purposes when it can still be accessed by another process or device. Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Alexandros Frantzis <[email protected]>
* panfrost: Enable AFBC on depth/stencilAlyssa Rosenzweig2019-06-113-11/+10
| | | | | | | | This seems to be a performance win, but more rigorous testing is necessary to figure out the exact circumstances when this is good/bad. Incidentally, this fixes non-aligned ZS. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Linear depth/stencil should be alignedAlyssa Rosenzweig2019-06-111-1/+2
| | | | | | We might render to it. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard: Decode LOD/bias registersAlyssa Rosenzweig2019-06-112-5/+58
| | | | | | | | For constant LODs/biases, we can use an immediate embedded in the texture (already decoded); for non-constant, we have to use a register squeezed into the usual immediate field, which is decoded here. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard: Decode texture offset register swizzleAlyssa Rosenzweig2019-06-112-11/+23
| | | | Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard/disasm: include textureGather()Alyssa Rosenzweig2019-06-112-5/+16
| | | | Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard: Support negative immediate offsetsAlyssa Rosenzweig2019-06-112-16/+20
| | | | | | | It's not at all clear why this work for texelFetch but not texture. Maybe the top bits are dual-purpose on other texturing ops...? Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard: Fix redunant mask redundancyAlyssa Rosenzweig2019-06-111-0/+5
| | | | Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard/disasm: Print LOD for texelFetchAlyssa Rosenzweig2019-06-111-0/+9
| | | | | | Its encoding differs slightly from the LOD used in normal texture calls. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard: Identify the in_reg_full fieldAlyssa Rosenzweig2019-06-113-10/+3
| | | | | | | This is clear for texelFetch, hence the confusion with Bifrost's filter field, but it's much more general in reality. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard/disasm: Correctly dump bias/LODAlyssa Rosenzweig2019-06-112-16/+20
| | | | Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard/disasm: Cleanup texture op codeAlyssa Rosenzweig2019-06-111-3/+3
| | | | Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard/disasm: Add missing spaceAlyssa Rosenzweig2019-06-111-2/+2
| | | | Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard/disasm: LOD immediate/register selectAlyssa Rosenzweig2019-06-112-3/+13
| | | | Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard/disasm: Use texture op name bareAlyssa Rosenzweig2019-06-112-9/+7
| | | | | | This allows us to show a call to textureLod in a reasonable way. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard/disasm: Varying perspective dividesAlyssa Rosenzweig2019-06-112-4/+28
| | | | | | | With an extra flag, we're able to do a perspective division "for free" while loading a varying. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard: Add perspective division opcodesAlyssa Rosenzweig2019-06-112-0/+9
| | | | | | ...on the load/store unit, not the ALUs. Looks goofy but hey. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard: Print texture offsetsAlyssa Rosenzweig2019-06-112-36/+56
| | | | | | | | This patch identifies the two modes of offsets in a texture instruction (immediate and register, disambiguated by the bit-once-known-as "has_offset") and implements disassembly for both. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard: Expand texture to 4-channel swizzleAlyssa Rosenzweig2019-06-113-24/+7
| | | | | | | This eliminates some unknowns, clarifies 3D textures, and will maybe help with array/shadow textures? Signed-off-by: Alyssa Rosenzweig <[email protected]>
* freedreno/a5xx: Fix indirect draw max_indices calculationEduardo Lima Mitev2019-06-111-2/+1
| | | | | | | | | | | | | | | | | The number of elements to draw should not be affected by the offset. A similar fix was submitted for a6xx at 79180a05. Fixes these dEQP tests on a5xx: dEQP-GLES31.functional.draw_indirect.compute_interop.large.drawelements_separate_grid_500x500_drawcount_8 dEQP-GLES31.functional.draw_indirect.compute_interop.large.drawelements_separate_grid_500x500_drawcount_2500 dEQP-GLES31.functional.draw_indirect.compute_interop.large.drawarrays_separate_grid_500x500_drawcount_2500 dEQP-GLES31.functional.draw_indirect.compute_interop.large.drawarrays_combined_grid_500x500_drawcount_2500 dEQP-GLES31.functional.draw_indirect.compute_interop.large.drawelements_combined_grid_500x500_drawcount_8 dEQP-GLES31.functional.draw_indirect.compute_interop.large.drawelements_combined_grid_500x500_drawcount_2500 Reviewed-by: Rob Clark <[email protected]>
* iris: Bypass half-float pack/unpack lowering.Kenneth Graunke2019-06-101-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This skips GLSL IR lowering of pack/unpackHalf operations, allowing the NIR optimizer to see them Improves performance in Synmark2's OglCSDof by about 2x, by cutting about 90% of the cycles from one of the compute shaders. shader-db statistics on Skylake: 4 compute shaders went from SIMD8 to SIMD16. total instructions in shared programs: 15598871 -> 15542568 (-0.36%) instructions in affected programs: 143016 -> 86713 (-39.37%) helped: 144 HURT: 0 helped stats (abs) min: 17 max: 4669 x̄: 390.99 x̃: 164 helped stats (rel) min: 7.48% max: 85.28% x̄: 30.17% x̃: 24.22% 95% mean confidence interval for instructions value: -510.50 -271.49 95% mean confidence interval for instructions %-change: -32.70% -27.65% Instructions are helped. total cycles in shared programs: 371973958 -> 368902103 (-0.83%) cycles in affected programs: 5557722 -> 2485867 (-55.27%) helped: 144 HURT: 0 helped stats (abs) min: 106 max: 1026600 x̄: 21332.33 x̃: 1697 helped stats (rel) min: 0.53% max: 88.98% x̄: 36.12% x̃: 34.67% 95% mean confidence interval for cycles value: -41570.02 -1094.64 95% mean confidence interval for cycles %-change: -38.44% -33.80% Cycles are helped. total spills in shared programs: 11936 -> 11903 (-0.28%) spills in affected programs: 110 -> 77 (-30.00%) helped: 3 HURT: 2 total fills in shared programs: 25644 -> 25178 (-1.82%) fills in affected programs: 677 -> 211 (-68.83%) helped: 5 HURT: 0 total loops in shared programs: 4830 -> 4829 (-0.02%) loops in affected programs: 1 -> 0 helped: 1 HURT: 0
* panfrost: Ignore discards in dead branch analysisAlyssa Rosenzweig2019-06-101-0/+5
| | | | | | | Fixes regressions in dEQP-GLES2.functional.shaders.discard.dynamic_loop_* Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard: Disambiguate register modeAlyssa Rosenzweig2019-06-101-1/+11
| | | | | | | | | | | | | We postfix instructions by their size if a destination override is in place (a la AT&T assembly), disambiguating instruction sizes. Previously, "16-bit instruction, 16-bit dest, 16-bit sources" disassembled identically to "32-bit instruction, 16-bit dest, 16-bit sources", which is semantically distinct due to the lessened opportunity for parallelism but (potentially) greater precision. Adding a postfix removes the ambiguity and relieves mental gymnastics reading weird disassemblies even in some cases that are not ambiguous. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard: Expose vec8/vec16 modesAlyssa Rosenzweig2019-06-101-236/+273
| | | | | | | | | | | | | | | | | | | | | | | | Midgard ALUs can operate in one of four modes: vec2 64-bit, vec4 32-bit, vec8 16-bit, or vec16 8-bit. Our compiler (and indeed, any OpenGL ES shader) only uses 32-bit (and eventually vec4 16-bit) modes in normal circumstances. Nevertheless, the other modes do exist and are easily accessible through OpenCL; they also come up in cases like blend shaders. While we have had minimal support for decoding 8-bit/64-bit modes, we did so pretending they were vec4 in each case; 16-bit registers had a synthetically duplicated register file to separate lo/hi halves, etc. This works for GL, but it doesn't map to what the hardware is -actually- doing, which can cause some headscratchingly bizarre disassemblies from OpenCL. So, we dive in the deep end and support these other modes natively in the disassembler, using absurdly long masks/swizzles, since the hardware is considerably more flexible than what was exposed before. Outside of some fixed routines for blending, none of the above is supported in the compiler yet. But it's better to have it in the ISA definitions and disassembler than not, for future use if nothing else. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard: Add shifting int modifiersAlyssa Rosenzweig2019-06-102-18/+14
| | | | | | | | | | | | | | | | | | | | | As a source modifier, shift allows shifting a value left by the bit size, useful in conjunction with a greater register mode, for instance to implement `upsample`. As a concrete example, the following OpenCL: ushort hr0 = /* ... */, uint r1 = /* ... */; uint r2 = (convert_uint(hr0) << 16) ^ b; compiles to the following Midgard assembly: ixor r, (hr0) << 16, b In reverse, the ".hi" output modifier shifts the value right by the bit size, leaving just the carry/overflow at the bottom. To implement *_hi functions in OpenCL (for <64-bit), we do arithmetic in the 2x higher mode with the .hi modifier. (For 64-bit, things are hairier, since there is not an 128-bit int mode). Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard: Add integer outmodsAlyssa Rosenzweig2019-06-103-25/+60
| | | | | | | | | | | | | | For floats, output modifiers determine clamping behaviour. For integers, they determine wrapping/saturation behaviour (or shifting -- see next commit). These are very different; they are conceptually two unrelated enums union'ed together; the distinction is responsible for many-a-bug. While clamping behaviour for floats was clear from GL, the int behaviour is only known From OpenCL contortion with convert_*_sat() functions. With the underlying functions known, clean up the codebase, likely fixing outmod type related bugs in the process. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard: Note floating compares type convertAlyssa Rosenzweig2019-06-101-4/+4
| | | | | | | | | | | | | | | | OP_TYPE_CONVERTS denotes an opcode that returns a different type than is source (going from int-domain to float-domain or vice versa), named after the f2i/i2f family of opcodes it covers. We care because source mods are determined by the source type (i/f) but output modifiers are determined by the output type (equals the source type, unless the op type converts, in which case it's the opposite). The upshot is that floating-point compares (feq/fne/etc) actually do type-convert. That is, that take in floating-points and output in integer space (a boolean), so we mark them off this way to ensure the correct output modifiers are used. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Align linear renderable resourcesAlyssa Rosenzweig2019-06-101-3/+10
| | | | | | | | It's just -easier- to render to aligned framebuffers. For winsys targets, we already align, but even for an internal linear FBO we ought to align everything nicely. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Fix stride check when mipmappingAlyssa Rosenzweig2019-06-101-7/+15
| | | | | | | | | Now that we support custom strides on mipmapped textures (theoretically, at least), extend the stride check to support mipmaps. Fixes incorrect strides of linear windows in Weston. Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Tomeu Vizoso <[email protected]>
* panfrost: Refactor texture/sampler uploadAlyssa Rosenzweig2019-06-103-100/+124
| | | | | | | | | | | | | We move some coding packing the texture/sampler descriptors into dedicated functions (out of the terrifyingly long emit_for_draw monolith), cleaning them up as we go. The discovery triggering the cleanup is the format for including manual strides in the presence of mipmaps/cubemaps. Rather than placed at the end like previously assumed, they are interleaved after each address. This difference is relevant when handling NPOT linear mipmaps. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Refactor blitting codeAlyssa Rosenzweig2019-06-105-79/+170
| | | | | | | | | We refactor the wallpaper rendering code to separate the wallpaper-specific bits from the general blitting capabilities. In the (hopefully near) future, we'll turn this on to implement real Gallium blits, e.g. for automatic mipmap generation. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Refactor AFBC codeAlyssa Rosenzweig2019-06-104-64/+154
| | | | | | | | This patch does a substantial cleanup of the code for handling AFBC, moving various disparate misplaced functions into a new central pan_afbc.c file. Signed-off-by: Alyssa Rosenzweig <[email protected]>