aboutsummaryrefslogtreecommitdiffstats
path: root/src/intel/compiler
Commit message (Collapse)AuthorAgeFilesLines
...
* i965: Add a brw_hw_type_to_reg_type() functionMatt Turner2017-08-212-0/+29
| | | | | | Will be used in later commits. Reviewed-by: Scott D Phillips <[email protected]>
* i965: Use a common table to translate logical to hardware typesMatt Turner2017-08-211-36/+29
| | | | Reviewed-by: Scott D Phillips <[email protected]>
* i965: Extract functions dealing with register types to separate fileMatt Turner2017-08-214-140/+207
| | | | | | | | | | I'm going to encapsulate all of the logic dealing with register types in this file. Rename the parameters for the hardware encodings from type -> hw_type at the same time. Reviewed-by: Scott D Phillips <[email protected]>
* i965: Reverse file/type arguments to register type functionsMatt Turner2017-08-214-13/+15
| | | | | | | I think of the initial arguments as "state" and the last as the actual subject. Reviewed-by: Scott D Phillips <[email protected]>
* i965: Add support for disassembling 64-bit integer immediatesMatt Turner2017-08-212-0/+13
| | | | | | | After the last patch converted things into enums, I helpfully got a compiler warning about these missing from the switch statement. Reviewed-by: Scott D Phillips <[email protected]>
* i965: Use separate enums for register vs immediate typesMatt Turner2017-08-216-129/+144
| | | | | | | The hardware encodings often mean different things depending on whether the source is an immediate. Reviewed-by: Scott D Phillips <[email protected]>
* i965: Reorder brw_reg_type enum valuesMatt Turner2017-08-215-26/+21
| | | | | | | | | | | These vaguely corresponded to the hardware encodings, but that is purely historical at this point. Reorder them so we stop making things "almost work" when mixing enums. The ordering has been closen so that no enum value is the same as a compatible hardware encoding. Reviewed-by: Scott D Phillips <[email protected]>
* i965: Validate destination restrictions with vector immediatesMatt Turner2017-08-213-12/+141
| | | | Reviewed-by: Scott D Phillips <[email protected]>
* i965: Don't let raw-move check be tricked by immediate vector typesMatt Turner2017-08-211-3/+10
| | | | | | | UB and B type encodings are the same as UV and VF. Noticed when writing the following patch. Reviewed-by: Scott D Phillips <[email protected]>
* i965: Only change type of 0.0f to VF if destination stride == 1Matt Turner2017-08-211-1/+2
| | | | | | | | | | The destination stride must be equivalent to a dword if VF is used. Also, since the only compaction table entires with "i:vf" have the destination as "r:f" specifically check that the destination is of type float. Reviewed-by: Scott D Phillips <[email protected]>
* i965: Remove CONT/BREAK from instruction compaction testMatt Turner2017-08-211-4/+0
| | | | | | | These cannot be compacted. A similar mistake was fixed in commit 90eaf01616a8 Reviewed-by: Scott D Phillips <[email protected]>
* i965: Test instruction compaction on all supported GensMatt Turner2017-08-211-8/+42
| | | | | | | | Note that there's no point in testing on G45, since its compaction is the same as Gen5. Same logic applies to Gen7 variants and low-power parts. Reviewed-by: Scott D Phillips <[email protected]>
* i965: Silence signed/unsigned comparison warningMatt Turner2017-08-211-1/+1
| | | | Reviewed-by: Scott D Phillips <[email protected]>
* i965: Move compaction "prepass" into brw_eu_compact.cMatt Turner2017-08-212-72/+82
| | | | Reviewed-by: Scott D Phillips <[email protected]>
* i965: Mark src inst pointer const in compaction codeMatt Turner2017-08-212-12/+13
| | | | Reviewed-by: Scott D Phillips <[email protected]>
* intel/compiler: properly size attribute wa_flags array for VulkanIago Toral Quiroga2017-08-111-1/+17
| | | | | | | | | | | | | | | | | | Mesa will map user defined vertex input attributes to slots starting at VERT_ATTRIB_GENERIC0 which gives us room for only 16 slots (up to GL_VERT_ATTRIB_MAX). This sufficient for GL, where we expose exactly 16 vertex attributes for user defined inputs, but in Vulkan we can expose up to 28 (which are also mapped from VERT_ATTRIB_GENERIC0 onwards) so we need to account for this when we scope the size of the array of attribute workaround flags that is used during the brw_vertex_workarounds NIR pass. This prevents out-of-bounds accesses in that array for NIR shaders that use more than 16 vertex input attributes. Fixes: dEQP-VK.pipeline.vertex_input.max_attributes.* Acked-by: Lionel Landwerlin <[email protected]>
* intel/vec4/gs: reset nr_pull_param if DUAL_INSTANCED compile failed.Dave Airlie2017-08-031-0/+1
| | | | | | | | | | | | | | | If dual object compile fails (as seems to happen with virgl a fair bit, and does piglit even have any tests for it?), we end up not restarting the pull params, so we call vec4_visitor::move_uniform_array_access_to_pull_constant a second time and it runs over the ends of the alloc. Fixes: tests/spec/glsl-1.50/execution/geometry/max-input-components.shader_test running inside virgl on ivybridge. Reviewed-by: Kenneth Graunke <[email protected]> Cc: <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* i965: Fix indentationMatt Turner2017-08-022-8/+8
|
* i965: Set lower_vote_trivial in vector_nir_options_gen6 too.Kenneth Graunke2017-07-211-0/+1
| | | | | | There's a second struct for Gen6+. Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Match destination type to size for ballotMatt Turner2017-07-202-2/+6
| | | | No use in taking a 64-bit value when we know the high 32-bits are zero.
* nir: Reduce destination size of ballot intrinsic when possibleMatt Turner2017-07-201-0/+1
| | | | | | | | | Some hardware, like i965, doesn't support group sizes greater than 32. In that case, we can reduce the destination size of the ballot intrinsic, which will simplify our code generation. Reviewed-by: Connor Abbott <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Implement ARB_shader_ballot operationsMatt Turner2017-07-203-0/+48
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Do not move MOVs writing the flag outside of control flowMatt Turner2017-07-201-2/+4
| | | | | | | | | | | | | | | | | | | The implementation of ballotARB() will start by zeroing the flags register. So, a doing something like if (gl_SubGroupInvocationARB % 2u == 0u) { ... = ballotARB(true); [...] } else { ... = ballotARB(true); [...] } (like fs-ballot-if-else.shader_test does) would generate identical MOVs to the same destination (the flag register!), and we definitely do not want to pull that out of the control flow. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Handle explicit flag sources in flags_read()Francisco Jerez2017-07-201-4/+5
| | | | | | | The implementations of the ARB_shader_ballot intrinsics will explicitly read the flag as a source register. Reviewed-by: Matt Turner <[email protected]>
* nir: Add system values from ARB_shader_ballotMatt Turner2017-07-202-3/+3
| | | | | | | | | | | | | We already had a channel_num system value, which I'm renaming to subgroup_invocation to match the rest of the new system values. Note that while ballotARB(true) will return zeros in the high 32-bits on systems where gl_SubGroupSizeARB <= 32, the gl_SubGroup??MaskARB variables do not consider whether channels are enabled. See issue (1) of ARB_shader_ballot. Reviewed-by: Connor Abbott <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Implement ARB_shader_group_vote operationsMatt Turner2017-07-201-0/+50
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Handle explicit flag destinations in flags_written()Francisco Jerez2017-07-201-4/+19
| | | | | | | The implementations of the ARB_shader_group_vote intrinsics will explicitly write the flag as the destination register. Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Lower ARB_shader_group_vote intrinsicsMatt Turner2017-07-201-0/+1
| | | | | | | | I don't expect anyone is going to care about using this in vec4 programs (vertex/tessellation/geometry on Gen6/7), no one has come up with a good way to implement it much less test it. Reviewed-by: Kenneth Graunke <[email protected]>
* nir: Add pass to optimize intrinsicsMatt Turner2017-07-201-0/+1
| | | | | | | Specifically, constant fold intrinsics from ARB_shader_group_vote, but I suspect it'll be useful for other things in the future. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Use pushed UBO data in the scalar backend.Kenneth Graunke2017-07-133-1/+64
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This actually takes advantage of the newly pushed UBO data, avoiding pull loads. Improves performance in GLBenchmark Manhattan 3.1 by: HSW: ~1%, BDW/SKL/KBL GT2: 3-4%, SKL GT4: 7-8%, APL: 4-5%. (thanks to Eero Tamminen for these numbers) shader-db results on Skylake, ignoring programs with spill/fill changes: total instructions in shared programs: 13963994 -> 13651893 (-2.24%) instructions in affected programs: 4250328 -> 3938227 (-7.34%) helped: 28527 HURT: 0 total cycles in shared programs: 179808608 -> 172535170 (-4.05%) cycles in affected programs: 79720410 -> 72446972 (-9.12%) helped: 26951 HURT: 1248 LOST: 46 GAINED: 21 Many "Deus Ex: Mankind Divided" shaders which already spilled end up spill a lot more (about 240 programs hurt, 9 helped). The cycle estimator suggests this is still overall a win (-0.23% in cycle counts) presumably because we trade pull loads for fills. v2: Drop "PULL" environment variable left in for initial debugging (caught by Matt). Reviewed-by: Matt Turner <[email protected]>
* i965: Factor out push locations.Kenneth Graunke2017-07-132-16/+25
| | | | | | | | With UBOs, the answer of "have we decided to push this uniform" gets a bit more complicated - for one, we have multiple surfaces. This patch refactors things so we can add the new code in a single place. Reviewed-by: Matt Turner <[email protected]>
* i965: Push UBO data, but don't use it just yet.Kenneth Graunke2017-07-132-1/+11
| | | | | | | | | | | This patch starts uploading UBO data via 3DSTATE_CONSTANT_* packets, and updates the compiler to know that there's extra payload data, so things continue working. However, it still issues pull loads for all data. I wanted to separate the two aspects for greater bisectability. v2: Update for new intel_bufferobj_buffer parameter. Reviewed-by: Matt Turner <[email protected]>
* i965: Select ranges of UBO data to be uploaded as push constants.Kenneth Graunke2017-07-133-0/+311
| | | | | | | | | | | | | | | This adds a NIR pass that decides which portions of UBOS we should upload as push constants, rather than pull constants. v2: Switch to uint16_t for the UBO block number, because we may have a lot of them in Vulkan (suggested by Jason). Add more comments about bitfield trickery (requested by Matt). v3: Skip vec4 stages for now...I haven't finished wiring up support in the vec4 backend, and so pushing the data but not using it will just be wasteful. Reviewed-by: Matt Turner <[email protected]>
* i965: Switch to absolute addressing for constant buffer 0.Kenneth Graunke2017-07-131-0/+6
| | | | | | | | | | | | | | | | | By default, 3DSTATE_CONSTANT_* Constant Buffer 0 is relative to dynamic state base address. This makes it unusable for pushing UBOs. I'd like to be able to use all four push buffers. There is a bit in the INSTPM register (or CS_DEBUG_MODE2 on Skylake) which controls whether buffer 0 is relative to dynamic state base address, or simply a normal pointer. Setting that gives us full flexibility. We can't currently write this on Haswell and earlier, and will need to update the kernel command parser, and then do the whole version checking song and dance. Reviewed-by: Matt Turner <[email protected]>
* intel/compiler: no need to check unsigned is >= 0Lionel Landwerlin2017-07-131-1/+1
| | | | | | CID: 1338342 Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* intel/compiler: don't check unsigned is >= 0Lionel Landwerlin2017-07-131-1/+1
| | | | | | CID: 1224468 Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* intel/compiler: remove check unsigned is >= 0Lionel Landwerlin2017-07-131-1/+1
| | | | | | | | By definition unsigned are always >= 0. CID: 742212 Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* intel/compiler: Don't use opt_sampler_eot() optimization on gen10+Anuj Phogat2017-07-121-1/+1
| | | | | | | This optimization has been removed on gen10+. Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* i965/i915: Add UYVY as the supported formatJohnson Lin2017-06-302-0/+2
| | | | | | Trigger the correct sampler options for it. Similar with YUYV Reviewed-by: Kristian H. Kristensen <[email protected]>
* intel: compiler/i965: fix is_broxton checksLionel Landwerlin2017-06-203-4/+4
| | | | | | | | | | In 5f2fe9302c is_geminilake was introduced for the differenciate broxton from geminilake. Unfortunately I failed as verifying that is_broxton is throughout the code base to mean Gen9lp. Fixes: 5f2fe9302c ("intel: common: add flag to identify platforms by name") Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/cnl: Make URB {VS, GS, HS, DS} sizes non multiple of 3Anuj Phogat2017-06-094-4/+33
| | | | | | | | | | v1: By Ben Widawsky <[email protected]> v2: v1 had an assert only for VS. Add the restriction for GS, HS and DS as well and make sure the allocated sizes are not multiple of 3. v3: Move the entry_size checks in to compiler code (Ken) Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/cnl: Handle gen10 in switch cases across the driverAnuj Phogat2017-06-092-0/+3
| | | | | | | | | V2: Start using gen10 functions isl_gen10*(), gen10_blorp_exec() gen10_init_atoms() (Jason) Remove Vulkan changes. Do them later in a separate patch. Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965/cnl: Update few assertionsAnuj Phogat2017-06-091-1/+1
| | | | | Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* tree-wide: remove trailing backslashEric Engestrom2017-06-071-1/+1
| | | | | | | | | Simple search for a backslash followed by two newlines. If one of the newlines were to be removed, this would cause issues, so let's just remove these trailing backslashes. Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965: Change INTEL_DEBUG=vec4 to INTEL_SCALAR_VS for consistency.Kenneth Graunke2017-06-051-1/+1
| | | | | | | | | We moved to INTEL_SCALAR_* when we added more than a single stage, but never went back and converted the VS to work that way. Be consistent. Also update the documentation to actually mention these debug variables. Acked-by: Jason Ekstrand <[email protected]>
* i965: Drop duplicate shadow variable.Kenneth Graunke2017-06-011-1/+0
| | | | | | We already initialized this at the top of the function. Trivial.
* i965: Move SOL PSIZ hacks from draw time to link time.Kenneth Graunke2017-06-011-12/+1
| | | | | | | | | We can just update the gl_transform_feedback_info fields at link time to make the VUE header fields have the right location and component. Then we don't need to handle them specially at draw time, which is expensive. Reviewed-by: Rafael Antognolli <[email protected]>
* i965: Ignore INTEL_SCALAR_* debug variables on Gen10+.Kenneth Graunke2017-05-291-10/+16
| | | | | | | | | | | Scalar mode has been default since Broadwell, and vector mode is getting increasingly unmaintained. There are a few things that don't even fully work in vector mode on Skylake, but we've never cared because nobody uses it. There's no point in porting it forward to new platforms. So, just ignore the debug options to force it on. Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Move clip program compilation to the compilerJason Ekstrand2017-05-268-0/+2340
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Move SF compilation to the compilerJason Ekstrand2017-05-263-0/+931
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>