summaryrefslogtreecommitdiffstats
path: root/src/intel/compiler
Commit message (Collapse)AuthorAgeFilesLines
* meson: Add build Intel "anv" vulkan driverDylan Baker2017-09-271-0/+155
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This allows building and installing the Intel "anv" Vulkan driver using meson and ninja, the driver has been tested against the CTS and has seems to pass the same series of tests (they both segfault when the CTS tries to run wayland wsi tests). There are still a mess of TODO, XXX, and FIXME comments in here. Those are mostly for meson bugs I'm trying to fix, or for additional things to implement for other drivers/features. I have configured all intermediate libraries and optional tools to not build by default, meaning they will only be built if they're pulled in as a dependency of a target that will actually be installed) this allows us to avoid massive if chains, while ensuring that only the bits that need to be built are. v2: - enable anv, x11, and wayland by default - add configure option to disable valgrind v3: - fix typo in meson_options (Nicholas) v4: - Remove dead code (Eric) - Remove change to generator that was from v0 (Eric) - replace if chain with loop (Eric) - Fix typos (Eric) - define HAVE_DLOPEN for both libdl and builtin dl cases (Eric) v5: - rebase on util string buffer implementation Signed-off-by: Dylan Baker <[email protected]> Reviewed-by: Eric Anholt <[email protected]> (v4)
* intel: use a flag instead of setting PYTHONPATHDylan Baker2017-09-271-8/+25
| | | | | | | | | | | | Meson doesn't allow setting environment variables for custom targets, so we either need to not pass this as an environment variable or use a shell script to wrap the invocation. The chosen solution has the advantage of working for both autotools and meson. v2: - put rules back in top scope (Ken) Reviewed-by: Kenneth Graunke <[email protected]> Signed-off-by: Dylan Baker <[email protected]>
* i965: Support copy propagating of untyped atomic surface indexes.Kenneth Graunke2017-09-261-0/+7
| | | | | | | | In the vec4 backend, SHADER_OPCODE_UNTYPED_ATOMIC's src[1] is the surface index. We want to copy propagate so we can use an immediate message descriptor, rather than an indirect send. Reviewed-by: Ian Romanick <[email protected]>
* i965/vec4: Fix swizzles on atomic sources.Kenneth Graunke2017-09-261-4/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | Atomic operation sources are scalar values, but we were failing to select the .x component of the second operand. For example, atomicCounterCompSwapARB(counter, 5u, 10u) would generate mov(8) vgrf4.x:D, 5D mov(8) vgrf5.x:D, 10D mov(8) vgrf9.x:UD, vgrf4.xyzw:D mov(8) vgrf9.y:UD, vgrf5.xyzw:D which wrongly selects the .y component of vgrf5, so the actual 10u value would get dead code eliminated. The swizzle works for the other source, but both of them ought to be .xxxx. Fixes the compare and swap CTS tests in: KHR-GL45.shader_atomic_counter_ops_tests.ShaderAtomicCounterOpsExchangeTestCase Cc: "17.2 17.1 17.0 13.0" <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965/vec4: Actually handle atomic op intrinsics.Kenneth Graunke2017-09-261-2/+10
| | | | | | | | | | | | Embarassingly, someone enabled the ARB_shader_atomic_counter_ops extension for Gen7+ but never added the intrinsics to the switch statement in the vec4 backend, so they just hit an unreachable() call and died. Fixes: 40dd45d0c6aa4a9d (i965: Enable ARB_shader_atomic_counter_ops) Cc: "17.2 17.1 17.0 13.0" <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965/nir: export nir_optimizeTimothy Arceri2017-09-262-7/+11
| | | | | Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Eduardo Lima Mitev <[email protected]>
* i965: Handle unwritten PSIZ/VIEWPORT/LAYER outputs in vec4 shaders.Kenneth Graunke2017-09-211-3/+3
| | | | | | | | This can occur if the shader is capturing some of the values from the VUE header for transform feedback, but the shader hasn't written all of them. Reviewed-by: Juan A. Suarez Romero <[email protected]>
* intel/eu/validate: Look up types on demand in execution_type()Jason Ekstrand2017-09-121-4/+2
| | | | | | | | | | We are looking up the execution type prior to checking how many sources we have. This leads to looking for a type for src1 on MOV instructions which is bogus. On BDW+, the src1 register type overlaps with the 64-bit immediate and causes us problems. Reviewed-by: Matt Turner <[email protected]> Cc: [email protected]
* i965: Drop unnecessary conditionalMatt Turner2017-08-291-4/+4
| | | | | | | | Clang doesn't realize that 0 and 1 are the only possibilities, a thinks lots of variables might be uninitialized. Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* intel/compiler: Cast reg types explicitlyTopi Pohjolainen2017-08-281-2/+2
| | | | | | | | | | Makes coverity happier. CID: 1416799 Fixes: c1ac1a3d25 (i965: Add a brw_hw_type_to_reg_type() function) Reviewed-by: Matt Turner <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]>
* anv,i965: Move CS shared lowering into anvJason Ekstrand2017-08-241-2/+0
| | | | | | | | | | | Right now, OpenGL uses the GLSL lowering for shared variables and anv uses NIR to lower them. For a long time, we've done this weird thing where we do the NIR lowering unconditionally and then add the SLM sizes from the two together. This works because one of them will always be 0 but it's a bit sketchy. Let's just move the NIR-based lowering into anv_pipeline and get rid of the sketch. Reviewed-by: Jordan Justen <[email protected]>
* i965: Stop using wm_prog_data->binding_table.render_target_start.Kenneth Graunke2017-08-231-2/+7
| | | | | | | | | | | | | Render target surfaces always start at binding table index 0. This is required for us to use headerless FB writes, which we really want to do. So, we'll never change that. Given that, it's not necessary to look up a wm_prog_data field which we already know contains 0. We can drop the dependency in brw_renderbuffer_surfaces (Gen4-5)...which was already confusingly missing from gen6_renderbuffer_surfaces. Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Add a brw_wm_prog_data::has_render_target_reads field.Kenneth Graunke2017-08-232-0/+3
| | | | | | | State upload code should use prog_data rather than poking at shader_info directly. Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Mark functions staticMatt Turner2017-08-213-20/+21
| | | | | | Cuts 300 bytes of .text Reviewed-by: Jordan Justen <[email protected]>
* i965/vec4: Use 'class' src_reg, rather than 'struct' src_regMatt Turner2017-08-211-1/+1
| | | | Reviewed-by: Jordan Justen <[email protected]>
* i965/vec4: Return float from spill_cost_for_type()Matt Turner2017-08-211-1/+1
| | | | | Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]>
* i965: Optimize reading the destination typeMatt Turner2017-08-211-1/+3
| | | | | | | | | | | | | brw_hw_type_to_reg_type() needs to know only whether the file is BRW_IMMEDIATE_VALUE or not, which is not a valid file for the destination. gcc and clang will evaluate __builtin_strcmp() at compile time, so we can use it to pass a constant file for the destination. text data bss dec hex filename 7816214 346248 420496 8582958 82f72e i965_dri.so before 7816070 346248 420496 8582814 82f69e i965_dri.so after Reviewed-by: Scott D Phillips <[email protected]>
* i965: Mark brw_hw_type_to_reg_type() as a pure functionMatt Turner2017-08-211-1/+7
| | | | | | | | text data bss dec hex filename 7816886 346248 420496 8583630 82f9ce i965_dri.so before 7816214 346248 420496 8582958 82f72e i965_dri.so after Reviewed-by: Scott D Phillips <[email protected]>
* i965: Hide the register type hardware encodingsMatt Turner2017-08-212-31/+31
| | | | | | So we stop mixing them with the logical enum. Reviewed-by: Scott D Phillips <[email protected]>
* i965: Stop using hardware register types directlyMatt Turner2017-08-214-158/+113
| | | | Reviewed-by: Scott D Phillips <[email protected]>
* i965: Add brw_hw_reg_type_to_letters() and use it in brw_disasm.cMatt Turner2017-08-213-39/+45
| | | | Reviewed-by: Scott D Phillips <[email protected]>
* i965: Move brw_reg_type_letters() as wellMatt Turner2017-08-216-33/+37
| | | | | | | And add "to_" to the name for consistency with the other functions in this file. Reviewed-by: Scott D Phillips <[email protected]>
* i965: Switch to using the logical register typesMatt Turner2017-08-212-21/+19
| | | | Reviewed-by: Scott D Phillips <[email protected]>
* i965: Add functions to abstract access to register typesMatt Turner2017-08-212-51/+79
| | | | | | | | | | | | | | | | | | Previously the brw_inst{,_set}_{dst,src0,src1}_reg_type() functions provided access to the hardware encodings for the register types. We often mixed these with the logical BRW_REGISTER_TYPE_* enums (which themselves used to be the hardware format!) with bad results. With that functionality now available with the hw_ versions (see previous commit), we now add functions that take the logical BRW_REGISTER_TYPE_* enums and convert into the hardware format and vice versa. To do the conversion we also have to provide the file. Note the asymmetry between the two functions: the new getter reads the file from the instruction word, and to ensure that is always set the setter writes both the file and the type. Reviewed-by: Scott D Phillips <[email protected]>
* i965: Rename brw_inst's functions that access the register typeMatt Turner2017-08-217-99/+99
| | | | | | Put hw_ in the name so that it's clear these are the hardware encodings. Reviewed-by: Scott D Phillips <[email protected]>
* i965: Index brw_hw_reg_type_to_size()'s table by logical typeMatt Turner2017-08-211-39/+19
| | | | | | I'll be transitioning everything to use the logical types. Reviewed-by: Scott D Phillips <[email protected]>
* i965: Add a brw_hw_type_to_reg_type() functionMatt Turner2017-08-212-0/+29
| | | | | | Will be used in later commits. Reviewed-by: Scott D Phillips <[email protected]>
* i965: Use a common table to translate logical to hardware typesMatt Turner2017-08-211-36/+29
| | | | Reviewed-by: Scott D Phillips <[email protected]>
* i965: Extract functions dealing with register types to separate fileMatt Turner2017-08-214-140/+207
| | | | | | | | | | I'm going to encapsulate all of the logic dealing with register types in this file. Rename the parameters for the hardware encodings from type -> hw_type at the same time. Reviewed-by: Scott D Phillips <[email protected]>
* i965: Reverse file/type arguments to register type functionsMatt Turner2017-08-214-13/+15
| | | | | | | I think of the initial arguments as "state" and the last as the actual subject. Reviewed-by: Scott D Phillips <[email protected]>
* i965: Add support for disassembling 64-bit integer immediatesMatt Turner2017-08-212-0/+13
| | | | | | | After the last patch converted things into enums, I helpfully got a compiler warning about these missing from the switch statement. Reviewed-by: Scott D Phillips <[email protected]>
* i965: Use separate enums for register vs immediate typesMatt Turner2017-08-216-129/+144
| | | | | | | The hardware encodings often mean different things depending on whether the source is an immediate. Reviewed-by: Scott D Phillips <[email protected]>
* i965: Reorder brw_reg_type enum valuesMatt Turner2017-08-215-26/+21
| | | | | | | | | | | These vaguely corresponded to the hardware encodings, but that is purely historical at this point. Reorder them so we stop making things "almost work" when mixing enums. The ordering has been closen so that no enum value is the same as a compatible hardware encoding. Reviewed-by: Scott D Phillips <[email protected]>
* i965: Validate destination restrictions with vector immediatesMatt Turner2017-08-213-12/+141
| | | | Reviewed-by: Scott D Phillips <[email protected]>
* i965: Don't let raw-move check be tricked by immediate vector typesMatt Turner2017-08-211-3/+10
| | | | | | | UB and B type encodings are the same as UV and VF. Noticed when writing the following patch. Reviewed-by: Scott D Phillips <[email protected]>
* i965: Only change type of 0.0f to VF if destination stride == 1Matt Turner2017-08-211-1/+2
| | | | | | | | | | The destination stride must be equivalent to a dword if VF is used. Also, since the only compaction table entires with "i:vf" have the destination as "r:f" specifically check that the destination is of type float. Reviewed-by: Scott D Phillips <[email protected]>
* i965: Remove CONT/BREAK from instruction compaction testMatt Turner2017-08-211-4/+0
| | | | | | | These cannot be compacted. A similar mistake was fixed in commit 90eaf01616a8 Reviewed-by: Scott D Phillips <[email protected]>
* i965: Test instruction compaction on all supported GensMatt Turner2017-08-211-8/+42
| | | | | | | | Note that there's no point in testing on G45, since its compaction is the same as Gen5. Same logic applies to Gen7 variants and low-power parts. Reviewed-by: Scott D Phillips <[email protected]>
* i965: Silence signed/unsigned comparison warningMatt Turner2017-08-211-1/+1
| | | | Reviewed-by: Scott D Phillips <[email protected]>
* i965: Move compaction "prepass" into brw_eu_compact.cMatt Turner2017-08-212-72/+82
| | | | Reviewed-by: Scott D Phillips <[email protected]>
* i965: Mark src inst pointer const in compaction codeMatt Turner2017-08-212-12/+13
| | | | Reviewed-by: Scott D Phillips <[email protected]>
* intel/compiler: properly size attribute wa_flags array for VulkanIago Toral Quiroga2017-08-111-1/+17
| | | | | | | | | | | | | | | | | | Mesa will map user defined vertex input attributes to slots starting at VERT_ATTRIB_GENERIC0 which gives us room for only 16 slots (up to GL_VERT_ATTRIB_MAX). This sufficient for GL, where we expose exactly 16 vertex attributes for user defined inputs, but in Vulkan we can expose up to 28 (which are also mapped from VERT_ATTRIB_GENERIC0 onwards) so we need to account for this when we scope the size of the array of attribute workaround flags that is used during the brw_vertex_workarounds NIR pass. This prevents out-of-bounds accesses in that array for NIR shaders that use more than 16 vertex input attributes. Fixes: dEQP-VK.pipeline.vertex_input.max_attributes.* Acked-by: Lionel Landwerlin <[email protected]>
* intel/vec4/gs: reset nr_pull_param if DUAL_INSTANCED compile failed.Dave Airlie2017-08-031-0/+1
| | | | | | | | | | | | | | | If dual object compile fails (as seems to happen with virgl a fair bit, and does piglit even have any tests for it?), we end up not restarting the pull params, so we call vec4_visitor::move_uniform_array_access_to_pull_constant a second time and it runs over the ends of the alloc. Fixes: tests/spec/glsl-1.50/execution/geometry/max-input-components.shader_test running inside virgl on ivybridge. Reviewed-by: Kenneth Graunke <[email protected]> Cc: <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* i965: Fix indentationMatt Turner2017-08-022-8/+8
|
* i965: Set lower_vote_trivial in vector_nir_options_gen6 too.Kenneth Graunke2017-07-211-0/+1
| | | | | | There's a second struct for Gen6+. Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Match destination type to size for ballotMatt Turner2017-07-202-2/+6
| | | | No use in taking a 64-bit value when we know the high 32-bits are zero.
* nir: Reduce destination size of ballot intrinsic when possibleMatt Turner2017-07-201-0/+1
| | | | | | | | | Some hardware, like i965, doesn't support group sizes greater than 32. In that case, we can reduce the destination size of the ballot intrinsic, which will simplify our code generation. Reviewed-by: Connor Abbott <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Implement ARB_shader_ballot operationsMatt Turner2017-07-203-0/+48
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Do not move MOVs writing the flag outside of control flowMatt Turner2017-07-201-2/+4
| | | | | | | | | | | | | | | | | | | The implementation of ballotARB() will start by zeroing the flags register. So, a doing something like if (gl_SubGroupInvocationARB % 2u == 0u) { ... = ballotARB(true); [...] } else { ... = ballotARB(true); [...] } (like fs-ballot-if-else.shader_test does) would generate identical MOVs to the same destination (the flag register!), and we definitely do not want to pull that out of the control flow. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Handle explicit flag sources in flags_read()Francisco Jerez2017-07-201-4/+5
| | | | | | | The implementations of the ARB_shader_ballot intrinsics will explicitly read the flag as a source register. Reviewed-by: Matt Turner <[email protected]>