summaryrefslogtreecommitdiffstats
path: root/src/intel/compiler
Commit message (Collapse)AuthorAgeFilesLines
* i965: Fix indentationMatt Turner2017-08-022-8/+8
|
* i965: Set lower_vote_trivial in vector_nir_options_gen6 too.Kenneth Graunke2017-07-211-0/+1
| | | | | | There's a second struct for Gen6+. Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Match destination type to size for ballotMatt Turner2017-07-202-2/+6
| | | | No use in taking a 64-bit value when we know the high 32-bits are zero.
* nir: Reduce destination size of ballot intrinsic when possibleMatt Turner2017-07-201-0/+1
| | | | | | | | | Some hardware, like i965, doesn't support group sizes greater than 32. In that case, we can reduce the destination size of the ballot intrinsic, which will simplify our code generation. Reviewed-by: Connor Abbott <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Implement ARB_shader_ballot operationsMatt Turner2017-07-203-0/+48
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Do not move MOVs writing the flag outside of control flowMatt Turner2017-07-201-2/+4
| | | | | | | | | | | | | | | | | | | The implementation of ballotARB() will start by zeroing the flags register. So, a doing something like if (gl_SubGroupInvocationARB % 2u == 0u) { ... = ballotARB(true); [...] } else { ... = ballotARB(true); [...] } (like fs-ballot-if-else.shader_test does) would generate identical MOVs to the same destination (the flag register!), and we definitely do not want to pull that out of the control flow. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Handle explicit flag sources in flags_read()Francisco Jerez2017-07-201-4/+5
| | | | | | | The implementations of the ARB_shader_ballot intrinsics will explicitly read the flag as a source register. Reviewed-by: Matt Turner <[email protected]>
* nir: Add system values from ARB_shader_ballotMatt Turner2017-07-202-3/+3
| | | | | | | | | | | | | We already had a channel_num system value, which I'm renaming to subgroup_invocation to match the rest of the new system values. Note that while ballotARB(true) will return zeros in the high 32-bits on systems where gl_SubGroupSizeARB <= 32, the gl_SubGroup??MaskARB variables do not consider whether channels are enabled. See issue (1) of ARB_shader_ballot. Reviewed-by: Connor Abbott <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Implement ARB_shader_group_vote operationsMatt Turner2017-07-201-0/+50
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Handle explicit flag destinations in flags_written()Francisco Jerez2017-07-201-4/+19
| | | | | | | The implementations of the ARB_shader_group_vote intrinsics will explicitly write the flag as the destination register. Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Lower ARB_shader_group_vote intrinsicsMatt Turner2017-07-201-0/+1
| | | | | | | | I don't expect anyone is going to care about using this in vec4 programs (vertex/tessellation/geometry on Gen6/7), no one has come up with a good way to implement it much less test it. Reviewed-by: Kenneth Graunke <[email protected]>
* nir: Add pass to optimize intrinsicsMatt Turner2017-07-201-0/+1
| | | | | | | Specifically, constant fold intrinsics from ARB_shader_group_vote, but I suspect it'll be useful for other things in the future. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Use pushed UBO data in the scalar backend.Kenneth Graunke2017-07-133-1/+64
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This actually takes advantage of the newly pushed UBO data, avoiding pull loads. Improves performance in GLBenchmark Manhattan 3.1 by: HSW: ~1%, BDW/SKL/KBL GT2: 3-4%, SKL GT4: 7-8%, APL: 4-5%. (thanks to Eero Tamminen for these numbers) shader-db results on Skylake, ignoring programs with spill/fill changes: total instructions in shared programs: 13963994 -> 13651893 (-2.24%) instructions in affected programs: 4250328 -> 3938227 (-7.34%) helped: 28527 HURT: 0 total cycles in shared programs: 179808608 -> 172535170 (-4.05%) cycles in affected programs: 79720410 -> 72446972 (-9.12%) helped: 26951 HURT: 1248 LOST: 46 GAINED: 21 Many "Deus Ex: Mankind Divided" shaders which already spilled end up spill a lot more (about 240 programs hurt, 9 helped). The cycle estimator suggests this is still overall a win (-0.23% in cycle counts) presumably because we trade pull loads for fills. v2: Drop "PULL" environment variable left in for initial debugging (caught by Matt). Reviewed-by: Matt Turner <[email protected]>
* i965: Factor out push locations.Kenneth Graunke2017-07-132-16/+25
| | | | | | | | With UBOs, the answer of "have we decided to push this uniform" gets a bit more complicated - for one, we have multiple surfaces. This patch refactors things so we can add the new code in a single place. Reviewed-by: Matt Turner <[email protected]>
* i965: Push UBO data, but don't use it just yet.Kenneth Graunke2017-07-132-1/+11
| | | | | | | | | | | This patch starts uploading UBO data via 3DSTATE_CONSTANT_* packets, and updates the compiler to know that there's extra payload data, so things continue working. However, it still issues pull loads for all data. I wanted to separate the two aspects for greater bisectability. v2: Update for new intel_bufferobj_buffer parameter. Reviewed-by: Matt Turner <[email protected]>
* i965: Select ranges of UBO data to be uploaded as push constants.Kenneth Graunke2017-07-133-0/+311
| | | | | | | | | | | | | | | This adds a NIR pass that decides which portions of UBOS we should upload as push constants, rather than pull constants. v2: Switch to uint16_t for the UBO block number, because we may have a lot of them in Vulkan (suggested by Jason). Add more comments about bitfield trickery (requested by Matt). v3: Skip vec4 stages for now...I haven't finished wiring up support in the vec4 backend, and so pushing the data but not using it will just be wasteful. Reviewed-by: Matt Turner <[email protected]>
* i965: Switch to absolute addressing for constant buffer 0.Kenneth Graunke2017-07-131-0/+6
| | | | | | | | | | | | | | | | | By default, 3DSTATE_CONSTANT_* Constant Buffer 0 is relative to dynamic state base address. This makes it unusable for pushing UBOs. I'd like to be able to use all four push buffers. There is a bit in the INSTPM register (or CS_DEBUG_MODE2 on Skylake) which controls whether buffer 0 is relative to dynamic state base address, or simply a normal pointer. Setting that gives us full flexibility. We can't currently write this on Haswell and earlier, and will need to update the kernel command parser, and then do the whole version checking song and dance. Reviewed-by: Matt Turner <[email protected]>
* intel/compiler: no need to check unsigned is >= 0Lionel Landwerlin2017-07-131-1/+1
| | | | | | CID: 1338342 Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* intel/compiler: don't check unsigned is >= 0Lionel Landwerlin2017-07-131-1/+1
| | | | | | CID: 1224468 Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* intel/compiler: remove check unsigned is >= 0Lionel Landwerlin2017-07-131-1/+1
| | | | | | | | By definition unsigned are always >= 0. CID: 742212 Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* intel/compiler: Don't use opt_sampler_eot() optimization on gen10+Anuj Phogat2017-07-121-1/+1
| | | | | | | This optimization has been removed on gen10+. Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* i965/i915: Add UYVY as the supported formatJohnson Lin2017-06-302-0/+2
| | | | | | Trigger the correct sampler options for it. Similar with YUYV Reviewed-by: Kristian H. Kristensen <[email protected]>
* intel: compiler/i965: fix is_broxton checksLionel Landwerlin2017-06-203-4/+4
| | | | | | | | | | In 5f2fe9302c is_geminilake was introduced for the differenciate broxton from geminilake. Unfortunately I failed as verifying that is_broxton is throughout the code base to mean Gen9lp. Fixes: 5f2fe9302c ("intel: common: add flag to identify platforms by name") Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/cnl: Make URB {VS, GS, HS, DS} sizes non multiple of 3Anuj Phogat2017-06-094-4/+33
| | | | | | | | | | v1: By Ben Widawsky <[email protected]> v2: v1 had an assert only for VS. Add the restriction for GS, HS and DS as well and make sure the allocated sizes are not multiple of 3. v3: Move the entry_size checks in to compiler code (Ken) Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/cnl: Handle gen10 in switch cases across the driverAnuj Phogat2017-06-092-0/+3
| | | | | | | | | V2: Start using gen10 functions isl_gen10*(), gen10_blorp_exec() gen10_init_atoms() (Jason) Remove Vulkan changes. Do them later in a separate patch. Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965/cnl: Update few assertionsAnuj Phogat2017-06-091-1/+1
| | | | | Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* tree-wide: remove trailing backslashEric Engestrom2017-06-071-1/+1
| | | | | | | | | Simple search for a backslash followed by two newlines. If one of the newlines were to be removed, this would cause issues, so let's just remove these trailing backslashes. Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965: Change INTEL_DEBUG=vec4 to INTEL_SCALAR_VS for consistency.Kenneth Graunke2017-06-051-1/+1
| | | | | | | | | We moved to INTEL_SCALAR_* when we added more than a single stage, but never went back and converted the VS to work that way. Be consistent. Also update the documentation to actually mention these debug variables. Acked-by: Jason Ekstrand <[email protected]>
* i965: Drop duplicate shadow variable.Kenneth Graunke2017-06-011-1/+0
| | | | | | We already initialized this at the top of the function. Trivial.
* i965: Move SOL PSIZ hacks from draw time to link time.Kenneth Graunke2017-06-011-12/+1
| | | | | | | | | We can just update the gl_transform_feedback_info fields at link time to make the VUE header fields have the right location and component. Then we don't need to handle them specially at draw time, which is expensive. Reviewed-by: Rafael Antognolli <[email protected]>
* i965: Ignore INTEL_SCALAR_* debug variables on Gen10+.Kenneth Graunke2017-05-291-10/+16
| | | | | | | | | | | Scalar mode has been default since Broadwell, and vector mode is getting increasingly unmaintained. There are a few things that don't even fully work in vector mode on Skylake, but we've never cared because nobody uses it. There's no point in porting it forward to new platforms. So, just ignore the debug options to force it on. Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Move clip program compilation to the compilerJason Ekstrand2017-05-268-0/+2340
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Move SF compilation to the compilerJason Ekstrand2017-05-263-0/+931
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* intel/compiler: Make brw_disasm take const assemblyJason Ekstrand2017-05-263-15/+15
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* i965/vec4: load dvec3/4 uniforms first in the push constant bufferSamuel Iglesias Gonsálvez2017-05-181-27/+80
| | | | | | | | | | | | | | | | | | | | | | | | Reorder the uniforms to load first the dvec4-aligned variables in the push constant buffer and then push the vec4-aligned ones. It takes into account that the relocated uniforms should be aligned to their channel size. This fixes a bug were the dvec3/4 might be loaded one part on a GRF and the rest in next GRF, so the region parameters to read that could break the HW rules. v2: - Fix broken logic. - Add a comment to explain what should be needed to optimise the usage of the push constant buffer slots, as this patch does not pack the uniforms. v3: - Implemented the push constant buffer usage optimization. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Cc: "17.1" <[email protected]> Acked-by: Francisco Jerez <[email protected]>
* i965/vec4: fix swizzle and writemask when loading an uniform with constant ↵Samuel Iglesias Gonsálvez2017-05-181-4/+11
| | | | | | | | | | | | | | | | | | | | | | | | | offset It was setting XYWZ swizzle and writemask to all uniforms, no matter if they were a vector or scalar, so this can lead to problems when loading them to the push constant buffer. Moreover, 'shift' calculation was designed to calculate the offset in DWORDS, but it doesn't take into account DFs, so the calculated swizzle for the later ones was wrong. The indirect case is not changed because MOV INDIRECT will write to all components. Added an assert to verify that these uniforms are aligned. v2: - Fix 'shift' calculation (Curro) - Set both swizzle and writemask. - Add assert(shift == 0) for the indirect case. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Cc: "17.1" <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* i965/vec4/gs: restore the uniform values which was overwritten by failed ↵Samuel Iglesias Gonsálvez2017-05-181-0/+26
| | | | | | | | | | | | | | | | | vec4_gs_visitor execution We are going to add a packing feature to reduce the usage of the push constant buffer. One of the consequences is that 'nr_params' would be modified by vec4_visitor's run call, so we need to restore it if one of them failed before executing the fallback ones. Same thing happens to the uniforms values that would be reordered afterwards. Fixes GL45-CTS.arrays_of_arrays_gl.InteractionFunctionCalls2 when the dvec4 alignment and packing patch is applied. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Cc: "17.1" <[email protected]> Acked-by: Francisco Jerez <[email protected]>
* i965: Fix test_eu_validate.cppMatt Turner2017-05-161-1/+1
| | | | | | | Broken by commit a7217e909ce6 ("i965: Pass pointer and end of assembly to brw_validate_instructions"). Reported-by: Aaron Watry <[email protected]>
* i965: Add a weak no-op nir_print_instr() symbolMatt Turner2017-05-151-0/+2
| | | | | | | | | | | | | intel_asm_annotation.c is part of libintel_compiler.la, which contains code for disassembling and validating shaders that we want to call in aubinator_error_decode. dump_assembly() calls nir_print_instr() to print annotations, and although dump_assembly() is not called by aubinator_error_decode (nor is any function in intel_asm_annotation.c) it causes undefined references to nir_print_instr(). To work around, provide a no-op weak symbol to resolve against.
* i965: Allow brw_eu_validate to handle compact instructionsMatt Turner2017-05-151-2/+15
| | | | | | | | This will allow the validator to run on shader programs we find in the GPU hang error state. Reviewed-by: Iago Toral Quiroga <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Pass pointer and end of assembly to brw_validate_instructionsMatt Turner2017-05-155-11/+22
| | | | | | | This will allow us to more easily run brw_validate_instructions() on shader programs we find in GPU hang error states. Reviewed-by: Iago Toral Quiroga <[email protected]>
* i965/vec4: Delete the system value infastructureJason Ekstrand2017-05-0911-137/+5
| | | | | | | | | The only thing still using it is INVOCATION_ID for geometry shaders. That's easily enough inlined into the nir_intrinsic_load_invocation_id handling code. Reviewed-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vec4: Use NIR to do GS input remappingJason Ekstrand2017-05-099-101/+59
| | | | | | | | We're already doing this in the FS back-end. This just does the same thing in the vec4 back-end. Reviewed-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Move remapping of gl_PointSize to the NIR levelJason Ekstrand2017-05-092-26/+21
| | | | | Reviewed-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/nir: Inline remap_inputs_with_vue_mapJason Ekstrand2017-05-091-27/+22
| | | | | Reviewed-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vec4: Use NIR remapping for VS attributesJason Ekstrand2017-05-096-121/+34
| | | | | | | | | | | | | The NIR pass already handles remapping system values to attributes for us so we delete the system value code as part of the conversion. We also change nir_lower_vs_inputs to take an explicit inputs_read bitmask and pass in the inputs_read from prog_data instead from pulling it out of NIR. This is because the version in prog_data may get EDGEFLAG added to it on some old platforms. Reviewed-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel/compiler/vs: Move inputs_read handling to generic codeJason Ekstrand2017-05-091-0/+3
| | | | | Reviewed-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vec4: Set VERT_BIT_EDGEFLAG based on the VUE mapJason Ekstrand2017-05-091-0/+11
| | | | | | | We also add a nice little comment to make it more clear exactly what happens with the edge flag copy. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Lower gl_VertexID and friends to inputs at the NIR levelJason Ekstrand2017-05-094-70/+74
| | | | | | | | | NIR calls these system values but they come in from the VF unit as vertex data. It's terribly convenient to just be able to treat them as such in the back-end. Reviewed-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: Set uses_vertexid and friends from brw_compile_vsJason Ekstrand2017-05-093-11/+17
| | | | | Reviewed-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>