summaryrefslogtreecommitdiffstats
path: root/src/intel/compiler
Commit message (Collapse)AuthorAgeFilesLines
* intel/compiler: Account for built-in uniforms in analyze_ubo_rangesJason Ekstrand2018-07-232-3/+39
| | | | | | | | | | | | The original pass only looked for load_uniform intrinsics but there are a number of other places that could end up loading a push constant. One obvious omission was images which always implicitly use a push constant. Legacy VS clip planes also get pushed into the shader. This fixes some new Vulkan CTS tests that test random combinations of bindings and, in particular, test lots of UBOs and images together. Cc: [email protected] Cc: Kenneth Graunke <[email protected]>
* intel/compiler: fix -Wsign-compare warningCaio Marcelo de Oliveira Filho2018-07-181-1/+1
| | | | | | | | | | | | Explicitly convert to signed integer. Conversion is valid since is the same (implicitly) used to initialize the loop. Avoids the warning: ../../src/intel/compiler/brw_fs.cpp: In member function ‘bool fs_visitor::lower_simd_width()’: ../../src/intel/compiler/brw_fs.cpp:5761:45: warning: comparison of integer expressions of different signedness: ‘int’ and ‘unsigned int’ [-Wsign-compare] split_inst.eot = inst->eot && i == n - 1; ~~^~~~~~~~ Reviewed-by: Anuj Phogat <[email protected]>
* intel/compiler: silence -Wclass-memaccess warningsCaio Marcelo de Oliveira Filho2018-07-182-5/+5
| | | | Reviewed-by: Anuj Phogat <[email protected]>
* i965/fs: unspills shoudn't use grf127 as dest since Gen8+Jose Maria Casanova Crespo2018-07-122-5/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | At 232ed8980217dd65ab0925df28156f565b94b2e5 "i965/fs: Register allocator shoudn't use grf127 for sends dest" we didn't take into account the case of SEND instructions that are not send_from_grf. But since Gen7+ although the backend still uses MRFs internally for sends they are finally assigned to a GRFs. In the case of unspills the backend assigns directly as source its destination because it is suppose to be available. So we always have a source-destination overlap. If the reg_allocator assigns registers that include the grf127 we fail the validation rule that affects Gen8+ "r127 must not be used for return address when there is a src and dest overlap in send instruction." So this patch activates the grf127_send_hack_node for Gen8+ and if we have any register spilled we add interferences to the destination of the unspill operations. We also need to avoid that opt_bank_conflicts() optimization, that runs after the register allocation, doesn't move things around, causing the grf127 to be used in the condition we were avoiding. Fixes piglit test tests/spec/arb_compute_shader/linker/bug-93840.shader_test and some shader-db crashed because of the grf127 validation rule.. v2: make sure that opt_bank_conflicts() optimization doesn't change the use of grf127. (Caio) Found by Caio Marcelo de Oliveira Filho Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107193 Fixes: 232ed89802 "i965/fs: Register allocator shoudn't use grf127 for sends dest" Cc: 18.1 <[email protected]> Cc: Caio Marcelo de Oliveira Filho <[email protected]> Cc: Jason Ekstrand <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/ir: Uncomment definition of several unused hardware opcodes.Francisco Jerez2018-07-091-14/+14
| | | | | | | | | | There are a number of opcode_desc table entries for many of these unused opcodes. A symbolic opcode enum will be required in a future commit in order to keep them in the opcode description tables. The alternative would be to remove the unused opcodes from the opcode description tables. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/fs: Initialize mlen for gen7 varying pull constant load messages.Francisco Jerez2018-07-092-7/+5
| | | | | | | This makes the message length available at the IR level, which should save some guesswork in a future commit. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/eu: Assert that the instruction is send-like in brw_set_desc_ex().Francisco Jerez2018-07-091-2/+3
| | | | | | | Constructing a descriptor in-place as part of the immediate of an ALU instruction is no longer supported. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/eu: Get rid of the return value of brw_send_indirect_message().Francisco Jerez2018-07-092-20/+5
| | | | | | | | | The return value is not used anymore. This allows simplifying the code slightly, and in addition it should frustrate anybody's attempts to continue using the obsolete piecemeal approach to construct a message descriptor in combination with brw_send_indirect_message(). Reviewed-by: Kenneth Graunke <[email protected]>
* intel/eu: Get rid of the return value of brw_send_indirect_surface_message().Francisco Jerez2018-07-091-10/+6
| | | | | | | | All users of brw_send_indirect_surface_message() should be providing a full descriptor immediate up front by now, this isn't necessary anymore. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/eu: Use descriptor constructors for dataport typed surface messages.Francisco Jerez2018-07-091-47/+35
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* intel/eu: Use descriptor constructors for dataport scattered byte surface ↵Francisco Jerez2018-07-091-33/+27
| | | | | | messages. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/eu: Use descriptor constructors for dataport untyped surface messages.Francisco Jerez2018-07-092-50/+52
| | | | | | v2: Use SET_BITS macro instead of left shift (Ken). Reviewed-by: Kenneth Graunke <[email protected]>
* intel/eu: Provide single descriptor argument to ↵Francisco Jerez2018-07-091-29/+36
| | | | | | | | | brw_send_indirect_surface_message(). Instead of the current message_len, response_len and header_present arguments. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/eu: Use descriptor constructors for pixel interpolator messages.Francisco Jerez2018-07-092-14/+29
| | | | | | v2: Use SET_BITS macro instead of left shift (Ken). Reviewed-by: Kenneth Graunke <[email protected]>
* intel/eu: Use descriptor constructors for dataport write messages.Francisco Jerez2018-07-093-98/+65
| | | | | | v2: Use SET_BITS macro instead of left shift (Ken). Reviewed-by: Kenneth Graunke <[email protected]>
* intel/eu: Use descriptor constructors for dataport read messages.Francisco Jerez2018-07-094-95/+85
| | | | | | v2: Use SET_BITS macro instead of left shift (Ken). Reviewed-by: Kenneth Graunke <[email protected]>
* intel/eu: Use descriptor constructors for sampler messages.Francisco Jerez2018-07-094-122/+91
| | | | | | v2: Use SET_BITS macro instead of left shift (Ken). Reviewed-by: Kenneth Graunke <[email protected]>
* intel/eu: Provide desc immediate argument up front to ↵Francisco Jerez2018-07-094-11/+13
| | | | | | | | | | brw_send_indirect_message(). The current approach of returning a setup instruction where additional descriptor fields can be specified is still supported in order to keep things working, but it will be removed later in this series. Reviewed-by: Kenneth Graunke <[email protected]>
* TRIVIAL: intel/eu: Use a local devinfo variable in brw_shader_time_add().Francisco Jerez2018-07-091-5/+6
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* intel/eu: Use brw_set_desc() along with a helper to set common descriptor ↵Francisco Jerez2018-07-093-86/+68
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | controls. This replaces brw_set_message_descriptor() with the composition of brw_set_desc() and a new inline helper function that packs the common message descriptor controls into an integer. The goal is to represent all message descriptors as a 32-bit integer which is written at once into the instruction, which is more flexible (SENDS anyone?), robust (see d2eecf0b0b24d203d0f171807681dffd830d54de fixing an issue ultimately caused by some bits of the extended message descriptor being left undefined) and future-proof than the current approach of specifying the individual descriptor fields directly into the instruction. This approach also seems more self-documenting, since it will allow removing calls to functions with way too many arguments like brw_set_*_message() and brw_send_indirect_message(), and instead provide a single descriptor argument constructed from an appropriate combination of brw_*_desc() helpers. Note that because brw_set_message_descriptor() was (conditionally?) overriding fields of the instruction which strictly speaking weren't part of the message descriptor, this involves calling brw_inst_set_sfid() and brw_inst_set_eot() in some cases in addition to brw_set_desc(). v2: Use SET_BITS macro instead of left shift (Ken). Reviewed-by: Kenneth Graunke <[email protected]>
* intel/eu: Define SET_BITS helper more easily reusable than SET_FIELD.Francisco Jerez2018-07-091-0/+7
| | | | | | | | Allows to specify a bitfield based on its upper and lower bounds instead of a symbolic field definition, kind of what the current GET_BITS macro is to GET_FIELD. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/eu: Define helper to specify the descriptor immediates of a SEND ↵Francisco Jerez2018-07-092-0/+26
| | | | | | instruction. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/eu: Add brw_inst.h helpers for the SEND(C) descriptor and extended ↵Francisco Jerez2018-07-091-0/+78
| | | | | | | | | | | | descriptor. This introduces helpers that can be used to specify or extract the whole descriptor of a SEND message instruction at once. Because the the instruction encoding of these is rather awkward on some generations using the generic brw_inst.h macros doesn't seem like an option. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/compiler: emit actual barriers for working-group level barriersIago Toral Quiroga2018-07-101-23/+2
| | | | | | | | | Until now we have assumed that we could skip emitting these barriers in the general case based on empirical testing and a few assumptions detailed in a comment in the driver code, however, recent CTS tests have showed that we actually need them to produce correct behavior. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Enable store_ssbo for 8-bit types.Jose Maria Casanova Crespo2018-07-101-7/+8
| | | | | | v2: Update comment according to this patch. (Jason Ekstrand) Reviewed-by: Jason Ekstrand <[email protected]>
* intel/compiler: relax brw_eu_validate for byte raw movsJose Maria Casanova Crespo2018-07-101-3/+5
| | | | | | | | | | | When the destination is a BYTE type allow raw movs even if the stride is not exact multiple of destination type and exec type, execution type is Word and its size is 2. This restriction was only allowing stride==2 destinations for 8-bit types. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Enable conversions to 8-bit integersJose Maria Casanova Crespo2018-07-101-0/+2
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Support for 8-bit base types in helper functionsJose Maria Casanova Crespo2018-07-102-1/+14
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Register allocator shoudn't use grf127 for sends destJose Maria Casanova Crespo2018-07-101-0/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since Gen8+ Intel PRM states that "r127 must not be used for return address when there is a src and dest overlap in send instruction." This patch implements this restriction creating new grf127_send_hack_node at the register allocator. This node has a fixed assignation to grf127. For vgrf that are used as destination of send messages we create node interfereces with the grf127_send_hack_node. So the register allocator will never assign to these vgrf a register that involves grf127. If dispatch_width > 8 we don't create these interferences to the because all instructions have node interferences between sources and destination. That is enough to avoid the r127 restriction. This fixes CTS tests that raised this issue as they were executed as SIMD8: dEQP-VK.spirv_assembly.instruction.graphics.8bit_storage.8struct_to_32struct.storage_buffer_*int_geom Shader-db results on Skylake: total instructions in shared programs: 7686798 -> 7686797 (<.01%) instructions in affected programs: 301 -> 300 (-0.33%) helped: 1 HURT: 0 total cycles in shared programs: 337092322 -> 337091919 (<.01%) cycles in affected programs: 22420415 -> 22420012 (<.01%) helped: 712 HURT: 588 Shader-db results on Broadwell: total instructions in shared programs: 7658574 -> 7658625 (<.01%) instructions in affected programs: 19610 -> 19661 (0.26%) helped: 3 HURT: 4 total cycles in shared programs: 340694553 -> 340676378 (<.01%) cycles in affected programs: 24724915 -> 24706740 (-0.07%) helped: 998 HURT: 916 total spills in shared programs: 4300 -> 4311 (0.26%) spills in affected programs: 333 -> 344 (3.30%) helped: 1 HURT: 3 total fills in shared programs: 5370 -> 5378 (0.15%) fills in affected programs: 274 -> 282 (2.92%) helped: 1 HURT: 3 v2: Avoid duplicating register classes without grf127. Let's use a node with a fixed assignation to grf127 and create interferences to send message vgrf destinations. (Eric Anholt) v3: Update reference to CTS VK_KHR_8bit_storage failing tests. (Jose Maria Casanova) Reviewed-by: Jason Ekstrand <[email protected]> Cc: 18.1 <[email protected]>
* intel/compiler: grf127 can not be dest when src and dest overlap in sendJose Maria Casanova Crespo2018-07-101-0/+11
| | | | | | | | | | | | | | Implement at brw_eu_validate the restriction from Intel Broadwell PRM, vol 07, section "Instruction Set Reference", subsection "EUISA Instructions", Send Message (page 990): "r127 must not be used for return address when there is a src and dest overlap in send instruction." v2: Style fixes (Matt Turner) Reviewed-by: Matt Turner <[email protected]> Cc: 18.1 <[email protected]>
* intel/fs: use uint type for per_slot_offset at GSJose Maria Casanova Crespo2018-07-091-1/+1
| | | | | | | | | | | | This helps us to compact original instruction: mul(8) g3<1>D g6<8,8,1>UD 0x00000006UD { align1 1Q }; So now we emit: mul(8) g3<1>UD g6<8,8,1>UD 0x00000006UD { align1 1Q compacted }; Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* intel/compiler: remove unused functionIago Toral Quiroga2018-07-092-31/+0
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/compiler: Relax mixed type restriction for saturating immediatesIan Romanick2018-07-062-4/+22
| | | | | | | | | | | | | | | | | At the time of commit 7bc6e455e23 (i965: Add support for saturating immediates.) we thought mixed type saturates would be impossible. We were only thinking about type converting moves from D to F, for example. However, type converting moves w/saturate from F to DF are definitely possible. This change minimally relaxes the restriction to allow cases that I have been able trigger via piglit tests. Fixes new piglit tests: - arb_gpu_shader_fp64/execution/built-in-functions/fs-sign-sat-neg-abs.shader_test - arb_gpu_shader_fp64/execution/built-in-functions/vs-sign-sat-neg-abs.shader_test Signed-off-by: Ian Romanick <[email protected]> Cc: [email protected] Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* i965/vec4: Properly handle sign(-abs(x))Ian Romanick2018-07-061-1/+17
| | | | | | | | | | | | | | This is achived by copying the sign(abs(x)) optimization from the FS backend. On Gen7 an earlier platforms, this fixes new piglit tests: - glsl-1.10/execution/vs-sign-neg-abs.shader_test - glsl-1.10/execution/vs-sign-sat-neg-abs.shader_test Signed-off-by: Ian Romanick <[email protected]> Cc: [email protected] Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* i965/fs: Properly handle sign(-abs(x))Ian Romanick2018-07-061-3/+12
| | | | | | | | | | | | | Fixes new piglit tests: - glsl-1.10/execution/fs-sign-neg-abs.shader_test - glsl-1.10/execution/fs-sign-sat-neg-abs.shader_test - glsl-1.10/execution/vs-sign-neg-abs.shader_test - glsl-1.10/execution/vs-sign-sat-neg-abs.shader_test Signed-off-by: Ian Romanick <[email protected]> Cc: [email protected] Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* python: Use the print functionMathieu Bridon2018-07-061-3/+5
| | | | | | | | | | | | In Python 2, `print` was a statement, but it became a function in Python 3. Using print functions everywhere makes the script compatible with Python versions >= 2.6, including Python 3. Signed-off-by: Mathieu Bridon <[email protected]> Acked-by: Eric Engestrom <[email protected]> Acked-by: Dylan Baker <[email protected]>
* i965/vec4: Make the vec4_visitor::nir_emit_instr default case unreachableIan Romanick2018-07-051-2/+1
| | | | | | | | | | | | | The bug fixed by the previous commit went undetected because extra stderr messages are not flagged by the CI. Copy the solution from fs_visitor::nir_emit_instr and mark the default case unreachable. An alternate solution is to delete the default case so that the compiler will issue a warning. That may require more work since there are other (impossible) cases that exist. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel/compiler: More DCE after loweringIan Romanick2018-07-051-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some of the lowering passes, nir_lower_locals_to_regs for example, can cause some previously live code to be dead. This pass in particular leaves a bunch of nir_instr_type_deref instructions floating around. This causes shader-db runs on Gen5 through Haswell to spew tons of messages like: VS instruction not yet implemented by NIR->vec4 UnrealEngine4/EffectsCaveDemo/239.shader_test is one shader that generates these messages. Cleaning up the dead code fixes that. To verify, I did a shader-db before and after. Even though all the messages are gone, the results make my brain hurt. :( Haswell total cycles in shared programs: 411890163 -> 411891145 (<.01%) cycles in affected programs: 57016 -> 57998 (1.72%) helped: 3 HURT: 11 helped stats (abs) min: 2 max: 154 x̄: 96.67 x̃: 134 helped stats (rel) min: 0.08% max: 2.23% x̄: 1.42% x̃: 1.96% HURT stats (abs) min: 18 max: 686 x̄: 115.64 x̃: 20 HURT stats (rel) min: 0.81% max: 7.12% x̄: 1.87% x̃: 0.93% 95% mean confidence interval for cycles value: -51.39 191.67 95% mean confidence interval for cycles %-change: -0.14% 2.46% Inconclusive result (value mean confidence interval includes 0). Ivy Bridge total cycles in shared programs: 259114802 -> 259115032 (<.01%) cycles in affected programs: 24034 -> 24264 (0.96%) helped: 1 HURT: 9 helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 helped stats (rel) min: 0.08% max: 0.08% x̄: 0.08% x̃: 0.08% HURT stats (abs) min: 18 max: 48 x̄: 25.78 x̃: 20 HURT stats (rel) min: 0.80% max: 1.94% x̄: 1.08% x̃: 0.80% 95% mean confidence interval for cycles value: 12.42 33.58 95% mean confidence interval for cycles %-change: 0.54% 1.38% Cycles are HURT. Signed-off-by: Ian Romanick <[email protected]> Fixes: 5a02ffb733e nir: Rework lower_locals_to_regs to use deref instructions Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Fix output register sizes when variable ranges are interleavedNeil Roberts2018-07-041-7/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | In 6f5abf31466aed this code was fixed to calculate the maximum size of an attribute in a seperate pass and then allocate the registers to that size. However this wasn’t taking into account ranges that overlap but don’t have the same starting location. For example: layout(location = 0, component = 0) out float a[4]; layout(location = 2, component = 1) out float b[4]; Previously, if ‘a’ was processed first then it would allocate a register of size 4 for location 0 and it wouldn’t allocate another register for location 2 because it would already be covered by the range of 0. Then if something tries to write to b[2] it would try to write past the end of the register allocated for ‘a’ and it would hit an assert. This patch changes it to scan for any overlapping ranges that start within each range to calculate the maximum extent and allocate that instead. Fixed Piglit’s arb_enhanced_layouts/execution/component-layout/ vs-fs-array-interleave-range.shader_test Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Fixes: 6f5abf31466 "i965: Fix output register sizes when multiple variables share a slot."
* i965/vec4: Don't cmod propagate from CMP to ADD if the writemask isn't ↵Ian Romanick2018-07-022-5/+87
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | compatible Otherwise we can incorrectly cmod propagate in situations like add(8) g10<1>.xD g2<0>.xD -16D ... cmp.ge.f0(8) null<1>D g2<0>.xD 16D ... (+f0) sel(8) g21<1>.xyUD g14<4>.xyyyUD g18<4>.xyyyUD Sadly, this change hurts quite a few shaders. v2: Refactor writemask compatibility check into a separate function. Suggested by Caio. Ivy Bridge and Haswell had similar results. (Haswell shown) total instructions in shared programs: 12968489 -> 12968738 (<.01%) instructions in affected programs: 60679 -> 60928 (0.41%) helped: 0 HURT: 249 HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.22% max: 0.81% x̄: 0.46% x̃: 0.44% 95% mean confidence interval for instructions value: 1.00 1.00 95% mean confidence interval for instructions %-change: 0.44% 0.48% Instructions are HURT. total cycles in shared programs: 409171965 -> 409172317 (<.01%) cycles in affected programs: 260056 -> 260408 (0.14%) helped: 0 HURT: 176 HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 0.04% max: 0.34% x̄: 0.17% x̃: 0.17% 95% mean confidence interval for cycles value: 2.00 2.00 95% mean confidence interval for cycles %-change: 0.16% 0.18% Cycles are HURT. Sandy Bridge total instructions in shared programs: 10423577 -> 10423753 (<.01%) instructions in affected programs: 40667 -> 40843 (0.43%) helped: 0 HURT: 176 HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.29% max: 0.79% x̄: 0.48% x̃: 0.42% 95% mean confidence interval for instructions value: 1.00 1.00 95% mean confidence interval for instructions %-change: 0.46% 0.51% Instructions are HURT. total cycles in shared programs: 146097503 -> 146097855 (<.01%) cycles in affected programs: 503990 -> 504342 (0.07%) helped: 0 HURT: 176 HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 0.02% max: 0.36% x̄: 0.12% x̃: 0.11% 95% mean confidence interval for cycles value: 2.00 2.00 95% mean confidence interval for cycles %-change: 0.11% 0.13% Cycles are HURT. No changes on any other platforms. Signed-off-by: Ian Romanick <[email protected]> Fixes: cd635d149b2 i965/vec4: Propagate conditional modifiers from compares to adds Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/compiler: Silence unused parameter warnings brw_nir.cIan Romanick2018-07-025-7/+6
| | | | | | | | | | | | | | src/intel/compiler/brw_nir.c: In function ‘brw_nir_lower_vue_outputs’: src/intel/compiler/brw_nir.c:464:32: warning: unused parameter ‘is_scalar’ [-Wunused-parameter] bool is_scalar) ^~~~~~~~~ src/intel/compiler/brw_nir.c: In function ‘lower_bit_size_callback’: src/intel/compiler/brw_nir.c:610:57: warning: unused parameter ‘data’ [-Wunused-parameter] lower_bit_size_callback(const nir_alu_instr *alu, void *data) ^~~~ Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* anv,intel: Enable nir_opt_large_constants for VulkanJason Ekstrand2018-07-022-0/+13
| | | | | | | | | | | | According to RenderDoc, this shaves 99.6% of the run time off of the ambient occlusion pass in Skyrim Special Edition when running under DXVK and shaves 92% off the runtime for a reasonably representative frame. When running the actual game, Skyrim goes from being a slide-show to a very stable and playable framerate on my SKL GT4e machine. Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel/fs: Build 32-wide FS shaders.Francisco Jerez2018-06-281-11/+43
| | | | Co-authored-by: Jason Ekstrand <[email protected]>
* intel/fs: Add fields to wm_prog_data for SIMD32 dispatchJason Ekstrand2018-06-282-0/+8
| | | | Reviewed-by: Matt Turner <[email protected]>
* intel/fs: Fix nir_intrinsic_load_helper_invocation for SIMD32.Francisco Jerez2018-06-281-5/+9
| | | | | Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* intel/fs: Fix fs_builder::sample_mask_reg() for 32-wide FS dispatch.Francisco Jerez2018-06-281-3/+3
| | | | | Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* intel/fs: Fix Gen6+ interpolation setup for SIMD32Francisco Jerez2018-06-281-56/+60
| | | | | Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* intel/fs: Get rid of MOV_DISPATCH_TO_FLAGSJason Ekstrand2018-06-285-35/+8
| | | | | | We can just emit the MOV in the two places where we use this. Reviewed-by: Matt Turner <[email protected]>
* intel/fs: Emit MOV_DISPATCH_TO_FLAGS once for the centroid workaroundJason Ekstrand2018-06-282-50/+16
| | | | | | | There's no reason for us to emit it a pile of times and then have a whole pass to clean it up. Just emit it once like we really want. Reviewed-by: Matt Turner <[email protected]>
* intel/fs: Generalize the unlit centroid workaroundFrancisco Jerez2018-06-281-14/+8
| | | | | | | | This generalizes the unlit centroid workaround so it's less code and now supports SIMD32. Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>