aboutsummaryrefslogtreecommitdiffstats
path: root/src/intel
Commit message (Collapse)AuthorAgeFilesLines
* intel/compiler: Set "Null Render Target" ex_desc bit on Gen11Kenneth Graunke2019-09-171-0/+3
| | | | | | | | | | | | | | | | When there are no color regions (i.e. a depth only pass), we can set the "Null Render Target" bit in the Gen11 RT write extended message descriptor to indicate that it should behave as if it's writing to a null render target, without the need for a binding table entry. This lets drivers avoid setting up that null RT binding table entry, but more importantly means the HW doesn't actually have to bother looking up the surface state. Together with the next patch, this improves performance in Car Chase on an Icelake 8x8 (locked to 700Mhz) by 0.0445526% +/- 0.0132736% (n=832). Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* anv: enable VK_KHR_shader_float_controls and SPV_KHR_float_controlsSamuel Iglesias Gonsálvez2019-09-173-0/+33
| | | | | | | | | | | | | | This adds support for VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_FLOAT_CONTROLS_PROPERTIES_KHR and enables de Vulkan and SPIR-V extensions. Also, notice that this includes the updates applied to the VkPhysicalDeviceFloatControlsPropertiesKHR structure in the extension VK_KHR_shader_float_controls v4 and Vulkan 1.1.116. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Signed-off-by: Andres Gomez <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* i965/fs: add support for shader float control to remove_extra_rounding_modes()Samuel Iglesias Gonsálvez2019-09-171-1/+14
| | | | | | | | | | | | | | | | | | | | | The remove_extra_rounding_modes() optimization will remove duplicated rounding mode changes. v2: - Fix bug in the rounding mode change (Alejandro). v3: - Fix rounding modes. v4: - Updated to renamed shader info member and enum values (Andres). v5: - Simplify flags logic operations (Caio). Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Signed-off-by: Andres Gomez <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* i965/fs: set rounding mode when emitting nir_op_f2f32 or nir_op_f2f16Samuel Iglesias Gonsálvez2019-09-171-5/+27
| | | | | | | | | v2: - Consider nir_op_f2f16 case too (Caio). Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Signed-off-by: Andres Gomez <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* i965/fs: set rounding mode when emitting fadd, fmul and ffma instructionsSamuel Iglesias Gonsálvez2019-09-171-1/+34
| | | | | | | | | v2: - Updated to renamed shader info member (Andres). Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Signed-off-by: Andres Gomez <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* i965/fs: add emit_shader_float_controls_execution_mode() and aux functionsSamuel Iglesias Gonsálvez2019-09-173-0/+61
| | | | | | | | | | | | | | | | | | | | We need this function to emit code that setups the control register later with the defined execution mode for the shader. Therefore, we emit it as the first instruction. v2: - Fix bug in setting the default mode mask in brw_rnd_mode_from_nir(). - Fix support for rounding modes in brw_rnd_mode_from_nir(). v3: - Updated to renamed shader info member and enum values (Andres). v4: - Add actual emission as first instruction of emit_nir_code (Caio). Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Signed-off-by: Andres Gomez <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* i965/fs/generator: add new opcode to set float controls modes in control ↵Samuel Iglesias Gonsálvez2019-09-173-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | register Before this commit, we had only FPRoundingMode decoration (the per instruction one) that is applied during the SPIR-V handling. In vtn_alu we find out the rounding mode, and generate the code accordingly that later will be used to look for the respective nir_op_f2f16_{rtz,rtne}. Per-instruction gets prioritized because we make them explicit conversions (with RTZ or RTNE nir opcodes) and they will override the default execution mode defined with float controls. However, we need to come back to the mode defined by float controls after the execution of the FP Rounding instruction. Therefore, the new SHADER_OPCODE_FLOAT_CONTROL_MODE opcode will be used to set the default rounding mode and denorms treatment in the whole shader while the pre-existent SHADER_OPCODE_RND_MODE, will be used as prioritized rounding mode in a per-instruction basis. v2: - Fix bug in defining BRW_CR0_FP_MODE_MASK. v3: - Update comment (Caio). v4: - Split the patch into the helper and the new opcode (this one) (Caio). v5: - Add an explanation on the actual purpose and priority of the newly introduced opcode in the commit log (Caio). Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* i965/fs/generator: refactor rounding mode helper in preparation for float ↵Samuel Iglesias Gonsálvez2019-09-173-35/+32
| | | | | | | | | | | | | | | | | controls v2: - Fix bug in defining BRW_CR0_FP_MODE_MASK. v3: - Update comment (Caio). v4: - Split the patch into the helper (this one) and the new opcode (Caio). Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* i965/fs/nir: add nir_op_unpack_half_2x16_split_*_flush_to_zeroSamuel Iglesias Gonsálvez2019-09-171-0/+4
| | | | | | | | | | | | | The denorm mode is set in the control register, no need to do something else. v2: - Add an assert to make sure that we realize if this assumption is broken in the future (Caio). Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Signed-off-by: Andres Gomez <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/nir: do not apply the fsin and fcos trig workarounds for constsSamuel Iglesias Gonsálvez2019-09-171-2/+2
| | | | | | | | | | | | | | | | | | | If we have fsin or fcos trigonometric operations with constant values as inputs, we will multiply the result by 0.99997 in brw_nir_apply_trig_workarounds, making the result wrong. Adjusting the rules so they do not apply to const values we let a later constant fold to deal with it. v2: - Do not early constant fold but only apply the trig workaround for non constants (Caio). - Add fixes tag to commit log (Caio). Fixes: bfd17c76c12 "i965: Port INTEL_PRECISE_TRIG=1 to NIR." Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Signed-off-by: Andres Gomez <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir/large_constants: pass after lowering copy_derefSergii Romantsov2019-09-161-7/+7
| | | | | | | | | | | v2: by J.Ekstrand suggestion moved lowering of large constants after lowering of copy_deref is done. CC: Jason Ekstrand <[email protected]> CC: Caio Marcelo de Oliveira Filho <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111450 Signed-off-by: Sergii Romantsov <[email protected]>
* vulkan: add vk_x11_strict_image_count optionLionel Landwerlin2019-09-151-0/+1
| | | | | | | | | | | | | | | | | | This option strictly allocate the minImageCount given by the application at swapchain creation. This works around application that do not deal with the fact that the implementation allocates more images than the minimum specified. v2: Add values in default drirc (Bas) v3: specify engine name/version (Lionel) Signed-off-by: Lionel Landwerlin <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111522 Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Cc: 19.2 <[email protected]>
* driconfig: add a new engine name/version parameterLionel Landwerlin2019-09-151-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Vulkan applications can register with the following structure : typedef struct VkApplicationInfo { VkStructureType sType; const void* pNext; const char* pApplicationName; uint32_t applicationVersion; const char* pEngineName; uint32_t engineVersion; uint32_t apiVersion; } VkApplicationInfo; This enables the Vulkan implementations to apply workarounds based off matching this description. Here we add a new parameter for matching the driconfig options with the following : <device driver="anv"> <application engine_name_match="MyOwnEngine.*" engine_versions="10:12,40:42"> <option name="blaaah" value="true" /> </application> </device> v2: switch engine name match to use regexps v3: Verify that the regexec returns REG_NOMATCH for match failure (Eric) v4: Add missing bit that went to the following commit (Eric) Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Cc: 19.2 <[email protected]>
* intel/fs: Handle UNDEF in split_virtual_grfsJason Ekstrand2019-09-131-1/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the UNDEF instruction was added, we didn't do anything special in split_virtual_grfs. This mean that anything with an UNDEF wasn't getting split which causes problems for the compiler. Among other things, it makes RA harder because things are in bigger chunks. It also meant that dvec4s weren't getting split which means that they are larger than the maximum register size. Shader-db results on Kaby Lake: total instructions in shared programs: 14959202 -> 14960035 (<.01%) instructions in affected programs: 96197 -> 97030 (0.87%) helped: 140 HURT: 128 helped stats (abs) min: 1 max: 17 x̄: 1.62 x̃: 1 helped stats (rel) min: 0.09% max: 6.15% x̄: 0.65% x̃: 0.45% HURT stats (abs) min: 1 max: 825 x̄: 8.28 x̃: 1 HURT stats (rel) min: 0.13% max: 139.83% x̄: 1.70% x̃: 0.50% 95% mean confidence interval for instructions value: -2.96 9.18 95% mean confidence interval for instructions %-change: -0.56% 1.51% Inconclusive result (value mean confidence interval includes 0). total loops in shared programs: 4372 -> 4372 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total cycles in shared programs: 352646771 -> 352840997 (0.06%) cycles in affected programs: 218600800 -> 218795026 (0.09%) helped: 21167 HURT: 21411 helped stats (abs) min: 1 max: 2924 x̄: 36.89 x̃: 10 helped stats (rel) min: <.01% max: 41.90% x̄: 2.97% x̃: 0.98% HURT stats (abs) min: 1 max: 26027 x̄: 45.54 x̃: 10 HURT stats (rel) min: <.01% max: 324.46% x̄: 3.88% x̃: 1.06% 95% mean confidence interval for cycles value: 2.87 6.26 95% mean confidence interval for cycles %-change: 0.40% 0.55% Cycles are HURT. total spills in shared programs: 8840 -> 8953 (1.28%) spills in affected programs: 126 -> 239 (89.68%) helped: 1 HURT: 2 total fills in shared programs: 21782 -> 21914 (0.61%) fills in affected programs: 431 -> 563 (30.63%) helped: 1 HURT: 3 LOST: 0 GAINED: 5 Shader-db results on Haswell: total instructions in shared programs: 13320918 -> 13320769 (<.01%) instructions in affected programs: 40998 -> 40849 (-0.36%) helped: 146 HURT: 56 helped stats (abs) min: 1 max: 8 x̄: 2.73 x̃: 2 helped stats (rel) min: 0.16% max: 8.60% x̄: 2.52% x̃: 2.22% HURT stats (abs) min: 2 max: 23 x̄: 4.45 x̃: 4 HURT stats (rel) min: 0.21% max: 10.26% x̄: 6.83% x̃: 10.26% 95% mean confidence interval for instructions value: -1.26 -0.21 95% mean confidence interval for instructions %-change: -0.62% 0.77% Inconclusive result (%-change mean confidence interval includes 0). total loops in shared programs: 4373 -> 4373 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total cycles in shared programs: 374518258 -> 374384193 (-0.04%) cycles in affected programs: 231101954 -> 230967889 (-0.06%) helped: 21427 HURT: 19438 helped stats (abs) min: 1 max: 2035 x̄: 31.09 x̃: 8 helped stats (rel) min: <.01% max: 40.95% x̄: 2.42% x̃: 0.86% HURT stats (abs) min: 1 max: 20875 x̄: 27.38 x̃: 8 HURT stats (rel) min: <.01% max: 59.09% x̄: 2.49% x̃: 0.80% 95% mean confidence interval for cycles value: -4.49 -2.07 95% mean confidence interval for cycles %-change: -0.14% -0.04% Cycles are helped. total spills in shared programs: 23406 -> 23411 (0.02%) spills in affected programs: 3 -> 8 (166.67%) helped: 0 HURT: 2 total fills in shared programs: 34845 -> 34850 (0.01%) fills in affected programs: 3 -> 8 (166.67%) helped: 0 HURT: 2 LOST: 0 GAINED: 0 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111566 Fixes: f4ef34f207d1 "intel/fs: Add an UNDEF instruction to avoid..." Reviewed-by: Francisco Jerez <[email protected]>
* intel/gen11+: Enable Hardware filtering of Semi-Pipelined State in WMAnuj Phogat2019-09-111-0/+11
| | | | | | | Initial benchmarking didn't show any performance benefits. But it might eventually. Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* genxml/gen11+: Add COMMON_SLICE_CHICKEN4 registerAnuj Phogat2019-09-112-0/+10
| | | | | Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* android: anv: libmesa_vulkan_common: add libmesa_util static dependencyMauro Rossi2019-09-081-1/+2
| | | | | | | | | | | | | Change needed to fix the following building error: In file included from external/mesa/src/intel/vulkan/anv_device.c:43: external/mesa/src/util/xmlpool.h:115:10: fatal error: 'xmlpool/options.h' file not found ^~~~~~~~~~~~~~~~~~~ 1 error generated. Fixes: 4dcb1ff ("anv: add support for driconf") Signed-off-by: Mauro Rossi <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* intel/blorp: Use wide formats for nicely aligned stencil clearsJason Ekstrand2019-09-062-0/+122
| | | | | | | | | | | | | | | | In the case where the stencil clear is nicely aligned, we can clear stencil much more efficiently by mapping it as a wide format (say RGBA32_UINT) and blasting out the stencil clear value with a repclear. On Unigine Heaven, this makes one stencil clear go from non-trivial to unnoticeable when looking at per-draw timings. In order for this change to work properly, ANV needs to do a bit more flushing around depth and stencil clears. i965 and iris already have the cache tracking logic to handle this so no changes are required there. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/blorp: Expose surf_fake_interleaved_msaa internallyJason Ekstrand2019-09-062-5/+8
|
* intel/blorp: Expose surf_retile_w_to_y internallyJason Ekstrand2019-09-062-5/+8
| | | | | Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* blorp: Memset surface info to zero when initializing itJason Ekstrand2019-09-061-0/+1
| | | | | | | | This isn't known to fix any current bugs but it does prevent a regression in a subsequent commit. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/tools: Decode PS kernels on SNBJason Ekstrand2019-09-061-1/+4
| | | | | Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/tools: Decode 3DSTATE_BINDING_TABLE_POINTERS on SNBJason Ekstrand2019-09-061-0/+15
| | | | | Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: add support for vk_x11_override_min_image_countEric Engestrom2019-09-061-0/+3
| | | | | | | Cc: [email protected] Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: add support for driconfEric Engestrom2019-09-064-3/+19
| | | | | | | | | No option is supported yet, this is just the boilerplate. Cc: [email protected] Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv,iris: L3ALLOC register replaces L3CNTLREG for gen12Jordan Justen2019-09-062-7/+16
| | | | | Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel/gen12: Add L3 configurationsAnuj Phogat2019-09-061-1/+12
| | | | | Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* anv: Bump maxComputeWorkgroupSizeJason Ekstrand2019-09-061-4/+6
| | | | | | Fixes: 9a129510f56f "anv: Bump maxComputeWorkgroupInvocations" Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111552 Reviewed-by: Lionel Landwerlin <[email protected]>
* intel: Stop redirecting state cache to command streamer cache sectionKenneth Graunke2019-09-061-12/+0
| | | | | | | | | | | | | | | | | | This bit redirects the state cache from the unified/RO sections of the L3 cache to the "CS command buffer" section of the cache, which would be set up via TCCNTLREG. The documentation says: "Additionaly, this redirection should be enabled only if there is a non-zero allocation for the CS command buffer section." We don't allocate any cache to the CS command buffer section, so enabling this redirection effectively disabled the state cache. The Windows driver only sets up that section when using POSH, which we do not currently use. So, leave it unallocated and disable the redirection to get a functional state cache again. Improves performance in Civilization VI by 18%, Manhattan 3.0 by 6%, and Car Chase by 2%.
* Revert "intel/fs: Move the scalar-region conversion to the generator."Jason Ekstrand2019-09-064-5/+5
| | | | | | | | | | This reverts commit c0504569eac5e5c305e9f0c240e248aca9d8891f. Now that we're doing interpolation lowering in NIR, we can continue to stride the FS input registers directly in the brw_fs_nir code like we did before. This fixes SIMD32 fragment shaders which broke because lower_simd_width depended on the 0 stride to split PLN instructions correctly. Reviewed-by: Francisco Jerez <[email protected]>
* intel/fs: Fix FB write inst groupsJason Ekstrand2019-09-061-1/+1
| | | | | | | | | | | | This commit does two things. First, it simplifies the way we compute the FB write group bit. There's no reason to use a ternary because inst->group / 16 can only be 0 or 1. Second, it fixes an order-of- operations bug where the ternary wasn't selecting between (1 << 11) and 0 but between (1 << 11) and 0 | brw_dp_write_desc(...). Fixes: 0d9648416 "intel/compiler: Use generic SEND for Gen7+ FB writes" Reviewed-by: Francisco Jerez <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nir: allow specifying filter callback in lower_alu_to_scalarVasily Khoruzhick2019-09-061-3/+3
| | | | | | | | | | | | | Set of opcodes doesn't have enough flexibility in certain cases. E.g. Utgard PP has vector conditional select operation, but condition is always scalar. Lowering all the vector selects to scalar increases instruction number, so we need a way to filter only those ops that can't be handled in hardware. Reviewed-by: Qiang Yu <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Signed-off-by: Vasily Khoruzhick <[email protected]>
* anv: fix format string in error messageEric Engestrom2019-09-041-1/+1
| | | | | | Fixes: 9775894f102535a79186 ("anv: Move size check from anv_bo_cache_import() to caller (v2)") Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: build libanv for gen12 in android buildTapani Pälli2019-08-281-0/+23
| | | | | | | Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Build for gen12Jordan Justen2019-08-286-1/+29
| | | | | | Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/l3: Don't assert on gen12 (use gen11 config temporarily)Jordan Justen2019-08-281-0/+1
| | | | | | Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Acked-by: Lionel Landwerlin <[email protected]>
* intel/compiler: Disable compaction on gen12 for nowJordan Justen2019-08-281-1/+7
| | | | | Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel/isl: build android libmesa_isl for gen12Tapani Pälli2019-08-281-0/+20
| | | | | | | Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/isl: Build gen12 using gen11 code pathsJordan Justen2019-08-284-1/+11
| | | | | | Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/genxml: generate pack files for gen12 on android buildsTapani Pälli2019-08-281-0/+5
| | | | | | | Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/genxml: Build gen12 genxmlJordan Justen2019-08-285-2/+11
| | | | | | Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/genxml: Add gen12.xml as a copy of gen11.xmlJordan Justen2019-08-281-0/+7171
| | | | | | Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/genxml: Run sort_xml.sh to tidy gen9.xml and gen11.xmlJordan Justen2019-08-282-38/+36
| | | | | | Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/genxml/gen11: Add spaces in EnableUnormPathInColorPipeJordan Justen2019-08-281-1/+1
| | | | | | Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/genxml: Handle field names with different spacing/hyphenJordan Justen2019-08-281-3/+4
| | | | | | | | | | | | | If a field name differs slightly between two generations then this change will still add the fields into the same group. For example, these will be treated as equal: * "Software Exception" and "Software Exception" * "Per Thread" and "Per-Thread" Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/compiler: Request bitfield_reverse lowering on pre-Gen7 hardwareIan Romanick2019-08-281-0/+1
| | | | | | | | | | See the previous commit for the explanation of the Fixes tag. Hurts 21 shaders in shader-db. All of the hurt shaders are in Unreal Engine 4 tech demos. Reviewed-by: Matt Turner <[email protected]> Fixes: 7afa26d4e39 ("nir: Add lowering for nir_op_bitfield_reverse.")
* intel/compiler: Use new Gen11 headerless RT writes for MRT casesKenneth Graunke2019-08-271-2/+13
| | | | | | | | | | | | | | | Gen11 adds support for specifying the render target index and src0 alpha present bits in the extended message descriptor. Previously, we had to use a message header for this, requiring extra instructions to write the fields, and two registers of extra payload. Improves performance on my ICL 8x8 frequency locked to 700Mhz, on iris: GfxBench5 Manhattan 3.0: 2.13635% +/- 0.159859% (n=5) GfxBench5 Aztec Ruins: 1.57173% +/- 0.128749% (n=5) Synmark2 OglDeferred: 2.86914% +/- 0.191211% (n=10) Reviewed-by: Jason Ekstrand <[email protected]>
* intel/compiler: Use generic SEND for Gen7+ FB writesKenneth Graunke2019-08-272-6/+28
| | | | | | | | This takes care of generate_fb_write/fire_fb_write/brw_fb_WRITE's stuff earlier in the visitor. It will also make it easier to generate SENDSC messages with indirect extended descriptors in a few patches. Reviewed-by: Jason Ekstrand <[email protected]>
* intel/compiler: Refactor FB write message control setup into a helper.Kenneth Graunke2019-08-273-26/+37
| | | | | | This will be used by visitor code to convert directly to SEND in a bit. Reviewed-by: Jason Ekstrand <[email protected]>
* intel/compiler: Handle bits 15:12 in brw_send_indirect_split_message()Kenneth Graunke2019-08-271-2/+12
| | | | | | | | | | | | Annoyingly, these bits exist in some extended message descriptors (in particular render target writes), but they don't have any corresponding bits in the ISA encoding. So we can't use an immediate and have to fall back to an indirect extended descriptor. Thanks to Jason Ekstrand for reminding me that you can still set these bits via an indirect descriptor, even if they don't exist in the ISA. Reviewed-by: Jason Ekstrand <[email protected]>