summaryrefslogtreecommitdiffstats
path: root/src/intel
Commit message (Collapse)AuthorAgeFilesLines
* intel/compiler: avoid truncating int64_t to intMaya Rashish2019-09-261-1/+1
| | | | | Reviewed-by: Matt Turner <[email protected]> Signed-off-by: Maya Rashish <[email protected]>
* anv: gem-stubs: return a valid fd got anv_gem_userptr()Lionel Landwerlin2019-09-251-1/+7
| | | | | | | | Fixes invalid close(-1) in the unit tests. Signed-off-by: Lionel Landwerlin <[email protected]> Cc: <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: set rounding mode when emitting the flrp instructionAndres Gomez2019-09-241-0/+7
| | | | | | | | | | | flrp was forgotten when already adding the rounding mode for other instructions. Fixes: ba1e25e1aa6 ("i965/fs: set rounding mode when emitting fadd, fmul and ffma instructions") Suggested-by: Ian Romanick <[email protected]> Signed-off-by: Andres Gomez <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* i965/fs: add a comment about how the rounding mode in fmul is setAndres Gomez2019-09-241-0/+4
| | | | | | | | | | | | | | | | | After 1711bf6cf2d ("intel/fs: Generate better code for fsign multiplied by a value"), the conflicts resolution for setting the rounding mode after the fused fmul and fsign optimization is non obvious. Basically, the optimization doesn't really result in a MUL, or any other operation which would need to have the rounding mode set. Hence, we set it just before the actual MUL in the treatment of fmul. Fixes: ba1e25e1aa6 ("i965/fs: set rounding mode when emitting fadd, fmul and ffma instructions") Suggested-by: Caio Marcelo de Oliveira Filho <[email protected]> Signed-off-by: Andres Gomez <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* intel: Increase Gen11 compute shader scratch IDs to 64.Kenneth Graunke2019-09-231-1/+13
| | | | | | | | | | | | | | | | | | | | | | | From the MEDIA_VFE_STATE docs: "Starting with this configuration, the Maximum Number of Threads must be set to (#EU * 8) for GPGPU dispatches. Although there are only 7 threads per EU in the configuration, the FFTID is calculated as if there are 8 threads per EU, which in turn requires a larger amount of Scratch Space to be allocated by the driver." It's pretty clear that we need to increase this for scratch address calculations, because the FFTID has a certain bit-pattern. The quote above seems to indicate that we should increase the actual thread count programmed in MEDIA_VFE_STATE as well, but we think the intention is to only bump the scratch space. Fixes GPU hangs in Bioshock Infinite and Synmark's CSDof on Icelake 8x8. Fixes: 5ac804bd9ac ("intel: Add a preliminary device for Ice Lake") Reviewed-by: Matt Turner <[email protected]>
* Revert "intel/gen11+: Enable Hardware filtering of Semi-Pipelined State in WM"Kenneth Graunke2019-09-231-11/+0
| | | | | | | | | | | | | | | This reverts commit 729de1488f49033bc181b8123af5658228a51bf1. It turns out that, although the register is in the logical context, it isn't whitelisted, so we can't actually write it from userspace batch buffers. The write just becomes a noop, which is why we saw no performance changes. I manually whitelisted it, and still observed no performance gains, but it did regress KHR-GL46.texture_cube_map_array.color_depth_attachments on the iris driver. So we might need to fix something before enabling this. To prevent it randomly getting turned on should the kernel ever whitelist this register, we revert the patch for now.
* intel/genxml: Stop manually scrubbing 'α' -> "alpha"Kenneth Graunke2019-09-232-2/+1
| | | | | | | 'α' has never appeared in any genxml files, so there's no need to replace it with the word "alpha". Reviewed-by: Jordan Justen <[email protected]>
* isl: Drop WaDisableSamplerL2BypassForTextureCompressedFormats on Gen11Kenneth Graunke2019-09-201-1/+1
| | | | | | | | | | | Gen11 doesn't require us to bypass the L2 cache for BC* images anymore. The documentation is a bit hard to follow on this point, but the Windows driver clearly only applies this workaround on Gen9, and their commit history indicates that this was an intentional change to drop the workaround for Gen11+. Reviewed-by: Jason Ekstrand <[email protected]>
* anv: Advertise VK_KHR_shader_subgroup_extended_typesJason Ekstrand2019-09-202-0/+8
| | | | Reviewed-by: Paulo Zanoni <[email protected]>
* intel/fs: Do 8-bit subgroup scan operations in 16 bitsJason Ekstrand2019-09-201-3/+39
| | | | Reviewed-by: Paulo Zanoni <[email protected]>
* intel/fs: Allow CLUSTER_BROADCAST to do type conversionJason Ekstrand2019-09-201-1/+1
| | | | | | | | We can't really handle it in the little-core 64-bit case but it's not really needed there. Where we really want this is for when we need to do 16 -> 8-bit conversions. Reviewed-by: Paulo Zanoni <[email protected]>
* intel/fs: Allow UB, B, and HF types in brw_nir_reduction_op_identityJason Ekstrand2019-09-201-1/+7
| | | | | | | Because byte immediates aren't a thing on GEN hardware, we return a signed or unsigned word immediate in the byte case. Reviewed-by: Paulo Zanoni <[email protected]>
* intel/fs: don't forget the stride at generate_shufflePaulo Zanoni2019-09-201-1/+2
| | | | | | | | | | | | | | During generate_shuffle(), when we use byte sized registers we end up with a destination stride of 2. We don't take the stride into consideration when selecting the group offset for the last MOV operation, which means we end up moving things to the wrong place, leaving the last few channels untouched. Take the destination stride in consideration so we don't miss the last channels. v2: Assert this is not necessary for the IVB special case (Jason). Reviewed-by: Jason Ekstrand <[email protected]> Signed-off-by: Paulo Zanoni <[email protected]>
* util/rb_tree: Reverse the order of comparison functionsJason Ekstrand2019-09-201-2/+2
| | | | | | | | | | | The new order matches that of the comparison functions accepted by the C standard library qsort() functions. Being consistent with qsort will hopefully help avoid developer confusion. The only current user of the red-black tree is aub_mem.c which is pretty easy to fix up. Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: implement ICD interface v4Eric Engestrom2019-09-201-1/+29
| | | | | | Signed-off-by: Eric Engestrom <[email protected]> Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: split instance dispatch tableEric Engestrom2019-09-203-1/+198
| | | | | | | | | | | | | This effectively breaks the instance dispatch table in 2 with entry points using a physical device as first argument getting their own dispatch table. As a result we now have to check instance & physical device dispatch table instead of just the instance dispatch table before. Signed-off-by: Eric Engestrom <[email protected]> Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* Move blob from compiler/ to util/Jason Ekstrand2019-09-191-1/+1
| | | | | | | | There's nothing whatsoever compiler-specific about it other than that's currently where it's used. Reviewed-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/fs: Add Fall-through commentCaio Marcelo de Oliveira Filho2019-09-191-0/+3
| | | | Reviewed-by: Andres Gomez <[email protected]>
* anv: fix descriptor limits on gen8Arcady Goldmints-Orlov2019-09-192-4/+6
| | | | | | | | | | | | | | Later generations support bindless for samplers, images, and buffers and thus per-stage descriptors are not limited by the binding table size. However, gen8 doesn't support bindless images and thus needs to report a lower per-stage limit so that all combinations of descriptors that fit within the advertised limits are reported as supported by vkGetDescriptorSetLayoutSupport. Fixes test dEQP-VK.api.maintenance3_check.descriptor_set Fixes: 79fb0d27f3 ("anv: Implement SSBOs bindings with GPU addresses in the descriptor BO") Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/fs: fix SHADER_OPCODE_CLUSTER_BROADCAST for SIMD32Paulo Zanoni2019-09-191-1/+10
| | | | | | | | | | | | | The current code can create functions with a width of 32, which is not supported by our hardware. Add some code to simplify how we express what we want and prevent such cases. For some unknown reason, all the tests I could run seem to work even with these unsupported MOVs. Fixes: b0858c1cc6 "intel/fs: Add a couple of simple helper opcodes" Reviewed-by: Jason Ekstrand <[email protected]> Signed-off-by: Paulo Zanoni <[email protected]>
* intel/fs: the maximum supported stride width is 16Paulo Zanoni2019-09-191-1/+3
| | | | | | | | | | | | | | | | There are cases where we try to generate registers with a stride of 32, while the hardware maximum is just 16. This happens, for example, when using 8 bit integers on SIMD32. This results in a crash because the variable 'width' has a value of 32: ../../src/intel/compiler/brw_reg.h:550: brw_reg brw_vecn_reg(unsigned int, brw_reg_file, unsigned int, unsigned int): Assertion `!"Invalid register width"' failed. This change prevents the crash and makes the tests pass. Reviewed-by: Jason Ekstrand <[email protected]> Signed-off-by: Paulo Zanoni <[email protected]>
* intel/fs: roll the loop with the <0,1,0> additions in emit_scan()Paulo Zanoni2019-09-191-32/+14
| | | | | | | | | | | | IMHO the code is easier to understand this way, being explicit that we're doing exactly the same thing every time. No functional changes. v2: Adjust the loop breaking condition (Jason). Reviewed-by: Jason Ekstrand <[email protected]> Signed-off-by: Paulo Zanoni <[email protected]>
* intel/fs: make scan/reduce work with SIMD32 when it fits 2 registersPaulo Zanoni2019-09-191-0/+23
| | | | | | | | | | | | When dealing with uint16_t and uint8_t on SIMD32 we can do all the operations using just 2 registers, so we don't hit the recursion at the beginning of emit_scan(). Because of that, we need to actually compute scan/reduce for channels 31:16. v2: Still missed instructions (Jason). Reviewed-by: Jason Ekstrand <[email protected]> Signed-off-by: Paulo Zanoni <[email protected]>
* intel/compiler: Record whether any pull constant loads occurKenneth Graunke2019-09-183-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | I would like for iris to be able to avoid setting up SURFACE_STATE for UBOs in the common case where all constants are pushed. Unfortunately, we don't know up front whether everything will be pushed: the backend is allowed to demote pushed UBOs to pull loads fairly late in the process. This is probably desirable though, as we'd like the backend to be able to re-pull pushed data to break up long live ranges in response to register pressure. Here we simply add a "are there any pull loads at all" boolean to prog_data, which is a bit crude but at least allows us to skip work in the common "everything pushed" case. We could skip more work by tracking exactly which UBO surfaces are pulled in a bitmask, but I wanted to avoid bringing back the old mark_surface_used() mechanism. Finer-grained tracking could allow us to skip a bit more work when multiple UBOs are in use and /some/ are 100% pushed, but others are accessed via pulls. However, I'm not sure how common this is and it would save at most 4 pull descriptors, so we defer that for now. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/compiler: Set "Null Render Target" ex_desc bit on Gen11Kenneth Graunke2019-09-171-0/+3
| | | | | | | | | | | | | | | | When there are no color regions (i.e. a depth only pass), we can set the "Null Render Target" bit in the Gen11 RT write extended message descriptor to indicate that it should behave as if it's writing to a null render target, without the need for a binding table entry. This lets drivers avoid setting up that null RT binding table entry, but more importantly means the HW doesn't actually have to bother looking up the surface state. Together with the next patch, this improves performance in Car Chase on an Icelake 8x8 (locked to 700Mhz) by 0.0445526% +/- 0.0132736% (n=832). Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* anv: enable VK_KHR_shader_float_controls and SPV_KHR_float_controlsSamuel Iglesias Gonsálvez2019-09-173-0/+33
| | | | | | | | | | | | | | This adds support for VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_FLOAT_CONTROLS_PROPERTIES_KHR and enables de Vulkan and SPIR-V extensions. Also, notice that this includes the updates applied to the VkPhysicalDeviceFloatControlsPropertiesKHR structure in the extension VK_KHR_shader_float_controls v4 and Vulkan 1.1.116. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Signed-off-by: Andres Gomez <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* i965/fs: add support for shader float control to remove_extra_rounding_modes()Samuel Iglesias Gonsálvez2019-09-171-1/+14
| | | | | | | | | | | | | | | | | | | | | The remove_extra_rounding_modes() optimization will remove duplicated rounding mode changes. v2: - Fix bug in the rounding mode change (Alejandro). v3: - Fix rounding modes. v4: - Updated to renamed shader info member and enum values (Andres). v5: - Simplify flags logic operations (Caio). Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Signed-off-by: Andres Gomez <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* i965/fs: set rounding mode when emitting nir_op_f2f32 or nir_op_f2f16Samuel Iglesias Gonsálvez2019-09-171-5/+27
| | | | | | | | | v2: - Consider nir_op_f2f16 case too (Caio). Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Signed-off-by: Andres Gomez <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* i965/fs: set rounding mode when emitting fadd, fmul and ffma instructionsSamuel Iglesias Gonsálvez2019-09-171-1/+34
| | | | | | | | | v2: - Updated to renamed shader info member (Andres). Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Signed-off-by: Andres Gomez <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* i965/fs: add emit_shader_float_controls_execution_mode() and aux functionsSamuel Iglesias Gonsálvez2019-09-173-0/+61
| | | | | | | | | | | | | | | | | | | | We need this function to emit code that setups the control register later with the defined execution mode for the shader. Therefore, we emit it as the first instruction. v2: - Fix bug in setting the default mode mask in brw_rnd_mode_from_nir(). - Fix support for rounding modes in brw_rnd_mode_from_nir(). v3: - Updated to renamed shader info member and enum values (Andres). v4: - Add actual emission as first instruction of emit_nir_code (Caio). Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Signed-off-by: Andres Gomez <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* i965/fs/generator: add new opcode to set float controls modes in control ↵Samuel Iglesias Gonsálvez2019-09-173-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | register Before this commit, we had only FPRoundingMode decoration (the per instruction one) that is applied during the SPIR-V handling. In vtn_alu we find out the rounding mode, and generate the code accordingly that later will be used to look for the respective nir_op_f2f16_{rtz,rtne}. Per-instruction gets prioritized because we make them explicit conversions (with RTZ or RTNE nir opcodes) and they will override the default execution mode defined with float controls. However, we need to come back to the mode defined by float controls after the execution of the FP Rounding instruction. Therefore, the new SHADER_OPCODE_FLOAT_CONTROL_MODE opcode will be used to set the default rounding mode and denorms treatment in the whole shader while the pre-existent SHADER_OPCODE_RND_MODE, will be used as prioritized rounding mode in a per-instruction basis. v2: - Fix bug in defining BRW_CR0_FP_MODE_MASK. v3: - Update comment (Caio). v4: - Split the patch into the helper and the new opcode (this one) (Caio). v5: - Add an explanation on the actual purpose and priority of the newly introduced opcode in the commit log (Caio). Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* i965/fs/generator: refactor rounding mode helper in preparation for float ↵Samuel Iglesias Gonsálvez2019-09-173-35/+32
| | | | | | | | | | | | | | | | | controls v2: - Fix bug in defining BRW_CR0_FP_MODE_MASK. v3: - Update comment (Caio). v4: - Split the patch into the helper (this one) and the new opcode (Caio). Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* i965/fs/nir: add nir_op_unpack_half_2x16_split_*_flush_to_zeroSamuel Iglesias Gonsálvez2019-09-171-0/+4
| | | | | | | | | | | | | The denorm mode is set in the control register, no need to do something else. v2: - Add an assert to make sure that we realize if this assumption is broken in the future (Caio). Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Signed-off-by: Andres Gomez <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/nir: do not apply the fsin and fcos trig workarounds for constsSamuel Iglesias Gonsálvez2019-09-171-2/+2
| | | | | | | | | | | | | | | | | | | If we have fsin or fcos trigonometric operations with constant values as inputs, we will multiply the result by 0.99997 in brw_nir_apply_trig_workarounds, making the result wrong. Adjusting the rules so they do not apply to const values we let a later constant fold to deal with it. v2: - Do not early constant fold but only apply the trig workaround for non constants (Caio). - Add fixes tag to commit log (Caio). Fixes: bfd17c76c12 "i965: Port INTEL_PRECISE_TRIG=1 to NIR." Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Signed-off-by: Andres Gomez <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir/large_constants: pass after lowering copy_derefSergii Romantsov2019-09-161-7/+7
| | | | | | | | | | | v2: by J.Ekstrand suggestion moved lowering of large constants after lowering of copy_deref is done. CC: Jason Ekstrand <[email protected]> CC: Caio Marcelo de Oliveira Filho <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111450 Signed-off-by: Sergii Romantsov <[email protected]>
* vulkan: add vk_x11_strict_image_count optionLionel Landwerlin2019-09-151-0/+1
| | | | | | | | | | | | | | | | | | This option strictly allocate the minImageCount given by the application at swapchain creation. This works around application that do not deal with the fact that the implementation allocates more images than the minimum specified. v2: Add values in default drirc (Bas) v3: specify engine name/version (Lionel) Signed-off-by: Lionel Landwerlin <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111522 Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Cc: 19.2 <[email protected]>
* driconfig: add a new engine name/version parameterLionel Landwerlin2019-09-151-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Vulkan applications can register with the following structure : typedef struct VkApplicationInfo { VkStructureType sType; const void* pNext; const char* pApplicationName; uint32_t applicationVersion; const char* pEngineName; uint32_t engineVersion; uint32_t apiVersion; } VkApplicationInfo; This enables the Vulkan implementations to apply workarounds based off matching this description. Here we add a new parameter for matching the driconfig options with the following : <device driver="anv"> <application engine_name_match="MyOwnEngine.*" engine_versions="10:12,40:42"> <option name="blaaah" value="true" /> </application> </device> v2: switch engine name match to use regexps v3: Verify that the regexec returns REG_NOMATCH for match failure (Eric) v4: Add missing bit that went to the following commit (Eric) Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Cc: 19.2 <[email protected]>
* intel/fs: Handle UNDEF in split_virtual_grfsJason Ekstrand2019-09-131-1/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the UNDEF instruction was added, we didn't do anything special in split_virtual_grfs. This mean that anything with an UNDEF wasn't getting split which causes problems for the compiler. Among other things, it makes RA harder because things are in bigger chunks. It also meant that dvec4s weren't getting split which means that they are larger than the maximum register size. Shader-db results on Kaby Lake: total instructions in shared programs: 14959202 -> 14960035 (<.01%) instructions in affected programs: 96197 -> 97030 (0.87%) helped: 140 HURT: 128 helped stats (abs) min: 1 max: 17 x̄: 1.62 x̃: 1 helped stats (rel) min: 0.09% max: 6.15% x̄: 0.65% x̃: 0.45% HURT stats (abs) min: 1 max: 825 x̄: 8.28 x̃: 1 HURT stats (rel) min: 0.13% max: 139.83% x̄: 1.70% x̃: 0.50% 95% mean confidence interval for instructions value: -2.96 9.18 95% mean confidence interval for instructions %-change: -0.56% 1.51% Inconclusive result (value mean confidence interval includes 0). total loops in shared programs: 4372 -> 4372 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total cycles in shared programs: 352646771 -> 352840997 (0.06%) cycles in affected programs: 218600800 -> 218795026 (0.09%) helped: 21167 HURT: 21411 helped stats (abs) min: 1 max: 2924 x̄: 36.89 x̃: 10 helped stats (rel) min: <.01% max: 41.90% x̄: 2.97% x̃: 0.98% HURT stats (abs) min: 1 max: 26027 x̄: 45.54 x̃: 10 HURT stats (rel) min: <.01% max: 324.46% x̄: 3.88% x̃: 1.06% 95% mean confidence interval for cycles value: 2.87 6.26 95% mean confidence interval for cycles %-change: 0.40% 0.55% Cycles are HURT. total spills in shared programs: 8840 -> 8953 (1.28%) spills in affected programs: 126 -> 239 (89.68%) helped: 1 HURT: 2 total fills in shared programs: 21782 -> 21914 (0.61%) fills in affected programs: 431 -> 563 (30.63%) helped: 1 HURT: 3 LOST: 0 GAINED: 5 Shader-db results on Haswell: total instructions in shared programs: 13320918 -> 13320769 (<.01%) instructions in affected programs: 40998 -> 40849 (-0.36%) helped: 146 HURT: 56 helped stats (abs) min: 1 max: 8 x̄: 2.73 x̃: 2 helped stats (rel) min: 0.16% max: 8.60% x̄: 2.52% x̃: 2.22% HURT stats (abs) min: 2 max: 23 x̄: 4.45 x̃: 4 HURT stats (rel) min: 0.21% max: 10.26% x̄: 6.83% x̃: 10.26% 95% mean confidence interval for instructions value: -1.26 -0.21 95% mean confidence interval for instructions %-change: -0.62% 0.77% Inconclusive result (%-change mean confidence interval includes 0). total loops in shared programs: 4373 -> 4373 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total cycles in shared programs: 374518258 -> 374384193 (-0.04%) cycles in affected programs: 231101954 -> 230967889 (-0.06%) helped: 21427 HURT: 19438 helped stats (abs) min: 1 max: 2035 x̄: 31.09 x̃: 8 helped stats (rel) min: <.01% max: 40.95% x̄: 2.42% x̃: 0.86% HURT stats (abs) min: 1 max: 20875 x̄: 27.38 x̃: 8 HURT stats (rel) min: <.01% max: 59.09% x̄: 2.49% x̃: 0.80% 95% mean confidence interval for cycles value: -4.49 -2.07 95% mean confidence interval for cycles %-change: -0.14% -0.04% Cycles are helped. total spills in shared programs: 23406 -> 23411 (0.02%) spills in affected programs: 3 -> 8 (166.67%) helped: 0 HURT: 2 total fills in shared programs: 34845 -> 34850 (0.01%) fills in affected programs: 3 -> 8 (166.67%) helped: 0 HURT: 2 LOST: 0 GAINED: 0 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111566 Fixes: f4ef34f207d1 "intel/fs: Add an UNDEF instruction to avoid..." Reviewed-by: Francisco Jerez <[email protected]>
* intel/gen11+: Enable Hardware filtering of Semi-Pipelined State in WMAnuj Phogat2019-09-111-0/+11
| | | | | | | Initial benchmarking didn't show any performance benefits. But it might eventually. Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* genxml/gen11+: Add COMMON_SLICE_CHICKEN4 registerAnuj Phogat2019-09-112-0/+10
| | | | | Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* android: anv: libmesa_vulkan_common: add libmesa_util static dependencyMauro Rossi2019-09-081-1/+2
| | | | | | | | | | | | | Change needed to fix the following building error: In file included from external/mesa/src/intel/vulkan/anv_device.c:43: external/mesa/src/util/xmlpool.h:115:10: fatal error: 'xmlpool/options.h' file not found ^~~~~~~~~~~~~~~~~~~ 1 error generated. Fixes: 4dcb1ff ("anv: add support for driconf") Signed-off-by: Mauro Rossi <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* intel/blorp: Use wide formats for nicely aligned stencil clearsJason Ekstrand2019-09-062-0/+122
| | | | | | | | | | | | | | | | In the case where the stencil clear is nicely aligned, we can clear stencil much more efficiently by mapping it as a wide format (say RGBA32_UINT) and blasting out the stencil clear value with a repclear. On Unigine Heaven, this makes one stencil clear go from non-trivial to unnoticeable when looking at per-draw timings. In order for this change to work properly, ANV needs to do a bit more flushing around depth and stencil clears. i965 and iris already have the cache tracking logic to handle this so no changes are required there. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/blorp: Expose surf_fake_interleaved_msaa internallyJason Ekstrand2019-09-062-5/+8
|
* intel/blorp: Expose surf_retile_w_to_y internallyJason Ekstrand2019-09-062-5/+8
| | | | | Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* blorp: Memset surface info to zero when initializing itJason Ekstrand2019-09-061-0/+1
| | | | | | | | This isn't known to fix any current bugs but it does prevent a regression in a subsequent commit. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/tools: Decode PS kernels on SNBJason Ekstrand2019-09-061-1/+4
| | | | | Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/tools: Decode 3DSTATE_BINDING_TABLE_POINTERS on SNBJason Ekstrand2019-09-061-0/+15
| | | | | Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: add support for vk_x11_override_min_image_countEric Engestrom2019-09-061-0/+3
| | | | | | | Cc: [email protected] Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: add support for driconfEric Engestrom2019-09-064-3/+19
| | | | | | | | | No option is supported yet, this is just the boilerplate. Cc: [email protected] Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv,iris: L3ALLOC register replaces L3CNTLREG for gen12Jordan Justen2019-09-062-7/+16
| | | | | Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>