summaryrefslogtreecommitdiffstats
path: root/src/intel
Commit message (Collapse)AuthorAgeFilesLines
* vulkan/wsi: Add X11 adaptive sync support based on dri options.Bas Nieuwenhuizen2019-04-231-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | The dri options are optional. When the dri options are not provided the WSI will not use adaptive sync. FWIW I think for xf86-video-amdgpu this still requires an X11 config option, so only people who opt in can get possible regressions from this. So then the remaining question is: why do this in the WSI? It has been suggested in another MR that the application sets this. However, I disagree with that as I don't think we'll ever get a reasonable set of applications setting it. The next questions is whether this can be a layer. It definitely can be as implemented now. However, I think this generally fits well with the function of the WSI. Furthemore, for e.g. the DISPLAY WSI this is much harder to do in a layer. Of course, most of the WSI could almost be a layer, but I think this still fits best in the WSI. Acked-by: Jason Ekstrand <[email protected]>
* anv: fix argument name for vkCmdEndQueryLionel Landwerlin2019-04-241-2/+2
| | | | | | | | | | Doesn't fix anything but it's not the right function prototype. Signed-off-by: Lionel Landwerlin <[email protected]> Fixes: 673f33c77dd765 ("anv: Implement CmdBegin/EndQueryIndexed") Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Sagar Ghuge <[email protected]>
* intel: workaround VS fixed function issue on Gen9 GT1 partsLionel Landwerlin2019-04-231-0/+12
| | | | | | | | | | | | The issue is noticeable in the dEQP-GLES31.functional.geometry_shading.layered.render_with_default_layer_3d test where a triangle goes missing when we use the maximum number of URB entries as specified by the documentation. Signed-off-by: Lionel Landwerlin <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107505 Reviewed-by: Kristian H. Kristensen <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
* intel/compiler: Improve fix_3src_operand()Matt Turner2019-04-221-5/+18
| | | | | | | | | | | | | | | | | | Allow ATTR and IMM sources unconditionally (ATTR are just GRFs, IMM will be handled by opt_combine_constants(). Both are already allowed by opt_copy_propagation(). Also allow FIXED_GRF if the regioning is 8,8,1. Could also allow other stride=1 regions (e.g., 4,4,1) and scalar regions but I don't think those occur. This is sufficient to allow a pass added in a future commit (fs_visitor::lower_linterp) to avoid emitting extra MOV instructions. I removed the 'src.stride > 1' case because it seems wrong: 3-src instructions on Gen6-9 are align16-only and can only do stride=1 or stride=0. A run through Jenkins with an assert(src.stride <= 1) never triggers, so it seems that it was dead code. Reviewed-by: Rafael Antognolli <[email protected]>
* intel/compiler: Add unit tests for sat prop for different exec sizesMatt Turner2019-04-221-0/+68
| | | | | | | The two new unit tests verify that propagating a saturate between instructions of different exec sizes does not happen. Reviewed-by: Rafael Antognolli <[email protected]>
* intel/compiler: Use SIMD16 instructions in fs saturate prop unit testMatt Turner2019-04-221-59/+59
| | | | | | | | | | | | Will allow us to test that propagation between instructions of different exec sizes does not happen (in the next commit). The stray-looking change in intervening_dest_write is to adjust the size of the texture result to keep the test functioning identically when the instructions' exec sizes are doubled. Without the change, the texture does not overwrite the destination fully as the unit test intends. Reviewed-by: Rafael Antognolli <[email protected]>
* intel/fs: Remove fs_generator::generate_linterp from gen11+.Rafael Antognolli2019-04-221-44/+6
| | | | | | | | | We now have a lowering pass that will do this at the fs_visitor level, so we can remove this code from gen11+. v2: Reduce size of the "i" array from 4 to 2 (Matt). Reviewed-by: Matt Turner <[email protected]>
* intel/fs: Add a lowering pass for linear interpolation.Rafael Antognolli2019-04-222-0/+47
| | | | | | | | | | | | | | | | On gen11, instead of using a PLN instruction, we convert FS_OPCODE_LINTERP to 2 or 4 multiply adds. That is done in the fs_generator code. This patch adds a lowering pass that does the same thing at the fs_visitor. It also drops the usage of NF types, since we don't need the extra precision and it lets us skip the accumulator. With all that, some optimizations will still be run on the generated code, and we should get better scheduling. v2: Update comment about saturation and conditional mod (Matt) Reviewed-by: Matt Turner <[email protected]>
* intel/fs: Move the scalar-region conversion to the generator.Rafael Antognolli2019-04-224-5/+5
| | | | | | | | | | Move the scalar-region conversion from the IR to the generator, so it doesn't affect the Gen11 path. We need the non-scalar regioning for a later lowering pass that we are adding. v2: Better commit message (Matt) Reviewed-by: Matt Turner <[email protected]>
* intel/fs: Only propagate saturation if exec_size is the same.Rafael Antognolli2019-04-221-1/+2
| | | | | | | | | Otherwise it could propagate the saturation from a SIMD16 instruction into a SIMD8 instruction. With that, only part of the destination register, which is the source of the move with saturation, would have been updated. Reviewed-by: Matt Turner <[email protected]>
* intel/fs: Add support for float16 to the fsign optimizationsIan Romanick2019-04-201-6/+24
| | | | | | | | | | | | | | | Commit ad98fbc2174 ("intel/fs: Refactor code generation for nir_op_fsign to its own function") criss-crossed with c2b8fb9a810 ("anv/device: expose VK_KHR_shader_float16_int8 in gen8+"), and I was not paying enough attention when I rebased. This adds back the float16 changes and enables the optimization. v2: Incorporate more changes from 19cd2f5debd and a8d8b1a1391 that I missed in the previous version. Fixes: ad98fbc2174 ("intel/fs: Refactor code generation for nir_op_fsign to its own function") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110474 Reviewed-by: Matt Turner <[email protected]> [v1]
* anv: Rework the descriptor set layout create loopJason Ekstrand2019-04-191-14/+13
| | | | | | | | | | | Previously, we were storing the per-binding create info pointer in the immutable_samplers field temporarily so that we can switch the order in which we walk the loop. However, now that we have multiple arrays of structs to walk, it makes more sense to store an index of some sort. Because we want to leave immutable_samplers as NULL for undefined bindings, we store index + 1 and then subtract one later. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* anv: Ignore descriptor binding flags if bindingCount == 0Jason Ekstrand2019-04-191-3/+2
| | | | | | | | | I missed this on the first go round. The bindingCount field of VkDescriptorSetLayoutBindingFlagsCreateInfoEXT is allowed to be zero which means the flags array is ignored. Fixes: d6c9bd6e01b4d "anv: Put binding flags in descriptor set layouts" Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* anv/nir: Add a central helper for figuring out SSBO address formatsJason Ekstrand2019-04-193-57/+98
| | | | Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* anv: Implement VK_EXT_descriptor_indexingJason Ekstrand2019-04-195-2/+93
| | | | | | | | | Now that everything is in place to do bindless for all resource types except input attachments and UBOs, VK_EXT_descriptor_indexing is "trivial". Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* anv: Put binding flags in descriptor set layoutsJason Ekstrand2019-04-192-0/+19
| | | | | Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* anv: Use bindless handles for imagesJason Ekstrand2019-04-195-4/+63
| | | | | Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/fs: Add support for bindless image load/store/atomicJason Ekstrand2019-04-193-8/+72
| | | | | Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* anv: Use bindless textures and samplersJason Ekstrand2019-04-196-31/+228
| | | | | | | | | | This commit changes anv to put bindless handles and sampler pointers into the descriptor buffer and use those instead of bindful when we run out of binding table space. This "spilling" of descriptors allows to to advertise an almost unbounded number of images and samplers. Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* anv: Pass the plane into lower_tex_derefJason Ekstrand2019-04-191-7/+5
| | | | | Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* anv: Use write_image_view to initialize immutable samplersJason Ekstrand2019-04-191-5/+13
| | | | | | | | | Instead of setting it manually, call the helper. When setting descriptor sets becomes more complicated than just setting some struct values, this will keep immutable sampler handling correct. Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* anv: Count the number of planes in each descriptor bindingJason Ekstrand2019-04-192-3/+19
| | | | | Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/fs: Add support for bindless texture opsJason Ekstrand2019-04-195-10/+86
| | | | | | | | | | | | | | We add two new texture sources for bindless surface and sampler handles. Bindless surface handles are expected to be pre-shifted so that the 20-bit surface state table index is in the top 20 bits of the 32-bit handle. This lets us avoid any extra shifts in the shader. Bindless sampler handles are 32-byte aligned byte offsets from general state base address. We use 32-byte aligned instead of 16-byte aligned to avoid having to use more indirect messages than needed. It means we can't tightly pack samplers but that's probably not a big deal. Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel,nir: Lower TXD with a bindless samplerJason Ekstrand2019-04-191-0/+1
| | | | | | | | | When we have a bindless sampler, we need an instruction header. Even in SIMD8, this pushes the instruction over the sampler message size maximum of 11 registers. Instead, we have to lower TXD to TXL. Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* anv: Implement VK_KHR_shader_atomic_int64Jason Ekstrand2019-04-198-5/+51
| | | | | Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* anv: Implement SSBOs bindings with GPU addresses in the descriptor BOJason Ekstrand2019-04-196-35/+347
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit adds a new way for ANV to do SSBO bindings by just passing a GPU address in through the descriptor buffer and using the A64 messages to access the GPU address directly. This means that our variable pointers are now "real" pointers instead of a vec2(BTI, offset) pair. This carries a few of advantages: 1. It lets us support a virtually unbounded number of SSBO bindings. 2. It lets us implement VK_KHR_shader_atomic_int64 which we couldn't implement before because those atomic messages are only available in the bindless A64 form. 3. It's way better than messing around with bindless handles for SSBOs which is the only other option for VK_EXT_descriptor_indexing. 4. It's more future looking, maybe? At the least, this is what NVIDIA does (they don't have binding based SSBOs at all). This doesn't a priori mean it's better, it just means it's probably not terrible. The big disadvantage, of course, is that we have to start doing our own bounds checking for robustBufferAccess again have to push in dynamic offsets. Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* anv: Lower some SSBO operations in apply_pipeline_layoutJason Ekstrand2019-04-191-2/+212
| | | | | | | | | | | | In order to avoid the potential overhead of A64 operations on all SSBO ops, we look for those SSBO ops where we can get to the descriptor set from the SSBO access operation and lower those to a binding-table approach. When robustBufferAccess is enabled, this lets the hardware do the bounds checking for us. It also avoids some potentially expensive 64-bit integer calculations. Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* anv: Add a has_a64_buffer_access to anv_physical_deviceJason Ekstrand2019-04-194-6/+11
| | | | | | | | This is more descriptive and a bit nicer than checking for gen >= 8 && use_softpin everywhere. Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/nir: Re-run int64 lowering in postprocess_nirJason Ekstrand2019-04-191-0/+1
| | | | | | | | | | | We're about to start doing 64-bit pointer calculations in ANV. They will get applied after brw_preprocess_nir which is where we currently do 64-bit integer arithmetic lowering. Because we're adding 64-bit integer arithmetic after the initial lowering has happened, we need to lower again. Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* anv/pipeline: Add skeleton support for spilling to bindlessJason Ekstrand2019-04-194-27/+122
| | | | | | | | | | If the number of surfaces or samplers exceeds what we can put in a table, we will want to spill out to bindless. There is no bindless support yet but this gets us the basic framework that will be used by later commits. Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* anv/pipeline: Sort bindings by most used firstJason Ekstrand2019-04-191-40/+95
| | | | | | | | | This commit just sorts the bindings by how often they're used vs the array size of the binding. This will let us make more nuanced decisions about what goes in the binding table vs. what to make bindless. Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* anv: Add a #define for the max binding table sizeJason Ekstrand2019-04-193-4/+16
| | | | | | | | | This also fixes a bug where we mis-calculate maximum binding table sizes and may return true in vkGetDescriptorSetLayoutSupport even for sets too large to fit in a binding table. Fixes: ddc40691221 "anv: Implement VK_KHR_maintenance3" Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* anv: Put image params in the descriptor set buffer on gen8 and earlierJason Ekstrand2019-04-196-124/+109
| | | | | | | | | | | | This is really where they belong; not push constants. The one downside here is that we can't push them anymore for compute shaders. However, that's a general problem and we should figure out how to push descriptor sets for compute shaders. This lets us bump MAX_IMAGES to 64 on BDW and earlier platforms because we no longer have to worry about push constant overhead limits. Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* anv: Make all VkDeviceMemory BOs resident permanentlyJason Ekstrand2019-04-194-46/+48
| | | | | | | | | | | | | | | | | | | | | We spend a lot of time in the driver adding things to hash sets to track residency. The reality is that a properly built Vulkan app uses large memory objects and sub-allocates from them. In a typical frame, most of if not all of those allocations are going to be resident for the entire frame so we're really not saving ourselves much by tracking fine-grained residency. Just throwing everything in the validation list does make it a little bit more expensive inside the kernel to walk the list and ensure that all our VA is in order. However, without relocations, the overhead of that is pretty small. If we ever do run into a memory pressure situation where the fine- grained residency could even potentially help, we would likely be swapping one page out to make room for another within the draw call and performance is totally lost at that point. We're better off swapping out other apps and just letting ours run a whole frame. Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* anv: limit URB reconfigurations when using blorpLionel Landwerlin2019-04-193-3/+11
| | | | | | | | | | | If the last graphics pipeline bound to the command buffer has enough space in its VS URB entries for Blorp then avoid reconfiguring the URB partitions. v2: s/0/MESA_SHADER_VERTEX/ (Caio) Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/devinfo: add basic sanity tests on device databaseLionel Landwerlin2019-04-192-0/+45
| | | | | | | | | v2: #undef NDEBUG (Eric) Use inc_include & inc_src (Eric) Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Anuj Phogat [email protected]
* intel/devinfo: fix missing num_thread_per_eu on ICLLionel Landwerlin2019-04-191-6/+2
| | | | | | | | | | | There was an assumption that num_thread_per_eu would be set in the Gen8 features. Since this is mostly the same of all gen8->11 (except GEN9_LP that overwrites it) let's just factor it out. Signed-off-by: Lionel Landwerlin <[email protected]> Cc: [email protected] Acked-by: Eric Engestrom <[email protected]> Reviewed-by: Anuj Phogat [email protected]
* intel/fs: Account for live range lengths in spill costsJason Ekstrand2019-04-181-1/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current register allocator has a concept of "spill benefit" which is based on the number of nodes with which a given node interferes. The idea is that you want to spill stuff with high interference because those are the most likely registers to help when spilling. However, this fails to take into account the length of the live range so the allocator frequently picks "cheap" (not many uses) registers which are actually very short lived and so spilling them doesn't help with the pressure situation. This commit takes into account the length of the live range to make long-lived registers more likely to get spilled than short-lived ones. This encourages the spill chooser to choose slightly larger registers which will affect a larger area of the program and hopefully we have to spill fewer of them to get the same reduction in over-all register pressure. Shader-db results on Kaby Lake: total spills in shared programs: 23664 -> 12050 (-49.08%) spills in affected programs: 19243 -> 7629 (-60.35%) helped: 296 HURT: 8 total fills in shared programs: 32028 -> 25139 (-21.51%) fills in affected programs: 20378 -> 13489 (-33.81%) helped: 295 HURT: 16 Of course, most of that is in Deus Ex... Shader-db results on Kaby Lake (without Deus Ex): total spills in shared programs: 6479 -> 5834 (-9.96%) spills in affected programs: 3231 -> 2586 (-19.96%) helped: 40 HURT: 4 total fills in shared programs: 17165 -> 17099 (-0.38%) fills in affected programs: 6951 -> 6885 (-0.95%) helped: 40 HURT: 7 Even without Deus Ex, the spill help is pretty respectable. The worst hurt shaders were one compute shader in Aztec Ruins and one fragment shader in KSP that were each hurt by around 13% fill 9% spill. VkPipeline-db results on Kaby Lake: total spills in shared programs: 9149 -> 8069 (-11.80%) spills in affected programs: 5197 -> 4117 (-20.78%) helped: 27 HURT: 16 total fills in shared programs: 26390 -> 25477 (-3.46%) fills in affected programs: 12662 -> 11749 (-7.21%) helped: 24 HURT: 22 The Vulkan results were decidedly more mixed but we don't have nearly as many apps in that database yet. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* anv: fix uninitialized pthread cond clock domainLionel Landwerlin2019-04-181-1/+1
| | | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Fixes: 843775bab78a6b ("anv: Rework fences") Reviewed-by: Jason Ekstrand <[email protected]>
* anv: Drop some unneeded ANV_FROM_HANDLE for physical devicesJason Ekstrand2019-04-181-6/+0
| | | | | | Ever since 48ed2a7bb009618ed, we've had one at the top of the function. Reviewed-by: Caio Marcelo de Oliveira Filho [email protected]
* anv: Re-sort the GetPhysicalDeviceFeatures2 switch statementJason Ekstrand2019-04-181-17/+17
| | | | Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/fs: Generate better code for fsign multiplied by a valueIan Romanick2019-04-181-0/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | v2: Rebase on v2 changes in previous two commits. v3: Rebase on 85c35885b38 ("nir: Rework nir_src_as_alu_instr to not take a pointer"). shader-db results: Skylake and Broadwell had similar results. (Skylake shown) total instructions in shared programs: 15297100 -> 15282141 (-0.10%) instructions in affected programs: 956685 -> 941726 (-1.56%) helped: 4527 HURT: 0 helped stats (abs) min: 1 max: 221 x̄: 3.30 x̃: 2 helped stats (rel) min: 0.07% max: 10.53% x̄: 1.85% x̃: 1.37% 95% mean confidence interval for instructions value: -3.48 -3.12 95% mean confidence interval for instructions %-change: -1.88% -1.81% Instructions are helped. total cycles in shared programs: 372809551 -> 372597886 (-0.06%) cycles in affected programs: 13645512 -> 13433847 (-1.55%) helped: 4362 HURT: 125 helped stats (abs) min: 1 max: 2088 x̄: 50.73 x̃: 28 helped stats (rel) min: 0.01% max: 28.20% x̄: 2.77% x̃: 2.39% HURT stats (abs) min: 1 max: 1836 x̄: 76.90 x̃: 28 HURT stats (rel) min: <.01% max: 34.36% x̄: 3.03% x̃: 1.42% 95% mean confidence interval for cycles value: -50.98 -43.37 95% mean confidence interval for cycles %-change: -2.67% -2.55% Cycles are helped. total spills in shared programs: 23465 -> 23463 (<.01%) spills in affected programs: 42 -> 40 (-4.76%) helped: 1 HURT: 0 total fills in shared programs: 31766 -> 31763 (<.01%) fills in affected programs: 69 -> 66 (-4.35%) helped: 1 HURT: 0 Haswell total instructions in shared programs: 13839992 -> 13828311 (-0.08%) instructions in affected programs: 712503 -> 700822 (-1.64%) helped: 3477 HURT: 0 helped stats (abs) min: 1 max: 221 x̄: 3.36 x̃: 2 helped stats (rel) min: 0.07% max: 10.64% x̄: 1.96% x̃: 1.52% 95% mean confidence interval for instructions value: -3.58 -3.14 95% mean confidence interval for instructions %-change: -2.01% -1.92% Instructions are helped. total cycles in shared programs: 387026330 -> 386872483 (-0.04%) cycles in affected programs: 11329966 -> 11176119 (-1.36%) helped: 3307 HURT: 139 helped stats (abs) min: 2 max: 1776 x̄: 49.58 x̃: 18 helped stats (rel) min: 0.01% max: 20.38% x̄: 2.27% x̃: 1.79% HURT stats (abs) min: 1 max: 2314 x̄: 72.68 x̃: 20 HURT stats (rel) min: <.01% max: 33.99% x̄: 2.28% x̃: 0.96% 95% mean confidence interval for cycles value: -49.31 -39.98 95% mean confidence interval for cycles %-change: -2.15% -2.01% Cycles are helped. LOST: 1 GAINED: 0 Ivy Bridge total instructions in shared programs: 12045602 -> 12038463 (-0.06%) instructions in affected programs: 623837 -> 616698 (-1.14%) helped: 2498 HURT: 0 helped stats (abs) min: 1 max: 39 x̄: 2.86 x̃: 2 helped stats (rel) min: 0.05% max: 10.00% x̄: 1.30% x̃: 1.05% 95% mean confidence interval for instructions value: -2.96 -2.75 95% mean confidence interval for instructions %-change: -1.34% -1.26% Instructions are helped. total cycles in shared programs: 181025675 -> 180891323 (-0.07%) cycles in affected programs: 11329329 -> 11194977 (-1.19%) helped: 2439 HURT: 47 helped stats (abs) min: 1 max: 1565 x̄: 57.06 x̃: 26 helped stats (rel) min: 0.02% max: 24.56% x̄: 2.02% x̃: 1.64% HURT stats (abs) min: 1 max: 1269 x̄: 102.51 x̃: 43 HURT stats (rel) min: 0.11% max: 52.94% x̄: 4.15% x̃: 1.34% 95% mean confidence interval for cycles value: -59.91 -48.17 95% mean confidence interval for cycles %-change: -1.99% -1.82% Cycles are helped. Sandy Bridge, Iron Lake, and GM45 had similar results. (Sandy Bridge shown) total instructions in shared programs: 10896368 -> 10896339 (<.01%) instructions in affected programs: 3767 -> 3738 (-0.77%) helped: 17 HURT: 0 helped stats (abs) min: 1 max: 4 x̄: 1.71 x̃: 1 helped stats (rel) min: 0.13% max: 9.52% x̄: 3.58% x̃: 2.73% 95% mean confidence interval for instructions value: -2.27 -1.14 95% mean confidence interval for instructions %-change: -5.14% -2.03% Instructions are helped. total cycles in shared programs: 155091109 -> 155091021 (<.01%) cycles in affected programs: 47241 -> 47153 (-0.19%) helped: 15 HURT: 8 helped stats (abs) min: 2 max: 81 x̄: 15.73 x̃: 4 helped stats (rel) min: 0.03% max: 10.59% x̄: 1.55% x̃: 0.71% HURT stats (abs) min: 14 max: 32 x̄: 18.50 x̃: 17 HURT stats (rel) min: 0.32% max: 2.79% x̄: 2.43% x̃: 2.71% 95% mean confidence interval for cycles value: -14.59 6.93 95% mean confidence interval for cycles %-change: -1.41% 1.08% Inconclusive result (value mean confidence interval includes 0). Reviewed-by: Matt Turner <[email protected]> [v2]
* intel/fs: Add a scale factor to emit_fsignIan Romanick2019-04-182-12/+77
| | | | | | | | | | | | Normally fsign generates -1, 0, or +1. The new scale factor, S, causes fsign to generate -S, 0, or +S. v2: Rebase on v2 changes in previous commit. v3: Rebase on 85c35885b38 ("nir: Rework nir_src_as_alu_instr to not take a pointer"). Reviewed-by: Matt Turner <[email protected]> [v2]
* intel/fs: Refactor code generation for nir_op_fsign to its own functionIan Romanick2019-04-182-65/+65
| | | | | | | v2: Call emit_fsign from inside the existing switch statement. Suggested by Matt. Reviewed-by: Matt Turner <[email protected]>
* intel/fs: Eliminate dead code firstIan Romanick2019-04-181-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This simplifies the later patch "i965/fs: Generate better code for fsign multiplied by a value". shader-db results: Broadwell and Skylake had similar results. (Skylake shown) total cycles in shared programs: 372808735 -> 372809551 (<.01%) cycles in affected programs: 1519520 -> 1520336 (0.05%) helped: 243 HURT: 277 helped stats (abs) min: 1 max: 226 x̄: 34.05 x̃: 5 helped stats (rel) min: 0.01% max: 13.88% x̄: 1.46% x̃: 0.27% HURT stats (abs) min: 1 max: 1810 x̄: 32.82 x̃: 5 HURT stats (rel) min: 0.01% max: 16.03% x̄: 1.56% x̃: 0.29% 95% mean confidence interval for cycles value: -7.18 10.32 95% mean confidence interval for cycles %-change: -0.17% 0.46% Inconclusive result (value mean confidence interval includes 0). Sandy Bridge, Haswell and Ivy Bridge had similar results. (Sandy Bridge shown) total cycles in shared programs: 155091458 -> 155091109 (<.01%) cycles in affected programs: 370797 -> 370448 (-0.09%) helped: 24 HURT: 36 helped stats (abs) min: 1 max: 331 x̄: 103.17 x̃: 41 helped stats (rel) min: 0.02% max: 7.70% x̄: 2.07% x̃: 0.56% HURT stats (abs) min: 1 max: 291 x̄: 59.08 x̃: 10 HURT stats (rel) min: 0.02% max: 5.29% x̄: 1.02% x̃: 0.15% 95% mean confidence interval for cycles value: -37.92 26.28 95% mean confidence interval for cycles %-change: -0.88% 0.45% Inconclusive result (value mean confidence interval includes 0). Iron Lake and GM45 had similar results. (GM45 shown) total cycles in shared programs: 129133970 -> 129133978 (<.01%) cycles in affected programs: 111966 -> 111974 (<.01%) helped: 3 HURT: 1 helped stats (abs) min: 2 max: 4 x̄: 2.67 x̃: 2 helped stats (rel) min: <.01% max: <.01% x̄: <.01% x̃: <.01% HURT stats (abs) min: 16 max: 16 x̄: 16.00 x̃: 16 HURT stats (rel) min: 0.07% max: 0.07% x̄: 0.07% x̃: 0.07% 95% mean confidence interval for cycles value: -12.93 16.93 95% mean confidence interval for cycles %-change: -0.05% 0.08% Inconclusive result (value mean confidence interval includes 0). Reviewed-by: Matt Turner <[email protected]>
* nir: Add a nir_src_as_intrinsic() helperJason Ekstrand2019-04-181-11/+4
| | | | | Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir: Rework nir_src_as_alu_instr to not take a pointerJason Ekstrand2019-04-181-6/+4
| | | | | | | | | Other nir_src_as_* functions just take a nir_src. It's not that much more memory copying and the constness preserving really isn't worth the cognitive dissonance. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* anv: implement WaEnableStateCacheRedirectToCSLionel Landwerlin2019-04-181-0/+11
| | | | | | | | | | | | This 3d performance workaround was initially put in the kernel but the media driver requires different settings so the register has been whitelisted in i915 [1] and userspace drivers are left initializing it as they wish. [1] : https://patchwork.freedesktop.org/series/59494/ Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* iris: implement WaEnableStateCacheRedirectToCSLionel Landwerlin2019-04-181-0/+5
| | | | | | | | | | | | This 3d performance workaround was initially put in the kernel but the media driver requires different settings so the register has been whitelisted in i915 [1] and userspace drivers are left initializing it as they wish. [1] : https://patchwork.freedesktop.org/series/59494/ Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* anv/device: expose VK_KHR_shader_float16_int8 in gen8+Iago Toral Quiroga2019-04-182-0/+10
| | | | | | | | v2 (Jason): - Merge shaderFloat16 and shaderInt8 enablement into a single patch. - Merge extension enable. Reviewed-by: Jason Ekstrand <[email protected]> (v1)