summaryrefslogtreecommitdiffstats
path: root/src/intel/compiler
Commit message (Collapse)AuthorAgeFilesLines
* intel/compiler/icl: Use invocation id bits 22:16 instead of 23:17Topi Pohjolainen2018-10-171-2/+6
| | | | | | | | | | | | | | Identifier bits in the dispatch header have changed. See Bspec: SINGLE_PATCH Payload: 3D Pipeline Stages - 3D Pipeline Geometry - Hull Shader (HS) Stage IVB+ - Payloads IVB+ Fixes: KHR-GL46.tessellation_shader.tessellation_shader_tc_barriers.barrier_guarded_read_write_calls Reviewed-by: Anuj Phogat <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]>
* i965/fs: Add 64-bit int immediate support to dump_instructions()Matt Turner2018-10-162-0/+8
| | | | | Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]>
* intel: disable FS IR validation in release mode.Kenneth Graunke2018-10-151-0/+2
| | | | | | We probably don't need to iterate, fprintf, and abort in release mode. Reviewed-by: Matt Turner <[email protected]>
* intel/nir, freedreno/ir3: Use the separated dead write vars passCaio Marcelo de Oliveira Filho2018-10-151-0/+1
| | | | | | | No changes to shader-db for intel. No changes to shader-db expected for freedreno. Reviewed-by: Jason Ekstrand <[email protected]>
* intel/vec4: Fix nir_op_b2[fi] with 64-bit resultJason Ekstrand2018-10-111-1/+6
| | | | | | | | | This is valid NIR but you can't actually hit this case today. GLSL IR doesn't have a bool to double opcode; it does f2d(b2f(x)). In SPIR-V we don't have any to/from bool conversion opcodes at all. However, the next commit will make us start generating it so we should be ready. Reviewed-by: Ian Romanick <[email protected]>
* intel/fs: Fix nir_op_b2[fi] with 64-bit result on Gen8 LP and Gen9 LPJason Ekstrand2018-10-111-5/+5
| | | | | | | | | | | | | | | | | | | | | | Several of the Atom GPUs have additional restrictions on alignment when moving < 64-bit source to a 64-bit destination. All of the nir_op_*2*64 code generation paths respected this, but nir_op_b2[fi] did not. Previous to commit a68dd47b911 it was not possible to generate such an instruction from the GLSL path. It may have been possible from SPIR-V, but it's not clear. The aforementioned patch converts a 64-bit nir_op_fsign into a sequence of operations including a nir_op_b2f with a 64-bit result. This "just works" everywhere except these Atom parts. This problem was not detected during normal CI testing because the Atom parts are not included in developer builds. v2 (idr): Make the patch compile, and make some cosmetic changes. Add a commit message. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108319 Fixes: a68dd47b911 "nir/algebraic: Simplify fsat of fsign" Reviewed-by: Ian Romanick <[email protected]>
* intel: Introducing Whiskey Lake platformRodrigo Vivi2018-10-111-0/+1
| | | | | | | | | | | | | | | | | | | Whiskey Lake uses the same gen graphics as Coffe Lake, including some ids that were previously marked as reserved on Coffe Lake, but that now are moved to WHL page. This follows the ids and approach used on kernel's commit b9be78531d27 ("drm/i915/whl: Introducing Whiskey Lake platform") and commit c1c8f6fa731b ("drm/i915: Redefine some Whiskey Lake SKUs") v2: Lionel noticed that GT{1,2,3} on kernel wasn't following spec when looking to number of EUs, so kernel has been updated. Cc: Lionel Landwerlin <[email protected]> Cc: José Roberto de Souza <[email protected]> Cc: Anuj Phogat <[email protected]> Signed-off-by: Rodrigo Vivi <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* intel: Don't propagate conditional modifiers if a UD source is negatedJason Ekstrand2018-10-105-0/+50
| | | | | | | | | This fixes a bug uncovered by my NIR integer division by constant optimization series. Fixes: 19f9cb72c8b "i965/fs: Add pass to propagate conditional..." Fixes: 627f94b72e0 "i965/vec4: adding vec4_cmod_propagation..." Reviewed-by: Ian Romanick <[email protected]>
* intel/compiler: Don't handle fsign.satIan Romanick2018-10-092-23/+3
| | | | | | | No shader-db or CI changes on any Intel platform. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Thomas Helland <[email protected]>
* intel/fs: Fix a typo in need_matching_subreg_offsetJason Ekstrand2018-10-021-1/+1
| | | | | | | | | This fixes a bunch of Vulkan subgroup tests on little core platforms. Fixes: 4150920b95 "intel/fs: Add a helper for emitting scan operations" Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Tested-by: Mark Janes <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/compiler: Export TCS passthrough creationCaio Marcelo de Oliveira Filho2018-09-252-0/+86
| | | | | | | Move create_passthrough_tcs() from i965 so can be used in other contexts. Acked-by: Jason Ekstrand <[email protected]>
* intel/compiler/icl: Use barrier id bits 24:30 instead of 24:27,31Topi Pohjolainen2018-09-251-3/+13
| | | | | | | Fixes gpu hangs with Carchase and Manhattan. Reviewed-by: Anuj Phogat <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]>
* i965/fs: Don't propagate conditional modifiers from integer compares to addsIan Romanick2018-09-171-1/+9
| | | | | | | | | | | | No shader-db changes on any Intel platform... which probably explains why no bugs have been bisected to this problem since it landed in Mesa 18.1. :( The commit mentioned below is in 18.2, so 18.1 would need a slightly different fix (due to code refactoring). Signed-off-by: Ian Romanick <[email protected]> Fixes: 77f269bb560 "i965/fs: Refactor propagation of conditional modifiers from compares to adds" Reviewed-by: Alejandro Piñeiro <[email protected]> (reviewed the original patch) Cc: Matt Turner <[email protected]> (reviewed the original patch)
* Replace uses of _mesa_bitcount with util_bitcountDylan Baker2018-09-075-6/+11
| | | | | | | | | | | | | and _mesa_bitcount_64 with util_bitcount_64. This fixes a build problem in nir for platforms that don't have popcount or popcountll, such as 32bit msvc. v2: - Fix additional uses of _mesa_bitcount added after this was originally written Acked-by: Eric Engestrom <[email protected]> (v1) Acked-by: Eric Anholt <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* nir: Drop the vs_inputs_dual_locations optionJason Ekstrand2018-09-061-3/+0
| | | | | | | | | | | | | It was very inconsistently handled; the only things that made use of it were glsl_to_nir, glspirv, and nir_gather_info. In particular, nir_lower_io completely ignored it so anyone using nir_lower_io on 64-bit vertex attributes was going to be in for a shock. Also, as of the previous commit, it's set by every driver that supports 64-bit vertex attributes. There's no longer any reason to have it be an option so let's just delete it. Reviewed-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* intel/compiler: remove unused get_image_base_type()Eric Engestrom2018-09-061-19/+0
| | | | | | | | | Unused since 09f1de97a76a4990fd7c "anv,i965: Lower away image derefs in the driver". Cc: Jason Ekstrand <[email protected]> Signed-off-by: Eric Engestrom <[email protected]> Acked-by: Jason Ekstrand <[email protected]>
* i965/fs: include multisamplers on image_intrinsic_coord_componentsAlejandro Piñeiro2018-09-051-0/+2
| | | | | | | | | | | | | | | | | | This is the second patch needed to fix the following piglit tests: tests/spec/arb_gl_spirv/linker/uniform/multisampler.shader_test tests/spec/arb_gl_spirv/linker/uniform/multisampler-array.shader_test Although in this case it doesn't affect so many borrowed tests, as there aren't too many tests using multisamplers on Intel. It is worth to note that this patch is also needed when those tests are run on GLSL mode (using the --glsl option). Although most Intel drivers would not be able to run/execute tests using multisamplers, as GL_MAX_IMAGE_SAMPLES is zero, technically those tests are expected to link correctly, so linking tests should pass. Reviewed-by: Jason Ekstrand <[email protected]>
* intel/compiler: rename brw_nir_lower_glsl_imagesAlejandro Piñeiro2018-09-051-2/+2
| | | | | | | | | | To brw_nir_lower_gl_images, as it will be also used on the ARB_gl_spirv codepath, that doesn't involves GLSL at all. So the lowering is about images following the OpenGL semantics. In any case "brw_nir_lower_opengl_images" seemed too long to me, so I just used gl. That shortening is already used on other parts of the code. Reviewed-by: Jason Ekstrand <[email protected]>
* intel/compiler: Remove redundant nir_remove_dead_variables callJason Ekstrand2018-09-041-2/+0
| | | | | | | As of 07a2098a708a2, brw_nir_optimize calls nir_remove_dead_variables as the last optimization. Doing it again is just pointless. Reviewed-by: Tapani Pälli <[email protected]>
* intel: compiler: remove dead local variables at optimization passLionel Landwerlin2018-09-031-0/+5
| | | | | | | | | | | | | | | | | | | | We're hitting an assert in gfxbench because one of the local variable is a sampler (according to Jason this isn't valid) : testfw_app: ../src/compiler/nir_types.cpp:551: void glsl_get_natural_size_align_bytes(const glsl_type*, unsigned int*, unsigned int*): Assertion `!"type does not have a natural size"' failed. Since this particular variable isn't used, it can be eliminated by removing unused local variables at the end of the optimization loop. This makes sense also for valid local variables. v2: Move additional local variable removal out of optimization loop, but before large constant removal (Jason/Lionel) v3: Move the removal at the end of brw_nir_optimize() Signed-off-by: Lionel Landwerlin <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107806 Reviewed-by: Jason Ekstrand <[email protected]>
* i965/vec4: Clamp indirect tes input array reads with 0x0fffffffIan Romanick2018-09-011-1/+11
| | | | | | | | | Page 190 of "Volume 7: 3D Media GPGPU Engine (Haswell)" says the valid range of the offset is [0, 0FFFFFFFh]. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: [email protected]
* i965/vec4: Correctly handle uniform sources in ↵Ian Romanick2018-09-011-1/+14
| | | | | | | | | | | generate_tes_add_indirect_urb_offset Fixes failure in the new piglit test tes-patch-input-array-vec2-index-invalid-rd.shader_test. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: [email protected]
* intel: Introducing Amber Lake platformRodrigo Vivi2018-08-311-0/+1
| | | | | | | | | | | | | | | Amber Lake uses the same gen graphics as Kaby Lake, including a id that were previously marked as reserved on Kaby Lake, but that now is moved to AML page. This follows the ids and approach used on kernel's commit e364672477a1 ("drm/i915/aml: Introducing Amber Lake platform") Reported-by: Timo Aaltonen <[email protected]> Cc: José Roberto de Souza <[email protected]> Cc: Anuj Phogat <[email protected]> Signed-off-by: Rodrigo Vivi <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/nir: Lowering image loads and stores trashes all metadataJason Ekstrand2018-08-301-2/+2
| | | | | | | | | | | This fixes the GL_ARB_fragment_shader_interlock piglit test on gen8 platforms where the lack of metadata dirtying was causing another pass to accidentally delete a much needed loop. https://bugs.freedesktop.org/show_bug.cgi?id=107745 Fixes: 37f7983bcca1 "intel/compiler: Do image load/store lowering..." Jason Ekstrand <[email protected]> writes: Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/compiler: Remove surface_idx from brw_image_paramJason Ekstrand2018-08-292-13/+6
| | | | | | | Now that the drivers are lowering to surface indices themselves, we no longer need to push the surface index into the shader. Reviewed-by: Kenneth Graunke <[email protected]>
* intel: Use TXS for image_size when we have a typed surfaceJason Ekstrand2018-08-295-4/+74
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* anv,i965: Lower away image derefs in the driverJason Ekstrand2018-08-295-118/+122
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, the back-end compiler turn image access into magic uniform reads and there was a complex contract between back-end compiler and driver about setting up and filling out those params. As of this commit, both drivers now lower image_deref_load_param_intel intrinsics to load_uniform intrinsics controlled by the driver and lower the other image_deref_* intrinsics to image_* intrinsics which take an actual binding table index. There are still "magic" uniforms but they are now added and controlled entirely by the driver and that contract no longer spans components. This also has the side-effect of making most image use compile-time binding table indices. Previously, all image access pulled the binding table index from a uniform. Part of the reason for this was that the magic uniforms made it difficult to decouple binding table indices from the uniforms and, since they are indexed completely differently (especially in Vulkan), it was hard to pull them apart. Now that the driver is handling both, it's trivial to decouple the two and provide actual binding table indices. Shader-db results on Kaby Lake: total instructions in shared programs: 15166872 -> 15164293 (-0.02%) instructions in affected programs: 115834 -> 113255 (-2.23%) helped: 191 HURT: 0 total cycles in shared programs: 571311495 -> 571196465 (-0.02%) cycles in affected programs: 4757115 -> 4642085 (-2.42%) helped: 73 HURT: 67 total spills in shared programs: 10951 -> 10926 (-0.23%) spills in affected programs: 742 -> 717 (-3.37%) helped: 7 HURT: 0 total fills in shared programs: 22226 -> 22201 (-0.11%) fills in affected programs: 1146 -> 1121 (-2.18%) helped: 7 HURT: 0 Reviewed-by: Kenneth Graunke <[email protected]>
* nir: Use a bitfield for image access qualifiersJason Ekstrand2018-08-291-1/+1
| | | | | | | | | | This commit expands the current memory access enum to contain the extra two bits provided for images. We choose to follow the SPIR-V convention of NonReadable and NonWriteable because readonly implies that you *can* read so readonly + writeonly doesn't make as much sense as NonReadable + NonWriteable. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/compiler: Use two components for 1D array image sizesJason Ekstrand2018-08-291-23/+14
| | | | | | | | | | Having the array length component stored in .z was a small convenience for the ISL image param filling code and an annoyance in the NIR lowering code. The only convenience of treating 1D arrays like 2D arrays in the lowering code is in the address calculation code so let's put all the complexity there as well. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/compiler: Do image load/store lowering to NIRJason Ekstrand2018-08-296-1120/+882
| | | | | | | | | | | | | | | | | | | | | | | This commit moves our storage image format conversion codegen into NIR instead of doing it in the back-end. This has the advantage of letting us run it through NIR's optimizer which is pretty effective at shrinking things down. In the common case of rgba8, the number of instructions emitted after NIR is done with it is half of what it was with the lowering happening in the back-end. On the downside, the back-end's lowering is able to directly use predicates and the NIR lowering has to use IFs. Shader-db results on Kaby Lake: total instructions in shared programs: 15166910 -> 15166872 (<.01%) instructions in affected programs: 5895 -> 5857 (-0.64%) helped: 15 HURT: 0 Clearly, we don't have that much image_load_store happening in the shaders in shader-db.... Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vec4: Emit BRW_AOP_INC or BRW_AOP_DEC for atomicAdd of +1 or -1Ian Romanick2018-08-281-3/+16
| | | | | | | No shader-db changes on any Intel platform. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* i965/fs: Emit BRW_AOP_INC or BRW_AOP_DEC for imageAtomicAdd of +1 or -1Ian Romanick2018-08-281-3/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | v2: Refactor selection of atomic opcode to a separate function. Suggested by Jason. No changes on any other Intel platforms. Skylake total instructions in shared programs: 14304261 -> 14304241 (<.01%) instructions in affected programs: 1625 -> 1605 (-1.23%) helped: 4 HURT: 0 helped stats (abs) min: 1 max: 8 x̄: 5.00 x̃: 5 helped stats (rel) min: 1.01% max: 14.29% x̄: 5.86% x̃: 4.07% 95% mean confidence interval for instructions value: -10.66 0.66 95% mean confidence interval for instructions %-change: -15.91% 4.19% Inconclusive result (value mean confidence interval includes 0). total cycles in shared programs: 527531226 -> 527531194 (<.01%) cycles in affected programs: 92204 -> 92172 (-0.03%) helped: 2 HURT: 0 Haswell and Broadwell had similar results. (Broadwell shown) total instructions in shared programs: 14615730 -> 14615710 (<.01%) instructions in affected programs: 1838 -> 1818 (-1.09%) helped: 4 HURT: 0 helped stats (abs) min: 1 max: 8 x̄: 5.00 x̃: 5 helped stats (rel) min: 0.89% max: 13.04% x̄: 5.37% x̃: 3.78% 95% mean confidence interval for instructions value: -10.66 0.66 95% mean confidence interval for instructions %-change: -14.59% 3.85% Inconclusive result (value mean confidence interval includes 0). Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Refactor image atomics to be a bit more like other atomicsIan Romanick2018-08-281-40/+44
| | | | | | | This greatly simplifies the next patch. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* i965/fs: Emit BRW_AOP_INC or BRW_AOP_DEC for atomicAdd of +1 or -1Ian Romanick2018-08-281-4/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Funny story... a single shader was hurt for instructions, spills, fills. That same shader was also the most helped for cycles. #GPUsAreWeird No changes on any other Intel platform. v2: Refactor selection of atomic opcode to a separate function. Suggested by Jason. Haswell, Broadwell, and Skylake had similar results. (Skylake shown) total instructions in shared programs: 14304116 -> 14304261 (<.01%) instructions in affected programs: 12776 -> 12921 (1.13%) helped: 19 HURT: 1 helped stats (abs) min: 1 max: 16 x̄: 2.32 x̃: 1 helped stats (rel) min: 0.05% max: 7.27% x̄: 0.92% x̃: 0.55% HURT stats (abs) min: 189 max: 189 x̄: 189.00 x̃: 189 HURT stats (rel) min: 4.87% max: 4.87% x̄: 4.87% x̃: 4.87% 95% mean confidence interval for instructions value: -12.83 27.33 95% mean confidence interval for instructions %-change: -1.57% 0.31% Inconclusive result (value mean confidence interval includes 0). total cycles in shared programs: 527552861 -> 527531226 (<.01%) cycles in affected programs: 1459195 -> 1437560 (-1.48%) helped: 16 HURT: 2 helped stats (abs) min: 2 max: 21328 x̄: 1353.69 x̃: 6 helped stats (rel) min: 0.01% max: 5.29% x̄: 0.36% x̃: 0.03% HURT stats (abs) min: 12 max: 12 x̄: 12.00 x̃: 12 HURT stats (rel) min: 0.03% max: 0.03% x̄: 0.03% x̃: 0.03% 95% mean confidence interval for cycles value: -3699.81 1295.92 95% mean confidence interval for cycles %-change: -0.94% 0.30% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 8025 -> 8033 (0.10%) spills in affected programs: 208 -> 216 (3.85%) helped: 1 HURT: 1 total fills in shared programs: 10989 -> 11040 (0.46%) fills in affected programs: 444 -> 495 (11.49%) helped: 1 HURT: 1 Ivy Bridge total instructions in shared programs: 11709181 -> 11709153 (<.01%) instructions in affected programs: 3505 -> 3477 (-0.80%) helped: 3 HURT: 0 helped stats (abs) min: 1 max: 23 x̄: 9.33 x̃: 4 helped stats (rel) min: 0.11% max: 1.16% x̄: 0.63% x̃: 0.61% total cycles in shared programs: 254741126 -> 254738801 (<.01%) cycles in affected programs: 919067 -> 916742 (-0.25%) helped: 3 HURT: 0 helped stats (abs) min: 21 max: 2144 x̄: 775.00 x̃: 160 helped stats (rel) min: 0.03% max: 0.90% x̄: 0.32% x̃: 0.03% total spills in shared programs: 4536 -> 4533 (-0.07%) spills in affected programs: 40 -> 37 (-7.50%) helped: 1 HURT: 0 total fills in shared programs: 4819 -> 4813 (-0.12%) fills in affected programs: 94 -> 88 (-6.38%) helped: 1 HURT: 0 Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> [v1] Reviewed-by: Jason Ekstrand <[email protected]>
* intel/compiler: Silence unused parameter warnings in brw_eu.hIan Romanick2018-08-281-1/+1
| | | | | | | | | | | | | | | | | All of the other brw_*_desc functions take a devinfo parameter, and all of the others at least have an assert that uses it. Keep the parameter, but mark it as unused. Silences 37 warnings like: In file included from src/intel/common/gen_disasm.c:27:0: src/intel/compiler/brw_eu.h: In function ‘brw_pixel_interp_desc’: src/intel/compiler/brw_eu.h:377:53: warning: unused parameter ‘devinfo’ [-Wunused-parameter] brw_pixel_interp_desc(const struct gen_device_info *devinfo, ^~~~~~~ Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* i965: Add INTEL_fragment_shader_ordering support.Kevin Rogovin2018-08-281-0/+1
| | | | | | | | | | Adds suppport for INTEL_fragment_shader_ordering. We achieve the fragment ordering by using the same instruction as for beginInvocationInterlockARB() which is by issuing a memory fence via sendc. Signed-off-by: Kevin Rogovin <[email protected]> Reviewed-by: Plamena Manolova <[email protected]>
* intel/eu: print bytes instead of 32 bit hex valueSagar Ghuge2018-08-271-17/+30
| | | | | | | | | | | | | | | INTEL_DEBUG=hex prints 32 bit hex value and due to endianness of CPU byte order is reversed. In order to disassemble binary files, print each byte instead of 32 bit hex value. v2: Print blank spaces in order to vertically align output of compacted instructions hex value with uncompacted instructions hex value. (Matt Turner) v3: Fix line wrap at correct length Signed-off-by: Sagar Ghuge <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* intel/nir: Enable nir_opt_find_array_copiesJason Ekstrand2018-08-232-13/+28
| | | | | | | | | | | | | | | | | | | | | | | We have to be a bit careful with this one because we want it to run in the optimization loop but only in the first brw_nir_optimize call. Later calls assume that we've lowered away copy_deref instructions and we don't want to introduce any more. Shader-db results on Kaby Lake: total instructions in shared programs: 15176942 -> 15176942 (0.00%) instructions in affected programs: 0 -> 0 helped: 0 HURT: 0 In spite of the lack of any shader-db improvement, this patch completely eliminates spilling in the Batman: Arkham City tessellation shaders. This is because we are now able to detect that the temporary array created by DXVK for storing TCS inputs is a copy of the input arrays and use indirect URB reads instead of making a copy of 4.5 KiB of input data and then indirecting on it with if-ladders. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/nir: Use nir_shrink_vec_array_varsJason Ekstrand2018-08-231-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | Shader-db results on Kaby Lake: total instructions in shared programs: 15177605 -> 15176765 (<.01%) instructions in affected programs: 4259 -> 3419 (-19.72%) helped: 1 HURT: 0 total spills in shared programs: 10954 -> 10855 (-0.90%) spills in affected programs: 295 -> 196 (-33.56%) helped: 1 HURT: 0 total fills in shared programs: 22222 -> 22117 (-0.47%) fills in affected programs: 417 -> 312 (-25.18%) helped: 1 HURT: 0 The helped shader is from the OglCSDof synmark test. On my Kaby Lake laptop, the actual framerate of the benchmark didn't appear to improve beyond the noise. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/nir: Use the new structure and array splitting passesJason Ekstrand2018-08-231-0/+2
| | | | | | | | | | | | | | | | | | | | | | We call structure splitting once because it is guaranteed to split all the structures in the entire shader in one go. We call array splitting in the loop in case future optimizations turn indirects into direct dereferences and we can split more arrays. Shader-db results on Kaby Lake: total instructions in shared programs: 15177605 -> 15177605 (0.00%) instructions in affected programs: 0 -> 0 helped: 0 HURT: 0 This is unsurprising because nir_lower_vars_to_ssa already effectively does structure and array splitting internally. It doesn't actually split the variables but it's ability to reason about aliasing in the presence of arrays and structures and pick out scalars or vectors to be lowered to SSA values is fairly advanced. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/compiler: Implement untyped atomic float min, max, and compare-swap ↵Ian Romanick2018-08-2214-1/+261
| | | | | | | | | | dataport messages v2: Split changes to the message type field to another patch. Suggested by Caio. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/compiler: Expand untyped atomic message type field by a bitIan Romanick2018-08-223-4/+9
| | | | | | | | | | | This is necessary for a new Gen9 message type that will be added in the next patch. There are also Gen8 message types that need the extra bit (mostly for bindless). v2: Split off from the next patch. Suggested by Caio. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/compiler: Silence unused parameter warningsIan Romanick2018-08-225-7/+5
| | | | | | | | | | | | | | | | | | | | | | | | src/intel/compiler/brw_disasm_info.c: In function ‘nir_print_instr’: src/intel/compiler/brw_disasm_info.c:30:61: warning: unused parameter ‘instr’ [-Wunused-parameter] __attribute__((weak)) void nir_print_instr(const nir_instr *instr, FILE *fp) {} ^~~~~ src/intel/compiler/brw_disasm_info.c:30:74: warning: unused parameter ‘fp’ [-Wunused-parameter] __attribute__((weak)) void nir_print_instr(const nir_instr *instr, FILE *fp) {} ^~ src/intel/compiler/brw_disasm.c: In function ‘src_ia1’: src/intel/compiler/brw_disasm.c:850:18: warning: unused parameter ‘_reg_file’ [-Wunused-parameter] unsigned _reg_file, ^~~~~~~~~ src/intel/compiler/brw_fs_surface_builder.cpp: In function ‘void brw::surface_access::emit_byte_scattered_write(const brw::fs_builder&, const fs_reg&, const fs_reg&, const fs_reg&, unsigned int, unsigned int, unsigned int, brw_predicate)’: src/intel/compiler/brw_fs_surface_builder.cpp:193:57: warning: unused parameter ‘size’ [-Wunused-parameter] unsigned dims, unsigned size, ^~~~ v2: Update commit message. brw_fs_generator.cpp warnings were already fixed by another patch. Noticed by Caio. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* Revert "intel/nir: Call nir_lower_io_to_scalar_early"Jason Ekstrand2018-08-151-12/+5
| | | | | | | | | | | | | Commit 4434591bf56a6b0 caused substantially more URB messages in geometry and tessellation shaders. Before we can really enable this sort of optimization, We either need some way of combining them back together into vectors or we need to do cross-stage vector element elimination without splitting everything into scalars. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107510 Fixes: 4434591bf56a6 "intel/nir: Call nir_lower_io_to_scalar_early" Acked-by: Kenneth Graunke <[email protected]> Tested-by: Mark Janes <[email protected]>
* meson: Build with Python 3Mathieu Bridon2018-08-101-1/+1
| | | | | | | | | | | | Now that all the build scripts are compatible with both Python 2 and 3, we can flip the switch and tell Meson to use the latter. Since Meson already depends on Python 3 anyway, this means we don't need two different Python stacks to build Mesa. Signed-off-by: Mathieu Bridon <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* intel: Fix SIMD16 unaligned payload GRF reads on Gen4-5.Kenneth Graunke2018-08-091-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the SIMD16 Gen4-5 fragment shader payload contains source depth (g2-3), destination stencil (g4), and destination depth (g5-6), the single register of stencil makes the destination depth unaligned. We were generating this instruction in the RT write payload setup: mov(16) m14<1>F g5<8,8,1>F { align1 compr }; which is illegal, instructions with a source region spanning more than one register need to be aligned to even registers. This is because the hardware implicitly does (nr | 1) instead of (nr + 1) when splitting the compressed instruction into two mov(8)'s. I believe this would cause the hardware to load g5 twice, replicating subspan 0-1's destination depth to subspan 2-3. This showed up as 2x2 artifact blocks in both TIS-100 and Reicast. Normally, we rely on the register allocator to even-align our virtual GRFs. But we don't control the payload, so we need to lower SIMD widths to make it work. To fix this, we teach lower_simd_width about the restriction, and then call it again after lower_load_payload (which is what generates the offending MOV). Fixes: 8aee87fe4cce0a883867df3546db0e0a36908086 (i965: Use SIMD16 instead of SIMD8 on Gen4 when possible.) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107212 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=13728 Reviewed-by: Jason Ekstrand <[email protected]> Tested-by: Diego Viola <[email protected]>
* intel/compiler: Add brw_get_compiler_config_value for disk cacheJordan Justen2018-08-012-0/+39
| | | | | | | | | | | | | | | | | | | During code review, Jason pointed out that: 2b3064c0731 "i965, anv: Use INTEL_DEBUG for disk_cache driver flags" Didn't account for INTEL_SCALER_* environment variables. To fix this, let the compiler return the disk_cache driver flags. Another possible fix would be to pull the INTEL_SCALER_* into INTEL_DEBUG bits, but as we are currently using 41 of 64 bits, I didn't think it was a good use of 4 more of these bits. (5 since INTEL_PRECISE_TRIG needs to be accounted for as well.) Cc: Jason Ekstrand <[email protected]> Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* anv/pipeline: More aggressively optimize away color attachmentsJason Ekstrand2018-08-011-0/+1
| | | | | | | | | | Instead of just looking at the number of color attachments, look at which ones are actually used by the subpass. This lets us potentially throw away chunks of the fragment shader. In DXVK, for example, all subpasses have 8 attachments and most are VK_ATTACHMENT_UNUSED so this is very helpful in that case. Reviewed-by: Timothy Arceri <[email protected]>
* intel/nir: Call nir_lower_io_to_scalar_earlyJason Ekstrand2018-08-011-5/+12
| | | | | | | | | | | | | | | | | Shader-db results on Kaby Lake: total instructions in shared programs: 15166953 -> 15073611 (-0.62%) instructions in affected programs: 2390284 -> 2296942 (-3.91%) helped: 16469 HURT: 505 total loops in shared programs: 4954 -> 4951 (-0.06%) loops in affected programs: 3 -> 0 helped: 3 HURT: 0 Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel/nir: Split IO arrays into elementsJason Ekstrand2018-08-011-0/+4
| | | | | | | | | | | | | | | | | The NIR nir_lower_io_arrays_to_elements pass attempts to split I/O variables which are arrays or matrices into a sequence of separate variables. This can help link-time optimization by allowing us to remove varyings at a more granular level. Shader-db results on Kaby Lake: total instructions in shared programs: 15177645 -> 15168494 (-0.06%) instructions in affected programs: 79857 -> 70706 (-11.46%) helped: 392 HURT: 0 Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>