aboutsummaryrefslogtreecommitdiffstats
path: root/src/intel/compiler/brw_fs_visitor.cpp
Commit message (Collapse)AuthorAgeFilesLines
* i965: Move down genX_upload_sbe in profiles.Mathias Fröhlich2020-03-101-0/+1
| | | | | | | | | | | | | | | | | Avoid looping over all VARYING_SLOT_MAX urb_setup array entries from genX_upload_sbe. Prepare an array indirection to the active entries of urb_setup already in the compile step. On upload only walk the active arrays. v2: Use uint8_t to store the attribute numbers. v3: Change loop to build up the array indirection. v4: Rebase. v5: Style fix. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Signed-off-by: Mathias Fröhlich <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/308>
* intel/compiler: Move register pressure calculation into IR analysis objectFrancisco Jerez2020-03-061-4/+2
| | | | | | | | | | This defines a new BRW_ANALYSIS object which wraps the register pressure computation code along with its result. For the rationale see the previous commits converting the liveness and dominance analysis passes to the IR analysis framework. Reviewed-by: Matt Turner <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
* intel/compiler/fs: Switch liveness analysis to IR analysis frameworkFrancisco Jerez2020-03-061-1/+2
| | | | | | | | | | | This involves wrapping fs_live_variables in a BRW_ANALYSIS object and hooking it up to invalidate_analysis() so it's properly invalidated. Seems like a lot of churn but it's fairly straightforward. The fs_visitor invalidate_ and calculate_live_intervals() methods are no longer necessary after this change. Reviewed-by: Matt Turner <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
* intel/compiler: Move all live interval analysis results into fs_live_variablesFrancisco Jerez2020-03-061-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | This moves the following methods that are currently defined in fs_visitor (even though they are side products of the liveness analysis computation) and are already implemented in brw_fs_live_variables.cpp: > bool virtual_grf_interferes(int a, int b) const; > int *virtual_grf_start; > int *virtual_grf_end; It makes sense for them to be part of the fs_live_variables object, because they have the same lifetime as other liveness analysis results and because this will allow some extra validation to happen wherever they are accessed in order to make sure that we only ever use up-to-date liveness analysis results. This shortens the virtual_grf prefix in order to compensate for the slightly increased lexical overhead from the live_intervals pointer dereference. Reviewed-by: Matt Turner <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
* intel/fs: Use helper for discard sample mask flag subregister number.Francisco Jerez2020-02-141-1/+1
| | | | | | | | | Use it instead of hard-coding f0.1 for the sample mask of programs that use discard. This will make the task easier when we replace f0.1 with another flag register location in order to support discard with SIMD32 shaders. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/fs/gen11: Work around dual-source blending hangs in combination with ↵Francisco Jerez2020-02-141-0/+12
| | | | | | | | | | | | | | SIMD32. The SIMD8 dual-source blending framebuffer write messages seem to have trouble releasing the pixel scoreboard dependency in SIMD32 dispatch mode, which leads to hangs. I have a better workaround for this which doesn't involve disabling SIMD32 when dual-source blending is enabled, but I'm still investigating some issues with it. Limit the dispatch width to SIMD16 in such cases for the moment in order to make the CI happy on ICL with SIMD32 fragment shaders enabled. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/fs: Set src0 alpha present bit in header when provided in message payload.Francisco Jerez2020-02-141-2/+2
| | | | | | | | | | | | | | | | | | Currently the "Source0 Alpha Present to RenderTarget" bit of the RT write message header is derived from brw_wm_prog_data::replicate_alpha. However the src0_alpha payload is provided anytime it's specified to the logical message. This could theoretically lead to an inconsistency if somebody provided a src0_alpha value while brw_wm_prog_data::replicate_alpha was false, as I'm planning to do in a future commit in order to implement a hardware workaround. Instead calculate the header bit based on whether a src0_alpha value was provided to the logical message, which guarantees the same behavior on pre-ICL and ICL+ (the latter used an extended descriptor bit for this which didn't suffer from the same issue). Remove the brw_wm_prog_data::replicate_alpha flag. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/fs: Switch to standard vector layout for barycentrics at optimization ↵Francisco Jerez2020-01-171-8/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | time. This involves permuting the registers of barycentric vectors to have the standard X[0-n] Y[0-n] layout at NIR translation time. Barycentrics are converted to the format expected by the PLN instruction in the lower_barycentrics() pass run after the optimization loop. Main reason is correctness of SIMD32 fragment shaders. The shuffle_from_pln_layout() and shuffle_to_pln_layout() helpers used during NIR translation are busted for SIMD32. This leads to serious corruption at present with INTEL_DEBUG=do32, especially on Gen11+ where these helpers are hit more frequently due to the lack of a hardware PLN instruction. Of course one could have chosen to fix those helpers instead, but there is another far more subtle issue that was reported during review of the SIMD32 fragment shader codegen changes: The SIMD splitting pass currently handles SIMD32 barycentric vectors as if they had the standard X[0-n] Y[0-n] layout, even though they are interleaved for the PLN instruction, which causes incorrect execution masks to be applied to the MOVs unzipping barycentric vectors in cases where a LINTERP instruction occurs under non-uniform control flow. I'm not aware of any conformance regressions due to the latter issue at present, but for our peace of mind let's move the conversion to the PLN layout into the lower_barycentrics() pass run after lower_simd_width(). This leads to the following shader-db improvements (including SIMD32 shaders) in combination with the previous back-end preparation changes -- Without them (especially the copy propagation changes) this would lead to a massive number of regressions. On ICL: total instructions in shared programs: 20662316 -> 20466903 (-0.95%) instructions in affected programs: 10538474 -> 10343061 (-1.85%) helped: 68775 HURT: 6 total spills in shared programs: 8938 -> 8748 (-2.13%) spills in affected programs: 376 -> 186 (-50.53%) helped: 9 HURT: 5 total fills in shared programs: 8965 -> 8663 (-3.37%) fills in affected programs: 965 -> 663 (-31.30%) helped: 9 HURT: 6 LOST: 146 GAINED: 43 On SKL: total instructions in shared programs: 18725867 -> 18614912 (-0.59%) instructions in affected programs: 3876590 -> 3765635 (-2.86%) helped: 27492 HURT: 2 LOST: 191 GAINED: 417 On SNB: total instructions in shared programs: 14573613 -> 13980646 (-4.07%) instructions in affected programs: 5199074 -> 4606107 (-11.41%) helped: 29998 HURT: 0 LOST: 21 GAINED: 30 Results are somewhat less impressive but still significant without SIMD32 fragment shaders enabled. On ICL: total instructions in shared programs: 16148728 -> 16061659 (-0.54%) instructions in affected programs: 6114788 -> 6027719 (-1.42%) helped: 42046 HURT: 6 total spills in shared programs: 8218 -> 8028 (-2.31%) spills in affected programs: 376 -> 186 (-50.53%) helped: 9 HURT: 5 total fills in shared programs: 8953 -> 8651 (-3.37%) fills in affected programs: 965 -> 663 (-31.30%) helped: 9 HURT: 6 LOST: 0 GAINED: 3 On SKL: total instructions in shared programs: 14927994 -> 14926738 (-0.01%) instructions in affected programs: 168850 -> 167594 (-0.74%) helped: 711 HURT: 2 On SNB: total instructions in shared programs: 10770538 -> 10734403 (-0.34%) instructions in affected programs: 2702172 -> 2666037 (-1.34%) helped: 17818 HURT: 0 All of the hurt shaders are either spilling slightly more or emitting additional NOP instructions due to the SIMD16 POW workaround for Gen8-9 combined with differences in scheduling. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/fs: Split fetch_payload_reg() into separate helper for barycentrics.Francisco Jerez2020-01-171-2/+2
| | | | | | | | | | | | | | | | We're about to change the layout of barycentric vectors, which will involve permuting the GRFs of barycentrics fetched from the thread payload. Make room for this in a function separate from the generic fetch_payload_reg(), since the permutation will only be applicable to barycentric vectors. This allows simplifying fetch_payload_reg(), since there was no need for handling multiple-component payload registers except for barycentrics. This causes some minor shader-db noise due to the new helper emitting a LOAD_PAYLOAD instruction unconditionally, but it will be cleaned up shortly. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/fs/gen6: Use SEL instead of bashing thread payload for unlit centroid ↵Francisco Jerez2020-01-171-5/+8
| | | | | | | | | | | | | | | | | | | | | | | workaround. This prevents regressions on SNB due to the redundant MOVs lying around in cases where fetch_payload_reg() returns a VGRF (currently only in SIMD32 but soon in pretty much all cases). The MOVs can't be register-coalesced due to their source being a FIXED_GRF, and they can't be copy-propagated either due to the unlit centroid workaround partial writes. They can be copy-propagated just fine into a SEL instruction though. On SNB this prevents the following shader-db regressions (including SIMD32 programs) in combination with the interpolation rework part of this series: total instructions in shared programs: 13996898 -> 14001982 (0.04%) instructions in affected programs: 197461 -> 202545 (2.57%) helped: 0 HURT: 1251 Reviewed-by: Kenneth Graunke <[email protected]>
* intel/fs: Check for NULL key in fs_visitor constructorMichel Dänzer2019-10-241-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | Flagged by UBSan: ../src/intel/compiler/brw_fs_visitor.cpp:986:20: runtime error: member access within null pointer of type 'const struct brw_base_prog_key' #0 0x559fadb48556 in fs_visitor::init() ../src/intel/compiler/brw_fs_visitor.cpp:986 #1 0x559fadb46db3 in fs_visitor::fs_visitor(brw_compiler const*, void*, void*, brw_base_prog_key const*, brw_stage_prog_data*, nir_shader const*, unsigned int, int, brw_vue_map const*) ../src/intel/compiler/brw_fs_visitor.cpp:962 #2 0x559fad9c7cd8 in saturate_propagation_fs_visitor::saturate_propagation_fs_visitor(brw_compiler*, brw_wm_prog_data*, nir_shader*) (/home/daenzer/src/mesa-git/mesa/build-amd64-sanitize/src/intel/compiler/fs_saturate_propagation+0x61bcd8) #3 0x559fad9960a1 in saturate_propagation_test::SetUp() ../src/intel/compiler/test_fs_saturate_propagation.cpp:65 #4 0x559fadd7a32d in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ../src/gtest/src/gtest.cc:2402 #5 0x559fadd65c3b in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ../src/gtest/src/gtest.cc:2438 #6 0x559fadd0af75 in testing::Test::Run() ../src/gtest/src/gtest.cc:2470 #7 0x559fadd0d8a4 in testing::TestInfo::Run() ../src/gtest/src/gtest.cc:2656 #8 0x559fadd10032 in testing::TestCase::Run() ../src/gtest/src/gtest.cc:2774 #9 0x559fadd2ba0c in testing::internal::UnitTestImpl::RunAllTests() ../src/gtest/src/gtest.cc:4649 #10 0x559fadd7df46 in bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) ../src/gtest/src/gtest.cc:2402 #11 0x559fadd69613 in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) ../src/gtest/src/gtest.cc:2438 #12 0x559fadd2302e in testing::UnitTest::Run() ../src/gtest/src/gtest.cc:4257 #13 0x559fadda2d61 in RUN_ALL_TESTS() ../src/gtest/include/gtest/gtest.h:2233 #14 0x559fadda2c21 in main ../src/gtest/src/gtest_main.cc:37 #15 0x7fe8f6748bba in __libc_start_main ../csu/libc-start.c:308 #16 0x559fad9950f9 in _start (/home/daenzer/src/mesa-git/mesa/build-amd64-sanitize/src/intel/compiler/fs_saturate_propagation+0x5e90f9) Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Adam Jackson <[email protected]>
* intel/compiler: Remove emit_alpha_to_coverage workaround from backendSagar Ghuge2019-10-211-84/+0
| | | | | | | | | | | | | Remove emit_alpha_to_coverage workaround from backend compiler and start using ported workaround from NIR. v2: Copy comment from brw_fs_visitor (Caio Marcelo de Oliveira Filho) Fixes piglit test on HSW: - arb_sample_shading-builtin-gl-sample-mask-mrt-alpha-to-coverage-combinations Signed-off-by: Sagar Ghuge <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/fs/gen12: Fix barrier codegen.Francisco Jerez2019-10-111-0/+1
| | | | | | | | The WAIT instruction has been removed, but SYNC.bar can be used instead to wait for a notification on n0.0. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: add emit_shader_float_controls_execution_mode() and aux functionsSamuel Iglesias Gonsálvez2019-09-171-0/+58
| | | | | | | | | | | | | | | | | | | | We need this function to emit code that setups the control register later with the defined execution mode for the shader. Therefore, we emit it as the first instruction. v2: - Fix bug in setting the default mode mask in brw_rnd_mode_from_nir(). - Fix support for rounding modes in brw_rnd_mode_from_nir(). v3: - Updated to renamed shader info member and enum values (Andres). v4: - Add actual emission as first instruction of emit_nir_code (Caio). Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Signed-off-by: Andres Gomez <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* Revert "intel/fs: Move the scalar-region conversion to the generator."Jason Ekstrand2019-09-061-1/+1
| | | | | | | | | | This reverts commit c0504569eac5e5c305e9f0c240e248aca9d8891f. Now that we're doing interpolation lowering in NIR, we can continue to stride the FS input registers directly in the brw_fs_nir code like we did before. This fixes SIMD32 fragment shaders which broke because lower_simd_width depended on the 0 stride to split PLN instructions correctly. Reviewed-by: Francisco Jerez <[email protected]>
* intel/fs: Drop the gl_program from fs_visitorJason Ekstrand2019-08-251-3/+2
| | | | | | | | | It's not used by anything anymore now that so much lowering has been moved into NIR. Sadly, we still need on in brw_compile_gs() for geometry shaders on Sandy Bridge. Short of a lot of pointless work, that one's probably not going away. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Print the scheduler mode.Matt Turner2019-07-301-0/+1
| | | | | | | | Line wrap some awfully long lines while we are here. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965/fs: Add a shader_stats struct.Matt Turner2019-07-301-1/+1
| | | | | | | | | | It'll grow further, and we'd like to avoid adding an additional parameter to fs_generator() for each new piece of data. v2 (idr): Rebase on 17 months. Track a visitor instead of a cfg. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Use NIR to lower legacy userclipping.Kenneth Graunke2019-07-241-81/+0
| | | | | | | | | | | | | | | This allows us to drop legacy userclip plane handling in both the vec4 and FS backends, and simplifies a few interfaces. v2 (Jason Ekstrand): - Move brw_nir_lower_legacy_clipping to brw_nir_uniforms.cpp because it's i965-specific. - Handle adding the params in brw_nir_lower_legacy_clipping - Call brw_nir_lower_legacy_clipping from brw_codegen_vs_prog Co-authored-by: Jason Ekstrand <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel/compiler: Add a "base class" for program keysJason Ekstrand2019-07-101-24/+3
| | | | | | | | | Right now, all keys have two things in common: a program string ID and a sampler_prog_key_data. I'd like to add another thing or two and need a place to put it. This commit adds a new brw_base_prog_key struct which contains those two common bits. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/compiler/fs/icl: Use dummy masked urb write for tess evalTopi Pohjolainen2019-04-251-1/+50
| | | | | | | | | | | One cannot write the URB arbitrarily and therefore the message has to be carefully constructed. The clever tricks originate from Kenneth and Jason, I'm just writing the patch. Fixes GPU hangs on ICL with Vulkan CTS. Reviewed-by: Kenneth Graunke <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]>
* intel/fs: Move the scalar-region conversion to the generator.Rafael Antognolli2019-04-221-1/+1
| | | | | | | | | | Move the scalar-region conversion from the IR to the generator, so it doesn't affect the Gen11 path. We need the non-scalar regioning for a later lowering pass that we are adding. v2: Better commit message (Matt) Reviewed-by: Matt Turner <[email protected]>
* intel/fs: Add support for bindless texture opsJason Ekstrand2019-04-191-1/+3
| | | | | | | | | | | | | | We add two new texture sources for bindless surface and sampler handles. Bindless surface handles are expected to be pre-shifted so that the 20-bit surface state table index is in the top 20 bits of the 32-bit handle. This lets us avoid any extra shifts in the shader. Bindless sampler handles are 32-byte aligned byte offsets from general state base address. We use 32-byte aligned instead of 16-byte aligned to avoid having to use more indirect messages than needed. It means we can't tightly pack samplers but that's probably not a big deal. Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* i965,iris,anv: Make alpha to coverage work with sample maskDanylo Piliaiev2019-03-251-1/+93
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | From "Alpha Coverage" section of SKL PRM Volume 7: "If Pixel Shader outputs oMask, AlphaToCoverage is disabled in hardware, regardless of the state setting for this feature." From OpenGL spec 4.6, "15.2 Shader Execution": "The built-in integer array gl_SampleMask can be used to change the sample coverage for a fragment from within the shader." From OpenGL spec 4.6, "17.3.1 Alpha To Coverage": "If SAMPLE_ALPHA_TO_COVERAGE is enabled, a temporary coverage value is generated where each bit is determined by the alpha value at the corresponding sample location. The temporary coverage value is then ANDed with the fragment coverage value to generate a new fragment coverage value." Similar wording could be found in Vulkan spec 1.1.100 "25.6. Multisample Coverage" Thus we need to compute alpha to coverage dithering manually in shader and replace sample mask store with the bitwise-AND of sample mask and alpha to coverage dithering. The following formula is used to compute final sample mask: m = int(16.0 * clamp(src0_alpha, 0.0, 1.0)) dither_mask = 0x1111 * ((0xfea80 >> (m & ~3)) & 0xf) | 0x0808 * (m & 2) | 0x0100 * (m & 1) sample_mask = sample_mask & dither_mask Credits to Francisco Jerez <[email protected]> for creating it. It gives a number of ones proportional to the alpha for 2, 4, 8 or 16 least significant bits of the result. GEN6 hardware does not have issue with simultaneous usage of sample mask and alpha to coverage however due to the wrong sending order of oMask and src0_alpha it is still affected by it. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109743 Signed-off-by: Danylo Piliaiev <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* intel/compiler: use 0 as sampler in emit_mcs_fetchCaio Marcelo de Oliveira Filho2019-02-081-1/+1
| | | | | | | | | | | | The sampler will be ignored since the underlying 'ld_mcs' operation won't use it, so just fill the field with 0 instead of the texture to make it clearer that's the case. This will also avoid is_high_sampler() to kick in unnecessarily, in case we are using the operation for a texture with index >= 16. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel/compiler/icl: Use barrier id bits 24:30 instead of 24:27,31Topi Pohjolainen2018-09-251-3/+13
| | | | | | | Fixes gpu hangs with Carchase and Manhattan. Reviewed-by: Anuj Phogat <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]>
* intel/fs: use uint type for per_slot_offset at GSJose Maria Casanova Crespo2018-07-091-1/+1
| | | | | | | | | | | | This helps us to compact original instruction: mul(8) g3<1>D g6<8,8,1>UD 0x00000006UD { align1 1Q }; So now we emit: mul(8) g3<1>UD g6<8,8,1>UD 0x00000006UD { align1 1Q compacted }; Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* intel/fs: Add fields to wm_prog_data for SIMD32 dispatchJason Ekstrand2018-06-281-0/+1
| | | | Reviewed-by: Matt Turner <[email protected]>
* intel/fs: Fix Gen6+ interpolation setup for SIMD32Francisco Jerez2018-06-281-56/+60
| | | | | Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* intel/fs: Get rid of MOV_DISPATCH_TO_FLAGSJason Ekstrand2018-06-281-1/+3
| | | | | | We can just emit the MOV in the two places where we use this. Reviewed-by: Matt Turner <[email protected]>
* intel/fs: Emit MOV_DISPATCH_TO_FLAGS once for the centroid workaroundJason Ekstrand2018-06-281-11/+16
| | | | | | | There's no reason for us to emit it a pile of times and then have a whole pass to clean it up. Just emit it once like we really want. Reviewed-by: Matt Turner <[email protected]>
* intel/fs: Generalize the unlit centroid workaroundFrancisco Jerez2018-06-281-14/+8
| | | | | | | | This generalizes the unlit centroid workaround so it's less code and now supports SIMD32. Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* intel/fs: Wrap FS payload register look-up in a helper function.Francisco Jerez2018-06-281-7/+5
| | | | | Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* intel/fs: Use fs_regs instead of brw_regs in the unlit centroid workaroundFrancisco Jerez2018-06-281-12/+12
| | | | | | | | | | While we're here, we change to using horiz_offset() instead of abusing half(). v2 (Jason Ekstrand): - Use horiz_offset() instead of half() Reviewed-by: Matt Turner <[email protected]>
* intel/fs: Don't enable dual source blend if no outputs are writtenFrancisco Jerez2018-06-281-1/+2
| | | | | | | | This prevents a crash in some arb_enhanced_layouts tests that would be caused by the next commit. Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* intel/fs: Rework KSP data to be SIMD width-basedJason Ekstrand2018-06-281-1/+1
| | | | Reviewed-by: Matt Turner <[email protected]>
* intel/fs: Add explicit last_rt flag to fb writes orthogonal to eot.Francisco Jerez2018-05-291-0/+2
| | | | | | | | | | | | | | When using multiple RT write messages to the same RT such as for dual-source blending or all RT writes in SIMD32, we have to set the "Last Render Target Select" bit on all write messages that target the last RT but only set EOT on the last RT write in the shader. Special-casing for dual-source blend works today because that is the only case which requires multiple RT write messages per RT. When we start doing SIMD32, this will become much more common so we add a dedicated bit for it. Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* intel/fs: Use the ATTR file for FS inputsFrancisco Jerez2018-05-291-6/+4
| | | | | | | | | | | | This replaces the special magic opcodes which implicitly read inputs with explicit use of the ATTR file. v2 (Jason Ekstrand): - Break into multiple patches - Change the units of the FS ATTR to be in logical scalars Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* intel/fs: Initialize fs_visitor::grf_used on construction.Francisco Jerez2017-12-211-0/+1
| | | | | | | | | | | | | | | This should shut up some Valgrind errors during pre-regalloc scheduling. The errors were harmless since they could only have led to the estimation of the bank conflict penalty of an instruction pre-regalloc, which is inaccurate at that point of the program compilation, but no less accurate than the intended "return 0" fall-back path. The scheduling pass is normally re-run after regalloc with a well-defined grf_used value and accurate bank conflict information. Fixes: acf98ff933d "intel/fs: Teach instruction scheduler about GRF bank conflict cycles." Reported-by: Eero Tamminen <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* intel/fs: Explicitly set EXECUTE_1 where neededJason Ekstrand2017-11-071-4/+3
| | | | Reviewed-by: Iago Toral Quiroga <[email protected]>
* intel/fs: Rework zero-length URB write handlingJason Ekstrand2017-11-071-29/+31
| | | | | | | | | | | | | | Originally we tried to handle this case based on slots_valid. However, there are a number of ways that this can go wrong. For one, we throw away any trailing slots which either aren't written or are set to VARYING_SLOT_PAD. Second, even if PSIZ is a valid slot, we may not actually write anything there. Between the lot of these, it was possible to end up in a case where we tried to do a regular URB write but ended up with a length of 1 which is invalid. This commit moves it to the end and makes it based on a new boolean flag urb_written. Reviewed-by: Iago Toral Quiroga <[email protected]> Cc: [email protected]
* intel/fs: Remove min_dispatch_width from fs_visitorJason Ekstrand2017-11-071-11/+0
| | | | | | | | It's 8 for everything except compute shaders. For compute shaders, there's no need to duplicate the computation and it's just a possible source of error. Reviewed-by: Iago Toral Quiroga <[email protected]>
* intel/vs: Grow the param array for clip planesJason Ekstrand2017-10-121-0/+7
| | | | | | | | Instead of requiring the caller of brw_compile_vs to figure it out, just grow the param array on-demand. Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel: Rewrite the world of push/pull paramsJason Ekstrand2017-10-121-4/+4
| | | | | | | | | | | | | | | | | This moves us away to the array of pointers model and onto a model where each param is represented by a generic uint32_t handle. We reserve 2^16 of these handles for builtins that get generated by somewhere inside the compiler and have well-defined meanings. Generic params have handles whose meanings are defined by the driver. The primary downside to this new approach is that it moves a little bit of the work that we would normally do at compile time to draw time. On my laptop this hurts OglBatch6 by no more than 1% and doesn't seem to have any measurable affect on OglBatch7. So, while this may come back to bite us, it doesn't look too bad. Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Lower gl_VertexID and friends to inputs at the NIR levelJason Ekstrand2017-05-091-38/+0
| | | | | | | | | NIR calls these system values but they come in from the VF unit as vertex data. It's terribly convenient to just be able to treat them as such in the back-end. Reviewed-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: Set uses_vertexid and friends from brw_compile_vsJason Ekstrand2017-05-091-6/+0
| | | | | Reviewed-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nir: Embed the shader_info in the nir_shader againJason Ekstrand2017-05-091-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit e1af20f18a86f52a9640faf2d4ff8a71b0a4fa9b changed the shader_info from being embedded into being just a pointer. The idea was that sharing the shader_info between NIR and GLSL would be easier if it were a pointer pointing to the same shader_info struct. This, however, has caused a few problems: 1) There are many things which generate NIR without GLSL. This means we have to support both NIR shaders which come from GLSL and ones that don't and need to have an info elsewhere. 2) The solution to (1) raises all sorts of ownership issues which have to be resolved with ralloc_parent checks. 3) Ever since 00620782c92100d77c660f9783504c6d80fa1d58, we've been using nir_gather_info to fill out the final shader_info. Thanks to cloning and the above ownership issues, the nir_shader::info may not point back to the gl_shader anymore and so we have to do a copy of the shader_info from NIR back to GLSL anyway. All of these issues go away if we just embed the shader_info in the nir_shader. There's a little downside of having to copy it back after calling nir_gather_info but, as explained above, we have to do that anyway. Acked-by: Timothy Arceri <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Move the back-end compiler to src/intel/compilerJason Ekstrand2017-03-131-0/+953
Mostly a dummy git mv with a couple of noticable parts: - With the earlier header cleanups, nothing in src/intel depends files from src/mesa/drivers/dri/i965/ - Both Autoconf and Android builds are addressed. Thanks to Mauro and Tapani for the fixups in the latter - brw_util.[ch] is not really compiler specific, so it's moved to i965. v2: - move brw_eu_defines.h instead of brw_defines.h - remove no-longer applicable includes - add missing vulkan/ prefix in the Android build (thanks Tapani) v3: - don't list brw_defines.h in src/intel/Makefile.sources (Jason) - rebase on top of the oa patches [Emil Velikov: commit message, various small fixes througout] Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>