summaryrefslogtreecommitdiffstats
path: root/src/intel
Commit message (Collapse)AuthorAgeFilesLines
* intel/fs: Extend thread payload layout to SIMD32Francisco Jerez2018-06-283-22/+45
| | | | | | | | | | And handle 32-wide payload register reads in fetch_payload_reg(). v2 (Jason Ekstrand); - Fix some whitespace and brace placement Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>
* intel/fs: Wrap FS payload register look-up in a helper function.Francisco Jerez2018-06-283-12/+23
| | | | | Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>
* intel/fs: Use fs_regs instead of brw_regs in the unlit centroid workaroundFrancisco Jerez2018-06-281-12/+12
| | | | | | | | | | While we're here, we change to using horiz_offset() instead of abusing half(). v2 (Jason Ekstrand): - Use horiz_offset() instead of half() Reviewed-by: Matt Turner <mattst88@gmail.com>
* intel/fs: Simplify fs_visitor::emit_samplepos_setupFrancisco Jerez2018-06-281-21/+7
| | | | | | | | | | | | | The original code manually handled splitting the MOVs to 8-wide to handle various regioning restrictions. Now that we have a SIMD width splitting pass that handles these things, we can just emit everything at the full width and let the SIMD splitting pass handle it. We also now have a useful "subscript" helper which is designed exactly for the case where you want to take a W type and read it as a vector of Bs so we may as well use that too. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Add plumbing for shader time in 32-wide FS dispatch mode.Francisco Jerez2018-06-284-3/+4
| | | | | Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>
* intel/fs: Disable opt_sampler_eot() in 32-wide dispatch.Francisco Jerez2018-06-282-1/+6
| | | | | Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>
* intel/fs: Emit LINE+MAC for LINTERP with unaligned coordinatesJason Ekstrand2018-06-282-10/+56
| | | | | | | | | | | | | | | | | | | | | On g4x through Sandy Bridge, src1 (the coordinates) of the PLN instruction is required to be an even register number. When it's odd (which can happen with SIMD32), we have to emit a LINE+MAC combination instead. Unfortunately, we can't just fall through to the gen4 case because the input registers are still set up for PLN which lays out the four src1 registers differently in SIMD16 than LINE. v2 (Jason Ekstrand): - Take advantage of both accumulators and emit LINE LINE MAC MAC (Based on a patch from Francisco Jerez) - Unify the gen4 and gen4x-6 cases using a loop v3 (Jason Ekstrand): - Don't unify gen4 with gen4x-6 as this turns out to be more fragile than first thought without reworking the gen4 barycentric coordinate layout. Reviewed-by: Matt Turner <mattst88@gmail.com>
* intel/fs: Mark LINTERP opcode as writing accumulator on platforms without PLNJason Ekstrand2018-06-281-1/+2
| | | | | | | | | | | | | | When we don't have PLN (gen4 and gen11+), we implement LINTERP as either LINE+MAC or a pair of MADs. In both cases, the accumulator is written by the first of the two instructions and read by the second. Even though the accumulator value isn't actually ever used from a logical instruction perspective, it is trashed so we need to make the scheduler aware. Otherwise, the scheduler could end up re-ordering instructions and putting a LINTERP between another an instruction which writes the accumulator and another which tries to use that result. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Matt Turner <mattst88@gmail.com>
* intel/fs: Rework INTERPOLATE_AT_PER_SLOT_OFFSETFrancisco Jerez2018-06-283-19/+9
| | | | | | | | | | This reworks INTERPOLATE_AT_PER_SLOT_OFFSET to work more like an ALU operation and less like a send. This is less code over-all and, as a side-effect, it now properly handles execution groups and lowering so SIMD32 support just falls out. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>
* intel/fs: Add the group to the flag subreg number on SNB and olderJason Ekstrand2018-06-281-1/+7
| | | | | | | | | | | We want consistent behavior in the meaning of the flag_subreg field between SNB and IVB+. v2 (Jason Ekstrand): - Add some extra commentary Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>
* intel/fs: Fix FB read header setup for SIMD32.Francisco Jerez2018-06-281-4/+13
| | | | | Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>
* intel/fs: Fix logical FB write lowering for SIMD32Francisco Jerez2018-06-281-5/+20
| | | | Reviewed-by: Matt Turner <mattst88@gmail.com>
* intel/fs: Fix FB write message control codegen for SIMD32.Francisco Jerez2018-06-281-18/+34
| | | | | Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>
* intel/fs: Don't enable dual source blend if no outputs are writtenFrancisco Jerez2018-06-281-1/+2
| | | | | | | | This prevents a crash in some arb_enhanced_layouts tests that would be caused by the next commit. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>
* intel/fs: Fix codegen of FS_OPCODE_SET_SAMPLE_ID for SIMD32.Francisco Jerez2018-06-281-11/+13
| | | | | Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>
* intel/eu: Fix pixel interpolator queries for SIMD32.Francisco Jerez2018-06-281-1/+2
| | | | | Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>
* intel/fs: Disable SIMD32 dispatch for fragment shaders with discard.Francisco Jerez2018-06-281-0/+2
| | | | | | | | | | Current discard handling requires dedicating the second flag register to discard. However, control-flow in SIMD32 requires both flag registers so it's incompatible with the current discard handling. Just don't support SIMD32+discard for now. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>
* intel/fs: Disable SIMD32 dispatch on Gen4-6 with control flowFrancisco Jerez2018-06-281-0/+8
| | | | | | | | | The hardware's control flow logic is 16-wide so we're out of luck here. We could, in theory, support SIMD32 if we know the control-flow is uniform but we don't have that information at this point. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>
* intel/fs: Split instructions low to high in lower_simd_widthJason Ekstrand2018-06-281-2/+35
| | | | | | | | | | | | | Commit 0d905597f fixed an issue with the placement of the zip and unzip instructions. However, as a side-effect, it reversed the order in which we were emitting the split instructions so that they went from high group to low instead of low to high. This is fine for most things like texture instructions and the like but certain render target writes really want to be emitted low to high. This commit just switches the order back around to be low to high. Reviewed-by: Matt Turner <mattst88@gmail.com> Fixes: 0d905597f "intel/fs: Be more explicit about our placement of [un]zip"
* intel/fs: Rework KSP data to be SIMD width-basedJason Ekstrand2018-06-283-47/+43
| | | | Reviewed-by: Matt Turner <mattst88@gmail.com>
* intel/compiler: Add and use helpers for working with KSP indicesJason Ekstrand2018-06-283-32/+136
| | | | | | | | The pixel shader dispatch table is kind-of a confusing mess. This adds some helpers for dealing with it and for easily extracting the correct data from wm_prog_data. Reviewed-by: Matt Turner <mattst88@gmail.com>
* intel/fs: Remove program key argument from generator.Francisco Jerez2018-06-287-10/+7
| | | | | Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>
* intel/fs: Set up FB write message headers in the visitorJason Ekstrand2018-06-282-83/+86
| | | | | | | | | | | Doing instruction header setup in the generator is awful for a number of reasons. For one, we can't schedule the header setup at all. For another, it means lots of implied writes which the instruction scheduler and other passes can't properly read about. The second isn't a huge problem for FB writes since they always happen at the end. We made a similar change to sampler handling in ff4726077d86. Reviewed-by: Matt Turner <mattst88@gmail.com>
* intel/fs: Fix implied_mrf_writes() for headerless FB writes.Francisco Jerez2018-06-281-1/+2
| | | | | Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>
* intel/fs: Fix fs_inst::flags_written() for Gen4-5 FB writes.Francisco Jerez2018-06-281-1/+2
| | | | | Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>
* intel/eu: Return new instruction to caller from brw_fb_WRITE().Francisco Jerez2018-06-282-21/+23
| | | | | Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>
* intel/fs: Pull FB write implied headers from src[0]Jason Ekstrand2018-06-281-9/+6
| | | | | | | | Now that we have the implied header in src[0] for tracking purposes, we may as well use it in the generator. This makes things a tiny bit more general. Reviewed-by: Matt Turner <mattst88@gmail.com>
* intel/fs: Properly track implied header regs read by FB writesJason Ekstrand2018-06-281-1/+16
| | | | | | | | The FB write opcode on gen4-5 does implied copies from g0 and g1 to the message payload. With this commit, we start tracking that as part of the IR by having the FB write read from g0-1. Reviewed-by: Matt Turner <mattst88@gmail.com>
* intel/fs: FS_OPCODE_REP_FB_WRITE has side effectsJason Ekstrand2018-06-281-0/+1
| | | | | | | It doesn't matter since we don't ever run replicated write shaders through the optimizer but it's good to be complete. Reviewed-by: Matt Turner <mattst88@gmail.com>
* Revert "anv: Print the actual enum for ignored structure types"Jason Ekstrand2018-06-271-3/+1
| | | | | This reverts commit fda7014c35e5f5dfa26f078ad0512d13ead8b717. It was hitting an unreachable when the sType was unknown.
* anv: Print the actual enum for ignored structure typesJason Ekstrand2018-06-271-1/+3
| | | | Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
* i965/gen6/gs: Handle case where a GS doesn't allocate VUEAndrii Simiklit2018-06-261-21/+21
| | | | | | | | | | | | | | | | | | | | | | | We can not use the VUE Dereference flags combination for EOT message under ILK and SNB because the threads are not initialized there with initial VUE handle unlike Pre-IL. So to avoid GPU hangs on SNB and ILK we need to avoid usage of the VUE Dereference flags combination. (Was tested only on SNB but according to the specification SNB Volume 2 Part 1: 1.6.5.3, 1.6.5.6 the ILK must behave itself in the similar way) v2: Approach to fix this issue was changed. Instead of different EOT flags in the program end we will create VUE every time even if GS produces no output. v3: Clean up the patch. Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105399 CC: <mesa-stable@lists.freedesktop.org> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Tested-by: Mark Janes <mark.a.janes@intel.com>
* anv: add VK_EXT_display_control to anv driver [v5]Keith Packard2018-06-234-0/+108
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This extension provides fences and frame count information to direct display contexts. It uses new kernel ioctls to provide 64-bits of vblank sequence and nanosecond resolution. v2: Adopt Jason Ekstrand's coding conventions Declare variables at first use, eliminate extra whitespace between types and names. Wrap lines to 80 columns. Add extension to list in alphabetical order Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com> v3: Adapt to WSI fence API change. It now returns VkResult and no longer has an option for relative timeouts. v4: wsi_register_display_event and wsi_register_device_event now use the default allocator when NULL is provided, so remove the computation of 'alloc' here. v5: use zalloc2 instead of alloc2 for the WSI fence. Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com> Signed-off-by: Keith Packard <keithp@keithp.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
* anv: Support wait for heterogeneous list of fences [v3]Keith Packard2018-06-231-18/+90
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Handle the case where the set of fences to wait for is not all of the same type by either waiting for them sequentially (waitAll), or polling them until the timer has expired (!waitAll). We hope the latter case is not common. While the current code makes sure that it always has fences of only one type, that will not be true when we add WSI fences. Split out this refactoring to make merging that clearer. v2: Adopt Jason Ekstrand's coding conventions Declare variables at first use, eliminate extra whitespace between types and names. Wrap lines to 80 columns. Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com> v2: Cast INT64_MAX to uint64_t to make of its use as the maximum possible timeout clearly unsigned to the reader. Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com> Make anv_wait_for_fences with !waitAll check all fences at least once, even if the requested timeout has already passed. Signed-off-by: Keith Packard <keithp@keithp.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
* nir: Rework lower_locals_to_regs to use deref instructionsJason Ekstrand2018-06-221-2/+0
| | | | | | | | | | This completely reworks the pass to support deref instructions and delete support for old deref chains Acked-by: Rob Clark <robdclark@gmail.com> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Acked-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* intel,ir3: Re-enable nir_opt_copy_prop_varsJason Ekstrand2018-06-221-1/+1
| | | | | | | | | Now that it's rewritten for deref instructions, we can turn it back on. Acked-by: Rob Clark <robdclark@gmail.com> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Acked-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* intel/blorp: Stop setting tex->texture/samplerJason Ekstrand2018-06-221-2/+0
| | | | | | | | | nir_tex_instr_create uses rzalloc so it's already NULL Acked-by: Rob Clark <robdclark@gmail.com> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Acked-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* intel/nir: Only lower load/store derefsJason Ekstrand2018-06-221-1/+1
| | | | | | | | | Everything else should already be handled. Acked-by: Rob Clark <robdclark@gmail.com> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Acked-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* intel/fs: Use image_deref intrinsics instead of image_varJason Ekstrand2018-06-223-75/+86
| | | | | | | | | | | | Since we had to rewrite the deref walking loop anyway, I took the opportunity to make it a bit clearer and more efficient. In particular, in the AoA case, we will now emit one minmax instead of one per array level. Acked-by: Rob Clark <robdclark@gmail.com> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Acked-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* anv/pipeline: Convert apply_pipeline_layout to deref instructionsJason Ekstrand2018-06-222-74/+78
| | | | | | | Acked-by: Rob Clark <robdclark@gmail.com> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Acked-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* anv/apply_pipeline_layout: Simplify extract_tex_src_planeJason Ekstrand2018-06-221-34/+12
| | | | | | | Acked-by: Rob Clark <robdclark@gmail.com> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Acked-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* anv/pipeline: Convert lower_multiview to deref instructionsJason Ekstrand2018-06-221-12/+5
| | | | | | | | Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Acked-by: Rob Clark <robdclark@gmail.com> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Acked-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* anv/pipeline: Convert YCbCr lowering to deref instructiosnJason Ekstrand2018-06-222-18/+22
| | | | | | | Acked-by: Rob Clark <robdclark@gmail.com> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Acked-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* anv/pipeline: Convert lower_input_attachments to deref instructionsJason Ekstrand2018-06-222-18/+19
| | | | | | | Acked-by: Rob Clark <robdclark@gmail.com> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Acked-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* anv/pipeline: Do less deref instruction loweringJason Ekstrand2018-06-221-2/+3
| | | | | | | | | | | This commit removes most of the deref instruction lowering. Instead of lowering early, we only lower textures and images and we only do so right before any of the anv image lowering passes. Acked-by: Rob Clark <robdclark@gmail.com> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Acked-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* nir,spirv: Rework function callsJason Ekstrand2018-06-221-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | This commit completely reworks function calls in NIR. Instead of having a set of variables for the parameters and return value, nir_call_instr now has simply has a number of sources which get mapped to load_param intrinsics inside the functions. It's up to the client API to build an ABI on top of that. In SPIR-V, out parameters are handled by passing the result of a deref through as an SSA value and storing to it. This virtue of this approach can be seen by how much it allows us to delete from core NIR. In particular, nir_inline_functions gets halved and goes from a fairly difficult pass to understand in detail to almost trivial. It also simplifies spirv_to_nir somewhat because NIR functions never were a good fit for SPIR-V. Unfortunately, there is no good way to do this without a mega-commit. Core NIR and SPIR-V have to be changed at the same time. This also requires changes to anv and radv because nir_inline_functions couldn't handle deref instructions before this change and can't work without them after this change. Acked-by: Rob Clark <robdclark@gmail.com> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Acked-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* spirv: Use NIR per-member splittingJason Ekstrand2018-06-221-0/+6
| | | | | | | | | | | | Before, we were doing structure splitting in spirv_to_nir. Unfortunately, this doesn't really work when you think about passing struct pointers into functions. Doing it later in NIR is a much better plan. Acked-by: Rob Clark <robdclark@gmail.com> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Acked-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Move nir_lower_deref_instrs to right before locals_to_regsJason Ekstrand2018-06-222-2/+2
| | | | | | | | Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Acked-by: Rob Clark <robdclark@gmail.com> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Acked-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* intel,ir3: Disable nir_opt_copy_prop_varsJason Ekstrand2018-06-221-1/+1
| | | | | | | | | | | | This pass doesn't handle deref instructions yet. Making it handle both legacy derefs and deref instructions would be painful. Since it's not important for correctness, just disable it for now. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Acked-by: Rob Clark <robdclark@gmail.com> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Acked-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* anv,i965,radv,st,ir3: Call nir_lower_deref_instrsJason Ekstrand2018-06-222-0/+4
| | | | | | | | | | | This inserts a call to nir_lower_deref_instrs at every call site of glsl_to_nir, spirv_to_nir, and prog_to_nir. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Acked-by: Rob Clark <robdclark@gmail.com> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Acked-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>