summaryrefslogtreecommitdiffstats
path: root/src/broadcom
Commit message (Collapse)AuthorAgeFilesLines
* nir: merge and extend nir_opt_move_comparisons and nir_opt_move_load_uboRhys Perry2019-08-121-1/+1
| | | | | | | | | | v2: add to series v3: update Makefile.sources v4: don't remove a comment and break statement v4: use nir_can_move_instr Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* v3d: use the GPU to record primitives written to transform feedbackIago Toral Quiroga2019-08-081-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We can use the PRIMITIVE_COUNTS_FEEDBACK packet to write various primitive counts to a buffer, including the number of primives written to transform feedback buffers, which will handle buffer overflow correctly. There are a couple of caveats with this: Primitive counters are reset when we emit a 'Tile Binning Mode Configuration' packet, which can happen in the middle of a primitives query, so we need to read the buffer when we submit a job and accumulate the counts in the context so we don't lose them. We also need to do the same when we switch primitive type during transform feedback so we can compute the correct number of recorded vertices from the number of primitives. This is necessary so we can provide an accurate vertex count for draw from transform feedback. v2: - When computing the number of vertices for a primitive, pass in the base primitive, since that is what the hardware will count. - No need to update primitive counts when switching primitive types if the base primitives are the same. - Log perf warning when mapping the primitive counts BO for readback (Eric). - Only emit the primitive counts packet once at job end (Eric). - Use u_upload mechanism for the primitive counts buffer (Eric). - Use the XML to generate indices into the primitive counters buffer (Eric). Fixes piglit tests: spec/ext_transform_feedback/overflow-edge-cases spec/ext_transform_feedback/query-primitives_written-bufferrange spec/ext_transform_feedback/query-primitives_written-bufferrange-discard spec/ext_transform_feedback/change-size base-shrink spec/ext_transform_feedback/change-size base-grow spec/ext_transform_feedback/change-size offset-shrink spec/ext_transform_feedback/change-size offset-grow spec/ext_transform_feedback/change-size range-shrink spec/ext_transform_feedback/change-size range-grow spec/ext_transform_feedback/intervening-read prims-written Reviewed-by: Eric Anholt <[email protected]>
* v3d: add header guards in v3d_packet_helpers.hIago Toral Quiroga2019-08-081-0/+4
| | | | Reviewed-by: Eric Anholt <[email protected]>
* meson: replace libmesa_util with idep_mesautilEric Engestrom2019-08-032-3/+4
| | | | | | | | | | | This automates the include_directories and dependencies tracking so that all users of libmesa_util don't need to add them manually. Next commit will remove the ones that were only added for that reason. Signed-off-by: Eric Engestrom <[email protected]> Acked-by: Eric Anholt <[email protected]> Tested-by: Vinson Lee <[email protected]>
* tree-wide: replace MAYBE_UNUSED with ASSERTEDEric Engestrom2019-07-314-5/+5
| | | | | | Suggested-by: Jason Ekstrand <[email protected]> Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* v3d: Introduce a DRM shim for calling out to the simulator.Eric Anholt2019-07-257-0/+779
| | | | | | | | | | | | The goal is to enable testing of parts of drivers without depending on any particular kernel version or hardware being present. Simply set LD_PRELOAD=$PREFIX/lib/libv3d_drm_shim.so in your environment, and we'll fake a /dev/dri/renderD128 (or whatever the next available node is) using v3dv3. That node can then be used with the surfaceless or gbm EGL platforms. Acked-by: Iago Toral Quiroga <[email protected]>
* v3d: Avoid scheduling an instruction that stalls waiting for SFU retvalJose Maria Casanova Crespo2019-07-221-4/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we detect that a scheduling candidate will stall because having a register source that is the written by the SFU unit in the previous instruction we reduce its priority so any non stalling operation would be chosen. The latency of SFU operations is defined as 2. So they would be scheduled earlier if other candidates have the same priority. Finally we won't merge instructions that stall to a previously chosen one. As the result of the previous one would be waiting for an extra cycle. Although shader-db result show that instruction are hurt with an increase of 0.35% the sum of instructions + stalls is reduced a 0.52%. And the total of sfu-stalls is reduced a 63.51%. It implies also a small increase in the max-temps metric because of scheduling earlier SFU operations. total instructions in shared programs: 9102719 -> 9117851 (0.17%) instructions in affected programs: 4324628 -> 4339760 (0.35%) helped: 4162 HURT: 12128 helped stats (abs) min: 1 max: 10 x̄: 1.28 x̃: 1 helped stats (rel) min: 0.09% max: 4.76% x̄: 0.66% x̃: 0.51% HURT stats (abs) min: 1 max: 27 x̄: 1.69 x̃: 1 HURT stats (rel) min: 0.05% max: 7.69% x̄: 0.87% x̃: 0.68% 95% mean confidence interval for instructions value: 0.90 0.96 95% mean confidence interval for instructions %-change: 0.47% 0.50% Instructions are HURT. total max-temps in shared programs: 1327728 -> 1327812 (<.01%) max-temps in affected programs: 4730 -> 4814 (1.78%) helped: 61 HURT: 134 helped stats (abs) min: 1 max: 2 x̄: 1.08 x̃: 1 helped stats (rel) min: 2.70% max: 13.33% x̄: 4.89% x̃: 4.17% HURT stats (abs) min: 1 max: 3 x̄: 1.12 x̃: 1 HURT stats (rel) min: 1.54% max: 20.00% x̄: 6.10% x̃: 5.26% 95% mean confidence interval for max-temps value: 0.28 0.58 95% mean confidence interval for max-temps %-change: 1.80% 3.52% Max-temps are HURT. total sfu-stalls in shared programs: 99551 -> 36324 (-63.51%) sfu-stalls in affected programs: 95029 -> 31802 (-66.53%) helped: 25882 HURT: 0 helped stats (abs) min: 1 max: 27 x̄: 2.44 x̃: 2 helped stats (rel) min: 5.26% max: 100.00% x̄: 79.86% x̃: 100.00% 95% mean confidence interval for sfu-stalls value: -2.47 -2.42 95% mean confidence interval for sfu-stalls %-change: -80.18% -79.54% Sfu-stalls are helped. total inst-and-stalls in shared programs: 9202270 -> 9154175 (-0.52%) inst-and-stalls in affected programs: 5618516 -> 5570421 (-0.86%) helped: 22728 HURT: 855 helped stats (abs) min: 1 max: 31 x̄: 2.16 x̃: 1 helped stats (rel) min: 0.07% max: 16.67% x̄: 1.14% x̃: 0.92% HURT stats (abs) min: 1 max: 5 x̄: 1.25 x̃: 1 HURT stats (rel) min: 0.12% max: 5.26% x̄: 1.24% x̃: 0.86% 95% mean confidence interval for inst-and-stalls value: -2.07 -2.01 95% mean confidence interval for inst-and-stalls %-change: -1.07% -1.05% Inst-and-stalls are helped. v2: Rename v3d_qpu_generates_sfu_stalls to v3d_qpu_instr_is_sfu (Eric) Reviewed-by: Eric Anholt <[email protected]>
* v3d: add shader-db stat to count SFU stallsJose Maria Casanova Crespo2019-07-225-14/+74
| | | | | | | | | | | | | | SFU operations have a latency of 2 cicles, so if their results are used in the following cycle to a SFU instruction, the GPU stalls for an extra cycle until the result is available. This adds the number of stalls to the shader-db debug mode and sum of instruction + stalls to evaluate optimizations to schedule instructions that avoid generating sfu-stalls. v2: Rename v3d_qpu_generates_sfu_stalls to v3d_qpu_instr_is_sfu (Eric) Reviewed-by: Eric Anholt <[email protected]>
* v3d: Use nir_shader_lower_instructions() for txf_ms lowering.Eric Anholt2019-07-181-26/+16
| | | | | | Cuts out a bunch of boilerplate. Reviewed-by: Iago Toral Quiroga <[email protected]>
* v3d: Fix assertion failures in debug builds.Eric Anholt2019-07-181-0/+2
| | | | | | | | | nir_lower_io leaves around deref_var instructions after lowering away deref intrinsics. This ends up breaking validation after v3d_nir_lower_io removes variables not actually being stored by the shader's store_output()s. Reviewed-by: Iago Toral Quiroga <[email protected]>
* v3d: emit correct lowering for logic operations with MSAA render targetsIago Toral Quiroga2019-07-181-5/+54
| | | | | | | v2: - Drop the writemask from the per-sample color intrinsic (Eric) Reviewed-by: Eric Anholt <[email protected]>
* v3d: handle nir_intrinsic_store_tlb_sample_color_v3dIago Toral Quiroga2019-07-181-20/+44
| | | | | | | v2: - Move handling of output intrinsics to ntq_emit_intrinsic() (Eric). Reviewed-by: Eric Anholt <[email protected]>
* v3d: implement per-sample tlb color writesIago Toral Quiroga2019-07-181-30/+44
| | | | Reviewed-by: Eric Anholt <[email protected]>
* v3d: refactor the tlb color write codeIago Toral Quiroga2019-07-181-49/+39
| | | | | | | | We want to split the tlb specifier setup from the color writes, because when we implement per-sample color writes we want to do the latter for all the samples, but the former only once. Reviewed-by: Eric Anholt <[email protected]>
* v3d: move tlb color write emission to a helper functionIago Toral Quiroga2019-07-181-95/+99
| | | | | | | | | We will soon be adding per-sample color writes which means additional complexity and more indentation (we will need another loop to emit the writes for each individual sample), so this will help keeping things simple and a bit more readable. Reviewed-by: Eric Anholt <[email protected]>
* v3d: implement per-sample tlb color readsIago Toral Quiroga2019-07-181-39/+52
| | | | Reviewed-by: Eric Anholt <[email protected]>
* broadcom: Move v3d_get_device_info to commonAndreas Bergmeier2019-07-173-1/+86
| | | | In common we can use implementation for Vulkan.
* v3d: use inc/dec tmu operation with image atomic sub/add of 1Alejandro Piñeiro2019-07-121-5/+11
| | | | | | | | | | | | | | | | | | This allows to remove a mov of 1/-1, as it is implicit with the operation. As with atomic inc/dec/add, usual shader-db set doesn't include any GLES shader using it. So using as workaround vk-gl-cts shaders, we get this: total instructions in shared programs: 1217013 -> 1217006 (<.01%) instructions in affected programs: 53 -> 46 (-13.21%) helped: 2 HURT: 0 One of the helped shader went from 40 to 34 instructions. Reviewed-by: Eric Anholt <[email protected]>
* v3d: refactor some code from v3d40_vir_emit_image_load_storeAlejandro Piñeiro2019-07-121-33/+29
| | | | | | | And moved to new auxiliar method v3d40_image_load_store_tmu_op, equivalent to the nir_to_nir v3d_general_tmu_op, to clean-up a little. Reviewed-by: Eric Anholt <[email protected]>
* v3d: use inc/dec tmu operation with atomic sub/add of 1Alejandro Piñeiro2019-07-122-6/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Among other things, this avoid the need of loading 1/-1 constants (so one less operation). The removed comment suggest the option of adding support on NIR for inc/dec. Intel just uses an auxiliar method to get which hw operation is needed, so no lowering is needed. And at the same time, being so small, seems unreasonable to try to add a general one on NIR itself. It is more easy to just adapt the method here (that is what the patch does right now). It is worth to note that we are not getting any change on shader-db stats because all those methods are used on the usual shader-db set with shaders needing GLSL > 4.2. In general there aren't too many GLSL ES 3.1 tests. As an alternative, we captured the GLES3/GLSL31/GLS32 used on vk-gl-cts, even if that is not a real life usage of shaders. With those we get the following: total instructions in shared programs: 1217022 -> 1217013 (<.01%) instructions in affected programs: 117 -> 108 (-7.69%) helped: 6 HURT: 0 helped stats (abs) min: 1 max: 2 x̄: 1.50 x̃: 1 helped stats (rel) min: 3.57% max: 10.00% x̄: 8.09% x̃: 9.09% 95% mean confidence interval for instructions value: -2.07 -0.93 95% mean confidence interval for instructions %-change: -10.54% -5.64% Instructions are helped. Note that the shaders helped are really low because most of the vk-gl-cts tests using AtomicInc/Dec/Add are mostly used on compute shaders. Although right now there is a branch around with CS support, the usual is doing the stats against master. Reviewed-by: Eric Anholt <[email protected]>
* v3d: remove redefinition of tmu operations on nir_to_virAlejandro Piñeiro2019-07-121-38/+21
| | | | | | | | | | | | They are already defined, although is a slightly different format on the generated packet headers, so it was needed to change how it is used on nir_to_vir. In addition to allow to remove some duplicated headers, it will allow to define just one get_op_for_atomic_add aux method later to support using inc/dec instead of add of 1/-1. Reviewed-by: Eric Anholt <[email protected]>
* v3d: tweak initial comment on pack generator scriptAlejandro Piñeiro2019-07-121-1/+1
| | | | | | | As the files it mentions to use as reference has slightly different names. Reviewed-by: Eric Anholt <[email protected]>
* v3d: remove unused definitionsIago Toral Quiroga2019-07-121-7/+0
| | | | Reviewed-by: Eric Anholt <[email protected]>
* v3d: move implementation of some intrinsics to separate helpersIago Toral Quiroga2019-07-121-78/+90
| | | | Reviewed-by: Eric Anholt <[email protected]>
* v3d: emit correct lowering for logic ops with RGB10A2 render targetsIago Toral Quiroga2019-07-121-12/+64
| | | | Reviewed-by: Eric Anholt <[email protected]>
* v3d: emit correct lowering for logic ops with integer render targetsIago Toral Quiroga2019-07-122-9/+47
| | | | Reviewed-by: Eric Anholt <[email protected]>
* v3d: add lowering for OpenGL logic operationsIago Toral Quiroga2019-07-124-0/+279
| | | | | | | | | | | | | | | | | | | | | This implements support for OpenGL logic operations by emitting code to read from the TLB if needed and blending the fragment output accordingly. It is similar to VC4's blend lowering pass, but exclusive to logic operations, since blending is otherwise supported in hardware. The pass doesn't handle MSAA targets yet. Fixes the following piglit tests: spec/!opengl 1.0/gl-1.0-logicop/* spec/!opengl 1.1/gl-1.1-xor spec/!opengl 1.1/gl-1.1-xor-copypixels It also fixes text cursor rendering in Libreoffice with the GTK+2 theme, which is rendered via glamor using the XOR logic operation. v2: fix checks for allowed variable location and maximum render target (Eric) Reviewed-by: Eric Anholt <[email protected]>
* v3d: acquire scoreboard lock before first tlb readIago Toral Quiroga2019-07-123-0/+34
| | | | | | | | | | | | | | | | | | | Until now we have always been emitting our scoreboard locks on the last thread switch to improve parallelism. We did this by emitting our last thread switch right before our tlb writes at the very end of the program, where we know that we are outside control flow. Unfortunately, this strategy is not valid when we have tlb color reads too, as these will happen before this point in the program and can happen inside control flow. To fix this we always emit a thread switch before the first tlb load and if we see additional thread switches after that point, we change the strategy to lock on the first thread switch. v2: change the solution so it is expected to work in more scenarios (Eric). Reviewed-by: Eric Anholt <[email protected]>
* v3d: implement tile buffer color read intrinsicIago Toral Quiroga2019-07-121-0/+100
| | | | | | | | | | | | We will be emitting this intrinsic to signal TLB color loads when we implement OpenGL logic operations, where we need to blend the fragment shader color output with the existing color in the render target. Per-sample TLB reads are not supported yet. v2: fix the offset into the color_reads array (Eric). Reviewed-by: Eric Anholt <[email protected]>
* v3d: fix size of color_reads and sample_colors arraysIago Toral Quiroga2019-07-121-2/+2
| | | | | | | | | We need to scale the size of these arrays to consider up to V3D_MAX_DRAW_BUFFERS render targets and 4 components per color. v2: we want to store each color component separately, so scale by 4 too. Reviewed-by: Eric Anholt <[email protected]>
* v3d: add color formats and swizzles to the fragment shader keyIago Toral Quiroga2019-07-121-0/+9
| | | | | | | We are going to need these very soon to emit correct reads from the tlb to implement logic operations. Reviewed-by: Eric Anholt <[email protected]>
* v3d: add helpers to emit ldtlb and ldtlbu signalsIago Toral Quiroga2019-07-121-0/+24
| | | | | | | | | | The ldtlbu version will read an implicit uniform with the TLB read specifier and should be used for the first read in a sequence of TLB reads (unless the default configuration is valid, in which case we can use ldtlb). The ldtlb version is used for any subsequent TLB read in the sequence. Reviewed-by: Eric Anholt <[email protected]>
* v3d: handle tlb read dependency tracking as if they were writesIago Toral Quiroga2019-07-121-1/+1
| | | | | | Tile buffer reads are emitted as ordered sequences and cannot be reordered. Reviewed-by: Eric Anholt <[email protected]>
* v3d: instructions with the ldtlb and ldtlbu signals are tlb instructionsIago Toral Quiroga2019-07-121-0/+3
| | | | Reviewed-by: Eric Anholt <[email protected]>
* v3d: tlb loads cannot be removedIago Toral Quiroga2019-07-121-0/+2
| | | | | | | Loads from the tile buffer are emitted in ordered sequences so we cannot eliminate or reorder any of them. Reviewed-by: Eric Anholt <[email protected]>
* v3d: the ldtlbu signal reads an implicit uniformIago Toral Quiroga2019-07-121-0/+1
| | | | Reviewed-by: Eric Anholt <[email protected]>
* v3d: handle ldtlb and ldtlbu signals during disassemblyIago Toral Quiroga2019-07-121-0/+2
| | | | | | | | We already have code to print these signals but the early return in the code that checks if any signals are present present was missing the checks for them, so it would skip printing them unless they were paired with other signals. Reviewed-by: Eric Anholt <[email protected]>
* nir: Add lower_rotate flag and set to true in all driversSagar Ghuge2019-07-011-0/+1
| | | | | | Signed-off-by: Sagar Ghuge <[email protected]> Suggested-by: Matt Turner <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* nir: define behavior of nir_op_bfm and nir_op_u/ibfe according to SM5 spec.Daniel Schürmann2019-06-241-1/+0
| | | | | | | | | | | That is: the five least significant bits provide the values of 'bits' and 'offset' which is the case for all hardware currently supported by NIR and using the bfm/bfe instructions. This patch also changes the lowering of bitfield_insert/extract using shifts to not use bfm and removes the flag 'lower_bfm'. Tested-by: Eric Anholt <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* v3d: implement simultaneous peripheral access exceptions for V3D 4.1+Iago Toral Quiroga2019-06-183-6/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Shader-db results: total instructions in shared programs: 9117550 -> 9102719 (-0.16%) instructions in affected programs: 1752873 -> 1738042 (-0.85%) helped: 7076 HURT: 478 helped stats (abs) min: 1 max: 22 x̄: 2.19 x̃: 2 helped stats (rel) min: 0.07% max: 13.89% x̄: 1.70% x̃: 1.07% HURT stats (abs) min: 1 max: 7 x̄: 1.41 x̃: 1 HURT stats (rel) min: 0.09% max: 10.17% x̄: 0.86% x̃: 0.54% 95% mean confidence interval for instructions value: -2.00 -1.92 95% mean confidence interval for instructions %-change: -1.58% -1.50% Instructions are helped. total max-temps in shared programs: 1327774 -> 1327728 (<.01%) max-temps in affected programs: 1025 -> 979 (-4.49%) helped: 47 HURT: 2 helped stats (abs) min: 1 max: 2 x̄: 1.02 x̃: 1 helped stats (rel) min: 2.63% max: 20.00% x̄: 7.67% x̃: 5.26% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 4.17% max: 4.17% x̄: 4.17% x̃: 4.17% 95% mean confidence interval for max-temps value: -1.06 -0.82 95% mean confidence interval for max-temps %-change: -8.89% -5.49% Max-temps are helped. Reviewed-by: Eric Anholt <[email protected]>
* v3d: do not setup execute flags for else block in uniform control flowIago Toral Quiroga2019-06-141-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | Either all channels executed the 'then' block, in which case all channels will directly jump to the 'endif' block at the end of the 'then' block, or all channels execute the 'else' block (so no execution masking is necessary). Shader-db results: total instructions in shared programs: 9119238 -> 9117550 (-0.02%) instructions in affected programs: 401252 -> 399564 (-0.42%) helped: 855 HURT: 77 total uniforms in shared programs: 3022622 -> 3022605 (<.01%) uniforms in affected programs: 3566 -> 3549 (-0.48%) helped: 17 HURT: 0 total max-temps in shared programs: 1327762 -> 1327774 (<.01%) max-temps in affected programs: 619 -> 631 (1.94%) helped: 2 HURT: 15 Reviewed-by: Eric Anholt <[email protected]>
* v3d: fix checking twice auf flagAlejandro Piñeiro2019-06-131-1/+1
| | | | | | | | | Seems a C&P error, and should check for auf/muf. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110902 Fixes: 8f065596d22ab000c53f "v3d: Add an optimization pass for redundant flags updates." Reviewed-by: Eric Anholt <[email protected]>
* v3d: don't emit point coordinates varyings if the FS doesn't read themIago Toral Quiroga2019-06-073-5/+22
| | | | | | | We still need to emit them in V3D 3.x since there there is no mechanism to disable them. Reviewed-by: Eric Anholt <[email protected]>
* v3d: add a helper to track variables that need point coordinatesIago Toral Quiroga2019-06-071-5/+10
| | | | Reviewed-by: Eric Anholt <[email protected]>
* v3d: fix scheduling dependency tracking for ALU with small immediatesIago Toral Quiroga2019-06-061-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | We were not accountint for small immediates in the B mux so the scheduler was interpreting these are regular register file accesses, which could lead to additional (incorrect) write-read dependencies. Shader-db changes: total instructions in shared programs: 9163664 -> 9137263 (-0.29%) instructions in affected programs: 3931035 -> 3904634 (-0.67%) helped: 12457 HURT: 2563 total max-temps in shared programs: 1325787 -> 1325597 (-0.01%) max-temps in affected programs: 5746 -> 5556 (-3.31%) helped: 186 HURT: 16 helped stats (abs) min: 1 max: 4 x̄: 1.12 x̃: 1 helped stats (rel) min: 1.45% max: 22.22% x̄: 4.42% x̃: 3.28% HURT stats (abs) min: 1 max: 3 x̄: 1.12 x̃: 1 HURT stats (rel) min: 2.86% max: 10.00% x̄: 5.76% x̃: 5.88% 95% mean confidence interval for max-temps value: -1.04 -0.84 95% mean confidence interval for max-temps %-change: -4.16% -3.07% Max-temps are helped. Reviewed-by: Eric Anholt <[email protected]>
* v3d: Enable NIR's lower_fmod option.Kenneth Graunke2019-06-051-0/+1
| | | | | | | | | | Currently, st/mesa is always calling the GLSL IR lower_instructions() pass with MOD_TO_FLOOR set, so mod operations will be lowered before ever reaching NIR. This enables the same lowering at the NIR level, which will let me shut off the GLSL IR path for NIR-based drivers. Reviewed-by: Marek Olšák <[email protected]> Acked-by: Eric Anholt <[email protected]>
* nir: Drop imov/fmov in favor of one mov instructionJason Ekstrand2019-05-241-2/+1
| | | | | | | | | | | | | | | | The difference between imov and fmov has been a constant source of confusion in NIR for years. No one really knows why we have two or when to use one vs. the other. The real reason is that they do different things in the presence of source and destination modifiers. However, without modifiers (which many back-ends don't have), they are identical. Now that we've reworked nir_lower_to_source_mods to leave one abs/neg instruction in place rather than replacing them with imov or fmov instructions, we don't need two different instructions at all anymore. Reviewed-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Vasily Khoruzhick <[email protected]> Acked-by: Rob Clark <[email protected]>
* nir: allow specifying a set of opcodes in lower_alu_to_scalarJonathan Marek2019-05-101-1/+1
| | | | | | | | | This can be used by both etnaviv and freedreno/a2xx as they are both vec4 architectures with some instructions being scalar-only. Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Christian Gmeiner <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* nir: Initialize lower_flrp_progress everywhereIan Romanick2019-05-091-1/+1
| | | | | | | | | | | | | | | | I don't know why I thought NIR_PASS always set the progress variable. Derp. Fixes: d41cdef2a59 ("nir: Use the flrp lowering pass instead of nir_opt_algebraic") Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Coverity CID: 1444996 Coverity CID: 1444995 Coverity CID: 1444994 Coverity CID: 1444993 Coverity CID: 1444991 Coverity CID: 1444989
* nir: Use the flrp lowering pass instead of nir_opt_algebraicIan Romanick2019-05-061-0/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | I tried to be very careful while updating all the various drivers, but I don't have any of that hardware for testing. :( i965 is the only platform that sets always_precise = true, and it is only set true for fragment shaders. Gen4 and Gen5 both set lower_flrp32 only for vertex shaders. For fragment shaders, nir_op_flrp is lowered during code generation as a(1-c)+bc. On all other platforms 64-bit nir_op_flrp and on Gen11 32-bit nir_op_flrp are lowered using the old nir_opt_algebraic method. No changes on any other Intel platforms. v2: Add panfrost changes. Iron Lake and GM45 had similar results. (Iron Lake shown) total cycles in shared programs: 188647754 -> 188647748 (<.01%) cycles in affected programs: 5096 -> 5090 (-0.12%) helped: 3 HURT: 0 helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 helped stats (rel) min: 0.12% max: 0.12% x̄: 0.12% x̃: 0.12% Reviewed-by: Matt Turner <[email protected]>