aboutsummaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* genxml: Consistently use a numeric "MOCS" fieldKenneth Graunke2018-12-1416-260/+177
| | | | | | | | | | | | | | | | | | | | | | | | | | | | When we first started using genxml, we decided to represent MOCS as an actual structure, and pack values. However, in many places, it was more convenient to use a numeric value rather than treating it as a struct, so we added secondary setters in a bunch of places as well. We were not entirely consistent, either. Some places only had one. Gen6 had both kinds of setters for STATE_BASE_ADDRESS, but newer gens only had the struct-based setters. The names were sometimes "Constant Buffer Object Control State" instead of "Memory", making it harder to find. Many had prefixes like "Vertex Buffer MOCS"...in a vertex buffer packet...which is a bit redundant. On modern hardware, MOCS is simply an index into a table, but we were still carrying around the structure with an "Index to MOCS Table" field, in addition to the direct numeric setters. This is clunky - we really just want a number on new hardware. This patch eliminates the struct-based setters, and makes the numeric setters be consistently called "MOCS". We leave the struct definition around on Gen7-8 for reference purposes, but it is unused. v2: Drop bonus "Depth Buffer MOCS" fields on Gen7.5 and Gen9 Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* nir: fix opt_if_loop_last_continue()Timothy Arceri2018-12-141-2/+6
| | | | | | | | | | | | | | | | | | | | | | | The pass did not correctly handle loops ending in: if ssa_7 { block block_8: /* preds: block_7 */ continue /* succs: block_1 */ } else { block block_9: /* preds: block_7 */ break /* succs: block_11 */ } The break will get eliminated by another opt but if this pass gets called first (as it does on RADV) we ended up inserting instructions after the break. Fixes: 5921a19d4b0c ("nir: add if opt opt_if_loop_last_continue()") Reviewed-by: Dave Airlie <[email protected]>
* freedreno/a6xx: fix resource_copy_region()Rob Clark2018-12-131-9/+24
| | | | | | | | | | | | | | pctx->resource_copy_region() needs to fall back to sw copy for non-renderable formats. But previously for things that we could not use the blitter for, would fall back to 3d. Which won't work if 3d can't render to the dst format either. Instead rework things to fallback to fd_resource_copy_region(), which will try 3d core and then fall back to memcpy(). Fixes (for example) dEQP-GLES3.functional.texture.format.sized.2d.rgb9_e5_pot Signed-off-by: Rob Clark <[email protected]>
* freedreno: move fd_resource_copy_region()Rob Clark2018-12-133-62/+73
| | | | | | Code-motion prep for next patch. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: more blitter fixesRob Clark2018-12-131-10/+22
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno: update generated headersRob Clark2018-12-138-30/+39
| | | | Signed-off-by: Rob Clark <[email protected]>
* gallium/aux: add is_unorm() helperRob Clark2018-12-132-0/+24
| | | | | | We already had one for is_snorm() but not unorm. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: fix blitter crashRob Clark2018-12-131-0/+17
| | | | | | | | | | Fixes a crash with unsupported formats in dEQP-GLES3.functional.texture.format.sized.2d.rgb9_e5_pot Also fixes gpu hangs with some formats that are supported, but which we don't know what internal-format to use for the blitter, for ex dEQP-GLES3.functional.texture.format.sized.2d_array.rgb10_a2_pot Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: don't remove unused input componentsRob Clark2018-12-131-1/+7
| | | | | Fixes: 0d240c22141 freedreno/ir3: don't fetch unused tex components Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: fix crashRob Clark2018-12-131-14/+8
| | | | | | | Fixes a crash in dEQP-GLES3.functional.shaders.fragdepth.compare.fragcoord_z Fixes: 0d240c22141 freedreno/ir3: don't fetch unused tex components Signed-off-by: Rob Clark <[email protected]>
* freedreno: also set DUMP flag on shadersRob Clark2018-12-135-20/+22
| | | | | | | | If we emit shader as a pointer to a GEM object, also set the RELOC_DUMP flag as a hint to kernel that this is a useful buffer to snapshot for debug dumps. Signed-off-by: Rob Clark <[email protected]>
* freedreno: debug GEM obj namesRob Clark2018-12-1313-21/+91
| | | | | | | With a recent enough kernel, set debug names for GEM BOs, which will show up in $debugfs/gem Signed-off-by: Rob Clark <[email protected]>
* freedreno/drm: sync uapi and enable softpinRob Clark2018-12-136-25/+30
| | | | | | | | | Pull in updated UAPI and use kernel API version to enable softpin. Since MSM_SUBMIT_BO_DUMP flag was added at same time, use that to signal to kernel that cmdstream buffers are useful to dump for debugging/cmdstream-traces. Signed-off-by: Rob Clark <[email protected]>
* nir: Move intel's half-float image store lowering to to nir_format.h.Eric Anholt2018-12-132-8/+15
| | | | | | | | I needed the same function for v3d. This was originally in d3e046e76c06 ("nir: Pull some of intel's image load/store format conversion to nir_format.h") before we made am istake about simplifying the function. Reviewed-by: Jason Ekstrand <[email protected]>
* Revert "intel: Simplify the half-float packing in image load/store lowering."Eric Anholt2018-12-131-2/+8
| | | | | | | | This reverts commit 06fbcd2cd5cc5702c9039c26d20082a99bc157bf. nir_pack_half_2x16_split *isn't* vectorizable, it's 1-component only, thus why we had this split-scalar code in the first place. Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Print the format of image variables.Eric Anholt2018-12-131-0/+47
| | | | | | | | This helps a lot when debugging image load/store lowering on large testcases. Unfortunately the Mesa enum name stuff is under src/mesa and we can't get at it from the compiler. Reviewed-by: Jason Ekstrand <[email protected]>
* mesa/st: Expose compute shaders when NIR support is advertised.Eric Anholt2018-12-132-8/+14
| | | | | | | | We have a NIR path, and V3D doesn't have TGSI input for compute (only what TTN can handle for the various gallium-internal shaders). Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* radv/xfb: fix counter buffer bounds checks.Dave Airlie2018-12-131-2/+2
| | | | | | | | | | | | If we gave this function 0 counter buffers, we'd still try and access pCounterBuffers[0] as this check was incorrect. Fixes crash with ext_transform_feedback-pipeline-basic-primgen on zink on radv. Fixes: 677b496b6 (radv: fix begin/end transform feedback with 0 counter buffers.) Signed-off-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* i965: Enable nir_opt_idiv_const for 32 and 64-bit integersJason Ekstrand2018-12-131-1/+3
| | | | | | | | | | | | | | | | | | | | | The pass should work for all bit sizes but it's less clear that the extra instructions are worth it on small integers. Also, the hardware doesn't do mul_high on anything other than 32-bit integers and, absent any decent mechanism for testing the pass on 8 and 16-bit types, it's probably best to just leave it disabled for now. Shader-db results on Sky Lake: total instructions in shared programs: 15105795 -> 15111403 (0.04%) instructions in affected programs: 72774 -> 78382 (7.71%) helped: 0 HURT: 265 Note that hurt here actually means helped because we're getting rid of integer quotient operations (which are a send on some platforms!) and replacing them with fairly cheap ALU ops. Reviewed-by: Ian Romanick [email protected]
* i965/vec4: Implement nir_op_uadd_satJason Ekstrand2018-12-131-0/+6
| | | | Reviewed-by: Ian Romanick [email protected]
* i965/fs: Implement nir_op_uadd_satIan Romanick2018-12-131-0/+5
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Add a pass for lowering integer division by constantsJason Ekstrand2018-12-134-0/+219
| | | | | | | | | | | It's a reasonably well-known fact in the world of compilers that integer divisions by constants can be replaced by a multiply, an add, and some shifts. This commit adds such an optimization to NIR for easiest case of udiv. Other division operations will be added in following commits. In order to provide some additional driver control, the pass takes a minimum bit size to optimize. Reviewed-by: Ian Romanick [email protected]
* nir: Add a saturated unsigned integer add opcodeIan Romanick2018-12-131-0/+2
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* nir/lower_int64: Add support for [iu]mul_highJason Ekstrand2018-12-132-0/+67
| | | | Reviewed-by: Ian Romanick [email protected]
* nir: Allow [iu]mul_high on non-32-bit typesJason Ekstrand2018-12-132-4/+40
| | | | Reviewed-by: Ian Romanick [email protected]
* glx: mandate xf86vidmode only for "drm" dri platformsEmil Velikov2018-12-131-2/+4
| | | | | | | | | | | | | | | Currently we have the three dri "platforms" - drm, apple and windows. Since xf86vidmode is a thing only for the drm one, adjust the preprocessor guards and correctly check for the dependency. v2: terminate the GLX_USE_WINDOWSGL hunk Cc: Jon TURNEY <[email protected]> Fixes: 5bc509363b6 ("glx: make xf86vidmode mandatory for direct rendering") Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Dylan Baker <[email protected]> Acked-by: Eric Engestrom <[email protected]>
* nir: remove unused variableAlejandro Piñeiro2018-12-131-1/+0
| | | | | | | To avoid the following warning: ./src/compiler/nir/nir_loop_analyze.c:807:16: warning: unused variable ‘ns’ [-Wunused-variable] nir_shader *ns = impl->function->shader; Reviewed-by: Lionel Landwerlin <[email protected]>
* virgl: work around bad assumptions in virglrendererErik Faye-Lund2018-12-131-1/+32
| | | | | | | | | | | | | | | | | | | | Virglrenderer does the wrong thing when given an instance divisor; it tries to use the element-index rather than the binding-index as the argument to glVertexBindingDivisor(). This worked fine as long as there was a 1:1 relationship between elements and bindings, which was the case util 19a91841c34 "st/mesa: Use Array._DrawVAO in st_atom_array.c.". So let's detect instance divisors, and restore a 1:1 relationship in that case. This will make old versions of virglrenderer behave correctly. For newer versions, we can consider making a better interface, where the instance divisor isn't specified per element, but rather per binding. But let's save that for another day. Signed-off-by: Erik Faye-Lund <[email protected]> Fixes: 19a91841c34 "st/mesa: Use Array._DrawVAO in st_atom_array.c." Reviewed-by: Mathias Fröhlich <[email protected]> Tested-By: Gert Wollny <[email protected]>
* virgl: wrap vertex element state in a structErik Faye-Lund2018-12-132-9/+21
| | | | | | | | | This just has one member for now; the handle. But this is about to change. Signed-off-by: Erik Faye-Lund <[email protected]> Reviewed-by: Mathias Fröhlich <[email protected]> Tested-By: Gert Wollny <[email protected]>
* virgl: simplify virgl_hw_set_index_bufferErik Faye-Lund2018-12-131-3/+2
| | | | | | Signed-off-by: Erik Faye-Lund <[email protected]> Reviewed-by: Mathias Fröhlich <[email protected]> Tested-By: Gert Wollny <[email protected]>
* virgl: simplify virgl_hw_set_vertex_buffersErik Faye-Lund2018-12-131-4/+2
| | | | | | Signed-off-by: Erik Faye-Lund <[email protected]> Reviewed-by: Mathias Fröhlich <[email protected]> Tested-By: Gert Wollny <[email protected]>
* radv: don't check if format is depth in radv_image_can_enable_hile()Samuel Pitoiset2018-12-131-1/+0
| | | | | | | This is always TRUE if htile_size is not 0. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: check if addrlib enabled HTILE in radv_image_can_enable_htile()Samuel Pitoiset2018-12-131-1/+2
| | | | | | | | When hile_size is 0, we can't enable HTILE. This doesn't change anything, except not calling radv_image_alloc_htile(). Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: switch on EOP when primitive restart is enabled with triangle stripsSamuel Pitoiset2018-12-131-2/+1
| | | | | | | | | | Otherwise, Yakuza hangs the GPU with DXVK. We don't know if linetrip and pointlist are affected, so my point is to do that only for triangle strips. Cc: [email protected] Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: allow to skip DCC decompressions with the new predicateSamuel Pitoiset2018-12-131-6/+13
| | | | | | | | | | Feral games aren't affected because they don't decompress DCC. F1 2018 has one DCC decompression per frame, but I don't see any performance improvements. This new predicate will be probably more useful for DCC/MSAA. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: add a predicate for reflecting DCC decompression stateSamuel Pitoiset2018-12-135-1/+44
| | | | | | | It's somehow similar to the FCE predicate. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* i965/compute: Emit GPGPU_WALKER in genX_state_uploadJordan Justen2018-12-123-130/+105
| | | | | Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965/genX_state: Add register access functionsJordan Justen2018-12-121-0/+31
| | | | | Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel: Simplify the half-float packing in image load/store lowering.Eric Anholt2018-12-121-8/+2
| | | | | | | This was noted by Jason in review when I tried to make a helper for the old path. Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Pull some of intel's image load/store format conversion to nir_format.hEric Anholt2018-12-122-18/+40
| | | | | | | | | I needed the same functions for v3d. Note that the color value in the Intel lowering has already been cut down to image.chans num_components. v2: Drop the half float one, since it was a 1-liner after cleanup. Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Add some more consts to the nir_format_convert.h helpers.Eric Anholt2018-12-121-7/+6
| | | | | | | Most of the bits were constant, but a few were missed. Avoids warnings from v3d's upcoming static const bits declarations. Reviewed-by: Jason Ekstrand <[email protected]>
* nir: detect more induction variablesTimothy Arceri2018-12-131-0/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This allows loop analysis to detect inductions variables that are incremented in both branches of an if rather than in a main loop block. For example: loop { block block_1: /* preds: block_0 block_7 */ vec1 32 ssa_8 = phi block_0: ssa_4, block_7: ssa_20 vec1 32 ssa_9 = phi block_0: ssa_0, block_7: ssa_4 vec1 32 ssa_10 = phi block_0: ssa_1, block_7: ssa_4 vec1 32 ssa_11 = phi block_0: ssa_2, block_7: ssa_21 vec1 32 ssa_12 = phi block_0: ssa_3, block_7: ssa_22 vec4 32 ssa_13 = vec4 ssa_12, ssa_11, ssa_10, ssa_9 vec1 32 ssa_14 = ige ssa_8, ssa_5 /* succs: block_2 block_3 */ if ssa_14 { block block_2: /* preds: block_1 */ break /* succs: block_8 */ } else { block block_3: /* preds: block_1 */ /* succs: block_4 */ } block block_4: /* preds: block_3 */ vec1 32 ssa_15 = ilt ssa_6, ssa_8 /* succs: block_5 block_6 */ if ssa_15 { block block_5: /* preds: block_4 */ vec1 32 ssa_16 = iadd ssa_8, ssa_7 vec1 32 ssa_17 = load_const (0x3f800000 /* 1.000000*/) /* succs: block_7 */ } else { block block_6: /* preds: block_4 */ vec1 32 ssa_18 = iadd ssa_8, ssa_7 vec1 32 ssa_19 = load_const (0x3f800000 /* 1.000000*/) /* succs: block_7 */ } block block_7: /* preds: block_5 block_6 */ vec1 32 ssa_20 = phi block_5: ssa_16, block_6: ssa_18 vec1 32 ssa_21 = phi block_5: ssa_17, block_6: ssa_4 vec1 32 ssa_22 = phi block_5: ssa_4, block_6: ssa_19 /* succs: block_1 */ } Unfortunatly GCM could move the addition out of the if for us (making this patch unrequired) but we still cannot enable the GCM pass without regressions. This unrolls a loop in Rise of The Tomb Raider. vkpipeline-db results (VEGA): Totals from affected shaders: SGPRS: 88 -> 96 (9.09 %) VGPRS: 56 -> 52 (-7.14 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 2168 -> 4560 (110.33 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 4 -> 4 (0.00 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Thomas Helland <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=32211
* nir: reword code commentTimothy Arceri2018-12-131-2/+2
| | | | Reviewed-by: Thomas Helland <[email protected]>
* nir: in loop analysis track actual control flow typeTimothy Arceri2018-12-131-13/+21
| | | | | | | This will allow us to improve analysis to find more induction variables. Reviewed-by: Thomas Helland <[email protected]>
* nir: add if opt opt_if_loop_last_continue()Danylo Piliaiev2018-12-131-0/+95
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Removing the last continue can allow more loops to unroll. Also inserting code into the if branch can allow the various if opts to progress further. The insertion of some loops into the if branch also reduces VGPR use in some shaders. vkpipeline-db results (VEGA): Totals from affected shaders: SGPRS: 6552 -> 6576 (0.37 %) VGPRS: 6544 -> 6532 (-0.18 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 481952 -> 478032 (-0.81 %) bytes LDS: 13 -> 13 (0.00 %) blocks Max Waves: 241 -> 242 (0.41 %) Wait states: 0 -> 0 (0.00 %) Shader-db results radeonsi (VEGA): Totals from affected shaders: SGPRS: 168 -> 168 (0.00 %) VGPRS: 144 -> 140 (-2.78 %) Spilled SGPRs: 157 -> 157 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 8524 -> 8488 (-0.42 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 7 -> 7 (0.00 %) Wait states: 0 -> 0 (0.00 %) v2: (Timothy Arceri): - allow for continues in either branch - move any trailing loops inside the if as well as blocks. - leave nir_opt_trivial_continues() to actually remove the continue. Reviewed-by: Thomas Helland <[email protected]> Signed-off-by: Timothy Arceri <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=32211
* nir: rework force_unroll_array_access()Timothy Arceri2018-12-131-14/+35
| | | | | | | Here we rework force_unroll_array_access() so that we can reuse the induction variable detection in a following patch. Reviewed-by: Thomas Helland <[email protected]>
* nir: factor out some of the complex loop unroll code to a helperTimothy Arceri2018-12-131-51/+64
| | | | | Acked-by: Jason Ekstrand <[email protected]> Reviewed-by: Thomas Helland <[email protected]>
* meson: libfreedreno depends upon libdrm (for fence support)Rhys Kidd2018-12-121-3/+1
| | | | | | | | | | | | Error message building freedreno Gallium driver with meson: ../src/gallium/drivers/freedreno/freedreno_fence.c:27:21: fatal error: libsync.h: No such file or directory \#include <libsync.h> Fixes: 4aa69cc4257 ("meson: build freedreno") Signed-off-by: Rhys Kidd <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* nir: Document the function inlining processJason Ekstrand2018-12-121-0/+68
| | | | | | | | | This has thrown a few people off recently and it's good to have the process and all the rational for it documented somewhere. A comment at the top of nir_inline_functions seems as good a place as any. Acked-by: Karol Herbst <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* intel/blorp: Assert that we don't re-layout a compressed surfaceJason Ekstrand2018-12-121-0/+3
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>