aboutsummaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/lima/lima_program.c
Commit message (Collapse)AuthorAgeFilesLines
* lima: add support for R and RG formatsVasily Khoruzhick2020-03-201-13/+97
| | | | | | | | | | Unfortunately these are not supported natively for sampling so we have to lower them. Reviewed-by: Erico Nunes <[email protected]> Signed-off-by: Vasily Khoruzhick <[email protected]> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4241> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4241>
* lima/gpir: add better lowering for ftruncVasily Khoruzhick2020-03-161-9/+3
| | | | | | | | | | | | GP doesn't support ftrunc natively and unfortunately one in generic opt_algebraic is not GP-friendly either. Introduce our own lowering that utilizes fsign() that GP supports: ftrunc(a) = fmul(fsign(a), ffloor(fmax(a, -a))) Tested-by: Andreas Baierl <[email protected]> Reviewed-by: Andreas Baierl <[email protected]> Signed-off-by: Vasily Khoruzhick <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4126>
* lima: rename lima_submit to lima_jobQiang Yu2020-02-171-3/+3
| | | | | | | Reviewed-by: Vasily Khoruzhick <[email protected]> Signed-off-by: Qiang Yu <[email protected]> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3755> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3755>
* lima: move pp_max_stack_size to lima_submitQiang Yu2020-02-171-1/+3
| | | | | | | | pp_max_stack_size is preserved across draws. Reviewed-by: Vasily Khoruzhick <[email protected]> Signed-off-by: Qiang Yu <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3755>
* lima: disable early-z if fragment shader uses discardVasily Khoruzhick2020-01-271-0/+2
| | | | | | | | | | | We have to disable early-z if fragment shader uses discard, otherwise we'll get misrendering. Reported-by: Icenowy Zheng <[email protected]> Reviewed-by: Andreas Baierl <[email protected]> Signed-off-by: Vasily Khoruzhick <[email protected]> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3570> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3570>
* lima/ppir: enable lower_fdphErico Nunes2019-12-111-0/+1
| | | | | | | | | Otherwise we may lower some fdot to fdph which is not implemented in pp. Fixes #2126 Signed-off-by: Erico Nunes <[email protected]> Reviewed-by: Vasily Khoruzhick <[email protected]>
* lima: fix nir shader memory leakErico Nunes2019-11-071-0/+2
| | | | | | | | | | | | | | | | | | | | | | | Fix memory leak on allocation for nir shader, reported by valgrind. 3,502 (480 direct, 3,022 indirect) bytes in 1 blocks are definitely lost in loss record 77 of 84 at 0x48483F8: malloc (in /usr/lib/valgrind/vgpreload_memcheck-arm64-linux.so) by 0x5750817: ralloc_size (ralloc.c:119) by 0x5750977: rzalloc_size (ralloc.c:151) by 0x575C173: nir_shader_create (nir.c:45) by 0x5763ACB: nir_shader_clone (nir_clone.c:728) by 0x55D5003: st_create_fp_variant (st_program.c:1242) by 0x55D789F: st_get_fp_variant (st_program.c:1522) by 0x55D789F: st_get_fp_variant (st_program.c:1507) by 0x56400C3: st_update_fp (st_atom_shader.c:163) by 0x563D333: st_validate_state (st_atom.c:261) by 0x55D07CB: prepare_draw (st_draw.c:132) by 0x55D08DF: st_draw_vbo (st_draw.c:184) by 0x55576CB: _mesa_draw_arrays (draw.c:374) by 0x55576CB: _mesa_draw_arrays (draw.c:351) Signed-off-by: Erico Nunes <[email protected]> Reviewed-by: Qiang Yu <[email protected]>
* lima: add support for gl_PointSizeVasily Khoruzhick2019-11-051-0/+1
| | | | | | | | | | | | | GP handles gl_PointSize similar to gl_Position, i.e. it needs separate buffer and it has special type in varying descriptors, also for indexed draw we need to emit special PLBU command to pass address of gl_PointSize buffer. Blob also clamps gl_PointSize to 1 .. 100 (as well as line width), so let's do the same. Reviewed-by: Andreas Baierl <[email protected]> Signed-off-by: Vasily Khoruzhick <[email protected]>
* lima/ppir: add NIR pass to split varying loadsVasily Khoruzhick2019-09-261-0/+1
| | | | | | | | | | | | | | | NIR may emit a single instrinsic to load several packed varyings, but that's suboptimal for Utgard PP for several reasons: - varyings that are used as sampler inputs can be passed using pipeline register with increased precision - we have small number of regs, so using a vec4 regs for storing two vec2 varyings increases reg pressure. Add NIR pass to split a single load into several loads and utilize it in lima. Reviewed-by: Qiang Yu <[email protected]> Signed-off-by: Vasily Khoruzhick <[email protected]>
* lima: implement BO cacheVasily Khoruzhick2019-09-221-2/+2
| | | | | | | | | | | | | | | Allocating BOs is expensive, so we should avoid doing that by caching freed BOs. BO cache is modelled after one in v3d driver and works as follows: - in lima_bo_create() check if we have matching BO in cache and return it if there's one, allocate new BO otherwise. - in lima_bo_unreference() (renamed from lima_bo_free()): put BO in cache instead of freeing it and remove all stale BOs from cache Reviewed-by: Qiang Yu <[email protected]> Signed-off-by: Vasily Khoruzhick <[email protected]>
* lima: run opt_algebraic between int_to_float and boot_to_float for vsVasily Khoruzhick2019-09-091-4/+5
| | | | | | | | | int_to_float emits ftrunc and ftrunc lowering generates bool ops. Reviewed-by: Connor Abbott <[email protected]> Reviewed-by: Erico Nunes <[email protected]> Reviewed-by: Andreas Baierl <[email protected]> Signed-off-by: Vasily Khoruzhick <[email protected]>
* lima/gpir: lower fceilVasily Khoruzhick2019-09-091-0/+1
| | | | | | | | | GP doesn't support fceil so we need to lower it. Reviewed-by: Connor Abbott <[email protected]> Reviewed-by: Erico Nunes <[email protected]> Reviewed-by: Andreas Baierl <[email protected]> Signed-off-by: Vasily Khoruzhick <[email protected]>
* lima/ppir: don't lower phis to scalarVasily Khoruzhick2019-09-051-1/+0
| | | | | | | | | | Utgard PP is vec4 architecture, so lowering phis to scalars increases instruction count and potentially interferes with spilling. Tested-by: Andreas Baierl <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Signed-off-by: Vasily Khoruzhick <[email protected]>
* lima/ppir: don't lower vector {b,f}csel to scalar if condition is scalarVasily Khoruzhick2019-09-061-5/+21
| | | | | | | | | | Utgard PP has vector fcsel operation, but its condition is scalar. Add filtering callback that checks whether {b,f}csel condition is not scalar to lower {b,f}csel to scalar only in this case. Reviewed-by: Qiang Yu <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Signed-off-by: Vasily Khoruzhick <[email protected]>
* nir: allow specifying filter callback in lower_alu_to_scalarVasily Khoruzhick2019-09-061-16/+30
| | | | | | | | | | | | | Set of opcodes doesn't have enough flexibility in certain cases. E.g. Utgard PP has vector conditional select operation, but condition is always scalar. Lowering all the vector selects to scalar increases instruction number, so we need a way to filter only those ops that can't be handled in hardware. Reviewed-by: Qiang Yu <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Signed-off-by: Vasily Khoruzhick <[email protected]>
* lima/ppir: enable vectorize optimizationErico Nunes2019-08-251-0/+5
| | | | | | | | | | pp has vector units and some operations can be optimized when bundled together. Benchmarking this with piglit shaders shows that the instruction count can be greatly reduced on many examples with vectorize. Signed-off-by: Erico Nunes <[email protected]> Reviewed-by: Qiang Yu <[email protected]>
* lima/ppir: lower selects to scalarsErico Nunes2019-08-251-0/+5
| | | | | | | | | nir vec4 fcsel assumes that each component of the condition will be used to select the same component from the options, but pp can't implement that since it only has 1 component for the condition. Signed-off-by: Erico Nunes <[email protected]> Reviewed-by: Qiang Yu <[email protected]>
* lima: fix ppir spill stack allocationErico Nunes2019-08-251-0/+2
| | | | | | | | | | | | The previous spill stack was fixed and too small, and caused instability in programs requiring spilling for roughly more than one value. This patch adds a dynamic calculation of the buffer size based on stack utilization and switches it to a separate allocation at flush time that will fit the shader that requires the largest buffer. Signed-off-by: Erico Nunes <[email protected]> Reviewed-by: Vasily Khoruzhick <[email protected]> Reviewed-by: Qiang Yu <[email protected]>
* lima/ppir: move sin/cos input scaling into NIRVasily Khoruzhick2019-08-061-0/+3
| | | | | Reviewed-by: Erico Nunes <[email protected]> Signed-off-by: Vasily Khoruzhick <[email protected]>
* lima: add summary report for shader-dbErico Nunes2019-08-061-2/+4
| | | | | | | | | | | | | | Very basic summary, loops and gpir spills:fills are not updated yet and are only there to comply with the strings to shader-db report.py regex. For now it can be used to analyze the impact of changes in instruction count in both gpir and ppir. The LIMA_DEBUG=shaderdb setting can be useful to output stats on applications other than shader-db. Signed-off-by: Erico Nunes <[email protected]> Reviewed-by: Qiang Yu <[email protected]>
* lima/ppir: enable lower_vector_cmp to lower fall_equalErico Nunes2019-08-051-0/+1
| | | | | Signed-off-by: Erico Nunes <[email protected]> Reviewed-by: Vasily Khoruzhick <[email protected]>
* lima: re-run nir_opt_algebraic after int loweringErico Nunes2019-08-051-0/+15
| | | | | | | | | nir_lower_int_to_float is currently only meant to run once, and some ops must be lowered after being converted from int ops to be implementable, so re-run nir_opt_algebraic after lowering ints to floats. Signed-off-by: Erico Nunes <[email protected]> Reviewed-by: Vasily Khoruzhick <[email protected]>
* lima/ppir: move alu vec to scalar lowering into NIRVasily Khoruzhick2019-08-041-1/+10
| | | | | | | | | Utgard PP is vec4, but some operations are scalar, utilize NIR vec to scalar lowering pass and indicate operations that we want to lower. Reviewed-by: Qiang Yu <[email protected]> Signed-off-by: Vasily Khoruzhick <[email protected]>
* lima: enable lower_bitops in ppirErico Nunes2019-07-311-0/+1
| | | | | | | | | The mali pp doesn't support integers and some nir_algebraic optimizations may result in ops that are not easily lowerable to floats, so disable optimizations resulting in bitops. Signed-off-by: Erico Nunes <[email protected]> Reviewed-by: Jonathan Marek <[email protected]>
* nir/algebraic: rename lower_bitshift to lower_bitopsErico Nunes2019-07-311-1/+1
| | | | | | | | | | | | | Optimizations that insert bitshift or bitwise operations should not be applied on GPUs that don't support integer operations. The .lower_bitshift could be used to control the bitshift related ones, but there was also one bitwise optimization uncovered. Since only lima and freedreno use this option and the use case is that no bit operations are wanted, let's rename it to .lower_bitops and use it to control all bitops related optimizations. Signed-off-by: Erico Nunes <[email protected]> Reviewed-by: Jonathan Marek <[email protected]>
* lima/ppir: lower fdot in nir_opt_algebraicErico Nunes2019-07-311-0/+1
| | | | | | | | | Now that we have fsum in nir, we can move fdot lowering there. This helps reduce ppir complexity and enables the lowered ops to be part of other nir optimizations in the optimization loop. Signed-off-by: Erico Nunes <[email protected]> Reviewed-by: Qiang Yu <[email protected]>
* lima/ppir: lower texture projectionErico Nunes2019-07-311-0/+5
| | | | | | | | | | | | | | | | | | | | | | Lower texture projection in ppir using nir_lower_tex and nir_lower_tex. This will insert a mul with the coordinate division before the load varying. Even though the lima pp supports projection in the load varying instruction while loading the coordinates (from a register or a varying), it requires that both the coordinates and projector be components in a single register. nir currently handles them in separate ssa, and attempting to merge them manually may end up in worse code than just doing the coordinate division manually. So for now let's just lower the projection to add support for it in lima. In the future, an optimization pass may be implemented in lima to ensure that both coords and projector come in the same register, then this lowering may be disabled and in this case lima may use the built-in projection and save the mul instruction from lowering. Signed-off-by: Erico Nunes <[email protected]> Reviewed-by: Qiang Yu <[email protected]>
* lima/gp: Clean up lima_program_optimize_vs_nir() a littleConnor Abbott2019-07-281-1/+1
| | | | | | | | | | Remove an unnecessary nir_lower_regs_to_ssa as that should be done by the state tracker, and add a missing DCE pass after running copy propagation in order to remove the dead copies. This shouldn't fix anything but the second part will reduce shader sizes. Signed-off-by: Connor Abbott <[email protected]> Reviewed-by: Qiang Yu <[email protected]>
* nir: replace lower_sincos with algebraic optJonathan Marek2019-07-241-1/+1
| | | | | | | | This version has less ops for the same precision. Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Vasily Khoruzhick <[email protected]> Acked-by: Matt Turner <[email protected]>
* lima: Reintroduce the standalone compilerConnor Abbott2019-07-181-2/+2
| | | | | | I used this to test things without needing to have a device handy. Acked-by: Qiang Yu <[email protected]>
* nir: Add lower_rotate flag and set to true in all driversSagar Ghuge2019-07-011-0/+2
| | | | | | Signed-off-by: Sagar Ghuge <[email protected]> Suggested-by: Matt Turner <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* lima/ppir: lower ffma in ppirAndreas Baierl2019-06-241-0/+1
| | | | | | | Since we cannot handle ffma in ppir, lower it on nir level already. Signed-off-by: Andreas Baierl <[email protected]> Reviewed-by: Qiang Yu <[email protected]>
* lima: lower fmod in ppir and gpirErico Nunes2019-06-161-0/+2
| | | | | | | | | | | | Since commit 4f3c82c72c5 fmod is no longer being lowered in nir, and ends up crashing lima programs with "unsupported nir_op: fmod" in both ppir and gpir. There seems to be no mod operation in hardware in utgard and there is an optimization in nir to lower fmod to instructions that lima already implements, so let's use that. Signed-off-by: Erico Nunes <[email protected]> Reviewed-by: Qiang Yu <[email protected]>
* nir: remove bool lowering from lower_int_to_floatJonathan Marek2019-05-311-0/+2
| | | | | | | | | | | | | | Removes the bool_to_float logic from the int_to_float pass, so that both can be used separately. By having separate passes we have better validation and it makes it possible to use with the lower_ftrunc option (int lowering generates ftrunc, but lower_ftrunc generates bools, ftrunc lowering should probably be reworked). For now we always expect lower_bool to come after lower_int. Also fixes f2i32 to become ftrunc and adds u2f/f2u cases. Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: add lower_bitshift optionJonathan Marek2019-05-311-0/+1
| | | | | | | | | Add a "lower_bitshift" option, which disables optimizations introducing bitshifts and lowers ishl by constant to a multiply, so that we don't have to deal with bitshifts in int_to_float lowering. Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* lima/gpir: switch to use nir_lower_viewport_transformQiang Yu2019-05-201-0/+1
| | | | | | Reviewed-by: Vasily Khoruzhick <[email protected]> Reviewed-by: Erico Nunes <[email protected]> Signed-off-by: Qiang Yu <[email protected]>
* nir: allow specifying a set of opcodes in lower_alu_to_scalarJonathan Marek2019-05-101-2/+2
| | | | | | | | | This can be used by both etnaviv and freedreno/a2xx as they are both vec4 architectures with some instructions being scalar-only. Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Christian Gmeiner <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* lima: enable sin and cos lowering for GPVasily Khoruzhick2019-05-071-0/+1
| | | | | | | | GP doesn't support sin/cos natively, so we have to lower them. Reviewed-by: Qiang Yu <[email protected]> Tested-by: Qiang Yu <[email protected]> Signed-off-by: Vasily Khoruzhick <[email protected]>
* nir: nir_shader_compiler_options: drop native_integersChristian Gmeiner2019-05-071-2/+0
| | | | | | | | Driver which do not support native integers should use a lowering pass to go from integers to floats. Signed-off-by: Christian Gmeiner <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* lima/gpir: enable lowering for ftruncVasily Khoruzhick2019-05-071-0/+1
| | | | | Reviewed-by: Qiang Yu <[email protected]> Signed-off-by: Vasily Khoruzhick <[email protected]>
* lima: use int_to_float lowering passVasily Khoruzhick2019-05-071-2/+6
| | | | | | | | Neither GP nor PP in Mali4x0 support integers, so utilize new pass and set native_integers to true for now until this flag is dropped. Reviewed-by: Qiang Yu <[email protected]> Signed-off-by: Vasily Khoruzhick <[email protected]>
* lima/ppir: Add gl_FragCoord handlingAndreas Baierl2019-04-291-0/+1
| | | | | | | | | Treat gl_FragCoord variable as a system value and lower the w component with a nir pass. Add the necessary bits for correct codegen. Signed-off-by: Andreas Baierl <[email protected]> Reviewed-by: Qiang Yu <[email protected]>
* lima: enable nir fsign lowering in ppirErico Nunes2019-04-191-0/+1
| | | | | | | | The mali utgard pp doesn't support a sign instruction. Use the nir lowering function for fsign to implement fsign in ppir. Signed-off-by: Erico Nunes <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* lima: add bool parameter to type_size functionKarol Herbst2019-04-121-1/+1
| | | | | | | | | Fixes: 035759b61ba1778d5143cdf3a8795a62dd5d8a60 ("nir/i965/freedreno/vc4: add a bindless bool to type size functions") Signed-off-by: Karol Herbst <[email protected]> Tested-by: Icenowy Zheng <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* lima: lower bool to float when building shadersIcenowy Zheng2019-04-121-0/+2
| | | | | | | | | | | | | | | | | Both processors of Mali Utgard are float-only, so bool are not acceptable data type of them. Fortunately the NIR compiler infrastructure has a lower pass to lower bool to float. Call this lower pass to lower bool to float for both GP and PP. This makes Glamor on Xorg server 1.20.3 at least doesn't hang when starting gtk3-demo. The old map of nir op bcsel is changed to fcsel, and the map of b2f32 in PP is dropped because it's not needed now (it's originally only mapped to ppir_op_mov). Signed-off-by: Icenowy Zheng <[email protected]> Reviewed-by: Qiang Yu <[email protected]>
* gallium: add lima driverQiang Yu2019-04-111-0/+317
v2: - use renamed util_dynarray_grow_cap - use DEBUG_GET_ONCE_FLAGS_OPTION for debug flags - remove DRM_FORMAT_MOD_ARM_AGTB_MODE0 usage - compute min/max index in driver v3: - fix plbu framebuffer state calculation - fix color_16pc assemble - use nir_lower_all_source_mods for lowering neg/abs/sat - use float arrary for static GPU data - add disassemble comment for static shader code - use drm_find_modifier v4: - use lima_nir_lower_uniform_to_scalar v5: - remove nir_opt_global_to_local when rebase Cc: Rob Clark <[email protected]> Cc: Alyssa Rosenzweig <[email protected]> Acked-by: Eric Anholt <[email protected]> Signed-off-by: Andreas Baierl <[email protected]> Signed-off-by: Arno Messiaen <[email protected]> Signed-off-by: Connor Abbott <[email protected]> Signed-off-by: Erico Nunes <[email protected]> Signed-off-by: Heiko Stuebner <[email protected]> Signed-off-by: Koen Kooi <[email protected]> Signed-off-by: Marek Vasut <[email protected]> Signed-off-by: marmeladema <[email protected]> Signed-off-by: PaweÅ‚ Chmiel <[email protected]> Signed-off-by: Rob Herring <[email protected]> Signed-off-by: Rohan Garg <[email protected]> Signed-off-by: Vasily Khoruzhick <[email protected]> Signed-off-by: Qiang Yu <[email protected]>