summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/vc4
Commit message (Collapse)AuthorAgeFilesLines
* vc4: Use runtime CPU detection for whether NEON is available.Eric Anholt2017-05-022-14/+16
| | | | | | | | This will allow Raspbian's ARMv6 builds to take advantage of the new NEON code, and could prevent problems if vc4 ends up getting used on a v7 CPU without NEON. v2: Drop dead NEON_SUFFIX (noted by Erik Faye-Lund)
* vc4: Use a wrapper file to set VC4_BUILD_NEON instead of CFLAGS.Eric Anholt2017-05-024-8/+31
| | | | | | | | | Android.mk was setting the flag across the entire driver, so we didn't have non-NEON versions getting built. This was going to be a problem with the next commit, when I start auto-detecting NEON support and use the non-NEON version when appropriate. Reviewed-by: Rob Herring <[email protected]>
* vc4: Only build the NEON code on arm32.Eric Anholt2017-05-011-2/+2
| | | | | | | | | | | NEON is sufficiently different on arm64 that we can't just reuse this code. Disable it on arm64 for now. v2: Use PIPE_ARCH_ARM instead, as __ARM_ARCH may be 8 for a 32-bit build for a v8 CPU. Signed-off-by: Eric Anholt <[email protected]> Cc: <[email protected]>
* gallium: add PIPE_SHADER_CAP_TGSI_SKIP_MERGE_REGISTERSSamuel Pitoiset2017-04-261-0/+1
| | | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium: fold u_trim_pipe_prim call from st/mesa to driversMarek Olšák2017-04-201-0/+5
| | | | | | | Most drivers don't need it and shouldn't need it because it can't be used in some cases (indirect draws, primitive restart, count from streamout). Reviewed-by: Brian Paul <[email protected]>
* vc4: Enable V3D 2.6.Eric Anholt2017-04-181-1/+1
| | | | | This version of the chip is present on the Cygnus-based 911360 enterprise phone platform. It appears to be completely backwards compatible.
* gallium: add PIPE_CAP_TGSI_TES_LAYER_VIEWPORTNicolai Hähnle2017-04-141-0/+1
| | | | | Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* gallium: add PIPE_CAP_TGSI_BALLOTNicolai Hähnle2017-04-051-0/+1
| | | | Reviewed-by: Marek Olšák <[email protected]>
* gallium: add sparse buffer interface and capabilityNicolai Hähnle2017-04-051-0/+1
| | | | | | | v2: - explain the resource_commit interface in more detail Reviewed-by: Marek Olšák <[email protected]>
* gallium: Add a cap to check if the driver supports fill_rectangleLyude2017-03-311-0/+1
| | | | | | | | Changes since v1: - Add pipe caps for etnaviv, freedreno, swr and virgl Signed-off-by: Lyude <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* gallium: remove support for predicates from TGSI (v2)Marek Olšák2017-04-011-2/+0
| | | | | | | | | | | Neved used. v2: gallivm: rename "pred" -> "exec_mask" etnaviv: remove the cap gallium: fix tgsi_instruction::Padding Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* gallium: add PIPE_CAP_TGSI CLOCKNicolai Hähnle2017-03-311-0/+1
| | | | | Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* vc4: Fix indenting in vc4_screen_get_param()Lyude2017-03-301-3/+3
| | | | | | Signed-off-by: Lyude <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* gallium: add PIPE_CAP_TGSI_TEX_TXF_LZMarek Olšák2017-03-151-0/+1
|
* nir: Rework conversion opcodesJason Ekstrand2017-03-143-13/+13
| | | | | | | | | | | | | | | | | | | | | | | | The NIR story on conversion opcodes is a mess. We've had way too many of them, naming is inconsistent, and which ones have explicit sizes was sort-of random. This commit re-organizes things and makes them all consistent: - All non-bool conversion opcodes now have the explicit size in the destination and are named <src_type>2<dst_type><size>. - Integer <-> integer conversion opcodes now only come in i2i and u2u forms (i2u and u2i have been removed) since the only difference between the different integer conversions is whether or not they sign-extend when up-converting. - Boolean conversion opcodes all have the explicit size on the bool and are named <src_type>2<dst_type>. Making things consistent also allows nir_type_conversion_op to be moved to nir_opcodes.c and auto-generated using mako. This will make adding int8, int16, and float16 versions much easier when the time comes. Reviewed-by: Eric Anholt <[email protected]>
* vc4: Fix math with a condition flag set.Eric Anholt2017-03-082-3/+18
| | | | | | | | | | | Math results land in r4, regardless of the condition. To implement them, we just need to ensure that the results are moved out of r4 (as often happens anyway, the values is live across another math instruction), so that we can attach the condition to the MOV. Fixes dEQP-GLES2.functional.shaders.random.all_features.fragment.93 and a couple others, that were assertion failing that their conditions hadn't been handled during the QIR->QPU stage.
* vc4: Fix register pressure cost estimates when a src appears twice.Eric Anholt2017-03-081-3/+13
| | | | | | | | | | This ended up confusing the scheduler for things like fabs (implemented as fmaxabs x, x) or squaring a number, and it would try to avoid scheduling them because it appeared more expensive than other instructions. Fixes failure to register allocate in dEQP-GLES2.functional.uniform_api.random.3 with almost no shader-db effects (+.35% max temps)
* vc4: Report to shader-db how many threads a fragment shader has.Eric Anholt2017-03-081-0/+7
| | | | | Doing instruction count analysis when we emit the thread switches that will save us from tons of stalls is kind of missing the point.
* Revert "vc4: Lazily emit our FS/VS input loads."Eric Anholt2017-03-084-93/+75
| | | | | | This reverts commit 292c24ddac5acc35676424f05291c101fcd47b3e. It broke a lot of GLES2 deqp, and I see at least one problem that will require some serious rework to fix.
* gallium: s/uint/enum pipe_shader_type/ for set_constant_buffer()Brian Paul2017-03-081-1/+2
| | | | Reviewed-by: Edward O'Callaghan <[email protected]>
* gallium: s/unsigned/enum pipe_shader_type/ for get_compiler_options()Brian Paul2017-03-082-2/+4
| | | | Reviewed-by: Edward O'Callaghan <[email protected]>
* gallium: s/unsigned/enum pipe_shader_type/ for pipe_screen::get_shader_param()Brian Paul2017-03-081-2/+3
| | | | Reviewed-by: Edward O'Callaghan <[email protected]>
* gallium/util: replace pipe_mutex_unlock() with mtx_unlock()Timothy Arceri2017-03-072-7/+7
| | | | | | | | | | pipe_mutex_unlock() was made unnecessary with fd33a6bcd7f12. Replaced using: find ./src -type f -exec sed -i -- \ 's:pipe_mutex_unlock(\([^)]*\)):mtx_unlock(\&\1):g' {} \; Reviewed-by: Marek Olšák <[email protected]>
* gallium/util: replace pipe_mutex_lock() with mtx_lock()Timothy Arceri2017-03-072-6/+6
| | | | | | | | | | replace pipe_mutex_lock() was made unnecessary with fd33a6bcd7f12. Replaced using: find ./src -type f -exec sed -i -- \ 's:pipe_mutex_lock(\([^)]*\)):mtx_lock(\&\1):g' {} \; Reviewed-by: Marek Olšák <[email protected]>
* gallium/util: replace pipe_mutex_init() with mtx_init()Timothy Arceri2017-03-071-1/+1
| | | | | | | | | | pipe_mutex_init() was made unnecessary with fd33a6bcd7f12. Replace was done using: find ./src -type f -exec sed -i -- \ 's:pipe_mutex_init(\([^)]*\)):(void) mtx_init(\&\1, mtx_plain):g' {} \; Reviewed-by: Marek Olšák <[email protected]>
* gallium/util: replace pipe_mutex with mtx_tTimothy Arceri2017-03-071-2/+2
| | | | | | pipe_mutex was made unnecessary with fd33a6bcd7f12. Reviewed-by: Marek Olšák <[email protected]>
* vc4: Lazily emit our FS/VS input loads.Eric Anholt2017-02-244-75/+93
| | | | | | | | | | | | | | | | | | | This reduces register pressure in both types of shaders, by reordering the input loads from the var->data.driver_location order to whatever order they appear first in the NIR shader. These instructions aren't reorderable at our QIR scheduling level because the FS takes two in lockstep to do an interpolation, and the VS takes multiple read instructions in a row to get a whole vec4-level attribute read. shader-db impact: total instructions in shared programs: 76666 -> 76590 (-0.10%) instructions in affected programs: 42945 -> 42869 (-0.18%) total max temps in shared programs: 9395 -> 9208 (-1.99%) max temps in affected programs: 2951 -> 2764 (-6.34%) Some programs get their max temps hurt, depending on the order that the load_input intrinsics appear, because we end up being unable to copy propagate an older VPM read into its only use.
* vc4: Refactor the load_input code out of the intrinsic code.Eric Anholt2017-02-241-25/+42
| | | | It's going gain most of ntq_setup_inputs(), so simplify it first.
* vc4: Track the last block we emitted at the top level.Eric Anholt2017-02-243-5/+10
| | | | | This will be used for delaying our VPM reads (which must be unconditional) until just before they're used.
* vc4: Emit max number of temps in the shader-db output.Eric Anholt2017-02-241-0/+23
| | | | | | | We need to be paying attention to optimization's impact on this -- even if we reduce instruction count, increasing max temps in general is likely to cause us to fail to register allocate on some shaders, which means that those won't run at all.
* gallium: remove PIPE_CAP_USER_INDEX_BUFFERSMarek Olšák2017-02-251-1/+0
| | | | | | | | all drivers support it Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Brian Paul <[email protected]> Tested-by: Brian Paul <[email protected]> (VMware driver only)
* vc4: automake: add the kernel/README to the tarballEmil Velikov2017-02-241-0/+2
| | | | | | Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Andreas Boll <[email protected]>
* gallium: set pipe_context uploaders in drivers (v3)Marek Olšák2017-02-141-5/+6
| | | | | | | | | | | | | | | Notes: - make sure the default size is large enough to handle all state trackers - pipe wrappers don't receive transfer calls from stream_uploader, because pipe_context::stream_uploader points directly to the underlying driver's stream_uploader (to keep it simple for now) v2: add error handling to nv50, nvc0, noop v3: set const_uploader Reviewed-by: Nicolai Hähnle <[email protected]> Tested-by: Edmondo Tommasina <[email protected]> (v1) Tested-by: Charmaine Lee <[email protected]>
* vc4: Enable glSampleMask() even when !rasterizer->multisample.Eric Anholt2017-02-101-2/+1
| | | | | | | | gallium's blitter expects that it can set the sample mask even when the rasterizer doesn't have the flag on. Between this and the previous test, 10 new ext_framebuffer_multisample tests start passing.
* vc4: Respect glSampleMask() even when we're not writing color.Eric Anholt2017-02-101-3/+13
| | | | | | gallium's quad-based blitter for copying MSAA depth textures expects to be able to do 4 passes updating a sample at a time using glSampleMask, and there's no color buffer bound when it's doing that.
* vc4: Use the nir_builder helper for loading sample mask.Eric Anholt2017-02-101-10/+1
|
* vc4: Use accurate 1/w in coordinate shader as well as vert shader.Eric Anholt2017-02-101-1/+1
| | | | | We probably shouldn't be emitting different scaled viewport coordinates between vertex and coord.
* vc4: Drop VS inputs to 8.Eric Anholt2017-02-101-4/+1
| | | | | | | In the hardware we only get to declare 8 vertex elements (GLES2's minimum), so we should be exposing that number here. Fixes an assertion failure in piglit texrect-many, at the expense of various GL 2.0-ish minmax tests now complaining that our count is too low.
* vc4: Avoid emitting small immediates for UBO indirect load address guards.Eric Anholt2017-02-105-4/+20
| | | | | | | | | | | | The kernel will reject our shader if we emit one here, and having 4, 8, or 12 as the top end of our UBO clamp rare is enough that it's not worth making the kernel let us. Fixes piglit fs-const-array-of-struct and fs-const-array-of-struct-of-array since recent GLSL linking changes made us get this as an indirect load of a uniform, instead of a tempoary. Cc: "13.0 17.0" <[email protected]>
* gallium: add separate PIPE_CAP_INT64_DIVMODIlia Mirkin2017-02-091-0/+1
| | | | | | | | | | | Nouveau does not currently have logic to implement this as a library function. Even though such a library could be written, there's no big advantage to do it that way for now given that int64 is a very uncommon use-case. Allow a driver to expose INT64 without supporting division and modulo operations. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium: turn PIPE_SHADER_CAP_DOUBLES into a screen capabilityNicolai Hähnle2017-02-021-1/+1
| | | | | | | | | | | | | | | | | | | Make the cap consistent with PIPE_CAP_INT64. Aside from the hypothetical case of using draw for vertex shaders (and actually caring about doubles...), every implementation supports doubles either nowhere or everywhere. Also, st/mesa didn't even check the cap correctly in all supported shader stages. While at it, add a missing LLVM version check for 64-bit integers in radeonsi. This is conservative: judging by the log, LLVM 3.8 might be sufficient, but there are probably bugs that have been fixed since then. v2: fix clover (Marek) Reviewed-by: Marek Olšák <[email protected]>
* vc4: Enable Neon on arm android buildsRob Herring2017-01-311-0/+2
| | | | | Signed-off-by: Rob Herring <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* vc4: fix arm64 build with NeonRob Herring2017-01-311-1/+1
| | | | | | | | The addition of Neon assembly breaks on arm64 builds because the assembly syntax is different. For now, restrict Neon to ARMv7 builds. Signed-off-by: Rob Herring <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* vc4: Make Neon inline assembly clang compatibleRob Herring2017-01-311-35/+35
| | | | | | | | | | | | clang throws an error on "%r2" and similar. I couldn't find any documentation on what "%r?" is supposed to mean and I've never seen any use like that as far as I remember. The parameter is supposed to be cpu_stride and just %2/%3 should be sufficient. There's no need for trailing ";" either, so remove those, too. Signed-off-by: Rob Herring <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* vc4: Coalesce into TLB writes as well as VPM/tex.Eric Anholt2017-01-281-1/+5
| | | | | | | | This generally cuts an instruction when blending is enabled and we thus have a single instruction generating the color value. total instructions in shared programs: 91759 -> 91634 (-0.14%) instructions in affected programs: 5338 -> 5213 (-2.34%)
* vc4: Avoid an extra temporary and mov in ffloor/ffract/fceil.Eric Anholt2017-01-281-13/+18
| | | | | | | | | | shader-db results: total instructions in shared programs: 92611 -> 91764 (-0.91%) instructions in affected programs: 27417 -> 26570 (-3.09%) The star is one shader in glmark2's terrain (drops 16% of its instructions), but there are also wins in mupen64plus and glb2.7.
* vc4: Flip the switch to run the GLSL compiler optimization loop once.Eric Anholt2017-01-281-1/+1
| | | | | | | | | | | | | This has almost no effect on shader-db: total instructions in shared programs: 92572 -> 92611 (0.04%) instructions in affected programs: 4486 -> 4525 (0.87%) Looking at 2 of the 7 different shaders that were hurt (all of which were in mupen64), they all appear to be just differences in order of instructions at the NIR level. The advantage is that this should significantly reduce time in the compiler.
* gallium: Add integer 64 capabilityDave Airlie2017-01-271-0/+1
| | | | | | | | | v1.1: move to using a normal CAP. (Marek) v2: fill in the cap everywhere Signed-off-by: Dave Airlie <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* vc4: Use NEON to speed up utile stores on Pi2+.cros-mesa-17.1.0-r2-vanillacros-mesa-17.1.0-r1-vanillachadv/cros-mesa-17.1.0-r2-vanillachadv/cros-mesa-17.1.0-r1-vanillaEric Anholt2017-01-261-5/+50
| | | | Improves 1024x1024 TexSubImage2D by 41.2371% +/- 3.52799% (n=10).
* vc4: Use NEON to speed up utile loads on Pi2.Eric Anholt2017-01-263-18/+115
| | | | | | | | | | | | | | | | | | | We had a lot of memcpy call overhead because gpu_stride wasn't being inlined. But if you split out the stride==8 and stride==16 cases like this code does while still using memcpy, you'd no longer have glibc's NEON memcpy applied at which point we'd be doing 16 uncached reads instead of 64/(NEON memcpy granularity), for about a 30% performance hit. By hand writing the assembly, we can get a whole cacheline loaded at a time. Unfortunately, NEON intrinsics turned out to be unusable -- they didn't have the vldm instruction available. Note that, for now, the NEON code is only enabled when building for ARMv7 (Pi 2+). We may want to do runtime detection for the Raspbian case, in the future. Improves 1024x1024 GetTexImage by 208.256% +/- 7.07029% (n=10).