summaryrefslogtreecommitdiffstats
path: root/src/broadcom
Commit message (Collapse)AuthorAgeFilesLines
* nir/opt_peephole_select: Don't peephole_select expensive math instructionsIan Romanick2018-12-171-1/+1
| | | | | | | | | | | | | | | | On some GPUs, especially older Intel GPUs, some math instructions are very expensive. On those architectures, don't reduce flow control to a csel if one of the branches contains one of these expensive math instructions. This prevents a bunch of cycle count regressions on pre-Gen6 platforms with a later patch (intel/compiler: More peephole select for pre-Gen6). v2: Remove stray #if block. Noticed by Thomas. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Thomas Helland <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* nir/opt_peephole_select: Don't try to remove flow control around indirect loadsIan Romanick2018-12-171-1/+1
| | | | | | | | | | | | | | | | | | | That flow control may be trying to avoid invalid loads. On at least some platforms, those loads can also be expensive. No shader-db changes on any Intel platform (even with the later patch "intel/compiler: More peephole select"). v2: Add a 'indirect_load_ok' flag to nir_opt_peephole_select. Suggested by Rob. See also the big comment in src/intel/compiler/brw_nir.c. v3: Use nir_deref_instr_has_indirect instead of deref_has_indirect (from nir_lower_io_arrays_to_elements.c). v4: Fix inverted condition in brw_nir.c. Noticed by Lionel. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* v3d: Fix the argument type for vir_BRANCH().Eric Anholt2018-12-171-1/+1
| | | | | Apparently this has been spewing warnings for Jason's clang, but not my gcc.
* nir: Add a bool to int32 lowering passJason Ekstrand2018-12-161-0/+2
| | | | | | | | We also enable it in all of the NIR drivers. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Tested-by: Bas Nieuwenhuizen <[email protected]>
* nir: Rename Boolean-related opcodes to include 32 in the nameJason Ekstrand2018-12-161-22/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a squash of a bunch of individual changes: nir/builder: Generate 32-bit bool opcodes transparently nir/algebraic: Remap Boolean opcodes to the 32-bit variant Use 32-bit opcodes in the NIR producers and optimizations Generated with a little hand-editing and the following sed commands: sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' **/*.c sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' **/*.c sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' **/*.c sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' **/*.c sed -i 's/nir_op_\([fiu]lt\)/nir_op_\132/g' **/*.c sed -i 's/nir_op_\([fiu]ge\)/nir_op_\132/g' **/*.c sed -i 's/nir_op_\([fiu]ne\)/nir_op_\132/g' **/*.c sed -i 's/nir_op_\([fiu]eq\)/nir_op_\132/g' **/*.c sed -i 's/nir_op_\([fi]\)ne32g/nir_op_\1neg/g' **/*.c sed -i 's/nir_op_bcsel/nir_op_b32csel/g' **/*.c Use 32-bit opcodes in the NIR back-ends Generated with a little hand-editing and the following sed commands: sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' **/*.c sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' **/*.c sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' **/*.c sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' **/*.c sed -i 's/nir_op_\([fiu]lt\)/nir_op_\132/g' **/*.c sed -i 's/nir_op_\([fiu]ge\)/nir_op_\132/g' **/*.c sed -i 's/nir_op_\([fiu]ne\)/nir_op_\132/g' **/*.c sed -i 's/nir_op_\([fiu]eq\)/nir_op_\132/g' **/*.c sed -i 's/nir_op_\([fi]\)ne32g/nir_op_\1neg/g' **/*.c sed -i 's/nir_op_bcsel/nir_op_b32csel/g' **/*.c Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Tested-by: Bas Nieuwenhuizen <[email protected]>
* v3d: Use the original bit size when scalarizing uniform loads.Eric Anholt2018-12-161-1/+2
| | | | | | Prevents a regression in jekstrand's 1-bit series. Reviewed-by: Jason Ekstrand <[email protected]>
* v3d: Drop in a bunch of notes about performance improvement opportunities.Eric Anholt2018-12-143-1/+61
| | | | | | These have all been floating in my head, and while I've thought about encoding them in issues on gitlab once they're enabled, they also make sense to just have in the area of the code you'll need to work in.
* v3d: Do uniform pretty-printing in the QPU dump.Eric Anholt2018-12-143-1/+62
| | | | | If you're trying to trace what's going on in a QPU dump, this will definitely help you find your way.
* v3d: Move uniform pretty-printing to its own helper function.Eric Anholt2018-12-142-71/+77
| | | | I want to reuse it in the QPU dump.
* v3d: Avoid assertion failures when removing end-of-shader instructions.Eric Anholt2018-12-141-0/+6
| | | | | | | | | After generating VIR, we leave c->cursor pointing at the end of the shader. If the shader had dead code at the end (for example from preamble instructions in a shader with no side effects), we would assertion fail that we were leaving the cursor pointing at freed memory. Since anything following DCE should be setting up a new cursor anyway, just clear the cursor at the start.
* v3d: Add support for draw indirect for GLES3.1.Eric Anholt2018-12-141-0/+39
| | | | | | In trying to enable compute shaders, I found that a bunch of deqp-gles31's compute stuff wanted to interact with indirect dispatch. This was easy to do on its own.
* v3d: Add missing flagging of SYNCB as a TSY op.Eric Anholt2018-12-141-0/+1
| | | | Fixes: f2e41daac577 ("broadcom/vc5: Update QPU instruction pack/unpack for v4.2.")
* v3d: Make sure that a thrsw doesn't split a multop from its umul24.Eric Anholt2018-12-141-0/+1
| | | | | | | The thrsw will invalidate rtop, just like accumulators and flags. Caught by simulator assertions in CS imulextended/umulextended tests. Fixes: 90269ba35333 ("broadcom/vc5: Use THRSW to enable multi-threaded shaders.")
* v3d: Fix a leak of the disassembled instruction string during debug dumps.Eric Anholt2018-12-071-0/+1
| | | | Fixes: ade416d02369 ("broadcom: Add VC5 NIR compiler.")
* v3d: Add VIR dumping of TMU config p0/p1.Eric Anholt2018-12-072-0/+27
| | | | I had a bit of it for V3D 3.x, but didn't update it for 4.x.
* v3d: Simplify VIR uniform dumping using a temporary.Eric Anholt2018-12-071-19/+10
|
* v3d: Garbage collect unused uniforms code.Eric Anholt2018-12-071-2/+0
|
* v3d: Return the right gl_SampleMaskIn[] value.Eric Anholt2018-12-072-3/+1
| | | | | It's supposed to be the dispatched sample mask for this pixel, not the GL state's sample mask.
* v3d: Fix a comment typoEric Anholt2018-12-071-1/+1
|
* v3d: Convert to using nir_src_as_uint() from const_value derefs.Eric Anholt2018-12-071-14/+10
| | | | | Follows 16870de8a0aa ("nir: Use nir_src_is_const and nir_src_as_* in core code") to clean up v3d.
* v3d: Re-use the wrap mode uniform on V3D 3.3.Eric Anholt2018-12-071-24/+4
|
* v3d: Use combined input/output segments.Eric Anholt2018-12-073-1/+34
| | | | | | | The HW apparently has some issues (or at least a much more complicated VCM calculation) with non-combined segments, and the closed source driver also uses combined I/O. Until I get the last CTS failure resolved (which does look plausibly like some VPM stomping), let's use combined I/O too.
* nir: Make boolean conversions sized just like the othersJason Ekstrand2018-12-051-4/+4
| | | | | | | | | Instead of a single i2b and b2i, we now have i2b32 and b2iN where N is one if 8, 16, 32, or 64. This leads to having a few more opcodes but now everything is consistent and booleans aren't a weird special case anymore. Reviewed-by: Connor Abbott <[email protected]>
* meson: Add tests to suitesDylan Baker2018-11-201-1/+2
| | | | | | | | | | | | | | | | Meson test has a concepts of suites, which allow tests to be grouped together. This allows for a subtest of tests to be run only (say only the tests for nir). A test can be added to more than one suite, but for the most part I've only added a test to a single suite, though I've added a compiler group that includes nir, glsl, and glcpp tests. To use this you'll need to invoke meson test directly, instead of ninja test (which always runs all targets). it can be invoked as: `meson test -C builddir --suite $suitename` (meson test has addition options that are pretty useful). Tested-By: Gert Wollny <[email protected]> Acked-by: Eric Engestrom <[email protected]>
* nir: Make nir_lower_clip_vs optionally work with variables.Kenneth Graunke2018-11-191-1/+2
| | | | | | | | | | | | The way nir_lower_clip_vs() works with store_output intrinsics makes a ton of assumptions about the driver_location field. In i965 and iris, I'd rather do this lowering early and work with variables. v3d may want to switch to that as well, and ir3 could too, but I'm not sure exactly what would need updating. For now, handle both methods. Reviewed-by: Eric Anholt <[email protected]>
* v3d: Don't try to set PF flags on a LDTMU operationEric Anholt2018-11-151-0/+6
| | | | | We need an ALU op in order to set PF. Fixes a recent assertion failure in dEQP-GLES3.functional.ubo.single_basic_type.shared.bool_vertex
* v3d: Update the TLB config for depth writes on V3D 4.2.Eric Anholt2018-11-011-8/+22
| | | | Fixes 311 piglit cases on the simulator.
* configure: allow building with python3Emil Velikov2018-10-311-1/+1
| | | | | | | | | | | | | | | Pretty much all of the scripts are python2+3 compatible. Check and allow using python3, while adjusting the PYTHON2 refs. Note: - python3.4 is used as it's the earliest supported version - python2 chosen prior to python3 v2: use python2 by default Cc: Ilia Mirkin <[email protected]> Signed-off-by: Emil Velikov <[email protected]> Acked-by: Eric Engestrom <[email protected]>
* v3d: Use nir_remove_unused_io_vars to handle binner shader output DCEEric Anholt2018-10-302-45/+13
| | | | | | We were doing this late after nir_lower_io, but we can just reuse the core code. By doing it at this stage, we won't even set up the VS attributes as inputs, reducing our VPM size.
* v3d: Only add output slot tracking for the current varying slot.Eric Anholt2018-10-301-1/+1
| | | | | | | | We always emit 4 slots per slot because things like color output and position processing in the epilogue will potentially look up more values than the variable declaration had. However, when we get a .location_frac != 0, we don't want to overwrite components of the following .driver_location.
* v3d: Use nir_lower_io_to_scalar_early to DCE unused VS input components.Eric Anholt2018-10-301-0/+16
| | | | | This lets us trim unused trailing components in the vertex attributes, reducing the size of our VPM allocations.
* v3d: Don't rely on sorting input vars for VPM read setup.Eric Anholt2018-10-301-28/+20
| | | | | | | For supporting scalar VPM i/o at the NIR level, we need to do a pass over the vars to figure out how big each attribute is after DCE. Once we've done that, we can just walk over c->vattr_sizes[] instead of bothering with vars.
* v3d: Split out NIR input setup between FS and VPM.Eric Anholt2018-10-301-47/+80
| | | | | They don't share much code, and I'm about to rewrite the remaining shared code for the VPM case.
* util: use C99 declaration in the for-loop hash_table_foreach() macroEric Engestrom2018-10-252-3/+0
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* v3d: Add support for hardware pack/unpack of half floats.Eric Anholt2018-10-151-0/+16
| | | | | Cuts the formerly 7-minute simulation time of fs-packHalf2x16.shader_test in half.
* android: broadcom/cle: export the broadcom top level path headersMauro Rossi2018-09-151-0/+2
| | | | | | | | | | | | | | | | | | | | Fixes the following building error in vc4 build: In file included from external/mesa/src/gallium/drivers/vc4/kernel/vc4_render_cl.c:34: In file included from external/mesa/src/gallium/drivers/vc4/kernel/vc4_drv.h:27: In file included from external/mesa/src/gallium/drivers/vc4/vc4_simulator_validate.h:34: In file included from external/mesa/src/gallium/drivers/vc4/vc4_context.h:39: In file included from external/mesa/src/gallium/drivers/vc4/vc4_cl.h:56: gen/STATIC_LIBRARIES/libmesa_broadcom_genxml_intermediates/broadcom/cle/v3d_packet_v21_pack.h:12:10: fatal error: 'cle/v3d_packet_helpers.h' file not found ^~~~~~~~~~~~~~~~~~~~~~~~~~ 1 error generated. Fixes: 5b102160ae ("broadcom/genxml: Introduce a V3D packet/struct decoder.") Cc: "18.2" <[email protected]> Acked-by: Eric Anholt <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Signed-off-by: Mauro Rossi <[email protected]>
* android: broadcom/cle: add gallium include pathMauro Rossi2018-09-151-0/+2
| | | | | | | | | | | | | | | | | Fixes the following building error: In file included from external/mesa/src/broadcom/cle/v3d_decoder.c:38: In file included from external/mesa/src/broadcom/cle/v3d_packet_helpers.h:29: external/mesa/src/gallium/auxiliary/util/u_math.h:42:10: fatal error: 'pipe/p_compiler.h' file not found ^~~~~~~~~~~~~~~~~~~ 1 error generated. Fixes: 5b102160ae ("broadcom/genxml: Introduce a V3D packet/struct decoder.") Cc: "18.2" <[email protected]> Acked-by: Eric Anholt <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Signed-off-by: Mauro Rossi <[email protected]>
* android: broadcom/genxml: fix collision with intel/genxml header-gen macroMauro Rossi2018-09-151-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes the following building error, happening when building both intel and broadcom: Gen Header: libmesa_broadcom_genxml_32 <= v3d_packet_v21_pack.h FAILED: gen/STATIC_LIBRARIES/libmesa_broadcom_genxml_intermediates/broadcom/cle/v3d_packet_v21_pack.h /bin/bash -c "python external/mesa/src/broadcom/cle/gen_pack_header.py \ external/mesa/src/broadcom/cle/v3d_packet_v21.xml \ > gen/STATIC_LIBRARIES/libmesa_broadcom_genxml_intermediates/broadcom/cle/v3d_packet_v21_pack.h" Traceback (most recent call last): File "external/mesa/src/broadcom/cle/gen_pack_header.py", line 626, in <module> p = Parser(sys.argv[2]) IndexError: list index out of range header-gen macro is already defined by Intel genxml building rules and the existing header-gen does not have the $(PRIVATE_VER) argument, infact the bash command line logged in the building error is missing exactly $(PRIVATE_VER) argument Renaming the macro as pack-header-gen in src/broadcom/Android.genxml.mk solves the building error, another possible way is to keep the gen rules commands expanded and not use the macros. Fixes: 7f80a9ff13 ("vc4: Introduce XML-based packet header generation like Intel's.") Cc: "18.2" <[email protected]> Acked-by: Eric Anholt <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Signed-off-by: Mauro Rossi <[email protected]>
* move u_math to src/utilDylan Baker2018-09-071-1/+1
| | | | | | | | | | | | | | | Currently we have two sets of functions for bit counts, one in gallium and one in core mesa. The ones in core mesa are header only in many cases, since they reduce to "#define _mesa_bitcount popcount", but they provide a fallback implementation. This is important because 32bit msvc doesn't have popcountll, just popcount; so when nir (for example) includes the core mesa header it doesn't (and shouldn't) link with core mesa. To fix this we'll promote the version out of gallium util, then replace the core mesa uses with the util version, since nir (and other non-core mesa users) can and do link with mesautils. Acked-by: Eric Engestrom <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* v3d: Fix setup of the VCM cache size.Eric Anholt2018-09-071-1/+2
| | | | | | | | | | | There were two bugs working together to make things mostly work: I wasn't dividing the VPM output size available by the size of a batch (vertex), but I also had the size of the VPM reduced by a factor of 8. Fixes dEQP-GLES3.functional.vertex_array_objects.all_attributes and it seems also my intermittent varying failures. Fixes: 1561e4984eb0 ("v3d: Emit the VCM_CACHE_SIZE packet.")
* Revert "configure: allow building with python3"Emil Velikov2018-08-241-1/+1
| | | | | | | | | | | | | | This reverts commit ae7898dfdbe5c8dab7d11c71862353f1ae43feb0. Turns out the python scripts are _not_ fully python 3 compatible. As Ilia reported using get_xmlpool.py with LANG=C produces some weird output - see the link for details. Even though the issue was spotted with the autoconf build, it exposes a genuine problem with the script (and lack of lang handling of the meson build.) https://lists.freedesktop.org/archives/mesa-dev/2018-August/203508.html
* configure: allow building with python3Emil Velikov2018-08-231-1/+1
| | | | | | | | | | | | Pretty much all of the scripts are python2+3 compatible. Check and allow using python3, while adjusting the PYTHON2 refs. Note: - python3.4 is used as it's the earliest supported version - python3 chosen prior to python2 Signed-off-by: Emil Velikov <[email protected]> Acked-by: Eric Engestrom <[email protected]>
* meson: Build with Python 3Mathieu Bridon2018-08-101-2/+2
| | | | | | | | | | | | Now that all the build scripts are compatible with both Python 2 and 3, we can flip the switch and tell Meson to use the latter. Since Meson already depends on Python 3 anyway, this means we don't need two different Python stacks to build Mesa. Signed-off-by: Mathieu Bridon <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* v3d: Emit the VCM_CACHE_SIZE packet.Eric Anholt2018-08-064-4/+36
| | | | | | | This is needed to ensure that we don't get blocked waiting for VPM space with bin/render overlapping. Cc: "18.2" <[email protected]>
* v3d: Avoid spilling that breaks the r5 usage after a ldvary.Eric Anholt2018-08-061-0/+9
| | | | | | Fixes bad rendering when forcing 2 spills in glxgears. Cc: "18.2" <[email protected]>
* v3d: Make sure that QPU instruction-has-a-dest matches VIR.Eric Anholt2018-08-062-1/+11
| | | | | | | | | Found when debugging register spilling -- we would try to spill the dest of a STVPMV, inserting spill code after entering the last segment. In fact, we were likely to to choose to do this, given that the STVPMV "dest" temp was never read from, making it cheap to spill. Cc: "18.2" <[email protected]>
* v3d: Wait for TMU writes to complete before continuing after a spill.Eric Anholt2018-08-061-1/+6
| | | | | | | | The simulator complained that we had write responses outstanding at shader end. It seems that a TMU read does not guarantee that previous TMU writes by the thread have completed, which surprised me. Cc: "18.2" <[email protected]>
* v3d: Make sure we don't emit a thrsw before the last one finished.Eric Anholt2018-08-061-2/+13
| | | | | | | Found while forcing some spilling, which creates a lot of short tmua->thrsw->ldtmu sequences. Cc: "18.2" <[email protected]>
* v3d: Add some debug code for forcing register spilling.Eric Anholt2018-08-061-0/+14
| | | | | | This is useful for periodically testing out register spilling to see how it goes on simple shaders, rather than only failing on insanely complicated ones.
* v3d: Actually put the "%s" in the snprintf.Eric Anholt2018-08-011-1/+1
| | | | | | | | I missed an important part when porting the change over, fixing my compiler warning but breaking -Werror=format-security. Fixes: e6ff5ac4468e ("v3d: use snprintf(..., "%s", ...) instead of strncpy") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107443