summaryrefslogtreecommitdiffstats
path: root/src/broadcom
Commit message (Collapse)AuthorAgeFilesLines
* v3d: Implement a small immediates optimization, based on VC4's.Eric Anholt2018-07-238-19/+143
| | | | | | | | | We can do one per instruction, and we have to be careful not to overwrite raddr_b, but this greatly reduces the pressure on uniform loads (particularly around ldvpm/stvpm instructions). total instructions in shared programs: 90768 -> 88220 (-2.81%) instructions in affected programs: 82711 -> 80163 (-3.08%)
* v3d: Return an invalid src number if asked for a missing implicit uniform.Eric Anholt2018-07-232-3/+3
| | | | | | Sometimes when iterating over sources, we might want to check if it's the implicit one. We wouldn't want to match on a non-implicit src using this function.
* v3d: Skip emitting texture config parameter 2 if it's just the defaults.Eric Anholt2018-07-231-1/+5
| | | | | | shader-db: total instructions in shared programs: 91275 -> 90768 (-0.56%) instructions in affected programs: 20702 -> 20195 (-2.45%)
* v3d: Update an XXX comment for a path we handled in HW on V3D 4.x.Eric Anholt2018-07-231-1/+1
|
* v3d: Switch to using the new SFU instructions on V3D 4.x.Eric Anholt2018-07-238-24/+118
| | | | | | | | | | | | | | | | These instructions let us write directly to the phys regfile, instead of just R4. That lets us avoid moving out of R4 to avoid conflicting with other SFU results, and to avoid conflicting with thread switches. There is still an extra instruction of latency, which is not represented in the scheduler at the moment. If you use the result before it's ready, the QPU will just stall, unlike the magic R4 mode where you'd read the previous value. That means that the following shader-db results aren't quite representative (since we now cause some stalls instead of emitting nops), but they're impressive enough that I'm happy with the change. total instructions in shared programs: 95669 -> 91275 (-4.59%) instructions in affected programs: 82590 -> 78196 (-5.32%)
* v3d: Add QPU pack/unpack for the new SFU instructions.Eric Anholt2018-07-234-0/+32
| | | | | These instructions allow writing the result to any register, instead of a special writeback to r4.
* v3d: Fix the name of the "flpop" operation.Eric Anholt2018-07-236-6/+7
| | | | | Noticed while trying to sort a new op into the appropriate place to match the documentation.
* v3d: Print the instruction we're testing in the QPU disasm/pack round-trip.Eric Anholt2018-07-231-2/+3
| | | | | If we fail initial disassembly, it's good to know what instruction it was that failed.
* v3d: Drop unused vir_SAT() operation.Eric Anholt2018-07-231-8/+0
| | | | We lower saturates in NIR.
* v3d: Rotate through registers to improve post-RA scheduling options.Eric Anholt2018-07-231-0/+45
| | | | | | | | | | | Similarly to VC4's implementation, by not picking r0 immediately upon freeing it, we give the scheduler more of a chance to fit later writes in earlier. I'm not clear on whether there's any real cost to picking phys over accumulators, so keep that behavior for now. shader-db: total instructions in shared programs: 96831 -> 95669 (-1.20%) instructions in affected programs: 77254 -> 76092 (-1.50%)
* v3d: Allow reading from physical regs written in the previous instruction.Eric Anholt2018-07-231-24/+0
| | | | | | | | | This restriction existed in V3D 2.x, but lifting it was a major change in 3.x. shader-db results: total instructions in shared programs: 98117 -> 96831 (-1.31%) instructions in affected programs: 48520 -> 47234 (-2.65%)
* v3d: Disable shader-db cycle estimates until we sort out TMU estimates.Eric Anholt2018-07-161-1/+4
| | | | | I keep having to ignore these shader-db changes since I don't trust them, so just disable the reports entirely.
* v3d: Emit the lowered uniform just before its first use in a block.Eric Anholt2018-07-161-20/+18
| | | | | | | | total instructions in shared programs: 98578 -> 98119 (-0.47%) instructions in affected programs: 27571 -> 27112 (-1.66%) and it also eliminates most spills/fills on the CTS's randomized uniform usage testcases.
* v3d: Add an assert that we don't provide an invalid texture return words.Eric Anholt2018-07-161-0/+8
| | | | The docs had an update noting this restriction, so reflect it in the code.
* v3d: Apply GFXH-1625 restriction on TMUWT in the end of the shader.Eric Anholt2018-07-161-0/+4
| | | | | This doesn't affect us yet since we're not doing TMUWTs, but I think we will for GLES 3.1.
* v3d: Implement noperspective varyings on V3D 4.x.Eric Anholt2018-07-094-4/+9
| | | | | Fixes a bunch of piglit interpolation tests, and reduces my concern about some MSAA blit shaders with noperspective varyings.
* v3d: Fix typo in dither mode offset.Eric Anholt2018-07-091-1/+1
| | | | | | We weren't using the field yet, so it didn't affect anything. Fixes: c0476d964abb ("v3d: Express dithering mode in the same way that the CLIF parser does.")
* v3d: Add support for GL_SAMPLE_ALPHA_TO_ONE.Eric Anholt2018-07-051-0/+3
| | | | Fixes piglit ext_framebuffer_multisample-draw-buffers-alpha-to-one
* v3d: Respect swap_color_rb for the f32_color_rb case.Eric Anholt2018-07-051-5/+7
| | | | | We don't actually set the two flags together, but I want to use the r/g/b/a reordered fields in the next commit.
* v3d: Emit a TF flush after each draw using TF.Eric Anholt2018-07-021-0/+2
| | | | | This fixes GPU hangs on 7278 in transform feedback tests such as GTF-GLES3.gtf.GL3Tests.transform_feedback2.transform_feedback2_basic
* v3d: Move GL shader state dumping out of per-version compilation.Eric Anholt2018-06-293-41/+26
| | | | It doesn't depend on V3D_VER, since it's just calling v3d_print_group.
* v3d: Add missing Stream field to transform feedback specs on V3D 4.1.Eric Anholt2018-06-291-1/+8
| | | | | Noticed when trying to CLIF parse a transform feedback job that hangs on HW.
* v3d: Add missing "tri trip or fan" flag in Primitive List Format.Eric Anholt2018-06-291-0/+1
|
* v3d: Fix the shader code address field widths on V3D 4.1+Eric Anholt2018-06-291-3/+3
| | | | | | We were overlapping it with the threadable/nan flags, resulting in incorrect relocations (threadable/nan included in the offset) and wrong ordering in the CLIF files.
* v3d: Add missing "no prim pack" field to the V3D4.1+ GL shader state.Eric Anholt2018-06-291-0/+2
| | | | | It looks like we don't need this flag for anything (not that I'm clear on what it does), but it makes our struct dumping line up with CLIF parsing.
* v3d: Express dithering mode in the same way that the CLIF parser does.Eric Anholt2018-06-291-4/+8
|
* v3d: Add missing "number of bin tile lists" field.Eric Anholt2018-06-291-0/+1
| | | | | Noticed when trying to feed our dumps through the CLIF parser. Since this is a "minus one" field, we were already filling in the value we wanted (0).
* v3d: Rewrite the color write masks to match CLIF format.Eric Anholt2018-06-291-5/+1
| | | | | The render_target_* fields gave us pretty(ish) printing, but meant we were incompatible with CLIF, and had much more verbose code generating them.
* v3d: Merge the V3D 4.1 and 4.2 XML into V3D 3.3'x XML.Eric Anholt2018-06-297-2162/+607
| | | | | | The XML ends up noisier if you're only looking at one version, but from the diffstat there's obvious wins in terms of deduplication. This will get even more significant if we ever support 3.2 or 4.0.
* v3d: Switch v3d_decoder.c to the XML's top min_ver/max_ver fields.Eric Anholt2018-06-295-6/+14
| | | | | | | The XML zipper wants one XML per version for filling out its tables, but we want to do more than one GPU version per XML now. Assume that the "gen" field will be the same as min_ver and look up our XML text assuming that they're listed in increasing min_ver.
* v3d: Create XML fields for min_ver and max_ver of a packet/struct/enum.Eric Anholt2018-06-292-2/+82
| | | | | This will be used to merge together the V3D 3.3-4.1 XML with the variants disabled based on the version.
* v3d: Pass the version being generated to the pack generator script.Eric Anholt2018-06-294-20/+22
| | | | | | | It turns out that most V3D versions change very few packets, so keeping separate copies of the XML per version makes changing the XML a pain as you have to replicate your changes to each one. This is the start of changing it so that one XML can generate headers for multiple versions.
* v3d: Convert a bunch of our "minus one" fields over to the new XML attr.Eric Anholt2018-06-273-24/+24
| | | | | This fixes up their formatting for CLIF files and makes the code more legible.
* v3d: Add pack/unpack/decode support for fields with a "- 1" modifier.Eric Anholt2018-06-273-17/+46
| | | | | | | | | | Right now, we name these fields as "field name minus one" so that your C code obviously states what the value should be. However, it's easy enough to handle at the codegen level with another little XML attribute, meaning less C code and easier-to-read values in CLIF dumping and gdb as well. (The actual CLIF format for simulator and FPGA replay takes in pre-minus-one values, so we need it there too).
* v3d, vc4: Disable valgrind checking of CLE inputs when NDEBUG is set.Eric Anholt2018-06-211-0/+2
| | | | | | For a meson -Db_ndebug=true release build on x86_64, reduces text size of libv3d.a from 53.0k to 51.6k. Inspired by 0d5329d626e3 ("anv: Disable __gen_validate_value if NDEBUG is set.")
* v3d: Implement ALPHA_TO_COVERAGE.Eric Anholt2018-06-202-2/+15
| | | | | | There's a convenient "FTOC" instruction for generating the coverage now, unlike vc4. This fixes dEQP-GLES3.functional.multisample.fbo_4_samples.proportionality_alpha_to_coverage
* v3d: Add missing always_flush debug flag.Eric Anholt2018-06-191-0/+1
| | | | The #define existed and was checked in the driver.
* v3d: Limit shader threading according to our maximum TMU fifo usage.Eric Anholt2018-06-151-10/+24
| | | | | | Fixes simulator assertion failures in dEQP-GLES3.functional.shaders.texture_functions.texture.samplercubeshadow_bias_fragment and similar complicated cases.
* v3d: Fix shaders using pixel center W but no varyings.Eric Anholt2018-06-153-15/+8
| | | | | | | | The docs called this field "uses both center W and centroid W", but actually it's "do you need center W even if varyings don't obviously call for it?" Fixes dEQP-GLES3.functional.shaders.builtin_variable.fragcoord_w
* v3d: Fix configuration setup of mixed f32 and f16 render targets.Eric Anholt2018-06-141-1/+1
| | | | Fixes dEQP-GLES3.functional.fragment_out.random.26 and 6 others.
* v3d: Remove unused QUNIFORM_STENCIL left over from vc4.Eric Anholt2018-06-141-2/+0
|
* v3d: Fix undefined results for a swap_color_rb RT from a float shader output.Eric Anholt2018-06-141-1/+4
| | | | | Fixes segfaults and undefined behavior in dEQP-GLES3.functional.fragment_out.basic.fixed.srgb8_alpha8_lowp_float
* v3d: Enable the new NIR bitfield operation lowering paths.Eric Anholt2018-06-061-2/+19
| | | | | | | | | | These together get the GLSL 3.00 unorm/snorm pack functions and MESA_shader_integer operations working. v2: Fix commit message typo. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* v3d: Be more explicit about include directory from our generated code.Eric Anholt2018-06-051-1/+1
| | | | | | | You'd need src/broadcom/cle/ in the -I previously, for srcdir != builddir. nir was fine at that, but automake didn't have it. Bugzilla: https://github.com/anholt/mesa/issues/104
* v3d: Add support for glSampleMask / glSampleCoverage.Eric Anholt2018-05-172-0/+10
|
* v3d: Enable NaN propagation in the VS and CS as well.Eric Anholt2018-05-173-3/+9
| | | | Fixes piglit vs-isnan-*.shader_test at the expense of gl-1.0-spot-light.
* v3d: Rename the driver files from "vc5" to "v3d".Eric Anholt2018-05-162-1/+1
|
* v3d: Rename the vc5_dri.so driver to v3d_dri.so.Eric Anholt2018-05-162-8/+8
| | | | | | This allows the driver to load against the merged kernel DRM driver. In the process, rename most of the build system variables and gallium plumbing functions.
* android: change include "cutils/log.h" to "log/log.h" on Android API >=26jenny.q.cao2018-05-141-0/+4
| | | | | | | | | There is a compile warning from Android 8 (API version 26) from "include cutils/log.h" warning: "Deprecated: don't include cutils/log.h, use either android/log.h or log/log.h"-W#warnings, Change to include "log/log.h" on Android 8 or later major version to avoid this warning Signed-off-by: jenny.q.cao <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* broadcom/vc5: Add support for centroid varyings.Eric Anholt2018-04-263-0/+44
| | | | | | | | | It would be nice to share the flags packet emit logic with flat shade flags, but I couldn't come up with a good way while still using our pack macros. We need to refactor this to shader record setup at compile time, anyway. Fixes ext_framebuffer_multisample-interpolation * centroid-*