summaryrefslogtreecommitdiffstats
path: root/src/gallium
Commit message (Collapse)AuthorAgeFilesLines
* lima/ppir: handle all node types in ppir_node_replace_childErico Nunes2019-07-191-2/+30
| | | | | | | | | | ppir_node_replace_child is used by the const lowering routine in ppir. All types need to be handled here, otherwise the src node is not updated properly when one of the lowered nodes is a const, which results in, for example, regalloc not assigning registers correctly. Signed-off-by: Erico Nunes <[email protected]> Reviewed-by: Vasily Khoruzhick <[email protected]>
* lima/ppir: branch regalloc fixesErico Nunes2019-07-191-0/+33
| | | | | | | | The branch instruction has sources which must be handled in src handling paths so that regalloc assigns registers to them properly. Signed-off-by: Erico Nunes <[email protected]> Reviewed-by: Vasily Khoruzhick <[email protected]>
* iris: change last_vue_stage() to look at uncompiled shadersTimothy Arceri2019-07-191-3/+3
| | | | | | | This allows us to find the last vue stage before we have compiled the shaders. Reviewed-by: Kenneth Graunke <[email protected]>
* panfrost: Set rt_countAlyssa Rosenzweig2019-07-182-8/+11
| | | | | | | | | | This doesn't quite work yet, but it illustrates how MRT is implemented in the MFBD: rt_count is set appropriately based on the number of render targets, while additional render target descriptors are appended on with an index variable in them (not quite decoded since there's some aspects we don't understand there, but conceptually this should be right). Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Trace invisible BOsAlyssa Rosenzweig2019-07-181-1/+5
| | | | | | | Helps make the decode a little more readable (names instead of addresses). Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Zero polygon list body size for clearsAlyssa Rosenzweig2019-07-181-0/+4
| | | | | | | There's no polygons, so you can't have any size to the polygon list, although there is a minimal header. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/mfbd: Unify depth-only with masked FBO pathAlyssa Rosenzweig2019-07-181-22/+24
| | | | Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Simplify set_framebuffer_stateAlyssa Rosenzweig2019-07-181-35/+9
| | | | | | Most of the ad hoc logic is already in Gallium. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Check for NULL surface in placesAlyssa Rosenzweig2019-07-185-5/+14
| | | | | | | | | | Fixes a bunch of NULL dereferences, although it does cause GPU faults of course. This is caused by color buffers masked out in MRT, which we'll eventually have to solve the right way... one thing at a time. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Expose 4 render targetsAlyssa Rosenzweig2019-07-181-2/+2
| | | | | | Hidden behind deqp flag as usual. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Shrink tiler heapAlyssa Rosenzweig2019-07-181-1/+1
| | | | | | | 128MB is excessive and 16MB is still plenty. Saves 112MB/context on kernels without growable/heap support. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* vc4: Convert vc4_nir_lower_txf_ms to nir_shader_lower_instructions().Eric Anholt2019-07-181-32/+13
| | | | | | Cuts out a bunch of boilerplate. Reviewed-by: Iago Toral Quiroga <[email protected]>
* panfrost: Handle Z24 texturesAlyssa Rosenzweig2019-07-181-1/+1
| | | | | | Just use the Z32 code. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/ci: Update expectationsAlyssa Rosenzweig2019-07-181-14/+0
| | | | | | We just fixed some stencil tests. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Make scissor test more robustAlyssa Rosenzweig2019-07-181-8/+15
| | | | | | See v3d implementation. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Use correct NO_DITHER field on MFBDAlyssa Rosenzweig2019-07-182-1/+6
| | | | Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Implement Z32F(_S8) supportAlyssa Rosenzweig2019-07-182-0/+16
| | | | | | | Z32F uses a dediacted float path. Z32F_S8 uses separate stencil planes in the hardware, lowered via u_transfer_helper. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Copy stencil front to back if back disabledAlyssa Rosenzweig2019-07-181-5/+14
| | | | | | | When backside stenciling is disabled, backfacing primitives just do the same thing as frontfacing primitives. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* swr/rast: Refactor memory API between rasterizer core and swrJan Zielinski2019-07-1830-185/+370
| | | | | | | This commit cleans up API between the core of the rasterizer and swr. Some formatting changes are also done. Reviewed-by: Alok Hota <[email protected]>
* lima/ppir: Add gl_PointCoord handlingAndreas Baierl2019-07-186-5/+34
| | | | | | | | | Treat gl_PointCoord as a system value and add the necessary bits for correct codegen. Signed-off-by: Andreas Baierl <[email protected]> Reviewed-by: Qiang Yu <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* gallium: Add PIPE_CAP_TGSI_FS_POINT_IS_SYSVALAndreas Baierl2019-07-183-0/+4
| | | | | | | | This adds an option to treat gl_PointCoord as a system value. Signed-off-by: Andreas Baierl <[email protected]> Reviewed-by: Qiang Yu <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* nir/tgsi: Extend tgsi_to_nir.c to support gl_PointCoord as a system value.Andreas Baierl2019-07-181-0/+20
| | | | | Signed-off-by: Andreas Baierl <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* lima/gp: Fix problem with complex movesConnor Abbott2019-07-183-9/+125
| | | | | | | | | | | | | | | | | | | When writing the scheduler, we forgot that you can't read the complex unit in certain sources because it gets overwritten to 0 or 1. Fixing this turned out to be possible without giving up and reducing GPIR_VALUE_REG_NUM to 10, although it was difficult in a way I didn't expect. There can be at most 4 next-max nodes that can't have moves scheduled in the complex slot, so it actually isn't a problem for getting the number of next-max nodes at 5 or lower. However, it is a problem for stores. If a given node is a next-max node whose move cannot go in the complex slot *and* is used by a store that we decide to schedule, we have to reserve one of the non-complex slots for a move instead of all the slots, or we can wind up in a situation where only the complex slot is free and we fail the move. This means that we have to add another term to the reservation logic, for stores whose children cannot be in the complex slot. Acked-by: Qiang Yu <[email protected]>
* lima/gpir: Rework the schedulerConnor Abbott2019-07-189-560/+1187
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now, we do scheduling at the same time as value register allocation. The ready list now acts similarly to the array of registers in value_regalloc, keeping us from running out of slots. Before this, the value register allocator wasn't aware of the scheduling constraints of the actual machine, which meant that it sometimes chose the wrong false dependencies to insert. Now, we assign value registers at the same time as we actually schedule instructions, making its choices reflect reality much better. It was also conservative in some cases where the new scheme doesn't have to be. For example, in something like: 1 = ld_att 2 = ld_uni 3 = add 1, 2 It's possible that one of 1 and 2 can't be scheduled in the same instruction as 3, meaning that a move needs to be inserted, so the value register allocator needs to assume that this sequence requires two registers. But when actually scheduling, we could discover that 1, 2, and 3 can all be scheduled together, so that they only require one register. The new scheduler speculatively inserts the instruction under consideration, as well as all of its child load instructions, and then counts the number of live value registers after all is said and done. This lets us be more aggressive with scheduling when we're close to the limit. With the new scheduler, the kmscube vertex shader is now scheduled in 40 instructions, versus 66 before. Acked-by: Qiang Yu <[email protected]>
* lima/gp: Mark more add-only nodes as maybe-two-slotConnor Abbott2019-07-181-0/+8
| | | | Reviewed-by: Qiang Yu <[email protected]>
* lima/gpir: Fix some bugs in instruction handlingConnor Abbott2019-07-181-0/+12
| | | | Reviewed-by: Qiang Yu <[email protected]>
* lima: Reintroduce the standalone compilerConnor Abbott2019-07-186-2/+351
| | | | | | I used this to test things without needing to have a device handy. Acked-by: Qiang Yu <[email protected]>
* softpipe: Clamp border colors when neededGert Wollny2019-07-182-14/+31
| | | | | | | | | | | | | | | | | | | | | | unorm and snorm require that the border color values are clamped, so when picking the sampler view copy/clamp the border color from the sampler and use these adjusted values. Fixes: dEQP-GLES31.functional.texture.border_clamp.range_clamp.linear_compressed_color dEQP-GLES31.functional.texture.border_clamp.range_clamp.linear_snorm_color dEQP-GLES31.functional.texture.border_clamp.range_clamp.linear_srgb_color dEQP-GLES31.functional.texture.border_clamp.range_clamp.linear_unorm_color dEQP-GLES31.functional.texture.border_clamp.range_clamp.nearest_compressed_color dEQP-GLES31.functional.texture.border_clamp.range_clamp.nearest_snorm_color dEQP-GLES31.functional.texture.border_clamp.range_clamp.nearest_srgb_color dEQP-GLES31.functional.texture.border_clamp.range_clamp.nearest_unorm_color dEQP-GLES31.functional.texture.border_clamp.range_clamp.nearest_unorm_depth dEQP-GLES31.functional.texture.border_clamp.range_clamp.nearest_unorm_depth_uint_stencil_sample_depth Signed-off-by: Gert Wollny <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* softpipe: set a lower minimum clamp value for texture coordinate border clampGert Wollny2019-07-181-1/+1
| | | | | | | | | | The value of -0.5f is not small enough to produce negative coordinates, so lower the minimum clamp value to -1.0f. This fixes a number of tests from dEQP-GLES31.functional.texture.border_clamp.* Signed-off-by: Gert Wollny <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* softpipe: Correct repeat-mirror evaluationGert Wollny2019-07-181-5/+19
| | | | | | | | | | | | when mirroring the texture corrdinates the indices must be mirrored as well and the half pixel shift must be applied in reverse. Fixes a number of tests from: dEQP-GLES31.functional.texture.gather.offset.* dEQP-GLES31.functional.texture.gather.offsets.* Signed-off-by: Gert Wollny <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* softpipe: Also mark textures as dirty when updating the framebuffer stateGert Wollny2019-07-181-1/+1
| | | | | | | | | | | At this point all the draw caches are flushed to the old attached textures, so the read caches of these textures will need to be updated too. Fixes: dEQP-GLES3.functional.fbo.color.repeated_clear.sample.tex2d.* Signed-off-by: Gert Wollny <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* etnaviv: set DITHER_MODEJonathan Marek2019-07-171-0/+1
| | | | | | | | | This fixes a rendering glitch observed in SDL testscale test, where alpha blending samples with value (1.0, 1.0, 1.0, 0.0) whitens the target instead of having no effect. Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Christian Gmeiner <[email protected]>
* etnaviv: update headers from rnndbJonathan Marek2019-07-171-1/+4
| | | | | | | Update to etna_viv commit a16a418. Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Christian Gmeiner <[email protected]>
* etnaviv: fix blend color on newer GPUsJonathan Marek2019-07-174-19/+21
| | | | | | | Newer GPUs use the half float ALPHA_COLOR_EXT register. Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Christian Gmeiner <[email protected]>
* etnaviv: fix alpha blending casesJonathan Marek2019-07-171-6/+9
| | | | | | | | We need to check rgb_func/alpha_func when determining if blend or separate alpha is required. Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Christian Gmeiner <[email protected]>
* etnaviv: fix polygon offsetJonathan Marek2019-07-171-1/+1
| | | | | | | | | | Dividing the fui result by 65535 is obviously wrong, and from testing, on GC7000L at least there is no division by 65535. Fixes dEQP-GLES2.functional.polygon_offset.fixed16_displacement_with_units Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Christian Gmeiner <[email protected]>
* freedreno/a6xx: Drop the WFI in the program update stateobj.Eric Anholt2019-07-171-2/+0
| | | | | | | | | | | | | Rob Clark thinks this was likely a workaround for our const buffer update bugs, and now that it's passing tests, we should be able to drop it. renderdoc-traces results: traces/android/clashofclans.rdc: +6.1% +/- 1.1% traces/android/candycrush.rdc: +5.2% +/- 1.6% Reviewed-by: Rob Clark <[email protected]>
* freedreno/a6xx: Drop the WFI in constant uploads.Eric Anholt2019-07-171-2/+0
| | | | | | | | | | | | | | Now that the bin vs render constlen is fixed, we can skip these waits. Improves webgl aquarium performance at 10k fish from 27fps to 33. Some highlights from renderdoc-traces: traces/android/minecraft.rdc: +17.1% +/- 3.4% traces/glmark2/ideas-speed=duration.rdc: +11.6% +/- 2.4% traces/android/candycrush.rdc: +5.4% +/- 1.1% traces/android/clashofclans.rdc: +4.4% +/- 1.3% Reviewed-by: Rob Clark <[email protected]>
* freedreno: Assert that we don't exceed constlen.Eric Anholt2019-07-171-10/+24
| | | | | | | | | We actually could go up to vs->constlen in the binning shader on a6xx, but for sanity let's make sure that we're always under constlen. This would have caught the bug fixed in 572c76fd8826 ("freedreno: Clamp UBO uploads to the constlen decided by the shader.") Reviewed-by: Rob Clark <[email protected]>
* freedreno: Fix more constlen overflows.Eric Anholt2019-07-171-2/+5
| | | | | | | | | Fixes constlen overflow in dEQP-GLES31.functional.shaders.builtin_var.compute.num_work_groups and dEQP-GLES31.functional.image_load_store.buffer.image_size.readonly_32 and probably others. Reviewed-by: Rob Clark <[email protected]>
* freedreno: Drop stale comment about skipping uploads.Eric Anholt2019-07-171-1/+0
| | | | | | | We already skip the upload if it's unused, due to the constlen > offset check. Reviewed-by: Rob Clark <[email protected]>
* virgl: Set meta data for textures from handle.Lepton Wu2019-07-171-0/+1
| | | | | | | | | | The set of meta data was removed by commit 8083464. It broke lots of dEQP tests when running with pbuffer surface type. Fixes: 80834640137 ("virgl: remove dead code") Signed-off-by: Lepton Wu <[email protected]> Reviewed-by: Erik Faye-Lund <[email protected]> Reviewed-by: Chia-I Wu <[email protected]>
* iris: Enable fast clears on other miplevels and layers than 0.Rafael Antognolli2019-07-171-8/+48
| | | | | | | | | | | | | | | | Until now we only supported fast clear colors on the first miplevel and layer. The main reason for it is that we can't have different fast clear values at different levels/layers, since the surface state only supports one clear value. We can, however, enable it if we make sure we only use the same value for all levels/layers, and if one of them changes, we resolve all the others. We already do that for depth fast clears so hopefully it will be fine for color fast clears too. v2: Add check for partial clear too (Ken). Reviewed-by: Kenneth Graunke <[email protected]>
* iris: Allow resolving clear color of CCS_D surfaces.Rafael Antognolli2019-07-171-6/+2
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* iris: Make iris_has_color_unresolved non-staticKenneth Graunke2019-07-172-6/+10
| | | | We want to use this in the transfer code and possibly for fast clears.
* broadcom: Move v3d_get_device_info to commonAndreas Bergmeier2019-07-171-51/+2
| | | | In common we can use implementation for Vulkan.
* panfrost: Merge varyings_mem into transient buffersAlyssa Rosenzweig2019-07-173-15/+5
| | | | | | | | | | | Theoretically we would like these split since varyings can have specially optimized flags (no map, coherent local). For now, since neither of these flags is particularly meaningful right now, merge them together instead of special casing varyings_mem. Saves upwards of 64MB of RAM per context. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* softpipe: pass stream-out targets to draw-module earlyErik Faye-Lund2019-07-172-15/+8
| | | | | | | | This is essensially a port of ed53e61bec9 from LLVMpipe to softpipe, as it makes things a bit simpler and more performant. Signed-off-by: Erik Faye-Lund <[email protected]> Reviewed-By: Gert Wollny <[email protected]>
* softpipe: Remove unused static functionGert Wollny2019-07-171-9/+0
| | | | | | | | | | | Thanks to Eric Engestrom for pointing out that there was something wrong with that function. Fixes: 724a73509e1bc1ce3abf9500e457bb2911b642db softpipe: Prepare handling explicit gradients Signed-off-by: Gert Wollny <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* iris: Drop copy and pasted iris_timebase_scaleKenneth Graunke2019-07-163-12/+3
| | | | | Lionel moved brw_timebase_scale to gen_device_info_timebase_scale a few months ago, so we should just use that, and not our own copy in iris.