summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* anv/apply_pipeline_layout: Set the cursor in lower_res_reindex_intrinsicJason Ekstrand2019-01-081-0/+2
| | | | | | | | | | | | | The loop through instructions doesn't set the cursor for us so unless we set it somewhere, we may end up emitting instructions in the wrong place. The only reason why we haven't been bitten by this in the past is that it only happens in a few variable pointers cases and the CTS tests for those don't use much control flow so things were getting emitted in the correct order by accident. Cc: [email protected] Reviewed-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* spirv: Handle any bit size in vector_insert/extractJason Ekstrand2019-01-083-11/+15
| | | | | | | | | This crops up both in the actual SPIR-V VectorInsert/Extract opcodes as well as various places where we deal with vector derefs. Cc: [email protected] Reviewed-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* glsl_type: Support serializing 8 and 16-bit typesJason Ekstrand2019-01-081-2/+12
| | | | | Reviewed-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* spirv: Fix matrix parameters in function calls.Bas Nieuwenhuizen2019-01-081-0/+4
| | | | | | | | | | They can be handled exactly the same as arrays, we just need to handle the base type correctly in the switches. Fixes: a45b6fb4524 "spirv: Pass SSA values through functions" Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109204 Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Fix rasterization precision bits.Bas Nieuwenhuizen2019-01-071-3/+3
| | | | | | | | | | | | | | | Note that these limits are exact, not a "precision is at least x", as texel coords also get snapped to a multiple of this step size before filtering. This fixes CTS tests dEQP-VK.texture.explicit_lod.2d.sizes.31x55_nearest_linear_mipmap_nearest_repeat dEQP-VK.texture.explicit_lod.2d.sizes.57x35_nearest_linear_mipmap_nearest_repeat Fixes: f4e499ec791 "radv: add initial non-conformant radv vulkan driver" Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109151 Reviewed-by: Samuel Pitoiset <[email protected]>
* nir: Make gl_nir_lower_samplers use gl_nir_lower_samplers_as_derefKenneth Graunke2019-01-071-47/+28
| | | | | | | | | | | | | | | | | | | | | | | | These days, we have two sampler lowering passes. The newer one, gl_nir_lower_samplers_as_deref, is used by radeonsi. It rewrites variables to drop structures out of sampler deref chains, to make life simpler. It then sets var->data.binding for non-bindless sampler and image variables based on the GL uniform storage's opaque index values. The older one converts sampler deref chains (nir_tex_src_texture_deref) to a numerical offset (nir_tex_src_texture_offset). It also stores the constant-valued portion of that number in tex->texture_index, making life really simple for drivers that don't support indirects. It too pokes at GL uniform storage's opaque index values. Logically, we can do the first pass (simplify derefs, set bindings) then the second (turn derefs to offsets, set texture_index). This patch does exactly that, eliminating some redundancy (only one pass has to poke at GL uniform storage), and gaining proper var->data.binding values for drivers using the full lowering. Reviewed-by: Ian Romanick <[email protected]> Acked-by: Jason Ekstrand <[email protected]>
* nir: Fix gl_nir_lower_samplers_as_deref's structure type handling.Kenneth Graunke2019-01-071-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | We recurse to remove structures, and at each step, re-modify the resulting type for our link in the deref chain. For arrays, the result of recursion is the new underlying type - so we wrap it with the array dimensionality again. For structs, we want to simply use the new underlying type, skipping the struct altogether. The correct way to do this is to do nothing at all. Previously, we had reset type to next->type, which is the /old/ field type, not the new field type we obtained by recursing. This undid our recursive work. Fixes about 338 tests with nested structs, such as: dEQP-GLES2.functional.uniform_api.value.initial.get_uniform.nested_structs_arrays.sampler2D_samplerCube_fragment Note that currently only radeonsi uses this pass, and NIR support is disabled there by default, so the breakage was likely not seen by most people. The next commit uses this pass for more drivers, so this fix prevents regressions from that change. Reviewed-by: Ian Romanick <[email protected]> Acked-by: Jason Ekstrand <[email protected]>
* amd/common: Add some parentheses to silence warning.Bas Nieuwenhuizen2019-01-071-2/+2
| | | | | | | | | | | | [1/59] Compiling C object 'src/amd/common/src@amd@common@@amd_common@sta/ac_nir_to_llvm.c.o'. ../mesa/src/amd/common/ac_nir_to_llvm.c: In function ‘get_inst_tessfactor_writemask’: ../mesa/src/amd/common/ac_nir_to_llvm.c:4089:32: warning: suggest parentheses around ‘+’ inside ‘<<’ [-Wparentheses] writemask = ((1 << num_comps + 1) - 1) << first_component; ~~~~~~~~~~^~~ ../mesa/src/amd/common/ac_nir_to_llvm.c:4091:33: warning: suggest parentheses around ‘+’ inside ‘<<’ [-Wparentheses] writemask = (((1 << num_comps + 1) - 1) << first_component) << 4; Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Remove unused variable.Bas Nieuwenhuizen2019-01-071-1/+0
| | | | Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Remove device path.Bas Nieuwenhuizen2019-01-072-3/+0
| | | | | | | | unused and gcc complains about strncpy. (from what I can see because strncpy does not leave a 0 byte on truncate. That said we don't use it so this does not fix a real bug). Reviewed-by: Samuel Pitoiset <[email protected]>
* ac: remove unused variable from ac_build_ddxyMarek Olšák2019-01-071-1/+1
| | | | trivial
* glsl: correct typo in GLSL compilation error messageAndres Gomez2019-01-071-1/+1
| | | | | | | | | v2: Add the "fix" tag (Erik). Fixes: 037f68d81e1 ("glsl: apply align layout qualifier rules to block offsets") Cc: Timothy Arceri <[email protected]> Signed-off-by: Andres Gomez <[email protected]> Reviewed-by: Erik Faye-Lund <[email protected]>
* vulkan: Update the XML and headers to 1.1.97Jason Ekstrand2019-01-0713-41/+311
| | | | Acked-by: Samuel Pitoiset <[email protected]>
* docs: update 18.3 and add 19.x cycles for the release calendarAndres Gomez2019-01-071-4/+117
| | | | | | | | | | | v2: replace incorrect "<td/>" with "<td>" (Eric). Cc: Dylan Baker <[email protected]> Cc: Juan A. Suarez <[email protected]> Cc: Emil Velikov <[email protected]> Signed-off-by: Andres Gomez <[email protected]> Acked-by: Emil Velikov <[email protected]> Acked-by: Juan A. Suarez <[email protected]>
* anv/android: Do not reject storage images.Bas Nieuwenhuizen2019-01-071-8/+2
| | | | | | | | | | | | | | | | | We do the ImageFormatProperties check already, and rejecting an usage flag when both ImageFormatProperties and the WSI (which is Android) support it is not allowed. Intel does support storage for some of the support WSI formats, such as R8G8B8A8_UNORM, and looking at the ISL_SURF_USAGE_DISABLE_AUX_BIT, the imported images do not have any form of compression that would prevent this fix. v2: Also consider STORAGE bit for Gralloc usage bits. (From Kevin Strasser <[email protected]>) Fixes: 053d4c328fa "anv: Implement VK_ANDROID_native_buffer (v9)" Reviewed-by: Tapani Pälli <[email protected]>
* radv: Implement buffer stores with less than 4 components.Bas Nieuwenhuizen2019-01-071-5/+14
| | | | | | | | | We started using it in the btoi paths for r32g32b32, and the LLVM IR checker will complain about it because we end up with intrinsics with the wrong type extension in the name. Fixes: 593996bc02 ("radv: implement buffer to image operations for R32G32B32") Reviewed-by: Samuel Pitoiset <[email protected]>
* appveyor: Add a Cygwin build scriptJon Turney2019-01-072-5/+54
|
* appveyor: put build steps in a script, rather than inline in appveyor.ymlJon Turney2019-01-072-29/+41
|
* etnaviv: annotate variables only used in debug buildLucas Stach2019-01-071-7/+4
| | | | | | | | | Some of the status variables in the compiler are only used in asserts and thus may be unused in release builds. Annotate them accordingly to avoid 'unused but set' warnings from the compiler. Signed-off-by: Lucas Stach <[email protected]> Reviewed-by: Christian Gmeiner <[email protected]>
* etnaviv: enable full overwrite in a few more casesLucas Stach2019-01-071-4/+7
| | | | | | | | | Take into account the render target format when checking if the color mask affects all channels of the RT. This allows to enable full overwrite in a few cases where a non-alpha format is used. Signed-off-by: Lucas Stach <[email protected]> Reviewed-by: Christian Gmeiner <[email protected]>
* nir: avoid uninitialized variable warningTimothy Arceri2019-01-071-1/+1
| | | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109231
* st/glsl: refactor st_link_nir()Timothy Arceri2019-01-071-36/+16
| | | | | | | | | | | | | | | | | | The functional change here is moving the nir_lower_io_to_scalar_early() calls inside st_nir_link_shaders() and moving the st_nir_opts() call after the call to nir_lower_io_arrays_to_elements(). This fixes a bug with the following piglit test due to the current code not cleaning up dead code after we lower arrays. This was causing an assert in the new duplicate varyings link time opt introduced in 70be9afccb23. tests/spec/glsl-1.10/execution/vsfs-unused-array-member.shader_test Moving the nir_lower_io_to_scalar_early() calls also allows us to tidy up the code a little and merge some loops. Reviewed-by: Eric Anholt <[email protected]>
* v3d: Use the core tex lowering.Eric Anholt2019-01-043-123/+10
| | | | | | | | Even without any clever optimization on the unpack operations, this gives us a useful value for the channels read field, which we can use to avoid ldtmu instructions to the no-op register. instructions in affected programs: 890712 -> 881974 (-0.98%)
* nir: Add nir_lower_tex options to lower sampler return formats.Eric Anholt2019-01-042-0/+83
| | | | | | | | | | | | | I've been doing this in the nir-to-vir and nir-to-qir backends of v3d and vc4, but nir could potentially do some useful stuff for us (like avoiding unpack/repacks) if we give it the information. v2: Skip lowering for txs/query_levels v3: Fix a crash on old-style shadow v4: Rename to tex_packing, use nir_format_unpack_sint/uint helpers, pack the enum. Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Allow nir_format_unpack_int/sint to unpack larger values.Eric Anholt2019-01-041-3/+8
| | | | | | | For V3D, I want to unpack 4-16-bit packed integers for 8 and 16-bit integer samplers. Reviewed-by: Jason Ekstrand <[email protected]>
* intel/blorp: Be more conservative about copying clear colorsJason Ekstrand2019-01-041-3/+6
| | | | | | | | | | | | In 92eb5bbc68d7324 we attempted to avoid copying clear colors whenever we weren't doing a resolve. However, this broke MSAA resolves because we need the clear color in the source. This patch makes blorp much more conservative such that it only avoids the clear color copy if either aux_usage == NONE or it's explicitly doing a fast-clear. Fixes: 92eb5bbc68d7 "intel/blorp: Only copy clear color when doing..." Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107728 Reviewed-by: Rafael Antognolli <[email protected]>
* v3d: Stop scalarizing our uniform loads.Eric Anholt2019-01-042-102/+57
| | | | | | | | | | | | | | | We can pull a whole vector in a single indirect load. This saves a bunch of round-trips to the TMU, instructions for setting up multiple loads, references to the UBO base in the uniforms, and apparently manages to reduce register pressure as well. instructions in affected programs: 3086665 -> 2454967 (-20.47%) uniforms in affected programs: 919581 -> 721039 (-21.59%) threads in affected programs: 1710 -> 3420 (100.00%) spills in affected programs: 596 -> 522 (-12.42%) fills in affected programs: 680 -> 562 (-17.35%) Improves 3dmmes performance by 2.29312% +/- 0.139825% (n=5)
* v3d: Do UBO loads a vector at a time.Eric Anholt2019-01-042-35/+99
| | | | | | | In the process of adding support for SSBOs and CS shared vars, I ended up needing a helper function for doing TMU general ops. This helper can be that starting point, and saves us a bunch of round-trips to the TMU by loading a vector at a time.
* v3d: Remove dead switch cases and comments from v3d_nir_lower_io.Eric Anholt2019-01-041-8/+3
| | | | Moving things to NIR left this mess around. All we lower now is uniforms.
* v3d: Fix up VS output setup during precompiles.Eric Anholt2019-01-041-6/+10
| | | | | | | | | I noticed that a VS I was debugging was missing all of its output stores -- outputs_written was for POS, VAR0, VAR3, while the shader's variables were POS, VAR9, and VAR12. I'm not sure what outputs_written is supposed to be doing here, but we can just walk the declared variables and avoid both this bug and the emission of extra stvpms for less-than-vec4 varyings.
* v3d: Reinstate the new shader-db output after v3d_compile() refactor.Eric Anholt2019-01-041-1/+18
| | | | I misplaced it in the rebase conflicts.
* nir: remove dead code from copy_prop_varsCaio Marcelo de Oliveira Filho2019-01-041-1/+1
| | | | | | | | | | | When copy_prop_vars also took care of dead write handling, intrin was used as part of store_to_entry. Now it isn't, so this assignment isn't used really used. Add a comment clarifying what happens to intrin. Fixes: 4dfa7adc100 "nir: Remove handling of dead writes from copy_prop_vars" Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: add CS stall on VF invalidation workaroundLionel Landwerlin2019-01-042-2/+2
| | | | | | | | | | | | | | | | | | | Even with the previous commit, hangs are still happening. The problem there is that the VF cache invalidate do happen immediately without waiting for previous rendering to complete. What happens is that we invalidate the cache the moment the PIPE_CONTROL is parsed but we still have old rendering in the pipe which continues to pull data into the cache with the old high address bits. The later rendering with the new high address bits then doesn't have the clean cache that it expects/needs. v2: Update commit message/explanation with Jason's Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Fixes: a363bb2cd0e2a1 ("i965: Allocate VMA in userspace for full-PPGTT systems.") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109072
* i965: include draw_params/derived_draw_params for VF cache workaroundLionel Landwerlin2019-01-041-5/+18
| | | | | | | | | | | These buffers are using VB slots and should be included in the workaround decision. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Fixes: a363bb2cd0e2a1 ("i965: Allocate VMA in userspace for full-PPGTT systems.") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109072
* intel/blorp: emit VF caching workaround before 3DSTATE_VERTEX_BUFFERSLionel Landwerlin2019-01-041-2/+2
| | | | | | | | | Probably no difference but it's nice to have i965 & blorp emit things in the same order. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: limit VF caching workaround to gen8/9/10Lionel Landwerlin2019-01-042-2/+4
| | | | | | | | | Documentation of the 3DSTATE_VERTEX_BUFFERS packet says this is only needed before ICL. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* glsl/linker: complete documentation for assign_attribute_or_color_locationsAndres Gomez2019-01-041-9/+13
| | | | | | | | | Commit 27f1298b9d9 ("glsl/linker: validate attribute aliasing before optimizations") forgot to complete the documentation. Cc: Tapani Pälli <[email protected]> Signed-off-by: Andres Gomez <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* virgl: remove empty fileGurchetan Singh2019-01-031-0/+0
| | | | | Fixes: 174f53 ("virgl: consolidate transfer code") Reviewed-by: Erik Faye-Lund <[email protected]>
* virgl: don't flush an empty rangeGurchetan Singh2019-01-031-0/+4
| | | | | | | | | | | | | | Otherwise, the gl-1.0-long-dlist Piglit test crashes. Fixes: db7757 ("virgl: modify how we handle GL_MAP_FLUSH_EXPLICIT_BIT") Reported by airlied@ v2: Exit on any invalid range (Erik) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109190 Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Erik Faye-Lund <[email protected]> Tested-by: Jakob Bornecrantz <[email protected]>
* docs: advertise distro-provided meson cross-filesEric Engestrom2019-01-031-0/+9
| | | | | | | | Hopefully we can kick start the revolution and other distros will start providing them as well :) Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* docs: fix the meson aarch64 cross-fileEric Engestrom2019-01-031-2/+2
| | | | | | | | | | `gcc-ar` is preferred over the generic `ar`, and the `arm` family is for 32-bit ARM [1]. [1] https://mesonbuild.com/Reference-tables.html#cpu-families Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* virgl/vtest: Use default socket name from protocol headerJakob Bornecrantz2019-01-031-3/+1
| | | | | | | | No functional change as the socket name is the same, just removing the double definition of the path. Reviewed-by: Gurchetan Singh <[email protected]> Signed-off-by: Jakob Bornecrantz <[email protected]>
* freedreno: fix staging resource size for arraysRob Clark2019-01-031-2/+10
| | | | | | | | | | | A 2d-array texture (for example), should get the # of array elements from box->depth, rather than depth0 which is minified. Fixes dEQP-GLES3.functional.shaders.texture_functions.texture.sampler2darray_bias_float_fragment with tiled textures. Reported-by: Kristian H. Kristensen <[email protected]> Signed-off-by: Rob Clark <[email protected]>
* freedreno: remove blit_via_copy_region()Rob Clark2019-01-031-4/+0
| | | | | | | | | | | | | If we hit the memcpy() path for copy_region(), that will try to do a transfer_map(), which goes badly for blits to/from staging triggered by transfer_map() or transfer_unmap(). We could possibly add fd_blit2() which has allow_transfer_map param, and call that for staging blits. But I'm not really sure if trying the blit via copy_region() is very useful. At least for newer gens that implement fd_context::blit(), it probably isn't. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: rework blitter APIRob Clark2019-01-031-54/+8
| | | | | | | | Switch over to using fd_context::blit(), in the same way that a5xx does. The previous patch wires fd_resource_copy_region() up to the blitter so a6xx no longer needs to bypass the core layer to accelerate this. Signed-off-by: Rob Clark <[email protected]>
* freedreno: try blitter for fd_resource_copy_region()Rob Clark2019-01-031-0/+27
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno: rework blit APIRob Clark2019-01-038-27/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | First step to unify the way fd5 and fd6 blitter works. Currently a6xx bypasses the blit API in order to also accelerate resource_copy_region() But this approach can lead to infinite recursion: #0 fd_alloc_staging (ctx=0x5555936480, rsc=0x7fac485f90, level=0, box=0x7fbab29220) at ../src/gallium/drivers/freedreno/freedreno_resource.c:291 #1 0x0000007fbdebed04 in fd_resource_transfer_map (pctx=0x5555936480, prsc=0x7fac485f90, level=0, usage=258, box=0x7fbab29220, pptrans=0x7fbab29240) at ../src/gallium/drivers/freedreno/freedreno_resource.c:479 #2 0x0000007fbe5c5068 in u_transfer_helper_transfer_map (pctx=0x5555936480, prsc=0x7fac485f90, level=0, usage=258, box=0x7fbab29220, pptrans=0x7fbab29240) at ../src/gallium/auxiliary/util/u_transfer_helper.c:243 #3 0x0000007fbde2dcb8 in util_resource_copy_region (pipe=0x5555936480, dst=0x7fac485f90, dst_level=0, dst_x=0, dst_y=0, dst_z=0, src=0x7fac47c780, src_level=0, src_box_in=0x7fbab2945c) at ../src/gallium/auxiliary/util/u_surface.c:350 #4 0x0000007fbdf2282c in fd_resource_copy_region (pctx=0x5555936480, dst=0x7fac485f90, dst_level=0, dstx=0, dsty=0, dstz=0, src=0x7fac47c780, src_level=0, src_box=0x7fbab2945c) at ../src/gallium/drivers/freedreno/freedreno_blitter.c:173 #5 0x0000007fbdf085d4 in fd6_resource_copy_region (pctx=0x5555936480, dst=0x7fac485f90, dst_level=0, dstx=0, dsty=0, dstz=0, src=0x7fac47c780, src_level=0, src_box=0x7fbab2945c) at ../src/gallium/drivers/freedreno/a6xx/fd6_blitter.c:587 #6 0x0000007fbde2f3d0 in util_try_blit_via_copy_region (ctx=0x5555936480, blit=0x7fbab29430) at ../src/gallium/auxiliary/util/u_surface.c:864 #7 0x0000007fbdec02c4 in fd_blit (pctx=0x5555936480, blit_info=0x7fbab29588) at ../src/gallium/drivers/freedreno/freedreno_resource.c:993 #8 0x0000007fbdf08408 in fd6_blit (pctx=0x5555936480, info=0x7fbab29588) at ../src/gallium/drivers/freedreno/a6xx/fd6_blitter.c:546 #9 0x0000007fbdebdc74 in do_blit (ctx=0x5555936480, blit=0x7fbab29588, fallback=false) at ../src/gallium/drivers/freedreno/freedreno_resource.c:129 #10 0x0000007fbdebe58c in fd_blit_from_staging (ctx=0x5555936480, trans=0x7fac47b7e8) at ../src/gallium/drivers/freedreno/freedreno_resource.c:326 #11 0x0000007fbdebea38 in fd_resource_transfer_unmap (pctx=0x5555936480, ptrans=0x7fac47b7e8) at ../src/gallium/drivers/freedreno/freedreno_resource.c:416 #12 0x0000007fbe5c5c68 in u_transfer_helper_transfer_unmap (pctx=0x5555936480, ptrans=0x7fac47b7e8) at ../src/gallium/auxiliary/util/u_transfer_helper.c:516 #13 0x0000007fbde2de24 in util_resource_copy_region (pipe=0x5555936480, dst=0x7fac485f90, dst_level=0, dst_x=0, dst_y=0, dst_z=0, src=0x7fac47b8e0, src_level=0, src_box_in=0x7fbab2997c) at ../src/gallium/auxiliary/util/u_surface.c:376 #14 0x0000007fbdf2282c in fd_resource_copy_region (pctx=0x5555936480, dst=0x7fac485f90, dst_level=0, dstx=0, dsty=0, dstz=0, src=0x7fac47b8e0, src_level=0, src_box=0x7fbab2997c) at ../src/gallium/drivers/freedreno/freedreno_blitter.c:173 #15 0x0000007fbdf085d4 in fd6_resource_copy_region (pctx=0x5555936480, dst=0x7fac485f90, dst_level=0, dstx=0, dsty=0, dstz=0, src=0x7fac47b8e0, src_level=0, src_box=0x7fbab2997c) at ../src/gallium/drivers/freedreno/a6xx/fd6_blitter.c:587 ... Instead rework the API to push the fallback back to core code, so that we can rework resource_copy_region() to have it's own fallback path, and then finally convert fd6 over to work in the same way. This also makes ctx->blit() optional, and cleans up some unnecessary callers. Signed-off-by: Rob Clark <[email protected]>
* freedreno: skip depth resolve if not writtenRob Clark2019-01-033-4/+14
| | | | | | | | | | | | For multi-pass rendering, it is common to keep the same depth buffer from previous pass, to discard geometry that would be hidden by later draws. In the later passes with depth-test enabled, but depth-write disabled, there is no reason to do gmem2mem resolve. TODO probably do something similar for stencil.. although stencil buffer isn't used as commonly these days Signed-off-by: Rob Clark <[email protected]>
* nir: merge some basic consecutive ifsTimothy Arceri2019-01-031-0/+93
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After trying multiple times to merge if-statements with phis between them I've come to the conclusion that it cannot be done without regressions. The problem is for some shaders we end up with a whole bunch of phis for the merged ifs resulting in increased register pressure. So this patch just merges ifs that have no phis between them. This seems to be consistent with what LLVM does so for radeonsi we only see a change (although its a large change) in a single shader. Shader-db results i965 (SKL): total instructions in shared programs: 13098176 -> 13098152 (<.01%) instructions in affected programs: 1326 -> 1302 (-1.81%) helped: 4 HURT: 0 total cycles in shared programs: 332032989 -> 332037583 (<.01%) cycles in affected programs: 60665 -> 65259 (7.57%) helped: 0 HURT: 4 The cycles estimates reported by shader-db for i965 seem inaccurate as the only difference in the final code is the removal of the redundent condition evaluations and jumps. Also the biggest code reduction (~7%) for radeonsi was in a tomb raider tressfx shader but for some reason this does not get merged for i965. Shader-db results radeonsi (VEGA): Totals from affected shaders: SGPRS: 232 -> 232 (0.00 %) VGPRS: 164 -> 164 (0.00 %) Spilled SGPRs: 59 -> 59 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 14584 -> 13520 (-7.30 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 13 -> 13 (0.00 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Ian Romanick <[email protected]>
* nir: add rewrite_phi_predecessor_blocks() helperTimothy Arceri2019-01-031-20/+31
| | | | | | This will also be used by the if merge pass in the following commit. Reviewed-by: Ian Romanick <[email protected]>