summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* virgl: remove empty fileGurchetan Singh2019-01-031-0/+0
| | | | | Fixes: 174f53 ("virgl: consolidate transfer code") Reviewed-by: Erik Faye-Lund <[email protected]>
* virgl: don't flush an empty rangeGurchetan Singh2019-01-031-0/+4
| | | | | | | | | | | | | | Otherwise, the gl-1.0-long-dlist Piglit test crashes. Fixes: db7757 ("virgl: modify how we handle GL_MAP_FLUSH_EXPLICIT_BIT") Reported by airlied@ v2: Exit on any invalid range (Erik) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109190 Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Erik Faye-Lund <[email protected]> Tested-by: Jakob Bornecrantz <[email protected]>
* docs: advertise distro-provided meson cross-filesEric Engestrom2019-01-031-0/+9
| | | | | | | | Hopefully we can kick start the revolution and other distros will start providing them as well :) Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* docs: fix the meson aarch64 cross-fileEric Engestrom2019-01-031-2/+2
| | | | | | | | | | `gcc-ar` is preferred over the generic `ar`, and the `arm` family is for 32-bit ARM [1]. [1] https://mesonbuild.com/Reference-tables.html#cpu-families Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* virgl/vtest: Use default socket name from protocol headerJakob Bornecrantz2019-01-031-3/+1
| | | | | | | | No functional change as the socket name is the same, just removing the double definition of the path. Reviewed-by: Gurchetan Singh <[email protected]> Signed-off-by: Jakob Bornecrantz <[email protected]>
* freedreno: fix staging resource size for arraysRob Clark2019-01-031-2/+10
| | | | | | | | | | | A 2d-array texture (for example), should get the # of array elements from box->depth, rather than depth0 which is minified. Fixes dEQP-GLES3.functional.shaders.texture_functions.texture.sampler2darray_bias_float_fragment with tiled textures. Reported-by: Kristian H. Kristensen <[email protected]> Signed-off-by: Rob Clark <[email protected]>
* freedreno: remove blit_via_copy_region()Rob Clark2019-01-031-4/+0
| | | | | | | | | | | | | If we hit the memcpy() path for copy_region(), that will try to do a transfer_map(), which goes badly for blits to/from staging triggered by transfer_map() or transfer_unmap(). We could possibly add fd_blit2() which has allow_transfer_map param, and call that for staging blits. But I'm not really sure if trying the blit via copy_region() is very useful. At least for newer gens that implement fd_context::blit(), it probably isn't. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: rework blitter APIRob Clark2019-01-031-54/+8
| | | | | | | | Switch over to using fd_context::blit(), in the same way that a5xx does. The previous patch wires fd_resource_copy_region() up to the blitter so a6xx no longer needs to bypass the core layer to accelerate this. Signed-off-by: Rob Clark <[email protected]>
* freedreno: try blitter for fd_resource_copy_region()Rob Clark2019-01-031-0/+27
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno: rework blit APIRob Clark2019-01-038-27/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | First step to unify the way fd5 and fd6 blitter works. Currently a6xx bypasses the blit API in order to also accelerate resource_copy_region() But this approach can lead to infinite recursion: #0 fd_alloc_staging (ctx=0x5555936480, rsc=0x7fac485f90, level=0, box=0x7fbab29220) at ../src/gallium/drivers/freedreno/freedreno_resource.c:291 #1 0x0000007fbdebed04 in fd_resource_transfer_map (pctx=0x5555936480, prsc=0x7fac485f90, level=0, usage=258, box=0x7fbab29220, pptrans=0x7fbab29240) at ../src/gallium/drivers/freedreno/freedreno_resource.c:479 #2 0x0000007fbe5c5068 in u_transfer_helper_transfer_map (pctx=0x5555936480, prsc=0x7fac485f90, level=0, usage=258, box=0x7fbab29220, pptrans=0x7fbab29240) at ../src/gallium/auxiliary/util/u_transfer_helper.c:243 #3 0x0000007fbde2dcb8 in util_resource_copy_region (pipe=0x5555936480, dst=0x7fac485f90, dst_level=0, dst_x=0, dst_y=0, dst_z=0, src=0x7fac47c780, src_level=0, src_box_in=0x7fbab2945c) at ../src/gallium/auxiliary/util/u_surface.c:350 #4 0x0000007fbdf2282c in fd_resource_copy_region (pctx=0x5555936480, dst=0x7fac485f90, dst_level=0, dstx=0, dsty=0, dstz=0, src=0x7fac47c780, src_level=0, src_box=0x7fbab2945c) at ../src/gallium/drivers/freedreno/freedreno_blitter.c:173 #5 0x0000007fbdf085d4 in fd6_resource_copy_region (pctx=0x5555936480, dst=0x7fac485f90, dst_level=0, dstx=0, dsty=0, dstz=0, src=0x7fac47c780, src_level=0, src_box=0x7fbab2945c) at ../src/gallium/drivers/freedreno/a6xx/fd6_blitter.c:587 #6 0x0000007fbde2f3d0 in util_try_blit_via_copy_region (ctx=0x5555936480, blit=0x7fbab29430) at ../src/gallium/auxiliary/util/u_surface.c:864 #7 0x0000007fbdec02c4 in fd_blit (pctx=0x5555936480, blit_info=0x7fbab29588) at ../src/gallium/drivers/freedreno/freedreno_resource.c:993 #8 0x0000007fbdf08408 in fd6_blit (pctx=0x5555936480, info=0x7fbab29588) at ../src/gallium/drivers/freedreno/a6xx/fd6_blitter.c:546 #9 0x0000007fbdebdc74 in do_blit (ctx=0x5555936480, blit=0x7fbab29588, fallback=false) at ../src/gallium/drivers/freedreno/freedreno_resource.c:129 #10 0x0000007fbdebe58c in fd_blit_from_staging (ctx=0x5555936480, trans=0x7fac47b7e8) at ../src/gallium/drivers/freedreno/freedreno_resource.c:326 #11 0x0000007fbdebea38 in fd_resource_transfer_unmap (pctx=0x5555936480, ptrans=0x7fac47b7e8) at ../src/gallium/drivers/freedreno/freedreno_resource.c:416 #12 0x0000007fbe5c5c68 in u_transfer_helper_transfer_unmap (pctx=0x5555936480, ptrans=0x7fac47b7e8) at ../src/gallium/auxiliary/util/u_transfer_helper.c:516 #13 0x0000007fbde2de24 in util_resource_copy_region (pipe=0x5555936480, dst=0x7fac485f90, dst_level=0, dst_x=0, dst_y=0, dst_z=0, src=0x7fac47b8e0, src_level=0, src_box_in=0x7fbab2997c) at ../src/gallium/auxiliary/util/u_surface.c:376 #14 0x0000007fbdf2282c in fd_resource_copy_region (pctx=0x5555936480, dst=0x7fac485f90, dst_level=0, dstx=0, dsty=0, dstz=0, src=0x7fac47b8e0, src_level=0, src_box=0x7fbab2997c) at ../src/gallium/drivers/freedreno/freedreno_blitter.c:173 #15 0x0000007fbdf085d4 in fd6_resource_copy_region (pctx=0x5555936480, dst=0x7fac485f90, dst_level=0, dstx=0, dsty=0, dstz=0, src=0x7fac47b8e0, src_level=0, src_box=0x7fbab2997c) at ../src/gallium/drivers/freedreno/a6xx/fd6_blitter.c:587 ... Instead rework the API to push the fallback back to core code, so that we can rework resource_copy_region() to have it's own fallback path, and then finally convert fd6 over to work in the same way. This also makes ctx->blit() optional, and cleans up some unnecessary callers. Signed-off-by: Rob Clark <[email protected]>
* freedreno: skip depth resolve if not writtenRob Clark2019-01-033-4/+14
| | | | | | | | | | | | For multi-pass rendering, it is common to keep the same depth buffer from previous pass, to discard geometry that would be hidden by later draws. In the later passes with depth-test enabled, but depth-write disabled, there is no reason to do gmem2mem resolve. TODO probably do something similar for stencil.. although stencil buffer isn't used as commonly these days Signed-off-by: Rob Clark <[email protected]>
* nir: merge some basic consecutive ifsTimothy Arceri2019-01-031-0/+93
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After trying multiple times to merge if-statements with phis between them I've come to the conclusion that it cannot be done without regressions. The problem is for some shaders we end up with a whole bunch of phis for the merged ifs resulting in increased register pressure. So this patch just merges ifs that have no phis between them. This seems to be consistent with what LLVM does so for radeonsi we only see a change (although its a large change) in a single shader. Shader-db results i965 (SKL): total instructions in shared programs: 13098176 -> 13098152 (<.01%) instructions in affected programs: 1326 -> 1302 (-1.81%) helped: 4 HURT: 0 total cycles in shared programs: 332032989 -> 332037583 (<.01%) cycles in affected programs: 60665 -> 65259 (7.57%) helped: 0 HURT: 4 The cycles estimates reported by shader-db for i965 seem inaccurate as the only difference in the final code is the removal of the redundent condition evaluations and jumps. Also the biggest code reduction (~7%) for radeonsi was in a tomb raider tressfx shader but for some reason this does not get merged for i965. Shader-db results radeonsi (VEGA): Totals from affected shaders: SGPRS: 232 -> 232 (0.00 %) VGPRS: 164 -> 164 (0.00 %) Spilled SGPRs: 59 -> 59 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 14584 -> 13520 (-7.30 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 13 -> 13 (0.00 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Ian Romanick <[email protected]>
* nir: add rewrite_phi_predecessor_blocks() helperTimothy Arceri2019-01-031-20/+31
| | | | | | This will also be used by the if merge pass in the following commit. Reviewed-by: Ian Romanick <[email protected]>
* nir: simplify does_varying_match()Timothy Arceri2019-01-031-5/+2
| | | | Reviewed-by: Alejandro Piñeiro <[email protected]>
* nir: make use of does_varying_match() helperTimothy Arceri2019-01-031-2/+1
| | | | Reviewed-by: Alejandro Piñeiro <[email protected]>
* nir: make nir_opt_remove_phis_impl() staticTimothy Arceri2019-01-032-2/+1
| | | | Reviewed-by: Alejandro Piñeiro <[email protected]>
* v3d: Refactor compiler entrypoints.Eric Anholt2019-01-023-189/+170
| | | | | | Before, I had per-stage entryoints with some helpers shared between them. As I extended for compute shaders and shader-db, it turned out that the other common code in the middle wanted to be shared too.
* v3d: Handle dynamically uniform IF statements with uniform control flow.Eric Anholt2019-01-021-1/+65
| | | | | | | | | Loops will be trickier, since we need some analysis to figure out if the breaks/continues inside are uniform. Until we get that in NIR, this gets us some quick wins. total instructions in shared programs: 6192844 -> 6174162 (-0.30%) instructions in affected programs: 487781 -> 469099 (-3.83%)
* v3d: Fold comparisons for IF conditions into the flags for the IF.Eric Anholt2019-01-025-12/+57
| | | | | total instructions in shared programs: 6193810 -> 6192844 (-0.02%) instructions in affected programs: 800373 -> 799407 (-0.12%)
* v3d: Don't try to fold non-SSA-src comparisons into bcsels.Eric Anholt2019-01-021-1/+17
| | | | | There could have been a write of a src in between the comparison and the bcsel that would invalidate the comparison.
* v3d: Move the "Find the ALU instruction generating our bool" out of bcsel.Eric Anholt2019-01-021-6/+9
| | | | This will be reused for if statements.
* v3d: Simplify the emission of comparisons for the bcsel optimization.Eric Anholt2019-01-021-37/+24
| | | | | | I wanted to reuse the comparison stuff for nir_ifs, but for that I just want the flags and no destination value. Splitting the conditions from the destinations ended up cleaning the existing code up, anyway.
* v3d: Don't forget to include RT writes in precompiles.Eric Anholt2019-01-021-0/+10
| | | | | Looking at some assembly dumps for an optimization, we were clearly missing important parts of the shader!
* v3d: Fix segfault when failing to compile a program.Eric Anholt2019-01-021-2/+4
| | | | | | | We'll still fail at draw time, but this avoids a regression in shader-db execution once I enable TLB writes in precompiles. Fixes: b38e4d313fc2 ("v3d: Create a state uploader for packing our shaders together.")
* radeonsi: always unmap texture CPU mappings on 32-bit CPU architecturesMarek Olšák2019-01-021-0/+16
| | | | | | Team Fortress 2 32-bit version runs out of the CPU address space. Tested-by: Dieter Nützel <[email protected]>
* radeonsi: remove unused variables in si_insert_input_ptrMarek Olšák2019-01-021-3/+1
| | | | Tested-by: Dieter Nützel <[email protected]>
* radeonsi: use u_decomposed_prims_for_vertices instead of u_prims_for_verticesMarek Olšák2019-01-021-1/+3
| | | | | | | It seems to be the same, but this doesn't use integer division with a variable divisor. Tested-by: Dieter Nützel <[email protected]>
* radeonsi: make si_cp_wait_mem more configurableMarek Olšák2019-01-025-8/+8
| | | | Tested-by: Dieter Nützel <[email protected]>
* radeonsi: call si_fix_resource_usage for the GS copy shader as wellMarek Olšák2019-01-021-0/+4
| | | | Tested-by: Dieter Nützel <[email protected]>
* radeonsi: don't emit redundant PKT3_NUM_INSTANCES packetsMarek Olšák2019-01-022-2/+10
| | | | Tested-by: Dieter Nützel <[email protected]>
* nir: add a way to print the deref chainCaio Marcelo de Oliveira Filho2019-01-022-4/+14
| | | | | | | | Makes debugging easier when we care about the deref chain and not the deref instruction itself. To make it take a const pointer, constify some of the static functions in nir_print.c. Reviewed-by: Eric Anholt <[email protected]>
* meson: Error out if building nouveau and using LLVM without rttiDylan Baker2019-01-021-0/+3
| | | | | | | | | | | Nouveau requires rtti. Often LLVM is configured without rtti, and code with and without cannot be linked safely. Lets just error out if nouveau is requested and llvm is built without rtti. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109202 Fixes: c5a97d658ec19cc02719d7f86c1b0715e3d9ffc4 ("meson: fix builds against LLVM built without rtti") Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* egl/haiku: Fix reference to disp vs dpyAlexander von Gluck IV2019-01-021-1/+2
| | | | | Reviewed-by: Eric Engestrom <[email protected]> Fixes: 00992700c9a812a54563 "egl: set the EGLDevice when creating a display"
* compiler/spirv: use 32-bit polynomial approximation for 16-bit asin()Iago Toral Quiroga2019-01-021-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | The 16-bit polynomial execution doesn't meet Khronos precision requirements. Also, the half-float denorm range starts at 2^(-14) and with asin taking input values in the range [0, 1], polynomial approximations can lead to flushing relatively easy. An alternative is to use the atan2 formula to compute asin, which is the reference taken by Khronos to determine precision requirements, but that ends up generating too many additional instructions when compared to the polynomial approximation. Specifically, for the Intel case, doing this adds +41 instructions to the program for each asin/acos call, which looks like an undesirable trade off. So for now we take the easy way out and fallback to using the 32-bit polynomial approximation, which is better (faster) than the 16-bit atan2 implementation and gives us better precision that matches Khronos requirements. v2: - Fallback to 32-bit using recursion (Jason). Reviewed-by: Jason Ekstrand <[email protected]>
* compiler/spirv: implement 16-bit frexpIago Toral Quiroga2019-01-021-2/+46
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* compiler/spirv: implement 16-bit hyperbolic trigonometric functionsIago Toral Quiroga2019-01-021-18/+26
| | | | | | | | | | v2: - use nir_fadd_imm and nir_fmul_imm helpers (Jason) v3: - since we need to define one for fsub use it for fdiv too (Jason) Reviewed-by: Jason Ekstrand <[email protected]>
* compiler/spirv: implement 16-bit exp and logIago Toral Quiroga2019-01-021-2/+2
| | | | | | | v2 - use nir_fmul_imm helper (Jason) Reviewed-by: Jason Ekstrand <[email protected]>
* compiler/spirv: implement 16-bit atan2Iago Toral Quiroga2019-01-021-7/+11
| | | | | | | | | | | | v2: - fix huge_val for 16-bit, it was mean't to be 2^14 not 10^14. v3: - rebase on top of new bool sized opcodes - use nir_b2f helper - use nir_fmul_imm helper Reviewed-by: Jason Ekstrand <[email protected]>
* compiler/spirv: implement 16-bit atanIago Toral Quiroga2019-01-021-12/+11
| | | | | | | | | v2: - use nir_fadd_imm and nir_fmul_imm helpers (Jason) - rebased on top of new sized boolean opcodes - use nir_b2f helper Reviewed-by: Jason Ekstrand <[email protected]>
* compiler/spirv: implement 16-bit acosIago Toral Quiroga2019-01-021-2/+3
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* compiler/spirv: implement 16-bit asinIago Toral Quiroga2019-01-021-9/+14
| | | | | | | | | | | v2: - use nir_fmul_imm and nir_fadd_imm helpers (Jason) v3: - missed one case where we need to replace nir_imm_float with nir_imm_floatN_t (Jason) Reviewed-by: Jason Ekstrand <[email protected]>
* compiler/spirv: handle 16-bit float in radians() and degrees()Iago Toral Quiroga2019-01-021-2/+2
| | | | | | | v2: - use nir_imm_fmul helper (Jason) Reviewed-by: Jason Ekstrand <[email protected]>
* compiler/nir: add nir_fadd_imm() and nir_fmul_imm() helpersIago Toral Quiroga2019-01-021-0/+12
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* compiler/nir: add a nir_b2f() helperIago Toral Quiroga2019-01-021-0/+12
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* nir: link time opt duplicate varyingsTimothy Arceri2019-01-021-0/+88
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we are outputting the same value to more than one output component rewrite the inputs to read from a single component. This will allow the duplicate varying components to be optimised away by the existing opts. shader-db results i965 (SKL): total instructions in shared programs: 12869230 -> 12860886 (-0.06%) instructions in affected programs: 322601 -> 314257 (-2.59%) helped: 3080 HURT: 8 total cycles in shared programs: 317792574 -> 317730593 (-0.02%) cycles in affected programs: 2584925 -> 2522944 (-2.40%) helped: 2975 HURT: 477 shader-db results radeonsi (VEGA): SGPRS: 31576 -> 31664 (0.28 %) VGPRS: 17484 -> 17064 (-2.40 %) Spilled SGPRs: 184 -> 167 (-9.24 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 583340 -> 569368 (-2.40 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 6162 -> 6270 (1.75 %) Wait states: 0 -> 0 (0.00 %) vkpipeline-db results RADV (VEGA): Totals from affected shaders: SGPRS: 14880 -> 15080 (1.34 %) VGPRS: 10872 -> 10888 (0.15 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 674016 -> 668396 (-0.83 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 2708 -> 2704 (-0.15 %) Wait states: 0 -> 0 (0.00 % V2: bunch of tidy ups suggested by Jason Reviewed-by: Eric Anholt <[email protected]>
* nir: rework nir_link_opt_varyings()Timothy Arceri2019-01-021-16/+12
| | | | | | | | This just cleans things up a little and make things more safe for derefs. Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* nir: add can_replace_varying() helperTimothy Arceri2019-01-021-2/+14
| | | | | | | | This will be reused by the following patch. Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* nir: rename nir_link_constant_varyings() nir_link_opt_varyings()Timothy Arceri2019-01-025-6/+6
| | | | | | | | | | The following patches will add support for an additional optimisation so this function will no longer just optimise varying constants. Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* st/glsl_to_nir: call nir_lower_load_const_to_scalar() in the stTimothy Arceri2019-01-022-3/+3
| | | | | | | | | This will help the new opt introduced in the following patches allowing us to remove extra duplicate varyings. Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* radeonsi: make use of ac_are_tessfactors_def_in_all_invocs()Timothy Arceri2019-01-021-8/+2
| | | | | Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Marek Olšák <[email protected]>