summaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* radv: Optimize calling radv_save_descriptors.Bas Nieuwenhuizen2017-11-041-4/+2
| | | | | | | | uint32_t data[MAX_SETS * 2] = {}; was getting executed before the exit and took significant amounts of time. By having the check outside the function, we skip the execution of the clear. Reviewed-by: Dave Airlie <[email protected]>
* radv: Use an array to store descriptor sets.Bas Nieuwenhuizen2017-11-042-26/+50
| | | | | | | | | | | | The vram_list linked list resulted in lots of pointer chasing. Replacing this with an array instead improves descriptor set allocation CPU usage by 3x at least (when also considering the free), because it had to iterate through 300-400 sets on average. Not a huge improvement as the pre-improvement CPU usage was only about 2.3% in the busiest thread. Reviewed-by: Dave Airlie <[email protected]>
* nv50,nvc0: Display shared memory usage in pipe_debug_messagePierre Moreau2017-11-042-6/+8
| | | | Signed-off-by: Pierre Moreau <[email protected]>
* nv50,nvc0: Copy shared memory per block to the program info structure and backPierre Moreau2017-11-042-0/+4
| | | | | | | | In OpenCL/CUDA kernels, shared memory usage can be defined within the kernel code. Those usage will only be picked up while parsing the SPIR-V, during the translation phase of the program. Signed-off-by: Pierre Moreau <[email protected]>
* nv50/ir: Store shared memory per block in nv50_ir_prog_infoPierre Moreau2017-11-041-0/+1
| | | | Signed-off-by: Pierre Moreau <[email protected]>
* i965/gen10: Implement Wa3DStateModeAnuj Phogat2017-11-032-0/+16
| | | | | | | | | | | | | | This workaround doesn't fix any of the piglit hangs we've seen on CNL. But it might be fixing something we haven't tested yet. V2: Remove the bits enabling Float blend optimization. It is enabled through CACHE_MODE_SS register. Update the comment. Move gen10 if block on top of gen9 if block. Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Rafael Antognolli <[email protected]> Reviewed-by: Nanley Chery <[email protected]>
* i965/gen10: Enable float blend optimizationAnuj Phogat2017-11-032-0/+9
| | | | | | | | | This optimization is enabled for previous generations too. See Mesa commit c17e214a6b On CNL this bit has been moved to CACHE_MODE_SS register. Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Nanley Chery <[email protected]>
* i965/gen10: Implement WaForceRCPFEHangWorkaroundAnuj Phogat2017-11-031-0/+23
| | | | | | | | | | | | | This workaround doesn't fix any of the piglit hangs we've seen on CNL. But it might be fixing something we haven't tested yet. V2: Add the check for Post Sync Operation. Update the workaround comment. Use braces around if-else. Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Rafael Antognolli <[email protected]> Reviewed-by: Nanley Chery <[email protected]>
* i965/gen10: Implement WaSampleOffsetIZ workaroundAnuj Phogat2017-11-032-0/+50
| | | | | | | | | | | | | | | | | | | | | | | There are few other (duplicate) workarounds which have similar recommendations: WaFlushHangWhenNonPipelineStateAndMarkerStalled WaCSStallBefore3DSamplePattern WaPipeControlBefore3DStateSamplePattern WaPipeControlBefore3DStateSamplePattern has some extra recommendations if driver is using mid batch context restore. Ignoring it for now because We're not doing mid-batch context restore in Mesa. This workaround doesn't fix any of the piglit hangs we've seen on CNL. But it might be fixing something we haven't tested yet. V2: Use brw_load_register_imm32() to program CACHE_MODE_0. Get rid of brw_flush_gpu_caches(). V3: Make the workaround helper functions static. Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Rafael Antognolli <[email protected]> Reviewed-by :Nanley Chery <[email protected]>
* i965/gen10: Don't set Antialiasing Enable in 3DSTATE_RASTER if num_samples > 1Anuj Phogat2017-11-031-0/+10
| | | | | Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/gen10: Don't set Smooth Point Enable in 3DSTATE_SF if num_samples > 1Anuj Phogat2017-11-031-1/+12
| | | | | Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* winsys/amdgpu: Add R600_DEBUG flag to reserve VMID per ctx.Andrey Grodzovsky2017-11-034-0/+15
| | | | | | | | | | Fixes reverted patch f03b7c9 by doing VMID reservation per process and not per context. Also updates required amdgpu libdrm version since the change involved interface updates in amdgpu libdrm. Signed-off-by: Andrey Grodzovsky <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* i965: perf: list registers to program for queriesLionel Landwerlin2017-11-032-0/+66
| | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: perf: factorize code for availabilityLionel Landwerlin2017-11-031-12/+16
| | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: perf: make revision variable availableLionel Landwerlin2017-11-035-8/+10
| | | | | | | This will be used in the next commit to build up register programming. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* glsl: fix interpolateAtXxx(some_vec[idx], ...) with dynamic idxNicolai Hähnle2017-11-031-1/+30
| | | | | | | | | | | | | The dynamic index of a vector (not array!) is lowered to a sequence of conditional assignments. However, the interpolate_at_* expressions require that the interpolant is an l-value of a shader input. So instead of doing conditional assignments of parts of the shader input and then interpolating that (which is nonsensical), we interpolate the entire shader input and then do conditional assignments of the interpolated result. Reviewed-by: Timothy Arceri <[email protected]>
* glsl: allow any l-value of an input variable as interpolant in interpolateAt*Nicolai Hähnle2017-11-032-5/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | The intended rule has been clarified in GLSL 4.60, Section 8.13.2 (Interpolation Functions): "For all of the interpolation functions, interpolant must be an l-value from an in declaration; this can include a variable, a block or structure member, an array element, or some combination of these. Component selection operators (e.g., .xy) may be used when specifying interpolant." For members of interface blocks, var->data.must_be_shader_input must be determined on-the-fly after lowering interface blocks, since we don't want to disable varying packing for an entire block just because one input in it is used in interpolateAt*. v2: keep setting must_be_shader_input in ast_function (Ian) v3: follow the relaxed rule of GLSL 4.60 v4: only apply the relaxed rules to desktop GL (the ES WG decided that the relaxed rules may apply in a future version but not retroactively; see also dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at_centroid.negative.*) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101378 Reviewed-by: Ian Romanick <[email protected]> (v1) Reviewed-by: Timothy Arceri <[email protected]>
* nir/serialize: fix build with gcc 4.4.7Dave Airlie2017-11-031-19/+19
| | | | | | | I had to build on RHEL6 today, and noticed this. Reviewed-by: Jason Ekstrand <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* i915g: remove some unknown cap warnings.Dave Airlie2017-11-031-0/+8
|
* i915g: make gears run again.Dave Airlie2017-11-034-4/+24
| | | | | | | We need to validate some structs exist before we dirty the states, and avoid the problem in some other places. Fixes: e027935a7 ("st/mesa: don't update unrelated states in non-draw calls such as Clear")
* ac: remove the remaining duplicate llvm typesTimothy Arceri2017-11-031-12/+1
| | | | | Reviewed-by: Marek Olšák <[email protected] Acked-by: Nicolai Hähnle <[email protected]>
* ac: remove usused v4f32Timothy Arceri2017-11-031-4/+0
| | | | | Reviewed-by: Marek Olšák <[email protected] Acked-by: Nicolai Hähnle <[email protected]>
* ac: add v2f32 to the common code and make use of itTimothy Arceri2017-11-033-10/+7
| | | | | Reviewed-by: Marek Olšák <[email protected] Acked-by: Nicolai Hähnle <[email protected]>
* ac: use the ac f16 llvm typeTimothy Arceri2017-11-031-3/+1
| | | | | Reviewed-by: Marek Olšák <[email protected] Acked-by: Nicolai Hähnle <[email protected]>
* ac: use the ac f32 llvm typeTimothy Arceri2017-11-031-35/+33
| | | | | Reviewed-by: Marek Olšák <[email protected] Acked-by: Nicolai Hähnle <[email protected]>
* ac: use the ac f64 llvm typeTimothy Arceri2017-11-031-3/+1
| | | | | Reviewed-by: Marek Olšák <[email protected] Acked-by: Nicolai Hähnle <[email protected]>
* ac: use the common v8i32 llvm typeTimothy Arceri2017-11-031-4/+2
| | | | | Reviewed-by: Marek Olšák <[email protected] Acked-by: Nicolai Hähnle <[email protected]>
* ac: use the common v4i32 llvm typeTimothy Arceri2017-11-031-9/+7
| | | | | Reviewed-by: Marek Olšák <[email protected] Acked-by: Nicolai Hähnle <[email protected]>
* ac: add v3i32 to the common code and make use of itTimothy Arceri2017-11-033-5/+5
| | | | | Reviewed-by: Marek Olšák <[email protected] Acked-by: Nicolai Hähnle <[email protected]>
* ac: add v2i32 to the common code and use itTimothy Arceri2017-11-033-11/+11
| | | | | Reviewed-by: Marek Olšák <[email protected] Acked-by: Nicolai Hähnle <[email protected]>
* ac: use the ac i64 llvm typeTimothy Arceri2017-11-031-3/+1
| | | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Marek Olšák <[email protected] Acked-by: Nicolai Hähnle <[email protected]>
* ac: remove unused i16 llvm typeTimothy Arceri2017-11-031-2/+0
| | | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Marek Olšák <[email protected] Acked-by: Nicolai Hähnle <[email protected]>
* ac: use the ac ivoidt llvm typeTimothy Arceri2017-11-031-4/+2
| | | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Marek Olšák <[email protected] Acked-by: Nicolai Hähnle <[email protected]>
* ac: use the ac i8 llvm typeTimothy Arceri2017-11-031-6/+4
| | | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Marek Olšák <[email protected] Acked-by: Nicolai Hähnle <[email protected]>
* ac: use the ac i1 llvm typeTimothy Arceri2017-11-031-3/+1
| | | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Marek Olšák <[email protected] Acked-by: Nicolai Hähnle <[email protected]>
* ac: use the ac i32 llvm typeTimothy Arceri2017-11-031-181/+179
| | | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Marek Olšák <[email protected] Acked-by: Nicolai Hähnle <[email protected]>
* ac/radeonsi: add support for tex instr without a derefenceTimothy Arceri2017-11-032-34/+51
| | | | | | | | | | | These are produced by nir_lower_bitmap(), adding the missing derefence would cause other issues that need to be hacked around such as skipping sampler lowering and uniform location assignment, so this change seems the correct way to go. Fixes 194 piglit crashes on radeonsi using NIR. Reviewed-by: Nicolai Hähnle <[email protected]>
* nir: skip lowering sampler if there is no dereferenceTimothy Arceri2017-11-031-1/+3
| | | | | | This avoids a crash on the output of nir_lower_bitmap(). Reviewed-by: Nicolai Hähnle <[email protected]>
* r600: add support for early depth/stencil.Dave Airlie2017-11-031-0/+3
| | | | | | | | This add support for the early depth/stencil property found on image shaders. Reviewed-by: Nicolai Hähnle <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: add support for emitting RAT instructions to the assembler.Dave Airlie2017-11-033-0/+35
| | | | | | | | This adds support for emitting RAT instructions to the assembler. RAT instructions are used to implement image accessors. Reviewed-by: Nicolai Hähnle <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: add support for mark bit to the assembler.Dave Airlie2017-11-033-0/+7
| | | | | | | | This adds support to the assembler for the mark bit on the export word1. Reviewed-by: Nicolai Hähnle <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: add support for valid pixel mode on CF clausesDave Airlie2017-11-032-0/+2
| | | | | | | | This just adds support to the assembler for setting the valid pixel mode on the CF clause. Reviewed-by: Nicolai Hähnle <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: add support for some ALU sources.Dave Airlie2017-11-031-0/+9
| | | | | | | | | | These special ALU sources provide the shader engine, simd and hw wave ids. These are required for images support. Reviewed-by: Nicolai Hähnle <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv: use the optimal packets order for dispatch callsSamuel Pitoiset2017-11-021-8/+53
| | | | | | | | | | This should reduce the time where compute units are idle, mainly for meta operations because they use a bunch of compute shaders. This seems to have a really minor positive effect for Talos, at least. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* nir: add tess patch support to nir_remove_unused_varyings()Timothy Arceri2017-11-031-19/+42
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* es2api/ABI-check: Add es3.x symbolsDylan Baker2017-11-021-8/+125
| | | | | | | | | | | | | | Currently this ABI check only checks for es2 symbols, but es3.x symbols are also exposed. Exposing these symbols is recommended by Khronos, and as such the test should accept that as ABI. see: https://lists.freedesktop.org/archives/mesa-stable/2016-June/004545.html for the discussion about exposing these symbols cc: Ian Romanick <[email protected]> Signed-off-by: Dylan Baker <[email protected]> Tested-by: Eric Engestrom <[email protected]> Tested-by: Michel Dänzer <[email protected]>
* meson: Set c visibility args for wayland-drmDylan Baker2017-11-021-0/+1
| | | | | | | Because otherwise gbm will expose wayland symbols that it shouldn't. Signed-off-by: Dylan Baker <[email protected]> Reviewed-and-Tested-by: Eric Engestrom <[email protected]>
* st/glsl_to_nir: pass gl_shader_program to st_finalize_nir()Timothy Arceri2017-11-033-27/+11
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radv: Don't expose heaps with 0 memory.Bas Nieuwenhuizen2017-11-023-53/+101
| | | | | | | | | | | | It confuses CTS. This pregenerates the heap info into the physical device, so we can use it for translating contiguous indices into our "standard" ones. This also makes the WSI a bit smarter in case the first preferred heap does not exist. Reviewed-by: Dave Airlie <[email protected]> CC: <[email protected]>
* gbm: Don't traverse backwards for includesDylan Baker2017-11-024-5/+12
| | | | | | | | | | | | | | This is just a bad idea and should be avoided. Instead, make the #include flat and fix the build systems to pass the proper -I flags v2: - add an inc_wayland_drm instead passing a path to include_directories (Emil) - update commit message (Emil) Signed-off-by: Dylan Baker <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Daniel Stone <[email protected]> (v1) Reviewed-by: Eric Engestrom <[email protected]> (v1)