aboutsummaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers
Commit message (Collapse)AuthorAgeFilesLines
* gallium: add PIPE_CAP_MAX_VARYINGSKarol Herbst2019-02-0716-12/+48
| | | | | | | | | | | | | | | | | Some NVIDIA hardware can accept 128 fragment shader input components, but only have up to 124 varying-interpolated input components. We add a new cap to express this cleanly. For most drivers, this will have the same value as PIPE_SHADER_CAP_MAX_INPUTS for the fragment shader. Fixes KHR-GL45.limits.max_fragment_input_components Signed-off-by: Karol Herbst <[email protected]> [imirkin: rebased, improved docs/commit message] Signed-off-by: Ilia Mirkin <[email protected]> Acked-by: Rob Clark <[email protected]> Acked-by: Eric Anholt <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Cc: 19.0 <[email protected]>
* panfrost: Include glue for out-of-tree legacy codeAlyssa Rosenzweig2019-02-075-7/+29
| | | | | | | | | | | | | | | | | In addition to the DRM interface in active development, for legacy kernels Panfrost has a small, optional, out-of-tree glue repository. For various reasons, this legacy code should not be included in Mesa proper, but this commit allows it to coexist peacefully with upstream Panfrost. If the nondrm repo is cloned/symlinked to the directory `src/gallium/drivers/panfrost/nondrm`, legacy functionality will be built. Otherwise, the driver will build normally, though a runtime error message will be printed if a legacy kernel is detected. This workaround is icky, but it allows a nearly-upstream Panfrost to work on real hardware, today. Ideally, this patch will be reverted when the Panfrost kernel module is mature and we drop legacy support. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Check in sources for command streamAlyssa Rosenzweig2019-02-0722-5/+5441
| | | | | | | | This patch includes the command stream portion of the driver, complementing the earlier compiler. It provides a base for future work, though it does not integrate with any particular winsys. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Use u_pipe_screen_get_param_defaultsAlyssa Rosenzweig2019-02-071-151/+6
| | | | | | | | Switching to the defaults function cleans up pan_screen.h markedly and futureproofs for when new PIPE_CAPs are added. Signed-off-by: Alyssa Rosenzweig <[email protected]> Suggested-by: Eric Anholt <[email protected]>
* nvc0: add compute invocation counterRhys Perry2019-02-068-4/+207
| | | | | | | | | | | | | | | | | The strategy is to keep a CPU-side counter of the direct invocations, and a GPU-side counter of the indirect invocations, and then add them together for queries. The specific technique is a macro which multiplies a list of integers together and accumulates the product into SCRATCH registers held inside of the context. Another macro will read those values out and add them to the passed-in cpu-side counter to be stored in a query buffer the same way that all the other statistics are stored. Original implementation by Rhys Perry, redone by Ilia Mirkin to use the SCRATCH temporaries. Signed-off-by: Ilia Mirkin <[email protected]>
* gm107/ir: add fp64 rsqKarol Herbst2019-02-063-3/+128
| | | | | Acked-by: Ilia Mirkin <[email protected]> Cc: 19.0 <[email protected]>
* gm107/ir: add fp64 rcpKarol Herbst2019-02-063-4/+270
| | | | | Acked-by: Ilia Mirkin <[email protected]> Cc: 19.0 <[email protected]>
* gk104/ir: Use the new rcp/rsq in libraryKarol Herbst2019-02-063-15/+334
| | | | | | [imirkin: add a few more "long" prefixes to safen things up] Acked-by: Ilia Mirkin <[email protected]> Cc: 19.0 <[email protected]>
* gk110/ir: Use the new rcp/rsq in libraryBoyan Ding2019-02-065-0/+42
| | | | | | | | | | v2: (Karol Herbst <[email protected]> * fix Value setup for the builtins Signed-off-by: Boyan Ding <[email protected]> [imirkin: track the fp64 flag when switching ops to calls] Signed-off-by: Ilia Mirkin <[email protected]> Cc: 19.0 <[email protected]>
* gk110/ir: Add rsq f64 implementationBoyan Ding2019-02-062-2/+109
| | | | | | Signed-off-by: Boyan Ding <[email protected]> Acked-by: Ilia Mirkin <[email protected]> Cc: 19.0 <[email protected]>
* gk110/ir: Add rcp f64 implementationBoyan Ding2019-02-062-4/+235
| | | | | | Signed-off-by: Boyan Ding <[email protected]> Acked-by: Ilia Mirkin <[email protected]> Cc: 19.0 <[email protected]>
* nvc0: stick zero values for the compute invocation countsIlia Mirkin2019-02-061-0/+2
| | | | | | | | | | Not quite perfect, but at least we don't end up with random values in the query buffer. Fixes KHR-GL45.pipeline_statistics_query_tests_ARB.functional_default_qo_values Signed-off-by: Ilia Mirkin <[email protected]> Cc: 19.0 <[email protected]>
* nv50,nvc0: use condition for occlusion queries when already completeIlia Mirkin2019-02-066-28/+25
| | | | | | | | | | | | | | | | | | For the NO_WAIT variants, we would jump into the ALWAYS case for both nested and inverted occlusion queries. However if the query had previously completed, the application could reasonably expect that the render condition would follow that result. To resolve this, we remove the nesting distinction which unnecessarily created an imbalance between the regular and inverted cases (since there's no "zero" condition mode). We also use the proper comparison if we know that the query has completed (which could happen as a result of an earlier get_query_result call). Fixes KHR-GL45.conditional_render_inverted.functional Signed-off-by: Ilia Mirkin <[email protected]> Cc: 19.0 <[email protected]>
* nvc0: fix 3d images on keplerIlia Mirkin2019-02-062-35/+34
| | | | | | | | | | | | | Looks like SUBFM.3D and SUEAU are perfectly capable of dealing with 3d tiling, they just need the correct inputs. Supply them. We also have to deal with the case where a 2d "layer" of a 3d image is bound. In this case, we supply the z coordinate separately to the shader, which has to optionally treat every 2d case as if it could be a slice of a 3d texture. Signed-off-by: Ilia Mirkin <[email protected]> Cc: 19.0 <[email protected]>
* nvc0/ir: fix second tex argument after levelZero optimizationIlia Mirkin2019-02-062-25/+24
| | | | | | | | | | | | | | | | | | | | We used to pre-set a bunch of extra arguments to a texture instruction in order to force the RA to allocate a register at the boundary of 4. However with the levelZero optimization, which removes a LOD argument when it's uniformly equal to zero, we undid that logic by removing an extra argument. As a result, we could end up with insufficient alignment on the second wide texture argument. Instead we switch to a different method of achieving the same result. The logic runs during the constraint analysis of the RA, and adds unset sources as necessary right before being merged into a wide argument. Fixes MISALIGNED_REG errors in Hitman when run with bindless textures enabled on a GK208. Fixes: 9145873b152 ("nvc0/ir: use levelZero flag when the lod is set to 0") Signed-off-by: Ilia Mirkin <[email protected]> Cc: 19.0 <[email protected]>
* nvc0/ir: always use CG mode for loads from atomic-only buffersIlia Mirkin2019-02-061-2/+12
| | | | | | | | | | | | | | | | Atomic operations don't update the local cache, which means that we would have to issue CCTL operations in order to get the updated values. When we know that a buffer is primarily used for atomic operations, it's easier to just avoid the caching at that level entirely. The same issue persists for non-atomic buffers, which will have to be fixed separately. Fixes the failing dEQP-GLES31.functional.atomic_counter.* tests. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Karol Herbst <[email protected]> Cc: 19.0 <[email protected]>
* nvc0: add support for handling indirect draws with attrib conversionIlia Mirkin2019-02-063-1/+82
| | | | | | | | | | | | | | | | The hardware does not natively support FIXED and DOUBLE formats. If those are used in an indirect draw, they have to be converted. Our conversion tries to be clever about only converting the data that's needed. However for indirect, that won't work. Given that DOUBLE or FIXED are highly unlikely to ever be used with indirect draws, read the indirect buffer on the CPU and issue draws directly. Fixes the failing dEQP-GLES31.functional.draw_indirect.random.* tests. Signed-off-by: Ilia Mirkin <[email protected]> Cc: 19.0 <[email protected]>
* freedreno/a6xx: Use tiling for all resourcesKristian H. Kristensen2019-02-061-1/+0
| | | | | | | | We used to restrict this to just PIPE_BIND_SAMPLER_VIEW resources, but most resources benefit from being tiled. Signed-off-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* freedreno/a6xx: Emit blitter dst with OUT_RELOCWKristian H. Kristensen2019-02-061-1/+1
| | | | | | | | | We're writing to the bo and the kernel needs to know for fd_bo_cpu_prep() to work. Fixes: f93e43127252679b ("freedreno/a6xx: Enable blitter") Reviewed-by: Rob Clark <[email protected]> Signed-off-by: Kristian H. Kristensen <[email protected]>
* radeonsi: use local ws variable in si_need_dma_spaceMarek Olšák2019-02-061-9/+10
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't leak an index buffer if draw_vbo failsMarek Olšák2019-02-061-3/+5
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: make allocator_zeroed_memory unmappable and use bigger buffersMarek Olšák2019-02-061-1/+2
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: clear allocator_zeroed_memory with SDMAMarek Olšák2019-02-064-12/+9
| | | | | | | | so that it can be used in parallel IBs. This also removes the SO_FILLED_SIZE hack. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: initialize textures using DCC to black when possibleMarek Olšák2019-02-063-13/+63
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* freedreno: a2xx: fix fast clearJonathan Marek2019-02-061-1/+0
| | | | | | | Fixes: 912a9c8d Signed-off-by: Jonathan Marek <[email protected]> Cc: 19.0 <[email protected]>
* v3d: Store the actual mask of color buffers present in the key.Eric Anholt2019-02-051-9/+10
| | | | | | | If you only bound rt 1+, we'd still emit a write to the rt0 that isn't present (noticed while debugging an ext_framebuffer_multisample-alpha-to-coverage-no-draw-buffer-zero regression in another change).
* v3d: Fix precompile of FRAG_RESULT_DATA1 and higher outputs.Eric Anholt2019-02-051-1/+1
| | | | I was just leaving the other MRT targets than DATA0 out, by accident.
* nir: Move V3D's "the shader was TGSI, ignore FS output types" flag to NIR.Eric Anholt2019-02-052-10/+2
| | | | | | | | | | | | | | Ken's rework of mesa/st builtins to NIR means that we'll have more NIR shaders with color output types that are mismatched with the render target types. Since this is behavior that GLSL doesn't require, add it as a shader_info option so the driver can know that it needs to ignore the FS output's base type in favor of the actual render target's. This prevents needing additional variants in several mesa/st paths (clear, pbo upload, pbo download), given that the driver already has to handle the variants for any TGSI being passed to it (from u_blitter, for example). Reviewed-by: Rob Clark <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nvc0/ir: replace cvt instructions with add to improve shader performanceKarol Herbst2019-02-052-0/+64
| | | | | | | | | | | | | | | | | | | gives me an performance boost of 0.2% in pixmark_piano on my gk106, gm204 and gp107. reduces the amount of generated convert instructions by roughly 30% in shader-db. v2: only for 32 bit operations move some common code out of the switch handle OP_SAT with modifiers v3: only for registers and const memory rework if clauses merge isCvt into this patch v4: merge isCvt into its use Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* swr/rast: update SWR rasterizer shader statsAlok Hota2019-02-0510-38/+204
| | | | | | Primarily refactoring internal stats types Reviewed-by: Bruce Cherniak <[email protected]>
* radeonsi: release tokens after creating the shader programGert Wollny2019-02-051-0/+2
| | | | | | | | | | | | | | | | | | | | ureg_get_tokens clears the reference to the tokens, and create_compute_state makes a copy, hence the tokens must be explicitely released. Fixes: Direct leak of 256 byte(s) in 1 object(s) allocated from: #0 0x7ff729cf3c60 in realloc (/usr/lib64/gcc/x86_64-pc-linux-gnu/7.3.0/libasan.so+0xdbc60) #1 0x7ff721b1240c in tokens_expand ../../samba/mesa/src/gallium/auxiliary/tgsi/tgsi_ureg.c:234 #2 0x7ff721b1c9c0 in get_tokens ../../samba/mesa/src/gallium/auxiliary/tgsi/tgsi_ureg.c:257 #3 0x7ff721b1c9c0 in copy_instructions ../../samba/mesa/src/gallium/auxiliary/tgsi/tgsi_ureg.c:2040 #4 0x7ff721b1c9c0 in ureg_finalize ../../samba/mesa/src/gallium/auxiliary/tgsi/tgsi_ureg.c:2090 #5 0x7ff721b1e919 in ureg_get_tokens ../../samba/mesa/src/gallium/auxiliary/tgsi/tgsi_ureg.c:2167 #6 0x7ff721f8b35a in si_create_dma_compute_shader ../../samba/mesa/src/gallium/drivers/radeonsi/si_shaderlib_tgsi.c:219 #7 0x7ff722043ed9 in si_compute_do_clear_or_copy ../../samba/mesa/src/gallium/drivers/radeonsi/si_compute_blit.c:156 #8 0x7ff7220448d3 in si_clear_buffer ../../samba/mesa/src/gallium/drivers/radeonsi/si_compute_blit.c:247 #9 0x7ff7220350e8 in vi_dcc_clear_level ../../samba/mesa/src/gallium/drivers/radeonsi/si_clear.c:274 Signed-off-by: Gert Wollny <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* nv50,nvc0: add explicit settings for recent capsIlia Mirkin2019-02-042-0/+4
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Cc: 19.0 <[email protected]>
* panfrost: Implement Midgard shader toolchainAlyssa Rosenzweig2019-02-0513-2/+6383
| | | | | | | | | | | | | | | This patch implements the free Midgard shader toolchain: the assembler, the disassembler, and the NIR-based compiler. The assembler is a standalone inaccessible Python script for reference purposes. The disassembler and the compiler are implemented in C, accessible via the standalone `midgard_compiler` binary. Later patches will use these interfaces from the driver for online compilation. Signed-off-by: Alyssa Rosenzweig <[email protected]> Acked-by: Jason Ekstrand <[email protected]> Acked-by: Rob Clark <[email protected]> Acked-by: Eric Anholt <[email protected]> Acked-by: Emil Velikov <[email protected]>
* panfrost: Initial stub for Panfrost driverAlyssa Rosenzweig2019-02-0511-0/+2984
| | | | | | | | | | | | | This patch adds an initial stub for the Gallium driver, containing simple screen functions and the majority of the driver headers but no actual functionality. It further adds the winsys glue for linking in this stub driver via kmsro on Rockchip/Amlogic boards. Signed-off-by: Alyssa Rosenzweig <[email protected]> Acked-by: Jason Ekstrand <[email protected]> Acked-by: Rob Clark <[email protected]> Acked-by: Eric Anholt <[email protected]> Acked-by: Emil Velikov <[email protected]>
* radeonsi: fix crashing performance counters (division by zero)Marek Olšák2019-02-041-1/+1
| | | | Fixes: e2b9329f17 "radeonsi: move remaining perfcounter code into si_perfcounter.c"
* radeonsi: handle render_condition_enable in si_compute_clear_render_targetMarek Olšák2019-02-043-3/+8
|
* radeonsi: use compute for clear_render_target when possibleSonny Jiang2019-02-045-0/+184
| | | | | Signed-off-by: Sonny Jiang <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* ac/radv/radeonsi: add ac_get_num_physical_sgprs() helperTimothy Arceri2019-02-011-4/+3
| | | | Reviewed-by: Samuel Pitoiset <[email protected]>
* freedreno: more fixing release tarballRob Clark2019-01-311-1/+3
| | | | | Fixes: aa0fed10d35 freedreno: move ir3 to common location Signed-off-by: Rob Clark <[email protected]>
* virgl: ARB_query_buffer_object supportDave Airlie2019-01-317-1/+58
| | | | | | v1.1: fix size define. Reviewed-by: Gurchetan Singh <[email protected]>
* virgl: enable elapsed time queriesDave Airlie2019-01-311-1/+1
| | | | | | GL underneath always has GL_TIME_ELAPSED so always enable these. Reviewed-by: Gurchetan Singh <[email protected]>
* radeonsi: fix a comment typo in si_fine_fence_setMarek Olšák2019-01-301-1/+1
|
* r600: add -Wstrict-overflow=0 to meson to silence the warningMarek Olšák2019-01-301-1/+1
| | | | same as radeonsi
* radeonsi: unify error paths in si_texture_create_objectMarek Olšák2019-01-301-9/+9
|
* radeonsi: merge & rename texture BO metadata functionsMarek Olšák2019-01-301-64/+53
|
* radeonsi: enable dithered alpha-to-coverage for better qualityMarek Olšák2019-01-301-4/+5
| | | | | | | same as AMDVLK. GL_NV_alpha_to_coverage_dither_control allows controlling this behavior. The default is implementation-dependent.
* v3d: Fix leak in resource setup error pathErnestas Kulik2019-01-291-1/+1
| | | | | | | | | Reported by Coverity: in the case of unsupported modifier request, the code does not jump to the “fail” label to destroy the acquired resource. CID: 1435704 Signed-off-by: Ernestas Kulik <[email protected]> Fixes: 45bb8f295710 ("broadcom: Add V3D 3.3 gallium driver called "vc5", for BCM7268.")
* vc4: Fix leak in HW queries error pathErnestas Kulik2019-01-291-1/+1
| | | | | | | | | | Reported by Coverity: in the case where there exist hardware and non-hardware queries, the code does not jump to err_free_query and leaks the query. CID: 1430194 Signed-off-by: Ernestas Kulik <[email protected]> Fixes: 9ea90ffb98fb ("broadcom/vc4: Add support for HW perfmon")
* v3d: Always enable the NEON utile load/store code.Eric Anholt2019-01-291-5/+6
| | | | | | | I can't imagine the new HW block being paired with a v6 CPU, so don't bother with the CPU detection that vc4 had to do. Improves 1024x1024 TexImage on my 7278 by 47.3229% +/- 0.679632%
* freedreno: fix sysmem rendering being used when clear is usedJonathan Marek2019-01-291-1/+1
| | | | | | | | | | This batch->cleared value is only used to decide to use sysmem rendering or not, so it should include any buffers that are affected by a clear. This is required because the a2xx fast clear doesn't work with sysmem rendering. The a22x "normal" clear path doesn't work with sysmem either. Signed-off-by: Jonathan Marek <[email protected]>