summaryrefslogtreecommitdiffstats
path: root/src/gallium
Commit message (Collapse)AuthorAgeFilesLines
* radeonsi: constify a bunch of the perfcounter structs.Dave Airlie2017-05-043-52/+46
| | | | | | | | This moves the structs from the data segment to the rodata segment, which seems like the more correct place for them. Reviewed-by: Nicolai Hähnle <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radeonsi/gfx9: fix gl_ViewportIndexMarek Olšák2017-05-032-8/+40
| | | | | | | v2: remove unnecessary LLVMBuildAnd calls Cc: 17.1 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi/gfx9: set VGT_REUSE_OFF = 0Marek Olšák2017-05-031-3/+7
| | | | | | same as Vulkan Reviewed-by: Nicolai Hähnle <[email protected]>
* etnaviv: add L8A8_UNORM texture formatChristian Gmeiner2017-05-031-0/+2
| | | | | | | | No piglit regressions. CC: <[email protected]> Signed-off-by: Christian Gmeiner <[email protected]> Reviewed-by: Philipp Zabel <[email protected]>
* ac: rename ac_eliminate_const_vs_outputs -> ac_optimize_vs_outputsMarek Olšák2017-05-031-5/+5
| | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* vc4: Use runtime CPU detection for whether NEON is available.Eric Anholt2017-05-022-14/+16
| | | | | | | | This will allow Raspbian's ARMv6 builds to take advantage of the new NEON code, and could prevent problems if vc4 ends up getting used on a v7 CPU without NEON. v2: Drop dead NEON_SUFFIX (noted by Erik Faye-Lund)
* vc4: Use a wrapper file to set VC4_BUILD_NEON instead of CFLAGS.Eric Anholt2017-05-024-8/+31
| | | | | | | | | Android.mk was setting the flag across the entire driver, so we didn't have non-NEON versions getting built. This was going to be a problem with the next commit, when I start auto-detecting NEON support and use the non-NEON version when appropriate. Reviewed-by: Rob Herring <[email protected]>
* gallium: Enable ARM NEON CPU detection.Eric Anholt2017-05-023-0/+46
| | | | | | | | | | | | | | | | | | I wrote this code with reference to pixman, though I've only decided to cover Linux (what I'm testing) and Android (seems obvious enough). Linux has getauxval() as a cleaner interface to the /proc entry, but it's more glibc-specific and I didn't want to add detection for that. This will be used to enable NEON at runtime on ARMv6 builds of vc4. v2: Actually initialize the temp vars in the Android path (noticed by daniels) v3: Actually pull in the cpufeatures library (change by robher). Use O_CLOEXEC. Break out of the loop when we find our feature. v4: Drop VFP code, which was confused about what it was detecting and not actually used yet. Reviewed-by: Grazvydas Ignotas <[email protected]>
* renderonly: use drmIoctlPhilipp Zabel2017-05-021-4/+3
| | | | | | | | | | | To restart interrupted system calls, use drmIoctl. Fixes: 848b49b288f ("gallium: add renderonly library") CC: <[email protected]> Suggested-by: Emil Velikov <[email protected]> Signed-off-by: Philipp Zabel <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Christian Gmeiner <[email protected]>
* renderonly: drop resources on destroyPhilipp Zabel2017-05-023-3/+13
| | | | | | | | | | | | | The renderonly_scanout holds a reference on its prime pipe resource, which should be released when it is destroyed. If it was created by renderonly_create_kms_dumb_buffer_for_resource, the dumb BO also has to be destroyed. Fixes: 848b49b288f ("gallium: add renderonly library") CC: <[email protected]> Signed-off-by: Philipp Zabel <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Christian Gmeiner <[email protected]>
* renderonly: close transfer prime_fdPhilipp Zabel2017-05-021-0/+2
| | | | | | | | | | | | prime_fd is only used to transfer the scanout buffer to the GPU inside renderonly_create_kms_dumb_buffer_for_resource. It should be closed immediately to avoid leaking the DMA-BUF file handle. Fixes: 848b49b288f ("gallium: add renderonly library") CC: <[email protected]> Signed-off-by: Philipp Zabel <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Christian Gmeiner <[email protected]>
* vc4: Only build the NEON code on arm32.Eric Anholt2017-05-011-2/+2
| | | | | | | | | | | NEON is sufficiently different on arm64 that we can't just reuse this code. Disable it on arm64 for now. v2: Use PIPE_ARCH_ARM instead, as __ARM_ARCH may be 8 for a 32-bit build for a v8 CPU. Signed-off-by: Eric Anholt <[email protected]> Cc: <[email protected]>
* gm107/ir: add a missing assertion in emitISCADD()Samuel Pitoiset2017-05-011-0/+2
| | | | | | | For consistency, similar to the other emitters. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* gallium/targets: fix bool setting on BE architecturesIlia Mirkin2017-04-298-11/+11
| | | | | | | | | | | | | val_bool and val_int are in a union. val_bool gets the first byte, which happens to work on LE when setting via the int, but breaks on BE. By setting the value properly, we are able to use DRI3 on BE architectures. Tested by running glxgears with a NV34 in a G5 PPC. Signed-off-by: Ilia Mirkin <[email protected]> Cc: [email protected] [Emil Velikov: squash the vmwgfx hunk] Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Michel Dänzer <[email protected]>
* st/wgl: whitespace, formatting fixes in stw_pixelformat.cBrian Paul2017-04-281-72/+62
| | | | Trivial.
* st/wgl: allow WGL_BIND_TO_TEXTURE_RGB_ARB for RGBA visualsCharmaine Lee2017-04-281-2/+2
| | | | | | | | | | | We do not need to restrict WGL_BIND_TO_TEXTURE_RGB_ARB to RGB visuals only. It can be supported with RGBA visuals as well. This fixes the early exit of cinebench-r15-test trace. Tested with cinebench-r15, piglit, glretrace. Reviewed-by: Brian Paul <[email protected]>
* st/wgl: use ARRAY_SIZE() macro in wglChoosePixelFormatARB()Brian Paul2017-04-281-1/+1
| | | | Trivial.
* st/wgl: whitespace/formatting fixes in stw_ext_pixelformat.cBrian Paul2017-04-281-59/+52
| | | | Trivial.
* svga: implement sRGB rendering for imported surfacesNeha Bhende2017-04-281-2/+9
| | | | | | | | | | If texture is imported and templ format is sRGB, use compatible sRGB format to the imported texture format while creating surface view. tested with MTT piglit, glretrace, viewperf and conform Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Charmaine Lee <[email protected]>
* svga: add function svga_linear_to_srgb()Neha Bhende2017-04-282-0/+29
| | | | | | | | This function will return compatible svga srgb format for corresponding linear format Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Charmaine Lee <[email protected]>
* svga: Add a more elaborate format compatibility determination v2Thomas Hellstrom2017-04-283-41/+93
| | | | | | | | | | | | | | dri3 is a bit sloppy about its format compatibility requirements, so add a possibility to import xrgb surfaces as argb textures and vice versa. At the same time, make the svga_texture_from_handle() function a bit more readable and fix the error path where we leaked a winsys surface. v2: Addressed review comments by Brian. Signed-off-by: Thomas Hellstrom <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Charmaine Lee <[email protected]>
* swr/rast: add memory api to SwrGetInterface()Tim Rowley2017-04-286-28/+54
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: use gather instruction for odd format fetchTim Rowley2017-04-281-46/+9
| | | | | | | Small fetch performance optimization - use gather instruction for odd format fetch instead of slow emulated code. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: enable SIMD16 8x2 tile backendTim Rowley2017-04-281-1/+1
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: add SwrInit() to init backend/memory tablesTim Rowley2017-04-285-22/+26
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: increment depth/stencil tile pointer in SIMD16 BETim Rowley2017-04-281-1/+1
| | | | | | | Misplaced #endif preventing depth and stencil hot tile pointers from incrementing in SIMD16 8x2 configuration of BackendPixelRate. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: add SwrGetInterface() function to return apiTim Rowley2017-04-283-44/+151
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: enable per-warp scratch space for CSTim Rowley2017-04-288-8/+33
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: reduce simd{16}vertex stack for VS outputTim Rowley2017-04-282-16/+54
| | | | | | | | | | | | | | | Frontend - reduce simdvertex/simd16vertex stack usage for VS output in ProcessDraw, fixes stack overflow in some of the deeper call stacks under SIMD16. 1. Move the vertex store out of PA_FACTORY, and off the stack 2. Allocate the vertex store out of the aligned heap (pointer is temporarily stored in TLS, but will be migrated to thread pool along with other frontend temporary buffers). 3. Grow the vertex store as necessary for the number of verts per primitive, in chunks of 8/4 simdvertex/simd16vertex Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: remove default argument from SwrSync()Tim Rowley2017-04-281-1/+1
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: remove unused variables in the SIMD16 FETim Rowley2017-04-283-14/+2
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: move construction of const above gotoTim Rowley2017-04-281-2/+2
| | | | | | Fixes gcc error for SIMD16 FE. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: name threads to aid debuggingTim Rowley2017-04-284-2/+126
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: disable buffer overrun warning for Assemble()Tim Rowley2017-04-281-2/+4
| | | | | | | | Disabling buffer overrun warning for Assemble(uint32_t slot, simdvector *verts) due to what looks like a MSVC compiler bug when compiling the SIMD16 FE. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: clean up clipper commentsTim Rowley2017-04-281-2/+2
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: add SIMDAPI decorators in binner/clipperTim Rowley2017-04-282-6/+6
| | | | | | Fixes MSVC errors with SIMD16 FE. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: add additional jit utility functionsTim Rowley2017-04-284-1/+76
| | | | | | Not used yet. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: more flexible max attribute slotsTim Rowley2017-04-287-27/+30
| | | | | | | | | | | Ability to allocate space for an arbitrary number (at compile time) of positions in the vertex layout. Removes KNOB_NUM_ATTRIBUTES from knobs.h, replaces the VTX slot number #defines with the SWR_VTX_SLOTS enum (which contains replacement for NUM_ATTRIBUTES: SWR_VTX_NUM_SLOTS) Reviewed-by: Bruce Cherniak <[email protected]>
* radeonsi: don't load unused compute shader input SGPRs and VGPRsMarek Olšák2017-04-284-48/+76
| | | | | | | | | Basically, don't load GRID_SIZE or BLOCK_SIZE if they are unused, determine whether to load BLOCK_ID for each component separately, and set the number of THREAD_ID VGPRs to load. Now we should get the maximum CS launch wave rate in most cases. Reviewed-by: Nicolai Hähnle <[email protected]>
* tgsi/scan: record compute shader system value usageMarek Olšák2017-04-282-0/+37
| | | | | | v2: just do indexing with swizzle[i] Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: add a HUD query for draw calls with primitive restartMarek Olšák2017-04-284-0/+11
| | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radeonsi: tell LLVM not to remove s_barrier instructionsMarek Olšák2017-04-281-12/+33
| | | | | | | LLVM 5.0 removes s_barrier instructions if the max-work-group-size attribute is not set. What a surprise. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: fix tess offchip offset for per-patch attributesMarek Olšák2017-04-283-12/+18
| | | | | | We need 4 more bits there. I don't know what is fixed by this. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: pass tessellation ring addresses via user SGPRsMarek Olšák2017-04-287-56/+112
| | | | | | | | | | | | | | | | | This removes s_load_dword latency for tess rings. We need just 1 SGPR for the address if we use 64K alignment. The final asm for recreating the descriptor is: // s2 is (address >> 16) s_mov_b32 s3, 0 s_lshl_b64 s[4:5], s[2:3], 16 s_mov_b32 s6, -1 s_mov_b32 s7, 0x27fac v2: bitcast the descriptor type from v2i64 to v4i32 Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: use si_insert_input_ret in si_llvm_emit_tcs_epilogueMarek Olšák2017-04-281-19/+10
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: remove VS epilog code, compile VS with PrimID export on demandMarek Olšák2017-04-285-210/+31
| | | | | | | | | | | | The use of PrimID in the pixel shader is too rare to deserve such a sizable support code. The initial idea of the VS epilog was to move the clipping code there and remove it based on states, but optimized variants are now used to do that and are easier to support, so the VS epilog has turned out to be not so useful. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: get InstanceID from VGPR1 (or VGPR2 for tess) instead of VGPR3Marek Olšák2017-04-284-13/+33
| | | | | | VGPR1 = InstanceID / StepRate0; // StepRate0 can be set to 1 Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't load PrimID in TES if it's not usedMarek Olšák2017-04-281-3/+3
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: explain (non-)monolithic shadersMarek Olšák2017-04-281-0/+67
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi/gfx9: enable OpenGL 4.5Marek Olšák2017-04-281-5/+0
| | | | | | | Tentatively enable it, expecting the scratch buffer support to be done before the next Mesa release. Reviewed-by: Nicolai Hähnle <[email protected]>