summaryrefslogtreecommitdiffstats
path: root/src/gallium
Commit message (Collapse)AuthorAgeFilesLines
* nv50/ir: allow to swap sources for OP_SUBSamuel Pitoiset2016-07-221-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This allows the load-propagation pass to swap the sources in presence of immediate values. Maxwell (GM107): total instructions in shared programs :1928187 -> 1927634 (-0.03%) total gprs used in shared programs :330741 -> 330154 (-0.18%) total local used in shared programs :28032 -> 28032 (0.00%) local gpr inst bytes helped 0 271 425 425 hurt 0 0 194 194 Fermi (GF114): total instructions in shared programs :2334474 -> 2333829 (-0.03%) total gprs used in shared programs :380934 -> 380215 (-0.19%) total local used in shared programs :33304 -> 33264 (-0.12%) local gpr inst bytes helped 5 314 521 521 hurt 0 4 195 195 No regressions on GM107 and GF114 with full piglit. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* gallium/radeon: make deferred flushes asynchronousMarek Olšák2016-07-221-0/+2
| | | | Reviewed-by: Edward O'Callaghan <[email protected]>
* gallium: add PIPE_FLUSH_DEFERREDMarek Olšák2016-07-222-1/+12
| | | | | | | | | | | | | There are 2 uses: - Asynchronous flushing for multithreaded drivers. - Return a fence without flushing (mid-command-buffer fence). The driver can defer flushing until fence_finish is called. This is required to make Bioshock Infinite faster, which creates 1000 fences (flushes) per frame. Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* gallium/os: use CLOCK_MONOTONIC for sleeps (v2)Marek Olšák2016-07-222-6/+14
| | | | | | v2: handle EINTR, remove backslashes Reviewed-by: Eric Engestrom <[email protected]>
* nvc0/mme: fix offsets used for indirect drawsSamuel Pitoiset2016-07-222-8/+8
| | | | | | | | | | This fixes a regression introduced in 1da704a94c57aa0b0cf8faaa3236fe47dfb8f88c because the offset has moved from 0x180 to 0x1a0, and the macros have to be re-compiled. Fixes: 1da704a ("nvc0: increase the tex handles area size in the driver") Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: fix offsets of MP perf counters input parametersSamuel Pitoiset2016-07-221-15/+15
| | | | | | | | | | | | | This fixes a regression introduced in 1da704a94c57aa0b0cf8faaa3236fe47dfb8f88c because the offset has moved from 0x600 to 0x620, and the kernels used for reading MP perf counters have to be re-assembled. This also fixes amd_performance_monitor_measure piglit. Fixes: 1da704a ("nvc0: increase the tex handles area size in the driver") Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* vc4: Return V3D version details in the GL renderer info.Eric Anholt2016-07-202-1/+12
| | | | This is as close as we get to a name for the 3D blocks.
* vc4: Check the V3D version reported by the kernel.Eric Anholt2016-07-202-0/+62
| | | | | | We don't want to bring up an old userspace driver on a kernel for newer hardware. We'll also want to look at the other ident fields in the future.
* vc4: Detect and report kernel support for branching.Eric Anholt2016-07-201-2/+12
|
* vc4: Switch to using the libdrm-provided vc4_drm.h.Eric Anholt2016-07-202-280/+2
| | | | | The required version is set to .69 for the getparam ioctl that will be used in the next commit.
* clover: Re-order includes in invocation.cpp to fix buildTom Stellard2016-07-201-7/+17
| | | | | | | | | | | | The build was failing because the official CL headers have a few defines, like: # define cl_khr_gl_sharing 1 Which have the same name as some class members of clang's OpenCLOptions class. If we include the cl headers first, this breaks the build because the member names of this class are replaced by the literal 1. Reviewed-by: Francisco Jerez <[email protected]> Reviewed-by: Vedran Miletić <[email protected]>
* clover: Add missing include v2Tom Stellard2016-07-201-0/+1
| | | | | | | | | | | clang commit r275822 removed unnecessary includes from header files, so we now need to explicitly include clang/Lex/PreprocessorOptions.h v2: - Use <> instead of "" for the include path. Reviewed-by: Francisco Jerez <[email protected]> Reviewed-by: Vedran Miletić <[email protected]>
* swr: [rasterizer core] introduce simd16intrin.hTim Rowley2016-07-204-6/+751
| | | | | | | | | Refactoring to leave existing simd_* intrinsics in "simdintrin.h" unchanged, adding corresponding simd16_* intrinsics in "simd16intrin.h" on the side, with emulation, that we can use piecemeal, rather than the all-or-nothing approach to bring up avx512. Signed-off-by: Tim Rowley <[email protected]>
* swr: [rasterizer core] fix for possible int32 overflow conditionTim Rowley2016-07-201-1/+1
| | | | Signed-off-by: Tim Rowley <[email protected]>
* swr: [rasterizer core] rename *_MAX enum values to *_COUNTTim Rowley2016-07-205-22/+21
| | | | | | Makes these names semantically correct. Signed-off-by: Tim Rowley <[email protected]>
* swr: [rasterizer core] centroid correctionTim Rowley2016-07-201-9/+17
| | | | Signed-off-by: Tim Rowley <[email protected]>
* swr: [rasterizer core] support range of values in TemplateArgUnrollerTim Rowley2016-07-203-26/+56
| | | | | | Fixes Linux warnings. Signed-off-by: Tim Rowley <[email protected]>
* swr: [rasterizer core] ensure adjacent topologies use the cut-aware PATim Rowley2016-07-201-5/+2
| | | | Signed-off-by: Tim Rowley <[email protected]>
* swr: [rasterizer] attribute swizzling and linkageTim Rowley2016-07-2011-171/+218
| | | | | | | | | Add support for enhanced attribute swizzling. Currently supports constant source overrides to handle PrimitiveID support. No support yet for input select swizzling or wrap shortest. Removes obsoleted linkageMask and associated code. Signed-off-by: Tim Rowley <[email protected]>
* swr: [rasterizer common] icc declspec definitionsTim Rowley2016-07-201-1/+17
| | | | Signed-off-by: Tim Rowley <[email protected]>
* swr: [rasterizer jitter] rework vertex/instance ID storage in fetchTim Rowley2016-07-202-64/+36
| | | | | | | | Moved the setting into the existing component control code. Fixes bad interaction between attribute/component setting for vertex/instance ID and component packing. Signed-off-by: Tim Rowley <[email protected]>
* swr: [rasterizer core] avx512 simd utility workTim Rowley2016-07-204-10/+1026
| | | | | | Enabling KNOB_SIMD_WIDTH = 16 for AVX512 pre-work and low level simd utils Signed-off-by: Tim Rowley <[email protected]>
* swr: [rasterizer core] viewport rounding for disabled scissorTim Rowley2016-07-201-2/+4
| | | | | | | Adjust viewport rounding when scissor rect is disabled during macro tile scissor setup. Signed-off-by: Tim Rowley <[email protected]>
* gallium/dri: Add shared glapi to LIBADD on AndroidTomasz Figa2016-07-201-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | An earlier patch fixed the problem for classic drivers, however Gallium was still left broken. This patch applies the same workaround to Gallium, when compiled for Android. Following is a quote from the original patch: 0cbc90c57cfc mesa: dri: Add shared glapi to LIBADD on Android /system/vendor/lib/dri/*_dri.so actually depend on libglapi: without this, loading the so file fails with: cannot locate symbol "__emutls_v._glapi_tls_Context" On non-Android (non-bionic) platform, EGL uses the following workflow, which works fine: dlopen("libglapi.so", RTLD_LAZY | RTLD_GLOBAL); dlopen("dri/<driver>_dri.so", RTLD_NOW | RTLD_GLOBAL); However, bionic does not respect the RTLD_GLOBAL flag, and the dri library cannot find symbols in libglapi.so, so we need to link to libglapi.so explicitly. Android.mk already does this. Cc: "12.0" <[email protected]> Signed-off-by: Tomasz Figa <[email protected]> Signed-off-by: Nicolas Boichat <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* radeonsi: advertise 8 bits subpixel precision for viewport boundsJózef Kucia2016-07-201-1/+2
| | | | | Signed-off-by: Józef Kucia <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* r600: advertise 8 bits subpixel precision for viewport boundsJózef Kucia2016-07-201-1/+2
| | | | | Signed-off-by: Józef Kucia <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* gallium: add a cap for VIEWPORT_SUBPIXEL_BITS (v2)Józef Kucia2016-07-2017-0/+18
| | | | | | | | | | | | This allows Gallium drivers to advertise the subpixel precision for floating point viewports bounds. v2: - Set ViewportSubpixelBits in st_init_limits. Signed-off-by: Józef Kucia <[email protected]> Signed-off-by: Marek Olšák <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: disable MS images on GM107+Samuel Pitoiset2016-07-201-0/+7
| | | | | | | | MS images have to be handled explicitly and I don't plan to implement them for now. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: print OP_SUREDB subops in debug modeSamuel Pitoiset2016-07-201-0/+1
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* gm107/ir: add emission for SUREDxSamuel Pitoiset2016-07-201-0/+50
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* gm107/ir: add emission for SUSTx and SULDxSamuel Pitoiset2016-07-201-0/+104
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* gm107/ra: fix constraints for surface operationsSamuel Pitoiset2016-07-201-2/+23
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* gm107/ir: lower surface operationsSamuel Pitoiset2016-07-202-1/+77
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: bind images for 3d/cp shaders on GM107+Samuel Pitoiset2016-07-205-18/+207
| | | | | | | | | On Maxwell, images binding is slightly different (and much better) regarding Fermi and Kepler because a texture view needs to be uploaded for each image and this is going to simplify the thing a lot. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: increase the tex handles area size in the driver cbSamuel Pitoiset2016-07-201-11/+11
| | | | | | | | | | | | Currently, we can store 32 tex handles of 32-bits integer each and that fits perfectly with the underlying hardware except on GM107+ which requires to upload a texture view for each images. This patch increases the number of storable texture handles in the driver constant buffer from 32 to 40 because we expose 8 images. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* winsys/amdgpu: use pb_cache buckets for fewer pb_cache missesMarek Olšák2016-07-191-6/+21
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* winsys/radeon: use pb_cache buckets for fewer pb_cache missesMarek Olšák2016-07-191-7/+22
| | | | | | This makes Bioshock Infinite with deferred flushing 2.2% faster. Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/pb_cache: reduce the number of pointer dereferencesMarek Olšák2016-07-191-7/+9
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/pb_cache: divide the cache into buckets for reducing cache missesMarek Olšák2016-07-195-26/+47
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/pb_cache: check parameters that are more likely to fail firstMarek Olšák2016-07-191-8/+7
| | | | | | This makes Bioshock Infinite with deferred flushing 2% faster. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: emit PS exports lastMarek Olšák2016-07-191-13/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This effectively removes s_waitcnt instructions after FP16 exports. Before: v_cvt_pkrtz_f16_f32_e32 v0, v0, v1 ; 5E000300 v_cvt_pkrtz_f16_f32_e32 v1, v2, v3 ; 5E020702 exp 15, 0, 1, 0, 0, v0, v1, v0, v0 ; F800040F 00000100 s_waitcnt expcnt(0) ; BF8C0F0F v_cvt_pkrtz_f16_f32_e32 v0, v4, v5 ; 5E000B04 v_cvt_pkrtz_f16_f32_e32 v1, v6, v7 ; 5E020F06 exp 15, 1, 1, 0, 0, v0, v1, v0, v0 ; F800041F 00000100 s_waitcnt expcnt(0) ; BF8C0F0F v_cvt_pkrtz_f16_f32_e32 v0, v8, v9 ; 5E001308 v_cvt_pkrtz_f16_f32_e32 v1, v10, v11 ; 5E02170A exp 15, 2, 1, 0, 0, v0, v1, v0, v0 ; F800042F 00000100 s_waitcnt expcnt(0) ; BF8C0F0F v_cvt_pkrtz_f16_f32_e32 v0, v12, v13 ; 5E001B0C v_cvt_pkrtz_f16_f32_e32 v1, v14, v15 ; 5E021F0E exp 15, 3, 1, 1, 1, v0, v1, v0, v0 ; F8001C3F 00000100 s_endpgm ; BF810000 After: v_cvt_pkrtz_f16_f32_e32 v0, v0, v1 ; 5E000300 v_cvt_pkrtz_f16_f32_e32 v1, v2, v3 ; 5E020702 v_cvt_pkrtz_f16_f32_e32 v2, v4, v5 ; 5E040B04 v_cvt_pkrtz_f16_f32_e32 v3, v6, v7 ; 5E060F06 exp 15, 0, 1, 0, 0, v0, v1, v0, v0 ; F800040F 00000100 v_cvt_pkrtz_f16_f32_e32 v4, v8, v9 ; 5E081308 v_cvt_pkrtz_f16_f32_e32 v5, v10, v11 ; 5E0A170A exp 15, 1, 1, 0, 0, v2, v3, v0, v0 ; F800041F 00000302 v_cvt_pkrtz_f16_f32_e32 v6, v12, v13 ; 5E0C1B0C v_cvt_pkrtz_f16_f32_e32 v7, v14, v15 ; 5E0E1F0E exp 15, 2, 1, 0, 0, v4, v5, v0, v0 ; F800042F 00000504 exp 15, 3, 1, 1, 1, v6, v7, v0, v0 ; F8001C3F 00000706 s_endpgm ; BF810000 Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: set optimal settings in COMPUTE_RESOURCE_LIMITSMarek Olšák2016-07-191-2/+6
| | | | | | ported from Vulkan Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: really wait for the second EOP event and not the first oneMarek Olšák2016-07-191-1/+5
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: remove RADEON_FLUSH_KEEP_TILING_FLAGS flagMarek Olšák2016-07-195-16/+4
| | | | | | always set Reviewed-by: Nicolai Hähnle <[email protected]>
* gm107/ir: make use of ADD32I for all immediatesSamuel Pitoiset2016-07-191-1/+1
| | | | | | | | ADD only allows to emit 19-bits immediates. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Cc: <[email protected]>
* gm107/ir: add missing NEG modifier for IADD32ISamuel Pitoiset2016-07-191-0/+1
| | | | | | | | Like FADD32I, the NEG modifier of src0 is at position 56. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Cc: [email protected]
* ddebug: Fix trivial typo in stderr messageAndreas Boll2016-07-191-1/+1
| | | | Signed-off-by: Andreas Boll <[email protected]>
* vl: fix memory leakEric Engestrom2016-07-191-7/+1
| | | | | | | CovID: 1363008 Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Nayan Deshmukh <[email protected]> Reviewed-by: Christian König <[email protected]>
* vl: add entry pointBoyuan Zhang2016-07-191-0/+1
| | | | | | | | | | | | | Add entrypoint to distinguish H.264 decode and encode. For example, in patch 5/11 when is calling "VaCreateContext", "pps" and "sps" shouldn't be allocated for H.264 encoding. So we need to use the entry_point to determine this is H.264 decode or H.264 encode. We can use config to determine the entrypoint since config_id is passed to us for VaCreateContext call. However, for VaDestoyContext call, only context_id is passed to us. So we need to know the entrypoint in order to not free the pps/sps for encoding case. Signed-off-by: Boyuan Zhang <[email protected]> Reviewed-by: Christian König <[email protected]>
* nv50,nvc0: srgb rendering is only available for rgba/bgraIlia Mirkin2016-07-181-2/+2
| | | | | | | | | | | | | | | | Mark both L8_SRGB and L8A8_SRGB as non-renderable (the latter already didn't have the bind flags). This makes the state tracker pick a different format when rendering is required, or mark the fb as incomplete. This fixes: bin/getteximage-formats init-by-clear-and-render -auto -fbo bin/getteximage-formats init-by-rendering -auto -fbo which previously ran into srgb-encoding differences. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Cc: [email protected]