summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers
Commit message (Collapse)AuthorAgeFilesLines
* nvc0/ir: limit max number of regs based on availability in SMIlia Mirkin2016-05-302-2/+4
| | | | | | | | | This effectively limits registers to 32 and 64 for fermi and kepler when 1024 threads are used, but allows the full amount to be used with smaller thread sizes. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* nv50/ir: record number of threads in a compute shaderIlia Mirkin2016-05-305-2/+10
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* nv50/ir: Add missing handling of U64/S64 in inlinesPierre Moreau2016-05-301-1/+3
| | | | | Signed-off-by: Pierre Moreau <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* vc4: Fix doxygen warnings12.0-branchpointRhys Kidd2016-05-302-6/+6
| | | | | | | | Now that vc4 automated code documentation can be generated with doxygen, fix the warnings issued by Doxygen 1.8.11. Signed-off-by: Rhys Kidd <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* nvc0/ir: fix emission of predicate spill to registerIlia Mirkin2016-05-301-1/+2
| | | | | | The lane mask only applies to real mov's, while here we're using PSET. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0: fix some compute texture validation bits on keplerIlia Mirkin2016-05-303-2/+7
| | | | | | | | | (a) Make sure to update the TIC in case of an updated buffer address (b) Mark newly-inactive textures dirty so that we update the handle in set_tex_handles. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* swr: automake: silence the python invocationEmil Velikov2016-05-301-7/+8
| | | | | Cc: Tim Rowley <[email protected]> Signed-off-by: Emil Velikov <[email protected]>
* swr: automake: attempt to fix the out-of-tree buildEmil Velikov2016-05-301-0/+7
| | | | | | | | | | Make sure that the output folder is created otherwise the python scripts yells at us. Cc: [email protected] Cc: Tim Rowley <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96238 Signed-off-by: Emil Velikov <[email protected]>
* swr: remove LLVM dependency from source generation rules.Emil Velikov2016-05-301-2/+2
| | | | | | | | | | The dependencies should not mention any files external to the project. If we want to do sanity checks for the LLVM installed on the system we should do that in configure, yet again where is the merit which header gets checked and which doesn't ? Cc: Tim Rowley <[email protected]> Signed-off-by: Emil Velikov <[email protected]>
* swr: add all the generators to the release tarball.Emil Velikov2016-05-301-0/+24
| | | | | | Namely the python scripts and the knobs.template. Signed-off-by: Emil Velikov <[email protected]>
* softpipe: add sp_buffer.h to the sources list (release tarball)Emil Velikov2016-05-301-0/+1
| | | | Signed-off-by: Emil Velikov <[email protected]>
* freedreno: make sure we pick up ir3_nir_trig.py in the release tarballEmil Velikov2016-05-301-0/+1
| | | | Signed-off-by: Emil Velikov <[email protected]>
* gallium: push offset down to driverStanimir Varbanov2016-05-302-0/+13
| | | | | | | | | | | | | Push offset down to drivers when importing dmabuf. This is needed to more fully support EGL_EXT_image_dma_buf_import when a non-zero offset is specified. Tesing has been done for freedreno, and compile tested following gallium drivers: nouveau,svga,virgl,r600,r300,radeonsi,swrast,i915,ilo Signed-off-by: Stanimir Varbanov <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* radeonsi: Don't offset OFFCHIP_BUFFERING on pre-VI cards.Bas Nieuwenhuizen2016-05-301-2/+6
| | | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96239 Reviewed-by: Marek Olšák <[email protected]>
* nv50,nvc0: fix the max_vertices=0 caseIlia Mirkin2016-05-293-2/+4
| | | | | | | This is apparently legal. Drop any emit/restarts, and pass a 1 to the hardware. Signed-off-by: Ilia Mirkin <[email protected]>
* swr: [rasterizer] Do not define _mm256_storeu2_m128i with icc.Vinson Lee2016-05-281-1/+1
| | | | | | | | | | | | | | | | Fix build error with icc. CXX libswrAVX_la-swr_clear.lo icpc: command line warning #10006: ignoring unknown option '-Wdelete-non-virtual-dtor' In file included from ./rasterizer/jitter/jit_api.h(31), from swr_context.h(30), from swr_clear.cpp(24): ./rasterizer/common/os.h(135): error: expected an identifier void _mm256_storeu2_m128i(__m128i *hi, __m128i *lo, __m256i a) ^ Signed-off-by: Vinson Lee <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* gk110/ir: fix unspilling of predicates from registersIlia Mirkin2016-05-281-0/+28
| | | | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96258 Signed-off-by: Ilia Mirkin <[email protected]> Cc: "11.2 11.1" <[email protected]>
* nvc0: remove outdated surfaces validation code for GK104Samuel Pitoiset2016-05-281-70/+0
| | | | | | | | | This code was used for validating surfaces with compute but now we use pipe_image_view instead. Anyway, surfaces support should be re-introduced properly once OpenCL happens. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: do not always invalidate 3D CBs when using computeSamuel Pitoiset2016-05-281-8/+17
| | | | | | | | | Constant buffers are aliased between 3D and CP on Fermi, but we should only invalidate them when a compute shader actually uses CBs and not all the time after a lauching grid. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* radeonsi: enable OpenGL 4.3Bas Nieuwenhuizen2016-05-271-0/+4
| | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* nouveau: enable GL 4.3 on kepler/fermiDave Airlie2016-05-281-1/+1
| | | | Signed-off-by: Dave Airlie <[email protected]>
* radeonsi: always reserve output space for tess factorsMarek Olšák2016-05-271-1/+6
| | | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]> Tested-by: Dave Airlie <[email protected]>
* gallium/ddebug: Add passthrough for query_memory_info.Bas Nieuwenhuizen2016-05-271-0/+9
| | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* svga: remove unneeded casts in get_query_result_vgpu9() callsBrian Paul2016-05-271-2/+2
| | | | Reviewed-by: Charmaine Lee <[email protected]>
* svga: use MAYBE_UNUSED to silence release-build warningsBrian Paul2016-05-271-7/+4
| | | | Signed-off-by: Brian Paul <[email protected]>
* nvc0/ir: handle a load's reg result not being used for locked variantsIlia Mirkin2016-05-263-11/+45
| | | | | | | | | | | | | | For a load locked, we might not use the first result but the second result is the predicate result of the locking. In that case the load splitting logic doesn't apply (which is designed for splitting 128-bit loads). Instead we take the predicate and move it into the first position (as having a dead result in first def's position upsets all sorts of things including RA). Update the emitters to deal with this as well. Signed-off-by: Ilia Mirkin <[email protected]> Tested-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* nvc0/ir: avoid generating illegal instructions for compute constbuf loadsIlia Mirkin2016-05-261-1/+2
| | | | | | | | | | | | For user-supplied constbufs, fileIndex is 0. In that case, when we subtract 1, we'll end up loading from constbuf offset -16. This is illegal, and there are asserts to avoid it. Normally we'd just DCE it, but no point in generating the instructions if they're not going to be used. Signed-off-by: Ilia Mirkin <[email protected]> Acked-by: Hans de Goede <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* util/indices,svga: s/unsigned/enum pipe_prim_type/Brian Paul2016-05-262-2/+4
| | | | Reviewed-by: Roland Scheidegger <[email protected]>
* svga: s/unsigned/enum pipe_resource_usage/ for buffer usage variablesBrian Paul2016-05-263-3/+3
| | | | Reviewed-by: Roland Scheidegger <[email protected]>
* svga: s/unsigned/enum pipe_prim_type/ for primitive type variablesBrian Paul2016-05-267-14/+15
| | | | | | Proper enum types were only added recently. Reviewed-by: Roland Scheidegger <[email protected]>
* svga: fix test for unfilled triangles fallbackBrian Paul2016-05-263-6/+43
| | | | | | | VGPU10 actually supports line-mode triangles. We failed to make use of that before. Reviewed-by: Charmaine Lee <[email protected]>
* svga: clean up and improve comments in svga_draw_private.hBrian Paul2016-05-261-4/+8
| | | | Reviewed-by: Charmaine Lee <[email protected]>
* svga: fix primitive mode (point/line/tri) test for unfilled primitivesBrian Paul2016-05-262-2/+2
| | | | | | | | | The original mode test was valid before we had GS support. Regression tested with full piglit run. Though, I don't think we have any piglit tests that exercise drawing unfilled adjacency primitives. Reviewed-by: Charmaine Lee <[email protected]>
* nvc0: invalidate textures/samplers between 3D and CP on FermiSamuel Pitoiset2016-05-262-0/+27
| | | | | | | | | | | Like constant buffers, samplers and textures are aliased on Fermi and we need to invalidate the state when switching from 3D to CP and vice versa. This fixes rendering issues in the UE4 demos. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* compiler: Move glsl_to_nir to libglsl.laJason Ekstrand2016-05-261-1/+1
| | | | | | | | Right now libglsl.la depends on libnir.la so putting it in libnir.la adds a dependency on libglsl.la that goes the wrong direction. Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
* radeonsi: Allow TES distribution between shader engines.Bas Nieuwenhuizen2016-05-264-15/+40
| | | | | | | | | | | | | The R_028B50_VGT_TESS_DISTRIBUTION value is copied from amdgpu-pro. Smaller values in the ACCUM fields seem to decrease the performance advantage from this patch, higher values don't seem to matter. v2: Add distribution mode field enums. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Process multiple patches per threadgroup.Bas Nieuwenhuizen2016-05-261-15/+35
| | | | | | | | | | | | | | | | | | | | | | | Using more than 1 wave per threadgroup does increase performance generally. Not using too many patches per threadgroup also increases performance. Both catalyst and amdgpu-pro seem to use 40 patches as their maximum, but I haven't really seen any performance increase from limiting the number of patches to 40 instead of 64. Note that the trick where we overlap the input and output LDS does not work anymore as the insertion of the tess factors changes the patch stride. v2: - Add comment about LDS assumptions. - Add constant for buffer size. - Fix code style. v3: - Correct limits for not splitting patches between waves. - Set max num_patches to 40 as in the proprietary driver. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Add barrier before writing the tess factors.Bas Nieuwenhuizen2016-05-261-0/+6
| | | | | | | | The factors may be stored to LDs by another invocation than the invocation for vertex 0. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Enable dynamic HS.Bas Nieuwenhuizen2016-05-262-5/+16
| | | | | | | | | | This allows running the TES on different CU's than the TCS which results in performance improvements. v2: Only write the control word from one invocation. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Remove LDS layout user SGPR's from TES.Bas Nieuwenhuizen2016-05-263-13/+10
| | | | | | | | They are unused. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Use buffer loads and stores for passing data from TCS to TES.Bas Nieuwenhuizen2016-05-261-16/+50
| | | | | | | | | | | | | | | | We always try to use 4-component loads, as LLVM does not combine loads and they bypass the L1 cache. We can't use a similar strategy for stores and this is especially notable with the tess factors, as they are often set with separate MOV's per component in the TGSI. We keep storing to LDS and the LDS space, so we can load the outputs later, either due to the shader, of for wrting the tess factors. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Store inputs to memory when not using a TCS.Bas Nieuwenhuizen2016-05-263-0/+49
| | | | | | | | | | | | | | | | | We need to copy the VS outputs to memory. I decided to do this using a shader key, as the value depends on other shaders. I also switch the fixed function TCS over to monolithic, as otherwisze many of the user SGPR's need to be passed to the epilog, which increases register pressure, or complexity to avoid that. The main body of the fixed function TCS is not that interesting to precompile anyway, since we do it on demand and it is very small. v2: Use u_bit_scan64. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Add offchip buffer address calculation.Bas Nieuwenhuizen2016-05-261-0/+124
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of creating a memory area per patch and per vertex, we put the same attribute of every vertex & patch together. Most loads and stores access the same attribute across all lanes, only for different patches and vertices. For the TCS this results in tightly packed data for 4-component stores. For the TES this is not the case as within a patch the loads often also access the same vertex. However if there are < 4 vertices/patch, this still results in a reduction of the number of cache lines. In the LDS situation we only do better than worst case if the data per patch < 64 bytes, which due to the tessellation factors is pretty much never. We do not use hardware swizzling for this. It would slightly reduce the number of executed VALU instructions, but I had issues with increased wait times that I haven't been able to solve yet. Furthermore, the tbuffer_store intrinsic does not support both VGPR offset and an index, so we have a problem storing indirectly indexed outputs. This can be solved by temporarily storing arrays in LDS and then copying them, but I don't think that is worth the effort. The difference in VALU cycles hardware swizzling gives is about 0.2% of total busy cycles. That is without handling the array case. I chose for attributes instead of components as they are often accessed together, and the software swizzling takes VALU cycles for calculating offsets. v2: - Rename functions to get_tcs_tes_buffer_address. - multiply by 16 as late as possible. - Use tgsi_full_src_register_from_dst. - Remove some bad comments. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Add user SGPR for the layout of the offchip buffer.Bas Nieuwenhuizen2016-05-263-4/+20
| | | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Use correct parameter index for LS_OUT_LAYOUT.Bas Nieuwenhuizen2016-05-261-3/+4
| | | | | | | | | This happens to be in the right position, but that changes when TCS/TES get new parameters. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Add buffer load functions.Bas Nieuwenhuizen2016-05-261-0/+114
| | | | | | | | | | v2: - Use llvm.admgcn.buffer.load instrinsics for new LLVM. - Code style fixes. v3: - Code style fix. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Define build_tbuffer_store_dwords earlier to support new users.Bas Nieuwenhuizen2016-05-261-69/+69
| | | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Add offchip tessellation parameters.Bas Nieuwenhuizen2016-05-263-6/+34
| | | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Add buffer for offchip storage between TCS and TES.Bas Nieuwenhuizen2016-05-264-0/+23
| | | | | | | | | | | The buffer is quite large, but should only be allocated if the application uses tessellation. Most non-games don't. v2: - Use the correct register for SI. - Add define for block size. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* nvc0: allow to monitor MP perf counters with compute shadersSamuel Pitoiset2016-05-262-19/+55
| | | | | | | | | | | | | | | | | | To read out MP perf counters we use a compute shader and need to upload input data like a 64-bits addr used to store the values and a sequence ID for synchronization. Currently, this input data is uploaded as user uniforms which means that it's sticked to c0[], but if a compute shader from a real application is used, monitoring those performance counters will just overwrite some data and miserably crash. Instead, sticking the 64-bits addr and the sequence into the driver constant buffer seems like much better and will allow to monitor counters with GL 4.3 apps. Tested on GF119 and GK110, but should not hurt anything on GK104. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>