aboutsummaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers
Commit message (Collapse)AuthorAgeFilesLines
...
* radeonsi: do compilation from si_create_shader_selector asynchronouslyMarek Olšák2016-07-054-7/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | Main shader parts and geometry shaders are compiled asynchronously by util_queue. si_create_shader_selector doesn't wait and returns. si_draw_vbo(si_shader_select) waits for completion. This has the best effect when shaders are compiled at app-loading time. It doesn't help much for shaders compiled on demand, even though VS+PS compilation should take as much as time as the bigger one of the two. If an app creates more shaders, at most 4 threads will be used to compile them. Debug output disables this for shader stats to be printed in the correct order. (We could go even further and build variants asynchronously too, then emit draw calls without waiting and emit incomplete shader states, then force IB chaining to give the compiler more time, then sync the compilation at the IB flush and patch the IB with correct shader states. This is great for compilation before draw calls, but there are some difficulties such as scratch and tess states requiring the compiler output, and an on-disk shader cache will likely be a much better and simpler solution.) Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't lock shader cache mutex during compilationMarek Olšák2016-07-051-6/+16
| | | | | | | | | | to allow multiple shaders to be compiled simultaneously. ALso, shader-db can again use all 4 cores. v2: Remove the pipe_mutex_unlock call in the error path. Reviewed-by: Nicolai Hähnle <[email protected]> (v1)
* radeonsi: separate the compilation chunk of si_create_shader_selectorMarek Olšák2016-07-053-80/+110
| | | | | | | The function interface is ready to be used by util_queue. Also, si_shader_select_with_key can no longer accept si_context. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: move LLVMTargetMachineRef creation to a separate functionMarek Olšák2016-07-051-14/+18
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: add and use radeon_info::max_alloc_size (v2)Marek Olšák2016-07-054-10/+10
| | | | | | | | | | v2: - squashed the patches - use INT_MAX - clamp max_const_buffer_size - check the DRM version in radeon Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Vedran Miletić <[email protected]>
* radeonsi: print LLVM IRs to ddebug logsMarek Olšák2016-07-056-1/+26
| | | | | | | Getting LLVM IRs of hanging shaders have never been easier. Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: enable string markers and record apitrace call numbersMarek Olšák2016-07-053-1/+24
| | | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* ddebug: add an option to dump info about a specific apitrace callMarek Olšák2016-07-053-3/+29
| | | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* ddebug: implement pipe_context::generate_mipmapMarek Olšák2016-07-051-1/+52
| | | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* ddebug: record and dump apitrace call numbersMarek Olšák2016-07-054-1/+31
| | | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* ddebug: implement emit_string_markerMarek Olšák2016-07-051-3/+10
| | | | | | | and remove some obsolete comments Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: remove unused code - radeon_llvm_util.*Marek Olšák2016-07-055-169/+0
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: keep using v_rcp_f32 for division in future LLVM (v2)Marek Olšák2016-07-052-2/+30
| | | | | | | | | | This will be needed after some LLVM changes that haven't landed yet. v2: - use LLVMIsConstant to fix an LLVM assertion failure. LLVMSetMetadata doesn't work with constants. - don't set float metadata as string Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: remove an obsolete commentMarek Olšák2016-07-051-5/+0
| | | | | | It's not true. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't interpolate colors if flatshading is enabledMarek Olšák2016-07-053-2/+14
| | | | | | use v_interp_mov for those Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: enable the barycentric optimization in all casesMarek Olšák2016-07-053-18/+125
| | | | | | | | Handle the bc_optimize SGPR bit if both CENTER and CENTROID are enabled. This should increase the PS launch rate for big primitives with MSAA. Based on discussion with SPI guys. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: compute only one set of interpolation (i,j) when MSAA is disabledMarek Olšák2016-07-053-3/+88
| | | | | | | This should increase the PS launch rate for shaders using at least 2 pairs of perspective (i,j) and same for linear. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: split ps.prolog.force_persample_interp into persp and linear bitsMarek Olšák2016-07-053-45/+64
| | | | | | This reduces the number of v_mov's in the prolog. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't dump the shader key for non-monolithic shaders earlyMarek Olšák2016-07-051-1/+2
| | | | | | It's always zero. Reviewed-by: Nicolai Hähnle <[email protected]>
* r600g: Add double precision FMA opsJan Vesely2016-07-051-0/+2
| | | | | | | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96782 Fixes: 54c4d525da7c7fc1e103d7a3e6db015abb132d5d ("r600g: Enable FMA on chips that support it") Signed-off-by: Jan Vesely <[email protected]> Tested-by: James Harvey <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* r600: fix duplicate 'const' declarationFrancesco Ansanelli2016-07-041-1/+1
| | | | Signed-off-by: Nicolai Hähnle <[email protected]>
* radeon/uvd: fix overflow error while calculating bit stream buffer sizeIndrajit Das2016-07-041-1/+1
| | | | Reviewed-by: Christian König <[email protected]>
* freedreno: fix crash on smaller gpus and higher resolutionsRob Clark2016-07-031-1/+1
| | | | | | | | | Devices with smaller GMEM size need more tiles. On db410c at 2048x1152, glmark2 shadow needed ~330 tiles for fullscreen. Lets bump it up to 512. (Maybe with MRT you could end up needing more, but at that point things are probably going to be painfully slow.) Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: support glsl linking for cmdline compilerRob Clark2016-07-021-24/+47
| | | | | | | | | | | For .vert/.frag, now multiple can be specified on the cmdline for purposes of linking, and the last one specified is the one that is fed into the ir3 backend (and dumped along the way if --verbose is specified) Without this, varyings in frag shaders would appear as undefined. Signed-off-by: Rob Clark <[email protected]>
* freedreno: update valid_buffer_range for SO buffersRob Clark2016-07-021-0/+5
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: support non-user_buffer constsRob Clark2016-07-022-3/+5
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a2xx: move setup/restore cmds into binning passRob Clark2016-07-024-9/+4
| | | | | | | | Rather than doing a separate submit at context create, move these cmds to before first tile, as is done on a3xx/a4xx. Otherwise state can be overwritten by other contexts. Signed-off-by: Rob Clark <[email protected]>
* freedreno: pass index buffer as a pipe_resourceRob Clark2016-07-022-16/+16
| | | | | | This will be useful in a following patch. Signed-off-by: Rob Clark <[email protected]>
* freedreno: switch emit_const_bo() to take prsc'sRob Clark2016-07-024-17/+18
| | | | | | We can push the unwrap of pipe_resource down. Signed-off-by: Rob Clark <[email protected]>
* nv30: Fix "array subscript is below array bounds" compiler warningHans de Goede2016-07-021-2/+1
| | | | | | | | gcc6 does not like the trick where we point to one entry before the array start and then start a while with a pre-increment. Signed-off-by: Hans de Goede <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nouveau: Fix a couple of "foo may be used uninitialized' compiler warningsHans de Goede2016-07-022-3/+3
| | | | | | | | | | | | | These are all new false positives with gcc6. In nouveau_compiler.c: gcc6 no longer assumes that passing a pointer to a variable into a function initialises that variable. In nv50_ir_from_tgsi.cpp op and mode are not set if there are 0 enabled dst channels, this never happens, but gcc cannot know this. Signed-off-by: Hans de Goede <[email protected]> Acked-by: Ilia Mirkin <[email protected]>
* nouveau: Fix gcc6 / c++11 auto_ptr deprecation compiler warningsHans de Goede2016-07-021-0/+4
| | | | | Signed-off-by: Hans de Goede <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* nouveau: Add support for SV_WORK_DIMHans de Goede2016-07-028-12/+29
| | | | | | | | Add support for SV_WORK_DIM for nvc0 and nve4. Signed-off-by: Hans de Goede <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* nvc0: Make NVC0_CB_AUX_GRID_INFO take an index argumentHans de Goede2016-07-023-4/+4
| | | | | | | | | This brings it inline with the other macros like NVC0_CB_AUX_UBO_INFO and NVC0_CB_AUX_TEX_INFO. Signed-off-by: Hans de Goede <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* nvc0: fix up image support for allowing multiple samplesIlia Mirkin2016-07-017-49/+108
| | | | | | | | | Basically we just have to scale up the coordinates and then add the relevant sample offset. The code to handle this was already largely present from Christoph's earlier attempts to pipe images through back in the dark ages, this just hooks it all up. Signed-off-by: Ilia Mirkin <[email protected]>
* nv30: go back to not using viewport validate function for swtnlIlia Mirkin2016-07-012-1/+16
| | | | | | | | | The output of draw requires a null viewport transform, which the regular code is ill-equiped to do. Reinstate the original settings in the render path, and add setting of the viewport clip polygon based on fb width/height (as that is all taken care of by draw). Signed-off-by: Ilia Mirkin <[email protected]>
* nv30: fix viewport clipping settings to be based on viewport, not rtIlia Mirkin2016-07-012-17/+11
| | | | | | | This fixes a ton of "*clip*" dEQP GLES2 tests, as well as triangle-guardband-viewport in piglit. Signed-off-by: Ilia Mirkin <[email protected]>
* swr: Refactor checks for compiler feature flagsChuck Atkins2016-06-301-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | Encapsulate the test for which flags are needed to get a compiler to support certain features. Along with this, give various options to try for AVX and AVX2 support. Ideally we want to use specific instruction set feature flags, like -mavx2 for instance instead of -march=haswell, but the flags required for certain compilers are different. This allows, for AVX2 for instance, GCC to use -mavx2 -mfma -mbmi2 -mf16c while the Intel compiler which doesn't support those flags can fall back to using -march=core-avx2. This addresses a bug where the Intel compiler will silently ignore the AVX2 instruction feature flags and then potentially fail to build. v2: Pass preprocessor-check argument as true-state instead of false-state for clarity. v3: Reduce AVX2 define test to just __AVX2__. Additional defines suchas __FMA__, __BMI2__, and __F16C__ appear to be inconsistently defined w.r.t thier availability. v4: Fix C++11 flags being added globally and add more logic to swr_require_cxx_feature_flags Cc: <[email protected]> Reviewed-by: Tim Rowley <[email protected]> Tested-by: Tim Rowley <[email protected]> Signed-off-by: Chuck Atkins <[email protected]>
* svga: use SVGA3D_vgpu10_BufferCopy() for buffer copiesBrian Paul2016-06-301-4/+28
| | | | | | | | | | So that we do copies host-side rather than in the guest with map/memcpy. Tested with piglit arb_copy_buffer-subdata-sync test and new arb_copy_buffer-intra-buffer-copy test. Reviewed-by: Charmaine Lee <[email protected]> Acked-by: Roland Scheidegger <[email protected]>
* svga: add SVGA3D_vgpu10_BufferCopy()Brian Paul2016-06-302-0/+30
| | | | | Acked-by: Roland Scheidegger <[email protected]> Reviewed-by: Charmaine Lee <[email protected]>
* svga: flush buffers when mapping for readingBrian Paul2016-06-301-13/+24
| | | | | | | | | | | | | | | With host-side buffer copies (via SVGA3D_vgpu10_BufferCopy()) we have to make sure any pending map-write operations are completed before reading if the buffer is dirty. Otherwise the ReadbackSubResource operation could get stale data from the host buffer. This allows the piglit arb_copy_buffer-subdata-sync test to pass when we start using the SVGA3D_vgpu10_BufferCopy command. v2: check the sbuf->dirty flag in the outer conditional, per Charmaine. Acked-by: Roland Scheidegger <[email protected]> Reviewed-by: Charmaine Lee <[email protected]>
* svga: enable ARB_copy_image extension in the driverNeha Bhende2016-06-301-1/+2
| | | | | | Reviewed-by: Brian Paul <[email protected]> Acked-by: Roland Scheidegger <[email protected]> Reviewed-by: Charmaine Lee <[email protected]>
* svga: try blitting with copy region in more casesBrian Paul2016-06-301-1/+7
| | | | | | | | | | We previously could do blits with util_resource_copy_region() when doing 'loose' format checking. Also do blits with util_resource_copy_region() when the blit src/dst formats (not the underlying resources) exactly match. Needed for GL_ARB_copy_image. Acked-by: Roland Scheidegger <[email protected]> Reviewed-by: Charmaine Lee <[email protected]>
* svga: use copy_region_vgpu10() for region copies when possibleBrian Paul2016-06-301-4/+37
| | | | | | | v2: remove extra svga_define_texture_level() call, per Charmaine. Acked-by: Roland Scheidegger <[email protected]> Reviewed-by: Charmaine Lee <[email protected]>
* svga: use vgpu10 CopyRegion command when possibleNeha Bhende2016-06-301-2/+147
| | | | | | | | | Do texture->texture copies host-side with this command when possible. Use the previous software fallback otherwise. Reviewed-by: Brian Paul <[email protected]> Acked-by: Roland Scheidegger <[email protected]> Reviewed-by: Charmaine Lee <[email protected]>
* svga: set render target flag for snorm surfacesBrian Paul2016-06-301-0/+10
| | | | | | | | | We don't normally support rendering to SNORM surfaces, but with GL_ARB_copy_image we can copy to them if we treat them as typeless and use a UNORM surface view. Acked-by: Roland Scheidegger <[email protected]> Reviewed-by: Charmaine Lee <[email protected]>
* svga: add new svga_format_is_uncompressed_snorm() helperBrian Paul2016-06-302-0/+24
| | | | | Acked-by: Roland Scheidegger <[email protected]> Reviewed-by: Charmaine Lee <[email protected]>
* svga: adjust sampler view format for RGBXBrian Paul2016-06-301-1/+5
| | | | | | | | We previously handled the case of a RGBX sampler view of a RGBA surface. Add the reverse case too. For GL_ARB_copy_image. Acked-by: Roland Scheidegger <[email protected]> Reviewed-by: Charmaine Lee <[email protected]>
* svga: adjust render target view format for RGBXBrian Paul2016-06-301-1/+13
| | | | | | | | For GL_ARB_copy_image we may be asked to create an RGBA view of a RGBX surface. Use an RGBX view format for that case. Acked-by: Roland Scheidegger <[email protected]> Reviewed-by: Charmaine Lee <[email protected]>
* svga: don't advertise support for R32G32B32_UINT/SINT surface formatsNeha Bhende2016-06-301-2/+2
| | | | | | | | | | | | | | | | | | We want to be able to copy between different 32-bit, 3-channel surface formats for GL_ARB_copy_image but since we don't support R32G32B32_FLOAT for textures (it's not blendable and wouldn't work for render to texture) we can't support 32-bit, 3-channel integer formats. The state tracker will choose 4-channel formats instead. Fixes the piglit arb_copy_image-format test for several cases. Note: This change may need to be revisited if/when the texture_view exension is enabled in driver. Reviewed-by: Brian Paul <[email protected]> Acked-by: Roland Scheidegger <[email protected]> Reviewed-by: Charmaine Lee <[email protected]>