summaryrefslogtreecommitdiffstats
path: root/src/gallium
Commit message (Collapse)AuthorAgeFilesLines
* st/clover: provide a path for drivers to call through to pfn_notifyIlia Mirkin2015-11-054-4/+36
| | | | | | | Signed-off-by: Ilia Mirkin <[email protected]> [ Francisco Jerez: Clean up clover::context interface by passing around a function object. ]
* gallium: expose a debug message callback settable by context ownerIlia Mirkin2015-11-056-0/+82
| | | | | | | This will allow gallium drivers to send messages to KHR_debug endpoints Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* nouveau: relax fence emit space assertIlia Mirkin2015-11-043-3/+3
| | | | | | | | | We also have the "reserved for kick" space available. Some of my earlier changes can probably be removed, but this is a quick fix for some of the rarer fallout. Signed-off-by: Ilia Mirkin <[email protected]> Cc: <[email protected]>
* vc4: When the create ioctl fails, free our cache and try again.Eric Anholt2015-11-041-5/+24
| | | | | | | | | This greatly increases the pressure you can put on the driver before create fails. Ultimately we need to let the kernel take control of our cached BOs and just take them from us (and other clients) directly, but this is a very easy patch for the moment. Cc: "11.0" <[email protected]>
* vc4: Print the rounded shader size in debug output.Eric Anholt2015-11-041-1/+1
| | | | | It's surprising to see "0kb" printed for debug on short shaders, while 4kb alignment won't be suprising.
* vc4: Fix dumping the size of BOs allocated/cached.Eric Anholt2015-11-041-2/+2
| | | | 60MB of cached BOs are a lot less scary than 600MB.
* svga: implement 'white_fragments' option for VGPU10 fragment shadersBrian Paul2015-11-041-5/+30
| | | | | | | | | | When we emulate XOR logicop mode with blend-subtract, we need to ensure that the fragment shader always emits white. We had this implemented for VGPU9, but not VGPU10. VMware bug 1545492. Reviewed-by: Charmaine Lee <[email protected]>
* u_vbuf: minor code reformatting / line wrappingBrian Paul2015-11-041-4/+8
| | | | Trivial.
* u_vbuf: add some const qualifiersBrian Paul2015-11-041-2/+2
| | | | Trivial.
* svga: use new enum indices_mode typeBrian Paul2015-11-042-2/+4
| | | | Reviewed-by: Charmaine Lee <[email protected]>
* util/indices: replace #define tokens with enum typeBrian Paul2015-11-043-86/+91
| | | | | | To ease debugging in gdb. Reviewed-by: Charmaine Lee <[email protected]>
* gallivm: fix sampling for s3tc srgb formats when using texture cacheRoland Scheidegger2015-11-041-1/+3
| | | | | | | | | This actually stored the values as 8bit linear values in the cache, then did another srgb->linear conversion... We don't want to do the former (decoding 8bit srgb values to 8bit linear completely defeats the purpose of srgb in the first place), so just decode to 8bit srgb. Fixes piglit.spec.ext_texture_srgb.texwrap formats-s3tc tests.
* llvmpipe: add cache for compressed texturesRoland Scheidegger2015-11-0419-18/+730
| | | | | | | | | | | | | | | | | | | | | | compressed textures are very slow because decoding is rather complex (and because there's no jit code code to decode them too for non-technical reasons). Thus, add some texture cache which holds a couple of decoded blocks. Right now this handles only s3tc format albeit it could be extended to work with other formats rather trivially as long as the result of decode fits into 32bit per texel (ideally, rgtc actually would decode to more than 8 bits per channel, but even then making it work for it shouldn't be too difficult). This can improve performance noticeably but don't expect wonders (uncompressed is unsurprisingly still faster). It's also possible it might be slower in some cases (using nearest filtering for example or if there's otherwise not many cache hits, the cache is only direct mapped which isn't great). Also, actual decode of a block relies on util code, thus even though always full blocks are decoded it is done texel by texel - this could obviously benefit greatly from simd-optimized code decoding full blocks at once... Note the cache is per (raster) thread, and currently only used for fragment shaders. Reviewed-by: Jose Fonseca <[email protected]>
* llvmpipe: use simple coeffs calc for 128bit vectorsOded Gabbay2015-11-041-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are currently two methods in llvmpipe code to calculate coeffs to be used as inputs for the fragment shader. The two methods use slightly different ways to do the floating point calculations and thus produce slightly different results. The decision which method to use is determined by the size of the vector that is used by the platform. For vectors with size of more than 128bit, a single-step method is used, in which coeffs_init_simple() + attribs_update_simple() are called. For vectors with size of 128bit or less, a two-step method is used, in which coeffs_init() + attribs_update() are called. This causes some piglit tests (clip-distance-bulk-copy, interface-vs-unnamed-to-fs-unnamed) to fail when using platforms with 128bit vectors (such as ppc64le or x86-64 without AVX). This patch makes platforms with 128bit vectors use the single-step method (aka "simple" method) instead of the two-step method. This would make the resulting coeffs identical between more platforms, make sure the piglit tests passes, and make debugging and maintainability a bit easier as the generated LLVM IR will be the same for more platforms. The performance impact is negligible for x86-64 without AVX, and basically non-existent for ppc64le, as it can be seen from the following benchmarking results: - glxspheres, on ppc64le: - original code: 4.892745317 frames/sec 5.460303857 Mpixels/sec - with the patch: 4.932083873 frames/sec 5.504205571 Mpixels/sec - Additional 0.8% performance boost - glxspheres, on x86-64 without AVX: - original code: 20.16418809 frames/sec 22.50323395 Mpixels/sec - with the patch: 20.31328989 frames/sec 22.66963152 Mpixels/sec - Additional 0.74% performance boost - glmark2, on ppc64le: - original code: score of 58 - with my change: score of 57 - glmark2, on x86-64 without AVX: - original code: score of 175 - with the patch: score of 167 - Impact of of -4.5% on performance - OpenArena, on ppc64le: - original code: 3398 frames 1719.0 seconds 2.0 fps 255.0/505.9/2773.0/0.0 ms - with the patch: 3398 frames 1690.4 seconds 2.0 fps 241.0/497.5/2563.0/0.2 ms - 29 seconds faster with the patch, which is about 2% - OpenArena, on x86-64 without AVX: - original code: 3398 frames 239.6 seconds 14.2 fps 38.0/70.5/719.0/14.6 ms - with the patch: 3398 frames 244.4 seconds 13.9 fps 38.0/71.9/697.0/14.3 ms - 0.3 fps slower with the patch (about 2%) Additional details can be found at: http://lists.freedesktop.org/archives/mesa-dev/2015-October/098635.html Signed-off-by: Oded Gabbay <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* gallium/radeon: allow returning SDMA fences from pipe->flushMarek Olšák2015-11-041-8/+56
| | | | | | | pipe->flush never returned SDMA fences. This fixes it. This is only an issue on amdgpu where fences can signal out of order. Reviewed-by: Michel Dänzer <[email protected]>
* gallium/radeon: always return the last SDMA fence on SDMA flush if neededMarek Olšák2015-11-042-4/+8
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* nvc0: add missing compute parameters required by cloverSamuel Pitoiset2015-11-031-1/+10
| | | | | | | This fixes crashes with some piglit OpenCL tests. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: handle NULL pointer in nvc0_get_compute_param()Samuel Pitoiset2015-11-031-24/+21
| | | | | | | | | To get the size (in bytes) of a compute parameter, clover first calls get_compute_param() with a NULL data pointer. The RET() macro is based on nv50. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50: use correct heaps for FP and GP code segmentsSamuel Pitoiset2015-11-011-2/+2
| | | | | | This is just a cosmetic change. Trivial. Signed-off-by: Samuel Pitoiset <[email protected]>
* nouveau: get rid of tabsIlia Mirkin2015-10-3119-607/+607
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* virgl/vtest: fix extra mallocDave Airlie2015-10-311-1/+1
| | | | | | | | This somehow got added twice, drop the first one. Reported by Coverity. Signed-off-by: Dave Airlie <[email protected]>
* virgl: free sampler view on failure pathDave Airlie2015-10-311-1/+5
| | | | | | Reported by Coverity. Signed-off-by: Dave Airlie <[email protected]>
* gallium/swrast: fixup build breakage and warningsDave Airlie2015-10-316-1/+6
| | | | | | | The front buffer rendering changes broke an interface, I didn't fix up all of them. Signed-off-by: Dave Airlie <[email protected]>
* gallium/swrast: fix front buffer blitting. (v2)Dave Airlie2015-10-317-12/+89
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | So I've known this was broken before, cogl has a workaround for it from what I know, but with the gallium based swrast drivers BlitFramebuffer from back to front or vice-versa was pretty broken. The legacy swrast driver tracks when a front buffer is used and does the get/put images when it is mapped/unmapped, so this patch attempts to add the same functionality to the gallium drivers. It creates a new context interface to denote when a front buffer is being created, and passes a private pointer to it, this pointer is then used to decide on map/unmap if the contents should be updated from the real frontbuffer using get/put image. This is primarily to make gtk's gl code work, the only thing I've tested so far is the glarea test from https://github.com/ebassi/glarea-example.git v2: bump extension version, check extension version before calling get image. (Ian) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91930 Cc: <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* winsys/virgl: rework line wrapping/indentEmil Velikov2015-10-304-92/+124
| | | | | | | | Wrap some of the 'omg it's getting out of hand' long lines, and re-indent where things feel off. Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* virgl: unwrap the includesEmil Velikov2015-10-3019-56/+72
| | | | | | | | Include what you want, rather than relying on a header foo.h N levels down the include chain, to provide something that you need. Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* winsys/virgl: remove temporary ret variableEmil Velikov2015-10-301-9/+3
| | | | | Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* winsys/virgl: always memset prior to ioctlEmil Velikov2015-10-301-4/+8
| | | | | Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* winsys/virgl: use MALLOC to match FREEEmil Velikov2015-10-301-1/+1
| | | | | | | The uppercase versions are wrappers which must be matched. Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* winsys/virgl: remove calloc/malloc castsEmil Velikov2015-10-302-5/+3
| | | | | Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* winsys/virgl: throw in some inline wrappersEmil Velikov2015-10-304-10/+40
| | | | | Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* virgl: introduce virgl_query() inline wrapperEmil Velikov2015-10-301-5/+10
| | | | | Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* virgl: use virgl_screen/surface upcast wrappersEmil Velikov2015-10-305-6/+11
| | | | | Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* virgl: introduce and use virgl_transfer/texture/resource inline wrappersEmil Velikov2015-10-307-24/+34
| | | | | | | | | The only two remaining cases of (struct virgl_resource *) require a closer look. Either the error checking is missing or the arguments provided feel wrong. Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* virgl: add virgl_context/sampler_view/so_target() upcast wrappersEmil Velikov2015-10-307-65/+82
| | | | | Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* winsys/virgl/drm: drop unneeded forward declarationEmil Velikov2015-10-301-2/+0
| | | | | Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* virgl: remove sw_winsys pointer from virgl_screenEmil Velikov2015-10-303-3/+0
| | | | | | | | | The screen already has a pointer to the (base) winsys object. With the latter of which implemented/sub-classed as either drm or sw based one, depending on the target. Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* virgl: rename virgl.h to virgl_screen.hEmil Velikov2015-10-306-5/+5
| | | | | | | Provide a more meaningful name considering it's purpose. Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* virgl: move virgl_hw.h into the driver dirEmil Velikov2015-10-309-7/+5
| | | | | | | | | | | Strictly speaking virgl_hw.h should reside in the driver folder, as it describes the hardware. Moving it allows us to nuke the following strange dependency winsys/vtest > driver > winsys/drm Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* virgl: straighten the includes confusionEmil Velikov2015-10-306-11/+8
| | | | | | | | | Use the relevant GALLIUM_foo_CFLAGS which has all the requirements (not to mention VISIBITY_CFLAGS) and keep ../ out of the include directives. Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* virgl: remove the _FILE_OFFSET_BITS definesEmil Velikov2015-10-303-7/+0
| | | | | | | The build already sets it as needed. Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* winsys/virgl/drm: add all files to the tarballEmil Velikov2015-10-301-1/+5
| | | | | Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* winsys/virgl/vtest: list all files in Makefile.sourcesEmil Velikov2015-10-301-1/+4
| | | | | Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* virgl: move sources list to Makefile.sourcesEmil Velikov2015-10-302-10/+19
| | | | | | | ... and add the missing files while we're at it. Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* virgl: fix drm.h include pathEmil Velikov2015-10-301-1/+1
| | | | | | | | | | | | The drm/ prefix is required, if using the kernel provided headers. As most distros don't ship them it and we already depend on libdrm (which adds the relevant -I flag) just drop the drm/ from the include. Once a libdrm release with the virtgpu_drm.h header is released, we can drop our local copy of the file. Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* nv50: do not create an invalid HW query typeSamuel Pitoiset2015-10-302-12/+30
| | | | | | | | While we are at it, store the rotate offset for occlusion queries to nv50_hw_query like on nvc0. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Pierre Moreau <[email protected]>
* nv50: move HW queries to nv50_query_hw.c/h filesSamuel Pitoiset2015-10-308-349/+476
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Pierre Moreau <[email protected]>
* nv50: move nva0_so_target_save_offset() to its correct locationSamuel Pitoiset2015-10-303-21/+18
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Pierre Moreau <[email protected]>
* nv50: add a header file for nv50_querySamuel Pitoiset2015-10-306-40/+49
| | | | | | | | | Like for nvc0, this will allow to split different types of queries and to prepare the way for both global performance counters and MP counters. While we are at it, make use of nv50_query struct instead of pipe_query. Signed-off-by: Samuel Pitoiset <[email protected]>
* st/va: add support to export a surface as dmabufJulien Isorce2015-10-303-2/+144
| | | | | | | | | | | | | | | | | | | | I.e. implements: VaAcquireBufferHandle VaReleaseBufferHandle for memory of type VA_SURFACE_ATTRIB_MEM_TYPE_DRM_PRIME And apply relatives change to: vlVaMapBuffer vlVaUnMapBuffer vlVaDestroyBuffer Implementation inspired from cgit.freedesktop.org/vaapi/intel-driver Tested with gstreamer-vaapi with nouveau driver. Signed-off-by: Julien Isorce <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Christian König <[email protected]>