summaryrefslogtreecommitdiffstats
path: root/src/gallium
Commit message (Collapse)AuthorAgeFilesLines
* nv50: correctly calculate the number of vertical blocks during transfer mapEmil Velikov2014-02-251-1/+1
| | | | | | Cc: "10.0 10.1" <[email protected]> Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* gallium: add texture gather support to gallium (v3)Dave Airlie2014-02-2517-2/+73
| | | | | | | | | | | | | | | | | | | | | This adds support to gallium for a TG4 instruction, and two CAPs. The first CAP is required for GL_ARB_texture_gather. The second CAP is required to expose GL_ARB_gpu_shader5. However so far we haven't found any hardware that natively exposes the textureGatherOffsets feature from GL, so just lower it for now. If hardware appears for this we can add another CAP to allow TG4 to take 4 offsets. v2: add component selection src and a cap to say hw can do it. (st can use to help control GL_ARB_gpu_shader5/GLSL 4.00). Add docs. v3: rename to SM5, add docs. Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* clover: Pass buffer offsets to the driver in set_global_binding() v3Tom Stellard2014-02-244-10/+32
| | | | | | | | | | | | | | | The offsets will be stored in the handles parameter. This makes it possible to use sub-buffers. v2: - Style fixes - Add support for constant sub-buffers - Store handles in device byte order v3: - Use endian helpers Reviewed-by: Francisco Jerez <[email protected]>
* radeonsi: Use SI_BIG_ENDIAN now that it existsTom Stellard2014-02-241-1/+1
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* r600g: Use util_cpu_to_le32() instead of bswap32() on big-endian systemsTom Stellard2014-02-243-3/+3
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: Use util_cpu_to_le32() instead of bswap32() on big-endian systemsTom Stellard2014-02-242-2/+2
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* util: Add util_cpu_to_le* helpersTom Stellard2014-02-241-0/+3
| | | | | Reviewed-by: Michel Dänzer <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* util: Add util_bswap64() v3Tom Stellard2014-02-241-0/+16
| | | | | | | | | | | | v2: - Use __builtin_bswap64() - Remove unnecessary mask - Add util_le64_to_cpu() helper v3: - Remove unnecessary AC_SUBST Reviewed-by: Michel Dänzer <[email protected]>
* configure.ac: Use AX_GCC_BUILTIN to check availability of __builtin_bswap32 v2Tom Stellard2014-02-241-1/+2
| | | | | | | v2: - Remove unnecessary AC_SUBST Reviewed-by: Matt Turner <[email protected]>
* targets/opencl: resolve undefined symbols at link timeEmil Velikov2014-02-241-0/+1
| | | | | | | | Current automake build does not try to resolve undefined symbols thus we could end up with a broken library. Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* gallium/targets: resolve undefined reference to pipe_loader_sw_probe_driEmil Velikov2014-02-243-0/+15
| | | | | | | | | | | | | With the introduction of the pipe_loader_sw_probe_dri helper we require the sw/dri winsys during linking stage despite it being unused by any of the targets. This will cause a minor increase in the resulting library which will be cleaned up via linker options with upcoming patches. v2: Link with libswdri.la only when available. Reported-and-tested-by: Tom Stellard <[email protected]> (v1) Signed-off-by: Emil Velikov <[email protected]>
* pipe-loader: wrap pipe_loader_sw_probe_xlib within HAVE_PIPE_LOADER_XLIBEmil Velikov2014-02-247-7/+34
| | | | | | | | | | | | | | | | | | | The above function implies using the the xlib winsys, which has additional library dependencies that should not be forced. Make the software xlib pipe loader optional thus avoid all the dependency hell. A user that wishes to use the particular pipe-loader would need to set the following within configure.ac. enable_gallium_xlib_loader=yes v2: - Wrap sw/xlib/xlib_sw_winsys.h to handle compilation on systems lacking X11 headers. Spotted by Christian Prochaska. Tested-by: Tom Stellard <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75356 Signed-off-by: Emil Velikov <[email protected]>
* targets/gbm: exit gracefully if pipe_loader_drm_probe_fd is not availableEmil Velikov2014-02-241-3/+4
| | | | | | | | | | When one builds without gallium_drm_loader, the above function will not be available, thus we'll segfault in gallium_screen_create due to memory access violation. Tested-by: Tom Stellard <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75335 Signed-off-by: Emil Velikov <[email protected]>
* freedreno/a3xx/compiler: half-precision outputRob Clark2014-02-236-10/+130
| | | | | | | | | | | | | | | | | | | | Using generic shaders caused a measurable fps drop, which was isolated to use of full precision (vs half precision) output. This is an attempt to regain that lost performance by using half precision solid/blit shaders (when the output format is not float32). Note: for the built-in shaders, I would not expect them to be register starved. And in fact it is the solid frag shader that seems to have the biggest impact. So I suspect you get double the pixel pipe units (or half the cycles) when the output is half precision. So there may be some gain to using half precision output for application shaders as well, even though the rest of register usage is still full precision. But for half precision to work for more complex shaders, we need to deal with some constraints, like cat2 needing same precision for it's two src registers. So for now it is not enabled by default except for the built-in shaders. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx: add shader variantsRob Clark2014-02-2310-196/+283
| | | | | | | | | Start putting in place infrastructure to deal with multiple shader variants. Initially we'll use this for two sided color (frag) and binning pass (vert) shaders. Possibly need for others later (such as YUV vs RGB eglImage?). Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx/compiler: collapse nop's with repeatRob Clark2014-02-232-0/+15
| | | | | | | | | Easier than making more extensive use of rpt, and the more compact shaders seem to bring some bit of performance boost. (Perhaps repeat flag benefits are more than just instruction cache, possibly it saves on instruction decode as well?) Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx: drop hand-coded blit/solid shadersRob Clark2014-02-2310-287/+181
| | | | | | | | | Instead in the common code, construct these shaders from TGSI. For now we let a2xx keep it's hand coded shaders, as it's compiler isn't quite up to the job yet. All the same it is a net drop in code size and gets rid of special cases. Signed-off-by: Rob Clark <[email protected]>
* freedreno/lowering: cleanup apiRob Clark2014-02-235-24/+138
| | | | | | | | Make things configurable, and tweak the API a bit to avoid an extra tgsi_shader_scan(). Getting closer to something generic which can be moved out of freedreno and shaderd by other drivers. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx: add float 16 and 32bit formatsRob Clark2014-02-231-0/+22
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno: resync generated headersRob Clark2014-02-234-4/+20
| | | | Signed-off-by: Rob Clark <[email protected]>
* nv50: make sure to clear _all_ layers of all attachmentsIlia Mirkin2014-02-223-3/+21
| | | | | | | | | | | | | | | Unfortunately there's only one RT_ARRAY_MODE setting for all attachments, so clears were previously truncated to the minimum number of layers any attachment had. Instead set the RT_ARRAY_MODE to 512 (the max number of layers) before doing the clear. This fixes gl-3.2-layered-rendering-clear-color-mismatched-layer-count. Also fix clears of individual layered rt/zeta, in case it ever happens. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Christoph Bumiller <[email protected]> Cc: 10.1 <[email protected]>
* ilo: fix and enable fast depth clearChia-I Wu2014-02-222-9/+38
| | | | | | | | Use tex->bo_format instead of zs->format in ilo_blitter_rectlist_clear_zs() because the latter may be combined depth/stencil format. hiz_can_clear_zs() is no-op for GEN7+, but move the GEN check so that the assertions are tested. Finally, call the fast depth clear function from ilo_clear().
* ilo: add slice clear valueChia-I Wu2014-02-225-7/+78
| | | | | It is needed for 3DSTATE_CLEAR_PARAMS, and can also be used to track what value the slice has been cleared to.
* ilo: better readability and doc for texture flagsChia-I Wu2014-02-223-36/+58
| | | | | Improve comments for the flags, and explicitly separate their uses in slice flags and resolve flags.
* ilo: fix for stencil only rectlist opsChia-I Wu2014-02-222-2/+8
| | | | | 3DSTATE_STENCIL_BUFFER inherits some states from 3DSTATE_DEPTH_BUFFER. We need to emit both even the surface is stencil only.
* ilo: fix a false assertion failure on GEN6Chia-I Wu2014-02-221-4/+12
| | | | Layer offsetting is possible when it is level 0, layer 0.
* ilo: pipe_texture::usage is not a bitfieldChia-I Wu2014-02-221-1/+1
| | | | It happens to work because PIPE_USAGE_STAGING is 0x100.
* ilo: set ILO_TEXTURE_CPU_WRITE for imported texturesChia-I Wu2014-02-221-3/+10
| | | | | Assume the bo has been written by another process, which will trigger a HiZ resolve.
* nv50/ir/ra: fix SpillCodeInserter::offsetSlot usageChristoph Bumiller2014-02-221-7/+7
| | | | | | We were turning non-memory spill slots into NULL. Cc: 10.1 <[email protected]>
* libgl-xlib: Fix xlib_sw_winsys.h include path.Vinson Lee2014-02-211-1/+1
| | | | | | | | | | | | | This patch fixes this SCons build error introduced with commit 4f37e52f374b8b1d7177634dc09ab71e30e1779d. Compiling src/gallium/targets/libgl-xlib/xlib.c ... src/gallium/targets/libgl-xlib/xlib.c:35:42: fatal error: state_tracker/xlib_sw_winsys.h: No such file or directory #include "state_tracker/xlib_sw_winsys.h" ^ Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75347 Signed-off-by: Vinson Lee <[email protected]>
* pipe-loader: introduce pipe_loader_sw_probe_null helper functionEmil Velikov2014-02-222-0/+31
| | | | | | | v2: Handle null_sw_create failure, add missing function return type Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Jakob Bornecrantz <[email protected]> (v1)
* pipe-loader: introduce pipe_loader_sw_probe_dri helperEmil Velikov2014-02-223-0/+37
| | | | | | | | | | | Will be used in the following commits. v2: Link gallium tests against the library. v3: Handle dri_create_sw_winsys failure v4: Rebase on top of the targets/xa changes Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Jakob Bornecrantz <[email protected]> (v2)
* pipe-loader: introduce pipe_loader_sw_probe_xlib helperEmil Velikov2014-02-223-3/+45
| | | | | | | | | Will be used in the upcoming patches. v2: handle xlib_create_sw_winsys failure, drop unneeded header Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Jakob Bornecrantz <[email protected]> (v1)
* pipe-loader: use bool type for pipe_loader_drm_probe_fd()Emil Velikov2014-02-222-5/+5
| | | | | | | | v2: Rebase on top of the rendernode changes. Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Jakob Bornecrantz <[email protected]> (v1) Reviewed-by: Francisco Jerez <[email protected]> (v1)
* winsys/xlib: move xlib_create_sw_winsys within the winsysEmil Velikov2014-02-2210-15/+21
| | | | | | v2: Rebase on top of vl_winsys_xsp.c removal Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Jakob Bornecrantz <[email protected]> (v1)
* pipe-loader: handle memory allocation failureEmil Velikov2014-02-222-0/+4
| | | | | Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Jakob Bornecrantz <[email protected]>
* pipe-loader: build pipe_loader_drm_x_auth whenever HAVE_PIPE_LOADER_XCB is ↵Emil Velikov2014-02-221-1/+1
| | | | | | | | | | defined Currently HAVE_PIPE_LOADER_XCB is defined, rather than being set to 1/0. Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Jakob Bornecrantz <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* pipe-loader: destroy sw_winsys on sw_releaseEmil Velikov2014-02-221-0/+3
| | | | | | | | | | | | The sw pipe-loader implicitly handles winsys_create, thus we it would make sense to implicitly destroy it upon releasing the loader. Currently we leak the sw_winsys when releasing the pipe-loader. Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Jakob Bornecrantz <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* vl/winsys_dri: cleanup vl_screen_create error pathEmil Velikov2014-02-221-13/+19
| | | | | | Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Jakob Bornecrantz <[email protected]> Reviewed-by: Christian König <[email protected]>
* targets/pipe-loader: link pipe-nouveau against libdrmEmil Velikov2014-02-221-0/+1
| | | | | | Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Jakob Bornecrantz <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* clover: Unabbreviate a few data accessor names for consistency.Francisco Jerez2014-02-219-28/+28
| | | | Tested-by: Tom Stellard <[email protected]>
* clover: Replace the transfer(new ...) idiom with a safer create(...) helper ↵Francisco Jerez2014-02-216-56/+43
| | | | | | function. Tested-by: Tom Stellard <[email protected]>
* clover: Migrate a bunch of pointers and references in the object tree to ↵Francisco Jerez2014-02-2129-163/+179
| | | | | | smart references. Tested-by: Tom Stellard <[email protected]>
* clover: Allow storing a range into a container of different (but compatible) ↵Francisco Jerez2014-02-211-7/+7
| | | | | | element type. Tested-by: Tom Stellard <[email protected]>
* clover: Define an intrusive smart reference class.Francisco Jerez2014-02-211-0/+66
| | | | Tested-by: Tom Stellard <[email protected]>
* clover: Some improvements for the intrusive pointer class.Francisco Jerez2014-02-216-30/+45
| | | | | | | | Define some additional convenience operators, clean up the implementation slightly, and rename it to 'intrusive_ptr' for reasons that will be obvious in the next commit. Tested-by: Tom Stellard <[email protected]>
* clover: Fix up NULL constant pointer arguments.Francisco Jerez2014-02-211-1/+2
| | | | Tested-by: Tom Stellard <[email protected]>
* tgsi_ureg: add property_gs_invocationsJordan Justen2014-02-202-0/+11
| | | | | | | | Fixes a build break in state_tracker/st_program.c Signed-off-by: Jordan Justen <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75278 Reviewed-by: Dave Airlie <[email protected]>
* gallivm: add smallfloat to float conversion not relying on cpu denorm handlingRoland Scheidegger2014-02-201-20/+65
| | | | | | | | | | | | | | | | | | | | | | The previous code relied on cpu denorm support for converting small float formats (such r11g11b10_float and r16_float) to floats, otherwise denorms are flushed to zero. We worked around that in llvmpipe blend code by reenabling denorms, but this did nothing for texture sampling. Now it would be possible to reenable it there too but I'm not really a fan of messing with fpu flags (and it seems we can't actually do it reliably with llvm in any case looking at some bug reports). (Not to mention if you actually have a lot of denorms in there, you can expect some order-of-magnitude slowdown with x86 cpus.) So instead use code which adjusts exponents etc. directly hence not relying on cpu denorm support for the rescaling mul. (We still need the fpu flag handling as we can't do float-to-smallfloat without using cpu denorms at least for now - I actually wanted to keep both the old and new code and using one or the other depending on from where it's called but that didn't work out as the parameter would have to be passed through too many layers than I'd like.) Reviewed-by: Zack Rusin <[email protected]> Reviewed-by: Si Chen <[email protected]>
* st/omx/enc: add multi scaling buffers for performance improvementLeo Liu2014-02-202-16/+29
| | | | | Signed-off-by: Leo Liu <[email protected]> Reviewed-by: Christian König <[email protected]>