summaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* st/mesa: Fix paths used in Android buildsTomasz Figa2014-10-033-0/+6
| | | | | | | | | | | | | | | | | | | With current makefiles the build fails because source and build paths are generated incorrectly. With Android build system the top_srcdir and top_builddir variables are undefined and all paths are relative to where Android.mk is located. This ends up with path likes external/mesa/src/mesa/src/mesa/ for both source and build paths, which are obviously wrong. This patch fixes this by overriding resulting SRCDIR and BUILDDIR variables with empty string, so that paths end up being relative to Android.mk file again. Appending correct build path to generated files is already done in Android.gen.mk. Signed-off-by: Tomasz Figa <[email protected]> CC: <[email protected]> Reviewed-by: Emil Velikov <[email protected]> (cherry picked from commit b4ffd19e6c9f61dfa4e0eda1f606cd255b27208f)
* st/mesa: Generate format_info.c in Android buildsTomasz Figa2014-10-031-0/+9
| | | | | | | | | | | Current Android makefiles lack generation of format_info.c, which is a dependency of main/format.c. This patch adds necessary code to Android.gen.mk. Signed-off-by: Tomasz Figa <[email protected]> CC: <[email protected]> Reviewed-by: Emil Velikov <[email protected]> (cherry picked from commit 98445fd25e4f0bd7dc4d7a2a843b7fbe76c9756d)
* util: Include in Android buildsTomasz Figa2014-10-038-4/+114
| | | | | | | | | | | This patch fixes Android build failures by including src/util directory in compilation. Files inside of this directory are compiled into libmesa_util static library and linked with resulting libGLES_mesa. Signed-off-by: Tomasz Figa <[email protected]> CC: <[email protected]> Reviewed-by: Emil Velikov <[email protected]> (cherry picked from commit d703abf735bc2fe27af893d07e44598b8601b172)
* glx/dri3: Provide error diagnostics when DRI3 allocation failsKeith Packard2014-10-031-8/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of just segfaulting in the driver when a buffer allocation fails, report error messages indicating what went wrong so that we can debug things. As a simple example, chromium wraps Mesa in a sandbox which doesn't allow access to most syscalls, including the ability to create shared memory segments for fences. Before, you'd get a simple segfault in mesa and your 3D acceleration would fail. Now you get: $ chromium --disable-gpu-blacklist [10618:10643:0930/200525:ERROR:nss_util.cc(856)] After loading Root Certs, loaded==false: NSS error code: -8018 libGL: pci id for fd 12: 8086:0a16, driver i965 libGL: OpenDriver: trying /local-miki/src/mesa/mesa/lib/i965_dri.so libGL: Can't open configuration file /home/keithp/.drirc: Operation not permitted. libGL: Can't open configuration file /home/keithp/.drirc: Operation not permitted. libGL error: DRI3 Fence object allocation failure Operation not permitted [10618:10618:0930/200525:ERROR:command_buffer_proxy_impl.cc(153)] Could not send GpuCommandBufferMsg_Initialize. [10618:10618:0930/200525:ERROR:webgraphicscontext3d_command_buffer_impl.cc(236)] CommandBufferProxy::Initialize failed. [10618:10618:0930/200525:ERROR:webgraphicscontext3d_command_buffer_impl.cc(256)] Failed to initialize command buffer. This made it pretty easy to diagnose the problem in the referenced bug report. Bugzilla: https://code.google.com/p/chromium/issues/detail?id=415681 Signed-off-by: Keith Packard <[email protected]> Cc: [email protected] Reviewed-by: Matt Turner <[email protected]> (cherry picked from commit 3202926746298468805f54ac5b39d62f9585dabf)
* st/xa: Fix regression in xa_yuv_planar_blit()Thomas Hellstrom2014-10-032-0/+12
| | | | | | | | | | | Commit "st/xa: scissor to help tilers" broke xa_yuv_planar_blit() and vmwgfx textured video. Fix this by implementing scissors also in the yuv draw path. Signed-off-by: Thomas Hellstrom <[email protected]> Reviewed-by: Sinclair Yeh <[email protected]> Cc: Rob Clark <[email protected]> Cc: "10.2 10.3" <[email protected]> (cherry picked from commit 46537f1d03ba6de83be70ac574f633bb4342a327)
* st/dri: remove GALLIUM_MSAA and __GL_FSAA_MODE environment variablesMarek Olšák2014-09-281-35/+0
| | | | | | | | | | | | | | | | | | | | | Some users don't understand that these variables can break OpenGL. The general is rule is that if an app supports MSAA, you mustn't use GALLIUM_MSAA. For example, if an app has an 8xMSAA FBO and GALLIUM_MSAA=4 is set, resolving the FBO to the back buffer will be rejected which will look like this on all gallium drivers: http://www.phoronix.com/scan.php?page=article&item=amd_radeonsi_msaa The environment variables also have no effect on modern apps like TF2, but there is still a performance hit due to wasted bandwidth and VRAM. In a nutshell, it does more harm than good. Cc: 10.2 10.3 <[email protected]> Reviewed-by: Michel Dänzer <[email protected]> (cherry picked from commit 8449121971ce1db03fea19665d314e523fdc10dd)
* glsl: Strip arrayness from ir_type_dereference_variable tooIan Romanick2014-09-271-1/+1
| | | | | | | | | | | | | If the thing being dereferenced is a record or an array of records, it should be treated as row-major. The ir_type_derference_record path already does this, and I think I intended to do the same for this path in b17a4d5d. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83741 Cc: [email protected] (cherry picked from commit c3f17bb18f597d7f606805ae94363dae7fd51582)
* glsl: Round struct size up to at least 16 bytesIan Romanick2014-09-271-1/+1
| | | | | | | | | | | | Per rule #9, the size of the structure is vec4 aligned. The MAX2 in the loop ensures that sizes >= 16 bytes are vec4 aligned. The new MAX2 after the loop ensures that sizes < 16 bytes are vec4 aligned. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82932 Cc: [email protected] (cherry picked from commit 2ab71e1486e76722154b48faef8216ff8173fd30)
* glsl: Make sure row-major array-of-structure get correct layoutIan Romanick2014-09-271-1/+8
| | | | | | | | | | | | Whether or not the field is row-major (because it might be a bvec2 or something) does not affect the array itself. We need to know whether an array element in its entirety is row-major. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83506 Cc: [email protected] (cherry picked from commit 5c75270c344815b15ef73e83421192fd7de35972)
* glsl: Make sure fields after small structs have correct paddingIan Romanick2014-09-271-0/+22
| | | | | | | | | | | | Previously the linker would correctly calculate the layout, but the lower_ubo_reference pass would not apply correct alignment to fields following small (less than 16-byte) nested structures. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83533 Cc: [email protected] (cherry picked from commit 8e01c66da6c780601f941aa5b9939962c219fdbd)
* st/mesa: Use PIPE_USAGE_STAGING for GL_STATIC/DYNAMIC/STREAM_READ buffersMichel Dänzer2014-09-271-3/+5
| | | | | | | | | | Such buffers can only be useful by reading from them with the CPU, so we need to make sure CPU reads are fast. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84178 Reviewed-by: Marek Olšák <[email protected]> Cc: [email protected] (cherry picked from commit 7e55c3b352b6616fa2780f683dd6c8e1a3f61815)
* gm107/ir: take relative pfetch offset into accountIlia Mirkin2014-09-271-1/+4
| | | | | | | | | | There is no dedicated instruction for this, so just combine it with the constant offset. Acked-by: Ben Skeggs <[email protected]> Signed-off-by: Ilia Mirkin <[email protected]> Cc: "10.3" <[email protected]> (cherry picked from commit a5bbfeda977a62aa3349f0c7d04c5c20156c1faf)
* gm107/ir: add support for indirect const buffer selectionIlia Mirkin2014-09-271-0/+14
| | | | | | | | | This was missed in the commit that enabled it for fermi/kepler as part of ARB_gpu_shader5 Signed-off-by: Ilia Mirkin <[email protected]> Cc: "10.3" <[email protected]> (cherry picked from commit cdc4de121564a47cbdac760622b6dc7112e548aa)
* gm107/ir: fix texture argument orderIlia Mirkin2014-09-272-5/+34
| | | | | | Signed-off-by: Ilia Mirkin <[email protected]> Cc: "10.3" <[email protected]> (cherry picked from commit 0532a5fd00cdddda0fd1727fb519cb4312f47e83)
* gm107/ir: fix manual TXD for array targetsIlia Mirkin2014-09-271-2/+3
| | | | | | | | This parallels the fixes in commit afea9bae. Signed-off-by: Ilia Mirkin <[email protected]> Cc: "10.3" <[email protected]> (cherry picked from commit d3c3bba6d07c97cfc1499a6bda73337584943971)
* nv50/ir: avoid deleting pseudo instructions too earlyIlia Mirkin2014-09-271-1/+10
| | | | | | | | | | | | What happens is that a SPLIT operation is part of the spill node, and as a pseudo op, the instruction gets erased after processing its first def. However the later defs still need to refer to it, so instead delay deleting until after that whole RA node is done processing. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79462 Signed-off-by: Ilia Mirkin <[email protected]> Cc: "10.2 10.3" <[email protected]> (cherry picked from commit 0147c10c5f00b43696ba660aab604d674a75e83c)
* mesa: Set correct array element in vbo_exec_vtx_init.Kenneth Graunke2014-09-271-1/+1
| | | | | | | | | | | | I'm not familiar with this code, but this sure appears to be a typo. It looks like the intent is to set each array element, not arrays[0] each time. Notably, the loop just below uses "array", not "arrays". Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Fredrik Höglund <[email protected]> Reviewed-by: Brian Paul <[email protected]> Cc: [email protected] (cherry picked from commit f81052dc9b99eca765a44decd01af0335350d0b2)
* mesa: Use proper structure for glGet*(GL_TEXTURE_COORD_ARRAY*).Kenneth Graunke2014-09-271-4/+4
| | | | | | | | | | | | | | | | | | | The code in get.c that handles this uses ctx->Array.VAO->VertexAttrib, which is a gl_vertex_attrib_array structure, not a gl_client_array. The offsets of all fields happened to be the same in both structures, at least on x86_64. "Size," "Type," and "Stride" are obviously the same: both structures start with the same fields, in the same order. "Enabled" is dicier: there are different fields before it in both structures, including pointer sized values which might need special alignment. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Fredrik Höglund <[email protected]> Reviewed-by: Brian Paul <[email protected]> Cc: [email protected] (cherry picked from commit d0ec6e85099af68b8a36f9815f4e3d43d767bb38)
* radeonsi: properly destroy the GS copy shader and scratch_bo for computeMarek Olšák2014-09-272-3/+8
| | | | | | | | Cc: 10.2 10.3 <[email protected]> Reviewed-by: Michel Dänzer <[email protected]> (cherry picked from commit dc05a9e4e089d66a2ffe8919857ad9660e108c28) [Emil Velikov: remove unref scratch_bo, s/si_shader/si_pipe_shader/] Signed-off-by: Emil Velikov <[email protected]>
* radeonsi: release GS rings at context destructionMarek Olšák2014-09-271-0/+2
| | | | | | | | Cc: 10.2 10.3 <[email protected]> Reviewed-by: Michel Dänzer <[email protected]> (cherry picked from commit 711623f7c8113d43f2d54ebfe5cbed3d406a3c79) [Emil Velikov: s/ring/ring.buffer/] Signed-off-by: Emil Velikov <[email protected]>
* i915: Fix black buffers when importing prime fdsAndreas Pokorny2014-09-271-0/+2
| | | | | | | | | | Width and Height of the imported image was never initialized from the imported bo. Cc: 10.2 10.3 <[email protected]> Signed-off-by: Andreas Pokorny <[email protected]> Reviewed-by: Daniel Stone <[email protected]> (cherry picked from commit df341320c9be34c40b76e42510640120e0ebe0d3)
* egl/drm: expose KHR_image_pixmap extensionAndreas Pokorny2014-09-271-0/+1
| | | | | | | | | | This changes enables EGL_KHR_image_pixmap in the egl drm platform, which is implemented there but has not been advertised yet. Cc: 10.2 10.3 <[email protected]> Signed-off-by: Andreas Pokorny <[email protected]> Reviewed-by: Daniel Stone <[email protected]> (cherry picked from commit 53b614bfd3c12368347b2953121e815add68d90b)
* gallivm: fix idivRoland Scheidegger2014-09-271-7/+5
| | | | | | | | | | | | ffeb77c7b0552a8624e46e65d6347240ac5ae84d had a typo which turned all signed integer divisions into unsigned ones. Oops. This gets us back the 51 little piglits (all from glsl built-in-functions, fs/vs/gs-op-div-int-ivec2 and similar). Cc: "10.2 10.3" <[email protected]> Reviewed-by: Jose Fonseca <[email protected]> (cherry picked from commit 5e1fcc625824ae962d5f658e151e6bc2665adce8)
* gallivm,tgsi: fix idiv by zero crashrconde2014-09-232-7/+25
| | | | | | | | | | | | While the result of signed integer division by zero is undefined by glsl (and doesn't exist with d3d10), we must not crash, so need to make sure we don't get sigfpe much like udiv already does. Unlike udiv where we return 0xffffffff (as required by d3d10) there is no requirement right now to return anything specific so we use zero. (cherry picked from commit ffeb77c7b0552a8624e46e65d6347240ac5ae84d) Nominated-by: Roland Scheidegger <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83570
* clover: Add support to mem objects for multiple destructor callbacks v2Tom Stellard2014-09-232-5/+8
| | | | | | | | | | | | | The spec says that mem objects should maintain a stack of callbacks not just one. v2: - Remove stray printf. Reviewed-by: Francisco Jerez <[email protected]> CC: "10.3" <[email protected]> (cherry picked from commit c6d980140913307d48648058ec24da42a31fc37c)
* mesa: fix prog_optimize.c assertions triggered by SWZ opcodeBrian Paul2014-09-231-5/+4
| | | | | | | | | | | | | | The SWZ instruction can have swizzle terms >4 (SWIZZLE_ZERO, SWIZZLE_ONE). These swizzle terms caused a few assertions to fail. This started happening after the commit "mesa: Actually use the Mesa IR optimizer for ARB programs." when replaying some apitrace files. A new piglit test (tests/asmparsertest/shaders/ARBfp1.0/swz-08.txt) exercises this. Cc: "10.3" <[email protected]> Reviewed-by: Charmaine Lee <[email protected]> (cherry picked from commit 7b2c7032446da4138dedeee8feaa79d741f1f108)
* swrast: Fix handling of MESA_FORMAT_L8A8_SRGB for big-endianRichard Sandiford2014-09-231-3/+3
| | | | | | | | | | | | | Luminance is the least-significant byte of the uint16, rather than the lowest byte in memory. Other parts of mesa already handle this correctly for big-endian, and swrast already handles other MESA_FORMAT_x8y8 formats correctly. This case was just an odd-one-out. Signed-off-by: Richard Sandiford <[email protected]> Reviewed-by: Brian Paul <[email protected]> Cc: <[email protected]> Signed-off-by: Dave Airlie <[email protected]> (cherry picked from commit ecc48f83c8359e3ef64ea40dfb6074f4a1a38dc1)
* mesa: Fix alpha component in unpack_R8G8B8X8_SRGB.Richard Sandiford2014-09-231-1/+1
| | | | | | | | | | | | The function was using the "X" component as the alpha channel, rather than setting alpha to 1.0. Signed-off-by: Richard Sandiford <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Cc: <[email protected]> Signed-off-by: Dave Airlie <[email protected]> (cherry picked from commit 3ff5c6a6c472288fa5f50d880621f38ea94b9c23)
* r300g: set register classes before interferencesConnor Abbott2014-09-161-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | In commit 567e2769b81863b6dffdac3826a6b729ce6ea37c ("ra: make the p, q test more efficient") I unknowingly introduced a new requirement to the register allocator API: the user must set the register class of all nodes before setting up their interferences, because ra_add_conflict_list() now uses the classes of the two interfering nodes. i965 already did this, but r300g was setting up register classes interleaved with setting up the interference graph. This led to us calculating the wrong q total, and in certain cases e78a01d5e6f77e075fe667a0f0ccb10d89c0dd58 (" ra: optimistically color only one node at a time") made it so that this bug caused a segfault. In particular, the error occurred if the q total was decremented to 1 below 0 for the last node to be pushed onto the stack. Since q_total is an unsigned integer, it overflowed to 0xffffffff, which is what lowest_q_total happens to be initialzed to. This means that we would fail the "new_q_total < lowest_q_total" check on line 476 of register_allocate.c, and so the node would never be pushed onto the stack, which led to segfaults in ra_select() when we failed to ever give it a register. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82828 Cc: "10.3" <[email protected]> Signed-off-by: Connor Abbott <[email protected]> Tested-by: Pavel Ondračka <[email protected]> Reviewed-by: Tom Stellard <[email protected]> (cherry picked from commit afd82dcad127b64381ca6d80d0e499368074f474)
* i965: add support for RGBA dma_buf imports.Gwenole Beauchesne2014-09-161-0/+6
| | | | | | | | | | This allows for importing foreign buffers in RGB32 native endian byte order, i.e. DRM_FORMAT_XBGR8888, and DRM_FORMAT_ABGR8888. Signed-off-by: Gwenole Beauchesne <[email protected]> Acked-by: Kenneth Graunke <[email protected]> Cc: "10.3" <[email protected]> (cherry picked from commit e1c50abf8a0ca1d541c4e2dbd5ed1805ed958ba7)
* i965: Mark delta_x/y as BAD_FILE if remapped away completely.Kenneth Graunke2014-09-162-5/+15
| | | | | | | | | | | | | | | | | | | | | | | | Commit afe3d1556f6b77031f7025309511a0eea2a3e8df (i965: Stop doing remapping of "special" regs.) stopped remapping delta_x/delta_y, and additionally stopped considering them always-live. We later realized delta_x was used in register allocaiton, so we actually needed to remap it, which was fixed in commit 23d782067ae834ad53522b46638ea21c62e94ca3 (i965/fs: Keep track of the register that hold delta_x/delta_y.). However, that commit didn't restore the "always consider it live" part. If all the code using delta_x was eliminated, fs_visitor::delta_x would be left pointing at its old register number. Later code in register allocation would handle that register number specially...even though it wasn't actually delta_x. To combat this, set delta_x/y to BAD_FILE if they're eliminated, and check for that. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83127 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Cc: "10.3" <[email protected]> (cherry picked from commit 78bd12619474e98503965541c61c5d7e9c408110)
* gallivm: Fix uses of 2^24Richard Sandiford2014-09-161-4/+4
| | | | | | | | | | | Fallback cases in lp_bld_arit.c used 2^24 to mean "2 to the power 24", but in C it's "2 xor 24", i.e. 26. Fixed by using 1<< instead. Signed-off-by: Richard Sandiford <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> Cc: "10.2 10.3" <[email protected]> Signed-off-by: Dave Airlie <[email protected]> (cherry picked from commit 1a65629ccc590fe04a97b6df63d73e349b793619)
* nouveau: change internal variables to avoid conflicts with macro argsIlia Mirkin2014-09-161-10/+10
| | | | | | | | Reported by Coverity Signed-off-by: Ilia Mirkin <[email protected]> Cc: "10.2 10.3" <[email protected]> (cherry picked from commit b13a4ca3f7f622cbf688eec14d3f4156533af44e)
* mesa: fix _mesa_free_pipeline_data() use-after-free bugBrian Paul2014-09-161-2/+2
| | | | | | | | | | | | | | Unreference the ctx->_Shader object before we delete all the pipeline objects in the hash table. Before, ctx->_Shader could point to freed memory when _mesa_reference_pipeline_object(ctx, &ctx->_Shader, NULL) was called. Fixes crash when exiting the piglit rendezvous_by_location test on Windows. Cc: [email protected] Reviewed-by: Ian Romanick <[email protected]> (cherry picked from commit 0d73ac6b02cac46d4a8f3cd1ffa591e071577fa7)
* gallium/util: add missing u_debug includeAndreas Boll2014-09-161-0/+1
| | | | | | | | | | | | | | | | | | | | Needed for assert. Fixes build on BE archs with -Werror=implicit-function-declaration. In file included from ../../../../../src/gallium/auxiliary/draw/draw_fs.c:30:0: ../../../../../src/gallium/auxiliary/util/u_math.h: In function 'util_memcpy_cpu_to_le32': ../../../../../src/gallium/auxiliary/util/u_math.h:810:4: error: implicit declaration of function 'assert' [-Werror=implicit-function-declaration] assert(n % 4 == 0); ^ Cc: "10.3" <[email protected]> Signed-off-by: Andreas Boll <[email protected]> Reviewed-by: Marek Olšák <[email protected]> (cherry picked from commit 2a13ff954d3d8cea73bbcf728edffa867828cb78)
* nouveau: only enable stencil func if the visual has stencil bitsIlia Mirkin2014-09-162-2/+2
| | | | | | | | The _Enabled property already has the relevant information. Signed-off-by: Ilia Mirkin <[email protected]> Cc: "10.2 10.3" <[email protected]> (cherry picked from commit 3c81de58512f0615df1d90aa79a22c9a44c7189e)
* nouveau: only enable the depth test if there actually is a depth bufferIlia Mirkin2014-09-165-4/+9
| | | | | | Signed-off-by: Ilia Mirkin <[email protected]> Cc: "10.2 10.3" <[email protected]> (cherry picked from commit 79959e5de518c59b327a9df4a6fa80a68213b873)
* nouveau: remove unneeded assertMaarten Lankhorst2014-09-161-1/+0
| | | | | | | | | No idea why it was added, but the code runs fine even on videos where it triggers. Signed-off-by: Maarten Lankhorst <[email protected]> Cc: "10.2 10.3" <[email protected]> (cherry picked from commit 8ab85bfcd5ddd44c50e5b384222731cb2a1a1496)
* nouveau: rework reference frame handlingMaarten Lankhorst2014-09-163-4/+37
| | | | | | | | | | | | | Fixes a regression from "nouveau/vdec: small fixes to h264 handling" New picking order for frames: 1. Vidbuf pointer matches. 2. Take the first kicked ref. 3. If that fails, take a ref that has a different last_used. Signed-off-by: Maarten Lankhorst <[email protected]> Cc: "10.2 10.3" <[email protected]> (cherry picked from commit a41aad843108cec1901c88a76d5ceb4ede2e062b)
* nouveau: fix MPEG4 hw decodingMaarten Lankhorst2014-09-161-3/+3
| | | | | | | | Reorder some fields to make I-frame decoding work correctly. Signed-off-by: Maarten Lankhorst <[email protected]> Cc: "10.2 10.3" <[email protected]> (cherry picked from commit 121ceb38f45daacc938349d9d5aa82776b78dbab)
* nouveau: re-allocate bo's on overflowMaarten Lankhorst2014-09-164-11/+87
| | | | | | | | | | The BSP bo might be too small to contain all of the bsp data, bump its size on overflow. Also bump inter_bo when this happens, it might be too small otherwise. Signed-off-by: Maarten Lankhorst <[email protected]> Cc: "10.2 10.3" <[email protected]> (cherry picked from commit f6afed7076a6ef446dbec7cb10c8f8c60efafccd)
* i965/vec4: Only examine virtual_grf_end for GRF sourcesIan Romanick2014-09-161-8/+12
| | | | | | | | | | | | | | If the source is not a GRF, it could have a register >= virtual_grf_count. Accessing virtual_grf_end with such a register would lead to out-of-bounds access. Make sure the source is a GRF before accessing virtual_grf_end. Fixes Valgrind complaints while compiling some shaders. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: [email protected] (cherry picked from commit 7aeb853c90c2e84fdd4b6b0af97566562c912861)
* i965: Implement GL_PRIMITIVES_GENERATED with non-zero streams.Iago Toral Quiroga2014-09-161-4/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | So far we have been using CL_INVOCATION_COUNT to resolve this query but this is no good with streams, as only stream 0 reaches the clipping stage. From ARB_transform_feedback3: "When a generated primitive query for a vertex stream is active, the primitives-generated count is incremented every time a primitive emitted to that stream reaches the Discarding Rasterization stage (see Section 3.x) right before rasterization. This counter is incremented whether or not transform feedback is active." Unfortunately, we don't have any registers that provide the number of primitives written to a specific stream other than the ones that track the number of primitives written to transform feedback in the SOL stage, so we can't implement this exactly as specified. In the past we implemented this feature by activating the SOL unit even if transform feeback was disabled, but making it so that all buffers were disabled and it only recorded statistics, which gave us the right semantics (see 3178d2474ae5bdd1102fb3d76a60d1d63c961ff5). Unfortunately, this came with a significant performance impact and had to be reverted. This new take does not intend to implement the exact semantics required by the spec, but improves what we have now, since now we return the primitive count for stream 0 in all cases. With this patch we use GEN7_SO_PRIM_STORAGE_NEEDED to resolve GL_PRIMITIVES_GENERATED queries for non-zero streams. This would return the number of primitives written to transform feedback for each stream instead. Since non-zero streams are only useful in combination with transform feedback this should not be too bad, and the only case that I think we would not be supporting would be the one in which we want to use both GL_PRIMITIVES_GENERATED and GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN on the same non-zero stream to detect buffer overflow. This patch also fixes the following piglit test: arb_gpu_shader5-xfb-streams-without-invocations This test uses both GL_PRIMITIVES_GENERATED and GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN queries on non-zero streams, but it does never hit the overflow case, so both queries are always expected to return the same value. Reviewed-by: Kenneth Graunke <[email protected]> Cc: "10.3" <[email protected]> (cherry picked from commit f976b4c1bf2271cf986be8204147ae986380cc91) Nominated-by: Kenneth Graunke <[email protected]>
* glsl: Speed up constant folding for swizzles.Kenneth Graunke2014-09-121-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ir_rvalue::constant_expression_value() recursively walks down an IR tree, attempting to reduce it to a single constant value. This is useful when you want to know whether a variable has a constant expression value at all, and if so, what it is. The constant folding optimization pass attempts to replace rvalues with their constant expression value from the bottom up. That way, we can optimize subexpressions, and ideally stop as soon as we find a non-constant subexpression. In order to obtain the actual value of an expression, the optimization pass calls constant_expression_value(). But it should only do so if it knows the value can be combined into a constant. Otherwise, at each step of walking back up the tree, it will walk down the tree again, only to discover what it already knew: it isn't constant. We properly avoided this call for ir_expression nodes, but not for ir_swizzle nodes. This patch fixes that, drastically reducing compile times on certain shaders where tree grafting has given us huge expression trees. It also fixes SuperTuxKart. Thanks to Iago and Mike for help in tracking this down. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78468 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Cc: [email protected] (cherry picked from commit 84a40ce86b1010873b194eb9bf0b8744234b829c)
* i965/vec4: Make type_size() return 0 for samplers.Kenneth Graunke2014-09-121-3/+3
| | | | | | | | | | | | | | | The FS backend has always used 0, and the VS backend has always used 1. I think 1 is just working around other problems, and is incorrect. Samplers are baked in; nothing uses the UNIFORM register we would create, and we shouldn't upload any constant values for them. Fixes ES3-CTS.shaders.struct.uniform.sampler_array_vertex. Signed-off-by: Kenneth Graunke <[email protected]> Cc: [email protected] Reviewed-by: Ian Romanick <[email protected]> Tested-by: Ian Romanick <[email protected]> (cherry picked from commit 7865026c04f6cc36dc81f993bc32ddda2806ecb5)
* i965: Skip allocating UNIFORM file storage for uniforms of size 0.Kenneth Graunke2014-09-122-6/+6
| | | | | | | | | | | | | | | | | | | | | | Samplers take up zero slots and therefore don't exist in the params array, nor are they included in stage_prog_data->nr_params. There's no need to store their size in param_size, as it's only used for dealing with arrays of "real" uniforms (ones uploaded as shader constants). We run into all kinds of problems trying to refer to the uniform storage for variables that don't have uniform storage. For one, we may use some other variable's index, or access out of bounds in arrays. In the FS backend, our extra 2 * MaxSamplerImageUnits params for texture rectangle rescaling paper over a lot of problems. In the VS backend, we claim samplers take up a slot, which also papers over problems. Instead, just skip allocating storage for variables that don't have any. Signed-off-by: Kenneth Graunke <[email protected]> Cc: [email protected] Reviewed-by: Ian Romanick <[email protected]> Tested-by: Ian Romanick <[email protected]> (cherry picked from commit 2408f166db1d81f2e9cc86b3f413ddba5ba537fa)
* i965: Disable guardband clipping in the smaller-than-viewport case.Kenneth Graunke2014-09-121-0/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | Apparently guardband clipping doesn't work like we thought: objects entirely outside fthe guardband are trivially rejected, regardless of their relation to the viewport. Normally, the guardband is larger than the viewport, so this is not a problem. However, when the viewport is larger than the guardband, this means that we would discard primitives which were wholly outside of the guardband, but still visible. We always program the guardband to 8K x 8K to enforce the restriction that the screenspace bounding box of a single triangle must be no more than 8K x 8K. So, if the viewport is larger than that, we need to disable guardband clipping. Fixes ES3 conformance tests: - framebuffer_blit_functionality_negative_height_blit - framebuffer_blit_functionality_negative_width_blit - framebuffer_blit_functionality_negative_dimensions_blit - framebuffer_blit_functionality_magnifying_blit - framebuffer_blit_functionality_multisampled_to_singlesampled_blit v2: Mention the acronym expansion for TA/TR/MC in the comments. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]> (cherry picked from commit 0bac2551e40410e2251daf4fd9faf69310ab34ce)
* i965: Separate gl_InstanceID and gl_VertexID uploading.Kenneth Graunke2014-09-125-16/+42
| | | | | | | | | | | | | | | | | We always uploaded them together, mostly out of laziness - both required an additional vertex element. However, gl_VertexID now also requires an additional vertex buffer for storing gl_BaseVertex; for non-indirect draws this also means uploading (a small amount of) data. This is extra overhead we don't need if the shader only uses gl_InstanceID. In particular, our clear shaders currently use gl_InstanceID for doing layered clears, but don't need gl_VertexID. Signed-off-by: Kenneth Graunke <[email protected]> Cc: "10.3" <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Tested-by: Ian Romanick <[email protected]> (cherry picked from commit 6b6145204dd4a1112f6e1fe10162636141495b79)
* i965: Fix reference counting in new basevertex upload code.Kenneth Graunke2014-09-121-0/+3
| | | | | | | | | | | | | | | | | | | | | In the non-indirect draw case, we call intel_upload_data to upload gl_BaseVertex. It makes brw->draw.draw_params_bo point to the upload buffer, and increments the upload BO reference count. So, we need to unreference it when making brw->draw.draw_params_bo point at something else, or else we'll retain a reference to stale upload buffers and hold on to them forever. This also means that the indirect case should increment the reference count on the indirect draw buffer when making brw->draw.draw_params_bo point at it. That way, both paths increment the reference count, so we can safely unreference it every time. Signed-off-by: Kenneth Graunke <[email protected]> Cc: "10.3" <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Tested-by: Ian Romanick <[email protected]> (cherry picked from commit e980fe607155c79ccba56ef78854093b7730bef6)
* i965: Request lowering gl_VertexIDIan Romanick2014-09-121-0/+1
| | | | | | | | | | | | | | Fixes the (new) piglit tests gles-3.0-drawarrays-vertexid, gl-3.0-multidrawarrays-vertexid, and gl-3.2-basevertex-vertexid. Fixes gles3conform failure in: ES3-CTS.gtf.GL3Tests.transform_feedback.transform_feedback_vertex_id Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80247 Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> (cherry picked from commit 927f5db46135b3eb63f401833b1e40a3be9ca4e0)