summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* draw/llvmpipe: fix transform feedback position + enable other extensionsDave Airlie2012-12-146-8/+27
| | | | | | | | | | | | | | | | This builds on the previous draw/softpipe patch. So llvmpipe does streamout calls after clip/viewport stages, but we have the pre-clip position stored for later use, so when we are doing transform feedback, and its the position vertex grab the vertex from the stored pre clip position. The perfect fix is too probably add a codegen transform feedback stage in between shader and clip stages, but this is good enough for now. Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* draw: add support for later transform feedback extensionsDave Airlie2012-12-143-6/+17
| | | | | | | | | | | | | | | | | This adds support to draw for the new features of transform feedback. a) fix count_from_stream_output, using max_index+1 for now but it looks like it should be valid as its derived from the vertex elements/vbo. b) fix striding and dst offsets in output buffers - was just wrong before. c) fix crash if tfb is suspended (so.num_targets == 0) This also enables the new features on softpipe. It should be possible to enable them on llvmpipe as well after this commit, but would need to schedule piglit runs. Signed-off-by: Dave Airlie <[email protected]>
* clover: Fix build since removal of pipe_surface::usageTom Stellard2012-12-131-1/+0
| | | | by commit 25409c6da8163d9acb386511aef0c11577c7aadb
* r600g/radeonsi: Silence warningsMaxence Le Dore2012-12-135-30/+49
| | | | Reviewed-by: Tom Stellard <[email protected]>
* clover: Add support for compiler flagsTom Stellard2012-12-135-12/+71
| | | | Reviewed-by: Francisco Jerez <[email protected]>
* clover: Don't erase build info of devices not being builtTom Stellard2012-12-131-2/+2
| | | | | | | | | Every call to _cl_program::build() was erasing the binaries and logs for every device associated with the program. This is incorrect because it is possible to build a program for only a subset of devices and so any device not being build should not have this information erased. Reviewed-by: Francisco Jerez <[email protected]>
* r600g: use load_ar checks with llvm output.Vincent Lejeune2012-12-131-0/+6
| | | | Reviewed-by: Tom Stellard <[email protected]>
* build: Fix AX_PROG_{CC,CXX}_FOR_BUILD macrosThierry Reding2012-12-132-52/+23
| | | | | | | | | | | | | | | | | Override the cross_compiling and ac_tool_prefix variables by reassigning to them instead of redefining the macros. Redefining them will actually cause the variable names to be replaced instead of their content. Furthermore push the definition of CPPFLAGS before running the checks for the build tools to avoid the host CPPFLAGS from leaking into the build CPPFLAGS. While at it drop the redefinition of AC_TRY_COMPILER which hasn't been used since autoconf 2.50 and make sure that all definitions are properly popped when done (LDFLAGS, ac_cv_prog_CPP, ac_cv_prog_CXXCPP). Acked-by: Matt Turner <[email protected]> Signed-off-by: Thierry Reding <[email protected]>
* gallivm: fix texel fetch for array texturesRoland Scheidegger2012-12-131-17/+38
| | | | | | | | | | Since we don't call lp_build_sample_common() in the texel fetch path we missed the layer fixup code. If someone would have tried to do texelFetch with array textures it would have crashed for sure. Not really tested (can't run the piglit test being able to use texelFetch with array samplers for now with llvmpipe). Reviewed-by: José Fonseca <[email protected]>
* mesa: Fix computation of default vertex attrib stride for 2_10_10_10 formats.Paul Berry2012-12-133-1/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, if the client program didn't specify a stride when setting up a vertex attribute, we used _mesa_sizeof_type() to compute the size of the type, and multiplied it by the number of components. This didn't work for the 2_10_10_10 formats, since _mesa_sizeof_type() returns -1 for those types, resulting in all kinds of havoc, since it was causing the hardware to be programmed with a negative stride value. This patch adds a new function _mesa_bytes_per_vertex_attrib(), which is similar to the existing function _mesa_bytes_per_pixel(), but which computes the size of a vertex attribute based on the type and the number of formats. For packed formats (currently only the 2_10_10_10 formats), it verifies that the number of components is correct and returns the size of the packed format. For unpacked formats, it returns the size of the type times the number of components. In addition, this patch adds an assertion so that if we ever forget to update _mesa_bytes_per_vertex_attrib() when adding a new vertex format, we'll see the problem quickly rather than having to debug a subtle conformance test failure. Fixes GLES3 conformance tests vertex_type_2_10_10_10_rev_{conversion,divisor,stride_pointer}.test. Reviewed-by: Brian Paul <[email protected]>
* mesa/uniform_query: Don't write to *params if there is an errorMatt Turner2012-12-131-1/+5
| | | | | | | | | | | | | | | | The GL 3.1 and ES 3.0 specs say of glGetActiveUniformsiv: "If an error occurs, nothing will be written to params." So, make a pass through the indices and check that they're valid before the pass that actually writes to params. Checking pname happens on the first iteration of the second loop. Fixes es3conform's getactiveuniformsiv_for_nonexistent_uniform_indices test. NOTE: This is a candidate for the 9.0 branch. Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: print unsigned values with %uMatt Turner2012-12-131-4/+4
| | | | | | | | Otherwise messages say silly things like glGetActiveUniformBlockiv(block index -1 >= 0) Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Fix disassembly of jump targets on Gen7.Kenneth Graunke2012-12-121-4/+9
| | | | | | Gen7 stores the JIP/UIP bits in different places. Reviewed-by: Eric Anholt <[email protected]>
* i965: Make try_rewrite_rhs_to_dst compare VGRF size to regs written.Kenneth Graunke2012-12-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | try_rewrite_rhs_to_dst is a quick optimization to avoid generating new temporaries (and MOVs from those temporaries to the dest) for every expression tree we visit. By generating better code in simple cases, we reduce the burden on later optimization passes like register coalescing. Previously, we compared inst->regs_written() to lhs->vector_elements to make sure the instruction generating our value wrote the same number of components as our destination register. However, this fails in some cases. One example is texturing (which produces a vec4) into gl_FragData[i]. Technically, gl_FragData[i] is also a vec4. However, the destination VGRF actually has size 4n (where n is the size of the array). split_virtual_grfs() can't split VGRFs that are used by SEND messages which require contiguous destination registers (like texturing), and register allocation needs all VGRFs to have sizes between 1 and 4. Amnesia: The Dark Descent hits this case: a texturing instruction (4 components) gets rewritten to the gl_FragData output register (which was 4*3 = 12 components), causing the register allocator to hit the "we rely on split_virtual_grfs" assertion. This makes it possible to play Amnesia. Reviewed-by: Eric Anholt <[email protected]>
* configure.ac: Disable compiler optimizations when --enable-debug is setEmil Velikov2012-12-121-4/+4
| | | | | | Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Dan Nicholson <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* softpipe: remove unused corner0 variableBrian Paul2012-12-121-1/+0
|
* llvmpipe: remove unneeded draw_flush() callBrian Paul2012-12-121-2/+0
| | | | | | | | | | This is redundant since we're calling draw_bind_fragment_shader() which already does a flush. v2: the redundant flush in llvmpipe_set_constant_buffer() has already been removed by commit 3427466e6dbbb8db7c1ecda6b3859ca1cc5827a3 Reviewed-by: José Fonseca <[email protected]>
* r600g: suballocate memory for fetch shaders from a large bufferMarek Olšák2012-12-126-19/+37
| | | | | | | | | | Fetch shaders are usually destroyed at the context destruction by the state tracker, so we can put them all in a large buffer without wasting memory. This reduces the number of relocations sent to the kernel a little bit. Tested-by: Aaron Watry <[email protected]> Reviewed-by: Alex Deucher <[email protected]>
* r600g: suballocate memory for the STRMOUT_BUFFER_FILLED_SIZE registerMarek Olšák2012-12-125-16/+28
| | | | | | | | | | Instead of having a 4-byte buffer for each streamout target, we suballocate each dword from a 4K buffer. This further reduces the overall number of relocations. Tested-by: Aaron Watry <[email protected]> Reviewed-by: Alex Deucher <[email protected]>
* gallium/util: add a simple allocator for suballocating from a large bufferMarek Olšák2012-12-123-0/+181
| | | | | Tested-by: Aaron Watry <[email protected]> Reviewed-by: Alex Deucher <[email protected]>
* r600g: use u_upload_mgr for allocating staging transfer buffersMarek Olšák2012-12-121-15/+15
| | | | | | | | | | u_upload_mgr suballocates memory from a large buffer and maps the allocated range (unsychronized), which is perfect for short-lived staging buffers. This reduces the number of relocations sent to the kernel. Tested-by: Aaron Watry <[email protected]> Reviewed-by: Alex Deucher <[email protected]>
* winsys/radeon: don't use BIND flags, add a flag for the cache bufmgr insteadMarek Olšák2012-12-1212-28/+30
|
* st/dri: add a way to force MSAA on with an environment variableMarek Olšák2012-12-121-4/+39
| | | | | | | | | | | There are 2 ways. I prefer the former: GALLIUM_MSAA=n __GL_FSAA_MODE=n Tested with ETQW, which doesn't support MSAA on Linux. This is the only way to get MSAA there. Reviewed-by: Brian Paul <[email protected]>
* mesa: don't advertise ARB_texture_buffer_object in legacy contextsMarek Olšák2012-12-127-20/+23
| | | | Reviewed-by: Ian Romanick <[email protected]>
* mesa: disallow creation of GL 3.1 compatibility contextsMarek Olšák2012-12-123-9/+8
| | | | | | Death to driver-specific hacks! Reviewed-by: Ian Romanick <[email protected]>
* gallium: remove pipe_surface::usageMarek Olšák2012-12-1262-126/+33
| | | | | | Not really used by anybody now. Reviewed-by: Brian Paul <[email protected]>
* svga: stop using pipe_surface::usageMarek Olšák2012-12-121-15/+7
| | | | | | | | | There are only 2 possible usages: render target and depth stencil. Both can be derived from the surface format, so the flag is redundant. And it's going away... Reviewed-by: Brian Paul <[email protected]>
* gallium/util: move util_try_blit_via_copy_region to u_surface.cMarek Olšák2012-12-125-158/+164
| | | | Reviewed-by: Brian Paul <[email protected]>
* gallium/cso: don't use the pipe_error return type where it's not neededMarek Olšák2012-12-122-41/+24
| | | | Reviewed-by: Brian Paul <[email protected]>
* gallium: manage render condition in cso_context and fix postprocessing w/ itMarek Olšák2012-12-129-33/+43
| | | | Reviewed-by: Brian Paul <[email protected]>
* st/mesa: remove a weird msaa hackMarek Olšák2012-12-124-29/+2
| | | | | | It doesn't work and it's not clear how it's supposed to work. Reviewed-by: Brian Paul <[email protected]>
* softpipe: implement seamless cubemap support. (v1.1)Dave Airlie2012-12-122-9/+139
| | | | | | | | | | | | | | | | | This adds seamless sampling for cubemap boundaries if requested. The corner case averaging is messy but seems like it should be spec compliant. The face direction stuff is also a bit messy, I've no idea if that could or should be simpler, or even if all my directions are fully correct! v1.1: update comments, drop unneeded seamless calls for nearest, fix if statement layout. Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Brian Paul <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* gallium: fix cap warnings for tbo cap.Dave Airlie2012-12-123-0/+3
| | | | Signed-off-by: Dave Airlie <[email protected]>
* glsl_to_tgsi: emit multi-level structs and arrays properly.Dave Airlie2012-12-121-9/+42
| | | | | | | | | | | | | This follow the code from the i965 driver, and emits the structs and arrays recursively. This fixes an assert in the two UBO tests fs-struct-copy-complicated and vs-struct-copy-complicated These tests now pass on softpipe, with no regressions. Signed-off-by: Dave Airlie <[email protected]>
* llvmpipe: don't use user constant buffersBrian Paul2012-12-111-1/+2
| | | | | | | This fixes some use-after-free issues. I haven't measured any real performance difference with a handful of Mesa demos. Reviewed-by: Jose Fonseca <[email protected]>
* llvmpipe: support pipe_resource-based constant buffersBrian Paul2012-12-117-34/+48
| | | | | | | | | | | | | | | | Before this we only supported user-based constant buffers. First, we basically plumb pipe_constant_buffer objects through llvmpipe rather than pipe_resource objects. Second, update llvmpipe_set_constant_buffer() and try_update_scene_state() so they understand both resource- and user-based constant buffers. The problem with user constant buffers is the potential for use-after-free, as seen in some WebGL tests. The next patch will flip the switch for resource-based const buffers. Reviewed-by: Jose Fonseca <[email protected]>
* util: add util_copy_constant_buffer() helper functionBrian Paul2012-12-111-0/+20
| | | | Reviewed-by: Jose Fonseca <[email protected]>
* i965/fs: Improve performance of shaders that start out with a discard.Eric Anholt2012-12-116-7/+148
| | | | | | | | | | | | | | I had tried this in the past, but ran into trouble with applications that sample from undiscarded pixels in the same subspan. To fix that issue, only jump to the end for an entire subspan at a time. Improves GLbenchmark 2.7 (1024x768) performance by 7.9 +/- 1.5% (n=8). v2: Drop the br variable in the jump instruction -- if I ever do jumps pre-gen6, it'll be a different code block anyway since we don't have HALT until gen6. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Rewrite discards to use a flag subreg to track discarded pixels.Eric Anholt2012-12-118-73/+46
| | | | | | | | | This makes much more sense on gen6+, and will also prove useful for early exit of shaders on discard. v2: fix up a stale comment from before converting gen4-5. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Add an instruction flag for choosing the flag subregister.Eric Anholt2012-12-116-13/+42
| | | | | | | | We're going to redo discard handling to track discards in the other flag subregister, saving instructions in the discard and allowing predicated jumps out to the end of the shader. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Let brw_flag_reg() choose the flag reg and subreg.Eric Anholt2012-12-114-7/+7
| | | | | | We're about to start using the f0.1 subregister. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Print the flag reg updated by conditional modifiers.Eric Anholt2012-12-111-1/+15
| | | | | | | This makes our output more consistent with other disasm tools, and will be necessary when we start using f0.1. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add the new flag_reg_nr instruction field from IVB.Eric Anholt2012-12-112-5/+9
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Correct the name and usage of the flag subregister number field.Eric Anholt2012-12-114-15/+15
| | | | | | | We've been calling it a register number, it's actually the subregister, and things will get confusing once we start using it if it isn't fixed. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Remove bogus flag_reg_nr field from bits3.Eric Anholt2012-12-111-4/+2
| | | | | | | There's a flag subreg nr field in bits2 next to src0.vertstride, but there shouldn't be anything in bits3 next to src1.vertstride. Reviewed-by: Kenneth Graunke <[email protected]>
* st/egl/drm: only unref the udev device if neededTobias Droste2012-12-111-4/+5
| | | | | | | | | | | Fixes compiler warning: drm/native_drm.c: In function ‘native_create_display’: drm/native_drm.c:180:21: warning: ‘device’ may be used uninitialized in this function [-Wmaybe-uninitialized] drm/native_drm.c:157:24: note: ‘device’ was declared here Signed-off-by: Tobias Droste <[email protected]> Reviewed-by: Alex Deucher <[email protected]>
* softpipe: Use os_time_get_nano() everywhere.José Fonseca2012-12-112-5/+5
|
* clover: Install CL headers.Johannes Obermayr2012-12-101-0/+10
| | | | Note: This is a candidate for the stable branches.
* gallivm: Lower TGSI_OPCODE_MUL to fmul by defaultTom Stellard2012-12-101-2/+3
| | | | | | | This fixes a number of crashes on r600g due to the fact that lp_build_mul assumes vector types when optimizing mul to bit shifts. This bug was uncovered by 0ad1fefd6951aa47ab58a41dc9ee73083cbcf85c
* llvmpipe: fix txq for 1d/2d arrays. (v3)Dave Airlie2012-12-111-2/+15
| | | | | | | | | | | | | | | | | Noticed would fail, we were doing two things wrong a) 1d arrays require the layers in height b) minifying the layers field. v2: don't change height code, fixup completely inside txq as suggested by Roland. v3: just add minify before texture array size v1: Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>