summaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* glsl: Apply the transformation "(a || a) -> a" in opt_algebraic.Eric Anholt2013-11-151-1/+3
| | | | | | | | | | | total instructions in shared programs: 1732385 -> 1732373 (-0.00%) instructions in affected programs: 416 -> 404 (-2.88%) GAINED: 0 LOST: 0 (That's 4 already-short fragment shaders in dota2) Reviewed-by: Jordan Justen <[email protected]>
* glsl: Move the CSE equality functions to the ir class.Eric Anholt2013-11-154-179/+222
| | | | | | | | I want to reuse them in opt_algebraic. v2: Merge in Chris Forbes's break fix. Reviewed-by: Jordan Justen <[email protected]>
* clover: Remove dead file from Makefile.sources.Matt Turner2013-11-151-1/+0
| | | | | | Reviewed-by: Francisco Jerez <[email protected]> Reviewed-by: Tom Stellard <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* i965: Rework brw_new_batch to actually start a new batch.Kenneth Graunke2013-11-151-4/+5
| | | | | | | | | | | | | | | Previously, brw_new_batch was called just after execbuf, but before intel_batchbuffer_reset. Essentially, it prepared for the creation of a new batch, that wasn't yet available, and which it didn't create. This was a bit awkward. This patch makes brw_new_batch call intel_batchbuffer_reset as the very first operation. This means that brw_new_batch actually creates a new batchbuffer, and thus has it available. It brings the creation of the new batchbuffer and BRW_NEW_BATCH flagging together into one place. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Move cache_used_by_gpu flag setting to brw_finish_batch.Kenneth Graunke2013-11-151-6/+6
| | | | | | | It really makes more sense here. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i915: Actually enable __DRI2rendererQueryExtensionRecIan Romanick2013-11-151-0/+1
| | | | | | | | | | | | | | | | | | More rebase fail. This code was written long before i915 and i965 were split, so most of the code in i9[16]5/intel_screen.c only needed to exist in one place. It looks like I fixed n-1 of those places after rebasing on the split. I only found this from the defined-but-not-used warning for intelRendererQueryExtension. I noticed this while fixing the other, related warnings. (Note: During review, we decided to *not* pick this back to 10.0.) Signed-off-by: Ian Romanick <[email protected]> Cc: Daniel Vetter <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Acked-by: Paul Berry <[email protected]>
* radeon/llvm: Free elf_buffer after useAaron Watry2013-11-151-0/+1
| | | | | | | | Prevents a memory leak. v2: Remove null check CC: "10.0" <[email protected]>
* r600/llvm: Free binary.code/binary.config in r600_llvm_compileAaron Watry2013-11-151-0/+3
| | | | | | | | | | | radeon_llvm_compile allocates memory for binary.code, binary.config, or neither depending on what's being done. We need to make sure to free that memory after it's no longer needed. v2: Don't bother checking for null before FREE() CC: "10.0" <[email protected]>
* r600/llvm: initialize radeon_llvm_binaryAaron Watry2013-11-151-0/+1
| | | | | | | | | | | | | use memset to initialize to 0's... otherwise code_size and config_size could be uninitialized when read later in this method. It's also hard to do NULL checks on uninitialized pointers. Reviewed-by: Tom Stellard <[email protected]> v2: Fix indentation CC: "10.0" <[email protected]>
* svga: remove unused vars in svga_hwtnl_simple_draw_range_elements()Brian Paul2013-11-151-12/+2
| | | | | | And simplify the code. Reviewed-by: Jose Fonseca <[email protected]>
* svga: print warning for unsupported indirect dest reg indexingBrian Paul2013-11-151-0/+4
| | | | | | | | | For DX9-level shaders, there's only limited support for indirect indexing of registers (with the loop counter register, not the general address register.) Reviewed-by: José Fonseca <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* svga: mark dest image as defined in svga_surface_copy()Brian Paul2013-11-151-0/+2
| | | | | | | | | | | | After we blit/copy to a dest texture image we need to mark it as being defined. This fixes broken mipmap generation for quite a few texture formats. Mipgen involves making texture views and svga_texture_view_surface() skips texture images that are undefined. Cc: "10.0" <[email protected]> Reviewed-by: José Fonseca <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* svga: do primitive trimming in translate_indices()Brian Paul2013-11-151-3/+12
| | | | | | | | | | | | | | The index translation code expects the number of indexes to be consistent with the primitive type (ex: a multiple of 3 for PIPE_PRIM_TRIANGLES). If it's not, we can write out of bounds in the destination buffer. Fixes failed assertions in the pipebuffer debug code found with Piglit primitive-restart-draw-mode test. Cc: "10.0" <[email protected]> Reviewed-by: José Fonseca <[email protected]>
* indices: add comments, assertions in u_indices.c fileBrian Paul2013-11-151-0/+26
| | | | | Reviewed-by: José Fonseca <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* mesa: remove duplicated prototypes in varray.hBrian Paul2013-11-151-6/+0
|
* gallium/pipe_loader: un-reference udev resources when we're done with them.Aaron Watry2013-11-151-0/+3
| | | | | | Reviewed-by: Tom Stellard <[email protected]> CC: "10.0" <[email protected]>
* radeonsi/compute: Dispose of LLVM module after compiling kernelsAaron Watry2013-11-151-0/+1
| | | | | | | | v2: Fix indentation Reviewed-by: Tom Stellard <[email protected]> CC: "10.0" <[email protected]>
* radeonsi/compute: Free program and program.kernels on shutdownAaron Watry2013-11-151-1/+15
| | | | | | | | v2: Fix indentation Reviewed-by: Tom Stellard <[email protected]> CC: "10.0" <[email protected]>
* radeon/llvm: Free created llvm memory bufferAaron Watry2013-11-151-0/+1
| | | | | | | | v2: Fix indentation Reviewed-by: Tom Stellard <[email protected]> CC: "10.0" <[email protected]>
* radeon/llvm: Free libelf resourcesAaron Watry2013-11-151-0/+3
| | | | | | | | v2: Fix indentation Reviewed-by: Tom Stellard <[email protected]> CC: "10.0" <[email protected]>
* radeon/llvm: fix spelling errorAaron Watry2013-11-151-1/+1
| | | | | | Reviewed-by: Tom Stellard <[email protected]> CC: "10.0" <[email protected]>
* clover: Support multiple devices in clCreateContextFromType() v2Tom Stellard2013-11-151-3/+9
| | | | | | | | | v2: - Use clGetDeviceIDs to query devices. Reviewed-by: Francisco Jerez <[email protected]> CC: "10.0" <[email protected]>
* glsl: Rework interface block linking.Paul Berry2013-11-151-20/+251
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, when doing intrastage and interstage interface block linking, we only checked the interface type; this prevented us from catching some link errors. We now check the following additional constraints: - For intrastage linking, the presence/absence of interface names must match. - For shader ins/outs, the interface names themselves must match when doing intrastage linking (note: it's not clear from the spec whether this is necessary, but Mesa's implementation currently relies on it). - Array vs. nonarray must be consistent, taking into account the special rules for vertex-geometry linkage. - Array sizes must be consistent (exception: during intrastage linking, an unsized array matches a sized array). Note: validate_interstage_interface_blocks currently handles both uniforms and in/out variables. As a result, if all three shader types are present (VS, GS, and FS), and a uniform interface block is mentioned in the VS and FS but not the GS, it won't be validated. I plan to address this in later patches. Fixes the following piglit tests in spec/glsl-1.50/linker: - interface-blocks-vs-fs-array-size-mismatch - interface-vs-array-to-fs-unnamed - interface-vs-unnamed-to-fs-array - intrastage-interface-unnamed-array v2: Simplify logic in intrastage_match() for handling array sizes. Make extra_array_level const. Use an unnamed temporary interface_block_definition in validate_interstage_interface_blocks()'s first call to definitions->store(). Cc: "10.0" <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: Fix vertical alignment for multisampled buffers.Paul Berry2013-11-151-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | From the Sandy Bridge PRM, Vol 1 Part 1 7.18.3.4 (Alignment Unit Size): j [vertical alignment] = 4 for any render target surface is multisampled (4x) From the Ivy Bridge PRM, Vol 4 Part 1 2.12.2.1 (SURFACE_STATE for most messages), under the "Surface Vertical Alignment" heading: This field is intended to be set to VALIGN_4 if the surface was rendered as a depth buffer, for a multisampled (4x) render target, or for a multisampled (8x) render target, since these surfaces support only alignment of 4. Back in 2012 when we added multisampling support to the i965 driver, we forgot to update the logic for computing the vertical alignment, so we were often using a vertical alignment of 2 for multisampled buffers, leading to subtle rendering errors. Note that the specs also require a vertical alignment of 4 for all Y-tiled render target surfaces; I plan to address that in a separate patch. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=53077 Cc: [email protected] Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* main: Fix MaxUniformComponents for geometry shaders.Paul Berry2013-11-151-1/+1
| | | | | | | | | | | | | | | For both vertex and fragment shaders we default MaxUniformComponents to 4 * MAX_UNIFORMS. It makes sense to do this for geometry shaders too; if back-ends have different limits they can override them as necessary. Fixes piglit test: spec/glsl-1.50/built-in constants/gl_MaxGeometryUniformComponents Cc: "10.0" <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* tools/trace: Several bugfixes/improvements to dump_state.pyJosé Fonseca2013-11-151-9/+49
| | | | | | | | | - Don't crash with user memory pointers. - Support old bind_*_sampler_* methods. Useful when comparing dumps from old branches. - Misc.
* trace: Dump user_buffer members.José Fonseca2013-11-151-0/+2
|
* mesa: Fix derived vertex state not being updated in glCallList()Fredrik Höglund2013-11-151-6/+16
| | | | | | | | | | AEcontext::NewState is not always set when the vertex array state is changed. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=71492 Cc: "10.0" <[email protected]> Reviewed-by: José Fonseca <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* radeonsi: add support for Hawaii asics (v2)Alex Deucher2013-11-155-0/+17
| | | | | | | Update additional register fields. Reviewed-by: Michel Dänzer <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* i965: Initialize schedule_node::delay.Vinson Lee2013-11-141-0/+1
| | | | | | | Fixes "Uninitialized scalar field" defect reported by Coverity. Signed-off-by: Vinson Lee <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* haiku/swrast: Inherit gl_config, fix flushAlexander von Gluck IV2013-11-142-37/+26
| | | | | | | | * Inherit gl_context so we always have access to it * Thanks curro for the idea. * Last Haiku cannidate for 10.0.0 Reviewed-by: Francisco Jerez <[email protected]>
* llvmpipe: (trivial) fix more fallout from the setup cleanup.Roland Scheidegger2013-11-141-2/+4
| | | | Oops... Should have done some more testing.
* llvmpipe: (trivial) fix misplaced bld context assignment.Roland Scheidegger2013-11-141-2/+1
| | | | Should fix polygon offset crashes...
* gallivm: Compile flag to debug TGSI execution through printfs.José Fonseca2013-11-145-47/+222
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is similar to tgsi_exec.c's DEBUG_EXECUTION compile flag. I had prototyped this for a while while debugging an issue, but finally cleaned this up and added a few more bells and whistles. v2: Use '$' as marker; better output. Thanks to Brian, Zack and Roland reviews. Here is a sample output. CONST[0].x = 0.00625000009 0.00625000009 0.00625000009 0.00625000009 CONST[0].y = -0.00714285718 -0.00714285718 -0.00714285718 -0.00714285718 CONST[0].z = -1 -1 -1 -1 CONST[0].w = 1 1 1 1 IN[0].x = 143.5 175.5 175.5 143.5 IN[0].y = 123.5 123.5 155.5 155.5 IN[0].z = 0 0 0 0 IN[0].w = 1 1 1 1 $ 1: RCP TEMP[0].w, IN[0].wwww TEMP[0].w = 1 1 1 1 $ 2: MAD TEMP[0].xy, IN[0], CONST[0], CONST[0].zwzw TEMP[0].x = -0.103124976 0.0968750715 0.0968750715 -0.103124976 TEMP[0].y = 0.117857158 0.117857158 -0.110714316 -0.110714316 $ 3: MUL OUT[0].xy, TEMP[0], TEMP[0].wwww OUT[0].x = -0.103124976 0.0968750715 0.0968750715 -0.103124976 OUT[0].y = 0.117857158 0.117857158 -0.110714316 -0.110714316 $ 4: MUL OUT[0].z, IN[0].zzzz, TEMP[0].wwww OUT[0].z = 0 0 0 0 $ 5: MOV OUT[0].w, TEMP[0] OUT[0].w = 1 1 1 1 $ 6: END OUT[0].x = -0.103124976 0.0968750715 0.0968750715 -0.103124976 OUT[0].y = 0.117857158 0.117857158 -0.110714316 -0.110714316 OUT[0].z = 0 0 0 0 OUT[0].w = 1 1 1 1
* softpipe: (trivial) fix debug codeRoland Scheidegger2013-11-141-15/+10
| | | | | | The debug printfs wouldn't actually compile when enabled, so kill them off and insert some new one in another place, and make sure it keeps compiling by enclosing it in a if-0 clause.
* llvmpipe: clean up state setup code a bitRoland Scheidegger2013-11-141-115/+59
| | | | | | | In particular get rid of home-grown vector helpers which didn't add much. And while here fix formatting a bit. No functional change. Reviewed-by: Jose Fonseca <[email protected]>
* gallivm,llvmpipe: fix float->srgb conversion to handle NaNsRoland Scheidegger2013-11-145-28/+45
| | | | | | | | | | | | | | | | | | | | | | | | d3d10 requires us to convert NaNs to zero for any float->int conversion. We don't really do that but mostly seems to work. In particular I suspect the very common float->unorm8 path only really passes because it relies on sse2 pack intrinsics which just happen to work by luck for NaNs (float->int conversion in hw gives integer indeterminate value, which just happens to be -0x80000000 hence gets converted to zero in the end after pack intrinsics). However, float->srgb didn't get so lucky, because we need to clamp before blending and clamping resulted in NaN behavior being undefined (and actually got converted to 1.0 by clamping with sse2). Fix this by using a zero/one clamp with defined nan behavior as we can handle the NaN for free this way. I suspect there's more bugs lurking in this area (e.g. converting floats to snorm) as we don't really use defined NaN behavior everywhere but this seems to be good enough. While here respecify nan behavior modes a bit, in particular the return_second mode didn't really do what we wanted. From the caller's perspective, we really wanted to say we need the non-nan result, but we already know the second arg isn't a NaN. So we use this now instead, which means that cpu architectures which actually implement min/max by always returning non-nan (that is adhering to ieee754-2008 rules) don't need to bend over backwards for nothing. Reviewed-by: Jose Fonseca <[email protected]>
* dri: Change value param to unsignedIan Romanick2013-11-134-4/+4
| | | | | | | | | This silences some compiler warnings in i915 and i965. See also 75982a5. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: "10.0" <[email protected]>
* i965: Use drm_intel_get_aperture_sizes instead of hard-coded 2GiBIan Romanick2013-11-131-3/+7
| | | | | | | | | | Systems with little physical memory installed will report less than 2GiB, and some systems may (hypothetically?) have a larger address space for the GPU. My IVB still reports 1534. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Daniel Vetter <[email protected]> Cc: "10.0" <[email protected]>
* i915: Use drm_intel_get_aperture_sizes instead of drmAgpSizeIan Romanick2013-11-131-2/+6
| | | | | | | | Send the zombie back to the grave before it infects the townsfolk. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Daniel Vetter <[email protected]> Cc: "10.0" <[email protected]>
* i965: implement blit path for PBO glDrawPixelsAlexander Monakov2013-11-132-0/+120
| | | | | | | | | | | | This patch implements accelerated path for glDrawPixels from a PBO in i965. The code follows what intel_pixel_read, intel_pixel_copy, intel_pixel_bitmap and intel_tex_image are doing. Piglit quick.tests show no regressions. In my testing on IVB, performance improvement is huge (about 30x, didn't measure exactly) since generic path goes via _mesa_unpack_color_span_float, memcpy, extract_float_rgba. Signed-off-by: Alexander Monakov <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* haiku: add swrast driverAlexander von Gluck IV2013-11-135-0/+873
| | | | | | | | * This is pretty small and upkeep should be minimal. * Currently fully working. * Cannidate for 10.0.0 branch Acked-by: Brian Paul <[email protected]>
* dri: Remove redundant createNewContext function from __DRIimageDriverExtensionKristian Høgsberg2013-11-122-46/+11
| | | | | | | | | | createContextAttribs is a superset of what createNewContext provides. Also remove the function typedef, since createNewContext is deprecated and no longer used in multiple interfaces. Signed-off-by: Kristian Høgsberg <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Cc: "10.0" <[email protected]>
* wayland: Use __DRIimage based getBuffers implementation when availableKristian Høgsberg2013-11-122-47/+96
| | | | | | | | | | | This lets us allocate color buffers as __DRIimages and pass them into the driver instead of having to create a __DRIbuffer with the flink that requires. Signed-off-by: Kristian Høgsberg <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Chad Versace <[email protected]> Cc: "10.0" <[email protected]>
* gbm: Add support for __DRIimage based getBuffers when availableKristian Høgsberg2013-11-123-10/+72
| | | | | | | | | | | | | | | | | | This lets us allocate color buffers as __DRIimages and pass them into the driver instead of having to create a __DRIbuffer with the flink that requires. With this patch, we can now run gbm on render-nodes. A render-node is a drm device that doesn't support modesetting and all the legacy DRI ioctls. flink is also not supported, but now that gbm doesn't need flink, we can run piglit on head-less gbm or head-less GPGPU. Signed-off-by: Kristian Høgsberg <[email protected]> Reviewed-by: Chad Versace <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Tested-by: Jordan Justen <[email protected]> Cc: "10.0" <[email protected]>
* dri/i915, dri/i965: Fix support for planar imagesAnder Conselvan de Oliveira2013-11-122-2/+4
| | | | | | | | | | | Planar images have format __DRI_IMAGE_FORMAT_NONE, but the patch that moved the conversion from dri_format to the mesa format made it impossible to allocate a image with that format. Signed-off-by: Ander Conselvan de Oliveira <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Cc: "10.0" <[email protected]>
* i965/fs: Try a different pre-scheduling heuristic if the first spills.Eric Anholt2013-11-125-54/+76
| | | | | | | | | | | | | | | | | | | | | | | Since LIFO fails on some shaders in one particular way, and non-LIFO systematically fails in another way on different kinds of shaders, try them both, and pick whichever one successfully register allocates first. Slightly prefer non-LIFO in case we produce extra dependencies in register allocation, since it should start out with fewer stalls than LIFO. This is madness, but I haven't come up with another way to get unigine tropics to not spill while keeping other programs from not spilling and retaining the non-unigine performance wins from texture-grf. total instructions in shared programs: 1626728 -> 1626288 (-0.03%) instructions in affected programs: 1015 -> 575 (-43.35%) GAINED: 50 LOST: 0 Improves Unigine Tropics performance by 14.5257% +/- 0.241838% (n=38) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70445 Cc: "10.0" <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Do instruction pre-scheduling just before register allocation.Eric Anholt2013-11-121-2/+2
| | | | | | | | | | | Long ago, the HW_REG usage in assign_curb/urb_setup() were scheduling barriers, so we had to run scheduler before them in order for it to be able to do basically anything. Now that that's fixed, we can delay the scheduling until we go to allocate (which will make the next change less scary). Cc: "10.0" <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Ignore actual latency pre-reg-alloc.Eric Anholt2013-11-121-21/+29
| | | | | | | | | | | | | | | | We care about depth-until-program-end, as a proxy for "make sure I schedule those early instructions that open up the other things that can make progress while keeping register pressure low", not actual latency (since we're relying on the post-register-alloc scheduling to actually schedule for the hardware). total instructions in shared programs: 1609931 -> 1609931 (0.00%) instructions in affected programs: 0 -> 0 GAINED: 55 LOST: 43 Cc: "10.0" <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Fix message setup for SIMD8 spills.Eric Anholt2013-11-121-1/+1
| | | | | | | | | | | | | | In the SIMD16 spilling changes, I replaced a "1" in the spill path with "mlen", but obviously it wasn't mlen before because spills have the g0 header along with the payload. The interface I was trying to use was asking for how many physical regs we're writing, so we're looking for "1" or "2". I'm guessing this actually passed piglit because the high 8 bits of the execution mask in SIMD8 mode are all 0s. Cc: "10.0" <[email protected]> Reviewed-by: Paul Berry <[email protected]>