summaryrefslogtreecommitdiffstats
path: root/src/mesa
Commit message (Collapse)AuthorAgeFilesLines
* swrast: fix MSVC double->float conversion warningsBrian Paul2013-10-311-2/+2
|
* mesa: fix some MSVC signed/unsigned compiler warningsBrian Paul2013-10-313-6/+8
|
* meta: fix assorted MSVC int/float conversion warningsBrian Paul2013-10-311-14/+14
|
* st/draw: silence Mingw warning in pointer_to_offset()Brian Paul2013-10-311-1/+1
| | | | | Fixes "warning: cast from pointer to integer of different size" for 64-bit builds.
* i965/fs: Perform CSE on CMP(N) instructions.Matt Turner2013-10-301-10/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Optimizes cmp.ge.f0(8) null g45<8,8,1>F 0F (+f0) sel(8) g50<1>F g40<8,8,1>F g10<8,8,1>F cmp.ge.f0(8) null g45<8,8,1>F 0F (+f0) sel(8) g51<1>F g41<8,8,1>F g11<8,8,1>F cmp.ge.f0(8) null g45<8,8,1>F 0F (+f0) sel(8) g52<1>F g42<8,8,1>F g12<8,8,1>F cmp.ge.f0(8) null g45<8,8,1>F 0F (+f0) sel(8) g53<1>F g43<8,8,1>F g13<8,8,1>F into cmp.ge.f0(8) null g45<8,8,1>F 0F (+f0) sel(8) g50<1>F g40<8,8,1>F g10<8,8,1>F (+f0) sel(8) g51<1>F g41<8,8,1>F g11<8,8,1>F (+f0) sel(8) g52<1>F g42<8,8,1>F g12<8,8,1>F (+f0) sel(8) g53<1>F g43<8,8,1>F g13<8,8,1>F total instructions in shared programs: 1644938 -> 1638181 (-0.41%) instructions in affected programs: 574955 -> 568198 (-1.18%) Two more 16-wide programs (in L4D2). Some large (-9%) decreases in instruction count in some of Valve's Source Engine games. No regressions. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965/fs: Don't emit null MOVs in CSE.Matt Turner2013-10-301-17/+25
| | | | | | | | We'd like to CSE some instructions, like CMP, that often have null destinations. Instead of replacing them with MOVs to null, just don't emit the MOV. Reviewed-by: Paul Berry <[email protected]>
* i965/fs: Use reads_flag and writes_flag methods in the scheduler.Matt Turner2013-10-301-12/+4
| | | | | Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965/fs: Add reads_flag() and writes_flag() to fs_inst.Matt Turner2013-10-302-0/+16
| | | | | Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965/fs: Add is_null() method to fs_reg.Matt Turner2013-10-302-0/+9
| | | | | Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965/fs: Use the gen7 scratch read opcode when possible.Eric Anholt2013-10-308-3/+91
| | | | | | | | | | This avoids a lot of message setup we had to do otherwise. Improves GLB2.7 performance with register spilling force enabled by 1.6442% +/- 0.553218% (n=4). v2: Use BRW_PREDICATE_NONE, improve a comment (by Paul). Reviewed-by: Paul Berry <[email protected]>
* i965: Merge together opcodes for SHADER_OPCODE_GEN4_SCRATCH_READ/WRITEEric Anholt2013-10-3010-36/+33
| | | | | | | I'm going to be introducing gen7 variants, and the previous naming was going to get confusing. Reviewed-by: Paul Berry <[email protected]>
* i965/fs: Fix register unspills from a reg_offset.Eric Anholt2013-10-301-3/+3
| | | | | | | We were clearing the reg_offset before trying to use it. Oops. Fixes glsl-fs-texture2drect with the reg spilling debug enabled. Reviewed-by: Paul Berry <[email protected]>
* i965/fs: Fix register spilling for 16-wide.Eric Anholt2013-10-302-14/+13
| | | | | | | | | | | Things blew up when I enabled the debug register spill code without disabling 16-wide, so I decided to just fix 16-wide spilling. We still don't generate 16-wide when register spilling happens as part of allocation (since we expect it to be slower), but now we can experiment with allowing it in some cases in the future. Reviewed-by: Paul Berry <[email protected]>
* i965/fs: Exit the compile if spilling would overwrite in-use MRFs.Eric Anholt2013-10-303-0/+25
| | | | | | | | | I believe this will never happen in SIMD8 mode, but it could for SIMD16 when we fix it. v2: Fix off-by-one in my register counting comment (caught by Paul). Reviewed-by: Paul Berry <[email protected]> (v1)
* i965/fs: Fix broken register spilling debug code.Eric Anholt2013-10-302-7/+11
| | | | | | | | | | | Now that reg spilling generates new vgrfs, we were looping forever if you ever turned it on. Instead, move the debug code into the register allocator right near where we'd be doing spilling anyway, which should more accurately reflect how register spilling occurs in the wild. Reviewed-by: Paul Berry <[email protected]>
* i965/fs: Split "find what MRFs were used" to a helper function.Eric Anholt2013-10-302-9/+25
| | | | | | | | I'm going to need to reuse this for fixing register spilling on SIMD16. Note that BRW_MAX_MRF is 16, which is the same as BRW_MAX_GRF - GEN7_MRF_HACK_START. Reviewed-by: Paul Berry <[email protected]>
* i965/fs: Update an ancient, wrong comment about reg_offset.Eric Anholt2013-10-301-3/+5
| | | | | | This hasn't been true since SIMD16 mode was added. Reviewed-by: Paul Berry <[email protected]>
* i965/fs: Prefer more-critical instructions of the same age in LIFO scheduling.Eric Anholt2013-10-301-15/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | When faced with a million instructions that all became candidates at the same time (none of which individually reduce register pressure), the ones on the critical path are more likely to be the ones that will free up some candidates soon. shader-db: total instructions in shared programs: 1681070 -> 1681070 (0.00%) instructions in affected programs: 0 -> 0 GAINED: 40 LOST: 74 Fixes indistinguishable-from-hanging behavior in GLES3conform's uniform_buffer_object_max_uniform_block_size test, regressed by c3c9a8c85758796a26b48e484286e6b6f5a5299a. Given that 93bd627d5a6c485948b94488e6cd53a06b7ebdcf was unlocked by that commit, the net effect on 16-wide program count is still quite positive, and I think this should give us more stable scheduling (less dependency on original instruction emit order). v2: Comment suggestions by Paul Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70943 Reviewed-by: Paul Berry <[email protected]>
* i965: Compute the node's delay time for scheduling.Eric Anholt2013-10-301-0/+28
| | | | | | | | | | | | This is a step in doing scheduling as described in Muchnick (p538). A difference is that our latency function is only specific to one instruction (it doesn't describe, for example, the different latency between WAR of a send's arguments and RAW of a send's destination), but that's changeable later. We also don't separately compute the postorder traversal of the graph, since we can use the setting of the delay field as the "visited" flag. Reviewed-by: Paul Berry <[email protected]>
* mesa: Drop unused return value from use_shader_programGregory Hainaut2013-10-301-5/+3
| | | | | | | | | | | | The return value has been unused since commit d348b0c. This was originally included in another patch, but it was split out by Ian Romanick. v2: Drop unnecessary final return. Suggested by Paul. Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Paul Berry <[email protected]> Cc: Eric Anholt <[email protected]>
* automake: properly handle non-default expat installationEmil Velikov2013-10-291-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | Use PKG_CHECK_MODULE over requesting the user to setup the option at configure time. Drop unused EXPAT_INCLUDE and update all targets. NOTE: The this commit removes the --with-expat configure option. One should ensure that the expat they wish to use has expat.pc file accessible by pkg-config. v2: * Add note about the removal of --with-expat (per Tom Stellard) * Drop EXPAT_CFLAGS for targets that do not build DRI_COMMON (spotted by Matt Turner) v3: * Rebase on top of megadrivers (drop EXPAT_CFLAGS from swrast) Acked-by: Matt Turner <[email protected]> (v2) Reviewed-by: Tom Stellard <[email protected]> (v2) Signed-off-by: Emil Velikov <[email protected]> Conflicts: configure.ac src/mesa/drivers/dri/common/Makefile.am
* i965/fs: Drop our dead push constants before overflowing to pull constants.Eric Anholt2013-10-291-2/+1
| | | | | | | | | | | | | | | | The idea of the original order was that you'd dead code eliminate accesses to push constants. But I've never seen a case of that (nor has shader-db), while we frequently see sparse accesses of large constant arrays that would overflow into pull constants. Cuts pull constant use on csgo, serious sam, planeshift, and the cave: total instructions in shared programs: 1695103 -> 1688795 (-0.37%) instructions in affected programs: 92024 -> 85716 (-6.85%) GAINED: 339 LOST: 0 Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Simplify the shader time code by using atomic counter helpers.Francisco Jerez2013-10-294-41/+7
| | | | Reviewed-by: Paul Berry <[email protected]>
* i965: Add brw_reg constructors taking a dynamically determined vector width.Francisco Jerez2013-10-291-0/+24
| | | | | | | | The MRF variant is going to be used extensively by the atomic counter intrinsics to assemble untyped atomic and surface read messages easily. Reviewed-by: Paul Berry <[email protected]>
* i965/gen7: Implement code generation for untyped surface read instructions.Francisco Jerez2013-10-299-0/+112
|
* i965/gen7: Implement code generation for untyped atomic instructions.Francisco Jerez2013-10-299-0/+130
| | | | Reviewed-by: Paul Berry <[email protected]>
* i965: Implement ABO surface state emission.Francisco Jerez2013-10-297-0/+122
| | | | | | | | | | | | The maximum number of atomic buffer objects is somewhat arbitrary, we can change it in the future easily if it turns out it's not enough... v2: Add comments with the relevant mesa dirty bits. Fix usage of BRW_NEW_UNIFORM_BUFFER in the GS ABO state atom. v3: Update binding table layout diagrams. v4: Resolve conflicts with the recent dynamic surface index assignment changes. Reviewed-by: Paul Berry <[email protected]>
* i965: Define vtbl method that initializes an untyped R/W surface.Francisco Jerez2013-10-292-5/+37
| | | | | | | | And add Gen7 implementation. v2: Fix off by one error in buffer size calculation. Reviewed-by: Paul Berry <[email protected]>
* glsl: Add new atomic_uint built-in GLSL type.Francisco Jerez2013-10-296-0/+9
| | | | | | | | | v2: Fix GLSL version in which the type became available. Add contains_atomic() convenience method. Split off atomic counter comparison error checking to a separate patch that will handle all opaque types. Include new ir_variable fields for atomic types. Reviewed-by: Ian Romanick <[email protected]>
* mesa: Add support for ARB_shader_atomic_counters.Francisco Jerez2013-10-2910-3/+263
| | | | | | | | | | | | | | | This patch implements the common support code required for the ARB_shader_atomic_counters extension. It defines the necessary data structures for tracking atomic counter buffer objects (from now on "ABOs") associated with some specific context or shader program, it implements support for binding buffers to an ABO binding point and querying the existing atomic counters and buffers declared by GLSL shaders. v2: Fix extension checks. Drop unused MAX_ATOMIC_BUFFERS constant. Acked-by: Paul Berry <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* glapi: Add support for ARB_shader_atomic_counters.Francisco Jerez2013-10-293-1/+10
| | | | | | | | Add XML file for the dispatch code generator, update the dispatch_sanity test and add stub definition for the new entry point. Reviewed-by: Paul Berry <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965: Handle deallocation of some private ralloc contexts explicitly.Francisco Jerez2013-10-294-4/+4
| | | | | | | | | These ralloc contexts belong to a specific object and are being deallocated manually from the class destructor. Now that we've hooked up destructors to ralloc there's no reason for them to be children of any other context, and doing so might to lead to double frees under some circumstances. The class destructor has all the responsibility of freeing class memory resources now.
* mesa: Define introspection macro to determine whether a type is trivially ↵Francisco Jerez2013-10-291-1/+23
| | | | | | | | | | destructible. Only implemented on GCC and Clang for now. Other compilers use a dummy implementation that always returns false, which should be a safe [but slightly inefficient] assumption in all cases. Reviewed-by: Ian Romanick <[email protected]>
* glsl: Generalize MSVC fix for strcasecmp().Paul Berry2013-10-291-0/+1
| | | | | | | | This will let us use strcasecmp() from anywhere inside Mesa without having to worry about the fact that it doesn't exist in MSVC. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* st/mesa: move out of memory check in st_draw_vbo()Brian Paul2013-10-291-3/+4
| | | | | | | | | | | Before we were only checking the st->vertex_array_out_of_memory flag after updating array state. But if there's two consecutive glDrawArrays calls and the first one is skipped because of OOM, the second one should be skipped too. Cc: 9.2 <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* i965/vec4: Reduce working set size of live variables computation.Eric Anholt2013-10-292-23/+28
| | | | | | | | | | | | | | | | Orbital Explorer was generating a 4000 instruction geometry shader, which was taking 275 trips through dead code elimination and register coalescing, each of which updated live variables to get its work done, and invalidated those live variables afterwards. By using bitfields instead of bools (reducing the working set size by a factor of 8) in live variables analysis, it drops from 88% of the profile to 57%, and reduces overall runtime from I-got-bored-and-killed-it (Paul says 3+ minutes) to 10.5 seconds. Compare to f179f419d1d0a03fad36c2b0a58e8b853bae6118 on the FS side. Reviewed-by: Paul Berry <[email protected]>
* Remove error when calling glGenQueries/glDeleteQueries while a query is activeCarl Worth2013-10-281-15/+10
| | | | | | | | | | | | | | | | There is nothing in the OpenGL specification which prevents the user from calling glGenQueries to generate a new query object while another object is active. Neither is there anything in the Mesa implementation which prevents this. So remove the INVALID_OPERATION errors in this case. Similarly, it is explicitly allowed by the OpenGL specification to delete an active query, so remove the assertion for that case, replacing it with the necesssary state updates to end the query, (clear the bindpt pointer and call into the driver's EndQuery hook). CC: <[email protected]> Reviewed-by: Brian Paul <[email protected]> Tested-by: Brian Paul <[email protected]>
* i965: Also emit HiZ and Stencil packets when disabling depth on Gen6.Kenneth Graunke2013-10-281-0/+12
| | | | | | | | | | | The normal drawing path does this, and it's necessary on Ivybridge, so let's try it on Sandybridge too. It's not explicitly documented as necessary, but might help with hangs. Signed-off-by: Kenneth Graunke <[email protected]> Tested-by: Xinkai Chen <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Cc: "9.2" <[email protected]>
* i965: Also emit HIER_DEPTH and STENCIL packets when disabling depth.Kenneth Graunke2013-10-281-0/+12
| | | | | | | | | | | | | | | | | From the documentation: "[DevIVB] 3DSTATE_DEPTH_BUFFER must always be programmed along with the other Depth/Stencil state commands(i.e. 3DSTATE_CLEAR_PARAMS, 3DSTATE_STENCIL_BUFFER, or 3DSTATE_HIER_DEPTH_BUFFER)." We normally do this, but BLORP was failing to do so in the case where it disables depth. Not observed to fix anything yet. Signed-off-by: Kenneth Graunke <[email protected]> Tested-by: Xinkai Chen <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Cc: "9.2" <[email protected]>
* i965: Move post-sync non-zero flush for 3DSTATE_MULTISAMPLE.Kenneth Graunke2013-10-281-3/+3
| | | | | | | | | | | For some reason, we put the flush in the caller, rather than just before emitting the packet. This is more than a cosmetic problem: BLORP calls gen6_emit_3dstate_multisample() directly, and so it missed the flush. Signed-off-by: Kenneth Graunke <[email protected]> Tested-by: Xinkai Chen <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Cc: "9.2" <[email protected]>
* i965: Also guard 3DSTATE_DRAWING_RECTANGLE with a flush in blorp.Kenneth Graunke2013-10-281-0/+3
| | | | | | | | | Non-pipelined commands need this flush. Signed-off-by: Kenneth Graunke <[email protected]> Tested-by: Xinkai Chen <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Cc: "9.2" <[email protected]>
* i965: Emit post-sync non-zero flush before 3DSTATE_DRAWING_RECTANGLE.Kenneth Graunke2013-10-281-0/+4
| | | | | | | | | This is another non-pipelined command that needs a flush on Sandybridge. Signed-off-by: Kenneth Graunke <[email protected]> Tested-by: Xinkai Chen <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Cc: "9.2" <[email protected]>
* i965: Emit post-sync non-zero flush before 3DSTATE_GS_SVB_INDEX.Kenneth Graunke2013-10-281-0/+3
| | | | | | | | | | | | | | | From the comments above intel_emit_post_sync_nonzero_flush: "[DevSNB-C+{W/A}] Before any depth stall flush (including those produced by non-pipelined state commands), software needs to first send a PIPE_CONTROL with no bits set except Post-Sync Operation != 0." This suggests that every non-pipelined (0x79xx) command needs a post-sync non-zero flush before it. Signed-off-by: Kenneth Graunke <[email protected]> Tested-by: Xinkai Chen <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Cc: "9.2" <[email protected]>
* i965: CS writes/reads should use I915_GEM_INSTRUCTIONDaniel Vetter2013-10-281-2/+2
| | | | | | | | | | | | | | | | Otherwise the gen6 w/a in the kernel won't kick in and the write will land nowhere. Inspired by a patch Ken pointed me at which had the same issue (but isn't yet merged and also for a gen7+ feature). An audit of the entire driver didn't reveal any other case than the one in in the write_reg helper used by the gen6 queryobj code. Acked-by: Kenneth Graunke <[email protected]> Signed-off-by: Daniel Vetter <[email protected]> Tested-by: Xinkai Chen <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Cc: "9.2" <[email protected]>
* i965: Do not set bilinear_filter flag in case of multisample blitsAnuj Phogat2013-10-281-1/+1
| | | | | | | | | | | | | | | | Setting bilinear_filter flag in case of multisample blits with GL_LINEAR filter causes incorrect behavior in translate_dst_to_src() function. This broke Modern Warfare (1, 2 and 3) on SNB, IVB and HSW. Tested on SNB and IVB, no Piglit regressions. Trace file of the game (taken with apitrace) works fine with this patch. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=69078 Cc: [email protected] Signed-off-by: Anuj Phogat <[email protected]> Reported-by: Armin K <[email protected]> Tested-by: Armin K <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* mesa: Remove trailing whitespace in texparam.cRico Schüller2013-10-281-6/+6
| | | | | Signed-off-by: Rico Schüller <[email protected]> Signed-off-by: Brian Paul <[email protected]>
* mesa: use void in _mesa_VDPAUFiniNV() as in the header fileBrian Paul2013-10-281-1/+1
|
* i965: Make fs gl_PrimitiveID input work even when there's no gs.Paul Berry2013-10-272-5/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a geometry shader is present, the fragment shader gl_PrimitiveID input acts like an ordinary varying, receiving data from the gs gl_PrimitiveID output. When there's no geometry shader, we have to ask the fixed function SF hardware to provide the primitive ID to the fragment shader instead. Previously, the SF setup code would handle this situation by recognizing that the FS gl_PrimitiveID input didn't match to any VS output; since normally an FS input with no corresponding VS output leads to undefined data, the SF setup code used to just arbitrarily assign it to receive data from attribute 0. This patch changes the SF setup code so that instead of arbitrarily using attribute 0, it assigns the unmatched FS input to receive gl_PrimitiveID. In the case where the FS input really is gl_PrimitiveID, this produces the intended result. In all other cases, no harm is done since GL specifies that the behaviour is undefined. Fixes piglit test primitive-id-no-gs. v2: If an attribute is already being overridden with point coordinates, don't try to also override it with gl_PrimitiveID. This is necessary to avoid regressing piglit tests such as shaders/glsl-fs-pointcoord. Reviewed-by: Eric Anholt <[email protected]>
* mesa: Add GL_NV_vdpau_interop functions to dispatch_sanity.cpp.Vinson Lee2013-10-261-0/+12
| | | | | | | | Fixes 'make check' failures introduced with commit 80964226e9b8a05c39157f9305c06c0b2861e080. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70900 Signed-off-by: Vinson Lee <[email protected]>
* mesa: add vdpau.c and st_vdpau.c to src/mesa/SConscriptBrian Paul2013-10-261-0/+2
| | | | Fixes SCons build.