summaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* radeon/llvm: fix TXQ_LZ handling for cube mapsVadim Girlin2012-12-181-2/+4
| | | | Signed-off-by: Vadim Girlin <[email protected]>
* r600g: initialize inst_mod in r600_tex_from_byte_streamVadim Girlin2012-12-181-0/+2
| | | | Signed-off-by: Vadim Girlin <[email protected]>
* gallivm: fix conversion for pure integer formatsRoland Scheidegger2012-12-181-0/+1
| | | | | | | | | | | Since the idea is to just expand or shrink the bit width but not otherwise do conversion we also need to adjust the sign bit according to src, otherwise the conversion code will incorrectly clamp the values. (Since this only works for casting to ordinary floats the norm and fixed bits should always be fine.) This fixes the remaining piglit attribs GL3 failures. Reviewed-by: José Fonseca <[email protected]>
* glsl: Fix gl_context vs. ralloc context in check_version again, again.Kenneth Graunke2012-12-171-2/+2
| | | | | | Dave found some, but there were more. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=58039
* vega: fix for object handle leakAndreas Pokorny2012-12-174-1/+8
| | | | | | | | frees the object handle when a OpenVG is destroyed. Signed-off-by: Andreas Pokorny <[email protected]> Signed-off-by: Brian Paul <[email protected]>
* wmesa: include version.h to silence warningBrian Paul2012-12-171-0/+1
|
* xlib: include headers to fix errors/warningsBrian Paul2012-12-171-0/+2
|
* mesa osmesa/x11: fix build error introduced in 4bea4cb9Jordan Justen2012-12-172-8/+8
| | | | | | | Fixes https://bugs.freedesktop.org/show_bug.cgi?id=58380 Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* gallivm: fix texel fetch for array textures (2)Roland Scheidegger2012-12-171-2/+3
| | | | | | | | a460aea3f14222af46f88d1bc686f82180b8a872 wasn't entirely correct, since all coords are already ints hence need to skip the iround. Passes piglit texelFetch with sampler1DArray/sampler2DArray. Reviewed-by: Dave Airlie <[email protected]>
* mesa: assert if driver did not compute the versionJordan Justen2012-12-163-1/+4
| | | | | | | | | | Make sure drivers initialize the version before: * _mesa_initialize_exec_table is called * _mesa_initialize_exec_table_vbo is called * A context is made current Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: don't initialize VBO vtxfmt in _vbo_CreateContextJordan Justen2012-12-164-10/+0
| | | | | | | | The driver should call _mesa_initialize_vbo_vtxfmt after computing the context version. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: don't initialize exec dispatch tables in _mesa_initialize_contextJordan Justen2012-12-161-3/+0
| | | | | | | | Drivers must compute the context version, and then call _mesa_initialize_exec_table themselves. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* mesa dispatch_sanity: call new functions to initialize exec tableJordan Justen2012-12-161-1/+6
| | | | | | | | | | | In a future patch the exec functions will no longer set up by _mesa_initialize_context and _vbo_CreateContext. Therefore we must call _mesa_initialize_exec_table and _mesa_initialize_exec_table_vbo. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* drivers: compute version and then initialize exec tableJordan Justen2012-12-1612-2/+81
| | | | | | | | This change forces the context version to be computed before initilizing the exec dispatch tables. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* vbo: add _mesa_initialize_vbo_vtxfmtJordan Justen2012-12-162-0/+19
| | | | | | | | This function initializes the exec/save dispatch tables for VBO vtxfmt. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: separate exec allocation from initializationJordan Justen2012-12-163-16/+15
| | | | | | | | | | | | | | | | | | | In glapi/gl_genexec.py: * Remove _mesa_alloc_dispatch_table call In glapi/gl_genexec.py and api_exec.h: * Rename _mesa_create_exec_table to _mesa_initialize_exec_table In context.c: * Call _mesa_alloc_dispatch_table instead of _mesa_create_exec_table * Call _mesa_initialize_exec_table (this is temporary) Once all drivers have been modified to call _mesa_initialize_exec_table, then the call to _mesa_initialize_context can be removed from context.c. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* r600g: fixup offset types for printingDave Airlie2012-12-162-4/+4
| | | | | | This allows the debug code to at least show the sign properly. Signed-off-by: Dave Airlie <[email protected]>
* gallium/u_blitter: Remove the overlapped blit assert from ↵Henri Verbeet2012-12-161-28/+0
| | | | | | | | | | | | util_blitter_blit_generic(). This is used by st_BlitFramebuffer() / r600_blit(), and ARB_fbo allows overlapped blits, even though the result is undefined. No piglit regressions on r600g / CYPRESS. Signed-off-by: Henri Verbeet <[email protected]> Reviewed-by: Michel Dänzer <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* glsl_parser_extras.cpp: fixup gl vs mem contexts again.Dave Airlie2012-12-161-4/+4
| | | | | | | | | This should fix: https://bugs.freedesktop.org/show_bug.cgi?id=58039 Tested-by: Darxus on bug 58039 Reviewed-by: Kenneth Graunke <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* i965: Move BRW_MAX_GRF and similar defines to brw_reg.h.Kenneth Graunke2012-12-152-18/+17
| | | | | | These don't really belong in brw_structs.h. Reviewed-by: Eric Anholt <[email protected]>
* i965: Split struct brw_reg out from brw_eu.h into its own header.Kenneth Graunke2012-12-152-709/+778
| | | | | | | | | | | | | | | | | | | struct brw_instruction and the related instruction emitting code won't be useful on Gen8+, as the instruction encoding changed. However, the struct brw_reg code is still extremely valuable. While we're at it, fix up some style points: - s/GLuint/unsigned/g - s/GLint/int/g - s/GLshort/int16_t/g - s/GLushort/uint16_t/g - s/INLINE/inline/g - Replace tabs with spaces - Put return types on a separate line from the function name/parameters - Remove trailing whitespace - Remove extraneous whitespace around function parameters Reviewed-by: Eric Anholt <[email protected]>
* st/mesa: add texture buffer object rgb32 support.Dave Airlie2012-12-161-1/+13
| | | | | | | This checks if the pipe driver can support RGB32 formats. Reviewed-by: Marek Olšák <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* mesa: add support for ARB_texture_buffer_object_rgb32Dave Airlie2012-12-163-0/+15
| | | | | | | | | | This adds the extensions + the tex buffer support for checking the formats. There is a piglit test enhancement sent to that list. Reviewed-by: Marek Olšák <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* glsl: avoid using gl context as a memory contextDave Airlie2012-12-151-4/+5
| | | | | | | | Not sure what was going on here, but running piglit with debug builds might be a good plan :-) Reviewed-by: Kenneth Graunke <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* i965: Add missing autoconf bits so test_vec4_register_coalesce will buildIan Romanick2012-12-141-0/+3
| | | | | Signed-off-by: Ian Romanick <[email protected]> Tested-by: Eric Anholt <[email protected]>
* i965: Generalize VS compute-to-MRF for compute-to-another-GRF, too.Eric Anholt2012-12-143-61/+128
| | | | | | | | | No statistically significant performance difference on glbenchmark 2.7 (n=60). It reduces cycles spent in the vertex shader by 3.3% +/- 0.8% (n=5), but that's only about .3% of all cycles spent according to the fixed shader_time. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: Extend opt_compute_to_mrf to handle limited "reswizzling"Eric Anholt2012-12-143-9/+113
| | | | | | | | | | | | | The way our visitor works, scalar expression/swizzle results that get stored in channels other than .x will have an intermediate MOV from their result in the .x channel to the real .y (or whatever) channel, and similarly for vec2/vec3 results. By knowing how to adjust DP4-type instructions for optimizing out a swizzled MOV, we can reduce instructions in common matrix multiplication cases. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: Add a unit test for opt_compute_to_mrf().Eric Anholt2012-12-143-2/+133
| | | | | | | | | | | | | The compute-to-mrf code is really twitchy, and it's hard to construct GLSL testcases for it. This unit test is also really hard to work with (for example, if your instruction is removed by dead code elimination, you end up inspecting something irrelevant), but I did use it for debugging some of the commits to follow. I called it test_vec4_register_coalesce because the compute-to-mrf code is about to morph into that. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Drop an unnecessary _safe on a list walk.Eric Anholt2012-12-141-1/+1
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Add a note explaining a detail of register_coalesce_2().Eric Anholt2012-12-141-0/+21
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Also consider HALTs a potential block end.Eric Anholt2012-12-141-0/+1
| | | | | | | | The final halt of the fragment shader turns off the remaining channels, then jumps such that everything is turned back on. So, we can have our last ENDIF of the shader point at that directly. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Jump to the end of the next outer conditional block on ENDIFs.Kenneth Graunke2012-12-141-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | From the Ivybridge PRM, Volume 4, Part 3, section 6.24 (page 172): "The endif instruction is also used to hop out of nested conditionals by jumping to the end of the next outer conditional block when all channels are disabled." Also: "Pseudocode: Evaluate(WrEn); if ( WrEn == 0 ) { // all channels false Jump(IP + JIP); }" First, ENDIF re-enables any channels that were disabled because they didn't match the conditional. If any channels are active, it proceeds to the next instruction (IP + 16). However, if they're all disabled, there's no point in walking through all of the instructions that have no effect---it can jump to the next instruction that might re-enable some channels (an ELSE, ENDIF, or WHILE). Previously, we always set JIP on ENDIF instructions to 2 (which is measured in 8-byte units). This made it do Jump(IP + 16), which just meant it would go to the next instruction even if all channels were off. It turns out that walking over instructions while all the channels are disabled like this is worse than just instruction dispatch overhead: if there are texturing messages, it still costs a couple hundred cycles to not-actually-read from the texture results. This patch finds the next instruction that could re-enable channels and sets JIP accordingly. Reviewed-by: Eric Anholt <[email protected]>
* i965: expose ARB_texture_cube_map_arrayChris Forbes2012-12-141-0/+1
| | | | | | | | | V3: Put enable in an existing block rather than making a new one for no good reason. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Fix setup for textureGrad(samplerCubeArray, coord, dPdx, dPdy)Eric Anholt2012-12-141-7/+12
| | | | | | Caught by tex_grad-01.frag. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Move the failure for gen7 16-wide intdiv to emit_math().Eric Anholt2012-12-142-7/+4
| | | | | | | The cube map array code adds another caller of emit_math(), which needs this check. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: fs: Add fixup for textureSize on Gen6/7Chris Forbes2012-12-141-0/+11
| | | | | | | | | | | | | | | V2: Moved up into emit(ir_texture *) to avoid duplication and fix ordering for Gen7; Gen6 math quirks moved into previous patches. Tested on Gen6 only; passes all the cube_map_array piglits. V3: Fixed weird whitespace V4: Use sampler->type; otherwise broken on arrays of samplers. v5: Minor style fixes (by anholt) Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: fs: fix gen6+ math operands in one placeChris Forbes2012-12-142-28/+33
| | | | | | | | | | V4: Fix various style nits as pointed out by Eric, and expand IMM operands on both Gen6 and Gen7. v5: minor style nits (by anholt) Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: vs: Add fixup for textureSize with cube array samplersChris Forbes2012-12-141-0/+13
| | | | | | | | | | | V3: Fixed weird whitespace V4: Use sampler's type rather than variable's type; otherwise broken with arrays of samplers. (Thanks Eric) v5: Fix a couple more style nits (by anholt) Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: Fix gen6+ math operand quirks in one placeChris Forbes2012-12-142-34/+28
| | | | | | | | | | | This causes immediate values to get moved to a temp on gen7, which is needed for an upcoming change but hadn't happened in the visitor until then. v2: Drop gen > 7 checks (doesn't exist), and style-fix comments (changes by anholt). Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add various plumbing for cubemap arraysChris Forbes2012-12-145-3/+11
| | | | | | | | V4: Fixed style nits Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Add empirically-determined instruction latencies for gen7.Eric Anholt2012-12-141-3/+179
| | | | | | | | | | | | | | | | v2: Actually switch on the other math instructions mentioned in the comment. v3: Add timing data for textureSize(), and clean up some long comment lines. Testing shader_time of fs16 shaders on a few frames of various apps: nexuiz improved by 2.9% +/- 1.5% (n=10) no difference on GLB2.5 (n=36, outliers removed) no difference on GLB2.7 (n=25) etqw improved by 2.6% +/- 2.2% (n=25) no difference on lightsmark (n=25) Acked-by: Kenneth Graunke <[email protected]>
* i965/fs: Fix the clock increment in scheduling.Eric Anholt2012-12-141-3/+15
| | | | | | | I've tested this to be true with various ALU ops on gen7 (with the exception of MADs, which go at either 3 or 4 cycles per dispatch). Acked-by: Kenneth Graunke <[email protected]>
* i965/fs: Move the old gen4 bspec-based scheduling info to a helper func.Eric Anholt2012-12-141-33/+41
| | | | | | For gen7 everything changes, and we have actual information on latency. Acked-by: Kenneth Graunke <[email protected]>
* i965/fs: Set up gen7 UBO loads as sends from GRFs.Eric Anholt2012-12-145-7/+114
| | | | | | | | | | | | This gives the instruction scheduler a chance to schedule between the loads, whereas before it was restricted due to the dependencies between the MRFs for setting them up. For one shader in gles3conform, it goes from getting stuck in register allocation for as long as anybody's bothered to leave it running down to 23 seconds, thanks to the LIFO scheduling. Acked-by: Kenneth Graunke <[email protected]>
* i965/fs: Before reg alloc, schedule instructions to reduce live ranges.Eric Anholt2012-12-141-6/+41
| | | | | | | | | | | | | | | | | | | | | | | | | This came from an idea by Ben Segovia. 16-wide pixel shaders are very important for latency hiding on i965, so we want to try really hard to get them. If scheduling an instruction makes some set of instructions available, those are probably the ones that make the instruction's result dead. By choosing those first, we'll have a tendency to reduce the amount of live data as opposed to creating more. Previously, we were sometimes getting this behavior out of the scheduler, which was what produced the scheduler's original performance wins on lightsmark. Unfortunately, that was mostly an accident of the lame instruction latency information that I had, which made it impossible to fix the actual scheduling for performance. Now that we've fixed the scheduling for setup for register allocation, we can safely update the latency parameters for the final schedule. In shader-db, we lose 37 16-wide shaders, but gain 90 new ones. 4 shaders that were spilling change how many registers spill, for a reduction of 70/3899 instructions. v2: Simplify the new loop. Acked-by: Kenneth Graunke <[email protected]>
* i965/fs: Add some optional debug printfs to scheduling.Eric Anholt2012-12-141-0/+21
| | | | | | Seeing when instructions become available to schedule is really useful. Acked-by: Kenneth Graunke <[email protected]>
* i965/fs: Schedule instructions both before and after register allocation.Eric Anholt2012-12-143-18/+78
| | | | Acked-by: Kenneth Graunke <[email protected]>
* i965: Make sure that the shader_time report at context destroy happens.Eric Anholt2012-12-141-0/+3
| | | | | | Otherwise, you end up with some report from within a second of context destroy, which is now what you really want for testing the impact of changes
* i965: Print a total time for the different shader stages.Eric Anholt2012-12-141-10/+38
| | | | | | | | | Sometimes I've got a patch for a performance optimization that's not showing a statistically significant performance difference on reported FPS, but still seems like a good idea because it ought to reduce time spent in the shader. If I can see the total number of cycles spent in the shader stage being optimized, it may show that the patch is still worthwhile (or point out that it's actually broken in some way).
* i965: Scale shader_time to compensate for resets.Eric Anholt2012-12-144-9/+83
| | | | | | | | | | Some shaders experience resets more than others, which skews the numbers reported. Attempt to correct for this by linearly scaling according to the number of resets that happen. Note that will not be accurate if invocations of shaders have varying times and longer invocations are more likely to reset. However, this should at least be better than the previous situation.