summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* i965/fs: Emit compressed BFI2 instructions on Gen > 7.Matt Turner2014-09-301-1/+1
| | | | | | | IVB had a restriction that prevented us from emitting compressed three-source instructions, and although that was lifted on Haswell, Haswell had a new restriction that said BFI instructions specifically couldn't be compressed.
* i965/fs: Allow SIMD16 borrow/carry/64-bit multiply on Gen > 7.Matt Turner2014-09-301-3/+3
| | | | | | | These checks were intended for Gen 7 only. None of these restrictions apply to Gen 8. Reviewed-by: Ian Romanick <[email protected]>
* i965/fs: Set MUL source type to W/UW in 64-bit mul macro on Gen8.Matt Turner2014-09-301-1/+22
| | | | Reviewed-by: Ian Romanick <[email protected]>
* i965/fs: Optimize sqrt+inv into rsq.Matt Turner2014-09-301-0/+11
| | | | | | | | | | | | | | | | | | Transform sqrt a, b rcp c, a into sqrt a, b rsq c, b The improvement here is that we've broken a dependency between these instructions. Leads to 330 fewer INV instructions and 330 more RSQ. Reviewed-by: Anuj Phogat <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965/vec4: Optimize sqrt+inv into rsq.Matt Turner2014-09-301-0/+11
| | | | | | | | | | | | | | | | | | | | | | | Transform sqrt a, b rcp c, a into sqrt a, b rsq c, b In most cases the sqrt's result is still used, so the improvement here is that we've broken a dependency between these instructions. Leads to 80 fewer INV instructions and 80 more RSQ. Occasionally the sqrt's result is no longer used, leading to: instructions in affected programs: 5005 -> 4949 (-1.12%) Reviewed-by: Anuj Phogat <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965/vec4: Call opt_algebraic after opt_cse.Matt Turner2014-09-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The next patch adds an algebraic optimization for the pattern sqrt a, b rcp c, a and turns it into sqrt a, b rsq c, b but many vertex shaders do a = sqrt(b); var1 /= a; var2 /= a; which generates sqrt a, b rcp c, a rcp d, a If we apply the algebraic optimization before CSE, we'll end up with sqrt a, b rsq c, b rcp d, a Applying CSE combines the RCP instructions, preventing this from happening. No shader-db changes. Reviewed-by: Anuj Phogat <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965/fs: Extend predicated break pass to predicate WHILE.Matt Turner2014-09-301-0/+36
| | | | | | | | Helps a handful of programs in Serious Sam 3 that use do-while loops. instructions in affected programs: 16114 -> 16075 (-0.24%) Reviewed-by: Ian Romanick <[email protected]>
* gallivm: Fix build for LLVM 3.2Mathias Fröhlich2014-10-014-3/+21
| | | | | | | | | | | | Do not rely on LLVMMCJITMemoryManagerRef being available. The c binding to the memory manager objects only appeared on llvm-3.4. The change is based on an initial patch of Brian Paul. Reviewed-by: Brian Paul <[email protected]> Tested-by: Brian Paul <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Mathias Froehlich <[email protected]>
* freedreno: destroy transfer pool after blitterRob Clark2014-09-301-2/+2
| | | | | | | | Blitter can still have transfers hanging around which it frees in util_blitter_destroy(). So let it clean up before we yank the transfer_pool from under it. Signed-off-by: Rob Clark <[email protected]>
* freedreno/lowering: fix token calculation for loweringRob Clark2014-09-301-16/+39
| | | | | | | Indirect registers consume an additional token. Try to clean up the token calculation math a bit, and fix it at the same time. Signed-off-by: Rob Clark <[email protected]>
* i965/fs: Don't make a name for a vector splitting temporaryIan Romanick2014-09-301-3/+8
| | | | | | | | | | If the name is just going to get dropped, don't bother making it. If the name is made, release it sooner (rather than later). No change Valgrind massif results for a trimmed apitrace of dota2. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Don't make a name for the function return variableIan Romanick2014-09-301-4/+7
| | | | | | | | | | If the name is just going to get dropped, don't bother making it. If the name is made, release it sooner (rather than later). No change Valgrind massif results for a trimmed apitrace of dota2. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Don't allocate a name for ir_var_temporary variablesIan Romanick2014-09-304-0/+28
| | | | | | | | | | | | | | | | Valgrind massif results for a trimmed apitrace of dota2: n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B) Before (32-bit): 74 40,578,719,715 67,762,208 62,263,404 5,498,804 0 After (32-bit): 52 40,565,579,466 66,359,800 61,187,818 5,171,982 0 Before (64-bit): 74 37,129,541,061 95,195,160 87,369,671 7,825,489 0 After (64-bit): 76 37,134,691,404 93,271,352 85,900,223 7,371,129 0 A real savings of 1.0MiB on 32-bit and 1.4MiB on 64-bit. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Use ir_var_temporary for compiler generated temporariesIan Romanick2014-09-303-3/+4
| | | | | | | | | | These few places were using ir_var_auto for seemingly no reason. The names were not added to the symbol table. No change Valgrind massif results for a trimmed apitrace of dota2. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Add context-level controls for whether temporaries have real namesIan Romanick2014-09-304-0/+23
| | | | | | | | | No change Valgrind massif results for a trimmed apitrace of dota2. v2: Minor rebase on _mesa_init_constants changes. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Never put ir_var_temporary variables in the symbol tableIan Romanick2014-09-305-5/+14
| | | | | | | | | | | Later patches will give every ir_var_temporary the same name in release builds. Adding a bunch of variables named "compiler_temp" to the symbol table can only cause problems. No change Valgrind massif results for a trimmed apitrace of dota2. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Add the possibility for ir_variable to have a non-ralloced nameIan Romanick2014-09-303-2/+30
| | | | | | | | | | Specifically, ir_var_temporary variables constructed with a NULL name will all have the name "compiler_temp" in static storage. No change Valgrind massif results for a trimmed apitrace of dota2. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Store ir_variable_data::_num_state_slots and ::binding in 16-bits eachIan Romanick2014-09-301-8/+16
| | | | | | | | | | | | | | | | | Valgrind massif results for a trimmed apitrace of dota2: n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B) Before (32-bit): 44 40,577,049,140 68,118,608 62,441,063 5,677,545 0 After (32-bit): 71 40,583,408,411 67,761,528 62,263,519 5,498,009 0 Before (64-bit): 63 37,122,829,194 95,153,008 87,333,600 7,819,408 0 After (64-bit): 67 37,123,303,706 95,150,544 87,333,600 7,816,944 0 A real savings of 173KiB on 32-bit and no change on 64-bit. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* glsl: Squish ir_variable::max_ifc_array_access and ::state_slots togetherIan Romanick2014-09-303-36/+48
| | | | | | | | | | | | | | | | | | | | | | | | At least one of these pointers must be NULL, and we can determine which will be NULL by looking at other fields. Use this information to store both pointers in the same location. If anyone can think of a better name for the union than "u", I'm all ears. Valgrind massif results for a trimmed apitrace of dota2: n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B) Before (32-bit): 63 40,574,239,515 68,117,280 62,618,607 5,498,673 0 After (32-bit): 44 40,577,049,140 68,118,608 62,441,063 5,677,545 0 Before (64-bit): 53 37,126,451,468 95,150,256 87,711,304 7,438,952 0 After (64-bit): 63 37,122,829,194 95,153,008 87,333,600 7,819,408 0 A real savings of 173KiB on 32-bit and 368KiB on 64-bit. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* glsl: Make ir_variable::num_state_slots and ir_variable::state_slots privateIan Romanick2014-09-3010-56/+78
| | | | | | | | | | | | Also move num_state_slots inside ir_variable_data for better packing. The payoff for this will come in a few more patches. No change Valgrind massif results for a trimmed apitrace of dota2. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* glsl: Make ir_variable::max_ifc_array_access privateIan Romanick2014-09-305-22/+53
| | | | | | | | | | The payoff for this will come in a few more patches. No change Valgrind massif results for a trimmed apitrace of dota2. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* glsl: Store ir_variable::depth_layout using 3 bitsIan Romanick2014-09-301-10/+9
| | | | | | | | | | | | | | | | | | | | | | | warn_extension_index was moved to improve packing. Valgrind massif results for a trimmed apitrace of dota2: n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B) Before (32-bit): 73 40,580,476,304 68,488,400 62,796,151 5,692,249 0 After (32-bit): 73 40,575,751,558 68,116,528 62,618,607 5,497,921 0 Before (64-bit): 71 37,124,890,613 95,889,584 88,089,008 7,800,576 0 After (64-bit): 62 37,123,578,526 95,150,784 87,711,304 7,439,480 0 A real savings of 173KiB on 32-bit and 368KiB on 64-bit. v2: Use the enum name with the bit-field and remove the extra casts. Suggested by Ken. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> [v1] Reviewed-by: Tapani Pälli <[email protected]> [v1]
* glsl: Replace ir_variable::warn_extension pointer with an 8-bit indexIan Romanick2014-09-303-10/+31
| | | | | | | | | | | | | | | | | | | | Also move the new warn_extension_index into ir_variable::data. This enables slightly better packing. Valgrind massif results for a trimmed apitrace of dota2: n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B) Before (32-bit): 82 40,580,040,531 68,488,992 62,973,695 5,515,297 0 After (32-bit): 73 40,580,476,304 68,488,400 62,796,151 5,692,249 0 Before (64-bit): 65 37,124,013,542 95,892,768 88,466,712 7,426,056 0 After (64-bit): 71 37,124,890,613 95,889,584 88,089,008 7,800,576 0 A real savings of 173KiB on 32-bit and 368KiB on 64-bit. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* glsl: Use accessors for ir_variable::warn_extensionIan Romanick2014-09-303-7/+30
| | | | | | | | | | The payoff for this will come in the next patch. No change Valgrind massif results for a trimmed apitrace of dota2. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* glsl: Eliminate unused built-in variables after compilationIan Romanick2014-09-304-0/+104
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After compilation (and before linking) we can eliminate quite a few built-in variables. Basically, any uniform or constant (e.g., gl_MaxVertexTextureImageUnits) that isn't used (with one exception) can be eliminated. System values, vertex shader inputs (with one exception), and fragment shader outputs that are not used and not re-declared in the shader text can also be removed. gl_ModelViewProjectMatrix and gl_Vertex are used by the built-in function ftransform. There are some complications with eliminating these variables (see the comment in the patch), so they are not eliminated. Valgrind massif results for a trimmed apitrace of dota2: n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B) Before (32-bit): 46 40,661,487,174 75,116,800 68,854,065 6,262,735 0 After (32-bit): 50 40,564,927,443 69,185,408 63,683,871 5,501,537 0 Before (64-bit): 64 37,200,329,700 104,872,672 96,514,546 8,358,126 0 After (64-bit): 59 36,822,048,449 96,526,888 89,113,000 7,413,888 0 A real savings of 4.9MiB on 32-bit and 7.0MiB on 64-bit. v2: Don't remove any built-in with Transpose in the name. v3: Fix comment typo noticed by Anuj. Signed-off-by: Ian Romanick <[email protected]> Suggested-by: Eric Anholt <[email protected]> Reviewed-by: Matt Turner <[email protected]> Acked-by: Anuj Phogat <[email protected]> Cc: Eric Anholt <[email protected]>
* glsl: Validate that built-in uniforms have backing stateIan Romanick2014-09-301-0/+8
| | | | | | | | | | | All built-in uniforms are supposed to be backed by some GL state. The state_slots field describes this backing state. This helped me track down a bug in a later patch. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]> Acked-by: Anuj Phogat <[email protected]>
* vc4: Don't forget to store stencil along with depth when storing either.Eric Anholt2014-09-301-1/+1
| | | | | Otherwise, we'd replace the stencil in our packed depth/stencil with 0s. Fixes about 50 piglit tests.
* llvmpipe: Reuse llvmpipes LLVMContext in the draw context.Mathias Fröhlich2014-09-305-9/+31
| | | | | | | | | | | Reuse the LLVMContext already allocated in llvmpipe_context for draw_llvm if ppossible. This should decrease the memory footprint of an llvmpipe context. v2: Fix compile with llvm disabled. Reviewed-by: Jose Fonseca <[email protected]> Signed-off-by: Mathias Froehlich <[email protected]>
* llvmpipe: Make a llvmpipe OpenGL context thread safe.Mathias Fröhlich2014-09-304-18/+40
| | | | | | | | | | | | | | | | | This fixes the remaining problem with the recently introduced global jit memory manager. This change again uses a memory manager that is local to gallivm_state. This implementation still frees the majority of the memory immediately after compilation. Only the generated code is deferred until this code is no longer used. This change and the previous one using private LLVMContext instances I can now safely run several independent OpenGL contexts driven by llvmpipe from different threads. v3: Rebase on llvm-3.6 compile fixes. Reviewed-by: Jose Fonseca <[email protected]> Signed-off-by: Mathias Froehlich <[email protected]>
* llvmpipe: Use two LLVMContexts per OpenGL context instead of a global one.Mathias Fröhlich2014-09-3013-31/+40
| | | | | | | | | | | | This is one step to make llvmpipe thread safe as mandated by the OpenGL standard. Using the global LLVMContext is obviously a problem for that kind of use pattern. The patch introduces two LLVMContext instances that are private to an OpenGL context and used for all compiles. One is put into struct draw_llvm and the other one into struct llvmpipe_context. Reviewed-by: Jose Fonseca <[email protected]> Signed-off-by: Mathias Froehlich <[email protected]>
* i965/brw_reg: Make the accumulator register take an explicit width.Jason Ekstrand2014-09-303-10/+15
| | | | | | | The big pile of patches I just pushed regresses about 25 piglit tests on SNB. This fixes the regressions. Signed-off-by: Jason Ekstrand <[email protected]>
* llvmpipe: move lp_jit_screen_init() call after allocation of screen objectBrian Paul2014-09-301-3/+5
| | | | | | | | | The screen argument isn't actually used by lp_jit_screen_init() at this time, but let's move the call so that we pass a valid pointer. v2: don't leak screen if lp_jit_screen_init() fails. Reviewed-by: Roland Scheidegger <[email protected]>
* tgsi: fix Semantic.Name assignment in tgsi_transform_input_decl()Brian Paul2014-09-301-1/+1
| | | | | | | Assign the sem_name parameter, not TGSI_SEMANTIC_GENERIC. Fixes polygon stipple regression. Reviewed-by: Charmaine Lee <[email protected]>
* util: simplify PIPE_TEXTURE_CUBE case in util_max_layer()Brian Paul2014-09-301-2/+3
| | | | | | | | | For cube resources, the array_size value should be 6. So handle that case as we do for array texture resources. But assert that array_size==6 just to be safe. Reviewed-by: Ilia Mirkin <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* softpipe: don't special case PIPE_TEXTURE_CUBE in softpipe_resource_layout()Brian Paul2014-09-301-2/+3
| | | | | | As with the previous patch for llvmpipe. Reviewed-by: Ilia Mirkin <[email protected]>
* llvmpipe: remove special case for PIPE_TEXTURE_CUBE in llvmpipe_texture_layout()Brian Paul2014-09-301-3/+6
| | | | | | | layers (aka array_size) should be 6 for cube textures so we don't need to special-case it. But add an assertion just to be safe. Reviewed-by: Ilia Mirkin <[email protected]>
* gallium: add doc note about cube textures and can_create_resource()Brian Paul2014-09-301-0/+2
| | | | | | Just to be clear, and echo the description for resource_create(). Reviewed-by: Ilia Mirkin <[email protected]>
* st/mesa: remove unneded PIPE_TEXTURE_CUBE check in st_texture_create()Brian Paul2014-09-301-1/+1
| | | | | | | Earlier in the function we assert layers==6 for PIPE_TEXTURE_CUBE so there's no reason to special-case the pt.array_size = layers assignment. Reviewed-by: Ilia Mirkin <[email protected]>
* mesa: Drop the always-software-primitive-restart paths.Eric Anholt2014-09-305-58/+8
| | | | | | | The core sw primitive restart code is still around, because i965 uses it in some cases, but there are no drivers that want it on all the time. Reviewed-by: Rob Clark <[email protected]>
* gallium: Drop software-only primitive restart support.Eric Anholt2014-09-301-3/+2
| | | | | | | | | | | | | The drivers not flagging primitive restart support are r300 swtcl, svga, nv30, and vc4. The point of primitive restart is to slightly reduce draw call overhead for apps by batching multiple draws. If we do an extra pass to read the index buffer and split back into multiple draws, we've entirely missed the point. This is particularly bad for drivers that otherwise have hardware IB reads, where the readback is probably uncached. Reviewed-by: Rob Clark <[email protected]>
* i965/fs: Properly calculate the number of instructions in ↵Jason Ekstrand2014-09-301-1/+3
| | | | | | | calculate_register_pressure Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Use the GRF for FB writes on gen >= 7Jason Ekstrand2014-09-306-71/+142
| | | | | | | | | | | | | | | On gen 7, the MRF was removed and we gained the ability to do send instructions directly from the GRF. This commit enables that functinoality for FB writes. v2: Make handling of components more sane. i965/fs: Force a high register for the final FB write v2: Renamed the array for the range mappings and added a comment Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Handle COMPR4 in LOAD_PAYLOADJason Ekstrand2014-09-302-1/+36
| | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Constant propagate into LOAD_PAYLOADJason Ekstrand2014-09-301-0/+1
| | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Add split_virtual_grfs and compute_to_mrf after lower_load_payloadJason Ekstrand2014-09-301-0/+2
| | | | | | | | If we are going to use LOAD_PAYLOAD operations to fill MRF registers, then we will need this. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Add a an optional source to the FS_OPCODE_FB_WRITE instructionJason Ekstrand2014-09-304-29/+28
| | | | | | | | | Previously, we were use the base_mrf parameter of fs_inst to store the MRF location. In preparation for doing FB writes from the GRF, we now also allow you to set inst->base_mrf to -1 and provide a source register. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Use the GRF for UNTYPED_SURFACE_READ instructionsJason Ekstrand2014-09-304-16/+24
| | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Use the GRF for UNTYPED_ATOMIC instructionsJason Ekstrand2014-09-306-25/+36
| | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Add a function for getting a component of a 8 or 16-wide registerJason Ekstrand2014-09-301-0/+10
| | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Use the instruction execution size directly for texture generationJason Ekstrand2014-09-301-3/+10
| | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>