summaryrefslogtreecommitdiffstats
path: root/src/glsl
Commit message (Collapse)AuthorAgeFilesLines
* linker: Add a missing space in an error messageNeil Roberts2014-11-131-1/+1
| | | | Reviewed-by: Brian Paul <[email protected]>
* glsl: Swap the order of glsl_type::name and ::lengthIan Romanick2014-11-102-8/+8
| | | | | | | | | | | | On x86-64 this saves 8 bytes of padding in the structure, and this reduces the size of the structure to 32 bytes. v2: Fix constructor so that GCC won't warn about the order of initialization. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* glsl: Store glsl_type::vector_elements and ::matrix_columns as uint8_tIan Romanick2014-11-101-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Due to the total number of bits used in the bitfield, this does not increase the size of the structure. It does, however, reduce the number of instructions required each time one of these fields is accessed. To access ::matrix_columns with the bitfield, three instructions were required: movzbl 0x9(%rdx),%eax shr %al and $0x7,%eax As a uint8_t, only one instruction is required. movzbl 0xa(%rdx),%eax These fields are accessed *a lot*. Valgrind callgrind results for a trace of Tesseract: _mesa_Uniform4fv _mesa_Uniform4f _mesa_Uniform1i Before (64-bit): 48,103,497 16,556,096 676,447 After (64-bit): 45,722,616 15,737,964 670,607 _mesa_Uniform4fv _mesa_Uniform4f _mesa_Uniform1i Before (32-bit): 61,472,611 21,051,222 821,361 After (32-bit): 57,987,421 19,872,226 811,609 Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* glsl/list: Revert unintentional file mode change in previous commit.Vinson Lee2014-11-071-0/+0
| | | | Signed-off-by: Vinson Lee <[email protected]>
* glsl/list: Move declaration before code.Vinson Lee2014-11-071-1/+3
| | | | | | | | | | | | Fixes MSVC build error. shaderapi.c src\glsl\list.h(535) : error C2143: syntax error : missing ';' before 'type' src\glsl\list.h(535) : error C2143: syntax error : missing ')' before 'type' src\glsl\list.h(536) : error C2065: 'node' : undeclared identifier Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86025 Signed-off-by: Vinson Lee <[email protected]>
* glsl/list: Add an exec_list_validate functionJason Ekstrand2014-11-071-0/+19
| | | | | | | This can be very useful for trying to debug list corruptions. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Skip loop-too-large heuristic if indexing arrays of a certain sizeKenneth Graunke2014-11-061-1/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A pattern in certain shaders is: uniform vec4 colors[NUM_LIGHTS]; for (int i = 0; i < NUM_LIGHTS; i++) { ...use colors[i]... } In this case, the application author expects the shader compiler to unroll the loop. By doing so, it replaces variable indexing of the array with constant indexing, which is more efficient. This patch extends the heuristic to see if arrays accessed within the loop are indexed by an induction variable, and if the array size exactly matches the number of loop iterations. If so, the application author probably intended us to unroll it. If not, we rely on the existing loop-too-large heuristic. Improves performance in a phong shading microbenchmark by 2.88x, and a shadow mapping microbenchmark by 1.63x. Without variable indexing, we can upload the small uniform arrays as push constants instead of pull constants, avoiding shader memory access. Affects several games, but doesn't appear to impact their performance. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Acked-by: Kristian Høgsberg <[email protected]>
* glsl: Lower constant arrays to uniform arrays.Kenneth Graunke2014-11-064-0/+106
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Consider GLSL code such as: const ivec2 offsets[] = ivec2[](ivec2(-1, -1), ivec2(-1, 0), ivec2(-1, 1), ivec2(0, -1), ivec2(0, 0), ivec2(0, 1), ivec2(1, -1), ivec2(1, 0), ivec2(1, 1)); ivec2 offset = offsets[<non-constant expression>]; Both i965 and nv50 currently handle this very poorly. On i965, this becomes a pile of MOVs to load the immediate constants into registers, a pile of scratch writes to move the whole array to memory, and one scratch read to actually access the value - effectively the same as if it were a non-constant array. We'd much rather upload large blocks of constant data as uniform data, so drivers can simply upload the data via constbufs, and not have to populate it via shader instructions. This is currently non-optional because both i965 and nouveau benefit from it, and according to Marek radeonsi would benefit today as well. (According to Tom, radeonsi may want to handle this itself in the long term, but we can always add a flag when it becomes useful.) Improves performance in a terrain rendering microbenchmark by about 2x, and cuts the number of instructions in about half. Helps a lot of "Natural Selection 2" shaders, as well as one "HOARD" shader. total instructions in shared programs: 5473459 -> 5471765 (-0.03%) instructions in affected programs: 5880 -> 4186 (-28.81%) v2: Use ir_var_hidden to avoid exposing the new uniform via the GL uniform introspection API. v3: Alphabetize Makefile.sources properly. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77957 Signed-off-by: Kenneth Graunke <[email protected]>
* glsl: Add infrastructure for "hidden" uniforms.Kenneth Graunke2014-11-063-0/+62
| | | | | | | | | | | | | | | In the compiler, we'd like to generate implicit uniforms for internal use. These should not be visible via the GL uniform introspection API. To support that, we add a new ir_variable::how_declared value of ir_var_hidden, and plumb that through to gl_uniform_storage. v2 (idr): Fix some memory management issues in move_hidden_uniforms_to_end. The comment block on the function has more details. Signed-off-by: Kenneth Graunke <[email protected]> Signed-off-by: Ian Romanick <[email protected]>
* glsl: Improve the CSE pass debugging output.Kenneth Graunke2014-11-031-1/+8
| | | | | | | | The CSE pass now prints out why it thinks a value is not a candidate for adding to the AE set. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Remove now useless dot optimization on basis vectMatt Turner2014-11-033-92/+3
| | | | | | | The optimization in commit d056863b covers these cases, which were the first optimizations I added to the GLSL compiler. Reviewed-by: Ian Romanick <[email protected]>
* glsl: Emit mul instead of dot if only one component left.Matt Turner2014-11-031-1/+4
| | | | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85683 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85691 Reviewed-by: Ian Romanick <[email protected]>
* glsl: protect glsl_type with a mutexChia-I Wu2014-10-302-10/+62
| | | | | | | | | | glsl_type has several static hash tables and a static ralloc context. They need to be protected by a mutex as they are not thread-safe. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=69200 Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* glsl: protect anonymous struct id with a mutexChia-I Wu2014-10-301-2/+8
| | | | | | | | | There may be two contexts compiling shaders at the same time, and we want the anonymous struct id to be globally unique. Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* util: add _mesa_strtod and _mesa_strtofChia-I Wu2014-10-306-135/+9
| | | | | | | | | Both core mesa and glsl have their own wrappers for strtof_l. Merge and move them to util/. They are compiled with a C++ compiler so that we can make them thread-safe in a following commit. Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* glsl: Drop constant 0.0 components from dot products.Matt Turner2014-10-291-0/+27
| | | | | | | | | Helps a small number of vertex shaders in the games Dungeon Defenders and Shank, as well as an internal benchmark. instructions in affected programs: 2801 -> 2719 (-2.93%) Reviewed-by: Kenneth Graunke <[email protected]>
* glsl: Standardize names and fix typosAndres Gomez2014-10-242-7/+7
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: Silence unused parameter warning in _mesa_clear_shader_program_dataIan Romanick2014-10-243-5/+3
| | | | | | | | | Just remove the parameter. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* linker: Rely on _mesa_clear_shader_program_data to clear link informationIan Romanick2014-10-244-14/+34
| | | | | | | | _mesa_link_shader_program already calls _mesa_clear_shader_program_data before calling link_shaders, so this is already done. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Use signed array index in update_max_array_access()Anuj Phogat2014-10-221-3/+3
| | | | | | | | | Avoids a crash in case of negative array index is used in a shader program. Cc: <[email protected]> Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* glsl: Fix crash due to negative array indexAnuj Phogat2014-10-221-1/+1
| | | | | | | | | | | | | | | | Currently Mesa crashes with a shader like this: [fragmnet shader] float[5] array; int idx = -2; void main() { gl_FragColor = vec4(0.0, 1.0, 0.0, array[idx]); } Cc: <[email protected]> Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* glsl: Delete unused gl_uniform_driver_format enum values.Kenneth Graunke2014-10-211-11/+0
| | | | | | | | | | | | | | A while back, Matt made the uniform upload functions simply upload ctx->Const.UniformBooleanTrue for boolean values instead of 0/1, which removed the need to convert it later. We also set UniformBooleanTrue to 1.0f for drivers which want to treat booleans as 0.0/1.0f. Nothing ever sets these, so they are dead. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Tapani Pälli <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* glsl: fix several use-after-free bugsBrian Paul2014-10-201-3/+7
| | | | | | | | | | | | | | | | | | The get_variable_being_redeclared() function can free the 'var' argument. Thereafter, we cannot assume that 'var' is a valid pointer. This patch replaces 'var->name' with 'earlier->name' in two places and calls is_gl_identifier(var->name) before 'var' might get freed. This fixes several piglit GLSL crashes, including: spec/glsl-1.50/execution/geometry/clip-distance-in-param spec/glsl-1.50/execution/geometry/clip-distance-bulk-copy spec/glsl-1.50/compiler/gs-redeclares-pervertex-out-before-global-redeclaration.geom I'm not sure why these were not spotted sooner. A similar bug was previously fixed by f9cecca7a. Cc: <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* glsl: implement switch flow control using a loopTapani Pälli2014-10-202-37/+64
| | | | | | | | | | | | | | | | | | | | | | | Patch removes old variable based logic for handling a break inside switch. Switch is put inside a loop so that existing infrastructure for loop flow control can be used for the switch, now also dead code elimination works properly. Possible 'continue' call inside a switch needs now special handling which is taken care of by detecting continue, breaking out and calling continue for the outside loop. v2: remove one unnecessary ir_expression (Curro) Fixes following Piglit tests: fs-exec-after-break.shader_test fs-conditional-break.shader_test No Piglit or es3conform regressions. Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* glsl: Update and fix typos in README.Andres Gomez2014-10-161-8/+8
|
* glsl: improve accuracy of atan()Erik Faye-Lund2014-10-101-10/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Our current atan()-approximation is pretty inaccurate at 1.0, so let's try to improve the situation by doing a direct approximation without going through atan. This new implementation uses an 11th degree polynomial to approximate atan in the [-1..1] range, and the following identitiy to reduce the entire range to [-1..1]: atan(x) = 0.5 * pi * sign(x) - atan(1.0 / x) This range-reduction idea is taken from the paper "Fast computation of Arctangent Functions for Embedded Applications: A Comparative Analysis" (Ukil et al. 2011). The polynomial that approximates atan(x) is: x * 0.9999793128310355 - x^3 * 0.3326756418091246 + x^5 * 0.1938924977115610 - x^7 * 0.1173503194786851 + x^9 * 0.0536813784310406 - x^11 * 0.0121323213173444 This polynomial was found with the following GNU Octave script: x = linspace(0, 1); y = atan(x); n = [1, 3, 5, 7, 9, 11]; format long; polyfitc(x, y, n) The polyfitc function is not built-in, but too long to include here. It can be downloaded from the following URL: http://www.mathworks.com/matlabcentral/fileexchange/47851-constraint-polynomial-fit/content/polyfitc.m This fixes the following piglit test: shaders/glsl-const-folding-01 Signed-off-by: Erik Faye-Lund <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* glsl: Optimize min/max expression treesIago Toral Quiroga2014-10-074-0/+478
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Original patch by Petri Latvala <[email protected]>: Add an optimization pass that drops min/max expression operands that can be proven to not contribute to the final result. The algorithm is similar to alpha-beta pruning on a minmax search, from the field of AI. This optimization pass can optimize min/max expressions where operands are min/max expressions. Such code can appear in shaders by itself, or as the result of clamp() or AMD_shader_trinary_minmax functions. This optimization pass improves the generated code for piglit's AMD_shader_trinary_minmax tests as follows: total instructions in shared programs: 75 -> 67 (-10.67%) instructions in affected programs: 60 -> 52 (-13.33%) GAINED: 0 LOST: 0 All tests (max3, min3, mid3) improved. A full shader-db run: total instructions in shared programs: 4293603 -> 4293575 (-0.00%) instructions in affected programs: 1188 -> 1160 (-2.36%) GAINED: 0 LOST: 0 Improvements happen in Guacamelee and Serious Sam 3. One shader from Dungeon Defenders is hurt by shader-db metrics (26 -> 28), because of dropping of a (constant float (0.00000)) operand, which was compiled to a saturate modifier. Version 2 by Iago Toral Quiroga <[email protected]>: Changes from review feedback: - Squashed various cosmetic changes sent by Matt Turner. - Make less_all_components return an enum rather than setting a class member. (Suggested by Mat Turner). Also, renamed it to compare_components. - Make less_all_components, smaller_constant and larger_constant static. (Suggested by Mat Turner) - Change mixmax_range to call its limits "low" and "high" instead of "range[0]" and "range[1]". (Suggested by Connor Abbot). - Use ir_builder swizzle helpers in swizzle_if_required(). (Suggested by Connor Abbot). - Make the logic more clearer by rearrenging the code and commenting. (Suggested by Connor Abbot). - Added comment to explain why we need to recurse twice. (Suggested by Connor Abbot). - If we cannot prune an expression, do not return early. Instead, attempt to prune its children. (Suggested by Connor Abbot). Other changes: - Instead of having a global "valid" visitor member, let the various functions that can determine this status return a boolean and check for its value to decide what to do in each case. This is more flexible and allows to recurse into children of parents that could not be prunned due to invalid ranges (so related to the last bullet in the review feedback). - Make sure we always check if a range is valid before working with it. Since any use of get_range, combine_range or range_intersection can invalidate a range we should check for this situation every time we use any of these functions. Version 3 by Iago Toral Quiroga <[email protected]>: Changes from review feedback: - Now we can make get_range, combine_range and range_intersection static too (suggested by Connor Abbot). - Do not return NULL when looking for the larger or greater constant into mixed vector constants. Instead, produce a new constant by doing a component-wise minmax. With this we can also remove of the validations when we call into these functions (suggested by Connor Abbot). - Add a comment explaining the meaning of the baserange argument in prune_expression (suggested by Connor Abbot). Other changes: - Eliminate minmax expressions operating on constant vectors with mixed values by resolving them. No piglit regressions observed with Version 3. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76861 Reviewed-by: Connor Abbott <[email protected]>
* glsl: do not emit error for non written varyings on OpenGL ESTapani Pälli2014-10-071-2/+16
| | | | | | | | | | | Patch fixes following test case from 'shaders-with-varyings' WebGL conformance suite: "vertex shader with unused varying and fragment shader with used varying must succeed" v2: emit still a warning if the condition happens (Ian) Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* util: Include in Android buildsTomasz Figa2014-10-031-1/+3
| | | | | | | | | | This patch fixes Android build failures by including src/util directory in compilation. Files inside of this directory are compiled into libmesa_util static library and linked with resulting libGLES_mesa. Signed-off-by: Tomasz Figa <[email protected]> CC: <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* glsl: Fix memory leak in builtin_builder::_image_prototype.Iago Toral Quiroga2014-10-021-3/+5
| | | | | | in_var calls the ir_variable constructor, which dups the variable name. Reviewed-by: Ilia Mirkin <[email protected]>
* glsl: make consistent use of DECLARE_RALLOC_CXX_OPERATORSIlia Mirkin2014-10-022-47/+3
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* glsl: Don't make a name for the function return variableIan Romanick2014-09-301-4/+7
| | | | | | | | | | If the name is just going to get dropped, don't bother making it. If the name is made, release it sooner (rather than later). No change Valgrind massif results for a trimmed apitrace of dota2. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Don't allocate a name for ir_var_temporary variablesIan Romanick2014-09-304-0/+28
| | | | | | | | | | | | | | | | Valgrind massif results for a trimmed apitrace of dota2: n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B) Before (32-bit): 74 40,578,719,715 67,762,208 62,263,404 5,498,804 0 After (32-bit): 52 40,565,579,466 66,359,800 61,187,818 5,171,982 0 Before (64-bit): 74 37,129,541,061 95,195,160 87,369,671 7,825,489 0 After (64-bit): 76 37,134,691,404 93,271,352 85,900,223 7,371,129 0 A real savings of 1.0MiB on 32-bit and 1.4MiB on 64-bit. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Use ir_var_temporary for compiler generated temporariesIan Romanick2014-09-303-3/+4
| | | | | | | | | | These few places were using ir_var_auto for seemingly no reason. The names were not added to the symbol table. No change Valgrind massif results for a trimmed apitrace of dota2. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Add context-level controls for whether temporaries have real namesIan Romanick2014-09-301-0/+1
| | | | | | | | | No change Valgrind massif results for a trimmed apitrace of dota2. v2: Minor rebase on _mesa_init_constants changes. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Never put ir_var_temporary variables in the symbol tableIan Romanick2014-09-305-5/+14
| | | | | | | | | | | Later patches will give every ir_var_temporary the same name in release builds. Adding a bunch of variables named "compiler_temp" to the symbol table can only cause problems. No change Valgrind massif results for a trimmed apitrace of dota2. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Add the possibility for ir_variable to have a non-ralloced nameIan Romanick2014-09-303-2/+30
| | | | | | | | | | Specifically, ir_var_temporary variables constructed with a NULL name will all have the name "compiler_temp" in static storage. No change Valgrind massif results for a trimmed apitrace of dota2. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Store ir_variable_data::_num_state_slots and ::binding in 16-bits eachIan Romanick2014-09-301-8/+16
| | | | | | | | | | | | | | | | | Valgrind massif results for a trimmed apitrace of dota2: n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B) Before (32-bit): 44 40,577,049,140 68,118,608 62,441,063 5,677,545 0 After (32-bit): 71 40,583,408,411 67,761,528 62,263,519 5,498,009 0 Before (64-bit): 63 37,122,829,194 95,153,008 87,333,600 7,819,408 0 After (64-bit): 67 37,123,303,706 95,150,544 87,333,600 7,816,944 0 A real savings of 173KiB on 32-bit and no change on 64-bit. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* glsl: Squish ir_variable::max_ifc_array_access and ::state_slots togetherIan Romanick2014-09-303-36/+48
| | | | | | | | | | | | | | | | | | | | | | | | At least one of these pointers must be NULL, and we can determine which will be NULL by looking at other fields. Use this information to store both pointers in the same location. If anyone can think of a better name for the union than "u", I'm all ears. Valgrind massif results for a trimmed apitrace of dota2: n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B) Before (32-bit): 63 40,574,239,515 68,117,280 62,618,607 5,498,673 0 After (32-bit): 44 40,577,049,140 68,118,608 62,441,063 5,677,545 0 Before (64-bit): 53 37,126,451,468 95,150,256 87,711,304 7,438,952 0 After (64-bit): 63 37,122,829,194 95,153,008 87,333,600 7,819,408 0 A real savings of 173KiB on 32-bit and 368KiB on 64-bit. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* glsl: Make ir_variable::num_state_slots and ir_variable::state_slots privateIan Romanick2014-09-305-33/+55
| | | | | | | | | | | | Also move num_state_slots inside ir_variable_data for better packing. The payoff for this will come in a few more patches. No change Valgrind massif results for a trimmed apitrace of dota2. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* glsl: Make ir_variable::max_ifc_array_access privateIan Romanick2014-09-305-22/+53
| | | | | | | | | | The payoff for this will come in a few more patches. No change Valgrind massif results for a trimmed apitrace of dota2. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* glsl: Store ir_variable::depth_layout using 3 bitsIan Romanick2014-09-301-10/+9
| | | | | | | | | | | | | | | | | | | | | | | warn_extension_index was moved to improve packing. Valgrind massif results for a trimmed apitrace of dota2: n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B) Before (32-bit): 73 40,580,476,304 68,488,400 62,796,151 5,692,249 0 After (32-bit): 73 40,575,751,558 68,116,528 62,618,607 5,497,921 0 Before (64-bit): 71 37,124,890,613 95,889,584 88,089,008 7,800,576 0 After (64-bit): 62 37,123,578,526 95,150,784 87,711,304 7,439,480 0 A real savings of 173KiB on 32-bit and 368KiB on 64-bit. v2: Use the enum name with the bit-field and remove the extra casts. Suggested by Ken. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> [v1] Reviewed-by: Tapani Pälli <[email protected]> [v1]
* glsl: Replace ir_variable::warn_extension pointer with an 8-bit indexIan Romanick2014-09-303-10/+31
| | | | | | | | | | | | | | | | | | | | Also move the new warn_extension_index into ir_variable::data. This enables slightly better packing. Valgrind massif results for a trimmed apitrace of dota2: n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B) Before (32-bit): 82 40,580,040,531 68,488,992 62,973,695 5,515,297 0 After (32-bit): 73 40,580,476,304 68,488,400 62,796,151 5,692,249 0 Before (64-bit): 65 37,124,013,542 95,892,768 88,466,712 7,426,056 0 After (64-bit): 71 37,124,890,613 95,889,584 88,089,008 7,800,576 0 A real savings of 173KiB on 32-bit and 368KiB on 64-bit. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* glsl: Use accessors for ir_variable::warn_extensionIan Romanick2014-09-303-7/+30
| | | | | | | | | | The payoff for this will come in the next patch. No change Valgrind massif results for a trimmed apitrace of dota2. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* glsl: Eliminate unused built-in variables after compilationIan Romanick2014-09-304-0/+104
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After compilation (and before linking) we can eliminate quite a few built-in variables. Basically, any uniform or constant (e.g., gl_MaxVertexTextureImageUnits) that isn't used (with one exception) can be eliminated. System values, vertex shader inputs (with one exception), and fragment shader outputs that are not used and not re-declared in the shader text can also be removed. gl_ModelViewProjectMatrix and gl_Vertex are used by the built-in function ftransform. There are some complications with eliminating these variables (see the comment in the patch), so they are not eliminated. Valgrind massif results for a trimmed apitrace of dota2: n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B) Before (32-bit): 46 40,661,487,174 75,116,800 68,854,065 6,262,735 0 After (32-bit): 50 40,564,927,443 69,185,408 63,683,871 5,501,537 0 Before (64-bit): 64 37,200,329,700 104,872,672 96,514,546 8,358,126 0 After (64-bit): 59 36,822,048,449 96,526,888 89,113,000 7,413,888 0 A real savings of 4.9MiB on 32-bit and 7.0MiB on 64-bit. v2: Don't remove any built-in with Transpose in the name. v3: Fix comment typo noticed by Anuj. Signed-off-by: Ian Romanick <[email protected]> Suggested-by: Eric Anholt <[email protected]> Reviewed-by: Matt Turner <[email protected]> Acked-by: Anuj Phogat <[email protected]> Cc: Eric Anholt <[email protected]>
* glsl: Validate that built-in uniforms have backing stateIan Romanick2014-09-301-0/+8
| | | | | | | | | | | All built-in uniforms are supposed to be backed by some GL state. The state_slots field describes this backing state. This helped me track down a bug in a later patch. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]> Acked-by: Anuj Phogat <[email protected]>
* glsl: Allow texture2DProjLod and textureCubeLod in GL ESKalyan Kondapally2014-09-291-3/+3
| | | | | | | | | | According to GLES (i.e. 1.0 and above) spec textureCubeLod and texture2DProjLod are built in functions. We seem to disable support for these functions with GLES. This patch enables the support. Signed-off-by: Kalyan Kondapally <[email protected]> Reviewed-by: Matt Turner <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84355
* glsl: Recognize open-coded pow(x, y).Matt Turner2014-09-271-0/+14
| | | | | | | | pow(x, y) is equivalent to exp(log(x) * y). instructions in affected programs: 578 -> 458 (-20.76%) Reviewed-by: Kenneth Graunke <[email protected]>
* glsl: Strip arrayness from ir_type_dereference_variable tooIan Romanick2014-09-261-1/+1
| | | | | | | | | | | | If the thing being dereferenced is a record or an array of records, it should be treated as row-major. The ir_type_derference_record path already does this, and I think I intended to do the same for this path in b17a4d5d. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83741 Cc: [email protected]
* glsl: Round struct size up to at least 16 bytesIan Romanick2014-09-261-1/+1
| | | | | | | | | | | Per rule #9, the size of the structure is vec4 aligned. The MAX2 in the loop ensures that sizes >= 16 bytes are vec4 aligned. The new MAX2 after the loop ensures that sizes < 16 bytes are vec4 aligned. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82932 Cc: [email protected]