| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
Signed-off-by: Emil Velikov <[email protected]>
Acked-by: Matt Turner <[email protected]>
Acked-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
| |
Allows us to remove the SCons workaround :-)
Signed-off-by: Emil Velikov <[email protected]>
Acked-by: Matt Turner <[email protected]>
Acked-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We would like to be able to combine
result.x = bitfieldExtract(src0.x, src1.x, src2.x);
result.y = bitfieldExtract(src0.y, src1.y, src2.y);
result.z = bitfieldExtract(src0.z, src1.z, src2.z);
result.w = bitfieldExtract(src0.w, src1.w, src2.w);
into a single ivec4 bitfieldInsert operation. This should be possible
with most drivers.
This patch changes the offset and bits parameters from scalar ints
to ivecN or uvecN. The type of all three operands will be the same,
for simplicity.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We would like to be able to combine
result.x = bitfieldInsert(src0.x, src1.x, src2.x, src3.x);
result.y = bitfieldInsert(src0.y, src1.y, src2.y, src3.y);
result.z = bitfieldInsert(src0.z, src1.z, src2.z, src3.z);
result.w = bitfieldInsert(src0.w, src1.w, src2.w, src3.w);
into a single ivec4 bitfieldInsert operation. This should be possible
with most drivers.
This patch changes the offset and bits parameters from scalar ints
to ivecN or uvecN. The type of all four operands will be the same,
for simplicity.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
TGSI doesn't use these - it just translates ir_quadop_bitfield_insert
directly. NIR can handle ir_quadop_bitfield_insert as well.
These opcodes were only used for i965, and with Jason's recent patches,
we can do this lowering in NIR (which also gains us SPIR-V handling).
So there's not much point to retaining this GLSL IR lowering code.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, opt_vectorize() tries to combine:
result.x = bitfieldInsert(src0.x, src1.x, src2.x, src3.x);
result.y = bitfieldInsert(src0.y, src1.y, src2.y, src3.y);
result.z = bitfieldInsert(src0.z, src1.z, src2.z, src3.z);
result.w = bitfieldInsert(src0.w, src1.w, src2.w, src3.w);
into a single ir_quadop_bitfield_insert opcode, which operates on
ivec4s. However, GLSL IR's opcodes currently require the bits and
offset parameters to be scalar integers. So, this breaks.
We want to be able to vectorize this eventually, but for now, just
chicken out and make opt_vectorize() bail by marking all the bitfield
insert/extract related opcodes as horizontal. This is a relatively
uncommon case today, so we'll do the simple fix for stable branches,
and fix it properly on master.
Fixes assertion failures when compiling Shadow of Mordor vertex shaders
on i965 in vec4 mode (where OptimizeForAOS enables opt_vectorize()).
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
vector_insert takes a vector, a scalar location, and a scalar value,
and produces a new vector with that component updated. As such, it
can't be vectorized properly.
vector_extract takes a vector and a scalar location, and returns
that scalar component of the vector. Vectorization doesn't really
make any sense.
Treating both as horizontal operations makes sure the vectorizer
won't try to touch these.
Found by inspection.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
There used to be more members but they now share other fields
in order to keep memory use low.
Also making the naming more generic will allow us to reuse the
field for explicit byte offsets within blocks for
ARB_enhanced_layouts.
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
|
|
| |
The GLSL IR to TGSI/Mesa IR paths for any_nequal have the same
optimizations the ir_unop_any paths had.
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The value will be set in separate-shader program when an input/output
must remains active. e.g. when deadcode removal isn't allowed because
it will create interface location/name-matching mismatch.
v3:
* Rename the attribute
* Use ir_variable directly instead of ir_variable_refcount_visitor
* Move the foreach IR code in the linker file
v4:
* Fix variable name in assert
v5 (by Timothy Arceri):
* Rename functions and reword comments
* Don't set always active on builtins
Signed-off-by: Gregory Hainaut <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
Reviewed-by: Tapani Pälli <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ARB_explicit_uniform_location allows the index for subroutine functions
to be explicitly set in the shader.
This patch reduces the restriction on the index qualifier in
validate_layout_qualifiers() to allow it to be applied to subroutines
and adds the new subroutine qualifier validation to ast_function::hir().
ast_fully_specified_type::has_qualifiers() is updated to allow the
index qualifier on subroutine functions when explicit uniform locations
is available.
A new check is added to ast_type_qualifier::merge_qualifier() to stop
multiple function qualifiers from being defied, before this patch this
would cause a segfault.
Finally a new variable is added to ir_function_signature to store the
index. This value is validated and the non explicit values assigned in
link_assign_subroutine_types().
Reviewed-by: Tapani Pälli <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We will need this later on when we implement proper support for
precision qualifiers in the drivers and also to do link time checks for
uniforms as indicated by the spec.
This patch also adds compile-time checks for variables without precision
information (currently, Mesa only checks that a default precision is set
for floats in fragment shaders).
As indicated by Ian, the addition of the precision information to
ir_variable has been done using a bitfield and pahole to identify an
available hole so that memory requirements for ir_variable stay the
same.
v2 (Ian):
- Avoid if-ladders by defining arrays of supported sampler names and
indexing
into them with type->sampler_array + 2 * type->sampler_shadow
- Make the code that selects the precision qualifier to use an utility
function
- Fix a typo
v3 (Tapani):
- rebased
- squashed in "Precision qualifiers are not allowed on structs"
- fixed select_gles_precision for sampler arrays
- fixed precision_qualifier_allowed for arrays of structs
v4 (Tapani):
- add atomic_uint handling
- do not allow precision qualifier on images
(issues reported by Marta)
v5 (Tapani):
- support precision qualifier on image types
v6 (Tapani):
- set precision qualifier on interface block members
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Shared variables are stored in a common pool accessible by all threads
in a compute shader local work group.
These variables are similar to OpenCL's local/__local variables.
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
| |
v2:
* Split from patch to add ir_var_shader_shared (tarceri)
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We get these when we operate on vector variables with array accessors
(i.e. things like a[0] where 'a' is a vec4). When we call variable_referenced()
on these expressions we want to return a reference to 'a' instead of NULL.
This fixes a problem where we pass a[0] as the first argument to an atomic
SSBO function that expects a buffer variable. In order to check this, we use
variable_referenced(), but that is currently returning NULL in this case, since
the underlying rvalue is a vector_extract expression.
Tested-by: Markus Wick <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The unsized array length is computed with the following formula:
array.length() =
max((buffer_object_size - offset_of_array) / stride_of_array, 0)
Of these, only the buffer size needs to be provided by the backends, the
frontend already knows the values of the two other variables.
This patch identifies the cases where we need to get the length of an
unsized array, injecting ir_unop_ssbo_unsized_array_length expressions
that will be lowered (in a later patch) to inject the formula mentioned
above.
It also adds the ir_unop_get_buffer_size expression that drivers will
implement to provide the buffer length.
v2:
- Do not define a triop that will force backends to implement the
entire formula, they should only need to provide the buffer size
since the other values are known by the frontend (Curro).
v3:
- Call state->has_shader_storage_buffer_objects() in ast_function.cpp instead
of using state->ARB_shader_storage_buffer_object_enable (Tapani).
Signed-off-by: Samuel Iglesias Gonsalvez <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
They only can be defined in the last position of the shader
storage blocks.
When an unsized array is used in different shaders, it might be
converted in different sized arrays, avoid get a linker error
in that case.
v2:
- Rework error condition and error messages (Timothy Arceri)
v3:
- Move OpenGL ES check to its own patch.
Signed-off-by: Samuel Iglesias Gonsalvez <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This will allow us to access the uniform later on without resorting to
building a name string and looking it up in UniformHash.
V3: remove line wrap change from this patch
V2: store slot number for all non-UBO uniforms to make code more
consitent, renamed explicit_binding to explicit_location and added
comment about what it does. Store the location at every shader stage.
Updated data.location comments in ir/nir.h.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We initialize gl_GlobalInvocationID based on the extension spec
formula:
gl_GlobalInvocationID =
gl_WorkGroupID * gl_WorkGroupSize + gl_LocalInvocationID
https://www.opengl.org/registry/specs/ARB/compute_shader.txt
Signed-off-by: Jordan Justen <[email protected]>
Cc: Ilia Mirkin <[email protected]>
Reviewed-by: Tapani Pälli <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Also rename to _mesa_get_main_function_signature.
We will call it near the end of compilation to insert some code into
main for initializing some compute shader global variables.
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Tapani Pälli <[email protected]>
Reviewed-by: Eduardo Lima Mitev <[email protected]>
|
|
|
|
|
|
|
|
| |
Will be used for textureSamples()
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
builtin_variables.cpp:1062:53: warning: unused parameter 'name_as_gs_input' [-Wunused-parameter]
const char *name_as_gs_input)
^
builtin_functions.cpp:4774:47: warning: unused parameter 'intrinsic_name' [-Wunused-parameter]
const char *intrinsic_name,
^
builtin_functions.cpp:4907:66: warning: unused parameter 'state' [-Wunused-parameter]
_mesa_glsl_find_builtin_function_by_name(_mesa_glsl_parse_state *state,
^
builtin_functions.cpp:4915:49: warning: unused parameter 'num_arguments' [-Wunused-parameter]
unsigned num_arguments,
^
builtin_functions.cpp:4916:49: warning: unused parameter 'flags' [-Wunused-parameter]
unsigned flags)
^
ir_print_visitor.cpp:589:37: warning: unused parameter 'ir' [-Wunused-parameter]
ir_print_visitor::visit(ir_barrier *ir)
^
linker.cpp:3212:48: warning: unused parameter 'ctx' [-Wunused-parameter]
build_program_resource_list(struct gl_context *ctx,
^
standalone_scaffolding.cpp:65:57: warning: unused parameter ‘id’ [-Wunused-parameter]
_mesa_shader_debug(struct gl_context *, GLenum, GLuint *id,
^
v2: Rebase on top of GL_ARB_shader_image_size work (especially
58a86897). Silence more warnings added by that work.
v3: Remove mention of the removed parameter from comments. Suggested by
Iago.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]> [v1]
Reviewed-by: Iago Toral Quiroga <[email protected]>
Cc: "Martin Peres <[email protected]>"
|
|
|
|
|
|
|
|
|
| |
This adds a ir_variable which contains the subroutine uniform
and an array rvalue for the deref of that uniform, these
are stored in the ir_call and lowered later.
Reviewed-by: Chris Forbes <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
We need to store two sets of info into the ir_function,
if this is a function definition with a subroutine list
(subroutine_def) or if it a subroutine prototype.
v1.1: add some more documentation.
Reviewed-by: Chris Forbes <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This type will be used to store the name of subroutine types
as in subroutine void myfunc(void);
will store myfunc into a subroutine type.
This is required to the parser can identify a subroutine
type in a uniform decleration as a valid type, and also for
looking up the type later.
Also add contains_subroutine method.
v2: handle subroutine to int comparisons, needed
for lowering pass.
v3: do subroutine to int with it's own IR
operation to avoid hacking on asserts (Kayden)
v3.1: fix warnings in this patch, fix nir,
fix tgsi
v3.2: fixup tests
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
tests: fix warnings
|
|
|
|
|
|
| |
v2: Dropped some unrelated reordering in glsl_parser.yy as Ken suggested.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
Since this now checks if a variable is inside a uniform or a shader
storage block.
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This will be used to identify buffer variables inside shader storage
buffer objects, which are very similar to uniforms except for a few
differences, most important of which is that they are writable.
Since buffer variables are so similar to uniforms, we will almost always
want them to go through the same paths as uniforms.
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
v2:
* Changes suggested by mattst88
[[email protected]: Add nir support]
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Ben Widawsky <[email protected]>
|
|
|
|
|
|
|
|
| |
These were added in commit f2616e56, presumably in preparation for
translating ARB vp/fp into GLSL IR. That never happened, and neither did
a lowering pass that actually generated these instructions.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
v2: Don't be lazy. Constify the as_foo functions and use those instead
of ugly casts. Suggested by Curro.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Now that they're all implemented using macros, this is trivial.
v2: Remove redundant parenthesis. Suggested by Curro.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The downcast functions for non-leaf classes were previously implemented
"by hand." Now they are implemented using macros based on the is_foo
functions added in the previous patch.
v2: Remove redundant parenthesis. Suggested by Curro (on the next
patch).
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
These functions deteremine when an IR node is one of the non-leaf
classes.
v2: Adjust indentation to line up. Suggested by Matt.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We use the idiom
ir_foo *x = y->as_foo();
if (x == NULL)
return;
all over the place. GCC generates some quite lovely code for this.
One such example:
340a5b: 83 7d 18 04 cmpl $0x4,0x18(%rbp)
340a5f: 0f 85 06 04 00 00 jne 340e6b
340a65: 48 85 ed test %rbp,%rbp
340a68: 0f 84 fd 03 00 00 je 340e6b
This case used as_expression() (ir_type_expression is 4). Note that it
checks the ir_type, then checks that the pointer isn't NULL. There is
some disconnect in GCC around the condition in the as_foo functions.
return ir_type == ir_type_##TYPE ? (ir_##TYPE *) this : NULL; \
It believes "this" could be NULL, so it emits check outside the function
just for fun.
This patch uses assume() to tell GCC that it need not bother with extra
NULL checking of the pointer returned by the as_foo functions.
text data bss dec hex filename
4836430 158688 26248 5021366 4c9eb6 i965_dri-before.so
4836173 158688 26248 5021109 4c9db5 i965_dri-after.so
v2: Replace 'if (this == NULL) unreachable("this cannot be NULL")' with
assume(this != NULL). Suggested by Ilia Mirkin.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
| |
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Create a new search function to look for matching built-in functions by name
and use it for built-in function redefinition or overload in GLSL ES 3.00.
GLSL ES 3.0 spec, chapter 6.1 "Function Definitions", page 71
"A shader cannot redefine or overload built-in functions."
While in GLSL ES 1.0 specification, chapter 8 "Built-in Functions"
"User code can overload the built-in functions but cannot redefine them."
So this check is specific to GLSL ES 3.00.
This patch fixes the following dEQP tests:
dEQP-GLES3.functional.shaders.functions.invalid.overload_builtin_function_vertex
dEQP-GLES3.functional.shaders.functions.invalid.overload_builtin_function_fragment
dEQP-GLES3.functional.shaders.functions.invalid.redefine_builtin_function_vertex
dEQP-GLES3.functional.shaders.functions.invalid.redefine_builtin_function_fragment
No piglit regressions.
Signed-off-by: Samuel Iglesias Gonsalvez <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
v2: add d2b, more ir_constant stuff (Ilia)
Signed-off-by: Dave Airlie <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the compiler, we'd like to generate implicit uniforms for internal
use. These should not be visible via the GL uniform introspection API.
To support that, we add a new ir_variable::how_declared value of
ir_var_hidden, and plumb that through to gl_uniform_storage.
v2 (idr): Fix some memory management issues in
move_hidden_uniforms_to_end. The comment block on the function has more
details.
Signed-off-by: Kenneth Graunke <[email protected]>
Signed-off-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
| |
The optimization in commit d056863b covers these cases, which were the
first optimizations I added to the GLSL compiler.
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Valgrind massif results for a trimmed apitrace of dota2:
n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B)
Before (32-bit): 74 40,578,719,715 67,762,208 62,263,404 5,498,804 0
After (32-bit): 52 40,565,579,466 66,359,800 61,187,818 5,171,982 0
Before (64-bit): 74 37,129,541,061 95,195,160 87,369,671 7,825,489 0
After (64-bit): 76 37,134,691,404 93,271,352 85,900,223 7,371,129 0
A real savings of 1.0MiB on 32-bit and 1.4MiB on 64-bit.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Specifically, ir_var_temporary variables constructed with a NULL name
will all have the name "compiler_temp" in static storage.
No change Valgrind massif results for a trimmed apitrace of dota2.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Valgrind massif results for a trimmed apitrace of dota2:
n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B)
Before (32-bit): 44 40,577,049,140 68,118,608 62,441,063 5,677,545 0
After (32-bit): 71 40,583,408,411 67,761,528 62,263,519 5,498,009 0
Before (64-bit): 63 37,122,829,194 95,153,008 87,333,600 7,819,408 0
After (64-bit): 67 37,123,303,706 95,150,544 87,333,600 7,816,944 0
A real savings of 173KiB on 32-bit and no change on 64-bit.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Tapani Pälli <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
At least one of these pointers must be NULL, and we can determine which
will be NULL by looking at other fields. Use this information to store
both pointers in the same location.
If anyone can think of a better name for the union than "u", I'm all
ears.
Valgrind massif results for a trimmed apitrace of dota2:
n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B)
Before (32-bit): 63 40,574,239,515 68,117,280 62,618,607 5,498,673 0
After (32-bit): 44 40,577,049,140 68,118,608 62,441,063 5,677,545 0
Before (64-bit): 53 37,126,451,468 95,150,256 87,711,304 7,438,952 0
After (64-bit): 63 37,122,829,194 95,153,008 87,333,600 7,819,408 0
A real savings of 173KiB on 32-bit and 368KiB on 64-bit.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Tapani Pälli <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Also move num_state_slots inside ir_variable_data for better packing.
The payoff for this will come in a few more patches.
No change Valgrind massif results for a trimmed apitrace of dota2.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Tapani Pälli <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The payoff for this will come in a few more patches.
No change Valgrind massif results for a trimmed apitrace of dota2.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Tapani Pälli <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
warn_extension_index was moved to improve packing.
Valgrind massif results for a trimmed apitrace of dota2:
n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B)
Before (32-bit): 73 40,580,476,304 68,488,400 62,796,151 5,692,249 0
After (32-bit): 73 40,575,751,558 68,116,528 62,618,607 5,497,921 0
Before (64-bit): 71 37,124,890,613 95,889,584 88,089,008 7,800,576 0
After (64-bit): 62 37,123,578,526 95,150,784 87,711,304 7,439,480 0
A real savings of 173KiB on 32-bit and 368KiB on 64-bit.
v2: Use the enum name with the bit-field and remove the extra casts.
Suggested by Ken.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]> [v1]
Reviewed-by: Tapani Pälli <[email protected]> [v1]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Also move the new warn_extension_index into ir_variable::data. This
enables slightly better packing.
Valgrind massif results for a trimmed apitrace of dota2:
n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B)
Before (32-bit): 82 40,580,040,531 68,488,992 62,973,695 5,515,297 0
After (32-bit): 73 40,580,476,304 68,488,400 62,796,151 5,692,249 0
Before (64-bit): 65 37,124,013,542 95,892,768 88,466,712 7,426,056 0
After (64-bit): 71 37,124,890,613 95,889,584 88,089,008 7,800,576 0
A real savings of 173KiB on 32-bit and 368KiB on 64-bit.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Tapani Pälli <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The payoff for this will come in the next patch.
No change Valgrind massif results for a trimmed apitrace of dota2.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Tapani Pälli <[email protected]>
|