| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
No need to allocate more GPR than used in the compute kernel which
reads MP performance counters on Fermi.
Signed-off-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
nir/nir_control_flow.c: In function ‘split_block_cursor.isra.11’:
nir/nir_control_flow.c:460:15: warning: ‘after’ may be used uninitialized in this function [-Wmaybe-uninitialized]
*_after = after;
^
nir/nir_control_flow.c:458:16: warning: ‘before’ may be used uninitialized in this function [-Wmaybe-uninitialized]
*_before = before;
^
Signed-off-by: Vinson Lee <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
We need to use per-slot offsets when there's non-uniform indexing,
as each SIMD channel could have a different index. We want to use
them for any non-constant index (even if uniform), as it lives in
the message header instead of the descriptor, allowing us to set
offsets in GRFs rather than immediates.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Abdiel Janulgue <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
GLSL 4.00 and GL_ARB_gpu_shader5 introduced a new int -> uint implicit
conversion rule and updated the rules for modulus to use them. (In
earlier languages, none of the implicit conversion rules did anything
relevant, so there was no point in applying them.)
This allows expressions such as:
int foo;
uint bar;
uint mod = foo % bar;
Cc: [email protected]
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
I've been carrying around a patch to do this for the last few months,
and it's been exceedingly useful for debugging GS and tessellation
problems. I've caught lots of bugs by inspecting the interface
expectations of two adjacent stages.
It's not that much spam, so I figure we may as well just print it.
Signed-off-by: Kenneth Graunke <[email protected]>
Acked-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This makes expressions like component(fs_reg(ATTR, n), 7) get a proper
<0,1,0> region instead of the invalid <0,8,0>.
Nobody uses this today, but I plan to.
v2: Rebase on Matt's changes; simplify.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]> [v1]
|
|
|
|
|
|
|
|
|
|
| |
With the many variants of IO intrinsics, particular sources are often in
different locations. It's convenient to say "give me the indirect
offset" or "give me the vertex index" and have it just work, without
having to think about exactly which kind of intrinsic you have.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
We'd like to shadow these when possible, but the current code doesn't
work properly for TCS outputs. For now, disable it.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Normally, we rely on nir_lower_outputs_to_temporaries to create shadow
variables for outputs, buffering the results and writing them all out
at the end of the program. However, this is infeasible for tessellation
control shader outputs.
Tessellation control shaders can generate multiple output vertices, and
write per-vertex outputs. These are arrays indexed by the vertex
number; each thread only writes one element, but can read any other
element - including those being concurrently written by other threads.
The barrier() intrinsic synchronizes between threads.
Even if we tried to shadow every output element (which is of dubious
value), we'd have to read updated values in at barrier() time, which
means we need to allow output reads.
Most stages should continue using nir_lower_outputs_to_temporaries(),
but in theory drivers could choose not to if they really wanted.
v2: Rebase to accomodate Jason's review feedback.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Similar to nir_load_per_vertex_input, but for outputs. This is not
useful in geometry shaders, but will be useful in tessellation shaders.
v2: Change stage_uses_per_vertex_outputs() to is_per_vertex_output(),
taking a nir_variable (requested by Jason Ekstrand).
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Tessellation control shader inputs are an array indexed by the vertex
number, like geometry shader inputs. There aren't per-patch TCS inputs.
Tessellation evaluation shaders have both per-vertex and per-patch
inputs. Per-vertex inputs get the new intrinsics; per-patch inputs
continue to use the ordinary load_input intrinsics, as they already
work like we want them to.
v2: Change stage_uses_per_vertex_inputs into is_per_vertex_input(),
which takes a variable (requested by Jason Ekstrand).
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
brw_meta_fast_clear.c: In function 'get_buffer_rect':
brw_meta_fast_clear.c:318:37: warning: unused parameter 'brw' [-Wunused-parameter]
get_buffer_rect(struct brw_context *brw, struct gl_framebuffer *fb,
^
brw_meta_fast_clear.c:319:44: warning: unused parameter 'irb' [-Wunused-parameter]
struct intel_renderbuffer *irb, struct rect *rect)
^
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Ian Romanick <[email protected]>
Cc: "10.6 11.0" <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
| |
Some of these are no longer needed since all the backends switched to
NIR.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
intel_asm_annotation.c: In function ‘annotation_insert_error’:
intel_asm_annotation.c:214:18:
warning: ‘ann’ may be used uninitialized in this function
[-Wmaybe-uninitialized]
ann->error = ralloc_strdup(annotation->mem_ctx, error);
^
I initially tried changing the type of ann_count to unsigned (is
currently int), since that in addition to the check that it's non-zero
at the beginning of the function seems sufficient to prove that it must
be greater than zero. Unfortunately that wasn't sufficient.
|
|
|
|
|
|
| |
Reviewed-by: Eduardo Lima Mitev <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Signed-off-by: Juha-Pekka Heikkila <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Emil Velikov <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
The first four values (2-bits) are hardware values, and VGRF, ATTR, and
UNIFORM remain values used in the IR.
Reviewed-by: Emil Velikov <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
HW_REGs are (were!) kind of awful. If the file was HW_REG, you had to
look at different fields for type, abs, negate, writemask, swizzle, and
a second file. They also caused annoying problems like immediate sources
being considered scheduling barriers (commit 6148e94e2) and other such
nonsense.
Instead use ARF/FIXED_GRF/MRF for fixed registers in those files.
After a sufficient amount of time has passed since "GRF" was used, we
can rename FIXED_GRF -> GRF, but doing so now would make rebasing awful.
Reviewed-by: Emil Velikov <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The fs_reg() constructors for immediates set stride to 0, except for
vector-immediates, which set stride to 1. This patch makes the fs_reg
constructor that takes a brw_reg do likewise, so that stride is set
correctly for cases such as fs_reg(brw_imm_v(...)).
The generator asserts that this is true (and presumably it's useful in
some optimization passes?) and the VF fs_reg constructors did this (by
virtue of the fact that it doesn't override what init() does).
In the next commit, calling this constructor with brw_imm_* will generate
an IMM file register rather than a HW_REG, making this change necessary
to avoid breakage with existing uses of brw_imm_v().
Reviewed-by: Emil Velikov <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We use brw_imm_v() to produce type-V immediates, which generates a
brw_reg with fs_reg's .file set to HW_REG. The next commit will rid us
of HW_REGs, so we need to handle BRW_REGISTER_TYPE_V in the IMM case.
Reviewed-by: Emil Velikov <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The 2-bit hardware register file field is ARF, GRF, MRF, IMM.
Rename GRF to VGRF (virtual GRF) so that we can reuse the GRF name to
mean an assigned general purpose register.
Reviewed-by: Emil Velikov <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
I'm going to begin using brw_reg's file field in backend_reg and its
derivatives, and in order to keep the hardware value for ARF as 0, we
have to do something different.
Reviewed-by: Emil Velikov <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The test (file == BAD_FILE) works on registers for which the constructor
has not run because BAD_FILE is zero. The next commit will move
BAD_FILE in the enum so that it's no longer zero.
In the case of this->outputs, the constructor was being run implicitly,
and we were unnecessarily memsetting is to zero.
Reviewed-by: Emil Velikov <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
In addition to combining another field, we get replace silliness like
"reg.reg" with something that actually makes sense, "reg.nr"; and no one
will ever wonder again why dst.reg isn't a dst_reg.
Moving the now 16-bit nr field to a 16-bit boundary decreases code size
by about 3k.
Reviewed-by: Emil Velikov <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Emil Velikov <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
Also allows us to handle HW_REGs in the swizzle() and writemask()
functions.
Reviewed-by: Emil Velikov <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
Since backend_reg now inherits brw_reg, we can use it in place of the
fixed_hw_reg field.
Reviewed-by: Emil Velikov <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Emil Velikov <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Emil Velikov <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Put fields that are meaningless with an immediate in the same storage
with the immediate. This leaves fields type, file, nr, subnr in the
first dword where there's now extra room for expansion.
Reviewed-by: Emil Velikov <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Generated by
sed -i -e 's/\.bits\././g' *.c *.h *.cpp
sed -i -e 's/dw1\.//g' *.c *.h *.cpp
and then reverting changes to comments in gen7_blorp.cpp and
brw_fs_generator.cpp.
There wasn't any utility offered by forcing the programmer to list these
to access their fields. Removing them will reduce churn in future
commits.
This is C11 (and gcc has apparently supported it for sometime
"compatibility with other compilers")
See https://gcc.gnu.org/onlinedocs/gcc/Unnamed-Fields.html
Reviewed-by: Emil Velikov <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Switching from an implicitly-sized type field to field with an explicit
bit width is safe because we have fewer than 2^4 types, and gcc will
warn if you attempt to set a value that will not fit.
Reviewed-by: Emil Velikov <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
Instead use the ones provided by brw_reg. Also allows us to handle
HW_REGs in the negate() functions.
Reviewed-by: Emil Velikov <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
Some fields (file, type, abs, negate) in brw_reg are shadowed by
backend_reg.
Reviewed-by: Emil Velikov <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since the types of the expression were
bool ? src_reg : (bool ? brw_reg : brw_reg)
the result of the second (nested) ternary would be implicitly
converted to a src_reg by the src_reg(struct brw_reg) constructor. I.e.,
bool ? src_reg : src_reg(bool ? brw_reg : brw_reg)
In the next patch, I make backend_reg (the parent of src_reg) inherit
from brw_reg, which changes this expression to return brw_reg, which
throws away any fields that exist in the classes derived from brw_reg.
I.e.,
src_reg(bool ? brw_reg(src_reg) : bool ? brw_reg : brw_reg)
Generally this code was gross, and wasn't actually shorter or easier to
read than an if ladder.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
|
|
|
|
| |
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reduces the shader key for ES.
Use a fixed attrib location based on (semantic name, index).
The ESGS item size is determined by the physical index of the highest ES
output, so it's almost always larger than before, but I think that
shouldn't matter as long as the ESGS ring buffer is large enough.
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I discovered that increasing the ESGS ring size fixes GS hangs on Tonga,
so let's do it properly.
There is now a separate init_config_gs_rings state that is not immutable,
because GS rings are resized when needed.
This also saves some memory. Most apps won't need more than 1MB
per ring per shader engine.
Reviewed-by: Michel Dänzer <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Michel Dänzer <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Michel Dänzer <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Michel Dänzer <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Michel Dänzer <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
and ..._cond -> ..._invert
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
Not setting the predication bit is sufficient.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
just disable it by not setting the predication bit
Reviewed-by: Nicolai Hähnle <[email protected]>
|