| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
raw copy.
Noticed the problem by inspection while typing in the previous commit.
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
|
|
| |
raw copy.
This was likely the original intention, and at least register coalesce
relies on it.
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
|
| |
Unlike the FS counterpart of this commit this was likely not (yet) a
bug, but let's fix it already in preparation for implementing support
for sub-GRF offsets in the VEC4 back-end.
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
There was a workaround for this in fs_inst::size_read() for the
SHADER_OPCODE_MOV_INDIRECT instruction and FIXED_GRF register file
*only*. We should take this possibility into account for the sources
and destinations of all instructions on all optimization passes that
need to quantize dataflow in 32B increments by adding the amount of
misalignment to the size read or written from the regs_read() and
regs_written() helpers respectively.
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes regs_written() and regs_read() to return a more accurate
value when the padding left between components due to a stride value
greater than one causes the region bounds given by size_written or
size_read to overflow into the next register. This could become a
problem in optimization passes that keep track of dataflow using
fixed-size arrays with register granularity, because the overflow
register (not actually accessed by the region) may not have been
allocated at all which could lead to undefined memory access.
An alternative to this would be to subtract the trailing padding
already during the calculation of fs_inst::size_read and
::size_written, but that would break code that currently assumes that
::size_read and _written are whole multiples of the component size,
and would be hard to maintain looking forward because size_written is
assigned from a bunch of different places.
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
| |
This will be useful later on when we start using reg_offset() on fixed
hardware registers.
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
| |
This restriction seemed rather artificial... Removing it actually
simplifies things slightly.
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
|
| |
The LINTERP virtual instruction only reads three scalar components
from the first 16B of the second source, we can now teach size_read()
about it since its return value is represented with byte granularity.
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
| |
UNIFORM files.
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
units.
The previous regs_read value can be recovered by rewriting each
reference of regs_read() like 'x = i.regs_read(j)' to 'x =
DIV_ROUND_UP(i.size_read(j), reg_unit)'.
For the same reason as in the previous patches, this doesn't attempt
to be particularly clever about simplifying the result in the interest
of keeping the rather lengthy patch as obvious as possible. I'll come
back later to clean up any ugliness introduced here.
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The previous regs_read value can be recovered by rewriting each
reference of regs_read() like 'x = i.regs_read(j)' to 'x =
DIV_ROUND_UP(i.size_read(j), reg_unit)'.
For the same reason as in the previous patches, this doesn't attempt
to be particularly clever about simplifying the result in the interest
of keeping the rather lengthy patch as obvious as possible. I'll come
back later to clean up any ugliness introduced here.
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
| |
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
in bytes.
The previous regs_written field can be recovered by rewriting each
rvalue reference of regs_written like 'x = i.regs_written' to 'x =
DIV_ROUND_UP(i.size_written, reg_unit)', and each lvalue reference
like 'i.regs_written = x' to 'i.size_written = x * reg_unit'.
For the same reason as in the previous patches, this doesn't attempt
to be particularly clever about simplifying the result in the interest
of keeping the rather lengthy patch as obvious as possible. I'll come
back later to clean up any ugliness introduced here.
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The previous regs_written field can be recovered by rewriting each
rvalue reference of regs_written like 'x = i.regs_written' to 'x =
DIV_ROUND_UP(i.size_written, reg_unit)', and each lvalue reference
like 'i.regs_written = x' to 'i.size_written = x * reg_unit'.
For the same reason as in the previous patches, this doesn't attempt
to be particularly clever about simplifying the result in the interest
of keeping the rather lengthy patch as obvious as possible. I'll come
back later to clean up any ugliness introduced here.
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
::regs_written.
This is in preparation for dropping vec4_instruction::regs_read and
::regs_written in favor of more accurate alternatives expressed in
byte units. The main reason these wrappers are useful is that a
number of optimization passes implement dataflow analysis with
register granularity, so these helpers will come in handy once we've
switched register offsets and sizes to the byte representation. The
wrapper functions will also make sure that GRF misalignment (currently
neglected by most of the back-end) is taken into account correctly in
the calculation of regs_read and regs_written.
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is in preparation for dropping fs_inst::regs_read and
::regs_written in favor of more accurate alternatives expressed in
byte units. The main reason these wrappers are useful is that a
number of optimization passes implement dataflow analysis with
register granularity, so these helpers will come in handy once we've
switched register offsets and sizes to the byte representation. The
wrapper functions will also make sure that GRF misalignment (currently
neglected by most of the back-end) is taken into account correctly in
the calculation of regs_read and regs_written.
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The fs_reg::subreg_offset and ::offset fields are now redundant, the
sub-GRF offset can just be added to the single ::offset field
expressed in byte units. The current subreg_offset value can be
recovered by applying the following rule: Replace each rvalue
reference of subreg_offset like 'x = r.subreg_offset' with 'x =
r.offset % reg_unit', and each lvalue reference like 'r.subreg_offset
= x' with 'r.offset = ROUND_DOWN_TO(r.offset, reg_unit) + x'.
For the same reason as in the previous patches, this doesn't attempt
to be particularly clever about simplifying the result in the interest
of keeping the rather lengthy patch as obvious as possible. I'll come
back later to clean up any ugliness introduced here.
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
| |
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
expressed in bytes.
The dst/src_reg::offset field in byte units introduced in the previous
patch is a more straightforward alternative to an offset
representation split between ::reg_offset and ::subreg_offset fields.
The split representation makes it too easy to forget about one of the
offsets while dealing with the other, which has led to multiple FS
back-end bugs in the past. To make the matter worse the unit
reg_offset was expressed in was rather inconsistent, for uniforms it
would be expressed in either 4B or 16B units depending on the
back-end, and for most other things it would be expressed in 32B
units.
This encodes reg_offset as a new offset field expressed consistently
in byte units. Each rvalue reference of reg_offset in existing code
like 'x = r.reg_offset' is rewritten to 'x = r.offset / reg_unit', and
each lvalue reference like 'r.reg_offset = x' is rewritten to
'r.offset = r.offset % reg_unit + x * reg_unit'.
Because the change affects a lot of places and is rather non-trivial
to verify due to the inconsistent value of reg_unit, I've tried to
avoid making any additional changes other than applying the rewrite
rule above in order to keep the patch as simple as possible, sometimes
at the cost of introducing obvious stupidity (e.g. algebraic
expressions that could be simplified given some knowledge of the
context) -- I'll clean those up later on in a second pass.
v2: Fix division by the wrong reg_unit in the UNIFORM case of
convert_to_hw_regs(). (Iago)
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The fs_reg::offset field in byte units introduced in this patch is a
more straightforward alternative to the current register offset
representation split between fs_reg::reg_offset and ::subreg_offset.
The split representation makes it too easy to forget about one of the
offsets while dealing with the other, which has led to multiple
back-end bugs in the past. To make the matter worse the unit
reg_offset was expressed in was rather inconsistent, for uniforms it
would be expressed in either 4B or 16B units depending on the
back-end, and for most other things it would be expressed in 32B
units.
This encodes reg_offset as a new offset field expressed consistently
in byte units. Each rvalue reference of reg_offset in existing code
like 'x = r.reg_offset' is rewritten to 'x = r.offset / reg_unit', and
each lvalue reference like 'r.reg_offset = x' is rewritten to
'r.offset = r.offset % reg_unit + x * reg_unit'.
Because the change affects a lot of places and is rather non-trivial
to verify due to the inconsistent value of reg_unit, I've tried to
avoid making any additional changes other than applying the rewrite
rule above in order to keep the patch as simple as possible, sometimes
at the cost of introducing obvious stupidity (e.g. algebraic
expressions that could be simplified given some knowledge of the
context) -- I'll clean those up later on in a second pass.
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Eero Tamminen <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
| |
AEP requires ASTC, which is currently only enabled on Skylake and later.
(It may be possible to extend this to Cherryview/Braswell in the future,
but earlier hardware doesn't have ASTC support.)
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
| |
This is mandatory.
Cc: [email protected]
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
| |
Numeric 2 is actually GLSL_SAMPLER_DIM_3D, which I don't think is what
was intended.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
I want to re-use this in a different pass, so move to nir.h
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
Turns out it already exists.. so don't duplicate it.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Adam Jackson <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Adam Jackson <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Adam Jackson <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Adam Jackson <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
| |
v2:
- Pass disp to RETURN_EGL_ERROR so we unlock the display
Reviewed-by: Adam Jackson <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Adam Jackson <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
| |
This moves the native pixmap fixup to a helper function so we don't
repeat ourselves.
Reviewed-by: Adam Jackson <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
| |
This moves the native window fixup to a helper function so we don't
repeat ourselves.
Reviewed-by: Adam Jackson <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Adam Jackson <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Adam Jackson <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Emil Velikov <[email protected]>
Signed-off-by: Adam Jackson <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Updated eglext.h to revision 33111 from the Khronos repository.
v2:
- Don't (re)move extension includes from eglext.h (Emil Velikov)
- Bump to revision 33111 (Adam Jackson)
Reviewed-by: Emil Velikov <[email protected]>
Reviewed-by: Adam Jackson <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The Wayland Scanner pkg-config file is called wayland-scanner.pc.
Fixes: 153539bd9d4445b50411 ("configure: rework wayland_scanner
handling (fix make distcheck)")
Cc: [email protected]
Reviewed-by: Eric Engestrom <[email protected]>
Tested-by: Eric Engestrom <[email protected]>
Signed-off-by: Brendan King <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
| |
e7c8c85785b3a8f29e3f ("gbm: Removed unused function.") forgot to remove
the global array used only by that function.
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
| |
A possible error (-1) was being lost because it was first converted to an
unsigned int and only then checked.
Reviewed-by: Nicolai Hähnle <[email protected]>
Signed-off-by: Martina Kollarova <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The LLVM compiler can CSE interp intrinsics thanks to
LLVMReadNoneAttribute.
26011 shaders in 14651 tests
Totals:
SGPRS: 1146340 -> 1132676 (-1.19 %)
VGPRS: 727371 -> 711730 (-2.15 %)
Spilled SGPRs: 2218 -> 2078 (-6.31 %)
Spilled VGPRs: 369 -> 369 (0.00 %)
Scratch VGPRs: 1344 -> 1344 (0.00 %) dwords per thread
Code Size: 35841268 -> 36009732 (0.47 %) bytes
LDS: 767 -> 767 (0.00 %) blocks
Max Waves: 222559 -> 224779 (1.00 %)
Wait states: 0 -> 0 (0.00 %)
v2: don't call load_input for fragment shaders in emit_declaration
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
26011 shaders in 14651 tests
Totals:
SGPRS: 1152636 -> 1146340 (-0.55 %)
VGPRS: 728198 -> 727371 (-0.11 %)
Spilled SGPRs: 3776 -> 2218 (-41.26 %)
Spilled VGPRs: 369 -> 369 (0.00 %)
Scratch VGPRs: 1344 -> 1344 (0.00 %) dwords per thread
Code Size: 35835152 -> 35841268 (0.02 %) bytes
LDS: 767 -> 767 (0.00 %) blocks
Max Waves: 222372 -> 222559 (0.08 %)
Wait states: 0 -> 0 (0.00 %)
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
26011 shaders in 14651 tests
Totals:
SGPRS: 1251920 -> 1152636 (-7.93 %)
VGPRS: 728421 -> 728198 (-0.03 %)
Spilled SGPRs: 16644 -> 3776 (-77.31 %)
Spilled VGPRs: 369 -> 369 (0.00 %)
Scratch VGPRs: 1344 -> 1344 (0.00 %) dwords per thread
Code Size: 36001064 -> 35835152 (-0.46 %) bytes
LDS: 767 -> 767 (0.00 %) blocks
Max Waves: 222221 -> 222372 (0.07 %)
Wait states: 0 -> 0 (0.00 %)
v2: merge codepaths where possible
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
|
| |
v2: inline the code and remove the conditional that's a no-op now
Reviewed-by: Nicolai Hähnle <[email protected]>
|