| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
| |
Funnily enough, some of these were turned into a compile-time error by gcc
with -fsanitize=undefined ("initializer is not a constant").
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Michel Dänzer <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Michel Dänzer <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Since 1 << 31 complains about undefined behaviour; the others are changed
only for consistency.
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Michel Dänzer <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
This reverts commit b593737ed8349b280fa29242c35f565b59ab3025.
Apparently it causes GPU hangs on some image load store tests.
Let's turn it back off until we figure out why.
|
|
|
|
|
| |
This is where we handle texop_texture_samples so it makes things more
consistent.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are a few different fixups that we have to do for texture
destinations that re-arrange channels, fix hardware vs. API mismatches, or
just shrink the result to fit in the NIR destination. These were all being
done in a somewhat haphazard manner. This commit replaces all of the
shuffling with a single LOAD_PAYLOAD operation at the end and makes it much
easier to insert fixups between the texture instruction itself and the
LOAD_PAYLOAD.
Shader-db results on Haswell:
total instructions in shared programs: 6227035 -> 6226669 (-0.01%)
instructions in affected programs: 19119 -> 18753 (-1.91%)
helped: 85
HURT: 0
total cycles in shared programs: 56491626 -> 56476126 (-0.03%)
cycles in affected programs: 672420 -> 656920 (-2.31%)
helped: 92
HURT: 42
|
|
|
|
| |
We are no longer using anything from GLSL IR in the FS backend.
|
|
|
|
|
|
|
| |
The fs_visitor::emit_texture helper originated when we still had both NIR
and IR visitors for the FS backend. Since the old visitor was removed,
emit_texture serves no real purpose beyond arbitrarily splitting
heavily-linked code across two functions.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Normally, we expect SIMD8 shaders to be more instructions than SIMD4x2
shaders, as it takes four instructions to operate on a vec4, rather than
a single instruction. However, the benefit is that it can process 8
objects per shader thread instead of 2.
Surprisingly, the shader-db statistics show an improvement in both
instruction and cycle counts:
Synmark: -31.25% instructions, -29.27% cycles, 0 hurt.
Tessmark: -36.92% instructions, -37.81% cycles, 0 hurt.
Unigine Heaven: -3.42% instructions, -17.95% cycles, 0 hurt.
Shadow of Mordor:
+13.24% instructions (26 with fewer instructions, 45 with more),
-5.23% cycles (44 with fewer cycles, 27 with more cycles).
Presumably, this is because the SIMD8 URB messages are a much more
natural fit than the SIMD4x2 URB messages - there's a ton less header
setup.
I benchmarked Shadow of Mordor and Unigine Heaven on my Skylake GT3e,
and the performance seems to be the same or increase ever so slightly
(< 1 FPS difference). So I believe it's strictly superior.
There's also a lot more optimization potential we can do in scalar mode.
This will also help us finish fp64 support, as scalar support is going
to land much sooner than vec4-mode support.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
According to Timothy, using program_string_id == 0 to identify the
passthrough TCS is going to be problematic for his shader cache work.
So, change it to strcmp() the name at visitor creation time.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Beginning with commit 7b208a73, Unigine Valley began hanging the GPU on
Gen >= 8 platforms.
Evidently that commit allowed the scheduler to make different choices
that somehow finally ran afoul of a hardware bug in which POW and FDIV
instructions may not be followed by an instruction with two destination
registers (including compressed instructions). I presume the conditions
are more complex than that, but the internal hardware bug report (BDWGFX
bug_de 1696294) does not contain much more information.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94924
Reviewed-by: Topi Pohjolainen <[email protected]> [v1]
Tested-by: Mark Janes <[email protected]> [v1]
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
| |
This fixes: GL43-CTS.compute_shader.resource-ubo
Reviewed-by: Anuj Phogat <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
| |
We already are a GLintptr, casting won't help.
Reviewed-by: Alejandro Piñeiro <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
| |
Reviewed-by: Eduardo Lima Mitev <[email protected]>
|
|
|
|
| |
Reviewed-by: Charmaine Lee <[email protected]>
|
|
|
|
|
|
|
| |
Silences warnings with 32-bit Linux gcc builds and MinGW which doesn't
recognize the ‘t’ conversion character.
Reviewed-by: Sinclair Yeh <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
v2:
* Declare loop index variable at loop site (idr)
* Make arrays of MI_MATH instructions 'static const' (idr)
* Remove commented debug code (idr)
* Updated comment in set_query_availability (Ken)
* Replace switch with if/else in hsw_result_to_gpr0 (Ken)
* Only divide GL_FRAGMENT_SHADER_INVOCATIONS_ARB by 4 on
hsw and gen8 (Ken)
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This matches the byte based offset of brw_load_register_mem*.
The function is also moved into intel_batchbuffer.c like
brw_load_register_mem*.
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
MOCS for 3DSTATE_SO_BUFFER has existed for ages.
|
|
|
|
| |
I added this when deleting some unnecessary code in a rebase.
|
|
|
|
|
|
|
|
| |
When computing the offset in the uniform storage table, take into account
the size multiplier so double precision matrices are handled correctly.
Signed-off-by: Juan A. Suarez Romero <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
Split 32-bit and 64-bit fmod lowering as the drivers might need to
lower them separately inside NIR depending on the HW support.
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
| |
GEN_LT has a straightforward implementation on which we can build the
GEN_GE and GEN_LE macros.
Suggested-by: Ilia Mirkin <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
For opcodes that changed meaning on different generations, we store a
pointer to a secondary table and the table's size in a tagged union in
place of the mnemonic and number of sources.
Acked-by: Francisco Jerez <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The previous commit replaced direct uses of opcode_descs with calls to
the wrapper function, which should be the only method of accessing
opcode_descs's data. As a result gen_from_devinfo() can also be made
static.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
I merged opcode_desc into inst_info (instead of the other way around)
because inst_info was sorted by opcode number.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Drop the uses of 'enum gen' to a plain int, so that we don't have to
expose the bitfield definitions and GEN_GE/GEN_LE macros to other users
of brw_eu.h. As a result, s/.gen/.gens/ to avoid confusion with
devinfo->gen.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The function takes a device info struct as argument in addition to the
opcode number in order to disambiguate between multiple opcode_desc
entries for different instructions with the same opcode number.
Reviewed-by: Iago Toral Quiroga <[email protected]> [v1]
[v2] mattst88: Put brw_opcode_desc() in brw_eu.c instead of moving it
there in a later patch.
Reviewed-by: Kenneth Graunke <[email protected]> [v2]
[v3] mattst88: Return NULL if opcode >= ARRAY_SIZE(opcode_descs)
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is not strictly required for the following changes because none
of the three-source opcodes we support at the moment in the compiler
back-end has been removed or redefined, but that's likely to change in
the future. In any case having hardware instructions specified as a
pair of hardware device and opcode number explicitly in all cases will
simplify the opcode look-up interface introduced in a subsequent
commit, since the opcode number alone is in general ambiguous.
Reviewed-by: Iago Toral Quiroga <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
A future series will implement support for an instruction that happens
to have the same opcode number as another instruction we support
already on a disjoint set of hardware generations. In order to
disambiguate which instruction it is brw_instruction_name() will need
some way to find out which device we are generating code for.
Reviewed-by: Iago Toral Quiroga <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Unlike most shader stages, the Hull Shader hardware makes us explicitly
tell it how many threads to dispatch and manually configure the channel
mask. One perk of this is that we have a lot of flexibility - we can
run it in either SIMD4x2 or SIMD8 mode.
Treating it as SIMD8 means that shaders with 8 or fewer output vertices
(which is overwhemingly the common case) can be handled by a single
thread. This has several intriguing properties:
- Accessing input arrays with gl_InvocationID as the index is a simple
SIMD8 URB read with g1 as the header. No indirect addressing required.
- Barriers are no-ops.
- We could potentially do output shadowing to combine writes, as the
concurrency concerns are gone. (We don't do this yet, though.)
v2: Drop first_non_payload_grf change, as it was always adding 0
(caught by Jordan Justen).
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I'm about to implement a scalar TCS backend, and I'd rather not
duplicate all of this code there.
One change is that we now write the tessellation levels from all
TCS threads, rather than just the first. This is pretty harmless,
and was easier. The IF/ENDIF needed for that are gone; otherwise
the generated code is basically identical.
I chose to emit load/store intrinsics directly because it was easier.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This prevents a crash when a NULL src is passed with a non-NULL length.
fixes: dEQP-GLES31.functional.debug.object_labels.query_length_only
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95252
Signed-off-by: Mark Janes <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|