| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
v2 (Matt):
- Take a DF source argument for the DIM instruction emission
in the visitors.
- Indentation.
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
| |
Most of these are bugs because the intended execution size of an
instruction and the dispatch width of the shader aren't necessarily
the same (especially in SIMD32 programs).
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
| |
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
controls.
This implements some simple helper functions that can be used to
specify the group of channel enable signals and compression enable
that apply to a brw_inst instruction.
It's intended to replace brw_set_default_compression_control
eventually because the current interface has a number of shortcomings
inherited from the Gen-4-5-centric representation of compression and
group controls as a single non-orthogonal enum: On the one hand it
doesn't work for specifying arbitrary group controls other than 1Q and
2Q, which are frequently useful in SIMD32 and FP64 programs. On the
other hand the current interface forces you to update the compression
*and* group controls simultaneously, which has been the source of a
number of generator bugs (a bunch of them fixed in this series),
because in many cases we would end up resetting the group controls to
zero inadvertently even though everything we wanted to do was disable
instruction compression -- The latter seems especially unfortunate on
Gen6+ hardware which have no explicit compression control, so we would
end up bashing the quarter control field of the instruction for no
benefit.
Instead of a single function that updates both at the same time
introduce separate interfaces to update one or the other independently
preserving the current value of the other (which typically comes from
the back-end IR so it has to be respected).
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
For opcodes that changed meaning on different generations, we store a
pointer to a secondary table and the table's size in a tagged union in
place of the mnemonic and number of sources.
Acked-by: Francisco Jerez <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The previous commit replaced direct uses of opcode_descs with calls to
the wrapper function, which should be the only method of accessing
opcode_descs's data. As a result gen_from_devinfo() can also be made
static.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
I merged opcode_desc into inst_info (instead of the other way around)
because inst_info was sorted by opcode number.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Drop the uses of 'enum gen' to a plain int, so that we don't have to
expose the bitfield definitions and GEN_GE/GEN_LE macros to other users
of brw_eu.h. As a result, s/.gen/.gens/ to avoid confusion with
devinfo->gen.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The function takes a device info struct as argument in addition to the
opcode number in order to disambiguate between multiple opcode_desc
entries for different instructions with the same opcode number.
Reviewed-by: Iago Toral Quiroga <[email protected]> [v1]
[v2] mattst88: Put brw_opcode_desc() in brw_eu.c instead of moving it
there in a later patch.
Reviewed-by: Kenneth Graunke <[email protected]> [v2]
[v3] mattst88: Return NULL if opcode >= ARRAY_SIZE(opcode_descs)
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is not strictly required for the following changes because none
of the three-source opcodes we support at the moment in the compiler
back-end has been removed or redefined, but that's likely to change in
the future. In any case having hardware instructions specified as a
pair of hardware device and opcode number explicitly in all cases will
simplify the opcode look-up interface introduced in a subsequent
commit, since the opcode number alone is in general ambiguous.
Reviewed-by: Iago Toral Quiroga <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
I want to use this directly from brw_vec4_generator.cpp.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The thread scratch space is thread-local so using the full IA-coherent
stateless surface index (255 since Gen8) is unnecessary and
potentially expensive. On Gen8 and early steppings of Gen9 this is
not a functional change because the kernel already sets bit 4 of
HDC_CHICKEN0 which overrides all HDC memory access to be non-coherent
in order to workaround a hardware bug.
This happens to fix a full system hang when running any spilling code
on a pre-production SKL GT4e machine I have on my desk (forcing all
HDC access to non-coherent from the kernel up to stepping F0 might be
a good idea though regardless of this patch), and improves performance
of the OglPSBump2 SynMark benchmark run with INTEL_DEBUG=spill_fs by
33% (11 runs, 5% significance) on a production SKL GT2 (on which HDC
IA-coherency is apparently functional so it wouldn't make sense to
disable globally).
Reviewed-by: Kristian Høgsberg <[email protected]>
|
|
|
|
| |
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
| |
brw_inst.h is only for the brw_inst/brw_compact_inst functions.
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
| |
Initially just checks that sources are non-NULL, which would have
alerted us to the problem fixed by commit 6c846dc5.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Otherwise I'll have to add another later in this series.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a non-const sample number is given to interpolateAtSample it will
now generate an indirect send message with the sample ID similar to
how non-const sampler array indexing works. Previously non-const
values were ignored and instead it ended up using a constant 0 value.
The generator will try to determine if the sample ID is dynamically
uniform via nir_src_is_dynamically_uniform. If not it will query the
pixel interpolator in a loop, once for each different live sample
number. The next live sample number is found using emit_uniformize. If
multiple live channels have the same sample number then they will be
handled in a single iteration of the loop. The loop is necessary
because the indirect send message doesn't seem to have a way to
specify a different value for each fragment.
This fixes the following two Piglit tests:
arb_gpu_shader5-interpolateAtSample-nonconst
arb_gpu_shader5-interpolateAtSample-dynamically-nonuniform
v2: Handle dynamically non-uniform sample ids.
v3: Remove the BREAK instruction and predicate the WHILE directly.
Make the tokens arrays const. (Matt Turner)
v4: Iterate over the live channels instead of each possible sample
number.
v5: Don't special case immediate values in
brw_pixel_interpolator_query. Make a better wrapper for the
function to set up the PI send instruction. Ensure that the SHL
instructions are scalar. (Francisco Jerez).
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This will be used to implement the Gateway Barrier SEND needed to implement
the barrier function.
v2:
* notify => gateway_notify (Ken)
* combine short lines of brw_barrier proto/decl (mattst88)
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This will be used to implement the barrier function.
v2:
* Rename to brw_WAIT (mattst88)
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This instruction calculates the index of an arbitrary channel enabled
in the current execution mask. It's expected to be used as input for
the BROADCAST opcode, but it's implemented as a separate instruction
rather than being baked into BROADCAST because FIND_LIVE_CHANNEL has
no dependencies so it can always be CSE'ed with other instances of the
same instruction within a basic block.
v2: Whitespace fixes.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The BROADCAST instruction picks the channel from its first source
given by an index passed in as second source. This will be used in
situations where all channels from the same SIMD thread have to agree
on the value of something, e.g. a surface binding table index.
This is in particular the case for UBO, sampler and image arrays,
which can be indexed dynamically with the restriction that all active
SIMD channels access the same index, provided to the shared unit as
part of a single scalar field of the message descriptor. Simply
taking the index value from the first channel as we were doing until
now is incorrect, because it might contain an uninitialized value if
the channel had previously been disabled by non-uniform control flow.
v2: Minor style fixes. Improve commit message.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
| |
Acked-by: Kenneth Graunke <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
| |
Acked-by: Kenneth Graunke <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
Acked-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Change brw_untyped_atomic() and brw_untyped_surface_read() to take the
surface index as a register instead of a constant and to use
brw_send_indirect_message() to emit the indirect variant of send with
a dynamically calculated message descriptor. This will be required to
support variable indexing of image arrays for
ARB_shader_image_load_store.
Acked-by: Kenneth Graunke <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This name better matches what it's actually used for. The patch was
generated with the following command:
for file in *; do
sed -i -e s/brw_compile/brw_codegen/g $file
done
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
| |
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
| |
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
| |
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
| |
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
| |
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
guess_execution_size() does two things:
1. Cope with small destination registers.
2. Cope with SIMD8 vs SIMD16 mode.
This patch replaces the first with a simple if block in brw_set_dest: if
the destination register width is less than 8, you probably want the
execution size to match. (I didn't put this in the 3src block because
it doesn't seem to matter.)
Since only the FS compiler cares about SIMD16 mode, it's easy to just
set the default execution size there.
This pattern was already been proven in the Gen8+ generator, but we
didn't port it back to the existing generator when we combined the two.
This is based on a patch from Ken from about a year ago. I've rebased it
and and fixed a few bugs.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
_surface_read.
And calculate the message response size based on the number of
components rather than the other way around. This simplifies their
interface somewhat and allows the caller to request a writeback
message with more than one vector component in SIMD4x2 mode.
Reviewed-by: Topi Pohjolainen <[email protected]>
Acked-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
descriptor.
This is going to be useful because the Gen7+ uniform and varying pull
constant, texturing, typed and untyped surface read, write, and atomic
generation code on the vec4 and fs back-end all require the same logic
to handle conditionally indirect surface indices. In pseudocode:
| if (surface.file == BRW_IMMEDIATE_VALUE) {
| inst = brw_SEND(p, dst, payload);
| set_descriptor_control_bits(inst, surface, ...);
| } else {
| inst = brw_OR(p, addr, surface, 0);
| set_descriptor_control_bits(inst, ...);
| inst = brw_SEND(p, dst, payload);
| set_indirect_send_descriptor(inst, addr);
| }
This patch abstracts out this frequently recurring pattern so we can
now write:
| inst = brw_send_indirect_message(p, sfid, dst, payload, surface)
| set_descriptor_control_bits(inst, ...);
without worrying about handling the immediate and indirect surface
index cases explicitly.
v2: Rebase. Improve documentatation and commit message. (Topi)
Preserve UW destination type cargo-cult. (Topi, Ken, Matt)
Reviewed-by: Topi Pohjolainen <[email protected]>
Acked-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
From the SNB PRM, volume 4, part 1, page 193:
"The dual source render target messages only have SIMD8 forms due to
maximum message length limitations. SIMD16 pixel shaders must send two of
these messages to cover all of the pixels. Each message contains two colors
(4 channels each) for each pixel in the message payload."
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82831
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Prior to this commit, the adjust_sampler_state_pointer function took an
extra register that it could use as scratch space. The usual candidate was
the destination of the sampler instruction. However, if that register ever
aliased anything important such as the sampler index, this would scratch
all over important data. Fortunately, the calculation is such that we can
just do it in place and we don't need the scratch space at all.
Reviewed-by: Chris Forbes <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Previously, we were use the base_mrf parameter of fs_inst to store the MRF
location. In preparation for doing FB writes from the GRF, we now also
allow you to set inst->base_mrf to -1 and provide a source register.
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
| |
Notice the mistaken (but harmless) argument swapping in brw_math_invert().
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
This provides a reasonable place to enforce the hardware restriction
that indirect descriptors must be in a0.0
Signed-off-by: Chris Forbes <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Broadwell measures jump distances in bytes, so we need to scale by 16.
v2: Update the function in brw_eu.h, not in brw_eu_emit.c.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Different generations of hardware measure jump distances in different
units. Previously, every function that needed to set a jump target open
coded this scaling, or made a hardcoded assumption (i.e. just used 2).
Most functions start with the number of instructions to jump, and scale
up to the hardware-specific value. So, I made the function match that.
Others start with a byte offset, and divide by a constant (8) to obtain
the jump distance. This is actually 16 / 2 (the jump scale for Gen5-7).
v2: Make the helper a static inline defined in brw_eu.h, instead of
an actual function in brw_eu_emit.c (as suggested by Matt).
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
| |
The only difference is setting PopCount on Gen4-5.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
The #ifndef include guards already said the right thing :)
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Matt Turner <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Matt Turner <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|