| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These functions handle the conversion of a vec4 into the form expected
by the dataport unit in message and message return payloads. The
conversion is not always trivial because some messages don't support
SIMD4x2 for some generations, in which case a strided copy may be
necessary.
v2: Split from the FS implementation.
v3: Rewrite to avoid evil array_reg, emit_collect and emit_zip.
Reviewed-by: Kristian Høgsberg <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
See "i965/fs: Introduce FS IR builder." for the rationale.
v2: Drop scalarizing VEC4 builder.
v3: Take a backend_shader as constructor argument. Improve handling
of debug annotations and execution control flags. Rename "instr"
variable. Initialize cursor to NULL by default and add method to
explicitly point the builder at the end of the program.
Reviewed-by: Kristian Høgsberg <[email protected]>
|
|
|
|
|
|
|
|
| |
Currently the validation pass only validates that regs_read and
regs_written are consistent with the sizes of VGRF's. We can add more as
we find it to be useful.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This moves the compute shader code around in order to make the way the
code is split up more consistent. There should be no functional changes.
Typically we have a few files per stage:
brw_vs.c, brw_wm.c brw_gs.c:
code to drive code generation and implement precompiling and
cache search.
genX_<stage>_state.c
gen specific implementation of the state emission for the shader
stage.
The brw_*_emit() functions are all in the same files as the visitor
classes they use (with the exception of VS, which may use either vec4 or
fs).
To make compute follow this convention, we move the brw_cs_emit()
function into brw_fs.cpp. We can then rename brw_cs.cpp to brw_cs.c and
do this in C like the other similar files. Finally, move state setup
and atoms to gen7_cs_state.c.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Signed-off-by: Kristian Høgsberg Kristensen <[email protected]>
|
|
|
|
|
|
|
| |
Outputs from the vertex shader become array inputs in the geomtry shader,
but the arrays are interleaved, so we need to map our inputs accordingly.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This patch will add a brw_vec4_nir.cpp file filled with entry point methods to
the main functionality, following a structure similar to brw_fs_nir.cpp.
Subsequent patches in this series will be adding the implementations for these
methods, incrementally.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Implement helper functions that can be used to construct and send
untyped and typed surface read, write and atomic messages to the
shared dataport unit easily.
v2: Drop VEC4 suport.
v3: Reimplement in terms of logical send opcodes.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Start trimming the fat from intel_batchbuffer.c. First by moving the set
of routines for emitting PIPE_CONTROLS (along with the lore concerning
hardware workarounds) to a separate brw_pipe_control.c
Signed-off-by: Chris Wilson <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This was originally only used by the vertex shader, but it's now used by
the geometry shader as well, and will also eventually be used for
tessellation control and evaluation shaders.
I suspect it will be easier to find in a file named after the concept.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The purpose of this change is threefold: First, it improves the
modularity of the compiler back-end by separating the functionality
required to construct an i965 IR program from the rest of the visitor
god-object, what in turn will reduce the coupling between other
components and the visitor allowing a more modular design. This patch
doesn't yet remove the equivalent functionality from the visitor
classes, as it involves major back-end surgery.
Second, it improves consistency between the scalar and vector
back-ends. The FS and VEC4 builders can both be used to generate
scalar code with a compatible interface or they can be used to
generate natural vector width code -- 1 or 4 components respectively.
Third, the approach to IR construction is somewhat different to what
the visitor classes currently do. All parameters affecting code
generation (execution size, half control, point in the program where
new instructions are inserted, etc.) are encapsulated in a stand-alone
object rather than being quasi-global state (yes, anything defined in
one of the visitor classes is effectively global due to the tight
coupling with virtually everything else in the compiler back-end).
This object is lightweight and can be copied, mutated and passed
around, making helper IR-building functions more flexible because they
can now simply take a builder object as argument and will inherit its
IR generation properties in exactly the same way that a discrete
instruction would from the same builder object.
The emit_typed_write() function from my image-load-store branch is an
example that illustrates the usefulness of the latter point: Due to
hardware limitations the function may have to split the untyped
surface message in 8-wide chunks. That means that the several
functions called to help with the construction of the message payload
are themselves required to set the execution width and half control
correctly on the instructions they emit, and to allocate all registers
with half the default width. With the previous approach this would
require the used helper functions to be aware of the parameters that
might differ from the default state and explicitly set the instruction
bits accordingly. With the new approach they would get a modified
builder object as argument that would influence all instructions
emitted by the helper function as if it were the default state.
Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD()
method. It doesn't actually emit any instructions, they are simply
created and inserted into an exec_list which is returned for the
caller to emit at some location of the program. This sort of two-step
emission becomes unnecessary with the builder interface because the
insertion point is one more of the code generation parameters which
are part of the builder object. The caller can simply pass
VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the
location of the program where the effect of the constant load is
desired. This two-step emission (which pervades the compiler back-end
and is in most cases redundant) goes away: E.g. ADD() now actually
adds two registers rather than just creating an ADD instruction in
memory, emit(ADD()) is no longer necessary.
v2: Drop scalarizing VEC4 builder.
v3: Take a backend_shader as constructor argument. Improve handling
of debug annotations and execution control flags.
v4: Drop Gen6 IF with inline comparison. Rename "instr" variable.
Initialize cursor to NULL by default and add method to explicitly
point the builder at the end of the program.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
| |
Now that everything is running through NIR, this is all dead.
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously whenever a primitive is drawn the driver would call
_mesa_check_conditional_render which blocks waiting for the result of
the query to determine whether to render. On Gen7+ there is a bit in
the 3DPRIMITIVE command which can be used to disable the primitive
based on the value of a state bit. This state bit can be set based on
whether two registers have different values using the MI_PREDICATE
command. We can load these two registers with the pixel count values
stored in the query begin and end to implement conditional rendering
without stalling.
Unfortunately these two source registers were not in the whitelist of
available registers in the kernel driver until v3.19. This patch uses
the command parser version from intel_screen to detect whether to
attempt to set the predicate data registers.
The predicate enable bit is currently only used for drawing 3D
primitives. For blits, clears, bitmaps, copypixels and drawpixels it
still causes a stall. For most of these it would probably just work to
call the new brw_check_conditional_render function instead of
_mesa_check_conditional_render because they already work in terms of
rendering primitives. However it's a bit trickier for blits because it
can use the BLT ring or the blorp codepath. I think these operations
are less useful for conditional rendering than rendering primitives so
it might be best to leave it for a later patch.
v2: Use the command parser version to detect whether we can write to
the predicate data registers instead of trying to execute a
register load command.
v3: Simple rebase
v4: Changes suggested by Kenneth Graunke: Split the
load_64bit_register function out to a separate patch so it can be
a shared public function. Avoid calling
_mesa_check_conditional_render if we've already determined that
there's no query object. Some styling fixes.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
brw_emit_gpgpu_walker will be implemented in a subsequent patch.
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, we translated into NIR and did all the optimizations and
lowering as part of running fs_visitor. This meant that we did all of
that work twice for fragment shaders - once for SIMD8, and again for
SIMD16. We also had to redo it every time we hit a state based
recompile.
We now generate NIR once at link time. ARB programs don't have linking,
so we instead generate it at ProgramStringNotify time.
Mesa's fixed function vertex program handling doesn't bother to inform
the driver about new programs at all (which is rather mean), so we
generate NIR at the last minute, if it hasn't happened already.
shader-db runs ~9.4% faster on my i7-5600U, with a release build.
v2: Check NirOptions != NULL in ProgramStringNotify(). Don't bother
using _mesa_program_enum_to_shader_stage as we already know it.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
v2: Fix the spelling of analyze and re-arrange code for better readability
as per Connor's comments.
v3: Make the naming of things more consistent and add a pile of comments
v4: Stop trying to avoid vectors
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
total instructions in shared programs: 5885407 -> 5940958 (0.94%)
instructions in affected programs: 3617311 -> 3672862 (1.54%)
helped: 3
HURT: 23556
GAINED: 31
LOST: 165
... but will allow us to always emit MAD instructions.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit refactors the tiled_memcpy code in intel_tex_subimage.c and
moves it into its own file intel_tiled_memcpy files. Also, xtile_copy and
ytile_copy are renamed to linear_to_xtiled and linear_to_ytiled
respectively. The *_faster functions are similarly renamed.
There was also a bit of logic to select between the the libc provided
memcpy function and our custom memcpy that does an RGBA -> BGRA swizzle.
This was moved into an intel_get_memcpy function so that rgba8_copy can
live (and be inlined) in intel_tiled_memcpy.c.
v2: Jason Ekstrand <[email protected]>
- Better commit message
- Fix up the copyright on the intel_tiled_memcpy files
- Various whitespace fixes
- Moved a bunch of stuff that did not need to be exposed from
intel_tiled_memcpy.h to intel_tiled_memcpy.c
- Added proper documentation for intel_get_memcpy
- Incorperated the ptrdiff_t tweaks from commit 225a09790
v3: Jason Ekstrand <[email protected]>
- Fixed a comment
- Move the tile size constants into the .c file
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
|
| |
total instructions in shared programs: 5974160 -> 5959463 (-0.25%)
instructions in affected programs: 1743737 -> 1729040 (-0.84%)
GAINED: 0
LOST: 12
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This is similar to the GLSL IR frontend, except consuming NIR. This lets
us test NIR as part of an actual compiler.
v2: Jason Ekstrand <[email protected]>:
Make brw_fs_nir build again
Only use NIR of INTEL_USE_NIR is set
whitespace fixes
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Improves 359 shaders by >=10%
114 shaders by >=20%
91 shaders by >=30%
82 shaders by >=40%
22 shaders by >=50%
4 shaders by >=60%
2 shaders by >=80%
total instructions in shared programs: 5845346 -> 5822422 (-0.39%)
instructions in affected programs: 364979 -> 342055 (-6.28%)
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
These source files support actual geometry shaders, so using "gs" for
the name makes a lot of sense. We're going to be adding SIMD8 geometry
shader support as well, at which point "vec4_gs" will be a misnomer.
Signed-off-by: Kenneth Graunke <[email protected]>
Acked-by: Matt Turner <[email protected]>
Acked-by: Jason Ekstrand <[email protected]>
Acked-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The brw_gs.[ch] and brw_gs_emit.c source files contain code for
emulating fixed-function unit functionality (VF primitive decomposition
or SOL) using the GS unit. They do not contain code to support proper
geometry shaders.
We've taken to calling that code "ff_gs" (see brw_ff_gs_prog_key,
brw_ff_gs_prog_data, brw_context::ff_gs, brw_ff_gs_compile,
brw_ff_gs_prog). So it makes sense to make the filenames match.
Signed-off-by: Kenneth Graunke <[email protected]>
Acked-by: Matt Turner <[email protected]>
Acked-by: Jason Ekstrand <[email protected]>
Acked-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Geometry shaders in gen6 are significantly different from gen7+ so it is better
to have them implemented in a different file rather than adding gen6 branching
paths all over brw_vec4_gs_visitor.cpp.
This commit adds an initial implementation that only handles point output, which
is the simplest case.
Acked-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
We will program the gen6 hiz depth state differently to enable layered
rendering on gen6.
v2:
* Remove unneeded gen6_emit_depthbuffer as suggested by Topi
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We will program the gen6 renderbuffer surface state differently to
enable layered rendering on gen6.
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch uses the infrastructure put in place by previous patches
to implement fast color clears and replicated color clears in terms of
meta operations.
This works all the way back to gen7 where fast clear was introduced and
adds support for fast clear on gen8. It replaces the blorp path
completely and improves on a few cases. Layered clears are now done
using instanced rendering and multiple render-target clears use a
MRT shader with rep16 writes.
Signed-off-by: Kristian Høgsberg <[email protected]>
Acked-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
No longer needed as of last commit.
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
| |
We now use the brw_eu_emit.c code instead.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This, together with the meta path, provides a complete implemetation of
ARB_copy_image.
v2: Add a fallback memcpy path for when the texture is too big for the
blitter
v3: Properly support copying between two places on the same texture in the
memcpy fallback
v4: Properly handle blit between the same two images in the fallback path
v5: Properly handle blit between the same two compressed images in the
fallback path
v6: Fix a typo in a comment
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Neil Roberts <[email protected]>
|
|
|
|
|
|
|
|
| |
The code in brw_sampler_state.c now handles all generations; we don't
need the extra Gen7+ only code anymore.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When the driver was originally written, it only supported texturing in
the pixel shader backend; vertex and geometry shader texturing came much
later. Originally, the pixel shader was referred to as "WM" (the
Windowizer/Masker unit). So, this code happened to only be relevant for
the WM stage, at the time.
However, sampler state really applies to all stages, so putting "wm" in
the filename doesn't make sense. I dropped it in gen7_sampler_state.c;
at this point the asymmetry just trips people up.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
[mattst88]: Modified to perform CSE on instructions with
the same writemask. Offered no improvement before.
total instructions in shared programs: 1995633 -> 1995185 (-0.02%)
instructions in affected programs: 14410 -> 13962 (-3.11%)
Reviewed-by: Matt Turner <[email protected]>
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
The #ifndef include guards already said the right thing :)
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
| |
The functionality has been merged into brw_disasm.c; use that instead.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Will be used to print disassembly after jump targets are set and
instructions are compacted, while still retaining higher-level IR
annotations and basic block information.
An array of 'struct annotation' will live along side the generated
assembly. The generators will populate the array with their IR
annotations, and basic block pointers if the instructions began or ended
a basic block pointer.
We'll then update the instruction offset when we compact instructions
and then using the annotations print the disassembly.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
v2: Create the intel renderbuffer with level hardcoded to zero instead
of overriding it in the surface state configuration. Also moved the
dimension adjustments for tiling, mip level, msaa into the render
buffer creation. Finally prepares for another blit path needed for
miptree updownsampling.
v3 (Ken): Dropped unnecessary memory context for "ralloc_asprintf()"
Cc: "10.2" <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Signed-off-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
| |
Cc: "10.2" <[email protected]>
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
total instructions in shared programs: 1653399 -> 1651790 (-0.10%)
instructions in affected programs: 92157 -> 90548 (-1.75%)
GAINED: 2
LOST: 2
Also significantly reduces the number of optimization loop iterations:
total loop iterations in shared programs: 39724 -> 31651 (-20.32%)
loop iterations in affected programs: 21617 -> 13544 (-37.35%)
Including some great pathological cases, like 29 -> 3 in Strike Suit
Zero and 24 -> 3 in Dota2.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
The function has gotten large, and brw_fs.cpp is the largest source file
in the driver.
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This gets us equivalent code paths on BDW and pre-BDW, except for stencil
(where we don't have MSAA stencil resolve code yet)
Improves MSAA-forced citybench by 7.94496% +/- 2.38429% (n=16). Reduces
DRI2 MSAA glxgears performance by -12.3559% +/- 1.52845% (n=9).
v2: Move the new meta code to brw_meta_updownsample.c, name it
brw_meta_updownsample(), add a comment about
intel_rb_storage_first_mt_slice(), and rename that function and move
the RB generation into it (review ideas by Ken).
v3: Fix 2 src vs dst pasteos in previous change.
v4: Skip this path pre-gen8 for now, until we can analyze the glxgears
performance delta some more.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
This lowering pass will be useful for gallium drivers as well, in order to support
the GL TG4 oddity that is textureGatherOffsets.
Reviewed-by: Chris Forbes <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is quite similar to the Gen7 code. The main changes:
- 48-bit relocations
- Thread count is specified as U/2-1 instead of U-1.
- An extra DWord (DW9) with clip planes, URB entry output length/offsets
- We need to program the "Expected Vertex Count" (VerticesIn)
v2: Set the number of binding table entries so they can be prefetched
(requested by Eric Anholt).
v3: Add a WARN_ONCE for a missing workaround.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|