| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
| |
These functions' prototypes are marked with extern "C", which apparently
overrides a lack of extern "C" at the definition site if the prototype
has been seen first.
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
| |
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
| |
Added in commits 36fd65381 and 337dad8ce even though the existing
include was in view.
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
| |
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
| |
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
| |
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
| |
Now that backend_reg inherits from brw_reg, we have to be careful to
avoid the object slicing problem.
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the next patch, I make backend_reg's inheritance from brw_reg
private, which confuses clang when it sees the type "struct brw_reg" in
the derived class constructors, thinking it is referring to the
privately inherited brw_reg:
brw_fs.cpp:366:23: error: 'brw_reg' is a private member of 'brw_reg'
fs_reg::fs_reg(struct brw_reg reg) :
^
brw_shader.h:39:22: note: constrained by private inheritance here
struct backend_reg : private brw_reg
^~~~~~~~~~~~~~~
brw_reg.h:232:8: note: member is declared here
struct brw_reg {
^
Avoid this by marking brw_reg with the scope resolution operator.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In order to do this, we have to change the signature of the
backend_reg(brw_reg) constructor to take a reference to a brw_reg in
order to avoid unresolvable ambiguity about which constructor is
actually being called in the other modifications in this patch.
As far as I understand it, the rule in C++ is that if multiple
constructors are available for parent classes, the one closest to you in
the class heirarchy is closen, but if one of them didn't take a
reference, that screws things up.
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
| |
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
| |
Kind of a handy function. And I'll want it available outside of i965
for common nir-pass helpers.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Now that nir_lower_tex can do texture swizzle lowering, we can use that
instead of repeating more-or-less the same code in both backends. This
both allows us to share code and means that things like the tg4
work-arounds are somewhat simpler because they don't have to take the
swizzle into account.
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Previously, we had a rescale_texcoords helper in the FS backend for
handling rescaling of texture coordinates. Now that we can do variants in
NIR, we can use nir_lower_tex to do the rescaling for us. This allows us
to delete the i965-specific code and gives us proper TEXTURE_RECTANGLE and
GL_CLAMP handling in vertex and geometry shaders.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This allows us to insert NIR passes between initial NIR compilation and
optimization (link time) and actual backend code-gen. In particular, it
will allow us to do shader variants in NIR and share some of that shader
variant code between backends.
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
|
|
| |
At the moment, brw_create_nir just calls the three stages in sequence so
there's not much difference. Soon, however, we will want to start doing
variants in NIR at which point the postprocessing step will have to move
from shader create time to codegen time.
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
|
| |
This fixes a regression introduced in b1a83b5d1 that caused basically all
shaders to fail to compile on 32-bit platforms.
Reported-by: Mark Janes <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It looks like the sampler hardware doesn't take into account the
surface format when sampling a cleared color after a fast clear has
been done. So for example if you clear a GL_RED surface to 1,1,1,1
then the sampling instructions will return 1,1,1,1 instead of 1,0,0,1.
This patch makes it override the color that is programmed in the
surface state in order to swizzle for luminance and intensity as well
as overriding the missing components.
Fixes the ext_framebuffer_multisample-fast-clear Piglit test.
v2: Handle luminance and intensity formats
Reviewed-by: Ben Widawsky <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
v2: do the same in tgsi_to_nir (Samuel)
v3: added missing cases after rebase (Iago)
v4: Add a blank space after '#' in one of the comments (Matt)
Reviewed-by: Iago Toral Quiroga <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
There are various restrictions on what the hstride can be that depend on
the Gen, and now that we're using hstride == 2 for packing/unpacking
doubles, we're going to run into these restrictions a lot more often.
Pull them out into a separate function, and move the one restriction we
checked previously into it.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This can happen when the source of the compare was split by the SIMD
lowering pass. Potentially, we could allow the case where the exec size
of scan_inst is larger, and scan_inst has the right quarter selected,
but doing that seems a little more risky.
v2: Merge the bail condition into the the previous if/break block (Matt)
Reviewed-by: Iago Toral Quiroga <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
| |
Reviewed-by: Iago Toral Quiroga <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we tried to get/set something that was exactly 64 bits, we would
try to do (1 << 64) - 1 to calculate the mask which doesn't give us all
1's like we want.
v2 (Iago)
- Replace ~0 by ~0ull
- Removed unnecessary parenthesis
v3 (Kristian)
- Avoid the conditional
Reviewed-by: Iago Toral Quiroga <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
|
|
|
|
|
|
|
|
| |
v2:
- Simplify code (Iago)
Reviewed-by: Iago Toral Quiroga <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I noticed that brw_vs.c does this.
I believe the point is that nir->num_uniforms is either counted in
scalar components (in scalar mode), or vec4 slots (in vector mode).
But we want param_count to be in scalar components regardless, so
we have to scale up in vector mode.
We don't have to scale up in scalar mode, though.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
| |
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
| |
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
|
| |
nir_locals, nir_ssa_values, and nir_system_values are all dst_reg (not
that that makes a whole lot of sense to me), and only nir_inputs is a
src_reg.
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
| |
In most cases (when the negate is copy propagated and the MOV removed),
this is two instructions on Gen >= 8 and only two instructions on
earlier platforms -- and it doesn't use the flag register.
Reviewed-by: Jason Ekstrand <[email protected]>
|
| |
|
|
|
|
| |
The region fields are unioned with the immediate storage.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SKL supports the ability to do fast clears and resolves of 32b RGBA as both
integer and floats. This patch only enables float color clears because we
haven't yet enabled integer color clears, (HW support for that was added in
BDW).
v2: Remove LUMINANCE16F and INTENSITY16F special cases since they are now
handled by Neil's patch to disable MSAA fast clears.
Signed-off-by: Ben Widawsky <[email protected]>
Reviewed-by: Neil Roberts <[email protected]>
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This reverts commit 8a0c85b25853decb4a110b6d36d79c4f095d437b.
It's not a strict revert because I don't want to bring back the gen < 9 check at
this point in time.
Reviewed-by: Neil Roberts <[email protected]>
|
|
|
|
|
|
| |
This reverts commit dcd59a9e322edeea74187bcad65a8e56c0bfaaa2.
Reviewed-by: Neil Roberts <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The impetus for this patch comes from a seemingly benign statement within the
spec (quoted within the patch).
It is very important for clearing multiple color buffer attachments and can be
observed in the following piglit tests:
spec/arb_framebuffer_object/fbo-drawbuffers-none glclear
spec/ext_framebuffer_multisample/blit-multiple-render-targets 0
v2: Doing the framebuffer binding only once (Chad)
Directly use the renderbuffers from the mt (Chad)
v3: Patch from Neil whose feedback I originally missed.
Signed-off-by: Ben Widawsky <[email protected]>
Reviewed-by: Chad Versace <[email protected]>
Reviewed-by: Neil Roberts <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some of the information originally in this commit message is now in the patch
before this.
SKL adds compressible render targets and as a result mutates some of the
programming for fast clears and resolves. There is a new internal surface type
called the CCS. The old AUX_MCS bit becomes AUX_CCS_D. "Auxiliary Surfaces For
Sampled Tiled Resource".
The formats which are supported are defined in the table titled "Render Target
Surface Types [SKL+]". There is no PRM yet to reference. The previously
implemented helper function already does the right thing provided the table is
correct.
v2: Use better English in commit message (Matt)
s/compressable/compressible/ (Matt)
Don't compare bools to true (Matt)
Use the helper function and don't increase the context size - this is mostly
implemented in the patch just before this (Chad, Neil)
Remove an "invalid" assert (Chad)
Fix assertion to check num_samples > 1, instead of num_samples (Chad)
v3:
Use Matt's code as Requested-by: Chad. I didn't even look at it since Chad said
he was fine with that, and presumably Matt is fine with it.
v4: Use better quote from spec (Topi)
Cc: Chad Versace <[email protected]>
Signed-off-by: Ben Widawsky <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Background: Prior to Skylake and since Ivybridge Intel hardware has had the
ability to use a MCS (Multisample Control Surface) as auxiliary data in
"compression" operations on the surface. This reduces memory bandwidth. This
hardware was either used for MSAA compression, or fast clear operations. On
Gen8, a similar mechanism exists to allow the hiz buffer to be sampled from, and
therefore this feature is sometimes referred to more generally as "AUX buffers".
Skylake adds the ability to have the display engine directly source compressed
surfaces on top of the ability to sample from them. Inference dictates that
enabling this display features adds a restriction to the formats which could
actually be compressed. This is backed up by a blurb in the AUX_CCS_D section
from the RENDER_SURFACE_STATE: "In addition, if the surface is bound to the
sampling engine, Surface Format must be supported for Render Target Compression
for surfaces bound to the sampling engine." The current set of surfaces seems
to be a subset as compared to previous gens (see the next patch). Also, if I had
to guess I would guess that future gens add support for more surface formats. To
make handling this a bit easier to read, and more future proof, the support for
this is moved into the surface formats table.
Along with the modifications to the table, a helper function is also provided to
determine if a surface is CCS_E compatible. Because fast clears are currently
disabled on SKL, we can plumb the helper all the way through here, and not
actually have anything break.
v2:
- rename ccs to ccs_e; Requested-by: Chad
- rename lossless_compression to lossless_compression Requested-by: Chad
- change meaning of brw_losslessly_compressible_format Requested-by: Chad
- related changes to the code to reflect this.
- remove excess ccs (Chad)
v3:
- Commit message changes (Topi)
- Const some things which could be const (Topi)
Requested-by: Chad Versace <[email protected]>
Requested-by: Neil Roberts <[email protected]>
Signed-off-by: Ben Widawsky <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Patch was originally called:
i965/skl: Enable fast color clears on SKL
Skylake introduces some differences in the way that fast clears are programmed
and in the restrictions for using fast clears. Since some of these are
non-obvious, and fast clears are currently disabled globally, we can enable the
simple stuff here and leave the weirder stuff and separately reviewable work.
Based on a patch originally from Kristian.
Note that within this patch the change in scaling factors could be achieved with
this hunk instead. I've opted to keep things more like how the docs describe it
however.
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
@@ -150,9 +150,13 @@ intel_get_non_msrt_mcs_alignment(struct brw_context *brw,
/* In release builds, fall through */
case I915_TILING_Y:
*width_px = 32 / mt->cpp;
- *height = 4;
+ if (brw->gen >= 9)
+ *height = 2;
+ else
+ *height = 4;
v2: Add braces for the multiline (Matt + Chad)
Comment updates (requested by Chad)
Modified commit message
Commit message from Chad explaining the MCS height change (Chad)
Signed-off-by: Ben Widawsky <[email protected]>
Reviewed-by: Neil Roberts <[email protected]>
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
| |
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On the vec4 backend, textureSamplesIdentical() will always return
false. There are currently no test cases for the vec4 backend, so we
don't have much confidence in any implementation. We also don't think
anyone is likely to miss it.
v2: Handle immediate value for MCS smarter. Rebase on changes to
nir_texop_sampels_identical (missing second parameter). Suggested by
Jason.
v3: Add Neil's code to handle 16x MSAA in the FS. Also rebase on top of
f9a9ba5e. Stub out the vec4 implementation.
Signed-off-by: Ian Romanick <[email protected]>
Signed-off-by: Neil Roberts <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]> [v2]
Reviewed-by: Chris Forbes <[email protected]> [v2]
|
|
|
|
|
|
|
|
| |
v2: Rebase on top of f9a9ba5e.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This is the NIR analog to GLSL IR ir_samples_identical.
v2: Don't add the second nir_tex_src_ms_index parameter. Suggested by
Ken and Jason.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Otherwise, passing -1 gets you:
error: invalid conversion from 'int' to 'nir_variable_mode' [-fpermissive]
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
The previous two commits make this unnecessary.
Reviewed-by: Emil Velikov <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
Cuts 1.5k of .text.
Reviewed-by: Emil Velikov <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
W/UW immediates are 16-bits, but those 16-bits must be replicated
in the high 16-bits of the 32-bit field.
Remove the useless W/UW immediate saturating code, since we'll now be
using the appropriate immediate (and W/UW immediates in the IR can now
no longer be larger than 16-bits).
Reviewed-by: Emil Velikov <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Cuts 10k of .text, of which only 776 bytes are the fs_reg constructor
implementations themselves.
text data bss dec hex filename
5204535 214112 27784 5446431 531b1f i965_dri.so before
5193977 214112 27784 5435873 52f1e1 i965_dri.so after
Reviewed-by: Emil Velikov <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This partially reverts commit bbf8239f92ecd79431dfa41402e1c85318e7267f.
I didn't like that commit to begin with -- computing things at compile
time is fine -- but for purposes of verifying that the resulting values
are correct, looking up 0x00 and 0x30 in a table is a lot better than
evaluating a recursive function.
Anyway, by making brw_imm_vf4() take the actual 8-bit restricted floats
directly (instead of only integral values that would be converted to
restricted float), we can use this function as a replacement for the
vector float src_reg/fs_reg constructors.
brw_float_to_vf() is not currently an inline function, so it will not be
evaluated at compile time. I'll address that in a follow-up patch.
Reviewed-by: Emil Velikov <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This allows arbitrary non-constant indices on GS input arrays,
both for the vertex index, and any array offsets beyond that.
All indirects are handled via the pull model. We could potentially
handle indirect addressing of pushed data as well, but it would add
additional code complexity, and we usually have to pull inputs anyway
due to the sheer volume of input data. Plus, marking pushed inputs
as live due to indirect addressing could exacerbate register pressure
problems pretty badly. We'd need to be careful.
v2: Use updated MOV_INDIRECT opcode.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Abdiel Janulgue <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
|