| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ARB_explicit_uniform_location allows the index for subroutine functions
to be explicitly set in the shader.
This patch reduces the restriction on the index qualifier in
validate_layout_qualifiers() to allow it to be applied to subroutines
and adds the new subroutine qualifier validation to ast_function::hir().
ast_fully_specified_type::has_qualifiers() is updated to allow the
index qualifier on subroutine functions when explicit uniform locations
is available.
A new check is added to ast_type_qualifier::merge_qualifier() to stop
multiple function qualifiers from being defied, before this patch this
would cause a segfault.
Finally a new variable is added to ir_function_signature to store the
index. This value is validated and the non explicit values assigned in
link_assign_subroutine_types().
Reviewed-by: Tapani Pälli <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SKL supports the ability to do fast clears and resolves of 32b RGBA as both
integer and floats. This patch only enables float color clears because we
haven't yet enabled integer color clears, (HW support for that was added in
BDW).
v2: Remove LUMINANCE16F and INTENSITY16F special cases since they are now
handled by Neil's patch to disable MSAA fast clears.
Signed-off-by: Ben Widawsky <[email protected]>
Reviewed-by: Neil Roberts <[email protected]>
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This reverts commit 8a0c85b25853decb4a110b6d36d79c4f095d437b.
It's not a strict revert because I don't want to bring back the gen < 9 check at
this point in time.
Reviewed-by: Neil Roberts <[email protected]>
|
|
|
|
|
|
| |
This reverts commit dcd59a9e322edeea74187bcad65a8e56c0bfaaa2.
Reviewed-by: Neil Roberts <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The impetus for this patch comes from a seemingly benign statement within the
spec (quoted within the patch).
It is very important for clearing multiple color buffer attachments and can be
observed in the following piglit tests:
spec/arb_framebuffer_object/fbo-drawbuffers-none glclear
spec/ext_framebuffer_multisample/blit-multiple-render-targets 0
v2: Doing the framebuffer binding only once (Chad)
Directly use the renderbuffers from the mt (Chad)
v3: Patch from Neil whose feedback I originally missed.
Signed-off-by: Ben Widawsky <[email protected]>
Reviewed-by: Chad Versace <[email protected]>
Reviewed-by: Neil Roberts <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some of the information originally in this commit message is now in the patch
before this.
SKL adds compressible render targets and as a result mutates some of the
programming for fast clears and resolves. There is a new internal surface type
called the CCS. The old AUX_MCS bit becomes AUX_CCS_D. "Auxiliary Surfaces For
Sampled Tiled Resource".
The formats which are supported are defined in the table titled "Render Target
Surface Types [SKL+]". There is no PRM yet to reference. The previously
implemented helper function already does the right thing provided the table is
correct.
v2: Use better English in commit message (Matt)
s/compressable/compressible/ (Matt)
Don't compare bools to true (Matt)
Use the helper function and don't increase the context size - this is mostly
implemented in the patch just before this (Chad, Neil)
Remove an "invalid" assert (Chad)
Fix assertion to check num_samples > 1, instead of num_samples (Chad)
v3:
Use Matt's code as Requested-by: Chad. I didn't even look at it since Chad said
he was fine with that, and presumably Matt is fine with it.
v4: Use better quote from spec (Topi)
Cc: Chad Versace <[email protected]>
Signed-off-by: Ben Widawsky <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Background: Prior to Skylake and since Ivybridge Intel hardware has had the
ability to use a MCS (Multisample Control Surface) as auxiliary data in
"compression" operations on the surface. This reduces memory bandwidth. This
hardware was either used for MSAA compression, or fast clear operations. On
Gen8, a similar mechanism exists to allow the hiz buffer to be sampled from, and
therefore this feature is sometimes referred to more generally as "AUX buffers".
Skylake adds the ability to have the display engine directly source compressed
surfaces on top of the ability to sample from them. Inference dictates that
enabling this display features adds a restriction to the formats which could
actually be compressed. This is backed up by a blurb in the AUX_CCS_D section
from the RENDER_SURFACE_STATE: "In addition, if the surface is bound to the
sampling engine, Surface Format must be supported for Render Target Compression
for surfaces bound to the sampling engine." The current set of surfaces seems
to be a subset as compared to previous gens (see the next patch). Also, if I had
to guess I would guess that future gens add support for more surface formats. To
make handling this a bit easier to read, and more future proof, the support for
this is moved into the surface formats table.
Along with the modifications to the table, a helper function is also provided to
determine if a surface is CCS_E compatible. Because fast clears are currently
disabled on SKL, we can plumb the helper all the way through here, and not
actually have anything break.
v2:
- rename ccs to ccs_e; Requested-by: Chad
- rename lossless_compression to lossless_compression Requested-by: Chad
- change meaning of brw_losslessly_compressible_format Requested-by: Chad
- related changes to the code to reflect this.
- remove excess ccs (Chad)
v3:
- Commit message changes (Topi)
- Const some things which could be const (Topi)
Requested-by: Chad Versace <[email protected]>
Requested-by: Neil Roberts <[email protected]>
Signed-off-by: Ben Widawsky <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Patch was originally called:
i965/skl: Enable fast color clears on SKL
Skylake introduces some differences in the way that fast clears are programmed
and in the restrictions for using fast clears. Since some of these are
non-obvious, and fast clears are currently disabled globally, we can enable the
simple stuff here and leave the weirder stuff and separately reviewable work.
Based on a patch originally from Kristian.
Note that within this patch the change in scaling factors could be achieved with
this hunk instead. I've opted to keep things more like how the docs describe it
however.
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
@@ -150,9 +150,13 @@ intel_get_non_msrt_mcs_alignment(struct brw_context *brw,
/* In release builds, fall through */
case I915_TILING_Y:
*width_px = 32 / mt->cpp;
- *height = 4;
+ if (brw->gen >= 9)
+ *height = 2;
+ else
+ *height = 4;
v2: Add braces for the multiline (Matt + Chad)
Comment updates (requested by Chad)
Modified commit message
Commit message from Chad explaining the MCS height change (Chad)
Signed-off-by: Ben Widawsky <[email protected]>
Reviewed-by: Neil Roberts <[email protected]>
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
| |
Trivial.
|
|
|
|
| |
Trivial.
|
|
|
|
|
|
|
| |
v2 + v3: forgot null-pointer checks (spotted by Samuel Pitoiset)
Reviewed-by: Samuel Pitoiset <[email protected]>
Tested-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
It is easy enough to pre-determine the required size, and arrays are
generally better behaved especially when they get large.
v2: make sure init_perf_monitor returns true when no counters are active
(spotted by Samuel Pitoiset)
Reviewed-by: Samuel Pitoiset <[email protected]>
Tested-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Samuel Pitoiset <[email protected]>
Tested-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Previously, when a performance monitor was initialized, an inner loop through
all driver queries with string comparisons for each enabled performance
monitor counter was used. This hurts when a driver exposes lots of queries.
Reviewed-by: Samuel Pitoiset <[email protected]>
Tested-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Samuel Pitoiset <[email protected]>
Tested-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was only used to implement an unnecessarily restrictive interpretation
of the spec of AMD_performance_monitor. The spec says
A performance monitor consists of a number of hardware and software
counters that can be sampled by the GPU and reported back to the
application.
I guess one could take this as a requirement that counters _must_ be sampled
by the GPU, but then why are they called _software_ counters? Besides,
there's not much reason _not_ to expose all counters that are available,
and this simplifies the code.
v3: add a missing change in the nouveau driver (thanks Samuel Pitoiset)
Reviewed-by: Samuel Pitoiset <[email protected]>
Tested-by: Samuel Pitoiset <[email protected]>
|
|
|
|
| |
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
| |
With the earlier issues resolved we can expose the extension.
Signed-off-by: Boyan Ding <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
| |
In preparation for supporting GL_KHR_debug in OpenGL ES
v2: add a missing hunk in _mesa_IsEnabled (Emil)
Signed-off-by: Boyan Ding <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
| |
As defined in the spec
when implemented in an OpenGL ES context, all entry points defined
by this extension must have a "KHR" suffix.
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On the vec4 backend, textureSamplesIdentical() will always return
false. There are currently no test cases for the vec4 backend, so we
don't have much confidence in any implementation. We also don't think
anyone is likely to miss it.
v2: Handle immediate value for MCS smarter. Rebase on changes to
nir_texop_sampels_identical (missing second parameter). Suggested by
Jason.
v3: Add Neil's code to handle 16x MSAA in the FS. Also rebase on top of
f9a9ba5e. Stub out the vec4 implementation.
Signed-off-by: Ian Romanick <[email protected]>
Signed-off-by: Neil Roberts <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]> [v2]
Reviewed-by: Chris Forbes <[email protected]> [v2]
|
|
|
|
|
|
|
|
| |
v2: Rebase on top of f9a9ba5e.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This is the NIR analog to GLSL IR ir_samples_identical.
v2: Don't add the second nir_tex_src_ms_index parameter. Suggested by
Ken and Jason.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Otherwise, passing -1 gets you:
error: invalid conversion from 'int' to 'nir_variable_mode' [-fpermissive]
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
The previous two commits make this unnecessary.
Reviewed-by: Emil Velikov <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
Cuts 1.5k of .text.
Reviewed-by: Emil Velikov <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
W/UW immediates are 16-bits, but those 16-bits must be replicated
in the high 16-bits of the 32-bit field.
Remove the useless W/UW immediate saturating code, since we'll now be
using the appropriate immediate (and W/UW immediates in the IR can now
no longer be larger than 16-bits).
Reviewed-by: Emil Velikov <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Cuts 10k of .text, of which only 776 bytes are the fs_reg constructor
implementations themselves.
text data bss dec hex filename
5204535 214112 27784 5446431 531b1f i965_dri.so before
5193977 214112 27784 5435873 52f1e1 i965_dri.so after
Reviewed-by: Emil Velikov <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This partially reverts commit bbf8239f92ecd79431dfa41402e1c85318e7267f.
I didn't like that commit to begin with -- computing things at compile
time is fine -- but for purposes of verifying that the resulting values
are correct, looking up 0x00 and 0x30 in a table is a lot better than
evaluating a recursive function.
Anyway, by making brw_imm_vf4() take the actual 8-bit restricted floats
directly (instead of only integral values that would be converted to
restricted float), we can use this function as a replacement for the
vector float src_reg/fs_reg constructors.
brw_float_to_vf() is not currently an inline function, so it will not be
evaluated at compile time. I'll address that in a follow-up patch.
Reviewed-by: Emil Velikov <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Enable developers to know if the table's alphabetical sorting
is maintained or lost.
v2: Move "*" next to pointer name (Matt)
Include extensions_table.h instead of extensions.h (Ian)
Remove extra " *" in comment (Ian)
Signed-off-by: Nanley Chery <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Make it easier to determine where to add new extensions.
Performed with the vim sort command.
v2: Insert newline after last #define (Matt)
Signed-off-by: Nanley Chery <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
There are no shaders, so it doesn't even make sense to expose the
extension.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Cc: Nanley Chery <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This allows arbitrary non-constant indices on GS input arrays,
both for the vertex index, and any array offsets beyond that.
All indirects are handled via the pull model. We could potentially
handle indirect addressing of pushed data as well, but it would add
additional code complexity, and we usually have to pull inputs anyway
due to the sheer volume of input data. Plus, marking pushed inputs
as live due to indirect addressing could exacerbate register pressure
problems pretty badly. We'd need to be careful.
v2: Use updated MOV_INDIRECT opcode.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Abdiel Janulgue <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This commit adds code for testing nir_shader_clone by running it after each
and every optimization pass and throwing away the old shader. Testing
nir_shader_clone is hidden behind a new INTEL_CLONE_NIR environment
variable.
Reviewed-by: Rob Clark <[email protected]>
Acked-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Failing to call nir_metadata_preserve() can have nasty consequences:
some pass breaks dominance information, but leaves it marked as valid,
causing some subsequent pass to go haywire and probably crash.
This pass adds a simple validation mechanism to ensure passes handle
this properly. We add a new bogus metadata flag that isn't used for
anything in particular, set it before each pass, and ensure it *isn't*
still set after the pass. nir_metadata_preserve will reset the flag,
so correct passes will work, and bad passes will assert fail.
(I would have made these functions static inline, but nir.h is included
in C++, so we can't bit-or enums without lots of casting...)
Thanks to Dylan Baker for the idea.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
OPT() is the normal macro for passes that return booleans, while OPT_V()
is a variant that works for passes that don't properly report progress.
(Such passes should be fixed to return a boolean, eventually.)
These macros take care of calling nir_validate_shader() and setting
progress appropriately. In the future, it would be easy to add shader
dumping similar to INTEL_DEBUG=optimizer by extending the macro.
v2 (Jason Ekstrand):
- Fix an unused variable warning
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
3DSTATE_TE has partitioning, output topology, and domain fields,
each of which has several enumerated values. We'll also need to
switch on the domain, so enums (rather than #defines) seem like a
natural fit.
I chose to put these in brw_compiler.h because they'll be stored
in struct brw_tes_prog_data, which will live there.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
Cc: "10.6 11.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are currently a bunch of formats that behave strangely when
sampling the cleared color from the MCS buffer on SKL. They seem to
mostly be formats that don't have an alpha component, although it's
not all of them, and we haven't yet found anything in the specs which
would explain this. For now to be on the safe side this patch just
prevents fast clears for MSRTs on SKL altogether so that when fast
clears are eventually enabled it will only be for single-sampled
surfaces. The assumption is that clears are probably more likely to be
used in single-sampled applications anyway so we can at least get them
working and we can enable MSRTs later once we understand the problem
better.
This patch should have no functional effect other than perhaps
receiving fewer perf_debug messages on SKL+.
v2: Improve the commit message to avoid saying the patch disables fast
clears because it will be merged before fast clears are enabled
for any surfaces so it doesn't actually disable anything.
Reviewed-by: Ben Widawsky <[email protected]>
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
| |
PIPE_CONTOL!!!
|
|
|
|
|
|
|
|
|
|
| |
This helps address a coverity warning and prevents future questions about this
code.
Reported-by: Coverity (via Ilia)
Cc: Ilia Mirkin <[email protected]>
Signed-off-by: Ben Widawsky <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
We basically just need to uncomment Ben's code.
v2: Fix obvious bugs caught by Ben.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ben Widawsky <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Tapani Pälli <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Even though both tessellation shader stages must be used together, I
still think it makes sense to add separate debug flags for each stage.
It makes it possible to read the TCS/HS, rule out problems, then read
the TES/DS separately, without sifting through as much printed text.
I decided to add both the GL names (tcs/tes) and hardware names (hs/ds)
so they can be used interchangeably.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Tapani Pälli <[email protected]>
|