| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
OpenGL's dual color blending feature was specified so that an
implementation could support both multiple render targets (MRT) and
dual source blending. Fragment shader outputs specify both "location"
(the render target number) and "index" (either color 0 or 1).
I believe DirectX only has the notion of "location" - if using dual
color blending, location 0 or 1 will specify the operands. If not,
then location means the render target index. The two features can't
be used together.
As such, some applications mistakenly try to use <loc = 0, index = 0>
and <loc = 1, index = 0> in a shader used for dual color blending with
a single render target, rather than the correct <loc = 0, index = 0>
and <loc = 0, index = 1>.
In particular, Unigine Heaven 4.0 and Valley 1.0 suffer from this bug.
Unigine is aware of the problem, and quickly developed a fix, but has
not bothered to change the download link on their website to a working
copy in over a year. People were still using the broken version and
complaining. We tried working around this by disabling dual color
blending, but that apparently hurts performance, and people were once
again unhappy.
On i965, dual source blending is achieved by using different framebuffer
write messages than normal rendering. So, we have to compile different
code for the two cases. We're not being pedantic: we actually have to
know in order to function.
Normally, dual source blending is detectable in the shader: if a shader
has an output with index = 1, then it's meant for blending, not MRT.
With the broken inputs, they're indistinguishable, so we can only tell
by looking at the current GL state.
This patch implements a new drirc workaround:
export dual_color_blend_by_location=true
which makes the i965 driver detect when OpenGL state is configured for
dual source blending, and recompile the fragment shader to use the right
messages. In that case, we allow either location = 1 or index = 1 to
specify the second source for the blending equations.
It also re-enables GL_ARB_blend_func_extended for Unigine.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92233
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
Acked-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This appeared in brw_vs.c and brw_wm.c, should have appeared in
brw_gs.c, and was soon going to have to be in brw_tcs.c and brw_tes.c as
well.
So, instead, move it to a central location (which has to know about both
struct brw_context and perf_debug()).
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
| |
We were including it in headers, which then caused it to be included in
tons of places it wasn't needed.
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When 16x MSAA is used for sampling with texelFetch the compiler needs
to use a different instruction which passes more arguments for the MCS
data. Previously on skl+ it was unconditionally using this new
instruction. However since 16x MSAA is probably going to be pretty
rare, it is probably worthwhile to avoid using this instruction for
the other sample counts. In order to do that this patch adds a new
member to brw_sampler_prog_key_data to track when a sampler refers to
a buffer with 16 samples.
Note that this isn't done for the vec4 backend because it wouldn't
change how many registers it uses.
Acked-by: Ben Widawsky <[email protected]>
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
| |
This commit moves the common/modern stuff. Some legacy stuff such as
setting use_alt_mode was left because it needs to know whether or not we're
an ARB program.
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
| |
This commit removes all dependence on GL state by getting rid of the
brw_context parameter and the GL data structures.
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For the VS and FS stages that use ARB_vertex_program or
ARB_fragment_program we don't have a shader program, however,
when debuging is enabled, we call brw_dump_ir like this:
brw_dump_ir("vertex", prog, &vs->base, &vp->program.Base);
where vs will be NULL (since prog is NULL).
As pointed out by Chris, this &vs->base is not really a dereference,
it simply computes a new address that just happens to be 0x0 because
the offset of base in brw_shader is 0. Then brw_dump_ir will see a
NULL pointer and not do anything. This is why this does not crash at
the moment. However, this does not look very safe (it would crash
for any location of base that is not the first in brw_shader), so
patch it to prevent a potential (even if unlikely) problem in the
future.
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
brw_get_shader_time_index() is all tangled up in brw_context state and
we can't call it from the compiler. Thanks the Jasons recent
refactoring, we can just get the index and pass to the emit functions
instead.
Reviewed-by: Jason Ekstrand <[email protected]>
Signed-off-by: Kristian Høgsberg Kristensen <[email protected]>
|
|
|
|
|
|
|
| |
We move these calls one level up into the codegen functions.
Reviewed-by: Topi Pohjolainen <[email protected]>
Signed-off-by: Kristian Høgsberg Kristensen <[email protected]>
|
|
|
|
|
|
|
| |
Now that everything comes in through NIR, we can pick this directly out of
the shader source and don't need to reference the gl_fragment_program.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
As of now, uniform setup is more-or-less unified between vec4 and fs and no
longer requires the fs_visitor. This makes uniform setup more of a
language/API thing than a backend compiler thing. This commit moves
setting up the stage_prog_data.params arrays to the same place as we set up
the rest of stage_prog_data.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Setting up binding tables really has little to do with the actual process
of turning shaders into instructions; it's more part of setting up
prog_data. This commit moves it out of the visitors and with the rest of
the prog_data setup stuff.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, we had a bunch of code in each stage to figure out how many
slots we needed in stage_prog_data.param. This code was mostly identical
across the stages and had been copied and pasted around. Unfortunately,
this meant that any time you did something special, you had to add code for
it to each of these places. In particular, none of the stages took
subroutines into account; they were working entirely by accident. By
taking this data from the NIR shader, we know the exact number of entries
we need and everything goes a bit smoother.
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
| |
They are no longer used.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
We're trying to avoid a libdrm dependency in the core compiler, so let's
move the perf_debug code one level up from the brw_*_emit() helpers to
the brw_codegen_*_prog() helpers.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Signed-off-by: Kristian Høgsberg Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
All other precompile functions live in the brw_<stage>.c files, make fs
follow the convention.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Signed-off-by: Kristian Høgsberg Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
load/store.
v2: Store early fragment test mode in brw_wm_prog_data instead of
getting it from core mesa data structures (Ken).
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
program.
v2: Add CS support.
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This will be used to pass image meta-data to the shader when we cannot
use typed surface reads and writes. All entries except surface_idx
and size are otherwise unused and will get eliminated by the uniform
packing pass. size will be used for bounds checking with some image
formats and will be useful for ARB_shader_image_size too. surface_idx
is always used.
v2: Add CS support. Move the image_params array back to
brw_stage_prog_data.
v3: Improve documentation.
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Ian Romanick <[email protected]>
Acked-by: Jason Ekstrand <[email protected]>
Reviewed-by: Juha-Pekka Heikkila <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Change references to gl_framebuffer::Width, Height, MaxNumLayers
and Visual::samples to use the _mesa_geometry_ convenience functions
for those places where the geometry of the gl_framebuffer is needed
(in contrast to the geometry of the intersection of the attachments
of the gl_framebuffer).
This patch is to pave the way to enable GL_ARB_framebuffer_no_attachments
on Gen7 and higher in i965.
Reviewed-by: Ian Romanick <[email protected]>
Signed-off-by: Kevin Rogovin <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This makes INTEL_DEBUG=perf report shader recompiles due to CMS vs.
UMS/IMS differences and Sandybridge textureGather workarounds.
Previously, we just flagged them as "Something else".
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This name better matches what it's actually used for. The patch was
generated with the following command:
for file in *; do
sed -i -e s/brw_compile/brw_codegen/g $file
done
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is in preparation for these functions to be called from other
files.
This commit is intended to have no functional change. It exists in
preparation for some upcoming code movement in preparation for the
shader cache.
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The dirty-bit checking from each brw_upload_<stage>_prog function is
split out into its a new brw_<stage>_state_dirty function.
This commit is intended to have no functional change. It exists in
preparation for some upcoming code movement in preparation for the
shader cache.
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Across the board of the various generations, the intial few atoms in
all of the atom lists are basically the same, (performing uploads for
the various programs). The only difference is that prior to gen6
there's an ff_gs upload in place of the later gs upload.
In this commit, instead of using the atom lists for this program state
upload, we add a new function brw_upload_programs that calls into the
per-stage upload functions which in turn check dirty bits and return
immediately if nothing needs to be done.
This commit is intended to have no functional change. The motivation
is that future code, (such as the shader cache), wants to have a
single function within which to perform various operations before and
after program upload, (with some local variables holding state across
the upload).
It may be worth looking at whether some of the other functionality
currently handled via atoms might also be more cleanly handled in a
similar fashion.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We use IEEE mode for GLSL programs, but need to use ALT mode for ARB
programs so that 0^0 == 1. The choice is based entirely on the shader
source language.
Previously, our code to determine which mode we wanted was duplicated
in 8 different places (VS and FS for Gen4-5, Gen6, Gen7, and Gen8).
The ctx->_Shader->CurrentProgram[stage] == NULL check was confusing
as well - we use CurrentProgram (non-derived state), but _Shader
(derived state). It also relies on knowing that ARB programs don't
use gl_shader_program structures today. The compiler already makes
this assumption in a few places, but I'd rather keep that assumption
out of the state upload code.
With this patch, we select the mode at compile time, and store that
choice in prog_data. The state upload code simply uses that decision.
This eliminates a BRW_NEW_*_PROGRAM dependency in the state upload code.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The "Pixel Shader Computed Depth Mode" value is entirely based on the
shader program, so we can easily do it at compile time. This avoids the
if+switch on every 3DSTATE_WM (Gen7)/3DSTATE_PS_EXTRA (Gen8+) upload,
and shares a bit more code.
This also simplifies the PMA stall code, making it match the formula
more closely, and drops a BRW_NEW_FRAGMENT_PROGRAM dependency. (Note
that the previous comment was wrong - the code and the documentation
have != PSCDEPTH_OFF, not ==.)
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was added in September 2013 when we first implemented the fast
(but lower quality) derivatives. A quick Google search didn't turn
up anyone using or recommending the option, so I suspect no one does.
Applications that want to control the quality of their derivatives can
use the new GL_ARB_derivative_control extension, or use the glHint
mechanism. The driconf option seems superfluous.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
BRW_CACHE_VS_PROG is more easily associated with program caches than
plain BRW_VS_PROG.
While we're at it, rename BRW_WM_PROG to BRW_CACHE_FS_PROG, to move away
from the outdated Windowizer/Masker name.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Most of the dirty flags were listed in some arbitrary order. Some used
bonus parenthesis. Some put multiple flags on one line, others put one
per line. Some used tabs instead of spaces...but only on some lines.
This patch settles on one flag per line, in alphabetical order, using
spaces instead of tabs, and sheds the unnecessary parentheses.
Sorting was mostly done with vim's visual block feature and !sort,
although I alphabetized short lists by hand; it was pretty manual.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When using MRT on Gen4-5, we have to simulate GL's alpha test feature
by emitting discards in the fragment shader. In this case, it makes
sense to set prog_data->uses_kill, which means the fragment shader may
kill pixels via the discard mechanism.
This saves us from having to look an extra key value in a couple of
places, including in the generator.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
All shader stages have these fields, so it makes sense to store them in
the common base structure, rather than duplicating them in each.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The brw_stage_prog_data struct previously contained an array of float pointers
to the values of parameters. These were then copied into a batch buffer to
upload the values using a regular assignment. However the float values were
also being overloaded to store integer values for integer uniforms. This can
break if x87 floating-point registers are used to do the assignment because
the fst instruction tries to fix up invalid float values. If an integer
constant happened to look like an invalid float value then it would get
altered when it was copied into the batch buffer.
This patch changes the pointers to be gl_constant_value instead so that the
assignment should end up copying without any alteration. This also makes it
more obvious that the values being stored here are overloaded for multiple
types.
There are some static asserts where the values are uploaded to ensure that the
size of gl_constant_value is the same as a float.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=81150
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For a long time, we've wanted a place to put utility code which isn't
directly tied to Mesa or Gallium internals. This patch creates a new
src/util directory for exactly that purpose, and builds the contents as
libmesautil.la.
ralloc seemed like a good first candidate. These days, it's directly
used by mesa/main, i965, i915, and r300g, so keeping it in src/glsl
didn't make much sense.
Signed-off-by: Kenneth Graunke <[email protected]>
v2 (Jason Ekstrand): More realloc uses and some scons fixes
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We might be able to do this without an extra program key field, but this
is non-invasive and fixes the bug, for now.
This fixes the following Piglit tests on Broadwell:
- ARB_sample_shading/builtin-gl-sample-id 2
- ARB_sample_shading/builtin-gl-sample-position 2
- EXT_framebuffer_multisample/multisample-blit 2 color
- EXT_framebuffer_multisample/multisample-blit 2 color linear
- EXT_framebuffer_multisample/multisample-blit 2 depth
- EXT_framebuffer_multisample/no-color 2 depth combined
- EXT_framebuffer_multisample/no-color 2 depth separate
- EXT_framebuffer_multisample/no-color 2 depth single
- EXT_framebuffer_multisample/no-color 2 depth-computed combined
- EXT_framebuffer_multisample/no-color 2 depth-computed separate
- EXT_framebuffer_multisample/no-color 2 depth-computed single
- EXT_framebuffer_multisample/unaligned-blit 2 color msaa
- EXT_framebuffer_multisample/unaligned-blit 2 depth msaa
Signed-off-by: Kenneth Graunke <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80991
Reviewed-by: Matt Turner <[email protected]>
Cc: "10.2" <[email protected]>
|
|
|
|
|
|
|
|
| |
Otherwise, the performance warning for shader recompiles will just say
"something else".
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The new hardware actually supports this OpenGL 1.x feature natively,
so we can finally drop our shader workarounds.
Not many applications use GL_CLAMP, and most use it unintentionally, but
it's trivial to do right, so we should.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Cc: "10.2" <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We already have a perfectly good copy of the program key, and nobody is
going to modify it. The only reason we copied it was because the
brw_wm_compile structure embedded the key rather than pointing to it.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Instead, just pass the key and prog_data as separate parameters.
This moves it up a level - one step further toward getting rid of it.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
|
|
|
|
|
|
|
|
| |
'c' is going away, but we still need a memory context that lives
for the duration of the compile.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We throw away the data generated during compilation on the success path,
so we really ought to on the failure path as well. The caller has no
access to it anyway, so it's purely leaked.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
With this one use gone, c->last_scratch is now only used inside
fs_visitor. The rest of the driver uses prog_data->total_scratch.
We already compute similar prog_data fields in fs_visitor, so this
seems reasonable.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The if (!allocated_without_spills) block is an obvious spot for this
performance warning message.
In the Vec4 backend, scratch is also used for indirect access of
temporary arrays. The FS backend doesn't implement that yet, but
if it did, this message would be inaccurate, since scratch access
wouldn't necessarily mean spilling. Moving it preemptively fixes that.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I'm probably not the only person that has tried to kill _ReallyEnabled.
This does the mechanical part of the work, and cleans _ReallyEnabled from
i965.
I think that using _Current makes texture management clearer: You can't
have multiple targets in use in the same texture image unit at the same
time, because there's just that one pointer.
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|