| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Mostly a dummy git mv with a couple of noticable parts:
- With the earlier header cleanups, nothing in src/intel depends
files from src/mesa/drivers/dri/i965/
- Both Autoconf and Android builds are addressed. Thanks to Mauro and
Tapani for the fixups in the latter
- brw_util.[ch] is not really compiler specific, so it's moved to i965.
v2:
- move brw_eu_defines.h instead of brw_defines.h
- remove no-longer applicable includes
- add missing vulkan/ prefix in the Android build (thanks Tapani)
v3:
- don't list brw_defines.h in src/intel/Makefile.sources (Jason)
- rebase on top of the oa patches
[Emil Velikov: commit message, various small fixes througout]
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
These go in wm_prog_key so they're part of the compiler interface.
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
For the on-disk shader cache we want to be able to differentiate
between a program that was linked and one that was loaded from cache.
V2:
- don't return the new enum directly to the application when queried,
instead return GL_TRUE or GL_FALSE as required. Fixes google-chrome
corruptions when using cache.
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We had five copies of the same "walk the cache and look for an
existing shader variant for this program" code. Now we have one
helper function that returns the key.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eduardo Lima Mitev <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
According to the "Gather4 R32G32_FLOAT Bug" internal documentation
page, the R32G32_UINT and R32G32_SINT formats are affected by the
same bug as R32G32_FLOAT. Applying the same workarounds should be
viable - apparently the R32G32_FLOAT_LD format shouldn't corrupt
integer data which is NaN or other sketchy floating point values.
One irritating caveat is that, because it's a FLOAT format, the
alpha channel or any set to SCS_ONE return 0x3f8 (1.0) rather than
integer 1. So we need shader code to whack those channels to 1.
Fixes GL45-CTS.texture_gather.plain-gather-int-cube-rg on Haswell.
v2: Fix swizzle component zeroing (caught by Jordan Justen).
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
| |
We no longer need it.
While we are at it we mark the vs, gs, and wm codegen functions as static.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
We can now just get the data needed from the gl_shader_program_data
pointer in gl_program.
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
brw_assign_common_binding_table_offsets()
We now get everything we need directly from gl_program so there is
no need for this.
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rather then passing gl_shader_program.
The only field use was Name which is the same as the Id field in
gl_program.
For wm and vs we also make the functions static and move them before
the codegen functions.
This change reduces the codegen functions dependency on gl_shader_program.
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
gl_program
This removes another dependency on gl_shader_program in the codegen
functions.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
| |
This removes another dependency on gl_shader_program in the codegen
functions which will help allow us to use gl_program in the
CurrentProgram array rather than gl_shader_program.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
This allows us to delete brw_shader and removes the last use of
gl_linked_shader in the codegen paths.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
| |
gl_shader_program
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There's no point in performing depth writes when the depth test
comparison function is set to GL_EQUAL - it would just write out
the same value that's already there (if it is written at all). While
this is harmless from a functional perspective, it hurts performance.
Obviously, writing to memory is not free, but there's another more
subtle impact as well: it can prevent early depth optimizations.
Depth writes aren't supposed to happen for pixels that are killed
by fragment shader discard statements or the alpha test. So, with
depth writes enabled and either of those, the pixel shader must be
invoked to determine whether or not to perform the write. This is
fairly stupid in the EQUAL case - we're running a shader to decide
whether to replace the existing depth value with itself.
By disabling these pointless writes, we allow early depth even with
discards and alpha testing, allowing the hardware to skip the pixel
shader altogether if the depth test fails.
Improves performance of Unigine Valley:
- Skylake GT2: +17.8%
- Broadwell GT3e: +11.5%
- Cherrytrail: +19.4%
Huge thanks to Mark Janes for building frameretrace [1], the performance
analysis tool that helped us find this issue, and to Robert Bragg for
providing us performance metrics on Linux. Mark also spent the time to
analyze Valley performance on Windows vs. Linux and discovered a
discrepancy in early depth test metrics. Once he had isolated a draw
call and drawn attention to the problem, fixing it was pretty simple.
[1] https://github.com/janesma/apitrace/wiki/frameretrace-branch
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
| |
This is a step towards freeing gl_linked_shader after linking.
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
| |
Since we started releasing GLSL IR after linking the only time we can
print GLSL IR is during linking. When regenerating variants only NIR
will be available.
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Ben Widawsky <[email protected]>
|
|
|
|
| |
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Here we move OriginUpperLeft and PixelCenterInteger into gl_program
all other fields have been replace by shader_info.
V2: Don't use anonymous union/structs to hold vertex/fragment fields
suggested by Ian.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
Note we access shader_info from the program struct rather than the
nir_shader pointer because shader cache won't create a nir_shader.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Here brw_setup_vue_interpolation() is rewritten not to use the InterpQualifier
array in gl_fragment_program which will allow us to remove it.
This change also makes the code which is only used by gen4/5 more self contained
as it now has its own gen5_fragment_program struct rather than storing the map
in brw_context. This means the interpolation map will only get processed once
and will get stored in the in memory cache rather than being processed everytime
the fs changes.
Also by calling this from the fs compile code rather than from the upload code
and using the interpolation assigned there we can get rid of the
BRW_NEW_INTERPOLATION_MAP flag.
It might not seem ideal to add a gen5_fragment_program struct however by the end
of this series we will have gotten rid of all the brw_{shader_stage}_program
structs and replaced them with a generic brw_program struct so there will only
be two program structs which is better than what we have now.
V2: Don't remove BRW_NEW_INTERPOLATION_MAP from dirty_bit_map until the following
patch to fix build error.
V3 - Suggestions by Jason:
- name struct gen4_fragment_program rather than gen5_fragment_program
- don't use enum with memset()
- create interp mode set helper and simplify logic to call it
- add assert when calling function to show prog will never be NULL for
gen4/5 i.e. no Vulkan
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
When restoring something from shader cache we won't have and don't
want to create a nir_shader this change detaches the two.
There are other advantages such as being able to reuse the
shader info populated by GLSL IR.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This is a step towards dropping the GLSL IR version of
do_set_program_inouts() in i965 and moving towards native nir support.
This is important because we want to eventually convert to nir and
use its optimisations passes before we can call this GLSL IR pass.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This is a step towards dropping the GLSL IR version of
do_set_program_inouts() in i965 and moving towards native nir support.
This is important because we want to eventually convert to nir and
use its optimisations passes before we can call this GLSL IR pass.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This is a step towards dropping the GLSL IR version of
do_set_program_inouts() in i965 and moving towards native nir support.
This is important because we want to eventually convert to nir and
use its optimisations passes before we can call this GLSL IR pass.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This is a step towards dropping the GLSL IR version of
do_set_program_inouts() in i965 and moving towards native nir support.
This is important because we want to eventually convert to nir and
use its optimisations passes before we can call this GLSL IR pass.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Just say no to:
- brw->wm.base.prog_data = &brw->wm.prog_data->base.base;
We'll just use the brw_stage_prog_data pointer in brw_stage_state
and downcast it to brw_wm_prog_data as needed.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Timothy Arceri <[email protected]>
Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>
|
|
|
|
|
|
|
|
| |
Now that we have gen_device_info mutable, we can update its values and drop
all copies we had in brw_context.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Make gen_device_info a mutable structure so we can update the fields that
can be refined by querying the kernel (like subslices and EU numbers).
This patch does not make any functional change, it just makes
gen_get_device_info() fill a structure rather than returning a const
pointer.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
"intelScreen" is wordy and also doesn't fit our style guidelines.
"screen" is shorter, which is nice, because we use it fairly often.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Generated by:
sed -i -e 's/brw_device_info/gen_device_info/g' src/intel/**/*.c
sed -i -e 's/brw_device_info/gen_device_info/g' src/intel/**/*.h
sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.c
sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.cpp
sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.h
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Unfortunately due to the inconsistent meaning of some surface state
structure fields, we cannot re-use the same binding table entries for
sampling from and rendering into the same set of render buffers, so we
need to allocate a separate binding table block specifically for
render target reads if the non-coherent path is in use.
The slight noise is due to the change of
brw_assign_common_binding_table_offsets to return the next available
binding table index rather than void.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some of the following changes in this series are specific to the
non-coherent path, so I need some way to tell whether the coherent or
non-coherent path is in use. The flag defaults to the value of the
gl_extensions::MESA_shader_framebuffer_fetch enable so that it can be
overridden easily on hardware that supports both framebuffer fetch
extensions in order to test the non-coherent path, like:
MESA_EXTENSION_OVERRIDE=-GL_EXT_shader_framebuffer_fetch
(Of course trying to force-enable the coherent framebuffer fetch
extension on hardware without native support won't work and lead to
assertion failures).
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This would cause gl_FragStencilRef to be counted as a color output
incorrectly during the precompile phase, which leads to unnecessary
recompilation on master and could trigger an assertion failure in
fs_visitor::emit_fb_writes() on my i965-fb-fetch branch.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This will be used to find out what per-thread slot size a previously
allocated scratch BO was used with in order to fix a hardware race
condition without introducing additional stalls or memory allocations.
Instead of calling brw_get_scratch_bo() manually from the various
codegen functions, call a new helper function that keeps track of the
per-thread scratch size and conditionally allocates a larger scratch
BO.
v2: Handle BO allocation manually instead of relying on
brw_get_scratch_bo (Ken).
Cc: <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It appears we were over-allocating these arrays.
Previously we would use nir->num_uniforms directly for scalar
programs, and multiply it by 4 for vec4 programs.
Instead we should have been dividing by 4 in both cases to convert
from bytes to a gl_constant_value count. The size of gl_constant_value
is 4 bytes.
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
| |
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that we handle flipping and other gl_FragCoord transformations
via a uniform, these key fields have no users.
This patch actually eliminates the associated recompiles. The Tomb
Raider benchmark's minimum FPS increases from ~1 FPS to a reasonable
number.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This allows us to disable spilling for blorp shaders since blorp state
setup doesn't handle spilling. Without this, blorp fails hard if you run
with INTEL_DEBUG=spill.
Reviewed-by: Francisco Jerez <[email protected]>
Tested-by: Francisco Jerez <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes 4 dEQP-GLES31.functional.shaders.multisample_interpolation tests:
- interpolate_at_offset.no_qualifiers.default_framebuffer
- interpolate_at_offset.centroid_qualifier.default_framebuffer
- interpolate_at_offset.sample_qualifier.default_framebuffer
- interpolate_at_offset.array_element.default_framebuffer
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit reworks and simplifies the way we handle persample shading in
the shader key and prog_data. The previous approach had three different
key bits that had slightly different and hard-to-decern meanings while the
new bits are far more clear. This commit changes it to two easily
understood bits that communicate everything we need:
1) key->persample_interp: means that the user has requested persample
interpolation through the API. This is equivalent to having
SAMPLE_SHADING enabled and having MIN_SAMPLE_SHADING_VALUE set high
enough that you actually get multiple per-sample invocations.
2) key->multisample_fbo: means that the shader will be running on an
actual multi-sampled framebuffer.
This commit also adds a new "persample_dispatch" bit to prog_data which
indicates that the shader should be run in persample mode. This way the
state setup code doesn't have to look at the fragment program or GL state
and can just pull that data out of the prog_data.
In theory, this shuffle could mean more recompiles. However, in practice,
we were shoving enough state into the key before that we were probably
hitting a recompile on every per-sample shader anyway.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Ben Widawsky <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I'm going to need a key entry meaning "we have a multisample FBO,
and multisampling is enabled" in an upcoming patch. This is basically
wm_key->compute_sample_id, except that it also checks that the SAMPLE_ID
system value is read.
The only use of wm_key->compute_sample_id is in emit_sampleid_setup(),
which is only called when handling the SAMPLE_ID system value. So we
can just eliminate the check and generalize the field.
v2: Also update the Vulkan driver.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This was only used by the old gl_SampleID calculations. The new code
doesn't need to handle 2x specially.
v2: Delete it from the Vulkan driver, too.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|