aboutsummaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965/brw_wm.c
Commit message (Collapse)AuthorAgeFilesLines
* i965: Move the back-end compiler to src/intel/compilerJason Ekstrand2017-03-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | Mostly a dummy git mv with a couple of noticable parts: - With the earlier header cleanups, nothing in src/intel depends files from src/mesa/drivers/dri/i965/ - Both Autoconf and Android builds are addressed. Thanks to Mauro and Tapani for the fixups in the latter - brw_util.[ch] is not really compiler specific, so it's moved to i965. v2: - move brw_eu_defines.h instead of brw_defines.h - remove no-longer applicable includes - add missing vulkan/ prefix in the Android build (thanks Tapani) v3: - don't list brw_defines.h in src/intel/Makefile.sources (Jason) - rebase on top of the oa patches [Emil Velikov: commit message, various small fixes througout] Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Reduce cross-pollination between the DRI driver and compilerJason Ekstrand2017-03-011-1/+0
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Move some gen4 WM defines to brw_compiler.hJason Ekstrand2017-03-011-16/+16
| | | | | | | | These go in wm_prog_key so they're part of the compiler interface. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* st/mesa/i965: create link status enumTimothy Arceri2017-02-091-1/+1
| | | | | | | | | | | | For the on-disk shader cache we want to be able to differentiate between a program that was linked and one that was loaded from cache. V2: - don't return the new enum directly to the application when queried, instead return GL_TRUE or GL_FALSE as required. Fixes google-chrome corruptions when using cache. Reviewed-by: Anuj Phogat <[email protected]>
* i965: Make a helper for finding an existing shader variant.Kenneth Graunke2017-01-171-17/+5
| | | | | | | | | We had five copies of the same "walk the cache and look for an existing shader variant for this program" code. Now we have one helper function that returns the key. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eduardo Lima Mitev <[email protected]>
* i965: Fix textureGather with RG32I/UI on Gen7.Kenneth Graunke2017-01-131-7/+34
| | | | | | | | | | | | | | | | | | | According to the "Gather4 R32G32_FLOAT Bug" internal documentation page, the R32G32_UINT and R32G32_SINT formats are affected by the same bug as R32G32_FLOAT. Applying the same workarounds should be viable - apparently the R32G32_FLOAT_LD format shouldn't corrupt integer data which is NaN or other sketchy floating point values. One irritating caveat is that, because it's a FLOAT format, the alpha channel or any set to SCS_ONE return 0x3f8 (1.0) rather than integer 1. So we need shader code to whack those channels to 1. Fixes GL45-CTS.texture_gather.plain-gather-int-cube-rg on Haswell. v2: Fix swizzle component zeroing (caught by Jordan Justen). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: stop passing gl_shader_program to the precompile and codegen functionsTimothy Arceri2017-01-061-9/+4
| | | | | | | | We no longer need it. While we are at it we mark the vs, gs, and wm codegen functions as static. Reviewed-by: Eric Anholt <[email protected]>
* i965: make use of new is_arb_asm flagTimothy Arceri2017-01-061-7/+6
| | | | Reviewed-by: Eric Anholt <[email protected]>
* i965: stop passing gl_shader_program to brw_nir_setup_glsl_uniforms()Timothy Arceri2017-01-061-1/+1
| | | | | | | We can now just get the data needed from the gl_shader_program_data pointer in gl_program. Reviewed-by: Lionel Landwerlin <[email protected]>
* i965: stop passing gl_shader_program to ↵Timothy Arceri2017-01-061-5/+2
| | | | | | | | | brw_assign_common_binding_table_offsets() We now get everything we need directly from gl_program so there is no need for this. Reviewed-by: Lionel Landwerlin <[email protected]>
* i965: pass gl_program to the brw_*_debug_recompile() functionsTimothy Arceri2017-01-061-63/+62
| | | | | | | | | | | | | | Rather then passing gl_shader_program. The only field use was Name which is the same as the Id field in gl_program. For wm and vs we also make the functions static and move them before the codegen functions. This change reduces the codegen functions dependency on gl_shader_program. Reviewed-by: Lionel Landwerlin <[email protected]>
* i965: get InfoLog and LinkStatus via the shader program data pointer in ↵Timothy Arceri2017-01-031-2/+2
| | | | | | | | | gl_program This removes another dependency on gl_shader_program in the codegen functions. Reviewed-by: Eric Anholt <[email protected]>
* i965: update brw_get_shader_time_index() not to take gl_shader_programTimothy Arceri2017-01-031-2/+5
| | | | | | | | This removes another dependency on gl_shader_program in the codegen functions which will help allow us to use gl_program in the CurrentProgram array rather than gl_shader_program. Reviewed-by: Eric Anholt <[email protected]>
* i965: move compiled_once flag to brw_programTimothy Arceri2016-12-301-7/+3
| | | | | | | This allows us to delete brw_shader and removes the last use of gl_linked_shader in the codegen paths. Reviewed-by: Eric Anholt <[email protected]>
* st/mesa/glsl/nir/i965: make use of new gl_shader_program_data in ↵Timothy Arceri2016-11-191-2/+2
| | | | | | gl_shader_program Reviewed-by: Emil Velikov <[email protected]>
* i965: Disable depth writes when depth test is GL_EQUAL.Kenneth Graunke2016-11-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There's no point in performing depth writes when the depth test comparison function is set to GL_EQUAL - it would just write out the same value that's already there (if it is written at all). While this is harmless from a functional perspective, it hurts performance. Obviously, writing to memory is not free, but there's another more subtle impact as well: it can prevent early depth optimizations. Depth writes aren't supposed to happen for pixels that are killed by fragment shader discard statements or the alpha test. So, with depth writes enabled and either of those, the pixel shader must be invoked to determine whether or not to perform the write. This is fairly stupid in the EQUAL case - we're running a shader to decide whether to replace the existing depth value with itself. By disabling these pointless writes, we allow early depth even with discards and alpha testing, allowing the hardware to skip the pixel shader altogether if the depth test fails. Improves performance of Unigine Valley: - Skylake GT2: +17.8% - Broadwell GT3e: +11.5% - Cherrytrail: +19.4% Huge thanks to Mark Janes for building frameretrace [1], the performance analysis tool that helped us find this issue, and to Robert Bragg for providing us performance metrics on Linux. Mark also spent the time to analyze Valley performance on Windows vs. Linux and discovered a discrepancy in early depth test metrics. Once he had isolated a draw call and drawn attention to the problem, fixing it was pretty simple. [1] https://github.com/janesma/apitrace/wiki/frameretrace-branch Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Anuj Phogat <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: get num_images from shader_info rather than gl_linked_shaderTimothy Arceri2016-11-171-2/+1
| | | | | | This is a step towards freeing gl_linked_shader after linking. Reviewed-by: Emil Velikov <[email protected]>
* i965: only try print GLSL IR once when using INTEL_DEBUG to dump irTimothy Arceri2016-11-171-4/+4
| | | | | | | | Since we started releasing GLSL IR after linking the only time we can print GLSL IR is during linking. When regenerating variants only NIR will be available. Reviewed-by: Emil Velikov <[email protected]>
* i965: Fix alpha-to-coverage and alpha test enabled checksAnuj Phogat2016-11-071-2/+2
| | | | | Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Ben Widawsky <[email protected]>
* i965: replace brw_fragment_program with brw_programTimothy Arceri2016-10-261-6/+4
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* st/mesa/r200/i915/i965: eliminate gl_fragment_programTimothy Arceri2016-10-261-13/+12
| | | | | | | | | | Here we move OriginUpperLeft and PixelCenterInteger into gl_program all other fields have been replace by shader_info. V2: Don't use anonymous union/structs to hold vertex/fragment fields suggested by Ian. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/mesa/st/swrast: set fs shader_info directly and switch to using itTimothy Arceri2016-10-261-13/+9
| | | | | | | Note we access shader_info from the program struct rather than the nir_shader pointer because shader cache won't create a nir_shader. Reviewed-by: Jason Ekstrand <[email protected]>
* i965: rewrite brw_setup_vue_interpolation()Timothy Arceri2016-10-261-4/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Here brw_setup_vue_interpolation() is rewritten not to use the InterpQualifier array in gl_fragment_program which will allow us to remove it. This change also makes the code which is only used by gen4/5 more self contained as it now has its own gen5_fragment_program struct rather than storing the map in brw_context. This means the interpolation map will only get processed once and will get stored in the in memory cache rather than being processed everytime the fs changes. Also by calling this from the fs compile code rather than from the upload code and using the interpolation assigned there we can get rid of the BRW_NEW_INTERPOLATION_MAP flag. It might not seem ideal to add a gen5_fragment_program struct however by the end of this series we will have gotten rid of all the brw_{shader_stage}_program structs and replaced them with a generic brw_program struct so there will only be two program structs which is better than what we have now. V2: Don't remove BRW_NEW_INTERPOLATION_MAP from dirty_bit_map until the following patch to fix build error. V3 - Suggestions by Jason: - name struct gen4_fragment_program rather than gen5_fragment_program - don't use enum with memset() - create interp mode set helper and simplify logic to call it - add assert when calling function to show prog will never be NULL for gen4/5 i.e. no Vulkan Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nir/i965/anv/radv/gallium: make shader info a pointerTimothy Arceri2016-10-261-10/+10
| | | | | | | | | | When restoring something from shader cache we won't have and don't want to create a nir_shader this change detaches the two. There are other advantages such as being able to reuse the shader info populated by GLSL IR. Reviewed-by: Jason Ekstrand <[email protected]>
* i965: get inputs read from nir infoTimothy Arceri2016-10-061-5/+9
| | | | | | | | | | This is a step towards dropping the GLSL IR version of do_set_program_inouts() in i965 and moving towards native nir support. This is important because we want to eventually convert to nir and use its optimisations passes before we can call this GLSL IR pass. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: get outputs written from nir infoTimothy Arceri2016-10-061-3/+7
| | | | | | | | | | This is a step towards dropping the GLSL IR version of do_set_program_inouts() in i965 and moving towards native nir support. This is important because we want to eventually convert to nir and use its optimisations passes before we can call this GLSL IR pass. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: remove remaining tabs in brw_wm.cTimothy Arceri2016-10-061-44/+44
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: get uses discard from nir infoTimothy Arceri2016-10-061-2/+4
| | | | | | | | | | This is a step towards dropping the GLSL IR version of do_set_program_inouts() in i965 and moving towards native nir support. This is important because we want to eventually convert to nir and use its optimisations passes before we can call this GLSL IR pass. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: get uses texture gather from nir infoTimothy Arceri2016-10-061-2/+3
| | | | | | | | | | This is a step towards dropping the GLSL IR version of do_set_program_inouts() in i965 and moving towards native nir support. This is important because we want to eventually convert to nir and use its optimisations passes before we can call this GLSL IR pass. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Eliminate brw->wm.prog_data pointer.Kenneth Graunke2016-10-051-5/+5
| | | | | | | | | | | | Just say no to: - brw->wm.base.prog_data = &brw->wm.prog_data->base.base; We'll just use the brw_stage_prog_data pointer in brw_stage_state and downcast it to brw_wm_prog_data as needed. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* i965: make vs and fs key generation helpers available to shader cacheCarl Worth2016-09-271-1/+1
| | | | | Signed-off-by: Timothy Arceri <[email protected]> Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>
* i965: get rid of duplicated values from gen_device_infoLionel Landwerlin2016-09-231-2/+3
| | | | | | | | Now that we have gen_device_info mutable, we can update its values and drop all copies we had in brw_context. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel/i965: make gen_device_info mutableLionel Landwerlin2016-09-231-1/+1
| | | | | | | | | | | | Make gen_device_info a mutable structure so we can update the fields that can be refined by querying the kernel (like subslices and EU numbers). This patch does not make any functional change, it just makes gen_get_device_info() fill a structure rather than returning a const pointer. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Rename intelScreen to screen.Kenneth Graunke2016-09-201-2/+2
| | | | | | | | "intelScreen" is wordy and also doesn't fit our style guidelines. "screen" is shorter, which is nice, because we use it fairly often. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* intel: s/brw_device_info/gen_device_info/Jason Ekstrand2016-09-031-1/+1
| | | | | | | | | | | | | Generated by: sed -i -e 's/brw_device_info/gen_device_info/g' src/intel/**/*.c sed -i -e 's/brw_device_info/gen_device_info/g' src/intel/**/*.h sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.c sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.cpp sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.h Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: Allocate space in the binding table for non-coherent FB fetch.Francisco Jerez2016-08-251-3/+10
| | | | | | | | | | | | | | Unfortunately due to the inconsistent meaning of some surface state structure fields, we cannot re-use the same binding table entries for sampling from and rendering into the same set of render buffers, so we need to allocate a separate binding table block specifically for render target reads if the non-coherent path is in use. The slight noise is due to the change of brw_assign_common_binding_table_offsets to return the next available binding table index rather than void. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Add brw_wm_prog_key bit specifying whether FB reads should be coherent.Francisco Jerez2016-08-251-0/+6
| | | | | | | | | | | | | | | | | Some of the following changes in this series are specific to the non-coherent path, so I need some way to tell whether the coherent or non-coherent path is in use. The flag defaults to the value of the gl_extensions::MESA_shader_framebuffer_fetch enable so that it can be overridden easily on hardware that supports both framebuffer fetch extensions in order to test the non-coherent path, like: MESA_EXTENSION_OVERRIDE=-GL_EXT_shader_framebuffer_fetch (Of course trying to force-enable the coherent framebuffer fetch extension on hardware without native support won't work and lead to assertion failures). Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Don't consider the stencil output to be a color output.Francisco Jerez2016-08-241-1/+2
| | | | | | | | | This would cause gl_FragStencilRef to be counted as a color output incorrectly during the precompile phase, which leads to unnecessary recompilation on master and could trigger an assertion failure in fs_visitor::emit_fb_writes() on my i965-fb-fetch branch. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: make more effective use of SamplersUsedTimothy Arceri2016-07-051-7/+5
| | | | Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* i965: Keep track of the per-thread scratch allocation in brw_stage_state.Francisco Jerez2016-06-131-4/+3
| | | | | | | | | | | | | | | | This will be used to find out what per-thread slot size a previously allocated scratch BO was used with in order to fix a hardware race condition without introducing additional stalls or memory allocations. Instead of calling brw_get_scratch_bo() manually from the various codegen functions, call a new helper function that keeps track of the per-thread scratch size and conditionally allocates a larger scratch BO. v2: Handle BO allocation manually instead of relying on brw_get_scratch_bo (Ken). Cc: <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Shrink stage_prog_data param array lengthJordan Justen2016-05-291-1/+1
| | | | | | | | | | | | | | It appears we were over-allocating these arrays. Previously we would use nir->num_uniforms directly for scalar programs, and multiply it by 4 for vec4 programs. Instead we should have been dividing by 4 in both cases to convert from bytes to a gl_constant_value count. The size of gl_constant_value is 4 bytes. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Invoke lowering pass for YUV texturesKristian Høgsberg Kristensen2016-05-241-0/+28
| | | | Reviewed-by: Jordan Justen <[email protected]>
* i965: Delete brw_wm_prog_key::render_to_fbo and drawable_height.Kenneth Graunke2016-05-201-45/+0
| | | | | | | | | | | | Now that we handle flipping and other gl_FragCoord transformations via a uniform, these key fields have no users. This patch actually eliminates the associated recompiles. The Tomb Raider benchmark's minimum FPS increases from ~1 FPS to a reasonable number. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Add an allow_spilling flag to brw_compile_fsJason Ekstrand2016-05-171-1/+2
| | | | | | | | | This allows us to disable spilling for blorp shaders since blorp state setup doesn't handle spilling. Without this, blorp fails hard if you run with INTEL_DEBUG=spill. Reviewed-by: Francisco Jerez <[email protected]> Tested-by: Francisco Jerez <[email protected]>
* i965: Mark brw const in brw_state_dirty and callers.Kenneth Graunke2016-05-161-1/+1
| | | | | Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Flip interpolateAtOffset's y offset when necessary.Kenneth Graunke2016-05-151-1/+2
| | | | | | | | | | | Fixes 4 dEQP-GLES31.functional.shaders.multisample_interpolation tests: - interpolate_at_offset.no_qualifiers.default_framebuffer - interpolate_at_offset.centroid_qualifier.default_framebuffer - interpolate_at_offset.sample_qualifier.default_framebuffer - interpolate_at_offset.array_element.default_framebuffer Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Rework the persample shading key/prog_data bitsJason Ekstrand2016-05-141-10/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | This commit reworks and simplifies the way we handle persample shading in the shader key and prog_data. The previous approach had three different key bits that had slightly different and hard-to-decern meanings while the new bits are far more clear. This commit changes it to two easily understood bits that communicate everything we need: 1) key->persample_interp: means that the user has requested persample interpolation through the API. This is equivalent to having SAMPLE_SHADING enabled and having MIN_SAMPLE_SHADING_VALUE set high enough that you actually get multiple per-sample invocations. 2) key->multisample_fbo: means that the shader will be running on an actual multi-sampled framebuffer. This commit also adds a new "persample_dispatch" bit to prog_data which indicates that the shader should be run in persample mode. This way the state setup code doesn't have to look at the fragment program or GL state and can just pull that data out of the prog_data. In theory, this shuffle could mean more recompiles. However, in practice, we were shoving enough state into the key before that we were probably hitting a recompile on every per-sample shader anyway. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/wm: Don't sample lossless compressed as multisampledTopi Pohjolainen2016-05-121-1/+5
| | | | | Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Ben Widawsky <[email protected]>
* i965: Generalize wm_key->compute_sample_id to wm_key->multisample_fbo.Kenneth Graunke2016-04-201-5/+4
| | | | | | | | | | | | | | | | I'm going to need a key entry meaning "we have a multisample FBO, and multisampling is enabled" in an upcoming patch. This is basically wm_key->compute_sample_id, except that it also checks that the SAMPLE_ID system value is read. The only use of wm_key->compute_sample_id is in emit_sampleid_setup(), which is only called when handling the SAMPLE_ID system value. So we can just eliminate the check and generalize the field. v2: Also update the Vulkan driver. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Delete now dead persample_2x FS program key flag.Kenneth Graunke2016-04-201-4/+0
| | | | | | | | | | This was only used by the old gl_SampleID calculations. The new code doesn't need to handle 2x specially. v2: Delete it from the Vulkan driver, too. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>